OpenAI Agents SDK - Choosing Models and Providers for Production Workflows
When you're building agent systems with the OpenAI Agents SDK, one of the first decisions you face is model selection. It sounds simple - pick a model, set it, move on. But in practice, production agent workflows often need different models for different tasks within the same system. A triage agent that classifies incoming requests doesn't need the same model as a specialist agent that writes detailed analysis reports.
Getting this right from the start saves you from painful refactoring later. I've seen teams build entire agent systems against a single model, then discover that their costs are three times what they expected because every simple classification step was running through their most expensive model.
At Team 400, we build agent systems for Australian organisations across a range of industries, and model selection strategy is something we think about early in every project. Here's what we've learned.
Be explicit about model selection
The OpenAI Agents SDK lets you set models at three levels: per agent, per run, and as a process-wide environment variable. My strong recommendation is to be explicit rather than relying on defaults.
Here's what that looks like in TypeScript:
import { Agent, Runner } from "@openai/agents";
const triageAgent = new Agent({
name: "Triage agent",
instructions: "Classify incoming requests by priority and type.",
model: "gpt-5.4-mini",
});
const analysisAgent = new Agent({
name: "Analysis agent",
instructions: "Produce detailed investigation reports.",
model: "gpt-5.4",
});
const runner = new Runner({
model: "gpt-5.4",
});
And the Python equivalent:
from agents import Agent, RunConfig, Runner
triage_agent = Agent(
name="Triage agent",
instructions="Classify incoming requests by priority and type.",
model="gpt-5.4-mini",
)
analysis_agent = Agent(
name="Analysis agent",
instructions="Produce detailed investigation reports.",
)
result = await Runner.run(
analysis_agent,
"Investigate the billing issue on account 456.",
run_config=RunConfig(model="gpt-5.4"),
)
The triage agent uses gpt-5.4-mini because it's doing a simple classification task. Speed matters more than depth. The analysis agent uses gpt-5.4 because the output quality needs to be high.
If you don't set a model explicitly, the agent uses whatever the SDK's current default happens to be. That default can change between SDK versions. I've seen this bite teams in production when an SDK upgrade silently changed the default model and their costs shifted overnight. Set your models explicitly. It takes 10 seconds and prevents surprises.
The model selection decision framework
Here's how I think about which model goes where:
Use the smaller, faster model when:
- The task is classification, routing, or extraction
- The output is structured (JSON, categories, yes/no decisions)
- Latency matters more than nuance
- The agent handles high volume and costs need to stay reasonable
Use the larger, more capable model when:
- The task requires reasoning across multiple pieces of information
- The output is unstructured text that needs to read well
- Accuracy on edge cases matters more than speed
- The agent handles lower volume but higher stakes decisions
Use the run-level default when:
- You want to swap models for an entire workflow without editing every agent
- You're running the same agents in different environments (dev uses a cheaper model, production uses the full-capability one)
- You want one override point for testing or experimentation
Use the environment variable (OPENAI_DEFAULT_MODEL) when:
- You want a process-wide fallback for agents that don't specify a model
- You're running batch jobs where the model should be consistent across everything
In practice, most of our production agent systems use a mix of per-agent and run-level models. The agents that need specific models get them set directly. Everything else inherits from the run config, which we can adjust per environment.
Transport options - when WebSockets matter
Most agent workflows use the standard HTTPS request-response pattern. Your agent makes a call to the OpenAI API, waits for the response, processes it, makes the next call. Simple, reliable, well-understood.
The SDK also supports a WebSocket transport for the Responses API. This keeps a persistent connection open, which reduces the overhead of establishing a new HTTPS connection for each round trip. If your agent is doing many sequential model calls in a tight loop - think an agent that iterates through a document paragraph by paragraph - the WebSocket transport can reduce total latency noticeably.
But here's my honest take: for most agent workflows, the standard HTTPS path is fine. The connection establishment overhead is measured in milliseconds. Unless you're doing dozens of sequential calls where those milliseconds compound into noticeable delays, the WebSocket transport adds complexity without meaningful benefit.
Where WebSocket transport does make a difference is in scenarios where you're doing rapid back-and-forth model calls - something like an agent that's interacting with a user in real-time and needs to make multiple tool calls per user message. In those cases, the persistent connection keeps things feeling responsive.
One important distinction: the Responses WebSocket transport is not the same as the live audio/voice path. If you're building voice-enabled agents, that's a completely separate API surface using WebRTC or WebSocket for audio streaming. The Voice agents documentation covers that path.
Non-OpenAI models and mixed-provider setups
The SDK has a provider abstraction layer that lets you swap in non-OpenAI models. The exact configuration is language-specific - the TypeScript and Python SDKs handle it differently - so check the SDK docs for your language.
This matters for a few reasons. Some organisations have compliance requirements that restrict which model providers they can use. Others want to run certain agents against local models for cost or latency reasons while keeping their high-capability agents on OpenAI. Mixed-provider setups are becoming more common as the model market matures.
My practical advice: start with OpenAI models for everything. Get your agent system working and validated. Then, if you need to swap specific agents to different providers for cost, compliance, or latency reasons, use the provider abstraction to make those changes surgically. Don't start with a mixed-provider setup from day one - it adds debugging complexity before you even have a working system.
Model settings beyond just the model name
Choosing a model is only part of the configuration. The SDK also lets you control:
- Reasoning effort - how much "thinking" the model does before responding. Lower effort means faster, cheaper responses. Higher effort means more thorough reasoning. For a triage agent, low effort is fine. For a complex analysis agent, you want high effort.
- Stored prompts - instead of embedding system prompts in code, you can reference prompt configurations stored externally. This is useful when non-technical team members need to update agent behaviour without touching the codebase.
- Tool behaviour - how the model interacts with available tools. Some features here depend on the Responses API rather than the older Chat Completions surface.
Practical patterns from our projects
A few patterns we've settled on across multiple agent projects:
Environment-based model overrides. We set the run-level model based on an environment variable, so the same agent code runs against gpt-5.4-mini in development and gpt-5.4 in production. This keeps dev costs low while testing, and we only see production-grade model behaviour when it matters.
Cost monitoring per agent. When you have multiple agents with different models, track token usage per agent, not just per workflow. This lets you spot when a particular agent is consuming more tokens than expected - often a sign that its prompt needs refinement or its task decomposition is wrong.
Explicit model pinning for production. Don't use model aliases that auto-update in production. If you're running gpt-5.4, specify the exact version. When a new version is available, test it in staging first, then update the pin.
Where this fits in your agent architecture
Model selection is one piece of a larger design. How you define your agents, how you orchestrate runs, and how you handle tool calling all interact with model choice. A well-chosen model in a poorly designed agent system won't save you - and a mediocre model in a well-designed system can perform surprisingly well.
For the full picture on the OpenAI Agents SDK, the official documentation covers models, providers, and transport options. If you're building agent systems for your organisation and want help with architecture decisions - not just model selection, but the full design of your agent workflows - reach out to our team. We specialise in building production AI agent systems for Australian businesses, from initial design through to deployment and ongoing refinement on Azure AI infrastructure.