OpenAI Agents SDK - Getting Started and What We've Learned Building With It
OpenAI shipped their Agents SDK and it's genuinely good. Not "good for an early release" - actually good. We've been building agent systems for clients for a while now, and the SDK handles a lot of the tedious scaffolding that we used to write ourselves. Define an agent, give it tools, run it, get a result. It works in both Python and TypeScript, which matters because most of our client teams have strong opinions about language choice.
The official quickstart walks through the basics clearly. Rather than repeating that, I want to focus on what we've learned from building real agent systems with this SDK and where the interesting design decisions live.
The Basics in Five Minutes
Install the SDK. Set your API key. Create an agent. Run it.
In TypeScript:
import { Agent, run } from "@openai/agents";
const agent = new Agent({
name: "Research assistant",
instructions: "You help with research questions. Be concise.",
model: "gpt-5.4",
});
const result = await run(agent, "What are the main industries in Queensland?");
console.log(result.finalOutput);
In Python:
from agents import Agent, Runner
agent = Agent(
name="Research assistant",
instructions="You help with research questions. Be concise.",
model="gpt-5.4",
)
result = await Runner.run(agent, "What are the main industries in Queensland?")
print(result.final_output)
That's it. The SDK handles the model call, manages the conversation, and gives you back a result object with the final output and the full run history. No manual API calls, no parsing raw responses, no managing message arrays.
Adding Tools - Where It Gets Interesting
A bare agent that can only chat is useful for about ten minutes. The real value comes when you give agents tools they can call.
The SDK has a clean pattern for this. Define a tool with a name, description, parameters schema, and an execution function. Attach it to the agent. The agent decides when to call it based on the conversation context.
import { Agent, run, tool } from "@openai/agents";
import { z } from "zod";
const lookupCustomer = tool({
name: "lookup_customer",
description: "Look up a customer by their account number.",
parameters: z.object({
accountNumber: z.string(),
}),
async execute({ accountNumber }) {
// In reality, this hits your CRM or database
return `Customer ${accountNumber}: Acme Corp, active since 2019`;
},
});
const agent = new Agent({
name: "Account manager",
instructions: "Help staff look up customer information.",
tools: [lookupCustomer],
});
What I like about this approach is that it keeps the tool definition close to the agent that uses it. In some frameworks we've worked with, tools are defined in a separate registry and injected at runtime. That's more flexible but harder to reason about. The SDK's approach is simpler and, for most agent systems, simpler wins.
The parameters use Zod schemas in TypeScript, which gives you type safety and validation for free. The Python version uses standard type hints with the @function_tool decorator, which is similarly clean.
A Word About Tool Design
The SDK makes it easy to add tools. Maybe too easy. We've seen a pattern where developers give an agent twenty tools and then wonder why it picks the wrong one half the time. The model has to decide which tool to call based on the tool descriptions and the current conversation. More tools means more ambiguity.
Our rule of thumb: start with the minimum set of tools your agent needs. If you have a customer support agent, it probably needs "lookup order", "check refund status", and "escalate to human." It probably doesn't also need "update shipping address", "modify subscription", and "generate invoice" in the same agent. Split those into specialist agents instead.
The Handoff Pattern - Multiple Specialists
This is where the SDK design gets really smart. Instead of building one monolithic agent that handles everything, you build specialist agents and let a triage agent route between them.
const orderAgent = new Agent({
name: "Order specialist",
instructions: "Handle questions about orders, shipping, and delivery.",
tools: [lookupOrder, trackShipment],
});
const billingAgent = new Agent({
name: "Billing specialist",
instructions: "Handle questions about invoices, payments, and refunds.",
tools: [lookupInvoice, processRefund],
});
const triageAgent = Agent.create({
name: "Customer support",
instructions: "Route customer questions to the right specialist.",
handoffs: [orderAgent, billingAgent],
});
const result = await run(triageAgent, "Where's my order #12345?");
The triage agent looks at the user's question and hands off to the appropriate specialist. The specialist has its own tools, its own instructions, and its own context. This is cleaner than giving one agent all the tools and hoping it figures out the right workflow.
We've built customer service systems using exactly this pattern. The triage layer handles the initial routing, and each specialist is focused on one domain. When something doesn't fit any specialist, the triage agent can ask for clarification or fall back to a general response.
The result.lastAgent property tells you which agent actually handled the query, which is useful for analytics and monitoring. You want to know whether your billing specialist is handling 80% of queries (maybe you need to fix your invoicing system) or whether the triage agent is failing to route properly.
State Management - The Part Nobody Talks About
The quickstart glosses over state management, but this is where production agent systems get tricky. After the first turn, you need to decide how to carry state into the next turn. The SDK gives you several options:
Application-managed history. You keep the conversation history in your application and pass it back on each turn. Full control, full responsibility.
Sessions. The SDK manages history loading and saving for you. Less code, but you're depending on the SDK's storage.
Server-managed continuation. OpenAI manages the continuation state server-side. Simplest option, but you're giving up control of your conversation data.
For our client projects, we almost always use application-managed history. The overhead is minimal, and it means we control exactly what gets stored, where it gets stored, and how long it persists. For organisations in regulated industries (and in Australia, that's a lot of organisations), this matters.
Tracing - Don't Skip This
The SDK includes built-in tracing that pushes data to OpenAI's traces dashboard. Every model call, tool invocation, handoff, and guardrail evaluation gets recorded. This is genuinely useful during development.
When your agent gives a wrong answer or picks the wrong tool, you can open the trace and see exactly what happened. Which tools were considered. What the model's reasoning was. Where the conversation went sideways. Without tracing, debugging agent behaviour is like debugging a distributed system with print statements. Possible, but painful.
We enable tracing from day one on every agent project. The insights it provides during prompt tuning alone justify the effort.
What We've Learned From Production Deployments
A few hard-won lessons from deploying OpenAI agent systems for Australian businesses:
Instructions matter more than you think. The difference between a good agent and a bad agent is usually in the system instructions, not the model choice or tool design. Spend time writing clear, specific instructions. Tell the agent what it should do, what it should not do, and what to do when it's unsure. Be explicit about tone, format, and guardrails.
Test with real user inputs. Your test cases probably look like "What's the status of order #12345?" Real user inputs look like "hey the thing i bought last tuesday hasnt shown up yet can you check". Build your testing around messy, ambiguous, poorly-formatted inputs because that's what you'll get in production.
Monitor tool call patterns. Track which tools get called, how often, and in what sequences. Unexpected patterns (a tool never being called, or being called repeatedly in a loop) are early warning signs of prompt or tool design issues.
Start small and iterate. Build one agent with two tools. Get it working well. Then add a second agent. Then add handoffs. The SDK makes it tempting to build an elaborate multi-agent system from day one. Don't. Get the basics right first.
Where This Fits in the Agent Ecosystem
The OpenAI Agents SDK is one of several options for building agent systems. We also work with Azure AI Foundry, Claude's agent capabilities, and open-source frameworks like LangChain. Each has its strengths.
The OpenAI SDK's advantage is simplicity. If your organisation is already using OpenAI models and you want to add agent capabilities, this is the shortest path from zero to a working system. The TypeScript and Python support means it fits into most tech stacks without friction.
For organisations building on Microsoft's ecosystem, Azure AI Foundry offers tighter integration with Azure services and enterprise governance features. For more complex orchestration needs, tools like OpenClaw or custom frameworks might be more appropriate.
If you're trying to figure out which approach makes sense for your use case, our AI agent development team can help you evaluate the options and build a proof of concept. We've deployed agents using multiple frameworks and can give you an honest assessment of what works best for your specific requirements and existing infrastructure.
For organisations just starting their AI agent journey, our AI consulting practice can help you identify the right use cases and design an architecture that scales. Getting the foundations right matters more than picking the hottest framework.