OpenAI's Agent Platform - What It Actually Offers and Where It Fits
OpenAI has been building out their agent platform steadily over the past year, and it's reached a point where it's worth talking about seriously. Not in a "this changes everything" way - we've heard that phrase enough - but in a practical "should we use this for our next project?" way.
I spent some time going through their updated agents documentation, and here's my honest assessment of where OpenAI's agent tooling stands, what it does well, and where you might want to look elsewhere.
What OpenAI Is Actually Offering
Let me break this down because the marketing language can be thick. OpenAI's agent platform has a few distinct components:
Agent Builder is a visual canvas for designing agent workflows. You drag and drop components - models, tools, logic nodes, guardrails - and wire them together. Think of it like a flowchart builder, but each node can call an LLM, search a vector store, execute a function, or route to another agent. It's the no-code/low-code entry point.
AgentKit is the underlying modular toolkit. If Agent Builder is the GUI, AgentKit is the programmatic layer underneath. You can build the same agent workflows in code, which gives you version control, testing, and CI/CD integration that a visual builder can't easily provide.
Agents SDK is for developers who want to build agentic applications from scratch. It handles the orchestration, tool calling, and agent handoffs, but you write your own code around it. This is where teams with software engineering capability will spend most of their time.
ChatKit is the deployment layer. It's a customizable UI component that you embed in your product. Point it at an agent workflow (or run it on your own infrastructure with the SDK), and you've got a chat interface connected to your agent backend.
The pitch is end-to-end: design your agent, equip it with tools, test it, deploy it, embed it, monitor it, optimise it. All within OpenAI's ecosystem.
What Works Well
The model quality is strong. Whatever you think about OpenAI as a company, their models are genuinely good at agentic tasks. The reasoning capability, the ability to decide which tool to use and when, the handling of multi-step workflows - it's mature. When we build agents on OpenAI models, we spend less time wrestling with the model's decision-making and more time on the actual business logic.
Function calling is well-implemented. OpenAI was early to structured tool calling, and their implementation is still one of the most reliable. The model reliably formats function calls correctly, handles errors reasonably, and can chain multiple tool calls in a single turn. For building agents that interact with APIs and databases, this matters a lot.
The evaluation tooling is surprisingly good. Agent evals (evaluating whether your agent does the right thing) are one of the hardest parts of building production agents. OpenAI's eval platform, including trace grading and datasets, gives you a structured way to measure agent performance over time. Most teams skip this entirely, build an agent that works on five test cases, and wonder why it fails in production. Having eval tooling built into the platform lowers the barrier to actually doing it.
Vector stores and file search are integrated. If your agent needs to search through documents - internal knowledge bases, product manuals, policy documents - OpenAI hosts vector stores that your agent can query directly. No need to set up a separate vector database. For smaller-scale deployments, this is genuinely convenient.
Where It Gets Complicated
Vendor lock-in is real. If you build your entire agent stack on OpenAI's platform - Agent Builder, vector stores, ChatKit - you're deeply coupled to OpenAI. Their pricing changes, their API changes, their reliability issues are now your reliability issues. We saw this play out when OpenAI had capacity issues last year and agents that depended entirely on their infrastructure went down.
Our general advice: use OpenAI's models, but keep your orchestration layer portable. Build on the Agents SDK or use an open-source framework where you can swap models if you need to. We do a lot of work on Azure AI Foundry, which gives you access to OpenAI models through Azure's infrastructure with enterprise SLAs and data residency options. For Australian organisations with data sovereignty requirements, that matters.
Agent Builder is limited for production workloads. The visual builder is great for prototyping. You can get a working agent in an afternoon. But production agents need error handling, logging, retry logic, integration testing, deployment pipelines - things that are hard to do in a visual tool. Most teams that start with Agent Builder end up moving to the SDK for anything they're putting in front of real users.
ChatKit solves a narrow problem. If you need a chat interface and nothing else, ChatKit is fine. But most real agent deployments need more than a chat window. They need integration into existing applications, custom UIs, background processing, webhook triggers. ChatKit doesn't address those scenarios, so you end up building custom frontend code anyway.
The pricing model needs careful analysis. You're paying for model inference (per token), vector store hosting (per GB per day), and potentially file search queries. For a small agent handling a few hundred queries a day, costs are reasonable. For an agent that processes thousands of queries with large context windows and multiple tool calls per query, costs can scale fast. We always model out expected costs before recommending a platform, and we've talked clients out of approaches that would have cost them five figures a month in API calls.
How It Compares to Other Approaches
We build agents across multiple platforms - OpenAI, Azure AI, open-source frameworks, and custom solutions. Here's where OpenAI fits in our view:
For straightforward conversational agents (customer service, FAQ bots, internal knowledge assistants), OpenAI's platform is a solid choice. The model quality is high, the tooling is mature, and you can get to production relatively quickly.
For complex multi-agent systems with specialised agents handing off to each other, we still prefer custom orchestration. Frameworks like Semantic Kernel or our own orchestration layer give us more control over agent routing, state management, and error recovery. OpenAI's Agents SDK can do this, but it's not as flexible as purpose-built orchestration code.
For enterprise deployments with compliance requirements, Azure AI Foundry with OpenAI models is usually the better path. You get the same model capabilities with Azure's security, networking, and compliance infrastructure. Data stays in your Azure tenant, you get enterprise support, and your IT security team is much happier.
For cost-sensitive applications, consider whether you actually need GPT-4o for every interaction. Many agent tasks - routing, classification, simple extraction - work fine with smaller models. A tiered approach where a smaller model handles routine tasks and a larger model handles complex reasoning can cut costs by 60-70%.
Practical Recommendations
If you're evaluating OpenAI's agent platform for your organisation, here's what I'd suggest:
Start with a specific use case. Don't try to build a general-purpose AI assistant. Pick one well-defined workflow - expense report processing, customer inquiry routing, document summarisation - and build an agent for that. You'll learn more from one focused deployment than from a broad prototype.
Prototype in Agent Builder, build in SDK. Use the visual tool to validate your workflow and test the basic flow. Then rebuild it properly in code with error handling, logging, and tests. This sounds like doing the work twice, but the prototype phase is fast and catches design problems early.
Set up evals from day one. Build a test dataset of expected inputs and outputs before you write any agent code. Run your agent against this dataset after every change. This catches regressions that manual testing misses. OpenAI's eval tools make this easier than rolling your own, so actually use them.
Don't ignore the alternatives. OpenAI's platform is good, but it's not the only option. Microsoft's Copilot Studio, Azure AI Agent Service, and open-source frameworks all have strengths in different areas. The right platform depends on your existing tech stack, compliance needs, budget, and team capabilities.
We've built agent systems across most of these platforms for Australian organisations in finance, healthcare, professional services, and manufacturing. The technology choices matter, but they're usually less important than getting the use case right and building proper guardrails around the agent's behaviour. If you're exploring what agents could do for your business, our AI agent development team can help you evaluate options and build something that actually works in production, not just in a demo.
The agent space is moving fast, and OpenAI is clearly investing heavily in their platform. It's worth watching, worth experimenting with, and for the right use cases, worth building on. Just go in with clear requirements and realistic expectations about what these tools can and can't do today.