Migrating from OpenAI Assistants API - What You Need to Know

April 4, 2026•8 min read•Michael Ridland

If you built AI agents or assistants using OpenAI's Assistants API over the past couple of years, you've probably seen the writing on the wall. OpenAI has been steering developers towards their newer Responses API, and the Assistants API is heading for deprecation. This isn't a "maybe eventually" situation - it's happening, and teams that don't plan for it will be scrambling.

We've been helping Australian organisations build AI-powered applications with OpenAI's APIs since the early GPT-4 days, and we've now gone through several of these migrations with clients. Here's what we've learned about making the transition without breaking things.

Why OpenAI Is Moving Away from the Assistants API

The Assistants API was OpenAI's first real attempt at a stateful, agent-like API. You could create an assistant with instructions, attach tools (code interpreter, file search, function calling), and manage conversation threads that persisted on OpenAI's servers. It worked, and for a while it was the best option for building AI applications that needed memory and tool use.

But the architecture had problems. State management was entirely on OpenAI's side. Your conversation threads lived on their servers, which meant you were dependent on their storage, their rate limits, and their data retention policies. If you needed to inspect what happened in a conversation for debugging or compliance, you had to pull it through the API. If OpenAI had an outage, your conversation history was inaccessible.

The Responses API takes a different approach. Instead of managing state server-side, it gives you more control. You send the full context with each request and get back a response. If you need conversation history, you manage it yourself. If you need tool calls, they're handled inline rather than through a polling mechanism.

This is actually better for production systems, even though it feels like more work. You control your data. You control your state. You don't wake up one morning to find OpenAI changed their thread retention policy and your conversation history is gone.

What Actually Changes

Let me walk through the concrete differences that matter for migration:

Thread management goes away

In the Assistants API, you created threads, added messages to them, and ran the assistant against the thread. The thread was a server-side object with its own ID and lifecycle.

# Old Assistants API pattern
thread = client.beta.threads.create()
client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="Analyse this quarterly report"
)
run = client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=assistant.id
)

In the Responses API, you send messages directly. If you want conversation history, you include previous messages in the request:

# New Responses API pattern
response = client.responses.create(
    model="gpt-4o",
    input=[
        {"role": "user", "content": "Analyse this quarterly report"}
    ]
)

For multi-turn conversations, you append the assistant's response to your message list and send everything back with the next request. Yes, this means you're sending more data with each request. But it also means you have full control over what context the model sees, and you can trim or summarise old messages as conversations get long.

Tool calling gets simpler

The Assistants API had a somewhat awkward tool-calling flow. You'd start a run, poll for status, check if the run needed action (tool calls), execute the tools yourself, submit the outputs back, and then continue polling. It worked, but the polling loop was fragile and hard to debug.

The Responses API handles tool calls in a single request-response cycle. The model returns tool call requests in its response, you execute them, and you send the results back in the next request. Streaming makes this even smoother - you get tool call events in the stream and can handle them as they arrive.

# Tool calls in the Responses API
response = client.responses.create(
    model="gpt-4o",
    tools=[{
        "type": "function",
        "name": "get_customer_data",
        "description": "Look up customer information by ID",
        "parameters": {
            "type": "object",
            "properties": {
                "customer_id": {"type": "string"}
            },
            "required": ["customer_id"]
        }
    }],
    input=[{"role": "user", "content": "Look up customer C-4521"}]
)

No more run objects. No more polling. The conversation flow is more predictable and easier to reason about.

File search and code interpreter

The Assistants API bundled file search and code interpreter as built-in tools. You'd upload files to OpenAI, attach them to an assistant or thread, and the tools would work against those files.

In the Responses API, these capabilities still exist but work differently. File search uses vector stores that you manage explicitly. Code interpreter (now often called "code execution") is available as a tool type. The key difference is that file management is more explicit - you create vector stores, add files to them, and reference them in your requests.

This is more work to set up but gives you better visibility into what's happening. With the Assistants API, files were somewhat opaque - you uploaded them and hoped the search worked. With the Responses API approach, you can inspect your vector stores, manage file lifecycle, and understand why a search returned specific results.

Planning the Migration

Based on the migrations we've done, here's a practical approach:

Step 1 - Audit your current usage

Before you change any code, understand exactly how you're using the Assistants API. Map out:

How many assistants you have and what each one does
Which tools each assistant uses (function calling, file search, code interpreter)
How you manage threads and conversation history
What your error handling looks like
How files are uploaded and managed

We build a spreadsheet for this. It sounds old-fashioned, but having every assistant, its configuration, and its dependencies in one place makes the migration plannable rather than chaotic.

Step 2 - Build your state management layer

This is the biggest piece of work. The Assistants API managed state for you. Now you need to manage it yourself. You'll need:

A database for conversation history (PostgreSQL works well for this)
Logic to trim or summarise long conversations before they exceed context limits
Session management to associate conversations with users
Possibly a caching layer if you're doing high-volume interactions

Don't over-engineer this. A simple conversations table with user_id, messages (as JSON), created_at, and updated_at covers most use cases. You can add complexity later if you need it.

Step 3 - Migrate tool definitions

Function calling tools transfer almost directly. The schema format is very similar between the two APIs. You'll mainly need to change how you handle the tool call flow - from polling to inline response handling.

File search requires more work. You'll need to create vector stores in the new format and re-upload or re-index your files. Plan time for this, especially if you have large file collections.

Step 4 - Test with production-like data

Don't just test with sample inputs. Run your actual production queries through the new implementation and compare outputs. We typically do this in parallel - keep the Assistants API running in production while routing a percentage of traffic through the new implementation for comparison.

Look for differences in response quality, latency, and error rates. The model behaviour should be very similar since you're likely using the same underlying model, but the different context management can occasionally produce different results.

Step 5 - Switch over gradually

Don't do a big-bang migration. Route traffic incrementally - 5%, then 25%, then 50%, then 100%. Monitor each step for issues. This approach has saved us from several problems that only appeared at certain traffic levels.

Common Gotchas

A few things that caught us and our clients off guard:

Context window management. The Assistants API handled context window limits internally by summarising or truncating old messages. You need to do this yourself now. If your conversations are long, you'll hit context limits fast. Build truncation logic early.

Rate limits are different. The Responses API has different rate limit tiers than the Assistants API. Check your current usage against the new limits before migrating.

Cost structure changes. Without server-side thread management, you might send more tokens per request (because you're including conversation history). But you also have more control over what you send, so you can potentially reduce costs by being smarter about context. Net effect depends on your use case.

Streaming behaviour. If you're using streaming (and you probably should be for user-facing applications), the event format is different. Plan for this in your frontend code.

Is It Worth the Effort?

Yes. The Responses API is a better foundation for production AI applications. The control you gain over state management, the simpler tool-calling flow, and the reduced dependency on OpenAI's infrastructure all make your system more reliable and maintainable.

The migration effort is real - plan for 2-4 weeks for a typical application, longer if you have complex file search setups. But you're not just migrating away from a deprecated API. You're moving to an architecture that gives you more control and will be easier to maintain going forward.

If your team needs help planning or executing this migration, our AI development team has been through it multiple times. We also work with Azure OpenAI for organisations that want the OpenAI models with Azure's enterprise features, which adds another dimension to the migration planning.

For teams that are building new AI applications from scratch, skip the Assistants API entirely and go straight to the Responses API. There's no point building on an API that's heading for sunset. And if you're evaluating whether OpenAI is even the right foundation for your agents, talk to us about the broader AI agent development space - there are strong alternatives depending on your requirements.