OpenAI Function Calling - From Assistants API to the Responses API Migration

March 21, 2026•10 min read•Michael Ridland

Function calling is the thing that turns a language model from a fancy autocomplete into something that can actually do work. Without it, all you've got is text generation. With it, your model can look up a customer record, create a ticket, check inventory, send an email - whatever your business logic needs. It's the bridge between "the model thinks" and "the model acts."

We've been building function calling into production agent systems for over a year now at Team 400, and the pattern has stayed remarkably consistent even as the underlying APIs have shifted. What has shifted is that the Assistants API - the thing many teams built their function calling workflows on - is being deprecated. OpenAI announced the shutdown date as August 26, 2026, and the replacement is the Responses API. If you've got production code on the Assistants API, you've got about five months to migrate.

Let me walk through how function calling actually works, what the migration looks like, and what I'd do differently if I were starting fresh today.

How Function Calling Works - The Basics

The concept is straightforward. You describe your functions to the model using JSON Schema - what the function does, what parameters it accepts, what types those parameters are. The model reads the user's message, decides whether any of your functions are relevant, and if so, returns a structured call with the function name and arguments filled in.

Here's a simple example. Say you want your agent to be able to look up weather data:

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name, e.g. Melbourne"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"]
                    }
                },
                "required": ["location"]
            }
        }
    }
]

When a user asks "What's the weather like in Brisbane?", the model doesn't try to answer from its training data. Instead, it returns a function call: get_weather(location="Brisbane", unit="celsius"). You execute that function on your side - calling a weather API, querying a database, whatever - and pass the result back to the model. The model then uses that real data to formulate its response.

The key insight is that the model never executes the function itself. It decides what to call and with what arguments. You run the code. This is a deliberate design choice and a good one - it means you control exactly what happens, with your own authentication, error handling, and rate limiting.

What Made the Assistants API Different

The Assistants API wrapped function calling in a stateful, threaded conversation model. You created an assistant with tools defined upfront, then ran conversations through threads. When the model wanted to call a function, the run entered a requires_action status. You'd poll (or stream) for that status, execute the function, submit the output back, and the run would continue.

# The Assistants API pattern
run = client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=assistant.id
)

# Poll until the run needs action
while run.status == "requires_action":
    tool_calls = run.required_action.submit_tool_outputs.tool_calls

    outputs = []
    for call in tool_calls:
        result = execute_function(call.function.name, call.function.arguments)
        outputs.append({
            "tool_call_id": call.id,
            "output": json.dumps(result)
        })

    run = client.beta.threads.runs.submit_tool_outputs(
        thread_id=thread.id,
        run_id=run.id,
        tool_outputs=outputs
    )

This worked, but the polling model was clunky. You'd either poll in a loop (wasteful) or use streaming (better, but more complex to handle). The stateful nature of threads meant OpenAI was managing your conversation state on their side, which was convenient until you needed to do something the API didn't support - like branching a conversation or replaying from a specific point.

Parallel function calling was a nice addition. Models released after November 2023 could return multiple function calls in a single turn. Ask "What's the weather in Melbourne and Sydney?" and the model would return both calls at once instead of doing them sequentially. Practical speed improvement.

The strict: true parameter for Structured Outputs was another useful feature - it guaranteed that function call arguments would conform exactly to your JSON Schema. No more hoping the model got the types right.

The Deprecation - Why OpenAI Moved On

OpenAI deprecated the Assistants API as part of launching their new agents platform, which includes the Responses API (released March 11, 2025). Their reasoning makes sense when you look at the architecture: the Assistants API tried to be a stateful backend service, managing threads, files, and vector stores. That's a lot of responsibility for an API to hold, and it created limitations.

The Responses API takes a different approach. It's stateless by default. You send a request with your conversation history and tools, get back a response. If the response includes tool calls, you execute them and send another request with the results included. The state management is on your side, which sounds like more work but actually gives you more control.

The official function calling documentation still covers the Assistants API patterns, but the Responses API is where all new development should be happening.

Here's my honest take on the deprecation: I think it was the right call, but the timeline is tight for teams that built heavily on Assistants. Five months (from now) is not a lot of time for an enterprise migration, especially if you've got function calling woven deeply into production workflows. If you haven't started planning your migration yet, start this week.

Function Calling in the Responses API

The good news is that function calling itself works essentially the same way in the Responses API. You define functions as tools, the model decides when to call them, you execute and return results. The mechanics around it are different, but the core pattern is familiar.

from openai import OpenAI

client = OpenAI()

tools = [
    {
        "type": "function",
        "name": "get_customer",
        "description": "Look up a customer by their email address",
        "parameters": {
            "type": "object",
            "properties": {
                "email": {
                    "type": "string",
                    "description": "The customer's email address"
                }
            },
            "required": ["email"]
        }
    },
    {
        "type": "function",
        "name": "create_support_ticket",
        "description": "Create a new support ticket for a customer",
        "parameters": {
            "type": "object",
            "properties": {
                "customer_id": {"type": "string"},
                "subject": {"type": "string"},
                "priority": {
                    "type": "string",
                    "enum": ["low", "medium", "high"]
                }
            },
            "required": ["customer_id", "subject"]
        }
    }
]

response = client.responses.create(
    model="gpt-4o",
    input=[
        {"role": "user", "content": "Can you create a high priority ticket for [email protected] about their API integration failing?"}
    ],
    tools=tools
)

The model might first call get_customer with the email, you return the customer data including their ID, and then it calls create_support_ticket with the customer ID and details from the user's message. Two function calls, chained logically.

What's different from the Assistants API is that you're managing the conversation loop yourself. There's no thread object holding state. Each request includes the full conversation context. For simple use cases this feels like more boilerplate. For production systems, it's actually better - you control the state, you can persist it however you want, and you're not dependent on OpenAI's infrastructure for conversation management.

Patterns That Work in Production

After building function calling into multiple production systems, here are the patterns I'd recommend.

Keep function descriptions precise. The model uses your function descriptions to decide when to call them. Vague descriptions lead to the model calling functions when it shouldn't, or not calling them when it should. "Get customer information" is worse than "Look up a customer's account details, subscription status, and billing history by their email address." Be specific about what the function returns, not just what it does.

Use enums generously. If a parameter has a fixed set of valid values, use an enum. The model will stick to those values, and with strict: true, it's guaranteed. This prevents entire categories of bugs where the model invents parameter values that your backend doesn't understand.

Handle errors explicitly. When a function call fails - and it will - return a structured error message that the model can work with. Don't just throw an exception. Return something like {"error": "customer_not_found", "message": "No customer found with that email address"}. The model can then tell the user what happened and ask for the correct information.

Group related functions into toolsets. Not every function should be available in every conversation. An agent handling customer service doesn't need access to your internal analytics functions. Define toolsets per agent role or conversation type. Fewer tools means less confusion for the model and faster response times.

Log everything. Every function call, every argument, every result. When something goes wrong in production (and it will), you need to trace exactly what the model asked for, what your function returned, and how the model interpreted the result. This logging is your debugging lifeline.

The Migration Path

If you're moving from the Assistants API to the Responses API, here's how I'd approach it:

Step 1: Inventory your tools. List every function you've defined in your assistants. Document their schemas, descriptions, and any special behaviours. This is your migration checklist.

Step 2: Extract your state management. In the Assistants API, threads managed your conversation state. In the Responses API, that's on you. Build (or adopt) a conversation store - database, Redis, whatever fits your stack. You need to persist the message history and pass it with each request.

Step 3: Rewrite the orchestration loop. Replace the poll-for-status pattern with a straightforward request-response loop. Send a request, check if the response contains tool calls, execute them, send the results back. It's simpler code, honestly.

Step 4: Test with real conversations. Don't just test that functions get called - test full conversation flows. Does the model chain function calls correctly? Does it handle errors gracefully? Does it ask for clarification when arguments are ambiguous? Run your existing conversation logs through the new implementation and compare outputs.

Step 5: Run both in parallel. If you can, run the Assistants API and Responses API side by side in production for a period. Route a percentage of traffic to the new implementation and compare behaviour. This catches edge cases that unit tests miss.

The Responses API also gives you access to new capabilities that weren't in the Assistants API - web search, computer use, and better integration with the OpenAI Agents SDK. If you're building agentic automations, the migration is an opportunity to expand what your agents can do.

What I'd Do Starting Fresh

If you're starting a new project today, don't touch the Assistants API. Build on the Responses API from the start.

But I'd go a step further. Consider whether you want to be calling the Responses API directly at all, or whether you want an abstraction layer. We typically build our AI agent systems with an orchestration layer that sits between our business logic and the model API. This means we can swap between OpenAI's Responses API, Azure OpenAI, or even different model providers without rewriting our function calling logic.

The function definitions stay the same. The execution logic stays the same. Only the transport layer changes. This kind of portability has saved us on more than one project when a client's requirements shifted mid-build.

Function calling is probably the most practical capability in the modern AI toolkit. It's what makes agents useful rather than just interesting. The APIs will keep changing - we've already seen the shift from Chat Completions to Assistants to Responses, and there will be more shifts to come. But the pattern of "model decides, code executes" is stable. Build around that pattern, keep your implementations portable, and you'll be in good shape regardless of which API OpenAI ships next.