Running OpenAI Agents - The Agent Loop, State, and Streaming Explained

April 12, 2026•8 min read•Michael Ridland

Defining an agent - giving it a name, instructions, and tools - is the easy part. The interesting questions come when you actually run it. How does a single run work internally? How do you carry conversation state from one turn to the next? What happens when the agent needs human approval before continuing? These runtime concerns are where most of the complexity lives, and where most projects hit unexpected friction.

OpenAI's agent running documentation covers the technical details well. Here's what we've learned building production agents with this SDK for Australian organisations.

The Agent Loop - What Actually Happens During a Run

When you call run() in the OpenAI Agents SDK, it doesn't just send one request and return one response. It enters a loop that keeps going until it reaches a genuine stopping point. Understanding this loop is the single most important thing for working with the SDK effectively.

The loop works like this:

Send the current input to the agent's model
Look at what the model returned
If the model made tool calls, execute those tools and loop back to step 1 with the results
If the model handed off to another agent, switch to that agent and loop back
If the model produced a final answer with no pending tool work, return the result

Steps 3 and 4 are what make this interesting. A single run() call might involve multiple model roundtrips. The agent might call a tool to fetch data, get the result, call another tool to process it, get that result, and then produce a final answer. All of that happens inside one run() invocation.

This is worth internalising because it affects how you think about timeouts, costs, and error handling. A "simple" agent run might actually involve five or six model calls under the hood, each with its own latency and token cost. Plan accordingly.

Choosing a Conversation Strategy

This is the decision that has the biggest long-term impact, and it's one you should make early. The SDK gives you four ways to carry state between turns, and mixing them tends to create headaches.

Local History (Manual Replay)

You keep the conversation history yourself - in memory, in a database, wherever - and pass it back on each turn. This gives you maximum control. You decide what goes into context. You can trim, summarise, or modify the history between turns.

The downside is that you're responsible for everything. If you mess up the history format, you get subtle bugs. If your history gets too long, you need to handle truncation yourself. For small chat loops or highly controlled workflows, this works fine. For anything with complex multi-turn conversations, it's more work than the other options.

Sessions (SDK-Managed State)

This is what we use for most projects. You create a session object - MemorySession in TypeScript, SQLiteSession in Python - and pass it to each run() call. The SDK manages the conversation history for you.

import { Agent, MemorySession, run } from "@openai/agents";

const agent = new Agent({
  name: "Tour guide",
  instructions: "Answer with compact travel facts.",
});

const session = new MemorySession();

const firstTurn = await run(
  agent,
  "What city is the Golden Gate Bridge in?",
  { session },
);
console.log(firstTurn.finalOutput);

const secondTurn = await run(agent, "What state is it in?", { session });
console.log(secondTurn.finalOutput);

The Python equivalent uses SQLiteSession, which gives you persistence to disk:

from agents import Agent, Runner, SQLiteSession

agent = Agent(
    name="Tour guide",
    instructions="Answer with compact travel facts.",
)

session = SQLiteSession("conversation_123")

first_turn = await Runner.run(
    agent,
    "What city is the Golden Gate Bridge in?",
    session=session,
)

Sessions are the best default for most applications. They handle history management, support resumable runs (which matters for approval flows), and give you control over storage. The SQLiteSession in particular is nice because your conversation state survives process restarts without any external infrastructure.

Conversation ID (Server-Managed, Shared)

The conversationId approach stores state on OpenAI's servers using the Conversations API. You create a conversation, get an ID, and pass it on each turn. The advantage is that multiple systems can share the same conversation - a web server and a background worker could both continue the same conversation thread.

We've used this for multi-service architectures where different backend systems need to interact with the same agent conversation. It removes the need to synchronise state between services. The tradeoff is that you're dependent on OpenAI's API for state management - your conversation data lives on their infrastructure.

Previous Response ID (Lightweight Server-Managed)

This is the cheapest continuation option. Instead of maintaining a full conversation, you just pass the ID of the last response. OpenAI links the responses together server-side. You only need to send the new user message on each turn, not the full history.

It's suitable for simple back-and-forth conversations where you don't need complex state management. But it doesn't support the richer features that sessions provide, like resumable approval flows.

Which Should You Pick?

For most production applications, sessions. They give you the right balance of convenience and control. Use MemorySession for ephemeral conversations (like a web chat where losing history on server restart is acceptable) and SQLiteSession (or a custom session implementation backed by your database) for persistent conversations.

Use conversationId if you specifically need shared state across multiple services. Use previous response ID for lightweight chatbots where you want to minimise complexity. Avoid local history unless you have a specific reason to manage state yourself.

Pick one strategy per conversation and stick with it. Mixing approaches - say, using local replay for some turns and previous response ID for others - creates subtle state duplication bugs that are painful to track down.

Streaming - Same Loop, Incremental Output

Streaming doesn't change the fundamental agent loop. The same sequence of model calls, tool executions, and potential handoffs happens. The difference is that you receive events as they happen rather than waiting for the entire run to complete.

const stream = await run(agent, "Give me three short facts about Saturn.", {
  stream: true,
});

for await (const event of stream) {
  if (
    event.type === "raw_model_stream_event" &&
    event.data.type === "response.output_text.delta"
  ) {
    process.stdout.write(event.data.delta);
  }
}

await stream.completed;
console.log("\nFinal:", stream.finalOutput);

Three things to keep in mind with streaming:

Always wait for the stream to finish. Don't treat the run as settled until stream.completed resolves. Partial output isn't final output - the agent might still have tool calls to make.

If the run pauses for approval, handle it through the interruptions mechanism. Don't start a fresh turn. Resume from the existing state. This keeps turn counts, history, and continuation IDs consistent.

Cancelling mid-stream doesn't lose work. If you cancel a stream partway through, you can resume the unfinished turn from state later. This is useful for long-running agent tasks where a user might need to disconnect and reconnect.

Handling Failures and Pauses

There are two categories of non-happy-path outcomes, and they need different treatment.

Runtime failures include things like hitting the maximum turn limit, guardrail exceptions, or tool errors. These are errors - something went wrong. Handle them with error handling logic, logging, and potentially retry logic.

Expected pauses are different. These are intentional interruptions - the most common being human approval requests. The agent hit a point where it needs a human to approve an action before continuing. The run isn't failed; it's paused.

The critical distinction: treat approvals as paused runs, not as new turns. When a human approves an action, resume the run from the saved state rather than starting a new run() call. If you start a new turn, you lose the context of what the agent was doing when it paused, and turn counts get out of sync.

This is something we had to learn the hard way on a client project. We were treating approval resolutions as new turns, which meant the agent would sometimes repeat work it had already done before the pause. Resuming from state fixed the issue completely.

Practical Recommendations

Set sensible turn limits. The default might let an agent loop through many tool calls. If your tools are expensive or slow, cap the maximum turns to prevent runaway costs. Better to fail fast and investigate than to let an agent spend 50 turns and a lot of tokens trying to figure something out.

Log the internal turns. In production, you want visibility into what happened inside a run - how many model calls, which tools were called, what the intermediate results were. This telemetry is essential for debugging and cost monitoring.

Test with realistic conversation lengths. An agent that works great for 3-turn conversations might struggle at 20 turns as the context fills up. Test your session strategy with conversations that match real usage patterns.

Don't over-engineer state management early. Start with MemorySession, get your agent logic working, then move to persistent sessions when you actually need persistence. Premature complexity in the state layer will slow down your iteration speed.

Building Production Agent Systems

The OpenAI Agents SDK gives you a solid foundation for building agent applications, but the runtime concerns - state management, error handling, streaming, and approval flows - are where the real engineering work happens. Getting these right is the difference between a demo and a production system.

At Team 400, we've been building production AI agent systems for Australian organisations using both OpenAI's SDK and other frameworks. If you're looking at building agentic applications, whether it's customer service agents, internal automation, or specialist AI workflows, our AI development team can help you get from prototype to production. We also work extensively with Azure AI services for organisations that need to keep everything within the Microsoft ecosystem.

The agent loop concept is straightforward once you've internalised it. The conversation strategy decision is the one that matters most. Get that right early, keep your error handling clean, and the rest follows naturally.