Handling Tool Calls in the Claude Agent SDK - The Complete Loop
Every AI agent that does real work follows the same loop. The model decides it needs information or needs to perform an action. It emits a tool call. Your code executes that tool. You send the result back. The model continues.
Getting this loop right is the difference between an agent that works reliably and one that breaks in confusing ways. I've spent enough time debugging malformed tool results and mismatched IDs to have opinions about this. The mechanics are simple, but the details matter.
The Claude API documentation on handling tool calls covers the specification. Here's what the spec means when you're building production agents.
The Tool Call Lifecycle
When Claude decides to use a tool, the API response changes in two ways. First, the stop_reason becomes tool_use instead of end_turn. Second, the response includes one or more tool_use content blocks alongside any text content.
Each tool_use block has three fields that matter:
- id - a unique identifier for this specific tool call. You'll need this when sending results back.
- name - which tool Claude wants to use. This matches the name you defined in your tool configuration.
- input - a JSON object with the parameters, matching the input_schema you provided.
Here's what a typical response looks like when Claude decides to call a weather tool:
{
"stop_reason": "tool_use",
"content": [
{
"type": "text",
"text": "I'll check the current weather for you."
},
{
"type": "tool_use",
"id": "toolu_01A09q90qw90lq917835lq9",
"name": "get_weather",
"input": { "location": "Sydney, NSW", "unit": "celsius" }
}
]
}
Notice that Claude can include text alongside the tool call. This is the "thinking out loud" before acting - useful for user-facing applications where you want to show the user what's happening.
Sending Results Back
After you execute the tool, you send the result back as a user message containing a tool_result block. The structure is straightforward but the formatting rules are strict.
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "toolu_01A09q90qw90lq917835lq9",
"content": "Currently 22 degrees celsius, partly cloudy"
}
]
}
Three things that will cause errors if you get them wrong:
The tool_use_id must match. Every tool result must reference the exact ID from the corresponding tool_use block. If you mix up IDs when handling multiple parallel tool calls, Claude won't know which result goes with which call.
Tool results must come immediately after the tool use. You can't insert other messages between the assistant's tool_use message and your tool_result response. The API enforces this ordering.
Tool results must come first in the content array. If you want to include additional text in the same message as a tool result, the tool_result blocks must precede any text blocks. Get this backwards and you'll hit a 400 error. This one catches people regularly because it feels natural to write "Here are the results:" before the actual results. Don't.
{
"role": "user",
"content": [
{ "type": "tool_result", "tool_use_id": "toolu_01..." },
{ "type": "text", "text": "What should we do with this data?" }
]
}
Error Handling
Tools fail. APIs time out. Databases go down. External services return garbage. Your error handling strategy determines whether the agent recovers gracefully or crashes.
The Claude API provides the is_error flag for exactly this situation. When a tool execution fails, you still send back a tool_result - but you set is_error to true and put the error message in the content.
{
"type": "tool_result",
"tool_use_id": "toolu_01A09q90qw90lq917835lq9",
"content": "ConnectionError: the weather API returned HTTP 500",
"is_error": true
}
Claude handles this intelligently. It won't pretend the tool succeeded. Instead, it'll tell the user something like "I wasn't able to check the weather because the service is currently down." That's a much better user experience than a silent failure or an exception trace.
Here's something I've learned from building agents for clients through our AI agent development work: the quality of your error messages matters enormously. "Failed" tells Claude nothing. "Rate limit exceeded - retry after 60 seconds" tells Claude exactly what happened and what to do about it. Write error messages as if you're briefing a colleague, not logging to a file.
We've seen agents successfully retry operations, try alternative approaches, and gracefully inform users of limitations - all because the error messages gave Claude enough context to reason about the situation.
Rich Content in Tool Results
Tool results don't have to be plain strings. You can return structured content including text blocks, images, and documents.
This opens up interesting patterns. A screenshot tool can return the actual image for Claude to analyse. A document retrieval tool can return the full document. A database query tool can return a formatted table.
{
"type": "tool_result",
"tool_use_id": "toolu_01A09q90qw90lq917835lq9",
"content": [
{ "type": "text", "text": "Query returned 3 results:" },
{
"type": "document",
"source": {
"type": "text",
"media_type": "text/plain",
"data": "Order 4521 | Shipped | $1,240.00\nOrder 4522 | Processing | $890.00\nOrder 4523 | Delivered | $2,100.00"
}
}
]
}
The document type is particularly useful for returning large amounts of structured data. Rather than stuffing everything into a text string, you can pass it as a document with the appropriate media type. Claude processes this differently internally - documents get better treatment for long content than plain text strings.
Parallel Tool Calls
Claude can request multiple tools in a single response. When it does, you get multiple tool_use blocks in the content array. You need to execute all of them and return all the results in a single user message.
This is a performance optimisation. If Claude needs both the current weather and a customer's order history, it can request both simultaneously rather than waiting for one before requesting the other. Your code should execute them in parallel too - there's no reason to serialise independent tool calls.
The response with results looks like this:
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "toolu_weather_01",
"content": "22 degrees, partly cloudy"
},
{
"type": "tool_result",
"tool_use_id": "toolu_orders_01",
"content": "3 active orders totalling $4,230"
}
]
}
Each result references its own tool_use_id, and Claude matches them up. If one tool fails and the other succeeds, mark the failed one with is_error and return both. Claude will use what it can.
Invalid Tool Calls
Sometimes Claude calls a tool with wrong or missing parameters. This usually means your tool description wasn't clear enough, but it can also happen with ambiguous user requests.
When this happens, return a tool_result with is_error set to true and a message explaining what was wrong. Claude will typically retry with corrected parameters - it gets two or three attempts before giving up and apologising to the user.
A better long-term fix is improving your tool descriptions. If Claude keeps calling a tool with the wrong parameter format, the description probably isn't specific enough about what it expects. We've found that adding examples to tool descriptions reduces invalid calls significantly.
If you want to eliminate this entirely, the API supports strict mode on tool definitions. With strict: true, the API guarantees that tool inputs match your schema exactly. Missing parameters, wrong types, extra fields - none of them get through. This adds a small amount of latency to each request but can be worth it for production systems where reliability matters more than speed.
Server Tools vs Client Tools
The documentation distinguishes between client tools (tools you execute) and server tools (tools Claude executes internally, like web search). For server tools, you don't need to handle the tool call lifecycle at all - Claude processes the results internally and incorporates them into its response.
This distinction matters architecturally. Client tools are where all your custom logic lives - your API calls, database queries, and business operations. Server tools are capabilities that Anthropic provides. When designing an agent, you're mostly thinking about client tools, but it's worth knowing that some capabilities come built-in.
Patterns We Use in Production
From building agents for Australian enterprises through our AI consulting practice, a few patterns have proven themselves.
Typed tool results. Even though the API accepts string content, we always return structured data when the result is complex. JSON strings that Claude can parse are better than narrative text for data-heavy responses.
Timeout handling. Every tool call gets a timeout. If an external API doesn't respond within the threshold, we return an error result rather than letting the whole agent hang. This is basic reliability engineering but I'm surprised how often it gets skipped.
Logging the full exchange. We log every tool_use block and every tool_result block in production. When something goes wrong - and something always goes wrong eventually - having the full conversation including tool calls makes debugging straightforward.
Idempotent tools where possible. If Claude retries a tool call (which happens when it gets an error), you want that retry to be safe. A "get order status" call is naturally idempotent. A "create new order" call is not. For non-idempotent operations, we add deduplication logic using the tool_use_id.
Where to Go from Here
The manual tool handling described here is what you need when you want full control. For many use cases, Anthropic's Tool Runner abstraction handles the loop automatically - parsing tool_use blocks, executing your functions, and sending results back without you writing the plumbing code.
But understanding the underlying protocol matters, even if you use the abstraction. When something breaks, you need to know what the messages actually look like. And for custom orchestration patterns - conditional tool execution, dynamic tool sets, multi-agent routing - you'll need to drop down to this level.
If you're building AI agents that interact with real business systems, this tool call loop is the foundation everything else sits on. Get it right and the rest follows naturally. Get it wrong and you'll spend your time debugging message formatting instead of building features.
For teams looking to build production AI agent systems, our agentic automation services cover the full lifecycle - from architecture through to deployment and monitoring. The tool call protocol is just the beginning; the real work is deciding what tools to build and how to orchestrate them effectively.