How Claude Tool Use Works - Building AI Agents That Do Things
There's a moment in every AI project where someone says "can it actually do something, not just talk about doing it?" That's the moment tool use becomes relevant.
Most people's experience with AI is conversational - you ask a question, you get text back. That's fine for answering questions and writing content, but it falls apart the second you need the AI to check a database, call an API, send an email, or interact with any real system. The model can describe what it would do, but it can't actually do it. Unless you give it tools.
Anthropic's tool use documentation explains the mechanics well. This post is about the practical side - how we've used tool use to build real AI agents for Australian businesses, what the architecture actually looks like, and where the gotchas are.
The Basic Contract
Tool use in Claude works on a simple principle: you tell the model what operations are available, and it decides when to call them. The model never executes anything directly. It generates a structured request saying "I'd like to call function X with these arguments," your application code runs that function, and you send the result back to the model.
Think of it like working with a very capable colleague who can reason about problems and tell you exactly what API calls to make, but doesn't have access to your keyboard. They say "run this query." You run it. You give them the results. They tell you what to do next.
In code terms, you define tools as JSON schemas describing the function name, parameters, and types. You include these tool definitions when you send a message to Claude's API. If Claude decides a tool call would help answer the question, it returns a response with stop_reason: "tool_use" and a tool_use block containing the function name and arguments. Your application executes the function, then sends the result back as a tool_result block. The conversation continues.
This isn't magic. It's typed interfaces. If you've ever built a REST API and written client code that calls it, you already understand the pattern. The only difference is that the "client" deciding which endpoint to call is a language model instead of hardcoded application logic.
Three Types of Tools
Not all tools work the same way. Where the code executes changes what your application needs to handle.
User-Defined Tools (You Build Them, You Run Them)
This is where most of the value is. You define the schema. You write the implementation. You handle the execution. Claude just tells you when to call it and with what arguments.
A real example: we built an AI agent for a professional services firm that needed to look up client information, check project status, and draft status update emails. Three tools:
lookup_client- takes a client name, returns their record from the CRMcheck_project_status- takes a project ID, returns timeline and budget data from the PM tooldraft_email- takes a recipient, subject, and body, queues it for review
When a user asks "what's the status of the Henderson project and can you draft an update for the client?", Claude calls check_project_status, reads the result, calls lookup_client to get the contact details, then calls draft_email with a summary. The user reviews the draft and hits send. Three tool calls, one natural language request.
Anthropic-Schema Tools (Anthropic Designs Them, You Run Them)
For common operations like running shell commands, editing files, and controlling a browser, Anthropic publishes pre-defined tool schemas. You still execute the code on your side, but the schema design comes from Anthropic.
The advantage: Claude has been specifically trained on these tool signatures. It knows exactly how to format arguments for the bash tool, the text_editor tool, and the computer tool. It makes fewer errors and recovers more gracefully from unexpected results compared to equivalent custom tools you might define yourself.
We use the bash and text_editor tools extensively in our development workflows. They're the backbone of how Claude Code operates - reading files, running tests, making changes. The trained-in behaviour makes a noticeable difference in reliability.
Server-Executed Tools (Anthropic Runs Them)
For web_search, web_fetch, code_execution, and tool_search, Anthropic handles everything. You enable the tool in your request and the server takes care of execution. The response includes server_tool_use blocks showing what happened, but by the time you see them, it's already done.
The practical benefit: no infrastructure to maintain for these capabilities. Your agent can search the web, fetch URLs, and run code in a sandbox without you setting up anything. The trade-off is less control - you can't customise the search logic or sandbox configuration.
The Agentic Loop
Here's where it clicks. Tool use isn't a single call-and-response. It's a loop.
1. Send message with tool definitions
2. Claude responds with tool_use blocks
3. Execute the tools
4. Send results back
5. If stop_reason is still "tool_use", go to step 2
6. If stop_reason is "end_turn", you're done
This loop is what turns a chatbot into an agent. The model can chain multiple tool calls together, each one informed by the results of the previous ones. It can decide that the first search didn't return enough results and try a different query. It can call one tool to get an ID, then use that ID as input to another tool.
In practice, most interactions involve 1-3 tool calls. Complex tasks might chain 5-10. We've seen agentic loops run for 20+ iterations on particularly involved research or analysis tasks. The model manages its own workflow, which is the whole point.
One thing to plan for: the loop needs a safety valve. Set a maximum iteration count. We typically cap at 25 turns for production agents. Without a cap, a confused model could loop indefinitely, and that means burning API credits and compute time for no useful output.
When Tool Use Makes Sense (and When It Doesn't)
After building quite a few AI agents across different industries, we've developed a feel for when tool use is the right approach.
Use tools when:
- The task requires side effects. Sending messages, updating records, writing files - anything that changes state in the real world needs a tool. The model can't do it from text alone.
- You need current data. Stock prices, database records, API responses, today's weather. If the answer isn't in the training data (and it usually isn't for business-specific questions), you need a tool to fetch it.
- You want structured, reliable output. If you find yourself writing regex to extract a decision from the model's text response, that decision should have been a tool call. Tool schemas enforce output structure. Parsing prose to recover structure is fragile and unnecessary.
- You're integrating with existing systems. Databases, CRMs, ERPs, internal APIs. Tools are the bridge between natural language requests and the systems that fulfil them.
Don't use tools when:
- The model can answer from its own knowledge. Summarisation, translation, general knowledge questions - no tool needed. Adding a tool round-trip just adds latency for no benefit.
- There are no side effects. If nothing needs to be looked up, calculated externally, or changed, a tool doesn't add value.
- The task is trivially simple. Every tool call is at least one extra API round trip. For a quick "rewrite this sentence," the overhead of a tool call is silly.
The tell that you should be using tools: if you're doing string manipulation on model output to extract structured data, a tool would have given you that structure for free.
Practical Architecture Decisions
When we design AI agent systems for our clients, a few architectural patterns come up repeatedly:
Keep tool implementations thin. The tool function should do one thing - call the API, query the database, write the file. Don't put business logic in the tool implementation. The model handles the reasoning; your tools handle the execution.
Schema design matters. Clear parameter names, good descriptions, sensible defaults. The model reads your tool schema to decide how to call it. A parameter called q tells the model less than one called search_query. We've seen tool call accuracy improve meaningfully just from better schema descriptions.
Return useful errors. When a tool call fails, return a descriptive error message in the tool_result. The model can often recover from errors - retrying with different parameters, trying an alternative approach, or explaining to the user what went wrong. A generic "error occurred" wastes the model's ability to adapt.
Think about tool discovery. If your agent has 50 tools, the model has to read all 50 schemas to decide which one to use. That's a lot of input tokens. Consider grouping tools by domain and only including relevant tools based on context. Anthropic's tool_search server tool can help with this for large tool sets.
Parallel tool calls save time. Claude can return multiple tool_use blocks in a single response. If two tools are independent (looking up a customer while also checking inventory, for example), it'll call both at once. Your application should execute them in parallel rather than sequentially. This is one of those things that doesn't matter in development but makes a real difference at production scale.
The Honest Limitations
Tool use isn't perfect. Here's what to expect:
The model sometimes calls the wrong tool. With good schema design this is rare, but it happens. Your application should validate tool arguments before executing, especially for tools with side effects. You don't want a "send_email" tool firing because the model misunderstood the request.
Latency adds up. Each tool call is a round trip to your backend, plus the time for Claude to process the result and decide what to do next. A 5-step agentic loop with tools that each take 500ms adds 2.5 seconds just in tool execution, plus the API latency for each turn. For real-time user-facing agents, keep the tool chain short.
Token costs scale with the loop. Every iteration of the agentic loop sends the full conversation history (including all previous tool calls and results) back to the API. Long tool chains with verbose results get expensive. Summarise tool results when you can, and set reasonable limits on conversation length.
Models can get stuck. Occasionally the model enters a loop where it keeps calling the same tool expecting different results. Your iteration cap handles this, but it's worth adding logic to detect repeated identical tool calls and break out early.
Where This Is Heading
Tool use is how AI moves from "interesting technology" to "useful business tool." The models will keep getting better at deciding when and how to call tools. The schemas will get richer. The execution environments will get faster. But the fundamental pattern - model reasons, code executes, results flow back - is settled.
For Australian organisations looking at AI agent development, understanding tool use is essential. It's the mechanism that lets your AI agent actually interact with your business systems instead of just talking about them. And once you've built your first tool-using agent that actually saves someone real time, you'll want to build ten more.
The documentation from Anthropic is solid. Read through the full guide on how tool use works for the technical details, then start with a simple agent - two or three tools, a clear use case, and a real user who'll give you honest feedback. That's how every good AI agent project starts.