Back to Blog

Claude Agent SDK Custom Tools - Building AI Agents That Actually Do Things

April 1, 20269 min readMichael Ridland

An AI agent that can only talk is a chatbot. An AI agent that can call your APIs, query your database, update your CRM, and generate documents - that's something worth building.

Custom tools are what make that jump possible in the Claude Agent SDK. They let you define your own functions that Claude can call during a conversation, turning a language model into something that actually does work in your systems. I've been building these for clients across different industries, and the pattern is the same every time: the value of an agent increases dramatically the moment it can interact with real data.

The Claude Agent SDK custom tools documentation covers the full reference. Here's what we've learned about building tools that work well in production.

Why Custom Tools Matter

Without tools, an AI agent is limited to generating text. It can suggest what SQL query you should run, but it can't run it. It can describe the steps to update a customer record, but it can't do the update. The human is still the bottleneck.

With custom tools, the agent closes that loop. A user asks "what's the status of order 4521?" and the agent calls your order lookup tool, gets the real data, and responds with actual information. No copy-pasting queries. No switching between systems.

We've seen this pattern repeatedly in our AI agent development work. The clients who get the most value from AI agents are the ones where the agent has been given access to the systems their team uses daily. It's not about replacing people - it's about removing the tedious intermediary steps between a question and an answer.

How Tool Definition Works

Every custom tool in the SDK has four parts. Nothing more, nothing less.

Name is a unique identifier. Claude uses this internally to reference the tool. Keep it descriptive - lookup_customer is better than tool1. When registered with an MCP server named "crm", the full tool name becomes mcp__crm__lookup_customer.

Description is what Claude reads to decide whether to call the tool. This is more important than most people realise, and I'll come back to it later.

Input schema defines the arguments Claude must provide. In TypeScript, you use Zod schemas. In Python, you pass a dict mapping parameter names to types, like {"customer_id": str, "include_history": bool}. The SDK converts this to JSON Schema automatically. If you need something more specific - enums, ranges, optional fields - you can pass a full JSON Schema dict in Python.

Handler is your async function that does the actual work. It receives the validated arguments and must return a content array. That content can include text, images (base64-encoded), or resources (identified by URI).

Here's what this looks like in practice:

from claude_agent_sdk import tool, create_sdk_mcp_server
from typing import Any

@tool(
    "lookup_customer",
    "Look up a customer by their ID and return their account details including name, email, and current plan",
    {"customer_id": str},
)
async def lookup_customer(args: dict[str, Any]) -> dict[str, Any]:
    customer = await db.get_customer(args["customer_id"])
    return {
        "content": [
            {
                "type": "text",
                "text": f"Name: {customer.name}\nEmail: {customer.email}\nPlan: {customer.plan}\nStatus: {customer.status}",
            }
        ]
    }

The TypeScript equivalent uses Zod for the schema, which gives you automatic type inference on the handler's args parameter. Nice touch.

The In-Process MCP Server Approach

Here's something that threw me initially. Custom tools don't run as standalone functions - you wrap them in an MCP (Model Context Protocol) server using create_sdk_mcp_server in Python or createSdkMcpServer in TypeScript.

But this isn't a separate process you need to manage. The server runs in-process, inside your application. No sockets, no ports, no deployment headaches. You define tools, bundle them into a server with a name and version, and pass that server to query().

crm_server = create_sdk_mcp_server(
    name="crm",
    version="1.0.0",
    tools=[lookup_customer, update_customer_plan, log_interaction],
)

This is smart design. It means your custom tools use the same protocol as external MCP servers (filesystem, GitHub, Slack), so everything fits together consistently. But you don't pay the complexity cost of running a separate service when your tools are just functions in your app.

One thing to keep in mind: every tool in the server's tools array consumes context window space on every turn. If you're building dozens of tools, look into the SDK's tool search feature to load them on demand instead of all at once.

Error Handling - The Bit Most People Get Wrong

There are two ways to handle errors in a tool handler, and they produce very different outcomes.

Throwing an exception stops the agent loop entirely. Claude never sees the error. The query() call fails. The user gets nothing useful.

Returning is_error: True keeps the agent loop alive. Claude sees the error message as data and can react to it - retry the call, try a different approach, or explain what went wrong to the user.

@tool("fetch_order", "Fetch order details by order number", {"order_id": str})
async def fetch_order(args: dict[str, Any]) -> dict[str, Any]:
    try:
        order = await api.get_order(args["order_id"])
        if order is None:
            return {
                "content": [{"type": "text", "text": f"No order found with ID {args['order_id']}"}],
                "is_error": True,
            }
        return {"content": [{"type": "text", "text": json.dumps(order, indent=2)}]}
    except Exception as e:
        return {
            "content": [{"type": "text", "text": f"Failed to fetch order: {str(e)}"}],
            "is_error": True,
        }

In almost every case, you want the second approach. Let Claude handle the failure gracefully. We've seen agents recover from API timeouts by telling the user to try again in a moment, or from missing records by asking if the user has the right ID. That kind of resilience only works if the agent stays alive after an error.

The only time I'd let an exception propagate is if something is genuinely unrecoverable - like your database connection pool is exhausted and there's no point in the agent trying anything else.

Tool Annotations and Parallel Execution

Tool annotations are optional metadata that describe how a tool behaves. The one that matters most in practice is readOnlyHint.

When you mark a tool with readOnlyHint: True, you're telling the SDK this tool doesn't modify anything. The SDK can then batch it with other read-only tools in parallel. If a user asks "show me the customer details and their recent orders", and both lookup_customer and get_recent_orders are marked as read-only, Claude can call both at once instead of sequentially.

@tool(
    "get_recent_orders",
    "Get the most recent orders for a customer",
    {"customer_id": str, "limit": int},
    annotations=ToolAnnotations(readOnlyHint=True),
)
async def get_recent_orders(args: dict[str, Any]) -> dict[str, Any]:
    orders = await db.get_orders(args["customer_id"], limit=args["limit"])
    return {"content": [{"type": "text", "text": json.dumps(orders, indent=2)}]}

The other annotations - destructiveHint, idempotentHint, openWorldHint - are informational. They don't change execution behaviour today, but keeping them accurate is good practice for future SDK versions and for anyone reading your code.

One thing worth noting: annotations are metadata, not enforcement. A tool marked readOnlyHint: True can still write to disk if that's what your handler does. Don't use annotations as a security boundary. Be honest about what the tool actually does.

Tools We've Built for Clients

To give you a sense of what's possible, here are some of the custom tools we've built in our agentic automation projects.

Database lookups are the most common. Customer details, order history, inventory levels, project status. These are almost always read-only, marked with readOnlyHint: True, and they return structured text that Claude can reason about. The key is formatting the output clearly - don't dump raw JSON with fifty fields. Return what's relevant.

CRM updates are where tools start doing real work. An agent that can update a contact's status, log an interaction, or create a follow-up task saves genuinely significant time per interaction. These tools need careful error handling because you're writing to production systems. Always validate inputs in the handler before hitting the API.

Document generation tools return resource blocks with URIs. The agent calls a tool that generates a PDF or a report, and the tool returns a resource with a URI like file:///tmp/report-20260401.pdf. Claude can reference this in its response. We've used this pattern for generating proposals, compliance reports, and meeting summaries.

Multi-system orchestration is where it gets interesting. A single agent with tools for your CRM, your project management system, and your billing platform can answer questions that previously required checking three different applications. "What's the current project status and have we billed for this month?" - one question, one answer, zero tab switching.

Writing Good Tool Descriptions

This is the part that makes or breaks your agent's usefulness. Claude reads tool descriptions to decide which tool to call and when. A vague description means Claude guesses. A specific description means Claude picks correctly.

Bad: "Get data" - Claude has no idea when to use this.

Bad: "Query the database for information" - still too vague. Which database? What information?

Good: "Look up a customer by their ID and return their account details including name, email, current plan, and account status" - Claude knows exactly what this tool does and what it returns.

A few things we've learned:

Include the return format in the description. If your tool returns a JSON object with specific fields, mention the key fields. Claude makes better decisions when it knows what data it'll get back.

Be specific about what the tool can't do. If your customer lookup only searches by ID and not by name, say so. Otherwise Claude will try to pass a name as the customer_id parameter and get confused by the error.

Use the field-level .describe() in TypeScript Zod schemas. Each field can have its own description, which helps Claude provide the right values. latitude: z.number().describe("Latitude coordinate") is better than just latitude: z.number().

Test with ambiguous prompts. Ask your agent something that could match multiple tools and see which one it picks. If it picks wrong, the description needs work. This is an iterative process - treat tool descriptions like you'd treat API documentation. They need to be accurate and specific.

Getting Started

If you're already using the Claude Agent SDK, adding custom tools is the natural next step. Start with one or two read-only tools that give the agent access to your data. Once you see how much more useful the agent becomes, you'll want to add write operations.

If you're building AI agents for your organisation and want to get this right the first time, we can help. We've built custom tool sets across different industries and have a good sense of which patterns work and which ones cause headaches down the line.

The official custom tools documentation is worth reading end to end. It covers additional patterns like returning images, configuring allowed tools for auto-approval, and the full tool naming convention. Between that and what I've covered here, you'll have everything you need to start building agents that do more than just talk.