Back to Blog

Defining Tools for Claude AI Agents - What Actually Works

March 28, 20269 min readMichael Ridland

If you're building AI agents with Claude, tool definitions are where the rubber meets the road. You can have the best prompt engineering in the world, but if your tools are poorly defined, the agent will misuse them, call them at the wrong time, or pass garbage parameters. I've seen it happen on enough projects to know that tool design is where teams should spend more time than they typically do.

Anthropic's tool definition documentation covers the API mechanics well. What I want to talk about is what we've learned from actually building and deploying Claude-powered agents in production for Australian businesses.

The Basics - Tool Definition Structure

Every tool you define for Claude needs three things:

  • name - a string identifier (alphanumeric, hyphens, underscores, max 64 characters)
  • description - a plaintext explanation of what the tool does
  • input_schema - a JSON Schema defining the expected parameters

You can optionally include input_examples - concrete examples of valid inputs that help Claude understand complex tools.

Here's a simple example:

{
  "name": "get_customer_orders",
  "description": "Retrieves the order history for a specific customer by their customer ID. Returns the most recent 50 orders by default, including order date, total amount, status, and line items. Use this when the user asks about a customer's past orders, purchase history, or order status. Does not return payment details or shipping addresses for security reasons.",
  "input_schema": {
    "type": "object",
    "properties": {
      "customer_id": {
        "type": "string",
        "description": "The unique customer identifier, formatted as 'CUS-' followed by 8 digits (e.g., CUS-00012345)"
      },
      "limit": {
        "type": "integer",
        "description": "Maximum number of orders to return. Defaults to 50 if not specified. Maximum is 200."
      }
    },
    "required": ["customer_id"]
  }
}

That looks obvious. But the gap between "technically correct" and "works well in practice" is wider than you'd think.

Descriptions Are Everything

I'm not exaggerating when I say that tool descriptions are the single most important factor in how well your agent uses its tools. We've had situations where changing a description - without touching anything else - completely fixed an agent that was choosing the wrong tool for user queries.

Anthropic recommends at least 3-4 sentences per tool description. I'd push that further: for any tool that's even slightly ambiguous, write as much as you need. Describe:

What the tool does. Not just the action, but what data it returns and in what format.

When to use it. Be explicit about the scenarios where this tool is the right choice. "Use this when the user asks about order history" is far better than leaving Claude to guess.

When NOT to use it. This is the one teams skip, and it's often the most valuable part. "Do not use this tool to check order status for orders placed in the last hour - those won't appear yet. Use check_order_queue instead." That kind of negative guidance prevents the most frustrating agent failures.

What it won't return. If the tool intentionally excludes data (for security, performance, or scope reasons), say so. Otherwise Claude might call the tool, not find what it expects, and call it again with different parameters - wasting tokens and confusing the user.

Parameter format expectations. If a customer ID needs a specific prefix, if a date needs a particular format, if a string is case-sensitive - put it in the description. Don't rely on JSON Schema validation alone to enforce this.

Bad description:

{
  "name": "search_products",
  "description": "Searches for products."
}

Good description:

{
  "name": "search_products",
  "description": "Searches the product catalogue by keyword, category, or SKU. Returns up to 20 matching products with name, price, stock level, and category. Use this when a user asks about product availability, pricing, or wants to find a specific item. Supports partial matching on product names. Category must be one of: electronics, clothing, food, homewares. If the user's query is vague, search by keyword rather than category. Returns an empty array if no products match - do not retry with modified parameters, instead tell the user no results were found."
}

The difference in agent behaviour between these two is dramatic.

Consolidate Your Tools

One mistake I see repeatedly: teams create a separate tool for every possible action. create_order, update_order, cancel_order, get_order, list_orders, search_orders. Six tools that are really just one resource with different operations.

Claude handles a smaller number of well-defined tools better than a large number of similar tools. When you've got 20+ tools and several sound alike, the model spends more effort deciding which tool to call and occasionally picks wrong.

Better approach: consolidate related operations into fewer tools with an action parameter.

{
  "name": "manage_orders",
  "description": "Manages customer orders. Supports creating, updating, cancelling, retrieving, and searching orders via the action parameter.",
  "input_schema": {
    "type": "object",
    "properties": {
      "action": {
        "type": "string",
        "enum": ["create", "update", "cancel", "get", "list", "search"],
        "description": "The operation to perform"
      },
      "order_id": {
        "type": "string",
        "description": "Required for get, update, and cancel actions"
      },
      "search_query": {
        "type": "string",
        "description": "Required for search action. Supports product name, customer name, or date range"
      }
    },
    "required": ["action"]
  }
}

This isn't always the right pattern. If two operations are fundamentally different - say, a read-only lookup versus a destructive delete - there's an argument for keeping them separate so you can apply different safety controls. But for CRUD operations on the same resource, consolidation usually works better.

Naming Conventions Matter at Scale

When you've got five tools, naming doesn't matter much. When you've got 30, it matters a lot.

Use consistent namespacing that tells Claude which service or domain a tool belongs to:

  • github_list_prs
  • github_create_issue
  • slack_send_message
  • slack_list_channels
  • crm_search_contacts
  • crm_update_deal

This is especially useful when you're using Anthropic's tool search feature, which lets Claude search through a large tool library rather than loading all tools into context at once. Clear namespacing makes those searches more accurate.

Choosing the Right Model

This is worth calling out because it affects tool behaviour directly.

Claude Opus 4.6 is the best choice for complex tool scenarios. It handles ambiguous queries better - when a user's request could map to multiple tools, Opus is more likely to ask for clarification rather than guessing. It also does better with multiple tools in a single turn and with tools that have complex, nested input schemas.

Claude Haiku works well for straightforward tool use where there's an obvious single tool to call. It's faster and cheaper, which matters for high-volume agent deployments. But it's more likely to infer missing parameters rather than asking the user, which can cause problems if those inferences are wrong.

For most production agents, we use Opus for the main agent loop and Haiku for sub-tasks where the tool selection is unambiguous.

Input Examples - When to Use Them

The input_examples field lets you provide concrete examples of valid tool inputs. Each example is validated against your schema, so they also serve as a form of documentation.

I'd say input examples are worth adding in three scenarios:

  1. Complex nested objects. When your tool accepts objects within objects, an example makes the expected structure much clearer than schema alone.

  2. Format-sensitive parameters. If your tool expects dates in YYYY-MM-DD format, or IDs with specific prefixes, showing an example is faster for Claude to process than reading a description.

  3. Multiple valid use patterns. If a tool can be called in different ways depending on the scenario, examples for each pattern help Claude understand the variations.

For simple tools with one or two string parameters, the description is usually sufficient. Don't add examples just because you can.

Tool Response Design

This gets less attention than it should. How you format your tool's return data affects how well Claude uses the results.

Return only what Claude needs. If your tool queries a database and gets 50 columns back, don't pass all 50 to Claude. Filter to the fields that are relevant to the agent's task. Large responses eat context window and make it harder for Claude to find the important bits.

Use stable identifiers. Return customer IDs, order numbers, and slugs rather than internal database row IDs that mean nothing to the user. Claude will often include these identifiers in its response to the user, so they should be meaningful.

Include metadata that aids reasoning. If your search returned 200 results but you're only returning the top 20, include a total_results field so Claude can tell the user "I found 200 matches, here are the most relevant." Without that metadata, Claude has to guess whether 20 results is everything or just a subset.

Common Mistakes We See

Overloaded tool surfaces. Giving an agent 50 tools when it only needs 10 for its core job. Start with the minimum viable tool set and add more only when you have evidence the agent needs them.

Missing error guidance. Not telling Claude what to do when a tool returns an error. Should it retry? Try a different tool? Tell the user? Apologise? Put this in the tool description.

Assuming Claude reads the schema carefully. Claude uses tool descriptions far more than it reads JSON Schema properties. If something is important, put it in the description, not just in the schema.

Not testing with real user queries. Your tool might work perfectly with your test prompts but fail when real users ask messy, ambiguous questions. Test with the kind of queries your actual users will send.

Building Better Agents

Tool definition quality is one of those things that separates demo-quality agents from production-quality ones. It's not glamorous work - writing good descriptions feels tedious compared to architecting agent flows or fine-tuning prompts. But in our experience, it has more impact on end-user satisfaction than almost anything else.

If you're building Claude-powered agents and want help getting the tool layer right, our AI agent development team has built production agents across customer service, operations, and internal tooling. We know what works.

For teams who want to understand how Claude agents fit into a broader AI strategy - not just one agent, but a platform approach - our AI consulting practice can help you design an architecture that scales.

And if you're building with the Claude Agent SDK specifically and want hands-on guidance, we work closely with Anthropic's tools and frameworks and can accelerate your team's ramp-up significantly.

Good tools make good agents. Spend the time.