Back to Blog

Claude Tool Combinations - Agent Patterns That Actually Work in Production

April 28, 20269 min readMichael Ridland

When clients ask us how to build a Claude agent, the conversation usually starts with capabilities. What should it do? What does it need access to? What's the failure mode if it gets something wrong? Then very quickly we end up talking about tools, because tools are how an agent does anything useful.

Anthropic ships a set of built-in tools with the Claude API, and they're designed to work together. You don't have to use them all, and you don't have to use them at all - most production agent builds end up with a mix of Anthropic-provided tools and custom tools that talk to the client's own systems. But knowing how the Anthropic tools combine is the right starting point, because the patterns generalise.

I want to walk through the canonical tool combinations Anthropic suggests, what each one is genuinely good for, and where I've seen them work or fall down in real builds.

Research Agent: Web Search and Code Execution

The pairing is web_search with code_execution. Claude searches for information, then writes Python to process or analyse it. Together they let an agent answer questions that require both current information and meaningful computation.

A good example from a recent client engagement: a financial services team wanted an agent that could compare quarterly earnings across the major cloud providers and produce a comparison table. The agent searches for the latest earnings reports, extracts the numbers, runs Python to compute growth rates and ratios, and outputs a clean table. Without code execution, you'd be relying on the model to do arithmetic in its head, which is exactly where Claude (or any LLM) gets things subtly wrong.

The pattern is search, execute, optionally search again. The agent might find a partial answer in the first search, write code to process it, realise something is missing, search again with a more targeted query, and integrate the results. This iterative loop is where research agents start to feel genuinely capable rather than just summarising the first page of search results.

What works well about this combo: code execution runs server-side, so there's no sandbox to manage on your end. The agent can pip install reasonable packages within the runtime. It can produce visualisations. It can do real math.

What to watch for: search results are not always current. If your use case demands real-time data (live prices, current scores, breaking news), web search is sometimes a few hours behind, and you'll want to think about whether that matters for your application. Also, code execution has its own constraints around runtime and packages, so don't assume you can run heavy ML training inside it.

Coding Agent: Text Editor and Bash

This is the classic developer-assistant loop. The text editor tool reads and modifies files. The bash tool runs commands - tests, builds, git operations, package installs. Together they form what Anthropic calls the canonical software-development loop: inspect the code, make an edit, run the tests, repeat.

Both of these are client-executed, which is the important difference from web_search and code_execution. Your application controls what files the agent can see and what commands it can run. This is critical for security. You don't want an agent with bash access running arbitrary commands on a production server. You want it constrained to a specific working directory, with a command allowlist, and ideally running in an ephemeral environment that gets thrown away after the session.

We've been building coding agents for a few clients, and the practical lessons are:

Working directory hygiene matters. Lock the agent to a specific repository or working directory. Don't let it browse upward. The text editor should resolve paths relative to a root that the agent can't escape.

Command allowlists save you. Bash with no constraints is dangerous. Bash with an allowlist of "npm test, npm run build, git status, git diff" is useful. The constraint forces the agent to work within sensible boundaries.

Ephemeral environments are worth the setup cost. Running the agent in a fresh container per session, then discarding the container, means a bad command can't accumulate damage. If your agent is doing anything substantial, the investment in container orchestration is repaid quickly.

This is the pattern behind Claude Code and other developer-facing agents. It also generalises beyond coding - any task that involves "look at the state, change something, verify the change" maps to this pattern. Configuration management, data pipeline operations, infrastructure changes all fit the same shape.

Cite-Then-Fetch: Web Search and Web Fetch

This is the pairing I'd recommend to most teams starting out with research agents. Web search returns candidate URLs and snippets. Web fetch retrieves the full page content for the URLs that look most relevant.

The reason to use both rather than just web search is that snippets are often misleading. A search result might look perfect based on its summary, then the full page turns out to be about something different, or has the answer buried halfway down. By searching first and fetching only the promising results, the agent gets to inspect the snippets, pick what looks relevant, and only then read the full content. This is more efficient than trying to fit ten full pages into the context window.

It also produces better outputs. The agent can cite specific passages from the fetched pages rather than rephrasing snippet summaries. For applications where source attribution matters (legal research, due diligence, anything with regulatory implications), this is the difference between an agent that hallucinates citations and one that grounds its answers in real source material.

In our work with clients on enterprise AI agents, the cite-then-fetch pattern is the one we recommend for any agent that needs to answer questions with grounded evidence. Insurance teams use it for policy and regulatory research. Legal teams use it for case law summaries. Even sales teams use it for competitor analysis, where the cited source is part of the value (it's not "trust me, here's what their pricing is", it's "here's the pricing page that says it").

Long-Running Agents: Memory Plus Whatever Else

The memory tool persists state across conversations. By itself it doesn't do anything useful - it gives the agent a place to write down and later retrieve information that would otherwise be lost when the context window resets.

You combine memory with whatever other tools your agent needs. A support agent might pair memory with custom tools that talk to your ticketing system, your CRM, and your knowledge base. A project assistant might pair memory with calendar and email tools. The memory tool sits orthogonally to the rest of the toolset - it's a place to store and retrieve, not a workflow component.

The practical lessons here are about structure. Memory is just a key-value store with some structure. The agent decides what to write and what to read. Without instructions, the agent will accumulate a chaotic collection of notes that doesn't help it much. With clear instructions about what kinds of facts to remember (user preferences, ongoing projects, prior decisions) and how to retrieve them (by key, by topic), memory becomes a real capability.

This is the area where I think the biggest opportunities sit for enterprise agent deployments over the next year. Most current agents are stateless - they answer one question, then forget. Stateful agents that remember the user, the context of prior work, and the patterns of how a specific business operates are dramatically more useful. The memory tool is one piece of that puzzle, and it's worth understanding how to wire it in properly.

All-in-One: Computer Use

Computer use is the most general tool Anthropic provides. The agent sees screenshots and issues mouse and keyboard actions, which means it can drive any application a human can use. Legacy software with no API. Workflows that span multiple desktop apps. Visual verification steps. Things no other tool can reach.

I'll be direct about computer use: it's the right answer when nothing else works, and it's almost always the wrong answer otherwise.

The reasons not to use it when you have alternatives:

It's slow. Every action requires a screenshot roundtrip. Tasks that take 30 seconds with a proper API integration take 5 minutes with computer use.

It's brittle. UI changes break it. A vendor pushes a small UI update, and your agent stops working until someone updates the prompt or the screenshots.

It's expensive. Screenshots eat tokens. Long workflows eat a lot of tokens. The unit economics don't always make sense for high-frequency tasks.

It's hard to test. With API-based agents, you can write deterministic tests. With computer use, you're testing against a moving UI that's harder to mock.

That said, when computer use is right, it's transformative. The clients where we've recommended it have been ones with legacy desktop applications - typically internal tools built 10+ years ago, no API, no roadmap to add one, and a team of humans doing repetitive data entry from morning to night. Computer use is the answer there. The economics and brittleness arguments don't matter when the alternative is humans clicking the same buttons all day.

For new agent builds, the rule I'd apply is this: if there's an API, use the API. If there's a CLI, use bash. If there's a database, use a database connector. Only reach for computer use when the alternative is "we cannot automate this task".

Custom Tools and Where the Real Value Sits

The Anthropic-provided tools are a starting point, not the destination. The agents we build for clients almost always combine Anthropic tools with custom tools that talk to the client's specific systems. The CRM. The ticketing platform. The data warehouse. The internal knowledge base.

The pattern is: Anthropic tools handle the generic infrastructure (search, code execution, file editing, memory). Custom tools handle the specific business systems. Together they form an agent that has both general capabilities and specific access to your data.

This is also where the engineering effort goes. Wiring up the Anthropic tools is straightforward. Building reliable custom tool integrations, with proper auth, error handling, rate limiting, and observability, is where the real work sits. It's also where the real value gets created, because that's what differentiates your agent from one that anyone else can build.

We do a lot of this work through our AI agent development practice, and the pattern is consistent: clients underestimate the integration effort, and they overestimate how much value comes from the model itself versus the custom tooling around it. The model is necessary but not sufficient. The tools are where the agent becomes useful for a specific business.

If you're starting to think about building agents seriously, my recommendation is to spend more time thinking about what tools your agent needs than which model it should run on. The model can be swapped out. The tool integrations are what stay.


Reference: Tool combinations (Claude API docs)