Claude's Code Execution Tool - Running Python and Bash in the API
Claude's Code Execution Tool - Running Python and Bash in the API
There's a moment in most AI agent projects where you need Claude to do more than generate text. You need it to actually run code - process a CSV, calculate something, generate a chart, or manipulate files. Up until the code execution tool, the standard approach was to build your own sandboxed execution environment, handle all the security concerns yourself, and wire it up as a custom tool.
That works, but it's a lot of plumbing for something that should be straightforward. Anthropic's code execution tool handles all of that for you. Claude can write and run Python or Bash code inside a sandboxed container, return results, generate files, and iterate on solutions - all within a single API conversation.
We've been using this in client projects and it's genuinely changed how we architect certain types of AI agents.
What the Code Execution Tool Actually Is
At its core, the code execution tool gives Claude access to a sandboxed environment where it can run code. You include it in your API request alongside your other tools, and Claude can decide to write and execute code whenever it would help answer the user's question.
The sandbox runs Python with common data science libraries pre-installed (pandas, numpy, matplotlib, and others) and supports Bash commands for file operations and system tasks. Claude can create files, read uploaded files, generate visualisations, and iterate - if the first attempt at a calculation produces an error, it can read the error message and fix the code.
Here's a simple example using the Python SDK:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=4096,
messages=[
{
"role": "user",
"content": "Calculate the mean and standard deviation of [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]",
}
],
tools=[{"type": "code_execution_20250825", "name": "code_execution"}],
)
Claude sees the tool, decides that code execution would be useful for this calculation, writes the Python code, runs it, and returns the result. You don't need to handle the execution - Anthropic's infrastructure does that.
Two Tool Versions and What They Mean
There are currently two versions of the code execution tool, and which one you can use depends on your model.
code_execution_20250825 is the baseline version. It supports Bash commands and file operations and works across all supported Claude models, from Haiku 3.5 through to Opus 4.7. This is the version most people should start with.
code_execution_20260120 adds REPL state persistence and programmatic tool calling from within the sandbox. This means Claude can maintain state between code execution calls in the same conversation (variables persist) and can call other tools directly from the sandbox code. This version is only available on Opus 4.5+ and Sonnet 4.5+ models.
The REPL persistence in the newer version is a bigger deal than it sounds. Without it, each code execution is stateless - if Claude loads a dataframe in one call and wants to filter it in the next, it has to reload the data. With persistence, the dataframe stays in memory. For data analysis workflows with multiple steps, this dramatically reduces execution time and token usage.
My recommendation: start with code_execution_20250825 unless you specifically need state persistence or programmatic tool calling. It's simpler, works with more models, and handles the vast majority of use cases.
Where This Makes Sense in Practice
Not every AI agent needs code execution. But for certain categories of work, it's the difference between a useful agent and one that gives you approximate answers with caveats.
Data analysis and reporting. If you're building an agent that answers questions about data - "what were our top 10 products last quarter?" or "show me the trend in customer churn" - code execution lets Claude actually compute the answers rather than guessing. It can load data, run pandas queries, generate matplotlib charts, and return precise numbers.
File processing. We built an agent for a client that processes uploaded spreadsheets, validates the data against a set of business rules, and produces a cleaned output file. Without code execution, Claude would describe what it would do. With code execution, it actually does it and returns the processed file.
Mathematical and statistical work. Anything involving calculations beyond basic arithmetic benefits from code execution. Statistical tests, financial modelling, optimisation problems - Claude can write and run the code rather than attempting mental maths (which LLMs are not reliable at).
Prototyping and testing. During development, we use code execution to have Claude test its own suggestions. Instead of generating code and hoping it works, it can run it, see the output, and iterate.
Where code execution doesn't add much value: pure text generation tasks, conversational agents, summarisation, and similar work where Claude's language capabilities are sufficient on their own. Adding code execution to a chatbot that answers FAQs is overhead for no benefit.
The Cost Model Is Interesting
Here's something that surprised me: code execution is free when you include web search or web fetch in your request. If your API call includes web_search_20260209 or web_fetch_20260209 alongside code execution, you only pay standard input and output token costs. No extra charge for the execution itself.
This is actually quite smart from Anthropic's perspective. Code execution paired with web search lets Claude fetch data from the web and then process it with code before returning results. The filtered, processed output is typically much smaller than the raw web content, which means fewer tokens in Claude's context window. You get better answers with less token consumption.
When you use code execution without web search or web fetch, standard execution charges apply. The exact pricing depends on your plan and usage tier.
Platform Availability
Code execution is available on the Claude API (Anthropic's direct API) and on Microsoft Azure AI Foundry. It is not currently available on Amazon Bedrock or Google Vertex AI.
If you're building on Azure - which many of our Australian clients do - the Azure AI Foundry support is good news. You get code execution with the same data residency and compliance features that Azure provides, which matters for organisations with strict data governance requirements.
Practical Considerations
A few things we've learned from using this in production:
The sandbox is ephemeral. Files created during code execution don't persist between API calls (unless you're using the newer tool version with REPL persistence, and even then, persistence is within a single conversation). If you need to save outputs, extract them from the API response and store them yourself.
The sandbox has common Python packages but not everything. If your use case requires a niche library that isn't pre-installed, you can try installing it with pip within the sandbox, but this adds execution time and isn't guaranteed to work for packages with complex native dependencies.
Execution adds latency. Each code execution step adds time to the API response. For simple calculations it's barely noticeable, but for complex data processing it can add several seconds. Factor this into your UX design - show a loading indicator or stream the text response while the code runs.
Error handling matters. Claude can and will write code that throws errors sometimes. The good news is that it can see the error output and fix the code. But in production, you should handle the case where code execution fails after multiple retries. Don't assume every execution succeeds.
This feature is not eligible for Zero Data Retention (ZDR). If your organisation requires ZDR for compliance, code execution data is retained according to the feature's standard retention policy. This is worth checking with your compliance team before building it into a production system.
Building Agents with Code Execution
The real power of code execution shows up when you combine it with other tools in an agentic workflow. An agent that can search the web, read documents, execute code, and iterate on results is qualitatively different from one that can only generate text.
For example, we built a research agent that:
- Takes a question from the user
- Searches the web for relevant data
- Uses code execution to parse and analyse the data
- Generates a summary with charts
- Returns the analysis along with the generated files
Each step builds on the previous one, and code execution is what makes steps 3 and 4 possible within the API call rather than requiring external infrastructure.
Getting Started
If you want to try code execution, the simplest path is:
- Get an API key from Anthropic (or use your Azure AI Foundry endpoint)
- Include the code execution tool in your API request
- Ask Claude something that benefits from computation
Start with small, contained tasks - calculate something, process a small file, generate a chart. Get comfortable with how it works before building it into complex agent architectures.
We've been building AI agent systems for Australian organisations using the Claude Agent SDK and related tools. If you're exploring code execution as part of a broader agent architecture, our AI consulting team can help you figure out where it fits and how to build it properly. We also work with Azure AI Foundry for clients who need their AI infrastructure on Azure.
Reference
This post is based on Anthropic's documentation on the Claude code execution tool.