OpenAI Code Interpreter - Containers, Files, and Practical Patterns
OpenAI's Code Interpreter gives your AI a sandbox to write and run Python code. That sounds simple, but the implications are significant. Instead of the model just telling you how to solve a problem, it actually solves it - writing code, executing it, handling errors, and trying again until it gets the right answer.
We've been using it with clients for data analysis, file processing, and as part of larger AI agent workflows. The container model is what makes it practical for real work, and it's also where most of the gotchas live.
What Code Interpreter Actually Does
Code Interpreter lets models run Python in a sandboxed virtual machine. The model writes code, executes it, reads the output, and iterates. If the code fails, the model can rewrite and retry. This loop is what makes it genuinely useful rather than just a party trick.
The practical use cases break down into a few categories:
Data analysis - upload a CSV or Excel file, ask questions about it, get answers backed by actual computation rather than the model guessing at statistics.
File generation - need a chart? A cleaned dataset? A PDF report? The model writes the code to create it and hands you the file.
Math and computation - anything where you need exact answers rather than approximations. The model knows its own limitations with mental arithmetic and will offload to code.
Image processing - this one surprised us. The latest reasoning models (o3, o4-mini) can use Code Interpreter to crop, rotate, zoom, and transform images. If you're working with visual data, the model can manipulate it programmatically rather than just describing what it sees.
The Container Model
Every Code Interpreter session runs inside a container - a sandboxed virtual machine. This is where your code executes and where your files live. Understanding containers is the key to using Code Interpreter well.
Auto Mode vs Explicit Mode
You have two options for container management.
Auto mode is simpler. Pass "container": { "type": "auto" } in your tool configuration, and OpenAI creates a container for you. If you're making follow-up requests with previous code_interpreter_call items in context, it reuses the existing container. This is fine for one-off tasks and experimentation.
from openai import OpenAI
client = OpenAI()
resp = client.responses.create(
model="gpt-4.1",
tools=[{
"type": "code_interpreter",
"container": {"type": "auto", "memory_limit": "4g"}
}],
instructions="You are a data analyst. Use Python to answer questions.",
input="What is the standard deviation of [23, 45, 67, 89, 12, 34, 56]?"
)
Explicit mode gives you more control. Create a container first via the /v1/containers endpoint, then reference it by ID. This is what you want for production use, because you can pre-load files and reuse the container across multiple requests.
from openai import OpenAI
client = OpenAI()
container = client.containers.create(
name="analysis-session",
memory_limit="4g"
)
response = client.responses.create(
model="gpt-4.1",
tools=[{
"type": "code_interpreter",
"container": container.id
}],
tool_choice="required",
input="Analyse the uploaded dataset and identify outliers"
)
Memory Tiers
Containers come in four sizes: 1GB (default), 4GB, 16GB, and 64GB. The memory limit applies for the container's entire lifetime and affects pricing.
For most data analysis tasks, 4GB is the sweet spot. The default 1GB runs out quickly if you're working with anything beyond small datasets. 16GB and 64GB are for genuinely large data processing - think multi-million row datasets with complex transformations.
Pick the right tier upfront. You can't resize a container after creation.
The Expiration Problem
This is the biggest practical concern: containers expire after 20 minutes of inactivity. When a container expires, everything in it is gone - files, Python objects, computed results, all of it. The container metadata stays around as a snapshot, but the data is unrecoverable.
You can't resurrect an expired container. You have to create a new one and re-upload files.
This means you should treat containers as ephemeral. Any file the model generates that you want to keep, download it while the container is active. Don't assume you can come back to it later.
The good news is that any container operation refreshes the last_active_at timestamp. So retrieving the container, adding files, or deleting files all reset the 20-minute clock. If you're building an application with longer processing times, ping the container periodically to keep it alive.
Working with Files
This is where Code Interpreter gets genuinely useful for business applications. The model can both consume files you provide and generate files you need.
Uploading Files
Upload files to a container before or during a session. The model can then reference them in its Python code - reading CSVs with pandas, opening images with PIL, parsing JSON, whatever the task requires.
For auto-mode containers, you can attach file IDs directly in the container configuration:
container = {
"type": "auto",
"memory_limit": "4g",
"file_ids": ["file-abc123", "file-def456"]
}
Generated Files
When the model creates files - charts, processed datasets, reports - they appear as annotations in the response. Each annotation includes a file_id you can use to download the result.
This is the workflow that clients love most. Upload a messy spreadsheet, ask for a cleaned version with a summary chart, and get back downloadable files. No code to write, no environment to set up, no dependencies to manage.
But remember the expiration rule. Those files live in the container, and the container dies after 20 minutes of inactivity. Download what you need promptly.
Patterns That Work Well
Iterative Data Exploration
The model's ability to write code, run it, see the error, and fix it is genuinely powerful. We've had sessions where the model tries three or four approaches to parse a particularly messy data file before finding one that works. A human developer would do the same thing, just slower.
This works especially well for exploratory data analysis. "What's interesting about this dataset?" is a reasonable prompt when Code Interpreter is available. The model will compute summary statistics, look for correlations, identify outliers, and generate visualisations - all through code it writes and runs.
Data Validation
Upload a dataset and ask the model to validate it against business rules. "Check that all dates are in the current financial year, all amounts are positive, and all account codes match the chart of accounts." The model writes validation code, runs it, and reports what it found.
This is faster than writing validation scripts yourself and catches things you might not think to check. We use this pattern regularly in data and AI consulting work when evaluating data quality for new projects.
File Format Conversion
Need to convert between formats? Upload a CSV, get back a formatted Excel file with headers and styling. Upload JSON, get back a summary PDF. The model handles the conversion code and produces the output file.
Visual Analysis with Reasoning Models
The newer reasoning models (o3, o4-mini) can use Code Interpreter to process images programmatically. Upload a photo of a whiteboard, and the model can crop it, enhance contrast, extract text regions, and clean it up. This is more reliable than asking the model to "read" the image directly, because the model can apply actual image processing algorithms rather than relying on vision alone.
What Doesn't Work Well
Long-running computations hit the timeout wall. If your Python code takes more than a few minutes to execute, the request may time out before completing. Break large computations into smaller steps.
Large file handling is limited by the container's memory. A 64GB container sounds like a lot, but if you're loading a multi-gigabyte dataset into pandas, memory fills up fast. For truly large data, you need a proper data platform, not a Code Interpreter session.
Stateful workflows across sessions are fragile because of container expiration. If your application needs to process data across multiple interactions separated by more than 20 minutes, you need to persist state externally and reload it.
Library availability is limited to what's installed in the sandbox. The standard data science stack (pandas, numpy, matplotlib, scikit-learn) is there, but specialised libraries may not be. You can't install packages at runtime.
Integrating Code Interpreter into Larger Systems
Code Interpreter is most useful as one tool among many in an AI agent's toolkit. An agent might use Code Interpreter for data processing, function calling for API access, and file search for document retrieval - all within the same conversation.
For organisations building production AI systems, Code Interpreter handles the computational heavy lifting while other tools handle data access and external integration. Our AI agent development work often combines Code Interpreter with custom tools that connect to enterprise data sources.
The Responses API makes this composition straightforward. You define multiple tools in a single request, and the model decides which to use based on the task. Code Interpreter for computation, custom functions for everything else.
If you're exploring how to use OpenAI's tools effectively in your organisation, or you're building AI agents that need computational capabilities, reach out to us. We've built enough of these systems to know where the real complexity hides - and it's usually not in the code, it's in the data and the workflow design.
For the full API reference, see OpenAI's Code Interpreter documentation.