Hosting the Claude Agent SDK in Production - What You Need to Know
Building an AI agent is one thing. Running it in production where real users depend on it is something else entirely.
The Claude Agent SDK isn't like calling a stateless API. You don't send a request and get a response. Instead, you're running a long-lived process that maintains conversation state, executes commands in a persistent shell environment, and manages file operations. It's closer to hosting an application server than calling an endpoint.
We've been working with the Claude Agent SDK across several client projects, and the hosting question comes up every single time. This post covers what you actually need to know to deploy it properly, based on the official Anthropic documentation and our own experience running these agents in production.
Why It's Different from a Normal API
With a standard LLM API call, your application sends a prompt, gets a response, and that's it. Stateless. Simple. You can run it behind a load balancer and scale horizontally without thinking too hard about it.
The Claude Agent SDK changes that model. Each agent instance:
- Maintains conversational state across multiple interactions
- Executes shell commands in a persistent environment
- Reads and writes files within a working directory
- Runs tools that carry context from previous interactions
This means each agent needs its own isolated environment. You can't just throw it behind a round-robin load balancer and hope for the best. State matters. Isolation matters. Security really matters - you're giving an AI the ability to run commands.
System Requirements
Each SDK instance needs:
- Python 3.10+ (for the Python SDK) or Node.js 18+ (for TypeScript)
- Claude Code CLI installed globally via npm
- Roughly 1 GiB RAM, 5 GiB disk, and 1 CPU as a starting point
- Outbound HTTPS to
api.anthropic.com - Optional network access to any MCP servers or external tools your agent uses
The resource allocation is a starting point. If your agent is doing heavy file processing or running build tools, bump the RAM and CPU. If it's mostly conversational, you can probably get away with less.
Container-Based Sandboxing Is Not Optional
Let me be blunt about this: if you're running the Claude Agent SDK in production, it needs to run inside a sandboxed container. This isn't a nice-to-have. The agent can execute arbitrary commands. Without proper sandboxing, a prompt injection or unexpected agent behaviour could affect your host system.
Container sandboxing gives you:
- Process isolation - the agent can't escape its container
- Resource limits - CPU and memory caps prevent runaway processes
- Network control - you decide what the agent can reach
- Ephemeral filesystems - nothing persists beyond the session unless you explicitly allow it
The SDK also supports programmatic sandbox configuration, so you can set these limits from your application code rather than relying purely on container orchestration.
Four Deployment Patterns
Anthropic's documentation outlines four patterns, and each fits different use cases. Here's how we think about them.
Pattern 1 - Ephemeral Sessions
Spin up a container for each task, run the agent, destroy the container when done. This is the simplest model and the easiest to reason about.
Good for: one-off tasks like bug investigation, document processing, data extraction, or translation. The user kicks off work, the agent completes it, and the environment disappears.
We've used this pattern for a client that processes invoices. Each invoice gets its own container, the agent extracts structured data, and the container is torn down. Clean, predictable, and easy to scale by just running more containers in parallel.
The downside: there's a cold-start cost. Spinning up a container takes time, typically a few seconds, which is fine for background processing but noticeable for interactive use.
Pattern 2 - Long-Running Sessions
Keep the container alive. Run multiple agent processes inside it based on demand. This works for agents that need to be always-on or that handle high message volumes.
Good for: email monitoring agents, chatbots on platforms like Slack or Teams, site builders that serve content through container ports, or any scenario where the agent needs to be immediately responsive.
The challenge here is resource management. A long-running container needs monitoring, health checks, and a plan for what happens when it runs out of memory or disk. You also need to handle multiple concurrent agent processes carefully to prevent them from interfering with each other.
Pattern 3 - Hybrid Sessions
This is the pattern we use most often. Containers are ephemeral but get hydrated with history and state when they start up. The agent runs, does its work, and the container spins down. When the user comes back, a new container starts with the previous context loaded.
Good for: intermittent interaction patterns. Think project management agents, research assistants, or customer support bots where the conversation spans hours or days but isn't continuous.
The Claude Agent SDK's session resumption features make this viable. You store the session state in a database, and when the user returns, you restore it. The agent picks up where it left off without the user noticing the container was recycled.
This pattern hits the sweet spot between cost (containers aren't running when idle) and user experience (context is preserved). For most of our client projects, this is where we start.
Pattern 4 - Single Container with Multiple Agents
Multiple agent processes sharing one container. Anthropic flags this as the least common pattern and for good reason - agents can potentially overwrite each other's files and interfere in unexpected ways.
The one scenario where this makes sense is agent-to-agent interaction. Simulations, collaborative problem-solving, or multi-agent architectures where the agents need to share a filesystem. But you need to be very deliberate about preventing conflicts.
Choosing a Sandbox Provider
Several providers specialise in container environments for AI code execution:
- Modal Sandbox - good developer experience, simple API
- Cloudflare Sandboxes - if you're already on Cloudflare's platform
- Daytona - focused on development environments
- E2B - purpose-built for AI agent sandboxing
- Fly Machines - lightweight VMs with fast cold starts
- Vercel Sandbox - integrates well with Next.js and Vercel deployments
For self-hosted options, Docker with appropriate security profiles, gVisor, or Firecracker microVMs all work. The choice depends on your existing infrastructure and security requirements.
The cost of hosting containers is a question we get asked constantly. Anthropic puts the minimum at roughly 5 cents per hour per running container. But honestly, the container cost is almost always dwarfed by the token cost. The API calls to Claude are where the real expense lives. Don't over-optimise container costs while ignoring token usage.
Practical Hosting Decisions
When to shut down idle containers: This depends on your provider and your interaction pattern. If users typically respond within minutes, keep containers warm for 5 to 10 minutes. If interactions are spaced hours apart, shut down aggressively and rely on session resumption. Most sandbox providers let you configure idle timeouts.
Monitoring and health checks: The same logging and monitoring you use for your backend works for agent containers. Ship logs to your existing observability stack. Set up health check endpoints. Monitor memory and disk usage, because agents that write files can fill up disk surprisingly fast.
Setting maxTurns: Agent sessions won't timeout on their own. If the agent gets stuck in a loop - and it can happen - it'll keep running and burning tokens. Set a maxTurns property as a safety net. What that number should be depends on your use case, but having one at all prevents the worst-case scenario.
Updating the Claude Code CLI: The CLI follows semver, so breaking changes get a major version bump. In practice, we've found it's worth staying relatively current. Pin a specific version in your container image and update deliberately, testing each new version before rolling it out.
How We Approach Agent Hosting for Clients
Most of the organisations we work with don't want to manage container orchestration themselves. They want an AI agent that works and someone else handling the infrastructure.
Our typical approach is:
- Start with the hybrid session pattern unless there's a specific reason for always-on
- Use a managed sandbox provider rather than self-hosting containers
- Implement session persistence from day one, not as an afterthought
- Set up cost tracking and alerting for both token usage and container costs
- Build health monitoring into the standard ops workflow
The Agent SDK is production-ready, but it requires a different operational mindset than deploying a typical web API. If you're evaluating it for your organisation, thinking about the hosting model early saves significant rework later.
Our AI agent development team works with the Claude Agent SDK and other frameworks to build production agent systems. If you're planning an agent deployment and want to talk through the architecture, or if you need help with agentic automation more broadly, get in touch.