Securely Deploying Claude AI Agents - A Practical Security Guide
Here's something that doesn't get talked about enough in the AI space. You've built an AI agent that can execute code, read files, call APIs, and make decisions autonomously. How do you deploy it without creating a security nightmare?
This isn't a theoretical concern. We build and deploy AI agents for Australian businesses regularly, and security is the conversation that separates a proof of concept from a production system. An agent that works perfectly in a demo can be a real risk in production if it hasn't been properly isolated and constrained.
Anthropic recently published their guide to securely deploying Claude agents, and it's one of the more honest and practical security documents I've read from an AI provider. Rather than hand-waving about "enterprise-grade security," it lays out the actual threat model, the built-in protections, and the additional hardening you can apply. Let me walk through the key points and add some context from our own deployment experience.
Why AI Agents Need Different Security Thinking
Traditional software follows predetermined code paths. You write the code, you test the code, you deploy the code. The software does what you told it to do. AI agents are different. They generate their actions dynamically based on context and goals. This is what makes them useful - they can adapt to novel situations. But it also means their behaviour can be influenced by the content they process.
The specific risk here is prompt injection. A malicious file, a crafted email, a webpage with hidden instructions - any of these could potentially influence an agent's behaviour in ways you didn't intend. If your agent has access to credentials and network connectivity, a successful prompt injection could mean data exfiltration or unauthorised actions.
I want to be clear: this isn't a reason to avoid AI agents. It's a reason to deploy them with the same rigour you'd apply to any software that runs untrusted input. Web applications face SQL injection. AI agents face prompt injection. The response isn't to stop building, it's to build with appropriate controls.
Claude's Built-in Security Features
Claude Code ships with several security features that handle the most common risks. These aren't afterthoughts - they're designed into the tool.
The permissions system lets you control exactly what tools and commands the agent can use. You can set rules using glob patterns - "allow all npm commands," "block anything with sudo," "prompt for approval on file deletions." Organisations can set policies that apply across all users, which is important for teams where you don't want individual developers making their own security decisions.
Static analysis runs before bash commands execute. If the agent tries to modify system files or access sensitive directories, the command gets flagged. This catches a lot of the obvious bad outcomes before they happen.
Sandbox mode restricts filesystem and network access at the OS level. Commands run in a restricted environment where they can only touch the files and endpoints you've explicitly allowed.
For a developer running Claude Code on their laptop, these built-in features are probably sufficient. You're the only user, the blast radius is limited, and the permissions system gives you visibility into what the agent wants to do before it does it.
Production deployments need more.
The Security Principles That Matter
Anthropic's guide centres on three principles, and they're the same ones any security professional would recognise.
Security boundaries. Put sensitive resources outside the boundary that contains the agent. The classic example: instead of giving the agent an API key, run a proxy outside the agent's environment that injects credentials into outbound requests. The agent can make API calls, but it never sees the credentials themselves. If the agent is compromised, the credentials aren't exposed.
We've implemented this pattern on several client projects. The agent runs in a container. A reverse proxy sits in front of it, handling authentication and injecting headers. The agent's network access is restricted to the proxy. It can't make arbitrary outbound requests, and it can't see credentials. The proxy also logs everything, giving you an audit trail.
Least privilege. Only give the agent the access it needs for its specific task. Mount only the directories it needs to read. Restrict network access to the specific endpoints it needs to call. Drop unnecessary Linux capabilities in containers. This is standard security practice, but it's easy to skip when you're focused on getting the agent working.
Defence in depth. Layer your controls. Don't rely on any single mechanism. Combine container isolation with network restrictions with filesystem controls with request validation at your proxy. If one layer fails, the others still protect you.
Choosing Your Isolation Technology
The guide covers four levels of isolation, and the right choice depends on your threat model.
Sandbox runtime is the lightest option. Anthropic provides a sandbox-runtime package that uses OS-level primitives to restrict filesystem and network access. On Linux it uses bubblewrap, on macOS it uses Seatbelt profiles. You configure a JSON file specifying allowed paths and domains, and the agent is restricted to those. No Docker required.
This is great for CI/CD pipelines and single-developer setups. The overhead is minimal and the setup is simple. The trade-off is that the agent shares the host kernel, so a kernel vulnerability could theoretically allow escape. For most use cases, that's an acceptable risk.
Docker containers provide stronger isolation through Linux namespaces. Each container gets its own filesystem, process tree, and network stack. A properly hardened container configuration drops all capabilities, prevents privilege escalation, and applies a seccomp profile to restrict system calls.
This is our default recommendation for production deployments. The operational overhead is moderate - most teams already have Docker in their stack - and the isolation is strong enough for the vast majority of use cases. If you're running agents that process customer data or interact with production systems, containers are the minimum bar.
gVisor adds another layer by intercepting system calls through a user-space kernel. It's stronger than standard containers but adds meaningful performance overhead. We've used it for deployments where the agent processes data from untrusted external sources - things like customer-uploaded documents or scraped web content where the injection risk is higher.
VMs (Firecracker or QEMU) provide the strongest isolation. Each agent runs in its own virtual machine with a separate kernel. The performance overhead is highest, but the isolation is nearly airtight. This is what you want for multi-tenant environments where different customers' data must be completely separated, or for deployments processing highly sensitive information.
Network Controls - The Underrated Layer
If I had to pick one security control for agent deployments, it would be network restrictions. An agent that can't make arbitrary outbound network requests can't exfiltrate data, regardless of what prompt injection manages to achieve.
The pattern is simple. Run a proxy (nginx, Envoy, whatever your team knows) outside the agent's environment. Configure the agent's network to only reach the proxy. The proxy maintains an allowlist of permitted endpoints. Requests to anything else get blocked and logged.
This catches the scenario Anthropic specifically calls out - an agent processing a malicious file that instructs it to send data to an external server. The agent might try to comply, but the network control blocks the request. Defence in depth in action.
One caveat from the docs worth noting: the sandbox proxy doesn't inspect TLS traffic. It allowlists domains but can't validate what's being sent within the encrypted connection. If your agent has permissive credentials for an allowed domain, a compromised agent could potentially abuse those credentials within that domain. Your proxy logging won't show the request contents. Keep this in mind when designing your security boundaries.
What We Recommend for Australian Businesses
For most of the AI agent deployments we build, we follow a standard template:
- Agent runs in a Docker container with dropped capabilities and a read-only filesystem
- Only the working directory is mounted read-write
- Network access is restricted to a reverse proxy
- The proxy handles authentication, injects credentials, and maintains domain allowlists
- All requests are logged for audit purposes
- The container image is rebuilt regularly to pick up security patches
This gives you strong isolation without excessive operational complexity. It works on Azure Container Apps, Azure Kubernetes Service, or even a simple VM with Docker installed.
For organisations with stricter requirements - financial services, healthcare, government - we add gVisor or VM-level isolation and implement additional monitoring. But for most businesses, the container-plus-proxy pattern is the right balance of security and practicality.
Don't Skip the Basics
One more thing that the Anthropic guide touches on but deserves emphasis. All of this isolation and network control doesn't help if your agent's instructions are poorly written. Clear, specific instructions reduce the chance of the agent doing something unexpected. Tell it exactly what it should and shouldn't do. Define the scope of its responsibilities. Include explicit instructions about not following instructions from processed content.
Good prompt engineering is a security control. It's not sufficient on its own - you still need the technical controls - but it's a necessary part of the stack.
Getting Started
If you're deploying Claude agents or building with the Agent SDK and want to make sure your security posture is solid, talk to us. We can review your architecture, recommend the right isolation approach for your threat model, and help you implement it. Getting security right at the start is much cheaper than fixing it after something goes wrong.