Back to Blog

OpenClaw Sandbox CLI - Managing Isolated Runtimes for AI Agents

April 20, 20267 min readMichael Ridland

Deploy OpenClaw for Your Business

Secure deployment in 48 hours. Choose personal setup or fully managed.

Running AI agents against production codebases without isolation is one of those things that seems fine until it isn't. The agent deletes a file it shouldn't have. It runs a command that conflicts with another developer's work. It installs a dependency that breaks the build. These are real things that happen when agents have unrestricted access to your development environment.

OpenClaw's sandbox system gives you isolated runtimes where agents execute in their own contained environment. The sandbox CLI is how you manage those runtimes - inspecting, listing, and recreating them as your configuration changes. It's a small set of commands that solve a surprisingly annoying operational problem.

Why Sandboxing Matters for AI Agents

Let me be specific about what goes wrong without it.

When an AI agent runs code in your local environment, it has your permissions. It can read your SSH keys, access your environment variables, write to any directory you can write to. For a trusted tool running on your personal machine, that might be acceptable. For an agent running automated tasks on behalf of a team, or an agent executing untrusted code, it's a genuine risk.

Sandboxing puts a boundary around the agent's execution. The agent gets its own filesystem, its own process space, and a defined set of resources. It can do its work without accidentally (or intentionally) affecting anything outside its sandbox.

We've seen the practical value of this on client projects where multiple agents run concurrently against the same codebase. Without isolation, they step on each other. With sandboxed runtimes scoped per agent, each one works in its own copy of the workspace and the results get merged back cleanly.

The Three Backends

OpenClaw supports three sandbox backends, and choosing the right one depends on your infrastructure and security requirements.

Docker

This is the default and the most straightforward. Each agent gets a Docker container built from a base image. The container has its own filesystem, isolated networking, and defined resource limits. If you're already using Docker in your development workflow, this is the natural choice.

The workflow looks like this: you specify your sandbox image in the config, OpenClaw spins up a container when the agent starts, and the container is pruned after it's been idle for a configurable period (default 24 hours).

{
  "agents": {
    "defaults": {
      "sandbox": {
        "backend": "docker",
        "docker": {
          "image": "openclaw-sandbox:bookworm-slim"
        }
      }
    }
  }
}

For most development teams, Docker sandboxes are the right answer. They're fast to create, easy to configure, and the isolation model is well understood.

SSH

The SSH backend runs agents on a remote machine via SSH. This is useful when you need agents to execute in an environment that matches your production setup - specific OS versions, particular hardware, or pre-installed toolchains that are hard to replicate in Docker.

One thing to understand about the SSH backend: the remote workspace is canonical after the initial seed. That means the first time an agent runs, OpenClaw copies your local workspace to the remote host. After that, the remote copy is the source of truth. Running openclaw sandbox recreate deletes that remote workspace entirely and seeds it fresh from your local copy next time.

This matters because if the agent has made changes on the remote workspace and you recreate it, those changes are gone. Make sure you're pulling back any results you need before recreating.

OpenShell

OpenShell is the newest backend and it's designed for cloud-native setups. It supports both local and remote modes, with the remote mode behaving similarly to SSH in terms of workspace management. The difference is in how the runtime is provisioned and managed - OpenShell handles the infrastructure layer so you don't need to manage SSH targets directly.

The CLI Commands

There are three commands, and they do what you'd expect.

openclaw sandbox explain

This is the diagnostic command. It shows you the effective sandbox configuration for a given agent or session - what backend is being used, what the tool policy is, what workspace access looks like.

openclaw sandbox explain --agent work

I use this more often than I expected. When something's not working right in a sandboxed agent, the first question is always "what does the sandbox actually look like?" This command answers that without you having to read through config files and figure out inheritance rules.

Deploy OpenClaw for Your Business

Secure deployment in 48 hours. Choose personal setup or fully managed.

openclaw sandbox list

Lists all active sandbox runtimes with their status, backend type, age, idle time, and which agent they belong to.

openclaw sandbox list

The output tells you whether a runtime matches the current configuration. That's a useful detail - if you changed your config but runtimes are still running with the old settings, list will flag the mismatch. No more wondering why your agent is behaving differently than expected after a config change.

openclaw sandbox recreate

This is the one you'll use most often. It removes existing runtimes so they get recreated with your current configuration next time an agent runs.

openclaw sandbox recreate --all

You can scope it to a specific agent, a specific session, or just browser containers. There's a --force flag to skip confirmation, but I'd recommend against using that in scripts until you're confident in what you're deleting.

When You Need to Recreate

The documentation lists several scenarios, and they're all situations we've hit in practice.

After updating a Docker image. You pull a new base image, tag it, but your running containers are still using the old one. recreate --all forces them to rebuild from the new image.

After changing sandbox configuration. Updated the scope from agent to session? Changed the prune settings? Existing runtimes won't pick up those changes automatically. Recreate them.

After changing SSH targets or auth material. If you've rotated SSH keys, changed the target host, or updated certificate material, you need to recreate so the new credentials are used.

After changing setupCommand. If your agent's setup script has changed - maybe you added a new dependency or changed a build step - existing runtimes still have the old setup. Recreate to run the new setup.

The general pattern is: if you changed anything about how the sandbox should be configured, recreate. Runtimes don't auto-update. They keep running with whatever settings they were born with until they're pruned or recreated.

Configuration Tips

Sandbox configuration lives in ~/.openclaw/openclaw.json. A few settings worth thinking about:

Scope controls how many runtimes you get. agent means each agent gets its own runtime. session means each session gets its own. shared means all agents share one. For teams, agent scope usually makes the most sense because different agents often need different environments.

Prune settings control automatic cleanup. The defaults - 24 hours idle and 7 days maximum age - are reasonable for most setups. If you're running agents infrequently, you might want to shorten these to avoid accumulating stale containers.

Mode controls which agents get sandboxed. all sandboxes everything. non-main only sandboxes agents that aren't your primary one. off disables sandboxing entirely. We generally recommend all for team environments and non-main for individual development where you trust your primary agent but want isolation for experimental ones.

Our Take

Sandboxing is one of those operational concerns that doesn't feel important until you hit a problem. And the problems it prevents - agents interfering with each other, agents accessing things they shouldn't, agents leaving behind state that breaks future runs - are the kind that waste hours of debugging time.

OpenClaw's approach is pragmatic. Three backends to match different infrastructure setups. A simple CLI to manage runtimes. Configuration that defaults to sensible values but lets you customise when needed. It's not flashy, but it works.

If you're running AI agents in any kind of team or production setting, some form of isolation is worth implementing. OpenClaw's sandbox system is one way to do it. For organisations exploring AI agent infrastructure more broadly, our agentic automation practice helps teams design and deploy agent architectures that are secure and operationally sound. And if you're evaluating platforms like OpenClaw for your AI development workflow, our AI agent development team can help you make an informed choice based on your specific requirements.

Deploy OpenClaw for Your Business

Secure deployment in 48 hours. Choose personal setup or fully managed.