Deploying Multiple Azure AI Containers with Docker Compose
Running AI services locally isn't just for development anymore. We're seeing more Australian organisations - particularly in healthcare, government, and financial services - that need AI capabilities but can't send data to the cloud. Data sovereignty requirements, latency constraints, or just plain network unreliability in regional locations. Whatever the reason, containerised Azure AI services solve a real problem.
Docker Compose makes it practical to run multiple AI services on a single machine without the overhead of Kubernetes. You define your services in a YAML file, run one command, and you've got Document Intelligence and Computer Vision running side by side. It's not the right approach for every scenario, but when it fits, it's remarkably straightforward.
The official Microsoft documentation covers the mechanics. Let me share what we've learned deploying these in production.
Why Run AI Containers Locally?
The most common scenario we encounter is document processing pipelines where the data can't leave a specific network boundary. A government agency processing citizen documents. A healthcare provider running OCR on patient records. A legal firm extracting clauses from contracts that are subject to privilege.
In all these cases, the cloud versions of Azure AI services work fine functionally. The constraint is about where the data goes, not what happens to it. Containers give you the same AI capability within your own infrastructure boundary.
There's also the latency angle. If you're processing thousands of documents and each one requires a round trip to an Azure region, the cumulative latency adds up. Local containers eliminate that entirely. We've seen processing pipelines run 3-4x faster just by removing the network hop.
The Docker Compose Setup
Here's what the compose file looks like for running Document Intelligence alongside the Vision Read (OCR) container:
version: '3.7'
services:
forms:
image: "mcr.microsoft.com/azure-cognitive-services/form-recognizer/layout-3.1:latest"
environment:
eula: accept
billing: # Your Document Intelligence billing URL
apikey: # Your Document Intelligence API key
FormRecognizer__ComputerVisionApiKey: # Your API key
FormRecognizer__ComputerVisionEndpointUri: # Your endpoint URI
volumes:
- type: bind
source: /data/docai/output
target: /output
- type: bind
source: /data/docai/input
target: /input
ports:
- "5010:5000"
ocr:
image: "mcr.microsoft.com/azure-cognitive-services/vision/read:latest"
environment:
eula: accept
apikey: # Your Vision API key
billing: # Your Vision billing URL
ports:
- "5021:5000"
A few things to note that the docs mention but are easy to miss:
The volume directories must exist before you start the containers. Docker won't create them for you with bind mounts. If you're scripting deployments, add a mkdir -p step before docker-compose up. I've seen teams waste hours debugging startup failures that came down to missing directories.
Port mapping matters. Both containers internally listen on port 5000, so you need different host ports. The example uses 5010 and 5021 - pick whatever makes sense for your environment, but document it somewhere your team can find it.
The billing URL and API key are still required even for local containers. Microsoft uses these for metering, not for routing traffic. Your data stays local, but Microsoft still tracks usage for billing purposes. This surprises some clients who assume "local" means "no Azure dependency at all." You still need an Azure subscription and the appropriate resource provisioned.
Getting It Running
Start everything with:
docker-compose up
The first run pulls the container images, which are substantial - expect several gigabytes per image. On Australian internet connections, particularly in regional areas, plan accordingly. We've had deployments where the initial pull took over an hour on a 50Mbps connection.
Once pulled, the containers start up and you'll see logs from both services. You know things are working when you see:
Now listening on: http://0.0.0.0:5000
Application started. Press Ctrl+C to shut down.
For production deployments, add the -d flag to run in detached mode:
docker-compose up -d
Practical Considerations We've Learned
Resource Allocation
These containers aren't lightweight. Document Intelligence in particular can be memory-hungry when processing complex documents with embedded tables and images. We typically allocate a minimum of 8GB RAM and 4 CPU cores per container for production workloads. For the OCR container processing high-resolution scans, bump that to 12GB.
You can set resource limits in the compose file:
services:
forms:
# ... other config
deploy:
resources:
limits:
cpus: '4'
memory: 8G
reservations:
cpus: '2'
memory: 4G
Health Checks
Add health checks to your compose file. The containers expose endpoints you can ping, and Docker Compose can automatically restart unhealthy containers:
services:
forms:
# ... other config
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:5000/status"]
interval: 30s
timeout: 10s
retries: 3
Without health checks, a container that's crashed internally but hasn't actually exited will sit there accepting connections and returning errors. We learned this one in production when a client's overnight batch job silently failed because the forms container had run out of memory but was still technically "running."
Logging and Monitoring
By default, both containers log to stdout. For production, you'll want to configure log rotation and potentially ship logs to a central location:
services:
forms:
# ... other config
logging:
driver: "json-file"
options:
max-size: "50m"
max-file: "5"
Without log rotation, the containers will eventually fill your disk. Ask me how I know.
Common Pitfalls
Pricing tier mismatches. The containers only work with specific pricing tiers - F0 or Standard for Vision and Document Intelligence resources. If you've provisioned the wrong tier, the container will start but fail when you send requests. The error messages aren't always clear about what went wrong.
Gated preview containers. Some container versions require separate approval through Microsoft's online request form. If you're pulling a preview image and getting access denied, check whether it requires gating approval. This can add days to your deployment timeline, so check early.
Networking between containers. If your Document Intelligence container needs to call the OCR container (which it does for some operations), they communicate using Docker's internal networking. The FormRecognizer__ComputerVisionEndpointUri should point to the OCR container's internal address - something like http://ocr:5000. Don't use the host-mapped port for inter-container communication.
Image version pinning. Using :latest is convenient but risky for production. Pin to a specific version tag so your deployment doesn't break when Microsoft pushes an update. We've seen breaking changes in minor version bumps that caused document layouts to parse differently.
When Docker Compose Isn't Enough
Docker Compose works well for single-host deployments. If you need high availability, auto-scaling, or multi-node deployments, you'll want to move to Kubernetes or Azure Container Apps. But don't jump to orchestration platforms prematurely - many document processing workloads run perfectly well on a single beefy server with Docker Compose.
The decision point is usually around volume and uptime requirements. Processing a few hundred documents per day? Docker Compose on a single server is fine. Processing tens of thousands with 99.9% uptime requirements? Time to look at orchestration.
Putting It All Together
For a typical deployment, we build a wrapper script that handles directory creation, environment variable injection from a secrets manager, health checking after startup, and notification if something goes wrong. The compose file itself stays clean and readable.
We also recommend running the containers behind a reverse proxy like nginx or Traefik. This gives you TLS termination, request logging, and rate limiting without modifying the container configuration.
If your organisation is looking at running Azure AI services locally - whether for data sovereignty, latency, or offline scenarios - our team has done this across multiple industries. We provide Azure AI consulting and can help with everything from initial architecture through to production deployment. For organisations building broader AI processing pipelines, our AI solutions architecture practice covers the full design and implementation lifecycle.
The containerised approach isn't right for everyone. But when your constraints demand local processing, Docker Compose gives you a practical path from "we need AI on-premises" to "it's running in production" without overcomplicating the infrastructure.