Running Azure AI Services in Disconnected Environments - A Practical Guide
Not every environment has reliable internet access. That might sound odd in 2026, but if you work with mining operations in remote Australia, defence systems with strict air-gap requirements, or hospitals that can't send patient data to the cloud, you know exactly what I'm talking about. Azure AI disconnected containers exist for these situations, and they're more capable than most people realise.
What Are Disconnected Containers?
Azure AI Services offers Docker containers that let you run AI capabilities locally without a persistent internet connection. These aren't watered-down versions of the cloud APIs - they're the same models, running on your own infrastructure, with no requirement to phone home during normal operation.
The list of what you can run offline is pretty solid:
- Speech services - speech to text, custom speech to text, and neural text to speech
- Text translation (standard)
- Language services - sentiment analysis, key phrase extraction, language detection, summarisation, named entity recognition, PII detection, and conversational language understanding
- Vision - the Read OCR capability
- Document Intelligence - form and document processing
That's a lot of AI capability you can run on a server sitting in a shipping container on a mine site or inside a secure government network.
Microsoft's official documentation on disconnected containers covers the technical setup, but I want to talk about the practical side - when this approach makes sense and what to watch out for.
Why Would You Run AI Offline?
We see three main patterns across our Azure AI consulting work:
True air-gapped environments. Defence, certain government agencies, and some critical infrastructure operators have networks that physically cannot reach the internet. If they want AI capabilities, it has to run locally. Full stop.
Data sovereignty and compliance. Some organisations, particularly in healthcare and financial services, have strict rules about where data can go. Even though Azure has Australian data centres, some compliance frameworks require that certain data never leaves the organisation's own infrastructure. Disconnected containers let you process sensitive documents, patient records, or financial communications without data leaving your network.
Intermittent connectivity. Think remote mining operations, maritime vessels, or rural healthcare facilities. They might have satellite connectivity that works most of the time, but you can't build a production workflow that falls over every time the connection drops. Local containers keep processing regardless of network status.
Getting Access - It's Not Self-Service
This is worth calling out early because it catches people off guard. You can't just pull these containers from a registry and start using them offline. Microsoft requires you to apply for access, and the approval criteria are specific:
- Your organisation needs to be identified as a strategic customer or partner with Microsoft
- Your use case must genuinely require offline operation (zero connectivity, remote locations, or regulatory restrictions against cloud data processing)
- You need to complete the application carefully - Microsoft reviews these and they're looking for legitimate disconnected use cases
I've seen applications get rejected because the stated use case was essentially "we'd prefer not to use the cloud" rather than "we cannot use the cloud." Microsoft wants these containers used for genuine disconnected scenarios, not as a way to avoid cloud billing or data processing agreements.
The Pricing Model
Disconnected containers use commitment tier pricing. When you create your Azure AI resource in the portal, you select "Commitment tier disconnected containers" as the pricing tier. This means you're paying a fixed monthly fee for a set amount of usage, rather than pay-per-call like the standard cloud APIs.
This pricing model actually works well for most disconnected scenarios. If you're processing documents in a factory or transcribing medical dictation in a hospital, you typically have predictable, steady usage patterns that fit a commitment tier nicely.
The commitment tier option only appears if Microsoft has approved your application - you won't see it otherwise.
Setting It Up - What's Actually Involved
The technical setup follows a pattern that's consistent across all the container types:
Initial setup requires internet access. You need to pull the container image and download a licence. This is a one-time operation (plus periodic licence renewals).
Configure the container for disconnected mode. Each container type has specific environment variables and mount points. The key ones are the licence mount and the output mount for usage logging.
Run the container with Docker. A typical run command looks something like:
docker run -v /host/output:/output ... <image> ... Mounts:Output=/output
- Usage tracking is local. The container writes usage records to the mounted output volume. You can query these via REST endpoints on the container itself - either for all-time usage or for a specific month.
Kubernetes Deployment Note
If you're deploying to Kubernetes (which is common for production disconnected environments), there's a gotcha with environment variable names. Some containers use colons in their environment variable names (like Mounts:License), which Kubernetes doesn't accept. The fix is straightforward - replace colons with double underscores:
env:
- name: Mounts__License
value: "/license"
- name: Mounts__Output
value: "/output"
Small thing, but it'll save you a frustrating debugging session.
Real-World Considerations
Model Updates
Because you're running locally, you don't get automatic model updates like cloud users do. When Microsoft releases an improved model, you need to pull the new container image and redeploy. For air-gapped environments, this means physically transferring the image via secure media.
Plan for a regular update cadence. We typically recommend quarterly reviews of available updates, with actual deployments scheduled around maintenance windows.
Hardware Requirements
These containers run real ML models, so they need real compute resources. The specific requirements vary by container type, but expect to need:
- Decent CPU (some containers benefit significantly from multiple cores)
- Adequate RAM (the language models in particular can be memory-hungry)
- GPU support for some containers (neural text to speech performs much better with a GPU)
Size your hardware based on your expected throughput, not just the minimum requirements. The minimum specs will run the container, but response times under load might not meet your SLA.
Monitoring and Usage Reporting
The containers expose REST endpoints for usage data, which you'll want to integrate with your monitoring stack. The endpoints return JSON with meter names and quantities:
{
"apiType": "noop",
"serviceName": "noop",
"meters": [
{
"name": "Sample.Meter",
"quantity": 253
}
]
}
You can also pull usage for specific months, which is useful for cost allocation and capacity planning.
When This Isn't the Right Approach
I want to be honest about the limitations. Disconnected containers are the right tool for specific situations, but they're not always the best choice:
If you have reliable internet, the cloud APIs are almost always simpler to manage. No container orchestration, no licence management, no manual updates. The cloud versions also tend to get new features and model improvements faster.
If you need the very latest models, cloud is better. There's an inherent lag between when Microsoft releases a new model version and when it's available as a disconnected container.
If your use case is experimental, start with the cloud APIs and only move to disconnected containers once you've validated the approach and confirmed you genuinely need offline operation.
How We Can Help
We've deployed Azure AI containers in disconnected environments for manufacturing and mining clients across Australia. The pattern works well once it's set up properly, but the initial architecture decisions - which containers you need, how to handle updates, how to monitor usage, how to size hardware - benefit from experience.
If you're evaluating offline AI capabilities for your organisation, reach out and we can talk through the specifics. We work closely with Microsoft's Azure AI stack through our Azure AI Foundry consulting practice, so we can help you figure out whether disconnected containers are the right fit or whether there's a better approach for your situation.