Deploying Azure AI Services Containers to Azure Container Instances - A Practical Guide
Deploying Azure AI Services Containers to Azure Container Instances - A Practical Guide
There's a common situation we see with Australian businesses adopting AI services. They want to use Azure's cognitive capabilities - text analytics, speech recognition, key phrase extraction - but they have constraints around where the data gets processed. Maybe it's a compliance requirement. Maybe it's latency. Maybe they just want more control over the runtime environment.
Azure AI Services containers solve this by letting you run the same models that power Azure's cloud APIs inside your own container infrastructure. And Azure Container Instances (ACI) is the simplest way to get those containers running in Azure without managing VMs or Kubernetes clusters.
Why Containers for AI Services?
The cloud-hosted APIs work fine for most use cases. You send a request, get a response. Simple. But containers make sense when you need to keep data processing within a specific network boundary, when you need predictable latency without internet round-trips, or when you want to run AI inference closer to where your data lives.
We've seen this come up most often in healthcare and financial services. One client in the insurance space needed text analytics capabilities but their security team wasn't comfortable sending policyholder data to a public API endpoint. Running the text analytics container inside their own Azure subscription - behind their own network security groups - made the security review much simpler.
The other scenario is cost predictability. API calls are priced per transaction. If you're processing millions of documents, a container running on dedicated compute can work out cheaper, depending on the volume and the service.
The Basic Setup
The process for getting an Azure AI Services container onto ACI is straightforward, though there are a few gotchas that the documentation glosses over.
You need three things before you start. First, an Azure AI resource (the Foundry resource) created in the Azure portal - this gives you the API key and endpoint URL that the container uses for billing and authentication. Second, you need to have already pulled and run the container locally with Docker to confirm it works. Third, you need to know the exact image name for the service you're deploying.
That second point is worth emphasising. Don't try to go straight to ACI without testing locally first. We learned this the hard way on an early deployment. The container configuration is specific to each AI service, and debugging configuration issues is much faster on your local machine than in ACI where you're waiting for container restarts and checking logs through the Azure portal.
Deploying Through the Azure Portal
The portal approach is good for one-off deployments or when you're experimenting. You go to the Container Instances creation page and fill in the basics - subscription, resource group, container name, location.
For the image settings, you need the full image path. Something like mcr.microsoft.com/azure-cognitive-services/keyphrase for Key Phrase Extraction, or mcr.microsoft.com/azure-cognitive-services/textanalytics/language for language detection. The exact paths differ between services, so check the specific documentation for whichever AI service you're deploying.
Size matters here. The documentation recommends 2 CPU cores and 4 GB memory as a starting point, but in practice this varies significantly between services. Speech-to-text containers want more memory. Vision containers want more CPU. Start with the documented recommendations and load test from there.
On the networking tab, set TCP port 5000. That's the default port these containers expose their API on.
The advanced tab is where you set the environment variables that every Azure AI container needs:
- ApiKey - one of the two keys from your Azure AI resource
- Billing - the endpoint URL from your Azure AI resource
- Eula - set to "accept"
That billing endpoint is important to understand. Even though the container runs inference locally, it still phones home to Azure for billing. If the container can't reach the billing endpoint, it'll stop processing after a grace period. So your networking needs to allow outbound connectivity to Azure, even if you're keeping the inference traffic internal.
Deploying with the Azure CLI
For anything beyond experimentation, use the CLI with a YAML definition file. It's repeatable, version-controllable, and you can integrate it into your CI/CD pipeline.
The YAML file defines your container group - the image, environment variables, resource requests, ports, and restart policy. Here's what a real deployment looks like in practice:
apiVersion: 2018-10-01
location: australiaeast
name: text-analytics-container
properties:
containers:
- name: text-analytics
properties:
image: mcr.microsoft.com/azure-cognitive-services/textanalytics/language
environmentVariables:
- name: eula
value: accept
- name: billing
value: https://your-resource.cognitiveservices.azure.com/
- name: apikey
value: your-api-key-here
resources:
requests:
cpu: 4
memoryInGb: 8
ports:
- port: 5000
osType: Linux
restartPolicy: OnFailure
ipAddress:
type: Public
ports:
- protocol: tcp
port: 5000
type: Microsoft.ContainerInstance/containerGroups
Then deploy with:
az container create -g your-resource-group -f my-aci.yaml
A few things to watch out for. Not all Azure regions have the same CPU and memory availability for ACI. Australia East generally has good availability, but if you need larger containers, check the region availability documentation first. We've had a client try to deploy a speech container with 8 GB memory in a region that only supported up to 4 GB, and the error message was not particularly helpful.
Special Considerations for LUIS Containers
If you're deploying the LUIS (Language Understanding) container, there's an extra step. LUIS containers pull their model file at runtime from an Azure File Share, rather than having it baked into the container image. You need to export your LUIS model as a packaged app, upload it to an Azure File Share, and then mount that share to the container using the volumes configuration in your YAML.
This is one of the more fiddly setups in the Azure AI container world. Get your storage account name, key, and share name ready before you start, and test the file share mount with a simple container first to make sure your permissions are right.
Validating Your Deployment
Once the container is running, there are several endpoints you can hit to check its health:
- The root URL (
http://your-ip:5000/) shows a home page confirming the container is running /readyreturns a 200 if the container is ready to accept queries - useful for health checks/statusvalidates that your API key is working without running an actual inference query/swaggergives you the full API documentation with a "Try it out" feature for testing
That /ready endpoint is the one to use for monitoring. We typically set up an Azure Monitor alert that pings it every few minutes. If it goes down, you know before your users do.
When ACI Is Right (and When It Isn't)
ACI is the right choice for single-container deployments where you don't need autoscaling or complex orchestration. It's good for dev/test environments, for proof-of-concept work, and for production workloads with predictable, steady traffic.
It's not the right choice if you need horizontal scaling, if you need to run multiple related containers that communicate with each other, or if you need fine-grained networking controls. For those scenarios, Azure Kubernetes Service (AKS) is the better option, though it comes with significantly more operational overhead.
Most of our clients start with ACI for their initial AI container deployments and move to AKS once they have multiple containers in production and need proper orchestration. That progression makes sense - don't over-engineer the infrastructure before you've validated the use case.
Fitting This Into Your AI Strategy
Container-based AI deployment is one piece of a bigger puzzle. If you're exploring Azure AI services for your organisation, the key question isn't usually "can we run this in a container?" - it's "should we?"
For most text analytics, translation, and simple vision tasks, the cloud APIs are simpler and cheaper at low to moderate volumes. Containers make sense when you have specific compliance, latency, or cost requirements that push you towards running your own infrastructure.
If you're working through these decisions and want a sounding board, our AI consulting team works with Australian businesses across healthcare, financial services, and professional services on exactly these kinds of architecture questions. Reach out to us if you want to talk through your specific situation.
For the full deployment walkthrough, see Microsoft's Azure Container Instance recipe documentation.