Microsoft Foundry Deployment Types Explained - Which One Should Your Business Actually Use?
One of the first real decisions you'll face when building on Microsoft Foundry is how to deploy your models. It sounds simple - pick a model, deploy it, start calling the API. But Foundry now offers nine different deployment types, each with different implications for where your data gets processed, how much you pay, and what kind of performance you get.
We've worked through this decision with enough Australian clients that I can tell you: most teams pick the wrong deployment type first. Not because they're making bad decisions, but because Microsoft's docs don't explain the trade-offs in plain language. The official documentation is thorough, but it reads like a spec sheet rather than a decision guide.
So let me walk you through it the way I'd explain it at a whiteboard.
The Two Things That Actually Matter
Every deployment type in Foundry comes down to two variables:
1. Where does your data get processed? This isn't where your data is stored at rest - that always stays in your nominated Azure geography. This is about where the inference happens when you send a prompt to the model. Your options are:
- Global: Could be processed anywhere in the world
- Data Zone: Stays within a defined zone (currently US or EU)
- Regional: Processed only in the specific Azure region you deployed to
2. How do you pay?
- Pay-per-token (Standard): You pay for what you use. Good for variable or low-volume workloads.
- Reserved capacity (Provisioned): You buy a fixed amount of processing capacity upfront, measured in Provisioned Throughput Units (PTUs). Good for high-volume, predictable workloads.
- Batch: 50% cheaper than Standard, but your requests go into a queue and come back within 24 hours (sometimes longer).
That's it. Those two axes combine to create the nine deployment types. Once you see it that way, the matrix stops being confusing.
The Nine Types at a Glance
My simplified version of the comparison:
| Deployment Type | Data Processing | Billing | When to Use It |
|---|---|---|---|
| Global Standard | Anywhere | Pay-per-token | Default starting point for most workloads |
| Global Provisioned | Anywhere | Reserved PTUs | High-volume production with predictable throughput needs |
| Global Batch | Anywhere | 50% discount, 24hr turnaround | Large async jobs where you don't need real-time responses |
| Data Zone Standard | US or EU only | Pay-per-token | When you need data to stay within a defined zone |
| Data Zone Provisioned | US or EU only | Reserved PTUs | Data zone compliance plus high throughput |
| Data Zone Batch | US or EU only | 50% discount | Batch processing with data zone requirements |
| Standard (Regional) | Single region only | Pay-per-token | Strict regional compliance requirements |
| Regional Provisioned | Single region only | Reserved PTUs | Regional compliance plus guaranteed throughput |
| Developer | Anywhere | Pay-per-token | Fine-tuned model evaluation only (24hr lifetime) |
What We Actually Recommend for Most Australian Clients
About 80% of Australian businesses we work with should start with Global Standard and stay there until they have a specific reason to change.
Global Standard gives you the highest default quota, access to the newest models first, and the simplest setup. You pay per token, no minimum commitment. If you're running a proof of concept or deploying your first AI agent, just use this one.
The "global" part means your prompts could be processed in any Azure region worldwide. For internal tools, development environments, and non-regulated workloads, that's fine. Your data at rest stays in your Azure geography regardless.
When to Move Beyond Global Standard
These are the scenarios where we've actually recommended clients switch:
You need predictable latency at scale → Global Provisioned
One of our clients runs a customer-facing document processing system handling thousands of requests per hour. On Global Standard, they were getting occasional latency spikes - not failures, but enough variability to blow their SLAs. Global Provisioned fixed it. Reserved capacity means you're not sharing a pool with everyone else.
The catch: Provisioned capacity isn't cheap. You're buying PTUs whether you use them or not. For this client it worked out cheaper than pay-per-token at their volume. But if you're processing a few hundred requests a day? The minimum PTU commitment will burn money for nothing.
You're doing large batch jobs → Global Batch
If you're processing thousands of documents or running sentiment analysis across a dataset and you don't need results in real-time, Batch is the obvious pick. Same models, 50% of the cost. The trade-off is a 24-hour turnaround window, though most jobs we've run finish well before that.
We set this up for a client doing quarterly analysis of customer feedback - tens of thousands of items for classification and summarisation. Batch pricing saved them roughly 40% compared to running the same work through Standard.
You have strict data processing requirements → Regional Standard
This is the one Australian businesses ask about most. If your organisation has a policy (or regulatory requirement) that AI inference must happen within a specific Azure region - say, Australia East - you need Regional Standard or Regional Provisioned.
The part that trips people up: not every model is available in every region. The model catalogue for Australia East is noticeably thinner than what's available globally. You might find the model you actually want isn't deployed in Australia. At that point you've got a choice: use a different model that is available locally, or relax your data processing requirement and go Global.
There's no Data Zone option for Australia yet - the current Data Zones only cover the US and EU. For Australian data residency requirements, it's either Regional or Global. Nothing in between.
The Data Residency Question That Keeps Coming Up
Nearly every AI consulting engagement we do for Australian enterprise clients involves a conversation about data residency. And nearly every time, the conversation starts with the same misconception.
The misconception is: "If I use Global Standard, my data could end up stored anywhere in the world."
That's not how it works. Regardless of deployment type, data stored at rest stays in your designated Azure geography. The difference is only about where the inference computation happens - where the model actually processes your prompt and generates a response. Your data is encrypted in transit. The prompt goes to a data centre, gets processed, the result comes back. The model doesn't retain anything.
For most use cases, this means Global Standard is fine even for organisations that care about data sovereignty. The real exceptions are industries with explicit regulatory requirements about where computation occurs (not just storage) - certain government contracts, financial services, and healthcare.
If you're unsure whether your situation requires Regional deployment, talk to your compliance team with this specific distinction in mind. Don't default to Regional "just to be safe" - you'll limit your model options and pay more for it.
Cost Surprises to Watch For
Deployment type is only part of the cost picture in Foundry. A few things that catch people off guard:
Agent hosting costs add up quietly. If you're using Foundry Agent Service to build AI agents, you're paying for orchestration, conversation storage, tool calls, and monitoring on top of model inference. None of these are huge line items on their own. Together, they can surprise you.
Provisioned capacity is use-it-or-lose-it. You buy PTUs, you pay for them 24/7. We've seen organisations provision for peak load and then pay for idle capacity overnight and on weekends. If your traffic has clear patterns, work out whether you can scale PTUs up and down, or whether Standard with some latency variability is actually cheaper overall.
Batch isn't always cheaper in total. Yes, 50% cheaper per token. But if you're using Batch because you couldn't get enough Standard quota and your workload actually needs results faster than 24 hours, you'll end up building queuing logic, retry handling, and status polling. The engineering cost of that plumbing can outweigh the token savings, especially for smaller workloads.
A Decision Framework
The decision tree I walk clients through:
Step 1: Do you have a hard regulatory requirement about where AI inference computation occurs?
- Yes → Regional Standard or Regional Provisioned. Check model availability in your required region first.
- No → Continue to Step 2.
Step 2: What's your expected volume?
- Low/variable (under ~100K tokens/day) → Global Standard. Don't overthink it.
- High and consistent → Consider Global Provisioned. Run the numbers on PTU pricing vs. pay-per-token at your expected volume.
- Large batch jobs (not time-sensitive) → Global Batch.
Step 3: Do you need consistent latency?
- Yes, for customer-facing applications → Provisioned (Global or Regional depending on Step 1).
- No, some variability is fine → Standard is sufficient.
Three questions. That's all you need.
Where This Fits
Deployment types are one piece of a bigger puzzle. You've still got model selection to think about (Foundry's catalogue lets you mix models within a single system), governance and monitoring, and how you integrate with your data layer through tools like Microsoft Fabric.
But deployment type is one of those decisions that's easy to get wrong early and painful to change later. Once your infrastructure and cost models are built around a specific type, switching means rework.
If you're planning an AI project on Microsoft Foundry and want a second opinion on deployment types, model selection, or cost modelling, get in touch with our team. We'd rather help you get the architecture right now than untangle it six months from now.
For the full technical reference on deployment types, see the Microsoft Foundry deployment types documentation.