Microsoft Foundry Deployment Types Explained - Which One Should Your Business Actually Use?

March 9, 2026•8 min read•Team 400

One of the first real decisions you'll face when building on Microsoft Foundry is how to deploy your models. It sounds simple - pick a model, deploy it, start calling the API. But Foundry now offers nine different deployment types, each with different implications for where your data gets processed, how much you pay, and what kind of performance you get.

We've worked through this decision with enough Australian clients that I can tell you: most teams pick the wrong deployment type first. Not because they're making bad decisions, but because Microsoft's docs don't explain the trade-offs in plain language. The official documentation is thorough, but it reads like a spec sheet rather than a decision guide.

So let me walk you through it the way I'd explain it at a whiteboard.

The Two Things That Actually Matter

Every deployment type in Foundry comes down to two variables:

1. Where does your data get processed? This isn't where your data is stored at rest - that always stays in your nominated Azure geography. This is about where the inference happens when you send a prompt to the model. Your options are:

Global: Could be processed anywhere in the world
Data Zone: Stays within a defined zone (currently US or EU)
Regional: Processed only in the specific Azure region you deployed to

2. How do you pay?

Pay-per-token (Standard): You pay for what you use. Good for variable or low-volume workloads.
Reserved capacity (Provisioned): You buy a fixed amount of processing capacity upfront, measured in Provisioned Throughput Units (PTUs). Good for high-volume, predictable workloads.
Batch: 50% cheaper than Standard, but your requests go into a queue and come back within 24 hours (sometimes longer).

That's it. Those two axes combine to create the nine deployment types. Once you see it that way, the matrix stops being confusing.

The Nine Types at a Glance

My simplified version of the comparison:

Deployment Type	Data Processing	Billing	When to Use It
Global Standard	Anywhere	Pay-per-token	Default starting point for most workloads
Global Provisioned	Anywhere	Reserved PTUs	High-volume production with predictable throughput needs
Global Batch	Anywhere	50% discount, 24hr turnaround	Large async jobs where you don't need real-time responses
Data Zone Standard	US or EU only	Pay-per-token	When you need data to stay within a defined zone
Data Zone Provisioned	US or EU only	Reserved PTUs	Data zone compliance plus high throughput
Data Zone Batch	US or EU only	50% discount	Batch processing with data zone requirements
Standard (Regional)	Single region only	Pay-per-token	Strict regional compliance requirements
Regional Provisioned	Single region only	Reserved PTUs	Regional compliance plus guaranteed throughput
Developer	Anywhere	Pay-per-token	Fine-tuned model evaluation only (24hr lifetime)

What We Actually Recommend for Most Australian Clients

About 80% of Australian businesses we work with should start with Global Standard and stay there until they have a specific reason to change.

Global Standard gives you the highest default quota, access to the newest models first, and the simplest setup. You pay per token, no minimum commitment. If you're running a proof of concept or deploying your first AI agent, just use this one.

The "global" part means your prompts could be processed in any Azure region worldwide. For internal tools, development environments, and non-regulated workloads, that's fine. Your data at rest stays in your Azure geography regardless.

When to Move Beyond Global Standard

These are the scenarios where we've actually recommended clients switch:

You need predictable latency at scale → Global Provisioned

One of our clients runs a customer-facing document processing system handling thousands of requests per hour. On Global Standard, they were getting occasional latency spikes - not failures, but enough variability to blow their SLAs. Global Provisioned fixed it. Reserved capacity means you're not sharing a pool with everyone else.

The catch: Provisioned capacity isn't cheap. You're buying PTUs whether you use them or not. For this client it worked out cheaper than pay-per-token at their volume. But if you're processing a few hundred requests a day? The minimum PTU commitment will burn money for nothing.

You're doing large batch jobs → Global Batch

If you're processing thousands of documents or running sentiment analysis across a dataset and you don't need results in real-time, Batch is the obvious pick. Same models, 50% of the cost. The trade-off is a 24-hour turnaround window, though most jobs we've run finish well before that.

We set this up for a client doing quarterly analysis of customer feedback - tens of thousands of items for classification and summarisation. Batch pricing saved them roughly 40% compared to running the same work through Standard.

You have strict data processing requirements → Regional Standard

This is the one Australian businesses ask about most. If your organisation has a policy (or regulatory requirement) that AI inference must happen within a specific Azure region - say, Australia East - you need Regional Standard or Regional Provisioned.

The part that trips people up: not every model is available in every region. The model catalogue for Australia East is noticeably thinner than what's available globally. You might find the model you actually want isn't deployed in Australia. At that point you've got a choice: use a different model that is available locally, or relax your data processing requirement and go Global.

There's no Data Zone option for Australia yet - the current Data Zones only cover the US and EU. For Australian data residency requirements, it's either Regional or Global. Nothing in between.

The Data Residency Question That Keeps Coming Up

Nearly every AI consulting engagement we do for Australian enterprise clients involves a conversation about data residency. And nearly every time, the conversation starts with the same misconception.

The misconception is: "If I use Global Standard, my data could end up stored anywhere in the world."

That's not how it works. Regardless of deployment type, data stored at rest stays in your designated Azure geography. The difference is only about where the inference computation happens - where the model actually processes your prompt and generates a response. Your data is encrypted in transit. The prompt goes to a data centre, gets processed, the result comes back. The model doesn't retain anything.

For most use cases, this means Global Standard is fine even for organisations that care about data sovereignty. The real exceptions are industries with explicit regulatory requirements about where computation occurs (not just storage) - certain government contracts, financial services, and healthcare.

If you're unsure whether your situation requires Regional deployment, talk to your compliance team with this specific distinction in mind. Don't default to Regional "just to be safe" - you'll limit your model options and pay more for it.

Cost Surprises to Watch For

Deployment type is only part of the cost picture in Foundry. A few things that catch people off guard:

Agent hosting costs add up quietly. If you're using Foundry Agent Service to build AI agents, you're paying for orchestration, conversation storage, tool calls, and monitoring on top of model inference. None of these are huge line items on their own. Together, they can surprise you.

Provisioned capacity is use-it-or-lose-it. You buy PTUs, you pay for them 24/7. We've seen organisations provision for peak load and then pay for idle capacity overnight and on weekends. If your traffic has clear patterns, work out whether you can scale PTUs up and down, or whether Standard with some latency variability is actually cheaper overall.

Batch isn't always cheaper in total. Yes, 50% cheaper per token. But if you're using Batch because you couldn't get enough Standard quota and your workload actually needs results faster than 24 hours, you'll end up building queuing logic, retry handling, and status polling. The engineering cost of that plumbing can outweigh the token savings, especially for smaller workloads.

A Decision Framework

The decision tree I walk clients through:

Step 1: Do you have a hard regulatory requirement about where AI inference computation occurs?

Yes → Regional Standard or Regional Provisioned. Check model availability in your required region first.
No → Continue to Step 2.

Step 2: What's your expected volume?

Low/variable (under ~100K tokens/day) → Global Standard. Don't overthink it.
High and consistent → Consider Global Provisioned. Run the numbers on PTU pricing vs. pay-per-token at your expected volume.
Large batch jobs (not time-sensitive) → Global Batch.

Step 3: Do you need consistent latency?

Yes, for customer-facing applications → Provisioned (Global or Regional depending on Step 1).
No, some variability is fine → Standard is sufficient.

Three questions. That's all you need.

Where This Fits

Deployment types are one piece of a bigger puzzle. You've still got model selection to think about (Foundry's catalogue lets you mix models within a single system), governance and monitoring, and how you integrate with your data layer through tools like Microsoft Fabric.

But deployment type is one of those decisions that's easy to get wrong early and painful to change later. Once your infrastructure and cost models are built around a specific type, switching means rework.

If you're planning an AI project on Microsoft Foundry and want a second opinion on deployment types, model selection, or cost modelling, get in touch with our team. We'd rather help you get the architecture right now than untangle it six months from now.

For the full technical reference on deployment types, see the Microsoft Foundry deployment types documentation.