Azure AI Foundry vs Amazon SageMaker - Which Platform Fits
Every few weeks an Australian CTO sends me a Slack message that reads, roughly, "Our data team wants SageMaker, our app team wants Azure AI Foundry, what do we do?" The honest answer is that both can ship the same product. The dishonest answer pretends one of them is obviously better.
I've helped clients ship production systems on both. Some are still on the platform they started with. A handful migrated. A few run hybrid stacks and probably will forever. This is what I tell people when they actually want a decision rather than a feature checklist.
What you're really choosing
Azure AI Foundry and Amazon SageMaker have grown into broadly overlapping platforms in 2026, but they came from different directions and it still shows.
Foundry started as a way to build applications on top of pre-trained models, particularly OpenAI's. Its centre of gravity is "I have a model, now I want to wrap it in agents, evaluations, observability, and a deployment story." Microsoft has spent the last two years bolting on the fine-tuning, training, and data plumbing that SageMaker has had for a decade.
SageMaker started life as a machine learning workbench. Its centre of gravity is "I have data and a problem, now I want to experiment, train, evaluate, and deploy a model I own." AWS spent the last two years bolting on agents, Bedrock model access, and the application-layer pieces that Foundry was born with.
You're choosing a starting point, not a ceiling. Both will do most things eventually. The question is what feels native and what feels grafted on for the work you do most.
A quick comparison table
| Dimension | Azure AI Foundry | Amazon SageMaker |
|---|---|---|
| Best at | LLM apps, agents, RAG, enterprise integration | ML training pipelines, custom models, data science workflows |
| Frontier model access | OpenAI native, plus Llama, Mistral, Phi | Bedrock for Claude, Llama, Titan, Mistral |
| Anthropic Claude | Not available in Foundry directly | Available via Bedrock |
| Australian regions | Australia East (Sydney) for most services | ap-southeast-2 (Sydney), ap-southeast-4 (Melbourne) |
| Data residency story | Strong, mature, well-documented | Strong, more regions, slightly more configuration |
| Microsoft 365 integration | Native, deep, almost unfair | Via API, with effort |
| Custom model training | Capable but newer | Mature, well-trodden |
| Vendor lock-in feel | Higher if you use the proprietary pieces | High if you use Bedrock and SageMaker Pipelines |
| Pricing predictability | Better for token-based workloads | Better for steady training workloads |
| Australian skills market | Larger pool of .NET, Power Platform, Azure devs | Smaller but skilled ML engineer pool |
The table tells you about 30% of the story. The rest depends on your team and your work.
What we actually see in Australian projects
A few patterns come up so often I now expect them.
Microsoft-heavy organisations land on Foundry, and they should. If your company runs Microsoft 365, has Entra ID for identity, uses Fabric or Synapse for data, and has a Power Platform footprint, Foundry is not a close call. The integration tax of doing the same work on AWS is real, and you pay it monthly. Banks, state government departments, large professional services firms, and most insurance companies we work with fit this pattern.
AWS-heavy organisations land on SageMaker, but the picture is messier. If your data lake is on S3, your warehouse is Redshift, and your engineering culture lives in Terraform, you'll want SageMaker close to the data. But you'll often pair it with Bedrock for the model access, and the line between "SageMaker project" and "Bedrock project" gets blurry quickly. We see a lot of teams using SageMaker for training and inference of their own models, and Bedrock for the LLM layer in the same product.
Startups and data-science-first teams pick SageMaker more often than you'd expect. The notebook-first, training-pipeline-first workflow still feels more natural to ML-trained engineers. Foundry's UI improved a lot in 2025 but it still nudges you toward "use a model" rather than "build a model."
Hybrid is more common than vendors will admit. I'd estimate 30% of the production AI workloads we've seen this year touch both clouds. Usually it's the data layer on one side and the model layer on the other, sometimes for legitimate reasons (existing data gravity, specific model access) and sometimes because two teams made independent decisions before anyone noticed.
Pricing - past the headline rates
Both platforms publish per-token, per-hour, and per-endpoint rates that are easy to copy into a spreadsheet. The spreadsheet will be wrong.
Here's what actually moves the bill in production, with rough AUD ranges from real 2026 projects.
For a moderate LLM application (5-15 million tokens per day, retrieval-augmented, mid-sized enterprise):
- Azure AI Foundry: $8,000 to $25,000 AUD per month, dominated by model inference and Azure AI Search costs.
- Amazon SageMaker plus Bedrock: $9,000 to $28,000 AUD per month, dominated by Bedrock inference and OpenSearch Serverless costs.
The difference at this scale is rounding error. What actually matters is which platform's developer time you're paying for. A senior engineer comfortable on the platform you choose will save you 2-3x what the infrastructure costs.
For a custom-trained model in production (computer vision or domain-specific NLP):
- SageMaker is usually 15-25% cheaper for the training side at scale. The endpoint pricing is similar, but the training pipeline and spot instance support is more mature.
- Foundry has closed most of the gap for fine-tuning of existing models. For training from scratch, SageMaker still wins on cost and on tooling.
For an LLM-heavy product (agents, document processing, customer service automation):
- Foundry is usually 5-10% cheaper because of the native OpenAI pricing and the absence of the Bedrock margin layer.
- If your product needs Claude specifically, Bedrock changes the maths. There's no Claude on Foundry. That single fact decides a lot of projects.
The honest summary is that for most workloads, list-price differences are well inside the noise of how well you architect the system. We've cut bills by 60% on both platforms by changing caching, batching, and model selection. The platform was rarely the lever.
Where each platform actually hurts
Vendor comparison pages skip this section. It's the one you actually need.
Azure AI Foundry frustrations
Quota and capacity remain annoying. Australia East is busy. Getting GPT-4o or o3 capacity in Sydney for a new subscription can take a few days of back-and-forth with Microsoft, and provisioned throughput units are a real planning problem for high-volume workloads. We've had go-lives held up by capacity requests more than once.
The portal is still inconsistent. Foundry has been through three rebrands and the docs lag behind the UI. Engineers regularly find a feature in the portal that has no SDK equivalent yet, or vice versa. It's getting better, but plan for some "ask Microsoft support" moments.
No Claude. If your product needs Claude (and a lot of agent-heavy products do in 2026), you're either using two clouds or you're not on Foundry.
Amazon SageMaker frustrations
The cognitive load is higher. SageMaker is genuinely seven products in a trench coat. SageMaker Studio, Pipelines, Feature Store, Model Registry, Endpoints, JumpStart, Canvas. New team members spend their first month figuring out which piece does what. Foundry is much more opinionated, which can feel limiting until you watch a SageMaker team argue about which service to use.
IAM. You will spend more time on IAM policies than you expect. This is the AWS tax, and it's real. Compared to Azure's RBAC plus Entra ID, AWS identity is more powerful and more painful.
Bedrock pricing is opaque on cross-region inference. If you need Claude in Sydney with regional failover, the bill can surprise you.
Compliance and data residency in Australia
For most Australian clients, both platforms now meet the bar for IRAP-assessed workloads, APP compliance, and APRA CPS 234 expectations. Microsoft has Australia East and West, plus the Australian Government dedicated regions. AWS has Sydney and Melbourne regions and an AWS GovCloud equivalent path for sensitive workloads.
For LLM data residency specifically:
- Foundry: OpenAI models in Australia East stay in region for inference. Fine-tuning data stays in region. Logging is configurable. This was a real concern in 2023 and is settled now.
- SageMaker plus Bedrock: Inference stays in region. Cross-region inference for Bedrock can be enabled or disabled per model. Anthropic Claude is available in Sydney with regional residency.
If you're in financial services, healthcare, or government, both platforms can meet your needs. Get your security team to read the platform's data processing documentation, not a blog post. Then talk to a partner who has actually shipped in your sector. (We do a lot of AI consulting for Australian financial services and healthcare, and the answers genuinely differ by sector.)
A decision framework that actually decides something
Stop reading feature lists. Run through this instead.
Where is your data already? If your data lake, warehouse, and identity provider live in one cloud, the AI platform on that cloud usually wins. The egress cost and integration tax of going cross-cloud is underestimated.
What model do you need access to? If the answer involves Claude, you're either on AWS or you're hybrid. If the answer is GPT-4o, o3, or whatever OpenAI ships next, Foundry wins on day-one access.
What does your team know? A team of .NET and Power Platform engineers will be productive on Foundry in week one. The same team on SageMaker will need a month and probably a hire. The opposite is true for Python ML teams.
How much custom model training do you actually do? Be honest. Most enterprise AI in 2026 is RAG, agents, and fine-tuning of existing models. If you're not training from scratch more than once a quarter, you don't need SageMaker's training story. If you are, you probably do.
What does procurement look like? Some of our clients have enterprise agreements that make one cloud effectively free for some workloads. If your CFO has a $5m commitment to Microsoft, the platform decision was made for you.
If after running through this you still have a tie, pick whichever your strongest senior engineer wants to use. Conviction matters more than the marginal feature difference.
When to bring in outside help
Both platforms reward depth. Generalists ship slower and miss things that bite later: capacity planning, prompt injection mitigation, identity boundaries for agents, evaluation pipelines that actually catch regressions. We've seen six-figure mistakes from teams who skipped this and figured the platform would catch them.
If you're starting an Azure AI Foundry build, our Azure AI Foundry consultants work alongside your engineering team to set up the right patterns from day one. For Microsoft-stack work more broadly, see our Microsoft AI consultants page or our Azure AI consulting service.
If you're on the fence, we run a fixed-scope AI Opportunity Planner engagement that includes a platform decision recommendation along with the use case prioritisation. It usually takes 2-3 weeks and costs less than the first cloud bill of a wrong-platform project.
The one-line answer
Build LLM applications and agents on Foundry. Build custom ML pipelines on SageMaker. If you're already deep in one cloud, just stay there and stop reading comparison posts.
Get in touch via team400.com.au/contact if you want a second opinion on a specific project before you commit. We've seen most of the patterns and we'll tell you straight.