Azure AI Foundry Pricing and Cost Management Tips
Azure AI Foundry pricing is consumption-based, which sounds straightforward until you get your first invoice and wonder where the money went. We've helped dozens of Australian organisations plan and manage their Azure AI spend, and the gap between expected and actual costs is one of the most common problems we solve.
This guide breaks down exactly what costs what, shares real numbers from production workloads, and gives you practical strategies for keeping spend under control.
How Azure AI Foundry Pricing Works
There's no flat licence fee for Azure AI Foundry itself. You pay for the underlying Azure resources you consume. The main cost components are:
- Model inference (API calls to deployed models)
- Compute (for fine-tuning and managed deployments)
- Storage (training data, model artifacts, documents)
- Azure AI Search (if using RAG)
- Networking (data transfer, private endpoints)
Let's break each one down with current pricing in both USD and AUD.
Model Inference Costs - The Biggest Line Item
For most organisations, inference costs make up 50-80% of total Azure AI Foundry spend. This is the cost of calling your deployed models.
Pay-Per-Token Pricing (Serverless API)
Pricing is per million tokens. A token is roughly 4 characters or 0.75 words in English.
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Notes |
|---|---|---|---|
| GPT-4o | ~$2.50 USD / $3.80 AUD | ~$10.00 USD / $15.20 AUD | Best reasoning, highest cost |
| GPT-4o mini | ~$0.15 USD / $0.23 AUD | ~$0.60 USD / $0.91 AUD | Best value for most tasks |
| o1 | ~$15.00 USD / $22.80 AUD | ~$60.00 USD / $91.20 AUD | Advanced reasoning, very expensive |
| o3-mini | ~$1.10 USD / $1.67 AUD | ~$4.40 USD / $6.69 AUD | Good reasoning at moderate cost |
| Llama 3.1 70B | ~$0.27 USD / $0.41 AUD | ~$0.27 USD / $0.41 AUD | Open source, self-hosted option |
| Mistral Large | ~$2.00 USD / $3.04 AUD | ~$6.00 USD / $9.12 AUD | Strong alternative to GPT-4o |
| Phi-3 Mini | ~$0.13 USD / $0.20 AUD | ~$0.13 USD / $0.20 AUD | Smallest cost, simpler tasks |
Prices are approximate and fluctuate. AUD conversion at 1.52. Check Azure pricing for current rates.
What these numbers mean in practice:
A customer service chatbot handling 1,000 conversations per day, with an average of 500 input tokens and 300 output tokens per conversation:
- Using GPT-4o: ~$4.25 USD/day = ~$130 USD/month = ~$197 AUD/month
- Using GPT-4o mini: ~$0.26 USD/day = ~$7.80 USD/month = ~$12 AUD/month
The difference between model choices is massive. We've seen clients reduce costs by 80% by switching appropriate workloads from GPT-4o to GPT-4o mini with no measurable quality loss.
Provisioned Throughput Pricing
For high-volume production workloads, Azure offers Provisioned Throughput Units (PTUs) - reserved capacity at a lower per-token rate.
- You commit to a minimum capacity (measured in PTUs)
- Pricing is per PTU per hour, not per token
- Makes sense when you have consistent, predictable volume
When PTUs make financial sense: Generally when you're spending more than $5,000-$10,000 AUD/month on a single model's inference costs and your traffic is relatively steady. Below that threshold, pay-per-token is more cost-effective.
Compute Costs
Compute charges apply when you're fine-tuning models or running managed compute deployments (as opposed to serverless API).
Fine-Tuning Compute
| VM Size | Price per Hour (AUD approx) | Typical Use |
|---|---|---|
| Standard_NC6s_v3 (1x V100) | ~$4.50 | Small fine-tuning jobs |
| Standard_NC12s_v3 (2x V100) | ~$9.00 | Medium fine-tuning jobs |
| Standard_NC24s_v3 (4x V100) | ~$18.00 | Large fine-tuning jobs |
| Standard_NC24ads_A100_v4 (1x A100) | ~$5.50 | Fast fine-tuning |
| Standard_ND96asr_v4 (8x A100) | ~$44.00 | Large model fine-tuning |
Practical example: Fine-tuning GPT-4o mini on 500 training examples typically takes 20-40 minutes and costs $2-$5 AUD in compute. Fine-tuning GPT-4o on 1,000 examples takes 1-3 hours and costs $10-$30 AUD. These are low compared to inference costs over time.
Managed Compute for Inference
If you deploy open-source models (Llama, Mistral, Phi) on managed compute rather than through serverless API, you pay for the VM by the hour:
- A single A100 GPU for serving Llama 3.1 70B: ~$5.50 AUD/hour = ~$4,000 AUD/month
- A single V100 for serving smaller models (Phi-3, Llama 3.1 8B): ~$4.50 AUD/hour = ~$3,240 AUD/month
When managed compute makes sense: When you have very high volume (millions of tokens per day) and a single model to serve. The break-even point versus serverless API depends on your specific volume, but typically it's above 100 million tokens per month.
Azure AI Search Costs
If you're building RAG applications (which most production AI applications require), Azure AI Search is a significant cost component.
| Tier | Monthly Cost (AUD approx) | Storage | Indexes | Best For |
|---|---|---|---|---|
| Free | $0 | 50 MB | 3 | Experimentation only |
| Basic | ~$110/month | 2 GB | 15 | Small production workloads |
| Standard S1 | ~$370/month | 25 GB per partition | 50 | Most production workloads |
| Standard S2 | ~$1,480/month | 100 GB per partition | 200 | Large document collections |
| Standard S3 | ~$2,960/month | 200 GB per partition | 200 | Very large collections |
What most mid-market Australian organisations need: Standard S1 handles most initial production deployments. You'll likely spend $370-$740 AUD/month on AI Search once you're running a production RAG application.
Vector search adds cost: If you enable vector search (recommended for better retrieval quality), you'll also pay for the embedding model used to vectorise your documents. Using text-embedding-3-small costs roughly $0.02 USD per million tokens - usually negligible compared to other costs.
Storage Costs
Azure Blob Storage for your training data, documents, and model artifacts:
- Hot tier: ~$0.0264 AUD per GB per month
- Cool tier: ~$0.0132 AUD per GB per month
For most AI projects, storage costs are a rounding error - typically under $50 AUD/month unless you're working with very large document collections or media files.
Networking Costs
Usually small, but worth being aware of:
- Private endpoints: ~$13 AUD/month per endpoint. If you have 5 private endpoints for your AI Foundry setup, that's $65 AUD/month.
- Data egress: Charges apply when data leaves Azure or moves between regions. For AI workloads staying within a single region, this is minimal.
Real-World Cost Examples
Here are actual monthly cost breakdowns from Australian organisations we work with (anonymised):
Example 1 - Internal Knowledge Assistant (Mid-Size Professional Services)
| Component | Monthly Cost (AUD) |
|---|---|
| GPT-4o mini inference (50,000 queries/month) | $180 |
| Azure AI Search (Standard S1) | $370 |
| Storage | $15 |
| Application Insights monitoring | $30 |
| Total | $595 |
Example 2 - Document Processing Pipeline (Financial Services)
| Component | Monthly Cost (AUD) |
|---|---|
| GPT-4o inference (20,000 documents/month) | $2,400 |
| GPT-4o mini for pre-classification | $85 |
| Azure AI Search (Standard S1) | $370 |
| Fine-tuned model hosting | $320 |
| Storage (large document archive) | $90 |
| Monitoring | $45 |
| Total | $3,310 |
Example 3 - Customer Service AI (Retail)
| Component | Monthly Cost (AUD) |
|---|---|
| GPT-4o mini inference (200,000 conversations/month) | $720 |
| Azure AI Search (Standard S1) | $370 |
| Content safety API calls | $110 |
| Storage | $20 |
| Monitoring | $30 |
| Total | $1,250 |
Example 4 - Multi-Model Enterprise Platform (Large Enterprise)
| Component | Monthly Cost (AUD) |
|---|---|
| GPT-4o inference (complex analysis tasks) | $8,500 |
| GPT-4o mini inference (high-volume tasks) | $1,800 |
| Llama 3.1 70B on managed compute | $4,000 |
| Azure AI Search (Standard S2, 2 partitions) | $2,960 |
| Fine-tuning runs (periodic retraining) | $200 |
| Storage | $150 |
| Networking (private endpoints) | $130 |
| Monitoring | $120 |
| Total | $17,860 |
Cost Management Strategies That Actually Work
1. Model Routing - Match the Model to the Task
This is the single highest-impact cost optimisation. Not every query needs your most expensive model.
Build a routing layer that sends queries to different models based on complexity:
- Simple factual questions and classifications go to GPT-4o mini (or Phi-3)
- Complex analysis, reasoning, and generation go to GPT-4o
- Straightforward extraction tasks go to the cheapest model that meets accuracy thresholds
One client reduced monthly inference costs from $12,000 to $4,800 AUD by implementing model routing. The approach takes about 2-3 weeks to set up properly.
2. Prompt Optimisation - Reduce Token Count
Every token costs money. Optimise your prompts:
- Remove redundant instructions (the model doesn't need to be told the same thing three ways)
- Use concise system prompts (200-400 tokens is usually sufficient)
- Ask for concise outputs explicitly ("respond in 2-3 sentences" vs. letting the model write an essay)
- Reduce retrieved context in RAG applications (return 3 relevant chunks instead of 10)
A 30% reduction in average prompt length translates directly to a 30% reduction in input token costs. We've achieved this on multiple client projects just by auditing and trimming prompts.
3. Caching - Don't Pay Twice for the Same Answer
If your application receives similar queries repeatedly, implement a caching layer:
- Cache exact-match queries and their responses
- Use semantic similarity caching for near-duplicate queries
- Set appropriate cache expiry based on how often your data changes
For applications with repetitive query patterns (FAQ-style, standard report generation), caching can reduce API calls by 40-60%.
4. Azure Budgets and Alerts
Set up Azure Cost Management budgets from day one:
- Create a budget for each AI Foundry project
- Set alerts at 50%, 75%, and 90% of expected monthly spend
- For development environments, set a hard spending cap
How to set this up: In the Azure Portal, go to Cost Management + Billing, create a budget scoped to your AI Foundry resource group, and configure action groups for email alerts.
5. Reserved Instances and Commitments
If your AI Search or managed compute costs are significant and predictable:
- Azure Reservations: Commit to 1- or 3-year terms for 20-40% savings on compute
- PTU commitments: Provisioned Throughput Units for high-volume inference workloads
- Azure Savings Plans: Flexible compute commitments that apply across VM types
For an organisation spending $5,000+ AUD/month consistently, a 1-year reservation typically saves $12,000-$24,000 AUD annually.
6. Development vs. Production Environments
Don't run development at production scale:
- Use the free tier of AI Search for development
- Use GPT-4o mini (or even Phi-3) for development and testing, even if production uses GPT-4o
- Shut down managed compute deployments outside business hours in development
- Use smaller datasets for development fine-tuning runs
We've seen development environments that cost more than production because nobody thought to scale them down. A simple scheduled script that stops development compute at 6pm and starts it at 8am can save 60% of development compute costs.
7. Monitor and Optimise Continuously
AI costs change as usage patterns evolve. Set up a monthly review:
- Review Azure Cost Management reports by resource and tag
- Identify the top 3 cost drivers
- Check if model routing is working as expected
- Look for anomalies (sudden cost spikes, unexpected usage patterns)
- Review whether cheaper models have become available since your last assessment
Microsoft regularly releases new models and pricing changes. GPT-4o mini didn't exist two years ago - organisations that were paying for GPT-4 Turbo for simple tasks could have saved 90% by switching.
Budgeting for Your First Azure AI Foundry Project
If you're planning your first project and need a budget estimate, here's a framework:
Proof of Concept (4-8 weeks)
| Item | Estimated Cost (AUD) |
|---|---|
| Azure AI Foundry consumption (inference, search, storage) | $1,500 - $4,000 |
| Consulting/development (if external) | $15,000 - $40,000 |
| Total | $16,500 - $44,000 |
Production MVP (first 6 months)
| Item | Monthly Cost (AUD) |
|---|---|
| Inference | $200 - $5,000 |
| AI Search | $370 - $1,500 |
| Storage and networking | $50 - $200 |
| Monitoring | $30 - $100 |
| Monthly total | $650 - $6,800 |
Ongoing Optimisation Savings
Based on our experience, a focused cost optimisation effort after the first 3 months of production typically reduces monthly spend by 25-40%. The investment in optimisation (usually 20-40 hours of work) pays for itself within 2-3 months.
Common Pricing Mistakes to Avoid
Using GPT-4o for everything: The most expensive model is not always the best model for the job. Classify your tasks by complexity and route accordingly.
Leaving development resources running: Managed compute, AI Search instances, and other resources charge 24/7 unless you actively stop or downgrade them.
Ignoring token counts in prompts: A system prompt with 2,000 tokens that runs on every API call adds up quickly at scale. Audit your prompts.
Not setting budgets: Without budgets and alerts, you won't know there's a problem until the invoice arrives.
Over-provisioning AI Search: Starting with Standard S2 when Basic would suffice for your document volume wastes hundreds of dollars per month.
Skipping the caching layer: If even 20% of your queries are repeated or nearly identical, caching pays for itself immediately.
How Team 400 Helps With Cost Management
We include cost analysis and optimisation in every Azure AI Foundry engagement. When we build AI applications for clients, we design the architecture with cost efficiency in mind from the start - model routing, prompt optimisation, caching, and appropriate resource sizing.
For organisations already running Azure AI Foundry workloads, we offer focused cost optimisation assessments. We review your current architecture, identify savings opportunities, and implement changes. Typical engagement length is 2-3 weeks, and the savings typically exceed the engagement cost within 2-3 months.
Want to understand what your Azure AI Foundry project will cost, or need help reducing your current spend? Get in touch and we'll give you an honest assessment.
Explore our Azure AI Foundry consulting, our broader AI consulting services, or learn about our approach as Microsoft AI consultants.