How to Migrate from On-Premises AI to Azure AI Services
Most of the on-premises AI work we see in 2026 was built between 2019 and 2023. It's a Linux box (or three) sitting in a server room, running Python, scikit-learn, PyTorch and a fragile mess of cron jobs. The person who built it has moved on. The model is six versions behind. The hardware is up for refresh and nobody wants to sign off on another GPU procurement.
If that sounds like your environment, this article is for you. It's a buyer's guide to migrating on-premises AI to Azure AI Services, written from the perspective of a consulting team that has done it dozens of times for Australian clients across financial services, manufacturing, healthcare and government.
Why This Migration Is Now Cheaper to Do Than to Avoid
For most of the last decade, the case for on-premises AI was reasonable. Data sovereignty was unclear, cloud GPU prices were eye watering, and many cloud AI services weren't mature. None of those reasons hold up in 2026.
A few numbers from recent client work:
- A NSW state government agency was running a document classification model on two on-prem GPU servers. Total cost of ownership over five years, including hardware, power, cooling, infrastructure team time and re-platforming work, was around $1.2 million AUD. The equivalent Azure deployment over five years modelled out at $410,000 AUD with reserved capacity.
- A Brisbane based logistics company had a demand forecasting workload that needed a hardware refresh. The new on-prem cluster quote came in at $340,000 AUD upfront. The cloud equivalent was about $4,800 AUD per month, with no upfront, and scaled down to about $1,200 per month during quieter periods.
- A Melbourne professional services firm wanted to retire a Windows Server hosting their old custom NLP pipeline. They replaced the entire pipeline with Azure AI Language and Azure OpenAI for a quarter of the original build cost, in 11 weeks.
The cloud isn't always cheaper for everything. But for AI workloads specifically, the economics changed somewhere around 2023 and they haven't reverted.
What "On-Premises AI" Usually Looks Like in 2026
When we walk into an existing on-premises AI environment, we typically find some combination of:
- Python services running on Linux VMs or bare metal, often with conda environments that nobody remembers how to rebuild
- One or more GPU servers (NVIDIA T4, V100, A100) running model training and inference
- A SQL Server, PostgreSQL or Oracle database holding feature data
- File shares with training data, often duplicated and undocumented
- Scheduled batch jobs via cron or Windows Task Scheduler
- A web app or API that exposes model predictions to other systems
- Custom logging that nobody monitors anymore
Before you can plan a migration, you need to actually know what's there. We've started engagements where the client confidently described their AI environment in one paragraph, only to discover six undocumented Python services and a Jupyter notebook running in production on someone's laptop.
The first phase of any migration is honest inventory.
The Five Phase Migration Playbook
Here's the phase structure we use for most migrations. Timings assume a single AI workload of moderate complexity. Multiply for portfolios.
Phase 1 - Discovery and Inventory (1-3 weeks)
The goal is a complete picture of what you have, what it does, and what it costs.
For each AI workload, document:
- Business purpose and stakeholders
- Input data sources, volumes and refresh frequency
- Model type (regression, classification, vision, NLP, LLM)
- Training data location, size and lineage
- Compute footprint (CPU, GPU, memory, storage)
- Dependencies on other systems and APIs
- SLA expectations and current performance
- Compliance, data residency and audit requirements
- Who owns it, who uses it, who supports it
This is also when you decide what's worth migrating at all. Roughly a third of the AI workloads we audit are either no longer used, duplicated by something else, or providing so little value that the right answer is to retire them rather than migrate.
Be ruthless. Migrating dead weight is expensive.
Phase 2 - Target Architecture and Cost Model (2-4 weeks)
For each workload that's worth migrating, design the target state. The question isn't "how do we replicate this in Azure" but "given Azure's services, what's the right way to solve this problem now?"
Common target patterns:
- Custom ML model trained on tabular data: Azure Machine Learning Studio with managed compute and managed online endpoints
- Custom computer vision model: Azure ML Studio for training, Azure ML managed endpoints or Azure Container Apps for inference
- Document understanding or extraction: Azure AI Document Intelligence, often paired with Azure OpenAI for downstream reasoning
- Text classification, sentiment, entity extraction: Azure AI Language services
- Speech to text or text to speech: Azure AI Speech
- Translation: Azure AI Translator
- Search and retrieval: Azure AI Search with semantic ranking
- Chat, summarisation, content generation: Azure AI Foundry with the OpenAI model family
- Agents and tool calling: Azure AI Foundry Agent Service
Once the architecture is sketched, build a real cost model. Use the Azure pricing calculator, but adjust for what we've actually seen in production. Microsoft's calculator tends to underestimate egress, storage growth, monitoring and Azure AI Search indexing costs. Add 25 percent to whatever the calculator says and you'll be closer to reality.
A good cost model has three scenarios: baseline, expected and worst case. The worst case scenario should include what happens if your data volumes double, your model is called more often than expected, and your team forgets to turn off dev environments. Sponsors need to see this. Surprise bills are how migrations get reversed.
Phase 3 - Data Migration and Connectivity (3-8 weeks)
Data is usually the slowest and most painful part. Plan for that.
Things to work through:
- Network connectivity: Do you need ExpressRoute, a site to site VPN, or is public internet plus Private Endpoints enough? For most Australian clients we end up with a Private Endpoint architecture inside an existing hub and spoke network.
- Data residency: All Azure AI Services have Australia East and Australia Southeast region options now, but not every model in the Foundry catalogue is available in both. Check before you commit.
- Bulk data movement: For large historical datasets, Azure Data Box or AzCopy with parallelism. For continuous data feeds, Azure Data Factory or Microsoft Fabric pipelines.
- Database migration: SQL Server and Oracle have well established paths to Azure SQL, Azure SQL Managed Instance or Azure Database for PostgreSQL. Plan for downtime windows.
- Secrets and credentials: Move everything to Azure Key Vault and managed identities. If your current on-prem system stores API keys in environment variables, this is the time to fix it.
- Identity: Wire up Entra ID. RBAC on Azure AI Services is genuinely useful and well integrated.
Do not let the data migration drag on. A common failure mode is for the data engineering team to chase perfection while the AI team waits idle. Get to "good enough" data in Azure as fast as possible and iterate from there.
Phase 4 - Workload Migration and Validation (4-12 weeks per workload)
For each workload, the rough sequence is:
- Stand up the new Azure environment alongside the old one
- Re-implement the workload using the target Azure services
- Run the new system in parallel with the old one, comparing outputs
- Fix the inevitable behavioural differences
- Cut over consumers from old to new
- Decommission the old system once you're confident
Parallel running is essential. Models behave differently in different environments. Tokenisers vary between versions of the same library. A document classification model on Azure AI Document Intelligence might give very similar outputs to your old custom model, but "very similar" is not "identical" and downstream business logic might depend on the differences. Find those differences during parallel running, not after cutover.
We typically run parallel for at least four weeks for any model that drives business decisions, and longer if the decisions involve money. One financial services client insisted on three months of parallel running for a fraud detection workload. They were right to. The new model was better overall but missed a particular fraud pattern that the old model caught. Catching it during parallel run cost a few weeks of tuning. Catching it after cutover would have cost millions.
Phase 5 - Operations Handover (2-4 weeks)
The migration isn't done when the new system goes live. It's done when your operations team can support it.
This phase covers:
- Monitoring: Application Insights, Log Analytics, Azure Monitor dashboards. Set alerts for model performance drift, latency, error rates and cost anomalies.
- Cost controls: Budgets, alerts, tag based cost allocation. Reserved capacity for steady state workloads where it makes sense.
- Runbooks: Documented procedures for common operational tasks. Restarting an endpoint, rotating keys, retraining a model, scaling up for peak load.
- On call handover: Whoever was supporting the old system needs to understand the new one. Pair them with someone who built it for at least a couple of weeks.
- Retraining cadence: Decide who is responsible for monitoring model drift and triggering retraining. This is the part that gets forgotten and causes silent model degradation 12 months later.
What the Migration Actually Costs
Rough fee ranges for Australian engagements in 2026, based on our recent work:
- Discovery and target architecture for a single moderate workload: $25,000-$60,000 AUD
- Portfolio assessment for 5-15 workloads: $60,000-$180,000 AUD
- Full migration delivery for a single workload: $80,000-$300,000 AUD depending on complexity
- Portfolio migration program for a mid sized organisation: $400,000-$1.5 million AUD over 6-12 months
These ranges assume an external consulting team doing most of the work alongside your internal staff. If you have a strong internal AI engineering team and just need architectural guidance and unblockers, costs are significantly lower.
Cloud consumption costs once you're live depend entirely on your workloads. For most clients the new Azure running cost ends up at 40-65 percent of the old on-premises TCO. The savings increase over time as you avoid hardware refresh cycles and reduce the operations burden.
The Risks That Catch People Out
A few patterns that have burned clients we picked up after a failed first attempt:
Underestimating data engineering work. The AI side of the migration is usually the easy part. Cleaning up data sources, fixing pipelines, and dealing with messy legacy schemas is what blows out timelines.
Forgetting about the consumers. Other systems are calling your AI services today. Migration breaks those integrations unless you maintain compatible APIs or coordinate updates. We've seen migrations where the new system worked perfectly but six downstream applications quietly stopped working for weeks.
Skipping parallel running. Some teams skip it because parallel running is awkward. They regret it.
Treating the migration as a rebuild. If you try to fix everything that was wrong with the old system at the same time as migrating, you'll do neither well. Lift first, improve later.
Ignoring data egress and ingress costs. Moving terabytes of training data in and out of Azure has a cost. So does pulling features from on-prem databases over a VPN every time a model scores. Architect for data locality.
Letting one architect design everything. Get a second pair of eyes on the target architecture before you build it. Architecture mistakes are expensive to undo at the point of cutover.
No exit plan. What happens if the migration fails halfway through? Have a clear rollback plan for each workload before you start cutover.
Skills Your Team Needs
For a successful migration, somebody in the engagement needs to understand:
- Azure AI Services capabilities and their limits
- Networking, identity and security in Azure
- Data engineering on Azure (Fabric, Data Factory, Synapse if relevant)
- Python and the relevant ML libraries
- Whatever language the old system was written in, so you can read it
- MLOps practices for model lifecycle management
- Cost optimisation on Azure
If your internal team has most of this and just needs targeted help, an embedded engineer model works well. If you're starting from a low base, a full delivery team is faster. Our forward deployed engineers work in both modes depending on what the client needs.
A Realistic Timeline for a Mid Sized Migration
For a typical Australian mid market organisation with 4-8 AI workloads, expect:
- Months 1-2: Discovery, inventory, target architecture, cost model
- Months 2-4: Data migration, network setup, security baseline
- Months 3-8: Workload migration in waves, with the simplest workloads first
- Months 7-10: Parallel running and cutover for the harder workloads
- Months 9-12: Operations handover, optimisation, decommissioning of old systems
Roughly twelve months end to end is realistic. Trying to compress it to six months is possible but expensive and risky. Trying to stretch it to two years means you'll lose momentum, sponsors and budget.
When to Stop and Reconsider
Not every workload should be migrated. Reasons to pause:
- The workload is providing minimal business value and could just be retired
- The data sovereignty requirements genuinely cannot be met by Azure (rare in 2026, but real for some government use cases)
- The model is so old and the original team so dispersed that a rebuild is cheaper than a migration
- The business is about to be sold, merged or restructured and the AI workload is not strategic
We've talked clients out of migrations more than once. A migration nobody needs is the most expensive migration of all.
Working With Team 400
Team 400 has been delivering AI and cloud migration work for Australian organisations since 2018. We've seen the failure modes, we've fixed the messes, and we've built a delivery approach that gets clients into production quickly without skipping the parts that matter.
If you're scoping a migration from on-premises AI to Azure, our Azure AI consulting service covers the full lifecycle from assessment through delivery. For organisations starting with strategy, our AI strategy consultants can help build the business case before you commit to a migration plan.
If you need senior engineers embedded with your team to drive delivery, our Microsoft AI consultants and Azure AI Foundry consultants work shoulder to shoulder with internal teams across Sydney, Melbourne, Brisbane and remote engagements.
Get in touch via our contact page. The first conversation is a working session with a senior engineer, not a sales pitch. We'll tell you whether a migration is worth doing, and if it is, how to scope it properly.