Back to Blog

When to Use Azure Cognitive Services vs Custom AI Models

June 1, 202610 min readMichael Ridland

The two most expensive AI mistakes I see in Australian businesses are mirror images of each other. The first is building a custom model when an off-the-shelf API would have done the job in a week. The second is forcing an off-the-shelf API onto a problem where it never had a chance, then blaming the technology when results are mediocre.

We saw both in the same month last year. A retailer spent eight months training a custom product image classifier that ended up only 2% more accurate than Azure AI Vision's prebuilt categories. Around the same time, a healthcare client tried to use the standard Azure AI Language sentiment API on clinical notes and got results that were close to random. Both projects burned six-figure budgets that did not need to be burned.

This post is the framework I run through with clients before either of those mistakes can happen. It is opinionated, because the neutral version of this advice is what got those companies into trouble in the first place.

What we are actually choosing between

Azure has two broad ways of buying AI capability.

Azure AI Services (the new name for what used to be called Cognitive Services). These are prebuilt APIs that do specific things. Text translation. Image OCR. Speech to text. Document Intelligence. Content safety. Sentiment analysis. Face detection. You call the API, you get a result, you pay per call. Microsoft maintains the model. You do not train it, you do not own the weights, you do not need to know how it works inside.

Custom AI models. These can be fine-tuned versions of a base model on Azure (think GPT-4.1 fine-tuned on your own dataset, or a custom Azure AI Document Intelligence model trained on your specific document layouts). They can also be fully custom models trained from scratch in Azure Machine Learning. You own the resulting weights, you maintain them, you pay for the training and the hosting.

In between sits a third option that often gets forgotten: prompt engineering on top of a foundation model. You take GPT-4.1 or Claude Sonnet, you write a clever system prompt with examples, and you ground it on your own data via retrieval. No fine-tuning, no model ownership, but tailored behaviour. This is the right answer more often than either of the extremes.

The decision criteria that actually matter

Here are the questions I ask, in order. The first one that gives you a clear answer is usually the answer.

How accurate does it need to be, and how do you know?

This sounds like an obvious question and almost nobody can answer it crisply. "We need it to be accurate" is not a target. "We need 92% precision on extracting invoice line items, measured against a labelled set of 500 invoices that reflect our supplier mix" is a target.

When clients give us a clear accuracy target, the decision often makes itself. We run the off-the-shelf service on a sample, we measure the gap, and we know whether tuning is needed.

A rule of thumb: if the prebuilt service is within 5 percentage points of your target, prompt engineering or grounding will usually close the gap. If it is 5 to 15 points away, fine-tuning is probably the right move. More than 15 points away and you may have either picked the wrong service or you have a problem that needs a custom model from scratch.

How specific is your domain language?

The prebuilt services are trained on broad data. They do well on general business text, general images, general audio. They struggle on:

  • Clinical and medical notes
  • Legal and regulatory text
  • Heavy code mixing and acronyms (mining, defence, mortgage broking)
  • Niche industry jargon (insurance claims phrasing, specific trading desk terminology)
  • Non-standard document layouts that change frequently

If your data is full of language your average reader would not understand without a glossary, prebuilt services will hit a ceiling. Custom is usually the right answer.

How much labelled data do you have, or can you create?

Custom models eat data. A fine-tuned classifier needs at least a few hundred labelled examples to do anything useful, and ideally a few thousand. A custom Document Intelligence model needs at least five examples per layout, but really wants twenty or more for production quality.

If you do not have the data and cannot reasonably get it within a couple of months, custom is off the table regardless of how appealing it sounds.

This is where prompt engineering shines. You can do impressive things with five well-chosen few-shot examples and no labelled training set.

How often does the problem space change?

A custom model is a snapshot. It learns the world as it was when you trained it. If your document layouts change every six months, or your product taxonomy gets revised, or new regulatory categories appear, you will be retraining constantly. Each retrain is real engineering effort.

Prebuilt services keep getting updated by Microsoft on a cadence you do not control. That is good when you want to ride the improvements, and bad when an update changes behaviour you were relying on. We have had clients caught both ways.

If your problem space is stable, custom is fine. If it is changing fast, lean toward grounded foundation models with prompt engineering, which adapt by changing prompts rather than retraining.

What is the cost picture across two years?

This is where the numbers usually settle the argument. Indicative ranges in AUD for typical Australian projects in 2026:

Approach Build cost (AUD) Annual run cost (AUD)
Azure AI Services prebuilt API $5,000 to $25,000 integration $5,000 to $80,000 in API calls
Foundation model with prompt engineering and RAG $30,000 to $120,000 $20,000 to $200,000 in inference
Fine-tuned foundation model $80,000 to $250,000 $40,000 to $300,000
Custom model trained from scratch $200,000 to $800,000+ $50,000 to $400,000

For most projects with up to a few hundred thousand API calls a month, prebuilt services are dramatically cheaper. The crossover point where custom wins on pure cost is usually higher volume than people assume, and it almost never happens in year one.

The exception is when accuracy improvements from a custom model translate directly to business value. A 4% accuracy gain on a process that handles 50,000 transactions a day, where each error costs $20 to remediate, justifies a lot of model training.

A decision tree we use with clients

I have a one-page version of this we share with clients. The short version:

  1. Can a prebuilt Azure AI Service do the job to your accuracy target? Use it. Stop overthinking.
  2. If not, can a foundation model with good prompts and your own data hit the target? Use grounding and prompt engineering. This is usually the right answer.
  3. If grounding still falls short and you have labelled data, fine-tune.
  4. If fine-tuning still falls short, and the problem genuinely needs a custom architecture, then build from scratch. This is rare.

Most of our engagements end at step 1 or step 2. Step 3 happens once or twice a year. Step 4 happens maybe once every two years across our whole client base, and it is almost always in highly specialised industrial or scientific contexts.

Where prebuilt services punch above their weight

A few specific Azure AI Services that consistently impress us:

Document Intelligence prebuilt models. The invoice, receipt, ID, and contract models are genuinely good. We have replaced multi-month custom development projects with two-week integrations.

Speech to text with custom vocabulary. The base service plus a small custom phrase list will get you 95%+ accuracy on most Australian business audio without training a custom acoustic model. Aussie accents and place names work surprisingly well.

Content safety. If you are deploying any kind of generative AI to customers, the prebuilt content safety service is not optional. Trying to roll your own is a waste of time.

Translator. For nearly all languages we deal with in the Australian market, the prebuilt service hits production quality.

Where prebuilt services hit a wall

The places we consistently see prebuilt services fall over:

Industry-specific document extraction. If your forms are unique to your industry and have varying layouts, the prebuilt models will not get you all the way. Custom Document Intelligence models are usually the answer.

Sentiment on specialised text. Clinical notes, legal opinions, technical support tickets in highly technical domains. The generic sentiment API gets confused. Either use a foundation model with prompts, or fine-tune.

Niche image classification. Detecting defects in manufacturing, identifying species, classifying medical images. Prebuilt vision will miss most of the nuance. Custom Vision or a fine-tuned vision model is the path.

Conversation summarisation in specialised contexts. The generic summarisation does a reasonable job on emails and meetings. It does a poor job on clinical handover notes or detailed financial advisor conversations.

Common objections we hear

"We need custom because we are special." Often the team genuinely believes this, and they are sometimes right. More often, the prebuilt services have not been seriously tested against the actual data. Always test before you build.

"We do not trust Microsoft with our data." Reasonable concern, and the answer is the regional deployment options plus your own data isolation policies. Australia East and Australia Southeast keep your data onshore. The prebuilt services do not train on your prompts. If you need more, you can self-host fine-tuned open weights models in Azure, which keeps you inside your own subscription.

"Our procurement team only buys software, not APIs." This is more common than you would think and is usually solved by quarterly committed-use agreements through your Microsoft account team. The unit economics on volume are decent.

"We want to own the model so we are not locked in." Fair. But owning a custom model is also a commitment to maintaining it, monitoring drift, and retraining. The lock-in argument is real but is rarely the strongest reason to go custom on its own.

How this conversation usually plays out with us

When clients come to us with this question, our first session is almost always a structured discovery. What is the actual business problem, what does the data look like, what does accuracy mean here, what is the volume, and what are the alternatives. Most of the time we can answer prebuilt versus custom in a single workshop.

If you want help running that conversation properly, our Azure AI consultants work with Australian businesses across exactly this decision. We have shipped projects on both ends of the spectrum, which means we are not selling you the answer that suits our incentive.

For broader Microsoft work, our Microsoft AI consultants page lays out the wider Azure ecosystem. If you are specifically focused on document and data extraction use cases, our AI integration services team has done this end-to-end on Document Intelligence many times.

If you are not sure where to start at all, the AI opportunity planner is a short engagement that gets you to a prioritised list of high-value use cases before you commit to any tool. It is the most common starting point for clients who know they should be doing AI but do not yet know where it pays off most.

The honest summary: prebuilt services win more often than people assume, custom wins more decisively when it is the right call, and the worst outcome is committing to either path before you have measured what the simpler option can do. We help clients avoid that with an honest first conversation. If that sounds useful, get in touch.