Azure Foundry Tools - The AI Services You're Probably Already Using (and Some You Should Be)
If you've spent any time building AI applications on Azure, you've used at least one of these services - even if you didn't know their current name. Speech-to-text, document processing, computer vision, translation. Microsoft has been shipping these pre-built AI capabilities for years, and they've gone through several branding cycles: Cognitive Services, Azure AI Services, and now Foundry Tools under the Microsoft Foundry umbrella.
The name changes are annoying (I'll grant you that), but the services themselves have gotten quietly better with each iteration. For Australian businesses building AI applications, Foundry Tools are the "don't build it yourself" layer of the stack. Production-ready AI capabilities you call via an API instead of training your own models.
Here's what's actually available, what's worth paying attention to, and how we've used them in client projects.
What Foundry Tools Are
Foundry Tools are pre-built AI services that run as managed APIs on Azure. You provision a resource, get an endpoint and key, and start making API calls. No model training, no ML infrastructure, no data science team required.
They sit in a specific spot in the AI stack: above raw model inference (where you'd use GPT-4o or Claude through Foundry Models) but below fully custom solutions. If you need to extract text from invoices, Foundry Tools has a service for that. If you need to build a novel document understanding system that handles a format nobody's seen before, you'll probably need custom model work.
The full list of services is in the Microsoft Learn documentation, but here's what matters in practice.
The Services Worth Knowing
Speech
The Speech service handles speech-to-text, text-to-speech, translation, and speaker recognition. It's been around for years and the accuracy has gotten much better.
We've used it for meeting transcription pipelines, voice-enabled internal tools, and accessibility features. The real-time transcription works well with Australian English accents now, which wasn't always the case. Custom speech models are available if the base model struggles with industry jargon - you train on your organisation's specific terminology and the results improve noticeably.
One thing to watch: latency matters for real-time use cases. If your users are in Australia and you're running the Speech service in a US region, the round-trip time is noticeable. Check regional availability before committing to a design.
Document Intelligence
This is the one we use most frequently, by a wide margin. Document Intelligence (formerly Form Recognizer) extracts structured data from documents - invoices, receipts, contracts, identity documents, tax forms.
The pre-built models handle common document types out of the box. The invoice model, for example, extracts vendor name, invoice number, line items, totals, and tax amounts without any training. If your business processes a lot of documents, this alone can eliminate hours of manual data entry.
Custom models are where it gets good. You can train Document Intelligence on your organisation's specific document formats (internal forms, industry-specific paperwork, legacy templates) with as few as five training samples. We've built document processing pipelines that handle hundreds of documents per hour with accuracy rates above 95%, and the errors are consistently in edge cases (handwritten annotations, heavily redacted documents, poor scan quality).
If you're building any kind of document automation, look at this first. Our Azure AI consulting work frequently starts here because the ROI shows up in the first month.
Translator
Translator supports over 100 languages and handles both text and document translation. The quality for major language pairs (English to Mandarin, Japanese, Korean, Spanish) is solid for business communication. Not literary quality, but clear and accurate enough that you won't embarrass yourself.
For Australian businesses with multilingual customer bases or international operations, the document translation feature pulls its weight. Upload a Word doc or PDF, get it back translated with formatting preserved. It's not perfect. But it's a decent first pass that cuts professional translation costs.
Custom Translator lets you train models on your terminology. If your organisation uses terms that don't translate well out of the box (technical jargon, brand names, industry acronyms), custom training makes a real difference.
Language
The Language service bundles a bunch of natural language processing capabilities: sentiment analysis, key phrase extraction, named entity recognition, text summarisation, and conversational language understanding (the successor to LUIS).
Named entity recognition is the one we use most. Feed it a block of text and it identifies people, organisations, locations, dates, quantities, and other entities. We've used it for automatically categorising support tickets, pulling key details out of emails, and building search indexes over unstructured text.
Sentiment analysis has improved, but I'd still call it "directionally useful" rather than reliable for individual items. Across thousands of customer reviews, sentiment trends tell you something real. On a single customer email? The model sometimes misreads sarcasm or context-dependent tone. Use it for trends, not for routing individual customer interactions.
Vision
Computer Vision analyses images and videos: object detection, OCR (optical character recognition), image captioning, spatial analysis. The OCR capability overlaps with Document Intelligence, but Vision is more general-purpose. Need to read text from photos rather than structured documents? Vision is the right tool.
We've used it for quality inspection workflows, automated image categorisation, and extracting text from legacy scanned documents that don't have enough structure for Document Intelligence.
Content Safety
Content Safety detects harmful content in text and images: hate speech, violence, sexual content, self-harm references. If you're building any user-facing AI application, you need content safety filtering. Full stop.
This service runs as part of the broader Foundry platform's safety layer, and you can configure it in your Azure AI Foundry deployments. You set sensitivity thresholds per category, which matters because a healthcare app and a creative writing tool have very different tolerance levels.
Azure AI Search
Azure AI Search is technically part of the Foundry Tools family now. It's a managed search service with built-in AI enrichment, and it's the backbone of most RAG (Retrieval-Augmented Generation) implementations on Azure.
If you're building an AI application that needs to answer questions based on your organisation's documents, AI Search is almost certainly part of the architecture. It handles document ingestion, chunking, embedding generation, vector search, and hybrid search (combining traditional keyword search with vector similarity). We've written about this in the context of AI agent development - agents that need to retrieve information from enterprise knowledge bases lean heavily on AI Search.
Services Being Retired
A few older services are being phased out, and if you're using them, it's worth planning a migration:
- Anomaly Detector: Retired. Azure Monitor and custom ML models are the replacements.
- Content Moderator: Being replaced by Content Safety, which is a straight upgrade.
- LUIS (Language Understanding): Replaced by Conversational Language Understanding within the Language service. If you have LUIS apps in production, the migration path is clear but you still have to do the work.
- QnA Maker: Replaced by custom question answering in the Language service.
- Metrics Advisor and Personalizer: Both retired.
If any of these are in your production systems, don't panic, but do put migration on your roadmap. Microsoft typically gives 12+ months notice before actually switching services off.
Pricing: The Free Tier Is Real
Most Foundry Tools have a real free tier (the F0 SKU) that's actually useful for development and testing. The limits vary by service, but for Document Intelligence you get 500 pages per month free, and for Speech you get five hours of transcription per month. That's enough to build and test a proof of concept before spending anything.
Production pricing is usage-based. You pay per transaction, per page, per audio minute, or per character depending on the service. The costs are fair for what you get. Document Intelligence at scale costs a fraction of what manual data entry would, and Speech-to-text is way cheaper than human transcription rates.
The gotcha is that costs stack up fast at high volumes. A document processing pipeline that handles 50,000 invoices per month needs a proper cost estimate before going live. We always model costs during the design phase. Learning about them from your first Azure bill is not fun.
How They Fit Into the Broader Foundry Platform
Foundry Tools aren't standalone anymore. They're integrated into the Microsoft Foundry platform alongside Foundry Models (GPT-4o, Claude, etc.) and the Foundry Agent Service. What this means in practice is you can combine pre-built AI capabilities with custom model inference in a single application.
Here's what that looks like: an AI automation pipeline that receives a scanned document, uses Document Intelligence to extract structured data, sends the extracted text to GPT-4o for classification and summarisation, then uses the Language service to identify entities. All orchestrated through a single Foundry project with unified authentication, monitoring, and governance.
That composability is the real selling point. Individual services are fine on their own. Combine them well and they handle workflows that would take months to build from scratch.
Practical Advice
Start with pre-built models. Before investing in custom model training, test the base models against your actual data. They've gotten good enough that they handle many use cases without customisation.
Check regional availability. Not every service tier is available in Australian Azure regions. For latency-sensitive applications (real-time speech, interactive document processing), running in Australia East or Australia Southeast matters. For batch processing, deploying to a US region with better availability and lower pricing is a fair trade-off.
Think about multi-service resources. You can provision a single Foundry Tools resource that gives you access to multiple services with one endpoint and key. Simpler to manage, but cost attribution gets messy if multiple teams share the resource. For production, we typically recommend separate resources per service or per team so you can actually track what's costing what.
Version your custom models. If you're training custom models (Custom Speech, custom Document Intelligence models, Custom Translator), version them properly and keep training datasets alongside them. When you retrain with new data, compare accuracy against the previous version before promoting to production.
The Foundry Tools documentation on Microsoft Learn covers each service in detail. For the full picture of how these fit into an AI project on Azure, start with the Foundry Tools overview. If you need help figuring out which services fit your specific situation, that's the kind of problem we work through with Australian businesses every week.