Azure AI Architecture Patterns for Australian Enterprise
Architecture decisions made early in an AI project determine what's possible later. Get the architecture right, and your AI system scales, stays secure, and is maintainable. Get it wrong, and you spend months rebuilding what should have taken weeks.
After building dozens of Azure AI systems for Australian enterprises, we've settled on patterns that work. Not theoretical reference architectures from Microsoft's documentation, but patterns refined through production deployment, real data, and actual users breaking things in ways we didn't anticipate.
Here are the seven architecture patterns we use most frequently.
Pattern 1 - Retrieval-Augmented Generation (RAG)
This is the most common pattern we implement. If you're building any kind of knowledge assistant, Q&A system, or search-and-summarise application, RAG is your starting point.
How It Works
- Ingestion pipeline processes your documents - PDFs, Word docs, web pages, database records - into chunks
- Embedding model (Azure OpenAI text-embedding-3-small or text-embedding-3-large) converts each chunk into a vector
- Azure AI Search stores the vectors alongside the original text and metadata
- When a user asks a question, the question is embedded and matched against the stored vectors
- The top matching chunks are retrieved and passed to Azure OpenAI GPT-4o as context
- GPT-4o generates an answer grounded in the retrieved content
Azure Services
[Document Sources] -> [Azure Functions / Data Factory]
|
[Document Intelligence] (OCR, layout)
|
[Azure OpenAI Embeddings]
|
[Azure AI Search] (vector + keyword index)
|
[User Query] -> [Azure OpenAI Embeddings] -> [Search] -> [GPT-4o] -> [Response]
Key Design Decisions
Chunking strategy: This matters more than most teams realise. Too small and you lose context. Too large and you dilute relevance. We typically use 500-1000 token chunks with 100-200 token overlap. But the right size depends on your content - legal documents need larger chunks than FAQ entries.
Hybrid search: Don't rely on vector search alone. Azure AI Search supports hybrid search (vector + keyword) which consistently outperforms either approach independently. Enable semantic reranking for an additional accuracy boost.
Metadata filtering: Tag your documents with metadata (department, document type, date, confidentiality level) and use filters in your search queries. A question about HR policy shouldn't search engineering documentation.
Citation and grounding: Always return source references with AI-generated answers. This builds user trust and makes it possible to verify the AI's claims. Azure AI Search returns document IDs and content snippets that you can link back to original sources.
When to Use This Pattern
- Internal knowledge bases and policy assistants
- Customer support with product documentation
- Research and analysis over document collections
- Any "ask questions about our data" use case
Production Considerations for Australian Enterprise
- Deploy Azure AI Search with at least 2 replicas for high availability
- Use Private Link for both Azure AI Search and Azure OpenAI
- Implement document-level security so users only see content they're authorised to access
- Log all queries and responses for audit purposes
- Budget: $1,500-$3,000/month AUD for a production S1 Azure AI Search + Azure OpenAI consumption. See our pricing breakdown for details.
Pattern 2 - Intelligent Document Processing
The second most common pattern. Automating the extraction of structured data from unstructured documents.
How It Works
- Documents arrive via email, upload, or system integration
- Azure Document Intelligence extracts text, tables, and structure
- For standard documents (invoices, receipts), prebuilt models extract key fields
- For complex or non-standard documents, Azure OpenAI GPT-4o analyses the extracted content
- Extracted data is validated against business rules
- Validated data flows into downstream systems (ERP, CRM, database)
Azure Services
[Document Intake] -> [Azure Blob Storage]
|
[Azure Document Intelligence]
|
[Azure OpenAI GPT-4o] (complex extraction)
|
[Validation Logic] (Azure Functions)
|
[Business System] (ERP, CRM, Database)
Key Design Decisions
Prebuilt vs custom models: Test Azure Document Intelligence's prebuilt models first. For Australian invoices, receipts, and ID documents, they work surprisingly well. Only train custom models when prebuilt accuracy isn't sufficient. See our guide on prebuilt vs custom models.
Two-stage extraction: Use Document Intelligence for structural extraction (tables, key-value pairs, layout) and GPT-4o for semantic understanding (what does this clause mean, is this invoice compliant, summarise this report). Playing to each service's strengths.
Confidence scoring: Azure Document Intelligence returns confidence scores for each extracted field. Route low-confidence extractions to human review automatically. This gives you automation where the AI is confident and human oversight where it isn't.
Error handling: Documents fail. Scanned PDFs with poor resolution, unusual layouts, handwritten annotations. Build a human review queue from day one, not as an afterthought.
Real Numbers
For an Australian client processing 20,000 documents per month:
- Azure Document Intelligence: ~$300/month
- Azure OpenAI (for complex documents, ~30% of volume): ~$200/month
- Azure Functions compute: ~$50/month
- Azure Blob Storage: ~$20/month
- Total: ~$570/month for a system that replaced 1.5 FTE of manual data entry
Pattern 3 - AI Agent with Tool Use
For applications where the AI needs to take actions, not just answer questions. Customer service agents that can look up orders, field service assistants that can create tickets, internal tools that can query databases.
How It Works
- User sends a message (text, voice, or structured input)
- Azure OpenAI GPT-4o interprets the intent
- The model decides which tool(s) to call based on the available functions
- Tool gateway executes the function call (API request, database query, system action)
- Results are returned to the model for interpretation
- Model generates a response incorporating the tool results
- High-risk actions route through approval before execution
Azure Services
[User Interface] -> [Azure App Service / Container Apps]
|
[Azure OpenAI GPT-4o] (with function calling)
|
[Tool Gateway] (Azure Functions)
|
┌─────────────┼─────────────┐
| | |
[CRM API] [ERP API] [Database]
Key Design Decisions
Function definitions: Each tool the agent can use needs a clear, well-documented function definition. The quality of your function definitions directly impacts how well the agent chooses and uses tools. Invest time in writing precise parameter descriptions.
Risk tiers: Not all actions are equal. Reading data is low-risk. Updating a customer record is medium-risk. Processing a refund is high-risk. Implement tiered approval based on action risk:
| Risk Level | Action Examples | Approval Required |
|---|---|---|
| Low | Data lookup, status check | None (autonomous) |
| Medium | Update record, send internal notification | Logged, reviewable |
| High | Financial transaction, customer communication, data deletion | Human approval required |
Conversation memory: Azure Cosmos DB is our go-to for conversation state. It handles the read-heavy, low-latency access pattern well, and its global distribution isn't necessary but its performance is.
Guardrails: Constrain what the agent can do at the tool gateway level, not just in the prompt. If the agent shouldn't be able to delete records, don't give it a delete function. Defence in depth.
When to Use This Pattern
- Customer service automation
- Internal IT helpdesk
- Field service assistants
- Any workflow where users need to both query and act on business data
Pattern 4 - Batch Intelligence Pipeline
Not everything needs real-time inference. Some workloads are better served by batch processing: analyse a month's worth of customer feedback, classify 50,000 support tickets, extract data from a backlog of documents.
How It Works
- Azure Data Factory or Azure Functions (timer trigger) kicks off the pipeline on a schedule
- Input data is read from the source (database, file storage, API)
- Data is processed in parallel through Azure OpenAI or other AI services
- Results are written to the destination (database, data warehouse, report)
- Summary metrics and errors are logged for monitoring
Azure Services
[Scheduler] -> [Azure Data Factory / Functions]
|
[Data Source] (SQL, Blob, API)
|
[Processing Pool] (Azure Functions with concurrency)
|
[Azure OpenAI / AI Services]
|
[Output Store] (SQL, Cosmos DB, Blob)
|
[Monitoring / Alerts]
Key Design Decisions
Rate limiting: Azure OpenAI has rate limits (tokens per minute, requests per minute). Your batch pipeline needs to respect these. Implement exponential backoff and queue-based processing rather than hammering the API as fast as possible.
Cost management: Batch processing large volumes through GPT-4o can get expensive quickly. Use GPT-4o-mini for tasks that don't require the full model's reasoning capability. A routing layer that sends simple items to the cheaper model and complex items to GPT-4o can cut costs by 50-70%.
Idempotency: Batch pipelines fail partway through. Design each processing step to be idempotent so you can safely re-run without duplicating results.
Parallel processing: Azure Functions with queue-based triggering gives you natural parallelism. Put items on an Azure Service Bus queue, and Azure Functions scales out to process them concurrently, within your rate limits.
Real Numbers
A client processes 100,000 customer survey responses monthly:
- Azure OpenAI (GPT-4o-mini for classification, GPT-4o for detailed analysis of flagged items): ~$150/month
- Azure Functions: ~$30/month
- Azure Service Bus: ~$15/month
- Azure SQL Database: ~$100/month
- Total: ~$295/month for insights that previously required a team of analysts working for two weeks
Pattern 5 - Multi-Model Orchestration
For complex tasks that benefit from different models handling different parts of the problem.
How It Works
- Input arrives and is analysed for complexity
- A routing layer assigns the task to the appropriate model or chain of models
- Simple tasks go to GPT-4o-mini (fast, cheap)
- Complex reasoning tasks go to GPT-4o or o3 (smarter, slower, more expensive)
- Specialised tasks go to purpose-built models (Document Intelligence for OCR, Speech Service for transcription)
- Results are aggregated and returned
Key Design Decisions
Router design: The router can be rule-based (if document type = invoice, use Document Intelligence) or LLM-based (use GPT-4o-mini to classify the task complexity and route accordingly). Rule-based is cheaper and more predictable. LLM-based is more flexible.
Model fallback: If GPT-4o-mini produces a low-confidence result, escalate to GPT-4o automatically. This gives you the cost efficiency of the smaller model with the quality backstop of the larger one.
Latency budgeting: Each model call adds latency. If your total latency budget is 3 seconds, and you're chaining 3 model calls, each needs to complete in under a second. Design your chain with latency constraints in mind.
Pattern 6 - Event-Driven AI Processing
For organisations that need AI to respond to business events in near-real-time.
How It Works
- Business event occurs (new email, document uploaded, ticket created, sensor reading)
- Event is published to Azure Event Grid or Azure Service Bus
- Azure Functions picks up the event and sends relevant data to AI services
- AI processes the data (classify, extract, analyse, generate)
- Result triggers downstream actions (update record, send notification, create task)
Azure Services
[Event Source] -> [Azure Event Grid / Service Bus]
|
[Azure Functions]
|
[Azure OpenAI / AI Services]
|
[Action] (API call, database update, notification)
When to Use This Pattern
- Email triage and auto-routing
- Real-time document classification as files are uploaded
- Automated responses to customer enquiries
- IoT sensor data analysis with threshold alerts
- Compliance monitoring on transactions
The event-driven pattern is particularly good for Australian financial services organisations that need to monitor transactions for compliance in near-real-time without adding latency to the transaction itself.
Pattern 7 - AI-Augmented Search
Going beyond basic RAG to build search experiences where AI understands what users actually want, not just what they typed.
How It Works
- User enters a search query (natural language)
- Azure AI Search performs hybrid search (keyword + vector + semantic reranking)
- Top results are passed to Azure OpenAI GPT-4o for answer synthesis
- AI generates a direct answer with citations, plus returns the ranked search results
- Users can drill into source documents for verification
This is RAG's more sophisticated cousin. The difference is that the search experience itself is the product, not just a retrieval step for answer generation.
Key Design Decisions
Faceted search with AI: Use Azure AI Search's faceting alongside AI-generated answers. Users get both a direct answer and the ability to filter and browse results by metadata.
Query understanding: Use GPT-4o-mini to reformulate vague user queries before sending them to search. "What's our policy on WFH?" becomes a structured search for work-from-home policies with relevant metadata filters applied.
Freshness: For document collections that change frequently, implement incremental indexing with Azure AI Search's indexers. Stale search results are worse than no AI at all.
Cross-Cutting Concerns for Australian Enterprise
These apply across all patterns:
Security
- Managed identities for all service-to-service authentication. No API keys in code.
- Private Link for Azure OpenAI, Azure AI Search, and Azure Storage. No public internet traffic for data.
- Entra ID for user authentication with conditional access policies.
- Azure Key Vault for any secrets that can't be avoided (third-party API keys).
- Network Security Groups restricting traffic between subnets.
Observability
- Azure Monitor for infrastructure metrics
- Application Insights for application-level telemetry
- Custom logging for AI-specific metrics: prompt tokens, completion tokens, response latency, confidence scores, user feedback
- Dashboards in Azure Monitor Workbooks or Power BI for stakeholder visibility
Cost Control
- Azure Cost Management with budgets and alerts per resource group
- Model routing to use cheaper models where possible
- Caching for repeated queries (Azure Redis Cache or application-level)
- Autoscaling to avoid over-provisioning (Container Apps and Functions handle this well)
- Monthly cost reviews with the team
Compliance and Governance
- Content Safety filters on all Azure OpenAI deployments
- Audit logging for all AI interactions (who asked what, what was returned)
- Data retention policies aligned with your organisation's requirements
- Model versioning so you can track which model version produced each output
- Regular testing against bias and fairness benchmarks
Choosing the Right Pattern
| Use Case | Primary Pattern | Complexity | Typical Timeline |
|---|---|---|---|
| Knowledge assistant / Q&A | RAG | Medium | 6-12 weeks |
| Document processing | Intelligent Document Processing | Medium | 4-10 weeks |
| Customer service automation | AI Agent with Tool Use | High | 10-16 weeks |
| Bulk data analysis | Batch Intelligence | Low-Medium | 4-8 weeks |
| Complex multi-step tasks | Multi-Model Orchestration | High | 12-20 weeks |
| Real-time event response | Event-Driven AI | Medium | 6-12 weeks |
| Enterprise search | AI-Augmented Search | Medium-High | 8-14 weeks |
Most real-world systems combine 2-3 of these patterns. A customer service platform might use RAG for knowledge retrieval, Agent pattern for action execution, and Event-Driven for ticket processing.
Getting Architecture Right
Architecture mistakes are expensive to fix later. If you're planning an Azure AI system for your organisation, investing in architecture design upfront pays for itself many times over.
We help Australian enterprises design and build Azure AI systems that work in production, not just in demos. Whether you need an architecture review, a proof of concept, or full implementation support, talk to our team.
Explore our Azure AI consulting services for enterprise AI architecture, or see our Azure AI Foundry consulting for hands-on implementation. For a broader view of how we approach AI consulting, visit our services page.