Back to Blog

How to Build a RAG Application with LangChain and Azure - The Step-by-Step Guide

May 26, 20269 min readMichael Ridland

If you're reading this, you've probably already decided that retrieval-augmented generation is the right approach for your use case. You've got documents, a knowledge base, maybe a SharePoint full of policy PDFs, and you want a chatbot or agent that can answer questions accurately without making things up. Good. That's the right problem for RAG to solve.

What you're trying to figure out now is whether LangChain on Azure is the right stack, what the build actually looks like, and what it costs. I've built RAG systems for Australian clients ranging from a 40-person professional services firm in Brisbane to a national retailer with seven figures of monthly Azure spend. The architecture is broadly the same. The decisions that matter are usually about pragmatism, not novelty.

This is the guide I wish I'd had three years ago.

What you actually get from RAG (and what you don't)

A RAG application takes a user question, finds the relevant chunks of information from your documents, stuffs them into a prompt with the question, and lets an LLM answer based on that context. That's it. There's no magic.

What you get: answers grounded in your actual data, with citations back to source documents. The model can say "I don't know" instead of inventing facts. You can update the knowledge base without retraining anything.

What you don't get: deterministic answers. You don't get an agent that can take actions on your behalf (that's a different pattern). You don't get something that "understands" your business the way a person does. And you don't get free accuracy. If your documents are contradictory, outdated, or poorly written, your RAG app will be too.

We've turned down two RAG projects in the last six months because the client didn't actually have the documents they thought they had. One had a SharePoint with 11 years of meeting minutes and zero policy documentation. RAG was not the answer for them.

The reference architecture on Azure

For most Australian businesses, the stack looks like this:

  • Azure OpenAI for the LLM (GPT-4.1 or GPT-4o for the answering model, text-embedding-3-large for embeddings)
  • Azure AI Search as the vector store with hybrid search
  • Azure Blob Storage for source documents
  • Azure Functions or Container Apps for the ingestion pipeline
  • LangChain as the orchestration layer in Python or TypeScript
  • Azure App Service or Container Apps for the API
  • Application Insights for tracing and observability

Why these choices? Data sovereignty is a real concern in Australia. Running everything in the Australia East region keeps you on the right side of most procurement and compliance conversations. Azure OpenAI gives you the OpenAI models with enterprise data handling commitments. Azure AI Search has a hybrid search mode that combines keyword and vector search and outperforms pure vector search on almost every benchmark I've seen.

I'd skip Pinecone, Weaviate, and Qdrant for most Microsoft-shop clients unless there's a specific reason. The integration with Azure AD, the procurement story, and the lack of a second vendor relationship all push toward Azure AI Search.

Step 1 - Document ingestion

This is where most projects either succeed or quietly die. Get this right and the rest is mechanical. Get it wrong and you'll spend months chasing hallucinations that are actually retrieval failures.

The basic flow:

  1. Pull documents from source (SharePoint, blob storage, a CRM, whatever)
  2. Extract text (PDFs with Azure Document Intelligence, Word with python-docx, HTML with BeautifulSoup)
  3. Chunk the text
  4. Generate embeddings
  5. Push to Azure AI Search

LangChain's document loaders cover most common formats. For PDFs with tables, scanned content, or any structure that matters, use Azure Document Intelligence with the layout model. It costs around AUD $1.50 per 1,000 pages and the quality difference vs. raw PDF extraction is enormous.

Chunking is the most important decision you'll make. Default fixed-size chunking (say 1,000 characters with 200 overlap) works for narrative text. It is terrible for policy documents, manuals, or anything with sections. We almost always end up with a custom chunker that respects document structure. For SharePoint policy libraries this might mean chunking by section heading. For product catalogues it means one chunk per product.

LangChain ships with RecursiveCharacterTextSplitter and MarkdownHeaderTextSplitter which cover a lot of ground. Don't be afraid to write your own splitter. It's usually 50 lines of Python and saves you weeks of debugging downstream.

Step 2 - Embeddings and the vector store

Use text-embedding-3-large from Azure OpenAI. It's 3072 dimensions, which costs more storage than the smaller models but produces measurably better retrieval. For an Australian SMB with under a million chunks, the cost difference is negligible (we're talking tens of dollars per month).

In Azure AI Search, create your index with both a vector field and the original text. Add filters for any metadata you'll want to scope queries by (document type, date, department, customer tier). Filterable metadata is the single biggest accuracy lever you have. If a user asks about "leave policy" and you can filter to HR documents, you've eliminated 95% of the noise before the LLM ever sees a chunk.

Set up hybrid search with the HybridCountAndFacetMode for semantic ranking enabled. The semantic ranker in Azure AI Search is included in the Standard tier and above and consistently improves results by 10-15% in our testing.

A reasonable Australia East setup:

  • Azure AI Search Standard S1: around AUD $400/month
  • Storage for embeddings: usually under AUD $50/month for SMB workloads
  • Embedding generation: roughly AUD $0.18 per million tokens

For a 500,000 chunk knowledge base, expect total monthly infrastructure of AUD $600-900 before query costs.

Step 3 - The retrieval and generation chain

Here's where LangChain earns its keep. The LCEL (LangChain Expression Language) syntax lets you compose chains that are easy to test and observable.

A minimum viable RAG chain looks like this in pseudocode:

  1. Take the user query
  2. Optionally rewrite it (handle pronouns, expand acronyms, generate sub-queries)
  3. Retrieve top-k chunks from Azure AI Search using hybrid search
  4. Rerank with a cross-encoder if accuracy matters (we use Cohere's reranker via Azure Marketplace)
  5. Format chunks into a prompt with clear citation markers
  6. Call Azure OpenAI with the formatted prompt
  7. Parse citations back out of the response
  8. Return answer + sources

Query rewriting matters more than people realise. A user asks "what about parental leave for casuals?" - your vector search won't find the right chunk because the policy document says "non-permanent employees" not "casuals". A small rewriting step using gpt-4o-mini to expand the query against a glossary of company terms can lift recall by 20-30%.

Reranking is where you trade cost for quality. Pulling 50 candidates and reranking down to 5 with a cross-encoder is meaningfully better than pulling 5 candidates straight from vector search. The cost is a few cents per query.

Step 4 - Evaluation, which everyone skips

I'll keep this short because it deserves its own article. You cannot ship a RAG application responsibly without a test set. Build a set of 50-100 question-answer pairs that represent real user queries. Run it after every meaningful change to your prompts, chunking, or retrieval logic.

We use RAGAS for automated metrics (faithfulness, answer relevancy, context precision, context recall) and pair it with human review for the top 20 queries. Without this you're flying blind. With it you can confidently ship changes.

Step 5 - Production concerns

The things that will bite you in production:

Cost runaway. A user with curiosity and an API key can spend AUD $500 in a weekend. Implement per-user rate limiting at the API layer. Use prompt caching in Azure OpenAI (cached input tokens are 50% off and most RAG prompts have a lot of repeated system content).

PII leakage. If your documents contain customer PII, your model will quote it back. Strip PII at ingestion or filter responses. Australian Privacy Principles apply to your RAG outputs the same as any other system.

Stale documents. Build incremental ingestion from day one. Full reindexing every night seems fine until your knowledge base is 50GB.

Observability. Use LangChain's tracing (LangSmith is excellent for development, expensive at scale) or pipe everything through Application Insights with custom telemetry. You need to be able to answer "why did the model say that" for any given response.

Prompt injection. Users will try to break out of the RAG context. Especially if your app is public-facing. Use Azure AI Content Safety in front of both inputs and outputs, and never give the model tool-use capabilities it doesn't need.

Build vs buy decision

I'm a consultant. I have an obvious bias. But let me give you the honest answer.

Build with LangChain on Azure when:

  • You have specific document handling requirements (table-heavy PDFs, technical drawings, structured data)
  • You need fine control over chunking and retrieval logic
  • You'll integrate the RAG into existing applications with custom UI
  • You expect to evolve the system substantially (add agents, tool use, multi-step reasoning)
  • Total project budget is over AUD $40,000

Use Azure AI Foundry's built-in RAG patterns or Copilot Studio when:

  • Your documents are reasonably standard (Word, PDF, no complex structure)
  • You want a chat UI out of the box
  • You don't have engineering resources to maintain a custom stack
  • Speed to value matters more than customisation

For a deeper comparison of where Copilot Studio fits, see our Copilot Studio consulting work. For Azure-native AI builds, our Azure AI Foundry consultants handle the full implementation lifecycle.

What does a LangChain RAG build actually cost in Australia

For an Australian SMB or mid-market organisation, a production-ready RAG build with us typically lands in these ranges:

  • Proof of concept (single document type, 50-200 documents, basic web UI): AUD $15,000-25,000, 3-4 weeks
  • Production MVP (multi-source ingestion, hybrid search, evaluation suite, authentication, observability): AUD $45,000-90,000, 8-12 weeks
  • Enterprise RAG platform (multi-tenant, role-based access, complex document handling, agent capabilities): AUD $150,000+, 4-6 months

Ongoing Azure infrastructure for a typical mid-market deployment runs AUD $1,500-4,000/month depending on query volume and knowledge base size.

These ranges assume you're working with a consultancy that's done this before. If you're paying a generalist developer to learn RAG on your dime, expect 2-3x these numbers and a much higher chance the project quietly fails.

When to call us

We're a Sydney-based AI consultancy that builds these systems for Australian businesses. Most of our RAG work is for professional services firms, insurers, and mid-market software companies who need defensible, accurate question-answering over their own data.

If you've read this far and you're trying to decide whether to build or buy, or you have a half-finished RAG project that's not performing, get in touch. A 30-minute conversation usually clarifies whether you need an AI agent developer for a custom build, a LangChain consultant for architecture review, or whether something simpler will do the job.

You can reach the team via the contact page or read more about how we work with clients on the services page.