LangChain for Enterprise - Production Deployment Guide
Getting a LangChain application working on your laptop is the easy part. Getting it into production in an enterprise environment - with security reviews, compliance requirements, SLAs, and real users - is where most teams get stuck.
We've deployed LangChain applications into production for Australian enterprises in financial services, mining, professional services, and government. The gap between demo and production is always bigger than teams expect. Here's what it actually takes.
The Production Readiness Checklist
Before any LangChain application goes live for our clients, it needs to pass these gates:
- Security review completed (prompt injection, data leakage, auth)
- Data residency requirements confirmed and verified
- Evaluation framework in place with baseline metrics
- Error handling covers all LLM failure modes
- Observability and tracing configured
- Cost monitoring and alerting active
- CI/CD pipeline with automated testing
- Rollback plan documented and tested
- Runbook for on-call support
- User feedback mechanism implemented
- Rate limiting and usage quotas configured
- Compliance documentation completed
Every item on this list has caused a production incident for us or a client at some point. None of them are optional.
Security for Enterprise LangChain Deployments
Security is the number one concern for enterprise AI deployments, and rightly so. A LangChain application that can access your knowledge base and take actions on behalf of users is a high-value target.
Prompt Injection Defence
Prompt injection is the most common attack vector for LLM applications. An attacker crafts input that causes the LLM to ignore its instructions and do something unintended.
For production LangChain applications, implement defence in depth:
Input validation: Filter and sanitise user input before it reaches the LLM. Block known injection patterns and limit input length.
System prompt hardening: Write system prompts that are resistant to override attempts. Use clear delimiters between system instructions and user input. Test your prompts against injection datasets.
Output validation: Check LLM outputs before returning them to users or executing actions. If the model is supposed to return structured data, validate the structure. If it's supposed to stay within a topic, check for off-topic responses.
Least privilege tool access: If your LangChain agent has tools (database queries, API calls, file operations), restrict each tool to the minimum permissions needed. A knowledge base query tool should have read-only access to the knowledge base and nothing else.
# Example: output validation for a RAG application
def validate_response(response: str, source_docs: list) -> str:
# Check response doesn't contain PII patterns
if contains_pii(response):
return "I found relevant information but cannot display it due to data sensitivity rules."
# Check response is grounded in source documents
if not is_grounded(response, source_docs):
return "I wasn't able to find a reliable answer in the available documents."
return response
Data Residency and Privacy
For Australian enterprises, data residency is often a hard requirement. Your LangChain application architecture needs to ensure that:
- User queries are processed within your approved regions (typically Australia East for Azure)
- Document data stays within approved storage boundaries
- LLM calls go to models deployed in approved regions
- Logs and traces are stored in approved locations
- No data is sent to third-party services without explicit approval
This means being careful about which LangChain integrations you use. Some integrations send data to external services. Audit every component in your chain.
If you're using Azure OpenAI Service, deploy models in the Australia East region. For Azure AI Search, same region. For LangSmith, check their data processing locations - at the time of writing, LangSmith processes data in the US, which may not be acceptable for some Australian enterprises.
Authentication and Authorisation
Every production LangChain application needs proper auth:
User authentication: Integrate with your organisation's identity provider (Azure AD / Entra ID is most common for Australian enterprises). Don't build your own auth.
Document-level access control: If your RAG application searches across documents with different access levels, the retrieval layer needs to respect those permissions. A user should only get results from documents they're authorised to view.
This is harder than it sounds. Most vector stores don't have built-in access control. You need to implement filtering at the retrieval layer based on the authenticated user's permissions.
# Example: permission-aware retrieval
def get_retriever_for_user(user: User, vector_store):
# Get user's accessible document IDs
accessible_docs = get_user_document_permissions(user)
# Filter retrieval to only accessible documents
return vector_store.as_retriever(
search_kwargs={
"k": 5,
"filter": {"document_id": {"$in": accessible_docs}},
}
)
API key management: Never hardcode API keys. Use Azure Key Vault or your organisation's secrets management system. Rotate keys on a schedule.
Scaling LangChain Applications
Architecture Patterns
For enterprise workloads, we deploy LangChain applications using this architecture:
API layer: FastAPI or Azure Functions handling HTTP requests, authentication, and rate limiting.
Orchestration layer: LangChain chains and agents processing requests asynchronously.
Retrieval layer: Azure AI Search with the vector store indexed and ready.
LLM layer: Azure OpenAI Service with multiple model deployments for different use cases.
Queue layer: For high-throughput scenarios, use Azure Service Bus or Redis to decouple request intake from processing. This prevents slow LLM calls from blocking your API.
Handling Concurrency
LLM API calls are slow (1-10 seconds typically). Your LangChain application needs to handle concurrent users without blocking.
Async execution: Use LangChain's async interfaces throughout. Every chain, retriever, and tool call should be async.
# Use async chain execution
response = await chain.ainvoke({"question": user_query})
Connection pooling: Reuse HTTP connections to Azure OpenAI Service. Create the client once and share it across requests.
Caching: Cache embeddings, frequent queries, and retrieval results. For a knowledge base that doesn't change hourly, caching can reduce LLM costs by 30-50% and improve response times significantly.
Rate Limit Management
Azure OpenAI Service has rate limits measured in tokens per minute (TPM) and requests per minute (RPM). In production, you will hit these limits.
Strategies:
- Implement retry with exponential backoff: LangChain supports this natively.
- Queue and throttle: Use a request queue to smooth out traffic spikes.
- Multiple deployments: Deploy the same model in multiple Azure regions and load-balance across them.
- Request prioritisation: Give high-priority users or use cases faster access during rate-limited periods.
Monitoring and Observability
You cannot operate a production LangChain application without observability. When a user reports a bad answer, you need to trace back through every step and understand why.
What to Monitor
Request-level metrics:
- End-to-end latency (target: under 5 seconds for simple RAG, under 15 seconds for agent workflows)
- Token usage per request (input and output)
- Retrieval quality scores
- Error rates and types
System-level metrics:
- LLM API latency and error rates
- Vector store query performance
- Memory usage and CPU utilisation
- Queue depth (if using async processing)
Quality metrics:
- User satisfaction (thumbs up/down on responses)
- Answer accuracy (measured via periodic human evaluation)
- Hallucination rate (responses not grounded in retrieved context)
- Retrieval relevance scores
Observability Stack
For Azure-native deployments, we typically use:
- Azure Application Insights: Request tracing, performance monitoring, alerting
- Azure Monitor: Infrastructure metrics and log aggregation
- Custom LangChain callbacks: Logging chain execution details, token usage, and retrieval results
- LangSmith (optional): If data residency permits, LangSmith provides the best LangChain-specific tracing
Build a custom callback handler that logs every chain step:
from langchain.callbacks.base import BaseCallbackHandler
class ProductionCallbackHandler(BaseCallbackHandler):
def on_chain_start(self, serialized, inputs, **kwargs):
# Log chain start with request ID
pass
def on_retriever_end(self, documents, **kwargs):
# Log retrieved documents and relevance scores
pass
def on_llm_end(self, response, **kwargs):
# Log token usage, latency, model used
pass
def on_chain_error(self, error, **kwargs):
# Log error details, alert if critical
pass
Alerting
Set up alerts for:
- Error rate exceeds 5% over a 5-minute window
- P95 latency exceeds your SLA target
- Daily token spend exceeds budget threshold
- Hallucination rate increases (if you have automated detection)
- Azure OpenAI Service is returning rate limit errors
CI/CD for LangChain Applications
Testing Strategy
LangChain applications need three types of tests:
Unit tests: Test individual components (document loaders, text splitters, output parsers) with deterministic inputs and outputs. These are fast and reliable.
Integration tests: Test chains end-to-end against real LLM APIs. These are slow and non-deterministic. Use a smaller, cheaper model (GPT-4o-mini) for integration tests and set generous pass criteria.
Evaluation tests: Run your evaluation dataset against the full application and check that quality metrics don't regress. These are the most important tests for LangChain applications. If answer accuracy drops below your baseline after a code change, the deployment should fail.
# Example: evaluation test in CI
def test_rag_quality():
results = run_evaluation(eval_dataset, chain)
assert results["answer_correctness"] >= 0.85, \
f"Answer correctness {results['answer_correctness']} below threshold 0.85"
assert results["faithfulness"] >= 0.90, \
f"Faithfulness {results['faithfulness']} below threshold 0.90"
Deployment Strategy
Use blue-green or canary deployments for LangChain applications. LLM behaviour can change subtly with code changes that look minor. Deploy to a small percentage of traffic first and monitor quality metrics before rolling out fully.
Prompt version control: Treat prompts as code. Store them in version control, review changes in pull requests, and test them against your evaluation dataset before deployment.
Model version pinning: Pin specific model versions in your deployment configuration. Azure OpenAI model updates can change behaviour. Don't let a model update surprise you in production.
Governance and Compliance
Responsible AI
Australian enterprises deploying AI applications need to consider:
- Bias testing: Test your application for biased outputs, particularly if it makes decisions affecting people
- Transparency: Users should know they're interacting with an AI and understand how to escalate to a human
- Human oversight: For high-stakes decisions, implement human-in-the-loop patterns where the AI recommends and a human approves
- Audit trail: Log every decision, recommendation, and action for compliance review
Documentation
Enterprise deployments need documentation that satisfies your risk, compliance, and security teams:
- System architecture and data flow diagrams
- Security controls and threat model
- Data processing inventory (what data goes where)
- Model cards (which models are used, their capabilities and limitations)
- Incident response procedures
- Business continuity plan
We've found that spending time on this documentation early actually speeds up the overall project. Security and compliance reviews are faster when teams can hand over clear documentation rather than answering ad-hoc questions.
Common Enterprise Deployment Mistakes
Going straight to production without evaluation
We've seen teams build a LangChain application, do some manual testing, and push it live. Within a week, users find hallucinated responses, and trust is damaged. Build your evaluation framework before you deploy. It's cheaper to find problems in testing than in production.
Over-centralising the AI platform
Some enterprises try to build a single LangChain platform that serves every team's AI needs. This usually results in a lowest-common-denominator solution that doesn't serve anyone well. Let individual teams build fit-for-purpose applications on shared infrastructure (Azure OpenAI, Azure AI Search) rather than a shared application layer.
Underestimating ongoing operations
A production LangChain application isn't a set-and-forget deployment. Models need updating, prompts need tuning, documents need re-indexing, and user feedback needs to be incorporated. Budget for ongoing operations from the start. We typically recommend allocating 15-20% of the build cost per year for ongoing operations.
Not planning for model changes
LLM providers regularly update and retire models. Azure OpenAI model versions have retirement dates. Your application needs to be designed so that model changes are a configuration update, not a code rewrite. Abstract your model selection so you can switch models with minimal friction.
Timeline for Enterprise Deployment
Based on our experience deploying LangChain applications for Australian enterprises:
| Phase | Duration | Key Activities |
|---|---|---|
| Discovery and Design | 2-3 weeks | Requirements, architecture, security review |
| Proof of Concept | 2-4 weeks | Build working POC with real data |
| Production Build | 6-10 weeks | Full development, testing, evaluation framework |
| Security and Compliance Review | 2-4 weeks | Penetration testing, compliance documentation |
| Staged Rollout | 2-3 weeks | Canary deployment, monitoring, user feedback |
| Total | 14-24 weeks |
Teams that skip the POC phase or rush security review consistently end up taking longer overall.
How Team 400 Helps
We're a Brisbane-headquartered AI consulting company that specialises in taking LangChain applications from prototype to production for Australian enterprises.
Our team has deployed LangChain applications into regulated environments including financial services, mining, and government. We know what Australian enterprise security teams are going to ask, and we build the answers into the architecture from day one.
We work as LangChain consultants across the full delivery lifecycle - from architecture design through to production deployment and ongoing operations. We also bring deep Azure AI expertise to ensure your LangChain applications are deployed on well-architected Azure infrastructure.
If you're planning an enterprise LangChain deployment, talk to our team. We'll walk you through what a production deployment looks like for your specific use case and compliance requirements. Learn more about our AI agent development and consulting services.