Building AI Agents for Enterprise: Architecture Patterns

March 12, 2025•6 min read•Team 400

Last month, a client showed me their "AI agent" project. A developer had built something clever with LangChain over a weekend. It worked great on his laptop.

Then they tried to deploy it. Security team blocked it immediately. No audit logging. No access controls. No way to explain what it was doing or why. The compliance team had questions that nobody could answer.

This is the gap between AI demos and AI production. Let me walk through the architecture patterns that actually work in enterprise environments.

The Core Problem

Enterprise AI agents need to do something that sounds simple but isn't: take autonomous actions in systems where mistakes have consequences.

When an AI agent updates a customer record, processes a refund, or sends an email, that action is real. You can't A/B test it. You can't sandbox it forever. At some point, it has to interact with production systems and real data.

The architecture needs to make this safe, observable, and controllable.

Pattern 1: The Human-in-the-Loop Gradient

Not all agent actions are equal. Checking an order status is low-risk. Issuing a $50,000 refund is high-risk. Your architecture should reflect this.

We use a three-tier model:

Tier 1 - Autonomous: Agent can execute without human approval

Read-only operations (data lookups, status checks)
Low-value write operations below defined thresholds
Actions with easy reversal (draft creation, internal notes)

Tier 2 - Supervised: Agent proposes, human approves

Medium-value transactions
Customer-facing communications
Changes to important records

Tier 3 - Assisted: Agent prepares, human executes

High-value decisions
Actions with compliance implications
Anything involving external legal commitments

The boundaries between tiers should be configurable, not hardcoded. What's "low-value" varies dramatically between businesses.

Agent Action Request
        ↓
   Risk Assessment
        ↓
   ┌────┴────┐
   │ Tier?  │
   └────┬────┘
        ↓
  ┌─────┼─────┐
  ↓     ↓     ↓
Tier 1  Tier 2  Tier 3
  ↓     ↓       ↓
Execute → Queue for → Prepare
Immediately  Approval   Materials

Pattern 2: Tool Isolation

Your agent needs to interact with external systems, CRM, ERP, email, databases. Each integration is a potential security hole.

The pattern that works: every external interaction goes through a tool gateway.

The agent doesn't call Salesforce directly. It calls your tool gateway with a request like update_contact(id="001xxx", field="phone", value="0412345678"). The gateway:

Validates the request against a schema
Checks permissions (can this agent modify this record?)
Logs the request
Makes the actual API call
Logs the response
Returns the result to the agent

This gives you a single point of control for:

Access policies
Rate limiting
Audit logging
Error handling
Credential management

We've seen teams give their agent raw database access "for speed." Three months later, they're trying to reconstruct what changes were made and by whom. Don't do this.

Pattern 3: Conversation State Management

Enterprise agents often handle multi-turn conversations that span hours or days. A customer starts a request, goes to lunch, comes back, continues with new information.

The naive approach is stuffing everything into the LLM context window. This breaks down because:

Context windows have limits
Costs scale with context size
Older information gets "forgotten" even within the window

Better pattern: structured state + summarised context.

Maintain explicit state:

{
  "conversation_id": "conv_12345",
  "customer_id": "cust_789",
  "intent": "order_modification",
  "current_step": "awaiting_new_address",
  "collected_data": {
    "order_id": "ord_456",
    "original_address": "123 Old St",
    "new_address": null
  },
  "actions_taken": [
    {"action": "lookup_order", "timestamp": "...", "result": "success"}
  ]
}

When building the prompt for each turn, include:

The structured state
A summary of the conversation (not full transcript)
The last 2-3 exchanges verbatim
Any relevant retrieved context

This keeps costs manageable while maintaining coherence.

Pattern 4: Retrieval Architecture

Most enterprise agents need access to company knowledge, policies, procedures, product information, customer history.

The standard approach is RAG (Retrieval-Augmented Generation): embed documents into vectors, search for relevant chunks, include them in the prompt.

But naive RAG fails in enterprise contexts because:

Policies have hierarchy (general rules, exceptions, exceptions to exceptions)
Recency matters (last week's price list, not last year's)
Source authority matters (official policy vs. someone's Slack message)

Our enterprise RAG pattern:

Structured retrieval first: Before vector search, check if the question maps to structured data. "What's the refund policy for electronics?" should hit a structured policy database, not keyword search.

Source scoring: Not all retrieved content is equal. Weight results by:

Document recency
Document authority (official docs > wiki > emails)
Document specificity (product-specific > general)

Citation requirements: Every fact the agent states should be traceable to a source. If the agent can't cite it, it shouldn't say it.

Pattern 5: Observability Stack

You cannot operate what you cannot observe. Enterprise AI agents need comprehensive logging at multiple levels.

Conversation level: Full transcripts, state changes, outcomes. Searchable, filterable, exportable.

Decision level: For every agent decision, capture the prompt, the response, the extracted action, and the reasoning. You'll need this when something goes wrong.

System level: Latencies, error rates, token usage, cost tracking. Standard APM stuff, but applied to AI-specific metrics.

Business level: Resolution rates, escalation rates, customer satisfaction, time to resolution. The metrics that matter to stakeholders.

We typically build dashboards for each level and set up alerts for anomalies. A sudden spike in escalations might indicate a prompt regression. A drop in resolution rate might mean the knowledge base is stale.

Pattern 6: Graceful Degradation

AI systems fail in ways traditional software doesn't. The LLM returns nonsense. The embedding service times out. The retrieved context is irrelevant.

Design for this:

Confidence thresholds: If the agent's confidence is below a threshold, don't guess. Escalate or ask clarifying questions.

Circuit breakers: If the LLM API is slow or erroring, route to fallback behaviour (queue for human, provide canned responses, etc.).

Timeout handling: LLM calls can be slow. Users shouldn't wait 30 seconds for a response. Set aggressive timeouts and handle them gracefully.

Fallback chains: If the primary model fails, try a backup. If retrieval fails, acknowledge limitation rather than hallucinating.

Putting It Together

Here's the high-level architecture we typically deploy. This architecture leverages enterprise-grade frameworks like Microsoft's AutoGen and Semantic Kernel, which is why many of our clients benefit from working with Microsoft AI agent framework consultants who understand both the theoretical patterns and the practical implementation details.

User Interface (Chat, Voice, Email)
            ↓
    Conversation Router
            ↓
    Agent Orchestrator
      ↙     ↓      ↘
  State   LLM     Tool
  Manager  Gateway  Gateway
            ↓        ↓
    Retrieval   External
    System      Systems
            ↓
    Human Review Queue (Tier 2/3)
            ↓
    Audit & Analytics

Every component is independently scalable, replaceable, and observable.

What This Costs to Build

For a properly architected enterprise AI agent:

Initial build: $100,000-$250,000 for the platform and first use case Each additional use case: $30,000-$80,000 (leveraging existing platform) Ongoing operations: $5,000-$15,000/month depending on volume and complexity

The mistake is building quick and cheap, then trying to bolt on enterprise requirements later. It's always more expensive. Working with an experienced AI development firm in Brisbane ensures enterprise-grade architecture from day one.

Next Steps

If you're building AI agents for enterprise deployment, we're happy to discuss architecture patterns for your specific situation. We've deployed agents that handle thousands of daily interactions and can share what we've learned. As our Brisbane team, we bring real enterprise experience to every conversation.

Talk to our team about your agent architecture.