Back to Blog

AI Agent Architecture: Patterns for Production Systems

January 14, 20265 min readMichael Ridland

Building a demo AI agent is easy. Building one that runs reliably in production is hard.

The difference isn't the AI—it's the architecture around it.

Here's what we've learned building AI agents for Australian businesses.

The Simple Agent Pattern

Most agents should start here:

User Input → Agent (LLM + Tools) → Response
                    ↓
              State/Memory

Components:

  • Input processing: Parse and validate user intent
  • Agent core: LLM with system prompt and available tools
  • Tool layer: Functions the agent can call
  • State management: Context across interactions
  • Output formatting: Consistent response structure

This handles most business use cases. Don't over-architect until you need to.

Why Simple Agents Fail

Simple agents break when:

Context exceeds limits: Conversation history fills the context window

Tasks require coordination: Multiple steps with dependencies

Reliability becomes critical: Single failure point brings everything down

Scale requirements emerge: Can't parallelise effectively

When you hit these, you need more sophisticated patterns.

Pattern 1: Agent with Memory Tiers

For long-running conversations or persistent context:

                    ┌─────────────────┐
User Input ─────────│     Agent       │─────── Response
                    │                 │
                    │  Working Memory │ ← Current conversation
                    │  (context)      │
                    │        ↓        │
                    │  Short-term     │ ← Recent relevant history
                    │  (summary)      │
                    │        ↓        │
                    │  Long-term      │ ← Vector DB / persistent
                    │  (retrieval)    │
                    └─────────────────┘

Working memory: Current context (what the LLM sees) Short-term: Summarised recent interactions (hours/days) Long-term: Searchable history, retrieved on relevance

This lets agents maintain context across sessions without context limits.

Pattern 2: Router Agent

For handling varied request types:

                         ┌───────────────┐
                    ┌───│ Scheduling    │
                    │    │ Agent         │
User ──→ Router ────┼───│               │
         Agent      │    └───────────────┘
                    │    ┌───────────────┐
                    ├───│ FAQ           │
                    │    │ Agent         │
                    │    └───────────────┘
                    │    ┌───────────────┐
                    └───│ Escalation    │
                         │ Handler       │
                         └───────────────┘

Router agent's job:

  • Classify intent
  • Route to appropriate specialist agent
  • Handle cases that don't fit cleanly

Specialist agents:

  • Focused system prompt
  • Specific tools for their domain
  • Optimised for one task type

This works better than one agent trying to do everything.

Pattern 3: Multi-Step Orchestrator

For complex tasks requiring multiple operations:

                    ┌─────────────────────────┐
                    │   Orchestrator Agent    │
                    │                         │
                    │  Plan: [Step1, Step2,   │
                    │         Step3, Step4]   │
                    │                         │
                    │  Current: Step 2        │
                    └───────────┬─────────────┘
                                │
        ┌───────────────────────┼───────────────────────┐
        │                       │                       │
   ┌────┴────┐            ┌────┴────┐            ┌────┴────┐
   │  Step 1  │            │  Step 2  │            │  Step 3  │
   │ Complete │            │ Running  │            │ Pending  │
   └──────────┘            └──────────┘            └──────────┘

The orchestrator:

  • Breaks complex requests into steps
  • Tracks progress through steps
  • Handles failures and retries
  • Reports status

Each step can be:

  • An LLM call
  • A tool execution
  • A sub-agent invocation

Use this for workflows like order processing, document workflows, or approval chains.

Pattern 4: Supervisor with Workers

For parallel processing or redundant execution:

                    ┌─────────────────┐
                    │   Supervisor    │
                    │                 │
                    │  Distributes    │
                    │  Aggregates     │
                    │  Validates      │
                    └────────┬────────┘
                             │
        ┌────────────────────┼────────────────────┐
        │                    │                    │
   ┌────┴────┐          ┌────┴────┐          ┌────┴────┐
   │ Worker 1 │          │ Worker 2 │          │ Worker 3 │
   │          │          │          │          │          │
   └──────────┘          └──────────┘          └──────────┘

Use cases:

  • Parallel document processing
  • Consensus-based decisions (multiple perspectives)
  • Redundancy for reliability
  • Scaling throughput

The supervisor handles distribution, aggregation, and quality control.

Error Handling Patterns

Graceful Degradation

When AI fails, have fallbacks:

async def handle_request(request):
    try:
        # Primary: Full AI handling
        return await ai_agent.process(request)
    except AIUnavailableError:
        # Fallback 1: Simpler AI model
        return await simple_model.process(request)
    except:
        # Fallback 2: Rule-based response
        return rule_based_handler(request)

Retry with Backoff

LLM calls fail. Retry sensibly:

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=4, max=10)
)
async def call_llm(prompt):
    return await llm.generate(prompt)

Human Escalation

Know when to give up:

if confidence < 0.7 or attempts > 3:
    return escalate_to_human(
        context=conversation_history,
        reason="Low confidence response"
    )

State Management

Agents need state. Where to keep it:

In-memory: Fast, loses on restart. Good for development.

Cache (Redis): Fast, survives restarts. Good for session state.

Database: Slower, persistent. Good for conversation history.

Vector DB: For semantic retrieval. Good for long-term memory.

Typical production setup:

  • Redis for active session state
  • Postgres for conversation logs
  • Pinecone/Weaviate for memory retrieval

Monitoring and Observability

Production agents need visibility:

Log everything:

  • Input received
  • Agent reasoning (if available)
  • Tools called
  • Output generated
  • Time taken
  • Errors encountered

Track metrics:

  • Response latency (p50, p95, p99)
  • Success rate
  • Escalation rate
  • Tool call patterns
  • Cost per request

Alert on anomalies:

  • Latency spikes
  • Error rate increases
  • Unusual patterns
  • Cost overruns

We've written about monitoring AI agents in detail.

Testing Strategies

Unit Tests for Tools

Tools should work independently:

def test_schedule_appointment():
    result = schedule_tool(date="2026-01-20", time="10:00")
    assert result.success
    assert result.appointment_id is not None

Integration Tests for Agent Flows

Test complete scenarios:

def test_booking_flow():
    agent = TestAgent()
    response = agent.process("Book an appointment for Monday")
    assert "available times" in response

    response = agent.process("10am please")
    assert "confirmed" in response.lower()

Evaluation Sets

Build test datasets with expected outputs. Run regularly to catch regressions.

Scaling Considerations

Horizontal Scaling

Agents are stateless (state lives elsewhere). Scale by adding instances.

Queue-Based Processing

For async workloads:

Input Queue → Worker Pool → Output Queue
                  ↓
            Agent Instances

Cost Management

LLM calls cost money. Control it:

  • Caching for identical queries
  • Smaller models for simple tasks
  • Rate limiting per user/client
  • Cost alerts

Our Approach

When we build AI agents, architecture decisions depend on:

  • Complexity: Simple use cases get simple architecture
  • Reliability requirements: Critical systems get redundancy
  • Scale expectations: High volume gets queue-based processing
  • Integration needs: Deep integration drives architectural choices

Start simple. Add complexity when requirements demand it.

We work with Australian businesses on agent architecture. Get in touch to discuss your use case.