Back to Blog

MCP Apps - How Model Context Protocol Is Getting a UI Layer

March 15, 20267 min readMichael Ridland

MCP (Model Context Protocol) has been the quiet workhorse of the AI tooling world for the past year. If you've built anything where an AI agent talks to external systems, you've probably bumped into it. The protocol handles the plumbing - tool definitions, data exchange, authentication - and it does that job well.

But there's been an obvious gap. MCP servers can return text and structured data. That's it. No charts, no forms, no interactive elements. When an AI agent needs to show you a dashboard or collect structured input through a form wizard, you're on your own. Every host application (Claude Desktop, ChatGPT, custom chat UIs) has been implementing visual capabilities differently, if at all.

MCP Apps is a new extension to the protocol that fills this gap. It's currently at version 1.1.2 and it introduces a standardised way for MCP servers to deliver interactive UIs to host applications. Having spent time going through the spec, here's what I think Australian development teams should know about it.

What Problem Does This Solve?

Think about the AI agents you've built or used. They fetch data, process it, and return a text response. That works fine for simple lookups and summaries. But the moment you need to show a data visualisation, display a live progress indicator, present a multi-step configuration wizard, or render a 3D model preview - text falls apart.

The current workaround is messy. Some teams build separate web UIs. Others hack together markdown-rendered tables and hope for the best. Every implementation is bespoke, none of them interoperate, and the user experience ranges from passable to painful.

MCP Apps says: here's a standard for this. Build your UI once, declare it as part of your MCP server, and any host that supports MCP Apps can render it.

The Architecture in Plain English

Three things work together:

The server is a standard MCP server that also declares UI resources alongside its tools. These UI resources are HTML templates registered under a ui:// URI scheme - think of them as pre-declared web pages that the server can serve up when needed.

The host is whatever chat application your users interact with - Claude Desktop, a custom-built chat interface, whatever. When a tool returns a result that has an associated UI, the host renders that UI in a sandboxed iframe.

The view is the actual UI running inside that iframe. It acts as an MCP client itself, meaning it can call tools on the server, read resources, and even send messages back into the chat conversation.

The communication between views and hosts happens via JSON-RPC over postMessage. If you've ever built an iframe-based widget, this will feel familiar. If you haven't, the key thing to know is that the iframe is properly sandboxed - no access to the host's DOM, cookies, or storage.

The Lifecycle

There's a five-phase lifecycle that I think is well thought out:

Discovery happens when the host first connects to an MCP server. The host learns what tools are available and what UI resources exist. This is where the host can prefetch and security-review the UI templates before anything runs. Smart move - it means the security evaluation happens at connection time, not at runtime when a user is waiting.

Initialisation fires when the host decides to render a view. It creates an iframe, loads the UI resource, and the view sends a ui/initialize message to confirm it's ready.

Data delivery is when the host passes tool arguments and results into the view. The view gets the data it needs to render something useful.

Interactive phase is where things get interesting. Users interact with the UI - clicking buttons, filling forms, adjusting sliders - and those interactions can trigger tool calls back to the server. A chart might let you drill down into data. A form wizard might validate inputs and call APIs as you step through it.

Teardown notifies the view before the host unmounts it. Clean shutdown, no orphaned connections.

Tool Visibility - A Subtle but Important Detail

Not every tool should be visible to both the AI model and the UI. MCP Apps introduces three visibility levels:

  • Both (default) - the AI agent and the UI can both use the tool
  • App-only - only the UI can call this tool. The AI agent doesn't even know it exists
  • Model-only - only the AI agent can use it, the UI can't

App-only tools are the interesting case. You might have a tool that handles UI-specific operations - saving user preferences within a widget, fetching paginated data for a table, or triggering a client-side export. These don't need to clutter up the AI agent's context. Keeping them app-only means the agent's reasoning stays focused on the tools that matter for its task.

Display Modes

Views can be rendered in three modes:

Inline embeds the UI directly in the chat flow. Good for small data visualisations, compact forms, or status indicators.

Fullscreen takes over the screen for immersive experiences. Data dashboards, complex configuration panels, or rich media viewers.

Picture-in-picture gives you persistent floating widgets. Think a live progress monitor that stays visible while you continue chatting with the agent.

Views declare which modes they support. The host decides how to implement them. This is a sensible separation - the server says what's possible, the host decides what's practical given its own UI constraints.

The Security Model

This is where MCP Apps earns points. Views run in sandboxed iframes with strict isolation from the host. No DOM access, no cookies, no storage leakage. Servers declare what external network domains they need via CSP (Content Security Policy) metadata, and hosts enforce a restrictive-by-default policy. If a server doesn't declare a domain, it can't connect to it.

For anyone building enterprise AI solutions, this matters. You don't want an MCP server's UI quietly phoning home to unexpected endpoints. The explicit domain declaration means security teams can review and approve network access at connection time.

Progressive Enhancement - The Right Default

Here's my favourite design decision in the spec. If a host doesn't support MCP Apps, the tools still work - they just return text instead of UI. Your MCP server doesn't break. Users who don't have UI-capable hosts still get their data, just without the visual polish.

This means you can adopt MCP Apps incrementally. Build your server, add UI resources, and hosts that support them will render rich experiences while hosts that don't will gracefully fall back. No feature flags, no conditional logic, no "check if the host supports X" code paths.

What This Means for Agent Development in Australia

We've been building AI agent solutions across various platforms, and the pattern we keep hitting is the "last mile" problem. The agent fetches and processes data brilliantly, but presenting it to the user in a useful format is where things fall apart. Tables in markdown are barely readable. Charts described in text are useless. Multi-step workflows explained as numbered lists are error-prone.

MCP Apps gives agent builders a standard way to solve this. Instead of building custom frontends for every agent interaction that needs more than text, you build UI resources as part of your MCP server and let the host handle rendering.

For teams working with Azure AI Foundry or building custom agent platforms, this is worth watching closely. The spec is still maturing but the architecture is sound and the design decisions are pragmatic rather than theoretical. The security model alone makes it a better option than the ad-hoc iframe approaches most teams are cobbling together today.

If you want to dig into the technical details, the full MCP Apps specification is well documented and readable. Worth an afternoon if you're building anything that needs agents to show more than text.