Back to Blog

UX Guidelines for MCP Apps in Microsoft 365 Copilot - What Actually Works

May 14, 20269 min readMichael Ridland

Every Australian business we work with that is rolling out Microsoft 365 Copilot eventually asks the same question. We have a custom system - a CRM, an ERP, a job management tool, a quoting platform - and we want our staff to be able to use it from inside Copilot. How do we do that, and how do we make it not feel like we have just bolted our software on top of the chat window?

That second part is the bit most teams get wrong. The first version of any custom Copilot agent we see usually looks like someone took the existing web app and tried to recreate it inside the chat panel. Forms, tabs, sidebars, navigation menus. All the things that work in a standalone application and absolutely do not work when a user is having a conversation with an AI.

Microsoft has now published a UX guideline for MCP apps in Copilot which captures most of what we have been telling clients for the last twelve months. It is worth reading in full if you are building anything in this space. What I want to do in this post is talk about which parts of the guidance matter most in practice, where teams keep getting it wrong, and what I would actually do if you handed me a new MCP agent build tomorrow.

The mindset shift that has to happen first

The biggest mistake we see is teams treating the Copilot surface as another deployment target for their existing UI. It is not. A Copilot agent should be doing things that are hard or annoying to do in your normal app, not the same things in a smaller window.

Microsoft puts this as "extract capabilities, do not replicate interfaces" and it is the single most important sentence in the entire document. If your existing app has fifty screens, you do not need fifty Copilot widgets. You need to identify the three or four atomic things a user actually wants to do mid-conversation, and expose those as tools. Everything else can stay in the main app.

We had a property management client in Brisbane last year who wanted their inspection scheduling tool inside Copilot. The initial design from their internal team was a full month view calendar widget with drag and drop. It looked impressive in a demo. In practice nobody used it because the user already had a calendar open in another tab and they did not want to do calendar work in a chat. What they actually wanted was to say "schedule an inspection at 14 Brookfield Road for Thursday afternoon" and have it happen. That is one tool call, one confirmation card, done. We threw away the calendar widget and the adoption metrics jumped.

If you are starting on a custom Copilot agent and you are not sure where to begin, our Copilot Studio consultants do scoping workshops specifically to help teams figure out which capabilities are worth extracting.

Inline mode is the default for a reason

The Microsoft guideline splits MCP apps into two surfaces - inline mode and side-by-side mode. Every app has to support inline. Side-by-side is optional.

In practice, most of what your agent does should live in inline mode. Inline widgets are small cards that appear directly in the chat flow, before the model response. They are for previews, confirmations, simple actions, quick decisions. The constraint Microsoft suggests is that the widget should fit within a single scroll of the response, and that is the right constraint.

A few rules we apply when designing inline widgets:

Limit yourself to two actions per card. Not three, not five. If you find yourself reaching for a third button, you are probably trying to do too much. Split it into two interactions or move it to side-by-side.

Make state explicit. If the widget is loading data, show a loading state. If the action succeeded, confirm it visually. If it failed, show how to recover. Do not rely on the model text to tell the user what happened, because the model text is not always reliable and users often skim it anyway.

Title only when needed. If your card is showing a document or an item with a clear parent, give it a title. If it is showing a single inline result, skip the title and let the content speak.

No internal scrolling. No tabs. No pagination. If you need any of those, you are building the wrong surface. Inline is for summaries, not for systems.

We had one client try to put a sortable, filterable data grid inside an inline widget. It technically worked. Nobody used it because once you have a sort dropdown and a filter row inside a card inside a chat, you are three layers of UI deep and the brain just gives up. They moved it to side-by-side and the same data became useful.

When side-by-side actually earns its keep

Side-by-side is the expanded workspace. It opens alongside the conversation when the user clicks an expand action, and the original inline widget collapses to a small chiclet so the chat context is preserved. This is the right surface for richer work - editing a document, reviewing a comparison table, doing multi-step configuration, working with a canvas.

The mistake here is the opposite of the inline mistake. Teams go too big. They use side-by-side as an excuse to rebuild the entire SaaS product, complete with global navigation, settings menus, profile dropdowns, the lot. Microsoft is explicit on this and they are right. If your side-by-side experience resembles your full application, it has exceeded scope.

The way we think about it is that side-by-side is a workspace for the current task, not a mini version of your app. If the task is "review and approve this quote", side-by-side shows the quote, the line items, an approve button and a reject button. It does not show your customer list, your reporting page, or your admin settings. Those live in the real app and the user can hand off to them via the "open in app" affordance if they need to.

The chat also has to remain the primary surface of intent. Users should be able to keep chatting while side-by-side is open, ask follow-up questions, see Copilot reasoning. If you build something that locks them into the workspace and breaks the conversation, you have built the wrong thing.

Preserve human control - this is the bit enterprise IT cares about

Every Australian enterprise we work with has the same anxiety about Copilot agents. What if it does something it should not? What if it sends an email, books a meeting, updates a record, deletes a file, all without the user understanding what was about to happen?

The Microsoft guideline addresses this directly. Users must remain the ultimate decision-makers. There must be clear visibility into agent actions, explicit confirmations for sensitive operations, and transparent outcomes of what was created, modified or updated.

In practice this means:

Anything destructive needs a confirmation card with the actual data about to be affected. Not a generic "are you sure?" but a card showing the record name, the change being made and who initiated it. If the user is deleting a client record, show the client name and recent activity before they confirm.

Anything that affects external state - emails, calendar invites, payments - should always confirm before sending. Even if it slows the workflow down. The cost of a wrong send is much higher than the cost of one extra click.

Anything that has happened should be confirmed visibly afterwards. A receipt card with what was done and a link to the underlying record. If a user has to ask the agent "did that actually go through?" you have already lost trust.

This sounds obvious but we have reviewed agents from internal teams that quietly fire off API calls based on natural language interpretation without any confirmation step. That is the kind of thing that ends up in an incident report. Build the confirmation layer in from day one.

If you want help thinking through the governance and control patterns for agents in regulated industries, our enterprise AI agents specialists work through this with banks, insurers and government clients regularly.

The "scale density with intent" idea is more useful than it sounds

One of the lines in the Microsoft guideline that I keep coming back to is "scale density with intent". The idea is that the visual footprint of your UI should match what the user is trying to do at that moment. Glanceable summaries get a small inline widget. Real working tasks get the expanded workspace. Same data, different surface, depending on intent.

This is more useful than it sounds because it forces you to think about each tool call as having multiple possible visual representations. A "find the next available appointment" tool might return a single suggested time in an inline card. The same tool, called as part of a broader rescheduling task, might surface as a full calendar view in side-by-side mode. Same underlying capability, different surface based on what the user is doing.

In our builds, we usually start by designing the inline surface and then ask, for each tool, "is there a version of this that needs more room?" If yes, we add a side-by-side representation. If no, we leave it as inline only. About two thirds of the tools we ship end up inline only, which is a useful sanity check on whether you are over-engineering.

The honest assessment

Building agents on the MCP surface in Copilot is still early. The patterns Microsoft is documenting are real and correct, but the tooling around them is moving fast and not all of it is stable. Expect the SDK to change, expect the rendering to behave slightly differently in different Copilot surfaces (Teams, Word, Outlook all have quirks), and expect to do more manual testing than you would for a normal web app.

The flip side is that the early agents that get the UX right are getting genuinely impressive adoption inside the clients we have shipped them with. Once a user discovers they can do a real task in two messages without leaving Copilot, they start asking for more. That is the moment your Copilot rollout stops being a curiosity and starts being part of how work actually happens.

If you want help designing the surface for your own Copilot agent or want a UX review on something you are already building, our Microsoft AI consulting team does exactly this kind of work across Australian enterprises.

You can read Microsoft's full guideline at UX guidelines for MCP apps. It is one of the better pieces of Microsoft design documentation in recent memory and is worth working through carefully before you commit to a UX direction.