Back to Blog

How a Microsoft 365 Copilot Declarative Agent Actually Works - The Architecture Worth Understanding

May 27, 20269 min readMichael Ridland

Whenever a client tells me they want to build a declarative agent for Microsoft 365 Copilot, the first thing I want to know is whether they understand what they are not building. They are not building a model. They are not hosting an inference endpoint. They are not running a custom orchestration layer. The agent is a configuration that runs on top of the Copilot platform Microsoft already operates. The architecture question is mostly about where the boundaries are, what you get for free, and what you have to build yourself.

The Microsoft Learn article on declarative agent architecture covers the formal picture. Useful reference. What I want to do here is talk about how this fits together in practice, the implications it has for how you scope work, and where Australian teams tend to be surprised when they start building.

The big idea

A declarative agent is a customised version of Microsoft 365 Copilot. Same underlying model. Same chat surface. Same enterprise data plane (Microsoft Graph, sensitivity labels, the lot). What you change is the agent's instructions, the knowledge it can access, the actions it can take, and how it presents itself.

That is the architectural shift. You are not building an AI system. You are configuring one Microsoft already built. The platform handles the model calls, the grounding, the safety filters, the streaming, the chat history. Your responsibility is the bit on top.

This is actually quite a relief for most teams. It means you do not have to think about token costs, vector databases, prompt orchestration, or hosting. It also means there is a hard ceiling on what you can do. If your use case needs something the platform does not give you, a declarative agent is not the right tool. You probably want Copilot Studio or a custom solution.

The components, plain English

Five things make up a declarative agent. Worth knowing what each one does because they end up being where most of the architectural decisions land.

The agent manifest. A JSON file that describes the agent. Its name, description, starting prompts, instructions, allowed knowledge sources, allowed actions. This is the spine of the agent. When you change the agent, you are mostly changing this file.

Instructions. A natural language section inside the manifest that tells the model how to behave. "You are a customer service assistant for an Australian energy company. Always be polite. Always cite your sources. Never quote prices, refer the customer to the live pricing team." This is your prompt engineering canvas. It is also where most of the experimentation time goes.

Knowledge sources. Pointers to where the agent can ground its answers. SharePoint sites, OneDrive locations, Microsoft Graph content, public web search, uploaded files. The agent retrieves from these sources at runtime, not at build time. There is no embedding step you control. The platform handles it.

Actions. Plugins the agent can call. These can be API plugins (an OpenAPI document pointing at a REST API) or Office JS plugins (running inside an Office host). Actions are how the agent does things, as opposed to just answering questions.

Conversation starters. Optional but useful. Prompts that appear as buttons when the user opens the agent. Helps users discover what the agent is good for.

These five components are loaded together when the user opens the agent. The platform routes the user's chat through the model with your instructions, your knowledge sources, and your actions in scope. Output comes back through the same chat surface.

What runs where

The architectural picture often surprises people. Let me lay out the data flow.

User types a message in the Copilot chat. The chat client is in Teams, in the Microsoft 365 web client, in Outlook, wherever Copilot is surfaced. The agent's manifest has been deployed into the user's M365 tenant.

The platform receives the message. It identifies which agent the user is talking to. It loads the agent's manifest. It composes a prompt for the model that includes the user's message, the agent's instructions, the chat history, and (if relevant) retrieved content from the configured knowledge sources.

The model generates a response. If the model decides to call an action, the platform invokes the action. If the action is an API plugin, that is an outbound HTTPS call to wherever you have hosted the API. If the action is an Office JS plugin, the call goes to the local Office Add-in. Either way, the result comes back to the model, which folds it into the response.

The response streams back to the user. Sensitivity labels, citations, and policy controls are applied by the platform on the way out.

What is sitting on your side of the wall is the manifest, the instructions, the action implementations (if any), and the content in your knowledge sources. Everything else is Microsoft's.

This is genuinely useful. It also means your debugging story is constrained. You can see what you sent in (manifest, instructions, action responses) and what came out (the assistant's reply). The bit in the middle is not yours to inspect. If the agent is doing something weird, your levers are limited to changing the manifest, changing the actions, or changing the knowledge sources.

Decisions you have to make early

A few architectural choices that show up in every declarative agent project.

Knowledge scope. Which SharePoint sites, OneDrive folders, public sources can the agent see. This is a security decision as much as a quality decision. Broader knowledge means better answers in some cases, more risk of the agent surfacing the wrong thing in others. Start narrow. Expand based on real usage.

Action surface. Does the agent need to do things, or just answer questions. If just answering, you can ship a knowledge-only agent with no actions and no custom hosting. Much simpler. If it needs to take actions, you are building API plugins (and now possibly Office JS plugins) and you have a bigger surface to maintain.

Tone and behaviour. What does the agent sound like, what does it refuse, what does it escalate. This is mostly an instructions problem. Allow time to iterate. The first version of the instructions is always too short. The fourth version is usually about right.

Surface placement. Where does the user actually find the agent. Teams sidebar, a Copilot chat, a SharePoint embed, a Loop component. The answer affects how you brief users and how you measure adoption.

Lifecycle and ownership. Who owns the manifest. Who can change it. Who reviews changes. Declarative agents are configuration, not code, but they still need governance. If the marketing team can deploy an agent that touches the finance team's SharePoint, you have a problem.

We help clients work through these decisions in the early scoping phase. Often it is a workshop, a draft manifest, a small POC, and a refined manifest before any production deployment. Worth doing properly the first time. Cleaning up an over-broad agent later is harder than scoping it tightly upfront.

This is the work we do as part of our Microsoft 365 Copilot consulting engagements and broader Microsoft AI consulting. The architectural patterns are stabilising quickly, and getting the first agent right makes the second and third one much faster to deliver.

What is good about this architecture

The big win is leverage. You inherit a model, a chat experience, a security model, a governance layer, and an identity story. For an Australian organisation already on Microsoft 365, that is a lot of capability you do not have to build, host, or maintain. You ship a manifest. The platform does the rest.

The grounding story is also strong. Because the agent has direct access to Microsoft Graph data in the user's context, answers can be specific and current without you having to build an ingestion pipeline. The agent can cite a SharePoint document. The user can click through to it. The link respects the user's permissions. This is the kind of pattern that takes months to build from scratch in a custom RAG architecture.

Cost model is predictable. Declarative agents run inside the user's Copilot licence. No per-token billing on your side. No infrastructure bill that scales with usage. This matters for budget planning, particularly when you are scaling out from one agent to many.

What is still rough

Honest assessment. Three things.

The model is whatever Microsoft is running. You cannot pick. If your use case needs a specific model for tone, length, or domain knowledge, you are constrained by the platform's choice. Most of the time this is fine. Occasionally it is not, and the answer is to build outside the declarative agent framework.

The orchestration is closed. If the agent is making decisions you do not like, you can iterate on instructions and actions, but you cannot inspect the chain of thought or fine-tune the decision boundary. Debugging is a guessing game informed by experience.

Production telemetry is improving but not yet at the level you would want for a serious business application. Knowing which agent answered which question well, and which one badly, is harder than it should be. Plan to instrument usage yourself if you need real insight.

None of these are deal breakers. They are just the trade-offs you accept in exchange for the leverage.

When to use a declarative agent

If you are an Australian organisation on Microsoft 365, with content in SharePoint or OneDrive, and you want to give your users a focused assistant for a specific job, this is often the right starting point. Cheap to build. Cheap to run. Reuses what you already have. Predictable security story.

If the agent needs deep custom orchestration, multiple model choices, or to live outside the M365 boundary, you want Copilot Studio or a custom build. We help clients pick between these patterns regularly, and the right answer depends on the specific job, the audience, and the regulatory constraints.

For most teams, the first declarative agent is a learning experience as much as a delivery. By the second or third one, the patterns are obvious, the governance is in place, and the rollout is fast. The architectural framework being clear in your head from the start makes that progression much smoother.

If you are scoping a declarative agent project and want a sounding board on the architecture before you commit to a direction, that is exactly the kind of conversation we have with Australian organisations most weeks.

Reference: Declarative agent architecture - Microsoft Learn.