Back to Blog

Writing Instructions for Microsoft 365 Copilot Declarative Agents - What Actually Works

June 6, 20268 min readMichael Ridland

The instructions field on a declarative agent is the bit that does most of the work. Get it right and the agent behaves predictably, stays on topic, and does what you actually wanted. Get it wrong and you've built an expensive way to confuse your staff. We've now shipped a few dozen of these for Australian clients, and I can tell you the difference between a good agent and a bad one is almost always the instructions, not the connectors or the knowledge sources.

This post is a practical view on writing instructions for declarative agents in Microsoft 365 Copilot. The official Microsoft docs (linked at the bottom) cover the structure. What they don't cover is the dozen little patterns that decide whether a real user finds the agent useful or quietly stops using it after the first awkward interaction.

What declarative agents actually are

If you haven't built one yet, a declarative agent is a custom version of Microsoft 365 Copilot that you define with a JSON manifest. You give it a name, a description, an icon, some instructions, and optionally some grounding sources (SharePoint sites, files, websites) and actions (API calls via plugins). It runs inside Copilot Chat, Teams, or wherever Copilot lives in M365.

The big appeal is that you don't have to host anything. No bot framework, no Azure infrastructure, no orchestration code. You drop the agent manifest into the M365 Admin Centre, users find it in Copilot, and it just works. For internal use cases this is a sweet spot.

The catch is that the only real control you have over behaviour is via the instructions field. There's no code-level orchestration, no custom routing, no clever fallbacks. The model does what the instructions tell it to do, and that's about it. Which means writing those instructions well is everything.

The size and shape of useful instructions

Microsoft's documented limit is 8000 characters. In practice the agents that work well sit somewhere between 800 and 3000 characters. Anything longer and the model starts ignoring bits. Anything shorter and you haven't really constrained the behaviour.

The structure that works most consistently for us is:

  1. Identity. Who the agent is, what it's for, who it's for.
  2. Scope. What it should help with, and what it should not help with.
  3. Behaviour. How it should respond - tone, format, length.
  4. Grounding. When and how to use the knowledge sources.
  5. Actions. When and how to use any plugins or tools.
  6. Refusals and escalations. What to do when it can't help.

You don't need headings in the instructions. But mentally organising them this way before you write helps a lot. A common failure mode I see is instructions that are just a wall of "do this, do that, also do this" with no real shape. The model handles structured guidance much better than a list of bullets.

Start with the identity statement

The first sentence or two should set what this thing is. Not what it can do, what it is.

A good identity statement looks like: "You are the Group Policy Assistant for Acme Logistics. You help warehouse managers find the right answer in our operational policy library, especially around dangerous goods handling and chain-of-responsibility obligations."

That tells the model the persona, the audience, and the domain. Now everything else can hang off that. The model knows that when someone asks about something unrelated, it should redirect. It knows that "the manager" probably means a warehouse manager, not a finance manager. It knows the language should be operational, not academic.

Compare that to a vague version: "You are a helpful assistant that answers questions about company policies." That gives the model nothing. It'll happily answer questions about anything that vaguely relates to policy, including the policies of completely unrelated companies it knows about from training data. Specificity matters.

Scope and what to refuse

This is where most agent projects fall over. The instructions list what the agent should do, but they don't list what it shouldn't. So the agent does everything, badly.

We always include a "Do not help with" section, written in plain English. Things like:

  • "Do not provide legal advice. If a question requires legal interpretation, recommend the user contact the Legal team."
  • "Do not generate code. If a coding question arises, refer the user to the Engineering team."
  • "Do not discuss commercial pricing. If pricing comes up, refer the user to their account manager."

The model is reasonably good at following these if you're explicit. It's much worse at inferring them. "An agent for HR questions" will happily wade into legal interpretation if you don't explicitly say not to. We've seen this happen during user testing more than once.

This is the same logic we apply when building Copilot Studio agents for clients - the scope of refusals is as important as the scope of capabilities, and often the bit that gets missed in initial design.

Be specific about format and tone

The model defaults to a slightly chatty, headers-and-bullets style. Sometimes that's fine. Often it isn't.

If the agent is helping warehouse managers on a phone in a noisy environment, you want short answers. If it's helping finance prepare board reports, you want structured ones. Tell the model.

Examples of instructions that have worked for us:

  • "Respond in two to four sentences. Use plain English. Avoid headings and bullet points unless the user explicitly asks for a structured answer."
  • "When citing a policy, always include the policy name and section number in your response."
  • "Use Australian English spelling. Use the term 'OH&S' rather than 'OSHA' or 'safety'."

Be careful with this last one. The model will sometimes interpret formatting instructions too literally and produce stilted responses. Test, adjust, test again.

Grounding instructions matter more than you'd think

If your agent has knowledge sources attached, you need to tell it how to use them. Otherwise the model treats them as one of many possible sources, including its own pre-training, and you get answers that drift away from what's actually in your documents.

Good grounding instructions:

  • "Answer based on the content of the attached SharePoint site. If the answer isn't in the site, say so and offer to escalate."
  • "Always cite the source document and section. Don't paraphrase if a direct quote would do."
  • "Do not use general knowledge to answer questions. If the answer is not in the provided sources, say 'I don't have information on that' and offer the user a path to find out."

That last one is the difference between an agent that hallucinates with confidence and an agent that admits its limits. The hallucination version is dangerous because users believe it. The honest version is annoying for the first week and then becomes the trusted version.

Actions and tool use

If your agent has plugins or actions attached, the instructions need to explain when to call them. The model will try to be helpful and call actions even when it shouldn't, or not call them when it should.

A pattern we use:

  • "If the user asks about their leave balance, call the GetLeaveBalance action with their email address."
  • "Before calling SubmitTimesheet, confirm with the user that the totals are correct."
  • "If an action returns an error, apologise, explain what happened in plain English, and offer to escalate."

The "confirm before doing something irreversible" instruction is important. Without it, users get nervous about agents that take action on their behalf. With it, they trust them.

Refusals and escalations

Every agent needs a graceful "I can't help you with that" response. Otherwise the model will try to help with everything and fail badly at the things outside its scope.

The instruction that works:

"If you cannot answer a question or it falls outside your scope, respond with: 'That's not something I can help with directly. You can [specific next step - contact HR, raise a ticket, email a team]'. Do not attempt to provide a partial or speculative answer."

The specificity of the next step is what makes this work. "Contact someone" is a dead end for the user. "Email [email protected] with subject 'Leave Query'" is a path forward.

The testing process nobody talks about

Once you've written your instructions, test them with real users. Not your own queries, real users. The thing you'll discover is that real users phrase things in ways you didn't anticipate. They ask compound questions. They forget context. They go off topic and then come back.

We run a structured testing session before deploying any agent to production. About thirty queries, written by people who will actually use the agent, plus a few adversarial ones to see how it handles bad input. Then we tune the instructions based on what breaks.

The other thing to test is the same query in different conversation contexts. The model behaves differently when it's the first message versus the fifth. Instructions that work for a clean conversation sometimes break in a long, meandering one.

Where this fits in a broader Copilot programme

Declarative agents are a great starting point for an organisation that wants to extend Copilot. They're low-risk, don't need engineering effort, and can be built and deployed in days. But they have limits. Anything that needs real-time data, complex orchestration, multiple tool calls, or custom UI is going to push you towards custom agent development.

The good news is the work you do on instructions for a declarative agent translates directly into prompt design for a custom one. The same principles apply. The model is the model, whether you're deploying it through M365 or through your own infrastructure.

For Australian businesses just getting started with this, our Copilot training covers the patterns above plus the broader question of how to build a programme around Copilot adoption.

The official Microsoft documentation on writing effective instructions is here: Write effective instructions for declarative agents. It's a good reference. The patterns in this post are what we've found works in actual Australian production deployments.