Adding Skills to Microsoft 365 Copilot Declarative Agents - A Field Report

May 10, 2026•8 min read•Michael Ridland

Microsoft has been steadily quietly shipping a real extensibility model for Copilot in Microsoft 365. The Agents Toolkit lets you build declarative agents, hand them a manifest and a few config files, and deploy them across an organisation. Add skills and your agent can generate images, run Python, or call a REST API of your choosing.

I've been building these for clients across mining, financial services and professional services for the better part of a year now. Some of what Microsoft documents is exactly right. Some of it glosses over the operational reality of running this stuff in a regulated Australian organisation. Here's what I'd tell you if you were sitting across from me asking whether to start.

What "adding a skill" actually means

The Microsoft documentation on adding skills to declarative agents introduces three things you can bolt onto a declarative agent: image generator, code interpreter, and custom actions via API or MCP plugins. It positions these as "skills" you enable.

The honest framing is that these are different categories of capability with different operational profiles. The built-in capabilities (image and code) are flags you flip in the manifest. The custom actions are real software you have to build, deploy and maintain. Lumping them under the same heading is convenient for marketing but causes confusion when planning effort and risk.

A senior engineer at one of our financial services clients put it well: "the manifest entries are configuration. The custom actions are software. Treat them differently."

Image generation is the easy win

Adding the GraphicArt entry to the capabilities array gives your agent the ability to generate images from text prompts. It's literally one JSON object in the manifest:

{
  "name": "GraphicArt"
}

Provision the agent through Agents Toolkit, reload, and the agent can now make pictures.

The use case for most Australian organisations isn't "make a picture of a cat". It's the boring stuff that matters: marketing teams generating draft visuals for internal campaigns, training teams creating diagram concepts for course materials, sales teams sketching out pitch visuals before sending to the design team. These workflows used to involve a queue and a brief. Now they involve a prompt and a couple of iterations.

What you don't get from the docs is the licensing nuance. The image generator works fine for commercial Microsoft 365 Copilot. In GCCH environments (Government Community Cloud High) it's not available. Most Australian federal and state government clients I work with aren't on GCCH, but if you're advising a client that's been migrated to a regulated tenant, double check what's actually available before you scope work. We've helped a few organisations untangle exactly this question as part of our Copilot consulting work.

The other thing to know: generated images live in the Copilot conversation. They're not automatically saved to SharePoint or OneDrive. If users need to keep them, they have to download manually. This is a small thing that surprises end users.

Code interpreter is more capable than people realise

Adding the CodeInterpreter entry gives your agent the ability to run Python code in a sandboxed environment. Same one-line manifest entry:

{
  "name": "CodeInterpreter"
}

The framing in the docs is "solve complex tasks via Python code". This undersells what it actually does. Once code interpreter is on, your agent can do real analysis. Statistical work, file parsing, generating charts from data the user uploads, running calculations the model itself would otherwise hallucinate.

I had a finance team last year stop using their accountant's spreadsheets for a recurring monthly reconciliation. The Copilot agent we'd built had code interpreter on, and a junior analyst worked out they could just paste the data and ask for the reconciliation. The output was deterministic (Python doing actual maths) and the workflow that used to take a day was running in minutes.

The risk worth flagging is that code interpreter encourages users to send data into Copilot they probably shouldn't. Financial records. Customer data. Sensitive HR information. The agent boundary is the data boundary now, and most organisations haven't thought through what that means. If you're enabling code interpreter for a broad audience, you need clear guidance on what's allowed to go in. We cover this when we work with leadership teams through AI for Leaders and the strategy side of our business AI work.

GCCH note again: code interpreter requires a Microsoft 365 Copilot add-on license in GCCH. Worth pricing into your project costs if relevant.

Custom actions via OpenAPI is where it gets real

This is where declarative agents stop being toys and start being useful. A custom action lets your agent call a REST API of your choosing, using an OpenAPI description document to tell Copilot how the API works.

The Microsoft tutorial walks through using the JSONPlaceholder API as an example. Fine for learning. Not what you'll actually deploy.

In practice the API plugins I've built for clients have called things like: their CRM, their job tracking system, their internal knowledge base, a custom inventory lookup, the company's billing platform. The pattern is always the same. Take an internal system that has an API. Write a tight OpenAPI description. Hand it to Agents Toolkit. Provision. Now staff can ask their Copilot questions about that system in natural language.

The bit the docs don't really cover is how careful you have to be with the OpenAPI description. Copilot reads it and decides when to call which endpoint based on the operation descriptions and parameter names. If your descriptions are vague, Copilot makes the wrong calls. If your descriptions are too narrow, Copilot doesn't realise it can use them. There's a craft to this that takes a few iterations to get right.

The other thing: auth. The docs touch on this lightly. In production you almost certainly need OAuth, you need the user's identity to flow through to the backend so it respects existing permissions, and you need to think carefully about what happens when a user without access asks a question that would otherwise be answered. Getting auth wrong here is a security incident waiting to happen.

The MCP angle is starting to matter

The docs mention MCP plugins alongside API plugins as a way of adding custom actions. MCP, the Model Context Protocol, is the more interesting of the two for anyone building serious agent work.

The short version: instead of writing an OpenAPI description and exposing your API directly to Copilot, you stand up an MCP server that mediates between Copilot and your backend. The MCP server can handle auth, caching, rate limiting, observability, and the awkward stuff that doesn't fit cleanly into an OpenAPI spec.

For one-off integrations the OpenAPI route is fine. For a serious internal agent platform, MCP is where I'd be investing. The protocol is moving fast and Microsoft, Anthropic and the rest of the major model vendors have all converged on it. If you're starting fresh today, build MCP servers and treat OpenAPI as a fallback for systems you can't change. This is the architecture we're recommending for clients on our enterprise AI agents work.

What the docs get right and where they soft-pedal

Microsoft's documentation is good at the mechanics. JSON manifest changes. Provisioning. The Toolkit UI. You can follow the tutorial and have a working agent with skills enabled in an afternoon. That's a real achievement compared to where we were eighteen months ago.

What the docs soft-pedal is the operational picture around running these agents in a real organisation. Things like:

How do you version control the manifest properly when multiple developers are editing it? Git is fine for the JSON, but the linkage between manifest, API plugin, and Copilot environment isn't expressed anywhere in the repo. You have to maintain that mental model yourself.

What's the deployment story when you have ten declarative agents and need to push updates? Agents Toolkit handles single-agent provisioning well. Multi-agent rollouts across tenants need scripting you have to write yourself.

How do you monitor what your agents are actually doing? Logging is thin out of the box. You can see usage stats but not "which prompts triggered which actions and what happened". For anything regulated, you'll want better.

These aren't reasons not to use declarative agents. They're reasons to plan for the operational layer separately. Microsoft will catch up on a lot of this through the year. Don't wait for them to ship the perfect tooling before you start building, because the agents themselves are already valuable and the operational gaps are solvable with a bit of engineering.

What I'd actually do

If you're starting with declarative agents and skills today, the order I'd suggest:

Start with a single agent for a single team. Pick a team with a clear, repetitive workflow. Add image generator if they need visuals or code interpreter if they need analysis. Don't add custom actions yet. Get the team using the agent every day for a fortnight.

Once you've got that working, identify the one or two systems that team would benefit from the agent calling. Build OpenAPI descriptions for those. Add as custom actions. Watch how usage changes.

Once you've got two or three agents running and lessons learned, start thinking about an MCP server layer to centralise the integration work. This is roughly the path we've walked with several clients on our Microsoft AI consulting engagements.

The mistake to avoid is trying to build the perfect platform first. The technology is moving fast enough that anything you architect today will be obsolete in twelve months. Build small, learn fast, swap out the parts that don't scale.

The official Microsoft guide on adding skills to declarative agents is a fine starting point for the mechanics. The judgement calls (what to skill, when to use MCP, how to think about auth and security) are the harder part. If you'd like to talk through what makes sense for your organisation, give us a shout.