Writing OpenAPI Specs That Actually Work With Microsoft 365 Copilot
If you're extending Microsoft 365 Copilot with your own data, the OpenAPI spec is the bit that gets you tripped up. The Microsoft docs make it sound straightforward. You write a spec, point Copilot at it, and your API is now an action Copilot can call. In practice, it's more like writing a job ad for an LLM. Vague descriptions get vague usage. Bad parameter names get ignored. Overlapping operations confuse the model and it ends up calling the wrong one.
We've built a fair few Copilot extensions for Australian organisations over the last year - mostly for professional services firms, a couple of insurers, and one mining operation that wanted to query equipment maintenance records from Teams. Every single one of those projects taught us something about what makes an OpenAPI spec work well with Copilot, and what makes it quietly fail.
This post is what I wish I'd known before the first one.
Why the spec matters more than you think
When you wire a REST API into Copilot, Copilot doesn't read your code. It reads your spec. The descriptions, the operation IDs, the parameter names, the examples. That text is the only thing the model uses to decide:
- Should I call this API at all for the user's request?
- Which operation do I call?
- What values do I put in the parameters?
So your spec isn't just contract documentation any more. It's a prompt. A long, structured prompt that you've written in JSON or YAML. Treat it like one.
The implication that catches people off guard - your existing public API spec is probably not good enough. Most APIs were specified for developers who'd read the docs separately and figure out intent from context. Copilot doesn't have that context. If your endpoint is called /v2/q with a parameter called qs, no amount of Copilot magic is going to make that work reliably.
What "good" looks like in practice
Microsoft's guidance covers the basics. Descriptive operation IDs, clear summaries, good descriptions on every parameter. That's all true. But there's a layer underneath that matters more.
Write descriptions for the model, not your developers. A description like "Returns customer records filtered by status" is fine for a human developer. For Copilot, you want something like "Search for customer accounts. Use this when the user asks about customers, accounts, clients, or any specific person or company that might be in our CRM. Returns matching customer records with contact details, status, and recent activity." It's longer. It uses synonyms the user might actually type. It tells Copilot when to fire.
Use natural names everywhere. getCustomerByEmail reads better than searchCustV2 or customerLookupExt. Copilot will use the operation ID as part of its reasoning. Cryptic names produce cryptic decisions. We renamed every operation in a client's spec from internal codenames to plain English, and Copilot's accuracy on real user queries went up noticeably. Not a controlled experiment, but the support tickets dropped.
Be brutal about removing unused operations. Every operation in your spec is something Copilot might try to call. If you've got 80 endpoints but only 12 of them make sense for Copilot to use, expose only those 12. We had a client whose Copilot kept calling an admin debug endpoint because the description sounded generic. Removing it from the Copilot-facing spec fixed the problem.
This is part of why we usually build a dedicated Copilot extension layer for clients rather than just pointing Copilot at their existing API. The shape of the spec for Copilot is different from the shape of the spec for app developers.
Parameters are where most specs fail
If there's one thing to obsess over, it's how you describe parameters.
Mandatory fields should have descriptions that explain what's valid. For an enum, list the values in the description even though they're already in the enum array. Copilot uses both. For a date field, say what format you want and whether you accept relative phrases like "last quarter" - if you don't accept them, Copilot will helpfully try to send them anyway and your API will 400.
Optional parameters need a steer too. If customer_type is optional but most useful queries should pass it, say so. Something like "Customer type. Optional but strongly recommended for accuracy. Use 'individual' for personal accounts and 'business' for commercial. If the user hasn't specified, ask before calling."
That last bit is a trick worth knowing. You can put instructions in parameter descriptions that tell Copilot how to handle missing information. The model will often follow them.
Examples in parameters matter too. Don't just specify type. Add an example value. Copilot uses examples to ground its understanding of what real values look like. For an Australian context this is more important than you'd think - if your example for a date is 12/05/2026 and the user types "12 May 2026" or "May 12 2026", Copilot will figure it out faster when it has seen the format you want.
Responses need descriptions too
Most teams write detailed request specs and then leave the response schema bare. That's a mistake for Copilot. The response shape determines what Copilot can talk about with the user afterwards.
If your response returns a customer object with a tier field, describe what tier means. "Customer tier. Gold means more than $50k annual spend, Silver is $10k to $50k, Bronze is under $10k." Now when the user asks "is this customer high-value", Copilot can answer using the data instead of going back to ask the API.
This is the difference between Copilot extensions that feel intelligent and ones that feel like a thin chat layer over a database. The good ones describe their data well enough that the model can reason about it.
Authentication is honestly still messy
OpenAPI authentication descriptors for Copilot have got better over the last year but they're still the part where I lose patience most. The supported auth flows are narrower than what general OpenAPI supports. OAuth 2.0 with specific flows. API keys via specific schemes. Anything funky like a custom header-based JWT setup, you're going to have to work around.
We've ended up putting an authentication proxy in front of a couple of clients' APIs specifically because their existing auth scheme didn't map cleanly to what Copilot wanted. Not a huge job, but factor it into your estimates. If you've got an old line-of-business API with bespoke auth, you might be building a thin adapter layer regardless.
For internal APIs that already sit inside Entra ID with proper app registrations, the path is much smoother. If you're starting fresh, build with Entra from the start.
The bits Microsoft doesn't shout about
A few things we've found that aren't in the official guidance but matter:
Test with the actual user phrases. Once your spec is loaded, ask Copilot questions the way a real user would. Not "call getCustomerByEmail with email=[email protected]" but "what do we know about John from ACME Mining". If Copilot can't figure out which operation to call from natural language, your descriptions need work.
Watch for operation collisions. If you've got two operations that sound similar from their descriptions, Copilot will pick wrong sometimes. We had getCustomerOrders and getCustomerHistory in the same spec where "history" included orders. Renamed and re-described to make the distinction crisp.
Length is a real constraint. Copilot has a limit on how much of your spec it can consider in a single reasoning step. A spec with 200 operations is going to perform worse than the same business surface area carved into focused extensions. We typically build several focused agents rather than one mega-agent for this exact reason.
Don't trust the initial Copilot Studio import. When you import an OpenAPI spec into Copilot Studio, it generates connector actions. Review them by hand. We've seen it get parameter mappings subtly wrong, especially with nested objects. The fix is usually editing the generated action descriptions after import.
What we'd do differently
If I was starting a new Copilot extension project tomorrow, I'd write the OpenAPI spec from scratch rather than reusing an existing public API spec. Even if the underlying endpoints stayed the same, the spec would be different. Different operation IDs, different descriptions, fewer operations, more examples. Treat it as a separate artefact.
I'd also build the spec iteratively with real user queries. Write five queries you expect users to ask. Try them. See where Copilot gets confused. Adjust the spec. Try again. The spec is never done, it's tuned over time as you watch how people actually use it.
If you're getting started with Microsoft 365 Copilot extensions for your business, we run Copilot training and implementation engagements that cover this in detail. The technology is solid now. The hard part is the spec design and the operational fit.
Reference: Best practices for writing OpenAPI specs from the Microsoft 365 Copilot Extensibility documentation.