Giving Microsoft 365 Copilot Agents the Ability to Read and Act on Documents

June 21, 2026•7 min read•Michael Ridland

There's a particular kind of frustration I hear from teams who've started building agents on top of Microsoft 365 Copilot. They've set up a declarative agent, given it some instructions, pointed it at the right knowledge sources, and it answers questions nicely. Then someone has the document open in front of them, a contract or a report or a spreadsheet, and asks the agent something about that file, and the agent has no idea what they're talking about. The thing they're staring at, the agent can't see.

That gap is exactly what document interaction closes. It's one of those capabilities that sounds small in the release notes and turns out to change how people actually use an agent. Microsoft covers the setup in their document interaction documentation, and the configuration itself isn't complicated. The interesting part is what it lets you build, and where it still bites. Having put a few of these into real businesses, here's the practical view.

What Document Interaction Actually Does

A declarative agent in Microsoft 365 Copilot is, at its core, Copilot with a particular personality, a set of instructions, and a defined slice of knowledge. You build it to do one job well rather than to be a general assistant. The default version of that agent works off whatever knowledge sources you've configured, things like SharePoint sites or specific files you've pointed it at.

Document interaction adds something different: the ability for the agent to work with the document the user has open right now, in context, during the conversation. So if someone's reading a forty-page services agreement in Word and they ask the agent to summarise the termination clauses or flag anything unusual about the payment terms, the agent can actually read that open document and respond to it. The file doesn't need to have been pre-loaded as a knowledge source. It's the live thing in front of the user.

This matters because it matches how people actually work. Nobody thinks "let me go configure this PDF as a knowledge source first." They've got the file open, they have a question about it, and they want an answer. Document interaction makes the agent meet them where they already are. That shift, from "the agent knows about a fixed set of documents" to "the agent can engage with whatever you're working on," is the difference between a clever demo and something people reach for every day.

Where It Earns Its Keep

The use cases that land hardest are the ones where people deal with a steady stream of one-off documents that all need the same kind of attention.

Think about a legal or contracts team. Every agreement that crosses their desk is different, so pre-loading them as knowledge sources is pointless, but the questions are always similar. What are the obligations on us, where's the liability cap, does anything here deviate from our standard terms. An agent built specifically for contract review, with document interaction switched on, lets a lawyer open any agreement and get a structured first read in seconds. It doesn't replace the lawyer's judgement, but it gets the boring first pass out of the way. We've done this kind of work with professional services firms, and the time saved on the routine review of routine documents adds up fast.

Finance teams are another natural fit. Someone gets a supplier proposal or a board pack and needs to pull out the numbers that matter, compare them against expectations, or sanity-check the assumptions. An agent that can read the open document and answer pointed questions about it turns a tedious manual scan into a conversation.

The pattern underneath all of these is the same: high document volume, low document repetition, and a consistent set of questions. When you've got that shape, a declarative agent with document interaction is one of the cleaner wins available in the Microsoft 365 stack right now. If you're trying to work out where this kind of agent fits in your own organisation, mapping those high-volume, repetitive document tasks is exactly the sort of thing our Copilot and agent work starts with.

The Honest Caveats

Now the bits that don't make it into the marketing.

The first is that the quality of the answer is bounded by the quality of the document and the clarity of your instructions. Feed it a clean, well-structured contract and ask a precise question and you'll get an excellent response. Feed it a scanned PDF that's really just an image of text, or a spreadsheet with merged cells and notes scattered everywhere, and the agent will struggle the same way any reader would. Document interaction reads documents, it doesn't perform miracles on bad ones. Teams that test it only on tidy sample files and then roll it out against the messy reality of their actual document pile are setting themselves up for disappointment.

The second is expectation management around what the agent is actually doing. It's reading the document and reasoning over the text, not auditing it with legal or financial certainty. I'm fairly blunt with clients about this. An agent that summarises contract clauses is a brilliant assistant and a terrible final authority. The right framing is "this gives you a fast, useful first read so a person can focus their attention," not "this checks the contract so you don't have to." Get that framing wrong and you've built a liability rather than a tool. The agents that succeed are the ones deployed with that boundary made explicit to the people using them.

The third is the usual Microsoft caveat: the surface is still moving. What's supported, which file types behave well, how the agent accesses the open document, all of this evolves release to release. It's worth checking the current documentation when you build rather than relying on what was true six months ago. This is normal for the Copilot extensibility space, which is genuinely useful and genuinely still maturing at the same time. Build for it, but build expecting to adjust.

How to Approach Building One

If you're going to put one of these into a real team, here's the shape I'd follow.

Start by being specific about the job. A declarative agent is at its best when it does one thing well, so "contract review assistant for our standard supplier agreements" beats "general document helper." The narrower the job, the better the instructions you can write, and the instructions are where most of the quality comes from. Spend real time on them. Tell the agent how to structure its answers, what to flag, what tone to take, and what to do when it's unsure.

Then test against your actual documents, not idealised ones. Pull a representative sample of the messy, real files your team deals with and see how the agent copes. That's where you'll learn whether the idea survives contact with reality, and it's far cheaper to learn it during a pilot than after a rollout.

Finally, set the expectations of the people using it from day one. Make it clear what the agent is for and where the human still has to do the thinking. The agents that get trusted and used are the ones where everyone understands the boundary. The ones that get quietly abandoned are usually the ones that were oversold.

Worth the Effort

Document interaction is a small switch with an outsized effect on how usable a Copilot agent feels. It moves the agent from a thing that knows about your documents to a thing that can engage with the document in front of you, and for any team that processes a high volume of varied files, that's a meaningful upgrade. The technology is solid, the rough edges are manageable if you know they're there, and the wins are real when you aim it at the right job.

If you're thinking about where agents like this fit in your Microsoft 365 environment, or you've started building one and want a second opinion on whether you're aiming it at the right problem, that's a conversation we have often. Get in touch and we can talk through what's worth building and what's better left to a person for now.