Knowledge Sources in Microsoft 365 Copilot - What Your Custom Agents Can Actually Read
The moment a business gets past playing with Copilot and tries to build something useful with it, the same question comes up. "Can it answer questions about our stuff?" Our policies, our products, our project files, the things that actually live inside the company. Out of the box, Copilot knows about the public internet and whatever it can see in your Microsoft 365 tenant. The interesting work, the stuff that's genuinely worth paying for, happens when you point an agent at specific knowledge and tell it "answer from here."
That pointing is done through knowledge sources, and Microsoft's documentation on knowledge sources for Copilot extensibility lays out the options. I want to talk about it from the angle of someone who's built these agents for real businesses, because there's a gap between what the docs describe and what works once people start actually using the thing.
What a Knowledge Source Actually Is
Strip away the jargon and a knowledge source is just the set of information you've told an agent it's allowed to draw on when answering. Instead of the agent reaching across everything, you scope it down to specific places. A SharePoint site full of HR policies. A set of documents about your products. An external system you connect through a Graph connector. The agent grounds its answers in those sources, which means it can cite where an answer came from and is far less likely to make things up.
That grounding is the whole point. A general chatbot that confidently invents an answer about your refund policy is worse than useless, it's a liability. An agent grounded in the actual refund policy document, that quotes the relevant section and links to it, is something a customer service team can genuinely rely on. The knowledge source is what turns a clever-sounding toy into a tool people trust.
We build a lot of these grounded agents, and the difference in adoption between a grounded one and an ungrounded one is night and day. People can tell when an answer is real versus plausible-sounding waffle, and they stop using the ones that waffle. If you want to understand how we approach this properly, our Copilot Studio consulting work is built around getting the grounding right before anything else.
The Sources You Can Actually Use
The most common and most useful starting point is SharePoint. If your organisation already has documents living in SharePoint, and almost every Microsoft 365 customer does, you can point an agent at a specific site, library, or set of files and have it answer from those. This is the path of least resistance and where I'd tell most teams to begin. You're using content that already exists, already has permissions on it, and is already maintained by someone.
Beyond SharePoint, you can connect external knowledge through Graph connectors, which pull content from systems outside Microsoft 365 into the index that Copilot can search. Think of a knowledge base in a third-party support tool, a document management system, or a custom database. There's more setup involved, but it means the agent can reach company knowledge that doesn't live in Microsoft's world. For businesses with important content scattered across different platforms, this is how you bring it together without migrating everything.
There are also more specialised options, like embedded file content and connections to specific data through the agent's actions, but for most teams the story is SharePoint first, Graph connectors when you need to reach further. Don't overcomplicate the opening move.
Permissions Are the Part Everyone Underestimates
Here's the thing the documentation mentions but doesn't shout loudly enough, and the thing that bites people. Copilot respects the existing permissions on your content. That sounds reassuring, and it is, but it cuts both ways.
The good news is that an agent grounded in SharePoint won't show a user content they don't already have access to. If someone can't open a document directly, the agent won't quietly leak its contents to them either. Security trimming, as it's called, is built in and it works.
The bad news, or at least the surprising news, is that your existing permissions are now exposed in a way they never were before. If a sensitive document was technically open to "everyone in the company" because nobody ever tightened the permissions, it was probably safe through obscurity. Nobody knew the file existed or where to find it. Put a Copilot agent in front of it and suddenly anyone can ask a natural-language question and have that document surface instantly. The agent is brilliant at finding things, including things you'd forgotten were poorly secured.
This catches organisations out constantly. We always tell clients that rolling out Copilot agents is, among other things, a permissions audit you didn't know you needed. Before you point an agent at a SharePoint site, somebody needs to actually look at what's in there and who can see it. It's tedious. It's also non-negotiable, and we build that review into every Copilot deployment we run because skipping it is how you end up with an embarrassing incident.
Curation Beats Volume Every Time
A mistake I see again and again is the instinct to give an agent everything. "Let's just point it at the whole intranet." It feels generous and thorough. It produces a worse agent.
When you ground an agent in a vast, messy pile of content, it has to wade through outdated drafts, superseded policies, duplicate documents, and half-finished notes to find an answer. The results get noisier and less reliable, and the agent's accuracy drops. Worse, it might confidently quote a policy that was replaced two years ago because nobody deleted the old version.
A tightly curated knowledge source, a clean set of current, authoritative documents, produces a dramatically better agent than a sprawling one. Less is genuinely more here. The work of deciding what goes in, and keeping the old rubbish out, is the work. One company we helped had thirty versions of their employee handbook floating around SharePoint. The single most valuable thing we did wasn't technical at all, it was sitting with them to identify the one true current version and making sure that was what the agent saw. The agent got good the moment the source got clean.
So the honest advice is to treat your knowledge source like a curated library, not a junk drawer. Fewer, better, current documents. And someone needs to own keeping it that way, because content rots and a knowledge source that was clean a year ago won't be clean now without maintenance.
What's Still Rough
I'll be straight about the limitations, because the marketing won't be. Grounding quality depends heavily on how your content is structured. Well-organised documents with clear headings and sensible structure get parsed and retrieved far better than a wall of text or, heaven forbid, information trapped inside scanned PDFs and images. If your knowledge lives in badly formatted files, the agent will struggle, and the fix is often to improve the source documents themselves before blaming the technology.
Freshness can lag too. When you update a document, the index doesn't always reflect the change instantly, so there can be a window where the agent answers from a slightly stale version. For most uses that's fine. For anything time-critical, like a price that just changed, it's worth understanding the refresh behaviour rather than assuming the agent is always bang up to date.
And honestly, the setup and tuning take more thought than the demos suggest. Getting an agent to reliably answer from the right source, cite properly, and handle the awkward edge cases is real work. It's very doable, and the payoff is large, but anyone promising you a five-minute grounded agent that just works is selling you the demo, not the deployment.
Where to Start
If you're weighing this up, my advice is to start narrow and prove it. Pick one well-defined use case, a customer service knowledge base, an HR policy assistant, a product information helper. Curate a clean set of source documents for it. Audit the permissions on those sources before you switch anything on. Build the agent, test it hard with real questions from the people who'll use it, and only then think about expanding.
That narrow, grounded, well-curated approach is what separates the Copilot projects that stick from the ones that get abandoned after the novelty wears off. The technology is genuinely good now. The discipline around what you feed it is what decides whether it's useful.
If you're trying to build agents that actually answer from your company's knowledge and you want them done properly, with the permissions and curation handled rather than glossed over, that's squarely what we do. Take a look at our business AI work or just get in touch and tell us what you're trying to build. The first question we'll ask is what your knowledge source looks like, because that's where these projects are won or lost.