Back to Blog

OpenAI File Search - Building Knowledge-Grounded AI Agents with Vector Stores

April 24, 20268 min readMichael Ridland

One of the most common requests we get from clients is some version of "we want an AI that can answer questions from our documents." It sounds simple. It isn't, usually. You need to chunk documents, generate embeddings, store them somewhere searchable, handle retrieval, manage context windows, and deal with the inevitable quality issues when the retrieval doesn't find the right passage.

OpenAI's file search tool takes most of that off your plate. You upload files, they go into a vector store, and when the model needs information from those files, it searches the vector store automatically. No embedding pipeline to build. No retrieval logic to write. The file search documentation makes it look straightforward, and for once, the documentation isn't overselling it. It actually works pretty well for a lot of use cases.

But "pretty well" has limits, and I want to be honest about where those limits sit.

How It Works

The setup has three steps. Upload a file to OpenAI's File API. Create a vector store. Add the file to the vector store. Then when you make a Responses API call, you include file_search in the tools list and specify which vector store to search.

The model decides when to use file search based on the conversation. If a user asks a question that the model thinks requires information from the uploaded files, it runs a search, gets relevant passages, and uses those passages to generate its answer. You get back two things: the file search call itself (so you can see that it happened) and the model's response with citations pointing back to the source files.

What I like about this design is that you're not managing the retrieval pipeline. OpenAI handles the chunking, embedding, indexing, and search. For teams that want to build a document Q&A system without becoming retrieval experts, this removes a real barrier.

Setting Up Your Vector Store

The practical workflow looks like this.

First, upload your file. OpenAI supports a reasonable range of formats - PDFs, Word documents, PowerPoint, code files, markdown, plain text, JSON, and HTML. Most of what you'd typically want to search is covered. The encoding needs to be UTF-8, UTF-16, or ASCII for text-based formats, which is standard but worth checking if you're working with legacy documents.

Then create a vector store. This is just an API call to create a named container for your files. You can have multiple vector stores for different purposes - one for product documentation, another for policy documents, a third for technical specs.

Finally, add files to the vector store. This triggers the processing pipeline on OpenAI's side: chunking, embedding, and indexing. You need to poll the status until it reports completed before you can search against it. For small files this takes seconds. For larger documents, give it a minute or two.

One thing we've learned working with clients: organise your vector stores by domain rather than dumping everything into one. A vector store with 500 diverse files produces noisier search results than three targeted stores with 150-170 files each. The model gets better passages when the search space is more focused.

Retrieval Customisation

Out of the box, file search returns whatever number of results the model thinks it needs. You can customise this.

Limiting results. If you're concerned about token usage or latency, you can cap the number of results returned. Fewer results means less context for the model to work with, which speeds things up but might miss relevant information. There's no magic number here - it depends on your documents and the kinds of questions people ask. We usually start with the defaults and tune down if latency is an issue.

Including search results in the response. By default, you see the model's answer with citations, but you don't see the raw search results. Adding include parameters lets you see exactly what passages the model retrieved. This is valuable during development and testing - you can check whether the model found the right passages and whether your documents are chunked well. For production, you might not need this, but we always enable it during the build phase.

Metadata filtering. This is where things get genuinely useful for larger deployments. You can attach metadata attributes to your vector store files and then filter searches based on those attributes. Say you've got product documentation for five different product lines. Rather than creating five separate vector stores, you tag each file with a product_line attribute and filter at search time. The user asking about Product A only gets results from Product A's documents.

We've used metadata filtering for a client with documentation across multiple regions. Each document was tagged with its region, and the agent filtered searches based on which region's operations the user was asking about. Same vector store, different search scopes. It's clean and it works.

What It's Good At

File search works well for structured, factual knowledge retrieval. Policy documents, procedure manuals, product specifications, FAQ databases - anything where the answer to a question lives in a specific passage of a specific document. The citation system is particularly good here because users can trace the answer back to the source.

It's also a strong fit for code-related questions. If you upload your codebase (or selected files from it), the model can search for relevant functions, classes, and patterns. This is useful for internal developer tools - agents that help new team members understand how a codebase works.

For AI workspaces where teams need to interact with their organisational knowledge through natural language, file search is often the fastest path to something useful. The time from "we have documents" to "we have a working Q&A agent" can be measured in hours rather than weeks, which changes the economics of these projects significantly.

Where It Falls Short

Let me be direct about the limitations, because they matter for deciding whether this is the right approach.

You don't control the chunking. OpenAI handles how your documents get split into searchable passages. For well-structured documents with clear sections and headings, this usually works fine. For documents with complex layouts, tables that span pages, or information that only makes sense in context with surrounding paragraphs, the automatic chunking can miss the mark. We've had cases where the relevant answer spanned a page break and got split across two chunks, with neither chunk containing enough context on its own.

Search quality depends on document quality. If your documents are poorly written, inconsistently structured, or full of jargon without context, the retrieval will suffer. This isn't a file search problem - it's a garbage-in problem. But it's worth calling out because organisations often underestimate how messy their document collections are until they try to build a search system on top of them.

Rate limits matter at scale. Tier 1 gets you 100 requests per minute for file search. That's fine for an internal team tool. If you're building something that hundreds of people will use simultaneously, you'll need Tier 4 or 5 for 1,000 RPM, and you should plan your architecture accordingly.

No on-premises option. Your documents go to OpenAI. For some Australian organisations, particularly in healthcare, financial services, and government, this is a non-starter due to data sovereignty requirements. If that's your situation, you'll need to build your own retrieval pipeline using self-hosted embedding models and vector databases - or look at Azure OpenAI Service where you get more control over data residency.

File Search vs Building Your Own RAG Pipeline

This is the question that comes up in most of our AI consulting engagements. Should you use OpenAI's hosted file search, or build a custom retrieval-augmented generation (RAG) pipeline?

Use file search when you want to move fast, your documents are in standard formats, you're comfortable with OpenAI hosting your data, and your scale fits within the rate limits. The operational overhead is minimal - no infrastructure to manage, no embedding models to maintain, no vector database to tune.

Build your own when you need control over chunking strategies, you have data sovereignty requirements, you're at a scale where cost optimisation matters, or you need to integrate with existing search infrastructure. Custom RAG gives you more control at the cost of more engineering work.

Honestly, for most of the mid-market organisations we work with, file search is the right starting point. You can always migrate to a custom pipeline later if you outgrow it. But the "let's build custom RAG from day one" approach often delays getting value to users by months, and in that time the requirements change anyway.

Getting Started

If you want to try this out, the setup is genuinely quick. Upload a handful of documents that represent your use case. Create a vector store. Build a simple interface that takes user questions and passes them through the Responses API with file search enabled. Test it with real questions from real users, not contrived examples.

The gap between "it answers questions" and "it answers questions well enough that people actually use it instead of searching manually" is where the real work lives. That's about document quality, prompt design, and understanding what your users actually need from the system. The technology handles the hard parts. The human parts are still on you.

If you're planning a document intelligence or knowledge retrieval project and want to talk through the options, we're here to help. We've built these systems across multiple platforms and we can help you pick the right approach for your specific situation.