Back to Blog

Combining Files from a Folder in Power BI - What Actually Works

May 10, 20268 min readMichael Ridland

Everyone hits this problem eventually. A folder full of files that all look roughly the same and need to end up as a single table in Power BI. Monthly sales exports. Site reports from twenty branches. Survey responses from a year's worth of campaigns. You can open them one at a time, copy and paste them into a master Excel, and pray nothing changes. Or you can use the Combine Files feature in Power BI Desktop and let Power Query do the work.

I want to walk through how this actually plays out on real consulting jobs, not the tidy version Microsoft shows in the docs. Because in practice the folder is never as clean as the tutorial suggests.

How the feature works at a high level

When you point Power BI at a folder and click Combine Files, three things happen automatically. Power Query inspects the first file and works out what type of file it is. It generates a sample function that knows how to extract data from one file. Then it applies that function to every file in the folder and stacks the results into one table.

The clever bit is that these three pieces stay linked. The sample query and the function query reference each other. If you go back and change the sample, the function updates. If you add a new step to the sample, like trimming whitespace or unpivoting a column, that step runs on every file in the folder. Microsoft calls this the exemplar query pattern, and once you understand it, the whole feature makes sense.

The official walkthrough is in the Microsoft documentation on combining files. I'm not going to repeat the click-by-click. I want to talk about what goes wrong.

Where the tutorial breaks down

The tutorial assumes your folder contains files with identical structure. Same columns, same sheet names if they're Excel, same delimiter if they're CSV. Real folders are messier.

Here's what I see on most client engagements. Someone changed the CSV format halfway through the year. The Excel template got a new column added in March. There are two old files from before the rebrand that have a different header row. Someone accidentally saved a copy with a ~$ lock prefix because they had it open when the file got synced.

Combine Files handles uniform folders beautifully. It does not handle these messy realities on its own. You need to be ready to do some cleanup work before the combine step or build smarter logic into the exemplar query.

The simplest cleanup is a filter step early in the Folder query. Filter out files that don't end with .xlsx or .csv. Filter out files where the name starts with ~$. Filter out hidden files. This catches most of the rubbish that ends up in a SharePoint or OneDrive folder over time.

For the schema drift problem, the exemplar query is where you fix it. Build the extraction steps to be tolerant of column changes. If you reference columns by name, a missing column will break the whole load. If you reference by position, a reordered column will silently give you wrong data. Neither is good. The trick is to add a step that checks for the columns you care about and handles missing ones explicitly, even if that just means filling them with null.

The folder choice matters more than you'd think

There's a quiet decision baked into how you point Power BI at the folder. Local file path, network share, SharePoint folder, OneDrive folder. Each has different refresh behaviour and different gotchas.

Local file paths are fine for development on your machine. They break the moment you publish to the Power BI service because the gateway can't see your C drive. I've watched this happen more times than I'd like to admit. A consultant builds the report locally, it works, they publish, and the scheduled refresh fails because nothing can find the folder.

Network shares work if your organisation has a data gateway configured and the gateway service account has read access. If your IT department isn't sure what a data gateway is, you have a longer conversation ahead of you. We help clients sort this out on most of our Microsoft data and AI engagements, and it almost always uncovers wider issues with how data is being shared internally.

SharePoint and OneDrive folder connectors are the cleanest option for ongoing operational reports. They handle authentication through the user's account, they work in the service without a gateway, and they're tolerant of files being renamed because they use file IDs under the hood. The catch is they can be slow with large folders. If you've got 5,000 files in a single SharePoint library, expect refreshes to take longer than you'd like.

Why the exemplar query is the actual power

The exemplar query is the bit most people skip over and the bit that matters most. Once you understand it, the whole feature unlocks.

The default behaviour gives you a bare extraction. For a CSV that's basically "read the file, promote the first row to headers". For an Excel file with a single sheet, it's "open the workbook, take the only sheet, promote headers". Useful, but rarely enough.

The trick is that the exemplar query is just a normal Power Query query. You can do anything in it. Filter rows. Pivot or unpivot columns. Parse dates from filenames. Run regex over text columns. Any transformation you'd do on a single file, you can put in the exemplar query, and it will automatically apply to every file in the folder.

A pattern I use a lot is extracting metadata from the filename. If your files are named Sales_2026_March_NSW.csv, you can add steps in the exemplar query to parse the year, month and state out of the filename and add them as columns. Now your combined table has provenance baked into every row, without you needing to manually tag anything. That's massively useful for any report that needs to answer "where did this number come from".

Another pattern: skipping junk header rows. A lot of system exports have three or four header rows of report titles and metadata before the actual data starts. Add a "remove top N rows" step in the exemplar query and it runs on every file. Way cleaner than trying to do this with M code that loops over a list.

What I'd watch out for

A few patterns that have cost real time on real projects.

The "first file wins" problem. Combine Files picks the first file in the folder to base the exemplar on. If that file happens to be unusual, your sample is wrong and every other file gets processed through the wrong logic. Look at your data after the combine. If something feels off, swap the order or rebuild the exemplar against a more representative file.

The "Excel sheet name drift" problem. If your Excel files have sheets named Sheet1 on some files and Data on others, the combine will fail or silently miss data on the inconsistent ones. The fix is to write the exemplar to pick the sheet dynamically, like "first sheet" or "sheet matching a pattern", instead of hardcoding the name.

The "binary column type" problem. If you accidentally select a different scope in the Combine Files dialog (the whole workbook vs a specific sheet) you get very different behaviour. Selecting the workbook root gives you nested binary objects per file. Selecting a sheet gives you the data directly. Neither is wrong, but if you pick the wrong one you'll get confused fast.

The refresh performance problem. Combining lots of files is fast on initial load. It can be brutally slow on incremental refresh if you haven't set the refresh policy properly. By default, every refresh re-reads every file. For folders with hundreds of files this becomes the slow part of your whole report. Set up incremental refresh by file date or file name, depending on how your files are structured.

When to use Combine Files and when to do something else

Combine Files is the right tool when you have a folder of files that should logically be a single table, the files are reasonably consistent, and you can't change the upstream system to give you a database or API.

It's the wrong tool when you have a few hundred thousand files. At that point you want to be ingesting them through Microsoft Fabric, Azure Data Factory, or a proper data pipeline. The line is somewhere around five thousand files in my experience. Below that, Combine Files works fine. Above that, you're fighting the tool. This is where our Microsoft Fabric and Data Factory work usually starts.

It's also the wrong tool when the files have wildly different structures. If half your folder is JSON, a third is CSV, and the rest is Excel, you don't have a "folder of similar files". You have three different data sources that happen to share a folder. Treat them as separate queries.

Final thoughts

Combine Files is one of the small, well-designed features of Power Query that pays off compound interest over time. Learn it properly once, and you save yourself hours on every project that involves file-based data. Skip past it because the tutorial seems too simple, and you'll either rebuild this logic by hand every time or end up with brittle reports that break the moment a column moves.

The exemplar query pattern is the bit worth internalising. Once you understand that you're really just writing instructions for one file and Power Query handles the rest, the feature stops feeling magical and starts feeling like a tool you can shape. That shift, from passive consumer to active builder, is roughly the same shift we try to drive with every client we work with on Power BI and the broader Microsoft data stack.

If you're stuck on a specific folder structure or you've got a refresh that's mysteriously slow, drop us a line. We've probably seen the exact pattern before.