How to Shrink a Bloated Excel Workbook Before It Breaks Power BI
Almost every business we work with has at least one monster spreadsheet. You know the one. It's 80 megabytes, takes a full minute to open, freezes when you scroll, and three different teams depend on it. Somebody built it years ago, it grew, and now it's load-bearing infrastructure that wheezes every time you touch it. When that workbook becomes the source for a Power BI report, all of those problems get inherited and amplified.
Microsoft has a tidy reference on reducing the size of an Excel workbook so it works better with Power BI, and the advice is sound. But documentation tends to list the techniques without telling you which ones actually move the needle in practice. After cleaning up a lot of these beasts, I can tell you the size usually comes from two or three culprits, and fixing those gets you most of the way.
Why a Fat Workbook Is Worth Fixing
Before the how, a quick word on why this matters beyond tidiness. A bloated workbook isn't just slow. It's fragile. Big files corrupt more often, fail to refresh in Power BI, and time out when they're sitting behind a gateway. We've had clients whose entire reporting pipeline fell over once a month, and the root cause was a source workbook that had quietly crept past the size where things start to break.
There's also the practical ceiling. Power BI has limits on file sizes you can import, and when you're pulling from a workbook that's mostly dead weight, you're wasting that budget on rubbish. Trimming the file means faster refreshes, fewer mystery failures, and a model that has room to grow. It's unglamorous work, but it's the kind of thing that stops the 2am support call.
Culprit Number One: Formatting Nobody Can See
Here's the one that surprises people. A huge chunk of Excel bloat is formatting applied to cells that contain nothing. Someone selects an entire column and applies a colour, a border, a number format. Excel dutifully remembers that styling for all million-odd rows in the column, even though only fifty of them have data. Do that across a few sheets and you've added megabytes of pure overhead.
The tell-tale sign is when you press Ctrl+End and Excel jumps to cell XFD1048576 or somewhere absurdly far from your actual data. That means Excel thinks your used range is enormous, and it's storing formatting and empty-cell information for all of it. The fix is to select all the genuinely empty rows and columns below and to the right of your real data, delete them properly, then save. The used range resets, and the file often drops dramatically.
I've seen a 60 megabyte workbook fall to 8 megabytes from this one fix alone. No data lost, nothing changed that anyone could see. Just the removal of phantom formatting that was never doing anything useful. If you only do one thing on this list, do this one.
Culprit Number Two: Data That Should Live Somewhere Else
The second big offender is using Excel as a database when it was never meant to be one. Hundreds of thousands of rows of transactional data, sitting in a worksheet, being recalculated every time anyone breathes near the file. Excel can technically hold a lot of rows, but it gets slow and heavy long before it hits the hard limit.
If your workbook is really a data store, the honest answer is that the data shouldn't be in Excel at all. It should be in a proper source that Power BI can connect to directly, whether that's a database, a Fabric lakehouse, or even a set of CSV files in a folder. The spreadsheet becomes a place for analysis and presentation, not bulk storage. This is often the right moment to rethink the whole pipeline rather than keep patching the workbook.
That said, I'm realistic. Not every business can swap out a critical spreadsheet overnight, and sometimes Excel staying as the source is the pragmatic call for now. If that's where you are, at least get the data into the workbook's data model rather than sitting raw on worksheets, which brings me to the next point. When clients are ready to graduate from spreadsheet-as-database to something that scales, that's a big part of what our Microsoft Fabric work is about: moving the heavy data off the spreadsheet and onto foundations that don't fall over.
Use the Data Model, Not Worksheets
Excel has had a proper data model inside it for years, the same compression engine that powers Power BI. When you load data into the data model rather than onto worksheets, it gets compressed far more efficiently than raw cells. The same data that takes 40 megabytes spread across sheets might take a fraction of that in the model.
The practical move is to bring data in through Power Query and load it to the data model only, not to a worksheet, when you don't need to eyeball the raw rows. This keeps the workbook lean and, as a bonus, makes the eventual move to Power BI far smoother, because you've already done the modelling work. The queries and relationships you build in Excel's Power Query carry across almost directly.
This is also just better practice. Raw data dumped onto a worksheet invites people to fiddle with it, break formulas, and create the kind of mess that's impossible to audit. Data in the model is cleaner, more controlled, and more honest about being a source rather than a scratchpad.
The Smaller Wins That Still Add Up
Beyond the big three, there's a cluster of smaller things worth knowing about. Volatile formulas like INDIRECT, OFFSET, and TODAY recalculate constantly and can make a workbook sluggish even when it's not especially large. If a file feels slow out of proportion to its size, volatile formulas are often why. Replacing them with stable alternatives helps the responsiveness even if it doesn't shrink the file much.
Then there's the stuff that quietly accumulates. Hidden sheets full of old working data nobody remembers. Pivot cache copies that double up the underlying data for every pivot table. Conditional formatting rules layered on top of each other over years until there are hundreds of them. Images and embedded objects saved at full resolution. None of these is dramatic on its own, but a workbook that's been alive for five years tends to collect all of them, and clearing the junk out is satisfying and effective.
Saving in the modern .xlsx or .xlsb format matters too if you've somehow still got an old .xls file kicking around. The binary .xlsb format in particular can be noticeably smaller and faster to open, though it's slightly less compatible with other tools, so weigh that up.
A Sensible Order of Attack
If you've got a workbook that's misbehaving and you connect it to Power BI, here's how I'd work through it. Start with Ctrl+End to check the used range, and clear out the phantom formatting if it's blown out, because that's the highest-payoff fix. Then look at whether you're storing data that really belongs in a proper source. Move what you can into the data model. Clean up the volatile formulas, the orphaned sheets, and the accumulated cruft. Save in a modern format and check the size again.
Nine times out of ten, that sequence turns a wheezing 70 megabyte file into something lean that loads quickly and refreshes in Power BI without complaint. And if you go through all that and the file is still enormous, that's your signal that Excel has outgrown its job and it's time to put the data somewhere built for it.
That transition, from heroic spreadsheets to a reporting setup that scales, is one of the most common pieces of work we do. It usually starts exactly here, with one workbook that's become too important and too fragile at the same time. We help teams get the data onto solid foundations and build Power BI reporting that doesn't depend on a single overloaded file. If that sounds like a problem you recognise, get in touch and tell us about your workbook. We've almost certainly seen worse.