Back to Blog

Reducing Power BI Data Model Size - Techniques That Actually Work

April 6, 20268 min readMichael Ridland

Reducing Power BI Data Model Size - Techniques That Actually Work

There's a moment in every Power BI project where someone realises the data model has gotten too big. Maybe the refresh takes 45 minutes. Maybe the .pbix file is 1.2 GB and won't publish to the service. Maybe reports that used to load in two seconds now take fifteen.

I've seen this happen more times than I can count. The good news is that Power BI's VertiPaq engine is remarkably efficient at compression - 10 GB of source data can compress down to about 1 GB in the model. The bad news is that many teams accidentally defeat that compression with design choices they made early in the project when the data was small and nobody was worried about size.

Here's what actually moves the needle when you need to shrink a model down.

Why Size Matters (Beyond the Obvious)

Before getting into techniques, it's worth understanding why smaller models aren't just a nice-to-have.

Shared capacity in Power BI has a hard limit of 1 GB per model. Premium and Fabric capacities can go higher, but bigger models consume more memory and that affects everything else running on the same capacity. Your model competes with other models for memory. When it gets evicted, the next user to open a report waits while it reloads.

Smaller models also refresh faster. If your refresh window is tight - say you need data updated before the 8am leadership meeting - a model that refreshes in 5 minutes instead of 30 gives you breathing room for retries when things go wrong. And things always go wrong eventually.

The performance angle is real too. Fewer rows mean faster DAX calculations. Reports feel snappier. Users actually trust and use the reports instead of going back to Excel.

Start by Removing Columns You Don't Need

This is the single most effective thing you can do, and it's where I always start when reviewing a client's model.

Most models import entire tables from the source system. Every column, whether it's used in a report or not. That "CreatedBy" column from your ERP system? The "InternalNotes" field with free-text descriptions? The system-generated UUID that nobody ever filters on? They're all sitting in your model consuming memory.

Go through each table and ask two questions about every column. Is it used in a visual, slicer, or filter? Is it needed for a relationship, calculation, or security rule? If the answer to both is no, remove it in Power Query.

I worked on a model last year where removing unused columns cut the model size by 40%. The client had imported 85 columns from a sales transaction table. Reports used 12 of them. The other 73 were just dead weight.

A practical tip: use DAX Studio's "View Metrics" feature to see column sizes. It shows you exactly how much space each column uses, sorted by size. Ten minutes with View Metrics tells you more about your model than an hour of guessing.

Filter Out Historical Data You Don't Report On

This is the second-biggest win I see. Most transactional tables grow continuously, but most reports only look at the last two or three years.

If your sales fact table has data going back to 2015 but nobody reports on anything before 2023, you're carrying three years of unnecessary history. Apply a filter in Power Query to load only the rows you need.

The best approach is using a parameter for the date cutoff. Set it to something like "3 years before today" using a relative calculation. This way your model automatically drops old data as new data comes in, without anyone needing to manually adjust the filter.

Be careful with this one though. Don't just silently remove historical data from a model that people are actively using. Someone will eventually try to compare this year's performance against 2020 and wonder why the data's gone. Communicate the change and make sure the business is comfortable with the retention period.

For entity-based filtering - say you only need Australian data and the source system has global data - the same principle applies. Filter early in Power Query, not late in DAX. Every row you prevent from loading is a row that doesn't need to be compressed, indexed, or scanned.

Pre-Aggregate Where Possible

This is the technique with the biggest potential impact and the biggest trade-off.

If your fact table has one row per order line item and you're only ever reporting at the monthly level, you're loading 30x more rows than you need. Group the data by month, customer, and product in Power Query, sum up your measures, and load the summarised data instead.

I've seen this take models from millions of rows to thousands. The size reduction can be 95% or more.

The catch is that you lose the ability to drill down to individual transactions. If someone wants to see which specific orders made up a monthly total, that data isn't in the model anymore. You need to be sure your reporting requirements won't change before you commit to this approach.

A middle ground that works well is using composite models. Keep the summarised data in Import mode for fast performance on dashboards and summary reports, and add a DirectQuery connection to the source for detail-level drill-through pages. Users get speed for their daily reports and granularity when they need to investigate specific records.

Optimise Column Data Types

VertiPaq handles numeric data much more efficiently than text. Numeric columns use value encoding, which is fast and compact. Text columns use hash encoding, which requires looking up each unique value in a dictionary.

Look for columns where the data looks like text but is really numeric. Order numbers are a classic example. If every order number is "SO" followed by six digits (SO123456), you can strip the prefix in Power Query and store the numeric portion as an integer. For high-cardinality columns with millions of unique values, this can make a real difference.

Date and datetime columns are another area to check. If you have a datetime column but only use the date portion, truncate the time component. A column with 365 unique dates compresses far better than one with 8,760 unique datetime values (one per hour).

Prefer Power Query Columns Over Calculated Columns

This is one of those tips that sounds minor but adds up across a large model.

When you need a computed column - a "Full Name" combining first and last name, a categorisation based on a value range, a fiscal year derived from a date - you have two options. Create it in Power Query (M language) or create it as a calculated column in the model (DAX).

Both produce the same result from a reporting perspective. But Power Query columns are compressed more efficiently by VertiPaq and don't slow down your data refresh. Calculated columns are computed after all Power Query tables load, which extends refresh time and uses slightly more memory.

The exception is when the calculation needs to reference data from multiple tables or use DAX-specific functions like RELATED or CALCULATE. In those cases, a calculated column is the right tool. But for simple column transformations - concatenation, type conversion, conditional logic - do it in Power Query.

Disable Auto Date/Time

Power BI has a setting called "Auto date/time" that creates a hidden date table for every date column in your model. If you have five date columns across your tables, you get five hidden tables, each with a full calendar hierarchy.

For models with many date columns, this adds up. Disable it in File > Options > Data Load and create your own date table instead. One well-designed date table is better than five auto-generated ones that you can't customise anyway.

Use DirectQuery for Large Tables

If you've applied all the techniques above and your model is still too large, DirectQuery is your escape valve.

Instead of importing a table, DirectQuery queries the source system in real time when a user interacts with a report. No data is stored in the model at all for that table. This is perfect for large fact tables that are too big to import, even after filtering and aggregation.

The trade-off is performance. DirectQuery is slower than Import because every interaction generates a query against the source database. The source database needs to be fast enough to handle those queries without making users wait.

Composite models - mixing Import and DirectQuery tables in the same model - give you the best of both approaches. Keep your dimension tables and aggregated facts in Import mode, and put the detail-level fact table in DirectQuery. Reports are fast for normal use and can drill into detail when needed.

Getting Your Model Right

Data model design is one of those areas where getting it right early saves enormous pain later. Retrofitting a model that's been in production for two years - with dozens of reports built on top of it - is significantly harder than designing it properly from the start.

Our Power BI consulting team regularly works with organisations to audit and optimise existing models. We also help teams build new models with these principles baked in from day one. If your reports are slow and your refreshes are getting longer, there's almost certainly room to improve.

For a deeper look at these techniques, Microsoft's data reduction guidance for Import modelling is worth reading. It covers the VertiPaq storage engine internals that explain why these techniques work.

And if your data story goes beyond Power BI - if you need to think about data integration, Fabric, or how reporting fits into a broader business intelligence strategy - that's a conversation we're always happy to have.