Back to Blog

Power BI Dataflow Refresh - How to Understand and Optimise Performance

April 8, 20269 min readMichael Ridland

Power BI Dataflow Refresh - How to Understand and Optimise Performance

Dataflows are one of those Power BI features that start simple and get complicated quickly. You set up a few tables, schedule a refresh, and everything works fine. Then you add more tables, more transformations, and suddenly your refresh is taking two hours - and timing out at inconvenient moments.

I've seen this pattern play out across dozens of Power BI implementations. The dataflow itself usually isn't the problem - it's the refresh configuration and the transformation design that cause grief. The good news is that most performance issues have straightforward fixes once you know where to look.

Two Types of Refresh - And Why It Matters

Power BI dataflows support two refresh modes. Understanding which one you're using (and which one you should be using) is the first step to fixing performance problems.

Full refresh does exactly what it sounds like. It flushes all the existing data and reloads everything from scratch. Every row, every table, every time. For small datasets this is fine. For a table with 50 million rows of transaction data, it's a problem.

Incremental refresh is available on Premium and Premium Per User capacity. Instead of reloading everything, it partitions your data by date and only refreshes the partitions that have changed. After the initial load, subsequent refreshes only pull the last N days (or weeks, or months) of data.

The performance difference is dramatic. We had a client with a financial reporting dataflow that was doing a full refresh of three years of transaction data every night. The refresh took about 90 minutes and occasionally timed out. After setting up incremental refresh to only process the last 30 days, the nightly refresh dropped to about 8 minutes. Same data, same reports, wildly different refresh performance.

If you're on Premium and your dataflow processes any kind of time-series data, you should be using incremental refresh. Full stop.

One gotcha though: for incremental refresh to work efficiently, the date filter needs to be pushed down to the source system via query folding. If your data source can't fold the filter (flat files, some APIs, certain web connectors), Power BI ends up pulling the entire dataset anyway and filtering in memory. Which defeats the purpose entirely. More on query folding later.

Reading Your Refresh History

Power BI gives you a CSV download of your refresh history that contains useful diagnostic data. Most people never look at it. You should.

Navigate to your dataflow, go to Settings, then Refresh History. Each refresh has a download icon that gives you a CSV with per-entity and per-partition details. On Premium, you get extra columns that are genuinely useful:

Rows processed tells you how many rows were scanned or written. If this number seems higher than expected, your query might not be folding properly.

Max commit (KB) shows peak memory usage. If you're getting out-of-memory errors, this is where you diagnose them. A high max commit value usually means your M query is doing something expensive in memory - like unpivoting a wide table or expanding a nested record with millions of rows.

Processor time versus duration is an interesting comparison. If processor time is high relative to duration, your transformations are doing a lot of work. If duration is high but processor time is low, you're probably waiting on the source system or on network latency.

Wait time shows how long your entity sat in a queue before processing started. High wait time means your Premium capacity is congested. Other workloads might be competing for the same resources.

Compute engine tells you whether the enhanced compute engine was used. This is one of the most important diagnostics to check.

The Compute Engine - Where Performance Gains Hide

The enhanced compute engine is a feature on Premium capacity that uses an internal SQL engine to accelerate transformations. When it works, it can speed up refresh by a significant margin. When it doesn't activate, you're leaving performance on the table.

The compute engine column in your refresh history shows one of four statuses:

NA means the compute engine wasn't used at all. This happens if you're on Pro (which doesn't support it), if you've turned it off, if query folding is handling everything at the source, or if your transformations are too complex for the SQL engine.

Cached means your entity data was stored in the compute engine and is available for other queries to reference. This is what you want to see on staging entities that feed downstream dataflows.

Folded means the compute engine's SQL engine processed your transformations. Your query ran against the cached SQL data rather than going back to the source. This is the goal for transformation entities.

Cached + Folded is the best case scenario. Your entity is both cached for downstream use and its transformations were folded into SQL. If you see this across your entities, your dataflow architecture is well-designed.

Here's the pattern that reliably activates the compute engine: separate your dataflows into staging and transformation layers. The staging dataflow pulls raw data from your sources with minimal transformation. The transformation dataflow references the staging entities and applies your business logic. This separation lets the compute engine cache the staging data and fold your transformation queries into SQL.

We use this two-layer pattern on almost every Power BI consulting engagement and it consistently delivers better performance than a single monolithic dataflow.

Timeout Errors and How to Fix Them

Pro dataflows have hard timeout limits. Two hours for any single entity. Three hours for the entire dataflow. If you hit these limits, your refresh fails and no data gets updated.

Premium removes these timeouts, which is one reason to upgrade if your dataflows are growing. But just because Premium doesn't time out doesn't mean slow refreshes are acceptable. A refresh that takes four hours is still burning capacity and potentially blocking other workloads.

Common causes of long refreshes:

No query folding. Your M query pulls the entire dataset into memory and applies all transformations there. This is the number one cause of slow refreshes. Check the step folding indicators in the Power Query editor to see which steps fold and which don't.

Expanding table operations. Merging tables and expanding columns is expensive if the tables are large. If you're expanding a column from a table with millions of rows, consider whether you can restructure the query to filter first.

Too many entities in one dataflow. Each entity refreshes as part of a single transaction. If one entity fails, the whole dataflow fails and nothing gets updated. Breaking large dataflows into smaller ones with clear dependencies gives you more granular control over refresh and failure handling.

Gateway bottlenecks. If your data source is on-premises and goes through a gateway, that gateway is a potential bottleneck. Undersize it and your refreshes slow down. We've seen cases where adding a second gateway in a cluster cut refresh times in half, simply because the first gateway was maxed out on CPU.

Query Folding - The Single Most Important Performance Concept

Query folding is when Power Query translates your M transformations into native queries that run on the source system. Instead of pulling all the data into Power BI and filtering in memory, the filter, join, or aggregation happens at the database level.

This matters enormously for dataflow performance. A folded query might transfer 10,000 rows. The unfolded equivalent might transfer 10 million rows and then filter down to 10,000 in memory.

Most SQL-based data sources support query folding well. OData feeds support some level of filtering. Flat files, Excel files, web APIs, and SharePoint lists generally don't fold at all.

To check whether your query folds, use the step folding indicators in Power Query Online. Each step shows an icon indicating whether it folded to the source. If folding breaks at a particular step, look at what that step does. Common folding breakers include custom M functions, certain text operations, and transformations that have no SQL equivalent.

When folding breaks, restructure your query. Move the non-folding steps to the end so that as much filtering and joining as possible happens at the source before the data hits the Power BI engine.

Orchestration Between Dataflows

When you have multiple dataflows in the same workspace with dependencies, Power BI handles the refresh chain automatically. If dataflow B references dataflow A, refreshing A will trigger B to refresh afterwards.

This automatic orchestration is convenient but limited. It only works within a single workspace, and it only flows in one direction. If you need cross-workspace orchestration or more complex refresh chains, you'll need to use the Power BI REST APIs or Power Automate.

The Power Automate approach is particularly clean. There's a Power BI connector that lets you trigger dataflow refreshes and wait for completion before triggering the next one. You can build arbitrarily complex refresh pipelines this way, with branching, error handling, and notifications.

For simpler scheduling needs, the built-in scheduled refresh works fine. Just be mindful of the Pro limit of 8 refreshes per day. On Premium, this limit is higher, but you should still think about whether your reports actually need to refresh as often as you think they do. We've seen organisations refreshing hourly when their source data only changes once a day.

Environment Optimisation

Beyond the dataflow design itself, your infrastructure matters:

Capacity sizing on Premium directly affects refresh speed. If your capacity is undersized, refreshes queue up and wait times increase. The refresh history CSV shows wait times - if these are consistently high, your capacity needs a review.

Gateway sizing for on-premises sources is often neglected. The default gateway configuration is fine for light workloads, but if you're pushing millions of rows through it, consider the hardware. Microsoft has published sizing guidance that's worth reading.

Network latency between your data source, gateway, and Power BI service region matters. Your Power BI tenant is assigned to a specific Azure region. If your data source is in a different region (or a different country), network latency adds up. Keep things in the same region where possible. For Australian tenants, Australia East is the common region - make sure your gateways and data sources are close to that.

Start With the Diagnostics

If your dataflow refreshes are slow, don't guess at the fix. Download the refresh history CSV, look at the actual numbers, and let the data tell you where the problem is. High processor time points to expensive transformations. High wait time points to capacity congestion. Long duration with low processor time points to network or source system delays.

Most performance issues come down to three things: missing query folding, missing incremental refresh, or an underactivated compute engine. Fix those and you'll fix most problems.

If your organisation is working through Power BI performance challenges and wants structured guidance, our Microsoft Fabric and Power BI consulting team can help you design dataflow architectures that scale. We've done this work across financial services, retail, and professional services firms across Australia.

For the full technical reference, see Microsoft's documentation on understanding and optimising dataflow refresh.