The Hidden Cost of Referencing Power Query Queries in Power BI
A senior data analyst at an Australian insurance client called us last year about a Power BI dataset that took 47 minutes to refresh. The model was small. The source was a perfectly healthy SQL Server. There was no obvious reason for it to be that slow.
The cause turned out to be a single shared query referenced by twelve other queries. The shared query hit a web API, did some clean-up, and then fanned out into staging tables. The analyst assumed Power BI was running the shared query once and reusing the result twelve times. That is not what Power Query does. It was running the source web API call thirteen times per refresh - once for each downstream query, plus once for the original.
Once we understood what was happening, the fix took an afternoon. The refresh time dropped from 47 minutes to under three. This is one of the most common performance traps in Power BI and very few people know about it until it bites them.
What "referencing a query" actually means
In Power Query, you can create a query and then create a second query that uses the first one as its starting point. The second query is said to reference the first. You can see this when you right-click a query in the Queries pane and choose Reference.
The mental model most people have is that this is like a function call. The referenced query runs once, returns a result, and the result is reused everywhere it is referenced.
That is wrong. The actual behaviour is closer to a code include. When Power Query executes a query that references another query, it copies the steps of the referenced query into the calling query and runs the whole thing from scratch.
Consider the scenario from the Microsoft documentation. You have Query1 that hits a web API and has Load disabled. You have Query2, Query3, and Query4 that each reference Query1 and add some transformations, and all three are loaded to the model.
When you click Refresh, here is what actually happens:
- Power Query runs
Query1's steps as part ofQuery2. The web API is called. - Power Query runs
Query1's steps as part ofQuery3. The web API is called again. - Power Query runs
Query1's steps as part ofQuery4. The web API is called a third time.
The web API is called three times even though logically it should be called once. The result is slow refreshes and a source system that is being hammered for no good reason.
Why this design choice exists
Power Query is built around a functional, declarative model where queries are evaluated independently. There is no global execution graph that says "compute these shared inputs first". Each query that gets loaded to the model is evaluated as an independent thing, top to bottom, with referenced queries inlined into it.
This makes the architecture simple and predictable. It also means there is no place for an automatic "compute this once and cache it" step. If you want caching, you have to build it.
The result is a tension. The most natural way to share logic across queries (referencing) is also a performance hazard. The way to avoid the hazard (persisted intermediate storage) requires more setup. We see this trade-off play out badly on real client projects all the time, and it is one of the things we look for first when reviewing slow Power BI models as part of our Power BI consulting work.
Table.Buffer does not save you
A common first reaction when teams hear about this is "I will just buffer the query in memory with Table.Buffer". This does not work the way people expect.
Table.Buffer caches a table in memory for the duration of a single query execution. The keyword is "single query execution". If you buffer Query1 and then reference it from Query2, the buffer exists only while Query2 is running. When Query3 runs separately, it starts with a fresh memory state and the buffer is gone. The source data gets pulled again.
Worse, if you put Table.Buffer inside the referenced query, you are now buffering the data three times per refresh - once for each downstream query that gets executed. This can actually make performance worse, not better. We have seen this happen on client work where someone tried to "optimise" with Table.Buffer and the refresh time went up.
Table.Buffer is useful for specific scenarios within a single query, like preventing repeated re-evaluation of an in-query lookup table. It is not a tool for sharing data between queries.
The fix - use a dataflow
The right way to share a data input across multiple queries is to materialise the input outside of Power Query's per-query execution model. The Microsoft-recommended approach for this is a dataflow.
A dataflow takes your Query1 logic and runs it on a schedule in the Power BI service (or in Microsoft Fabric for the Gen2 version). The result is stored persistently. Your Query2, Query3, and Query4 then read from the stored dataflow instead of from the original source.
The architecture goes from:
Web API -> Query1 (inline into Query2) -> Query2 -> Model
Web API -> Query1 (inline into Query3) -> Query3 -> Model
Web API -> Query1 (inline into Query4) -> Query4 -> Model
To:
Web API -> Dataflow Query1 (runs on schedule, stored)
Stored data -> Query2 -> Model
Stored data -> Query3 -> Model
Stored data -> Query4 -> Model
The web API is called once per dataflow refresh, not three times per dataset refresh. The downstream queries read from pre-computed storage, which is fast.
We covered the broader case for dataflows in our Power BI dataflows best practices post. The query referencing problem is one of the most concrete reasons to introduce them.
Dataflow Gen2 vs Dataflow Gen1
Microsoft has two flavours of dataflows now. Dataflow Gen2 runs inside Microsoft Fabric and stores its output in a Lakehouse or Warehouse by default. Dataflow Gen1 is the older "self-service" dataflow that lives inside a Power BI workspace.
If you have Fabric capacity, use Gen2. The compute is more flexible, the storage is queryable from other tools, and the developer experience is better. If you are on Power BI Pro or Premium Per User without Fabric, Gen1 dataflows are still a valid option and they solve the query referencing problem just as well.
The decision tree we use with clients:
- Got Fabric? Use Dataflow Gen2.
- On Pro or PPU only? Use Dataflow Gen1.
- Don't want the operational overhead of dataflows at all? Consider whether the source itself can be optimised, or whether you can collapse the shared logic into a single query that produces a wider table that downstream consumers filter from. Sometimes this is simpler than introducing a new artifact.
When referencing is fine
To be clear, referencing queries is not always a problem. It only matters when the referenced query does expensive work that gets repeated.
If Query1 is a simple list of literal values or a quick lookup against a small reference table, referencing it from ten other queries costs almost nothing. The cost only matters when the referenced query is doing real work - hitting a web API, running an expensive SQL query, doing heavy transformations, or pulling lots of data.
The rough heuristic we use during model reviews: if a referenced query runs for more than two or three seconds on its own, and it is referenced by more than one downstream query, you have a candidate for dataflow extraction.
How to spot this in your own model
If you suspect query referencing is hurting your refresh times, here is how to confirm it.
In Power BI Desktop, go to View > Performance Analyzer. Start recording and trigger a refresh. The trace will show you how long each query takes. If you see the same source operation appearing inside multiple downstream queries with similar durations, that is the signature.
For server-side traces on published datasets, the Power BI service refresh logs and the source system's query logs will show you the duplicate calls. Most Australian organisations we work with have SQL Server or PostgreSQL as the source for their financial reporting, and both have query logging that makes this obvious if you know to look. A single dataset refresh that produces the same expensive query three times per refresh is the smoking gun.
The broader lesson
This is one of those problems where the natural way to organise your work is also the way that costs you most. Reference queries feel like good practice. They look like code reuse. They follow the don't-repeat-yourself principle. But they hide a real performance cost that only becomes obvious when refreshes start getting slow.
The lesson generalises beyond Power Query. Tooling that makes it easy to share logic does not always make it efficient to share logic. The right question to ask is not "can I share this?" but "what actually happens when I do?".
For the original Microsoft guidance on this, the referencing Power Query queries documentation is short and worth reading. It is one of the better-written Microsoft Learn pages on Power Query because it actually shows you the gotcha rather than just telling you the rule.
If you are running into Power BI refresh times that feel longer than they should be, or you want a second pair of eyes on a model that is starting to feel slow, this is the kind of problem we sort through routinely on engagements. Get in touch if you want to talk about it. The fixes are usually less dramatic than the symptoms suggest. A two-hour review and a small architectural change can turn a 47-minute refresh into a three-minute one, which is a meaningfully better experience for everyone using the dataset.