Power BI Dataflows Best Practices - What We've Learned From Real Deployments
We've written before about what Power BI dataflows are and when to use them. This post is about doing them well - the best practices that separate a clean, maintainable dataflow environment from one that becomes its own source of problems.
Microsoft has published a comprehensive best practices guide for dataflows, and it's a solid reference. But it reads like a collection of links to other documentation, which is useful for finding specific answers but doesn't give you a sense of priority. After deploying and troubleshooting dataflows across dozens of Australian organisations, here's what actually matters most.
Back Up Your Dataflows. Seriously.
This one comes first because it's the lesson most organisations learn the hard way. Deleted dataflows cannot be recovered. Microsoft is very clear about this, yet we still see organisations with business-critical dataflows and no backup strategy.
There are several approaches, and which one you use depends on your tolerance for manual work.
Azure Data Lake Storage connection. If you connect your Power BI workspace to an Azure Data Lake Gen2 account, copies of your dataflow definitions and data snapshots are automatically stored there. If someone deletes a dataflow, you can recover it by downloading the model.json file and importing it back. This is the best option if you're already on Azure, because it happens automatically.
Power Automate or Azure Logic Apps export. You can set up automated flows that export dataflow definitions to JSON files on a schedule, storing them in SharePoint or Azure Data Lake. This is a solid approach for organisations that want automated backups without the Azure storage connection.
Manual JSON export. You can export a dataflow definition to JSON from the Power BI service at any time. Quick and simple, but it only works if someone remembers to do it. For non-critical dataflows, this might be enough. For anything your month-end close depends on, it's not.
One thing to know: the "Save As" feature for dataflows creates a copy in the same workspace. That doesn't help you in a disaster recovery scenario - if the workspace itself has issues, your backup is gone too. Keep backups external to the workspace.
Query Folding Is Your Best Friend
If there's one performance concept that makes or breaks a dataflow, it's query folding. When Power Query can "fold" your transformation steps back to the source system, the source database does the work. When it can't, Power Query pulls all the data into memory and processes it locally. The difference in performance can be enormous - minutes versus hours for large datasets.
Microsoft has a detailed article on query folding, and it's worth reading closely. In practice, here's what we tell our clients:
Keep foldable steps at the top. Filtering, column selection, and simple type conversions typically fold to the source. More complex transformations like pivot/unpivot, merging queries, and custom columns often break folding. Structure your query so that all the heavy filtering and column reduction happens before any non-foldable steps.
Check whether folding is actually happening. In the Power Query editor, right-click on a step and look for "View Native Query." If you can see the generated SQL, that step is folding. If the option is greyed out, folding has stopped. This is the single most useful diagnostic step for dataflow performance.
Use the source system's strengths. If your source is SQL Server, write a view that does the heavy transformation work and connect your dataflow to the view. Don't try to replicate complex SQL logic in Power Query - it's almost never worth it, and the performance will be worse.
Computed Entities for Performance
Computed entities (or computed tables in Fabric terminology) are one of the more underutilised features of dataflows. A computed entity is one that references another entity within the same dataflow rather than connecting to an external source. The key benefit is that the referenced entity is already loaded into storage, so the computed entity reads from that rather than hitting the source system again.
This matters when you have a large base table that multiple transformations depend on. Instead of each entity querying the source independently, you load the base table once and build computed entities on top of it. We've seen this cut total refresh times by 40-60% in dataflows that hit the same source tables multiple times.
The catch: computed entities require Premium capacity or Premium Per User. They won't work on shared capacity.
Incremental Refresh for Growing Datasets
If your dataflow pulls transactional data that grows over time - sales transactions, log entries, customer interactions - you should be using incremental refresh. Without it, every refresh reloads the entire dataset. For a table with two years of history, that means reprocessing millions of rows when only today's data changed.
Microsoft's incremental refresh documentation for dataflows walks through the setup. The concept is straightforward: you define a date/time column that identifies new or changed records, and Power Query only processes the increment rather than the full table.
In practice, the trickiest part is choosing the right incremental boundary. Some organisations want hourly granularity. Others are fine with daily. The more frequently you need to refresh, the more carefully you need to think about whether the source system supports efficient querying of recent records. A poorly indexed source table can make incremental refresh slower than a full load, which defeats the purpose entirely.
Structure Dataflows by Domain, Not by Report
I've mentioned this before, but it bears repeating because we see the opposite pattern so often. Do not create one dataflow per report. Create dataflows around business domains.
A "Sales" dataflow that produces entities like Orders, Order Lines, Customers, and Products. A "Finance" dataflow with General Ledger, Budget, and Cost Centre entities. A "HR" dataflow with Employee, Department, and Headcount entities.
Multiple reports and semantic models then connect to these domain dataflows. The benefits compound quickly: one place to maintain business logic, one schedule to manage per domain, and a clear ownership model where the sales team owns the Sales dataflow, finance owns theirs, and so on.
The anti-pattern is a dataflow called "Marketing Dashboard Dataflow" that contains a random selection of entities from multiple source systems. When a second marketing report needs similar data, someone creates "Marketing Dashboard 2 Dataflow" with overlapping logic. Before long, you've recreated the same duplication problem that dataflows were supposed to solve.
Error Handling and Monitoring
Dataflows fail. Sources go offline, schemas change, credentials expire. The question is whether you find out about it before or after someone's executive dashboard shows stale data.
Set up email notifications for refresh failures at a minimum. Power BI workspace settings let you configure this per dataflow. If you're using Power Automate, you can build more sophisticated alerting - posting to a Teams channel, creating tickets in your ITSM tool, or triggering automated remediation.
Microsoft's guidance on error handling in Power Query is also worth reading. You can build error handling into the transformation logic itself, routing error rows to a separate entity for review rather than failing the entire refresh. We use this pattern for dataflows that pull from external APIs or third-party systems where occasional bad data is expected.
Workload Configuration on Premium
If you're running on Premium capacity, there are configuration settings that directly affect dataflow performance. The workload configuration documentation covers the details, but the highlights:
Enhanced compute engine. This can improve performance by up to 25x for large dataflows, particularly for computed entities and merge operations. It's not enabled by default on all capacity configurations, so check whether it's turned on.
Memory allocation. Premium capacities let you control how much memory is allocated to dataflow refreshes. If your refreshes are timing out or failing on large datasets, increasing the memory allocation often resolves the issue - though you're trading off capacity available for other workloads.
Container size. Larger container sizes let individual dataflow refreshes use more resources. For a few large, complex dataflows, increasing the container size makes more sense than having many small containers.
Reusability Across Workspaces
One of the most powerful patterns is building dataflows in a centralised workspace and having reports across multiple other workspaces consume them. This creates a hub-and-spoke model where the data team maintains a set of well-governed dataflows in the hub, and business units build their reports in their own spoke workspaces.
Microsoft calls these "linked entities" when one dataflow references entities from another dataflow in a different workspace. It works, but be aware of the refresh dependency chain. The source dataflow must refresh before any consuming dataflows, and those must refresh before any consuming semantic models. Schedule management gets complex quickly.
For organisations that are outgrowing basic Power BI governance, this pattern is often the right next step before investing in a full data warehouse or lakehouse architecture. It's something we help clients design and implement through our Power BI consulting practice, and it's a good way to get value from dataflows without a large upfront infrastructure investment.
Common Mistakes We See
A few patterns that come up regularly in our consulting work:
Putting too much logic in one dataflow. A dataflow with 50 entities and complex interdependencies is hard to debug, slow to refresh, and risky to change. Break it up by domain.
Ignoring refresh schedules. Dataflows, semantic models, and dashboards form a pipeline. If any piece refreshes at the wrong time, users see stale or inconsistent data. Map out the full refresh chain and build in buffers.
Not testing with production-scale data. A dataflow that works perfectly with a development database containing 10,000 rows might time out with 10 million. Always test with representative data volumes.
Skipping documentation. When the person who built the dataflow leaves the organisation, someone needs to understand what it does, where the data comes from, and why certain transformations were applied. Document your dataflows - even basic notes in the description field help.
Where to From Here
Dataflows are a good tool when applied to the right problems. They sit in a useful middle ground between "every report manages its own data" and "we need a full enterprise data warehouse." For many Australian organisations, especially those with 10-50 reports and a growing analytics team, dataflows offer the right level of structure without over-engineering the solution.
If you're looking at how dataflows fit into a broader data strategy - particularly with Microsoft Fabric and the move toward lakehouses and unified analytics - our Microsoft Fabric consulting team can help you figure out the right architecture for where you are today and where you're heading.