Back to Blog

Creating a Dataflow in Power BI - When the Legacy Approach Still Makes Sense

June 19, 20267 min readMichael Ridland

There's a moment in a lot of Power BI rollouts where the same transformation logic starts showing up in five different reports. Someone cleaned up the customer table in one report's Power Query. Someone else did almost the same thing in another. A third person did it slightly differently and now the customer counts don't match between two dashboards that are supposed to show the same business. This is the problem dataflows were built to solve, and it's one of the most common reasons Australian businesses we work with end up needing one.

A dataflow lets you do your data preparation once, in the Power BI service, and reuse the result across many reports and datasets. Instead of every report author rebuilding the same transformations, you build them in one place, refresh them on a schedule, and everyone downstream pulls from the same clean, consistent source. Microsoft's documentation on creating a dataflow is the official walkthrough. What I want to add is the consulting context: why you'd reach for this, when the legacy version is the right call, and where it tends to go wrong.

What a Dataflow Actually Is

If you've used Power Query in Power BI Desktop, you already understand dataflows. A dataflow is essentially Power Query running in the cloud instead of on your laptop. You define your queries, your transformations, your joins and filters and cleanups, and they execute in the Power BI service on a schedule. The output gets stored in the cloud and any report or dataset can connect to it.

The key shift in thinking is separation. In a typical setup, your data preparation and your report are bundled together in one PBIX file. Dataflows pull the preparation out and make it a shared asset. Your reports become consumers of clean data rather than each one owning its own messy version of the cleanup.

This matters most in organisations with more than a handful of report authors. When five people are all building reports off the same underlying systems, you want one definition of "what does a clean customer record look like," not five. Dataflows give you that single definition.

Creating One

The process is straightforward enough. In a Power BI workspace, you create a new dataflow, then define your queries. You can connect to the same broad range of data sources Power BI supports - SQL databases, SharePoint, web APIs, files in cloud storage, plenty more. You bring the data in, then use the online Power Query editor to shape it.

The editor is the familiar Power Query experience. You filter rows, rename and remove columns, change types, merge queries together, split columns, the whole toolkit. Each query you build becomes an entity in the dataflow, and each entity is a table that downstream reports can connect to.

Once your queries are defined you set a refresh schedule. The dataflow runs on its own cadence, independent of any report, and lands the refreshed data ready for consumption. From there, building a report against it is the same as connecting to any other source, except now you're connecting to data that's already clean.

If you want a hand designing this properly rather than learning by trial and error, our Power BI consultants have set up dataflow architectures for organisations across Australia, and we can usually spot the structural problems before they're baked in.

The "Legacy" Question

Microsoft labels the standard dataflow as legacy now, which sends a lot of people into a quiet panic about whether they're building on something that's about to be ripped out. Let me put that to rest. Legacy here doesn't mean deprecated or unsupported. It means there's a newer option, the Fabric-based Dataflow Gen2, that Microsoft is steering people toward for new work.

So which do you use? Here's how I actually decide it on engagements.

If the client is already in Microsoft Fabric, or moving there, Dataflow Gen2 is the right call. It's more capable, it integrates with the broader Fabric platform, it can write its output to a lakehouse or warehouse, and it's where the investment is going. Building new Fabric workloads on the legacy dataflow would be building toward yesterday.

If the client is on standard Power BI Pro or Premium and has no immediate Fabric plans, the legacy dataflow is completely fine and often the simpler choice. It does the job, it's well understood, and it doesn't require anyone to learn the broader Fabric model. For a business that just needs shared, reusable data prep and isn't ready to take on a whole platform shift, reaching for Fabric Gen2 can be overkill.

My honest take: don't let the word "legacy" stampede you into a migration you don't need. I've seen teams burn weeks moving working dataflows to Gen2 for no benefit other than dodging a scary-sounding label. If the legacy dataflow is solving your problem and you're not on a Fabric journey, leave it alone. The right time to move is when you're moving to Fabric for other reasons anyway, not because of a tag in the UI.

Where Dataflows Go Wrong

A few patterns I've watched trip people up.

The first is treating dataflows as a dumping ground. Once teams discover them, there's a temptation to push everything into dataflows, including transformations that really belong further upstream in the source database or a proper data warehouse. Dataflows are a presentation-layer convenience, not a replacement for a data platform. If you're doing heavy, business-critical transformation logic, that logic probably wants to live somewhere more robust than a Power BI dataflow. We've untangled a few setups where the dataflows had quietly become the de facto data warehouse, and it was not a fun position to be in.

The second is refresh sprawl. Every dataflow needs to refresh, and those refreshes consume capacity and take time. When you've got dozens of dataflows all refreshing on overlapping schedules, you can run into capacity contention and refreshes that miss their window. Be deliberate about refresh timing and frequency. A dataflow that feeds a daily report does not need to refresh every hour.

The third is the dependency tangle. Dataflows can reference other dataflows, which is a genuinely useful feature for building layered, reusable logic. It's also a great way to build a chain where one failed refresh upstream silently breaks five things downstream and nobody can work out why. If you build dependency chains, document them, and keep them shallow. A two-layer structure is manageable. A six-layer one is a debugging nightmare waiting to happen.

The Bigger Picture

Dataflows are one of those features that's easy to set up and easy to misuse. Used well, they bring real consistency to an organisation's reporting and they stop the endless duplication of the same cleanup logic across dozens of reports. Used badly, they become a shadow data platform that nobody owns and everybody's afraid to touch.

The teams that get the most out of them treat the dataflow layer as deliberate architecture, not as a convenient place to stash transformations. They decide what belongs in a dataflow versus what belongs upstream, they keep the structure clean, and they document the dependencies. That discipline is the difference between a feature that makes reporting better and one that makes it more fragile.

If your Power BI estate has grown organically and you're starting to feel the pain of inconsistent numbers and duplicated logic, that's usually the signal that some shared dataflow structure would help. It's also the kind of thing that's much easier to set up right the first time than to fix later. If you want a second pair of eyes on it, reach out and tell us what your reporting setup looks like. We spend a lot of time on the line between business intelligence and AI, and a well-organised data layer is the foundation everything else is built on.