Dataflow Gen1 to Gen2 Migration Scenarios - What We See in Real Australian Engagements
Every Australian organisation we work with on Microsoft Fabric eventually hits the same fork in the road. They have years of Power BI dataflows running quietly in the background, feeding semantic models, Excel reports, Dataverse tables, all sorts. Then someone with budget approval reads about Dataflow Gen2 and asks the obvious question: when do we migrate, and how much pain is involved?
The Microsoft documentation outlines three scenarios that mirror what we see in practice - personal, departmental, and enterprise. The scenarios are useful as a starting point. But the documentation is written by Microsoft, so it skips the awkward bits. Things like the real cost of a phased migration, the gotcha with the dataflow connector that Microsoft now quietly recommends against, and the question of whether you should even bother migrating some Gen1 dataflows or just rebuild them properly.
This post walks through what each migration scenario actually looks like from a consulting chair, including the bits the official guidance leaves out.
Why Gen2 Is Worth the Effort
Before talking about migration, it is worth being clear about what you are getting. Dataflow Gen2 has fast copy, which can reduce ingestion time for large source pulls by an order of magnitude on the right connectors. The incremental refresh has been rebuilt and is no longer the half-baked feature it was in Gen1. You get data destinations, so your output goes to a Lakehouse, Warehouse, Azure SQL, or Fabric KQL database instead of being trapped inside the dataflow's storage.
That last point is the one most people underestimate. In Gen1, your transformed data lived in a CDM folder that you accessed through the dataflow connector. The data was not really yours in a portable sense. With Gen2 outputting to a Lakehouse table, that same data is queryable from Spark, T-SQL, Power BI Direct Lake mode, or anything else that talks to OneLake. The architectural unlock is significant.
CPU consumption is the catch. Gen2 generally consumes more capacity units than Gen1 for equivalent work, particularly if you enable Lakehouse staging or use the Warehouse compute path. Microsoft is upfront about this and recommends doing a proof of concept before committing. We agree. We have seen capacity bills triple after a poorly planned migration, and we have seen them stay flat after a well-planned one. The difference is usually whether anyone bothered to measure.
If you are still in the early stages of working out whether Fabric is the right fit, our Microsoft Fabric consultants page covers the decision criteria we use. If you have committed and just need a migration sorted, keep reading.
Scenario 1 - Personal or Team Dataflows in a Single Workspace
The first scenario is the small team running a handful of Gen1 dataflows in one workspace. They pull from Excel, SharePoint, maybe a small SQL database, transform a bit, and feed three or four semantic models. Total dataflow runtime is under 30 minutes a day. Nobody is suffering.
The migration logic here is straightforward in theory. Recreate the dataflow in Gen2, point the consumers at the new dataflow ID, and decommission Gen1. Microsoft's example query shows the standard pattern - replace the workspace and dataflow IDs in the Power Query M code, and you are done.
let
Source = PowerPlatform.Dataflows(null),
Workspaces = Source{[Id="Workspaces"]}[Data],
Workspace = Workspaces{[workspaceId="<new workspace ID>"]}[Data],
DataflowId = Workspace{[dataflowId="<new dataflow ID>"]}[Data],
DimDateTable = DataflowId{[entity="DimDate", version=""]}[Data]
in
DimDateTable
Here is the bit Microsoft buries in an Important note. They explicitly recommend you do not use the dataflow connector with Gen2. The underlying storage is called DataflowsStagingLakehouse and is considered an implementation detail. Microsoft reserves the right to change it. If you migrate using the connector pattern above, you are technically working but on borrowed time.
What we tell clients in this scenario is to bite the bullet and output to a Lakehouse table during the migration. Yes, it adds work. Yes, it means updating semantic models to point at the Lakehouse instead of the dataflow connector. But you only do the migration once, and you avoid having to do it a second time when Microsoft changes the staging implementation in eighteen months.
The exception is when the dataflows are truly throwaway. A team using dataflows for ad hoc analysis that will be retired in six months can use the connector pattern and accept the risk. That is a valid call. Just be honest about which bucket your dataflows fall into.
Scenario 2 - Departmental Dataflows with Linked Tables Across Workspaces
This is where it gets interesting and where most of our migration work happens. A finance team has a master dataflow in one workspace that produces a clean customer dimension. The marketing team links to that dimension in their own workspace. The product team links to it in theirs. There are five or six dataflows downstream, each referencing the master through linked tables.
In Gen1, linked tables work but they are slow. Each refresh has to materialise the data, and you end up with copies sitting in multiple workspaces. The latency adds up. In Gen2, Microsoft's recommended pattern is to output the master dataflow to a Lakehouse, then create OneLake shortcuts from the downstream workspaces to that Lakehouse. The shortcuts are pointers, not copies. Refresh costs drop. Data freshness improves. One copy of the truth.
The query pattern in Gen2 changes too. Instead of PowerPlatform.Dataflows, you use Lakehouse.Contents:
let
Source = Lakehouse.Contents([]),
WorkspaceId = Source{[workspaceId="<workspace ID>"]}[Data],
LakehouseId = WorkspaceId{[lakehouseId="<lakehouse ID>"]}[Data],
DimCustomerTable = LakehouseId{[Id="DimCustomer", ItemKind="Table"]}[Data]
in
DimCustomerTable
The migration approach we use for departmental scenarios is to migrate the master dataflows first, get them outputting cleanly to Lakehouse tables, then migrate the downstream dataflows one at a time. Trying to do them all at once creates a window where some consumers are pointing at Gen1, others at Gen2, and refresh schedules conflict. Phased works. Big bang does not.
A genuinely useful tip - if your semantic models have the workspace and dataflow IDs hard-coded in Power Query, parameterise them before you start. Then you can use the Datasets Update Parameter in Group REST API to flip them programmatically. Saves enormous time on a large migration.
If you have a departmental migration coming up and want a sanity check on the approach, our Power BI consultants team has done this several times for organisations across Sydney, Melbourne, and Brisbane. We are happy to review your plan.
Scenario 3 - Enterprise Dataflows Feeding Multiple Domains
The enterprise scenario is the largest and the one where the most architectural rethinking is justified. You have a central data team ingesting from twenty source systems, transforming through dozens of Gen1 dataflows, feeding semantic models across finance, operations, sales, HR, and so on.
Here is where I get a bit opinionated. If you have an enterprise Gen1 estate, do not migrate it. Rebuild it. The Gen1 dataflow architecture predates the medallion pattern that Microsoft now pushes for Fabric, and trying to lift and shift it usually produces a Frankenstein. You end up with bronze and silver tables co-mingled, transformations split awkwardly between dataflows and downstream semantic models, and a dependency graph that nobody can fully explain.
What works better is using the Gen1 estate as a requirements document. Document what each dataflow produces, who consumes it, what the refresh schedule is. Then design a clean medallion architecture in Fabric with bronze tables for raw ingestion, silver tables for cleaned and conformed data, and gold tables for business-ready outputs. Use Dataflow Gen2 for the bronze and silver layers where Power Query is the right tool, and use notebooks or stored procedures for transformations that are better suited to Spark or T-SQL.
This is more work upfront. It is also the only way to end up with an estate you actually want in three years' time.
The cost question is real. CPU consumption on Gen2 will be higher than Gen1, particularly if you turn on staging. We typically see a 20 to 50 per cent increase in capacity consumption for equivalent workloads, though it varies. Run a representative sample of your dataflows through Gen2 first, measure the CU consumption, and project from there. The Fabric Metrics App is your friend here. Do not skip this step or you will get a surprise on your first invoice.
For organisations with serious data estate work coming, our data-related services include both the architecture work and the migration execution. We have done enough of these now to know where the bodies are buried.
What to Watch Out For
A few honest warnings from the field.
Capacity sizing is the most common place to get caught. Gen1 dataflows ran on a shared compute model that is generous. Gen2 runs on your Fabric capacity, full stop. If your capacity is sized for your reporting workload and you add Gen2 dataflows on top, you can throttle other workloads. Size up before you migrate, then resize down once you have measured actual consumption.
Refresh failures look different in Gen2. The error messages are improving but still vague in some cases. Build observability in early. The Fabric Monitoring Hub is decent but does not surface everything. We typically add custom logging in our dataflow pipelines so we get useful diagnostics when something breaks at 3am.
OneLake shortcuts are powerful but they are not a free lunch. Cross-workspace shortcuts respect workspace-level permissions but do not propagate row-level security from the source. If you have RLS on your bronze data, you cannot rely on shortcuts to enforce it downstream. Build your security model with that in mind.
Last one - do not migrate dataflows that should not exist. We have done audits where 30 to 40 per cent of an organisation's Gen1 dataflows had not been refreshed in months or had no downstream consumers. Migration is an excellent time to retire dead code rather than carry it forward.
Where to Start
If you are sitting on a Gen1 estate and wondering where to start, the honest answer is - do an inventory first. Catalogue every dataflow, every consumer, every refresh schedule. Then map them onto the three scenarios above and prioritise based on business value and migration complexity. Quick wins first to build confidence and capacity learnings, then the harder pieces.
The official Microsoft guidance on Dataflow Gen2 migration scenarios is worth reading as a starting point. Just remember it is written from Microsoft's perspective, not yours.
If you would like a second opinion on your migration plan, or you want someone who has done this several times to drive it, get in touch. We do this work across Australia and we will give you a straight answer about whether you are heading in a sensible direction.