Migrating to Dataflow Gen2 Using Save As - A Practical Walkthrough
A lot of Australian organisations I work with have Power BI dataflows sitting quietly in production. They were built in 2021 or 2022, they refresh on schedule, and nobody touches them. Now Microsoft Fabric is the strategic direction, Dataflow Gen2 is the way forward, and someone has finally noticed those legacy Gen1 dataflows are an awkward fit for the new world. The question that always comes up: do we have to rebuild them, or is there a sensible migration path?
The answer is yes, there's a path, and it's called Save As. Microsoft's migrate to Dataflow Gen2 using Save As guide covers the mechanics. What I want to do here is talk about what actually happens when you run this on real client workloads, what breaks, what improves, and how to plan a migration that doesn't blow up your refresh windows.
Why bother migrating
Worth being clear about why you'd do this at all. If your Gen1 dataflow is working and the team's happy, the case for migration isn't automatic. Microsoft is committed to Gen1 for the foreseeable future. They're not pulling the rug.
Where the case becomes compelling is when one of these is true:
- You're already invested in Fabric and you want a single dataflow story across the org.
- You need destinations beyond the default Power BI dataset, like writing to a lakehouse, warehouse, or SQL database.
- You want better performance through compute that scales with your Fabric capacity rather than the shared Gen1 backend.
- You need the parameterisation, fast copy, or staging features that only exist in Gen2.
- You want CI/CD via git integration on the dataflow definition.
For most of the clients we work with on Microsoft Fabric engagements, it ends up being a combination of the first and second points. They've decided Fabric is the strategy, and they want their data prep to land somewhere other than just Power BI.
What Save As actually does
The Save As feature, accessed from inside a Gen1 dataflow's edit experience, creates a new Dataflow Gen2 in your selected Fabric workspace with the same query logic. Tables, connections, and transformations carry over. The original Gen1 dataflow is left untouched, so you can run them in parallel during a migration.
What you get on the Gen2 side after Save As:
- A new Gen2 dataflow with the same M queries.
- The same source connections, though credentials need re-binding.
- The Power Query mashup definition carries across cleanly in most cases.
- The destination is initially unset, since Gen2 supports multiple destinations and you have to pick one.
What doesn't come across or needs attention:
- Computed entities behave differently in Gen2. If you've got computed entities that depend on linked entities from another dataflow, that chain needs rebuilding.
- Incremental refresh settings need re-configuring on the Gen2 side.
- Workspace-level settings like enhanced compute engine don't apply; Gen2 has its own performance model.
- Any custom M functions that referenced Gen1-specific behaviours might break, though this is rare in practice.
The thing to remember is that Save As is a one-way copy. Once you save, the two dataflows are independent. Changes to one don't propagate to the other. So you need a clear cutover plan rather than a slow parallel evolution.
A workflow that actually works
The pattern we use with clients goes like this.
First, audit the Gen1 dataflows. Pull a list from every workspace, document the source connections, the destination consumers, the refresh schedules, and the owner. Half the migration work is just knowing what you have. We usually find that 20 to 30 percent of the dataflows in any client tenant haven't refreshed in months or aren't actually consumed by anything. Don't migrate those. Decommission them.
Second, group the surviving dataflows by complexity. Simple imports with a few transformations get one treatment. Complex multi-stage flows with computed entities, linked entities, or unusual sources get another. Migrate the simple ones first to build confidence and uncover any tenant-specific gotchas before you touch the hard stuff.
Third, for each dataflow, do Save As into the target Fabric workspace, then immediately:
- Re-bind the source connections. Connections in Fabric are managed differently. You'll likely need to create new connection references via the Fabric portal or rely on existing OneLake or shared connections.
- Set the destination. This is the most important decision. For migrations where you want to preserve the existing downstream consumers (Power BI datasets pulling from the dataflow), use the Power BI dataset destination or stage to a lakehouse and have datasets pull from there. We default to lakehouse-as-destination for new Gen2 work because it opens up more downstream options.
- Configure refresh. Schedule should mirror the Gen1 schedule initially, with the option to reduce it if Gen2 performance allows.
- Run a manual refresh and validate the output table-by-table against the Gen1 output.
Fourth, run both in parallel for at least two refresh cycles. Compare row counts, key column hashes, and any aggregated metrics that downstream consumers care about. Don't trust visual inspection. We've had cases where a Gen2 dataflow produced subtly different results from Gen1 because of how Power Query optimises differently against the Fabric compute, and we only caught it because the row hashes diverged.
Fifth, cut over the consumers. Update Power BI datasets to point at the Gen2 source, redirect any Power Automate or other consumers, and once everything's pointing at Gen2, decommission the Gen1 dataflow. Don't leave the Gen1 running indefinitely. It costs nothing in Fabric capacity (because Gen1 runs on shared compute) but it creates confusion about which is the source of truth.
What actually breaks
The migration sounds clean. In practice, here's what bites people.
Connection re-binding is fiddly. Gen1 used dataflow-scoped connections managed inside the dataflow itself. Fabric uses workspace-level or tenant-level connections managed via the Manage connections portal. After a Save As, you'll see the queries with broken credential warnings until you re-create the connection references. For tenants with strict service principal usage policies, this can become a multi-day setup process because you need security approval for the new connection registration.
Incremental refresh has to be redone from scratch. Gen2 supports incremental refresh but the configuration is per-destination. If your Gen1 dataflow had incremental refresh configured against the Power BI dataset destination, you'll set it up again on the Gen2 side. Don't forget to set the appropriate range window and refresh policy.
Computed entities behave differently. In Gen1, a computed entity was an entity built on top of another entity within the same dataflow, processed by the enhanced compute engine. In Gen2, the equivalent pattern is staging. Tables can be staged (cached to managed storage during refresh) or not, and the Power Query engine handles dependencies. Most Gen1 computed entities translate cleanly, but if you had complex chains, expect to spend a half-day understanding the new model.
Linked entities from other dataflows need a rethink. If your Gen1 dataflow consumed data from another Gen1 dataflow via a linked entity, that pattern doesn't have a direct equivalent in Gen2. You'll either replicate the upstream logic, switch the upstream to a lakehouse and consume from there, or use a different pattern entirely. We've usually taken the opportunity to flatten these chains during migration. Linked entities were always a bit of a workaround.
Performance can go either way. Gen2 against a healthy F SKU capacity is generally faster than Gen1 on shared compute. But if your Fabric capacity is undersized or you have lots of concurrent refreshes, Gen2 can be slower because it's competing for capacity units. Monitor the Capacity Metrics App after migration and adjust accordingly. We've had to bump clients from F64 to F128 mid-migration when we underestimated the load.
Power Query M differences. Most M code translates one-to-one. The edge cases are functions that depend on Gen1-only behaviours, undocumented quirks of the enhanced compute engine, or anything that relied on the specific order of operations the Gen1 backend used. These show up as either errors or subtle data differences. The data-diffing step in the parallel run period catches them.
A few opinions worth airing
Don't migrate everything just because you can. We had a client last year who wanted to move 200 Gen1 dataflows to Gen2 as part of a Fabric rollout. We talked them down to 60. The rest were either unused or so simple they'd be easier to rebuild from scratch in a new pattern entirely.
Use the migration as an opportunity to redesign your data prep layer. If you're moving anyway, this is the right moment to flatten unnecessary chains, consolidate duplicated logic across dataflows, and shift away from the Power BI dataset destination as the default. Land in a lakehouse first, build datasets on top. This pattern scales better and gives you more flexibility for downstream consumers.
Be wary of Save As as a one-button solution. It works, but it's a starting point, not a finished migration. Budget a couple of hours per dataflow for the manual work that comes after Save As, and more for anything with computed or linked entities.
Get your Fabric capacity sizing sorted before you migrate. The most common cause of migration disappointment is moving to Gen2 on an undersized capacity, finding that refreshes are slower, and blaming Gen2 when the actual problem is capacity throttling. Use the metrics app, plan the load.
Where this fits in a Fabric strategy
For Australian organisations seriously moving to Microsoft Fabric, dataflow migration is one of several workstreams running in parallel. There's lakehouse design, semantic model rebuilding, Power BI workspace governance, pipeline orchestration via Data Factory in Fabric, and so on. The dataflow piece is usually the easiest one to scope, which is why it's often where teams start. It's a good warm-up for the rest.
We run a lot of these migrations as part of broader Fabric implementations. If you'd like a hand thinking through the sequence for your environment, or just want a second opinion on whether you should be migrating at all, our Microsoft Fabric consultants and data engineering team work with this every week. Happy to have a conversation about your specific situation.
The Save As feature is one of those small Microsoft features that does exactly what it says on the tin, with a few sharp edges underneath. Knowing where those edges are is half the battle. The other half is having a clear reason to migrate in the first place. Don't migrate for the sake of it.
Reference: Migrate to Dataflow Gen2 using Save As - Microsoft Learn