Back to Blog

Dataflow Gen2 With a Virtual Network Gateway in Microsoft Fabric - What It Costs and When You Need It

June 26, 20268 min readMichael Ridland

If your data sits behind a firewall, on a private network, or inside a locked-down Azure virtual network, you've probably already hit the wall that this feature is built to solve. Your security team is not going to open a public endpoint to your finance database just because someone in analytics wants to build a dataflow. And they're right not to. So the question becomes: how do you let Microsoft Fabric reach private data without poking a hole in the network that the security team will spend the next year worrying about. The virtual network data gateway for Dataflow Gen2 is Microsoft's answer, and it's worth understanding both how it works and what it adds to your bill before you commit to it.

We've set this up for a few Australian clients now, mostly in financial services and a couple in healthcare, where private networking isn't a nice-to-have but a compliance requirement. The technology is solid. The pricing has a wrinkle that catches people out. Let me walk through both.

What problem the virtual network gateway solves

A Dataflow Gen2 is the modern data preparation engine in Fabric. You use it to pull data from a source, clean it, reshape it, and land it somewhere Fabric can use. By default, a dataflow connects to sources over the public internet using Microsoft's cloud connectivity. That's fine when your source is a SaaS API or a database with a public endpoint and proper authentication.

It stops being fine the moment your source lives somewhere private. An Azure SQL database locked to a virtual network. A data warehouse with no public IP. An on-premises system reachable only through a private link. For those, Fabric needs a way into the private network, and that's what the gateway provides.

There are two flavours worth knowing. The on-premises data gateway is the older one, where you install a piece of software on a machine inside your network and it brokers the connection. The virtual network data gateway is the newer, fully managed option. You don't install or run anything. Microsoft provisions a managed gateway inside your Azure virtual network, and Fabric uses it to reach your private sources. No VM to patch, no server to babysit, no one getting paged at 2am because the gateway machine ran out of disk. For most cloud-native Australian businesses, the managed virtual network gateway is the one to reach for.

How the pricing actually works

Here's the part that catches people out, and it's worth being precise about because the surprise is avoidable.

When you run a Dataflow Gen2 normally, you pay for the compute it consumes, measured in Fabric capacity units, and that draws down against your capacity. Straightforward enough. When you run that same Dataflow Gen2 through a virtual network data gateway, the compute is billed at a higher rate. Microsoft applies a premium to the capacity units consumed because the work is running through managed private networking infrastructure rather than standard cloud connectivity.

So the same dataflow, doing the same transformation on the same volume of data, costs more when it runs through the gateway than when it runs over the public path. The difference isn't enormous per run, but it's real, and it compounds if you've got dataflows refreshing on a tight schedule. I've seen a client set up a private gateway for one sensitive source and then, without thinking it through, route every dataflow in the workspace through it for consistency. The result was a meaningful jump in capacity consumption for a lot of dataflows that didn't need private networking at all.

The practical rule is straightforward. Use the virtual network gateway for the dataflows that genuinely need to reach private data. Don't route public-source dataflows through it just because it's there. The premium is the cost of the privacy, and you only want to pay it where the privacy is required.

If you want to understand the broader Fabric billing model that this sits inside, we wrote a separate breakdown of how Fabric Data Factory pricing works that covers the capacity-unit model in general, and it's worth reading alongside this if the whole consumption thing is new to you.

Estimating the cost before you commit

The honest answer to "what will this cost us" is that it depends on three things: how much data each dataflow moves, how complex the transformations are, and how often the dataflow refreshes. A small dataflow that pulls a few thousand rows from a private SQL database once a day will barely register. A heavy dataflow that reshapes millions of rows every fifteen minutes through the gateway will consume real capacity at the premium rate, and you'll feel it.

The way I'd approach an estimate is the way Microsoft's own pricing example does it. Take a representative dataflow. Work out roughly how many capacity-unit seconds it consumes per run. Apply the gateway premium. Multiply by the number of refreshes per day, then by your billing period. That gives you a number you can actually plan around, rather than discovering it after the fact.

The mistake I see is people estimating based on data volume alone and ignoring refresh frequency. A modest dataflow refreshing every fifteen minutes runs ninety-six times a day. The same dataflow on an hourly schedule runs twenty-four times. That's a four-times difference in cost for, in a lot of cases, no real difference in business value, because nobody's looking at the data more than a few times a day anyway. Before you optimise the dataflow, ask whether it needs to refresh as often as it does. That single question saves more money than any amount of transformation tuning.

What works well

Having set this up a few times, here's what genuinely impresses me.

The managed gateway removes a whole category of operational pain. The old on-premises gateway meant someone owned a machine, kept it patched, monitored it, and dealt with it falling over. The virtual network gateway is Microsoft's problem to run. For a lean Australian business without a big platform team, that's worth a lot. You get private connectivity without taking on private infrastructure.

The security story is clean. Data moves through your virtual network, never traverses a public endpoint, and your security team can reason about the network path properly. In a compliance audit, "the data never leaves the private network" is a much easier sentence to defend than "we opened a restricted public endpoint with these rules." We've used exactly this setup to get analytics workloads approved in environments where a public path was simply never going to pass review.

Setup is reasonable. It's not a one-click affair, you need someone who understands Azure networking to wire it up, but it's a known quantity. Once it's configured, it mostly stays out of the way.

What to watch out for

The cost premium is the obvious one, and I've banged on about it enough. Route only what needs routing.

The other watch-out is that troubleshooting connectivity through a managed gateway is harder than through a gateway you control. When a connection fails, you've got fewer levers to pull and less visibility into the network path than you would with a machine you own. Most of the time it just works, but when it doesn't, expect to spend a bit longer diagnosing it, and make sure whoever set up the virtual network can help, because the failures are usually networking failures, not Fabric failures.

There's also a capability gap to check before you assume it'll work. Not every connector and every source type is supported through the virtual network gateway in the same way it is over the public path. Before you design an architecture around it, confirm your specific source is supported through the managed gateway. Finding out it isn't, after you've promised the security team a private path, is not a fun conversation.

When you actually need this

You need the virtual network gateway if you have data sources on a private network that Fabric can't otherwise reach, and you can't or won't expose them publicly. That's it. For financial services, healthcare, government, and anyone with a serious data classification policy, that describes most of the interesting sources. For a business whose data already lives in publicly addressable cloud services with good authentication, you may never need it, and you shouldn't pay the premium for connectivity you don't require.

The decision usually isn't really a technical one. It's a conversation between the people who want the data and the people who own the network, and the gateway is the thing that lets both of them get what they want. Getting that architecture right early, before you've built a dozen dataflows that all need re-pointing, is where having someone who's done it before pays off. It's the kind of thing we help with as part of our Microsoft Fabric work and broader Azure AI and data consulting for Australian businesses.

For Microsoft's own worked pricing example, including the specific capacity-unit numbers, the Dataflow Gen2 with virtual network gateway pricing documentation lays out the maths.

If you're trying to get private data into Fabric without your security team vetoing the whole project, that's a problem we've solved a few times now. Get in touch if you'd like to talk through the right setup for your environment.