Back to Blog

Power BI Dataflows with Azure Data Lake Gen2 - Bring Your Own Storage

April 14, 20268 min readMichael Ridland

Power BI Dataflows with Azure Data Lake Gen2 - Bring Your Own Storage

Most organisations running Power BI start with the default storage. Dataflows get created, data gets refreshed, and everything lives inside Power BI's internal managed storage. For a while, that works perfectly fine.

Then something changes. Maybe your data team wants to reuse the same curated data outside of Power BI. Maybe your data engineers want access to the underlying files. Maybe your governance team wants to know exactly where data lives and who can access it. Whatever the trigger, you start looking at the "bring your own storage" option - connecting Power BI dataflows to Azure Data Lake Storage Gen2.

We've helped a number of Australian organisations set this up, and the feature is genuinely useful - but it comes with enough prerequisites and gotchas that you want to plan the connection properly before you start clicking buttons.

Why Would You Want This?

By default, when you build a dataflow in Power BI, the output data sits inside Power BI's internal storage. You can consume it in Power BI reports, reference it from other dataflows, and that's about it. The data is somewhat locked inside the platform.

When you connect a dataflow to your own ADLS Gen2 account, Power BI writes the output data in Common Data Model (CDM) format directly to your storage account. The data lands as structured files with metadata, and suddenly it's accessible to anything that can read from Azure storage.

This opens up real possibilities:

Data scientists can access the same curated data that powers your reports. Instead of building separate data pipelines, they can point their Python notebooks or Databricks clusters at the same ADLS Gen2 location that your Power BI dataflows write to.

Data engineers can build downstream processes that pick up from where Power BI leaves off. Got a dataflow that cleanses and transforms customer data? Now your Azure Functions, Logic Apps, or Fabric pipelines can read that same output.

Backup and compliance requirements become easier to satisfy. The data is in your storage account, under your control, with your retention policies and access controls applied.

I'll be honest - for a lot of smaller deployments, default storage is fine. If you have three analysts building reports and nobody else touches the data, connecting ADLS Gen2 adds complexity without much benefit. Where this feature really shines is in mid-to-large organisations where multiple teams need to work with the same data, or where data governance policies require that you know exactly where your data lives.

The Prerequisites - Read These Carefully

Microsoft's documentation lists the prerequisites, and I want to highlight the ones that trip people up in practice.

The storage account must have Hierarchical Namespace enabled. This is the ADLS Gen2 flag, and it must be set when you create the storage account. You cannot enable it after creation. If you already have a storage account without HNS, you need a new one. I've seen teams waste half a day trying to retroactively enable this.

Owner permission at the storage account level is required. Not at the resource group level, not inherited from the subscription. The storage account itself. If you're an admin, you still need to explicitly assign yourself the Owner role on that specific resource. Azure's IAM model is granular, and this is one of those cases where it catches people out.

You also need the Storage Blob Data Owner role. This is separate from the Owner role. You need both. And the person connecting also needs the Storage Blob Data Reader role. Three separate role assignments on the same storage account.

The storage account must be in the same region as your Power BI capacity. For Pro workspaces, it needs to match the Fabric home region. For Premium workspaces, it needs to match the Premium capacity region. If your organisation has resources scattered across Australia East and Australia Southeast, check which region your Power BI tenant actually lives in before provisioning storage.

Firewalled storage accounts are not supported. This is a big one. Many organisations have their storage accounts locked behind virtual network rules or private endpoints as a security baseline. Power BI's ADLS Gen2 integration currently does not work with firewalled storage accounts. If your security team mandates network restrictions on all storage, this is a blocker.

MFA-protected ADLS connections are also not supported. If your organisation enforces multi-factor authentication across all Azure resource access (which many do), you'll hit this wall during setup.

Tenant-Level vs Workspace-Level Connections

You have two options for how the ADLS Gen2 connection is configured, and the choice depends on your organisation's structure.

Tenant-level storage sets a default ADLS Gen2 account for the entire Power BI tenant. A Power BI administrator configures this once, and then individual workspace admins can opt their workspaces into using it. It's a good approach when you want a centralised data lake and your governance model is relatively uniform.

Workspace-level storage lets each workspace admin connect to a different ADLS Gen2 account. This makes sense in larger organisations where different business units have separate storage accounts, or where data sovereignty requirements mean certain data must stay in a specific location.

You can use both simultaneously. Set a tenant-level default, but allow workspace-level overrides where needed. The key thing to understand is that even if you set up tenant-level storage, workspaces don't automatically start using it. Each workspace admin still needs to explicitly enable it.

One thing to be aware of: if you're connecting at the workspace level, the workspace must have zero dataflows in it before you make the connection. You can't retroactively switch an existing workspace's storage. This means you either need to plan the connection upfront, or you need to recreate your dataflows in a fresh workspace after connecting storage.

How the Storage Structure Works

Once connected, Power BI creates a container called powerbi in your ADLS Gen2 account. Inside that container, the structure is:

powerbi/
  <workspace-name>/
    <dataflow-name>/
      model.json
      <entity-folders>/

The model.json file contains the CDM metadata - schema definitions, data types, relationships. The entity folders contain the actual data files.

This structure is predictable and stable, which means you can build automated processes that read from it. We've built Azure Functions that trigger on new files appearing in these locations, picking up freshly refreshed data and pushing it into downstream systems.

The CDM format itself is worth understanding. It's not just flat CSV files - it includes rich metadata about the data types, hierarchies, and relationships. If you're building anything that consumes this data, you can either parse the CDM metadata or just read the underlying data files directly. For most use cases, reading the data files directly is simpler.

Setting Up the Connection

The actual connection process is straightforward once your prerequisites are met:

  1. Navigate to your workspace in the Power BI service
  2. Open Workspace Settings
  3. Go to the Azure Connections tab, then the Storage section
  4. Either use the tenant default (if configured) or select "Connect to Azure"
  5. Choose your subscription, resource group, and storage account
  6. Save

Power BI will automatically configure the required permissions on the storage account and set up the filesystem. From that point forward, every dataflow in that workspace writes its data to your ADLS Gen2 account.

The initial setup uses your personal Azure credentials to establish the connection, but after that, Power BI uses its own service account. Your personal account doesn't need to remain connected.

When This Feature Falls Short

I want to give you an honest assessment of the limitations we've run into.

The firewall restriction is the biggest practical issue. In security-conscious environments - and that includes most of the financial services and government clients we work with - storage accounts sit behind network rules. Until Microsoft supports private endpoints or service endpoints for this integration, those organisations can't use it.

Performance-wise, writing to external storage adds a small amount of latency to dataflow refreshes. For most workloads it's negligible, but if you have a dataflow that's already pushing the refresh time limit, the extra write time could tip it over.

There's also no built-in lifecycle management for the data in ADLS Gen2. Power BI writes the data, but it doesn't clean up old versions or manage retention. You need to set up your own Azure lifecycle management policies to handle that.

And if you're on the GCC (Government Community Cloud) version of Power BI in the US, this feature isn't available at all.

Who Should Set This Up?

If you're running Power BI as part of a broader Azure data platform, and you have teams beyond the BI analysts who need access to curated data, this is worth doing. The effort to set it up is modest, and the value of having your dataflow outputs available as standard Azure storage is significant.

If you're a standalone Power BI shop and nobody outside your BI team touches the data, keep using default storage. It's simpler and it works.

For organisations in between - maybe you're starting to build out a broader data platform, or you're thinking about bringing data scientists onto the team - I'd suggest setting up the ADLS Gen2 connection on a single workspace first. Test it, understand the folder structure, verify that your security requirements can be met, and then expand from there.

At Team 400, we help Australian organisations design their Power BI architecture to work within their broader data strategy. If you're weighing up whether ADLS Gen2 storage makes sense for your setup, or if you're running into the prerequisites and need a hand, reach out to our team.

We also work extensively with Microsoft Fabric and Azure AI services, so if your data platform ambitions go beyond Power BI, we can help with the bigger picture too.

Reference

This post is based on Microsoft's documentation on configuring dataflow storage with Azure Data Lake Gen2.