Back to Blog

Python in Power BI Query Editor - A Practical Guide for Data Teams

April 4, 20267 min readMichael Ridland

I'll be honest - when Microsoft first added Python support to Power BI's Query Editor, I thought it was a gimmick. Power Query's M language already handles most data transformation scenarios, and adding another language seemed like complexity for complexity's sake.

Then we hit a client project where we needed to fill in missing sensor data from an IoT pipeline using statistical imputation. M couldn't do it elegantly. Python with pandas did it in three lines. That changed my thinking entirely.

Microsoft's documentation on Python in Power Query Editor walks through the basics, but having used this in production projects across several Australian organisations, I want to share what actually matters when you're doing this for real.

When Python in Query Editor Actually Makes Sense

Let me be clear about this - you should not default to Python for every data transformation in Power BI. Power Query's built-in transformations are faster, easier to maintain, and don't require a Python installation on every machine that opens the file. Use them first.

But there are genuine scenarios where Python earns its place:

Filling missing values with statistical methods. This is Microsoft's own example, and it's a good one. Power Query can replace nulls with a fixed value, sure. But backfilling, forward-filling, or interpolating based on surrounding data points? That's where pandas shines. The fillna(method='backfill') pattern alone has saved us hours of manual data cleanup on several projects.

Clustering and classification during data prep. We had a retail client who needed to segment their customer base before it even hit the data model. Running a quick k-means cluster in the Query Editor meant the segmentation stayed in the data pipeline rather than being a separate offline process.

Text processing that goes beyond simple string operations. Regex in M is painful. Python's regex support, or even the fuzzywuzzy library for approximate string matching, handles messy text data far more naturally.

Statistical outlier detection. Identifying and flagging outliers using z-scores or IQR methods during the data prep stage means your reports don't get thrown off by bad data.

If your transformation can be done with standard Power Query steps - splitting columns, merging tables, changing types, pivoting - just use Power Query. It's simpler and performs better.

Setting It Up - The Bits Microsoft Glosses Over

The official docs tell you to install Python and pandas. That's true, but there are a few things they skip that will bite you in production.

Use a virtual environment. Don't install packages into your system Python. Create a virtual environment specifically for Power BI work. This isolates your Power BI dependencies from everything else on your machine and makes it easier to replicate the setup across team members.

python -m venv powerbi-env
powerbi-env\Scripts\activate
pip install pandas scikit-learn

Then point Power BI Desktop to this environment under File > Options > Python scripting. Set the Python home directory to your virtual environment's folder.

Tell Power BI where your Python is. This sounds obvious, but I've seen teams spend an hour debugging "Python not found" errors because Power BI was looking at the wrong installation. If you have Anaconda, regular Python, and maybe a pyenv setup all on the same machine, Power BI can get confused. Be explicit about the path.

The dataset variable is your interface. When your Python script runs inside Query Editor, your current query data arrives as a pandas DataFrame called dataset. You work with it, modify it, and Power BI picks up any DataFrames left in scope when the script finishes. If you create multiple DataFrames, Power BI will let you choose which one to keep. This is actually useful - you can output a cleaned version and a summary table from the same script.

A Real-World Example Beyond the Docs

Microsoft's example fills missing stock values using backfill. Here's a pattern we use more often - detecting and handling outliers in financial data before it hits the data model:

import pandas as pd
import numpy as np

# Calculate z-scores for the amount column
z_scores = np.abs((dataset['Amount'] - dataset['Amount'].mean()) / dataset['Amount'].std())

# Flag outliers (more than 3 standard deviations)
dataset['IsOutlier'] = z_scores > 3

# Cap extreme values instead of removing them
upper_limit = dataset['Amount'].mean() + 3 * dataset['Amount'].std()
lower_limit = dataset['Amount'].mean() - 3 * dataset['Amount'].std()
dataset['CleanedAmount'] = dataset['Amount'].clip(lower_limit, upper_limit)

This gives you the original data, an outlier flag for investigation, and a cleaned column for reporting - all in the data pipeline, not buried in a DAX measure.

The Privacy Level Headache

Here's something the docs mention but don't emphasise enough: every data source in your query needs to be set to "Public" for Python scripts to work. This is a privacy level setting, not a security setting - it controls how Power BI isolates data between different sources within the mashup engine.

In practice, this means you'll get a privacy warning the first time you run a Python step. You need to go to File > Options > Privacy and either set each source to Public individually, or choose "Always ignore Privacy Level settings." The second option is easier for development but think carefully about it in production, especially if your queries combine data from multiple sources with different sensitivity levels.

For client engagements, we typically set the specific sources to Public rather than globally ignoring privacy levels. It takes a bit more effort upfront but avoids awkward conversations with the client's security team later.

Gateway and Refresh Considerations

This is where the "use Python in Power BI" story gets a bit rough. If you're only working in Power BI Desktop, everything is straightforward - the Python runs on your local machine.

But when you publish to the Power BI service and want scheduled refresh, you need a gateway with Python installed on the gateway machine. Specifically:

  • You need an on-premises data gateway (personal mode works for personal datasets)
  • Python must be installed on the gateway machine
  • All the packages your scripts use must be installed there too
  • The gateway service account needs permission to run Python

We've seen this catch teams out. They build a report with Python transforms on their laptop, publish it, and then can't refresh it because the gateway machine doesn't have scipy or whatever package they used. Make a requirements.txt file for your Power BI Python dependencies and keep it version-controlled alongside your .pbix files.

Limitations Worth Knowing

A few things that have tripped us up or limited what we could do:

Performance. Python steps in Query Editor are slower than native M steps. The data has to be serialised, passed to the Python runtime, processed, and passed back. For small to medium datasets this is fine. For millions of rows, it can be noticeably slow. Profile your refresh times.

No nested tables. Power Query loves nested tables - they're how it handles grouped data. Python steps can't work with them. You'll need to expand nested tables before the Python step and optionally re-group after.

Debugging is basic. When your Python script errors out in Query Editor, the error messages are often unhelpful. Test your scripts in a regular Python environment first with sample data, then paste them into Power BI once you know they work.

Team skills. This is the non-technical one, but it matters most. If only one person on the team knows Python, you've created a maintenance dependency. We usually recommend that at least two people on any data team have working Python knowledge before adopting this pattern in production reports.

Where This Fits in the Bigger Picture

Python in Query Editor is one piece of the broader story around bringing code-based data processing into the Microsoft analytics stack. If you're also looking at Microsoft Fabric, Databricks, or Azure Synapse, you'll find Python even more central.

For organisations that are primarily Power BI shops today and want to gradually introduce more sophisticated data processing, Python in Query Editor is a sensible first step. It keeps everything inside the tool your team already knows, and the learning curve is manageable.

If you're working with a data team that needs help bringing Python capabilities into their Power BI workflows, our Power BI consulting team has done this across retail, financial services, and manufacturing clients. We can also help with the broader data integration strategy if you're thinking about Fabric or Synapse down the line.

The bottom line? Python in Power BI Query Editor is not something you should use everywhere. But when you need it - and you will eventually - it's genuinely useful. Set up your environments properly, document your dependencies, and make sure more than one person on the team can maintain what you build.