Back to Blog

Using Copilot in Microsoft Fabric Data Factory - What Actually Works

March 9, 20267 min readMichael Ridland

If you've spent any time building data pipelines in Microsoft Fabric, you know the drill. You're stitching together Copy activities, writing M expressions in Power Query, and debugging pipeline failures at 11pm because something broke in production. Microsoft's answer to making this less painful is Copilot for Data Factory - and after spending real time with it across client projects, I've got some honest thoughts on where it shines and where it still needs work.

What Copilot for Data Factory Actually Does

At its core, Copilot sits inside two parts of the Data Factory workload: Dataflow Gen2 and pipelines. In Dataflow Gen2, you can type natural language prompts to transform data instead of writing Power Query M code manually. In pipelines, you can generate entire pipeline structures from a description, get expressions written for you, and have Copilot summarise or troubleshoot existing pipelines.

The idea is simple - instead of remembering the exact M syntax for filtering European customers from a table, you type "Only keep European customers" and Copilot figures out the expression. Same deal with pipelines - describe what you want to move and where, and it generates the Copy activities and configuration.

Microsoft's official documentation walks through the setup with the Northwind OData sample, which gives you a decent feel for the basics.

Getting Started - The Prerequisites

Before you can use any of this, there are a few boxes to tick:

  • You need a Microsoft Fabric licence (obviously)
  • Your workspace needs Fabric capacity
  • Your tenant admin needs to enable Copilot

That last point trips people up more than you'd think. We've had clients come to us saying "Copilot doesn't appear in my Data Factory toolbar" and nine times out of ten, it's because their admin hasn't flipped the switch in the Fabric admin portal. It's not enabled by default, and depending on your organisation's policies around AI features, getting approval can take a few weeks.

Dataflow Gen2 - Where Copilot is Most Useful

This is where I think Copilot adds the most value right now. Writing M expressions has always been one of those skills that data engineers either love or tolerate. The syntax isn't hard once you know it, but remembering the exact function names and parameter orders? That's where people lose time.

With Copilot in Dataflow Gen2, you connect to your data source as normal, then open the Copilot pane and start typing what you want. Some things that work well:

Filtering and grouping - prompts like "Only keep European customers" or "Count the total number of employees by City" translate into clean M steps. The Applied Steps pane updates in real time, so you can see exactly what Copilot generated and verify it against your data.

Creating sample data - this one surprised me. You can ask Copilot to "Create a new query with sample data that lists all the Microsoft OS versions and the year they were released" and it generates a table from scratch. Handy for prototyping transforms when you don't have production data available yet.

Explaining queries - if you inherit a Dataflow with complicated M code (and let's be honest, most inherited Dataflows have at least a few head-scratchers), you can ask Copilot to explain the current query in plain English. This alone saves hours when onboarding new team members.

There's also an undo feature built in, which matters more than you'd expect. If Copilot generates a step that doesn't look right, you can hit Undo in the Copilot pane or just type "Undo" to roll it back. Small thing, but it makes experimentation feel safe.

Where It Falls Short in Dataflows

Complex multi-step transformations don't always come out clean. If you try to describe a complicated pivot, unpivot, and merge in a single prompt, you'll likely get something that needs manual adjustment. The sweet spot is one transformation per prompt - keep it focused and iterate.

Also, Copilot sometimes generates technically correct M code that's not the most efficient approach. It might create five steps where two would do the job. For one-off analyses, that doesn't matter. For Dataflows running on a schedule against large datasets, performance matters, and you'll want to review what Copilot generated.

Pipeline Generation - Promising but Still Maturing

This is the flashier feature. You can describe a data integration scenario in natural language - something like "Create a pipeline that copies data from a SQL Server table called Orders to Azure Data Lake Storage Gen2" - and Copilot generates the pipeline structure with a Copy activity already configured.

In practice, I've found this works best as a starting point rather than a finished product. Copilot gets the basic structure right most of the time, but you'll still need to:

  • Configure actual connection details (it can't guess your server names)
  • Set up authentication
  • Add error handling that matches your organisation's standards
  • Configure scheduling and triggers

Where pipeline Copilot gets genuinely useful is the expression builder integration. Inside any pipeline activity that supports dynamic content, you can ask Copilot to write expressions. Need a date formatted as yyyy-MM-dd? Need to pick the first non-null value between two parameters? Just describe it and Copilot writes the expression.

This matters because pipeline expressions use a specific syntax that's not quite the same as Power Query M or standard programming languages. Having Copilot handle that translation removes a real friction point.

Troubleshooting - The Underrated Feature

Pipeline troubleshooting with Copilot might be the most practically useful feature of the lot. When a pipeline fails, instead of manually parsing error logs and hunting through activity outputs, you can ask Copilot to explain the error and suggest fixes.

We've used this on a few Microsoft Fabric consulting engagements where clients had inherited complex pipelines from previous teams. Being able to point Copilot at a failed run and get a plain-English explanation of what went wrong - and what to try next - cut our investigation time significantly.

It's not perfect. Sometimes the recommendations are too generic ("check your connection settings" when the issue is actually a schema mismatch), but it gets you pointed in the right direction more often than not.

What This Means for Data Teams

Here's my honest take. Copilot in Data Factory doesn't replace the need for people who understand data engineering fundamentals. You still need to know what a good pipeline looks like, how to handle incremental loads, what proper error handling means for your business. What it does is reduce the time spent on syntax and boilerplate.

For organisations building their first Fabric implementation, Copilot lowers the barrier to getting productive quickly. Your team can focus on the data logic rather than fighting with M expressions or pipeline expression syntax.

For mature data teams, it's a productivity boost. Senior engineers can prototype faster, and junior team members can be more self-sufficient without constantly asking "what's the M function for..." questions.

A Few Things to Watch Out For

AI-generated mistakes are real. Microsoft is upfront about this - "AI powers Copilot, so surprises and mistakes are possible." Always validate what Copilot generates against your actual data. I've seen it apply filters that looked correct but missed edge cases in the data.

Copilot needs context. The quality of the output depends heavily on the quality of your prompt and the structure of your data. Well-named columns and tables get better results than cryptic abbreviations. If your source table has columns named "col1", "col2", "col3", don't expect Copilot to know what to do with them.

Tenant-level control matters. For regulated industries - financial services, healthcare, government - your organisation needs to understand what data Copilot processes and where. This is a conversation your IT and compliance teams should have before enabling it.

Getting Help with Fabric and Data Factory

We work with Australian organisations on Microsoft Data Factory implementations regularly, and Copilot has become part of how we accelerate delivery. If you're evaluating Fabric or trying to get more out of your existing setup, get in touch and we can walk through what makes sense for your situation.

The tooling is improving fast. What was a novelty feature twelve months ago is now a genuine productivity tool for data teams who take the time to learn its strengths and limitations. The key is knowing when to lean on it and when to write the code yourself.