Back to Blog

Creating Your First Microsoft Fabric Data Factory Pipeline - A Practical Walkthrough

April 6, 20267 min readMichael Ridland

Creating Your First Microsoft Fabric Data Factory Pipeline - A Practical Walkthrough

Most data teams I talk to are in one of two camps when it comes to Microsoft Fabric. They've either been circling it for months, reading documentation but not actually building anything. Or they jumped in, got confused by the sheer number of options, and went back to what they knew.

If that sounds familiar, building your first pipeline is a good way to break the deadlock. Not because a sample data pipeline is production-ready work, but because it forces you to set up the foundations - workspace, Lakehouse, pipeline authoring - and those foundations are what everything else builds on.

Here's what the process actually looks like, and where you might trip up along the way.

Why Start With Pipelines?

Fabric gives you a lot of ways to move and reshape data. Dataflows Gen2, notebooks, Spark jobs, copy jobs. For someone coming from Azure Data Factory, pipelines are the most familiar entry point. The concepts map almost directly - activities, triggers, monitoring, parameters.

But even if you've never used ADF, pipelines are still the right starting point because they teach you how Fabric thinks about data movement. Source to destination. Configuration through a visual canvas. Runs you can monitor and schedule.

Once you've built one pipeline, the rest of Fabric makes more sense.

Setting Up - What You Actually Need

Before you touch the pipeline editor, you need two things sorted.

A Fabric-enabled workspace. This is where your pipeline and Lakehouse will live. If your organisation has a Fabric capacity (F SKU), you can create a workspace and assign it to that capacity. If you're evaluating Fabric, Microsoft offers a trial capacity that gives you enough room to experiment.

The thing people miss here is that workspace assignment matters. A pipeline sitting in a workspace without capacity assigned won't run. You'll get a vague error that doesn't clearly explain the problem. Check your workspace settings and make sure a capacity is allocated before you start building.

A Lakehouse as your destination. You can create one on the fly during the pipeline wizard, but I'd recommend creating it separately first. Give it a sensible name that reflects what it'll hold. "Lakehouse1" is fine for a tutorial but it becomes confusing fast when you have ten of them.

Building the Pipeline Step by Step

Open your workspace and select "New item", then search for "Pipeline". Give it a name - something descriptive. "Sample Holiday Data Load" beats "Pipeline1" by a wide margin.

The Copy Data Assistant

Once you're in the pipeline editor, you'll see a mostly empty canvas. Select "Copy data assistant" and it walks you through a guided experience.

First, you pick your data source. For a first pipeline, select the "Sample data" tab and choose "Public Holidays". This is a small dataset that Microsoft hosts specifically for getting-started scenarios. No connection strings, no credentials, no fiddling with firewall rules. It just works.

You'll get a preview of the data - country codes, holiday names, dates. Nothing exciting, but that's the point. You want the pipeline mechanics to be the interesting part, not the data.

Configuring the Destination

Next, you point the pipeline at your Lakehouse. If you created one earlier, select it from the list. If not, you can create one inline. Pick "Tables" as the root folder (not "Files" - that's for unstructured data) and give your table a name. "public_holidays" works well.

One detail that catches people: the table name you enter here becomes the Delta table name in your Lakehouse. Delta format is the default and it's the right choice for structured data. You get ACID transactions, time travel, and schema evolution out of the box.

Review and Run

The assistant shows you a summary of what it's about to do. Source, destination, column mappings. Check these over - the defaults are usually right for sample data but in production scenarios this is where you'd spot column type mismatches or unwanted columns.

Hit "Save + Run" and the pipeline executes immediately. You'll see the output tab populate with run status, duration, rows read, and rows written.

For sample data, this runs in seconds. Real-world pipelines loading millions of rows from SQL Server or an API will take longer, obviously, but the monitoring experience is the same.

Scheduling - Where It Gets Useful

A pipeline that runs once isn't much use. The scheduling interface in Fabric is straightforward but has a few options worth knowing about.

From the pipeline editor, select "Schedule" on the Home tab. You can set frequency (every 15 minutes, hourly, daily, weekly), start and end dates, and time zone.

Here's my honest take: the scheduling in Fabric is functional but basic compared to what mature orchestration tools offer. You get simple recurrence patterns but not complex dependency chains. If pipeline B needs to wait for pipeline A to finish, you'll need to build that logic using "Invoke Pipeline" activities or look at Fabric's newer orchestration features.

For most use cases though - daily data loads, hourly refreshes - the built-in scheduler does the job.

What I'd Do Differently in Production

Sample data pipelines are great for learning the tool, but production pipelines need more thought. Here's what I tell clients when they move past the getting-started phase.

Parameterise everything. Hard-coded source paths, database names, and table names make pipelines brittle. Use pipeline parameters so you can point the same pipeline at dev, test, and production environments by changing a parameter value rather than rebuilding the pipeline.

Add error handling early. The default behaviour when a copy activity fails is that the pipeline fails. That's fine until you have 15 activities in a pipeline and one flaky API call at step 3 kills the whole run. Use "On failure" paths and "Set variable" activities to capture errors and decide whether to continue or abort.

Monitor actively, not passively. Fabric's monitoring hub shows you run history, but nobody checks it daily. Set up alerts for pipeline failures. Microsoft has built alert integration into Fabric that can notify via email or Teams. Five minutes setting up alerts saves hours of discovering stale data three days late.

Think about incremental loads from day one. Your first pipeline will do a full data load. That works for small datasets. But when your source table has 50 million rows, full loads every hour are wasteful and slow. Design for incremental loads using watermark columns (like a "last modified" timestamp) from the start, even if your initial dataset doesn't strictly need it.

Where Fabric Pipelines Fit in the Bigger Picture

Pipelines handle data movement and orchestration. They're the "plumbing" that gets data from source systems into your Lakehouse or Warehouse. They don't do heavy transformation - that's where notebooks and Dataflows Gen2 come in.

The pattern I see working well is: pipeline pulls raw data into a Lakehouse bronze layer, a notebook or dataflow transforms it into a silver layer, and Power BI reports sit on top. This is the medallion architecture pattern and Fabric is built around it.

If your organisation is already invested in the Microsoft stack - Power BI for reporting, Azure for infrastructure, Microsoft 365 for collaboration - Fabric pipelines are the natural choice for data integration. The experience is genuinely good, especially compared to where it was twelve months ago.

Getting Help With Fabric

We've been helping Australian businesses adopt Microsoft Fabric since it went GA, and the pattern we see most often is teams that understand the concepts but get stuck on the practical details. Connection configurations, gateway setup, capacity sizing, incremental refresh design - the stuff that documentation covers in theory but not in your specific context.

If you're building out your data platform on Fabric and want hands-on guidance, our Microsoft Fabric consulting team can help you get it right from the start. We also run data integration projects where we build the pipelines alongside your team so you're self-sufficient afterwards.

For the full official walkthrough, Microsoft's Create a pipeline tutorial covers the step-by-step with screenshots.

Building your first pipeline is a small step, but it's the step that makes everything else in Fabric feel less abstract. Get the sample data flowing, then start thinking about your real data. That's how every successful Fabric adoption I've seen has started.