How to Build Your First Data Pipeline with Data Factory

April 20, 2026•9 min read•Michael Ridland

Building your first data pipeline in Data Factory should take hours, not weeks. But we've seen plenty of teams spend weeks on what should be a simple first pipeline because they got lost in configuration options or followed outdated tutorials.

This guide walks you through building a real, production-ready data pipeline from scratch. We'll cover both Azure Data Factory (ADF) and Fabric Data Factory, noting where they differ. By the end, you'll have a working pipeline that moves data from a source system, applies basic transformations, and lands it in a destination - with proper error handling and monitoring.

Before You Start - What You Need

For Azure Data Factory

An Azure subscription (free trial works for learning)
An Azure resource group
An Azure Data Factory instance (create one in the Azure portal - takes about 2 minutes)
At least one data source to read from (Azure Blob Storage is easiest to start with)
A destination to write to (Azure SQL Database or Azure Data Lake Storage Gen2)

For Fabric Data Factory

A Microsoft Fabric capacity (even a trial capacity works)
A Fabric workspace
Access to at least one data source
A Fabric Lakehouse or Warehouse as your destination

The interfaces are similar enough that most of this guide applies to both. We'll call out differences where they matter.

Step 1 - Plan Your Pipeline Before You Build

This sounds obvious, but we've seen teams jump straight into the visual designer without thinking through what they're building. Take 15 minutes to answer these questions:

What data are you moving? Be specific. Which table, which file, which API endpoint?
Where is the source? Cloud service, on-premises database, SaaS application?
Where is the destination? Data lake, SQL database, warehouse?
What transformations do you need? Column renaming, filtering, data type conversion, joining with other data?
How often should it run? Real-time, hourly, daily, weekly?
What happens when it fails? Who gets notified? Does it retry?

For your first pipeline, keep it simple. Move one dataset from one source to one destination with minimal transformation. You can add complexity later.

Step 2 - Set Up Your Connections

In ADF, connections are called Linked Services. In Fabric, they're called Connections. Same concept, slightly different names.

Creating a Source Connection

Azure Data Factory:

Open the ADF Studio (from the Azure portal, click "Launch Studio")
Go to the Manage tab
Click Linked services then New
Search for your source type (e.g., "Azure Blob Storage")
Configure the connection - name, authentication method, connection string or account URL
Click Test connection to verify it works
Click Create

Fabric Data Factory:

Open your Fabric workspace
Go to Settings then Manage connections and gateways
Click New connection
Select your source type
Configure and test the connection

Creating a Destination Connection

Repeat the same process for your destination. If you're writing to Azure SQL Database, you'll need:

Server name (e.g., yourserver.database.windows.net)
Database name
Authentication (SQL auth or Azure AD)

Common mistake: Using SQL authentication with credentials stored in plain text. Use Azure Key Vault to store connection strings and secrets from day one. It takes 10 minutes to set up and saves you a security headache later.

Step 3 - Create Your Datasets (ADF) or Set Up Source/Destination (Fabric)

In Azure Data Factory

Datasets define the structure of your data. You need one for the source and one for the destination.

Go to the Author tab
Click the + next to Datasets and select New dataset
Choose your source type and linked service
Configure the dataset - file path, table name, schema
Repeat for the destination

Tip: Use parameterised datasets from the start. Instead of hardcoding a file path like /data/sales/2026-04-20.csv, use a parameter like /data/sales/{date}.csv. This makes your pipeline reusable.

In Fabric Data Factory

Fabric pipelines don't require separate dataset objects. You configure source and destination directly within the pipeline activities. This is simpler for getting started, though it can lead to duplication if you're building many pipelines against the same sources.

Step 4 - Build the Pipeline

Now for the actual pipeline.

Create a Copy Data Activity

The Copy Data activity is the workhorse of Data Factory. It moves data from source to destination with optional column mapping and basic transformations.

Go to the Author tab (ADF) or create a new Data Pipeline (Fabric)
Drag a Copy Data activity onto the canvas
In the Source tab, select your source dataset/connection
In the Sink tab (that's Microsoft's term for "destination"), select your destination dataset/connection
In the Mapping tab, configure column mappings

Column mapping options:

Auto-map works when source and destination have matching column names
Manual mapping lets you rename columns, change data types, or exclude columns
Import schema reads the source schema and lets you adjust mappings visually

Add Data Transformation

For your first pipeline, keep transformations simple. The Copy Data activity supports:

Column renaming via mapping
Data type conversion
Filtering (using a query on the source side)
Adding static columns

For more complex transformations (joins, aggregations, conditional logic), you'll need either:

Mapping data flows (ADF) - visual, code-free transformation tool
Dataflows Gen2 (Fabric) - Power Query-based transformations
Notebooks (Fabric) - Python/Spark code for full flexibility

We recommend starting with Copy Data and only adding transformation complexity when you need it.

Add Error Handling

Every production pipeline needs error handling. At minimum:

Set retry policy on the Copy Data activity (Settings tab). We recommend 3 retries with 30-second intervals.
Add a failure path. Drag from the red "X" output of Copy Data to a Web activity or Set Variable activity that logs the failure.
Configure alerts (covered in Step 6).

Here's a simple pattern we use on most projects:

Copy Data --> [Success] --> Log Success (Stored Procedure or Web Activity)
         --> [Failure] --> Log Failure --> Send Alert (Logic App or Email)

Step 5 - Schedule Your Pipeline

Triggers in Azure Data Factory

ADF supports three trigger types:

Schedule trigger: Runs at a set time (e.g., daily at 2 AM AEST). Most common for batch pipelines.
Tumbling window trigger: Runs for a specific time window. Useful for processing data in defined periods.
Event trigger: Runs when a file arrives in Blob Storage or Data Lake. Great for event-driven architectures.

To create a schedule trigger:

Click Add trigger then New/Edit
Choose Schedule
Set your recurrence (daily at 2:00 AM is a common starting point for Australian businesses - well outside business hours in AEST)
Set the start date and time zone (use AUS Eastern Standard Time or E. Australia Standard Time)

Important: Always set the time zone explicitly. The default is UTC, which is 10-11 hours behind AEST. We've seen pipelines that were supposed to run at 2 AM run at noon because someone forgot to set the time zone.

Triggers in Fabric Data Factory

Fabric pipelines support schedule triggers configured directly on the pipeline. The process is simpler:

Open your pipeline
Click Schedule in the toolbar
Set recurrence and time zone

Step 6 - Set Up Monitoring and Alerts

A pipeline that fails silently is worse than no pipeline at all. Set up monitoring before you go live.

Basic Monitoring

Azure Data Factory:

The Monitor tab in ADF Studio shows all pipeline runs with status, duration, and error details
Set up Azure Monitor alerts for pipeline failures (go to the ADF resource in Azure portal then Alerts then New alert rule)

Fabric Data Factory:

The Monitoring hub in Fabric shows pipeline run history
Configure alerts through the Fabric admin settings

What to Monitor

At minimum, set up alerts for:

Pipeline failures
Pipelines that run longer than expected (indicates performance issues or data volume spikes)
Pipelines that don't run when expected (missed triggers)

For more detail on monitoring, see our guide on Data Factory monitoring and alerting best practices.

Step 7 - Test Thoroughly

Before scheduling your pipeline, test it properly:

Run it manually with a small dataset first. Click Debug in the pipeline designer.
Check the output. Verify the data landed correctly in the destination. Check row counts, data types, and a sample of values.
Test failure scenarios. What happens if the source is unavailable? If the data format changes? If the destination is full?
Run with production-volume data. Performance with 100 rows tells you nothing about performance with 10 million rows.
Test the schedule. Publish and let the trigger fire at least twice before considering it done.

Common Mistakes to Avoid

We see these mistakes repeatedly across Data Factory projects:

1. Not using parameterisation. Hardcoding file paths, table names, and connection details makes pipelines rigid and forces duplication. Use parameters from the start.

2. Ignoring data type mismatches. A column that looks like an integer in the source might contain text values on edge cases. Explicit data type mapping prevents mysterious failures at 2 AM.

3. No incremental loading strategy. Loading the full dataset every time works for small tables but becomes untenable at scale. Implement watermark-based incremental loading early.

4. Skipping CI/CD. Manual publishing from the ADF Studio is fine for learning, but production pipelines need proper source control and deployment pipelines. Set up Git integration from day one.

5. Over-engineering the first pipeline. Your first pipeline should be simple. Get data moving, prove the pattern, then add complexity iteratively.

What Comes After Your First Pipeline

Once your first pipeline is running successfully, you'll naturally want to:

Add more pipelines for additional data sources
Implement incremental loading using watermarks or change data capture
Build transformation logic using data flows or notebooks
Set up proper CI/CD with Git integration and deployment pipelines
Create a monitoring dashboard for operational visibility

This is where having experienced Data Factory consultants pays off. The difference between a well-architected data platform and a spaghetti mess of pipelines is usually visible within the first 3-6 months.

Getting Help

If you're just getting started with Data Factory and want guidance from people who've built dozens of implementations, Team 400 can help. We're Microsoft Data Factory consultants based in Australia, and we work across both Azure Data Factory and Fabric Data Factory.

Whether you need help with your first pipeline or you're planning a full data platform build, get in touch. We also offer Power BI consulting if you need reporting on top of your data pipelines, and broader AI and data services for organisations looking to do more with their data.