How to Build Your First Data Pipeline with Data Factory
Building your first data pipeline in Data Factory should take hours, not weeks. But we've seen plenty of teams spend weeks on what should be a simple first pipeline because they got lost in configuration options or followed outdated tutorials.
This guide walks you through building a real, production-ready data pipeline from scratch. We'll cover both Azure Data Factory (ADF) and Fabric Data Factory, noting where they differ. By the end, you'll have a working pipeline that moves data from a source system, applies basic transformations, and lands it in a destination - with proper error handling and monitoring.
Before You Start - What You Need
For Azure Data Factory
- An Azure subscription (free trial works for learning)
- An Azure resource group
- An Azure Data Factory instance (create one in the Azure portal - takes about 2 minutes)
- At least one data source to read from (Azure Blob Storage is easiest to start with)
- A destination to write to (Azure SQL Database or Azure Data Lake Storage Gen2)
For Fabric Data Factory
- A Microsoft Fabric capacity (even a trial capacity works)
- A Fabric workspace
- Access to at least one data source
- A Fabric Lakehouse or Warehouse as your destination
The interfaces are similar enough that most of this guide applies to both. We'll call out differences where they matter.
Step 1 - Plan Your Pipeline Before You Build
This sounds obvious, but we've seen teams jump straight into the visual designer without thinking through what they're building. Take 15 minutes to answer these questions:
- What data are you moving? Be specific. Which table, which file, which API endpoint?
- Where is the source? Cloud service, on-premises database, SaaS application?
- Where is the destination? Data lake, SQL database, warehouse?
- What transformations do you need? Column renaming, filtering, data type conversion, joining with other data?
- How often should it run? Real-time, hourly, daily, weekly?
- What happens when it fails? Who gets notified? Does it retry?
For your first pipeline, keep it simple. Move one dataset from one source to one destination with minimal transformation. You can add complexity later.
Step 2 - Set Up Your Connections
In ADF, connections are called Linked Services. In Fabric, they're called Connections. Same concept, slightly different names.
Creating a Source Connection
Azure Data Factory:
- Open the ADF Studio (from the Azure portal, click "Launch Studio")
- Go to the Manage tab
- Click Linked services then New
- Search for your source type (e.g., "Azure Blob Storage")
- Configure the connection - name, authentication method, connection string or account URL
- Click Test connection to verify it works
- Click Create
Fabric Data Factory:
- Open your Fabric workspace
- Go to Settings then Manage connections and gateways
- Click New connection
- Select your source type
- Configure and test the connection
Creating a Destination Connection
Repeat the same process for your destination. If you're writing to Azure SQL Database, you'll need:
- Server name (e.g., yourserver.database.windows.net)
- Database name
- Authentication (SQL auth or Azure AD)
Common mistake: Using SQL authentication with credentials stored in plain text. Use Azure Key Vault to store connection strings and secrets from day one. It takes 10 minutes to set up and saves you a security headache later.
Step 3 - Create Your Datasets (ADF) or Set Up Source/Destination (Fabric)
In Azure Data Factory
Datasets define the structure of your data. You need one for the source and one for the destination.
- Go to the Author tab
- Click the + next to Datasets and select New dataset
- Choose your source type and linked service
- Configure the dataset - file path, table name, schema
- Repeat for the destination
Tip: Use parameterised datasets from the start. Instead of hardcoding a file path like /data/sales/2026-04-20.csv, use a parameter like /data/sales/{date}.csv. This makes your pipeline reusable.
In Fabric Data Factory
Fabric pipelines don't require separate dataset objects. You configure source and destination directly within the pipeline activities. This is simpler for getting started, though it can lead to duplication if you're building many pipelines against the same sources.
Step 4 - Build the Pipeline
Now for the actual pipeline.
Create a Copy Data Activity
The Copy Data activity is the workhorse of Data Factory. It moves data from source to destination with optional column mapping and basic transformations.
- Go to the Author tab (ADF) or create a new Data Pipeline (Fabric)
- Drag a Copy Data activity onto the canvas
- In the Source tab, select your source dataset/connection
- In the Sink tab (that's Microsoft's term for "destination"), select your destination dataset/connection
- In the Mapping tab, configure column mappings
Column mapping options:
- Auto-map works when source and destination have matching column names
- Manual mapping lets you rename columns, change data types, or exclude columns
- Import schema reads the source schema and lets you adjust mappings visually
Add Data Transformation
For your first pipeline, keep transformations simple. The Copy Data activity supports:
- Column renaming via mapping
- Data type conversion
- Filtering (using a query on the source side)
- Adding static columns
For more complex transformations (joins, aggregations, conditional logic), you'll need either:
- Mapping data flows (ADF) - visual, code-free transformation tool
- Dataflows Gen2 (Fabric) - Power Query-based transformations
- Notebooks (Fabric) - Python/Spark code for full flexibility
We recommend starting with Copy Data and only adding transformation complexity when you need it.
Add Error Handling
Every production pipeline needs error handling. At minimum:
- Set retry policy on the Copy Data activity (Settings tab). We recommend 3 retries with 30-second intervals.
- Add a failure path. Drag from the red "X" output of Copy Data to a Web activity or Set Variable activity that logs the failure.
- Configure alerts (covered in Step 6).
Here's a simple pattern we use on most projects:
Copy Data --> [Success] --> Log Success (Stored Procedure or Web Activity)
--> [Failure] --> Log Failure --> Send Alert (Logic App or Email)
Step 5 - Schedule Your Pipeline
Triggers in Azure Data Factory
ADF supports three trigger types:
- Schedule trigger: Runs at a set time (e.g., daily at 2 AM AEST). Most common for batch pipelines.
- Tumbling window trigger: Runs for a specific time window. Useful for processing data in defined periods.
- Event trigger: Runs when a file arrives in Blob Storage or Data Lake. Great for event-driven architectures.
To create a schedule trigger:
- Click Add trigger then New/Edit
- Choose Schedule
- Set your recurrence (daily at 2:00 AM is a common starting point for Australian businesses - well outside business hours in AEST)
- Set the start date and time zone (use AUS Eastern Standard Time or E. Australia Standard Time)
Important: Always set the time zone explicitly. The default is UTC, which is 10-11 hours behind AEST. We've seen pipelines that were supposed to run at 2 AM run at noon because someone forgot to set the time zone.
Triggers in Fabric Data Factory
Fabric pipelines support schedule triggers configured directly on the pipeline. The process is simpler:
- Open your pipeline
- Click Schedule in the toolbar
- Set recurrence and time zone
Step 6 - Set Up Monitoring and Alerts
A pipeline that fails silently is worse than no pipeline at all. Set up monitoring before you go live.
Basic Monitoring
Azure Data Factory:
- The Monitor tab in ADF Studio shows all pipeline runs with status, duration, and error details
- Set up Azure Monitor alerts for pipeline failures (go to the ADF resource in Azure portal then Alerts then New alert rule)
Fabric Data Factory:
- The Monitoring hub in Fabric shows pipeline run history
- Configure alerts through the Fabric admin settings
What to Monitor
At minimum, set up alerts for:
- Pipeline failures
- Pipelines that run longer than expected (indicates performance issues or data volume spikes)
- Pipelines that don't run when expected (missed triggers)
For more detail on monitoring, see our guide on Data Factory monitoring and alerting best practices.
Step 7 - Test Thoroughly
Before scheduling your pipeline, test it properly:
- Run it manually with a small dataset first. Click Debug in the pipeline designer.
- Check the output. Verify the data landed correctly in the destination. Check row counts, data types, and a sample of values.
- Test failure scenarios. What happens if the source is unavailable? If the data format changes? If the destination is full?
- Run with production-volume data. Performance with 100 rows tells you nothing about performance with 10 million rows.
- Test the schedule. Publish and let the trigger fire at least twice before considering it done.
Common Mistakes to Avoid
We see these mistakes repeatedly across Data Factory projects:
1. Not using parameterisation. Hardcoding file paths, table names, and connection details makes pipelines rigid and forces duplication. Use parameters from the start.
2. Ignoring data type mismatches. A column that looks like an integer in the source might contain text values on edge cases. Explicit data type mapping prevents mysterious failures at 2 AM.
3. No incremental loading strategy. Loading the full dataset every time works for small tables but becomes untenable at scale. Implement watermark-based incremental loading early.
4. Skipping CI/CD. Manual publishing from the ADF Studio is fine for learning, but production pipelines need proper source control and deployment pipelines. Set up Git integration from day one.
5. Over-engineering the first pipeline. Your first pipeline should be simple. Get data moving, prove the pattern, then add complexity iteratively.
What Comes After Your First Pipeline
Once your first pipeline is running successfully, you'll naturally want to:
- Add more pipelines for additional data sources
- Implement incremental loading using watermarks or change data capture
- Build transformation logic using data flows or notebooks
- Set up proper CI/CD with Git integration and deployment pipelines
- Create a monitoring dashboard for operational visibility
This is where having experienced Data Factory consultants pays off. The difference between a well-architected data platform and a spaghetti mess of pipelines is usually visible within the first 3-6 months.
Getting Help
If you're just getting started with Data Factory and want guidance from people who've built dozens of implementations, Team 400 can help. We're Microsoft Data Factory consultants based in Australia, and we work across both Azure Data Factory and Fabric Data Factory.
Whether you need help with your first pipeline or you're planning a full data platform build, get in touch. We also offer Power BI consulting if you need reporting on top of your data pipelines, and broader AI and data services for organisations looking to do more with their data.