Back to Blog

Microsoft Fabric Data Factory - Getting Started with Copy Jobs for Data Ingestion

April 17, 20267 min readMichael Ridland

If you're moving data into Microsoft Fabric for the first time, the Copy job is where you'll start. It's the simplest way to get raw data from a source into your Lakehouse, and honestly, it works well enough that you might wonder why people make data ingestion sound so complicated.

That said, there are things the official tutorial doesn't tell you that matter once you're past the "hello world" stage. I'll walk through the basics and then share what we've learned running these in production for Australian enterprises.

What is a Copy Job?

A Copy job in Fabric Data Factory is exactly what it sounds like - it copies data from point A to point B. Source could be a database, a file store, an API, or one of Microsoft's sample datasets. Destination is typically a table in your Lakehouse.

The Copy job replaces what used to be a Copy Activity inside an Azure Data Factory pipeline. It's standalone now, which means you can run it independently without wrapping it in a pipeline. For simple ingestion scenarios, this is a welcome simplification.

If you've used Azure Data Factory before, the Copy job will feel familiar. The configuration wizard walks you through source, destination, mapping, and execution. The main difference is that everything lives inside the Fabric workspace rather than in a separate Azure resource.

Setting Up Your First Copy Job

Here's the practical walkthrough. I'll use the NYC Taxi sample data that Microsoft provides, because it's large enough to be realistic without requiring you to set up external connections.

Prerequisites

You need a Fabric tenant with an active subscription and a workspace. If you're evaluating, Microsoft offers a free trial. You'll also need access to Power BI, since that's still the entry point for the Fabric experience.

Creating the Copy Job

From your Fabric workspace, select + New item and search for Copy job. Give it a meaningful name - something like "Ingest-NYC-Taxi-Bronze" rather than "Copy job 1". You'll thank yourself later when you have thirty of these.

Configuring the Source

On the Choose data source page, select Sample data from the top options, then pick NYC Taxi - Green. The preview shows you what you're getting before you commit. This is a good habit even with sample data - always preview before you copy.

For production use, you'd connect to your actual data source here. Fabric supports a wide range of connectors - databases, cloud storage, SaaS applications. The connector library is solid, though some connections require an on-premises data gateway if your source isn't cloud-accessible.

Configuring the Destination

Select Lakehouse as your destination. You can create a new Lakehouse inline or connect to an existing one. For a clean start, create a new one and give it a clear name.

Choose Full copy for the copy mode. This replaces all data in the destination each time - fine for initial loads and reference data, but you'll want incremental for anything that runs on a schedule with large volumes.

When mapping to the destination, select Tables (not Files), choose Append as the update method, and rename the destination table to something meaningful. The tutorial suggests "Bronze", following the medallion architecture pattern where raw data lands in a bronze layer before being refined.

Running It

Hit Save, then Run. The copy executes and you can monitor progress in the Results pane. Microsoft warns this can take over 30 minutes for the sample dataset, which is fair - it's a decent chunk of data. In practice, your timing will depend on data volume and source connectivity.

What the Tutorial Doesn't Tell You

Medallion Architecture Matters

The tutorial names the destination table "Bronze" and doesn't explain why. The medallion architecture - bronze, silver, gold - is a data engineering pattern where:

  • Bronze holds raw data exactly as it arrived from the source
  • Silver is cleaned, deduplicated, and conformed
  • Gold is business-ready aggregations and models

This isn't just academic naming. It affects how you structure your Lakehouse, how you handle data quality, and how you build dependencies between jobs. Get the bronze layer right first, then build transformations on top.

Full Copy vs Incremental

The tutorial uses full copy, which is fine for getting started. In production, you'll almost certainly need incremental loads for your transactional data. Full copies of large tables are slow and expensive.

Fabric supports incremental patterns, but you need to think about your watermark column - typically a modified date or an auto-incrementing ID. Plan this early because retrofitting incremental loads onto a system designed for full copies is painful.

Naming Conventions Save Time

I've seen Fabric workspaces where every Copy job is named "Copy job", "Copy job (1)", "Copy job (2)". Six months later, nobody knows what any of them do. Establish naming conventions from day one.

We use a pattern like Source-Entity-Layer - so ERP-SalesOrders-Bronze or CRM-Contacts-Bronze. It's immediately clear what each job does and where the data lands.

Monitoring and Alerting

The Results pane gives you real-time monitoring during execution, but you'll want proper alerting for scheduled jobs. Fabric integrates with Azure Monitor, and you can set up alerts for failed runs. Don't wait until someone notices stale data to discover a job has been failing for a week.

Where Copy Jobs Fit in the Bigger Picture

Copy jobs are module 1 of a larger pipeline. After ingesting raw data, you'll typically:

  1. Transform with Dataflows - clean and reshape data in the silver layer using Fabric's visual data transformation tools
  2. Model for analytics - build a semantic model in the gold layer for Power BI reporting
  3. Orchestrate with pipelines - chain Copy jobs, Dataflows, and notebooks into end-to-end workflows

If you're migrating from Azure Data Factory, a lot of these concepts will be familiar. The execution engine is different but the patterns carry over. We've written separately about upgrading existing Azure Data Factory pipelines to Fabric, which covers the migration path.

Practical Tips from Production Deployments

Test with sample data first. Even if you know your source system well, use Microsoft's sample data to validate your Lakehouse setup and mapping before pointing at production databases. It removes one variable from troubleshooting.

Watch your capacity units. Fabric consumption is capacity-based. Large copy jobs consume CU, and if you're on a shared capacity, your copy jobs compete with everyone else's queries and reports. Schedule heavy ingestion outside business hours when possible.

Document your mappings. The column mapping screen in the Copy job wizard is easy to configure but hard to review later. Keep a separate record of source-to-destination mappings, especially for tables with many columns or where you're renaming fields.

Start simple, add complexity later. It's tempting to build the entire medallion architecture on day one. Don't. Get one Copy job running reliably, then add transformations, then add orchestration. Each layer should be stable before you build on top of it.

Getting Help with Fabric

Microsoft Fabric brings together a lot of capabilities that used to live in separate products - Data Factory, Power BI, Synapse, and more. That consolidation is genuinely useful, but it also means there's a lot to learn.

We work with organisations across Australia as Microsoft Fabric consultants, helping teams get their data platform right from the start. Whether you're building your first Lakehouse or migrating from an existing Azure data stack, the patterns matter more than the specific clicks in the wizard.

For organisations looking to connect their Fabric data through to reporting, our Power BI consulting team works closely with the data engineering side to make sure the whole pipeline - from ingestion through to dashboards - actually works as a system.

The official tutorial walks through the mechanics well. What I've tried to add here is the context that makes those mechanics useful in practice.