Microsoft Fabric Data Integration Strategy - Picking the Right Tool for the Job
Microsoft Fabric has a lot of ways to move and transform data. That's both its strength and the thing that trips people up.
You've got Mirroring, Copy Jobs, Copy Activity in Pipelines, Dataflow Gen2, Notebooks, Apache Airflow Jobs, and Eventstreams. Each one does something slightly different. Pick the wrong one and you'll spend weeks building something that would have taken days with the right approach. Pick the right one and the work feels almost effortless.
We've been helping Australian enterprises figure this out since Fabric went generally available, and the pattern is always the same - teams waste the most time at the decision point, not the implementation. Microsoft's data integration decision guide lays out the options clearly. Here's how we think about the choices in practice.
The Three Questions That Matter
Before you look at any specific tool, answer these:
What are you actually trying to do? Are you moving data from point A to point B? Transforming it along the way? Orchestrating a multi-step workflow? Streaming events in real time? Each of these points you toward a different tool.
What's the skill level of the person maintaining this? A database administrator, a data engineer, and a data scientist will all prefer different approaches. Building something technically elegant that nobody on your team can maintain is worse than building something simple that everyone understands.
What's the data workload type? Batch, incremental, near real-time, or full streaming? This narrows the field significantly.
Get those three answers clear and the decision usually makes itself.
Data Movement - Your Four Main Options
Mirroring
If you need near real-time replication from a source database into Fabric with minimal setup, Mirroring is the answer. It's a no-code, turnkey experience. You point it at your SQL Server (or one of the other six-plus supported sources), and your data appears in OneLake as read-only Delta tables.
Where Mirroring shines is the simplicity. No ETL logic, no scheduling, no pipeline maintenance. The data just flows. For analytics teams that need fresh data without putting load on production databases, it's exactly right.
The limitations are real though. You can't transform data during the mirroring process. The destination is always a mirrored database - you don't choose where the data lands. And the connector list, while growing, is still smaller than what you get with Copy Job or Copy Activity. If you need data from fifty different sources, Mirroring won't cover all of them.
For organisations running SQL Server workloads and wanting analytics access, Mirroring is the fastest path to value. We've had clients go from zero to fresh analytics data in Fabric in under a day.
Copy Job
This is the newer option and it's quickly becoming our default recommendation for data ingestion. Copy Job gives you a wizard-driven experience for copying data from 50+ sources, with built-in support for both watermark-based incremental loads and native change data capture (CDC).
The multi-table selection is what sets it apart from Copy Activity. Instead of configuring one table at a time, you pick a batch, configure the copy pattern once, and let it run. For migrations or initial data loads, this saves hours of repetitive work.
Copy Job is genuinely no-code. A business analyst can set one up. The CDC support means it automatically detects CDC-enabled tables and handles the incremental logic for you. No custom expressions, no control tables, no maintenance scripts.
Where it falls short - transformation support is minimal. If you need to reshape data during the copy, you're looking at Copy Activity or Dataflow Gen2 instead.
Copy Activity in Pipelines
This is the workhorse you know from Azure Data Factory. Copy Activity handles batch, bulk, and incremental data movement with the same 50+ connector catalogue. It's low-code, supports the drag-and-drop pipeline canvas, and scales from gigabytes to petabytes.
The difference from Copy Job is flexibility. Copy Activity sits inside a pipeline, so you can chain it with other activities - lookups, stored procedures, conditional logic, loops. For medallion architecture implementations where data moves through bronze, silver, and gold layers with transformations at each step, this is the right tool.
The trade-off is more configuration. You're building pipelines, mapping connections, and managing more moving parts. For straightforward data copies, Copy Job does the same thing with less effort. Use Copy Activity when you need orchestration around the copy.
Eventstreams
If your data is continuous - IoT telemetry, transaction feeds, application events - Eventstreams handles real-time ingestion and lightweight stream processing. It connects to 25+ streaming sources and supports SQL-based stream analytics for in-flight transformations.
This is a different category entirely from the batch tools above. You don't schedule Eventstreams; they run continuously. The use cases are things like monitoring customer support metrics in real time, processing IoT sensor data, or building event-driven workflows that react to changes as they happen.
We've seen Eventstreams work well in manufacturing and logistics where real-time visibility is the whole point. For traditional data warehouse patterns with daily or hourly loads, you don't need this.
Orchestration - Pipelines vs Apache Airflow
If you need to coordinate multiple activities in sequence - copy data, then transform it, then run a stored procedure, then notify someone - you need orchestration.
Pipelines are the low-code option. Visual canvas, drag-and-drop activities, built-in scheduling, dependency management. If your team is comfortable with Azure Data Factory, Fabric Pipelines will feel familiar. It's the same paradigm with improvements (no publish step, better monitoring, Copilot assistance for expressions).
Apache Airflow Jobs are for Python-first teams. If your data engineers already write DAGs and have existing Airflow workflows, this managed service lets them keep working in their preferred tool without managing infrastructure. The connector ecosystem through Airflow is massive - 100+ connectors.
The honest advice? If you don't already have Airflow expertise on your team, don't introduce it just because it sounds more sophisticated. Pipelines handle 90% of orchestration needs with less complexity. Use Airflow when you genuinely need Python-based orchestration logic that goes beyond what the visual pipeline canvas can express.
Transformation - Three Different Worlds
Notebooks
For complex transformations - joins across large datasets, statistical computations, machine learning feature engineering, custom algorithms - Spark Notebooks are the most powerful option. You get the full distributed computing stack with support for Python, Scala, SQL, and R.
This is squarely a data engineer and data scientist tool. If your team writes code and needs the flexibility to do anything Spark can do, Notebooks are the right choice. The interactive development environment is good for exploration, and the same code runs in production.
Dataflow Gen2
This is Power Query in Fabric. If your team already uses Power Query in Excel or Power BI, they'll feel at home. 170+ connectors, 400+ transformation functions, all through a visual interface. No code required.
Dataflow Gen2 is our go-to recommendation for business analysts and data integrators who need to clean, reshape, and standardise data. The data profiling tools are genuinely useful for understanding data quality before you build pipelines around it.
The limitation is performance at very large scale. For terabyte-scale transformations, Notebooks with Spark will be faster. For everything else, the ease of use wins.
Eventstreams (Again)
Eventstreams shows up in the transformation column too because it supports SQL-based stream analytics. If you need lightweight transformations on streaming data - filtering, aggregation, windowing functions - you can do that inline without a separate transformation step.
Making the Decision in Practice
Here's how we typically walk clients through the choice.
Start with the destination. If your data needs to end up in a Fabric Lakehouse or Warehouse, any of these tools will work. If you're also sending data to non-Fabric destinations, check the connector list carefully.
Match the tool to the maintainer. A pipeline built by a consultant that your business analyst team can't modify isn't a good pipeline. Pick the tool that your team can own long-term.
Don't over-engineer the first iteration. We see teams reach for Notebooks and Spark when a Copy Job and a Dataflow Gen2 would do the job with a fraction of the complexity. Start simple. Add sophistication when the simple approach hits a wall.
Run parallel approaches during migration. If you're moving from Azure Data Factory to Fabric, you don't have to commit to one tool immediately. Build new work in Fabric while existing ADF pipelines keep running. This is explicitly supported and we've covered the migration process in detail.
Real Scenario Mapping
"We need our SQL Server data in Fabric for Power BI reports" - Mirroring. Done in a day. Near real-time freshness, zero maintenance.
"We need to consolidate data from fifteen different regional databases" - Copy Job with CDC for the initial migration and ongoing incremental loads.
"We need a medallion architecture with bronze, silver, and gold layers" - Pipelines with Copy Activity for movement, Dataflow Gen2 or Notebooks for transformation at each layer.
"We need real-time dashboards for our operations centre" - Eventstreams for ingestion, with Data Activator for alerts.
"We have complex Python-based data processing that needs to run on schedule" - Apache Airflow Jobs if you have Airflow expertise, Notebooks in Pipelines otherwise.
"Our business analysts need to clean and standardise supplier data" - Dataflow Gen2. No question.
Getting Help With the Decision
The choice matters because changing your data integration approach mid-project is expensive. Getting it right at the start saves real time and money.
If you're working through a Fabric implementation, our Microsoft Fabric consultants can help you map your specific workloads to the right tools. We also work with teams on the broader data factory migration from ADF to Fabric, and our business intelligence practice can help you think about how the data integration layer fits into your analytics strategy.
The full Microsoft decision guide includes detailed comparison tables and additional scenarios. It's worth bookmarking for when specific technical questions come up during implementation.