What Does It Actually Cost to Load 1 TB of Parquet into a Fabric Warehouse
One of the more useful pricing scenarios Microsoft published recently was the cost breakdown for loading 1 TB of Parquet data into a Fabric Warehouse using a pipeline Copy activity. The headline number is around 20 cents at US West 2 pay-as-you-go rates. That sounds great. The reality, when you translate it to Australian capacity choices and real-world workloads, is a bit more nuanced.
I want to walk through this scenario because it is genuinely useful as a reference point, and because the levers that affect the actual cost are not obvious from reading the documentation. This is the kind of math that turns up in capacity sizing conversations roughly every week, and getting it right saves real money.
If you are working through Fabric capacity planning more broadly, our Microsoft Fabric consultants page covers the strategy side. This post is the tactical view on one specific ingestion scenario.
The Microsoft Scenario in Plain English
The setup. You have 1 TB of Parquet data sitting in ADLS Gen2. You want to load it into a Fabric Warehouse table. You use a pipeline Copy activity with default settings. Microsoft's example shows the load taking 662 seconds (just over 11 minutes) and consuming 3,960 CU seconds.
Where do those 3,960 CU seconds come from? Microsoft's pipeline pricing model charges Copy activities based on the "intelligent throughput optimization" value, which is essentially how much parallelism the Copy activity used. Each unit of intelligent throughput optimization consumes 1.5 CU hours per hour of activity duration.
In this scenario, the optimisation value is 4. So the math is:
- 4 units of optimisation × 1.5 CU hours per unit × (11 minutes / 60 minutes per hour) = 1.1 CU hours
- 1.1 CU hours × 3600 seconds per hour = 3,960 CU seconds
At Microsoft's US West 2 pay-as-you-go rate of $0.18 per CU hour, that is 1.1 × $0.18 = around 20 cents.
Twenty cents to load a terabyte. The cloud is genuinely cheap for some workloads.
What This Means in Australian Money
The US West 2 PAYG rate is $0.18 per CU hour. In Australia East, the comparable PAYG rate is around $0.22 per CU hour at the time of writing, but check the current Fabric pricing page because Microsoft adjusts these. So the same load in Sydney would cost around 24 cents PAYG.
If you are on a reserved capacity (F64, F128, etc.) the math changes. You are paying for the capacity whether you use it or not. The 3,960 CU seconds are consumed out of the budget of that capacity. On an F64, you have 64 × 3600 = 230,400 CU seconds per hour of capacity budget. The 3,960 CU seconds for the 1 TB load is 1.7 per cent of one hour of an F64.
This is where capacity planning gets interesting. If you are running a single 1 TB load per day, an F2 might be enough capacity (an F2 has 7,200 CU seconds per hour). If you are running fifteen of these loads concurrently, an F64 might not be enough. The capacity sizing question is rarely "what does one workload cost" - it is "what is my peak concurrent demand."
The Bits That Actually Affect Your Bill
Here is where the Microsoft documentation glosses over the practical levers.
The intelligent throughput optimisation value is configurable. Microsoft's example shows it at 4, which is a reasonable default. You can set it higher (up to 256) for more parallelism, or lower for less. Higher values mean faster loads but more CU consumption. The relationship is roughly linear - doubling the optimisation halves the duration but doubles the CU consumption per second. So the total CU consumed for the same data volume is similar. But the wall-clock time matters if you have time-sensitive downstream workloads.
A practical example. The 1 TB load at optimisation 4 takes 11 minutes. At optimisation 8, it would take roughly 5 to 6 minutes but cost the same total CU. At optimisation 32, it would take 1 to 2 minutes but cost the same. If you need the data available by 6am for morning reporting, crank up the optimisation. If you have all night, leave it at 4.
Source format matters. Parquet is the cheapest format to read because it is columnar and compressed. If your source is CSV, expect roughly 2-3x the CU consumption for the same data volume. If your source is JSON, even more. We have seen organisations save real money by converting upstream CSV exports to Parquet before ingestion.
Schema complexity adds overhead. A simple table with 10 columns loads faster than a wide table with 500 columns, even if the total data volume is similar. The Copy activity has per-column overhead that adds up on wide tables. If you can split a 500-column source into a few narrower tables, do so.
Destination type matters significantly. Microsoft's example uses a Warehouse as the destination. If you wrote the same data to a Lakehouse instead, you would skip the Warehouse compute charges but pay slightly more on the Spark side for downstream consumption. For a pure landing zone, Lakehouse is usually cheaper. For a heavily queried analytical layer, Warehouse can be worth the extra ingestion cost because of the T-SQL query performance. Pick based on your downstream use case, not just ingestion economics.
Staging makes a difference. Some Copy activity configurations write to a staging Lakehouse first, then to the final destination. This adds CU consumption but can be faster for certain transformations. The example in the Microsoft documentation does not include staging. If you turn staging on, expect roughly 30 to 50 per cent more CU consumption for the equivalent load.
How to Verify Your Actual Cost
The Fabric Metrics App is the source of truth here. After any Copy activity run, you can find the actual CU seconds consumed in the Metrics App under data movement operations.
What I recommend to clients - run a representative load (not necessarily a full 1 TB, but a meaningful sample like 100 GB) on the actual pipeline you will use in production. Check the Metrics App to see the actual CU consumption. Multiply by 10 to get an estimate for the 1 TB equivalent. Then double it for safety. That is your cost estimate.
The reason for the safety factor is that production loads behave differently than test loads. There is concurrency from other workloads on the same capacity. There are transient retries when source systems are slow. There are days when one of the source files is corrupted and the pipeline does extra work to handle it. The headline cost from Microsoft's documentation assumes everything works perfectly first time. Your production pipelines will not.
For data integration work, we typically build observability around CU consumption from day one. The Metrics App gives you the data, but you need to consume it consistently to spot trends. Capacity that was fine in month one can become constrained by month six as workloads grow.
When 20 Cents Becomes Real Money
Let's scale this to a realistic enterprise workload. An organisation we worked with recently was loading roughly 800 GB per day into Fabric Warehouse, split across maybe 15 source systems. So roughly the same scale as 0.8 TB.
At Microsoft's example rates, that is around 16 cents per day, or $58 per year. Genuinely cheap.
But the actual workload was more like 60,000 CU seconds per day across all the Copy activities, not the 3,168 CU seconds you would naively expect from scaling Microsoft's example. Why? Because there were 200 individual Copy activities, not one big one. Each had startup overhead. Several involved transformations beyond simple copy. Some sources were CSV not Parquet. Some had retries. Some ran with higher optimisation values for time-sensitive loads.
The actual cost at Australian PAYG rates was around $1.20 per day, or $440 per year, just for the data ingestion. Still cheap. But about 7-8x what Microsoft's headline example would suggest. That is the gap between a clean documentation scenario and production reality.
The lesson is not that Fabric is expensive. It is cheap. The lesson is that scaling cost estimates linearly from a single-workload example will give you a misleading picture. The base unit of Fabric ingestion cost is the activity run, not the data volume.
What to Do About It
If you are sizing Fabric capacity for ingestion-heavy workloads, do these three things.
One, count your activity runs, not just your data volume. A pipeline with 200 small Copy activities will consume more CU than a pipeline with 5 large Copy activities moving the same total data.
Two, measure on your actual workload. Run a representative sample on a small capacity. Use the Metrics App. Extrapolate. Add safety margin. Do not trust documentation examples for your specific situation.
Three, optimise the chatty pipelines. If you have a pipeline doing 50 small loads, consider whether you can consolidate them into 10 larger loads. The per-activity overhead is real and consolidating workloads usually reduces total CU consumption without hurting the outcome.
For the original Microsoft scenario, see the Fabric Data Factory 1 TB Parquet pricing scenario. It is a good baseline. Just remember it is a baseline, not your bill.
If you want help sizing Fabric capacity for a real workload, or you have a capacity bill that is bigger than expected and you would like another set of eyes on it, get in touch. We do this work for organisations across Sydney, Melbourne, Brisbane, and the Sunshine Coast, and we will give you a straight answer about whether your capacity is sized right.