Loading 1 TB of CSV into a Fabric Lakehouse - What It Actually Costs
One of the more honest questions I get asked when an Australian business is sizing up Microsoft Fabric is "what is this thing actually going to cost us when we move real data?" Not the demo data. Not the 100 MB sample they used in the training session. The real volumes that come out of a production CRM or a partner FTP drop or an old SQL warehouse that nobody has touched since 2018.
Microsoft has a pricing scenario in the Fabric docs that walks through loading 1 TB of CSV data from ADLS Gen2 into a Lakehouse table. The number lands at about $14.11 USD using pay-as-you-go pricing in the West US 2 region. That is a useful anchor point, but it deserves some unpacking before you take it to your CFO.
What the scenario actually measures
The Microsoft example uses a Copy activity inside a pipeline to move 1 TB of CSV data into a Lakehouse table. The run consumes 282,240 CU seconds, which works out to 78.4 CU-hours, and at $0.18 per CU-hour that lands at $14.11. The run took 763.78 seconds, which is just under 13 minutes.
That is genuinely cheap for that volume of data movement. If you compare it to running an equivalent Azure Data Factory copy on dedicated integration runtime, or running a Databricks job to do the same ingestion, you are in a similar ballpark but with less plumbing.
A few things to notice before you celebrate. The price assumes pay-as-you-go pricing in West US 2. If you are running in Australia East or Australia Southeast, the per-CU-hour rate is different. Reserved capacity pricing is also different again, and for any production workload running on a Fabric capacity bigger than F2 you almost certainly want a reservation. The $14.11 number is a unit cost demonstration, not a price tag.
The other thing to watch is that the duration of the run does not directly drive the cost. The CU seconds metric is what you pay for, and that already factors in duration. Two runs that complete in different wall-clock times can cost the same number of CU seconds. That matters because there is no benefit to paying for premium acceleration just to shave minutes off a non-time-sensitive load.
Where the costs actually live in a Fabric project
When we work with Australian clients on Microsoft Fabric implementations, the data movement cost is rarely the part that catches people out. It is the supporting workloads that drift.
A typical Fabric implementation has Data Factory pipelines pulling data in, Dataflows Gen2 doing transformations, Lakehouse storage, semantic models for Power BI, scheduled refreshes, and a handful of notebooks doing things nobody has audited in six months. Each of those consumes capacity differently. The CU consumption from a single nightly 1 TB load is predictable. The capacity consumption from a hundred analysts each running ad-hoc DAX queries against a semantic model at 9:15 AM on Monday is not.
The trap is that the demo scenarios in the Microsoft documentation always show one workload at a time. Real Fabric tenants have all of them running simultaneously, and that is where capacity throttling and smoothing become real concerns.
Sizing your capacity is the actual decision
If you have one 1 TB nightly load and very little else happening, an F2 or F4 capacity will likely handle it without drama. Run the load at 2 AM when nothing else is contending and the capacity smoothing helps you absorb the spike without paying for an oversized SKU during business hours.
If you are running a full enterprise data platform with concurrent ETL, multiple semantic model refreshes, Power BI report traffic from a few hundred users, and notebooks running ad-hoc analysis, you are looking at F16 or higher. The pricing per CU-hour drops slightly with reserved capacity, but the bigger SKU still costs more in absolute terms.
We have seen organisations massively overspend on Fabric capacity because they did not test their workload mix before committing. The Fabric Metrics App is the tool you actually need. It tells you which workloads are burning your CU seconds and when. Without that data, you are guessing.
We have also seen the opposite, where teams pick a tiny capacity because the per-CU-hour math looks cheap, and then their Power BI users get throttled at 10 AM every day. The cost in productivity dwarfs the saving on the capacity bill.
CSV is rarely the right format anyway
A subtle point in the Microsoft scenario is that the source data is CSV. If you have control over the source, you should not be moving CSV around at 1 TB scale. Parquet compresses to roughly a third of the size for the same data, and the columnar format reads faster into Spark and SQL engines.
For an Australian business setting up a Fabric platform, one of the first decisions is whether the bronze layer in your Lakehouse stays in the original CSV/JSON format from the source systems, or whether you immediately convert to Parquet on landing. Both patterns are valid. The CSV-first pattern is simpler for replay and auditing because you have the raw file as it arrived. The Parquet-first pattern is cheaper to query and stores faster.
In practice, we usually recommend a hybrid. Land the raw file in OneLake as CSV or whatever format it arrives in, then immediately materialise a Parquet copy in the bronze layer. That gives you the audit trail and the query performance, and the storage cost difference is small.
The hidden cost - egress and ingress
The Microsoft pricing example assumes the data is in ADLS Gen2 in the same region as your Fabric capacity. If your source data lives in Azure, that is straightforward. If it lives in AWS S3 or an on-premises file server, the math changes.
Egress out of AWS is not free. Bandwidth from on-premises into Azure may be cheap if you have ExpressRoute already configured, but otherwise you are pulling 1 TB over a VPN or public internet, and that takes a while regardless of how cheap the Fabric capacity is. Test this end to end before you commit to a regular schedule. We have seen "five minute" data loads in production take six hours because the bottleneck was network throughput from a partner data centre, not Fabric.
If you are using the Microsoft Data Factory consulting capability inside Fabric specifically because you have hybrid sources, build out a pilot first that ingests a realistic sample from the actual source over the actual network. The cost of being wrong on this is high.
What this means for budget conversations
When a CFO or a head of data asks you how much Fabric will cost, the honest answer is "it depends on the workload mix, and we need to measure before we can predict accurately." That is not the answer they want. The answer they want is a number.
Here is a reasonable way to give them one. Take your three or four largest expected workloads, estimate the CU seconds for each based on the published pricing scenarios, multiply by frequency, then add a 30 to 50 percent buffer for everything you have not thought of yet. Compare that to the cost of a reserved Fabric capacity at the size that comfortably handles the peak. Pick whichever is lower, and commit to monitoring capacity usage for the first three months and adjusting.
This is how we approach Fabric capacity planning with clients on our AI consulting engagements. The number on day one is always wrong. The question is whether the team has the discipline to track actual consumption and right-size after three months.
A practical takeaway
Fabric pricing is fair for the volumes most Australian mid-market and enterprise businesses move. The 1 TB CSV load at around $14 USD is representative of what you should expect for one-off bulk movements, and even running that nightly works out to roughly $400 USD per month, which is fine.
The risk is not the per-workload cost. The risk is capacity sizing for the aggregate workload mix, and not measuring properly before committing to a reserved SKU. Get the measurement piece right and Fabric is one of the better-value enterprise data platforms on the market in 2026.
Reference: Microsoft Fabric Data Factory - Load 1 TB of CSV data to Lakehouse tables pricing scenario