Microsoft Fabric Capacity Planning - How to Size Workloads

April 19, 2026•11 min read•Michael Ridland

Microsoft Fabric Capacity Planning - How to Size Workloads

Capacity planning is where most Fabric implementations either waste money or hit performance walls. The Capacity Unit (CU) model is fundamentally different from what teams are used to with dedicated Azure resources, and getting it wrong is expensive in both directions - overspend if you're too generous, frustrated users and failed jobs if you're too conservative.

We've sized Fabric capacity for dozens of Australian organisations, and the patterns are consistent enough that I can share practical guidance. This post covers how CU consumption actually works, how to size for specific workloads, and the monitoring approach that keeps you right-sized over time.

How Fabric Capacity Units Work

Before sizing anything, you need to understand the mechanics:

Capacity Units (CUs) are a shared compute pool. Every Fabric workload - Data Engineering, Data Warehouse, Data Factory, Power BI, Real-Time Analytics - draws from the same CU pool. You don't allocate CUs to specific workloads. They share one pool.

CU consumption is measured per second. When a Spark job runs, it consumes CUs for every second it's active. When a Power BI report renders, it consumes CUs for the duration of the query. When a Data Factory pipeline executes, it consumes CUs for each activity.

Smoothing spreads consumption over time. Fabric doesn't enforce hard per-second limits. Instead, it uses a smoothing window (typically 5 minutes for interactive operations and 24 hours for background operations) that averages your consumption. This means you can burst above your capacity for short periods without hitting throttling.

Throttling happens when sustained consumption exceeds capacity. If your smoothed CU consumption consistently exceeds your SKU's capacity, Fabric starts delaying background operations first (pipeline runs, Spark jobs) and eventually rejects or delays interactive operations (queries, report renders).

Background vs. interactive matters. Background operations (scheduled pipeline runs, Spark batch jobs) have a longer smoothing window and are throttled first. Interactive operations (Power BI report queries, ad-hoc SQL queries) have a shorter smoothing window and are protected at the expense of background work.

CU Consumption by Workload Type

Each Fabric workload consumes CUs differently. Here's what we've observed in practice:

Data Engineering (Spark)

Spark jobs are typically the heaviest CU consumers. A Spark notebook or job consumes CUs based on:

The number of executor nodes allocated
The duration of the job
The volume of data processed

Typical consumption patterns:

A simple Spark notebook transforming 1 million rows might consume 10-20 CU-seconds
A complex pipeline processing 100 million rows with multiple joins and aggregations might consume 500-2,000 CU-seconds
Large-scale data processing jobs can consume 10,000+ CU-seconds

Optimisation tips:

Use smaller executor configurations where possible (Fabric's auto-scale often over-provisions)
Partition your data effectively to minimise full table scans
Use Delta Lake's Z-ordering for columns frequently used in filters
Cache intermediate results when the same data is used in multiple transformations
Consider Dataflows Gen2 instead of Spark for simpler transformations - it uses significantly fewer CUs

Data Warehouse (SQL)

The Fabric Warehouse consumes CUs for every query executed. Consumption depends on:

Data volume scanned
Query complexity (joins, aggregations, window functions)
Concurrent query count

Typical consumption patterns:

A simple SELECT with a WHERE clause on a partitioned table: 1-5 CU-seconds
A moderately complex dashboard query joining 3-4 tables: 5-20 CU-seconds
A heavy analytical query scanning large tables with multiple aggregations: 50-200 CU-seconds
A badly written query that does a full table scan on a 500GB table: 500+ CU-seconds

Optimisation tips:

Design your data model to minimise query complexity (proper star schema design pays off)
Use statistics and appropriate column types to help the query optimiser
Avoid SELECT * in production queries
Monitor the most expensive queries using Fabric's query monitoring views and optimise the top 10

Power BI

Power BI consumes CUs for:

Semantic model refreshes (loading data into the model)
Report rendering (executing DAX queries when users view reports)
Direct Lake reads (reading data from OneLake)

Typical consumption patterns:

An Import mode refresh of a 5GB model: 200-500 CU-seconds
A single page view of a moderately complex report: 2-10 CU-seconds
A Direct Lake report page view: 1-5 CU-seconds (typically lighter than Import mode queries)
50 concurrent users viewing dashboards during a morning peak: 100-500 CU-seconds per minute

Optimisation tips:

Move to Direct Lake mode where possible - it's more CU-efficient than Import mode for most workloads
Optimise DAX measures to reduce query time
Reduce the number of visuals per page (each visual generates a separate query)
Use aggregation tables for large datasets
Stagger refresh schedules so multiple large models don't refresh simultaneously

Data Factory (Pipelines)

Pipeline CU consumption depends on:

Activity type (copy activities, dataflow activities, notebook activities)
Data volume moved
Number of concurrent pipeline runs

Typical consumption patterns:

A copy activity moving 1GB of data: 10-30 CU-seconds
A Dataflows Gen2 transformation processing 10 million rows: 50-200 CU-seconds
An orchestration pipeline with 20 activities: depends on the activities, but overhead per pipeline is minimal

Optimisation tips:

Use bulk copy where possible instead of row-by-row operations
Parallelise independent copy activities within a pipeline
Schedule heavy pipeline runs during off-peak hours to avoid competing with interactive workloads

Real-Time Analytics (KQL)

KQL database CU consumption depends on:

Ingestion volume (events per second)
Query frequency and complexity
Materialised view maintenance
Data retention and compaction

Typical consumption patterns:

Ingesting 10,000 events per minute: 5-15 CU-seconds per minute (sustained)
A KQL dashboard query scanning 1 hour of data: 2-10 CU-seconds
Materialised view refresh: varies widely, but typically 10-50% of raw query cost

Real-time analytics is a continuous consumer. Unlike batch jobs that spike and complete, streaming workloads generate a constant CU draw. Budget for this baseline load plus headroom for queries.

Sizing Strategy - Start Small, Measure, Scale

The worst approach to capacity sizing is guessing a number and hoping it's right. The best approach is systematic:

Phase 1 - Estimate (Before You Have Data)

Use this table as a rough starting point based on organisation size and workload complexity:

Profile	Recommended Starting SKU	Monthly Cost (AUD, PAYG)
Small: Under 500 employees, basic BI, simple ETL	F4 or F8	$800-1,600
Medium: 500-2,000 employees, moderate analytics, daily pipelines	F16 or F32	$3,200-6,400
Large: 2,000-5,000 employees, complex analytics, real-time components	F32 or F64	$6,400-12,800
Enterprise: 5,000+ employees, heavy data engineering, ML workloads	F64 or F128	$12,800-25,600

Start one tier below your estimate. It's easier (and cheaper) to scale up than to justify scaling down.

Phase 2 - Measure (First 4-8 Weeks)

Once your Fabric environment is running with real workloads, install and monitor the Capacity Metrics app. This is non-negotiable - flying blind on capacity is a guaranteed path to either wasted spend or poor performance.

Key metrics to track:

CU utilisation percentage - Your average utilisation should sit between 50-70% during peak hours. Below 50% and you're over-provisioned. Above 80% and you're at risk of throttling.
Throttling events - Any throttling of interactive operations is a problem. Background throttling is acceptable if it doesn't delay critical pipeline runs.
Peak vs. off-peak patterns - Understand when your capacity is busiest and when it's idle. This informs whether you can pause capacity during off-hours.
Top consumers - Identify which workloads and which specific items (reports, pipelines, notebooks) consume the most CUs. The top 10% of items typically consume 60-80% of capacity.

Phase 3 - Right-Size (After 4-8 Weeks)

Based on your measurements:

If peak utilisation consistently stays below 40%, scale down one SKU tier
If peak utilisation regularly exceeds 80% or you're seeing interactive throttling, scale up one SKU tier
If utilisation is heavily skewed to certain hours, consider pausing capacity during idle periods

Phase 4 - Optimise Continuously

Capacity needs change as your organisation adds new reports, builds new pipelines, and onboards more users. We recommend a monthly capacity review during the first year, moving to quarterly once consumption patterns stabilise.

Cost Optimisation Techniques

Beyond right-sizing your SKU, several techniques can reduce your effective cost:

Pause Capacity During Off-Hours

If your workloads are primarily consumed during business hours (say, 7am to 7pm AEST), you can pause your Fabric capacity overnight and on weekends. This reduces your monthly cost by roughly 50%.

Implementation: Use an Azure Automation runbook or Logic App to pause capacity at 7pm and resume at 6:30am (giving it 30 minutes to warm up). On weekends, keep it paused unless you have overnight batch jobs that run on Saturday or Sunday.

Caveat: Pausing capacity means no scheduled refreshes, no pipeline runs, and no report access during paused hours. Make sure this aligns with your business requirements.

Use Reserved Instances

If you're confident in your capacity tier (you've been running for 3+ months and consumption is stable), switch from pay-as-you-go to a one-year or three-year reservation:

Commitment	Typical Savings vs. PAYG
One year	25-35%
Three years	40-50%

For an F32 capacity at ~$6,400/month PAYG, a one-year reservation might bring it down to ~$4,200-4,800/month. That's a meaningful saving.

Optimise Your Heaviest Workloads

Because 10% of items typically consume 60-80% of CUs, optimising your top consumers has an outsized impact:

Identify the top 10 CU consumers using the Capacity Metrics app
Analyse each one for optimisation opportunities (query tuning, model simplification, job scheduling)
Implement changes and measure the CU reduction
Repeat monthly as new workloads are added

We've seen organisations reduce their CU consumption by 30-50% through targeted optimisation of their heaviest workloads, without reducing functionality.

Separate Dev/Test and Production

Running development and testing workloads on your production capacity is risky (a runaway Spark notebook can throttle production reports) and wasteful (dev workloads inflate your capacity measurements). Use a separate, smaller capacity for dev/test:

Production: F32 (~$6,400/month)
Dev/Test: F8 (~$1,600/month)
Total: ~$8,000/month

This is cheaper than over-provisioning production to absorb dev workloads and gives you isolation for testing.

Capacity Planning for Growth

Don't just size for today's workloads. Plan for growth over the next 12 months:

How many new reports will be created? Each report adds to the interactive CU load.
Will data volumes grow? Larger datasets mean longer refresh times and heavier queries.
Are new data sources being onboarded? Each new source means new pipelines and potentially new Spark jobs.
Is the user base growing? More concurrent users means more interactive CU demand.
Are real-time workloads being added? These generate continuous baseline CU consumption.

A reasonable growth buffer is 20-30% above your current measured consumption. This gives you room to grow without immediately hitting throttling, while not paying for excessive idle capacity.

When to Scale Up vs. Optimise

This is a judgment call, but here's our rule of thumb:

Scale up when:

You're already well-optimised and still hitting capacity limits
Business growth is driving legitimate new workload
The cost of scaling up is less than the cost of the engineering time to optimise

Optimise when:

You haven't reviewed your top consumers in the last 3 months
You know there are badly performing queries or unoptimised Spark jobs
Your utilisation pattern is spiky (suggesting a few heavy items, not broad load)
Optimisation can defer a scale-up by 6+ months

In practice, the right answer is usually both: optimise first to get the most from your current capacity, then scale up when optimisation can't keep pace with growth.

How Team 400 Helps with Capacity Planning

Capacity planning is a core part of our Microsoft Fabric consulting engagements. We don't treat it as an afterthought - it's built into our implementation methodology from day one.

Our approach:

Initial sizing based on your workload profile and our benchmarks from similar Australian organisations
Monitoring setup with the Capacity Metrics app and custom alerts for utilisation thresholds
4-week review after go-live to validate sizing against actual consumption
Optimisation sprint targeting the top CU consumers for performance improvements
Right-sizing recommendation with cost projections for reserved instances vs. PAYG

We also help with Power BI performance tuning (one of the most common sources of CU overconsumption), Data Factory pipeline optimisation, and broader data platform architecture that affects how workloads are distributed.

If you're struggling with Fabric capacity - either paying too much for what you're using or hitting throttling that affects your users - get in touch with our team. We'll assess your current consumption and give you a clear plan for right-sizing.

Explore our full range of data and AI services to see how capacity planning fits into a broader Fabric and analytics strategy.