Data Factory for SAP Integration - Connecting SAP to Power BI
SAP is everywhere in Australian enterprise. Mining companies, manufacturers, utilities, large retailers, government departments - if you operate at scale in Australia, there's a good chance your core business processes run on SAP ECC, S/4HANA, or SAP BW.
And if you run SAP, you almost certainly have a reporting problem.
SAP's built-in reporting tools (SAP GUI reports, BEx queries, embedded analytics in Fiori) serve SAP users well enough, but they don't help the 80% of your organisation that doesn't live in SAP. Finance wants data in Excel. Operations wants dashboards. Executives want a single view across SAP and non-SAP systems. The sales team wants CRM data blended with SAP order data.
This is where Data Factory and Power BI come in. Data Factory extracts data from SAP, stages it in a modern data platform, and Power BI turns it into reports and dashboards that anyone in the organisation can use.
We've built this architecture for Australian organisations across mining, manufacturing, retail, and professional services. Here's how to do it right.
Why Not Connect Power BI Directly to SAP?
Power BI has native SAP connectors (SAP HANA and SAP BW), so you might wonder why you need Data Factory at all. Fair question.
Reasons to use Data Factory instead of direct Power BI-to-SAP connections:
Performance. Direct queries against SAP for large datasets are slow and put load on your SAP system. Your SAP admin will not be happy when 50 Power BI users all hit SAP at 9 AM.
Data blending. Most useful reports combine SAP data with non-SAP data (CRM, HR, external market data). Data Factory lets you land everything in one place and join it there.
Data transformation. SAP data structures are notoriously complex. Material master data alone can involve 20+ tables. Doing this transformation in Power BI is painful and brittle. Data Factory handles it much better.
History and snapshots. SAP is transactional - it shows current state, not history. Data Factory can capture daily snapshots to enable trend analysis that SAP alone can't provide.
Reduced SAP dependency. If SAP goes down for maintenance, your Power BI reports still work because they're reading from a separate data store, not from SAP directly.
Security separation. You can grant reporting access through the data platform without granting SAP access. This is a big deal in Australian enterprises where SAP licencing and access control are tightly managed.
SAP Connector Options in Data Factory
Data Factory offers several ways to connect to SAP. Choosing the right one depends on your SAP landscape and what data you need.
SAP Table Connector
Best for: Extracting raw table data from SAP ECC or S/4HANA.
The SAP Table connector reads directly from SAP tables using the SAP RFC protocol. It's the simplest approach and works well for:
- Master data extraction (materials, customers, vendors, cost centres)
- Transactional data (sales orders, purchase orders, invoices)
- Configuration tables
Requirements:
- SAP .NET Connector (NCo) 3.0 installed on the self-hosted integration runtime
- RFC user in SAP with appropriate authorisations (S_RFC, S_TABU_DIS)
- Network connectivity from SHIR to SAP application server
Limitations:
- You need to know which SAP tables contain the data you want (not always obvious)
- No built-in support for SAP's delta extraction mechanisms
- Large table extractions can be slow without proper filtering
SAP ODP (Operational Data Provisioning) Connector
Best for: Structured extraction with delta capability from SAP BW or S/4HANA.
The ODP connector accesses SAP's extraction framework, which supports:
- Delta (incremental) extraction
- DataSources from SAP BW extractors
- CDS views from S/4HANA
- SAP BW InfoProviders and queries
Requirements:
- SAP .NET Connector installed on the SHIR
- ODP framework activated in SAP (usually already active in recent versions)
- Appropriate SAP authorisations
This is our preferred connector for most SAP integration projects. It provides structured extraction with delta support, which means after the initial full load, subsequent extractions only pull changed records. This dramatically reduces extraction time and SAP system load.
SAP HANA Connector
Best for: Direct access to SAP HANA database tables or views.
If your SAP system runs on HANA (S/4HANA or Suite on HANA), you can bypass the application layer and read directly from the HANA database.
Requirements:
- SAP HANA ODBC driver installed on the SHIR
- HANA database user with SELECT permissions
- Network connectivity from SHIR to HANA database port (typically 30015)
When to use:
- You need high-performance extraction of large data volumes
- You want to access HANA calculation views
- Your SAP team has built data models in HANA that are ready for extraction
When to avoid:
- Your SAP system is not on HANA
- You need delta extraction (HANA connector does full loads by default)
- SAP admin restricts direct HANA database access (common in large enterprises)
SAP BW Connector
Best for: Extracting data from SAP BW InfoProviders and queries.
If your organisation uses SAP BW or BW/4HANA as its data warehouse, this connector accesses the curated data models that your SAP BW team has already built.
Requirements:
- SAP .NET Connector installed on the SHIR
- MDX or BAPI access to BW
- Appropriate SAP BW authorisations
Advantage: You get data that's already been modelled and curated by your SAP team, reducing the transformation work in Data Factory.
Recommended Architecture
Here's the architecture we recommend for SAP-to-Power-BI via Data Factory:
SAP ECC/S4HANA/BW
|
| (SAP ODP or Table connector via SHIR)
v
Azure Data Factory / Fabric Data Factory
|
| (Copy + Transform)
v
Azure Data Lake Gen2 / Fabric Lakehouse
|
| (Bronze -> Silver -> Gold layers)
v
Azure SQL / Fabric Warehouse (Gold/Serving layer)
|
v
Power BI (Reports and Dashboards)
The Three-Layer Pattern
Bronze layer: Raw SAP data, extracted as-is. No transformations. This is your safety net - you can always reprocess from bronze if something goes wrong in later layers.
Silver layer: Cleaned and standardised SAP data. Column names translated from SAP technical names to business-friendly names. Data types standardised. Basic quality checks applied. SAP tables joined into meaningful entities (e.g., combining MARA, MAKT, and MARC into a unified materials master).
Gold layer: Business-ready data models optimised for Power BI. Star schema or dimensional models with fact and dimension tables. Pre-calculated measures where appropriate. This is what Power BI connects to.
Why three layers instead of one? Because SAP data is messy. Tables have cryptic names (VBAK, VBAP, LIKP, LIPS). Fields use abbreviations and internal codes. Relationships between tables aren't always obvious. Processing in layers lets you separate concerns and makes troubleshooting much easier.
Step-by-Step Implementation
Step 1 - Identify the Data You Need
Start with the Power BI reports you want to build, and work backwards:
- What questions do the reports answer? (e.g., "What are our sales by product by region?")
- What data fields do you need? (sales order amount, product name, ship-to region, date)
- Which SAP tables or extractors contain those fields? (VBAK, VBAP, KNA1, MARA)
This is where SAP knowledge is essential. If you don't have someone on your team who understands SAP data structures, bring in help. Guessing at SAP table relationships is a fast way to produce wrong reports.
Common SAP data entities and their source tables:
| Business Entity | Key SAP Tables | Notes |
|---|---|---|
| Sales Orders | VBAK, VBAP, VBEP | Header, item, schedule line |
| Deliveries | LIKP, LIPS | Header and item |
| Invoices | VBRK, VBRP | Header and item |
| Purchase Orders | EKKO, EKPO | Header and item |
| Materials Master | MARA, MAKT, MARC, MARD | General, text, plant, storage |
| Customers | KNA1, KNB1, KNVV | General, company code, sales area |
| Vendors | LFA1, LFB1 | General, company code |
| GL Accounts | SKA1, SKAT | Chart of accounts, text |
| Financial Postings | BSEG, BKPF | Line items, document header |
| Cost Centres | CSKS, CSKT | Master, text |
Step 2 - Set Up the Self-Hosted Integration Runtime
Follow the setup instructions in our on-premises connectivity guide.
For SAP specifically:
- Install the SAP .NET Connector 3.0 on the SHIR server (download from SAP Marketplace - requires an SAP S-user)
- Copy the NCo DLLs to the SHIR installation directory
- Restart the SHIR service
- Verify the SAP connector appears in the linked service options in Data Factory
Step 3 - Build Extraction Pipelines
For each SAP entity:
- Create a linked service for your SAP system (using the appropriate connector)
- Build a pipeline with a Copy activity
- Configure the source (SAP table, ODP extractor, or HANA query)
- Configure the destination (Data Lake or Lakehouse - bronze layer)
- Add appropriate filters (date ranges, company codes) to limit extraction scope
- Test with a small dataset first
Delta extraction pattern (using ODP):
For the initial load:
- Set the extraction mode to "Full"
- Extract all historical data
For subsequent loads:
- Set the extraction mode to "Delta"
- ODP tracks the last extraction point and only sends changed records
- Merge deltas into the bronze layer
Step 4 - Build Transformation Pipelines
Transform bronze data to silver and gold layers using:
- Mapping data flows (ADF) for visual, code-free transformations
- Dataflows Gen2 (Fabric) for Power Query-based transformations
- Spark notebooks (Fabric or Databricks) for complex transformations
Key transformations for SAP data:
- Rename columns from SAP technical names (MATNR) to business names (Material_Number)
- Convert SAP date formats (YYYYMMDD as string) to proper date types
- Decode SAP coded values (e.g., order status codes to readable descriptions)
- Join related tables into unified entities
- Apply currency conversion where needed
- Filter to relevant company codes, sales organisations, or plant codes
Step 5 - Build Power BI Reports
Connect Power BI to the gold layer (not to the bronze or silver layers, and definitely not directly to SAP).
Power BI configuration tips for SAP data:
- Use Import mode for the best end-user performance
- Schedule Power BI dataset refreshes after Data Factory pipelines complete
- Build a date dimension table in the gold layer (don't rely on SAP date fields directly)
- Use Power BI row-level security to replicate SAP authorisation logic where needed
Performance Considerations
SAP extractions can be slow. Here's how to improve performance:
1. Filter aggressively. Don't extract all data from all time. Use date filters, company code filters, and plant filters to reduce volume.
2. Use parallel extraction. The SAP Table connector supports partitioning - split large tables across multiple parallel reads. For MARA with 500,000 records, splitting into 10 partitions can cut extraction time by 5-8x.
3. Extract during off-peak hours. Schedule SAP extractions for nights and weekends (Australian time) when SAP system load is lowest. This improves extraction speed and keeps your SAP admin happy.
4. Use delta extraction. After the initial full load, incremental deltas reduce extraction volume by 90%+ for most entities.
5. Size the SHIR appropriately. SAP extraction is CPU-intensive due to data serialisation. For large SAP estates, use an 8-core, 32 GB SHIR server minimum.
Common SAP Integration Challenges
SAP authorisations. Getting the right RFC authorisations in SAP can take weeks in large enterprises. Start this process early. You need an SAP Basis team member to create the RFC user and assign appropriate roles.
SAP system availability. SAP systems have maintenance windows, transport schedules, and occasional unplanned downtime. Build retry logic into your pipelines and don't depend on 100% SAP availability.
Data volume management. SAP financial data (BSEG) can contain hundreds of millions of records. Extracting the full history requires planning - consider initial load in batches by fiscal year.
Currency and unit handling. SAP stores amounts and quantities with separate currency and unit fields. Make sure your transformation logic handles currency conversion correctly, especially for multi-currency organisations.
Language-dependent texts. SAP stores text descriptions in multiple languages. Make sure your extraction filters for the correct language key (EN for English, or your organisation's preferred language).
How Team 400 Can Help
SAP integration is one of the most technically demanding Data Factory projects you can undertake. It requires deep knowledge of both SAP data structures and Azure data platform architecture.
We're Microsoft Data Factory consultants with hands-on SAP integration experience. We've built SAP-to-Power-BI architectures for Australian enterprises across mining, manufacturing, and professional services.
Our team works across the entire stack - Data Factory for extraction, Microsoft Fabric for the data platform, and Power BI for reporting. That means one team, one architecture, and no finger-pointing between vendors when something needs to be fixed.
Contact us to discuss your SAP integration requirements, or explore our full range of AI and data consulting services.