Power BI Web Data Import - A Practical Guide for Australian Teams
One of the first tricks I show people learning Power BI is the web connector. You paste a URL, Power BI scrapes the tables off the page, and suddenly you have data in your report. It feels a bit like magic the first time you see it. Then you try to refresh the report two weeks later and realise the page changed, your column names broke, and the whole thing fell over.
The Microsoft tutorial walks through importing the UEFA European Championship winners from Wikipedia. Cute example. Not particularly useful for most Australian businesses. But the underlying mechanics are exactly what you need when you want to pull ASX disclosures, ABS census tables, government open data portals, or competitor pricing pages into a report.
I've used the web connector on real consulting jobs more times than I can count. Here's what actually matters when you're using it for something beyond a tutorial.
When the web connector is the right call
Before getting into the how, a quick sanity check. The web connector is great for:
- Reference data that lives on a webpage and doesn't have an API. Things like postcode lookups, public statistics, exchange rates, regulator-published lists.
- Quick prototypes where you need to demonstrate a concept before someone commits to building a proper data pipeline.
- Augmenting internal data with public context. A sales report becomes more interesting when you can overlay it with population data or industry benchmarks.
It is not great for anything you'd describe as "important" that needs to refresh reliably for years. Webpages change. Tables get reformatted. CSS classes get renamed. Your beautiful Power BI report becomes a ticking time bomb.
If the data really matters and there's no API, talk to the source. Most government bodies will give you a direct CSV download URL if you ask, and a CSV download is dramatically more stable than scraping HTML.
Walking through it properly
The basic flow Microsoft describes is right. From the Home ribbon, dropdown Get data, pick Web, paste your URL, and Navigator opens with whatever tables Power BI can find on the page. You pick a table and click Transform Data to open Power Query Editor.
Sounds simple. Two things worth knowing that the tutorial glosses over.
First, the Navigator preview shows you the most recently selected table. So if you're hunting through 15 tables on a Wikipedia page trying to find the one with the data you want, you have to click each one and wait for the preview. There's no search. Bring patience. For pages with many tables, I sometimes use the browser developer tools to inspect the HTML first and figure out which table number I need before going into Power BI.
Second, Power BI's web connector can handle basic HTML pretty well, but it falls over with anything rendered by JavaScript. If the data appears in your browser after a network request fires, the web connector won't see it. You'll either get an empty result or a totally different page than what you expected. Right-click the page in your browser, view source, and search for some text from the table. If you can't find it in the raw HTML, the web connector won't either.
For JavaScript-heavy sites, you need a different approach. Either find the underlying API the page calls (network tab in browser dev tools), or use a service like a no-code scraper that renders pages first. We've built custom data ingestion solutions for clients where this came up, usually wrapped in an Azure Function that hits a real browser engine before handing the data to Power BI via a clean CSV in blob storage.
Cleaning up in Power Query Editor
Once you've got the table loaded into Power Query, the real work starts. Web tables are almost never in the shape you want. The Microsoft tutorial covers removing columns, filtering rows, renaming, and replacing values. All correct, all things you'll do.
A few patterns I've found useful that the tutorial doesn't mention:
Don't trust the first few rows. Web tables often have summary rows, header rows, or footer notes that get scraped as data. Look at the bottom of your imported table as well as the top. I've seen reports where the totals row from a webpage was being summed alongside the data rows for years before anyone noticed.
Promote headers early, but check what got promoted. Power BI's "Use First Row as Headers" is great when the first row really is headers. When the page has a merged header cell or a sub-header row, you end up with column names like "Column1.1" and "Result Sales". Worth a manual rename pass before doing anything else.
Use Replace Values for footnote markers. Wikipedia and government pages love their footnote markers like [a], [b], [c] tacked onto values. The tutorial mentions replacing "2020[c]" with "2021". In practice you want to strip these markers systematically. The Replace Values dialog supports regex if you tick the option in Advanced Options, which lets you nuke all bracket-and-letter patterns in one go with \[[a-z]\].
Always set explicit data types. Power BI tries to detect types from web sources and it's wrong often enough that you should never trust it. A year column might come in as text. A revenue column might come in as a date because one cell had a forward slash. Click each column, set its type explicitly, and validate that no errors appear in the column's quality indicator.
Authentication gotchas
The tutorial warns that "Web connections are only established using basic authentication" and that's basically true. If a page needs you to log in first, the web connector probably won't work. There are workarounds involving API keys in headers (the Advanced web connector options support this), but they're flaky.
If you need authenticated data, you're better off using a proper connector. Most SaaS tools either have a built-in Power BI connector or expose a REST API you can call from Power Query using Web.Contents() with authentication headers. We've done this for clients pulling data out of HubSpot, monday.com, Zendesk, and various Australian-specific tools. Worth the extra setup time because the result actually refreshes in the service.
One detail people miss: when you publish a report that uses the web connector, Power BI Service needs to be able to reach that URL on its own. If you're scraping an internal SharePoint page or an intranet site, the service can't get to it. You'd need an on-premises data gateway, and even then the web connector through a gateway has its own quirks.
Naming queries like an adult
When you finish your transformations in Power Query Editor, name the query something meaningful before clicking Close & Apply. The tutorial example calls it "Euro Cup Winners" which is fine. The default would have been "Table 3" or similar.
This matters more than people think. Six months later when you or a colleague opens this report to fix something, "Table 3" tells you nothing. "ASX_Top_200_Constituents" tells you exactly what's there. We've inherited some shocking Power BI reports from previous consultants where every query was called Query1, Query2, Query3. Don't be that person.
A naming convention that's served me well: source first, then content. So Wikipedia_UEFA_Winners or ABS_Population_NSW_2021 or Internal_Sales_FY26. When you scan the Queries pane, you immediately know what's from where.
Building visuals on top
Once your data lands in the report, the rest is normal Power BI work. Maps, charts, slicers, the usual. The Microsoft tutorial walks through creating a map and a pie chart from the Euro Cup data, which is fine for a tutorial.
One thing I'd push back on from the tutorial: pie charts. Don't use them. They make comparison hard, they don't scale beyond a few categories, and there's almost always a better choice. For category comparison, bar charts. For composition, stacked bars or treemaps. Pie charts have their place but it's narrow and the tutorial example isn't it.
If you're trying to learn the deeper craft of Power BI report design, look at things like deployment pipelines and proper modelling before chasing fancy visual types. There's a lot of Power BI consulting work we do that's purely about cleaning up the modelling layer so the visuals make sense in the first place.
When to stop scraping and build something real
I've used the web connector to win business. Show up to a client, they say "we wish we had X data but it's only on the regulator's website", you spin up a Power BI report in an hour that proves the concept. Now they want it productionised.
That's the line. Prototype with the web connector. If the report has real users and real consequences, replace the web scrape with a proper data ingestion. Schedule a job that pulls the source data into your warehouse or lakehouse, version it, validate it, alert when something changes. Then connect Power BI to your clean copy.
Many of our Microsoft Fabric implementations started exactly this way. Quick web-scraped Power BI proof of concept, then a proper data pipeline once the value was clear. The web connector is a starting point, not a finishing point.
Reference
Microsoft's original tutorial covers the basic mechanics: Tutorial - Analyze webpage data by using Power BI Desktop.
Worth doing once just to feel how the web connector works. Then come back to this post when you're trying to use it for something that matters.