Pulling Data From Webpages Into Power BI - When It Works and When It Will Bite You
I get asked about this feature more than I expected to when I started consulting on Power BI projects. A finance manager or operations lead will say something like "the data we need is on a public webpage, can we just pull it into Power BI?". The answer is yes, technically. The longer answer is yes but you need to know what you are signing up for.
Power BI Desktop has a Web connector that lets you point at a URL and import tables from the page. It has been there for years. It works well for what it is. It is also one of the most fragile data sources you can build a report on, and I want to talk about both sides of that honestly.
The basic mechanics
Open Power BI Desktop. On the Home ribbon, click Get data, then Web. A dialog asks for a URL. Paste it in. Click OK. After a few seconds the Navigator window appears, showing you the tables that Power BI managed to extract from the page.
The left side of the Navigator shows a list of available data elements - things like "Table 1", "Table 2", or sometimes more meaningful names if the page has structured data. Click any of them and the right side shows a preview. From here you have two choices. Click Load to bring the data into the model as it is, or click Transform Data to open Power Query Editor and clean it up first.
In practice we almost always click Transform Data. Web data is rarely clean enough to use directly. There will be merged header rows, footer rows that have crept in, columns that came across as text when they should be numbers, and stray asterisks or footnote markers that need stripping. Power Query handles all of this, but you need to know it is coming.
Once the data is in, it behaves like any other source. You can build relationships to it, write DAX against it, drop it onto the canvas. The connector is just the front door.
What actually works well
Static reference data is the sweet spot. Public statistics, lookup tables, country code lists, ABS reference data that does not change often, that sort of thing. Pull it in, refresh it occasionally, get on with your life. We have used the Web connector for things like exchange rate snapshots, RBA cash rate history, and various government published reference tables for clients in the financial services space.
Pages that publish proper HTML tables with semantic markup work the best. The connector parses the document object model and finds elements that look like tables. When the publisher of the page has done their HTML properly, you get clean rectangular data on the first try.
For a few quick analyses or one off reports, it is genuinely useful. A research analyst at a retail client we worked with needed to pull product information from a public catalogue site for a competitive pricing exercise. The whole thing was set up in fifteen minutes. The report was used once, then archived. That is the right use case.
Where it gets ugly
JavaScript heavy pages are where the connector struggles. If the data on the page is loaded dynamically by client side code after the initial HTML loads, the Web connector typically does not see it. The connector fetches the raw HTML response from the server, which on a modern web app might be a nearly empty shell with a script tag that builds everything later. You open Navigator and there is nothing useful in there.
There is a newer Web by example option that uses inference to extract data from rendered pages, and it is better at handling complex layouts than the original connector. It is still not a substitute for a real API, and it still chokes on heavily dynamic sites. If you find yourself fighting this for more than half an hour, the answer is almost always to stop and find a different data source.
The other issue is fragility. Web pages change. The publisher reorganises their site. They add a new column. They merge two tables. They rename a section. Your refresh fails or, worse, refreshes successfully but pulls the wrong data because Power Query is now binding to a table that has different semantics than it had last month.
We have seen this happen on a client report that pulled commodity prices from a public site. The site got redesigned, the table structure changed, and the report kept refreshing without errors but with subtly wrong numbers for two weeks before anyone noticed. That is the nightmare scenario. The fix was to switch the data source to a paid API, which should have been the answer from day one.
Authentication and privacy levels
When you connect to a web page, Power BI will sometimes ask about authentication. For most public pages it is Anonymous. For pages behind a login you can set Basic, Windows, or Web API key authentication. The connector handles common patterns but it is not magic. If a site uses some bespoke OAuth flow or a CAPTCHA, the connector will not handle it.
Privacy levels are another quiet trap. If you combine web data with other sources in the same query and the privacy settings do not match, you can get a Formula.Firewall error that blocks the refresh entirely. The fix is to set the privacy level on the Web connection appropriately - usually Public for public web pages - through Data source settings. We see this constantly when a junior developer adds a web source to a model and the whole thing breaks on the next refresh in the Power BI Service.
The certificate revocation switch
There is a setting in File > Options and settings > Options > Security called "Check if your certificates have been revoked". By default it is on, which is what you want. Power BI checks the certificate of the server you are connecting to and refuses the connection if it has been revoked.
In some scenarios, particularly when you are using a debugging proxy like Fiddler to capture web traffic, the certificate check causes problems and you may need to turn it off temporarily. The Microsoft documentation calls this out. My advice is to turn it back on the minute you are done. Leaving certificate revocation checking off in a corporate environment is the kind of thing that will get flagged in a security audit and rightly so.
When to use a Power Query function for the URL
If you are pulling from a paginated source or a set of pages with similar structure, you can wrap the Web.Contents call in a Power Query function and invoke it for each URL. We do this for clients who need to scrape index pages where data is spread across multiple URLs. It is more work than the basic connector but still less work than building a custom data pipeline.
The catch is that the Power BI Service has tight rules about dynamic data sources. If the URL is constructed dynamically inside the M code, the service may refuse to refresh it because it cannot evaluate the privacy settings statically. The workaround is to pass the URL as a parameter and use the RelativePath option in Web.Contents. This pattern is well documented in the Microsoft Learn articles but it is the kind of thing that catches developers out when their report works in Desktop but refuses to refresh in the cloud.
What we recommend to clients
If a customer is considering using web scraping as a long term data source, we usually push back hard. The cost of maintaining a scraping pipeline over years almost always exceeds the cost of using a proper API or buying the data from a vendor. We have seen clients with five or six "quick web pulls" that turned into a maintenance burden the data team eventually had to absorb.
For one off analyses and quick research, the Web connector is genuinely handy and we use it ourselves. For dashboards that real business decisions depend on, find an API. If the data is not available through an API, ask the publisher. Most reasonable organisations will provide structured data if you ask nicely or pay them a small fee. It is almost always cheaper than the alternative.
When we help clients architect their analytics platforms through our data engineering services, the conversation about data source quality comes up early and it is not negotiable. The cleanest dashboard in the world is worthless if the data behind it is unreliable, and web scraped data is fundamentally unreliable in a way that even bad APIs are not.
Worth knowing, worth using carefully
The Web connector is one of those Power BI features that is exactly as good as it needs to be for the use case it was built for. It is not a replacement for proper integration. It is a useful tool for quick exploration and one off pulls.
If you are an Australian organisation thinking about how to bring more data sources into your Power BI estate, the right starting point is a clear understanding of which sources you really need to integrate, how often they change, and how much you are willing to spend to maintain them. Our team helps clients work through these decisions on Microsoft Fabric and Power BI projects all the time. If you want a sensible second opinion on your data sourcing strategy, get in touch.
Reference - Connect to webpages from Power BI Desktop (Microsoft Learn)