Connecting Amazon S3 to Microsoft Fabric Data Factory - A Practical Guide
A surprising number of Australian businesses live in two cloud worlds at once. Their data platform is being built on Microsoft Fabric, but a big chunk of their actual data sits in Amazon S3 buckets, often because some earlier system was built on AWS and nobody has moved it. That is completely normal and not a problem to apologise for. The question that lands on my desk is always the same: how do we get the data from S3 into Fabric without building a fragile mess of scripts that breaks every second Tuesday?
The answer, more often than not, is the Amazon S3 connector in Fabric Data Factory. It is a built-in source connector that lets a Data Factory pipeline reach into an S3 bucket, read files, and pull them into the Fabric world (a Lakehouse, a Warehouse, or wherever you are landing data). No custom code, no Lambda functions, no homegrown sync job running on a forgotten VM.
Let me walk through how it actually works, where it shines, and the bits that trip people up.
Why this connector matters
Cross-cloud data movement used to be genuinely painful. You either wrote your own code against the AWS SDK, paid for a third-party tool, or set up some scheduled export that dumped files somewhere and hoped a downstream job picked them up. Every one of those approaches has a maintenance tail, and the homegrown ones tend to be understood by exactly one person who eventually leaves.
Having S3 as a first-class source inside Fabric Data Factory changes the economics. You configure a connection once, point a pipeline at it, and the movement becomes part of your normal orchestration rather than a separate thing bolted on the side. For clients who are standardising on Fabric but cannot rip everything out of AWS overnight (which is most of them), this is the bridge that makes a staged migration realistic.
We do a lot of this kind of plumbing in our Microsoft Fabric consulting work, and the pattern repeats: the data is in AWS, the analytics future is in Fabric, and the S3 connector is the join.
What it actually does
At its core the connector reads files from an S3 bucket. You give it the connection details, it authenticates against AWS, and then a Copy activity (or a dataflow) in your pipeline can treat that bucket like any other file source.
It handles the common file formats you would expect. Delimited text like CSV, JSON, Parquet, ORC, Avro, plus binary copy when you just want to move files byte-for-byte without parsing them. Parquet is the one I steer people towards when they have a choice, because it is columnar, compressed, and plays nicely with everything on the Fabric side. If your S3 data is sitting as enormous uncompressed CSVs, that is worth a conversation before you start moving terabytes around.
You can point it at a whole bucket, a folder prefix, or specific files. The wildcard and prefix support is genuinely useful for the standard "land a new file every day in a dated folder" pattern, because you can write a pipeline that grabs everything matching a path expression rather than hardcoding file names.
Authentication, which is where people get stuck
The connector authenticates using AWS access keys, specifically an access key ID and a secret access key. This is the part where projects stall, not because it is hard, but because of how AWS permissions and security teams interact.
A few honest observations from doing this repeatedly:
Create a dedicated IAM user (or role-based credentials) for Fabric with read-only access to exactly the buckets it needs. Do not hand over an admin key because it was quicker. I have seen people do this, and it is the kind of shortcut that turns into an incident report later. The connector only needs to read, so scope it to read.
Store the secret in a way that does not end up in plain text somewhere. Fabric connections keep credentials in the connection object rather than scattering them through pipeline definitions, which is good, but the bigger risk is the key getting pasted into a Teams chat or a runbook during setup. Treat the AWS secret key like a password, because it is one.
The other recurring snag is the AWS side simply not being ready. Half the delay on these projects is waiting for someone with AWS access to create the IAM user and confirm the bucket policy. If you are scoping a project, sort that out early rather than discovering on day one that nobody knows who owns the AWS account.
Where it works well
For batch movement of files on a schedule, this connector is solid. Daily extracts, periodic dumps, bulk loads of historical data, that whole category is exactly what it is built for. Set up a pipeline, schedule it, and let it run. We have clients pulling daily files out of S3 into a Fabric Lakehouse where they get cleaned and modelled for reporting, and once it is configured it just works in the background.
The integration into the rest of Data Factory is the real benefit. Because the S3 read is just a source in a normal pipeline, you can chain it with transformations, write the result wherever you need, add error handling, and monitor it alongside everything else. It is not a separate tool you have to babysit. If you are already building out pipelines in Fabric, adding S3 as a source is a small step rather than a new project. Our Data Factory consulting work is largely about getting these pipelines designed so they stay maintainable as they grow, and S3 sources slot into that cleanly.
Where to be careful
Now the honest caveats, because no connector is free of sharp edges.
This is a batch tool, not a streaming one. If you are imagining files appearing in S3 and instantly showing up in Fabric, adjust that expectation. The connector reads when a pipeline runs. You can run pipelines often, but you are still on a schedule or a trigger, not a live feed. For most reporting and analytics that is completely fine. If you genuinely need near-real-time, that is a different architecture conversation.
Then there is the cost question that nobody enjoys: data egress. AWS charges you to move data out of S3 to the internet. If you are pulling large volumes across to Fabric regularly, those egress fees add up, and they are easy to forget until the AWS bill arrives. Before you set up a pipeline that drags terabytes across every night, work out whether you actually need all of it, or whether you can filter, aggregate, or move only the changed files. The cheapest data movement is the one you do not do.
Region matters too. The further apart your S3 bucket and your Fabric capacity sit geographically, the more latency and potentially more cost you wear on every transfer. For Australian clients with buckets in an AWS region close to home, this is usually fine, but it is worth checking rather than assuming.
And keep an eye on schema drift. If the upstream AWS system changes its file format, adds a column, or renames something, your pipeline can break or silently load wrong data. This is not unique to S3, but cross-cloud setups make it harder to spot because the team that changed the file format often has no idea your Fabric pipeline depends on it. Build in some validation and do not assume the file looks the same today as it did at go-live.
How we approach it
When we set one of these up, the order is roughly: confirm AWS access and IAM permissions first, agree on file formats and folder structure, work out the volume and therefore the likely egress cost, then build the pipeline with proper logging and a sensible failure path. The actual connector configuration is the quick part. The thinking around it is where the value is.
If you are running a mixed AWS and Microsoft environment and trying to work out the cleanest way to get your S3 data into Fabric, that is exactly the sort of thing we help with. Get in touch and we can talk through whether the S3 connector is the right fit or whether your situation calls for something else. We would rather give you a straight answer than sell you a pipeline you do not need. You can also read more about how we approach data and AI projects across the Microsoft stack.
Reference: Amazon S3 connector overview, Microsoft Fabric Data Factory documentation.