Using the Amazon S3 Compatible Connector in Microsoft Fabric Data Factory
There is a connector in Microsoft Fabric Data Factory that confuses people the first time they see it, and it is worth clearing up because the confusion costs real time on projects. Next to the Amazon S3 connector sits a second one called "Amazon S3 Compatible". Same logo more or less, almost the same name, and a lot of people pick one at random and then wonder why their connection will not authenticate.
So let me sort that out. The S3 Compatible connector is not for AWS. It is for everything else that pretends to be AWS. There is a whole category of object storage that speaks the Amazon S3 API without being Amazon: MinIO running in someone's data centre, Cloudflare R2, Wasabi, Backblaze B2, Dell ECS, NetApp StorageGRID, and a long tail of on-premises appliances that vendors shipped with an S3-compatible front door. If your data lives in one of those, this is the connector you want.
We hit this more than you would expect. Plenty of Australian businesses have an on-prem MinIO cluster holding application data, or they moved to Cloudflare R2 to dodge AWS egress fees, and now they want that data in Fabric for reporting. The S3 Compatible connector is the bridge.
What it actually is
At a technical level it is a source connector that reads files from any storage system implementing the S3-compatible API. You give it three things: the service URL (the endpoint of your storage), an access key ID, and a secret access key. From there a Copy activity or a dataflow in your pipeline treats that storage like any other file source.
The key difference from the plain Amazon S3 connector is that service URL field. With actual AWS, the endpoint is known and the connector works it out from the region. With S3-compatible storage, you have to tell it exactly where the storage lives, because it could be https://minio.yourcompany.com.au or an R2 endpoint or a box sitting in a rack in your Sydney office. Get that URL wrong and nothing works, which is the single most common reason these connections fail on the first try.
It handles the file formats you would expect from any Fabric source. CSV and other delimited text, JSON, Parquet, ORC, Avro, plus binary copy when you just want to shift files byte-for-byte without parsing them. If you have a choice in the matter, Parquet is the one I push people towards, because it is compressed and columnar and behaves itself on the Fabric side. Uncompressed CSV at volume is a slow, expensive habit.
When you would reach for it
The honest answer is: when your storage is not AWS but talks like it is. A few patterns we see regularly.
On-prem MinIO is the big one. A lot of teams stood up MinIO years ago as a cheap, self-hosted object store for application data or backups, and it quietly accumulated data that nobody was reporting on. When the analytics conversation starts, this connector lets you pull that data into a Fabric Lakehouse without exporting it manually or writing a sync script that one person understands and nobody can maintain.
Cloudflare R2 is the next most common. People move to R2 specifically because it has no egress fees, which matters a great deal if you are reading data out regularly. If that is your storage, the S3 Compatible connector points straight at the R2 endpoint and you are away.
The rest are the various enterprise storage appliances. If your infrastructure team bought a storage system with an S3-compatible API, this is how Fabric reads from it without anyone having to build middleware.
We do a fair bit of this stitching in our Microsoft Fabric consulting work. The pattern is nearly always the same: data sitting in some object store that is not part of the Microsoft world, and a business that wants it modelled and reported in Fabric without a rebuild.
Authentication, and where it goes sideways
The connector uses an access key ID and a secret access key, the same model as AWS. That part is familiar. The trouble is usually one of three things.
First, the service URL. I said it above and I will say it again because it is genuinely the number one issue. The endpoint has to be exact, including whether it is HTTPS, the right port if your storage runs on a non-standard one, and the right path style. Some S3-compatible systems use path-style addressing (the bucket name in the URL path) rather than virtual-hosted style (the bucket name as a subdomain), and if your storage expects one and the connector assumes the other, you get authentication errors that look like a credentials problem but are not.
Second, certificates. On-prem and self-hosted storage often runs with a self-signed or internal certificate. If Fabric cannot validate the certificate chain, the connection fails. This is solvable, but it is the kind of thing that turns a ten-minute setup into a half-day of back-and-forth with the infrastructure team. Sort out the certificate situation before you assume the connector is broken.
Third, network reachability. If the storage is on-prem and not exposed to the public internet, Fabric in the cloud cannot reach it directly. You will need a data gateway or some network path that lets the Fabric service talk to your internal storage. This is the one that catches people who tested everything from inside the office network and assumed it would just work from the cloud. It will not, unless there is a route.
Scope the credentials to read-only and to exactly the buckets the pipeline needs. Do not hand over an admin key because it was faster during setup. That shortcut shows up in a security review later, and it is never a good look.
Where it works well, and where to be careful
For scheduled batch movement, this connector is dependable. Daily extracts, periodic dumps, bulk historical loads, that whole category is what it is built for. Once it is configured and the network path is sorted, it runs in the background as part of your normal orchestration. Because the read is just a source in a standard pipeline, you can chain transformations after it, write the result into a Lakehouse or Warehouse, add error handling, and monitor it next to everything else. That integration is the real win. Our Data Factory consulting work is mostly about getting these pipelines designed so they stay maintainable as they grow rather than turning into a pile of one-off jobs.
Now the caveats.
It is a batch tool, not a streaming one. The connector reads when a pipeline runs. You can trigger pipelines frequently, but you are still on a schedule, not a live feed. For reporting and analytics that is almost always fine. If you genuinely need near-real-time, that is a different architecture and a different conversation.
Watch the cost of moving data, but check who is charging you. The reason people pick R2 or self-hosted MinIO is often to avoid AWS-style egress fees. If your storage genuinely has no egress cost, great, move what you like. If it is a cloud provider that does charge, the same rule applies as with AWS: filter, aggregate, or move only changed files rather than dragging everything across every night.
Schema drift is the quiet killer. If the upstream system changes a file format, adds a column, or renames a field, your pipeline can break or silently load wrong data. Cross-system setups make this harder to catch, because the team that owns the storage often has no idea your Fabric pipeline depends on the file shape. Build in some validation and do not assume today's file looks like the one you tested at go-live.
How we approach it
The order we work in: confirm exactly what storage system it is and get the precise service URL, sort out certificates and network reachability, agree on file formats and folder structure, then build the pipeline with proper logging and a sensible failure path. The connector configuration itself is the quick bit. Everything around it is where projects actually succeed or stall.
If you have data sitting in MinIO, R2, Wasabi or some other S3-compatible store and you are trying to get it into Fabric cleanly, that is squarely the kind of problem we help with. Get in touch and we will give you a straight answer about whether this connector fits or whether your setup needs something different. You can also read more about how we handle data and AI work across the Microsoft stack.
Reference: Amazon S3 Compatible connector overview, Microsoft Fabric Data Factory documentation.