Skip to main content
Azure Blob Storage integration

Azure Blob Storage pipelines,
described in plain English.

Land PostgreSQL, MySQL, and SaaS data in Azure Blob as Parquet, JSON, or CSV — CDC streaming or scheduled snapshots — and move raw files byte-for-byte between Azure, S3, and GCS. Live on rsync.ai Cloud, no per-row fees.

TL;DR

rsync.ai writes to Azure Blob Storage two ways: structured output (Parquet, JSON, or CSV with a schema manifest, from Postgres/MySQL CDC or snapshots, ADLS Gen2 partitioned for Synapse and Fabric) and byte-identical blob passthrough (copy any object between Azure, S3, and GCS). Authenticate with an access key, SAS token, or managed identity; PII columns are masked before the write.

  • Parquet, JSON, or CSV — ADLS Gen2 & Hive partitioning
  • CDC streaming or scheduled snapshots — or both
  • SAS token or managed-identity auth
  • Blob passthrough between Azure Blob, S3, and GCS

What the Azure Blob connector does

Structured exports for analytics, and raw blob passthrough for everything else.

Structured exports

Postgres & MySQL tables to Parquet, JSON, or CSV with a schema manifest.

Blob passthrough

Copy any object byte-for-byte between Azure, S3, and GCS — SHA-256 verified.

PII-safe

Mask or hash sensitive columns before a single byte lands in your container.

Parquet (Snappy), JSON, or CSV outputDate / hour / Hive-style partitioningSAS token or managed-identity authPII masking before the Azure writeResumable block-blob uploadsSHA-256 integrity on every objectBlob passthrough: Azure ↔ S3 ↔ GCSNo per-row or per-GB pricing

rsync.ai vs. Fivetran, Airbyte, custom scripts for Azure Blob

What you give up — and gain — choosing rsync.ai for pipelines into Azure Blob Storage.

Feature
rsync.aiyou
Fivetran
Airbyte
Custom scripts
Plain-English pipeline setup
CDC streaming to Azure Blob (Postgres & MySQL)
Parquet output with schema manifest
Blob passthrough (Azure ↔ S3 ↔ GCS)
PII masking before write
No per-row / per-MAR pricing
Resumable snapshots (no restart on failure)

Azure Blob Storage pipelines — frequently asked

What can rsync.ai write to Azure Blob Storage?

Structured data and raw files. PostgreSQL and MySQL tables (via CDC or snapshot) and any other source land as Parquet, JSON, or CSV with a schema manifest — ready for Azure Synapse, Microsoft Fabric, or Databricks. Separately, blob passthrough copies any object byte-for-byte from S3 or GCS into Azure Blob without re-encoding.

How does rsync.ai authenticate to Azure Blob?

Use a storage-account access key, a scoped SAS token, or — if rsync.ai runs inside Azure — a managed identity so no long-lived secret is stored. You point rsync.ai at the storage account and container; the path layout and access tier are yours to configure.

Does it work with ADLS Gen2 and hierarchical namespaces?

Yes. Azure Data Lake Storage Gen2 (hierarchical namespace on Blob) is supported, including Hive-style partitioned Parquet that Synapse serverless SQL pools and Microsoft Fabric read directly. A JSON schema manifest is written alongside each batch for schema-evolution discovery.

Is Azure Blob a source or a destination?

Both. Azure Blob is typically a destination for data-lake and archival workloads, but rsync.ai can also read objects from Azure Blob and move them to another store (S3, GCS, or another container) with byte-identical blob passthrough. Blob → relational database is intentionally rejected — a raw binary can't be written to a table row without parsing.

Do I have to deploy anything to use the Azure Blob connector?

No. rsync.ai Cloud is live at app.rsync.ai — sign up free and build an Azure Blob pipeline in minutes, nothing to provision. If you'd rather run the whole stack inside your own VPC, self-hosting (source-available, Elastic License 2.0) arrives July 2026.