AWS S3 data pipelines,
described in plain English.
Land PostgreSQL, MySQL, and SaaS data in S3 as Parquet, JSON, or CSV — CDC streaming or scheduled snapshots — and move raw files byte-for-byte between S3, GCS, and Azure. Live on rsync.ai Cloud, no per-row fees.
rsync.ai writes to AWS S3 two ways: structured output (Parquet, JSON, or CSV with a schema manifest, from Postgres/MySQL CDC or snapshots) and byte-identical blob passthrough (copy any object between S3, GCS, and Azure). Path layout, partitioning, and format are configurable; PII columns are masked before the write. Works with S3-compatible stores like MinIO, R2, and Wasabi.
- Parquet, JSON, or CSV — date / hour / Hive partitioning
- CDC streaming or scheduled snapshots — or both
- IAM role or access-key auth — MinIO & R2 supported
- Blob passthrough between S3, GCS, and Azure Blob
Popular pipelines into S3
Step-by-step guides for the most common sources. Any other source works too — just describe it.
PostgreSQL → S3
Stream Postgres CDC changes or take consistent snapshots to S3 as Parquet, JSON, or CSV. Date or hour partitioning, Hive-style layout for Athena and Glue.
MySQL → S3
Export MySQL tables to S3 with binlog-based CDC or scheduled snapshots. Schema inferred from MySQL column types, written as Snappy-compressed Parquet by default.
What the S3 connector does
Structured exports for analytics, and raw blob passthrough for everything else.
Structured exports
Postgres & MySQL tables to Parquet, JSON, or CSV with a schema manifest.
Blob passthrough
Copy any object byte-for-byte between S3, GCS, and Azure — SHA-256 verified.
PII-safe
Mask or hash sensitive columns before a single byte lands in your bucket.
rsync.ai vs. Fivetran, Airbyte, custom scripts for S3
What you give up — and gain — choosing rsync.ai for pipelines into AWS S3.