Sync MySQL to AWS S3 —
snapshots and streaming CDC.
Export MySQL change events to S3 as partitioned Parquet files — feed Athena, Spark, or Glue without a separate ETL pipeline. Snapshot large tables, then stream incremental changes continuously.
rsync.ai reads MySQL binlog events in ROW format, batches them into Snappy-compressed Parquet files, and writes them to S3 with Hive-style date partitioning. Works with MySQL 5.7+, RDS, Aurora, Cloud SQL, and PlanetScale. IAM role or access key auth. Column-level PII masking before the S3 write. Athena-, Spark-, and Glue-compatible out of the box.
- Parquet output with Snappy compression — Athena, Spark, Glue compatible
- Hive-style date partitioning (YYYY-MM-DD) out of the box
- Chunked snapshot for large tables, then streaming CDC
- PII columns masked/nulled before S3 write — never in your data lake
How to sync MySQL to AWS S3 — 5 steps
From binlog setup to Parquet files in S3 — typically under 15 minutes.
- 1
Enable MySQL binlog
Set `binlog_format=ROW` and `log_bin=ON` in your MySQL config. Create a replication user with REPLICATION SLAVE and REPLICATION CLIENT privileges. On RDS or Aurora, enable automated backups and set the `binlog_format` parameter to ROW. Binlog retention should be at least 3 days — enough buffer for a rsync.ai restart without losing events.
binlog_format=ROW · log_bin=ON · REPLICATION SLAVE + REPLICATION CLIENT - 2
Connect MySQL source
Provide host, port, replication username, password, and the database name. rsync.ai connects as a MySQL replica and records the starting binlog position or GTID set before taking the initial snapshot. SSL/TLS and SSH tunnels are supported for MySQL instances inside a private VPC.
mysql://rsync_cdc:pass@host:3306/mydb - 3
Connect AWS S3
Provide your S3 bucket name and AWS credentials — either an IAM access key + secret, or an IAM role ARN if rsync.ai runs on EC2 or ECS with instance profile. The IAM policy needs `s3:PutObject`, `s3:GetObject`, and `s3:ListBucket` on your target bucket. You can optionally specify a key prefix (e.g. `mysql/prod/`).
s3://your-bucket/mysql/ — IAM role or access key - 4
Describe the sync in plain English
Tell rsync.ai what you want: 'Snapshot the MySQL orders table to S3 Parquet daily, then stream changes hourly.' or 'Replicate all tables in mydb to S3 continuously as CDC events.' The AI pipeline planner decides whether to use a full snapshot, incremental CDC, or both — and proposes a partition layout for Athena compatibility.
No SQL · No YAML · No DAGs - 5
Approve the S3 path layout and start
rsync.ai shows the proposed S3 key structure for each table (e.g. `s3://bucket/mysql/mydb/orders/2026-05-30/part-0001.parquet`). You can adjust prefix, partition granularity (hourly / daily / monthly), and file format (Parquet or JSON-Lines). Approve, and the pipeline starts — snapshot first, then streaming CDC batches on the schedule you set.
s3://bucket/mysql/{database}/{table}/YYYY-MM-DD/part-NNNN.parquet
MySQL → S3 output layout
Each MySQL table maps to a partitioned Parquet prefix in your S3 bucket. Paths shown for a 2026-05-30 run.
| MySQL table | S3 path | Notes | |
|---|---|---|---|
| orders | s3://bucket/mysql/mydb/orders/2026-05-30/part-0001.parquet | Partitioned by event date. Each Parquet file contains one day of inserts and updates. Deletes written as tombstone records with _deleted=true. | |
| products | s3://bucket/mysql/mydb/products/2026-05-30/part-0001.parquet | Full snapshot on first run, then CDC-only files. Parquet schema inferred from MySQL column types. | |
| customers | s3://bucket/mysql/mydb/customers/2026-05-30/part-0001.parquet | PII columns (email, phone) can be nulled or hashed before writing to S3. Configured per-column in the pipeline. | |
| sessions | s3://bucket/mysql/mydb/sessions/2026-05-30/part-0001.parquet | High-volume table. rsync.ai batches CDC events and writes Parquet files every N minutes (configurable) rather than per-event. | |
| transactions | s3://bucket/mysql/mydb/transactions/2026-05-30/part-0001.parquet | Append-only. rsync.ai writes INSERT events only. DECIMAL(18,8) preserved as Parquet DECIMAL with matching precision/scale. | |
| audit_log | s3://bucket/mysql/mydb/audit_log/2026-05-30/part-0001.parquet | Append-only audit table. Each Parquet file is Snappy-compressed. Athena can query directly with no manifest needed. |
rsync.ai vs. Fivetran, Airbyte, mysqldump+cron for MySQL → S3
How the options stack up for getting MySQL data into your data lake.