Skip to main content
MySQL integration

MySQL CDC pipelines,
in plain English.

Replicate MySQL to PostgreSQL, AWS S3, or Google Sheets using real-time binlog CDC — no polling, no missed deletes, no data engineers required.

TL;DR

rsync.ai connects to MySQL as a replica, reads the binary log in ROW format, and forwards every INSERT, UPDATE, and DELETE to your destination in real time. Works with MySQL 5.7+, Amazon RDS, Aurora, Google Cloud SQL, Azure Database for MySQL, and PlanetScale. GTID and file+position replication both supported. Column-level PII masking built in.

  • Real-time binlog CDC — no polling, no watermarks, no missed deletes
  • Works with RDS, Aurora, Cloud SQL, Azure DB, and PlanetScale
  • GTID replication with automatic resume after failover
  • Self-hosted — credentials and row data never leave your network

Every MySQL flavor supported

If it speaks the MySQL binary log protocol, rsync.ai can replicate it.

MySQL 5.7+MySQL 8.0+Amazon RDS for MySQLAmazon Aurora MySQLGoogle Cloud SQL for MySQLAzure Database for MySQLPlanetScaleMariaDB 10.3+Binlog ROW formatGTID replicationFile+position replication+ Vitess / PlanetScale Connect API

rsync.ai vs. the MySQL CDC alternatives

An honest look at the trade-offs for real-time MySQL replication.

Feature
rsync.aiyou
Fivetran
Airbyte
Debezium
Real-time MySQL binlog CDC
Plain-English pipeline setup
Self-hosted (data never leaves your VPC)
Source-available connector code (auditable)
Column-level PII masking / redaction
No per-row / per-MAR pricing
GTID + file+position replication

MySQL CDC — frequently asked

What is binlog CDC and how does it work with MySQL?

CDC (Change Data Capture) via MySQL's binary log (binlog) reads every INSERT, UPDATE, and DELETE event from MySQL's replication stream — the same mechanism MySQL replicas use. When `binlog_format=ROW` is set, each event carries the full before- and after-image of the row. rsync.ai connects as a MySQL replica, reads these events, and forwards them to your destination in real time — no polling, no `SELECT … WHERE updated_at > ?` watermarks, no missed deletes.

Does MySQL CDC work on Amazon RDS, Aurora, and Cloud SQL?

Yes. On RDS and Aurora, enable automated backups (sets `log_bin=ON`) and set the parameter group to `binlog_format=ROW`. On Google Cloud SQL, set `binlog_retention_hours` and switch the database flag `log_bin` on. On Azure Database for MySQL, use the `binlog_expire_logs_seconds` and `log_bin` server parameters. rsync.ai supports both file+position and GTID-based replication for all managed MySQL flavors.

Does MySQL CDC work on PlanetScale?

PlanetScale disables traditional binary log access for external replication clients by design. rsync.ai falls back to PlanetScale's native Connect API (CDC-over-HTTP) when connecting to a PlanetScale branch, so you still get row-level change events without needing binlog access. Set your connection type to 'PlanetScale' and rsync.ai handles the rest.

Will CDC replication fill up my MySQL disk with binlog files?

Only if you let binlog retention grow unbounded. rsync.ai acknowledges each event as soon as it is durably written to the destination, and MySQL purges binary logs based on `expire_logs_days` (MySQL 5.7) or `binlog_expire_logs_seconds` (MySQL 8.0). Set retention to 3–7 days — enough to survive a rsync.ai restart without losing events, but not enough to fill disk. rsync.ai also persists its replication position so it resumes from the last committed position after a restart.

Should I use GTID replication or file+position?

Use GTID if your MySQL supports it (MySQL 5.6+ with `gtid_mode=ON`). GTID gives each transaction a globally unique identifier, so rsync.ai can resume from an exact transaction boundary after a failover or primary switchover — no manual position bookkeeping. File+position works fine for single-primary setups but requires you to record the exact binlog file and offset when the initial snapshot is taken. rsync.ai supports both; GTID is the recommended default.

Can rsync.ai mask or redact PII columns before they reach the destination?

Yes. In your pipeline configuration you can mark any column as redacted (nulled), hashed (SHA-256), or truncated before events are forwarded. For example, you can hash `users.email` so the destination gets a consistent fingerprint useful for joins, but not the raw email address. Redaction happens in rsync.ai's processing layer — the raw value never reaches the destination or any log.

Is rsync.ai self-hosted? Can I run it inside my VPC?

Yes. rsync.ai runs entirely in your own infrastructure via `docker compose up`. The MySQL connector, the AI pipeline planner, and the control plane all run on compute you own. Your MySQL credentials and row data never leave your network. The license is Elastic License 2.0 — free to self-host, cannot be resold as a managed service.

Start replicating MySQL today.

Pick a destination — PostgreSQL, S3, or Google Sheets — and have CDC running in under 10 minutes.