Export PostgreSQL to S3
as Parquet, JSON, or CSV.
Stream PostgreSQL CDC changes to S3 for data lake ingestion, compliance archiving, or cost-effective long-term storage — described in plain English. Parquet by default, Snappy-compressed, date-partitioned.
rsync.ai reads PostgreSQL via logical replication (CDC mode) or consistent COPY (snapshot mode) and writes to S3 as Parquet (Snappy), JSON, or CSV. Path layout is configurable: date or hour partitioning, Hive-style or flat. Schema is inferred from Postgres column types. Initial snapshot + incremental CDC — or scheduled full snapshot. Works with self-hosted Postgres, RDS, Aurora, Supabase, Neon.
- Parquet (Snappy-compressed), JSON, or CSV — your choice
- Date or hour partitioning — Hive-style for Athena / Glue
- CDC streaming or scheduled snapshots — or both
- Self-hosted, source-available under Elastic License 2.0
How to sync PostgreSQL to AWS S3 — 5 steps
From enabling logical replication to Parquet files landing in your bucket.
- 1
Enable logical replication on PostgreSQL
Set `wal_level=logical` in postgresql.conf (or via the RDS/Aurora parameter group with `rds.logical_replication=1`). Set `max_replication_slots` ≥ 2. Create a publication for the tables you want to export: `CREATE PUBLICATION rsync_pub FOR ALL TABLES;`. For batch snapshot mode only (no CDC streaming), you can skip the replication slot — rsync.ai will use a consistent COPY snapshot instead.
wal_level=logical · pgoutput · or batch snapshot mode - 2
Connect PostgreSQL
Provide host, port (default 5432), user, password, and database. For CDC streaming mode also provide the replication slot name and publication name. The user needs SELECT on target tables (snapshot mode) or REPLICATION privilege (CDC mode). rsync.ai supports self-hosted Postgres 10+, Amazon RDS, Aurora, Google Cloud SQL, Supabase, and Neon.
CDC mode: REPLICATION role · Snapshot mode: SELECT only - 3
Connect AWS S3
Provide the S3 bucket name and your preferred authentication: AWS access key + secret, or an IAM role ARN (recommended for EC2/ECS deployments). The IAM policy needs `s3:PutObject`, `s3:GetObject`, and `s3:ListBucket` on your bucket. If rsync.ai runs inside AWS, attach an instance profile or ECS task role — no long-lived credentials needed.
Access key / secret · or IAM role (recommended) - 4
Describe the sync in plain English
Type what you want: 'Snapshot all tables from the public schema to S3 as Parquet daily at 2am, then stream CDC changes hourly.' rsync.ai determines the S3 path layout, file format, and partitioning strategy. You can also specify a custom path prefix, format (Parquet, JSON, CSV), or compression (Snappy, GZIP, uncompressed).
No SQL · No YAML · Parquet / JSON / CSV · Snappy / GZIP - 5
Approve path layout and start the pipeline
rsync.ai shows the proposed S3 path structure — e.g. `s3://your-bucket/postgres/public/orders/2026-05-30/part-0001.parquet` — and the Parquet schema inferred from PostgreSQL column types. Review partitioning (date, hour), file naming, and column types. Approve and the pipeline takes an initial consistent snapshot, then streams incremental changes on the schedule you set.
s3://bucket/{schema}/{table}/YYYY-MM-DD/part-NNNN.parquet
PostgreSQL → S3 path layout
Default path structure rsync.ai proposes. Customize prefix, format, and partitioning before approving.
| PostgreSQL table | S3 path (Parquet) | Notes | |
|---|---|---|---|
| public.users | s3://bucket/postgres/public/users/YYYY-MM-DD/part-0001.parquet | UUID columns stored as STRING in Parquet. TIMESTAMPTZ→TIMESTAMP (UTC). | |
| public.orders | s3://bucket/postgres/public/orders/YYYY-MM-DD/part-0001.parquet | NUMERIC(10,2)→DECIMAL(10,2) in Parquet. JSONB columns stored as STRING. | |
| public.products | s3://bucket/postgres/public/products/YYYY-MM-DD/part-0001.parquet | TEXT[]→repeated STRING group. JSONB metadata→STRING. | |
| public.events | s3://bucket/postgres/public/events/YYYY-MM-DD/part-0001.parquet | High-volume table — hourly partitioning recommended. BIGSERIAL→INT64. | |
| public.sessions | s3://bucket/postgres/public/sessions/YYYY-MM-DD/part-0001.parquet | TIMESTAMPTZ expires_at→TIMESTAMP. UUID columns→STRING. | |
| public.audit_logs | s3://bucket/postgres/public/audit_logs/YYYY-MM-DD/part-0001.parquet | JSONB row_data→STRING. Append-only table — no UPDATE/DELETE CDC needed. |
rsync.ai vs. Fivetran, Airbyte, custom pg_dump scripts for PostgreSQL → S3
What you give up — and gain — choosing rsync.ai for Postgres to S3 pipelines.