Skip to main content
Google Cloud Storage integration

Google Cloud Storage pipelines,
described in plain English.

Land PostgreSQL, MySQL, and SaaS data in GCS as Parquet, JSON, or CSV — CDC streaming or scheduled snapshots — and move raw files byte-for-byte between GCS, S3, and Azure. Live on rsync.ai Cloud, no per-row fees.

TL;DR

rsync.ai writes to Google Cloud Storage two ways: structured output (Parquet, JSON, or CSV with a schema manifest, from Postgres/MySQL CDC or snapshots, partitioned for BigQuery external tables) and byte-identical blob passthrough (copy any object between GCS, S3, and Azure). Authenticate with a service account or workload identity; PII columns are masked before the write.

  • Parquet, JSON, or CSV — Hive partitioning for BigQuery
  • CDC streaming or scheduled snapshots — or both
  • Service-account or workload-identity auth
  • Blob passthrough between GCS, S3, and Azure Blob

What the GCS connector does

Structured exports for analytics, and raw blob passthrough for everything else.

Structured exports

Postgres & MySQL tables to Parquet, JSON, or CSV with a schema manifest.

Blob passthrough

Copy any object byte-for-byte between GCS, S3, and Azure — SHA-256 verified.

PII-safe

Mask or hash sensitive columns before a single byte lands in your bucket.

Parquet (Snappy), JSON, or CSV outputDate / hour / Hive-style partitioningService-account or workload-identity authPII masking before the GCS writeResumable uploadsSHA-256 integrity on every objectBlob passthrough: GCS ↔ S3 ↔ AzureNo per-row or per-GB pricing

rsync.ai vs. Fivetran, Airbyte, custom scripts for GCS

What you give up — and gain — choosing rsync.ai for pipelines into Google Cloud Storage.

Feature
rsync.aiyou
Fivetran
Airbyte
Custom scripts
Plain-English pipeline setup
CDC streaming to GCS (Postgres & MySQL)
Parquet output with schema manifest
Blob passthrough (GCS ↔ S3 ↔ Azure)
PII masking before write
No per-row / per-MAR pricing
Resumable snapshots (no restart on failure)

Google Cloud Storage pipelines — frequently asked

What can rsync.ai write to Google Cloud Storage?

Structured data and raw files. PostgreSQL and MySQL tables (via CDC or snapshot) and any other source land as Parquet, JSON, or CSV with a schema manifest — ready for BigQuery external tables, Dataproc, or DuckDB. Separately, blob passthrough copies any object byte-for-byte from S3 or Azure Blob into GCS without re-encoding.

How does rsync.ai authenticate to GCS?

Use a Google Cloud service-account key with the Storage Object Admin role on your bucket, or — if rsync.ai runs inside Google Cloud — workload identity so no long-lived key is stored. The bucket, path prefix, and object lifecycle are yours to configure.

Can I use GCS files as BigQuery external tables?

Yes. rsync.ai writes Hive-style partitioned Parquet (year=2026/month=06/day=24/) that BigQuery external tables and partition pruning understand directly. A JSON schema manifest is written alongside each batch so downstream tools can discover column additions as your source schema evolves.

Is GCS a source or a destination?

Both. GCS is typically a destination for data-lake and archival workloads, but rsync.ai can also read objects from GCS and move them to another store (S3, Azure Blob, or another GCS bucket) with byte-identical blob passthrough. Blob → relational database is intentionally rejected — a raw binary can't be written to a table row without parsing.

Do I have to deploy anything to use the GCS connector?

No. rsync.ai Cloud is live at app.rsync.ai — sign up free and build a GCS pipeline in minutes, nothing to provision. If you'd rather run the whole stack inside your own VPC, self-hosting (source-available, Elastic License 2.0) arrives July 2026.