Skip to main content
v2026.06 · Live on rsync.ai Cloud

Documentation

Build data pipelines in plain English — from your first sync to custom MCP connectors and querying results with the Data Explorer.

Overview

Start here

rsync.ai is a plain-English data pipeline. You describe what you want in plain English, and the agent handles the rest — generating missing MCP connectors, moving data, and letting you query results — all in one product.

Quick start

  1. 1Sign up free at app.rsync.ai — nothing to provision, start building immediately. (Self-hosting on Docker Compose arrives July 2026.)
  2. 2Open the pipeline builder and connect your first source.
  3. 3Describe your sync: "Sync Shopify orders to Postgres every hour."
  4. 4If the MCP connector doesn't exist, the Tool Generator builds one from the source's API docs in 2–5 minutes.
  5. 5Approve the MCP connector schema and start the sync.
  6. 6Open the Data Explorer to query your synced data in plain English.

Data Pipeline

A pipeline in rsync.ai moves data from a source to a destination on a schedule or in real time. You configure it in plain English — no YAML, no JSON, no UI forms.

How a pipeline runs

1. DescribeYou type what you want: source, destination, schedule, filters.
2. PlanThe agent confirms the schema, tables, and sync cadence before touching anything.
3. ConnectExisting MCP connector is used, or the Tool Generator builds one (2–5 min).
4. MoveData is extracted, transformed if needed, and loaded to the destination.
5. QueryResults are immediately available in the Data Explorer.

Supported sources

Databases (real-time CDC)

PostgreSQL, MySQL

Databases (batch)

MariaDB, ClickHouse

Storage & files

AWS S3, GCS, Azure Blob, MinIO, Google Sheets — blob passthrough for any file format

SaaS & APIs

Shopify, Stripe, HubSpot, GitHub, Slack, Notion, Linear, Pipedrive

Any REST / GraphQL API

via Tool Generator (generate on demand)

Destinations

  • PostgreSQL (self-managed or cloud)
  • MySQL
  • Google Sheets
  • AWS S3, GCS, Azure Blob, MinIO — byte-identical blob passthrough for any file format
  • Snowflake, BigQuery, Redshift — coming soon

Schedule options

  • Real-time (CDC) — Postgres & MySQL only
  • Cron schedule — e.g. every 15 minutes, daily at 6am UTC
  • Manual trigger — run on demand from the UI or API
Warehouses (Snowflake, BigQuery, Redshift) are coming soon as destinations. Use the Tool Generator today to build a custom warehouse MCP connector if needed.

Tool Generator

The Tool Generator is rsync.ai's AI agent that builds a production-ready MCP connector from any REST or GraphQL docs URL. No code to write — paste the docs link and the agent produces a versioned, containerized MCP connector in minutes.

What gets generated

  • Authentication handler — API key, OAuth 2.0, bearer token, basic auth
  • Schema discovery — introspects all available resources and fields
  • Cursor pagination — offset, page-token, and link-header styles
  • Rate-limit backoff with configurable retry budget
  • Dockerfile for isolated, reproducible execution
  • Versioned output — regenerate at any time if the API changes

How to generate an MCP connector

  1. 1In the pipeline builder, click "Add source" and paste the API docs URL (e.g. https://developers.notion.com/reference).
  2. 2rsync.ai fetches the spec, identifies endpoints, and shows a preview of the resources it will sync.
  3. 3Review and approve the MCP connector schema — your human-in-the-loop gate before any data moves.
  4. 4The MCP connector generates and is immediately available. Typical time: 2–5 minutes.

Example — Notion MCP connector

Paste https://developers.notion.com/reference. The agent discovers Databases, Pages, Blocks, Users, and Comments endpoints, maps their schemas, and generates an MCP connector with OAuth token refresh built in.

Generation costs $0.20–$1.00 per MCP connector depending on API spec size. Generated MCP connectors are stored in your account and reusable across all pipelines — you only pay once per MCP connector version.
Works with any publicly documented REST or GraphQL API. Internal APIs? Host your OpenAPI spec on a URL accessible to rsync.ai and point the generator at it.

Data Explorer

After a pipeline runs, query the synced data directly in rsync.ai — no BI tool, no separate database client. Type a question in plain English and the Explorer converts it to SQL and runs it instantly.

Example session

You ask

Which Linear issues have been open the longest?

Explorer generates & runs

select title, days_open
from linear_issues
order by days_open desc
limit 10;

Result

titledays_open
Auth token refresh214
Webhook retries189
Schema drift alerts176

3 rows · export as CSV or JSON

Features

  • NL→SQL: describe your question, Explorer writes the query
  • CodeMirror 6 editor — syntax highlighting, autocomplete, keyboard shortcuts
  • Runs against the destination database that received your synced data — no copy, no extra latency
  • Export as CSV or JSON
  • Query history — re-run or iterate on past queries
The Explorer is available immediately after your first pipeline run. It connects to the destination database automatically — no separate configuration needed.

API

REST

Everything you can do in the rsync.ai UI — create pipelines, trigger runs, check status, and query results — is available over a REST API, so you can wire pipelines into CI, orchestrators, or your own apps.

Authentication

Create a token in app.rsync.ai → Settings → API tokens and send it as a bearer token. The base URL is https://api.rsync.ai/v1.

# Trigger a run for an existing pipeline
curl -X POST https://api.rsync.ai/v1/pipelines/pl_123/runs \
  -H "Authorization: Bearer $RSYNC_API_TOKEN" \
  -H "Content-Type: application/json"

# Poll the run status
curl https://api.rsync.ai/v1/runs/run_456 \
  -H "Authorization: Bearer $RSYNC_API_TOKEN"

Core resources

GET/pipelines
POST/pipelines
POST/pipelines/{id}/runs
GET/runs/{id}
GET/connectors
POST/explorer/query
The API uses the same token-scoped permissions as your account, and the surface is expanding — new endpoints ship alongside product features. Need an endpoint that isn't here yet? .

Real-time CDC

Change Data Capture (CDC) streams row-level inserts, updates, and deletes from your database in real time using a Debezium-based log reader. Supported for Postgres and MySQL. Other sources use scheduled batch sync.

Postgres setup

-- Enable logical replication (requires superuser)
ALTER SYSTEM SET wal_level = logical;
SELECT pg_reload_conf();

-- Create a replication slot
SELECT pg_create_logical_replication_slot('rsync_slot', 'pgoutput');

-- Grant permissions to the rsync user
CREATE ROLE rsync_replicator WITH REPLICATION LOGIN PASSWORD 'your-password';
GRANT SELECT ON ALL TABLES IN SCHEMA public TO rsync_replicator;

MySQL setup

-- Add to my.cnf / my.ini
[mysqld]
log_bin       = mysql-bin
binlog_format = ROW
binlog_row_image = FULL
server-id     = 1

-- Create replication user
CREATE USER 'rsync'@'%' IDENTIFIED BY 'your-password';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'rsync'@'%';
MariaDB uses scheduled batch sync, not CDC. Binary log format differences prevent real-time streaming today.

Unstructured Data

Beta

rsync.ai can move any file format between cloud object stores — byte-identical, no parsing required. PDFs, images, Parquet files, video, archives, or any binary — if it lives in S3, GCS, or Azure Blob, rsync.ai can copy it anywhere else in the same class.

What "blob passthrough" means

The file is read from the source byte-by-byte and written to the destination without parsing, transformation, or re-encoding. Every byte that left the source arrives at the destination unchanged.

Supported pairs (v1)

AWS S3GCS, Azure Blob, S3 (cross-bucket)
Google GCSAWS S3, Azure Blob, GCS (cross-bucket)
Azure BlobAWS S3, GCS, Azure Blob (cross-account)
Blob → relational database (e.g. Postgres) is intentionally rejected with a CAPABILITY_MISMATCH error. A raw binary cannot be written to a table row without parsing. Use the pipeline builder with a document-to-SQL transform for that flow.

Metadata catalog

Alongside every file transfer, rsync.ai writes a metadata record to your destination so you can track what was moved and detect changes for incremental re-sync:

file_nameOriginal file name from the source
bytesFile size in bytes
mime_typeDetected MIME type (e.g. application/pdf)
source_urlFull source URI (s3://bucket/key)
dest_keyDestination object key after transfer
sha256SHA-256 hash of the transferred bytes — integrity proof
created_atObject creation timestamp at source
updated_atLast-modified timestamp at source — used for incremental sync

Limits

  • Maximum file size: 1.5 GB per object
  • No limit on number of objects per pipeline run
  • SHA-256 integrity verified on every transfer — mismatches fail loudly

Coming soon

  • Parse-for-AI — extract structured text from PDFs and images, chunk, and send to a vector store (separate lane from blob passthrough)
  • Warehouse blob destinations — land files into Snowflake internal stages, BigQuery Cloud Storage, Redshift Spectrum (requires warehouse feature)

Self-hosting

Coming July 2026
rsync.ai Cloud is live today at app.rsync.ai — nothing to deploy. Self-hosting reaches general availability in July 2026. The setup below is a preview of what running rsync.ai in your own VPC will look like.

When it ships, rsync.ai will be source-available under ELv2 — free to run on your own infrastructure. The stack is Docker Compose with three core services: the pipeline agent, the API server, and a Postgres metadata store.

Requirements

  • Docker Engine 24+ and Docker Compose v2
  • 6 GB RAM minimum (8 GB recommended)
  • 20 GB disk space
  • OpenAI API key or Ollama running locally

Environment

# .env  (copy from .env.example)
OPENAI_API_KEY=sk-...           # or set OLLAMA_BASE_URL instead
POSTGRES_PASSWORD=changeme       # internal metadata store — change in prod
SECRET_KEY=change-in-production  # session signing key

# Optional
S3_BUCKET=rsync-data             # for pipeline run artifacts
S3_ENDPOINT=https://...          # MinIO, Cloudflare R2, etc.
TRUST_PROXY=1                    # set if behind nginx / Caddy for TLS

Start up

git clone https://github.com/rsync-ai/rsync
cd rsync
cp .env.example .env    # fill in your values
docker compose up -d

The UI is at http://localhost:8080. Put Caddy or nginx in front for TLS in production.

License

rsync.ai is source-available under the Elastic License 2.0 (ELv2). You can read, run, and modify the code for your own internal use. You may not offer rsync.ai as a managed service to third parties without a commercial agreement.

When self-hosting ships (July 2026) you'll be able to run it freely for internal use. For OEM or multi-tenant deployments, contact us.

Ready to try it?

Start building on rsync.ai Cloud today — self-hosting on your own infrastructure arrives July 2026.

Start free