Documentation
Build data pipelines in plain English — from your first sync to custom MCP connectors and querying results with the Data Explorer.
Overview
Start herersync.ai is a plain-English data pipeline. You describe what you want in plain English, and the agent handles the rest — generating missing MCP connectors, moving data, and letting you query results — all in one product.
Data Pipeline
Describe a sync in plain English. The agent runs it.
Tool Generator
Build an MCP connector from any REST or GraphQL docs URL.
Data Explorer
Query your synced data in plain English or SQL.
Quick start
- 1Sign up free at
app.rsync.ai— nothing to provision, start building immediately. (Self-hosting on Docker Compose arrives July 2026.) - 2Open the pipeline builder and connect your first source.
- 3Describe your sync: "Sync Shopify orders to Postgres every hour."
- 4If the MCP connector doesn't exist, the Tool Generator builds one from the source's API docs in 2–5 minutes.
- 5Approve the MCP connector schema and start the sync.
- 6Open the Data Explorer to query your synced data in plain English.
Data Pipeline
A pipeline in rsync.ai moves data from a source to a destination on a schedule or in real time. You configure it in plain English — no YAML, no JSON, no UI forms.
How a pipeline runs
Supported sources
Databases (real-time CDC)
PostgreSQL, MySQL
Databases (batch)
MariaDB, ClickHouse
Storage & files
AWS S3, GCS, Azure Blob, MinIO, Google Sheets — blob passthrough for any file format
SaaS & APIs
Shopify, Stripe, HubSpot, GitHub, Slack, Notion, Linear, Pipedrive
Any REST / GraphQL API
via Tool Generator (generate on demand)
Destinations
- PostgreSQL (self-managed or cloud)
- MySQL
- Google Sheets
- AWS S3, GCS, Azure Blob, MinIO — byte-identical blob passthrough for any file format
- Snowflake, BigQuery, Redshift — coming soon
Schedule options
- Real-time (CDC) — Postgres & MySQL only
- Cron schedule — e.g.
every 15 minutes,daily at 6am UTC - Manual trigger — run on demand from the UI or API
Tool Generator
The Tool Generator is rsync.ai's AI agent that builds a production-ready MCP connector from any REST or GraphQL docs URL. No code to write — paste the docs link and the agent produces a versioned, containerized MCP connector in minutes.
What gets generated
- Authentication handler — API key, OAuth 2.0, bearer token, basic auth
- Schema discovery — introspects all available resources and fields
- Cursor pagination — offset, page-token, and link-header styles
- Rate-limit backoff with configurable retry budget
- Dockerfile for isolated, reproducible execution
- Versioned output — regenerate at any time if the API changes
How to generate an MCP connector
- 1In the pipeline builder, click "Add source" and paste the API docs URL (e.g.
https://developers.notion.com/reference). - 2rsync.ai fetches the spec, identifies endpoints, and shows a preview of the resources it will sync.
- 3Review and approve the MCP connector schema — your human-in-the-loop gate before any data moves.
- 4The MCP connector generates and is immediately available. Typical time: 2–5 minutes.
Example — Notion MCP connector
Paste https://developers.notion.com/reference. The agent discovers Databases, Pages, Blocks, Users, and Comments endpoints, maps their schemas, and generates an MCP connector with OAuth token refresh built in.
Data Explorer
After a pipeline runs, query the synced data directly in rsync.ai — no BI tool, no separate database client. Type a question in plain English and the Explorer converts it to SQL and runs it instantly.
Example session
You ask
Explorer generates & runs
select title, days_open from linear_issues order by days_open desc limit 10;
Result
3 rows · export as CSV or JSON
Features
- NL→SQL: describe your question, Explorer writes the query
- CodeMirror 6 editor — syntax highlighting, autocomplete, keyboard shortcuts
- Runs against the destination database that received your synced data — no copy, no extra latency
- Export as CSV or JSON
- Query history — re-run or iterate on past queries
API
RESTEverything you can do in the rsync.ai UI — create pipelines, trigger runs, check status, and query results — is available over a REST API, so you can wire pipelines into CI, orchestrators, or your own apps.
Authentication
Create a token in app.rsync.ai → Settings → API tokens and send it as a bearer token. The base URL is https://api.rsync.ai/v1.
# Trigger a run for an existing pipeline curl -X POST https://api.rsync.ai/v1/pipelines/pl_123/runs \ -H "Authorization: Bearer $RSYNC_API_TOKEN" \ -H "Content-Type: application/json" # Poll the run status curl https://api.rsync.ai/v1/runs/run_456 \ -H "Authorization: Bearer $RSYNC_API_TOKEN"
Core resources
/pipelines/pipelines/pipelines/{id}/runs/runs/{id}/connectors/explorer/queryReal-time CDC
Change Data Capture (CDC) streams row-level inserts, updates, and deletes from your database in real time using a Debezium-based log reader. Supported for Postgres and MySQL. Other sources use scheduled batch sync.
Postgres setup
-- Enable logical replication (requires superuser)
ALTER SYSTEM SET wal_level = logical;
SELECT pg_reload_conf();
-- Create a replication slot
SELECT pg_create_logical_replication_slot('rsync_slot', 'pgoutput');
-- Grant permissions to the rsync user
CREATE ROLE rsync_replicator WITH REPLICATION LOGIN PASSWORD 'your-password';
GRANT SELECT ON ALL TABLES IN SCHEMA public TO rsync_replicator;MySQL setup
-- Add to my.cnf / my.ini [mysqld] log_bin = mysql-bin binlog_format = ROW binlog_row_image = FULL server-id = 1 -- Create replication user CREATE USER 'rsync'@'%' IDENTIFIED BY 'your-password'; GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'rsync'@'%';
Unstructured Data
Betarsync.ai can move any file format between cloud object stores — byte-identical, no parsing required. PDFs, images, Parquet files, video, archives, or any binary — if it lives in S3, GCS, or Azure Blob, rsync.ai can copy it anywhere else in the same class.
What "blob passthrough" means
The file is read from the source byte-by-byte and written to the destination without parsing, transformation, or re-encoding. Every byte that left the source arrives at the destination unchanged.
Supported pairs (v1)
CAPABILITY_MISMATCH error. A raw binary cannot be written to a table row without parsing. Use the pipeline builder with a document-to-SQL transform for that flow.Metadata catalog
Alongside every file transfer, rsync.ai writes a metadata record to your destination so you can track what was moved and detect changes for incremental re-sync:
file_nameOriginal file name from the sourcebytesFile size in bytesmime_typeDetected MIME type (e.g. application/pdf)source_urlFull source URI (s3://bucket/key)dest_keyDestination object key after transfersha256SHA-256 hash of the transferred bytes — integrity proofcreated_atObject creation timestamp at sourceupdated_atLast-modified timestamp at source — used for incremental syncLimits
- Maximum file size: 1.5 GB per object
- No limit on number of objects per pipeline run
- SHA-256 integrity verified on every transfer — mismatches fail loudly
Coming soon
- ◦Parse-for-AI — extract structured text from PDFs and images, chunk, and send to a vector store (separate lane from blob passthrough)
- ◦Warehouse blob destinations — land files into Snowflake internal stages, BigQuery Cloud Storage, Redshift Spectrum (requires warehouse feature)
Self-hosting
Coming July 2026app.rsync.ai — nothing to deploy. Self-hosting reaches general availability in July 2026. The setup below is a preview of what running rsync.ai in your own VPC will look like.When it ships, rsync.ai will be source-available under ELv2 — free to run on your own infrastructure. The stack is Docker Compose with three core services: the pipeline agent, the API server, and a Postgres metadata store.
Requirements
- Docker Engine 24+ and Docker Compose v2
- 6 GB RAM minimum (8 GB recommended)
- 20 GB disk space
- OpenAI API key or Ollama running locally
Environment
# .env (copy from .env.example) OPENAI_API_KEY=sk-... # or set OLLAMA_BASE_URL instead POSTGRES_PASSWORD=changeme # internal metadata store — change in prod SECRET_KEY=change-in-production # session signing key # Optional S3_BUCKET=rsync-data # for pipeline run artifacts S3_ENDPOINT=https://... # MinIO, Cloudflare R2, etc. TRUST_PROXY=1 # set if behind nginx / Caddy for TLS
Start up
git clone https://github.com/rsync-ai/rsync cd rsync cp .env.example .env # fill in your values docker compose up -d
The UI is at http://localhost:8080. Put Caddy or nginx in front for TLS in production.
License
rsync.ai is source-available under the Elastic License 2.0 (ELv2). You can read, run, and modify the code for your own internal use. You may not offer rsync.ai as a managed service to third parties without a commercial agreement.
When self-hosting ships (July 2026) you'll be able to run it freely for internal use. For OEM or multi-tenant deployments, contact us.
Ready to try it?
Start building on rsync.ai Cloud today — self-hosting on your own infrastructure arrives July 2026.