Sharding Postgres
Sharding splits a PostgreSQL database and all its tables and indices between multiple machines. Each machine runs its own, independent PostgreSQL server, while PgDog takes care of routing queries and moving data between hosts.
Intro to sharding
If you're not familiar with database sharding fundamentals, take a look at the sharding basics. Even if you're a seasoned database expert, it's good to have a refresher to confirm your understanding matches with our implementation.
Routing queries
PgDog is a query router. It can extract sharding hints directly from the SQL using the PostgreSQL parser and send queries to one or more shards.
Query router feature | Description |
---|---|
Direct-to-shard queries | Automatic sharding key detection which sends the query to one shard only. |
Cross-shard queries | Queries that don't have a sharding key are sent to all shards with results collected and transformed, as if they came from one database. |
Manual routing | Provide the sharding key in a query comment, or separately with a SET PostgreSQL command. |
Sharded COPY | Data sent via COPY commands is automatically split between all shards, using the configured sharding function. |
Managing data
PgDog implements the logical replication protocol used by Postgres to move data between nodes.
Data consistency
To make sure data is atomically written in cross-shard transactions, PgDog supports PostgreSQL's prepared transactions and two-phase commit.
Resharding
Resharding takes a database cluster with N shards (where N can be 1, for unsharded databases), and turns it into a cluster with M databases. It uses logical replication to do this without downtime or impacting production operations.
Schema management
PgDog makes sure that database schema is identical on all shards. It also has support for in-database primary key generation.