Primary keys
Primary keys are columns with a unique index and a not null constraint. Theoretically, any data type can be used as the primary key, but the common ones are BIGINT
(specified as BIGSERIAL
for automatic generation) and UUID
.
In sharded databases, primary keys generated on each shard have to be globally unique: no two shards can contain a row with the same value. To make this easy and avoid using external ID generation services, PgDog provides a few pl/PgSQL functions that can do this automatically from inside Postgres.
Note
Make sure to install and enable the schema manager before using this functionality.
How it works
Take the following table as an example:
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
email VARCHAR NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
If you run this command through PgDog, this table will be created on all shards. Underneath, Postgres expands BIGSERIAL
to the following code:
The users_id_seq
is a sequence, automatically created by Postgres, that will be used to generate unique values for inserted rows that don't provide one for the id
column.
Since each shard has its own sequence, values pulled from it would repeat on all shards, creating duplicate references to different objects in your database. To avoid this, we've written a pl/PgSQL function to replace nextval
that can generate globally unique values.
pgdog.next_id_seq
The function pgdog.next_id_seq
, installed by default if you're using our schema manager, accepts a sequence and returns a unique and valid value for each shard it's executed on. For example:
The function consumes values from the sequence until it finds one that satisfies the sharding function and the shard number of the current database. To make use of it, set it as the default value for your table's primary key, like so:
Sequence cache
When looking for the next valid number, next_id_seq
will consume several values from the sequence in a row. By default, each call to nextval
requires a write to the WAL, which could be a bit slower than optimal. To mitigate this, increase the sequence's cache:
This will keep 250 values of the sequence in memory instead of on disk. If you're deploying a large number of shards, increase the cache size accordingly.
UUIDs
Since UUIDs are randomly generated, they don't need a sequence to guarantee uniqueness. If you're using UUIDs as sharding keys and don't want to generate them in your application, you can use one of our pl/PgSQL functions to create valid values on each shard in the cluster:
Just like with BIGSERIAL
, you can set this function as the default on a column: