Skip to content

Unique IDs

Experimental feature

This feature is new and experimental. Please report any issues you encounter.

To generate unique identifiers, regular PostgreSQL databases use sequences. For example, BIGSERIAL and SERIAL columns get their values by calling:

SELECT nextval('sequence_name');

This guarantees that these columns contain unique and monotonically increasing integers.

If your database is sharded however, using sequences will create identical IDs for different rows on different shards. To address this, PgDog can generate unique 64-bit signed identifiers internally, based on the system clock.

How it works

The unique ID algorithm implemented by PgDog is based on three inputs:

  • Current system time in milliseconds
  • Unique identifier for the PgDog node (e.g. hostname)
  • An internal sequence

The unique node identifier ensures that two different instances of PgDog can't produce the same ID at the same time. Additionally, the internal sequence allows for submillisecond ID creation in very busy deployments.

Once configured, you can fetch unique IDs using a standard SQL command:

SHOW pgdog.unique_id;
   unique_id
----------------
 29888761298944

Configuration

Node identifier

To make IDs globally unique, a different node identifier is required for each instance in a PgDog deployment.

If you're using our Helm chart, this is taken care of automatically when deploying it as a StatefulSet resource:

statefulSet:
  enabled: true

Otherwise, you need to ensure each PgDog instance has a different NODE_ID environment variable configured at startup. The variable can contain anything, as long as it ends with -<number> (hyphen and a number).

For example, if you have a three node deployment, they could be identified as follows:

export NODE_ID=pgdog-prod-0
export NODE_ID=pgdog-prod-1
export NODE_ID=pgdog-prod-2

When configured correctly, you're able to get each node's identifier by querying the admin database, for example:

SHOW INSTANCE_ID;
  instance_id
----------------
  pgdog-prod-0

Maximum number of nodes

Due to how the ID generation algorithm is implemented, PgDog allows up to a maximum of 1024 instances (starting at 0) in the same deployment.

Minimum ID

If you're migrating data from an existing database, you can ensure that all IDs generated by PgDog start at a minimum value. This is configurable in pgdog.toml, like so:

[general]
unique_id_min = 5_000_000

When set, all generated IDs are guaranteed to be larger than this value.

Limitations

The generated unique IDs are 64-bit signed integers, matching the BIGINT (and BIGSERIAL) PostgreSQL format. However, since they are time-based, subsequently generated IDs will have gaps, for example:

678973936041
678944576104
678948770152

This is normally not an issue, since PostgreSQL sequences are not guaranteed to be gap-free either, but this is something to be aware of for applications that attempt to detect rolled back transactions.

Additionally, because PgDog reserves only 41 bits for the timestamp portion of the identifier, the IDs have a maximum value. Currently, the available ID range is 69.73 years, set to overflow on August 3, 2095. We expect databases to use 128-bit integers by then, expanding the ID range almost indefinitely.

Generation rate

Since the identifiers are time-based, to ensure uniqueness, PgDog limits how many IDs can be generated per unit of time. This limit is currently 4,096 IDs per millisecond.

When it's reached, PgDog will pause ID generation until the clock ticks to the next millisecond. This gives it an effective ID generation rate of 4,096,000 / second / node, which should be sufficient for most deployments.