Unique IDs
Experimental feature
This feature is new and experimental. Please report any issues you encounter.
To generate unique identifiers, regular PostgreSQL databases use sequences. For example, BIGSERIAL and SERIAL columns get their values by calling:
This guarantees that these columns contain unique and monotonically increasing integers.
If your database is sharded however, using sequences will create identical IDs for different rows on different shards. To address this, PgDog can generate unique 64-bit signed identifiers internally, based on the system clock.
How it works
The unique ID algorithm implemented by PgDog is based on three inputs:
- Current system time in milliseconds
- Unique identifier for the PgDog node (e.g.
hostname) - An internal sequence
The unique node identifier ensures that two different instances of PgDog can't produce the same ID at the same time. Additionally, the internal sequence allows for submillisecond ID creation in very busy deployments.
Once configured, you can fetch unique IDs using a standard SQL command:
Configuration
Node identifier
To make IDs globally unique, a different node identifier is required for each instance in a PgDog deployment.
If you're using our Helm chart, this is taken care of automatically when deploying it as a StatefulSet resource:
Otherwise, you need to ensure each PgDog instance has a different NODE_ID environment variable configured at startup. The variable can contain anything, as long as it ends with -<number> (hyphen and a number).
For example, if you have a three node deployment, they could be identified as follows:
When configured correctly, you're able to get each node's identifier by querying the admin database, for example:
Maximum number of nodes
Due to how the ID generation algorithm is implemented, PgDog allows up to a maximum of 1024 instances (starting at 0) in the same deployment.
Minimum ID
If you're migrating data from an existing database, you can ensure that all IDs generated by PgDog start at a minimum value. This is configurable in pgdog.toml, like so:
When set, all generated IDs are guaranteed to be larger than this value.
Limitations
The generated unique IDs are 64-bit signed integers, matching the BIGINT (and BIGSERIAL) PostgreSQL format. However, since they are time-based, subsequently generated IDs will have gaps, for example:
This is normally not an issue, since PostgreSQL sequences are not guaranteed to be gap-free either, but this is something to be aware of for applications that attempt to detect rolled back transactions.
Additionally, because PgDog reserves only 41 bits for the timestamp portion of the identifier, the IDs have a maximum value. Currently, the available ID range is 69.73 years, set to overflow on August 3, 2095. We expect databases to use 128-bit integers by then, expanding the ID range almost indefinitely.
Generation rate
Since the identifiers are time-based, to ensure uniqueness, PgDog limits how many IDs can be generated per unit of time. This limit is currently 4,096 IDs per millisecond.
When it's reached, PgDog will pause ID generation until the clock ticks to the next millisecond. This gives it an effective ID generation rate of 4,096,000 / second / node, which should be sufficient for most deployments.