UUID v1 vs v4 vs v7: Picking a DB Primary Key
Summary (TL;DR)
To put it bluntly: using UUID v4 as the primary key of a hot insert table is a footgun. I recently looked at a PostgreSQL 16 workload where events.id defaulted to gen_random_uuid() (a v4), and every INSERT was landing on a random B-tree leaf — cooling shared_buffers and dragging index fragmentation into the high single digits as reported by pgstattuple. Switching the column default to a v7 generator — same 16-byte uuid type, same index — cut average INSERT latency to roughly a third of what it was, and the fragmentation numbers stabilized. Choosing a UUID version is a database-design decision, not a cryptography decision. A UUID is a 128-bit value written as 8-4-4-4-12 hexadecimal digits, with four bits fixed as the version and two or three bits fixed as the variant. Version 1 stamps time and a node identifier (historically a MAC address) into those 128 bits — roughly sortable, but it leaks the host. Version 4 fills the non-reserved bits with random data: strong unguessability, but no ordering, so insertions land in arbitrary index positions. Version 7, formalized in RFC 9562 (2024), puts a Unix-millisecond timestamp in the high bits and random tail, combining v4’s safety with v1’s index locality. For a public API where opaque randomness genuinely matters, v4 is still the default. Almost everywhere else, v7 is the better primary key. v1 is legacy — accepted by databases, but it should not be a new choice.
Background
A UUID is 128 bits. By convention it is printed as 32 hexadecimal digits grouped with hyphens: xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx. The M digit holds the version (1, 4, 7, etc.) and the top bits of N hold the variant. Everything else is version-specific.
Version 1 was designed for uniqueness across machines and time. Its 60-bit timestamp counts 100-nanosecond intervals since 1582-10-15, a clock-sequence field handles clock rollbacks, and a 48-bit node ID was originally the network MAC address. The privacy consequence is direct: a v1 UUID minted on your laptop encodes your laptop’s MAC, and a one-line uuid -d reverses it. Modern libraries sometimes randomize the node ID to avoid the leak, but many still follow the original rule.
Version 4 is 122 bits of randomness with 6 bits fixed for version and variant. It assumes a strong RNG (the browser’s crypto.randomUUID() or PostgreSQL’s gen_random_uuid()). Collisions are effectively impossible at any realistic scale — the birthday bound gives roughly one-in-a-trillion odds after generating about a billion v4s. The downside is that successive v4s are uncorrelated, so inserting them into an index touches random pages, hurting cache locality and write amplification (WAL plus full-page writes in Postgres).
Version 7 is part of RFC 9562, published in 2024, which obsoleted RFC 4122 and added versions 6, 7, and 8. A v7 UUID stores a 48-bit Unix millisecond timestamp in its high bits, followed by the version tag, a small rand_a field, the variant tag, and a 62-bit rand_b tail. The practical effect is that v7 values generated within the same millisecond are adjacent in sort order, and values across milliseconds sort chronologically. The random tail still provides more than enough entropy for uniqueness within a millisecond. PostgreSQL 18 ships a native uuidv7(); on earlier versions you can use the pg_uuidv7 extension or generate them in the application layer (Node uuid 9.x, Python uuid6).
The variant bits matter because they distinguish the RFC 9562 / 4122 family from legacy Microsoft and Apollo UUIDs. For this guide, assume the RFC variant (the first hex digit of the N group is 8, 9, a, or b).
Storage format is a separate concern. PostgreSQL 16’s native uuid type stores the value as 16 bytes; MySQL typically uses BINARY(16) or CHAR(36) — the string form doubles storage and makes comparisons character-wise rather than byte-wise. The choice of version and the choice of storage format interact: sorting v7 as a binary value is cheap and correct, sorting it as a hex string is correct but slower, and sorting v4 is meaningless regardless of storage.
Data / Comparison
| Property | v1 | v4 | v7 |
|---|---|---|---|
| Generation input | Timestamp + clock sequence + node ID | 122 random bits | 48-bit Unix ms timestamp + random tail |
| Privacy | Leaks node ID (often the MAC) | No host or time information | Leaks creation time in ms, no host |
| Sortable by time | Yes (but byte order ≠ time order without rearrangement) | No | Yes — lexicographic order matches chronological order |
| Index locality | Moderate | Poor (random inserts across the B-tree) | Good (near-monotonic) |
| Typical use case | Legacy systems, some COM/Windows IDs | Public API IDs, session tokens, salts | Event logs, high-volume inserts, time-paged tables |
| Entropy | Low (most bits are time/node) | High (about 122 bits) | High tail (about 74 bits), low-collision within ms |
A rough mental model: v4 maximizes unguessability at the cost of index behavior, v7 keeps enough unguessability for most applications while restoring the insert-at-the-end pattern that databases love, and v1 is a historical artifact worth recognizing but not choosing.
Real-world Scenarios
Scenario 1 — Append-heavy event log. This is exactly the workload from the opening anecdote. A table that ingests millions of rows per day and is usually queried as “last 24 hours, ordered by time” benefits directly from v7. New rows land at the end of the primary-key index, so the hot pages stay warm and range scans over time ranges map onto contiguous index segments. Migrating from v4 to v7 here often reduces write latency and index fragmentation without changing any query code — typically a one-line column-default change.
Scenario 2 — Public user-facing IDs. Share links like /orders/{id} must be unguessable so that visitors cannot enumerate other users’ orders. v4 is the safe default. If you also want the benefits of v7, be aware that a v7 reveals its creation timestamp to the millisecond, which may be fine for orders but could leak business signals (e.g., exact checkout volume per minute) in more sensitive contexts. The compromise I have recommended to teams is a dual-ID pattern: v7 internally for the primary key, and a separate v4 or short random slug exposed to the outside world. You keep the index behavior without leaking timing externally.
Scenario 3 — Multi-region or sharded systems. v7’s timestamp prefix means two regions generating UUIDs in the same millisecond will interleave cleanly by time, but within a millisecond there is no ordering guarantee across regions. If you need a stricter cross-region ordering, ULID (a 48-bit timestamp + 80-bit randomness encoded as Crockford Base32) has nearly identical properties and a more compact 26-character textual form. Snowflake-style IDs go further by including an explicit machine ID for per-node ordering (the original Twitter design and Discord’s variant are both 64-bit), at the cost of requiring coordination to allocate those IDs.
Common Misconceptions
“UUIDs are always slow in databases.” They are slower than a 4-byte int in raw terms, but the real cost comes from random-insert fragmentation in B-tree indexes, which v7 largely eliminates. Storing UUIDs as 16 bytes instead of a 36-character string cuts index size in half and speeds comparisons. Many “UUIDs are slow” benchmarks are really “v4 stored as CHAR(36) in MySQL is slow.”
“v4 is the only secure version.” v7’s random tail is still a large entropy pool, and for most applications — session references, API IDs — it is unguessable enough that an attacker would not enumerate them. Where guessability matters is the timestamp prefix: v7 reveals when the row was created. If that is acceptable (and it usually is), v7 is a reasonable choice even for external IDs.
“UUIDs must be stored as strings.” The string form is 36 characters (32 hex plus 4 hyphens) vs 16 bytes binary. Binary is more compact and sorts correctly as bytes, which matters for v1 with its non-monotonic byte order and for v7 where byte order should align with time. PostgreSQL’s uuid type already stores the value as 16 bytes, so there is nothing extra to think about there.
“v1’s MAC leak doesn’t matter because nobody looks.” The MAC in a v1 UUID is a known reversal — uuid -d and any forensics tool extract it. If your UUIDs appear in URLs, support tickets, or log dumps shared with third parties, that is a real information disclosure.
Checklist
- Will this UUID appear in a database index that takes frequent inserts? Prefer v7. Fall back to v4 only if unguessability of creation time is essential.
- Is the UUID visible to end users or partners? Either version works; just confirm that the timestamp leak of v7 is acceptable for the context.
- Are you on Postgres? Store as
uuid(16 bytes). On MySQL, useBINARY(16)unless string compatibility outweighs the size cost. - Do you need ordering guarantees across multiple generators? v7 alone is not enough; consider ULID with the same time prefix, or a Snowflake-style ID with an explicit machine ID.
- Is v1 still in the codebase? Document the MAC leak and plan a migration when the schema allows.
- Are you generating UUIDs client-side? Use a library that calls a cryptographically strong RNG (
crypto.randomUUID()in modern browsers for v4; v7 libraries typically wrap the same RNG).
Related Tool
The Patrache Studio UUID generator produces v4 and v7 UUIDs locally, so the generated values are never logged to a third-party service. UUIDs almost always ride inside JSON payloads — JSON Formatting, Validation, and Schema in Practice covers the schema patterns that keep those IDs well-typed as they move between services. When you need a compact textual form — for instance, a 22-character short ID derived from a 16-byte UUID — the rules in Base64 and URL Encoding: Purpose, Pitfalls, Correct Usage explain why Base64URL rather than standard Base64 is the right choice.
References
- IETF RFC 9562, “Universally Unique IDentifiers (UUIDs)” — https://datatracker.ietf.org/doc/html/rfc9562
- IETF RFC 4122, “A Universally Unique IDentifier (UUID) URN Namespace” (obsoleted by RFC 9562) — https://datatracker.ietf.org/doc/html/rfc4122
- PostgreSQL documentation, “UUID Type” — https://www.postgresql.org/docs/current/datatype-uuid.html
- ULID specification — https://github.com/ulid/spec