Ask HN: We just had an actual UUID v4 collision...

Ask HN: We just had an actual UUID v4 collision…

I know what you’re thinking… and I still can’t believe it, but… This morning, our database flagged a duplicate UUID (v4). I checked, thinking it may have been a double-insert bug or something, but no. 我知道你在想什么……我到现在也不敢相信,但是……今天早上,我们的数据库标记了一个重复的 UUID (v4)。我检查了一下,以为可能是重复插入的 Bug 之类的,但并不是。

The original UUID was from a record added in 2025 (about a year ago), and today the system inserted a new document with a fresh UUIDv4 and it came up with the exact same one: b6133fd6-70fe-4fe3-bed6-8ca8fc9386cd 原始的 UUID 来自 2025 年(大约一年前)添加的一条记录,而今天系统插入了一个带有全新 UUIDv4 的文档,结果生成的竟然完全一样:b6133fd6-70fe-4fe3-bed6-8ca8fc9386cd

We’re using this: https://www.npmjs.com/package/uuid. I thought this is technically impossible, and it will never happen, and since we’re not modifying the UUIDs in any way, I really wonder how that… is possible!? We’re literally only calling: import { v4 as uuidv4 } from "uuid"; const document_id = uuidv4(); … and then insert into the database, that’s it. 我们使用的是这个库:https://www.npmjs.com/package/uuid。我原以为这在技术上是不可能的,永远不会发生,而且由于我们没有以任何方式修改这些 UUID,我真的很想知道这……怎么可能发生!?我们实际上只是调用了:import { v4 as uuidv4 } from "uuid"; const document_id = uuidv4(); ……然后插入数据库,仅此而已。

Additionally, the database only has about 15,000 records, and now one collision. Statistically… impossible. Has that ever happened to anyone?! What in the… help. 此外,数据库中只有大约 15,000 条记录,现在却出现了一次碰撞。从统计学上讲……这不可能。有人遇到过这种情况吗?!这到底是怎么回事……救命。


jandrewrogers: This is surprisingly common. The security of UUIDv4 is based on the assumption of a high-quality entropy source. This assumption is invalidated by hardware defects, normal software bugs, and developers not understanding what “high-quality entropy” actually means and that it is required for UUIDv4 to work as advertised. jandrewrogers: 这种情况出奇地普遍。UUIDv4 的安全性基于“高质量熵源”的假设。这一假设会因硬件缺陷、常规软件 Bug,以及开发者不理解“高质量熵”的真正含义(以及它是 UUIDv4 按预期工作的必要条件)而失效。

It is relatively expensive to detect when an entropy source is broken, so almost no one ever does. They find out when a collision happens, like you just did. UUIDv4 is explicitly forbidden for a lot of high-assurance and high-reliability software systems for this reason. 检测熵源何时损坏的成本相对较高,所以几乎没人会去做。他们通常是在发生碰撞时才发现,就像你刚才那样。出于这个原因,许多高保证和高可靠性的软件系统明确禁止使用 UUIDv4。


LocalH: This is why CloudFlare has done what they did with the lava lamp wall. Not that the wall is such a great source of entropy on its own - I’m sure it’s not their only source, but you can never have too many sources of entropy - but it makes it visible in a way that can grab those who don’t fully understand the concepts of RNGs and how entropy plays into that. LocalH: 这就是为什么 CloudFlare 要搞那个熔岩灯墙。并不是说这面墙本身就是多么伟大的熵源——我相信这肯定不是他们唯一的来源,而且熵源永远不嫌多——但它以一种直观的方式呈现出来,能够吸引那些不完全理解随机数生成器(RNG)概念以及熵在其中所起作用的人。

The more sources of entropy, the more closely you approach “perfect” randomization. And a large chunk of those entropy sources need to be non-deterministic. Even on the small level, local applications running on local systems, like games, can use things like the mouse coordinates, the timings between button presses, the exact frame count since game start before the player presses Start to greatly enhance randomness while still using PRNGs under the hood. 熵源越多,就越接近“完美”的随机化。而且这些熵源中很大一部分需要是非确定性的。即使在小范围内,运行在本地系统上的应用程序(如游戏)也可以利用鼠标坐标、按键间隔时间、玩家按下“开始”键前游戏启动后的精确帧数等信息,在底层仍使用伪随机数生成器(PRNG)的同时,极大地增强随机性。


greiskul: > you can never have too many sources of entropy. This is so true. And the beauty is that with algorithms, we don’t even need to know much about the entropy to be able to extract it. There is the Von Neumann method of generating an unbiased coin from a biased coin… And then there is modern cryptographic hashing. Feed it all the bits you can. Collisions end up only happening in the real world if every single one of those bits is identical. So if you have actual entropy being fed, that cannot be controlled, predicted, or replicated, modern cryptography tells you that the end result is unique. greiskul: > 熵源永远不嫌多。这太对了。美妙之处在于,有了算法,我们甚至不需要了解太多关于熵的知识就能提取它。比如冯·诺依曼提出的从有偏硬币生成无偏硬币的方法……还有现代加密哈希。尽可能多地输入比特位。在现实世界中,只有当每一个比特位都完全相同时,才会发生碰撞。因此,如果你输入的是无法被控制、预测或复制的真实熵,现代密码学告诉我们,最终结果是唯一的。