How Monero’s proof of work works

How Monero’s proof of work works

门罗币的工作量证明机制是如何运作的

Monero’s proof of work is called RandomX. Monero does not ask miners to run the same tiny hash function over and over. It asks them to run a small random program on a virtual machine, hit memory hard while doing it, and then hash the result. Bitcoin’s proof of work is great for specialized chips because the work never changes. RandomX was built to do the opposite. It tries to make efficient mining look as much like a normal CPU workload as possible. 门罗币(Monero)的工作量证明机制被称为 RandomX。门罗币并不要求矿工反复运行同一个微小的哈希函数,而是要求他们在虚拟机上运行一段随机的小程序,在此过程中高强度地占用内存,最后对结果进行哈希处理。比特币的工作量证明非常适合专用芯片,因为其计算任务从不改变。而 RandomX 的设计初衷恰恰相反,它试图让高效的挖矿过程尽可能看起来像普通的 CPU 工作负载。

Short version

简短总结

Here is the shortest useful summary: Monero takes the candidate block header plus a nonce. It also uses an older block hash as a medium-term key. That key builds a large shared memory dataset. The candidate block input is hashed to create a seed for a special virtual machine. The VM runs integer math, floating-point math, branches, and lots of memory accesses across 8 chained programs. The final machine state is hashed into a 256-bit output. If that output is below the network target, the block is valid. The interesting part is not the yes-or-no rule at the end. Every proof-of-work system has that. The interesting part is how Monero makes each hash attempt expensive in the exact ways normal CPUs are good at and custom chips hate. 以下是最简明扼要的总结:门罗币获取候选区块头和随机数(nonce),并使用一个较旧的区块哈希作为中期密钥。该密钥会构建一个大型共享内存数据集。候选区块输入被哈希处理后,为一台特殊的虚拟机生成种子。该虚拟机在 8 个链式程序中运行整数运算、浮点运算、分支跳转以及大量的内存访问。最终的机器状态被哈希成一个 256 位的输出。如果该输出低于网络目标值,则区块有效。有趣之处不在于最后的“是或否”判定规则(每个工作量证明系统都有),而在于门罗币如何让每一次哈希尝试的成本,恰好落在普通 CPU 擅长而定制芯片厌恶的领域。

Why Monero does not use a simple hash

为什么门罗币不使用简单的哈希算法

If your proof of work is just “run this fixed function on new inputs until you get a lucky output,” hardware designers have a clear job: build silicon that runs that exact function as cheaply and as fast as possible. That is what happened to Bitcoin with SHA-256 ASICs. Monero did not want that path. Long before RandomX, the project was explicit that specialized mining hardware creates centralization pressure. Fewer manufacturers matter more. Large farms matter more. Ordinary users matter less. Monero’s earlier answer was the CryptoNight family. Later, in late 2019, Monero switched to RandomX, which its own release notes described as a new proof of work “based on random instructions, adapted to CPUs.” So the design target changed from make memory matter to make a whole CPU matter. 如果你的工作量证明仅仅是“在新的输入上运行这个固定函数,直到得到一个幸运输出”,那么硬件设计者的任务就很明确:制造出能以最低成本、最快速度运行该特定函数的芯片。这就是比特币在 SHA-256 ASIC 矿机上发生的情况。门罗币不想走这条路。早在 RandomX 之前,该项目就明确表示,专用挖矿硬件会产生中心化压力。少数制造商变得更重要,大型矿场变得更重要,而普通用户则变得无足轻重。门罗币早期的解决方案是 CryptoNight 系列算法。后来,在 2019 年底,门罗币转向了 RandomX,其发布说明将其描述为一种“基于随机指令、适配 CPU”的新型工作量证明。因此,设计目标从“让内存变得重要”转变为“让整个 CPU 变得重要”。

The core idea behind RandomX

RandomX 的核心理念

RandomX starts from one observation: CPUs are not just arithmetic boxes. They are flexible machines built to run changing code and juggle a lot of hardware features at once. A modern CPU has: multiple cache levels, integer units, floating-point units, branch handling, out-of-order execution, speculative execution, and memory controllers. Normal cryptographic hashes do not use much of that variety. They mostly push data through a fixed pipeline. RandomX tries to bind proof of work to those broader CPU strengths. Its design document says the work must be dynamic. That means the miner is not just feeding in new data. The miner is also getting new code to run. That is why RandomX is based on random code execution. RandomX 基于一个观察:CPU 不仅仅是算术运算盒。它们是灵活的机器,旨在运行不断变化的代码并同时处理多种硬件特性。现代 CPU 拥有:多级缓存、整数单元、浮点单元、分支处理、乱序执行、推测执行和内存控制器。普通的加密哈希算法并没有利用到这些多样性,它们大多只是将数据推入固定的流水线。RandomX 试图将工作量证明与这些更广泛的 CPU 优势绑定。其设计文档指出,工作必须是动态的。这意味着矿工不仅是在输入新数据,还在运行新的代码。这就是为什么 RandomX 基于随机代码执行。

What a miner is actually computing

矿工实际上在计算什么

At the Monero level, RandomX takes two important inputs: a key K and a hashing input H. For Monero, K comes from an older block hash, called the key block. The RandomX reference README recommends changing this key every 2048 blocks with a 64-block delay, and that is how Monero wires it in. That detail matters because miners do not rebuild the heavy shared memory structures for every nonce. They rebuild them only when the key changes, roughly every 2.8 days. H is the candidate block hashing blob with a chosen nonce. That is the part miners keep changing over and over. So you can think of Monero mining like this: the network gives you a medium-term environment through K; your candidate block gives you a per-attempt input through H. The environment changes slowly. The attempt changes constantly. 在门罗币层面,RandomX 接收两个重要输入:密钥 K 和哈希输入 H。对于门罗币而言,K 来自一个较旧的区块哈希,称为“密钥区块”(key block)。RandomX 的参考 README 建议每 2048 个区块更换一次密钥,并有 64 个区块的延迟,门罗币正是这样实现的。这个细节很重要,因为矿工不需要为每一个随机数(nonce)重建沉重的共享内存结构。他们只在密钥更改时(大约每 2.8 天)才进行重建。H 是带有选定随机数的候选区块哈希数据块。这是矿工不断更改的部分。因此,你可以这样理解门罗币挖矿:网络通过 K 为你提供了一个中期环境;你的候选区块通过 H 为你提供了每次尝试的输入。环境变化缓慢,而尝试则不断变化。

Step 1: build the cache from the key

第一步:从密钥构建缓存

RandomX first takes the key K and runs Argon2d on it. Argon2d is better known as a password-hashing and key-derivation function. It is useful here for the same reason it is useful there: it is memory-hard. It forces the machine to touch a lot of memory in a way that is annoying to cheat. In the default RandomX parameters, this produces a 256 MiB cache. That cache is the smaller of the two big memory structures in RandomX. It is not the structure miners want to use directly for maximum speed. It is the structure used to build the bigger one. RandomX 首先获取密钥 K 并对其运行 Argon2d。Argon2d 更广为人知的是作为一种密码哈希和密钥派生函数。它在这里之所以有用,原因与它在密码学中的用途相同:它是“内存硬”(memory-hard)的。它强制机器以一种难以作弊的方式访问大量内存。在默认的 RandomX 参数下,这会产生一个 256 MiB 的缓存。该缓存是 RandomX 中两个大型内存结构中较小的一个。它不是矿工为了获得最大速度而直接使用的结构,而是用于构建更大结构的基础。

Step 2: expand the cache into the dataset

第二步:将缓存扩展为数据集

From that 256 MiB cache, RandomX builds the dataset. The default dataset size is: 2,147,483,648 bytes base size + 33,554,368 bytes extra size. Together that is about 2080 MiB, a little over 2 GiB. This odd-looking size is deliberate. It is big enough to spill out of on-chip memory and into DRAM, and the extra non-power-of-two tail makes life more annoying for hardware designers. The dataset is read-only during hashing. RandomX uses it to force regular DRAM traffic. The design doc says each program iteration reads one 64-byte dataset item, and across a whole hash result that becomes 16,384 dataset reads. That gives RandomX one of its main bottlenecks: memory access, not just arithmetic. RandomX 从那 256 MiB 的缓存中构建数据集。默认数据集大小为:2,147,483,648 字节的基础大小加上 33,554,368 字节的额外大小。总计约 2080 MiB,略高于 2 GiB。这个看起来很奇怪的大小是刻意设计的。它足够大,以至于会溢出片上内存进入 DRAM,而额外的非 2 的幂次方尾部让硬件设计者的工作变得更加麻烦。数据集在哈希过程中是只读的。RandomX 利用它来强制产生常规的 DRAM 流量。设计文档称,每个程序迭代读取一个 64 字节的数据集项,在整个哈希结果中,这相当于 16,384 次数据集读取。这为 RandomX 带来了其主要瓶颈之一:内存访问,而不仅仅是算术运算。

Step 3: initialize the scratchpad from the block input

第三步:从区块输入初始化暂存区(scratchpad)

Now RandomX turns to the per-hash input H. It computes Hash512(H) using Blake2b. That 64-byte result seeds an AES-based generator, which fills the scratchpad. The scratchpad is the VM’s working memory. Unlike the large dataset, it is meant to live in CPU cache, not DRAM. Its default size is 2 MiB, split to mimic CPU cache levels: 16 KiB L1, 256 KiB L2, 2 MiB L3. This is one of the smartest parts of RandomX. It uses two very different memory structures at once: a big dataset to hit DRAM, a smaller scratchpad to behave like cache-heavy code. That lets it pressure both the memory subsystem and the CPU core. 现在 RandomX 转而处理每次哈希的输入 H。它使用 Blake2b 计算 Hash512(H)。该 64 字节的结果为一个基于 AES 的生成器提供种子,进而填充暂存区。暂存区是虚拟机的内存工作区。与大型数据集不同,它旨在驻留在 CPU 缓存中,而不是 DRAM 中。其默认大小为 2 MiB,并被拆分以模拟 CPU 缓存层级:16 KiB L1、256 KiB L2、2 MiB L3。这是 RandomX 最聪明的地方之一。它同时使用了两种截然不同的内存结构:一个用于冲击 DRAM 的大型数据集,以及一个表现得像缓存密集型代码的小型暂存区。这使得它能够同时对内存子系统和 CPU 核心施加压力。

Step 4: generate a random program

第四步:生成随机程序

After the scratchpad is ready, RandomX generates a program for its virtual machine. This is not a C program or a JavaScript program. It is a compact VM program with its own instruction set. Two details matter a lot: Every instruction is 8 bytes long. Any 8-byte word is… 在暂存区准备好后,RandomX 为其虚拟机生成一个程序。这不是 C 程序或 JavaScript 程序,而是一个拥有自己指令集的紧凑型虚拟机程序。有两个细节非常重要:每条指令长度为 8 字节。任何 8 字节的字都是……