A new register allocator for ZJIT
A new register allocator for ZJIT
ZJIT 的新寄存器分配器
We recently landed a new register allocator in ZJIT and I thought I’d write a post about it! What is a register allocator? Whenever a compiler generates machine code it needs to decide where to put values. Those values usually take the shape of a variable in your function, though the compiler can also compute intermediate values as well. 我们最近在 ZJIT 中引入了一个新的寄存器分配器,我想写一篇文章来介绍它!什么是寄存器分配器?每当编译器生成机器码时,它都需要决定将数值放在哪里。这些数值通常表现为你函数中的变量,尽管编译器也可以计算中间值。
When we need to perform a calculation on some value, the CPU needs to know how to find the value. The CPU can typically only compute output based on inputs that are in registers, though some architectures (like x86) allow computations on values stored in memory. That said, reading and writing to registers is much faster than memory, so it behooves the compiler to keep values in registers as long as possible. 当我们对某个数值进行计算时,CPU 需要知道如何找到该数值。CPU 通常只能基于寄存器中的输入来计算输出,尽管某些架构(如 x86)允许对存储在内存中的值进行计算。话虽如此,读写寄存器的速度远快于内存,因此编译器有必要尽可能长时间地将数值保留在寄存器中。
Any particular function in your program could have tons of variables, but the number of available registers is finite, and architecture dependent. This is where a register allocator comes in. The register allocator looks at all of the variables, then figures out which registers they should go in, and if there aren’t enough registers figures out how to “spill” those variables to memory. 程序中的任何特定函数都可能拥有大量的变量,但可用寄存器的数量是有限的,且取决于架构。这就是寄存器分配器发挥作用的地方。寄存器分配器会查看所有变量,确定它们应该放入哪些寄存器,如果寄存器不足,则会计算如何将这些变量“溢出”(spill)到内存中。
How does it work? There are several well-known approaches to register allocation, each with different trade-offs between compile time and code quality. For ZJIT, we chose to implement a linear scan register allocator based on a reduced version of Christian Wimmer’s paper titled “Linear Scan Register Allocation on SSA Form”. The paper isn’t very long so I highly recommend giving it a read when you have time! Alternatively, Max Bernstein wrote a blog post breaking down the paper as well. 它是如何工作的?有几种众所周知的寄存器分配方法,每种方法在编译时间和代码质量之间都有不同的权衡。对于 ZJIT,我们选择实现一种线性扫描寄存器分配器,它基于 Christian Wimmer 的论文《Linear Scan Register Allocation on SSA Form》的简化版本。这篇论文不长,所以我强烈建议你有时间读一读!另外,Max Bernstein 也写了一篇博客文章对该论文进行了详细解读。
ZJIT uses Static single-assignment form, or SSA form in its back end. “SSA form” is a representation of code that only allows a variable to be assigned once. For example, this Ruby code: ZJIT 在其后端使用静态单赋值形式(SSA form)。“SSA 形式”是一种只允许变量被赋值一次的代码表示方式。例如,这段 Ruby 代码:
a = 123
a += 1
a
Would be represented in our backend Low-level Intermediate Representation (or “LIR”) kind of like this: 在我们的后端低级中间表示(或“LIR”)中,它会被表示为类似这样:
v1 = Const(123)
v2 = Const(1)
v3 = Add v1, v2
CRet v3
In the above pseudocode (it’s not exactly the same as the IR we use in the backend, but quite close), v1, v2, and v3 are all different variables. No variables are allowed to be re-assigned like in the corresponding Ruby code. The generated intermediate representation has exactly the same semantics as the Ruby code, but adds the restriction that a variable can only be written to once. 在上面的伪代码中(它与我们在后端使用的 IR 不完全相同,但非常接近),v1、v2 和 v3 都是不同的变量。不允许像对应的 Ruby 代码那样重新赋值变量。生成的中间表示与 Ruby 代码具有完全相同的语义,但增加了一个限制:变量只能被写入一次。
When ZJIT generates an intermediate representation of the Ruby code, it translates all variables to these numeric SSA variables and also uses SSA variables for temporary values. In the example translation above we can think of v1 as being equivalent to a, v2 as a temporary variable with the value 1, and v3 as a new version of a that has been added with v2. Variables can be written to once, but you can read from the variable as many times as you want. We call the place where the variable was written the “definition” and the place where the variable is read a “use”. 当 ZJIT 生成 Ruby 代码的中间表示时,它会将所有变量转换为这些数字化的 SSA 变量,并为临时值也使用 SSA 变量。在上面的转换示例中,我们可以认为 v1 等同于 a,v2 是一个值为 1 的临时变量,而 v3 是 a 的一个新版本,即 a 加上 v2 的结果。变量可以被写入一次,但你可以根据需要多次读取该变量。我们将变量被写入的地方称为“定义”(definition),将变量被读取的地方称为“使用”(use)。
Lifetimes / 生命周期
The first thing a register allocator needs to understand is when each value is alive and for how long. We call this duration a “lifetime” or “live range”. A value’s lifetime (or live range) starts at the point where it’s defined (the definition) and ends at the place it is last used (its “last use”). 寄存器分配器首先需要理解的是每个数值何时存活以及存活多久。我们将这段持续时间称为“生命周期”或“活跃范围”。一个数值的生命周期(或活跃范围)始于它被定义的地方(定义),终于它最后一次被使用的地方(“最后使用”)。
If two values have overlapping lifetimes they can’t share the same register. If there are more overlapping lifetimes than registers, then we know we need to spill a value to memory. Since these ranges refer to the “lifetime” of the variable, it’s common to say that the variable “came to life” at its definition, and then “died” at its last use. 如果两个数值的生命周期重叠,它们就不能共享同一个寄存器。如果重叠的生命周期数量超过了寄存器数量,那么我们就知道需要将一个数值溢出到内存中。由于这些范围指的是变量的“生命周期”,通常我们会说变量在定义时“诞生”,在最后一次使用时“死亡”。
Let’s look at an example. Consider this Ruby method that is already in SSA form: 让我们看一个例子。考虑这个已经处于 SSA 形式的 Ruby 方法:
def add_twice(a, b, c)
d = a + b
e = d + c
e
end
Here are the lifetimes for each value: 以下是每个数值的生命周期:
# Instruction | a | b | c | d | e
-------------------+-----+-----+-----+-----+----
1 d = a + b | x | x | . | . |
2 e = d + c | | | x | x | .
3 return e | | | | | x
A . character means that the variable is alive at that instruction. An x character means the variable dies, but is still used at that instruction. We can also express these lifetimes as ranges. The live range for each value is [definition, last use]:
. 字符表示变量在该指令处存活。x 字符表示变量在该指令处死亡,但仍在该指令中被使用。我们也可以将这些生命周期表示为范围。每个数值的活跃范围是 [定义, 最后使用]:
a: [0, 1] b: [0, 1] c: [0, 2] d: [1, 2] e: [2, 3]
Where instruction 0 represents the function entry (parameter definitions). The parameters a, b, and c are all alive at instruction 1 because they were defined before the method body. a and b are last used at instruction 1, so they die after that. c stays alive through instruction 2 because that’s where it’s last used. d is defined at instruction 1 and last used at instruction 2, so it’s alive for both. e is defined at instruction 2 and last used at instruction 3. 其中指令 0 代表函数入口(参数定义)。参数 a、b 和 c 在指令 1 处都是存活的,因为它们是在方法体之前定义的。a 和 b 在指令 1 处最后一次被使用,所以它们在那之后死亡。c 在指令 2 处依然存活,因为那是它最后一次被使用的地方。d 在指令 1 处定义,在指令 2 处最后一次使用,所以它在这两处都存活。e 在指令 2 处定义,在指令 3 处最后一次使用。
At instruction 1, a and b die and d comes to life. Since we know that a and b are never used again, we’re free to reuse one of their registers for d. So we only need three registers at instruction 1: one each for a/d, b, and c. Similarly at instruction 2, c and d die as e is born, so we can reuse a register again. 在指令 1 处,a 和 b 死亡,d 诞生。由于我们知道 a 和 b 不会再被使用,我们可以自由地将它们的一个寄存器重用于 d。因此,在指令 1 处我们只需要三个寄存器:a/d、b 和 c 各一个。同样在指令 2 处,c 和 d 死亡,e 诞生,所以我们可以再次重用寄存器。
Computing lifetimes involves a backward dataflow analysis over the control flow graph. We walk the instructions in reverse order, tracking which values are currently live. When we see the definition or use of a value, we know it is live at that instruction. 计算生命周期涉及对控制流图进行反向数据流分析。我们以相反的顺序遍历指令,跟踪哪些数值当前是存活的。当我们看到一个数值的定义或使用时,我们就知道它在该指令处是存活的。
ZJIT has a debugging option to dump live range graphs like this: ZJIT 有一个调试选项可以导出如下的活跃范围图:
$ ruby --zjit-call-threshold=2 --zjit-dump-lir=live_intervals ../test.rb
Here is an example of the output graph from one basic block as an excerpt from many blocks in a larger function: 这是一个来自较大函数中多个基本块之一的输出图示例:
v0 v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11
--- --- --- --- --- --- --- --- --- --- --- ---
i0 : . . . . . . . . . . . . FrameSetup
i2 : v . . . . . . . . . . . v0 = Load [x21 - 0x28]
i4 : █ v . . . . . . . . . . v1 = Load [x21 - 0x20]
i6 : █ █ . . . . . . . . . . Jmp bb3_l2([x19 + 0x18], v0, v1)
As mentioned earlier, in ZJIT, SSA variables names are just numbers. In the block above we have 12 variables numbered 0 through 11. These variables are listed in the first 如前所述,在 ZJIT 中,SSA 变量名只是数字。在上面的块中,我们有 12 个编号从 0 到 11 的变量。这些变量列在第一行。