zlib-rs in Firefox
zlib-rs in Firefox
zlib-rs in Firefox 2026-06-16 Author: Folkert de Vries
As of 151.0.0, Firefox uses zlib-rs for gzip (de)compression. This is very exciting, and has both performance and safety advantages. We first started talking to Mozilla engineers in summer 2024, and it took 2 years to actually get zlib-rs into production. What took us so long? 从 151.0.0 版本开始,Firefox 正式使用 zlib-rs 进行 gzip 压缩与解压。这令人非常兴奋,因为它在性能和安全性方面都具备优势。我们最早在 2024 年夏季与 Mozilla 的工程师进行沟通,历时两年才最终将 zlib-rs 投入生产环境。是什么让我们耗费了这么久?
Integrating zlib-rs into the Firefox codebase
将 zlib-rs 集成到 Firefox 代码库中
Switching to zlib-rs is not entirely trivial: we present zlib-rs as a drop-in compatible replacement, but there are some asterisks to this claim. We change the algorithms that are used at the different compression levels (in a way that is consistent with zlib-ng, but inconsistent with stock zlib), so the exact output bytes and output length can change slightly. The Firefox test suite tested for the exact output bytes in some cases, and for the (rough) output length in more. This is a good fail safe against messing up the compression configuration, but now these tests all needed to be updated. 切换到 zlib-rs 并非易事:虽然我们将 zlib-rs 作为直接兼容的替代品提供,但这一说法有一些附加条件。我们在不同的压缩级别下更改了所使用的算法(与 zlib-ng 一致,但与原生 zlib 不一致),因此确切的输出字节和输出长度可能会略有变化。Firefox 的测试套件在某些情况下会测试确切的输出字节,在更多情况下则测试(大致的)输出长度。这对于防止压缩配置出错是一个很好的安全保障,但现在这些测试都需要进行更新。
Firefox also adds a prefix to all symbols: instead of inflate it uses MOZ_Z_inflate to prevent symbol clashes. We’ve long supported prefixing the symbol name in various ways, so getting this to work was just a matter of configuration. So some work was needed, but the changes were straightforward. All seemed well, until…
Firefox 还为所有符号添加了前缀:它使用 MOZ_Z_inflate 而不是 inflate,以防止符号冲突。我们长期以来一直支持以各种方式为符号名称添加前缀,因此实现这一点仅仅是配置问题。虽然需要做一些工作,但改动非常直接。一切看起来都很顺利,直到……
Intel CPU bug
Intel CPU 漏洞
We started seeing crashes. The logs showed that a bounds check had failed that logically couldn’t fail. Of course, we’re lucky that we even got a bounds check failure; in C you’d just get silent data corruption. We could not reproduce the issue locally, and as more reports came in, a pattern started to emerge: our implementation triggered the infamous Intel Raptor Lake CPU bug. This generation of CPUs is plagued by instability and degradation issues. Something in our code was prone to triggering these issues, but of course we had no idea what, or even how to track it down. 我们开始发现崩溃现象。日志显示,一个逻辑上不可能失败的边界检查失败了。当然,我们很幸运能得到边界检查失败的提示;如果是在 C 语言中,你只会遇到静默的数据损坏。我们在本地无法复现该问题,随着报告越来越多,一种模式开始显现:我们的实现触发了臭名昭著的 Intel Raptor Lake CPU 漏洞。这一代 CPU 受不稳定性及性能衰减问题的困扰。我们代码中的某些部分容易触发这些问题,但我们当然不知道是什么,甚至不知道如何追踪它。
Eventually Fabian Giesen wrote “Oodle 2.9.14 and Intel 13th/14th gen CPUs”, which identifies the problem as a particular instruction used in writing the result of Huffman coding to memory. Zlib also uses Huffman coding, and zlib-rs turned out to also use the offending instruction. Still, finding and shipping the solution in Firefox is not a quick fix. This May, shortly after the 151 release, Mozilla engineers shipped the patch, “After a year, Firefox finally stops crashing on Intel’s Raptor Lake CPUs — Mozilla releases new version patch critical flaw on Intel 13th-gen and 14th-gen CPUs”. 最终,Fabian Giesen 撰写了《Oodle 2.9.14 与 Intel 13/14 代 CPU》一文,指出问题在于将霍夫曼编码结果写入内存时使用的一条特定指令。Zlib 也使用霍夫曼编码,而 zlib-rs 恰好也使用了这条有问题的指令。尽管如此,在 Firefox 中找到并发布解决方案并非易事。今年五月,在 151 版本发布后不久,Mozilla 工程师发布了补丁,标题为“历时一年,Firefox 终于停止在 Intel Raptor Lake CPU 上崩溃——Mozilla 发布新版本补丁修复 Intel 13 代和 14 代 CPU 的关键缺陷”。
Fixing the bug
修复漏洞
Once you know what to look for, fixing the issue is reasonably straightforward. We had this function: 一旦你知道要找什么,修复这个问题就相当简单了。我们原本有这样一个函数:
pub fn push_dist(&mut self, dist: u16, len: u8) {
let buf = &mut self.buf.as_mut_slice()[self.filled..][..3];
let [dist1, dist2] = dist.to_le_bytes();
buf[0] = dist1;
buf[1] = dist2;
buf[2] = len;
self.filled += 3;
}
This code is dead simple: we assign three byte values to consecutive indices of an array. But the assembly for this function (with LLVM 22) has this move from ch to memory, which is bits 8-15 of the RCX register: mov byte ptr [rsi + rdi + 1], ch. Due to the hardware bug, occasionally this instruction will actually write bits 0-7 instead, causing the crashes we were seeing.
这段代码非常简单:我们将三个字节值分配给数组的连续索引。但该函数(使用 LLVM 22)生成的汇编代码中包含一条从 ch 到内存的移动指令,即 RCX 寄存器的 8-15 位:mov byte ptr [rsi + rdi + 1], ch。由于硬件漏洞,该指令偶尔会错误地写入 0-7 位,从而导致了我们所见的崩溃。
To work around LLVM emitting this particular instruction, we use a tiny bit of unsafe code (LLVM is clever, so this was the simplest way we’ve found to have it generate the right thing):
为了绕过 LLVM 生成这条特定指令的问题,我们使用了一小段 unsafe 代码(LLVM 很聪明,所以这是我们找到的让它生成正确指令的最简单方法):
pub fn push_dist(&mut self, dist: u16, len: u8) {
let buf = &mut self.buf.as_mut_slice()[self.filled..][..3];
let bytes = dist.to_le_bytes();
unsafe { buf.as_mut_ptr().cast::<[u8; 2]>().write_unaligned(bytes) }
buf[2] = len;
self.filled += 3;
}
The fix in Firefox by Mike Hommey is here. The patch has been upstreamed into zlib-rs and we will continue to carry that patch for the foreseeable future: it’s a marginal amount of unsafe that is easily vetted. These are the sacrifices we make to run reliably on a variety of platforms. It turns out that LLVM 23 no longer emits the offending instruction, although I believe that is serendipitous and not deliberate. When we bump our MSRV to a version that requires LLVM 23 (e.g. for custom allocators and c-variadic functions) we can drop this workaround.
Mike Hommey 在 Firefox 中的修复方案见此处。该补丁已合并到 zlib-rs 上游,在可预见的未来我们将继续保留该补丁:这是一小部分很容易审查的 unsafe 代码。为了在各种平台上可靠运行,我们必须做出这些牺牲。事实证明,LLVM 23 不再生成该有问题的指令,尽管我认为这纯属巧合而非刻意为之。当我们提高 MSRV(最低支持 Rust 版本)到需要 LLVM 23 的版本时(例如为了支持自定义分配器和 C 可变参数函数),我们就可以移除这个变通方案了。
Results
结果
So why go through all of this trouble? Because zlib-rs is faster. Much faster. Especially on linux x86_64 the speedup is almost silly. These benchmarks from zlib-py compare stock zlib versus zlib-rs: 那么为什么要经历这么多麻烦呢?因为 zlib-rs 更快。快得多。特别是在 Linux x86_64 上,速度提升简直不可思议。以下来自 zlib-py 的基准测试对比了原生 zlib 与 zlib-rs:
(此处省略基准测试表格)
Compression is also faster, but harder to compare because the difference in compression ratio. 压缩速度也更快,但由于压缩比的差异,比较起来比较困难。