My Thoughts on Bun's Rust Rewrite

My Thoughts on Bun’s Rust Rewrite

关于 Bun 重写为 Rust 的一些思考

Before we discuss Rewrite Bun in Rust, there’s something that needs to be said, because no one is saying it. 在讨论 Bun 用 Rust 重写之前,有些话必须得说,因为目前还没人提过。

Bun stands where it does today because of Zig. Jarred chose Zig back then not because it was “cool,” but because Zig enabled a small team to rapidly prototype a high-performance JS runtime without a GC, without a heavy runtime. Zig’s low friction, direct memory manipulation, and straightforward C interop were the core reasons Bun could punch above its weight on performance with an extremely small team in its early days. The architecture, data structures, and low-level design of Bun that you see today – that was shaped by Zig. Bun 能有今天的地位,全靠 Zig。Jarred 当初选择 Zig 并非因为它“酷”,而是因为 Zig 让一个小团队能够在没有垃圾回收(GC)和沉重运行时的情况下,快速构建出一个高性能的 JS 运行时。Zig 的低摩擦力、直接的内存操作以及简洁的 C 语言互操作性,是 Bun 在早期能够以极小团队规模实现越级性能表现的核心原因。你今天所看到的 Bun 的架构、数据结构和底层设计,都是由 Zig 塑造的。

Jarred himself said: the architecture doesn’t change, the data structures don’t change. In plain English: the skeleton that the Rust rewrite inherits was built with Zig. Building the foundation with Zig, shipping the product with Zig, raising funding with Zig, and then switching to a more “mainstream” tech stack after the company gets acquired and has grown strong – there’s nothing wrong with this. It’s a normal business decision. That’s how tech debt works in Silicon Valley startups. Jarred 自己也说过:架构没变,数据结构也没变。通俗点说:这次 Rust 重写所继承的骨架,是用 Zig 搭建起来的。用 Zig 打地基、用 Zig 发布产品、用 Zig 融资,然后在公司被收购并壮大后切换到更“主流”的技术栈——这本身没什么问题。这是一种正常的商业决策。硅谷初创公司处理技术债的方式向来如此。

The Zig community doesn’t need Bun’s gratitude, but please don’t pretend this rewrite happened because Zig itself is inadequate. Zig 社区不需要 Bun 的感激,但请不要假装这次重写是因为 Zig 本身不行。

The Real Issue No One Dares to Say

没人敢说的真正问题

Now, let’s discuss the rewrite itself. 6,755 commits, branch name claude/phase-a-port, PR opened May 8th, merged May 14th. Six days. A full rewrite of a production-grade JS runtime, merged in six days. Let that number sit in your mind for a second. 现在,让我们讨论一下重写本身。6,755 次提交,分支名称为 claude/phase-a-port,PR 于 5 月 8 日开启,5 月 14 日合并。六天。一个生产级 JS 运行时的全面重写,在六天内合并完成。让这个数字在你的脑海里停留一会儿。

There’s a fundamental principle in software engineering: code you don’t understand should not run in production. Not because it necessarily has bugs, but because when it does bug out, you won’t know where to start looking. This principle isn’t conservatism – it’s the baseline of maintainability. 软件工程中有一条基本原则:你不理解的代码不应该在生产环境中运行。这并不是因为它一定有 Bug,而是因为当它出问题时,你根本不知道从哪里开始排查。这条原则不是保守,而是可维护性的底线。

6,755 commits, not a single line written by a human. The PR’s reviewer list: coderabbitai[bot] reviewed it, claude[bot] reviewed it, and the only human reviewer alii’s status was “Awaiting requested review” – hadn’t even looked. Code written by Claude, reviewed by Claude. This closed loop isn’t logically impossible, but it means: no human being has actually read this codebase in its entirety. 6,755 次提交,没有一行代码是由人类编写的。PR 的审查列表显示:coderabbitai[bot] 审查了它,claude[bot] 审查了它,而唯一的人类审查员 alii 的状态是“等待请求审查”——甚至还没看一眼。Claude 写的代码,Claude 审查。这个闭环在逻辑上并非不可能,但它意味着:没有任何人类真正完整地阅读过这套代码库。

“All Tests Pass” Doesn’t Mean What You Think

“所有测试通过”并不意味着你所想的那样

Someone will push back here: the test suite passes on all platforms – isn’t that validation? No. A test suite validates the correctness of known behavior on known paths. It does not validate: Whether error paths are handled correctly, Behavior at boundary conditions under stress, State consistency in concurrent scenarios, Whether the memory model conforms to intent under extreme conditions. 有人会反驳:测试套件在所有平台上都通过了——这难道不是验证吗?不。测试套件验证的是已知路径下已知行为的正确性。它无法验证:错误路径是否处理得当、压力下的边界条件表现、并发场景下的状态一致性,以及内存模型在极端条件下是否符合预期。

Jarred himself admitted: memory issues when re-entering across JS boundaries – the Rust compiler can’t handle that; it still relies on humans. And those parts that rely on humans? No human has reviewed them. Jarred 自己也承认:在跨越 JS 边界重新进入时存在内存问题——Rust 编译器无法处理这些,它仍然依赖人类。而那些依赖人类的部分呢?根本没有人审查过。

The more fundamental issue is: AI translates code via local semantic equivalence – it ensures each function behaves identically to the original in isolation, but it doesn’t understand the global invariants between functions – those design constraints that aren’t written into tests and live only in the original author’s head. These constraints might not show up in today’s tests, but could manifest six months from now under a specific production load in a completely inexplicable crash. 更根本的问题在于:AI 通过局部语义等价来翻译代码——它确保每个函数在孤立状态下与原版行为一致,但它并不理解函数之间的全局不变量——那些没有写进测试、只存在于原作者脑海中的设计约束。这些约束可能不会在今天的测试中显现,但六个月后,在特定的生产负载下,可能会演变成一场完全无法解释的崩溃。

This isn’t a knock on Claude. This is a problem any translation tool – including human programmers – faces without thorough review. At the scale of 6,755 commits, this risk is amplified 6,755 times. 这不是在批评 Claude。这是任何翻译工具(包括人类程序员)在没有彻底审查的情况下都会面临的问题。在 6,755 次提交的规模下,这种风险被放大了 6,755 倍。

After the Acquisition, the Risk Bearer Has Changed

被收购后,风险承担者已经变了

There’s a political-economy dimension here that technical discussions usually ignore. In the early days, Bun was Jarred betting on himself. Using Zig then, iterating fast, accepting tech debt – that was reasonable startup logic with self-assumed risk. 这里有一个技术讨论通常会忽略的政治经济学维度。在早期,Bun 是 Jarred 在为自己下注。那时使用 Zig、快速迭代、接受技术债——这是合理的初创公司逻辑,风险由自己承担。

Now Bun has been acquired by a major company, and its user base consists of real production systems. The risk bearer of this rewrite is no longer Jarred, but every engineer running Bun in production and the users behind them. 现在 Bun 已经被大公司收购,其用户群涵盖了真实的生产系统。这次重写的风险承担者不再是 Jarred,而是每一个在生产环境中运行 Bun 的工程师以及他们背后的用户。

Jarred says this version is still in canary, and there’s optimization and cleanup work to do before official release. Canary is a line of defense, but it’s not human review. Optimization and cleanup are code quality concerns, not comprehension concerns. A codebase that no one on the team has fully read – no matter how comprehensive the tests, no matter how long canary runs – its internal state is a black box to its maintainers. This will become very real pain at some future severe bug’s debugging scene. Jarred 说这个版本还在 Canary(金丝雀)阶段,在正式发布前还有优化和清理工作要做。Canary 是一道防线,但它不是人类审查。优化和清理是代码质量问题,而不是理解问题。一个团队中没有人完整阅读过的代码库——无论测试多么全面,无论 Canary 运行多久——其内部状态对维护者来说都是一个黑盒。在未来某个严重 Bug 的调试现场,这将会变成非常真实的痛苦。

Zig’s “Problems” Were Misdiagnosed

Zig 的“问题”被误诊了

Let’s return to Jarred’s stated reasons for migration: the Zig codebase had too many use-after-free bugs, double-frees, and memory leaks on error paths. This is true. But the conclusion that “Zig doesn’t work” drawn from this diagnosis is wrong. 让我们回到 Jarred 提到的迁移原因:Zig 代码库在错误路径上有太多的释放后使用(use-after-free)、重复释放(double-free)和内存泄漏问题。这是事实。但由此得出的“Zig 不行”的结论是错误的。

The correct diagnosis is: in a commercial project that prioritizes rapid iteration, the cognitive tax of manual memory management exceeded the team’s budget. This isn’t a bug in Zig – it’s a structural mismatch between Zig’s design goals and Bun’s business model. 正确的诊断是:在一个优先考虑快速迭代的商业项目中,手动内存管理的认知成本超出了团队的预算。这不是 Zig 的 Bug,而是 Zig 的设计目标与 Bun 的商业模式之间的结构性错位。

Zig’s target users are: systems programmers who know what they’re doing and are willing to pay the price for ultimate control. TigerBeetle used Zig to write a database with virtually no memory bugs, because their team culture and project nature align with Zig’s philosophy. Zig 的目标用户是:那些知道自己在做什么,并愿意为极致控制权付出代价的系统程序员。TigerBeetle 使用 Zig 编写了一个几乎没有内存 Bug 的数据库,因为他们的团队文化和项目性质与 Zig 的哲学相契合。

Bun’s team culture is fast iteration, fast shipping, fast bug fixes. There’s a fundamental tension between this and the rigorous memory discipline that Zig demands. This is a mismatch between Bun and Zig, not a failure of Zig. Interpreting “our team frequently makes mistakes with this tool” as “this tool is inadequate” is an attribution error. The hammer doesn’t fit, but it’s not the hammer’s fault. Bun 的团队文化是快速迭代、快速发布、快速修复 Bug。这与 Zig 所要求的严苛内存纪律之间存在根本性的张力。这是 Bun 和 Zig 之间的错位,而不是 Zig 的失败。将“我们的团队经常在这个工具上犯错”解读为“这个工具不行”,是一种归因错误。锤子不趁手,但那不是锤子的错。

So, Will This Rewrite Work?

那么,这次重写会成功吗?

Honestly: short-term it’ll probably be fine; long-term there are structural risks. 老实说:短期内可能没问题;但长期来看,存在结构性风险。