On Rendering Diffs

On Rendering Diffs

You open a pull request expecting to understand what changed. For small and medium changes, everything works. The code is readable, the files are there, you scroll around, add comments, and it’s all pretty seamless. 当你打开一个合并请求(Pull Request)时,你期望能看懂代码发生了什么变化。对于中小型变更,一切都很顺畅。代码清晰易读,文件结构一目了然,你可以随意滚动、添加评论,整个过程非常无缝。

Then you open something larger. Maybe an agent generated the implementation, tests, fixtures, and snapshots. Maybe the branch just touched more files than expected. Either way, the review surface starts to degrade. It might only show you one file at a time, or require each file to be loaded separately before you can read it, or even make basic navigation feel sluggish. 但当你打开规模更大的变更时,情况就变了。也许是 AI 代理生成了实现代码、测试用例、固件和快照,又或者是该分支涉及的文件超出了预期。无论哪种情况,代码审查界面的体验都会开始下降。它可能一次只显示一个文件,或者要求你逐个加载文件才能阅读,甚至会让基础的导航操作变得迟钝。

Some of these are reasonable trade-offs for genuinely hard problems. But they still have a cost: reviewers feel the limits of the tool, and product teams have to build workarounds for these limits. Diff rendering matters, but for most tools it is not the product. The product is what happens around the code: review workflows, automation, agent output, CI results, and collaboration. Code review should support that work, not become something every team has to build from scratch. 其中一些是针对棘手问题的合理权衡,但它们依然有代价:审查者会感受到工具的局限性,而产品团队则不得不为这些限制构建变通方案。差异(Diff)渲染固然重要,但对于大多数工具而言,它并非核心产品。真正的产品是围绕代码所发生的一切:审查工作流、自动化、代理输出、CI 结果以及协作。代码审查应该支持这些工作,而不是成为每个团队都必须从零开始构建的东西。

That is why, about 6 months ago, we released Diffs. Our goal was to make the code and diff rendering part just work, so teams could spend their time on the product around it. Originally we launched with just the basic pieces: File and FileDiff components. We quickly got feedback about performance issues, so we followed up with a simple virtualizer that avoided rendering code when it was out of view and an API to move syntax highlighting into worker threads. 正因如此,大约 6 个月前,我们发布了 Diffs。我们的目标是让代码和差异渲染部分“开箱即用”,这样团队就能将精力集中在围绕代码的产品本身上。最初,我们只发布了基础组件:File 和 FileDiff。很快我们收到了关于性能问题的反馈,于是我们跟进了一个简单的虚拟化器(Virtualizer),避免渲染视口之外的代码,并提供了一个将语法高亮移至 Worker 线程的 API。

The simple virtualizer helped, but it was a stopgap. There was still a lot of O(n×m) complexity, high memory usage, and virtualization blanking. What was missing was a higher-level component that could manage an entire review surface and handle the hard problems related to scale. That missing layer became CodeView: a virtualization-first component for reviewing code and diffs. And we built it around a deliberately impossible goal: You should be able to just render any diff. 简单的虚拟化器起到了一定作用,但它只是权宜之计。系统中仍然存在大量的 O(n×m) 复杂度、高内存占用以及虚拟化导致的空白闪烁问题。我们缺少的是一个更高层级的组件,能够管理整个审查界面并处理与规模相关的难题。这个缺失的层级就是 CodeView:一个以虚拟化为核心的、用于审查代码和差异的组件。我们围绕一个“故意不可能”的目标构建了它:你应该能够渲染任何差异。

Not literally, of course. There are physical limits to browsers, compute, and memory. But practically speaking, I think we’ve come pretty close, and I’d like to share a bit about how we got there. If you find long-form blog posts boring, go check out the CodeView playground at DiffsHub.com where you can pretty much view any PR or diff that GitHub will send our way. Nearly any diff, at any scale, nearly instantly. 当然,这并非字面意义上的“任何”。浏览器、计算能力和内存都有物理极限。但从实际角度来看,我认为我们已经非常接近这个目标了,我想分享一下我们是如何做到的。如果你觉得长篇博文枯燥,可以去 DiffsHub.com 的 CodeView 演练场看看,在那里你可以查看 GitHub 上的几乎任何 PR 或差异。几乎任何规模的差异,都能近乎即时地呈现。


DIFFS LOOK SIMPLE UNTIL THEY ARE NOT

差异看起来很简单,直到它们变得复杂

On the surface, rendering diffs in a browser may not seem very hard. It’s just text, right? Browsers are purpose-built to take raw HTML and turn that into something you can look at and interact with. Code is just text, after all. But a good review surface needs more than text. It needs syntax highlighting, line numbers, annotations, comments, theming, split and unified layouts, wrapping modes, and enough customization to fit into someone else’s product. 从表面上看,在浏览器中渲染差异似乎并不难。不就是文本吗?浏览器天生就是为了处理原始 HTML 并将其转化为可查看和交互的内容而设计的。毕竟,代码也只是文本。但一个优秀的审查界面需要的不仅仅是文本。它还需要语法高亮、行号、标注、评论、主题、分栏与合并视图、换行模式,以及足够的定制化能力以适配他人的产品。

Each of those features adds cost and complexity. Syntax highlighting adds processing time and inflates DOM count. Comments involve additional layout complexity that we can’t fully control, and they still have to work seamlessly with your existing design system. With CodeView, we take that per-file complexity and scale it up; work that was cheap for a single diff now has meaningful cost across a large review. 每一项功能都增加了成本和复杂度。语法高亮增加了处理时间并膨胀了 DOM 数量。评论涉及我们无法完全控制的额外布局复杂度,且它们必须与你现有的设计系统无缝协作。通过 CodeView,我们将这种单文件的复杂度进行了扩展;对于单个差异来说成本很低的工作,在大型审查中就会产生显著的开销。

We can roughly break down the problems into three categories: 我们可以将这些问题大致分为三类:

  • Rendering — DOM complexity grows quickly, and the browser can become overloaded while scrolling or interacting with the page. 渲染 — DOM 复杂度增长迅速,浏览器在滚动或与页面交互时可能会过载。
  • Processing — Every file or diff operation gets multiplied, so work that was fast in isolation can become expensive when repeated thousands of times. 处理 — 每个文件或差异操作都会被成倍放大,因此在孤立情况下很快的操作,在重复数千次后可能会变得昂贵。
  • Memory — Large files and diffs get transformed into rendering data structures, which can push against browser memory limits and make garbage collection more frequent. 内存 — 大型文件和差异会被转换为渲染数据结构,这可能会触及浏览器内存限制,并导致垃圾回收更加频繁。

Our simple virtualizer helped with some rendering problems, and moving highlighting off the main thread helped with parts of the processing problem. But CodeView needed to treat rendering, memory, and processing as connected parts of the same problem. 我们简单的虚拟化器解决了一些渲染问题,将高亮处理移出主线程也解决了一部分处理问题。但 CodeView 需要将渲染、内存和处理视为同一个问题的关联部分来统一处理。


VIRTUALIZATION

虚拟化

Virtualization, or windowing, is a way of tackling the rendering problem. In its simplest form, the idea is to only render the part of the content near the viewport. As you scroll, the virtualizer renders the new content coming into view and removes content that has moved off screen. 虚拟化(或称窗口化)是解决渲染问题的一种方法。最简单的形式是:只渲染视口附近的内容。当你滚动时,虚拟化器会渲染进入视野的新内容,并移除移出屏幕的内容。

Keeping the DOM small has a lot of benefits: lower memory usage, less layout work, less paint work, and fewer elements for the browser to manage. The trade-off is that the virtualizer has to estimate or measure how tall everything is, and it must coordinate those changes dynamically. One thing that adds to this complexity is that browsers generally manage scroll compositing separately from JavaScript execution. This can help scrolling feel more responsive to user interactions, but it also means that JavaScript can easily lag behind scroll updates. 保持较小的 DOM 有很多好处:更低的内存占用、更少的布局工作、更少的重绘工作,以及更少的浏览器管理元素。代价是虚拟化器必须估算或测量所有内容的高度,并动态协调这些变化。增加这种复杂性的一个因素是,浏览器通常将滚动合成与 JavaScript 执行分开管理。这有助于让滚动对用户交互更灵敏,但也意味着 JavaScript 很容易滞后于滚动更新。