Escaping the Fork: How Meta Modernized WebRTC Across 50+ Use Cases

Escaping the Fork: How Meta Modernized WebRTC Across 50+ Use Cases

摆脱“分支陷阱”:Meta 如何在 50 多个用例中实现 WebRTC 的现代化

By Boris Tsirkin, Joachim Reiersen 作者:Boris Tsirkin, Joachim Reiersen

At Meta, WebRTC powers real-time audio and video across various platforms. But forking a large open-source project like WebRTC within our monorepo presents unique challenges – over time, an internal fork can drift behind upstream, cutting itself off from community upgrades. 在 Meta,WebRTC 为各种平台上的实时音频和视频提供支持。然而,在我们的单体仓库(monorepo)中对 WebRTC 这样的大型开源项目进行分支(fork)会带来独特的挑战——随着时间的推移,内部版本可能会逐渐落后于上游,从而无法获得社区的更新。

We’re sharing how we escaped this “forking trap” – from building a dual-stack architecture that enabled safe A/B testing across 50+ use cases, to the workflows that now keep us continuously upgraded with upstream. This approach improved performance, binary size, and security – and we continue to use it today to A/B test each new upstream release before rolling it out. 我们在此分享如何摆脱这一“分支陷阱”——从构建支持 50 多个用例安全 A/B 测试的双栈架构,到确保我们持续与上游同步升级的工作流。这种方法改善了性能、二进制文件大小和安全性,并且我们今天仍在继续使用它,在发布每个新的上游版本之前进行 A/B 测试。

At Meta, real-time communication (RTC) powers various services, from global Messenger and Instagram video chats to low-latency Cloud Gaming and immersive VR casting on Meta Quest. To meet the performance demands of billions of users, we spent years developing a specialized, high-performance variant of the open-source WebRTC library. 在 Meta,实时通信(RTC)为各种服务提供支持,从全球 Messenger 和 Instagram 视频聊天,到低延迟云游戏以及 Meta Quest 上的沉浸式 VR 投屏。为了满足数十亿用户的性能需求,我们花费数年时间开发了一种专门的、高性能的开源 WebRTC 库变体。

Permanently forking a big open-source project can result in a common industry trap. It starts with good intentions: You need a specific internal optimization or a quick bug fix. But over time, as the upstream project evolves and your internal features accumulate, the resources needed to merge in external commits can become prohibitive. 永久性地分支一个大型开源项目可能会导致行业内常见的陷阱。它始于良好的初衷:你需要特定的内部优化或快速的错误修复。但随着时间的推移,当上游项目不断演进且内部功能不断累积时,合并外部提交所需的资源可能会变得高不可攀。

Recently, we officially concluded a massive multiyear migration to break this cycle. We successfully moved over 50 use cases from a divergent WebRTC fork to a modular architecture built on top of the latest upstream version – using it as a skeleton while injecting our own proprietary implementations of key components. 最近,我们正式完成了一项历时多年的大规模迁移,旨在打破这一循环。我们成功地将 50 多个用例从一个分叉的 WebRTC 版本迁移到了基于最新上游版本构建的模块化架构上——将其作为骨架,同时注入我们自己对关键组件的专有实现。

This article details how we engineered a solution to solve the “forking trap,” allowing us to build two versions of WebRTC simultaneously within a single library for the sake of A/B testing, while living in a monorepo environment, with continuous upgrade cycles of the library that’s being tested. 本文详细介绍了我们如何设计解决方案来解决“分支陷阱”,使我们能够在单体仓库环境中,在同一个库中同时构建两个版本的 WebRTC 以进行 A/B 测试,并对正在测试的库进行持续的升级循环。

The Challenge: The Monorepo and the Static Linker

挑战:单体仓库与静态链接器

Upgrading a library like WebRTC can be risky, especially when upgrading while serving billions of users and introducing regressions that are hard to rollback. This also eliminates the possibility of a one-time upgrade, which could break some users’ experiences due to the variety of devices and environments we are running at. 升级像 WebRTC 这样的库可能存在风险,尤其是在服务数十亿用户时进行升级,且引入难以回滚的回归问题时。这也排除了“一次性升级”的可能性,因为我们运行在各种设备和环境中,这种升级可能会破坏部分用户的使用体验。

To mitigate this, we prioritized A/B testing capabilities in order to run the legacy version of WebRTC alongside the new upstream version with clean patches and apply our features in the same app while being able to dynamically switch users between them to verify the new version. 为了缓解这一问题,我们优先考虑 A/B 测试能力,以便在同一个应用程序中同时运行旧版 WebRTC 和带有干净补丁的新上游版本,并应用我们的功能,同时能够动态地在用户之间切换以验证新版本。

Due to application build graph and size constraints, we also prioritized finding a solution to statically link two WebRTC versions. However, this violates the C++ linker One Definition Rule (ODR), causing thousands of symbol collisions, so we turned to finding a way to make two versions of the same library coexist in the same address space. 由于应用程序构建图和大小限制,我们还优先寻找静态链接两个 WebRTC 版本的解决方案。然而,这违反了 C++ 链接器的“单一定义规则”(ODR),导致数千个符号冲突,因此我们转而寻找一种让同一库的两个版本在同一地址空间中共存的方法。

Furthermore, Meta is using a monorepo and we don’t want to undergo the same process over and over again. This motivated us to find a solution to maintain custom patches for open-source projects in a monorepo environment, while being able to pull new versions from upstream and apply the patches over and over again. 此外,Meta 使用的是单体仓库,我们不希望一遍又一遍地重复同样的过程。这促使我们寻找一种解决方案,以便在单体仓库环境中维护开源项目的自定义补丁,同时能够从上游拉取新版本并反复应用这些补丁。

This led us to focus on solving two challenges: We desired A/B testing capability. To achieve that, we built two copies of WebRTC in the same library due to application constraints. With no feature branches in monorepo, how do we track patches and rebase them? Other libwebrtc-based OSS projects usually do this by applying a set of stored patch files sequentially on top of the clean repo on each library upgrade. Due to scalability concerns, we explored more nuanced options. 这使我们专注于解决两个挑战:我们渴望 A/B 测试能力。为了实现这一点,由于应用程序的限制,我们在同一个库中构建了两个 WebRTC 副本。在没有功能分支的单体仓库中,我们如何跟踪补丁并进行变基(rebase)?其他基于 libwebrtc 的开源项目通常在每次库升级时,通过在干净的仓库之上按顺序应用一组存储的补丁文件来做到这一点。由于可扩展性的考虑,我们探索了更精细的方案。

Solution 1: The Shim Layer and Dual-Stack Architecture

方案一:垫片层(Shim Layer)与双栈架构

To address the A/B testing capability, we chose to build two copies of WebRTC within the same app. However, doing this statically within the same overarching call orchestration library creates unique challenges. To tackle this, we built a shim layer between the application layer and WebRTC. 为了解决 A/B 测试能力问题,我们选择在同一个应用程序中构建两个 WebRTC 副本。然而,在同一个总体的呼叫编排库中静态地执行此操作会带来独特的挑战。为了解决这个问题,我们在应用层和 WebRTC 之间构建了一个垫片层(shim layer)。

It is a proxy library that sits between our application code and the underlying WebRTC implementations. Instead of the app calling WebRTC directly, it calls the shim API. The shim exposes a single, unified, version-agnostic API. The shim layer holds a “flavor” configuration and dispatches each call to either the legacy or latest WebRTC implementation at runtime. 这是一个位于我们的应用代码和底层 WebRTC 实现之间的代理库。应用程序不再直接调用 WebRTC,而是调用垫片 API。垫片暴露了一个单一的、统一的、与版本无关的 API。垫片层持有一个“风味”(flavor)配置,并在运行时将每个调用分发给旧版或最新版的 WebRTC 实现。

This approach – shimming at the lowest possible layer – avoids a significant binary size regression that duplicating the higher-layer call orchestration library would have caused. Duplication would have resulted in an uncompressed size increase of approximately 38 MB, whereas our solution added only about 5 MB – an 87% reduction. Next, we’ll look at the hurdles introduced by this dual-stack architecture and how we resolved them. 这种方法——在尽可能低的层级进行垫片处理——避免了复制高层呼叫编排库所导致的显著二进制文件大小回归。复制会导致未压缩大小增加约 38 MB,而我们的解决方案仅增加了约 5 MB——减少了 87%。接下来,我们将看看这种双栈架构带来的障碍以及我们是如何解决它们的。

Solving Symbol Collisions

解决符号冲突

Statically linking two copies of WebRTC into a single binary produces thousands of duplicate symbol errors. In order to ensure every symbol in each flavor is unique, we leveraged automated renamespacing: We built scripts that systematically rewrite every C++ namespace in a given WebRTC version, so the webrtc:: namespace in the latest upstream copy becomes webrtc_latest::, while the legacy copy becomes webrtc_legacy::. This rename was applied to every external namespace in the library. 将两个 WebRTC 副本静态链接到单个二进制文件中会产生数千个重复符号错误。为了确保每个版本中的每个符号都是唯一的,我们利用了自动命名空间重命名:我们构建了脚本,系统地重写了给定 WebRTC 版本中的每个 C++ 命名空间,因此最新上游副本中的 webrtc:: 命名空间变成了 webrtc_latest::,而旧版本变成了 webrtc_legacy::。此重命名应用于库中的每个外部命名空间。

But not everything in WebRTC lives in a namespace – global C functions, free variables, and classes that were left outside namespaces intentionally or accidentally also collide. For those, we moved what we could into namespaces and manipulated the symbols of the rest (like global C functions) with flavor-specific identifiers. 但并非 WebRTC 中的所有内容都位于命名空间中——全局 C 函数、自由变量以及有意或无意留在命名空间之外的类也会发生冲突。对于这些,我们将能移动的内容移入命名空间,并使用特定于版本的标识符来处理其余部分(如全局 C 函数)的符号。

Macros and preprocessor flags presented a subtler problem. Macros like RTC_CHECK and RTC_LOG can be used outside of WebRTC in wrapper libraries, so including both versions’ headers in the same translation unit triggers redefinition errors. 宏和预处理器标志提出了一个更微妙的问题。像 RTC_CHECKRTC_LOG 这样的宏可以在 WebRTC 之外的包装库中使用,因此在同一个翻译单元中包含两个版本的头文件会触发重定义错误。