Adopting AV1 for Real-Time Communication (RTC) at Scale
Adopting AV1 for Real-Time Communication (RTC) at Scale
在大规模实时通信 (RTC) 中采用 AV1
By Yu-Chen (Eric) Sun, Jie Dong, Kewei Huang, Dave Jack, Joachim Reiersen, Phil Scherbel, Karthik Sekuru, Vertika Singh, Thileepan Subramaniam, Wei Zhou 作者:Yu-Chen (Eric) Sun, Jie Dong, Kewei Huang, Dave Jack, Joachim Reiersen, Phil Scherbel, Karthik Sekuru, Vertika Singh, Thileepan Subramaniam, Wei Zhou
Adopting AV1 for real-time communication at Meta has been a multi-year effort spanning codec selection, device eligibility, rate control, and error resilience. We’re sharing the technical and operational challenges while deploying AV1 and expanding coverage, and how we addressed them for real-time communication. We’re presenting several technologies for improving AV1 call quality, including rate control and error resilience. 在 Meta,采用 AV1 进行实时通信是一项历时多年的工作,涵盖了编解码器选择、设备适配性、码率控制和错误恢复等多个方面。我们在此分享在部署 AV1 并扩大覆盖范围时所面临的技术和运营挑战,以及我们如何解决这些问题以实现实时通信。我们还将介绍几种提高 AV1 通话质量的技术,包括码率控制和错误恢复。
The AV1 video codec, first standardized by AOMedia in 2018, has rapidly evolved and gained widespread industry support. Today, leading companies like YouTube, Netflix, and Meta stream video using AV1 at scale. Meta introduced AV1 for real-time video calls on high-end devices in 2023, aiming to deliver superior call quality. Since then, we have made notable progress in expanding AV1’s reach and improving the experience for AV1-powered calls. Today, AV1 is enabled on the majority of mobile devices in Meta Real-Time Communication (RTC) applications such as Messenger and WhatsApp. AV1 视频编解码器于 2018 年由 AOMedia 完成标准化,此后迅速发展并获得了广泛的行业支持。如今,YouTube、Netflix 和 Meta 等领先公司都在大规模使用 AV1 进行视频流传输。Meta 于 2023 年在高端设备上引入了用于实时视频通话的 AV1,旨在提供卓越的通话质量。自那时起,我们在扩大 AV1 的覆盖范围和改善 AV1 通话体验方面取得了显著进展。目前,AV1 已在 Meta 的实时通信 (RTC) 应用(如 Messenger 和 WhatsApp)的大多数移动设备上启用。
Why Is Meta Interested in Adopting AV1 for RTC?
为什么 Meta 有兴趣在 RTC 中采用 AV1?
The motivation for switching to a more advanced video codec is straightforward — it delivers the same visual quality while using much less bandwidth. In offline tests, we observed at least a 20% bitrate reduction with AV1 compared with H.264/AVC under our product settings on low-end and mid-range devices. If devices can accommodate higher encoding complexity, the bitrate reductions are even greater. 转向更先进视频编解码器的动机很简单——它能在使用更少带宽的情况下提供相同的视觉质量。在离线测试中,我们观察到在低端和中端设备上,使用我们的产品设置时,AV1 相比 H.264/AVC 至少降低了 20% 的码率。如果设备能够承受更高的编码复杂度,码率的降低幅度会更大。
For real-time video calls, this means people on slower or limited networks can enjoy significantly better video quality. This is important to our users because, to meet low-latency requirements, the RTC product must handle bitrate fluctuations. In real-world networks — especially in emerging markets — video bitrates for RTC products typically range from 10 kbps to 400 kbps. Maintaining good video quality below 100 kbps remains challenging. 对于实时视频通话而言,这意味着处于较慢或受限网络环境中的用户可以享受到明显更好的视频质量。这对我们的用户很重要,因为为了满足低延迟要求,RTC 产品必须能够处理码率波动。在现实网络中——尤其是在新兴市场——RTC 产品的视频码率通常在 10 kbps 到 400 kbps 之间。在 100 kbps 以下保持良好的视频质量仍然是一项挑战。
To evaluate the user experience across codecs, we enabled AV1 in the Messenger app and conducted a side-by-side comparison using two Android phones. In the examples below, AV1 is displayed on the right and H.264/AVC on the left, both limited to 100 kbps. The H.264/AVC video appears noticeably blurry, while the AV1 video remains much clearer — highlighting the significant advantage of AV1 for video calls under bandwidth constraints. 为了评估不同编解码器的用户体验,我们在 Messenger 应用中启用了 AV1,并使用两部安卓手机进行了对比测试。在下方的示例中,右侧显示的是 AV1,左侧是 H.264/AVC,两者均限制在 100 kbps。H.264/AVC 视频看起来明显模糊,而 AV1 视频则清晰得多——这凸显了 AV1 在带宽受限情况下进行视频通话的显著优势。
An increased focus on screen content, needs support from high-quality computer generated content encoding. Traditionally, video encoders aren’t that well suited to complex content such as text with a lot of high-frequency content, and people are very sensitive to reading blurry text. AV1 has a set of coding tools — palette mode and intra-block copy — that drastically improve performance for screen content. 对屏幕内容关注度的提升,需要高质量的计算机生成内容编码支持。传统上,视频编码器并不太适合处理包含大量高频内容的文本等复杂内容,而人们对阅读模糊文本非常敏感。AV1 拥有一套编码工具——调色板模式 (palette mode) 和帧内块复制 (intra-block copy)——可以显著提升屏幕内容的编码性能。
Palette mode is designed according to the observation that the pixel values in a screen-content frame usually concentrate on the limited number of color values. It can represent the screen content efficiently by signaling the color clusters instead of the quantized transform-domain coefficients. In addition, for typical screen content, repetitive patterns can usually be found within the same picture. Intra-block copy facilitates block prediction within the same frame, so that the compression efficiency can be improved significantly. AV1 has the benefit of providing these two tools at the main profile. 调色板模式的设计基于这样一个观察:屏幕内容帧中的像素值通常集中在有限的几种颜色值上。它可以通过标记颜色簇而不是量化的变换域系数来高效地表示屏幕内容。此外,对于典型的屏幕内容,通常可以在同一画面中找到重复的模式。帧内块复制有助于在同一帧内进行块预测,从而显著提高压缩效率。AV1 的优势在于其主配置 (main profile) 中提供了这两种工具。
The Challenges in Adopting AV1
采用 AV1 面临的挑战
While the comparison clearly illustrates AV1’s advantages, there are significant challenges to its adoption in RTC. Unlike video on demand (VOD), RTC systems must manage end-to-end video latency, which ideally should remain below 300 milliseconds. If latency exceeds this threshold, people begin to notice delays in the conversation. Maintaining both high video quality and low latency is challenging. For example, multi-pass encoding techniques — which can improve quality — introduce additional delay. On the decoder side, extensive buffering further increases latency. Additionally, any sudden spikes in bitrate can cause video freezes during calls, degrading the user experience. 虽然对比测试清楚地展示了 AV1 的优势,但在 RTC 中采用它也面临重大挑战。与视频点播 (VOD) 不同,RTC 系统必须管理端到端的视频延迟,理想情况下应保持在 300 毫秒以下。如果延迟超过此阈值,人们就会开始察觉到对话中的延迟。同时保持高质量视频和低延迟是非常困难的。例如,可以提高质量的多遍编码技术会引入额外的延迟。在解码端,大量的缓冲会进一步增加延迟。此外,码率的任何突发峰值都可能导致通话期间视频卡顿,从而降低用户体验。
RTC products must also dynamically adapt to network conditions during a call. Two challenges are fluctuations in network bandwidth and packet loss. To cope with bandwidth changes, the video encoder adjusts parameters such as resolution and frame rate. However, switching resolutions typically requires a new key frame, which can cause a sudden bitrate spike and temporary视频 freezing. Similarly, packet loss can trigger retransmissions or force the encoder to send another key frame, both of which may lead to video freezes. Effectively managing these issues helps enable delivery of high-quality, uninterrupted video calls. Additionally, the RTC client must perform both real-time encoding and decoding, both of which consume significant power — making power efficiency important, especially on mobile devices. RTC 产品还必须在通话过程中动态适应网络状况。网络带宽波动和丢包是两个主要挑战。为了应对带宽变化,视频编码器会调整分辨率和帧率等参数。然而,切换分辨率通常需要一个新的关键帧,这可能会导致码率突发峰值和暂时的视频卡顿。同样,丢包可能会触发重传或强制编码器发送另一个关键帧,这两者都可能导致视频卡顿。有效管理这些问题有助于实现高质量、不间断的视频通话。此外,RTC 客户端必须同时进行实时编码和解码,这两者都会消耗大量电量——这使得电源效率变得至关重要,尤其是在移动设备上。
Encoder and Decoder Selection
编码器和解码器选择
Choosing the right encoder and decoder is the most critical step in adopting a new codec. The computational complexity of video codecs is a significant consideration for mobile devices. While AV1 offers improved compression efficiency through advanced coding tools, these benefits come at the burden of increased computational demands, particularly during encoding. To assess this increased complexity, in an offline experiment we integrated an open-source AV1 encoder and measured power consumption on a Pixel 8 device during a video call. The results showed a 14% increase in power usage compared to H.264/AVC — a significant challenge for mobile deployment. To address this, we adopted an internal low-complexity encoder that has similar power consumption as H.264 baseline, as detailed in the next section. 选择合适的编码器和解码器是采用新编解码器最关键的一步。视频编解码器的计算复杂度是移动设备需要重点考虑的因素。虽然 AV1 通过先进的编码工具提高了压缩效率,但这些优势是以增加计算需求为代价的,尤其是在编码过程中。为了评估这种增加的复杂度,我们在一次离线实验中集成了一个开源 AV1 编码器,并测量了 Pixel 8 设备在视频通话期间的功耗。结果显示,与 H.264/AVC 相比,功耗增加了 14%——这对移动部署来说是一个重大挑战。为了解决这个问题,我们采用了一个内部的低复杂度编码器,其功耗与 H.264 基准相当,具体细节将在下一节中介绍。