p99 0ms* autocomplete for 240 million domain names
p99 0ms* autocomplete for 240 million domain names
针对 2.4 亿个域名的 p99 0ms* 自动补全
We’ll get to the asterisk. I run Wirewiki.com, a website to inspect internet infrastructure like domain names. It helps people check (historic) DNS records, DNS delegation, email deliverability config, etc. There are a ton of sites that offer this (growing faster than ever thanks to vibe coding), so I need a way to stand out. I picked tool quality / usefulness and UX. The autocomplete is the main way to navigate Wirewiki, so it should be as complete, accurate and fast as possible. I want it to be instant. Like, next frame instant. I’ve mostly achieved that. 我们稍后再解释那个星号。我运营着 Wirewiki.com,这是一个用于检查域名等互联网基础设施的网站。它能帮助用户查询(历史)DNS 记录、DNS 委派、电子邮件投递配置等。市面上有很多提供此类服务的网站(得益于“感觉编程/vibe coding”,这类网站的数量增长比以往任何时候都快),所以我需要找到一种脱颖而出的方法。我选择了工具质量/实用性和用户体验(UX)。自动补全功能是导航 Wirewiki 的主要方式,因此它必须尽可能完整、准确且快速。我希望它是即时的,就像下一帧显示那样即时。目前我已经基本实现了这一点。
Here’s how. On keyDown (the user starts pressing a key), we prefetch the suggestions for the typed character + any next character. And on keyUp (the user releases the key), we render the suggestions.
实现方式如下:在 keyDown 事件(用户开始按下按键)时,我们预取已输入字符加上任意下一个字符的建议。而在 keyUp 事件(用户松开按键)时,我们渲染这些建议。
That gives us a time budget of keyPress1Duration + gap between key presses + keyPress2Duration. If the API returns before the end of the second key press, we’ll have the results ready in time. (A 60 Hz display renders every 16.7 ms. So we technically have 8.33 ms extra time budget at p50, but near 0 ms at p99.) 这给了我们一个时间预算:第一次按键时长 + 按键间隔 + 第二次按键时长。如果 API 在第二次按键结束前返回结果,我们就能及时准备好数据。(60Hz 的显示器每 16.7 毫秒渲染一次。因此,在 p50 时我们理论上有 8.33 毫秒的额外时间预算,但在 p99 时几乎为 0 毫秒。)
API round-trip time: The request for q=wi fires the instant i is pressed; if its response lands before k is released, completions for wik render with zero perceived latency. So for the purpose of this article, we’ll define latency as keyUp to results ready for rendering. p99 0 ms means that 99% of the time, the results will be ready before the user even releases the key. We need two things to make this happen: Client side prefetching and caching of the suggestions, and An API that’s fast enough.
API 往返时间:当按下“i”键时,针对 q=wi 的请求会立即触发;如果响应在“k”键松开之前到达,那么 wik 的补全结果将以零感知延迟渲染。因此,在本文中,我们将延迟定义为从 keyUp 到结果准备好渲染的时间。p99 0ms 意味着在 99% 的情况下,结果在用户松开按键之前就已经准备好了。要实现这一点,我们需要两样东西:客户端的预取和建议缓存,以及足够快的 API。
How big is the budget? We now know that we can spend two key press durations and a gap duration, but how long is that in milliseconds? I’ve measured it while typing 100 domain names reasonably fast and found that p99 works out to 121 ms for me. 预算有多少?我们现在知道可以花费两次按键时长加上一个间隔时长,但这在毫秒级是多少呢?我通过快速输入 100 个域名进行了测量,发现我的 p99 时间为 121 毫秒。
How fast can we make the API? I’m using the Tranco list of the top 1 million most popular domains for this API. These should be suggested first, and supplemented by any other domain name currently in use. CZDS offers the list of all domains for most of the gTLDs (like .com, .net, .org). ccTLDs (like .uk, .de, .fr) are unfortunately not available. But domains for those with any meaningful traffic will be in the Tranco list anyway. 我们能把 API 做得有多快?我使用了 Tranco 提供的全球前 100 万个最受欢迎的域名列表作为 API 的基础。这些域名应优先被建议,并辅以当前正在使用的其他域名。CZDS 提供了大多数通用顶级域名(如 .com, .net, .org)的所有域名列表。遗憾的是,国家代码顶级域名(如 .uk, .de, .fr)无法获取。但无论如何,那些有实际流量的域名都会出现在 Tranco 列表中。
I’ve designed the API to first search Tranco (the head), and then CZDS (the tail) if necessary. The results are returned in rank order, so the first 8 are the most popular. 我设计的 API 会先搜索 Tranco(头部数据),必要时再搜索 CZDS(尾部数据)。结果按排名顺序返回,因此前 8 个是最受欢迎的。
Head: in-memory character trie. A trie (prefix tree) stores the top 8 suggestions precomputed for every prefix. A prefix lookup is a walk of a few pointers. Worst case time complexity: O(length of what you typed). 头部:内存中的字符前缀树(Trie)。前缀树为每个前缀预先计算并存储了前 8 个建议。前缀查找只需遍历几个指针。最坏情况下的时间复杂度为:O(输入长度)。
Tail: SSD backed memory-mapped block index. The CZDS domains are sorted and delta-compressed into fixed-size blocks with a tiny in-memory directory. A lookup binary-searches the directory (27 MB), then linearly scans one block of 256 names. The 240M domain names take about 2.5 GB of disk space. Hot pages are cached in memory by the OS. Worst case time complexity: O(length of what you typed * log(number of domains)). 尾部:基于 SSD 的内存映射块索引。CZDS 域名经过排序和增量压缩,存储在固定大小的块中,并配有一个微小的内存目录。查找时先对目录(27 MB)进行二分搜索,然后线性扫描一个包含 256 个名称的块。2.4 亿个域名占用约 2.5 GB 磁盘空间。热点页面由操作系统缓存在内存中。最坏情况下的时间复杂度为:O(输入长度 * log(域名总数))。
Both the number of domains and the query length are bounded. That makes the worst case for both data structures effectively O(1), which should keep p99 latency low. Let’s see. Every keystroke travels Browser → Cloudflare → nginx → API and the response returns along the same path. 域名数量和查询长度都是有限的。这使得两种数据结构的最坏情况实际上都是 O(1),这应该能保持较低的 p99 延迟。让我们看看。每次按键都会经过“浏览器 → Cloudflare → nginx → API”的路径,响应则沿原路返回。
Most requests are answered within 2 ms by the API. Even at 1.6k req/s, Nginx + the API responds in 15 ms 99% of the time. I’m sure we could shave off a couple of milliseconds, but I’m happy with this. Optimizing the API further doesn’t make sense, since the network dominates latency. In practice, the autocomplete latency is about equal to the round trip time from the browser through Cloudflare to the server + 10 ms. 大多数请求在 2 毫秒内即可由 API 响应。即使在每秒 1.6k 次请求的情况下,Nginx + API 也能在 99% 的时间内于 15 毫秒内完成响应。我相信我们还能再挤出几毫秒,但我对目前的结果很满意。进一步优化 API 意义不大,因为网络延迟才是主导因素。实际上,自动补全的延迟大约等于从浏览器经由 Cloudflare 到服务器的往返时间再加上 10 毫秒。