Why implementing ActivityPub is hard, and why it doesn't have to be

Why implementing ActivityPub is hard, and why it doesn’t have to be

为什么实现 ActivityPub 很难,以及为什么它本不必如此

A quiet failure 静默的失败

Picture the moment your server sends its first Follow activity to Mastodon. You read the spec, built the JSON, signed the HTTP request, and POSTed it with care. What comes back is a single line: 401 Unauthorized. No body. No explanation. What went wrong? 想象一下,当你的服务器向 Mastodon 发送第一个“关注”(Follow)活动时的情景。你阅读了规范,构建了 JSON,签署了 HTTP 请求,并小心翼翼地将其 POST 出去。返回的却只有一行字:401 Unauthorized(未授权)。没有响应体,没有解释。哪里出错了?

Maybe the clock behind your Date header drifted a few minutes. Maybe the hash in your Digest header is off. Maybe you uppercased the (request-target) pseudo-header while building the signing string, or published your public key as PEM where the other side wanted multibase. The remote server won’t tell you. So you start reading someone else’s server code to debug your own. I know, because I’ve been there. 也许你 Date 头部的时间戳慢了几分钟。也许你的 Digest 头部哈希值不对。也许你在构建签名字符串时将 (request-target) 伪头部大写了,或者对方需要 multibase 格式时你却发布了 PEM 格式的公钥。远程服务器不会告诉你原因。于是你开始阅读别人的服务器代码来调试自己的。我深有体会,因为我经历过这一切。

Fedify began as a casualty of another project. I set out to build a single-user microblogging server, the one that would later become Hollo, and started implementing ActivityPub from scratch. Somewhere between the signature specs and the JSON-LD, the protocol work swallowed the product, and I put the whole thing down. What I picked back up wasn’t the app. It was the framework the app should have had. Fedify shipped first; only then could Hollo exist, built on top of it. Fedify 的诞生源于另一个项目的“牺牲”。我曾着手构建一个单用户微博服务器(后来成为了 Hollo),并开始从零实现 ActivityPub。在处理签名规范和 JSON-LD 的过程中,协议实现的复杂性吞噬了产品本身,我最终放弃了。但我重新拾起的不是那个应用,而是那个应用本该拥有的框架。Fedify 先发布了;只有这样,构建于其上的 Hollo 才得以存在。

ActivityPub development gets hard in a few very specific places. In this post I want to walk through five of them, then show what each one looks like with Fedify. If you’ve spent time in the fediverse, you’ll probably nod along. If you haven’t, you may wonder why anyone would do all of this by hand. Either way, the conclusion is the same: nobody has to anymore. ActivityPub 的开发在几个特定的地方非常困难。在这篇文章中,我想带大家了解其中的五个难点,并展示 Fedify 是如何解决它们的。如果你在联邦宇宙(fediverse)待过,你可能会点头表示赞同。如果你没有,你可能会好奇为什么有人会手动做这一切。无论如何,结论是一样的:现在没人需要再这样做了。

Five scenes

五个场景

Scene 1: there is more than one standard 场景 1:标准不止一个

ActivityPub servers authenticate each other with HTTP signatures. Except there isn’t one signature spec. Most of the fediverse runs on draft-cavage-http-signatures-12, an expired draft that never became a standard. The actual standard exists too: RFC 9421, HTTP Message Signatures. The problem is that you can’t know which one a given server accepts until you try. A real-world implementation therefore has to sign with one spec, see whether it gets rejected, re-sign with the other, and remember per server which one worked so it can skip the dance next time. The fediverse calls this double-knocking. Yes, you get to implement it yourself. ActivityPub 服务器通过 HTTP 签名相互验证。但问题是,签名规范不止一个。大部分联邦宇宙运行在 draft-cavage-http-signatures-12 上,这是一个从未成为正式标准的过期草案。而真正的标准是 RFC 9421(HTTP 消息签名)。问题在于,在尝试之前,你无法知道对方服务器接受哪一种。因此,实际的实现必须先用一种规范签名,如果被拒绝,再用另一种重新签名,并记住每个服务器适用的规范,以便下次跳过这个过程。联邦宇宙称之为“双重敲门”(double-knocking)。没错,你得自己实现它。

Scene 2: one document, many shapes 场景 2:一份文档,多种形态

ActivityPub’s wire format is JSON-LD, and in JSON-LD the same document can take many shapes. Your parser has to accept every combination, and which one arrives depends on the sender’s implementation. The spec-compliant answer is to normalize every document with a JSON-LD processor, expansion followed by compaction. In practice many implementations treat it all as “just JSON” and quietly break on whatever shape some server happens to emit. Either way, you end up with defensive code smeared across the whole codebase: is this a string? An array? An object? A URI I have to fetch? ActivityPub 的传输格式是 JSON-LD,而在 JSON-LD 中,同一份文档可以有多种形态。你的解析器必须接受所有组合,而具体收到哪种形态取决于发送方的实现。符合规范的做法是使用 JSON-LD 处理器对每份文档进行规范化(先展开再压缩)。但在实践中,许多实现只是将其视为“普通 JSON”,一旦遇到某种服务器发出的特殊形态就会静默崩溃。无论哪种方式,你最终都会在整个代码库中充斥着防御性代码:这是一个字符串吗?是一个数组吗?是一个对象吗?还是我需要获取的 URI?

Scene 3: the zombie post 场景 3:僵尸帖子

A user publishes a post, spots a typo, and deletes it right away. Your server sends a Create, then a Delete. Thanks to network weather, some receiving server gets the Delete first and the Create second. It ignores the deletion of a post that doesn’t exist yet, then dutifully processes the creation of a post that was already deleted. That post now lives on that server forever, while its author believes it’s gone. 用户发布了一条帖子,发现有错别字,于是立即将其删除。你的服务器发送了一个 Create(创建)活动,接着发送了一个 Delete(删除)活动。由于网络状况,某些接收服务器先收到了 Delete,后收到了 Create。它忽略了对尚未存在的帖子的删除操作,然后忠实地处理了那个已经被删除的帖子的创建操作。于是,那条帖子就永远留在了那台服务器上,而作者却以为它已经消失了。