The AT-URI Syntax Mess
The AT-URI Syntax Mess
AT-URI 语法混乱
It’s time to tackle one of the most unfortunate warts in the atproto specifications: that the AT URI syntax, as currently specified, is not a valid URI or URL. We put DIDs in the URI “authority” place (just after the at://), and they conflict with the host/port hierarchal URI syntax components specified in RFC 3986.
是时候解决 atproto 规范中最令人遗憾的缺陷之一了:目前定义的 AT URI 语法并非有效的 URI 或 URL。我们将 DID 放在了 URI 的“授权(authority)”位置(即 at:// 之后),这与 RFC 3986 中规定的主机/端口层级 URI 语法组件产生了冲突。
The core issue was surfaced in 2025 (by @mcc in this github issue), and is coming to a head now for two reasons. First, more folks are trying to use AT URIs in places expecting a well-formed URI (or URL), such as in HTML <link> tags. And second, the ATP Working Group at the IETF has a deliverable of standardizing the AT URI scheme with a permanent registration.
核心问题在 2025 年被提出(由 @mcc 在此 GitHub issue 中指出),现在由于两个原因变得愈发紧迫。首先,越来越多的人试图在期望使用格式良好的 URI(或 URL)的地方使用 AT URI,例如在 HTML 的 <link> 标签中。其次,IETF 的 ATP 工作组有一项任务,即通过永久注册来标准化 AT URI 方案。
The purpose of this post is give some background on the situation, then talk through some ways we could try and fix it. Unfortunately, I don’t think there are any great options on the table: any path the ecosystem takes is going to involve some developer pain and disruption.
本文旨在提供该情况的背景信息,并探讨我们可以尝试的几种修复方案。遗憾的是,我认为目前并没有完美的选项:生态系统所采取的任何路径都将涉及开发者的阵痛和系统中断。
I care a lot about this because I think AT URIs are quietly one of the best and most important features of the protocol! Putting an individual account identifier in the authority section, instead of a provider hostname, is both concretely and symbolically one of the Big Ideas of atproto: accounts are the ultimate authority for their content, and can seamlessly move between hosting providers. Accounts are in the driver seat, and resolve to a current network location. That aspect isn’t going to change, and if we can resolve this syntax mess then folks can go nuts and start using AT URIs all over the place.
我非常关心这个问题,因为我认为 AT URI 是该协议中最出色、最重要的功能之一!将个人账户标识符放在授权部分,而不是提供商的主机名,这在具体实现和象征意义上都是 atproto 的核心理念之一:账户是其内容的最终权威,并且可以在托管提供商之间无缝迁移。账户掌握着主动权,并能解析到当前的网络位置。这一点不会改变,如果我们能解决这个语法混乱,大家就可以放心地在各处广泛使用 AT URI 了。
What Went Wrong?
哪里出了问题?
To ground things a bit, here is an example AT URI pointing at a specific record, with a DID in the authority place: at://did:plc:vwzwgnygau7ed7b7wt5ux7y2/app.bsky.feed.post/3k5nobkf2w72g And the generic syntax: at:// <authority> / <collection> / <record-key>
为了更具体地说明,这里有一个指向特定记录的 AT URI 示例,其中 DID 位于授权位置:at://did:plc:vwzwgnygau7ed7b7wt5ux7y2/app.bsky.feed.post/3k5nobkf2w72g。其通用语法为:at:// <authority> / <collection> / <record-key>。
The at: URI schema was originally designed in 2022, and was provisionally registered in 2023. We knew that putting a DID in the authority part could cause compatibility issues, and would mean the syntax wasn’t a valid web URL under the WHATWG specification. But we (incorrectly) thought that it still complied with the URI syntax, as standardized in RFC-3986.
at: URI 方案最初设计于 2022 年,并于 2023 年获得临时注册。我们当时就知道,在授权部分放置 DID 可能会导致兼容性问题,并意味着该语法在 WHATWG 规范下不是有效的 Web URL。但我们(错误地)认为它仍然符合 RFC-3986 标准化的 URI 语法。
The root of the issue came from motivated reading of RFC-3986, and in particular Section 1.2.3: “For some URI schemes, the visible hierarchy is limited to the scheme itself: everything after the scheme component delimiter (”:”) is considered opaque to URI processing.”
问题的根源在于对 RFC-3986 的主观解读,特别是第 1.2.3 节:“对于某些 URI 方案,可见的层级仅限于方案本身:方案组件分隔符(“:”)之后的所有内容都被视为对 URI 处理过程不透明。”
At the time I interpreted this to mean that the underlying principle was that a URI was the scheme name and then colon character. Everything after that, as described in section 3.2, 3.3, and on, seemed like optional “hierarchal” syntax, which could be ignored. There are plenty of valid URI schemes that do this, like email (mailto:John.Doe@example.com), usenet (news:comp.infosystems.www.servers.unix).
当时我将其解读为:其基本原则是 URI 由方案名称和冒号组成。此后第 3.2、3.3 节等描述的所有内容似乎都是可选的“层级”语法,可以忽略。有很多有效的 URI 方案都这样做,例如电子邮件(mailto:John.Doe@example.com)和 usenet(news:comp.infosystems.www.servers.unix)。
Further guidance for URIs which are intended for permanent IANA registration (which we want for the AT URI scheme) are described in RFC-7595 Section 3.2, which says: “Schemes SHOULD avoid improper use of ’//’. The use of double slashes in the first part of a URI is not a stylistic indicator that what follows is a URI: double slashes are intended for use ONLY when the syntax of the
对于旨在进行永久 IANA 注册的 URI(我们希望 AT URI 方案如此),RFC-7595 第 3.2 节提供了进一步的指导,其中指出:“方案应避免不当使用 ’//‘。URI 第一部分中使用双斜杠并不是表明后续内容为 URI 的风格指示:双斜杠仅在
That makes it clear what the best practice is, though i’ll note that these are “SHOULD” not “MUST”. Overall, it seems clear that the generic/hierarchal syntax restrictions are intended to be required when // is used. The formal ABNF grammar in RFC-3986 Appendix A requires it. The WHATWG URL syntax leaves no ambiguity about this. And on a pragmatic level, the actual strings fail to parse with most real-world URL and URI libraries.
这明确了最佳实践是什么,尽管我需要指出这些是“建议(SHOULD)”而非“必须(MUST)”。总的来说,很明显当使用 // 时,通用/层级语法限制是必须遵守的。RFC-3986 附录 A 中的正式 ABNF 语法对此有要求。WHATWG URL 语法对此毫无歧义。从实际层面来看,这些字符串在大多数现实世界的 URL 和 URI 库中都无法解析。
What Can We Do About It?
我们能做什么?
There are billions of AT URIs in the wild today. Many of these are in content-addressed data records which can not be update without changing the record version (hash). Or in millions of cryptographically signed label objects from hundreds of providers. This means that the current syntax is going to be encountered and needs to be at least partially supported by atproto implementations indefinitely.
目前市面上已有数十亿个 AT URI。其中许多存在于内容寻址的数据记录中,如果不更改记录版本(哈希值),就无法更新。或者存在于来自数百个提供商的数百万个经过加密签名的标签对象中。这意味着当前的语法将会被持续遇到,并且 atproto 实现必须无限期地至少部分支持它。
With that in mind, I think there are a few broad approaches the ecosystem could take: Keep the existing syntax, and stop calling it a “URI”: for example, call them “AT reference identifiers” or something like that. This would mean abandoning the IETF working group charter goal of standardizing the URI scheme. It would also cause endless confusion and ambiguity when trying to use the strings on the web, in other protocols, etc. I think this is a bad outcome and we should no…
考虑到这一点,我认为生态系统可以采取几种大致的方法:保留现有语法,并不再将其称为“URI”:例如,称它们为“AT 引用标识符”之类。这意味着放弃 IETF 工作组标准化 URI 方案的章程目标。这也会在尝试在 Web、其他协议等中使用这些字符串时造成无尽的困惑和歧义。我认为这是一个糟糕的结果,我们不应该……