A bidirectional typechecking puzzle
A bidirectional typechecking puzzle
一个双向类型检查的难题
Recently I stumbled across a bidirectional typechecking puzzle in my Grace programming language that I thought people interested in programming language theory might find interesting. The context is that Grace is based on a bidirectional typechecking system (specifically: Complete and Easy Bidirectional Typechecking for Higher-Rank Polymorphism) and as Grace evolved I began to run into a few limitations of that approach. Some of these limitations I could workaround with various hacks but recently these workarounds piled up to generate a bizarre bug.
最近,我在我的 Grace 编程语言中偶然发现了一个双向类型检查的难题,我想对编程语言理论感兴趣的人可能会觉得它很有趣。背景是 Grace 基于一种双向类型检查系统(具体来说是:Complete and Easy Bidirectional Typechecking for Higher-Rank Polymorphism),随着 Grace 的发展,我开始遇到该方法的一些局限性。其中一些局限性我可以通过各种技巧来规避,但最近这些变通方法堆积起来,导致了一个奇怪的 Bug。
The problem
问题所在
To illustrate the bug, consider this Grace program: 为了说明这个 Bug,请看这段 Grace 程序:
let authorities = [ { domain: "google.com" } , { domain: "localhost", port: 8080 } ]
for { domain, port = 443 } of authorities in "${domain}:${show port}"
You can read this program as describing a list comprehension: Iterate over each record in the list of authorities and for each one bind the domain and port fields to variables of the same name, defaulting port to 443 if absent. Return "${domain}:${show port}" for each authority record.
你可以将此程序理解为列表推导式:遍历 authorities 列表中的每条记录,并将 domain 和 port 字段绑定到同名变量,如果 port 缺失则默认为 443。为每条记录返回 "${domain}:${show port}"。
I would expect a program like this to return:
我预期这样的程序会返回:
[ "google.com:443", "localhost:8080" ]
… but the bugged version would return:
……但有 Bug 的版本会返回:
[ "google.com:443", "localhost:443" ]
What gives? The port: 8080 field in the second record is being completely ignored.
怎么回事?第二条记录中的 port: 8080 字段被完全忽略了。
Bidirectional typechecking
双向类型检查
You might think that this is a problem with Grace’s evaluator but actually it’s a problem with Grace’s typechecker if you can believe that. The first part of the problem arises out of how Grace infers the type of lists. The type I would like Grace to infer for the list of authorities is: List { domain: Text, port: Optional Natural } … which you can read as: a List of records, each of which has a required domain field storing Text and an Optional port field storing a Natural number.
你可能认为这是 Grace 求值器的问题,但如果你敢相信的话,这实际上是 Grace 类型检查器的问题。问题的第一部分源于 Grace 推断列表类型的方式。我希望 Grace 为 authorities 列表推断出的类型是:List { domain: Text, port: Optional Natural }……你可以将其理解为:一个记录列表,其中每个记录都有一个存储 Text 的必选 domain 字段,以及一个存储 Natural 数值的可选 port 字段。
However, that is not the type that Grace actually infers: 然而,这并不是 Grace 实际推断出的类型:
>>> :type [ { "domain": "google.com" }, { "domain": "localhost", "port": 8080 } ]
List { domain: Text }
Whoops! Why is that? Well, it’s related to Grace using a bidirectional typechecker, where typechecking operations essentially come in two flavors: 糟糕!这是为什么呢?嗯,这与 Grace 使用双向类型检查器有关,其中类型检查操作本质上分为两种:
- infer the type of an expression (推断表达式的类型)
- check the type of an expression against an expected type (根据预期类型检查表达式的类型)
… and while that has some upsides (which I talk about in another post on the appeal of bidirectional typechecking) it has some downsides: you can’t easily infer the type of a list because neither of those two operations provides a way to combine two or more element types into an element supertype for the whole list. ……虽然这有一些优点(我在另一篇关于双向类型检查吸引力的文章中讨论过),但它也有一些缺点:你无法轻易推断出列表的类型,因为这两种操作都无法提供一种将两个或多个元素类型合并为整个列表的元素超类型的方法。
Before the bug fix, I’d hack around this by inferring the type of a list like this: 在修复这个 Bug 之前,我通过以下方式推断列表类型来绕过这个问题:
- infer the type of the first element of the list (推断列表中第一个元素的类型)
- check every other element against the inferred type of the first element (根据第一个元素的推断类型检查所有其他元素)
… and that’s why Grace would infer a type like this: ……这就是为什么 Grace 会推断出这样的类型:
>>> :type [ { "domain": "google.com" }, { "domain": "localhost", "port": 8080 } ]
List { domain: Text }
… because the type of the first element is { domain: Text } so that would become the element type for the entire list. If you expected a different type then you had to add an explicit type annotation. This wasn’t a great solution but it “worked” for a while as a half-measure.
……因为第一个元素的类型是 { domain: Text },所以它成为了整个列表的元素类型。如果你期望不同的类型,就必须添加显式的类型注解。这不是一个完美的解决方案,但作为一种权宜之计,它“奏效”了一段时间。
Most of the bidirectional typechecking literature recommends a different solution: require a type annotation on the list because then you can check each element of the list against the expected element type. However, that also didn’t sit right with me because I wanted Grace to work with real-world JSON which often has large and complicated schemas. I didn’t want users to have to spell out the entire schema as a giant type annotation. 大多数双向类型检查文献推荐另一种解决方案:要求对列表进行类型注解,因为这样你就可以根据预期的元素类型检查列表中的每个元素。然而,这对我来说并不合适,因为我希望 Grace 能处理现实世界中的 JSON,而这些 JSON 通常具有庞大且复杂的模式。我不希望用户必须将整个模式写成一个巨大的类型注解。
Elaboration
细化 (Elaboration)
Okay, that explains why the type is wrong but that still doesn’t explain why the result of evaluation is wrong. We’ve been talking about typechecking so far, but how does typechecking affect evaluation? Well, Grace’s typechecker does more than just flag type errors: it also adjusts the expression along the way, a process known as “elaboration”. 好吧,这解释了为什么类型是错误的,但仍然没有解释为什么求值结果是错误的。到目前为止我们一直在讨论类型检查,但类型检查如何影响求值呢?嗯,Grace 的类型检查器不仅仅是标记类型错误:它还在过程中调整表达式,这个过程被称为“细化”(elaboration)。
Specifically, when Grace checks a subtype against a supertype the typechecker inserts an explicit coercion converting the subtype to the supertype if they differ. For example, Grace’s evaluator internally represents all Optional values as either null or tagged as some x (for soundness reasons). However, if you input an expression without the tag where an Optional is expected then Grace will automatically insert a tag for you, like this:
具体来说,当 Grace 根据超类型检查子类型时,如果两者不同,类型检查器会插入一个显式的强制转换,将子类型转换为超类型。例如,Grace 的求值器在内部将所有 Optional 值表示为 null 或标记为 some x(出于健全性考虑)。然而,如果你在预期 Optional 的地方输入了一个没有标记的表达式,Grace 会自动为你插入一个标记,如下所示:
>>> [ some 1, 2 ] # This would be a type mismatch without elaboration
[ some 1, some 2 ]
There the type checker infers that the first element has type Optional Natural and then sees that the second element is a naked Natural number (without the some tag), but rather than flag that as a type error the typechecker helpfully inserts a some tag to fix the type mismatch.
在这里,类型检查器推断出第一个元素的类型是 Optional Natural,然后看到第二个元素是一个裸的 Natural 数值(没有 some 标记),但类型检查器并没有将其标记为类型错误,而是贴心地插入了一个 some 标记来修复类型不匹配。
The same thing happens with records, too! Grace supports record subtyping and also coerces the subtype to match the supertype. For example, if you annotate a record with a smaller record type: 记录也会发生同样的情况!Grace 支持记录子类型化,也会强制将子类型转换为匹配超类型。例如,如果你用一个较小的记录类型来注解一个记录:
>>> { x: 1, y: true }: { x: Natural }
{ "x": 1 }
… the typechecker allows that buuuut will also strip every field not present in the supertype. This coercion is important for soundness and the Appendix explains why in more detail. This sort of coercion contributed to the bug in the original example, and we can see why if we evaluate the authorities list just by itself:
……类型检查器允许这样做,但也会剔除所有超类型中不存在的字段。这种强制转换对于健全性很重要,附录中对此有更详细的解释。这种强制转换导致了原始示例中的 Bug,如果我们单独对 authorities 列表进行求值,就能看出原因:
>>> [ { domain: "google.com" }, { domain: "localhost", port: 8080 } ]
[ { "domain": "google.com" }, { "domain": "localhost" } ]
We get this result because: 我们得到这个结果是因为:
- the inferred type of the first element is
{ domain: Text }(第一个元素的推断类型是{ domain: Text }) - the second element is checked against that expected type (第二个元素根据该预期类型进行检查)
- the second element matches that type but also has an extra field (
port) (第二个元素匹配该类型,但还有一个额外的字段port) - the typechecker strips the
portfield to match the expected type (类型检查器剔除了port字段以匹配预期类型)
I don’t believe this coercion is the root cause of the bug (and I explain why in more detail in the Appendix), but we still need to understand coercion to understand the bug. 我不认为这种强制转换是该 Bug 的根本原因(我在附录中更详细地解释了原因),但我们仍然需要理解强制转换才能理解这个 Bug。