Record type inference for dummies

Record type inference for dummies

匿名记录类型推导入门

The reason I’m writing this post is because I actually wanted to write a more advanced post on type inference for anonymous records, but then I realized that most of my readers wouldn’t understand the latter post in isolation. So I figured I would write this introductory post to teach people new to type theory the basics. 我写这篇文章的原因是,我原本想写一篇关于匿名记录(anonymous records)类型推导的高阶文章,但我意识到大多数读者在没有背景知识的情况下很难理解。因此,我决定写这篇入门文章,为类型理论的新手讲解基础知识。

The reason I’m writing both posts is because I believe that good type inference for anonymous records is one of the big things holding back statically typed languages and not enough people appreciate this or understand why. I also think that there is a large disconnect between what programming language experts understand is possible and what lay programmers are comfortable or familiar with. 我写这两篇文章的原因在于,我相信优秀的匿名记录类型推导是阻碍静态类型语言发展的关键因素之一,但目前并没有足够多的人意识到这一点或理解其原因。同时,我认为编程语言专家所认为的“可行性”与普通程序员所熟悉或习惯的范式之间存在巨大的鸿沟。

To make a play on XKCD #2501: So this post (and the next one) are going to be an exposition of where the field of type theory was at over three decades ago, which our industry still hasn’t really caught up to, yet. 借用 XKCD #2501 的梗:这篇文章(以及下一篇)将阐述三十多年前类型理论领域的发展水平,而我们的行业至今尚未真正赶上。

Anonymous records

匿名记录

I mentioned that this is a post about type inference for anonymous records and I’ve met some programmers who either don’t know what anonymous records are or at least don’t recognize them by that name, so I’ll briefly touch upon them. An anonymous record is a record that doesn’t require an associated datatype declaration, and they’re quite common in dynamically typed languages. 我提到这是一篇关于匿名记录类型推导的文章,但我遇到过一些程序员,他们要么不知道什么是匿名记录,要么至少不熟悉这个术语,所以我将简要介绍一下。匿名记录是指不需要关联数据类型声明的记录,它们在动态类型语言中非常常见。

For example: 例如:

  • JavaScript calls them “objects”: { name: "Alice", age: 25 }

  • Python calls them “dictionaries”: { "name": "Alice", "age": 25 }

  • Ruby calls them “hashes”: { :name => "Alice", :age => 25 }

  • Nix calls them “attribute sets”: { name = "Alice"; age = 25; }

  • JavaScript 称之为“对象”(objects):{ name: "Alice", age: 25 }

  • Python 称之为“字典”(dictionaries):{ "name": "Alice", "age": 25 }

  • Ruby 称之为“哈希”(hashes):{ :name => "Alice", :age => 25 }

  • Nix 称之为“属性集”(attribute sets):{ name = "Alice"; age = 25; }

If you’ve ever worked with JSON then you’ve worked with anonymous records because JSON objects (just like JavaScript objects) are anonymous records. 如果你使用过 JSON,那么你就接触过匿名记录,因为 JSON 对象(就像 JavaScript 对象一样)本质上就是匿名记录。

Static typing

静态类型

A smaller number of statically typed programming languages support anonymous records because statically typed languages usually prefer named datatypes. For example, Haskell does not support anonymous records and requires a datatype declaration for all records, like this: 支持匿名记录的静态类型编程语言较少,因为静态类型语言通常更倾向于使用命名数据类型。例如,Haskell 不支持匿名记录,要求所有记录都必须有数据类型声明,如下所示:

data Person = Person{ name :: Text, age :: Integer }

example :: Person
example = Person{ name = "Alice", age = 25 }

… but there are still some statically typed programming languages that do support anonymous records, like TypeScript: ……但仍有一些静态类型编程语言支持匿名记录,例如 TypeScript:

{ name: "Alice", age: 25 } : { name: string, age: number }

… or C# (which calls them “anonymous types”): ……或者 C#(称之为“匿名类型”):

new { Name = "Alice"; Age = 25 }

… or PureScript (which calls them records): ……或者 PureScript(直接称之为记录):

{ name: "Alice", age: 25 } :: { name :: String, age :: Int }

If all we needed was language support for record literals without any operations on records then it’s fairly easy to infer their types. To do so, though, I’m going to introduce some type theory notation. 如果我们只需要语言支持记录字面量,而不需要对记录进行任何操作,那么推导它们的类型是相当容易的。不过,为了实现这一点,我需要引入一些类型理论的符号。

Let’s start by defining a basic abstract syntax tree that says that an expression e can be either: 让我们先定义一个基本的抽象语法树(AST),说明表达式 e 可以是以下几种情况之一:

  • a Boolean (e.g. True)

  • a String (e.g. "Alice")

  • a Number (e.g. 25.0)

  • a record containing 0 or more fields (e.g. { name = "Alice", age = 25 })

  • 布尔值(例如 True

  • 字符串(例如 "Alice"

  • 数字(例如 25.0

  • 包含 0 个或多个字段的记录(例如 { name = "Alice", age = 25 }

The equivalent Haskell code would be something like: 对应的 Haskell 代码如下:

type Identifier = Text
data Expression = Boolean Bool | String Text | Number Double | Record (Map Identifier Expression)

Fields can store arbitrary expressions, which means you can nest records. We’ll also need to define an abstract syntax tree for our inferred types, which can be either: 字段可以存储任意表达式,这意味着你可以嵌套记录。我们还需要为推导出的类型定义一个抽象语法树,它可以是:

  • the BooleanType

  • the StringType

  • the NumberType

  • a record type containing 0 or more fields

  • 布尔类型

  • 字符串类型

  • 数字类型

  • 包含 0 个或多个字段的记录类型

The equivalent Haskell code would be something like: 对应的 Haskell 代码如下:

data Type = BooleanType | StringType | NumberType | RecordType (Map Identifier Type)

Now that we have a syntax for expressions and types we can define some type inference rules. 现在我们有了表达式和类型的语法,可以定义一些类型推导规则了。

If you’ve never seen this notation before, you can think of it as mathematical pseudocode for how to implement a type inference function. The equivalent Haskell function would be something like: 如果你以前从未见过这种符号,可以将其视为实现类型推导函数的数学伪代码。对应的 Haskell 函数如下:

infer :: [(Identifier, Type)] -> Expression -> Either Text Type
infer context (Boolean _) = return BooleanType
infer context (String _)  = return StringType
infer context (Number _)  = return NumberType

Now suppose that we wanted to infer the type of a record literal. Normally we’d reason about the expression’s type by hand like this: 现在假设我们想要推导一个记录字面量的类型。通常我们会像这样手动推导表达式的类型:

To infer the type of { name = "Alice", age = 25 } we need to infer the type of each field: 要推导 { name = "Alice", age = 25 } 的类型,我们需要推导每个字段的类型:

A. first, infer the type of "Alice" (which is StringType) B. then, infer the type of 25 (which is NumberType) now combine those into the final record type: { name :: StringType, age :: NumberType }

A. 首先,推导 "Alice" 的类型(即 StringType) B. 然后,推导 25 的类型(即 NumberType) 最后将它们组合成最终的记录类型:{ name :: StringType, age :: NumberType }

If we take that reasoning process and generalize it to all records we might write something like this: 如果我们把这个推理过程推广到所有记录,我们可以这样写:

infer context (Record fields) = do
    fieldTypes <- traverse (infer context) fields
    return (RecordType fieldTypes)

… and we can verify this all works in the Haskell REPL: ……我们可以在 Haskell REPL 中验证这一切是否有效:

ghci> infer [] (Record [("name", String "Alice"), ("age", Number 25)])
Right (RecordType (fromList [("age",NumberType),("name",StringType)]))

Field access

字段访问

Any programming language worth its salt will also support record field access using something like dot notation (e.g. e.name). For example, in Nix that would look like this: 任何合格的编程语言都会支持使用点号(例如 e.name)来访问记录字段。例如,在 Nix 中是这样的:

nix-repl> { name = "Alice"; age = 25; }.name
"Alice"

So we’ll add a type inference rule for field access, but first we need to… 因此,我们将为字段访问添加一条类型推导规则,但首先我们需要……