Elixir v1.20: Now a gradually typed language

Elixir v1.20: Now a gradually typed language

Elixir v1.20:现已成为一门渐进式类型语言

In 2022, we announced the effort to add set-theoretic types to Elixir. In June 2023, we published an award winning paper on Elixir’s type system design and said our work was transitioning from research to development. With Elixir v1.20, we have completed our first development milestone which is to perform type inference and gradually type check every Elixir program, without introducing type annotations. 2022 年,我们宣布了为 Elixir 添加集合论类型(set-theoretic types)的计划。2023 年 6 月,我们发表了一篇关于 Elixir 类型系统设计的获奖论文,并表示我们的工作正从研究阶段转向开发阶段。随着 Elixir v1.20 的发布,我们完成了第一个开发里程碑:在不引入类型注解的情况下,对所有 Elixir 程序进行类型推断和渐进式类型检查。

This means Elixir increasingly reports dead code and verified bugs: typing violations that are guaranteed to fail at runtime if executed. Elixir can find verified bugs in existing programs efficiently, without introducing developer overhead, and with an extremely low false positives rate. 这意味着 Elixir 现在能更有效地报告死代码和已验证的 Bug:即那些如果在运行时执行,必然会导致失败的类型违规。Elixir 可以在不增加开发者负担的情况下,高效地发现现有程序中的已验证 Bug,且误报率极低。

In this announcement, we will break down the type system goals, what the dynamic() type means in Elixir, and how it finds verified bugs. In particular, our implementation performs well in the “If T: Benchmark for Type Narrowing” benchmark. Elixir passes 12 of the 13 categories, showing that it can recover precise type information from ordinary Elixir code, which we use to find verified bugs in dynamically typed programs. 在本公告中,我们将详细介绍类型系统的目标、Elixir 中 dynamic() 类型的含义,以及它是如何发现已验证 Bug 的。特别值得一提的是,我们的实现方案在“If T: Benchmark for Type Narrowing”基准测试中表现优异。Elixir 通过了 13 个类别中的 12 个,这表明它能够从普通的 Elixir 代码中恢复精确的类型信息,我们正是利用这一点在动态类型程序中发现已验证的 Bug。

The type system was made possible thanks to a partnership between CNRS and Remote. The development work is currently sponsored by Fresha, and Tidewave. 该类型系统的实现得益于 CNRS 和 Remote 之间的合作。目前的开发工作由 Fresha 和 Tidewave 赞助。

Types, in my Elixir?

我的 Elixir 里有了类型系统?

Our goal is to introduce a type system which is: 我们的目标是引入一个具备以下特性的类型系统:

  • sound - the types inferred and assigned by the type system align with the behaviour of the program
  • 可靠(Sound) - 类型系统推断和分配的类型与程序的实际行为保持一致。
  • gradual - Elixir’s type system includes the dynamic() type, which can be used when the type of a variable or expression is checked at runtime. In the absence of dynamic(), Elixir’s type system behaves as a static one
  • 渐进式(Gradual) - Elixir 的类型系统包含 dynamic() 类型,当变量或表达式的类型需要在运行时检查时可以使用它。在没有 dynamic() 的情况下,Elixir 的类型系统表现得像静态类型系统一样。
  • developer friendly - the types are described, implemented, and composed using basic set operations: unions, intersections, and negations (hence it is a set-theoretic type system), with clear error messages
  • 对开发者友好 - 类型通过基本的集合运算(并集、交集和补集)进行描述、实现和组合(因此它是一个集合论类型系统),并提供清晰的错误信息。

Introducing a type system into an existing language is a complex change. For this reason, our first milestone was to implement the type system without introducing typing annotations but still have it provide value to developers by finding dead code and verified bugs. This is done through the dynamic() type, which in Elixir is quite different from other gradually typed languages. Let’s break it down. 将类型系统引入现有语言是一项复杂的变革。因此,我们的第一个里程碑是在不引入类型注解的前提下实现类型系统,同时通过发现死代码和已验证 Bug 来为开发者提供价值。这是通过 dynamic() 类型实现的,它在 Elixir 中与其他渐进式类型语言有很大不同。让我们深入剖析一下。

The dynamic() type

dynamic() 类型

Many gradual type systems have the any() type, which, from the point of view of the type system, often means “anything goes” and no type violations are reported. On the other hand, Elixir’s gradual type is called dynamic() and it has two important properties: compatibility and narrowing. 许多渐进式类型系统都有 any() 类型,从类型系统的角度来看,这通常意味着“任何情况都可以”,不会报告任何类型违规。相比之下,Elixir 的渐进类型被称为 dynamic(),它具有两个重要属性:兼容性(compatibility)和收窄(narrowing)。

In static type systems, when you have a type of shape integer() or binary() and you invoke a function, said function must accept both types. However, because type systems cannot capture the intention of all of our programs with precision, this may lead to false positives. 在静态类型系统中,当你拥有 integer()binary() 类型的变量并调用函数时,该函数必须同时接受这两种类型。然而,由于类型系统无法精确捕捉我们所有程序的意图,这可能会导致误报。

Although value_or_error has type integer() or binary(), the operator / accepts only numbers, and String.upcase accepts only binaries/strings, the program above is valid and emits no exceptions at runtime. However, a type system would still report two violations, because the types supplied to / and String.upcase are not a subtype of the accepted types. 虽然 value_or_error 的类型是 integer()binary(),运算符 / 仅接受数字,而 String.upcase 仅接受二进制/字符串,但上述程序是有效的,且在运行时不会抛出异常。然而,类型系统仍会报告两次违规,因为提供给 /String.upcase 的类型不是所接受类型的子类型。

While the program above could be better written to have no typing violations, type systems will always reject valid programs, and if Elixir were to introduce too many false positives in existing codebases, it would quickly erode the trust in the type system. Therefore, Elixir’s gradual type system tags the value_or_error variable above with the type dynamic(integer() or binary()), which means the type is either integer() or binary() at runtime. 虽然上述程序可以通过重写来消除类型违规,但类型系统总会拒绝一些有效的程序。如果 Elixir 在现有代码库中引入过多的误报,会迅速削弱开发者对类型系统的信任。因此,Elixir 的渐进式类型系统将上述 value_or_error 变量标记为 dynamic(integer() or binary()) 类型,这意味着在运行时该类型要么是 integer(),要么是 binary()

When calling a function with a dynamic() type, Elixir will only emit a typing violation if the supplied types and the accepted types are disjoint. In the program above, even though / expects only numbers, dynamic(integer() or binary()) can be an integer() and given the accepted and supplied types are not disjoint, there are no typing violations. 当调用带有 dynamic() 类型的函数时,只有在提供的类型与接受的类型不相交(disjoint)时,Elixir 才会发出类型违规警告。在上述程序中,尽管 / 仅期望数字,但 dynamic(integer() or binary()) 可以是 integer(),鉴于接受的类型和提供的类型并非不相交,因此不存在类型违规。

However, if we were to change the program to this: 然而,如果我们把程序改成这样:

value_or_error = if value > 1 do value else "not well" end
Map.fetch!(value_or_error, :some_key)

Because Map.fetch! expects a map data structure, and value_or_error can only be integer or binary at runtime, the accepted and supplied types are disjoint, which turns into a violation. This is known as the compatibility property and it explains how Elixir reports only verified bugs. 因为 Map.fetch! 期望一个映射(map)数据结构,而 value_or_error 在运行时只能是整数或二进制,所以接受的类型和提供的类型是不相交的,这就会导致违规。这就是所谓的兼容性属性,它解释了为什么 Elixir 只报告已验证的 Bug。

However, reporting only verified bugs would not be useful if we can’t find many bugs in the first place. We addressed this problem by making sure Elixir’s dynamic type can be narrowed. 然而,如果我们无法发现足够多的 Bug,那么只报告已验证的 Bug 就没有意义了。我们通过确保 Elixir 的动态类型可以被“收窄”(narrowing)来解决这个问题。

In the program above, data starts as a dynamic() type. We then use it as data.a and data.b inside the plus operator, so Elixir will refine the data variable to have type %{…, a: number(), b: number()}, which implies it is a map with both a and b fields with number values (and potentially any other field, hence the leading …). 在上述程序中,data 最初是 dynamic() 类型。随后我们在加法运算符内将其用作 data.adata.b,因此 Elixir 会将 data 变量细化为 %{..., a: number(), b: number()} 类型,这意味着它是一个包含 ab 字段且值均为数字的映射(可能还包含其他字段,因此有前导的 ...)。

Therefore, if you were to forget to select the .b field and write this: 因此,如果你忘记选择 .b 字段并写成这样:

def add_a_and_b(data) do
  data.a + data
end

data would be first narrowed to a map of shape %{…, a: number()}, then attempted to be used as a number(), which would emit a violation. In other words, the dynamic() type in Elixir effectively works as a range, which can be refined as it is used throughout the program and reports violations whenever type checks fall outside of the range. This is a contrast to other gradual type systems, which use the dynamic type to discard all type information. data 首先会被收窄为 %{..., a: number()} 形状的映射,然后尝试将其用作 number(),这会触发违规。换句话说,Elixir 中的 dynamic() 类型实际上起到了一个范围的作用,它可以在程序运行过程中随着使用而不断细化,并在类型检查超出该范围时报告违规。这与其他渐进式类型系统形成了鲜明对比,后者通常使用动态类型来丢弃所有类型信息。