Safe Made Easy Pt.1: Single Ownership is (Not) Optional
Safe Made Easy Pt.1: Single Ownership is (Not) Optional
Safe Made Easy 第一部分:单一所有权是(非)可选的
Intro This post introduces an approach to memory safety that I believe is more practical and more ergonomic than the available alternatives.
简介 这篇文章介绍了一种内存安全方案,我相信它比现有的替代方案更实用、更符合人体工程学。
It all started way back when, and was inspired by things I read and wrote:
- Attempts to bolt linear types on top of Rust (1), (2)
- Leakpocalypse
- Verdagon’s post on Vale design and higher RAII
- A lot, and I mean A LOT, of TypeScript
这一切始于很久以前,灵感来源于我阅读和编写的内容:
- 在 Rust 之上添加线性类型的尝试 (1), (2)
- “内存泄漏启示录”(Leakpocalypse)
- Verdagon 关于 Vale 设计和更高阶 RAII 的文章
- 大量的,我是说非常大量的 TypeScript 代码
Three years of development later, I believe I finally got it. The proposal is complete. Moreover, I have implemented it in my own programming language I intend to release soon-ish, and I want to share the design decisions and the entire path from “huh, why not” to “omg it’s live”.
经过三年的开发,我相信我终于搞定了。这个提案已经完成。此外,我已经在自己即将发布的编程语言中实现了它,我想分享其中的设计决策,以及从“嗯,为什么不试试”到“天哪,它真的跑通了”的整个历程。
So, TL;DR: linear types (which are dropped exactly once) + abstract interpretation + a bunch of tricks allows us to eliminate the same classes of bugs as Rust does (at least in non-concurrent environments) plus memory leaks, and we can extend the approach to also cover concurrent environments, all the while being more ergonomic and less restrictive. Sounds fun? Let’s dig in.
简而言之:线性类型(每个值恰好被销毁一次)+ 抽象解释 + 一系列技巧,使我们能够消除与 Rust 相同类别的错误(至少在非并发环境下),并额外消除内存泄漏。我们还可以将此方法扩展到并发环境,同时保持更高的易用性和更少的限制。听起来很有趣?让我们深入探讨吧。
What it promises and what it doesn’t It is safe - it completely eliminates entire classes of bugs, such as: Double-free, Use-after-free, Dangling pointers, Null pointer dereferences, Buffer overflows, Out-of-bounds accesses, Iterator invalidation, Uninitialized memory access, Memory leaks.
它承诺了什么,没承诺什么 它是安全的——它彻底消除了整类错误,例如:重复释放(Double-free)、释放后使用(Use-after-free)、悬垂指针、空指针解引用、缓冲区溢出、越界访问、迭代器失效、未初始化内存访问以及内存泄漏。
Single ownership enables linearity - each value is dropped exactly once - and prohibits ownership cycles. Together with the flow-sensitive type system built to enforce it, these eliminate most of the above. Buffer overflows and OOB accesses are covered separately, but the mechanics of the rest of the system make dealing with these easy and efficient.
单一所有权实现了线性——每个值恰好被销毁一次——并禁止了所有权循环。结合为此构建的流敏感类型系统,这些机制消除了上述大部分问题。缓冲区溢出和越界访问是单独处理的,但系统其余部分的机制使得处理这些问题变得简单且高效。
It is sound - I will demonstrate over the course of this series that the claims hold for arbitrary inputs. There are no holes that can be used to break the guarantees provided from inside the system.
它是可靠的(Sound)——我将在本系列文章中证明,这些主张对于任意输入都成立。不存在任何可以从系统内部破坏这些保证的漏洞。
It is NOT simple - there is a fairly large number of primitives working together so that the whole system can uphold the safety guarantees promised.
它并不简单——有相当多的原语协同工作,才能使整个系统维持所承诺的安全保证。
It is NOT concerned with concurrency - though the “fearless concurrency” guarantees are a natural extension to the proposed system, it has not been implemented in a complete enough way to demonstrate the viability of the approach. I will expand on this in a future post once I get it up and running.
它不涉及并发——尽管“无畏并发”保证是该提案的自然扩展,但目前尚未以足够完整的方式实现来证明该方法的可行性。一旦我将其运行起来,我会在未来的文章中详细说明。
It is NOT claiming to be “zero cost”, though it keeps runtime overhead to the minimum - it introduces runtime checks (a single branch per indeterminate access) if the compiler cannot statically prove availability.
它不声称是“零成本”的,尽管它将运行时开销保持在最低限度——如果编译器无法静态证明可用性,它会引入运行时检查(每次不确定访问进行一次分支判断)。
Motivating example
Consider this pseudocode: var x: T = new T; if random() > 0.5 { drop x; } print(x); What this code does is it conditionally consumes a value.
动机示例
考虑这段伪代码:var x: T = new T; if random() > 0.5 { drop x; } print(x); 这段代码的作用是有条件地消耗一个值。
There are two ways this could go in a real language. C++ doesn’t particularly care and will happily compile this code. Which will then proceed to invoke UB (Undefined Behavior) in about 50% of runs. A modern C++ developer would reach for std::unique_ptr and std::optional here - and they would help, partially. RAII via smart pointers eliminates the manual delete, and optional gives you a way to represent “maybe moved.” But unique_ptr only manages heap-allocated objects, and the type system does not enforce the optional check - operator* on an empty optional is undefined behavior, and even .value() only gives you a runtime exception instead of a compile-time error. It is still on you to remember.
在现实语言中,这有两种处理方式。C++ 并不太在意,会愉快地编译这段代码,然后在约 50% 的运行中触发未定义行为(UB)。现代 C++ 开发者会在这里使用 std::unique_ptr 和 std::optional——它们会有所帮助,但只是部分帮助。通过智能指针实现的 RAII 消除了手动 delete,而 optional 提供了一种表示“可能已移动”的方法。但 unique_ptr 仅管理堆分配对象,且类型系统不会强制执行 optional 检查——在空的 optional 上使用 operator* 是未定义行为,即使是 .value() 也只会给你一个运行时异常,而不是编译时错误。记住检查依然是你的责任。
In Rust, though, this code does not compile at all: fn main() { let x = Box::new(42); if rand::random::<f64>() > 0.5 { drop(x); } println!("{}", x); } Rust takes a very different approach. The compiler tracks moves through control flow - it sees that x might have been moved in the if branch, and rejects the program outright. Rust’s ownership model requires that every variable’s move state is statically known at every point in the program - a conditionally-moved value violates that requirement, so the program is rejected. You can wrap the value in Option<T> yourself and .take() it manually, but Rust won’t do that for you - the burden is on the developer to restructure the code upfront.
然而在 Rust 中,这段代码根本无法编译。Rust 采取了完全不同的方法。编译器通过控制流跟踪移动——它发现 x 可能在 if 分支中被移动了,于是直接拒绝编译。Rust 的所有权模型要求每个变量的移动状态在程序的每个点上都是静态已知的——条件移动的值违反了该要求,因此程序被拒绝。你可以自己将值包装在 Option<T> 中并手动 .take(),但 Rust 不会为你做这些——开发者必须预先重构代码。
So, what if there was a third way between these two?
那么,如果在这两者之间存在第三种方式呢?
The proposal
The proposed solution is straightforward: var x: T = new T; if rand() > 0.5f { drop x; } // <- At this point, typeof(x) is Option<T>
提案
提出的解决方案很简单:var x: T = new T; if rand() > 0.5f { drop x; } // <- 此时,typeof(x) 为 Option<T>
The type of the value is now control-flow-dependent - the compiler evaluates it as it goes through the program, widening it each time control flow diverges to accommodate for both possibilities. Then it becomes the developer responsibility to narrow it down when they want to use it: if x { // x is definitely available } else { // x is definitely not available }
该值的类型现在依赖于控制流——编译器在遍历程序时对其进行评估,每当控制流分叉时,它就会将类型“拓宽”(widening)以容纳两种可能性。然后,当开发者想要使用它时,就有责任将其“收窄”(narrowing):if x { // x 在此分支中肯定可用 } else { // x 肯定不可用 }
One way to view this is to consider which information is available at the compiler at various points:
- First conditional statement makes the compiler lose information on availability of x, which is expressed by the type system as widening type of x to
Option<T>. - Second conditional statement provides information to the compiler - in each branch of the statement, x has a definite availability.
- But after the second conditional statement we are back to the state where the information is not available.
看待这个问题的一种方式是考虑编译器在不同点上拥有哪些信息:
- 第一个条件语句使编译器丢失了关于
x可用性的信息,类型系统将其表示为将x的类型拓宽为Option<T>。 - 第二个条件语句为编译器提供了信息——在语句的每个分支中,
x都有明确的可用性。 - 但在第二个条件语句之后,我们又回到了信息不可知的状态。
Compared to C++ approach, we now force the developer to consider the state space explicitly and avoid the crash, because the typechecker will catch all attempts to use an Option<T> where a T should be used, or to use a definitely non-available value. Compared to Rust approach, we gain flexibility at a cost of a runtime check - a single null/tag comparison at the point of access.
与 C++ 方法相比,我们现在强制开发者显式考虑状态空间并避免崩溃,因为类型检查器会捕获所有在应使用 T 的地方使用 Option<T>,或使用肯定不可用的值的尝试。与 Rust 方法相比,我们以运行时检查为代价获得了灵活性——即在访问点进行一次空值/标签比较。