How memory safety CVEs differ between Rust and C/C++

How memory safety CVEs differ between Rust and C/C++

Rust 与 C/C++ 在内存安全 CVE 上的区别

CVE is a database used for categorizing and reporting security vulnerabilities in software. There are various kinds of vulnerabilities that can be reported. Some of them are caused simply by bugs in the program logic (like a recent CVE reported in Cargo), but some of the most nasty ones are caused by memory unsafety, which can easily lead to exploits.

CVE 是一个用于对软件安全漏洞进行分类和报告的数据库。其中可以报告的漏洞种类繁多。有些漏洞仅仅是由程序逻辑错误引起的(例如最近在 Cargo 中报告的一个 CVE),但其中最棘手的一些是由内存不安全引起的,这很容易导致漏洞利用。

In this post I want to focus on the latter kind of CVEs, how they are reported, especially in libraries, and how it differs between Rust and C or C++. Because sometimes I see people online who compare the number of CVEs in Rust and C/C++ software, which tends to be accompanied by claims about Rust not being really memory safe or not being worth adopting when CVEs can still exist in it. And sometimes I also observe similar views when I teach Rust to programmers who are used to programming in C or C++.

在这篇文章中,我想重点讨论后一种 CVE,探讨它们是如何被报告的(特别是在库中),以及 Rust 与 C 或 C++ 在这方面有何不同。因为有时我会在网上看到人们比较 Rust 和 C/C++ 软件中的 CVE 数量,并随之声称 Rust 并非真正内存安全,或者既然 Rust 中仍存在 CVE,那么它就不值得采用。有时,当我在向习惯于 C 或 C++ 编程的程序员教授 Rust 时,也会观察到类似的观点。

Now, anyone is, of course, free to do such comparisons, and make their own conclusions based on it. But I think that there is an important difference in how potential vulnerabilities related to memory safety are treated in Rust and C/C++, which might not be obvious at first, especially if you don’t know how Rust works. I’d like to explain that in this post.

当然,任何人都可以自由地进行此类比较,并据此得出自己的结论。但我认为,Rust 和 C/C++ 在处理与内存安全相关的潜在漏洞方面存在一个重要的区别,这一点起初可能并不明显,特别是如果你不了解 Rust 的工作原理的话。我想在本文中对此进行解释。

But first, I should clarify that it is absolutely possible to cause memory unsafety bugs and undefined behaviour in Rust. In the vast majority of cases, the unsafe keyword is required for this to happen, but anyone who claims that Rust programs cannot experience UB at all is simply incorrect. It is also perfectly possible to cause general vulnerabilities (meaning those unrelated to memory unsafety) in Rust. Forgetting to add a check that your admin dashboard is only accessible to admins can happen in any language, after all.

但首先,我必须澄清,在 Rust 中完全有可能导致内存不安全错误和未定义行为(UB)。在绝大多数情况下,这需要使用 unsafe 关键字,但任何声称 Rust 程序绝对不会出现 UB 的人都是错误的。在 Rust 中,完全有可能导致通用漏洞(即与内存不安全无关的漏洞)。毕竟,忘记添加“只有管理员才能访问管理面板”的检查,在任何语言中都可能发生。

And yet, there is something very different between potential vulnerabilities in Rust and C or C++, which is related to the core reason of why Rust is actually much more memory safe in practice than C or C++. I’ll try to demonstrate it on the curl networking library, which is written in C.

然而,Rust 和 C/C++ 中的潜在漏洞之间存在着非常大的区别,这与 Rust 在实践中比 C/C++ 更具内存安全性的核心原因有关。我将尝试通过用 C 语言编写的 curl 网络库来演示这一点。

Potential vulnerability in curl?

curl 中的潜在漏洞?

(lib)curl is one of the most used and well-maintained open source libraries in the world. Its primary developer, Daniel Stenberg, is one of the most prolific open source maintainers of our time, and together with many other people, he has been diligently improving this library for the past 30 years. Despite having to deal with a recent avalanche of CVEs found by LLMs, he and his collaborators are doing a very good job of keeping curl safe from potential exploits and vulnerabilities, and they take pride in curl being a very robust piece of software.

(lib)curl 是世界上使用最广泛且维护得最好的开源库之一。其主要开发者 Daniel Stenberg 是我们这个时代最杰出的开源维护者之一,在过去 30 年里,他与许多其他人一起勤奋地改进着这个库。尽管最近不得不应对由大语言模型(LLM)发现的 CVE 浪潮,但他和他的合作者们在保护 curl 免受潜在漏洞和攻击方面做得非常出色,他们也为 curl 是一款非常健壮的软件而感到自豪。

So, let’s take that to the test, shall we? I opened the documentation of libcurl and found the first function I saw that accepts an argument, curl_getenv. This is supposed to be a simple function that provides a portable abstraction for getting the value of an environment variable across different operating systems. curl is supposed to be safe and robust, so surely this function doesn’t contain any UB or memory unsafety, right?

那么,让我们来测试一下吧?我打开了 libcurl 的文档,找到了我看到的第一个接受参数的函数 curl_getenv。这应该是一个简单的函数,为跨不同操作系统获取环境变量的值提供了一个可移植的抽象。curl 应该是安全且健壮的,所以这个函数肯定不会包含任何 UB 或内存不安全问题,对吧?

So what about the following C program? 那么下面这个 C 程序呢?

#include <curl/curl.h>
int main(void) {
    curl_getenv(NULL);
}

This 5-line C program is as simple as it gets, it just calls the curl_getenv function with a NULL pointer argument, and compiles without any warnings. And yet, when you execute it, you (might) get a segfault, and thus a memory safety bug, and thus a potential vulnerability/exploit:

这个 5 行的 C 程序非常简单,它只是用一个 NULL 指针参数调用了 curl_getenv 函数,并且编译时没有任何警告。然而,当你执行它时,你(可能)会得到一个段错误(segfault),从而导致一个内存安全错误,进而产生一个潜在的漏洞/攻击:

$ gcc test.c -otest -lcurl -Wall -Wextra
$ ./test
Segmentation fault (core dumped)

Of course, this program is artificially simple, but that’s kind of the point. In practice, situations like this can (and do) easily happen in larger programs by accident all the time. Huh. So maybe curl isn’t so safe after all? Should I go and report this as a vulnerability in curl?! No, of course not. That would be stupid. I know that, you know that. But how do we actually know it? That’s the interesting part.

当然,这个程序是人为简化的,但这正是重点所在。在实践中,这种情况在大型程序中很容易(并且确实)会意外发生。呵,所以 curl 也许并没有那么安全?我应该去把它报告为 curl 的一个漏洞吗?!不,当然不是。那太愚蠢了。我知道,你也知道。但我们是怎么知道的呢?这才是最有趣的部分。

Consider a very similar program that would call the function like this: curl_getenv("FOO"). What if that program would still segfault, and thus contain a potential vulnerability? I am sure that the curl maintainers would like to know about that happening, and would consider it to be a pretty big issue if I reported it! At the same time, I’m sure that they would (rightfully) tell me off if I reported the first program as a vulnerability in curl. Yet those two programs differ only by so little. So, what gives?

考虑一个非常相似的程序,它这样调用函数:curl_getenv("FOO")。如果那个程序仍然发生段错误,从而包含一个潜在漏洞呢?我相信 curl 的维护者会想知道这种情况,如果我报告了它,他们会认为这是一个相当大的问题!同时,我也确信,如果我把第一个程序报告为 curl 的漏洞,他们会(理所当然地)训斥我。然而,这两个程序之间只有微小的差别。那么,这是怎么回事呢?

Well, in practice, UB like the one in my original example is said to be caused by “wrong usage”, and it is not considered to be an issue in the library or API that I am using, but in my (application) code. This is done mostly for the following two reasons:

实际上,像我最初例子中那样的 UB 被认为是由于“错误使用”引起的,它不被视为我所使用的库或 API 的问题,而是我(应用程序)代码的问题。这主要是出于以下两个原因:

In C, it is often not possible to specify the contract (invariants, preconditions, postconditions, etc.) of APIs precisely due to its limited type system, and library authors often don’t bother describing all possible kinds of wrong usage, as it would not be practical. Indeed, the documentation of curl_getenv does not say that calling it with NULL is forbidden and might lead to a segfault! The authors thus assume that you will use the library “correctly” (whatever that means), and if you don’t, then any caused vulnerabilities are your fault.

在 C 语言中,由于其类型系统有限,通常无法精确指定 API 的契约(不变量、前置条件、后置条件等),而且库作者通常不会费心描述所有可能的错误使用方式,因为这不切实际。事实上,curl_getenv 的文档并没有说禁止使用 NULL 调用它,也没有说这可能导致段错误!因此,作者假设你会“正确地”使用该库(无论这意味着什么),如果你没有这样做,那么由此产生的任何漏洞都是你的错。

The fact that it is so simple to trigger UB by accident in C or C++ means that if we reported all the potential possibilities of causing a vulnerability, such as the one in my example program, most C or C++ libraries would be flooded by millions of CVEs. It wouldn’t make sense to do that, because there would be five different ways of potentially causing a vulnerability in every function call. And thus, in C and C++, we usually do not consider similar situations to warrant a CVE in the used library. In other words, we create CVEs for specific misuses of a library, not for the existence of a library API that can be misused.

在 C 或 C++ 中,意外触发 UB 是如此简单,这意味着如果我们报告所有可能导致漏洞的情况(例如我示例程序中的那种),大多数 C 或 C++ 库将被数百万个 CVE 淹没。这样做没有意义,因为每个函数调用中都可能有五种不同的方式导致潜在漏洞。因此,在 C 和 C++ 中,我们通常不认为类似情况值得在所使用的库中发布 CVE。换句话说,我们为库的特定滥用创建 CVE,而不是为可以被滥用的库 API 的存在创建 CVE。

How does it differ in Rust?

在 Rust 中有何不同?

So, what is the crucial difference between how the situation above would be treated in Rust? 那么,在 Rust 中处理上述情况的关键区别是什么?