How we used Quint to find over 10 bugs in SQLite while hardening Turso

How we used Quint to find over 10 bugs in SQLite while hardening Turso

我们是如何利用 Quint 在加固 Turso 的同时发现 SQLite 中 10 多个 Bug 的

From the moment we started Turso, we have taken testing extremely seriously. There is no other choice: Turso is a rewrite of SQLite, and SQLite sets a very high bar for reliability. We consistently hear that this, more than any specific feature, is what users love about SQLite. 从我们启动 Turso 的那一刻起,我们就极其重视测试。别无选择:Turso 是 SQLite 的重写版本,而 SQLite 为可靠性设定了极高的标准。我们经常听到用户说,比起任何特定功能,这才是他们喜爱 SQLite 的原因。

Our first version shipped already with a built-in Deterministic Simulation Testing mechanism. Today, aside from our original DST, we have differential testers, fuzzers, concurrency simulators, as well as making good use of Antithesis. We will never pass up an opportunity to improve things even more. Because of this, we have been attracting a healthy community of quality aficionados into our Open Source community. This article goes into the details of how one of our Open Source members, Pavan Nambi, used Quint to tighten our testing so hard that he ended up finding 10 bugs (and counting) in SQLite itself. 我们的第一个版本就内置了确定性模拟测试(DST)机制。如今,除了最初的 DST,我们还拥有差异测试器、模糊测试器、并发模拟器,并充分利用了 Antithesis。我们绝不会放过任何进一步改进的机会。正因如此,我们吸引了一个由质量发烧友组成的健康社区加入我们的开源项目。本文将详细介绍我们的开源成员 Pavan Nambi 如何使用 Quint 加强测试,最终在 SQLite 本身中发现了 10 个(且还在增加)Bug。

Formal Methods

形式化方法

Despite our testing discipline, one area in which we are still making slow inroads is formal methods. Formal methods have traditionally been very inaccessible to mortals, and usually come with the “who-watches-the-watchmen” added challenge of how to verify that the model of the system itself is correct. TLA+ is the gold standard for formal verification, but other tools have been appearing recently that try to tackle the issue of accessibility in formal methods. One such tool is Quint. 尽管我们有严格的测试纪律,但在形式化方法领域,我们的进展仍然缓慢。传统上,形式化方法对普通人来说非常难以触及,而且通常伴随着“谁来监督监督者”的挑战,即如何验证系统模型本身是正确的。TLA+ 是形式化验证的黄金标准,但最近出现了一些其他工具,试图解决形式化方法的易用性问题。Quint 就是其中之一。

As per their GitHub page, Quint “combines the robust theoretical basis of the Temporal Logic of Actions (TLA) with state-of-the-art type checking and development tooling.” Pavan decided to take it for a spin, but quickly concluded that trying to specify the entire system would be close to impossible. At that point, he had an idea: most drivers that consume SQLite do so through the C API. Turso exposes a C API that is compatible with SQLite’s. The C API is also well documented and well specified. If he could model the C API in Quint, that would amount to a pretty good level of coverage. 根据其 GitHub 页面,Quint “结合了动作时序逻辑 (TLA) 强大的理论基础与最先进的类型检查和开发工具”。Pavan 决定尝试一下,但很快得出结论:试图对整个系统进行规范化几乎是不可能的。此时,他产生了一个想法:大多数使用 SQLite 的驱动程序都是通过 C API 进行的。Turso 暴露了一个与 SQLite 兼容的 C API。该 C API 文档齐全且规范明确。如果他能在 Quint 中对 C API 进行建模,那将达到相当不错的覆盖率。

Quinting SQLite

使用 Quint 测试 SQLite

With this idea in mind, the process was now simple. We just had to rinse and repeat the following steps: Take one documented SQLite C API contract, then model only the state and properties needed for that. Generate a small trace. Run it against SQLite. Compare the documented result with the observed one. If the model checker fails, we get a counter-example: a sequence of states that lead to the violation. The expectation was that running it against SQLite would help us validate the model. In many situations it did. But in others, upon closer inspection, the system deviated from the model because the system was wrong. 有了这个想法,过程就变得简单了。我们只需要重复以下步骤:选取一个有文档记录的 SQLite C API 契约,然后仅对所需的特定状态和属性进行建模。生成一个小轨迹(trace)。针对 SQLite 运行它。将记录的结果与观察到的结果进行比较。如果模型检查器失败,我们会得到一个反例:导致违规的状态序列。我们的预期是,针对 SQLite 运行它将有助于验证模型。在许多情况下确实如此。但在其他情况下,经过仔细检查,系统偏离了模型,因为系统本身出错了。

One example: sqlite3_deserialize()

一个例子:sqlite3_deserialize()

Here is one example: the function sqlite3_deserialize() lets you take a serialized SQLite database image and load it into a database connection as an in-memory database. This is useful when you have a file-based database but want to fully load it into memory and operate on it as an in-memory database from that point on. According to the specification of this API, sqlite3_deserialize() should fail with SQLITE_BUSY if the target database is currently in a read transaction or if it is involved in a backup operation. 这是一个例子:sqlite3_deserialize() 函数允许你获取一个序列化的 SQLite 数据库镜像,并将其作为内存数据库加载到数据库连接中。当你有一个基于文件的数据库,但想将其完全加载到内存中并从那时起将其作为内存数据库操作时,这非常有用。根据该 API 的规范,如果目标数据库当前处于读取事务中或正在进行备份操作,sqlite3_deserialize() 应该返回 SQLITE_BUSY 错误。

Quint works by generating traces. Traces are just a sequence of states, not SQL statements. Those traces can then be translated into whatever you want to allow execution against the system. In our case, a small C program. The Quint model automatically generated a trace where this contract was violated: Open database. Create table. Serialize database. Start reading from the database. While the read transaction is active, call sqlite3_deserialize(), expecting SQLITE_BUSY. Quint 通过生成轨迹来工作。轨迹只是一系列状态,而不是 SQL 语句。这些轨迹可以被转换成任何你想要的形式,以便在系统上执行。在我们的例子中,是一个小的 C 程序。Quint 模型自动生成了一个违反此契约的轨迹:打开数据库。创建表。序列化数据库。开始从数据库读取。在读取事务处于活动状态时,调用 sqlite3_deserialize(),预期返回 SQLITE_BUSY

In summary, the database was supposed to return busy if a read transaction was in progress during a deserialize operation. The C program was used to move the database from one state to the other according to the trace. But in this case, the database did not return SQLITE_BUSY: it crashed instead. Crashes have one big advantage: it is easy to know that this is not an issue with the model, since crashing is almost never the right behavior. This was fixed in SQLite in this commit. 总之,如果在反序列化操作期间正在进行读取事务,数据库应该返回 busy。C 程序被用来根据轨迹将数据库从一个状态移动到另一个状态。但在这种情况下,数据库没有返回 SQLITE_BUSY:它反而崩溃了。崩溃有一个很大的优点:很容易判断这不是模型的问题,因为崩溃几乎永远不是正确的行为。此问题已在 SQLite 的此提交中修复。

And many more followed

随后发现了更多问题

During the entire exploration with Quint, the following bugs were found: 在整个使用 Quint 的探索过程中,发现了以下 Bug:

  • EXISTS-to-join optimization applies LIMIT/OFFSET as if duplicate join rows were real output rows
  • EXISTS 转 join 优化错误地应用了 LIMIT/OFFSET,仿佛重复的 join 行是真实的输出行
  • Nested EXISTS-to-join rewrite loses outer correlation and filters out a valid row
  • 嵌套的 EXISTS 转 join 重写丢失了外部关联,并过滤掉了有效行
  • Quoted constraint names with embedded quotes can become effectively undroppable via DROP CONSTRAINT
  • 带有嵌入引号的带引号约束名称可能导致无法通过 DROP CONSTRAINT 删除
  • Wrong-result regression in UPDATE one-pass planning when a correlated subquery is wrapped under BETWEEN in the WHERE clause
  • 当关联子查询在 WHERE 子句中被包裹在 BETWEEN 下时,UPDATE 一次性规划出现结果错误回归
  • SQLite crashes when ALTER ADD CHECK is used with internal sqlite_ tables
  • 当 ALTER ADD CHECK 与内部 sqlite_ 表一起使用时,SQLite 崩溃
  • sqlite3_deserialize() crashes instead of returning SQLITE_BUSY during an active read transaction
  • sqlite3_deserialize() 在活动读取事务期间崩溃,而不是返回 SQLITE_BUSY
  • sqlite3_deserialize() should reject a database involved in backup, but the backup/deserialization interleaving crashes
  • sqlite3_deserialize() 应该拒绝涉及备份的数据库,但备份/反序列化的交错导致了崩溃
  • sqlite3_mutex 128-byte alignment UB
  • sqlite3_mutex 128 字节对齐未定义行为 (UB)
  • NULL pzErr crash in sqlite3changegroup_change_finish()
  • sqlite3changegroup_change_finish() 中的 NULL pzErr 崩溃
  • xfer optimization bypasses BLOB type checks via ANY, resulting in integrity_check failures
  • xfer 优化通过 ANY 绕过了 BLOB 类型检查,导致 integrity_check 失败
  • xfer optimization bypasses check constraints, resulting in an inconsistent db
  • xfer 优化绕过了检查约束,导致数据库不一致
  • sqlite3_uri_int64() returns the default for invalid present values instead of zero (a docs mismatch rather than a real bug)
  • sqlite3_uri_int64() 对无效的现有值返回默认值而不是零(这更像是一个文档不匹配,而不是真正的 Bug)

In addition to the ones above, there are also other findings that we are working with the SQLite team to validate. 除了上述问题外,还有其他发现,我们正在与 SQLite 团队合作进行验证。

Summary

总结

Formal methods… 形式化方法……