I built a Python ORM with a Rust engine — here's how the GIL, PyO3, and asyncio actually cooperate

I built a Python ORM with a Rust engine — here’s how the GIL, PyO3, and asyncio actually cooperate

I like Tortoise ORM. Django-style models, async-first, clean. But I wanted more speed on read-heavy paths without reaching for SQLAlchemy, so I built yara-orm: a Tortoise-style async ORM where the model and query layer is Python, but the engine — connection pooling, parameter binding, and row decoding — is written in Rust (PyO3 over tokio-postgres and rusqlite).

我喜欢 Tortoise ORM。它拥有 Django 风格的模型、优先异步且代码整洁。但我希望在读取密集型路径上获得更高的速度，又不想使用 SQLAlchemy，于是我构建了 yara-orm：这是一个 Tortoise 风格的异步 ORM，其模型和查询层使用 Python 编写，但引擎（连接池、参数绑定和行解码）则由 Rust 编写（基于 PyO3，底层使用 tokio-postgres 和 rusqlite）。

The API is exactly what you’d expect:

其 API 正如你所预期的那样：

from yara_orm import Model, YaraOrm, fields, in_transaction

class Author(Model):
    id = fields.IntField(pk=True)
    name = fields.CharField(max_length=100)

class Book(Model):
    id = fields.IntField(pk=True)
    title = fields.CharField(max_length=200)
    author = fields.ForeignKeyField("Author", related_name="books")

await YaraOrm.init("postgres://localhost/app") # or sqlite://./app.db
await YaraOrm.generate_schemas()

ada = await Author.create(name="Ada")
hot = await Book.filter(author__name="Ada").order_by("-id").limit(10)

async with in_transaction():
    await Book.create(title="Atomic", author=ada)

But the API isn’t the interesting part — Tortoise already nailed that. The interesting part is underneath: a Rust database engine and Python’s asyncio loop have to share one interpreter, and if you get it wrong the GIL collapses the whole thing back into a single-threaded program. Here’s exactly how that works.

但 API 并不是最有趣的部分——Tortoise 已经做得非常好了。真正有趣的是底层：Rust 数据库引擎和 Python 的 asyncio 循环必须共享同一个解释器，如果处理不当，GIL（全局解释器锁）会将整个程序退化回单线程。以下是其具体工作原理。

Two runtimes in one process

同一进程中的两个运行时

There are two schedulers running at once:

CPython’s asyncio event loop — single-threaded, on your main thread, where your async def code runs.
Tokio’s multi-threaded runtime — background worker threads that actually open sockets, send queries, and parse wire protocols.

同时运行着两个调度器：

CPython 的 asyncio 事件循环——单线程，运行在你的主线程上，即你的 async def 代码执行的地方。
Tokio 的多线程运行时——后台工作线程，负责实际打开套接字、发送查询并解析网络协议。

The job is to let them cooperate so that the event loop never blocks on I/O, and the database I/O never blocks on the GIL. The GIL is the thing that makes that non-trivial.

我们的任务是让它们协作，确保事件循环永远不会因 I/O 而阻塞，且数据库 I/O 永远不会因 GIL 而阻塞。而 GIL 正是让这一切变得复杂的原因。

How the GIL shows up in Rust

GIL 在 Rust 中是如何体现的

In PyO3 you can’t touch a Python object without proof that you hold the GIL. That proof is a token — Python<'py> — threaded through the API: every function that reads or creates Python objects takes one, and the 'py lifetime ties every borrowed Bound<'py, PyAny> to it. It’s a compile-time guarantee. No token, no access to the interpreter.

在 PyO3 中，如果没有持有 GIL 的证明，你就无法触碰 Python 对象。这个证明就是一个令牌——Python<'py>——它贯穿于整个 API：每个读取或创建 Python 对象的函数都需要它，而 'py 生命周期将每个借用的 Bound<'py, PyAny> 与之绑定。这是一种编译时的保证。没有令牌，就无法访问解释器。

So the GIL boundary is explicit in the code, and only two places actually need it:

Binding parameters (Python → Rust): pulling a Python int / str / datetime / UUID out and converting it to a Rust value reads Python objects, so it holds the GIL.
Decoding rows (Rust → Python): constructing the int / str / datetime you get back creates Python objects, so it holds the GIL.

因此，GIL 的边界在代码中是显式的，实际上只有两个地方需要它：

绑定参数（Python → Rust）：从 Python 中提取 int / str / datetime / UUID 并将其转换为 Rust 值需要读取 Python 对象，因此需要持有 GIL。
解码行（Rust → Python）：构建返回的 int / str / datetime 需要创建 Python 对象，因此需要持有 GIL。

Everything between those two — acquiring a pooled connection, sending the query, waiting on the socket, parsing the wire protocol — touches no Python objects at all. So it runs with the GIL released. That’s the entire point: while Postgres is doing work and bytes are in flight, the GIL is free and other Python tasks run.

在这两者之间的一切操作——获取连接池连接、发送查询、等待套接字、解析网络协议——完全不触碰任何 Python 对象。因此，这些操作在释放 GIL 的情况下运行。这正是核心所在：当 Postgres 在处理任务且数据在传输时，GIL 是空闲的，其他 Python 任务可以正常运行。

To make that safe, the data crossing into the async world has to be owned and Send. yara-orm converts each parameter into a small Rust Value enum under the GIL, then hands an owned Vec<Value> to the database layer:

为了确保安全，进入异步世界的数据必须是“所有权”的且满足 Send 特征。yara-orm 在持有 GIL 的情况下将每个参数转换为一个小的 Rust Value 枚举，然后将拥有的 Vec<Value> 传递给数据库层：

#[derive(Clone)]
enum Value { Null, Int(i64), Text(String), Uuid(Uuid), /* ... */ }

By the time real I/O starts there isn’t a single Py<...> or Bound<...> in scope — nothing borrowed from the interpreter, nothing that needs the GIL — so Tokio is free to move the future across worker threads. This is also why you can’t just hold a PyObject across an .await: a GIL-bound handle isn’t Send, and the borrow checker stops you. The architecture is partly forced by PyO3’s types, which is a feature, not a limitation.

当真正的 I/O 开始时，作用域内没有任何 Py<...> 或 Bound<...> ——没有从解释器借用的东西，也没有需要 GIL 的东西——因此 Tokio 可以自由地在工作线程之间移动 future。这也是为什么你不能在 .await 期间持有 PyObject 的原因：受 GIL 约束的句柄不是 Send 的，借用检查器会阻止你这样做。这种架构在一定程度上是由 PyO3 的类型系统强制要求的，这是一种特性，而非局限。

How a Rust future becomes a Python await

Rust future 如何转化为 Python 的 await

The model layer calls await engine.fetch_rows(sql, params). On the Rust side fetch_rows doesn’t block — it returns a Python awaitable, built with pyo3-async-runtimes:

模型层调用 await engine.fetch_rows(sql, params)。在 Rust 端，fetch_rows 不会阻塞——它返回一个由 pyo3-async-runtimes 构建的 Python 可等待对象（awaitable）：

fn fetch_rows<'p>(&self, py: Python<'p>, sql: String, params: Vec<Value>) -> PyResult<Bound<'p, PyAny>> {
    let backend = self.backend.clone();
    future_into_py(py, async move {
        // runs on a Tokio worker thread, GIL released
        backend.fetch_all_values(&sql, &params).await.map_err(to_pyerr)
    })
}

future_into_py does three things:

Creates a Python asyncio.Future bound to the currently running event loop — which is why this has to be called from inside a running loop.
Spawns the Rust async move { ... } onto the Tokio runtime, which lives on its own background threads, completely separate from the asyncio loop thread.
When the Rust future finishes — on a Tokio worker thread — it schedules the result back onto the asyncio loop with loop.call_soon_threadsafe(...), the only thread-safe way to poke the loop from another thread.

future_into_py 做了三件事：

创建一个绑定到当前运行事件循环的 Python asyncio.Future——这就是为什么必须在运行中的循环内调用它的原因。
将 Rust 的 async move { ... } 派生（spawn）到 Tokio 运行时上，该运行时运行在自己的后台线程中，与 asyncio 循环线程完全隔离。
当 Rust future 完成时（在 Tokio 工作线程上），它通过 loop.call_soon_threadsafe(...) 将结果调度回 asyncio 循环，这是从另一个线程向循环发送信号的唯一线程安全方式。

From Python it’s an ordinary await: the coroutine suspends, the event loop keeps serving other tasks, and when the Tokio side resolves the future the loop wakes the coroutine with the rows. The decode step (Rust Value → Python objects) re-acquires the GIL for the few microseconds it takes to build the result, then releases it again. So the two runtimes never block each other: the asyncio thread is never blocked on I/O, and the database I/O never holds the GIL. The GIL is held only during the cheap conversion at each end.

从 Python 的角度看，这只是一个普通的 await：协程挂起，事件循环继续处理其他任务；当 Tokio 端解析完 future 后，循环会带着结果唤醒协程。解码步骤（Rust Value → Python 对象）会在构建结果所需的几微秒内重新获取 GIL，然后再次释放它。因此，两个运行时永远不会互相阻塞：asyncio 线程永远不会因 I/O 而阻塞，数据库 I/O 也永远不会持有 GIL。GIL 仅在两端进行廉价转换时才被持有。

That last sentence is the whole performance story, and it tells you exactly where to optimize. The query builder runs once per query; the decoder runs once per row. A SELECT returning 5,000 rows runs your row-hydration code 5,000 times — that loop is where the time goes, on every ORM. So that’s where the effort went:

最后这句话概括了整个性能故事，它明确指出了优化的方向。查询构建器每个查询运行一次；解码器每行运行一次。一个返回 5,000 行的 SELECT 语句会运行 5,000 次行填充代码——对于每个 ORM 来说，时间都花在这个循环上。因此，优化的重点就在这里：

uuid.UUID and decimal.Decimal type objects are imported once per interpreter, not re-resolved per cell (UUID primary keys show up on basically every query).
Postgres decoding dispatches on the column’s type OID via a jump table, instead of walking a 16-deep chain of type comparisons per cell.
SQLite upper-cases each column’s declared type once per result set instead of per cell, and binds parameters by move rather than copying them twice.
uuid.UUID 和 decimal.Decimal 类型对象在每个解释器中只导入一次，而不是每个单元格重新解析（UUID 主键几乎出现在每个查询中）。
Postgres 解码通过跳转表根据列的类型 OID 进行分发，而不是每个单元格遍历 16 层深的类型比较链。
SQLite 在每个结果集中将每列声明的类型大写一次，而不是每个单元格处理一次，并且通过移动（move）而不是两次拷贝来绑定参数。

None of these are glamorous. All of them compound across rows — and crucially, all of them are inside the…

这些优化都不怎么光鲜。但它们在处理多行数据时会产生累积效应——最关键的是，它们都位于……