BODHI: Precise OS Kernel Specification Inference

BODHI：精确的操作系统内核规范推断

The formal verification of operating system kernels requires precise specifications that capture the intended behavior of system calls. Writing these specifications manually demands deep domain expertise, motivating the use of large language models (LLMs) to automate the process. However, in OSV-Bench, a benchmark of 245 specification generation tasks derived from the Hyperkernel OS kernel, the best reported Pass@1 is 55.10%.

操作系统内核的形式化验证需要精确的规范来捕捉系统调用的预期行为。手动编写这些规范需要深厚的领域专业知识，这促使人们利用大语言模型（LLMs）来自动化这一过程。然而，在源自 Hyperkernel 操作系统内核的 245 项规范生成任务基准测试 OSV-Bench 中，目前报告的最佳 Pass@1 准确率仅为 55.10%。

We propose a domain knowledge prompting method (BODHI), which augments the standard few-shot prompt with a structured C-to-Python translation guide covering 15 categories of domain-specific translation patterns. Inspired by Structured Chain-of-Thought (SCoT) prompting, the guide organizes translation by separation of concerns, addressing pre-condition extraction and post-condition generation as distinct categories.

我们提出了一种领域知识提示方法（BODHI），它通过包含 15 类特定领域翻译模式的结构化 C 语言到 Python 语言翻译指南，增强了标准的少样本（few-shot）提示。受结构化思维链（SCoT）提示的启发，该指南通过关注点分离来组织翻译，将前置条件提取和后置条件生成作为不同的类别进行处理。

Evaluated on nine models from six providers (Anthropic, Mistral, Amazon, DeepSeek, Meta, Alibaba), covering dense, mixture-of-experts and reasoning architectures, BODHI improves every model tested, with gains ranging from +11% to +32%. The best configuration (Claude Opus 4.6 + BODHI) reaches 96.73% Pass@1.

在来自六家供应商（Anthropic、Mistral、Amazon、DeepSeek、Meta、Alibaba）的九个模型上进行评估，涵盖了稠密模型、混合专家模型（MoE）和推理架构，BODHI 提升了所有受测模型的性能，增幅在 +11% 到 +32% 之间。最佳配置（Claude Opus 4.6 + BODHI）达到了 96.73% 的 Pass@1。

BODHI reduces both syntax and semantic errors, with the strongest effect on models that have sufficient instruction-following capability to utilize structured reference material. These results demonstrate that domain knowledge injection is a model-agnostic technique that substantially bridges the gap between general-purpose code generation and formal specification synthesis.

BODHI 同时减少了语法和语义错误，对于那些具备足够指令遵循能力以利用结构化参考资料的模型，其效果最为显著。这些结果表明，领域知识注入是一种与模型无关的技术，能够显著弥合通用代码生成与形式化规范合成之间的差距。