OSCToM: RL-Guided Adversarial Generation for High-Order Theory of Mind

OSCToM：用于高阶心智理论的强化学习引导对抗生成

Abstract: Large Language Models (LLMs) perform well on many language tasks, but their Theory of Mind (ToM) reasoning is still uneven in complex social settings. Existing benchmarks, including ExploreToM, do not always test the recursive beliefs and information asymmetries that make these settings difficult.

摘要： 大型语言模型（LLMs）在许多语言任务中表现出色，但在复杂的社交环境中，其心智理论（ToM）推理能力仍然参差不齐。现有的基准测试（包括 ExploreToM）并不总是能充分测试那些使这些环境变得困难的递归信念和信息不对称问题。

This paper presents OSCToM (Observer-Self Conflict Theory of Mind), an approach for modeling nested belief conflicts in LLM-based ToM tasks. The key case is one in which an observer’s view of another agent conflicts with the observer’s own belief state. Such cases go beyond simple perspective-taking and require recursive, multi-layered reasoning.

本文提出了 OSCToM（观察者-自我冲突心智理论），这是一种用于在基于 LLM 的 ToM 任务中建模嵌套信念冲突的方法。其核心案例是观察者对他人的看法与观察者自身的信念状态发生冲突的情况。此类案例超越了简单的视角转换，需要递归的多层推理。

OSCToM combines reinforcement learning (RL), an extended domain-specific language, and compositional surrogate models to generate observer-self conflicts. In our experiments, OSCToM-8B gives the best overall result among the systems tested. It improves on the reported ExploreToM results on FANToM and remains competitive on Hi-ToM and BigToM.

OSCToM 结合了强化学习（RL）、扩展的领域特定语言以及组合代理模型来生成观察者-自我冲突。在我们的实验中，OSCToM-8B 在所有测试系统中取得了最佳的整体结果。它在 FANToM 基准测试上超越了 ExploreToM 报告的结果，并在 Hi-ToM 和 BigToM 上保持了竞争力。

On the information-asymmetric FANToM benchmark, OSCToM reaches 76% accuracy, compared with the 0.2% reported by ExploreToM. The data-synthesis procedure is also 6x more efficient, indicating that targeted training data can help smaller models handle advanced cognitive reasoning. The project code is available at this https URL.

在信息不对称的 FANToM 基准测试中，OSCToM 达到了 76% 的准确率，而 ExploreToM 报告的准确率仅为 0.2%。此外，该数据合成过程的效率提高了 6 倍，这表明针对性的训练数据可以帮助较小的模型处理高级认知推理。项目代码已在指定链接提供。