Greedy or not, here I come: Language production under vocabulary constraints in humans and resource-rational models

贪婪与否，我来了：人类与资源理性模型在词汇受限下的语言生成

Abstract: Communicating using only a limited vocabulary is a common but challenging cognitive phenomenon, requiring an ideal communicator to plan carefully to optimize for intelligibility while circumventing a constrained lexicon. 摘要： 仅使用有限的词汇进行交流是一种常见但具有挑战性的认知现象，它要求理想的交流者进行仔细规划，在绕过受限词库的同时优化语言的可理解性。

In this work, we investigate how humans respond to a broad array of questions under variable vocabulary limitations, consisting of only 250 highly frequent words at the most restrictive. 在这项工作中，我们研究了人类在不同词汇限制下如何回答各类问题，其中最严格的限制仅允许使用 250 个高频词。

We provide theoretically motivated comparisons to greedy and globally optimal sampling algorithms using Sequential Monte Carlo inference with large language models. 我们利用大型语言模型的序列蒙特卡洛推理（Sequential Monte Carlo inference），对贪婪采样算法和全局最优采样算法进行了理论驱动的比较。

Humans generally resemble greedy sampling more than globally optimal sampling, though more skilled humans are more likely to backtrack and revise — a non-greedy behavior. 研究发现，人类的表现通常更接近贪婪采样而非全局最优采样，尽管技能水平较高的人更有可能进行回溯和修改——这是一种非贪婪的行为。

An observed human pattern of leaning on semantically light words in high-constraint settings falls out of both greedy and globally optimal sampling. 在高度受限的环境下，人类倾向于使用语义较轻（semantically light）词汇的模式，在贪婪采样和全局最优采样模型中均有所体现。

We discuss the results and their broader implications for resource-rational cognition, psycholinguistics, L2 communication, and language impairments. 我们讨论了这些结果及其对资源理性认知、心理语言学、第二语言（L2）交流以及语言障碍研究的更广泛意义。