Learning Transferable Latent User Preferences for Human-Aligned Decision Making

学习用于人类对齐决策的可迁移潜在用户偏好

Abstract: Large language models (LLMs) are increasingly used as reasoning modules in many applications. While they are efficient in certain tasks, LLMs often struggle to produce human-aligned solutions. Human-aligned decision making requires accounting for both explicitly stated goals and latent user preferences that shape how ambiguous situations should be resolved.

摘要： 大型语言模型（LLMs）正越来越多地被用作许多应用中的推理模块。虽然它们在某些任务中表现高效，但往往难以产生符合人类意图的解决方案。人类对齐决策不仅需要考虑明确陈述的目标，还需要考虑潜在的用户偏好，这些偏好决定了应如何解决模糊情境。

Existing approaches to incorporating such preferences either rely on extensive and repeated user interactions or fail to generalize latent preferences across tasks and contexts, limiting their practical applicability. We consider a setting in which an LLM is used for high-level reasoning and is responsible for inferring latent user preferences from limited interactions, which guides downstream decision making.

现有的整合此类偏好的方法，要么依赖于大量且重复的用户交互，要么无法在不同任务和语境下泛化潜在偏好，从而限制了其实际应用价值。我们考虑了一种场景：LLM 被用于高层推理，并负责从有限的交互中推断潜在的用户偏好，进而指导后续的决策制定。

We introduce CLIPR (Conversational Learning for Inferring Preferences and Reasoning), a framework that learns actionable, transferable natural language rules that represent latent user preferences from minimal conversational input. These rules are iteratively refined through adaptive feedback and applied to both in-distribution and out-of-distribution ambiguous tasks across multiple environments.

我们引入了 CLIPR（用于推断偏好与推理的对话学习框架），该框架能够从极少的对话输入中学习可操作、可迁移的自然语言规则，以代表潜在的用户偏好。这些规则通过自适应反馈进行迭代优化，并应用于跨多个环境的分布内及分布外模糊任务中。

Evaluations on three datasets and a user study show that CLIPR consistently outperforms existing methods in improving alignment and reducing inference costs.

在三个数据集上的评估及一项用户研究表明，CLIPR 在提升对齐效果和降低推理成本方面始终优于现有方法。