Learning Transferable Latent User Preferences for Human-Aligned Decision Making
Learning Transferable Latent User Preferences for Human-Aligned Decision Making
学习用于人类对齐决策的可迁移潜在用户偏好
Abstract: Large language models (LLMs) are increasingly used as reasoning modules in many applications. While they are efficient in certain tasks, LLMs often struggle to produce human-aligned solutions. Human-aligned decision making requires accounting for both explicitly stated goals and latent user preferences that shape how ambiguous situations should be resolved.
摘要: 大型语言模型(LLMs)正越来越多地被用作许多应用中的推理模块。虽然它们在某些任务中表现高效,但往往难以产生符合人类意图的解决方案。人类对齐决策不仅需要考虑明确陈述的目标,还需要考虑潜在的用户偏好,这些偏好决定了应如何解决模糊情境。
Existing approaches to incorporating such preferences either rely on extensive and repeated user interactions or fail to generalize latent preferences across tasks and contexts, limiting their practical applicability. We consider a setting in which an LLM is used for high-level reasoning and is responsible for inferring latent user preferences from limited interactions, which guides downstream decision making.
现有的整合此类偏好的方法,要么依赖于大量且重复的用户交互,要么无法在不同任务和语境下泛化潜在偏好,从而限制了其实际应用价值。我们考虑了一种场景:LLM 被用于高层推理,并负责从有限的交互中推断潜在的用户偏好,进而指导后续的决策制定。
We introduce CLIPR (Conversational Learning for Inferring Preferences and Reasoning), a framework that learns actionable, transferable natural language rules that represent latent user preferences from minimal conversational input. These rules are iteratively refined through adaptive feedback and applied to both in-distribution and out-of-distribution ambiguous tasks across multiple environments.
我们引入了 CLIPR(用于推断偏好与推理的对话学习框架),该框架能够从极少的对话输入中学习可操作、可迁移的自然语言规则,以代表潜在的用户偏好。这些规则通过自适应反馈进行迭代优化,并应用于跨多个环境的分布内及分布外模糊任务中。
Evaluations on three datasets and a user study show that CLIPR consistently outperforms existing methods in improving alignment and reducing inference costs.
在三个数据集上的评估及一项用户研究表明,CLIPR 在提升对齐效果和降低推理成本方面始终优于现有方法。