When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation

当“看似合理”并非“真实”：评估基于大语言模型的城市模拟中的人类移动性

LLM-based generative agents are increasingly used in urban simulators, yet it remains unclear whether they reproduce empirically realistic human mobility patterns or merely generate plausible mobility narratives. 基于大语言模型（LLM）的生成式智能体正越来越多地被应用于城市模拟器中，然而，它们究竟是重现了符合实证的人类移动模式，还是仅仅生成了看似合理的移动叙事，目前尚不明确。

We introduce a validation framework for evaluating the mobility of generative agents of LLM-based urban simulators against real-world mobility data. For this, we use mobility laws, temporal rhythms, network motifs, semantic activity transitions, and behavioral mobility profiles. 我们引入了一个验证框架，旨在通过真实世界的移动数据来评估基于 LLM 的城市模拟器中生成式智能体的移动性。为此，我们使用了移动定律、时间节律、网络模体、语义活动转换以及行为移动特征等指标。

Using datasets from the Greater Paris region and Shanghai, we evaluate AgentSociety and CitySim across multiple dimensions of mobility realism. Our analysis reveals a substantial gap between narrative plausibility and empirical mobility realism. 利用大巴黎地区和上海的数据集，我们从多个移动真实性维度对 AgentSociety 和 CitySim 进行了评估。我们的分析揭示了叙事合理性与实证移动真实性之间存在巨大差距。

Although the simulators capture some high-level semantic activity distributions, they struggle to reproduce core spatial and temporal constraints, including realistic trip-length distributions, origin-destination flows, dwell times, and transition dynamics. 尽管这些模拟器能够捕捉到一些高层级的语义活动分布，但它们在重现核心时空约束方面表现吃力，包括真实的行程长度分布、出发地-目的地流量、停留时间以及转换动态。

We further observe that realistic mobility diversity is unstable across default prompting configurations and may require explicit profile-aware initialization. To support reproducible evaluation, we also contribute scalable and open LLM-driven infrastructure for regional-scale map generation, observability-enhanced simulation, mobility-metric computation, and traffic simulation. 我们进一步观察到，在默认的提示词配置下，真实的移动多样性并不稳定，可能需要明确的、具备特征感知能力的初始化。为了支持可重复的评估，我们还贡献了一套可扩展且开放的、由 LLM 驱动的基础设施，用于区域规模的地图生成、增强可观测性的模拟、移动性指标计算以及交通模拟。

Our findings highlight the need for rigorous empirical validation of LLM-based urban simulators and provide practical tools for building more realistic and reproducible urban simulation systems. 我们的研究结果强调了对基于 LLM 的城市模拟器进行严格实证验证的必要性，并为构建更真实、更具可重复性的城市模拟系统提供了实用工具。