Broadening Access to Transportation Safety Data with Generative AI: A Schema-Grounded Framework for Spatial Natural Language Queries

利用生成式 AI 拓宽交通安全数据的访问途径：一种基于模式的地理空间自然语言查询框架

Abstract: Transportation safety analysis requires integrating crash records, roadway attributes, and geospatial data through GIS-based workflows, but access remains uneven across agencies and community stakeholders. Technical prerequisites create a gap between analytical tools central to safety planning and the practitioners able to use them. Local agencies, school committees, and residents may have safety concerns but limited capacity to retrieve, filter, map, and analyze relevant data.

摘要： 交通安全分析需要通过基于 GIS（地理信息系统）的工作流程，整合事故记录、道路属性和地理空间数据，但不同机构和社区利益相关者在获取这些数据方面仍存在不均衡。技术门槛在安全规划的核心分析工具与能够使用这些工具的从业者之间造成了鸿沟。地方机构、学校委员会和居民可能存在安全顾虑，但缺乏检索、筛选、绘制和分析相关数据的能力。

Generative AI offers a way to narrow this divide, but its public-sector use raises questions about reliability, reproducibility, and governance. This paper presents a schema-grounded natural language interface for transportation safety analysis, using a large language model (LLM) to interpret user intent while preserving deterministic, reviewable execution against an authoritative database.

生成式 AI 提供了一种缩小这一差距的方法，但其在公共部门的应用引发了关于可靠性、可重复性和治理的质疑。本文提出了一种用于交通安全分析的“基于模式（Schema-Grounded）”的自然语言接口，利用大语言模型（LLM）来解读用户意图，同时确保在权威数据库上执行确定性且可审查的操作。

User queries are translated into structured semantic frames, validated by a rule-based layer, compiled into a typed directed acyclic graph of spatial operations, and executed against a PostGIS database. This bounded design separates language interpretation from deterministic execution, keeping results reproducible and schema-grounded while removing access barriers.

用户查询被转换为结构化的语义框架，经由基于规则的层进行验证，编译为空间操作的类型化有向无环图，并在 PostGIS 数据库上执行。这种受限的设计将语言解释与确定性执行分离开来，在消除访问障碍的同时，确保了结果的可重复性和基于模式的准确性。

The framework is evaluated using a statewide Massachusetts transportation safety database integrating crash records, roadway attributes, and geospatial layers including schools, bus stops, crosswalks, and municipal boundaries. All queries executed successfully; the validation layer corrects errors in 29% of evaluation queries, reflecting the gap between flexible natural language and strict schema-grounded requirements.

该框架通过马萨诸塞州全州交通安全数据库进行了评估，该数据库整合了事故记录、道路属性以及包括学校、公交车站、人行横道和市政边界在内的地理空间图层。所有查询均成功执行；验证层纠正了 29% 的评估查询中的错误，这反映了灵活的自然语言与严格的模式要求之间存在的差距。

The results suggest that combining natural language accessibility with deterministic execution is a practical direction for broadening access to transportation safety data, with implications for trustworthy AI in public-sector planning.

研究结果表明，将自然语言的易用性与确定性执行相结合，是拓宽交通安全数据访问途径的切实可行方向，并对公共部门规划中可信 AI 的应用具有重要意义。