DIAGRAMS: A Review Framework for Reasoning-Level Attribution in Diagram QA

Abstract: Diagram question answering (Diagram QA) requires reasoning-level attribution that links each question-answer pair to all visual regions needed to derive the answer, rather than only the region containing the final response.

摘要： 图表问答（Diagram QA）需要推理层面的归因，即需要将每个问答对与推导出答案所需的所有视觉区域相关联，而不仅仅是包含最终答案的区域。

Creating such structured evidence across diagrams, charts, maps, circuits, and infographics is time-consuming, and existing annotation tools tightly couple their interfaces to dataset-specific formats.

在图表、统计图、地图、电路图和信息图中创建此类结构化证据非常耗时，且现有的标注工具通常将其界面与特定数据集的格式紧密耦合。

We present DIAGRAMS, a lightweight, schema-driven review framework that decouples interface logic from dataset-specific JSON structures through an internal meta-schema and dataset adapters.

我们提出了 DIAGRAMS，这是一个轻量级的、由模式驱动的审查框架。它通过内部元模式（meta-schema）和数据集适配器，将界面逻辑与特定数据集的 JSON 结构解耦。

Given an image and QA pair with optional candidate regions, the system performs QA-conditioned evidence selection and proposes the regions required for reasoning.

给定一张图像和一个问答对（以及可选的候选区域），该系统能够执行基于问答条件的证据选择，并提出推理所需的区域。

When QA pairs or candidate regions are missing, it generates them and supports human verification and refinement.

当问答对或候选区域缺失时，系统会自动生成它们，并支持人工核对与优化。

Across six Diagram QA datasets, model-suggested evidence achieves 85.39% precision and 75.30% recall against reviewer-final selections (micro-averaged).

在六个图表问答数据集上，模型建议的证据与审查员最终选择的结果相比，达到了 85.39% 的精确率和 75.30% 的召回率（微平均值）。

These results indicate that the review-first framework reduces manual region creation while maintaining high agreement with final reasoning-level attributions.

这些结果表明，这种“审查优先”的框架在减少人工创建区域工作量的同时，保持了与最终推理层面归因的高度一致性。

We release a public demo and installable package to support dataset auditing, grounded supervision creation, and grounded evaluation.

我们发布了公开演示版和可安装软件包，以支持数据集审计、基础监督（grounded supervision）创建以及基础评估。