DraDDP: A Multimodal Multi-Party Dialogue Discourse Parsing Dataset

DraDDP：一个多模态多方对话篇章分析数据集

Multi-party dialogue discourse parsing aims to identify dependency structures and relation types between utterances in conversations. Previous studies are mostly limited to textual modality or two-party dialogue, failing to meet the multimodal and multi-party settings.

多方对话篇章分析旨在识别对话中话语之间的依赖结构和关系类型。以往的研究大多局限于文本模态或双方对话，无法满足多模态和多方对话的场景需求。

In this paper, we construct the first publicly available English multimodal dataset DraDDP for multi-party dialogue discourse parsing, based on American TV dramas. DraDDP contains 495 dialogue segments with 6,374 utterances and 9.1 hours of parallel video content, covering rich multi-party interaction scenarios.

在本文中，我们基于美剧构建了首个公开的英语多模态多方对话篇章分析数据集 DraDDP。DraDDP 包含 495 个对话片段，共计 6,374 条话语以及 9.1 小时的平行视频内容，涵盖了丰富的多方交互场景。

Moreover, we establish comprehensive benchmarks by evaluating this task on DraDDP and conducting in-depth analysis on the impact of different modalities. Experimental results demonstrate the value of multimodal information in capturing dialogue structures and relation types.

此外，我们通过在 DraDDP 上评估该任务并深入分析不同模态的影响，建立了全面的基准测试。实验结果证明了多模态信息在捕捉对话结构和关系类型方面的价值。

We will publicly release the dataset, annotation guidelines, and code to promote future research in multimodal dialogue understanding.

我们将公开发布该数据集、标注指南和代码，以促进多模态对话理解领域的未来研究。