ACAT: A Collaborative Platform for Efficient Aspect-Based Sentiment Dataset Annotation

ACAT: A Collaborative Platform for Efficient Aspect-Based Sentiment Dataset Annotation

ACAT:一个用于高效基于方面的情感数据集标注的协作平台

Abstract: Aspect-Based Sentiment Analysis (ABSA) requires high-quality datasets to train reliable models. However, existing annotation tools treat output as flat files, leaving researchers to manually consolidate multi-annotator data, reconstruct relational structures, and compute reliability metrics through custom scripts. 摘要: 基于方面的情感分析(ABSA)需要高质量的数据集来训练可靠的模型。然而,现有的标注工具通常将输出视为扁平文件,这使得研究人员不得不手动整合多标注者的数据、重建关系结构,并通过自定义脚本计算可靠性指标。

This paper introduces ACAT (Aspect-based sentiment analysis Collaborative Annotation Tool), a web-based platform natively supporting four ABSA workflows: (1) Aspect-Category Sentiment Analysis, (2) Clause-Level Segmentation, (3) Aspect-Term Sentiment Analysis with character-level position tracking, and (4) Aspect Sentiment Triplet Extraction with dual span offset preservation. 本文介绍了 ACAT(基于方面的情感分析协作标注工具),这是一个原生支持四种 ABSA 工作流的网页平台:(1) 方面类别情感分析,(2) 子句级分割,(3) 具有字符级位置追踪的方面术语情感分析,以及 (4) 具有双跨度偏移保留的方面情感三元组提取。

Its core contribution is an automated Extract, Transform, Load (ETL) pipeline that aligns collaborative annotations and computes Inter-Annotator Agreement (IAA) metrics directly at export, yielding training-ready datasets. 其核心贡献在于一个自动化的提取、转换、加载(ETL)流水线,该流水线能够对齐协作标注,并在导出时直接计算标注者间一致性(IAA)指标,从而生成可直接用于训练的数据集。

In a preliminary validation on 1,002 restaurant reviews with two annotators of differing expertise, ACAT achieves a median annotation time of 31.58 seconds and a raw IAA ranging from 0.78 to 0.86 across all tasks. 在针对 1,002 条餐厅评论进行的初步验证中,由两名不同专业水平的标注者参与,ACAT 的中位标注时间为 31.58 秒,且所有任务的原始 IAA(标注者间一致性)范围在 0.78 到 0.86 之间。