Generating Query-Focused Summarization Datasets from Query-Free Summarization Datasets
Generating Query-Focused Summarization Datasets from Query-Free Summarization Datasets
从无查询摘要数据集生成查询聚焦摘要数据集
Abstract: Large-scale datasets are widely used to perform summarization tasks, but they may not include queries alongside documents and summaries. In the search for suitable datasets for Query-Focused Summarization (QFS), we identify two research questions: Is it possible to automatically generate evidence-based query keywords from query-free datasets? Does evidence-based query generation support the QFS task?
摘要: 大规模数据集被广泛用于执行摘要任务,但它们往往不包含与文档和摘要相对应的查询。在寻找适合查询聚焦摘要(QFS)的数据集时,我们提出了两个研究问题:是否可以从无查询数据集中自动生成基于证据的查询关键词?基于证据的查询生成是否能支持 QFS 任务?
This paper proposes an evidence-based model to generate queries from query-free datasets. To evaluate our model intrinsically, we compare the similarity between the original queries and the system-generated queries of two QFS datasets. We also perform summarization tasks using different pre-trained models, as well as a state-of-the-art (SOTA) QFS model, to measure the extrinsic performance of our query generation approach.
本文提出了一种基于证据的模型,用于从无查询数据集中生成查询。为了对模型进行内在评估,我们比较了两个 QFS 数据集中原始查询与系统生成查询之间的相似度。此外,我们还使用不同的预训练模型以及最先进(SOTA)的 QFS 模型执行了摘要任务,以衡量我们查询生成方法的外部性能。
Experimental results indicate that summaries generated using evidence-based queries achieve competitive ROUGE scores compared to those generated from the original queries.
实验结果表明,与使用原始查询生成的摘要相比,使用基于证据的查询所生成的摘要在 ROUGE 分数上表现出了极强的竞争力。
Paper Details:
- Authors: Yllias Chali, Deen Abdullah
- arXiv ID: 2605.05392
- Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- Submission Date: 6 May 2026
论文详情:
- 作者: Yllias Chali, Deen Abdullah
- arXiv ID: 2605.05392
- 学科分类: 计算与语言 (cs.CL);人工智能 (cs.AI)
- 提交日期: 2026年5月6日