A Survey of Automated Presentation Coaching: Systems, Methods, and Open Challenges

A Survey of Automated Presentation Coaching: Systems, Methods, and Open Challenges

自动化演讲辅导综述:系统、方法与开放性挑战

Abstract: Automated coaching for oral presentations sits at the intersection of computer-assisted pronunciation training (CAPT), prosody modeling, and speech synthesis, yet no prior work has systematically surveyed and compared existing systems along these dimensions.

摘要: 自动化演讲辅导处于计算机辅助发音训练(CAPT)、韵律建模和语音合成的交叉领域,但目前尚无研究系统地梳理和比较这些维度下的现有系统。

This survey reviews and categorizes automated presentation coaching systems, spanning pronunciation tutors, fluency and prosody coaches, multimodal trainers, and conference Q&A practice tools.

本综述回顾并分类了自动化演讲辅导系统,涵盖了发音导师、流利度与韵律教练、多模态训练器以及会议问答练习工具。

We introduce a five-dimensional task taxonomy - covering segmental pronunciation, lexical stress, suprasegmental prosody, pacing, and content faithfulness - and explicitly map surveyed systems onto it to reveal coverage gaps.

我们引入了一个五维任务分类法——涵盖音段发音、词汇重音、超音段韵律、语速和内容忠实度——并将所调研的系统明确映射到该框架中,以揭示当前研究的覆盖缺口。

We further review the core technical methods these systems employ: TTS-based exemplar generation and diagnostic methods for pronunciation, prosody, and fluency assessment.

我们进一步回顾了这些系统所采用的核心技术方法:基于文本转语音(TTS)的范例生成,以及用于发音、韵律和流利度评估的诊断方法。

Key open challenges include the scarcity of annotated presentation corpora, achieving accent-fair feedback across diverse L1 backgrounds, and delivering low-latency diagnostics for real-time rehearsal.

主要的开放性挑战包括:带标注演讲语料库的匮乏、在不同母语(L1)背景下实现口音公平的反馈,以及为实时排练提供低延迟的诊断服务。