A Survey of Automated Presentation Coaching: Systems, Methods, and Open Challenges
A Survey of Automated Presentation Coaching: Systems, Methods, and Open Challenges
自动化演讲辅导综述:系统、方法与开放性挑战
Abstract: Automated coaching for oral presentations sits at the intersection of computer-assisted pronunciation training (CAPT), prosody modeling, and speech synthesis, yet no prior work has systematically surveyed and compared existing systems along these dimensions.
摘要: 自动化演讲辅导处于计算机辅助发音训练(CAPT)、韵律建模和语音合成的交叉领域,但目前尚无研究系统地梳理和比较这些维度下的现有系统。
This survey reviews and categorizes automated presentation coaching systems, spanning pronunciation tutors, fluency and prosody coaches, multimodal trainers, and conference Q&A practice tools.
本综述回顾并分类了自动化演讲辅导系统,涵盖了发音导师、流利度与韵律教练、多模态训练器以及会议问答练习工具。
We introduce a five-dimensional task taxonomy - covering segmental pronunciation, lexical stress, suprasegmental prosody, pacing, and content faithfulness - and explicitly map surveyed systems onto it to reveal coverage gaps.
我们引入了一个五维任务分类法——涵盖音段发音、词汇重音、超音段韵律、语速和内容忠实度——并将所调研的系统明确映射到该框架中,以揭示当前研究的覆盖缺口。
We further review the core technical methods these systems employ: TTS-based exemplar generation and diagnostic methods for pronunciation, prosody, and fluency assessment.
我们进一步回顾了这些系统所采用的核心技术方法:基于文本转语音(TTS)的范例生成,以及用于发音、韵律和流利度评估的诊断方法。
Key open challenges include the scarcity of annotated presentation corpora, achieving accent-fair feedback across diverse L1 backgrounds, and delivering low-latency diagnostics for real-time rehearsal.
主要的开放性挑战包括:带标注演讲语料库的匮乏、在不同母语(L1)背景下实现口音公平的反馈,以及为实时排练提供低延迟的诊断服务。