The opportunity at home – can AI drive innovation in personal assistant devices and sign language?

The opportunity at home – can AI drive innovation in personal assistant devices and sign language?

居家机遇——人工智能能否推动个人助理设备与手语技术的创新?

Advancing tech innovation and combating the data dessert that exists related to sign language have been areas of focus for the AI for Accessibility program. Towards those goals, in 2019 the team hosted a sign language workshop, soliciting applications from top researchers in the field. 推动技术创新并解决手语领域存在的“数据荒漠”问题,一直是“人工智能无障碍计划”(AI for Accessibility program)的重点。为了实现这些目标,该团队于 2019 年举办了一场手语研讨会,向该领域的顶尖研究人员征集申请。

Abraham Glasser, a Ph.D. student in Computing and Information Sciences and a native American Sign Language (ASL) signer, supervised by Professor Matt Huenerfauth, was awarded a three-year grant. His work would focus on a very pragmatic need and opportunity: driving inclusion by concentrating on and improving common interactions with home-based smart assistants for people who use sign language as a primary form of communication. 计算与信息科学专业的博士生、母语为美国手语(ASL)的使用者 Abraham Glasser 在 Matt Huenerfauth 教授的指导下,获得了一笔为期三年的资助。他的研究聚焦于一个非常务实的需求与机遇:通过关注并改善以手语为主要交流方式的人群与居家智能助理之间的日常交互,来推动包容性。

Since then, faculty and students in the Golisano College of Computing and Information Sciences at Rochester Institute of Technology (RIT) conducted the work at the Center for Accessibility and Inclusion Research (CAIR). CAIR publishes research on computing accessibility and it includes many Deaf and Hard of Hearing (DHH) students operating bilingually in English and American Sign Language. 此后,罗切斯特理工学院(RIT)戈利萨诺计算与信息科学学院的师生在无障碍与包容性研究中心(CAIR)开展了这项工作。CAIR 发布关于计算无障碍的研究成果,其成员包括许多以英语和美国手语双语交流的聋人及重听(DHH)学生。

To begin this research, the team investigated how DHH users would optimally prefer to interact with their personal assistant devices, be it a smart speaker or other type of devices in the household that respond to spoken command. Traditionally, these devices have used voice-based interaction, and as technology evolved, newer models now incorporate cameras and display screens. 为了启动这项研究,团队调查了 DHH 用户最希望如何与个人助理设备(无论是智能音箱还是家中其他响应语音指令的设备)进行交互。传统上,这些设备使用基于语音的交互,随着技术的发展,较新的型号现在集成了摄像头和显示屏。

Currently, none of the available devices on the market understand commands in ASL or other sign languages, so introducing that capability is an important future tech development to address an untapped customer base and drive inclusion. Abraham explored simulated scenarios in which, through the camera on the device, the tech would be able to watch the signing of a user, process their request, and display the output result on the screen of the device. 目前,市场上现有的设备都无法理解 ASL 或其他手语指令,因此引入这一功能是未来技术发展的重要方向,旨在触达尚未开发的客户群并推动包容性。Abraham 探索了模拟场景:通过设备上的摄像头,技术能够观察用户的手语动作,处理他们的请求,并将输出结果显示在设备屏幕上。

Some prior research had focused on the phases of interacting with a personal assistant device, but little included DHH users. Some examples of available research included studying device activation, including the concerns of waking up a device, as well as device output modalities in the form for videos, ASL avatars and English captions. 此前的一些研究集中在与个人助理设备交互的阶段,但很少涉及 DHH 用户。现有研究的例子包括对设备激活的研究(包括唤醒设备的顾虑),以及视频、ASL 虚拟形象和英语字幕等形式的设备输出模式。

The call to action from a research perspective included collecting more data, the key bottleneck, for sign language technologies. To pave the way forward for technological advancements it was critical to understand what DHH users would like the interaction with the devices to look like and what type of commands they would like to issue. 从研究角度来看,当务之急是收集更多数据,这是手语技术的关键瓶颈。为了推动技术进步,了解 DHH 用户希望与设备的交互方式以及他们想要发出何种指令至关重要。

Abraham and the team set up a Wizard-of-Oz videoconferencing setup. A “wizard” ASL interpreter had a home personal assistant device in the room with them, joining the call without being seen on camera. The device’s screen and output would be viewable in the call’s video window and each participant was guided by a research moderator. As the Deaf participants signed to the personal home device, they did not know that the ASL interpreter was voicing the commands in spoken English. Abraham 和团队设置了一个“绿野仙踪”(Wizard-of-Oz)式的视频会议环境。一位“幕后”ASL 翻译员在房间里放置了一台个人助理设备,并以不出现在镜头中的方式加入通话。设备的屏幕和输出内容在通话视频窗口中可见,每位参与者都由一名研究主持人引导。当聋人参与者向设备打手语时,他们并不知道幕后的 ASL 翻译员正在将这些指令转化为口语英语。

A team of annotators watched the recording, identifying key segments of the videos, and transcribing each command into English and ASL gloss. Abraham was able to identify new ways that users would interact with the device, such as “wake-up” commands which were not captured in previous research. 一组标注员观看了录像,识别出视频的关键片段,并将每条指令转录为英语和 ASL 词汇表(gloss)。Abraham 成功识别出用户与设备交互的新方式,例如在以往研究中未被捕捉到的“唤醒”指令。

Screenshots of various “wake up” signs produced by participants during the study conducted remotely by researchers from the Rochester Institute of Technology. Participants were interacting with a personal assistant device, using American Sign Language (ASL) commands which were translated by an unseen ASL interpreter, and they spontaneously used a variety of ASL signs to activate the personal assistant device before giving each command. The signs here include examples labeled as: (a) HELLO, (b) HEY, (c) HI, (d) CURIOUS, (e) DO-DO, and (f) A-L-E-X-A. 这是罗切斯特理工学院研究人员远程进行研究期间,参与者产生的各种“唤醒”手势截图。参与者正在与个人助理设备交互,使用由幕后 ASL 翻译员翻译的美国手语(ASL)指令,并在发出每条指令前自发地使用各种 ASL 手势来激活设备。图中的手势示例包括:(a) HELLO, (b) HEY, (c) HI, (d) CURIOUS, (e) DO-DO, 和 (f) A-L-E-X-A。

Additionally, a summarization of command categories and frequencies showed the most popular category was “command and control” where users adjust device settings, navigate through the results and answer yes/no style of questions. The next popular category was related to entertainment questions, followed by lifestyle and shopping. 此外,对指令类别和频率的总结显示,最受欢迎的类别是“命令与控制”,用户在此类交互中调整设备设置、浏览结果并回答“是/否”类型的问题。其次是与娱乐相关的问题,随后是生活方式和购物。

Furthermore, despite signing into a device, participants made sophisticated use of the spaces around their bodies, for example to represent and refer to people or things that were the topic of their questions. Another observation was the use of a question-mark sign at the beginning of yes or no questions, to call the attention of the device, while typically this sign more often used at the end of such questions. 此外,尽管是向设备打手语,参与者仍巧妙地利用了身体周围的空间,例如用以代表和指代他们问题中的人物或事物。另一个观察结果是,参与者在“是/否”问题的开头使用了问号手势来引起设备的注意,而通常该手势更多出现在此类问题的结尾。

When it came to errors, such as the device not giving the result the users were looking for, most commonly users would simply ignore the error and proceed with a different command. A close second method was to repeat the command with the exact same wording and signing style, followed by rewording the command. For instance, some reworded their questions to be more English language like, or fingerspelling words for emphasis upon re-attempts. 当出现错误(例如设备未给出用户预期的结果)时,最常见的做法是用户直接忽略错误并继续发出不同的指令。第二常见的方法是使用完全相同的措辞和手势风格重复指令,随后是重新措辞。例如,有些人将问题重新表述得更像英语句式,或者在重试时通过手指拼写单词来强调。

A paper with the full details of the research has been presented and published in the Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, entitled “Analyzing Deaf and Hard-of-Hearing Users’ Behavior, Usage, and Interaction with a Personal Assistant Device that Understands Sign-Language Input” by Abraham Glasser, Matthew Watkins, Kira Hart, Sooyeon Lee, Matt Huenerfauth. 该研究的详细论文已在 2022 年 CHI 计算系统中的人为因素会议论文集中发表,题为《分析聋人及重听用户对理解手语输入的个人助理设备的行为、使用及交互》,作者包括 Abraham Glasser、Matthew Watkins、Kira Hart、Sooyeon Lee 和 Matt Huenerfauth。

The knowledge gained through this research was then the basis of building a video dataset of recording of DHH people producing commands in ASL and interacting with their personal assistant devices – such as asking about the weather, controlling electronics in their home environment, and more. In using surveys and interviews to gather preferences and requirements from DHH users, videos were collected of ASL commands, leading to the production of a publ… 通过这项研究获得的知识,成为了构建视频数据集的基础,该数据集记录了 DHH 人群使用 ASL 发出指令并与个人助理设备交互的过程——例如询问天气、控制家庭环境中的电子设备等。通过利用问卷调查和访谈收集 DHH 用户的偏好和需求,研究团队收集了 ASL 指令视频,从而促成了……的制作。