My Accessibility Stack and the future on Wayland

My Accessibility Stack and the future on Wayland

我的无障碍技术栈与 Wayland 的未来

This is an article quite some time in the making. I’ve written 3 or 4 drafts of it over the last 4 months, looking for just the right thing to say. After I wrote the initial draft of this one, it sat in the drafts for another 2 months before I really finished it. In the end, I’ve decided that being straightforward is the best way to go, so here it is: As the Linux Desktop transitions to a Wayland-only future, I will be locked out of my computer, as the accessibility software I rely on is left behind.

这篇文章酝酿已久。在过去的四个月里,我写了三四个草稿,一直在寻找最合适的表达方式。写完初稿后,它在草稿箱里又躺了两个月,直到我真正完成它。最终,我决定开门见山,直抒胸臆:随着 Linux 桌面转向仅支持 Wayland 的未来,我将被拒之于自己的电脑之外,因为我所依赖的无障碍软件将被抛弃。

The desktop environment I use, KDE Plasma, has announced that in early 2027, X11 support will be removed from the system. That means in about roughly 9 months, I will no longer be welcome on that desktop environment, being forced to cling to an older version or switch to a more niche environment that still supports it. Why? The Wayland desktop has been making great strides in accessibility recently, right? Well, there’s a subset of accessibility that absolutely no one is talking about, and that’s what I’m here to fix today: Input devices.

我使用的桌面环境 KDE Plasma 已经宣布,将在 2027 年初从系统中移除对 X11 的支持。这意味着在大约 9 个月后,我将不再被该桌面环境所兼容,被迫坚守旧版本,或者切换到仍然支持 X11 的小众环境。为什么?Wayland 桌面最近在无障碍方面不是取得了巨大进步吗?然而,无障碍领域有一个几乎无人提及的子集,而这正是我今天要解决的问题:输入设备。

Most of the discussion about accessibility refers to output, for users who have limited vision, or are blind. The series of articles written by Fireborn last year discuss the myriad problems that exist while trying to use the Linux desktop while blind, and likewise GNOME has been devoting their attention to, for instance, supporting AccessKit in their applications, which helps screen readers such as Orca render its contents via text to speech. But accessibility cuts both ways, and it’s equally valid for people to have trouble conveying input to their systems.

关于无障碍的大多数讨论都集中在输出端,即针对视力受限或失明的用户。Fireborn 去年撰写的一系列文章讨论了盲人在使用 Linux 桌面时遇到的无数问题;同样,GNOME 也一直致力于在应用程序中支持 AccessKit,这有助于 Orca 等屏幕阅读器通过语音合成来呈现内容。但无障碍是双向的,对于那些在向系统输入指令时遇到困难的人来说,同样需要关注。

Such is the case for me, as last year, after a gradual but steady period of decline I was diagnosed with Ehlers-Danlos Syndrome, a musculoskeletal genetic defect which wreaks all sorts of havoc on your body. An earlier draft of this article included my whole, nasty journey of getting diagnosed and treated for such a rare (and often misdiagnosed) condition, but it’s really not that important– What matters is that basically what it did to me in particular was destroy all those important little muscles in the wrist that let you flex your fingers, say, in order to use a keyboard or work a mouse.

我的情况正是如此。去年,在经历了一段缓慢但持续的身体机能衰退后,我被诊断出患有埃勒斯-当洛综合征(Ehlers-Danlos Syndrome),这是一种会导致身体各种机能紊乱的肌肉骨骼遗传缺陷。本文的早期草稿中包含了我就这种罕见(且常被误诊)疾病进行诊断和治疗的整个痛苦过程,但这其实并不重要——重要的是,它对我造成的具体影响是摧毁了手腕中所有重要的微小肌肉,而这些肌肉正是让你能够弯曲手指以使用键盘或操作鼠标的关键。

Thanks to months of intensive physical therapy with a specialist in hypermobility disorders, I’ve regained partial use of my hands; depending on the day, I can get through maybe a few hours of typing with a specialized keyboard. (As an aside, the fact that I’ve regained ANY use is something of a miracle, largely owing to good location relative to specialists, liberal medical leave policy in Massachusetts, and my undeniable position of privilege within the horrifically messed up U.S. healthcare system. There are many worlds in which I wind up permanently disabled and never even find out why.)

多亏了与一位过度活动症候群专家进行了数月的强化物理治疗,我的手恢复了部分功能;根据当天的身体状况,我或许能用特制键盘打几个小时的字。(顺便提一下,我能恢复哪怕一点点功能都算是个奇迹,这很大程度上归功于我所处的地理位置方便就医、马萨诸塞州宽松的医疗假政策,以及我在美国那套糟糕透顶的医疗体系中不可否认的特权地位。在很多情况下,我可能最终会永久残疾,甚至永远不知道原因。)

But regrowing a large set of small muscles that all withered away is terribly slow, very painful, and probably imperfect; I may never truly regain full use. The progress I’ve made doesn’t get me through a full workday, let alone weeks on end; I need another way to get through my life and career. Enter Talon Voice.

但是,让一大群已经萎缩的微小肌肉重新生长是非常缓慢、痛苦且可能不完美的;我可能永远无法完全恢复功能。我所取得的进展不足以支撑我完成一天的工作,更不用说长期的工作了;我需要另一种方式来维持我的生活和职业生涯。于是,Talon Voice 登场了。

Talon Voice: With possibly one of the most understated landing pages in existence, Talon doesn’t do much at first glance to convey that it’s probably the most powerful hands-free input system ever created. Talon is a deeply, thoughtfully crafted core of a hyper-fast and accurate Speech-To-Text ML model, a bespoke scripting language, and Python, all working together in concert to enable nearly infinite extensibility for users to craft their own hands-free means of communication to their applications, either working with the application… Or against it. (That notion of “adversarial accessibility” is something that comes up quite frequently!)

Talon Voice:Talon 的落地页可能是现存最朴实无华的页面之一,乍看之下,它并没有传达出这可能是史上最强大的免提输入系统这一事实。Talon 的核心是一个经过深思熟虑、精心打造的超快速且准确的语音转文字机器学习模型,配合定制的脚本语言和 Python,共同协作,为用户提供了近乎无限的扩展性,让他们能够打造属于自己的免提方式与应用程序进行交互,无论是与应用程序“合作”……还是“对抗”。(这种“对抗性无障碍”的概念经常被提及!)

The community series of scripts is the first thing to install to make Talon useful, and boy, is it a whopper. There’s tens of thousands of lines of code in here, all carefully hand-written by individuals to meet their specific needs and conglomerated into a whole so that others can benefit from their work. With Talon, I can do things like: Focus applications, saving me the mouse movement and clicks necessary to select them from the taskbar; Write text, using Dictation Mode (most of this article was written with talon); Interact with my browser, using the Rango extension for browsers, completely hands free (which happens to be faster than traditionally moving around with a mouse anyhow); Write my own script to call out over D-Bus to an external speech to text program for times when I’m writing longer prose (Whisper-v3-large is a truly remarkable model, understanding things like proper nouns it’s never heard of before, though it isn’t quite fast); Make a hissing noise to scroll, which is a motion that is persistently painful for me regardless of what input device I try using (though in the future, I may try to integrate a foot petal with Talon).

社区脚本系列是让 Talon 发挥作用的首要安装项,天哪,这简直是个庞然大物。这里有数万行代码,全部由个人根据自身特定需求精心编写,并汇集成一个整体,以便他人能从他们的工作中受益。有了 Talon,我可以做到:聚焦应用程序,省去了从任务栏选择它们所需的鼠标移动和点击;使用听写模式编写文本(本文大部分内容都是用 Talon 写的);使用 Rango 浏览器扩展完全免提地与浏览器交互(这比传统的鼠标操作还要快);编写自己的脚本,通过 D-Bus 调用外部语音转文字程序,用于撰写较长的文章(Whisper-v3-large 是一个非常出色的模型,能理解它从未听过的专有名词,尽管速度不够快);发出嘶嘶声来滚动页面,对我来说,无论使用什么输入设备,滚动动作都会带来持续的疼痛(尽管未来我可能会尝试将脚踏板与 Talon 集成)。

The list goes on and on and on, but I’ve saved the best two extensions of what I’ll bring up here for last: gaze_ocr. gaze_ocr is an unbelievably cool extension that lets you directly control your screen using OCR. Using an OCR backend (one is not provided on Linux, but I was able to plug in RapidOCR) gaze_ocr will directly read the contents of your screen, allowing you to directly click on any object. Using an eye tracker, it will even disambiguate the text on the screen depending on what you’re literally looking at. I really cannot do this justice without a video, so I strongly encourage you to watch the sixty second intro here: https://youtu.be/qkFy66WF3bU. Suffice to say that this alone makes life so cool. There is zero integration required on behalf of any of the applications involved– yet I can interact with them anyway. Adversarial accessibility at its finest! I feel like I’m in a scifi movie whenever I use this package. But it isn’t even the most powerful extension I use… Cursorless. Man, this is…

列表不胜枚举,但我把这里要提到的两个最好的扩展留到了最后:gaze_ocr。gaze_ocr 是一个令人难以置信的酷炫扩展,它让你能够使用 OCR 直接控制屏幕。通过使用 OCR 后端(Linux 上没有提供,但我能够接入 RapidOCR),gaze_ocr 可以直接读取屏幕内容,让你直接点击任何对象。配合眼动追踪器,它甚至可以根据你实际注视的位置来消除屏幕上文本的歧义。没有视频我真的无法充分描述它的强大,所以我强烈建议你观看这里的 60 秒介绍:https://youtu.be/qkFy66WF3bU。只需说,仅此一项就让生活变得如此美妙。它不需要任何应用程序进行集成,但我仍然可以与它们交互。这就是对抗性无障碍的极致!每当我使用这个包时,我都感觉自己置身于科幻电影中。但这甚至还不是我使用的最强大的扩展……Cursorless。天哪,这简直是……