The text mode lie: why modern TUIs are a nightmare for accessibility

The text mode lie: why modern TUIs are a nightmare for accessibility

文本模式的谎言:为什么现代 TUI 是无障碍访问的噩梦

The mythical, it’s text, so it’s accessible. There is a persistent misconception among sighted developers: if an application runs in a terminal, it is inherently accessible. The logic assumes that because there are no graphics, no complex DOM, and no WebGL canvases, the content is just raw ASCII text that a screen reader can easily parse. The reality is different. Most modern Text User Interfaces (TUIs) are often more hostile to accessibility than poorly coded graphical interfaces. The very tools designed to improve the Developer Experience (DX) in the terminal—frameworks like Ink (JS/React), Bubble Tea (Go), or tcell—are actively destroying the experience for blind users.

“因为是文本,所以无障碍”是一个神话。在视力正常的开发者中存在一种持续的误解:如果一个应用程序在终端中运行,它就天生具有无障碍性。这种逻辑假设由于没有图形、没有复杂的 DOM、也没有 WebGL 画布,内容只是屏幕阅读器可以轻松解析的原始 ASCII 文本。但现实并非如此。大多数现代文本用户界面(TUI)对无障碍访问的敌意,往往比编码糟糕的图形界面还要大。那些旨在改善终端开发者体验(DX)的工具——如 Ink (JS/React)、Bubble Tea (Go) 或 tcell 等框架——实际上正在破坏盲人用户的使用体验。

The Architectural Flaw: Stream vs. Grid

架构缺陷:流(Stream)与网格(Grid)

To understand the failure, we must distinguish between two distinct concepts often conflated under “terminal apps”: the CLI (Command Line Interface) and the TUI. The CLI (The Stream): This operates on a standard input/output model (stdin/stdout). You type a command, the system appends the result below, and the cursor moves down. This is linear and chronological. For a screen reader, specifically kernel-level readers like Speakup, this is ideal. The TUI (The Grid): This treats the terminal window not as a stream of text, but as a 2D grid of pixels, where every character cell is a pixel. It abandons the temporal flow for a spatial layout.

要理解这种失败,我们必须区分在“终端应用”这一概念下经常被混淆的两个不同概念:CLI(命令行界面)和 TUI。CLI(流):它基于标准的输入/输出模型(stdin/stdout)运行。你输入一个命令,系统在下方追加结果,光标向下移动。这是线性的、按时间顺序的。对于屏幕阅读器,特别是像 Speakup 这样的内核级阅读器来说,这是最理想的。TUI(网格):它不把终端窗口视为文本流,而是视为一个二维像素网格,其中每个字符单元格都是一个像素。它放弃了时间流,转而采用空间布局。

Case Study: The gemini-cli Madness

案例研究:gemini-cli 的疯狂

Let’s look at a concrete example: gemini-cli, a tool written in Node.js using the Ink framework. On the surface, it looks like a simple chat interface. But underneath, Ink is trying to reconcile a React component tree into a terminal grid. When you use this tool with Speakup (Linux) or NVDA (Windows), the application doesn’t just fail; it actively spams you. Because the framework treats the screen as a reactive canvas, every update triggers a redraw. When the AI is “thinking,” the tool updates a timer or a spinner. To do this, it moves the hardware cursor to the timer location, writes the new time, and moves it back. For a sighted user, this happens instantly. For a screen reader user, this is what you hear: “Responding… Time elapsed 1s… Responding… Time elapsed 2s… [Fragment of chat history]… Responding…” It drives the screen reader mad. The cursor is teleporting all over the screen to update status indicators, spinners, and history. Speakup tries to read whatever is under the cursor at that exact millisecond. You end up hearing random bits of conversation mixed with timer updates, making it impossible to focus on what you are actually typing.

让我们看一个具体的例子:gemini-cli,一个使用 Ink 框架用 Node.js 编写的工具。从表面上看,它看起来像一个简单的聊天界面。但在底层,Ink 试图将 React 组件树协调进终端网格中。当你使用 Speakup (Linux) 或 NVDA (Windows) 使用此工具时,应用程序不仅会失败,还会主动向你发送垃圾信息。由于该框架将屏幕视为响应式画布,每次更新都会触发重绘。当 AI 正在“思考”时,工具会更新计时器或加载动画。为了做到这一点,它会将硬件光标移动到计时器位置,写入新时间,然后再移回。对于视力正常的用户,这一切瞬间完成。但对于屏幕阅读器用户,你听到的是:“正在响应… 已用时间 1 秒… 正在响应… 已用时间 2 秒… [聊天记录片段]… 正在响应…” 这会让屏幕阅读器发疯。光标在屏幕上四处传送以更新状态指示器、加载动画和历史记录。Speakup 会尝试读取那一毫秒光标下的任何内容。最终你听到的是随机的对话片段混杂着计时器更新,让你根本无法专注于自己正在输入的内容。

Worse, lets pretend that you’ve somehow managed well with speakup so far, but that you want to do some work with nvda. Maybe paste an error you’re getting on windows. So you open your terminal, ssh into your linux box, attach to your screen session and paste your text. The result is an immediate crash of the screen reader (NVDA) or massive system instability. Why? Every time you type a character or paste text, the application triggers a state change. The framework decides it needs to re-render the interface. Because the conversation history is part of that state, the application attempts to redraw or re-calculate the layout for thousands of lines of text instantly. The more messages you have in a conversation, the more this will happen. And no, you can’t just avoid this by using insert+5, the key combo supposed to avoid announcing dynamic change of content.

更糟糕的是,假设你目前用 Speakup 还能勉强应付,但你想用 NVDA 做些工作。比如粘贴你在 Windows 上遇到的错误信息。于是你打开终端,SSH 连接到 Linux 服务器,附加到 screen 会话并粘贴文本。结果是屏幕阅读器(NVDA)立即崩溃或系统出现严重不稳定。为什么?每当你输入一个字符或粘贴文本时,应用程序都会触发状态更改。框架决定需要重新渲染界面。由于对话历史是该状态的一部分,应用程序会尝试瞬间重绘或重新计算数千行文本的布局。对话中的消息越多,这种情况就越频繁。而且不,你无法通过使用 Insert+5(本应避免播报动态内容变化的快捷键)来规避这个问题。

The Lag Loop

延迟循环

Furthermore, frameworks like Ink running on single-threaded environments (like Node.js) suffer from massive performance degradation when the history grows. If you paste a large block of text, the system has to calculate the diff for thousands of lines. This causes input lag. You press a key, and you wait. You can wait up to 10 seconds for a single character to echo back. The system is too busy calculating how to redraw the screen to actually process your input.

此外,像 Ink 这样在单线程环境(如 Node.js)中运行的框架,在历史记录增长时会遭受巨大的性能下降。如果你粘贴一大块文本,系统必须为数千行计算差异。这会导致输入延迟。你按下一个键,然后等待。你可能需要等待长达 10 秒才能看到单个字符的回显。系统忙于计算如何重绘屏幕,以至于无法真正处理你的输入。

Why The “Old Guard” Works (nano, vim, menuconfig)

为什么“老派”工具(nano, vim, menuconfig)有效

Sighted developers often ask: “If TUIs are bad, why do you use nano, vim, or menuconfig?” The answer is not that these tools handle the cursor perfectly by default. The answer is that they allow you to hide the cursor entirely.

视力正常的开发者经常问:“如果 TUI 很糟糕,为什么你们还要用 nano、vim 或 menuconfig?”答案并不是这些工具默认就能完美处理光标。答案是它们允许你完全隐藏光标。

  1. Hiding the Cursor (nano, vim): In tools like nano or vim, usability depends on turning off features that track cursor position. If you run nano with options that show the cursor position (like —constantshow), or if you use vim without specific configuration, the experience is broken. When the cursor is visible and tracking is active, Speakup prioritizes the cursor’s location update over the character echo. Instead of hearing the letter “a” when you type it, you hear “Column 2”. You type “b”, and you hear “Column 3”. These older tools succeed because they allow you to disable this noise. You can configure them to suppress the visual cursor or status bar updates, forcing the screen reader to rely on the character input stream rather than the noisy coordinate updates. Modern frameworks rarely offer a “no-cursor” or “headless” mode; they assume the visual cursor is essential.

  2. 隐藏光标 (nano, vim): 在 nano 或 vim 等工具中,可用性取决于是否关闭跟踪光标位置的功能。如果你在运行 nano 时启用了显示光标位置的选项(如 —constantshow),或者在没有特定配置的情况下使用 vim,体验就会崩溃。当光标可见且跟踪处于活动状态时,Speakup 会优先处理光标位置更新,而不是字符回显。当你输入字母“a”时,你听到的不是“a”,而是“第 2 列”。你输入“b”,听到的是“第 3 列”。这些老工具之所以成功,是因为它们允许你禁用这种噪音。你可以配置它们来抑制视觉光标或状态栏更新,迫使屏幕阅读器依赖字符输入流,而不是嘈杂的坐标更新。现代框架很少提供“无光标”或“无头”模式;它们默认视觉光标是必不可少的。

  3. Single Column Focus (menuconfig): Tools like the Linux kernel’s menuconfig work because they enforce a strict, single-column focus. Even though there are borders and titles, the active area is a vertical list. The cursor stays pinned to that list. It doesn’t jump to the bottom right to update a clock, then to the top left to update a title. The spatial complexity is kept low enough that the screen reader never gets “lost.”

  4. 单列焦点 (menuconfig): 像 Linux 内核的 menuconfig 这样的工具之所以有效,是因为它们强制执行严格的单列焦点。尽管有边框和标题,但活动区域是一个垂直列表。光标始终固定在该列表上。它不会跳到右下角更新时钟,然后又跳到左上角更新标题。空间复杂度保持在足够低的水平,使屏幕阅读器永远不会“迷路”。

  5. The Lost Art of Scrolling Regions (Irssi): Irssi is the gold standard for accessible chat, but not because of luck. Irssi was built over 20 years with a custom rendering engine that utilizes VT100 Scrolling Regions. When a new message arrives in Irssi: It tells the terminal driver: “Define a scrolling region from line 1 to 23.” It sends a command: “Scroll up.” The terminal moves the bits up.

  6. 滚动区域的失传艺术 (Irssi): Irssi 是无障碍聊天的黄金标准,但这并非运气。Irssi 经过 20 多年的构建,拥有一个利用 VT100 滚动区域的自定义渲染引擎。当 Irssi 收到新消息时:它会告诉终端驱动程序:“定义从第 1 行到第 23 行的滚动区域。”它发送一个命令:“向上滚动。”终端随即将位图向上移动。