Scaling Accessible Mathematics on arXiv: HTML Conversion and MathML 4
Scaling Accessible Mathematics on arXiv: HTML Conversion and MathML 4
在 arXiv 上扩展无障碍数学:HTML 转换与 MathML 4
Title: Scaling Accessible Mathematics on arXiv: HTML Conversion and MathML 4 标题: 在 arXiv 上扩展无障碍数学:HTML 转换与 MathML 4
Authors: Deyan Ginev, Brian Caruso, Bruce Miller, Jeff Sank, Jacob Weiskoff 作者: Deyan Ginev, Brian Caruso, Bruce Miller, Jeff Sank, Jacob Weiskoff
Abstract: We report on the ongoing development of arXiv’s HTML Papers offering, available on every new TeX/LaTeX submission since its initial release in 2023. The main highlights from 2025 and early 2026 are: (i) community-driven improvements to HTML fidelity and service health, with roughly half of 6,000 user reports resolved; (ii) corpus-scale conversion work aimed at 90% error-free HTML (currently 75%); (iii) initial MathML 4 Intent annotations for accessible speech output; (iv) an in-progress Rust port of LaTeXML, reducing compute costs and enabling faster previews on submission. The arXiv HTML Papers project remains experimental, but is gradually maturing as we better understand the needs of arXiv’s readers and the technical opportunities presented by new standards and by advances in programming languages and AI.
摘要: 我们报告了 arXiv HTML 论文服务的持续开发进展,该服务自 2023 年首次发布以来,已应用于每一份新的 TeX/LaTeX 提交。2025 年至 2026 年初的主要亮点包括:(i) 在社区驱动下,HTML 的保真度和服务稳定性得到提升,约 6,000 份用户报告中有一半已得到解决;(ii) 旨在实现 90% 无错误 HTML 的语料库规模转换工作(目前为 75%);(iii) 用于无障碍语音输出的初步 MathML 4 意图(Intent)标注;(iv) 正在进行的 LaTeXML 的 Rust 语言移植工作,旨在降低计算成本并加快提交时的预览速度。arXiv HTML 论文项目目前仍处于实验阶段,但随着我们对 arXiv 读者需求以及由新标准、编程语言和人工智能进步所带来的技术机遇的深入理解,该项目正在逐渐成熟。
Subjects: Computation and Language (cs.CL); Digital Libraries (cs.DL) 学科: 计算与语言 (cs.CL);数字图书馆 (cs.DL)
MSC classes: 68U15 (Primary) 68V25, 68U35 (Secondary) MSC 分类: 68U15(主要),68V25, 68U35(次要)
ACM classes: I.7.2; H.3.7 ACM 分类: I.7.2; H.3.7