In Defense of YAML
In Defense of YAML / 为 YAML 正名
May 21, 2026 | Best Practices 2026年5月21日 | 最佳实践
A common refrain among developers: YAML is bad and TOML is good. This post argues otherwise, tracing the history of configuration formats, examining what YAML 1.2 actually fixed, and introducing py-yaml12, a new Rust-backed Python library for working with the modern spec. 开发者中流传着这样一种说法:YAML 很糟糕,而 TOML 很好。本文将反驳这一观点,通过追溯配置格式的历史,审视 YAML 1.2 究竟修复了什么,并介绍 py-yaml12——一个基于 Rust 的全新 Python 库,用于处理现代 YAML 规范。
Every programmer has opinions about configuration files. These opinions tend to be strongly held and inversely proportional to the stakes involved. In the last few years, the consensus view has shifted: YAML is bad, TOML is good, and enthusiastic users of YAML just might be plainly uninformed. This post takes a different view. We intend to present an argument for YAML which is grounded in history, its specification, and the state of tooling in 2026. 每个程序员对配置文件都有自己的看法。这些看法往往非常坚定,且与所涉及的利害关系成反比。过去几年里,共识发生了转变:YAML 很糟糕,TOML 很好,而 YAML 的狂热用户可能只是单纯的“无知”。本文持不同观点。我们旨在从历史、规范以及 2026 年的工具现状出发,为 YAML 进行辩护。
The case against YAML was, for a long time, a reasonable one. The format attracted its critics for real reasons, through years of surprising behavior that burned even careful users. But the specification evolved, and the tooling is finally catching up. To understand why the current consensus is outdated, we need to trace the lineage of configuration formats themselves, because this sort of argument has played out before. 长期以来,反对 YAML 的理由是合理的。这种格式确实因其多年来令人惊讶的行为而招致批评,甚至让谨慎的用户也深受其害。但规范已经演进,工具链也终于跟上了步伐。要理解为什么当前的共识已经过时,我们需要追溯配置格式本身的演变历程,因为类似的争论在历史上早已上演过。
A brief history of configuration formats / 配置格式简史
The INI file emerged in the early 1980s alongside MS-DOS and the first versions of Windows. It was the simplest thing that could possibly work: key-value pairs, grouped into sections denoted by square brackets, with semicolons for comments. They are flat, readable, and human-editable. For the needs of that era (like configuring device drivers, specifying font paths, or setting application preferences) it was entirely adequate. Its only real limitation was structural: you could not nest deeper than one level, and there was no formal specification, which meant every parser implemented its own dialect. But for two decades, this was fine. INI 文件出现于 20 世纪 80 年代初,与 MS-DOS 和 Windows 的早期版本同时代。它是当时最简单有效的方案:键值对被分组在方括号标记的节中,并使用分号进行注释。它们结构扁平、易于阅读且可由人工编辑。对于那个时代的需求(如配置设备驱动程序、指定字体路径或设置应用程序首选项)来说,它完全够用。它唯一的局限在于结构:无法嵌套超过一层,且没有正式规范,这意味着每个解析器都实现了自己的方言。但在随后的二十年里,这完全没问题。
Then came XML. In the late 1990s, the enterprise software world adopted angle brackets broadly. XML could represent arbitrary hierarchy. It had schemas, namespaces, transformations. It was self-describing. For a while it seemed as though the debate was settled. But XML configuration files grew large in practice. Anyone who maintained a Java web.xml or an Ant build file in 2003 knows what it was like to edit dozens of nested elements just to change a database connection string. The verbosity made the files difficult to maintain by hand, which is precisely what configuration files demand.
随后是 XML。20 世纪 90 年代末,企业软件领域广泛采用了尖括号。XML 可以表示任意层级的结构,拥有模式(Schema)、命名空间和转换功能,并且具备自描述性。有一段时间,争论似乎已经平息。但在实践中,XML 配置文件变得异常臃肿。任何在 2003 年维护过 Java web.xml 或 Ant 构建文件的人都知道,仅仅为了修改一个数据库连接字符串,就需要编辑几十个嵌套元素。这种冗长使得文件难以手动维护,而这恰恰是配置文件所要求的。
JSON appeared as the lightweight reaction. Douglas Crockford, who claims to have discovered rather than invented the format, offered the simplicity of the JavaScript object literal: curly braces, square brackets, quoted strings, and a tiny set of types. JSON displaced XML in web APIs through the late 2000s and early 2010s. But as people began using it for configuration (rather than machine-to-machine data exchange), its limitations became apparent. JSON has no comments. It has no multiline strings. Trailing commas are illegal. These are reasonable constraints for a serialization format, but they make JSON miserable for files that humans must author and maintain. The removal of comments from JSON’s spec was, according to Crockford himself, motivated by people abusing them for parsing directives. It was the right call for data interchange, but it left a gap. JSON 作为一种轻量级的反制方案出现。Douglas Crockford(他声称自己是发现而非发明了这种格式)提供了 JavaScript 对象字面量的简洁性:花括号、方括号、带引号的字符串以及极少数的数据类型。在 2000 年代末到 2010 年代初,JSON 在 Web API 中取代了 XML。但当人们开始将其用于配置(而非机器间的数据交换)时,其局限性显现出来。JSON 没有注释,没有多行字符串,且不允许尾随逗号。对于序列化格式而言,这些是合理的约束,但对于人类必须编写和维护的文件来说,这让 JSON 变得非常糟糕。据 Crockford 本人所述,从 JSON 规范中移除注释是为了防止人们滥用它们来编写解析指令。这对于数据交换来说是正确的决定,但它留下了一个空白。
YAML (2001) and TOML (2013) each arose to fill that gap, and both positioned themselves explicitly against what came before. YAML offered the full expressive power of a serialization language (including arbitrary nesting, multiple documents, references, and custom types) with a syntax built on indentation rather than brackets. TOML, created by Tom Preston-Werner a dozen years later, was a reaction to YAML’s complexity: it aimed to be a “standardized INI” with explicit typing, obvious semantics, and a formal specification. The pattern repeats in each generation: the previous format’s excess becomes the new format’s founding grievance. What is interesting about the current moment is that YAML’s problems were not inherent to the format’s design. They were artifacts of a particular specification version and the parsers frozen on it. YAML(2001 年)和 TOML(2013 年)应运而生以填补这一空白,两者都明确地将自己定位为对前者的反思。YAML 提供了序列化语言的全部表达能力(包括任意嵌套、多文档、引用和自定义类型),并采用基于缩进而非括号的语法。十二年后,Tom Preston-Werner 创建了 TOML,作为对 YAML 复杂性的回应:它旨在成为一种“标准化的 INI”,具有显式类型、明确的语义和正式的规范。这种模式在每一代中都在重复:前一种格式的过度设计成为了新格式诞生的不满根源。有趣的是,YAML 的问题并非其设计本身固有的,而是特定规范版本及其停滞不前的解析器所带来的产物。
The case against YAML (as it was) / 反对 YAML 的理由(过去的情况)
The criticisms of YAML are not fabricated. They reflect real experiences that real programmers had over many years. The most infamous problem is the Norway incident, which has become shorthand for YAML’s implicit typing behavior. In YAML 1.1, the bare scalar NO was interpreted as the boolean value false. This meant that a list of country codes would silently transform Norway into a falsehood.
对 YAML 的批评并非捏造。它们反映了真实程序员多年来的真实经历。最臭名昭著的问题是“挪威事件”(Norway incident),它已成为 YAML 隐式类型转换行为的代名词。在 YAML 1.1 中,裸标量 NO 被解释为布尔值 false。这意味着国家代码列表中的挪威(Norway)会被静默地转换为布尔假值。
The same applied to yes, on, off, y, n, and various capitalizations thereof. Ruud van Asseldonk’s widely-circulated “YAML Document from Hell” catalogued these and other problems: port mappings like 22:22 parsed as sexagesimal (base-60) integers, version numbers like 10.23 parsed as floats rather than strings, date-like values parsed as timestamps, and tags beginning with ! could trigger arbitrary code execution in some parsers. This is not only a country-code problem. In data science and machine learning code, n and y are natural variable names. Under YAML 1.1’s implicit boolean rules, a parser can resolve those keys as booleans instead of strings. These were not edge cases encountered only by the reckless. They emerged from the YAML 1.1 specification.
同样的情况也适用于 yes、on、off、y、n 及其各种大小写形式。Ruud van Asseldonk 广为流传的《YAML Document from Hell》一文列举了这些问题:如 22:22 这样的端口映射被解析为六十进制整数,10.23 这样的版本号被解析为浮点数而非字符串,日期类值被解析为时间戳,以及以 ! 开头的标签在某些解析器中可能触发任意代码执行。这不仅仅是国家代码的问题。在数据科学和机器学习代码中,n 和 y 是很自然的变量名。在 YAML 1.1 的隐式布尔规则下,解析器会将这些键解析为布尔值而非字符串。这些并非只有粗心大意者才会遇到的边缘情况,它们源于 YAML 1.1 的规范本身。