"Respectful" YAML patching in Rust

“Respectful” YAML patching in Rust

Rust 中的“尊重式” YAML 修补

Patching a YAML file programmatically is straightforward in principle: parse, modify, serialize. Ideally the process should also be respectful — that is, preserve the following properties of the initial file: 从编程角度修补 YAML 文件在原则上很简单:解析、修改、序列化。理想情况下,这个过程还应该是“尊重”原文件的——也就是说,要保留初始文件的以下属性:

  • Formatting. The same YAML value can be represented in multiple ways: how mappings and lists are indented, whether blank lines separate sections, how strings are quoted, and so on. For example, a list can be represented in block style (items: - 1 - 2 - 3) or in flow style (items: [1, 2, 3]). A general-purpose YAML library typically picks one canonical form when serializing and applies it to the entire document.

  • 格式。 同一个 YAML 值可以用多种方式表示:映射和列表的缩进方式、段落间是否有空行、字符串如何加引号等等。例如,列表可以用块样式(items: - 1 - 2 - 3)或流样式(items: [1, 2, 3])表示。通用的 YAML 库在序列化时通常会选择一种规范格式,并将其应用于整个文档。

  • Comments. One of YAML’s ergonomic advantages is that a value can have an associated inline note explaining why it’s set the way it is. Comments are typically erased at the deserialization stage and therefore have no chance to be serialized back.

  • 注释。 YAML 的一大易用性优势在于,值可以关联内联注释,解释其设置原因。注释通常在反序列化阶段就被抹除,因此无法在序列化时被还原。

Losing either property hurts. A dropped comment effectively loses historical context. Mangled formatting can render the resulting file invalid, or wipe out a layout that was carefully chosen for a specific situation (e.g. turn an intentional flow list into a block list). 丢失上述任何一种属性都会造成损害。丢失注释意味着失去了历史背景;格式混乱可能导致生成的文件无效,或者破坏了为特定场景精心设计的布局(例如,将刻意设计的流式列表变成了块式列表)。

Reaching for a popular general-purpose YAML library is the obvious move, but none of them preserve both: 寻找一个流行的通用 YAML 库是显而易见的选择,但它们都无法同时保留这两者:

  • serde_yaml is no longer maintained, and the feature request was declined as out of scope long before that.
  • serde_yaml 已不再维护,且相关功能请求早在很久之前就被以“超出范围”为由拒绝了。
  • yaml-rust2 doesn’t preserve comments; the feature request was closed with the note that the work would happen in saphyr instead.
  • yaml-rust2 不保留注释;相关功能请求被关闭,并注明相关工作将转入 saphyr 中进行。
  • saphyr is yaml-rust2’s spiritual successor by the same maintainer; comment support is planned but not yet shipped.
  • saphyr 是由同一维护者开发的 yaml-rust2 的精神续作;注释支持已在计划中,但尚未发布。

So a more niche tool is needed. 因此,需要一个更小众的工具。

The candidates

候选方案

A search of crates.io and lib.rs for libraries that claim comment preservation turns up four candidates: 在 crates.io 和 lib.rs 上搜索声称支持注释保留的库,发现了四个候选者:

  • yamlpath + yamlpatch — comment- and format-preserving routing (yamlpath) and patch operations (yamlpatch).
  • yamlpath + yamlpatch — 支持保留注释和格式的路由(yamlpath)及修补操作(yamlpatch)。
  • yaml-edit — per its README, preserves formatting, comments, and whitespace.
  • yaml-edit — 根据其 README,它能保留格式、注释和空白字符。
  • rust-yaml — README has a dedicated Comment Preservation section.
  • rust-yaml — README 中有一个专门的“注释保留”章节。
  • yamp — README lists comment preservation as one of the project’s design goals.
  • yamp — README 将注释保留列为项目的设计目标之一。

The experiment

实验

The example below uses a simplified config for a trading bot. The assets are grouped into named groups with a catch-all default group: 下面的示例使用了一个交易机器人的简化配置。资产被归入命名组中,并设有一个兜底的默认组:

# outer comment
asset_groups:
  group_abc: # group_abc comment
    - BTC
    - ETH
    - SOL
  # group_xyz outer comment
  group_xyz:
    - DOGE # asset comment
    - PEPE
  default: # default group inner comment
    - 1INCH
    - ATOM
    - LINK

The toy CLI used here supports two operations: 这里使用的演示 CLI 支持两种操作:

  • list-assets ASSET1,ASSET2 — append the listed assets to the default group, in alphabetical order.
  • list-assets ASSET1,ASSET2 — 将列出的资产按字母顺序追加到默认组中。
  • delist-assets ASSET1,ASSET2 — remove the listed assets from whichever group they live in. If a group goes empty, drop the group entirely.
  • delist-assets ASSET1,ASSET2 — 从它们所在的任何组中移除这些资产。如果某个组变为空,则彻底删除该组。

Listing assets

列出资产

The first test is a single list-assets invocation with four assets, picked to exercise three cases at once: 第一个测试是一次 list-assets 调用,包含四个资产,旨在同时测试三种情况:

list-assets 1INCH,BTC,XRP,BNB

  • 1INCH is already in default → no-op.
  • 1INCH 已在默认组中 → 无操作。
  • BTC is already in group_abc → also no-op.
  • BTC 已在 group_abc 中 → 同样无操作。
  • XRP and BNB are new and should land in default, alphabetically sorted alongside the existing items.
  • XRP 和 BNB 是新资产,应按字母顺序排列并存入默认组中。

The expected output: 预期输出:

# outer comment
asset_groups:
  group_abc: # group_abc comment
    - BTC
    - ETH
    - SOL
  # group_xyz outer comment
  group_xyz:
    - DOGE # asset comment
    - PEPE
  default: # default group inner comment
    - 1INCH
    - ATOM
    - BNB
    - LINK
    - XRP
  • yamlpath + yamlpatch — exact match
  • yamlpath + yamlpatch — 完全匹配
  • yaml-edit — outer comment dropped, “default” misindented
  • yaml-edit — 外部注释丢失,“default”缩进错误
  • rust-yaml — multiple issues, disqualified
  • rust-yaml — 存在多个问题,被淘汰

The comments are scattered (some end up at the bottom of the file, some duplicated), 1INCH is split into two list items (- 1 and - INCH), and the deliberate whitespace and inline comment on DOGE are both lost. 注释变得支离破碎(有的跑到文件末尾,有的重复出现),1INCH 被拆分为两个列表项(- 1- INCH),而 DOGE 上刻意保留的空白和内联注释也都丢失了。

The library’s own comment_preservation_demo.rs exhibits the same comment-scattering behavior when run unmodified. 该库自带的 comment_preservation_demo.rs 在未经修改运行的情况下,也表现出了同样的注释乱序行为。

  • yamp — parsing issues, disqualified
  • yamp — 解析问题,被淘汰

No output is shown here because some of the comments in the input confuse yamp’s parser. 此处没有显示输出,因为输入中的某些注释干扰了 yamp 的解析器。

list-assets is the easier of the two operations since it only touches a single group and only adds. yamlpath + yamlpatch round-trip the file exactly. yaml-edit does violate both properties, but not severely enough to disqualify it on this test alone. list-assets 是两个操作中较简单的一个,因为它只涉及单个组且仅执行添加操作。yamlpath + yamlpatch 能完美地往返处理文件。yaml-edit 虽然违反了两个属性,但程度尚不足以仅凭此测试就将其淘汰。

Delisting assets

移除资产

delist-assets is the more demanding operation: any group can be modified, any asset can be removed, groups can be removed entirely. The test: delist-assets 是要求更高的操作:任何组都可能被修改,任何资产都可能被移除,组也可能被彻底删除。测试如下:

delist-assets DOGE,PEPE,BTC,SOL,ATOM,SHIB

That covers every interesting case at once: 这涵盖了所有关键情况:

  • DOGE and PEPE are both members of group_xyz. Removing both should empty the group, which means the whole group_xyz group has to be removed.
  • DOGE 和 PEPE 都是 group_xyz 的成员。移除两者后该组应变为空,这意味着整个 group_xyz 组必须被删除。
  • BTC and SOL come out of group_abc, leaving it with only ETH.
  • BTC 和 SOL 从 group_abc 中移除,组内仅剩 ETH。
  • ATOM is removed from default.
  • ATOM 从默认组中移除。
  • SHIB isn’t in the file at all; should be a no-op.
  • SHIB 文件中不存在;应无操作。

The expected output: 预期输出:

# outer comment
asset_groups:
  group_abc: # group_abc comment
    - ETH
  default: # default group inner comment
    - 1INCH
    - LINK
  • yamlpath + yamlpatch — almost, a single comment rearranged
  • yamlpath + yamlpatch — 几乎完美,仅有一个注释位置重排

When the now-empty group_xyz: key is removed, the standalone comment that was sitting on the line above it doesn’t get removed with it. Instead it migrates onto the nearest surviving content line as an inline comment. The output is valid YAML and no comment is lost, but the comment is now attached to the wrong list item. 当现在为空的 group_xyz: 键被移除时,位于其上方一行的独立注释并没有随之被删除。相反,它迁移到了最近的存活内容行上,变成了内联注释。输出虽然是有效的 YAML 且没有丢失注释,但注释现在被错误地附加到了错误的列表项上。

  • yaml-edit — logical structure changed, disqualified
  • yaml-edit — 逻辑结构改变,被淘汰