Lessons from Log4Shell: Building a CRA-Ready Log4j

Lessons from Log4Shell: Building a CRA-Ready Log4j

Log4Shell 的教训:构建符合《网络韧性法案》(CRA) 的 Log4j

By: Piotr P. Karwasz, VP Logging, Apache Software Foundation 作者:Piotr P. Karwasz,Apache 软件基金会日志部门副总裁

The disclosure of Log4Shell (CVE-2021-44228) on December 9, 2021 did not just expose a vulnerability: it exposed a way of building software that was no longer fit for purpose, and it helped bring the European Cyber Resilience Act into being. I recently hosted a session for the Open Regulatory Compliance community’s CRA Monday series to tell the story from the inside: what the Apache Logging team actually did in the years after Log4Shell to rebuild the project as something CRA-ready. This blog recaps and expands upon that session; you can also watch the recording or view the slides.

2021 年 12 月 9 日 Log4Shell (CVE-2021-44228) 的披露不仅暴露了一个漏洞,更揭示了一种已不再适用的软件构建方式,并推动了欧洲《网络韧性法案》(CRA) 的出台。我最近为“开放监管合规社区”(Open Regulatory Compliance community) 的“CRA 周一”系列活动主持了一场会议,从内部视角讲述了这段故事:Apache 日志团队在 Log4Shell 事件发生后的几年里,究竟做了哪些工作来重构项目,使其符合 CRA 的要求。本篇博客回顾并扩展了该会议的内容;您也可以观看录像或查看幻灯片。

A Wake-Up Call for the Software Ecosystem

软件生态系统的警钟

Log4Shell’s impact was unprecedented in scale. Apache Log4j is embedded so deeply across the software ecosystem that the vulnerability propagated almost everywhere at once and most organizations had no idea where they were exposed. The rush to assess risk revealed a fundamental problem: few teams maintained a reliable Software Bill of Materials (SBOM), and the question “are we affected?” had no quick answer. The scramble had at least one useful side effect: it pushed many teams to finally migrate from Log4j 1, already end-of-life since 2015, to Log4j 2.

Log4Shell 的影响规模是前所未有的。Apache Log4j 在整个软件生态系统中嵌入得如此之深,以至于该漏洞几乎在瞬间传播到各处,而大多数组织根本不知道自己在哪里受到了影响。评估风险的匆忙揭示了一个根本性问题:很少有团队维护可靠的软件物料清单 (SBOM),“我们是否受到影响?”这个问题没有快速答案。这场混乱至少产生了一个有益的副作用:它促使许多团队终于从自 2015 年起就已停止维护的 Log4j 1 迁移到了 Log4j 2。

Lessons from Log4j perspective

从 Log4j 的视角汲取教训

Since Log4j is mostly consumed as a dependency rather than built upon, the lessons the Apache Software Foundation’s Logging Services team drew from Log4Shell were different from those of the broader ecosystem. The problems were not about visibility into our own dependencies, but about the state of the project itself:

  • Documentation was hard to navigate, with many features either undocumented or described only in terms a new contributor could not act on.
  • The release process was antiquated, understood by only a handful of people, and run on personal hardware: a single point of failure that nobody had reason to address until a crisis made it unavoidable.
  • Builds were slow and tests were flaky, meaning a failure late in a multi-hour process sent you back to the beginning.

由于 Log4j 主要作为依赖项被使用,而不是在其基础上进行构建,因此 Apache 软件基金会日志服务团队从 Log4Shell 中汲取的教训与更广泛的生态系统有所不同。问题不在于对自身依赖项的可见性,而在于项目本身的状态:

  • 文档难以导航,许多功能要么没有文档记录,要么描述方式让新贡献者无法上手。
  • 发布流程陈旧,只有少数人了解,且在个人硬件上运行:这是一个单点故障,在危机使其变得不可避免之前,没有人有理由去解决它。
  • 构建缓慢且测试不稳定,这意味着在长达数小时的流程后期出现故障,就会让你回到起点。

None of these were unique to Log4j. Log4Shell made them impossible to ignore, and addressing them put us on a path that anticipates much of what the CRA now asks of software maintainers.

这些问题并非 Log4j 所独有。Log4Shell 让这些问题变得无法忽视,而解决这些问题使我们走上了一条预见 CRA 现在对软件维护者所要求的大部分内容的道路。

Documentation: from maintainer knowledge to public record

文档:从维护者知识到公共记录

Logging is not always safe. There are real security concerns: CRLF injection from unstructured logging; sensitive information leaking into debug output; and injection of Log4j formatting patterns through user-supplied strings. Before Log4Shell, much of this knowledge lived in the heads of a few maintainers: not written down, not discoverable, and not actionable for the thousands of teams depending on the library. We rewrote the documentation website from scratch. The goal was to turn that private knowledge base into a public record by:

  • Covering security best practices and an explicit security model
  • Providing reference documentation generated directly from code, so it stays in sync as the library evolves
  • Making Log4j’s versioning policy and support status explicit and visible, both required for CRA attestations
  • Moving the issue tracker from JIRA closer to the code in GitHub Issues
  • Mirroring some discussions on both GitHub Discussions and mailing lists

日志记录并不总是安全的。存在真正的安全隐患:非结构化日志记录导致的 CRLF 注入;敏感信息泄露到调试输出中;以及通过用户提供的字符串注入 Log4j 格式化模式。在 Log4Shell 之前,大部分知识都存在于少数维护者的脑海中:没有记录下来,无法被发现,对于依赖该库的数千个团队来说也无法操作。我们从零开始重写了文档网站。目标是通过以下方式将私有知识库转化为公共记录:

  • 涵盖安全最佳实践和明确的安全模型
  • 提供直接从代码生成的参考文档,以便在库演进时保持同步
  • 使 Log4j 的版本控制策略和支持状态明确且可见,这两者都是 CRA 认证所必需的
  • 将问题跟踪器从 JIRA 迁移到更贴近代码的 GitHub Issues
  • 在 GitHub Discussions 和邮件列表中同步部分讨论

The results were measurable: more documentation pull requests, more site visits, a useful proxy for coverage and clarity, and noticeably better answers from LLMs trained on our new content.

结果是可衡量的:更多的文档拉取请求 (PR)、更多的网站访问量(这是覆盖率和清晰度的有效指标),以及基于我们新内容训练的 LLM 给出的答案明显更好。

Release process: from manual to reproducible

发布流程:从手动到可复现

In December 2021, Log4j’s tests ran on a Jenkins instance, binaries were built on maintainer machines, signing was manual, and builds were not reproducible. A full binary and site build literally took hours. This was not unusual for open source projects, but it created real risks around build integrity, and it was clearly not sustainable. By September 2024 we had migrated to GitHub Actions, achieved reproducible builds signed by a CI GPG key only known to ASF admins, parallelized tests, and reduced the build-and-deploy cycle to around 30 minutes.

2021 年 12 月,Log4j 的测试在 Jenkins 实例上运行,二进制文件在维护者的机器上构建,签名是手动的,且构建不可复现。完整的二进制文件和站点构建确实需要数小时。这对于开源项目来说并不罕见,但它在构建完整性方面造成了真正的风险,而且显然是不可持续的。到 2024 年 9 月,我们已迁移到 GitHub Actions,实现了由仅 ASF 管理员知晓的 CI GPG 密钥签名的可复现构建,实现了测试并行化,并将构建和部署周期缩短至约 30 分钟。

Currently:

  • The CI pipeline now automatically stages releases up to the voting phase: the first project in The ASF to do this.
  • We are working on integration with Apache Trusted Releases, which will bring automation to the voting and publishing steps as well.
  • We are working on full-SLSA build and source attestations, which will make us one of the first ASF projects to achieve this. This includes SLSA source level 4, requiring a non-author review for every commit: a critical guarantee for a project at the center of the most significant supply-chain incident in recent memory.

目前:

  • CI 流水线现在会自动将发布版本暂存到投票阶段:这是 ASF 中第一个做到这一点的项目。
  • 我们正在致力于与 Apache Trusted Releases 集成,这将为投票和发布步骤带来自动化。
  • 我们正在致力于完整的 SLSA 构建和源代码认证,这将使我们成为首批实现这一目标的 ASF 项目之一。这包括 SLSA 源代码 4 级,要求对每次提交进行非作者审查:对于处于近期最重大供应链事件中心的项目来说,这是一个关键的保证。

Machine-readable metadata: SBOMs, VEX, and beyond

机器可读的元数据:SBOM、VEX 及未来

One of the most concrete CRA requirements is the expectation that software comes with machine-readable security information. We now publish CycloneDX SBOMs to Maven Central, which references a Vulnerability Disclosure Report, a machine-readable version of our CVE list, on our website. This gives downstream users a complete, well-curated source of vulnerability information, unaffected by the data loss that public vulnerability databases sometimes introduce when converting between formats. It is also open to improvements by contributors.

CRA 最具体的要求之一是期望软件附带机器可读的安全信息。我们现在向 Maven Central 发布 CycloneDX SBOM,它引用了我们网站上的漏洞披露报告(我们 CVE 列表的机器可读版本)。这为下游用户提供了一个完整、精心策划的漏洞信息来源,不受公共漏洞数据库在格式转换时有时会引入的数据丢失的影响。它也欢迎贡献者进行改进。

The next step is Vulnerability Exploitability eXchange (VEX) statements, generated automatically through an open source toolset we are developing with OpenRefactory. The system combines:

  • An AI-backed Root Cause Service that identifies vulnerable methods
  • A Call Graph Service that maps per-component call graphs
  • A VEX Generation Service that determines the maximum reachable path and generates enriched VEX statements, which we call VEXplanations

下一步是漏洞可利用性交换 (VEX) 声明,通过我们与 OpenRefactory 共同开发的开源工具集自动生成。该系统结合了:

  • 一个 AI 支持的根本原因服务,用于识别易受攻击的方法
  • 一个调用图服务,用于映射各组件的调用图
  • 一个 VEX 生成服务,用于确定最大可达路径并生成丰富的 VEX 声明,我们称之为“VEXplanations”

We are currently testing this within Apache Solr and plan to extend it to Log4j and Commons. The goal is to give downstream users a machine-readable guarantee of no known exploitable vulnerabilities, assessed automatically rather than by hand. We are also planning to support Common Lifecycle Enumeration (ECMA-428), the machi…

我们目前正在 Apache Solr 中测试此功能,并计划将其扩展到 Log4j 和 Commons。目标是为下游用户提供机器可读的“无已知可利用漏洞”保证,通过自动评估而非人工评估。我们还计划支持通用生命周期枚举 (ECMA-428)…