The PR you would have opened yourself
The PR you would have opened yourself
你本该亲自提交的 PR
Published April 16, 2026 发布日期:2026年4月16日
TL;DR
摘要
We provide a Skill and a test harness to help port language models from transformers to mlx-lm, so they become (almost) instantly available the moment they are added to transformers. The Skill is designed to support contributors and reviewers as an aide, not an automation. We explain why we did it, how, and comment about how to meaningfully contribute to open source in the age of agents. 我们提供了一套“技能(Skill)”和测试工具,旨在帮助将语言模型从 transformers 移植到 mlx-lm,从而使模型在加入 transformers 的瞬间就能(几乎)即刻可用。这套“技能”旨在作为贡献者和审查者的辅助工具,而非全自动方案。我们将解释其动机、实现方式,并探讨在智能体(Agent)时代如何进行有意义的开源贡献。
The advent of code agents
代码智能体的降临
In 2026, code agents started to actually work. What used to be auto-completion at the side of your editor turned into a system that one-shots reasonable solutions from brief specifications. The generated code usually works out of the box, covers what you asked for, and makes reasonable assumptions about details you didn’t specify. This is great. As Jensen Huang puts it, we’ve instantly gone from 30 million to one billion coders in the world. Creative minds are unleashed. 2026年,代码智能体开始真正发挥作用。曾经编辑器侧边的自动补全功能,演变成了一个能根据简短需求一次性给出合理解决方案的系统。生成的代码通常开箱即用,涵盖了你的需求,并对未明确的细节做出了合理的假设。这非常棒。正如黄仁勋所言,全球程序员的数量瞬间从3000万增加到了10亿。创造力得到了彻底释放。
But it forces us to rethink open source. Take the transformers library as an example. It has hundreds of contributors, is used in thousands of projects, has been downloaded over a billion times. Suddenly, anyone with an agent can instruct it to find some open issue, fix it, and submit a PR. And that’s exactly what’s happening. Those people feel happy because they are contributing to a great library, but the sad reality is that, most of the time, they don’t realize they are not. 但这迫使我们重新思考开源。以 transformers 库为例,它拥有数百名贡献者,被数千个项目使用,下载量超过十亿次。突然之间,任何拥有智能体的人都可以指示它去寻找未解决的问题、修复它并提交 PR。而这正是目前正在发生的事情。这些人因为能为伟大的库做出贡献而感到高兴,但残酷的现实是,大多数时候,他们并没有意识到自己其实并未做出真正的贡献。
Why not?
为什么这么说?
There are two assumptions that agent-generated PRs usually miss. Codebases like transformers care deeply about the code. It’s cool to build projects where it doesn’t matter what the code looks like, but transformers is not one of them. Being used by thousands of people, transformers is primarily built as a human-to-human communication method, through code. Model files read top to bottom, because we want practitioners to understand them without jumping through complex abstractions. This permeates throughout the library design and is the reason why, for example, we favor flat hierarchies. 智能体生成的 PR 通常忽略了两个前提。像 transformers 这样的代码库非常看重代码质量。构建那些不在乎代码样式的项目固然很酷,但 transformers 并非此类。由于被成千上万人使用,transformers 本质上是一种通过代码进行人与人之间沟通的方式。模型文件从上到下阅读,因为我们希望从业者无需跳过复杂的抽象层就能理解它们。这种理念贯穿了整个库的设计,这也是我们偏好扁平化结构的原因。
Agents don’t have that context. Because design decisions are not explicit, agents suggest refactors to “improve” the codebase by following “best practices”, without realizing they are breaking implicit contracts between the library and its users. They are verbose, generalize too early, don’t notice when a change affects other areas, introduce subtle bugs, break performance. They are also sycophantic, and accept any idea as good and follow it through diligently, including ones a maintainer would have pushed back early on with a terse comment. 智能体缺乏这种背景信息。由于设计决策并非显式标注,智能体往往会通过遵循所谓的“最佳实践”来建议重构代码,却没意识到它们正在破坏库与用户之间的隐性契约。它们生成的代码冗长、过早泛化、无法察觉改动对其他领域的影响、引入隐蔽的 Bug 并破坏性能。它们还具有“谄媚”倾向,会盲目接受任何想法并勤勉地执行,即便这些想法本应被维护者用一句简短的评论驳回。
A small number of maintainers still has to read every PR, understand it, decide if the design direction is right, identify side effects, and write feedback. PR volume has gone up tenfold, but the amount of maintainers has not (and cannot, because team coordination does not scale). 少数维护者仍然必须阅读每一个 PR,理解其内容,判断设计方向是否正确,识别副作用,并撰写反馈。PR 的数量增加了十倍,但维护者的数量却没有(也无法增加,因为团队协作无法无限扩展)。
What does this have to do with MLX?
这与 MLX 有什么关系?
Transformers is one of the first projects to feel this pressure because of sheer volume, but the same dynamic is happening everywhere. As an example from a different domain, App Store reviewers are swamped because anyone can now build and submit an app, so many do. The same logic applies to MLX: their maintainers care deeply about the code and read every PR carefully. We wanted to see whether agents could help contributors land high-quality model ports fast, and at the same time support reviewers in their work. Transformers 是首批感受到这种压力的项目之一,因为其体量巨大,但同样的动态正在各处上演。以另一个领域为例,App Store 的审核人员正被淹没,因为现在任何人都能构建并提交应用,导致提交量激增。同样的逻辑也适用于 MLX:其维护者非常看重代码质量,并会仔细阅读每一个 PR。我们想看看智能体是否能帮助贡献者快速完成高质量的模型移植,同时为审查者的工作提供支持。
Not only do we aspire to produce PRs that could have come from a careful human submission, but we also provide additional artifacts to increase the signal: generation examples, numerical comparisons, and a separate non-agentic test harness for reproducibility. 我们不仅希望生成的 PR 看起来就像出自细心的人类之手,还提供了额外的产物来增加可信度:生成示例、数值对比,以及一套独立的非智能体测试工具,以确保可复现性。
Another connection between transformers and MLX is that, most times, mlx-lm models are ported from transformers implementations. Because transformers focuses on clarity and readability, it has become the source of truth for model definitions. Downstream contributors wait until the transformers implementations are ready before they port to other frameworks. As a side effect, this is an excellent environment for an agent because it naturally limits the scope: rather than creating an implementation from scratch, the agent relies on transformers code as the source of truth. This approach supports our goal: when a model lands in transformers, it should be available on MLX shortly after. Transformers 和 MLX 之间的另一个联系是,大多数情况下,mlx-lm 模型都是从 transformers 的实现移植过来的。由于 transformers 专注于清晰度和可读性,它已成为模型定义的“真理来源”。下游贡献者通常会等待 transformers 的实现就绪后再移植到其他框架。副作用是,这为智能体创造了一个绝佳的环境,因为它自然地限制了范围:智能体无需从零开始创建实现,而是依赖 transformers 代码作为真理来源。这种方法支持了我们的目标:当一个模型进入 transformers 后,它应该很快就能在 MLX 上使用。
What we did
我们做了什么
We built a Skill that mlx-lm contributors can use to port a model from transformers to MLX. Given a prompt like “convert the olmo_hybrid architecture to MLX”, the Skill sets up a virtual environment to work on, discovers and downloads the relevant models from the Hub, reads the transformers modeling code, writes the MLX implementation, and runs a battery of tests. If results don’t look right, it debugs and iterates, and does not declare success until it’s satisfied. 我们构建了一套“技能”,mlx-lm 贡献者可以使用它将模型从 transformers 移植到 MLX。给定一个提示词,如“将 olmo_hybrid 架构转换为 MLX”,该技能会自动设置工作虚拟环境,从 Hub 发现并下载相关模型,读取 transformers 的建模代码,编写 MLX 实现,并运行一系列测试。如果结果不正确,它会进行调试和迭代,直到满意为止才会宣布成功。
We designed it to be useful to reviewers as much as contributors. For the contributor, the Skill of course handles all the scaffolding: finding model variants on the Hub, diffing their configs to spot parameters that vary across model variants, downloading checkpoints, setting up editable installs of both mlx-lm and transformers. But it also handles the more difficult modeling tasks. It pays attention to salient architecture details and verifies sensitive areas, like RoPE configurations, that may result in hard-to-find bugs. It detects when the config doesn’t declare a dtype and infers it from the safetensors metadata header. It runs per-layer comparisons between transformers and MLX to pinpoint exactly where divergence occurs. These are the kinds of checks that only someone with porting experience would think to run. For the reviewer, the Skill produces a PR that is upfront… 我们将其设计为对审查者和贡献者同样有用。对于贡献者,该技能处理了所有的基础工作:在 Hub 上查找模型变体、对比配置以发现参数差异、下载检查点、设置 mlx-lm 和 transformers 的可编辑安装。但它也处理了更困难的建模任务。它会关注显著的架构细节,并验证敏感区域(如 RoPE 配置),这些地方往往会导致难以发现的 Bug。它能检测配置中未声明 dtype 的情况,并从 safetensors 元数据头中推断出来。它还会运行 transformers 和 MLX 之间的逐层对比,以精确定位差异发生的位置。这些检查只有具备移植经验的人才会想到去执行。对于审查者,该技能生成的 PR 会预先提供……