MAI-Code-1-Flash

MAI-Code-1-Flash

Today we’re introducing MAI-Code-1-Flash, a new Microsoft coding model built for fast, efficient assistance in everyday developer workflows. It is built end-to-end by Microsoft using clean and appropriately licensed data. The model is rolling out to GitHub Copilot individual users in Visual Studio Code in the model picker and under the default auto picker.

今天,我们正式推出 MAI-Code-1-Flash,这是一款全新的微软编程模型,旨在为日常开发者工作流提供快速、高效的辅助。该模型由微软端到端构建,使用了干净且获得适当授权的数据。目前,该模型正逐步向 Visual Studio Code 中的 GitHub Copilot 个人用户推送,用户可在模型选择器或默认的自动选择器中找到它。

Features and capabilities

功能与特性

Agentic coding in real developer environments, trained and designed for GitHub Copilot harness, to work better together. Adaptive thinking, stays concise for simple requests and spends more reasoning budget on complex tasks. Strong instruction-following across single-turn and multi-turn scenarios. MAI-Code-1-Flash is designed around the simple goal of delivering high-quality coding help with better efficiency. It outperforms Claude Haiku 4.5 with better price to performance across coding benchmarks.

在真实的开发者环境中实现代理式编程(Agentic coding),专为 GitHub Copilot 架构训练和设计,以实现更好的协作。具备自适应思维能力,在处理简单请求时保持简洁,在复杂任务上投入更多的推理资源。在单轮和多轮对话场景中均表现出强大的指令遵循能力。MAI-Code-1-Flash 的设计初衷非常简单:以更高的效率提供高质量的编程辅助。在各项编程基准测试中,它的性价比均优于 Claude Haiku 4.5。

Build for developers, not benchmarks

为开发者而生,而非为基准测试而生

Coding models are most useful when they perform well in the same environment developers use every day. That is why we built MAI-Code-1-Flash with production workflows at the center, rather than optimizing only for benchmarks. The model was trained directly with GitHub Copilot harnesses used in production. This allows it to learn how to interact with surrounding tools and systems in agentic coding tasks, making it uniquely well suited to real-world Copilot workflows compared to other available models.

编程模型只有在开发者日常使用的环境中表现出色时,才最具价值。因此,我们在构建 MAI-Code-1-Flash 时,将生产工作流置于核心地位,而非仅仅针对基准测试进行优化。该模型直接使用生产环境中的 GitHub Copilot 架构进行训练。这使其能够学习如何在代理式编程任务中与周边工具和系统交互,从而使其相比其他现有模型,更适合真实的 Copilot 工作流。

During training, we evaluated checkpoints across core software engineering tasks, repository question answering, refactoring, and telemetry-grounded tasks adapted from real GitHub Copilot usage. This alignment between training, evaluation, and production helps offline improvements translate into real-world developer quality.

在训练过程中,我们评估了涵盖核心软件工程任务、代码库问答、重构以及基于真实 GitHub Copilot 使用情况改编的遥测任务的检查点。这种训练、评估与生产环境之间的一致性,有助于将离线优化转化为开发者在实际使用中的质量提升。

Designed to maximize value per token

旨在最大化每个 Token 的价值

MAI-Code-1-Flash was trained with adaptive solution length control, which helps the model adjust the depth of its response to the task. It can stay concise for simpler requests and spend more reasoning budget when a problem requires deeper analysis or broader code changes. In practice, this means developers start seeing useful output sooner. We see MAI-Code-1-Flash solving harder problems with up to 60% fewer tokens. This helps reduce latency, lower cost, improve return on token, and make interactive workflows feel smoother.

MAI-Code-1-Flash 采用了自适应解决方案长度控制技术,帮助模型根据任务调整响应的深度。对于简单请求,它可以保持简洁;而当问题需要更深入的分析或更广泛的代码更改时,它会投入更多的推理资源。在实践中,这意味着开发者能更快地看到有用的输出。我们发现,MAI-Code-1-Flash 在解决更难的问题时,Token 使用量减少了多达 60%。这有助于降低延迟、减少成本、提高 Token 回报率,并使交互式工作流更加流畅。

Benchmark results in the production harness

生产环境架构下的基准测试结果

To understand both quality and efficiency, we evaluated MAI-Code-1-Flash against Claude Haiku 4.5 on SWE-Bench Verified, SWE-Bench Pro, SWE-Bench Multilingual, and Terminal Bench 2 using the same production harness that developers use for their everyday coding tasks. We measured task success and the average number of solution tokens required to complete each task.

为了评估质量与效率,我们使用开发者日常编程任务所用的相同生产架构,在 SWE-Bench Verified、SWE-Bench Pro、SWE-Bench Multilingual 和 Terminal Bench 2 上对 MAI-Code-1-Flash 与 Claude Haiku 4.5 进行了对比测试。我们测量了任务成功率以及完成每个任务所需的平均解决方案 Token 数量。

MAI-Code-1-Flash outperforms Claude Haiku 4.5 across all core coding benchmarks tested, with higher pass rates on all 4 evaluations, including a +16-point lead on the diverse, real-world tasks of SWE-Bench Pro (51.2% vs. 35.2%). It’s not just smarter; it’s leaner, solving harder problems with up to 60% fewer tokens on SWE-Bench Verified, proving that higher accuracy and greater efficiency are no longer a trade-off.

MAI-Code-1-Flash 在所有核心编程基准测试中均优于 Claude Haiku 4.5,在全部 4 项评估中通过率更高,包括在多样化的真实世界任务 SWE-Bench Pro 中领先 16 个百分点(51.2% 对 35.2%)。它不仅更聪明,而且更精简——在 SWE-Bench Verified 上以减少多达 60% 的 Token 解决了更难的问题,证明了更高的准确性和更高的效率不再是此消彼长的关系。

Math, Science, Instruction Following, and Agentic coding tasks

数学、科学、指令遵循与代理式编程任务

MAI-Code-1-Flash comes out ahead on every benchmark in the table, with the widest margin on IF Bench precise instruction following (+28.9) and the narrowest on rubric-based Advanced IF (+14.5). The strong instruction-following carries over to agentic tool use. Furthermore, MAI-Code-1-Flash also outperforms Claude Haiku-4.5 on core reasoning capabilities in math, science, and visual generation coding.

MAI-Code-1-Flash 在表中的每一项基准测试中均处于领先地位,其中在 IF Bench 精确指令遵循方面优势最大(+28.9),在基于准则的 Advanced IF 方面优势最小(+14.5)。这种强大的指令遵循能力也延伸到了代理式工具的使用中。此外,MAI-Code-1-Flash 在数学、科学和视觉生成编程的核心推理能力上也优于 Claude Haiku-4.5。

Standard benchmarks reward memorization as much as reasoning, for example a model that has seen the Monty Hall problem will answer it correctly, but invert the prizes and it fails. We built a 186-question, 34-category benchmark around adversarial traps like inverted classics, impossible tasks, and underdetermined scenarios to see whether models were actually reasoning or just pattern-matching. MAI-Code-1-Flash surpasses Claude Haiku 4.5 overall and reached 85.8% adjusted accuracy, with especially strong performance in reasoning, instruction-following, and recognizing impossible problems. We also see room for the model to grow, since core adversarial categories like Einstellung traps remained below 50% accuracy.

标准基准测试往往既奖励记忆也奖励推理。例如,一个见过“蒙提霍尔问题”的模型能回答正确,但如果颠倒奖品位置,它就会失败。我们构建了一个包含 186 个问题、34 个类别的基准测试,围绕反转经典问题、不可能任务和欠定场景等对抗性陷阱,以观察模型是在进行真正的推理还是仅仅在进行模式匹配。MAI-Code-1-Flash 在整体上超越了 Claude Haiku 4.5,达到了 85.8% 的调整后准确率,在推理、指令遵循和识别不可能问题方面表现尤为出色。我们也看到了该模型的成长空间,因为像“定势效应(Einstellung traps)”这样的核心对抗性类别准确率仍低于 50%。

Try it out

立即体验

MAI-Code-1-Flash is now rolling out to VS Code GitHub Copilot individual users. No additional setup is required. As the rollout progresses, you may see GitHub Copilot route tasks to MAI-Code-1-Flash through the Auto picker, or see the model available directly in the model picker. Here are a few fun sample apps we built with MAI-Code-1-Flash in VS Code: We would love to hear from you! Please join the GitHub Community to share your feedback.

MAI-Code-1-Flash 现正向 VS Code GitHub Copilot 个人用户推送,无需额外设置。随着推送的进行,您可能会看到 GitHub Copilot 通过自动选择器将任务分配给 MAI-Code-1-Flash,或者直接在模型选择器中看到该模型。以下是我们使用 MAI-Code-1-Flash 在 VS Code 中构建的一些有趣的示例应用:我们非常期待听到您的反馈!请加入 GitHub 社区分享您的想法。

Build the Future With Us

与我们共建未来

We’re a lean, fast-moving lab made up of some of the world’s most talented minds. We have an exciting roadmap of compute at MAI, with our next-generation GB200 cluster now operational. And we have an ambitious mission we truly believe in. We’re also fortunate to partner with incredible product teams giving our models the chance to reach billions of users and create immense positive impact. If you’re a brilliant, highly-ambitious and low ego individual, you’ll fit right in—come and join us as we work on our next generation of models! Explore all jobs

我们是一个精简、敏捷的实验室,由全球最顶尖的人才组成。我们在 MAI 拥有令人兴奋的计算路线图,下一代 GB200 集群现已投入运行。我们肩负着深信不疑的宏伟使命。我们也很幸运能与出色的产品团队合作,让我们的模型有机会触达数十亿用户,并产生巨大的积极影响。如果您是一位才华横溢、雄心勃勃且虚怀若谷的人,这里将非常适合您——加入我们,共同研发下一代模型吧!查看所有职位。