ZAYA1-8B Technical Report

Computer Science > Artificial Intelligence arXiv:2605.05365 (cs) [Submitted on 6 May 2026] 计算机科学 > 人工智能 arXiv:2605.05365 (cs) [提交于 2026 年 5 月 6 日]

Title: ZAYA1-8B Technical Report 标题： ZAYA1-8B 技术报告

Authors: Robert Washbourne, Rishi Iyer, Tomas Figliolia, Henry Zheng, Ryan Lorig-Roach, Sungyeon Yang, Pritish Yuvraj, Quentin Anthony, Yury Tokpanov, Xiao Yang, Ganesh Nanduru, Stephen Ebert, Praneeth Medepalli, Skyler Szot, Srivatsan Rajagopal, Alex Ong, Bhavana Mehta, Beren Millidge 作者： Robert Washbourne, Rishi Iyer, Tomas Figliolia, Henry Zheng, Ryan Lorig-Roach, Sungyeon Yang, Pritish Yuvraj, Quentin Anthony, Yury Tokpanov, Xiao Yang, Ganesh Nanduru, Stephen Ebert, Praneeth Medepalli, Skyler Szot, Srivatsan Rajagopal, Alex Ong, Bhavana Mehta, Beren Millidge

Abstract: We present ZAYA1-8B, a reasoning-focused mixture-of-experts (MoE) model with 700M active and 8B total parameters, built on Zyphra’s MoE++ architecture. ZAYA1-8B’s core pretraining, midtraining, and supervised fine-tuning (SFT) were performed on a full-stack AMD compute, networking, and software platform. 摘要： 我们推出了 ZAYA1-8B，这是一款专注于推理的混合专家（MoE）模型，拥有 7 亿激活参数和 80 亿总参数，基于 Zyphra 的 MoE++ 架构构建。ZAYA1-8B 的核心预训练、中期训练和监督微调（SFT）均在全栈 AMD 计算、网络和软件平台上完成。

With under 1B active parameters, ZAYA1-8B matches or exceeds DeepSeek-R1-0528 on several challenging mathematics and coding benchmarks, and remains competitive with substantially larger open-weight reasoning models. ZAYA1-8B was trained from scratch for reasoning, with reasoning data included from pretraining onward using an answer-preserving trimming scheme. 在激活参数不足 10 亿的情况下，ZAYA1-8B 在多项具有挑战性的数学和编码基准测试中达到或超过了 DeepSeek-R1-0528，并与规模大得多的开源权重推理模型保持竞争力。ZAYA1-8B 从零开始进行推理训练，并从预训练阶段开始通过一种保留答案的修剪方案纳入了推理数据。

Post-training uses a four-stage RL cascade: reasoning warmup on math and puzzles; a 400-task RLVE-Gym curriculum; math and code RL with test-time compute traces and synthetic code environments built from competitive-programming references; and behavioral RL for chat and instruction following. 训练后阶段采用了四阶段强化学习（RL）级联：针对数学和谜题的推理预热；包含 400 个任务的 RLVE-Gym 课程；结合测试时计算轨迹和基于竞赛编程参考构建的合成代码环境的数学与代码强化学习；以及用于聊天和指令遵循的行为强化学习。

We also introduce Markovian RSA, a test-time compute method that recursively aggregates parallel reasoning traces while carrying forward only bounded-length reasoning tails between rounds. In TTC evaluation, Markovian RSA raises ZAYA1-8B to 91.9% on AIME’25 and 89.6% on HMMT’25 while carrying forward only a 4K-token tail, narrowing the gap to much larger reasoning models including Gemini-2.5 Pro, DeepSeek-V3.2, and GPT-5-High. 我们还引入了马尔可夫 RSA（Markovian RSA），这是一种测试时计算方法，它递归地聚合并行推理轨迹，同时在各轮次之间仅传递有限长度的推理尾部。在 TTC 评估中，马尔可夫 RSA 将 ZAYA1-8B 在 AIME’25 上的成绩提升至 91.9%，在 HMMT’25 上提升至 89.6%，且仅需传递 4K token 的尾部，从而缩小了与 Gemini-2.5 Pro、DeepSeek-V3.2 和 GPT-5-High 等规模大得多的推理模型之间的差距。

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL) 学科： 人工智能 (cs.AI)；计算与语言 (cs.CL)