zai-org / GLM-5

GLM-5.2 & GLM-5.1 & GLM-5 👋 Join our Wechat or Discord community. 📖 Check out the GLM-5.2 blog and GLM-5 Technical report. 📍 Use GLM-5.2 API services on Z.ai API Platform. 🔜 Try GLM-5.2 at z.ai. GLM-5.2 & GLM-5.1 & GLM-5 👋 加入我们的微信或 Discord 社区。📖 查看 GLM-5.2 博客和 GLM-5 技术报告。📍 在 Z.ai API 平台使用 GLM-5.2 API 服务。🔜 前往 z.ai 体验 GLM-5.2。

Introduction

简介

GLM-5.2, our latest flagship model for long-horizon tasks. It marks a substantial leap in long-horizon task capability over its predecessor GLM-5.1 and, for the first time, delivers that capability on a solid 1M-token context. GLM-5.2 是我们最新的长周期任务旗舰模型。它在长周期任务能力上较前代 GLM-5.1 实现了质的飞跃，并首次在稳健的 100 万 token 上下文中实现了这一能力。

GLM-5.2’s new capabilities include: GLM-5.2 的新功能包括：

Solid 1M Context: A solid 1M-token context that stably sustains long-horizon work
稳健的 100 万上下文： 提供稳健的 100 万 token 上下文，可稳定支持长周期任务。
Advanced Coding with Flexible Effort: Stronger coding capabilities with multiple thinking effort levels to balance performance and latency
灵活的进阶编程： 更强的编程能力，支持多种思维努力程度（thinking effort levels），以平衡性能与延迟。
Improved Architecture: We propose IndexShare, which reuses the same indexer across every four sparse attention layers, reducing per-token FLOPs by 2.9× at a 1M context length. We also improve GLM-5.2’s MTP layer for speculative decoding, increasing the acceptance length by up to 20%
架构改进： 我们提出了 IndexShare，在每四个稀疏注意力层中复用同一个索引器，在 100 万上下文长度下将每个 token 的 FLOPs 降低了 2.9 倍。我们还改进了 GLM-5.2 用于投机采样的 MTP 层，使接受长度提升了高达 20%。

On standard coding benchmarks, GLM-5.2 is the strongest open-source model, improving on GLM-5.1 by a wide margin: 81.0 vs. 62.0 on Terminal-Bench 2.1 and 62.1 vs. 58.4 on SWE-bench Pro. It also closes much of the gap to the closed-source frontier — on Terminal-Bench 2.1 (81.0) it lands within a few points of Claude Opus 4.8 (85.0) — while staying ahead of Gemini 3.1 Pro. For more detail, check our blog. 在标准编程基准测试中，GLM-5.2 是目前最强的开源模型，较 GLM-5.1 有显著提升：在 Terminal-Bench 2.1 上得分为 81.0（对比 62.0），在 SWE-bench Pro 上得分为 62.1（对比 58.4）。它也大幅缩小了与闭源前沿模型的差距——在 Terminal-Bench 2.1 上（81.0）与 Claude Opus 4.8（85.0）仅有几分之差，同时领先于 Gemini 3.1 Pro。更多详情请查看我们的博客。

GLM-5.1

GLM-5.1 is our next-generation flagship model for agentic engineering, with significantly stronger coding capabilities than its predecessor. It achieves state-of-the-art performance on SWE-Bench Pro and leads GLM-5 by a wide margin on NL2Repo (repo generation) and Terminal-Bench 2.0 (real-world terminal tasks). GLM-5.1 是我们面向智能体工程的下一代旗舰模型，其编程能力较前代有显著增强。它在 SWE-Bench Pro 上达到了业界领先水平，并在 NL2Repo（代码库生成）和 Terminal-Bench 2.0（真实终端任务）上大幅领先于 GLM-5。

But the most meaningful leap goes beyond first-pass performance. Previous models—including GLM-5—tend to exhaust their repertoire early: they apply familiar techniques for quick initial gains, then plateau. Giving them more time doesn’t help. GLM-5.1, by contrast, is built to stay effective on agentic tasks over much longer horizons. We’ve found that the model handles ambiguous problems with better judgment and stays productive over longer sessions. It breaks complex problems down, runs experiments, reads results, and identifies blockers with real precision. By revisiting its reasoning and revising its strategy through repeated iteration, GLM-5.1 sustains optimization over hundreds of rounds and thousands of tool calls. The longer it runs, the better the result. 但最重大的飞跃不仅在于首轮表现。之前的模型（包括 GLM-5）往往会过早耗尽其“技能库”：它们使用熟悉的技巧快速获得初步成果，随后便陷入瓶颈，给予更多时间也无济于事。相比之下，GLM-5.1 旨在在更长的周期内保持智能体任务的有效性。我们发现该模型能以更好的判断力处理模糊问题，并在更长的会话中保持高效。它能拆解复杂问题、运行实验、读取结果并精准识别阻碍。通过反复迭代回顾推理过程并修正策略，GLM-5.1 能够在数百轮对话和数千次工具调用中持续优化。运行时间越长，结果越好。

GLM-5

We are launching GLM-5, targeting complex systems engineering and long-horizon agentic tasks. Scaling is still one of the most important ways to improve the intelligence efficiency of Artificial General Intelligence (AGI). Compared to GLM-4.5, GLM-5 scales from 355B parameters (32B active) to 744B parameters (40B active), and increases pre-training data from 23T to 28.5T tokens. GLM-5 also integrates DeepSeek Sparse Attention (DSA), largely reducing deployment cost while preserving long-context capacity. 我们推出了 GLM-5，旨在解决复杂系统工程和长周期智能体任务。规模化仍然是提升通用人工智能（AGI）智能效率的最重要途径之一。与 GLM-4.5 相比，GLM-5 的参数规模从 355B（激活 32B）扩展至 744B（激活 40B），预训练数据从 23T 增加到 28.5T token。GLM-5 还集成了 DeepSeek 稀疏注意力机制（DSA），在保持长上下文能力的同时大幅降低了部署成本。

Reinforcement learning aims to bridge the gap between competence and excellence in pre-trained models. However, deploying it at scale for LLMs is a challenge due to the RL training inefficiency. To this end, we developed slime, a novel asynchronous RL infrastructure that substantially improves training throughput and efficiency, enabling more fine-grained post-training iterations. 强化学习旨在弥合预训练模型在“胜任”与“卓越”之间的差距。然而，由于强化学习训练效率低下，在大规模 LLM 上部署它是一项挑战。为此，我们开发了 slime，这是一种新型异步强化学习基础设施，显著提升了训练吞吐量和效率，从而实现了更细粒度的后训练迭代。

With advances in both pre-training and post-training, GLM-5 delivers significant improvement compared to GLM-4.7 across a wide range of academic benchmarks and achieves best-in-class performance among all open-source models in the world on reasoning, coding, and agentic tasks, closing the gap with frontier models. 凭借预训练和后训练的双重进步，GLM-5 在广泛的学术基准测试中较 GLM-4.7 实现了显著提升，并在推理、编程和智能体任务方面达到了全球开源模型中的顶尖水平，进一步缩小了与前沿模型的差距。

GLM-5 is purpose-built for complex systems engineering and long-horizon agentic tasks. On our internal evaluation suite CC-Bench-V2, GLM-5 significantly outperforms GLM-4.7 across frontend, backend, and long-horizon tasks, narrowing the gap to Claude Opus 4.5. On Vending Bench 2, a benchmark that measures long-term operational capability, GLM-5 ranks #1 among open-source models. Vending Bench 2 requires the model to run a simulated vending machine business over a one-year horizon; GLM-5 finishes with a final account balance of $4,432, approaching Claude Opus 4.5 and demonstrating strong long-term planning and resource management. GLM-5 专为复杂系统工程和长周期智能体任务而设计。在我们内部评估套件 CC-Bench-V2 上，GLM-5 在前端、后端和长周期任务中均显著优于 GLM-4.7，缩小了与 Claude Opus 4.5 的差距。在衡量长期运营能力的基准测试 Vending Bench 2 中，GLM-5 在开源模型中排名第一。Vending Bench 2 要求模型模拟运营一家自动售货机业务一年；GLM-5 最终账户余额达到 4,432 美元，接近 Claude Opus 4.5，展现了强大的长期规划和资源管理能力。

Download Model

模型下载

Model	Hugging Face	ModelScope	Size	Precision
GLM-5.2	🤗 Hugging Face	🤖 ModelScope	744B-A40B	BF16
GLM-5.2-FP8	🤗 Hugging Face	🤖 ModelScope	744B-A40B	FP8
GLM-5.1	🤗 Hugging Face	🤖 ModelScope	744B-A40B	BF16
GLM-5.1-FP8	🤗 Hugging Face	🤖 ModelScope	744B-A40B	FP8
GLM-5	🤗 Hugging Face	🤖 ModelScope	744B-A40B	BF16
GLM-5-FP8	🤗 Hugging Face	🤖 ModelScope	744B-A40B	FP8

Serve GLM-5 Series Locally

本地部署 GLM-5 系列

GLM-5.2 supports deployment with the following frameworks. Feel free to try them out: GLM-5.2 支持通过以下框架进行部署，欢迎尝试：

SGLang (v0.5.13.post1+) — see cookbook
vLLM (v0.23.0+) — see recipes
Transformers (v0.5.12+) — see transformers docs
KTransformers (v0.5.12+) — see tutorial

For deployment on the Ascend NPU platform, inference frameworks such as vLLM-Ascend, xLLM and SGLang are supported — see here. 如需在昇腾 NPU 平台上部署，支持 vLLM-Ascend、xLLM 和 SGLang 等推理框架——详情见此处。

GLM-5 supports controlling the thinking budget through the reasoning_effort parameter, which accepts two levels: max and high. max is the default — if reasoning_effort is left unset (or set to any value other than high), the model runs at Max. To use the High level, you must explicitly pass reasoning_effort="high". For default scenarios such as benchmark/leaderboard reproduction, keep Max (no setting required); only set reasoning_effort="high" when you specifically want the High level. Thinking can be turned off entirely by setting enable_thinking=false. GLM-5 支持通过 reasoning_effort 参数控制思维预算，该参数接受两个级别：max 和 high。max 为默认值——如果未设置 reasoning_effort（或设置为除 high 以外的任何值），模型将以 Max 级别运行。若要使用 High 级别，必须显式传入 reasoning_effort="high"。对于基准测试/排行榜复现等默认场景，请保持 Max（无需设置）；仅在明确需要 High 级别时才设置 reasoning_effort="high"。通过设置 enable_thinking=false 可以完全关闭思维过程。

Citation

引用

If you find GLM-5 series model useful in your research, please cite our technical report: 如果您在研究中使用了 GLM-5 系列模型，请引用我们的技术报告：

@misc{glm5team2026glm5vibecodingagentic,
  title={GLM-5: from Vibe Coding to Agentic Engineering},
  author={GLM-5-Team and : and Aohan Zeng and Xin Lv and Zhenyu Hou and Zhengxiao Du and Qinkai Zheng and Bin Chen and Da Yin}
}