Frontier AI has broken the open CTF format
Frontier AI has broken the open CTF format
前沿人工智能已经摧毁了开放式 CTF 竞赛模式
What makes me qualified to say this? I started playing CTFs in 2021, the same year I started university. My first CTF was HCKSYD, a 48-hour solo CTF. I full solved it and won in 2 hours. I was completely hooked. That led me to win DownUnderCTF, Australia’s largest CTF, with Blitzkrieg multiple times. Blitzkrieg was one of Australia’s strongest teams at the time. I later joined TheHackersCrew, an international top-tier team that was consistently ranked highly on CTFTime, the main global ranking and event calendar the scene uses as its scoreboard. With them, I competed in some of the most prestigious CTFs in the world, consistently placing well within the top 10 until the end of 2025.
我为什么有资格说这些?我从 2021 年开始参加 CTF(夺旗赛),那也是我上大学的第一年。我的第一次比赛是 HCKSYD,一场 48 小时的个人赛。我只用了 2 小时就完成了所有题目并夺冠,从此彻底沉迷其中。随后,我与 Blitzkrieg 战队多次赢得了澳大利亚最大的 CTF 赛事——DownUnderCTF。Blitzkrieg 当时是澳大利亚最强的战队之一。后来,我加入了 TheHackersCrew,这是一支国际顶尖战队,在 CTFTime(全球 CTF 圈主要的排名和赛事日历平台)上长期名列前茅。与他们一起,我参加了世界上一些最负盛名的比赛,直到 2025 年底,我们始终稳居前十。
I am not saying this because I dislike CTFs. I am saying it because CTFs were the thing that made me fall in love with security. They taught me how to learn, gave me a way to measure myself, and introduced me to many of the people I respect most in the field. Watching people pretend the format is still fine is frustrating because the old game is not there anymore.
我这么说并不是因为我讨厌 CTF。恰恰相反,正是 CTF 让我爱上了安全领域。它教会了我如何学习,为我提供了一种自我评估的方式,并让我结识了许多我最尊敬的业内人士。看着人们假装这种竞赛模式依然完好,我感到非常沮丧,因为那种旧有的游戏规则已经不复存在了。
What changed? As AI tools ramped up in capability, especially when GPT-4 first came out, a significant percentage of medium difficulty CTF challenges started becoming one-shottable, meaning a single prompt from a user could produce the solve and flag. You could paste a cryptography challenge into ChatGPT, come back in 10 minutes, and have the solution. At the time, we did not think too much of it. Hard challenges went mostly untouched, and the time save was not large enough to ruin the competition.
发生了什么变化?随着 AI 工具能力的提升,尤其是 GPT-4 问世后,很大一部分中等难度的 CTF 题目开始变得可以“一键秒杀”——用户只需输入一个提示词(prompt),就能得到解题思路和 Flag。你可以把一道密码学题目丢给 ChatGPT,10 分钟后再回来,答案就已经有了。当时我们并没有太在意。高难度题目大多仍未受影响,且节省的时间尚不足以破坏比赛的公平性。
The issue was never that AI could help. CTF players have always used tools. The issue is when the model does the reasoning, writes the solve, and leaves the human with nothing meaningful to do besides copy the flag.
问题的关键从来不在于 AI 能否提供帮助。CTF 选手一直都在使用工具。真正的问题在于,当模型完成了推理、写出了脚本,人类除了复制 Flag 之外,几乎无事可做。
Enter Claude Opus 4.5. When Opus 4.5 dropped, the tone changed. Almost every medium difficulty challenge, and some hard challenges, became agent-solvable. Claude Code packaged everything into a CLI and made it easy to connect other CLI and MCP tools. It became trivial to build an orchestrator that used the CTFd API to spin up a Claude instance for every challenge. You could let the system run for the first hour, then only start working on whatever was left.
Claude Opus 4.5 的出现改变了一切。当 Opus 4.5 发布时,风向变了。几乎所有的中等难度题目,甚至部分高难度题目,都变成了“智能体可解”。Claude Code 将一切封装进命令行界面(CLI),并能轻松连接其他 CLI 和 MCP 工具。现在,编写一个调用 CTFd API 为每道题自动启动一个 Claude 实例的调度器变得轻而易举。你只需要让系统运行第一个小时,然后处理剩下的残局即可。
That changed the game. Teams that refused to use AI were not just missing a convenience; they were playing a slower version of the competition. Open online CTFs started becoming a question of how quickly you could automate the easy and medium work, then how much human attention you had left for the hardest challenges. The scoreboard started measuring orchestration and willingness to use frontier models alongside, and sometimes above, security skill.
游戏规则变了。拒绝使用 AI 的战队不仅是放弃了便利,更是在进行一场“慢速版”的比赛。开放式在线 CTF 开始演变成一场竞赛:看谁能更快地自动化完成简单和中等难度的题目,从而为最难的题目留出更多的人力。排行榜衡量的不再仅仅是安全技能,而是对 AI 调度能力以及使用前沿模型意愿的较量,甚至后者已经超越了前者。
The effects were obvious. The CTFTime leaderboard started feeling wrong. Some legendary teams that were consistently near the top appeared less often. Player activity felt lower. Challenge developers who treated CTFs as an artform had less reason to spend weeks building something beautiful if it was going to be eaten by an agent in minutes.
影响显而易见。CTFTime 的排行榜开始变得“不对劲”。一些长期位居榜首的传奇战队出现频率降低了。选手的活跃度似乎也在下降。那些将 CTF 视为艺术的题目开发者,如果辛苦构思数周的题目在几分钟内就被智能体破解,他们也就失去了创作的动力。
GPT-5.5 seals the deal. I have been working heavily with GPT-5.5 and GPT-5.5 Pro after launch. By benchmark metrics, 5.5 is close to Claude Mythos’ capability, and Pro likely surpasses it. These models can one-shot Insane difficulty active leakless heap pwn challenges on HackTheBox. They can solve a large portion of what a smaller CTF organiser can realistically produce. If you orchestrate Pro against Insane challenges in a 48-hour CTF, there is a good chance you get the flag before the event ends.
GPT-5.5 彻底终结了悬念。发布后,我一直在深度使用 GPT-5.5 和 GPT-5.5 Pro。从基准测试来看,5.5 的能力已接近 Claude Mythos,而 Pro 版本很可能已经超越了它。这些模型可以“秒杀”HackTheBox 上 Insane 难度的无泄漏堆溢出(heap pwn)题目。它们能解决小型 CTF 主办方所能制作的大部分题目。如果你在 48 小时的 CTF 比赛中调度 Pro 模型去攻克高难度题目,很有可能在比赛结束前就能拿到 Flag。
That makes open CTFs pay-to-win. The more tokens you can throw at a competition, the faster you can burn down the board. Specialised cybersecurity models like alias1 by Alias Robotics are becoming less relevant compared to general frontier LLMs. The competition is turning into “who can afford to run enough agents, with enough context, for long enough.”
这使得开放式 CTF 变成了“氪金游戏”。你投入的 Token 越多,刷榜的速度就越快。像 Alias Robotics 的 alias1 这种专门的网络安全模型,在通用前沿大模型面前正变得越来越无关紧要。比赛正在演变成一场“谁能负担得起运行足够多的智能体、提供足够的上下文、并持续足够长的时间”的竞赛。
CTFs feel much more like a cheesable mess than a competition. Your performance in a CTF no longer defines your skill the way it used to. Recruiting security practitioners by CTF performance is becoming less meaningful. It is not even a particularly good measure of AI skill, because most of the orchestration needed for CTFs is already open source or vibe codeable.
CTF 现在感觉更像是一团可以被“投机取巧”破解的乱麻,而非一场竞技。你在 CTF 中的表现已无法像过去那样定义你的技术水平。通过 CTF 成绩来招聘安全从业者正变得越来越没有意义。它甚至不能很好地衡量 AI 使用水平,因为 CTF 所需的大部分调度代码要么是开源的,要么随便写写就能实现。
The “beginners are fine” take. I have seen various takes that beginners can still learn from CTFs as they always have. These takes miss the scoreboard. CTFs were not just a set of puzzles. They were a ladder. Even as a beginner, you had something to climb. You could see yourself improve, solve more challenges, place higher, join better teams, and become more competitive over time.
关于“新手没问题”的观点。我看到过各种说法,认为新手依然可以像过去一样从 CTF 中学习。这些观点忽略了排行榜的作用。CTF 不仅仅是一堆谜题,它是一架阶梯。即使是新手,也有攀登的目标。你可以看到自己的进步,解出更多的题,排名更高,加入更好的战队,并随着时间推移变得更具竞争力。
That feedback loop is breaking. If the visible scoreboard is dominated by teams using AI, a beginner is pushed toward using AI before they have built the instincts the AI is replacing. That is an anti-pattern. It prevents active learning, and active struggle is the bit that actually teaches you. It is also completely demotivating to put in real effort and see no visible progress because the ladder above you has been automated.
这种反馈循环正在断裂。如果排行榜被使用 AI 的战队所统治,新手在建立起 AI 所替代的那些直觉之前,就会被迫去使用 AI。这是一种反模式(anti-pattern)。它阻碍了主动学习,而“主动挣扎”的过程才是真正能让你学到东西的部分。当你付出真正的努力却看不到任何可见的进步,因为你头顶的阶梯已经被自动化所取代时,这会让人感到极其沮丧。
It also changes what challenge authors want to build. If beginner CTFs become another place where people quietly paste prompts and climb a scoreboard, authors have more reason to put their effort into learning platforms instead. At least on platforms like picoGym and HackTheBox, the expectation is education, and beginners are less incentivised to cheat themselves out of learning.
这也改变了题目作者的创作意图。如果新手 CTF 变成了人们默默粘贴提示词就能刷榜的地方,作者们就有更多理由将精力转向学习平台。至少在 picoGym 和 HackTheBox 这样的平台上,目标是教育,新手通过作弊来剥夺自己学习机会的动机也会更小。
Beginners are better off using picoGym, HackTheBox, and other lab environments where the point is actually learning instead of pretending the public scoreboard still reflects human growth.
对于新手来说,使用 picoGym、HackTheBox 和其他实验环境会更好,因为那里的重点是真正的学习,而不是假装公共排行榜还能反映人类的成长。
“CTF isn’t dead”. I have seen some hopium posts about how CTF is not dead, it is just augmented by AI. They often point at CTFs like DEF CON to argue that AI still cannot solve everything. That is true, but it is the wrong defence. The hardest top-tier finals have very few participants, and they are usually gated behind qualifiers that are easier than the finals them.
“CTF 没死”。我看到过一些盲目乐观的文章,说 CTF 并没有死,只是被 AI 增强了。他们经常以 DEF CON 等 CTF 为例,辩称 AI 依然无法解决所有问题。这确实是真的,但这是一种错误的辩护。最顶级的决赛参与者极少,而且通常设有预选赛门槛,而预选赛的难度往往低于决赛本身。