Gemini 3.5 Flash might be fast enough for gen AI to make sense

Gemini 3.5 Flash 的速度或许足以让生成式 AI 真正发挥价值

At last year’s I/O event, Google was still talking about the 2.5 branch of Gemini, and what a difference a year makes. We’ve gone through the 3.0 and 3.1 families since then, and now it’s on to version 3.5. Gemini 3.5 Flash is rolling out across a wide range of Google products starting today, and Google again claims this model is even better than its last-gen Pro model. That has been a trend with Google’s tick-tock model updates over the past year, but the team says this release is special.

在去年的 I/O 大会上，谷歌还在讨论 Gemini 2.5 系列，一年过去，变化之大令人惊叹。自那时起，我们已经历了 3.0 和 3.1 系列，现在又迎来了 3.5 版本。Gemini 3.5 Flash 从今天起开始在谷歌的各类产品中全面铺开，谷歌再次声称该模型甚至优于上一代的 Pro 模型。这已成为谷歌过去一年中“滴答”式模型更新的趋势，但团队表示，这次发布意义非凡。

Gemini 3.5 Flash supposedly offers frontier-level intelligence while also being efficient enough that it may finally make complex agentic tasks worth doing at scale. Tulsee Doshi, senior director of product management for Gemini, explains that the innovations of Gemini 3.5 Flash are woven through multiple Google products, and this is just the start.

据称，Gemini 3.5 Flash 在提供前沿级智能的同时，还具备极高的效率，这或许终于能让复杂的智能体（Agentic）任务实现大规模应用。Gemini 产品管理高级总监 Tulsee Doshi 解释说，Gemini 3.5 Flash 的创新已融入谷歌的多款产品中，而这仅仅是个开始。

It’s no secret that generative AI is currently a money pit, and all the major AI players are trying to find paths to greater efficiency. The problem is magnified when you start building agentic experiences that are supposed to run for longer to complete complex tasks. Gemini 3.5 Flash may be a big step toward making that viable. The new model can output nearly 300 tokens per second, but its benchmark scores are similar to larger frontier models (like 3.1 Pro) that build outputs at a quarter of that speed.

众所周知，生成式 AI 目前是一个“烧钱坑”，所有主要的 AI 厂商都在寻求提高效率的途径。当你开始构建需要长时间运行以完成复杂任务的智能体体验时，这个问题会被进一步放大。Gemini 3.5 Flash 可能是实现这一目标的一大进步。该新模型每秒可输出近 300 个 token，但其基准测试得分却与那些输出速度仅为其四分之一的大型前沿模型（如 3.1 Pro）相当。

Google now says that the companies using the most AI tokens could save a billion dollars per year by shifting to the more efficient Gemini 3.5 Flash. API pricing for the new model is significantly lower than the Pro model it apes. Gemini 3.5 Flash clocks in at $1.50 per 1M input tokens and $9 per 1M output tokens. The 3.1 Pro model starts at $2 and $12, respectively, and it’s higher if you use more than 200k tokens.

谷歌现在表示，通过转向更高效的 Gemini 3.5 Flash，AI token 使用量最大的公司每年可节省 10 亿美元。该新模型的 API 定价明显低于其对标的 Pro 模型。Gemini 3.5 Flash 的价格为每 100 万输入 token 1.50 美元，每 100 万输出 token 9 美元。相比之下，3.1 Pro 模型的起价分别为 2 美元和 12 美元，如果使用超过 20 万 token，价格还会更高。

According to Doshi, the team made numerous improvements in pre-training with Gemini 3.5 Flash, but insights gleaned from how devs use Gemini models are really paying off. “With post-training, we’re really starting to unlock some of the value of the feedback we’re getting from users, for example, from Antigravity,” said Doshi. “That’s really what you’re seeing play out in terms of the code performance and the tool use performance. And then, the hope is that you’ll continue to see the step change where 3.5 Pro will be better, and the next Flash meets Pro performance with that series.”

据 Doshi 介绍，团队在 Gemini 3.5 Flash 的预训练中进行了大量改进，但从开发者使用 Gemini 模型的方式中获得的洞察力才是真正见效的关键。“通过后训练，我们真正开始释放从用户反馈中获得的部分价值，例如来自 Antigravity 的反馈，”Doshi 说道。“这正是你在代码性能和工具使用性能方面所看到的成果。我们希望未来能继续看到这种阶梯式的变化，即 3.5 Pro 会表现更好，而下一代 Flash 将达到该系列 Pro 级别的性能。”

Google is focused on code generation with the new model, which is a core agentic angle for AI. Both Terminal Bench and SWE-Bench Pro tests show substantial improvements—3.5 Flash clobbers older Flash models and shows a small but measurable improvement versus Gemini 3.1 Pro. Its scores are in the same neighborhood as OpenAI’s much larger and more expensive GPT 5.5.

谷歌将新模型的重点放在了代码生成上，这是 AI 智能体的一个核心切入点。Terminal Bench 和 SWE-Bench Pro 的测试均显示出显著的改进——3.5 Flash 完胜旧版 Flash 模型，并比 Gemini 3.1 Pro 有了小幅但可衡量的提升。其得分与 OpenAI 体积更大、成本更高的 GPT 5.5 处于同一水平。

A major barrier in agentic workflows is how generative models can use interfaces designed for humans. It’s not an easy problem to solve, Doshi said. “Certain things like UI control are expensive to do because the model has to search the page, it has to know where to click, it has to act through multiple steps. I think Flash is able to do that well because of that combination of quality and cost.”

智能体工作流中的一个主要障碍是生成式模型如何使用为人类设计的界面。Doshi 表示，这不是一个容易解决的问题。“像 UI 控制这类任务成本很高，因为模型必须搜索页面、知道点击哪里，并分多个步骤执行操作。我认为 Flash 之所以能很好地完成这些任务，是因为它兼顾了质量与成本。”

Google’s AI evaluations demonstrate these improvements, too. Among Google’s current collection of benchmarks is OSWorld-Verified, which tests how models handle general tasks in real computing environments. It’s similar to the coding improvements. Gemini 3.5 Flash substantially outperforms older Flash models and is even a bit faster than Gemini 3.1 Pro. It’s essentially tied with GPT 5.5.

谷歌的 AI 评估也证明了这些改进。在谷歌目前的基准测试集里，OSWorld-Verified 用于测试模型在真实计算环境中处理通用任务的能力。结果与代码改进类似：Gemini 3.5 Flash 显著优于旧版 Flash 模型，甚至比 Gemini 3.1 Pro 还要快一点，基本与 GPT 5.5 持平。

Gemini 3.5 Flash has been rolled out internally at Google, and Doshi noted that it’s having a big impact. “We have a set of internal metrics we’ve been evaluating that measures how Googlers code, so looking at our own code bases and how well the models perform on that,” Doshi said. “And you can see a massive, massive jump between where 3.1 Pro was and where 3.5 Flash is.”

Gemini 3.5 Flash 已在谷歌内部部署，Doshi 指出它产生了巨大的影响。“我们有一套内部指标来评估谷歌员工的编码方式，即观察我们自己的代码库以及模型在其中的表现，”Doshi 说。“你可以看到 3.1 Pro 和 3.5 Flash 之间存在巨大的性能飞跃。”

Google unveiled the Antigravity IDE last year, and it’s being upgraded to version 2.0 with support for Gemini 3.5 Flash. This update will support multiple parallel workflows—essentially sub-agents spawned by Gemini 3.5 Flash. Again, Google says this is only possible because the new model is so efficient at spitting out tokens. In addition to Antigravity, Gemini 3.5 Flash is coming to the Gemini app, the API, AI Studio, Android Studio, and all of Google’s enterprise products. As for the Pro variant, Google says that’s already in internal testing, and it should be ready for release next month.

谷歌去年发布了 Antigravity IDE，目前正升级至 2.0 版本，并支持 Gemini 3.5 Flash。此次更新将支持多个并行工作流——本质上是由 Gemini 3.5 Flash 生成的子智能体。谷歌再次强调，这之所以成为可能，完全归功于新模型在输出 token 方面的高效性。除了 Antigravity，Gemini 3.5 Flash 还将登陆 Gemini 应用、API、AI Studio、Android Studio 以及谷歌所有的企业级产品。至于 Pro 版本，谷歌表示已在内部测试中，预计下个月发布。

Gemini Spark is 3.5 Flash in agent form

Gemini Spark：以智能体形式呈现的 3.5 Flash

Companies are moving on from “AI” as their primary buzzword to “agents.” With Gemini Spark, Google is offering its first dedicated agent to users. Spark runs 24/7 in Google’s cloud, so it doesn’t use any of your computing resources and isn’t tied to any specific device or browser tab. Instead, it spans your entire Google footprint, using Gemini Flash 3.5 to run multiple agentic workflows at your command.

企业正从将“AI”作为主要流行语转向“智能体”。通过 Gemini Spark，谷歌向用户提供了首个专用智能体。Spark 在谷歌云中 24/7 全天候运行，因此不会占用你的任何计算资源，也不受限于特定的设备或浏览器标签页。相反，它覆盖了你的整个谷歌生态，利用 Gemini Flash 3.5 根据你的指令运行多个智能体工作流。

Google doesn’t always explain its buzzwords very well. So what is an AI agent anyway? Google’s Doshi explains: “I think of agents as being able to take a model plus a harness [software interface] such that the combination can actually take action on your behalf.” With Spark, you can give the AI instructions, and it handles the task. This can take place over time as the agent grabs context from your Drive files, Gmail, and more. You could have it watch for certain emails and integrate them into daily digests or have it monitor your meetings and generate summaries and action items. Spark can send you notifications or ask follow-up questions to better meet your needs, too.

谷歌并不总是能很好地解释其流行语。那么，到底什么是 AI 智能体？谷歌的 Doshi 解释道：“我认为智能体是将模型加上一套外壳（软件接口），使二者结合后能够真正代表你采取行动。”有了 Spark，你可以给 AI 下达指令，它就会处理任务。这个过程可以持续进行，智能体会从你的云端硬盘文件、Gmail 等处获取上下文。你可以让它监控特定邮件并将其整合进每日摘要，或者让它监控会议并生成总结和待办事项。Spark 还可以向你发送通知或提出后续问题，以更好地满足你的需求。