Google's new Gemma 4 12B model is designed to run on any laptop with 16GB of RAM

Google’s new Gemma 4 12B model is designed to run on any laptop with 16GB of RAM

谷歌推出新款 Gemma 4 12B 模型，专为 16GB 内存笔记本电脑设计

The generative AI boom has driven the cost of memory into the stratosphere, and Google is a key part of that trend. So it’s only fitting that Google should offer some less RAM-hungry local AI models. The company has announced the release of a new Gemma 4 model that fills a gap in the lineup that launched earlier this year. The new model is efficient enough that you may be able to run it on a pretty average consumer laptop.

生成式 AI 的热潮已将内存成本推向了天价，而谷歌正是这一趋势的主要推手之一。因此，谷歌推出一些对内存需求较低的本地 AI 模型显得尤为适宜。该公司宣布发布一款全新的 Gemma 4 模型，填补了今年早些时候发布的产品阵容中的空白。这款新模型效率极高，你甚至可以在普通的消费级笔记本电脑上运行它。

In April, Google released four models in the Gemma 4 family, which also marked the shift to a more open Apache 2.0 license. The initial models included two mobile-optimized options (E2B and E4B) along with a pair of models for more serious work (26B Mixture of Experts and 31B Dense). That left a rather large unserved space in the middle, which is right where the new model falls. Gemma 4 12B is considerably more capable than the mobile versions, but it won’t require a $20,000 AI accelerator to run locally.

今年 4 月，谷歌发布了 Gemma 4 系列的四款模型，这也标志着其转向了更开放的 Apache 2.0 许可证。最初的模型包括两款针对移动设备优化的版本（E2B 和 E4B），以及两款用于更繁重任务的模型（26B 混合专家模型和 31B 稠密模型）。这在中间留下了一个相当大的空白地带，而这正是新模型所处的位置。Gemma 4 12B 的能力远超移动版本，但它并不需要价值 2 万美元的 AI 加速器即可在本地运行。

Google says Gemma 4 12B is unique in that it can run on many consumer laptops without sacrificing quality. As long as you’ve got a computer with 16GB of system RAM or VRAM, the 12-billion-parameter model will work. That’s about half the total memory footprint of Gemma 4 26B MoE, and Google claims the new model is almost as capable, at least as far as benchmarks go.

谷歌表示，Gemma 4 12B 的独特之处在于它可以在许多消费级笔记本电脑上运行，且不会牺牲模型质量。只要你的电脑拥有 16GB 的系统内存或显存，这个拥有 120 亿参数的模型就能正常工作。这大约是 Gemma 4 26B MoE 总内存占用量的一半，谷歌声称，至少从基准测试来看，新模型的能力几乎与之相当。

Google says the new model is capable of complex multistep reasoning and agentic workflows that previously required the larger Gemma variants. Despite the smaller parameter count, Gemma 4 12B comes with the newly devised Multi-Token Prediction (MTP) drafters, which take advantage of unused processing cycles to calculate possible future tokens. The result is greater speed and efficiency. Google has released optional MTP versions of the other Gemma 4 models, but this is the first one to have MTP out of the box.

谷歌表示，该新模型能够处理复杂的、多步骤的推理和智能体工作流，而这些任务以前需要更大规模的 Gemma 变体才能完成。尽管参数量较小，但 Gemma 4 12B 配备了新设计的“多标记预测”（MTP）草稿器，它利用未使用的处理周期来计算未来可能的标记。其结果是更高的速度和效率。谷歌此前已为其他 Gemma 4 模型发布了可选的 MTP 版本，但这是首款开箱即用支持 MTP 的模型。

Gemma 4 12B is also more efficient thanks to a new approach to multimodality. The Gemma 4 family is natively multimodal, accepting text, audio, or images as inputs. Most gen AI models—including the other Gemma 4 variants—use dedicated encoders to process non-text inputs and pass that data to the LLM. This works well enough, but it increases latency and memory usage. With the new mid-weight model, Google has implemented a streamlined embedding module for vision, featuring single-matrix multiplication and positional embedding, which allows the data to pass to the LLM with proper spatial awareness. This eliminates the need for a bulky middleman encoder. For audio, there’s no encoding at all. The developers worked out a method of projecting the raw audio signal into the same vectors used for text tokens.

得益于多模态处理的新方法，Gemma 4 12B 的效率也更高。Gemma 4 系列原生支持多模态，可接收文本、音频或图像作为输入。大多数生成式 AI 模型（包括其他 Gemma 4 变体）使用专门的编码器来处理非文本输入，并将数据传递给大语言模型（LLM）。这种方法效果尚可，但会增加延迟和内存占用。对于这款新的中量级模型，谷歌实现了一个精简的视觉嵌入模块，采用单矩阵乘法和位置嵌入，使数据能够以适当的空间感知能力传递给 LLM。这消除了对笨重中间编码器的需求。对于音频，则完全不需要编码。开发人员研究出了一种将原始音频信号投影到与文本标记相同向量空间的方法。

If you want to check out the new Gemma 4 model, it’s accessible without a download via tools like LM Studio, Google AI Edge Gallery, and more. But the whole idea with Gemma 4 12B is that you can run it locally and on your own terms. If you’ve got the RAM, the model weights are available for download immediately on Kaggle and Hugging Face. It’s just shy of 18GB.

如果你想体验新款 Gemma 4 模型，可以通过 LM Studio、Google AI Edge Gallery 等工具直接访问，无需下载。但 Gemma 4 12B 的核心理念在于，你可以按照自己的方式在本地运行它。如果你有足够的内存，模型权重现已在 Kaggle 和 Hugging Face 上开放下载，大小略低于 18GB。