Build Your Own Local AI Coding Agent with Gemma 4 and OpenCode

Build Your Own Local AI Coding Agent with Gemma 4 and OpenCode

使用 Gemma 4 和 OpenCode 构建你自己的本地 AI 编程助手

Coding agents are now part of normal development work. Many people use them through cloud-hosted models, as it’s just convenient, and very capable models can be used. But when it comes to cost control, or if you don’t want to send your code to the cloud for privacy concerns, or you are experimenting and want to better understand how the agent stack actually works, you might want to try a local setup. This is what this post is about.

编程助手现已成为日常开发工作的一部分。许多人通过云端托管模型来使用它们,因为这非常方便,且可以使用能力极强的模型。但如果考虑到成本控制,或者出于隐私顾虑不想将代码发送到云端,又或者你只是想通过实验来深入了解智能体(Agent)架构的实际运作方式,那么你可能想尝试本地部署。这就是本文要探讨的内容。

Here, we’ll set up a local coding agent with three pieces: Ollama, for serving the model; Gemma 4, as the local LLM; OpenCode, as the agent interface. By the end, we’ll have OpenCode connected to a local LLM.

在这里,我们将使用三个组件来搭建一个本地编程助手:用于提供模型服务的 Ollama、作为本地大语言模型(LLM)的 Gemma 4,以及作为智能体交互界面的 OpenCode。最终,我们将实现 OpenCode 与本地 LLM 的连接。


1. Install Ollama

1. 安装 Ollama

We start by installing Ollama, which will serve the Gemma 4 model locally. If you haven’t used it before, Ollama is a runtime for downloading, running, and serving local language models from your own machine. Once it is set up, Ollama exposes a local API endpoint. This way, other tools (e.g., OpenCode) can talk to the model directly.

我们首先安装 Ollama,它将在本地提供 Gemma 4 模型服务。如果你之前没用过,Ollama 是一个用于在本地机器上下载、运行和托管本地语言模型的运行时。安装完成后,Ollama 会暴露一个本地 API 端点,这样其他工具(例如 OpenCode)就可以直接与模型进行通信。

On Windows machines, you can do that from the official installer: https://ollama.com/download. Alternatively, you can also install it from PowerShell by using winget: winget install Ollama.Ollama. After installation, you should be able to see the Ollama from the Windows Start menu. You can launch it like any other app. Once it is running, you should see the Ollama icon in the system tray, and this means the local Ollama service is running in the background.

在 Windows 机器上,你可以通过官方安装程序进行安装:https://ollama.com/download。或者,你也可以在 PowerShell 中使用 winget 安装:winget install Ollama.Ollama。安装后,你应该能在 Windows 开始菜单中看到 Ollama,像启动其他应用程序一样启动它。运行后,你应该会在系统托盘中看到 Ollama 图标,这意味着本地 Ollama 服务正在后台运行。

If you are on a Linux machine, you can install Ollama with: curl -fsSL https://ollama.com/install.sh | sh. After installation, check if Ollama is available: ollama --version. Once Ollama is installed, it runs a local server on your machine. Later, OpenCode will talk to this local Ollama server instead of calling a cloud model provider.

如果你使用的是 Linux 机器,可以通过以下命令安装 Ollama:curl -fsSL https://ollama.com/install.sh | sh。安装完成后,检查 Ollama 是否可用:ollama --version。一旦安装完成,Ollama 就会在你的机器上运行一个本地服务器。稍后,OpenCode 将与这个本地 Ollama 服务器通信,而不是调用云端模型提供商。


2. Download Gemma 4

2. 下载 Gemma 4

Next, we prepare a local LLM. For this post, we’ll use Gemma 4. Gemma 4 is a new open model released by Google on April 2, 2026. This model is designed for reasoning, coding, multimodal understanding, and agentic workflows. It comes in multiple sizes, including smaller edge-oriented variants and larger workstation-oriented variants.

接下来,我们准备一个本地 LLM。在本文中,我们将使用 Gemma 4。Gemma 4 是 Google 于 2026 年 4 月 2 日发布的一款全新开源模型。该模型专为推理、编码、多模态理解和智能体工作流而设计。它有多种尺寸,包括面向边缘设备的较小版本和面向工作站的较大版本。

Since this post is about running the model locally on a laptop, we’ll set up the edge-friendly variants, i.e., the E2B (gemma4:e2b) and E4B (gemma4:e4b) variants. In Ollama’s naming, the E stands for “effective” parameters. For this walkthrough, I use the E4B model as it gives more capability.

由于本文旨在介绍如何在笔记本电脑上本地运行模型,我们将设置适合边缘设备的版本,即 E2B (gemma4:e2b) 和 E4B (gemma4:e4b) 版本。在 Ollama 的命名中,E 代表“有效”参数。在本教程中,我使用 E4B 模型,因为它具备更强的能力。

In PowerShell or Linux terminal: ollama pull gemma4:e4b. You can check the downloaded model: ollama list. On my machine, Ollama reports the following: gemma4:e4b 9.6 GB. For reference, my laptop has an Intel i7-13800H CPU, 32 GB RAM, and an NVIDIA RTX 2000 Ada Laptop GPU with about 8 GB VRAM. You can choose gemma4:e2b instead if E4B feels too slow.

在 PowerShell 或 Linux 终端中输入:ollama pull gemma4:e4b。你可以通过 ollama list 查看已下载的模型。在我的机器上,Ollama 显示:gemma4:e4b 9.6 GB。作为参考,我的笔记本电脑配置为 Intel i7-13800H CPU、32 GB RAM 和 NVIDIA RTX 2000 Ada 笔记本 GPU(约 8 GB 显存)。如果觉得 E4B 太慢,你可以选择 gemma4:e2b。

A few technical notes here. The version of gemma4:e4b that we downloaded earlier is a 4-bit quantized model, with GGUF as the local model format used by Ollama runtimes. On my machine, Ollama reports gemma4:e4b supports with a 128K context length. Before moving to the next step, we can do a quick test: ollama run gemma4:e4b "what's the capital of France?". If you get “Paris” back, then congratulations, Gemma 4 is now available on your local machine through Ollama.

这里有几点技术说明。我们之前下载的 gemma4:e4b 版本是一个 4 位量化模型,使用 GGUF 作为 Ollama 运行时所采用的本地模型格式。在我的机器上,Ollama 显示 gemma4:e4b 支持 128K 的上下文长度。在进入下一步之前,我们可以做一个快速测试:ollama run gemma4:e4b "what's the capital of France?"。如果你得到了“Paris”的回复,那么恭喜你,Gemma 4 现在已经可以通过 Ollama 在你的本地机器上使用了。


3. Install OpenCode

3. 安装 OpenCode

Next, we need an agent interface. We’ll use OpenCode for that. If you have used tools like Claude Code or Codex, OpenCode belongs to the same broad category. You can think of it as an agent runtime that can operate within a local repo, inspect files, run commands, and perform various tasks.

接下来,我们需要一个智能体交互界面。我们将使用 OpenCode。如果你用过 Claude Code 或 Codex 之类的工具,OpenCode 属于同一大类。你可以把它看作是一个智能体运行时,它可以在本地代码库中操作、检查文件、运行命令并执行各种任务。

An important difference that matters for us is that OpenCode is open-source and agnostic about LLM providers. You can connect it to cloud models (e.g., Claude/GPT/Gemini models), or you can connect it to a local model served by Ollama. That is exactly what we’ll do here.

对我们来说,一个重要的区别是 OpenCode 是开源的,并且不绑定特定的 LLM 提供商。你可以将其连接到云端模型(如 Claude/GPT/Gemini 模型),也可以将其连接到由 Ollama 提供的本地模型。这正是我们接下来要做的。

If you are on a Windows machine, you’d need to first install Node.js. You can do so via: winget install OpenJS.NodeJS.LTS. On Linux, you can do: sudo apt update and sudo apt install -y nodejs npm. After installation, verify if both node and npm are available: node --version and npm --version. Now we can install OpenCode: npm install -g opencode-ai. Then verify the installation: opencode --version.

如果你使用的是 Windows 机器,需要先安装 Node.js。可以通过以下命令安装:winget install OpenJS.NodeJS.LTS。在 Linux 上,你可以运行:sudo apt updatesudo apt install -y nodejs npm。安装完成后,验证 node 和 npm 是否可用:node --versionnpm --version。现在我们可以安装 OpenCode:npm install -g opencode-ai。然后验证安装:opencode --version

At this point, OpenCode is installed. You can simply launch the interactive OpenCode TUI (terminal UI) from any project folder by running: opencode.

至此,OpenCode 已安装完毕。你只需在任何项目文件夹中运行 opencode,即可启动交互式 OpenCode TUI(终端用户界面)。


4. Connect OpenCode to Gemma 4

4. 将 OpenCode 连接到 Gemma 4

By default, OpenCode doesn’t know which model we want to use. Therefore, we need to point it to the Gemma 4 model, served by Ollama. Let’s first create an Ollama model tag with the full context window (128K) enabled. This is important because we want to make sure the agent can work properly without being truncated in context.

默认情况下,OpenCode 不知道我们要使用哪个模型。因此,我们需要将其指向由 Ollama 提供的 Gemma 4 模型。首先,让我们创建一个启用了完整上下文窗口(128K)的 Ollama 模型标签。这一点很重要,因为我们要确保智能体能够正常工作,而不会因为上下文被截断而受影响。

We can do that with a small Ollama Modelfile. Specifically, we can create a file called gemma4-e4b-128k.Modelfile in the folder/repo we want to work with:

FROM gemma4:e4b
PARAMETER num_ctx 131072

Then, in the command line, we create a new Ollama tag by: ollama create gemma4:e4b-128k -f gemma4-e4b-128k.Modelfile.

我们可以通过一个小的 Ollama Modelfile 来实现。具体来说,可以在我们想要工作的文件夹/仓库中创建一个名为 gemma4-e4b-128k.Modelfile 的文件:

FROM gemma4:e4b
PARAMETER num_ctx 131072

然后,在命令行中通过以下命令创建一个新的 Ollama 标签:ollama create gemma4:e4b-128k -f gemma4-e4b-128k.Modelfile

Something to point out: this would not trigger a new model downloading! It just creates an Ollama profile that uses the same Gemma 4 E4B model, but explicitly sets the runtime context window to 128K.

需要指出的是:这不会触发新的模型下载!它只是创建了一个 Ollama 配置文件,该文件使用相同的 Gemma 4 E4B 模型,但明确将运行时上下文窗口设置为 128K。