I Built an AI Agent Orchestrator Where Gemma 4 Only Knows What You Teach It

我构建了一个 AI 智能体编排器，让 Gemma 4 只懂你教它的内容

Gemma 4 Challenge: Build With Gemma 4 Submission. This is a submission for the Gemma 4 Challenge: Build with Gemma 4. Gemma 4 挑战赛：Gemma 4 构建作品提交。这是我为“Gemma 4 构建挑战赛”提交的作品。

What I Built: GemmaOrch is a skill-based AI agent orchestrator: you define what an agent knows by dropping Markdown files into a folder, assign those skills to a named agent, and chat with it. The agent powered by Gemma 4 will only answer within the boundaries of those files — it refuses anything outside scope with a precise phrase, never hallucinates expertise it wasn’t given. 我构建了什么：GemmaOrch 是一个基于技能的 AI 智能体编排器：你只需将 Markdown 文件放入文件夹即可定义智能体的知识库，将这些技能分配给指定的智能体，然后即可与其对话。由 Gemma 4 驱动的智能体只会回答这些文件范围内的内容——它会用精确的短语拒绝任何超出范围的问题，绝不会虚构它未被赋予的专业知识。

The core idea: agent behavior lives in .md files, not in code. No prompts hardcoded in the application. No domain logic baked into the service layer. The skill files are the agent. 核心理念：智能体的行为存在于 .md 文件中，而不是代码里。应用程序中没有硬编码的提示词，服务层中也没有嵌入任何领域逻辑。技能文件本身就是智能体。

What it solves: building specialized AI assistants usually means either fine-tuning a model (expensive, slow to iterate) or writing complex prompt engineering into your codebase (brittle, hard to maintain). GemmaOrch separates the two concerns — the orchestration logic stays in Java, the expertise lives in plain Markdown that anyone can read and edit. 它解决了什么问题：构建专业的 AI 助手通常意味着要么微调模型（昂贵且迭代缓慢），要么在代码库中编写复杂的提示词工程（脆弱且难以维护）。GemmaOrch 将这两者分离开来——编排逻辑保留在 Java 中，而专业知识则存在于任何人都能阅读和编辑的纯 Markdown 文件中。

Key features: Skill-driven agents (system prompts built at runtime), GitHub skill importer, streaming chat via Spring WebFlux, MCP server (JSON-RPC 2.0), REST API, and zero infrastructure (H2 database). 主要功能：技能驱动的智能体（运行时构建系统提示词）、GitHub 技能导入器、通过 Spring WebFlux 实现的流式聊天、MCP 服务器（JSON-RPC 2.0）、REST API 以及零基础设施依赖（H2 数据库）。

Built with: Java 25 · Spring Boot 3.5 · Spring AI 1.1.5 · Thymeleaf · HTMX 2.0. 技术栈：Java 25 · Spring Boot 3.5 · Spring AI 1.1.5 · Thymeleaf · HTMX 2.0。

How I Used Gemma 4: I used the gemma-4-31b-it model — the 31B dense instruction-tuned variant — via Google AI Studio through Spring AI’s spring-ai-starter-model-google-genai. 我如何使用 Gemma 4：我通过 Spring AI 的 spring-ai-starter-model-google-genai，经由 Google AI Studio 使用了 gemma-4-31b-it 模型（即 31B 稠密指令微调版本）。

Why the 31B dense, specifically: The project enforces a hard constraint: agents must refuse anything outside their assigned skills. I tested smaller variants first. The 4B model followed the constraint most of the time, but would occasionally drift. With the 31B dense, these failures essentially disappeared. The constraint held reliably across multi-turn conversations and adversarial inputs. 为什么特别选择 31B 稠密模型：该项目强制执行一个硬性约束：智能体必须拒绝任何超出其分配技能范围的内容。我最初测试了较小的版本。4B 模型在大多数情况下能遵守约束，但偶尔会“跑偏”。而使用 31B 稠密模型后，这些错误基本消失了。在多轮对话和对抗性输入中，该约束表现得非常可靠。

Two specific things the 31B unlocked: Long-context constraint adherence (it doesn’t “forget” instructions as context grows) and Role disambiguation (it correctly understands it is the agent being invoked, not the orchestrator). 31B 模型带来的两个关键优势：长上下文约束遵循能力（随着上下文增加，它不会“忘记”指令）以及角色消歧（它能准确理解自己是被调用的智能体，而不是调用智能体的编排器）。

Why not the 26B MoE? GemmaOrch is a single-tenant orchestrator where precision per response matters more than requests-per-second. The dense model’s full parameter activation per token is worth the inference cost for this use case. 为什么不用 26B MoE 模型？GemmaOrch 是一个单租户编排器，响应的精确度比每秒请求数更重要。对于这个用例，稠密模型在每个 Token 上激活全部参数所带来的推理成本是值得的。

The open-weights advantage: Gemma 4 is open. The application is architected so the model is an environment variable — swap AI Studio for a local Ollama instance and nothing else changes. For users with sensitive skill content, self-hosting is a real deployment path. 开放权重的优势：Gemma 4 是开放的。该应用程序的架构将模型设为环境变量——只需将 AI Studio 替换为本地的 Ollama 实例，其他一切无需更改。对于拥有敏感技能内容的用户来说，自托管是一个切实可行的部署路径。

Source: https://github.com/Bzaid94/gemma-agents-orchestrator.git · License: Apache 2.0 源码：https://github.com/Bzaid94/gemma-agents-orchestrator.git · 许可证：Apache 2.0