What I learned building an AI agent loop in Go

What I learned building an AI agent loop in Go

在 Go 语言中构建 AI Agent 循环的经验总结

Hello there! A few months ago I built nevinho, a personal AI agent I run on my own machine. Bash, file edits, web search, voice input, the works. It taught me a lot, but the whole thing was hardcoded around my own use case. Anyone who wanted something similar had to fork it and rip it apart. So I started over. 大家好!几个月前,我构建了 nevinho,这是一个在我自己机器上运行的个人 AI Agent。它具备 Bash 操作、文件编辑、网页搜索、语音输入等功能。这段经历让我受益匪浅,但整个项目是针对我个人的使用场景硬编码的。任何想要类似功能的人都必须 fork 代码并进行彻底重构。因此,我决定推倒重来。

Vikusha is the same idea, but as a Go framework. You bring your own system prompt, your own tools, your own transports, and the harness handles the rest. Which means I’m writing the core agent loop again. This time as a reusable framework, so others can build their own agents on top of it instead of forking mine. This post is about that loop. The thing every AI coding tool, every chatbot with tools, every “AI agent” is doing under the hood. Once you see it, you can’t unsee it. Vikusha 秉持同样的理念,但它是一个 Go 框架。你只需提供自己的系统提示词(system prompt)、工具和传输方式,框架会处理剩下的工作。这意味着我正在重写核心的 Agent 循环。这一次,我将其设计为一个可复用的框架,这样其他人就可以在它之上构建自己的 Agent,而无需 fork 我的代码。这篇文章就是关于这个循环的——这是每一个 AI 编程工具、每一个带工具的聊天机器人、每一个“AI Agent”在底层都在做的事情。一旦你看透了它,就再也无法忽视它。

What an agent actually does

Agent 到底在做什么

When you ask an AI assistant “what’s in this directory?”, a lot looks like it’s happening. The model “decides” to run a command, “reads” the output, “answers” you. It feels intelligent. What’s really happening is a loop. You send the model your message plus a list of tools it can call. The model replies with either text (it’s answering you) or a tool call (it wants to run something). If it called a tool, you run it, send the result back, and ask again. Eventually it replies with text and you’re done. That’s it. That’s the agent. 当你问 AI 助手“这个目录下有什么?”时,看起来发生了很多事情。模型“决定”运行一个命令,“读取”输出,“回答”你。这感觉很智能。但实际上发生的是一个循环:你向模型发送你的消息以及它可调用的工具列表。模型要么回复文本(回答你),要么回复工具调用(它想运行某些东西)。如果它调用了工具,你就运行它,将结果发回,然后再次询问。最终它会回复文本,任务完成。就是这样,这就是 Agent。

loop:
  response = provider.complete(system, messages, tools)
  text, tool_calls = split(response.content)
  if no tool_calls:
    return text
  messages += assistant(response.content)
  messages += user(run each tool → tool_result)

No “reasoning engine”, no chain-of-thought magic. The model decides what to do, you execute, the model sees the output, the model decides again. The loop is the abstraction. 没有所谓的“推理引擎”,也没有思维链的魔法。模型决定做什么,你执行,模型看到输出,模型再次决定。这个循环就是抽象的核心。

The four things that bit me

我踩过的四个坑

When I first wrote this I got it wrong in roughly four ways. Each one took me a confusing afternoon to figure out. 当我第一次编写这段代码时,大约在四个地方犯了错。每一个错误都让我花了一个下午的时间才搞清楚。

  1. Exit on absence of tool_use, not on stop_reason. Anthropic’s API returns a stop_reason field. It feels like the right exit condition. It isn’t. stop_reason can be max_tokens while there are still tool calls in the response. The actual signal is whether the response content has any tool_use blocks. If yes, run them and loop. If no, you’re done.

  2. 根据是否存在 tool_use 判断退出,而不是根据 stop_reason。 Anthropic 的 API 会返回一个 stop_reason 字段。这看起来像是正确的退出条件,但其实不然。当响应中仍有工具调用时,stop_reason 可能是 max_tokens。真正的信号是响应内容中是否包含任何 tool_use 块。如果有,运行它们并继续循环;如果没有,则任务完成。

  3. Send the assistant’s full content back, unchanged. When the model returns text plus tool calls, you have to append both as one message in the conversation history. If you split them, the API rejects the next request because the tool result references a tool_use id that isn’t in the previous message anymore.

  4. 将助手的完整内容原封不动地发回。 当模型同时返回文本和工具调用时,你必须将两者作为一条消息追加到对话历史中。如果你将它们拆分,API 会拒绝下一次请求,因为工具结果引用了一个在上一条消息中已不存在的 tool_use id。

  5. All tool results go in one user message. If the model called three tools in parallel, all three results have to come back in a single user message containing three tool_result blocks. Putting them in three separate messages breaks the pairing.

  6. 所有工具结果必须放在一条用户消息中。 如果模型并行调用了三个工具,这三个结果必须放在包含三个 tool_result 块的单条用户消息中返回。将它们放在三条独立的消息中会破坏配对关系。

  7. Errors are data, not exceptions. If a tool crashes or returns garbage, don’t abort the loop. Wrap the error in a tool_result with is_error: true and send it back to the model. The model sees the failure and either retries with different input or tells the user what happened. If you throw, the user gets nothing.

  8. 错误是数据,而不是异常。 如果工具崩溃或返回了垃圾数据,不要中止循环。将错误包装在 is_error: true 的 tool_result 中并发送回模型。模型会看到失败,要么用不同的输入重试,要么告诉用户发生了什么。如果你直接抛出异常,用户将一无所获。

These four rules are the entire correctness of the loop. Everything else is wrapping. 这四条规则构成了循环正确性的全部。其他一切都只是包装。

Two providers, same loop

两个提供商,同一个循环

Here’s where it gets interesting. Anthropic and OpenAI both support tool calling, but their wire formats are nothing alike. Anthropic puts tool calls inside the assistant’s content array, alongside text blocks. OpenAI puts them in a separate tool_calls field on the message. Anthropic sends tool arguments as a JSON object. OpenAI sends them as a JSON-encoded string. Anthropic puts the system prompt at the top level of the request. OpenAI prepends it as a message with role “system”. 有趣的地方来了。Anthropic 和 OpenAI 都支持工具调用,但它们的传输格式完全不同。Anthropic 将工具调用放在助手的 content 数组中,与文本块并列;OpenAI 则将它们放在消息中一个独立的 tool_calls 字段里。Anthropic 将工具参数作为 JSON 对象发送,而 OpenAI 将其作为 JSON 编码的字符串发送。Anthropic 将系统提示词放在请求的顶层,而 OpenAI 将其作为 role 为 “system” 的消息放在最前面。

If you build the loop against one of them, the other looks like a totally different problem. The fix is to have your own internal representation and translate at the edges. In Vikusha I have a generic llm.Block type with three variants: text, tool_use, tool_result. The agent loop only knows about blocks. Each provider has a Complete method that takes a generic request and returns a generic response. The translation lives inside the provider package, hidden behind the interface. 如果你针对其中一个构建循环,另一个看起来就像是一个完全不同的问题。解决方法是拥有自己的内部表示,并在边界处进行转换。在 Vikusha 中,我定义了一个通用的 llm.Block 类型,包含三种变体:text、tool_use 和 tool_result。Agent 循环只识别这些块。每个提供商都有一个 Complete 方法,接收通用请求并返回通用响应。转换逻辑位于 provider 包内部,隐藏在接口之后。

type Provider interface {
    Name() string
    Complete(ctx context.Context, req *Request) (*Response, error)
}

That’s the whole contract. Plug in Anthropic, OpenAI, OpenRouter, Ollama, whatever. The loop doesn’t care. This abstraction earns its keep the moment you switch providers. I started building against Anthropic, ran out of API credit, switched to OpenRouter (which speaks the OpenAI dialect), and the agent code didn’t change a line. Same loop, same Chat call, same tool execution. Just a different constructor. 这就是全部契约。接入 Anthropic、OpenAI、OpenRouter、Ollama 等任何服务,循环逻辑都不关心。当你切换提供商的那一刻,这种抽象的价值就体现出来了。我最初是针对 Anthropic 构建的,后来 API 额度用完了,切换到 OpenRouter(它使用 OpenAI 协议),Agent 代码一行都没改。同样的循环,同样的 Chat 调用,同样的工具执行,只是换了一个构造函数。

Tools, the easy part

工具,最简单的部分

A tool in Vikusha is anything that satisfies this interface: 在 Vikusha 中,任何满足以下接口的东西都可以作为工具:

type Tool interface {
    Name() string
    Description() string
    Schema() json.RawMessage
    Run(ctx context.Context, input json.RawMessage) (string, error)
}

Name and description are what the model sees when deciding whether to call you. Schema is JSON schema for the input parameters. Run executes the thing and returns text. The first real tool I built was file_read. It’s about 30 lines. Name 和 Description 是模型在决定是否调用你时看到的内容。Schema 是输入参数的 JSON schema。Run 执行具体操作并返回文本。我构建的第一个真实工具是 file_read,大约 30 行代码。

func (r *Read) Name() string { return "file_read" }
func (r *Read) Description() string { return "Read the contents of a file at the given path." }
func (r *Read) Schema() json.RawMessage { return json.RawMessage(`{ "type": "object", "properties": {"path": {"type": "string"}}, "required": ["path"] }`) }
func (r *Read) Run(ctx context.Context, input json.RawMessage) (string, error) {
    var in struct{ Path string }
    if err := json.Unmarshal(input, &in); err != nil { return "", err }
    data, err := os.ReadFile(in.Path)
    if err != nil { return "", err }
    return string(data), nil
}

The model gets the name, description, and schema in the request. When it wants to read a file, it sends back {"name": "file_read", "input": {"path": "go.mod"}}. The loop looks the tool up by name, runs it, and feeds the result back as a tool. 模型在请求中获取名称、描述和 schema。当它想读取文件时,它会返回 {"name": "file_read", "input": {"path": "go.mod"}}。循环通过名称查找工具,运行它,并将结果作为工具结果反馈回去。