How the itrstats tax assistant works: one query, every layer

itrstats 税务助手是如何工作的：一个查询，贯穿每一层

This post walks through how the itrstats (https://itrstats.in) tax assistant handles a single compound user question, end to end through every layer of the backend. 本文将详细介绍 itrstats (https://itrstats.in) 税务助手如何处理一个复合用户问题，并从后端每一层进行端到端的解析。

A user types this in: “What’s tax on ₹15 lakh in new regime, what percentile am I in, and is the marginal relief rule relevant here?” 用户输入了这样一个问题：“在新税制下，150万卢比的税额是多少？我处于什么百分位？边际减免规则在这里适用吗？”

The response that comes back: “In the new regime, tax on ₹15 lakh is ₹97,500 for FY 2025-26. Under the old regime, the same income would be ₹2,57,400, so the new regime is cheaper by ₹1,59,900. You are in roughly the top 17.42% of Indian taxpayers, with about 82.58% earning less. Marginal relief is not relevant here because it applies around the ₹12 lakh 87A rebate threshold and surcharge thresholds, not at ₹15 lakh.” 返回的回答是：“在新税制下，2025-26财年150万卢比的税额为97,500卢比。在旧税制下，同样的收入税额为257,400卢比，因此新税制节省了159,900卢比。你大约处于印度纳税人前17.42%的位置，约82.58%的人收入低于此水平。边际减免规则在此处不适用，因为它适用于120万卢比的87A退税门槛和附加费门槛附近，而非150万卢比。”

Behind that response, three tools fired, the model made two passes, and a composer did a final validation strip before anything left the server. The whole thing finishes in about four seconds. 在这一回答背后，系统触发了三个工具，模型进行了两轮处理，并在数据离开服务器前由一个“合成器”（composer）进行了最终的验证过滤。整个过程在大约四秒内完成。

The model did not compute a single number in the entire trace. It picked tools, narrated the result, and was kept on the rails by a Pydantic-enforced output schema. This post follows that one query through every layer of the system. 在整个追踪过程中，模型没有计算任何一个数字。它负责选择工具、叙述结果，并通过 Pydantic 强制执行的输出模式（schema）保持在预定轨道上。本文将追踪这一查询在系统每一层的流转过程。

Hop zero: the request arrives

第零跳：请求到达

The POST lands at /v1/assistant/query. The route is intentionally thin: validate, rate-limit, call one action, return. Orchestration lives in the action layer, which does the boring-but-necessary plumbing before the agent ever runs: POST 请求到达 /v1/assistant/query。路由层被刻意设计得很轻量：仅负责验证、限流、调用一个动作并返回。编排逻辑位于动作层（action layer），它在代理（agent）运行前处理那些枯燥但必要的准备工作：

Resolves a conversation ID, generating a new one if needed.
解析会话 ID，必要时生成一个新的。
Persists the request row to Postgres before the model is called, so a downstream crash still leaves a record of what was asked.
在调用模型前将请求行持久化到 Postgres，以便在下游崩溃时仍能保留查询记录。
Loads recent conversation turns and replays them into the agent’s input as alternating user/assistant messages.
加载最近的会话轮次，并将它们作为交替的用户/助手消息回放给代理的输入端。
There is no server-side thread state in the model. Hands off to run_agent, then persists the final response on the way back out. That’s the route → action layer.
模型中没有服务器端的线程状态。将任务移交给 run_agent，然后在返回时持久化最终响应。这就是“路由 → 动作层”的工作流程。

The interesting work starts inside run_agent. 有趣的工作始于 run_agent 内部。

Hop one: the agent decides what to call

第一跳：代理决定调用什么

The agent seeds its input list with system prompt, replayed turns, and the current query. The model call uses OpenAI’s Responses API with three knobs that matter: 代理使用系统提示词、回放的会话轮次和当前查询来初始化其输入列表。模型调用使用了 OpenAI 的 Responses API，其中有三个关键参数：

response = await client.responses.parse(
    model=AssistantConfig.OPENAI_ANSWER_MODEL,
    input=input_list,
    tools=tools, # JSON Schema for each registered tool
    tool_choice="auto",
    parallel_tool_calls=True,
    text_format=AssistantAnswerSchema, # Pydantic-enforced output
)

tools= is the JSON-Schema list of registered tools the model can call. parallel_tool_calls=True lets the model request several tool calls in one response instead of one at a time. text_format=AssistantAnswerSchema constrains the final answer to a Pydantic schema, so once the model stops calling tools it cannot return free-form text. tools= 是模型可以调用的已注册工具的 JSON-Schema 列表。parallel_tool_calls=True 允许模型在一次响应中请求多个工具调用，而不是一次一个。text_format=AssistantAnswerSchema 将最终答案限制为 Pydantic 模式，因此一旦模型停止调用工具，它就不能返回自由格式的文本。

For our ₹15 lakh query, hop 1’s output is three function calls in one response. No final answer this hop. Just three calls. The loop body executes them and re-runs the model. 对于我们的150万卢比查询，第一跳的输出是在一次响应中包含三个函数调用。这一跳没有最终答案，只有三个调用。循环体执行这些调用并重新运行模型。

The LLM is the dispatcher, not the calculator. A model that hallucinates tax math is a liability. A model that picks the right tool and forwards the user’s numbers is a feature. 大语言模型是调度员，而不是计算器。一个会产生税务数学幻觉的模型是累赘，而一个能选择正确工具并转发用户数据的模型才是核心功能。

The three tools fire

三个工具触发

Each registered tool is a ToolSpec with four fields: a name, a description, a JSON Schema for its parameters, and an async handler that takes an AgentContext plus the model’s arguments and returns a dict. 每个已注册的工具都是一个包含四个字段的 ToolSpec：名称、描述、参数的 JSON Schema，以及一个异步处理程序（handler）。该处理程序接收 AgentContext（包含请求元数据、先前轮次和检索工具存入的 collected_chunks 字段的结构体）以及模型的参数，并返回一个字典。