How the itrstats tax assistant works: one query, every layer
How the itrstats tax assistant works: one query, every layer
itrstats 税务助手是如何工作的:一个查询,贯穿每一层
This post walks through how the itrstats (https://itrstats.in) tax assistant handles a single compound user question, end to end through every layer of the backend. 本文将详细介绍 itrstats (https://itrstats.in) 税务助手如何处理一个复合用户问题,并从后端每一层进行端到端的解析。
A user types this in: “What’s tax on ₹15 lakh in new regime, what percentile am I in, and is the marginal relief rule relevant here?” 用户输入了这样一个问题:“在新税制下,150万卢比的税额是多少?我处于什么百分位?边际减免规则在这里适用吗?”
The response that comes back: “In the new regime, tax on ₹15 lakh is ₹97,500 for FY 2025-26. Under the old regime, the same income would be ₹2,57,400, so the new regime is cheaper by ₹1,59,900. You are in roughly the top 17.42% of Indian taxpayers, with about 82.58% earning less. Marginal relief is not relevant here because it applies around the ₹12 lakh 87A rebate threshold and surcharge thresholds, not at ₹15 lakh.” 返回的回答是:“在新税制下,2025-26财年150万卢比的税额为97,500卢比。在旧税制下,同样的收入税额为257,400卢比,因此新税制节省了159,900卢比。你大约处于印度纳税人前17.42%的位置,约82.58%的人收入低于此水平。边际减免规则在此处不适用,因为它适用于120万卢比的87A退税门槛和附加费门槛附近,而非150万卢比。”
Behind that response, three tools fired, the model made two passes, and a composer did a final validation strip before anything left the server. The whole thing finishes in about four seconds. 在这一回答背后,系统触发了三个工具,模型进行了两轮处理,并在数据离开服务器前由一个“合成器”(composer)进行了最终的验证过滤。整个过程在大约四秒内完成。
The model did not compute a single number in the entire trace. It picked tools, narrated the result, and was kept on the rails by a Pydantic-enforced output schema. This post follows that one query through every layer of the system. 在整个追踪过程中,模型没有计算任何一个数字。它负责选择工具、叙述结果,并通过 Pydantic 强制执行的输出模式(schema)保持在预定轨道上。本文将追踪这一查询在系统每一层的流转过程。
Hop zero: the request arrives
第零跳:请求到达
The POST lands at /v1/assistant/query. The route is intentionally thin: validate, rate-limit, call one action, return. Orchestration lives in the action layer, which does the boring-but-necessary plumbing before the agent ever runs:
POST 请求到达 /v1/assistant/query。路由层被刻意设计得很轻量:仅负责验证、限流、调用一个动作并返回。编排逻辑位于动作层(action layer),它在代理(agent)运行前处理那些枯燥但必要的准备工作:
- Resolves a conversation ID, generating a new one if needed.
- 解析会话 ID,必要时生成一个新的。
- Persists the request row to Postgres before the model is called, so a downstream crash still leaves a record of what was asked.
- 在调用模型前将请求行持久化到 Postgres,以便在下游崩溃时仍能保留查询记录。
- Loads recent conversation turns and replays them into the agent’s input as alternating user/assistant messages.
- 加载最近的会话轮次,并将它们作为交替的用户/助手消息回放给代理的输入端。
- There is no server-side thread state in the model. Hands off to
run_agent, then persists the final response on the way back out. That’s the route → action layer. - 模型中没有服务器端的线程状态。将任务移交给
run_agent,然后在返回时持久化最终响应。这就是“路由 → 动作层”的工作流程。
The interesting work starts inside run_agent.
有趣的工作始于 run_agent 内部。
Hop one: the agent decides what to call
第一跳:代理决定调用什么
The agent seeds its input list with system prompt, replayed turns, and the current query. The model call uses OpenAI’s Responses API with three knobs that matter: 代理使用系统提示词、回放的会话轮次和当前查询来初始化其输入列表。模型调用使用了 OpenAI 的 Responses API,其中有三个关键参数:
response = await client.responses.parse(
model=AssistantConfig.OPENAI_ANSWER_MODEL,
input=input_list,
tools=tools, # JSON Schema for each registered tool
tool_choice="auto",
parallel_tool_calls=True,
text_format=AssistantAnswerSchema, # Pydantic-enforced output
)
tools= is the JSON-Schema list of registered tools the model can call. parallel_tool_calls=True lets the model request several tool calls in one response instead of one at a time. text_format=AssistantAnswerSchema constrains the final answer to a Pydantic schema, so once the model stops calling tools it cannot return free-form text.
tools= 是模型可以调用的已注册工具的 JSON-Schema 列表。parallel_tool_calls=True 允许模型在一次响应中请求多个工具调用,而不是一次一个。text_format=AssistantAnswerSchema 将最终答案限制为 Pydantic 模式,因此一旦模型停止调用工具,它就不能返回自由格式的文本。
For our ₹15 lakh query, hop 1’s output is three function calls in one response. No final answer this hop. Just three calls. The loop body executes them and re-runs the model. 对于我们的150万卢比查询,第一跳的输出是在一次响应中包含三个函数调用。这一跳没有最终答案,只有三个调用。循环体执行这些调用并重新运行模型。
The LLM is the dispatcher, not the calculator. A model that hallucinates tax math is a liability. A model that picks the right tool and forwards the user’s numbers is a feature. 大语言模型是调度员,而不是计算器。一个会产生税务数学幻觉的模型是累赘,而一个能选择正确工具并转发用户数据的模型才是核心功能。
The three tools fire
三个工具触发
Each registered tool is a ToolSpec with four fields: a name, a description, a JSON Schema for its parameters, and an async handler that takes an AgentContext plus the model’s arguments and returns a dict.
每个已注册的工具都是一个包含四个字段的 ToolSpec:名称、描述、参数的 JSON Schema,以及一个异步处理程序(handler)。该处理程序接收 AgentContext(包含请求元数据、先前轮次和检索工具存入的 collected_chunks 字段的结构体)以及模型的参数,并返回一个字典。