LLM APIs as Infrastructure: Building Deterministic Systems Around Probabilistic AI

LLM API 即基础设施：在概率性 AI 周围构建确定性系统

The nature of an AI API: Somebody built this thing before you ever touched it. A lab trained it on an enormous amount of text, aligned it, wrapped it behind an endpoint, and rented it to you. You don’t own the model. You inherit the interface: its capabilities, its limits, its context window, and its cost. AI API 的本质：在你接触它之前，就已经有人构建了它。某个实验室用海量文本对其进行了训练、对齐，将其封装在 API 端点之后，并租借给你使用。你并不拥有该模型，你继承的是它的接口：它的能力、局限性、上下文窗口以及成本。

When you send a prompt, the model has two sources of information: what it learned during training and the context you provide. Training is fixed. Context is yours. From those two inputs, it produces the most probable continuation. People hear “probable” and think “guessing.” But it isn’t guessing like a coin flip. It’s producing the most likely continuation given everything the model has learned and everything you just told it — a weighted, structured output shaped by patterns and distributions. 当你发送提示词（Prompt）时，模型有两个信息来源：训练期间学到的知识和你提供的上下文。训练是固定的，上下文则是你的。基于这两个输入，它会生成最可能的续写内容。人们听到“概率性”往往会想到“猜测”，但这并不是像抛硬币那样的随机猜测。它是基于模型已学到的所有知识和你刚刚告知的所有信息，生成最可能的续写——这是一种由模式和分布所塑造的、加权的结构化输出。

Most of the time, that’s useful. Sometimes, it’s confidently wrong. If it gets something wrong, that does not mean there is a bug waiting to be patched. It’s the behavior you have to design around. 大多数时候，这是有用的。但有时，它会一本正经地胡说八道。如果它出错了，并不意味着有一个等待修复的 Bug，而是你需要围绕这种行为进行系统设计。

Beyond the API call

超越 API 调用

Traditional APIs train us to think in predictable systems: send a request, expect a response. Even complex systems have repeatable behavior. There may be state management, caching, retries, and rate limits, but your code usually knows what success and failure look like. AI systems break that expectation at the model layer. Same prompt, same context, but different output. Not because something failed. Because that’s how the component works. 传统的 API 训练我们以可预测的系统思维去思考：发送请求，期待响应。即使是复杂的系统也具有可重复的行为。虽然可能涉及状态管理、缓存、重试和速率限制，但你的代码通常清楚成功和失败是什么样子的。而 AI 系统在模型层打破了这种预期。同样的提示词、同样的上下文，却可能产生不同的输出。这不是因为系统故障，而是因为该组件的工作原理就是如此。

The request can succeed, the response can be 200 OK, the JSON can validate, and your logs can look clean, yet the answer can still be wrong. Hallucinations, dropped instructions, valid JSON with incorrect data. Nothing technically failed. The model simply produced the wrong output. That shifts the focus from whether the system responded to whether the response is right. 请求可能成功，响应可能是 200 OK，JSON 格式校验通过，日志看起来也很干净，但答案依然可能是错的。幻觉、指令遗漏、JSON 格式正确但数据错误——从技术上讲，没有任何环节失败，模型只是输出了错误的结果。这使得关注点从“系统是否响应”转移到了“响应是否正确”。

Traditional APIs usually fail in ways your code already knows how to handle. The request either resolves or rejects, and your error handling catches both. 传统 API 的失败方式通常是你的代码已知晓如何处理的。请求要么成功，要么拒绝，你的错误处理机制可以捕获这两者。

try {
  const response = await fetchClaim(id);
  renderClaim(response);
} catch (error) {
  showError(error);
}

Traditional software encourages a simple mental model: Request → Response → Done. AI systems need more checkpoints between the user’s input and the final output — and a separate quality gate before anything ships at all. 传统软件鼓励一种简单的思维模型：请求 → 响应 → 完成。而 AI 系统需要在用户输入和最终输出之间设置更多的检查点，并在任何内容发布之前设置独立的质量门禁。

Runtime (every request): User Input → Prompt/Context → LLM → Structured Output → Schema Validation → Business Rules → UI 运行时（每次请求）： 用户输入 → 提示词/上下文 → LLM → 结构化输出 → Schema 校验 → 业务规则 → UI

Pre-deployment (CI/CD): Test Dataset → Full Pipeline → Eval → Pass/Fail Gate → Deploy 部署前（CI/CD）： 测试数据集 → 完整流水线 → 评估 → 通过/失败门禁 → 部署

The API call is only one step. The model is only one part. It is not the whole app. API 调用只是其中一步，模型也只是其中一部分，它并不代表整个应用程序。

Where determinism matters

确定性在何处至关重要

In a probabilistic system, determinism matters when the output becomes data, triggers an action, or changes the state of the application. If a user says, “I got rear-ended yesterday,” the assistant can explain the interpretation in different ways. But the submitted incidentDate cannot be different every time. The system needs one resolved value, one validation path, and one record of what was submitted. 在概率性系统中，当输出转化为数据、触发操作或改变应用程序状态时，确定性就变得至关重要。如果用户说“我昨天被追尾了”，助手可以用不同的方式解释这句话，但提交的 incidentDate（事故日期）不能每次都不同。系统需要一个确定的值、一条校验路径以及一份关于提交内容的记录。

Lock these down:

Final form fields: incidentDate, injuries, accidentType
Required fields: whether something is complete or missing
Business rules: whether the form can be submitted
Actions: sending the form, saving a record, approving a claim
Audit/history: what value was used and why

锁定这些内容：

最终表单字段：incidentDate、injuries、accidentType
必填字段：判断内容是否完整或缺失
业务规则：判断表单是否可以提交
操作：发送表单、保存记录、审批理赔
审计/历史记录：使用了什么值以及原因

Let these breathe:

The assistant’s wording
The explanation shown to the user
Suggestions for what to clarify
Summaries or labels that do not trigger action

让这些内容保持灵活：

助手的措辞
展示给用户的解释
关于需要澄清内容的建议
不触发操作的摘要或标签

The boundary in practice

实践中的边界

Every step between the model and your user is a layer you own and control. Here’s what that looks like in a real integration. In a form application, the model can help turn a user’s plain language into structured data. 模型与用户之间的每一步都是你拥有并可控的层级。以下是它在实际集成中的样子。在一个表单应用中，模型可以帮助将用户的自然语言转化为结构化数据。

A user might write: “I got rear-ended yesterday. My side mirror broke, but nobody was hurt.” 用户可能会写：“我昨天被追尾了。侧后视镜坏了，但没人受伤。”

The application expects this shape: 应用程序期望的格式如下：

interface IncidentExtraction {
  formData: {
    incidentDate: string;
    accidentType: string;
    damageDescription: string;
    injuries: "Yes" | "No" | "Unclear";
    notes: string;
  };
  feedback: string[];
  confirmation: string;
}

But the model does not receive a TypeScript interface. It receives instructions and a strict schema. 但模型接收的并不是 TypeScript 接口，而是指令和一个严格的 Schema。

import { GoogleGenAI, Type } from "@google/genai";

const ai = new GoogleGenAI({});

const incidentSchema = {
  type: Type.OBJECT,
  properties: {
    formData: {
      type: Type.OBJECT,
      properties: {
        incidentDate: { type: Type.STRING, description: "Resolved date in YYYY-MM-DD format." },
        accidentType: { type: Type.STRING, description: "Short classification of the incident." },
        damageDescription: { type: Type.STRING, description: "Brief description of the damage." },
        injuries: { type: Type.STRING, enum: ["Yes", "No", "Unclear"], description: "Whether injuries were mentioned." },
        notes: { type: Type.STRING, description: "Any additional relevant details." }
      },
      required: ["incidentDate", "accidentType", "damageDescription", "injuries", "notes"]
    },
    feedback: { type: Type.ARRAY, items: { type: Type.STRING }, description: "User-facing notes about assumptions or missing details." },
    confirmation: { type: Type.STRING, description: "Short message asking the user to review the filled form." }
  },
  required: ["formData", "feedback", "confirmation"]
};

const today = "2026-07-02";
const prompt = `
  Today is ${today}. Extract incident details from the user's description.
  User description: "I got rear-ended yesterday. My side mirror broke, but nobody was hurt."
  Rules:
  - Resolve relative dates using today's date.
  - If a field is unclear, use "Unclear" or an empty string.
  - Do not submit the form.
`;

const response = await ai.models.generateContent({
  model: "gemini-2.5-flash",
  contents: prompt,
  config: {
    responseMimeType: "application/json",
    responseSchema: incidentSchema,
    temperature: 0.1 // Lower temperature for more consistent extractions
  }
});

The schema shapes the output. It does not guarantee it. Everything after this line is your code. Schema 可以塑造输出，但不能保证输出。在这行代码之后的一切，都属于你的代码逻辑。