Stop Shipping AI Slop: Build an Anti-Slop Harness Around Your LLM
Stop Shipping AI Slop: Build an Anti-Slop Harness Around Your LLM
停止发布 AI 垃圾内容:为你的大模型构建一套防垃圾处理机制
“AI slop” is not a model problem. It’s an engineering problem you decided not to solve. The slop is the bland, off-voice, half-hallucinated, occasionally-just-an-error-message text that your LLM emits maybe 5% of the time — and that 5% is the part users screenshot. “AI 垃圾内容”(AI slop)并非模型本身的问题,而是你选择不去解决的工程问题。这些垃圾内容通常表现为平庸、语调不符、半幻觉,甚至是偶尔出现的错误提示信息。你的大模型大约有 5% 的概率会输出这些内容,而恰恰是这 5% 的内容会被用户截图传播。
The instinct is to fix it in the prompt: add three more sentences of “be concise, be accurate, match my tone.” That treats a stochastic system as if it were deterministic. It isn’t. You cannot prompt your way to a guarantee. 人们的直觉是去修改提示词(Prompt):多加三句话,要求它“简洁、准确、符合我的语调”。这实际上是将一个随机系统当作确定性系统来对待,但它并非如此。你无法仅靠提示词来实现质量保证。
What actually works is treating the model like any other unreliable upstream dependency: wrap it in a harness that validates, rejects, and retries before anything reaches a user. The model proposes; the harness disposes. Here’s how to build one. 真正有效的方法是把模型当作任何其他不可靠的上游依赖项来处理:为其包裹一层处理机制(Harness),在内容到达用户之前进行验证、拒绝和重试。模型负责生成,机制负责把关。以下是构建方法。
Slop is a systems problem, not a prompt problem
垃圾内容是系统问题,而非提示词问题
Every production LLM feature I’ve shipped converged on the same shape: the model is one stage in a pipeline, not the pipeline itself. You don’t trust raw generation any more than you’d trust raw user input. You parse it, you validate it against constraints you can express in code, and you reject anything that fails — automatically, before a human ever sees it. 我所发布过的每一个生产级大模型功能,最终都归结为同一种架构:模型只是流水线中的一个环节,而不是流水线本身。你不能信任原始生成内容,就像你不会信任原始用户输入一样。你需要对其进行解析,根据代码定义的约束条件进行验证,并自动拒绝任何不合格的内容——在人类用户看到之前。
The key insight is that most slop is detectable. Empty output, a leaked stack trace, the wrong language, a 900-word answer when you asked for 200, a banned phrase like “in today’s fast-paced world” — these are all checkable with deterministic code. You don’t need a judge model to catch them (though a judge model has its place at the end). You need a gate that runs on every generation, costs microseconds, and never gets tired. Think of it as five layers, each rejecting a different class of failure. 核心洞察在于:大多数垃圾内容是可以被检测到的。空输出、泄露的堆栈跟踪、错误的语言、要求 200 字却输出了 900 字、包含“在当今快节奏的世界中”这类被禁短语——这些都可以通过确定性代码进行检查。你不需要一个裁判模型来捕捉它们(尽管裁判模型在最后阶段有其用武之地)。你需要的是一个在每次生成时运行、耗时仅微秒级且永不疲倦的“闸门”。可以将其想象为五层过滤机制,每一层负责拒绝不同类型的错误。
Layer 1: Structured output, not freeform text
第一层:结构化输出,而非自由文本
The single biggest reduction in slop comes from refusing to accept prose where you can demand structure. If you ask for a JSON object with named fields and a schema, the failure modes collapse from “infinite” to “a handful you can enumerate.” Use the provider’s native structured-output / tool-calling mode and attach a real schema — Pydantic, Zod, JSON Schema, whatever your stack speaks. 减少垃圾内容最有效的方法,就是在可以要求结构化输出的地方拒绝接受自由文本。如果你要求返回带有命名字段和模式(Schema)的 JSON 对象,失败模式就会从“无限”缩减为“可枚举的几种”。请使用模型提供商原生的结构化输出/工具调用模式,并附加一个真实的模式定义——无论是 Pydantic、Zod 还是 JSON Schema,只要你的技术栈支持即可。
This does two things. First, it forces the model to commit to a shape, which kills rambling preambles (“Sure! Here’s a great answer for you…”). Second, it gives you a parse step that fails loudly. If the model returns something that doesn’t validate, that’s not a soft warning — it’s a rejected generation that triggers a retry. A parse failure is a quality signal, not an exception to swallow. The corollary: never try/except: pass around your parser. A swallowed parse error is slop with the lights turned off.
这样做有两个好处。首先,它强制模型遵循特定的格式,从而消除了冗长的开场白(如“好的!这是为您准备的精彩回答……”)。其次,它提供了一个会“大声报错”的解析步骤。如果模型返回的内容无法通过验证,这不只是一个软警告,而是一个触发重试的拒绝信号。解析失败是质量问题的信号,而不是应该被吞掉的异常。结论是:永远不要在解析器周围使用 try/except: pass。被吞掉的解析错误,就是暗地里滋生的垃圾内容。
Layer 2: Reject error strings the model smuggles through
第二层:拒绝模型夹带的错误字符串
This one surprises people. Models are trained on the entire internet, which includes a lot of error messages, apology boilerplate, and refusal language. Under pressure — ambiguous input, a retrieval miss, a truncated context — the model will sometimes emit text that is syntactically valid but semantically garbage: “I’m sorry, I cannot access that file,” “Error: undefined,” “As an AI language model, I don’t have the ability to…,” or a half-rendered template with {{variable}} still in it.
这一点常让人感到意外。模型是在整个互联网数据上训练的,其中包含了大量的错误信息、道歉模板和拒绝用语。在压力下——例如输入模糊、检索失败或上下文被截断时——模型有时会输出语法正确但语义垃圾的内容,例如:“抱歉,我无法访问该文件”、“Error: undefined”、“作为一个人工智能语言模型,我没有能力……”或者是一个仍带有 {{variable}} 的未渲染模板。
Structured output won’t catch these, because they fit the schema fine. You need an explicit denylist of error-shaped strings and patterns, checked against every field. It’s crude and it works. Maintain it like you maintain a spam filter — every time a new flavor of garbage reaches production, it earns a line in the rejection list. 结构化输出无法捕捉这些内容,因为它们完全符合模式定义。你需要一个明确的错误字符串和模式黑名单,并对每个字段进行检查。这虽然原始,但非常有效。像维护垃圾邮件过滤器一样维护它——每当一种新的垃圾内容进入生产环境,就在拒绝列表中增加一行。
Layer 3: Voice and constraint checks
第三层:语调与约束检查
This is where you encode the things that make output yours rather than generic. Most of it is deterministic and cheap: 在这里,你可以编码那些让输出具有你个人风格而非通用风格的内容。大部分检查都是确定性且低成本的:
- Length bounds. A word or token range per field. Reject the 900-word answer and the one-liner. 长度限制。 每个字段的字数或 Token 范围。拒绝 900 字的长篇大论或只有一行的敷衍回答。
- Banned phrases. The motivational-closer clichés, the “delve,” the emoji clusters, the corporate hedging. 禁用短语。 那些励志结尾的陈词滥调、“深入研究”(delve)、表情符号堆砌、企业式的推诿用语。
- A regex pass. 正则表达式检查。
- Required language. If you build bilingual TR/EN tooling like I do, you check that a Turkish response is actually in Turkish — a quick script-ratio or language-ID check catches the model code-switching mid-paragraph. 语言要求。 如果你像我一样构建土耳其语/英语双语工具,你需要检查土耳其语回复是否真的是土耳其语——通过简单的脚本比例或语言识别检查,可以捕捉到模型在段落中途切换语言的情况。
- Format invariants. Markdown headings present, no leaked system-prompt fragments, no placeholder tokens. 格式不变性。 确保 Markdown 标题存在,没有泄露系统提示词片段,没有占位符 Token。
Here’s the core of a harness that strings these layers together with a bounded retry loop. 以下是连接这些层级并带有有限重试循环的处理机制核心代码:
import re
from pydantic import BaseModel, ValidationError
class Article(BaseModel):
title: str
body: str
ERROR_SHAPES = [
r"as an ai language model",
r"i (?:cannot|can't|am unable to) (?:access|comply)",
r"\berror:\s",
r"undefined|null\b",
r"\{\{.*?\}\}", # leaked template tokens
]
BANNED_PHRASES = [r"in today's fast-paced", r"delve into", r"unleash the power"]
def gate(text: str) -> list[str]:
"""Deterministic checks. Returns a list of failures (empty == pass)."""
fails = []
if not text.strip():
fails.append("empty output")
if not (200 <= len(text.split()) <= 800):
fails.append(f"length out of bounds: {len(text.split())} words")
for pat in ERROR_SHAPES:
if re.search(pat, text, re.I):
fails.append(f"error-shaped string: /{pat}/")
for pat in BANNED_PHRASES:
if re.search(pat, text, re.I):
fails.append(f"banned phrase: /{pat}/")
return fails
def generate(client, prompt: str, max_attempts: int = 3) -> Article:
last_fails: list[str] = []
for attempt in range(max_attempts):
feedback = "" if not last_fails else (
"\n\nYour previous output was rejected for: " + "; ".join(last_fails) + ". Fix these and return only the schema."
)
raw = client.structured(prompt + feedback, schema=Article) # native structured mode
try:
article = Article.model_validate(raw)
except ValidationError as e:
last_fails = [f"schema: {e.error_count()} errors"]
continue
last_fails = gate(article.body)
if not last_fails:
return article
raise RuntimeError(f"slop after {max_attempts} attempts: {last_fails}")
Notice what the harness does on rejection: it feeds the specific failures back into the next attempt. The model is far better at fixing a named defect than at avoiding an abstract one. And notice the loop is bounded — after max_attempts it raises an error. 注意当拒绝发生时,该机制做了什么:它将具体的失败原因反馈给下一次尝试。模型在修复明确的缺陷方面远比避免抽象的错误要强得多。同时请注意,循环是有上限的——在达到最大尝试次数后,它会抛出错误。