From Regex to AST: Building Taint Tracking for AI Agent Code
From Regex to AST: Building Taint Tracking for AI Agent Code
从正则表达式到抽象语法树 (AST):为 AI Agent 代码构建污点追踪
AgentGuard v0.5.0 ships AST-based taint tracking. This post explains how it works and why it matters. AgentGuard v0.5.0 正式发布了基于 AST(抽象语法树)的污点追踪功能。本文将详细介绍其工作原理及重要性。
The Regex Ceiling
正则表达式的局限性
Regex catches obvious patterns: prompt = f"You are helpful. {user_input}". A regex rule sees f"..." with {user_input} and flags it. Done.
正则表达式可以捕获明显的模式,例如 prompt = f"You are helpful. {user_input}"。正则规则只需识别带有 {user_input} 的 f"..." 字符串即可完成标记。
But regex cannot track this: 但正则表达式无法追踪以下情况:
query = request.json.get("query")
processed = query.strip().upper()
template = "Answer: {q}"
prompt = template.format(q=processed)
response = openai.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}]
)
The taint flows: request.json -> query -> processed -> template.format() -> prompt -> openai call. Four hops. Regex sees each line independently and cannot connect them.
污点流向为:request.json -> query -> processed -> template.format() -> prompt -> openai call,共经过四次跳转。正则表达式只能独立查看每一行,无法将它们关联起来。
AST to the Rescue
AST 救场
Python’s ast module parses source code into a syntax tree. We can walk that tree and track how data flows.
Python 的 ast 模块可以将源代码解析为语法树。我们可以遍历这棵树,从而追踪数据的流向。
Step 1: Identify Sources 第一步:识别源 (Sources)
A “source” is any expression that produces untrusted data: “源”是指任何产生不可信数据的表达式:
SOURCE_PATTERNS = {
"user_input", "user_msg", "user_message", "request", "req", "query", "message", "msg",
}
Plus attribute access patterns: request.args.get("q"), request.json["key"], input(). In AST terms, we check ast.Name nodes against the source set, and ast.Call nodes for request.args.get patterns.
此外还包括属性访问模式,如 request.args.get("q")、request.json["key"] 和 input()。在 AST 层面,我们通过检查 ast.Name 节点是否在源集合中,以及检查 ast.Call 节点是否匹配 request.args.get 模式来识别。
Step 2: Track Propagation 第二步:追踪传播 (Propagation)
When a source is assigned to a variable, that variable becomes tainted. But taint also propagates through: 当一个源被赋值给变量时,该变量即被标记为“污点”。污点还会通过以下方式传播:
- Method calls:
processed = user_input.strip()—processedis still tainted. - 方法调用:
processed = user_input.strip()——processed依然带有污点。 - F-strings:
prompt = f"Hello {user_input}"—promptis tainted. - F-strings:
prompt = f"Hello {user_input}"——prompt带有污点。 - .format():
prompt = template.format(q=query)—promptis tainted ifqueryis. - .format():
prompt = template.format(q=query)—— 如果query有污点,则prompt也有污点。 - String concatenation:
prompt = "Hello " + user_input—promptis tainted. - 字符串拼接:
prompt = "Hello " + user_input——prompt带有污点。 - List/dict construction:
messages = [{"role": "user", "content": user_input}]—messagesis tainted. - 列表/字典构建:
messages = [{"role": "user", "content": user_input}]——messages带有污点。
The tracker walks assignments in order, maintaining a tainted_vars dict. When it sees x = tainted_expr, it adds x to the dict. When it sees x = safe_expr, it removes x.
追踪器按顺序遍历赋值语句,维护一个 tainted_vars 字典。当看到 x = tainted_expr 时,将 x 加入字典;当看到 x = safe_expr 时,将其移除。
Step 3: Identify Sinks 第三步:识别汇点 (Sinks)
A “sink” is where tainted data reaches an LLM: “汇点”是指污点数据到达 LLM 的位置:
- Variable assignment:
prompt = <tainted>ormessages = [<tainted>] - 变量赋值:
prompt = <tainted>或messages = [<tainted>] - Function call:
openai.chat.completions.create(messages=<tainted>) - 函数调用:
openai.chat.completions.create(messages=<tainted>)
When the tracker sees a tainted expression reaching a sink, it fires a finding. 当追踪器发现污点表达式到达汇点时,就会触发警报。
Step 4: Sanitizers 第四步:净化器 (Sanitizers)
Not all transformations preserve taint. Some explicitly make data safe: safe = str(user_input)[:100]. The tracker treats str(), int(), float(), len(), and explicit escape functions as sanitizers. When data passes through a sanitizer, the taint is removed.
并非所有转换都会保留污点。有些转换会显式地使数据变得安全,例如 safe = str(user_input)[:100]。追踪器将 str()、int()、float()、len() 以及显式的转义函数视为净化器。当数据通过净化器时,污点会被移除。
What It Catches (That Regex Cannot)
它能捕获(正则无法捕获)的内容
- Multi-hop flow: 4 variable assignments leading to an LLM call.
- 多跳流: 经过 4 次变量赋值后到达 LLM 调用。
- Template .format() with named args: Correctly tracks taint through
.format(). - 带命名参数的 Template .format(): 正确追踪通过
.format()传播的污点。 - Messages array with tainted content: Detects taint inside nested list/dict structures.
- 带有污点内容的 Messages 数组: 检测嵌套列表/字典结构内部的污点。
What It Does Not Flag (Correctly)
它(正确地)不会标记的内容
- Sanitized input:
str(user_input)[:100]is recognized as safe. - 已净化输入:
str(user_input)[:100]被识别为安全。 - Hardcoded prompt: No taint source, so no alert.
- 硬编码提示词: 没有污点源,因此不会触发警报。
Limitations
局限性
- Python only: JavaScript/TypeScript support is on the roadmap.
- 仅限 Python: JavaScript/TypeScript 支持已在路线图中。
- Intra-file only: No interprocedural analysis yet.
- 仅限单文件: 暂不支持跨过程分析。
- No control flow: If/else branches are not tracked separately.
- 无控制流: If/else 分支不会被单独追踪。
- Conservative sanitizers:
str()is treated as a sanitizer, but may not be safe in all contexts. - 保守的净化器:
str()被视为净化器,但在某些上下文中可能并不安全。
Try It
尝试使用
pip install --upgrade dfx-agentguard
agentguard src/ --format text
The taint tracking rule (ASI01-TAINT-TRACK) runs alongside existing regex rules. Both layers work together: regex for speed, AST for precision. AgentGuard is MIT-licensed. v0.5.0 includes 38 tests and a 32-sample benchmark with 100% detection rate.
污点追踪规则 (ASI01-TAINT-TRACK) 与现有的正则规则并行运行。两者相辅相成:正则负责速度,AST 负责精度。AgentGuard 采用 MIT 协议开源。v0.5.0 版本包含 38 个测试用例和 32 个样本的基准测试,检测率达 100%。