From Regex to AST: Building Taint Tracking for AI Agent Code

From Regex to AST: Building Taint Tracking for AI Agent Code

从正则表达式到抽象语法树 (AST):为 AI Agent 代码构建污点追踪

AgentGuard v0.5.0 ships AST-based taint tracking. This post explains how it works and why it matters. AgentGuard v0.5.0 正式发布了基于 AST(抽象语法树)的污点追踪功能。本文将详细介绍其工作原理及重要性。

The Regex Ceiling

正则表达式的局限性

Regex catches obvious patterns: prompt = f"You are helpful. {user_input}". A regex rule sees f"..." with {user_input} and flags it. Done. 正则表达式可以捕获明显的模式,例如 prompt = f"You are helpful. {user_input}"。正则规则只需识别带有 {user_input}f"..." 字符串即可完成标记。

But regex cannot track this: 但正则表达式无法追踪以下情况:

query = request.json.get("query")
processed = query.strip().upper()
template = "Answer: {q}"
prompt = template.format(q=processed)
response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": prompt}]
)

The taint flows: request.json -> query -> processed -> template.format() -> prompt -> openai call. Four hops. Regex sees each line independently and cannot connect them. 污点流向为:request.json -> query -> processed -> template.format() -> prompt -> openai call,共经过四次跳转。正则表达式只能独立查看每一行,无法将它们关联起来。

AST to the Rescue

AST 救场

Python’s ast module parses source code into a syntax tree. We can walk that tree and track how data flows. Python 的 ast 模块可以将源代码解析为语法树。我们可以遍历这棵树,从而追踪数据的流向。

Step 1: Identify Sources 第一步:识别源 (Sources)

A “source” is any expression that produces untrusted data: “源”是指任何产生不可信数据的表达式:

SOURCE_PATTERNS = {
    "user_input", "user_msg", "user_message", "request", "req", "query", "message", "msg",
}

Plus attribute access patterns: request.args.get("q"), request.json["key"], input(). In AST terms, we check ast.Name nodes against the source set, and ast.Call nodes for request.args.get patterns. 此外还包括属性访问模式,如 request.args.get("q")request.json["key"]input()。在 AST 层面,我们通过检查 ast.Name 节点是否在源集合中,以及检查 ast.Call 节点是否匹配 request.args.get 模式来识别。

Step 2: Track Propagation 第二步:追踪传播 (Propagation)

When a source is assigned to a variable, that variable becomes tainted. But taint also propagates through: 当一个源被赋值给变量时,该变量即被标记为“污点”。污点还会通过以下方式传播:

  • Method calls: processed = user_input.strip()processed is still tainted.
  • 方法调用: processed = user_input.strip() —— processed 依然带有污点。
  • F-strings: prompt = f"Hello {user_input}"prompt is tainted.
  • F-strings: prompt = f"Hello {user_input}" —— prompt 带有污点。
  • .format(): prompt = template.format(q=query)prompt is tainted if query is.
  • .format(): prompt = template.format(q=query) —— 如果 query 有污点,则 prompt 也有污点。
  • String concatenation: prompt = "Hello " + user_inputprompt is tainted.
  • 字符串拼接: prompt = "Hello " + user_input —— prompt 带有污点。
  • List/dict construction: messages = [{"role": "user", "content": user_input}]messages is tainted.
  • 列表/字典构建: messages = [{"role": "user", "content": user_input}] —— messages 带有污点。

The tracker walks assignments in order, maintaining a tainted_vars dict. When it sees x = tainted_expr, it adds x to the dict. When it sees x = safe_expr, it removes x. 追踪器按顺序遍历赋值语句,维护一个 tainted_vars 字典。当看到 x = tainted_expr 时,将 x 加入字典;当看到 x = safe_expr 时,将其移除。

Step 3: Identify Sinks 第三步:识别汇点 (Sinks)

A “sink” is where tainted data reaches an LLM: “汇点”是指污点数据到达 LLM 的位置:

  • Variable assignment: prompt = <tainted> or messages = [<tainted>]
  • 变量赋值: prompt = <tainted>messages = [<tainted>]
  • Function call: openai.chat.completions.create(messages=<tainted>)
  • 函数调用: openai.chat.completions.create(messages=<tainted>)

When the tracker sees a tainted expression reaching a sink, it fires a finding. 当追踪器发现污点表达式到达汇点时,就会触发警报。

Step 4: Sanitizers 第四步:净化器 (Sanitizers)

Not all transformations preserve taint. Some explicitly make data safe: safe = str(user_input)[:100]. The tracker treats str(), int(), float(), len(), and explicit escape functions as sanitizers. When data passes through a sanitizer, the taint is removed. 并非所有转换都会保留污点。有些转换会显式地使数据变得安全,例如 safe = str(user_input)[:100]。追踪器将 str()int()float()len() 以及显式的转义函数视为净化器。当数据通过净化器时,污点会被移除。

What It Catches (That Regex Cannot)

它能捕获(正则无法捕获)的内容

  • Multi-hop flow: 4 variable assignments leading to an LLM call.
  • 多跳流: 经过 4 次变量赋值后到达 LLM 调用。
  • Template .format() with named args: Correctly tracks taint through .format().
  • 带命名参数的 Template .format(): 正确追踪通过 .format() 传播的污点。
  • Messages array with tainted content: Detects taint inside nested list/dict structures.
  • 带有污点内容的 Messages 数组: 检测嵌套列表/字典结构内部的污点。

What It Does Not Flag (Correctly)

它(正确地)不会标记的内容

  • Sanitized input: str(user_input)[:100] is recognized as safe.
  • 已净化输入: str(user_input)[:100] 被识别为安全。
  • Hardcoded prompt: No taint source, so no alert.
  • 硬编码提示词: 没有污点源,因此不会触发警报。

Limitations

局限性

  • Python only: JavaScript/TypeScript support is on the roadmap.
  • 仅限 Python: JavaScript/TypeScript 支持已在路线图中。
  • Intra-file only: No interprocedural analysis yet.
  • 仅限单文件: 暂不支持跨过程分析。
  • No control flow: If/else branches are not tracked separately.
  • 无控制流: If/else 分支不会被单独追踪。
  • Conservative sanitizers: str() is treated as a sanitizer, but may not be safe in all contexts.
  • 保守的净化器: str() 被视为净化器,但在某些上下文中可能并不安全。

Try It

尝试使用

pip install --upgrade dfx-agentguard
agentguard src/ --format text

The taint tracking rule (ASI01-TAINT-TRACK) runs alongside existing regex rules. Both layers work together: regex for speed, AST for precision. AgentGuard is MIT-licensed. v0.5.0 includes 38 tests and a 32-sample benchmark with 100% detection rate. 污点追踪规则 (ASI01-TAINT-TRACK) 与现有的正则规则并行运行。两者相辅相成:正则负责速度,AST 负责精度。AgentGuard 采用 MIT 协议开源。v0.5.0 版本包含 38 个测试用例和 32 个样本的基准测试,检测率达 100%。