How Do You Trust an AI Agent With Your Money? You Don't — You Check Its Receipt

How Do You Trust an AI Agent With Your Money? You Don’t — You Check Its Receipt

你该如何信任 AI 智能体处理你的资金?别盲目信任,请检查它的收据

Cryptographically verifiable agent behavior: swap, edit, or forge a step and it’s rejected. TL;DR: As we let AI agents do real things (issue refunds, move data, call APIs), “just trust it” stops being good enough. The fix: the agent hands you a tamper-proof receipt that proves it followed the approved rules and didn’t fake anything. I built a demo — change the rules, edit a step, or fake the signature, and the check fails every time. ~120 lines, normal everyday crypto, no API key.

加密可验证的智能体行为:任何替换、编辑或伪造步骤的行为都会被拒绝。简而言之:随着我们让 AI 智能体处理实际事务(如发放退款、移动数据、调用 API),“盲目信任”已不再足够。解决方案是:智能体向你提供一份防篡改的收据,证明它遵循了批准的规则且未进行任何伪造。我构建了一个演示程序——只要更改规则、编辑步骤或伪造签名,验证就会立即失败。代码约 120 行,使用常规加密技术,无需 API 密钥。

The scary question: You’re about to let an agent issue refunds, move files, or hit your production APIs. How do you actually know it followed the rules you approved — and not some changed version? And how do you know the log it gives you afterward wasn’t edited? Right now, the honest answer is usually: you don’t. You trust the logs. But logs can be edited, the rules an agent runs can be quietly swapped, and a compromised agent can claim it did one thing while doing another.

令人担忧的问题:你即将让一个智能体发放退款、移动文件或访问你的生产环境 API。你如何确定它确实遵循了你批准的规则,而不是某个被篡改的版本?你又如何确定它事后给你的日志没有被编辑过?目前,诚实的回答通常是:你无法确定。你只能信任日志。但日志是可以被编辑的,智能体运行的规则可以被悄悄替换,而一个被入侵的智能体可能声称自己做了一件事,实际却做了另一件事。

The 2026 fix is called verifiable agent behavior (the research term is “zkML”): the agent produces a tamper-proof receipt that proves it ran exactly the approved process — and anyone can check that receipt without having to trust the agent.

2026 年的解决方案被称为“可验证智能体行为”(研究术语为“zkML”):智能体生成一份防篡改的收据,证明它完全按照批准的流程运行——任何人无需信任该智能体,即可验证这份收据。

The 10-second version: 10 秒速览:

  • What happened / 发生了什么 | Result / 结果
  • Agent ran the approved refund rules, honestly / 智能体诚实地运行了批准的退款规则 | ✅ ACCEPT / 通过
  • Someone swapped in sneaky “refund anything” rules / 有人替换了隐蔽的“随意退款”规则 | 🚨 REJECT / 拒绝 — 规则与批准的不符
  • Someone edited a step (turned a $40 refund into $5000) / 有人编辑了步骤(将 40 美元退款改为 5000 美元) | 🚨 REJECT / 拒绝 — 收据无法对账
  • Someone faked the receipt without the secret key / 有人未通过密钥伪造了收据 | 🚨 REJECT / 拒绝 — 签名无效

Only the honest run passes. Every kind of cheating gets caught.

只有诚实的运行才能通过。任何形式的作弊都会被发现。

How it works (in plain terms): Three normal building blocks, no magic: 工作原理(通俗解释):三个常规构建模块,没有魔法:

  1. A fingerprint of the approved rules. Run the rules through a hashing function and you get a short, unique fingerprint. Anyone can fingerprint the approved rules and compare — if the agent used different rules, the fingerprints won’t match. 批准规则的指纹。 通过哈希函数运行规则,你会得到一个简短且唯一的指纹。任何人都可以对批准的规则进行指纹识别并进行比对——如果智能体使用了不同的规则,指纹将无法匹配。

  2. A receipt you can’t edit. Every step the agent takes is chained together so each step depends on all the steps before it. Change any one step and the whole thing stops adding up — like a tamper-evident seal. 无法编辑的收据。 智能体采取的每一步都相互关联,因此每一步都依赖于之前的所有步骤。更改任何一个步骤,整个链条就会失效——就像防篡改封条一样。

  3. A signature. The agent signs the final seal with a secret key. If someone tries to forge a receipt without that key, the signature won’t check out. To verify, you just redo all three and ask: Did it use the approved rules? Is the receipt intact? Is the signature real? All three have to pass. 签名。 智能体使用密钥对最终的封条进行签名。如果有人试图在没有密钥的情况下伪造收据,签名将无法通过验证。要进行验证,你只需重做上述三个步骤并确认:它是否使用了批准的规则?收据是否完整?签名是否真实?这三点必须全部通过。

Why this matters: Every other post in this series makes agents more independent — they rewrite their own code, sleep, model other people, get curious. This one is the safety net for all of that: independence without a way to check up on it is a liability. The more power we hand to agents, the less we can afford to just trust them — and the more we need a way to check them. 为何这很重要:本系列的其他文章都在让智能体变得更加独立——它们重写自己的代码、休眠、模拟他人、产生好奇心。而本文是这一切的安全网:如果独立性缺乏监管手段,那将是一种隐患。我们赋予智能体的权力越大,我们就越不能仅仅依赖信任,而越需要一种能够核查它们的方法。

The end goal of the real research is even stronger: prove an agent followed the approved rules without re-running it and without exposing any private data or secret model. That lets two companies trust each other’s agents — yours proves it behaved, mine checks the proof, and neither of us has to reveal our secrets. 这项研究的最终目标更为强大:证明智能体遵循了批准的规则,且无需重新运行,也无需暴露任何私有数据或秘密模型。这使得两家公司可以信任彼此的智能体——你的智能体证明了自己的行为,我的智能体验证了该证明,我们双方都无需泄露各自的机密。

Try it: 尝试一下: git clone https://github.com/Shridhar-2205/living-software cd living-software/05-verifiable-agent python demo.py

Honest note: the real research uses heavier cryptography so the checker doesn’t have to re-run anything and never sees the secret model. My demo re-checks a signed, sealed receipt instead — much simpler, and it shows the same payoff (cheat in any way ⇒ rejected) so you can feel what “verifiable behavior” actually buys you. It uses only standard, modern hashing (SHA-256), and the “secret key” is an obvious fake, never a real credential. 诚实说明:真正的研究使用更复杂的加密技术,因此验证者无需重新运行任何内容,也永远看不到秘密模型。我的演示程序则是通过重新检查已签名、已密封的收据来实现——这简单得多,且展示了相同的效果(任何形式的作弊都会被拒绝),让你能直观感受到“可验证行为”带来的价值。它仅使用标准的现代哈希算法(SHA-256),且“密钥”是明显的伪造品,绝非真实的凭证。

Shridhar Shah — Senior Software Engineer on the AI team at Cisco. Part 5 (the finale) of Toward Living Software. Shridhar Shah — 思科 AI 团队高级软件工程师。《迈向活体软件》系列第五篇(终章)。