Testing an AI Memory Reliability Checklist on 3 Redacted Agent Setups

Testing an AI Memory Reliability Checklist on 3 Redacted Agent Setups

测试 AI 记忆可靠性检查清单:针对 3 个脱敏后的智能体配置

I’m testing a small AI memory reliability checklist. The question is simple: When an AI agent reads project instructions, memory files, Cursor rules, or AGENTS.md before acting, can we tell which instructions should actually govern action? 我正在测试一份小型 AI 记忆可靠性检查清单。问题很简单:当 AI 智能体在执行操作前读取项目指令、记忆文件、Cursor 规则或 AGENTS.md 时,我们能否判断哪些指令才是真正应该指导行动的准则?

I’m looking for 3 people who use Claude, Cursor, Codex, or custom agents and are willing to share redacted, non-sensitive instruction files. Examples: AGENTS.md, CLAUDE.md, .cursorrules, Cursor rules, project instructions, memory exports, SOPs/checklists. 我正在寻找 3 位使用 Claude、Cursor、Codex 或自定义智能体的用户,并希望你们能分享脱敏后的非敏感指令文件。例如:AGENTS.md、CLAUDE.md、.cursorrules、Cursor 规则、项目指令、记忆导出文件、SOP(标准作业程序)或检查清单。

Please do not send API keys, passwords, private customer data, legal records, medical records, financial records, HR records, or anything sensitive. If something is private, redact it first and leave only the structure. 请勿发送 API 密钥、密码、私人客户数据、法律记录、医疗记录、财务记录、人力资源记录或任何敏感信息。如果内容涉及隐私,请先进行脱敏处理,仅保留文件结构。

If you participate, I’ll return a short report covering: stale instructions, conflicting rules, what should govern action, what should not govern action, missing verification gates, where a relevant memory could override a more authoritative one. 如果您参与测试,我将为您提供一份简短的报告,内容涵盖:过时的指令、冲突的规则、哪些内容应指导行动、哪些内容不应指导行动、缺失的验证环节,以及相关记忆可能覆盖更具权威性指令的情况。

This is not a security audit, legal review, compliance review, or production safety certification. It is a small research pilot to see whether this checklist is useful before I turn it into a tool. The public research behind it is here: https://github.com/keniel13-ui/ai-memory-judgment-demo 这并非安全审计、法律审查、合规性审查或生产安全认证。这只是一个小型的研究试点,旨在将此清单转化为工具前验证其有效性。其背后的公开研究请见:https://github.com/keniel13-ui/ai-memory-judgment-demo

The basic idea came from a failure pattern I’ve been testing: Relevance is not authority. A memory or instruction can be highly relevant to a request and still be the wrong thing for an agent to obey. 其核心理念源于我一直在测试的一种故障模式:相关性不等于权威性。一条记忆或指令可能与请求高度相关,但对于智能体而言,它可能并非正确的执行准则。

For example: an old instruction may still match the task but be superseded, a preference may be relevant but should not override a project rule, a workflow note may describe what happened before but not what should happen now, a read-only question may share vocabulary with a write/execute policy. 例如:旧指令可能仍与任务匹配但已被取代;个人偏好可能相关但不应覆盖项目规则;工作流笔记可能描述了过去的情况而非当前应执行的操作;只读问题可能与写入/执行策略共享相同的词汇。

The checklist tries to separate: “What is close to the query?” from: “What is allowed to govern the action?” If you want to test it, comment here or DM me. I’ll take the first 3 redacted setups. 该检查清单旨在区分:“哪些内容与查询相关?”与“哪些内容被允许指导行动?”如果您想参与测试,请在此评论或私信我。我将选取前 3 个脱敏后的配置进行测试。