How Meta Used AI to Map Tribal Knowledge in Large-Scale Data Pipelines

Meta 如何利用 AI 绘制大规模数据流水线中的“部落知识”地图

By Krishna Ganeriwal, Plawan Rath, Ashwini Verma 作者：Krishna Ganeriwal, Plawan Rath, Ashwini Verma

AI coding assistants are powerful but only as good as their understanding of your codebase. When we pointed AI agents at one of Meta’s large-scale data processing pipelines – spanning four repositories, three languages, and over 4,100 files – we quickly found that they weren’t making useful edits quickly enough. AI 编程助手功能强大，但其效能取决于对代码库的理解程度。当我们让 AI 智能体处理 Meta 的一个大规模数据处理流水线（涵盖 4 个代码仓库、3 种编程语言、超过 4,100 个文件）时，我们很快发现它们无法足够快地进行有效的代码编辑。

We fixed this by building a pre-compute engine: a swarm of 50+ specialized AI agents that systematically read every file and produced 59 concise context files encoding tribal knowledge that previously lived only in engineers’ heads. 为了解决这个问题，我们构建了一个预计算引擎：由 50 多个专业 AI 智能体组成的集群，它们系统地读取了每一个文件，并生成了 59 个简洁的上下文文件，将此前仅存在于工程师脑海中的“部落知识”（Tribal Knowledge）进行了编码。

The result: AI agents now have structured navigation guides for 100% of our code modules (up from 5%, covering all 4,100+ files across three repositories). We also documented 50+ “non-obvious patterns,” or underlying design choices and relationships not immediately apparent from the code, and preliminary tests show 40% fewer AI agent tool calls per task. 结果是：AI 智能体现在拥有了我们 100% 代码模块的结构化导航指南（从之前的 5% 提升而来，覆盖了三个仓库中的所有 4,100 多个文件）。我们还记录了 50 多个“非显而易见模式”（non-obvious patterns），即那些无法直接从代码中看出的底层设计选择和关联。初步测试显示，每个任务的 AI 智能体工具调用次数减少了 40%。

The system works with most leading models because the knowledge layer is model-agnostic. The system also maintains itself. Every few weeks, automated jobs periodically validate file paths, detect coverage gaps, re-run quality critics, and auto-fix stale references. The AI isn’t a consumer of this infrastructure, it’s the engine that runs it. 该系统适用于大多数主流模型，因为知识层与模型无关。此外，该系统具备自我维护能力。每隔几周，自动化任务会定期验证文件路径、检测覆盖缺口、重新运行质量评估，并自动修复过时的引用。AI 不仅仅是该基础设施的使用者，更是驱动它的引擎。

The Problem: AI Tools Without a Map

问题：没有地图的 AI 工具

Our pipeline is config-as-code: Python configurations, C++ services, and Hack automation scripts working together across multiple repositories. A single data field onboarding touches configuration registries, routing logic, DAG composition, validation rules, C++ code generation, and automation scripts – six subsystems that must stay in sync. 我们的流水线采用“配置即代码”（config-as-code）模式：Python 配置、C++ 服务和 Hack 自动化脚本在多个仓库中协同工作。引入一个单一数据字段涉及配置注册表、路由逻辑、DAG 组合、验证规则、C++ 代码生成和自动化脚本——这六个子系统必须保持同步。

We had already built AI-powered systems for operational tasks, scanning dashboards, pattern-matching against historical incidents, and suggesting mitigations. But when we tried to extend it to development tasks, it fell apart. The AI had no map. It didn’t know that two configuration modes use different field names for the same operation (swap them and you get silent wrong output), or that dozens of “deprecated” enum values must never be removed because serialization compatibility depends on them. Without this context, agents would guess, explore, guess again and often produce code that compiled but was subtly wrong. 我们此前已经构建了用于运维任务的 AI 系统，可以扫描仪表板、匹配历史故障模式并建议缓解措施。但当我们尝试将其扩展到开发任务时，系统失效了。AI 没有地图。它不知道两种配置模式对同一操作使用不同的字段名称（互换它们会导致静默的错误输出），也不知道几十个“已弃用”的枚举值绝不能删除，因为序列化兼容性依赖于它们。没有这些上下文，智能体只能猜测、探索、再猜测，往往会生成虽然能编译通过但逻辑上存在细微错误的代码。

The Approach: Teach the Agents Before They Explore

方法：在探索前先教导智能体

We used a large-context-window model and task orchestration to structure the work in phases: Two explorer agents mapped the codebase, 11 module analysts read every file and answered five key questions, Two writers generated context files, and 10+ critic passes ran three rounds of independent quality review, Four fixers applied corrections, Eight upgraders refined the routing layer, Three prompt testers validated 55+ queries across five personas, Four gap-fillers covered remaining directories, and Three final critics ran integration tests – 50+ specialized tasks orchestrated in a single session. 我们利用大上下文窗口模型和任务编排将工作分阶段进行：2 个探索智能体绘制代码库地图，11 个模块分析师阅读每个文件并回答 5 个关键问题，2 个撰写者生成上下文文件，10 多次评估过程进行了三轮独立的质量审查，4 个修复者应用修正，8 个升级者优化路由层，3 个提示词测试者验证了 5 种角色下的 55 多个查询，4 个补缺者覆盖了剩余目录，3 个最终评估者运行集成测试——在单次会话中编排了 50 多个专业任务。

The five questions each analyst answered per module: What does this module configure? What are the common modification patterns? What are the non-obvious patterns that cause build failures? What are the cross-module dependencies? What tribal knowledge is buried in code comments? 每位分析师针对每个模块回答的五个问题是：该模块配置了什么？常见的修改模式是什么？哪些非显而易见模式会导致构建失败？跨模块依赖关系是什么？代码注释中埋藏了哪些部落知识？

Question five was where the deepest learnings emerged. We found 50+ non-obvious patterns like hidden intermediate naming conventions where one pipeline stage outputs a temporary field name that a downstream stage renames (reference the wrong one and code generation silently fails), or append-only identifier rules where removing a “deprecated” value breaks backward compatibility. None of this had been written down before. 第五个问题是挖掘出最深层见解的地方。我们发现了 50 多个非显而易见模式，例如隐藏的中间命名约定（流水线的一个阶段输出临时字段名，下游阶段对其重命名，引用错误会导致代码生成静默失败），或者“仅追加”标识符规则（删除“已弃用”值会破坏向后兼容性）。这些内容此前从未被记录下来。

What We Built: A Compass, Not An Encyclopedia

我们构建了什么：指南针，而非百科全书

Each context file follows what we call “compass, not encyclopedia” principle – 25–35 lines (~1,000 tokens) with four sections: Quick Commands (copy-paste operations), Key Files (the 3–5 files you actually need), Non-Obvious patterns, and See Also (cross-references). No fluff, every line earns its place. All 59 files together consume less than 0.1% of a modern model’s context window. 每个上下文文件都遵循我们所谓的“指南针，而非百科全书”原则——25-35 行（约 1,000 tokens），包含四个部分：快速命令（复制粘贴操作）、关键文件（你真正需要的 3-5 个文件）、非显而易见模式和参考资料（交叉引用）。没有废话，每一行都有其价值。所有 59 个文件加起来占用的上下文窗口不到现代模型容量的 0.1%。

On top of this, we built an orchestration layer that auto-routes engineers to the right tool based on natural language. Type, “Is the pipeline healthy?” and it scans dashboards and matches against 85+ historical incident patterns. Type, “Add a new data field” and it generates the configuration with multi-phase validation. Engineers describe their problem; the system figures out the rest. 在此基础上，我们构建了一个编排层，根据自然语言自动将工程师引导至正确的工具。输入“流水线健康吗？”，它会扫描仪表板并匹配 85 多个历史故障模式。输入“添加新数据字段”，它会生成带有分阶段验证的配置。工程师描述问题，系统负责处理其余部分。

The system self-refreshes every few weeks, validating file paths, identifying coverage gaps, re-running critic agents, and auto-fixing issues. Context that decays is worse than no context at all. Beyond individual contextual files, we generated a cross-repo dependency index and data flow maps showing how changes propagate across repositories. This turns “What depends on X?” from a multi-file exploration (~6000 tokens) into a single graph lookup (~200 tokens) – in config-as-code where one field change ripples across six-subsystems. 该系统每隔几周自动刷新一次，验证文件路径、识别覆盖缺口、重新运行评估智能体并自动修复问题。过期的上下文比没有上下文更糟糕。除了单个上下文文件外，我们还生成了跨仓库依赖索引和数据流图，展示了变更如何在仓库间传播。这使得“什么依赖于 X？”的问题从多文件探索（约 6000 tokens）转变为单次图查询（约 200 tokens）——在“配置即代码”环境中，一个字段的变更会波及六个子系统。

Results

结果

Metric	Before	After
AI context coverage	~5% (5 files)	100% (59 files)
Codebase files with AI navigation	~50	4,100+
Tribal knowledge documented	0	50+ non-obvious patterns
Tested prompts (core pass rate)	0	55+ (100%)

In preliminary tests on six tasks against our pipeline, agents with pre-computed context used roughly 40% fewer tool calls and tokens per task. Complex workflow guidance that previously required ~two days of research and consulting with engineers now completes in ~30 minutes. Quality was non-negotiable: three rounds of independent critic agents improved scores from 3.65 to 4.20 out of 5.0, and all referenced file paths were verified with zero hallucinations. 在针对我们流水线的六项初步测试中，拥有预计算上下文的智能体在每个任务中减少了约 40% 的工具调用和 token 使用量。此前需要约两天研究和咨询工程师才能完成的复杂工作流指导，现在只需约 30 分钟即可完成。质量是不可妥协的：三轮独立的评估智能体将评分从 3.65 提高到了 4.20（满分 5.0），且所有引用的文件路径均经过验证，零幻觉。

Challenging the Conventional Wisdom on AI Context Files

挑战关于 AI 上下文文件的传统观念

Recent academic research found that AI-generated context files actually decreased agent success rates on well-known open-source Python repositories. This finding deserves serious consideration but it has a limitation: It was evaluated on codeb… 最近的学术研究发现，AI 生成的上下文文件实际上降低了智能体在知名开源 Python 代码库上的成功率。这一发现值得认真考虑，但它有一个局限性：它是在代码库上进行评估的……