Grokers: Bottom-Up Inductive Comprehension and Write-Time Intelligence over Typed Knowledge Graphs

Grokers: Bottom-Up Inductive Comprehension and Write-Time Intelligence over Typed Knowledge Graphs

Grokers:基于类型化知识图谱的自底向上归纳理解与写入时智能

Abstract: We present Grokers, an architecture for building persistent, structured comprehension of typed knowledge graphs through bottom-up inductive traversal of dependency subgraphs. Unlike retrieval-augmented generation (RAG), which pays full comprehension cost at every query, Grokers pushes intelligence to write time: autonomous Groker agents analyze nodes in a typed stream graph, extract structured attributes via governed language model (LM) calls, and inductively compose that understanding upward through dependency relations, writing enriched typed attributes that serve all future queries at zero additional LM cost.

摘要: 我们提出了 Grokers,这是一种通过对依赖子图进行自底向上的归纳遍历,从而构建类型化知识图谱的持久化、结构化理解的架构。与检索增强生成(RAG)在每次查询时都需要支付完整的理解成本不同,Grokers 将智能处理前置到了“写入时”:自主的 Groker 代理分析类型化流图中的节点,通过受控的语言模型(LM)调用提取结构化属性,并沿着依赖关系将这些理解自底向上进行归纳组合,最终写入丰富的类型化属性,从而以零额外的 LM 成本服务于所有未来的查询。

We prove three formal properties: (1) the Byte-Identity Theorem, establishing that context blocks assembled from a transactionally-maintained denormalization index are byte-identical across LM turns between semantic changes, enabling KV-cache hit rates approaching 100%; (2) the Accumulation Monotonicity Theorem, establishing that the fraction of interactions resolved without LM calls is non-decreasing in the number of completed interactions under a governed wisdom library growth protocol; and (3) the Dual-Traversal Ordering Theorem, establishing that top-down generation and bottom-up comprehension are the unique correct traversal orderings for their respective tasks over a dependency DAG, and that their composition closes into a complete generation-comprehension cycle.

我们证明了三个形式化属性:(1) 字节一致性定理(Byte-Identity Theorem),确立了从事务性维护的反规范化索引中组装的上下文块在语义变更之间的 LM 轮次中保持字节一致,使 KV 缓存命中率接近 100%;(2) 累积单调性定理(Accumulation Monotonicity Theorem),确立了在受控的知识库增长协议下,无需 LM 调用即可解决的交互比例随着已完成交互数量的增加而呈非递减趋势;(3) 双向遍历排序定理(Dual-Traversal Ordering Theorem),确立了自顶向下的生成和自底向上的理解是依赖有向无环图(DAG)中各自任务的唯一正确遍历顺序,且它们的组合构成了一个完整的“生成-理解”闭环。

We further present a deterministic alternative to embedding-based semantic search, with a synonym caching protocol whose LM fallback rate converges to zero for finite-vocabulary domains. A reference implementation is provided in the open-source Qbix / Safebox / Safebots stack.

此外,我们提出了一种基于嵌入的语义搜索的确定性替代方案,该方案包含一种同义词缓存协议,对于有限词汇领域,其 LM 回退率趋于零。该研究的参考实现已在开源的 Qbix / Safebox / Safebots 技术栈中提供。