cocoindex-io / cocoindex

Your agents deserve fresh context. Star us ❤️ → · · · CocoIndex turns codebases, meeting notes, inboxes, Slack, PDFs, and videos into live, continuously fresh context for your AI agents and LLM apps to reason over effectively — with minimal incremental processing. 您的 AI Agent 值得拥有最新的上下文。请为我们点个星 ❤️ → · · · CocoIndex 能将代码库、会议记录、收件箱、Slack、PDF 和视频转化为实时、持续更新的上下文，供您的 AI Agent 和 LLM 应用进行高效推理，且仅需极少的增量处理。

Get your production AI agent ready in 10 minutes with reliable, continuously fresh data — no stale batches, no context gap. Incremental · only the delta · Any scale · parallel by default · Declarative · Python, 5 min. 只需 10 分钟，即可利用可靠且持续更新的数据让您的生产级 AI Agent 就绪 —— 无需陈旧的批处理，没有上下文断层。增量式 · 仅处理增量 · 任意规模 · 默认并行 · 声明式 · Python，5 分钟上手。

Declare what should be in your target — CocoIndex keeps it in sync forever, recomputing only the Δ. 声明您目标数据中的内容 —— CocoIndex 将使其永久保持同步，且仅重新计算增量部分（Δ）。

import cocoindex as coco
from cocoindex.connectors import localfs, postgres
from cocoindex.ops.text import RecursiveSplitter

@coco.fn(memo=True) # ← cached by hash(input) + hash(code)
async def index_file(file, table):
    for chunk in RecursiveSplitter().split(await file.read_text()):
        table.declare_row(text=chunk.text, embedding=embed(chunk.text))

@coco.fn
async def main(src):
    table = await postgres.mount_table_target(PG, table_name="docs")
    table.declare_vector_index(column="embedding")
    await coco.mount_each(index_file, localfs.walk_dir(src).items(), table)

coco.App(coco.AppConfig(name="docs"), main, src="./docs").update_blocking()

Run once to backfill. Re-run anytime — only the changed files re-embed. 运行一次即可完成回填。随时重新运行 —— 只有更改过的文件才会重新进行向量化（re-embed）。

Building with an AI coding agent? Drop in our CocoIndex skill so your agent writes correct v1 code — concepts, APIs, patterns, all in one file. 正在使用 AI 编程助手开发？引入我们的 CocoIndex 技能，让您的助手编写出正确的 v1 代码 —— 概念、API、模式，尽在一处。

Incremental engine for long-horizon agents. Data transformation for any engineer, designed for AI workloads — with a smart incremental engine for always-fresh, explainable data. 面向长周期 Agent 的增量引擎。专为 AI 工作负载设计的工程化数据转换工具 —— 凭借智能增量引擎，实现数据始终新鲜且可解释。

Why incremental? Your agents are only as good as the data they see. Batch pipelines drift stale. CocoIndex stays live — and only runs the Δ. 为什么要增量？Agent 的能力取决于它所看到的数据。批处理流水线会产生滞后，而 CocoIndex 保持实时 —— 并且只处理增量。

CocoIndex Enterprise: Large corpus — built for enterprise scale. Incremental compute is the only way to keep large corpora fresh without re-embedding them every cycle. CocoIndex 企业版：针对大规模语料库 —— 为企业级规模构建。增量计算是保持大型语料库新鲜的唯一途径，无需在每个周期都进行全量重新向量化。

CocoIndex scales from a single repo to petabyte-scale stores — parallel by default, delta-only by design. Process once. Reconcile forever. CocoIndex 可从单个代码库扩展至 PB 级存储 —— 默认并行，设计上仅处理增量。一次处理，永久对齐。

When a source changes, CocoIndex identifies the affected records, propagates the change across joins and lookups, updates the target, and retires stale rows — without touching anything that didn’t change. 当源数据发生变化时，CocoIndex 会识别受影响的记录，通过连接（join）和查找（lookup）传播变更，更新目标数据，并移除陈旧行 —— 而不会触及任何未发生变化的数据。

Built on a Rust engine. The core is Rust — production-grade from day zero. Parallel chunking, zero-copy transforms where possible, and failure isolation so one bad record doesn’t stall the flow. 基于 Rust 引擎构建。核心采用 Rust —— 从第一天起就具备生产级品质。支持并行分块、尽可能实现零拷贝转换，并具备故障隔离机制，确保单条错误记录不会导致整个流程停滞。