cocoindex-io / cocoindex
cocoindex-io / cocoindex
Your agents deserve fresh context. Star us ❤️ → · · · CocoIndex turns codebases, meeting notes, inboxes, Slack, PDFs, and videos into live, continuously fresh context for your AI agents and LLM apps to reason over effectively — with minimal incremental processing. 您的 AI Agent 值得拥有最新的上下文。请为我们点个星 ❤️ → · · · CocoIndex 能将代码库、会议记录、收件箱、Slack、PDF 和视频转化为实时、持续更新的上下文,供您的 AI Agent 和 LLM 应用进行高效推理,且仅需极少的增量处理。
Get your production AI agent ready in 10 minutes with reliable, continuously fresh data — no stale batches, no context gap. Incremental · only the delta · Any scale · parallel by default · Declarative · Python, 5 min. 只需 10 分钟,即可利用可靠且持续更新的数据让您的生产级 AI Agent 就绪 —— 无需陈旧的批处理,没有上下文断层。增量式 · 仅处理增量 · 任意规模 · 默认并行 · 声明式 · Python,5 分钟上手。
Declare what should be in your target — CocoIndex keeps it in sync forever, recomputing only the Δ. 声明您目标数据中的内容 —— CocoIndex 将使其永久保持同步,且仅重新计算增量部分(Δ)。
import cocoindex as coco
from cocoindex.connectors import localfs, postgres
from cocoindex.ops.text import RecursiveSplitter
@coco.fn(memo=True) # ← cached by hash(input) + hash(code)
async def index_file(file, table):
for chunk in RecursiveSplitter().split(await file.read_text()):
table.declare_row(text=chunk.text, embedding=embed(chunk.text))
@coco.fn
async def main(src):
table = await postgres.mount_table_target(PG, table_name="docs")
table.declare_vector_index(column="embedding")
await coco.mount_each(index_file, localfs.walk_dir(src).items(), table)
coco.App(coco.AppConfig(name="docs"), main, src="./docs").update_blocking()
Run once to backfill. Re-run anytime — only the changed files re-embed. 运行一次即可完成回填。随时重新运行 —— 只有更改过的文件才会重新进行向量化(re-embed)。
Building with an AI coding agent? Drop in our CocoIndex skill so your agent writes correct v1 code — concepts, APIs, patterns, all in one file. 正在使用 AI 编程助手开发?引入我们的 CocoIndex 技能,让您的助手编写出正确的 v1 代码 —— 概念、API、模式,尽在一处。
Incremental engine for long-horizon agents. Data transformation for any engineer, designed for AI workloads — with a smart incremental engine for always-fresh, explainable data. 面向长周期 Agent 的增量引擎。专为 AI 工作负载设计的工程化数据转换工具 —— 凭借智能增量引擎,实现数据始终新鲜且可解释。
Why incremental? Your agents are only as good as the data they see. Batch pipelines drift stale. CocoIndex stays live — and only runs the Δ. 为什么要增量?Agent 的能力取决于它所看到的数据。批处理流水线会产生滞后,而 CocoIndex 保持实时 —— 并且只处理增量。
CocoIndex Enterprise: Large corpus — built for enterprise scale. Incremental compute is the only way to keep large corpora fresh without re-embedding them every cycle. CocoIndex 企业版:针对大规模语料库 —— 为企业级规模构建。增量计算是保持大型语料库新鲜的唯一途径,无需在每个周期都进行全量重新向量化。
CocoIndex scales from a single repo to petabyte-scale stores — parallel by default, delta-only by design. Process once. Reconcile forever. CocoIndex 可从单个代码库扩展至 PB 级存储 —— 默认并行,设计上仅处理增量。一次处理,永久对齐。
When a source changes, CocoIndex identifies the affected records, propagates the change across joins and lookups, updates the target, and retires stale rows — without touching anything that didn’t change. 当源数据发生变化时,CocoIndex 会识别受影响的记录,通过连接(join)和查找(lookup)传播变更,更新目标数据,并移除陈旧行 —— 而不会触及任何未发生变化的数据。
Built on a Rust engine. The core is Rust — production-grade from day zero. Parallel chunking, zero-copy transforms where possible, and failure isolation so one bad record doesn’t stall the flow. 基于 Rust 引擎构建。核心采用 Rust —— 从第一天起就具备生产级品质。支持并行分块、尽可能实现零拷贝转换,并具备故障隔离机制,确保单条错误记录不会导致整个流程停滞。