yifanfeng97 / Hyper-Extract

Smart Knowledge Extraction CLI Transform documents into structured knowledge with one command. 📖 English Version · 中文版

智能知识提取命令行工具 (CLI) 只需一条命令，即可将文档转换为结构化知识。

“Stop reading. Start understanding.” “告别文档焦虑，让信息一目了然”

Hyper-Extract is an intelligent, LLM-powered knowledge extraction and evolution framework. It radically simplifies transforming highly unstructured texts into persistent, predictable, and strongly-typed Knowledge Abstracts. It effortlessly extracts information into a wide spectrum of formats—ranging from simple Collections (Lists/Sets) and Pydantic Models, to complex Knowledge Graphs, Hypergraphs, and even Spatio-Temporal Graphs.

Hyper-Extract 是一个基于大语言模型（LLM）的智能知识提取与演化框架。它极大地简化了将高度非结构化文本转换为持久、可预测且强类型的“知识摘要”的过程。它能够轻松地将信息提取为多种格式，从简单的集合（列表/集合）和 Pydantic 模型，到复杂的知识图谱、超图，甚至是时空图。

✨ Core Features / 核心功能

🔷 8 Knowledge Structures: From simple Lists to advanced Graphs, Hypergraphs, and Spatio-Temporal Graphs. 8 种知识结构：从简单的列表到高级的图谱、超图以及时空图。
🧠 10+ Extraction Engines: GraphRAG, LightRAG, Hyper-RAG, KG-Gen, and more — ready to use. 10+ 种提取引擎：内置 GraphRAG、LightRAG、Hyper-RAG、KG-Gen 等，开箱即用。
📝 80+ YAML Templates: Zero-code extraction across Finance, Legal, Medical, TCM, Industry, and General domains. 80+ 种 YAML 模板：覆盖金融、法律、医疗、中医、工业及通用领域，实现零代码提取。
🔄 Incremental Evolution: Feed new documents anytime to expand and refine your knowledge base. 增量演化：随时输入新文档，以扩展和完善您的知识库。

🎯 What Can You Do With It? / 你可以用它做什么？

📄 Researcher — Turn papers into knowledge graphs

Feed a 20-page academic paper, get an interactive graph of key concepts, authors, and citations.

he parse paper.pdf -t general/academic_graph -o ./paper_kb/
he show ./paper_kb/

研究人员 — 将论文转化为知识图谱 输入一份 20 页的学术论文，即可获得包含关键概念、作者和引用的交互式图谱。

🏦 Financial Analyst — Extract entities from earnings reports

Automatically identify companies, executives, financial metrics, and their relationships from unstructured reports.

he parse earnings.md -t finance/earnings_graph -o ./finance_kb/
he search ./finance_kb/ "What are the key risk factors?"

金融分析师 — 从财报中提取实体 自动从非结构化报告中识别公司、高管、财务指标及其相互关系。

🔒 Local Deployment — Keep data on-premise with vLLM

Run Qwen3.5-9B + bge-m3 locally via vLLM. No data leaves your machine.

from hyperextract import create_client
llm, emb = create_client(
    llm="vllm:Qwen3.5-9B@http://localhost:8000/v1",
    embedder="vllm:bge-m3@http://localhost:8001/v1",
    api_key="dummy",
)

本地部署 — 通过 vLLM 确保数据不出本地 通过 vLLM 在本地运行 Qwen3.5-9B + bge-m3，确保数据不离开您的机器。

🚀 Supported Platforms & Models / 支持的平台与模型

Hyper-Extract relies on the LLM’s structured output capability (json_schema or Function Calling). Hyper-Extract 依赖于大模型的结构化输出能力（json_schema 或 Function Calling）。

Platform	Verified Models
OpenAI	gpt-4o, gpt-4o-mini, gpt-5
阿里云百炼	qwen-plus, qwen-turbo, deepseek-r1
Local vLLM	Qwen3.5-9B (GPTQ-Marlin)

Embedding models (semantic search) work with any OpenAI-compatible endpoint: text-embedding-3-small, text-embedding-v4 (Bailian), bge-m3 (local vLLM). 嵌入模型（语义搜索）适用于任何兼容 OpenAI 的端点：text-embedding-3-small、text-embedding-v4（百炼）、bge-m3（本地 vLLM）。

⚡ 30-Second Quick Start / 30 秒快速上手

# Install uv
tool install hyperextract

# Configure API key
he config init -k YOUR_OPENAI_API_KEY

# Extract knowledge from a document
he parse examples/en/tesla.md -t general/biography_graph -o ./output/ -l en

# Query it
he search ./output/ "What are Tesla's major achievements?"

# Visualize
he show ./output/

🐍 Python API

uv pip install hyperextract
from hyperextract import Template

ka = Template.create("general/biography_graph")
with open("examples/en/tesla.md") as f:
    result = ka.parse(f.read())
result.show()

📈 Why Hyper-Extract? / 为什么选择 Hyper-Extract？

Feature	GraphRAG	LightRAG	KG-Gen	ATOM	Hyper-Extract
Knowledge Graph	✅	✅	✅	✅	✅
Temporal Graph	✅	❌	❌	✅	✅
Spatial Graph	❌	❌	❌	❌	✅
Hypergraph	❌	❌	❌	❌	✅
Domain Templates	❌	❌	❌	❌	✅
Interactive CLI	✅	❌	❌	❌	✅
Multi-language	✅	❌	❌	❌	✅

📋 What’s under the hood? (Architecture & Templates) / 核心架构与模板

Hyper-Extract follows a three-layer architecture: Hyper-Extract 采用三层架构：

Auto-Types: 8 strongly-typed data structures (Model, List, Set, Graph, Hypergraph, Temporal Graph, Spatial Graph, Spatio-Temporal Graph). 自动类型：8 种强类型数据结构（模型、列表、集合、图、超图、时序图、空间图、时空图）。
Methods: Extraction algorithms: KG-Gen, GraphRAG, LightRAG, Hyper-RAG, Cog-RAG, and more. 方法：提取算法，包括 KG-Gen、GraphRAG、LightRAG、Hyper-RAG、Cog-RAG 等。
Templates: 80+ presets across 6 domains. Zero-code setup. 模板：覆盖 6 大领域的 80+ 预设，实现零代码配置。

📚 Documentation & Resources / 文档与资源

Resource	Link
Full Documentation	yifanfeng97.github.io/Hyper-Extract
CLI Guide	Command-line interface
Provider System	Model compatibility & local deployment
Template Gallery	80+ presets
Examples	Working code

🤝 Contributing & License / 贡献与许可

Contributions are welcome! Please submit Issues and PRs. Licensed under Apache-2.0.

欢迎贡献代码！请提交 Issue 和 PR。本项目采用 Apache-2.0 协议开源。