yifanfeng97 / Hyper-Extract
yifanfeng97 / Hyper-Extract
Smart Knowledge Extraction CLI Transform documents into structured knowledge with one command. 📖 English Version · 中文版
智能知识提取命令行工具 (CLI) 只需一条命令,即可将文档转换为结构化知识。
“Stop reading. Start understanding.” “告别文档焦虑,让信息一目了然”
Hyper-Extract is an intelligent, LLM-powered knowledge extraction and evolution framework. It radically simplifies transforming highly unstructured texts into persistent, predictable, and strongly-typed Knowledge Abstracts. It effortlessly extracts information into a wide spectrum of formats—ranging from simple Collections (Lists/Sets) and Pydantic Models, to complex Knowledge Graphs, Hypergraphs, and even Spatio-Temporal Graphs.
Hyper-Extract 是一个基于大语言模型(LLM)的智能知识提取与演化框架。它极大地简化了将高度非结构化文本转换为持久、可预测且强类型的“知识摘要”的过程。它能够轻松地将信息提取为多种格式,从简单的集合(列表/集合)和 Pydantic 模型,到复杂的知识图谱、超图,甚至是时空图。
✨ Core Features / 核心功能
- 🔷 8 Knowledge Structures: From simple Lists to advanced Graphs, Hypergraphs, and Spatio-Temporal Graphs. 8 种知识结构:从简单的列表到高级的图谱、超图以及时空图。
- 🧠 10+ Extraction Engines: GraphRAG, LightRAG, Hyper-RAG, KG-Gen, and more — ready to use. 10+ 种提取引擎:内置 GraphRAG、LightRAG、Hyper-RAG、KG-Gen 等,开箱即用。
- 📝 80+ YAML Templates: Zero-code extraction across Finance, Legal, Medical, TCM, Industry, and General domains. 80+ 种 YAML 模板:覆盖金融、法律、医疗、中医、工业及通用领域,实现零代码提取。
- 🔄 Incremental Evolution: Feed new documents anytime to expand and refine your knowledge base. 增量演化:随时输入新文档,以扩展和完善您的知识库。
🎯 What Can You Do With It? / 你可以用它做什么?
📄 Researcher — Turn papers into knowledge graphs
Feed a 20-page academic paper, get an interactive graph of key concepts, authors, and citations.
he parse paper.pdf -t general/academic_graph -o ./paper_kb/
he show ./paper_kb/
研究人员 — 将论文转化为知识图谱 输入一份 20 页的学术论文,即可获得包含关键概念、作者和引用的交互式图谱。
🏦 Financial Analyst — Extract entities from earnings reports
Automatically identify companies, executives, financial metrics, and their relationships from unstructured reports.
he parse earnings.md -t finance/earnings_graph -o ./finance_kb/
he search ./finance_kb/ "What are the key risk factors?"
金融分析师 — 从财报中提取实体 自动从非结构化报告中识别公司、高管、财务指标及其相互关系。
🔒 Local Deployment — Keep data on-premise with vLLM
Run Qwen3.5-9B + bge-m3 locally via vLLM. No data leaves your machine.
from hyperextract import create_client
llm, emb = create_client(
llm="vllm:Qwen3.5-9B@http://localhost:8000/v1",
embedder="vllm:bge-m3@http://localhost:8001/v1",
api_key="dummy",
)
本地部署 — 通过 vLLM 确保数据不出本地 通过 vLLM 在本地运行 Qwen3.5-9B + bge-m3,确保数据不离开您的机器。
🚀 Supported Platforms & Models / 支持的平台与模型
Hyper-Extract relies on the LLM’s structured output capability (json_schema or Function Calling). Hyper-Extract 依赖于大模型的结构化输出能力(json_schema 或 Function Calling)。
| Platform | Verified Models |
|---|---|
| OpenAI | gpt-4o, gpt-4o-mini, gpt-5 |
| 阿里云百炼 | qwen-plus, qwen-turbo, deepseek-r1 |
| Local vLLM | Qwen3.5-9B (GPTQ-Marlin) |
Embedding models (semantic search) work with any OpenAI-compatible endpoint: text-embedding-3-small, text-embedding-v4 (Bailian), bge-m3 (local vLLM).
嵌入模型(语义搜索)适用于任何兼容 OpenAI 的端点:text-embedding-3-small、text-embedding-v4(百炼)、bge-m3(本地 vLLM)。
⚡ 30-Second Quick Start / 30 秒快速上手
# Install uv
tool install hyperextract
# Configure API key
he config init -k YOUR_OPENAI_API_KEY
# Extract knowledge from a document
he parse examples/en/tesla.md -t general/biography_graph -o ./output/ -l en
# Query it
he search ./output/ "What are Tesla's major achievements?"
# Visualize
he show ./output/
🐍 Python API
uv pip install hyperextract
from hyperextract import Template
ka = Template.create("general/biography_graph")
with open("examples/en/tesla.md") as f:
result = ka.parse(f.read())
result.show()
📈 Why Hyper-Extract? / 为什么选择 Hyper-Extract?
| Feature | GraphRAG | LightRAG | KG-Gen | ATOM | Hyper-Extract |
|---|---|---|---|---|---|
| Knowledge Graph | ✅ | ✅ | ✅ | ✅ | ✅ |
| Temporal Graph | ✅ | ❌ | ❌ | ✅ | ✅ |
| Spatial Graph | ❌ | ❌ | ❌ | ❌ | ✅ |
| Hypergraph | ❌ | ❌ | ❌ | ❌ | ✅ |
| Domain Templates | ❌ | ❌ | ❌ | ❌ | ✅ |
| Interactive CLI | ✅ | ❌ | ❌ | ❌ | ✅ |
| Multi-language | ✅ | ❌ | ❌ | ❌ | ✅ |
📋 What’s under the hood? (Architecture & Templates) / 核心架构与模板
Hyper-Extract follows a three-layer architecture: Hyper-Extract 采用三层架构:
- Auto-Types: 8 strongly-typed data structures (Model, List, Set, Graph, Hypergraph, Temporal Graph, Spatial Graph, Spatio-Temporal Graph). 自动类型:8 种强类型数据结构(模型、列表、集合、图、超图、时序图、空间图、时空图)。
- Methods: Extraction algorithms: KG-Gen, GraphRAG, LightRAG, Hyper-RAG, Cog-RAG, and more. 方法:提取算法,包括 KG-Gen、GraphRAG、LightRAG、Hyper-RAG、Cog-RAG 等。
- Templates: 80+ presets across 6 domains. Zero-code setup. 模板:覆盖 6 大领域的 80+ 预设,实现零代码配置。
📚 Documentation & Resources / 文档与资源
| Resource | Link |
|---|---|
| Full Documentation | yifanfeng97.github.io/Hyper-Extract |
| CLI Guide | Command-line interface |
| Provider System | Model compatibility & local deployment |
| Template Gallery | 80+ presets |
| Examples | Working code |
🤝 Contributing & License / 贡献与许可
Contributions are welcome! Please submit Issues and PRs. Licensed under Apache-2.0.
欢迎贡献代码!请提交 Issue 和 PR。 本项目采用 Apache-2.0 协议开源。