yifanfeng97 / Hyper-Extract

yifanfeng97 / Hyper-Extract

Smart Knowledge Extraction CLI Transform documents into structured knowledge with one command. 📖 English Version · 中文版

智能知识提取命令行工具 (CLI) 只需一条命令,即可将文档转换为结构化知识。


“Stop reading. Start understanding.” “告别文档焦虑,让信息一目了然”

Hyper-Extract is an intelligent, LLM-powered knowledge extraction and evolution framework. It radically simplifies transforming highly unstructured texts into persistent, predictable, and strongly-typed Knowledge Abstracts. It effortlessly extracts information into a wide spectrum of formats—ranging from simple Collections (Lists/Sets) and Pydantic Models, to complex Knowledge Graphs, Hypergraphs, and even Spatio-Temporal Graphs.

Hyper-Extract 是一个基于大语言模型(LLM)的智能知识提取与演化框架。它极大地简化了将高度非结构化文本转换为持久、可预测且强类型的“知识摘要”的过程。它能够轻松地将信息提取为多种格式,从简单的集合(列表/集合)和 Pydantic 模型,到复杂的知识图谱、超图,甚至是时空图。


✨ Core Features / 核心功能

  • 🔷 8 Knowledge Structures: From simple Lists to advanced Graphs, Hypergraphs, and Spatio-Temporal Graphs. 8 种知识结构:从简单的列表到高级的图谱、超图以及时空图。
  • 🧠 10+ Extraction Engines: GraphRAG, LightRAG, Hyper-RAG, KG-Gen, and more — ready to use. 10+ 种提取引擎:内置 GraphRAG、LightRAG、Hyper-RAG、KG-Gen 等,开箱即用。
  • 📝 80+ YAML Templates: Zero-code extraction across Finance, Legal, Medical, TCM, Industry, and General domains. 80+ 种 YAML 模板:覆盖金融、法律、医疗、中医、工业及通用领域,实现零代码提取。
  • 🔄 Incremental Evolution: Feed new documents anytime to expand and refine your knowledge base. 增量演化:随时输入新文档,以扩展和完善您的知识库。

🎯 What Can You Do With It? / 你可以用它做什么?

📄 Researcher — Turn papers into knowledge graphs

Feed a 20-page academic paper, get an interactive graph of key concepts, authors, and citations.

he parse paper.pdf -t general/academic_graph -o ./paper_kb/
he show ./paper_kb/

研究人员 — 将论文转化为知识图谱 输入一份 20 页的学术论文,即可获得包含关键概念、作者和引用的交互式图谱。

🏦 Financial Analyst — Extract entities from earnings reports

Automatically identify companies, executives, financial metrics, and their relationships from unstructured reports.

he parse earnings.md -t finance/earnings_graph -o ./finance_kb/
he search ./finance_kb/ "What are the key risk factors?"

金融分析师 — 从财报中提取实体 自动从非结构化报告中识别公司、高管、财务指标及其相互关系。

🔒 Local Deployment — Keep data on-premise with vLLM

Run Qwen3.5-9B + bge-m3 locally via vLLM. No data leaves your machine.

from hyperextract import create_client
llm, emb = create_client(
    llm="vllm:Qwen3.5-9B@http://localhost:8000/v1",
    embedder="vllm:bge-m3@http://localhost:8001/v1",
    api_key="dummy",
)

本地部署 — 通过 vLLM 确保数据不出本地 通过 vLLM 在本地运行 Qwen3.5-9B + bge-m3,确保数据不离开您的机器。


🚀 Supported Platforms & Models / 支持的平台与模型

Hyper-Extract relies on the LLM’s structured output capability (json_schema or Function Calling). Hyper-Extract 依赖于大模型的结构化输出能力(json_schema 或 Function Calling)。

PlatformVerified Models
OpenAIgpt-4o, gpt-4o-mini, gpt-5
阿里云百炼qwen-plus, qwen-turbo, deepseek-r1
Local vLLMQwen3.5-9B (GPTQ-Marlin)

Embedding models (semantic search) work with any OpenAI-compatible endpoint: text-embedding-3-small, text-embedding-v4 (Bailian), bge-m3 (local vLLM). 嵌入模型(语义搜索)适用于任何兼容 OpenAI 的端点:text-embedding-3-smalltext-embedding-v4(百炼)、bge-m3(本地 vLLM)。


⚡ 30-Second Quick Start / 30 秒快速上手

# Install uv
tool install hyperextract

# Configure API key
he config init -k YOUR_OPENAI_API_KEY

# Extract knowledge from a document
he parse examples/en/tesla.md -t general/biography_graph -o ./output/ -l en

# Query it
he search ./output/ "What are Tesla's major achievements?"

# Visualize
he show ./output/

🐍 Python API

uv pip install hyperextract
from hyperextract import Template

ka = Template.create("general/biography_graph")
with open("examples/en/tesla.md") as f:
    result = ka.parse(f.read())
result.show()

📈 Why Hyper-Extract? / 为什么选择 Hyper-Extract?

FeatureGraphRAGLightRAGKG-GenATOMHyper-Extract
Knowledge Graph
Temporal Graph
Spatial Graph
Hypergraph
Domain Templates
Interactive CLI
Multi-language

📋 What’s under the hood? (Architecture & Templates) / 核心架构与模板

Hyper-Extract follows a three-layer architecture: Hyper-Extract 采用三层架构:

  1. Auto-Types: 8 strongly-typed data structures (Model, List, Set, Graph, Hypergraph, Temporal Graph, Spatial Graph, Spatio-Temporal Graph). 自动类型:8 种强类型数据结构(模型、列表、集合、图、超图、时序图、空间图、时空图)。
  2. Methods: Extraction algorithms: KG-Gen, GraphRAG, LightRAG, Hyper-RAG, Cog-RAG, and more. 方法:提取算法,包括 KG-Gen、GraphRAG、LightRAG、Hyper-RAG、Cog-RAG 等。
  3. Templates: 80+ presets across 6 domains. Zero-code setup. 模板:覆盖 6 大领域的 80+ 预设,实现零代码配置。

📚 Documentation & Resources / 文档与资源

ResourceLink
Full Documentationyifanfeng97.github.io/Hyper-Extract
CLI GuideCommand-line interface
Provider SystemModel compatibility & local deployment
Template Gallery80+ presets
ExamplesWorking code

🤝 Contributing & License / 贡献与许可

Contributions are welcome! Please submit Issues and PRs. Licensed under Apache-2.0.

欢迎贡献代码!请提交 Issue 和 PR。 本项目采用 Apache-2.0 协议开源。