PaddlePaddle / PaddleOCR
PaddlePaddle / PaddleOCR
Global Leading OCR Toolkit & Document AI Engine 全球领先的 OCR 工具包与文档 AI 引擎
PaddleOCR converts PDF documents and images into structured, LLM-ready data (JSON/Markdown) with industry-leading accuracy. With 70k+ Stars and trusted by top-tier projects like Dify, RAGFlow, and Cherry Studio, PaddleOCR is the bedrock for building intelligent RAG and Agentic applications. PaddleOCR 能够以行业领先的准确率,将 PDF 文档和图像转换为结构化、适配大模型的 JSON/Markdown 数据。凭借超过 7 万颗 Star,并受到 Dify、RAGFlow 和 Cherry Studio 等顶级项目的信赖,PaddleOCR 已成为构建智能 RAG(检索增强生成)和 Agent(智能体)应用的核心基石。
🚀 Key Features / 核心功能
📄 Intelligent Document Parsing (LLM-Ready) 智能文档解析(适配大模型)
Transforming messy visuals into structured data for the LLM era. 为大模型时代将杂乱的视觉信息转化为结构化数据。
-
SOTA Document VLM: Featuring PaddleOCR-VL-1.6 (0.9B), the industry’s leading lightweight vision-language model for document parsing. It achieves 96.3% accuracy on OmniDocBench v1.6, leads in text, formula, and table recognition, and shows significantly enhanced capabilities in ancient documents, rare characters, seals, and charts, with structured outputs in Markdown and JSON formats. SOTA 文档视觉语言模型 (VLM): 搭载行业领先的轻量级文档解析视觉语言模型 PaddleOCR-VL-1.6 (0.9B)。该模型在 OmniDocBench v1.6 上达到 96.3% 的准确率,在文本、公式和表格识别方面处于领先地位,并在古籍、生僻字、印章和图表识别方面展现出显著增强的能力,支持以 Markdown 和 JSON 格式输出结构化结果。
-
Structure-Aware Conversion: Powered by PP-StructureV3, seamlessly convert complex PDFs and images into Markdown or JSON. Unlike the PaddleOCR-VL series models, it provides more fine-grained coordinate information, including table cell coordinates, text coordinates, and more. 结构感知转换: 基于 PP-StructureV3,可将复杂的 PDF 和图像无缝转换为 Markdown 或 JSON。与 PaddleOCR-VL 系列模型不同,它提供了更细粒度的坐标信息,包括表格单元格坐标、文本坐标等。
-
Production-Ready Efficiency: Achieve commercial-grade accuracy with an ultra-small footprint. Outperforms numerous closed-source solutions in public benchmarks while remaining resource-efficient for edge/cloud deployment. 生产级效率: 以极小的资源占用实现商业级准确率。在公开基准测试中超越了众多闭源解决方案,同时保持了极高的资源效率,适用于边缘端及云端部署。
🔍 Universal Text Recognition (Scene OCR) 通用文本识别(场景 OCR)
The global gold standard for high-speed, multilingual text spotting. 高速、多语言文本检测与识别的全球黄金标准。
-
100+ Languages Supported: Native recognition for a vast global library. Our PP-OCRv5 single-model solution elegantly handles multilingual mixed documents (Chinese, English, Japanese, Pinyin, etc.). 支持 100+ 种语言: 原生支持全球海量语言库。我们的 PP-OCRv5 单模型解决方案能够优雅地处理多语言混合文档(中文、英文、日文、拼音等)。
-
Complex Element Mastery: Beyond standard text recognition, we support natural scene text spotting across a wide range of environments, including IDs, street views, books, and industrial components. 复杂元素掌握: 除了标准的文本识别,我们还支持各种环境下的自然场景文本检测,包括身份证件、街景、书籍和工业组件等。
-
Performance Leap: PP-OCRv5 delivers a 13% accuracy boost over previous versions, maintaining the “Extreme Efficiency” that PaddleOCR is famous for. 性能飞跃: PP-OCRv5 较前代版本准确率提升了 13%,同时保持了 PaddleOCR 闻名遐迩的“极致效率”。
🛠️ Developer-Centric Ecosystem 开发者友好生态
-
Seamless Integration: The premier choice for the AI Agent ecosystem—deeply integrated with Dify, RAGFlow, Pathway, and Cherry Studio. 无缝集成: AI Agent 生态的首选方案,已与 Dify、RAGFlow、Pathway 和 Cherry Studio 深度集成。
-
LLM Data Flywheel: A complete pipeline to build high-quality datasets, providing a sustainable “Data Engine” for fine-tuning Large Language Models. 大模型数据飞轮: 构建高质量数据集的完整流水线,为大模型微调提供可持续的“数据引擎”。
-
One-Click Deployment: Supports various hardware backends (NVIDIA GPU, Intel CPU, Kunlunxin XPU, and diverse AI Accelerators). 一键部署: 支持多种硬件后端(NVIDIA GPU、Intel CPU、昆仑芯 XPU 及各类 AI 加速器)。
📣 Recent updates / 近期更新
🔥 2026.05.28: Release of PaddleOCR 3.6.0 2026.05.28:发布 PaddleOCR 3.6.0
-
PaddleOCR-VL-1.6 highlights: New SOTA Accuracy: Achieves over 96.3% on OmniDocBench v1.6, also sets new SOTA on OmniDocBench v1.5 and Real5-OmniDocBench, leading both open-source and proprietary solutions in text, formula, and table recognition. PaddleOCR-VL-1.6 亮点: 全新 SOTA 准确率:在 OmniDocBench v1.6 上达到 96.3% 以上,同时在 OmniDocBench v1.5 和 Real5-OmniDocBench 上刷新 SOTA 纪录,在文本、公式和表格识别方面领先于所有开源及商业解决方案。
-
Comprehensive Capability Upgrade: Significant improvements in table, ancient document, and rare character recognition, with notably enhanced seal recognition, spotting, and chart understanding across multiple scenarios. 全面能力升级: 表格、古籍和生僻字识别能力显著提升,在多场景下的印章识别、检测及图表理解能力得到显著增强。
-
Seamless Migration: Model architecture is fully consistent with PaddleOCR-VL-1.5, enabling zero-cost adaptation—swap and go. 无缝迁移: 模型架构与 PaddleOCR-VL-1.5 完全一致,实现零成本适配——即插即用。
-
Try it now: Available on HuggingFace or our Official Website. 立即尝试: 可在 HuggingFace 或我们的官方网站获取。
2026.04.21: Release of PaddleOCR 3.5.0 2026.04.21:发布 PaddleOCR 3.5.0
-
Flexible inference backends: Seamlessly switch between Paddle static graph, Paddle dynamic graph, or Transformers. PaddleOCR is now deeply integrated with the Hugging Face ecosystem, and 20 major models support Transformers as the inference backend. 灵活的推理后端: 可在 Paddle 静态图、动态图或 Transformers 之间无缝切换。PaddleOCR 现已与 Hugging Face 生态深度集成,20 个主流模型支持使用 Transformers 作为推理后端。
-
Office documents to Markdown: Convert common document formats such as Word, Excel, and PowerPoint into Markdown. 办公文档转 Markdown: 支持将 Word、Excel 和 PowerPoint 等常见文档格式转换为 Markdown。
-
DOCX export for parsed results: The PaddleOCR-VL series, PP-StructureV3, and PP-DocTranslation now support exporting parsed results to DOCX for convenient viewing and editing in Microsoft Word. 解析结果导出为 DOCX: PaddleOCR-VL 系列、PP-StructureV3 和 PP-DocTranslation 现支持将解析结果导出为 DOCX,方便在 Microsoft Word 中查看和编辑。
-
Official browser inference SDK: Released PaddleOCR.js, the official browser inference SDK that supports running PP-OCRv5 directly in the browser. 官方浏览器推理 SDK: 发布 PaddleOCR.js,这是官方的浏览器推理 SDK,支持直接在浏览器中运行 PP-OCRv5。
2026.01.29: Release of PaddleOCR 3.4.0 2026.01.29:发布 PaddleOCR 3.4.0
-
PaddleOCR-VL-1.5 (SOTA 0.9B VLM): Our latest flagship model for document parsing is now live! 94.5% Accuracy on OmniDocBench: Surpassing top-tier general large models and specialized document parsers. PaddleOCR-VL-1.5 (SOTA 0.9B VLM): 我们最新的文档解析旗舰模型现已上线!在 OmniDocBench 上达到 94.5% 的准确率,超越了顶级通用大模型和专业文档解析器。
-
Real-World Robustness: First to introduce the PP-DocLayoutV3 algorithm for irregular shape positioning, mastering 5 tough scenarios: Skew, Warping, Scanning, Illumination, and Screen Photography. 真实场景鲁棒性: 首次引入用于不规则形状定位的 PP-DocLayoutV3 算法,攻克了 5 大难题:倾斜、形变、扫描、光照和屏幕拍摄。
-
Capability Expansion: Now supports Seal Recognition, Text Spotting, and expands to 111 languages (including China’s Tibetan script and Bengali). 能力扩展: 现支持印章识别、文本检测,并扩展至 111 种语言(包括中国藏文和孟加拉语)。
-
Long Document Mastery: Supports automatic cross-page table merging and hierarchical heading identification. 长文档处理: 支持自动跨页表格合并和层级标题识别。
2025.10.16: Release of PaddleOCR 3.3.0 2025.10.16:发布 PaddleOCR 3.3.0
- Released PaddleOCR-VL: Model Introduction: PaddleOCR-VL is a SOTA and resource-efficient model tailored for document parsing. Its core component is PaddleOCR-VL-0.9B, a compact yet powerful vision-language model (VLM) that integrates a NaViT-style dynamic resolution visual encoder with the ERNIE-4.5-0.3B language model to enable accurate element recognition. 发布 PaddleOCR-VL: 模型介绍:PaddleOCR-VL 是一款专为文档解析打造的 SOTA 且资源高效的模型。其核心组件是 PaddleOCR-VL-0.9B,这是一个小巧而强大的视觉语言模型 (VLM),它将 NaViT 风格的动态分辨率视觉编码器与 ERNIE-4.5-0.3B 语言模型相结合,实现了精准的元素识别。