PP-OCRv6 on Hugging Face: 50-Language OCR from 1.5M to 34.5M Parameters
PP-OCRv6 on Hugging Face: 50-Language OCR from 1.5M to 34.5M Parameters
PP-OCRv6 登陆 Hugging Face:从 1.5M 到 34.5M 参数的 50 语言 OCR 模型
Evaluate PP-OCRv6 online, then integrate lightweight, production-ready OCR with PaddlePaddle, Transformers, or ONNX Runtime backend. 您可以在线评估 PP-OCRv6,并通过 PaddlePaddle、Transformers 或 ONNX Runtime 后端集成轻量级、生产就绪的 OCR 功能。
PP-OCRv6 is the latest generation of PaddleOCR’s universal OCR model family. It is designed for real-world text detection and recognition across documents, screenshots, multilingual images, digital displays, industrial labels, and scene text. The model family scales from 1.5M to 34.5M parameters, with three tiers: tiny, small, and medium. The medium and small tiers support 50 languages, including Simplified Chinese, Traditional Chinese, English, Japanese, and 46 Latin-script languages. PP-OCRv6 是 PaddleOCR 通用 OCR 模型系列的最新一代产品。它专为处理文档、截图、多语言图像、数字显示屏、工业标签和场景文本等真实世界中的文本检测与识别而设计。该模型系列参数量从 1.5M 到 34.5M 不等,分为 tiny(微型)、small(小型)和 medium(中型)三个层级。其中 medium 和 small 层级支持 50 种语言,包括简体中文、繁体中文、英语、日语以及 46 种拉丁语系语言。
Try PP-OCRv6 online quickly: PP-OCRv6 Online Demo. 快速在线体验 PP-OCRv6:PP-OCRv6 在线演示。
On PaddleOCR’s official in-house multi-scenario OCR benchmarks, PP-OCRv6_medium reaches 86.2% detection Hmean and 83.2% recognition accuracy. Compared with PP-OCRv5_server, it improves text detection by +4.6 percentage points and text recognition by +5.1 percentage points. 在 PaddleOCR 官方的多场景 OCR 基准测试中,PP-OCRv6_medium 的检测 Hmean 达到 86.2%,识别准确率达到 83.2%。与 PP-OCRv5_server 相比,文本检测性能提升了 4.6 个百分点,文本识别性能提升了 5.1 个百分点。
PP-OCRv6 focuses on a practical OCR need: producing accurate, structured text outputs with small models and flexible deployment options. For a deeper discussion of why specialized OCR models remain useful in the VLM era, see our previous blog: PP-OCRv5 on Hugging Face: A Specialized Approach to OCR. PP-OCRv6 专注于一个实际的 OCR 需求:通过小模型和灵活的部署选项,生成准确的结构化文本输出。关于为何在视觉语言模型(VLM)时代专用 OCR 模型依然重要,请参阅我们之前的博客:《PP-OCRv5 on Hugging Face: A Specialized Approach to OCR》。
What’s new in PP-OCRv6
PP-OCRv6 有哪些新特性?
PP-OCRv6 introduces architecture, training, and data improvements across detection and recognition. The main design goal is to improve OCR accuracy while keeping model sizes suitable for different deployment settings. PP-OCRv6 在检测和识别方面引入了架构、训练和数据层面的改进。其主要设计目标是在保持模型尺寸适用于不同部署环境的同时,提升 OCR 的准确性。
Three model tiers
三种模型层级
PP-OCRv6 provides three model tiers, covering different model sizes and OCR accuracy levels. PP-OCRv6 提供三个模型层级,涵盖了不同的模型尺寸和 OCR 准确度水平。
| Model | Model size | Detection Hmean | Recognition accuracy | Typical application scenarios |
|---|---|---|---|---|
| 模型 | 模型大小 | 检测 Hmean | 识别准确率 | 典型应用场景 |
| PP-OCRv6_tiny | 1.5M params | 80.6% | 73.5% | Edge devices, lightweight local OCR, latency-sensitive demos, constrained environments |
| PP-OCRv6_tiny | 1.5M 参数 | 80.6% | 73.5% | 边缘设备、轻量级本地 OCR、延迟敏感型演示、受限环境 |
| PP-OCRv6_small | 7.7M params | 84.1% | 81.3% | Mobile, desktop, balanced OCR services, multilingual OCR with lower compute cost |
| PP-OCRv6_small | 7.7M 参数 | 84.1% | 81.3% | 移动端、桌面端、平衡型 OCR 服务、低计算成本的多语言 OCR |
| PP-OCRv6_medium | 34.5M params | 86.2% | 83.2% | Accuracy-oriented OCR, server-side pipelines, industrial OCR, document ingestion, multilingual OCR |
| PP-OCRv6_medium | 34.5M 参数 | 86.2% | 83.2% | 追求高精度的 OCR、服务端流水线、工业 OCR、文档录入、多语言 OCR |
PPLCNetV4 backbone
PPLCNetV4 主干网络
PP-OCRv6 uses PPLCNetV4 as a unified backbone for text detection and text recognition. For developers, the main benefit is consistency across the model family. The tiny, small, and medium tiers are not unrelated models; they are part of the same OCR family and share a common architectural direction. PP-OCRv6 使用 PPLCNetV4 作为文本检测和识别的统一主干网络。对于开发者而言,其主要优势在于整个模型系列的一致性。tiny、small 和 medium 层级并非互不相关的模型,它们属于同一个 OCR 家族,并共享共同的架构方向。
RepLKFPN for text detection
用于文本检测的 RepLKFPN
Text detection is the first stage of the OCR pipeline. Detection quality affects the crops sent to the recognizer, and poor crops often lead to poorer recognition. PP-OCRv6 upgrades the detection module with RepLKFPN, a lightweight large-kernel feature pyramid network designed for multi-scale text detection while keeping inference efficient. This is relevant for real-world OCR inputs, where text may be small, dense, rotated, low-resolution, or embedded in complex backgrounds. 文本检测是 OCR 流水线的第一阶段。检测质量直接影响送入识别器的裁剪图像,而糟糕的裁剪往往会导致识别效果下降。PP-OCRv6 使用 RepLKFPN 升级了检测模块,这是一种轻量级大核特征金字塔网络,旨在实现多尺度文本检测的同时保持高效推理。这对于真实世界的 OCR 输入至关重要,因为文本可能很小、密集、旋转、低分辨率或嵌入在复杂的背景中。
EncoderWithLightSVTR for recognition
用于识别的 EncoderWithLightSVTR
For text recognition, PP-OCRv6 uses EncoderWithLightSVTR. It combines local context modeling with global attention to improve recognition quality on challenging text crops. The recognition improvements are especially relevant for multilingual text, screen text, industrial characters, special symbols, dense text, and noisy image regions. 在文本识别方面,PP-OCRv6 使用了 EncoderWithLightSVTR。它结合了局部上下文建模与全局注意力机制,提升了在复杂文本裁剪区域的识别质量。这些识别改进对于多语言文本、屏幕文本、工业字符、特殊符号、密集文本和噪声图像区域尤为有效。
Unified multilingual OCR
统一的多语言 OCR
The medium and small tiers support 50 languages in one model family, covering Simplified Chinese, Traditional Chinese, English, Japanese, and 46 Latin-script languages. This helps reduce the need for separate OCR models across common multilingual OCR scenarios. medium 和 small 层级在同一个模型系列中支持 50 种语言,涵盖简体中文、繁体中文、英语、日语以及 46 种拉丁语系语言。这有助于减少在常见多语言 OCR 场景中对独立 OCR 模型的需求。
Quick start with PaddleOCR
PaddleOCR 快速入门
Install PaddleOCR: pip install paddleocr
安装 PaddleOCR:pip install paddleocr
Run OCR with Paddle Inference (Default backend): 使用 Paddle Inference(默认后端)运行 OCR:
from paddleocr import PaddleOCR
# Model: PP-OCRv6_medium(Default)
# Backend: Paddle Inference(Default)
ocr = PaddleOCR(
use_doc_orientation_classify=False,
use_doc_unwarping=False,
use_textline_orientation=False,
)
result = ocr.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png")
for res in result:
res.print()
res.save_to_img("output")
res.save_to_json("output")
The OCR result can be saved as visualization images and structured JSON output. The structured output can then be used by downstream systems such as document parsing, search, extraction, RAG, analytics, or agent workflows. OCR 结果可以保存为可视化图像和结构化 JSON 输出。结构化输出随后可用于文档解析、搜索、提取、RAG(检索增强生成)、分析或智能体工作流等下游系统。
Available inference backends
可用的推理后端
PP-OCRv6 can be used with multiple inference backends through PaddleOCR. PaddleOCR 3.7 provides a unified inference-engine interface, where engine selects the underlying runtime and related configuration can be passed through the pipeline or module API.
通过 PaddleOCR,PP-OCRv6 可以使用多种推理后端。PaddleOCR 3.7 提供了一个统一的推理引擎接口,通过 engine 参数选择底层运行时,相关配置可以通过流水线或模块 API 进行传递。
| Backend | Description |
|---|---|
| 后端 | 描述 |
| Transformers | Hugging Face / PyTorch-oriented inference path for supported PaddleOCR models |
| Transformers | 针对受支持的 PaddleOCR 模型,面向 Hugging Face / PyTorch 的推理路径 |
| ONNX Runtime | Portable inference path for ONNX-based deployment environments |
| ONNX Runtime | 基于 ONNX 的部署环境的可移植推理路径 |
| Paddle Inference | Native Paddle inference format |
| Paddle Inference | 原生 Paddle 推理格式 |
For Hugging Face users, PaddleOCR supports running selected OCR and document parsing models with a Transformers backend. This can be enabled with: engine="transformers"
对于 Hugging Face 用户,PaddleOCR 支持使用 Transformers 后端运行选定的 OCR 和文档解析模型。可以通过设置 engine="transformers" 来启用。
For more details on how the Transformers backend works in PaddleOCR, see: PaddleOCR: Running OCR and Document Parsing Tasks with a Transformers Backend 关于 Transformers 后端在 PaddleOCR 中如何工作的更多详情,请参阅:PaddleOCR: Running OCR and Document Parsing Tasks with a Transformers Backend
Run PP-OCRv6 example with Transformer Backend: 使用 Transformer 后端运行 PP-OCRv6 示例:
from paddleocr import PaddleOCR
# Model: PP-OCRv6_medium(Default)
# Backend: transformers
ocr = PaddleOCR(
use_doc_orientation_classify=False,
use_doc_unwarping=False,
use_textline_orientation=False,
engine="transformers",
)
result = ocr.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png")