PP-OCRv6 on Hugging Face: 50-Language OCR from 1.5M to 34.5M Parameters

PP-OCRv6 on Hugging Face: 50-Language OCR from 1.5M to 34.5M Parameters

PP-OCRv6 登陆 Hugging Face:从 1.5M 到 34.5M 参数的 50 语言 OCR 模型

Evaluate PP-OCRv6 online, then integrate lightweight, production-ready OCR with PaddlePaddle, Transformers, or ONNX Runtime backend. 您可以在线评估 PP-OCRv6,并通过 PaddlePaddle、Transformers 或 ONNX Runtime 后端集成轻量级、生产就绪的 OCR 功能。

PP-OCRv6 is the latest generation of PaddleOCR’s universal OCR model family. It is designed for real-world text detection and recognition across documents, screenshots, multilingual images, digital displays, industrial labels, and scene text. The model family scales from 1.5M to 34.5M parameters, with three tiers: tiny, small, and medium. The medium and small tiers support 50 languages, including Simplified Chinese, Traditional Chinese, English, Japanese, and 46 Latin-script languages. PP-OCRv6 是 PaddleOCR 通用 OCR 模型系列的最新一代产品。它专为处理文档、截图、多语言图像、数字显示屏、工业标签和场景文本等真实世界中的文本检测与识别而设计。该模型系列参数量从 1.5M 到 34.5M 不等,分为 tiny(微型)、small(小型)和 medium(中型)三个层级。其中 medium 和 small 层级支持 50 种语言,包括简体中文、繁体中文、英语、日语以及 46 种拉丁语系语言。

Try PP-OCRv6 online quickly: PP-OCRv6 Online Demo. 快速在线体验 PP-OCRv6:PP-OCRv6 在线演示

On PaddleOCR’s official in-house multi-scenario OCR benchmarks, PP-OCRv6_medium reaches 86.2% detection Hmean and 83.2% recognition accuracy. Compared with PP-OCRv5_server, it improves text detection by +4.6 percentage points and text recognition by +5.1 percentage points. 在 PaddleOCR 官方的多场景 OCR 基准测试中,PP-OCRv6_medium 的检测 Hmean 达到 86.2%,识别准确率达到 83.2%。与 PP-OCRv5_server 相比,文本检测性能提升了 4.6 个百分点,文本识别性能提升了 5.1 个百分点。

PP-OCRv6 focuses on a practical OCR need: producing accurate, structured text outputs with small models and flexible deployment options. For a deeper discussion of why specialized OCR models remain useful in the VLM era, see our previous blog: PP-OCRv5 on Hugging Face: A Specialized Approach to OCR. PP-OCRv6 专注于一个实际的 OCR 需求:通过小模型和灵活的部署选项,生成准确的结构化文本输出。关于为何在视觉语言模型(VLM)时代专用 OCR 模型依然重要,请参阅我们之前的博客:《PP-OCRv5 on Hugging Face: A Specialized Approach to OCR》。

What’s new in PP-OCRv6

PP-OCRv6 有哪些新特性?

PP-OCRv6 introduces architecture, training, and data improvements across detection and recognition. The main design goal is to improve OCR accuracy while keeping model sizes suitable for different deployment settings. PP-OCRv6 在检测和识别方面引入了架构、训练和数据层面的改进。其主要设计目标是在保持模型尺寸适用于不同部署环境的同时,提升 OCR 的准确性。

Three model tiers

三种模型层级

PP-OCRv6 provides three model tiers, covering different model sizes and OCR accuracy levels. PP-OCRv6 提供三个模型层级,涵盖了不同的模型尺寸和 OCR 准确度水平。

ModelModel sizeDetection HmeanRecognition accuracyTypical application scenarios
模型模型大小检测 Hmean识别准确率典型应用场景
PP-OCRv6_tiny1.5M params80.6%73.5%Edge devices, lightweight local OCR, latency-sensitive demos, constrained environments
PP-OCRv6_tiny1.5M 参数80.6%73.5%边缘设备、轻量级本地 OCR、延迟敏感型演示、受限环境
PP-OCRv6_small7.7M params84.1%81.3%Mobile, desktop, balanced OCR services, multilingual OCR with lower compute cost
PP-OCRv6_small7.7M 参数84.1%81.3%移动端、桌面端、平衡型 OCR 服务、低计算成本的多语言 OCR
PP-OCRv6_medium34.5M params86.2%83.2%Accuracy-oriented OCR, server-side pipelines, industrial OCR, document ingestion, multilingual OCR
PP-OCRv6_medium34.5M 参数86.2%83.2%追求高精度的 OCR、服务端流水线、工业 OCR、文档录入、多语言 OCR

PPLCNetV4 backbone

PPLCNetV4 主干网络

PP-OCRv6 uses PPLCNetV4 as a unified backbone for text detection and text recognition. For developers, the main benefit is consistency across the model family. The tiny, small, and medium tiers are not unrelated models; they are part of the same OCR family and share a common architectural direction. PP-OCRv6 使用 PPLCNetV4 作为文本检测和识别的统一主干网络。对于开发者而言,其主要优势在于整个模型系列的一致性。tiny、small 和 medium 层级并非互不相关的模型,它们属于同一个 OCR 家族,并共享共同的架构方向。

RepLKFPN for text detection

用于文本检测的 RepLKFPN

Text detection is the first stage of the OCR pipeline. Detection quality affects the crops sent to the recognizer, and poor crops often lead to poorer recognition. PP-OCRv6 upgrades the detection module with RepLKFPN, a lightweight large-kernel feature pyramid network designed for multi-scale text detection while keeping inference efficient. This is relevant for real-world OCR inputs, where text may be small, dense, rotated, low-resolution, or embedded in complex backgrounds. 文本检测是 OCR 流水线的第一阶段。检测质量直接影响送入识别器的裁剪图像,而糟糕的裁剪往往会导致识别效果下降。PP-OCRv6 使用 RepLKFPN 升级了检测模块,这是一种轻量级大核特征金字塔网络,旨在实现多尺度文本检测的同时保持高效推理。这对于真实世界的 OCR 输入至关重要,因为文本可能很小、密集、旋转、低分辨率或嵌入在复杂的背景中。

EncoderWithLightSVTR for recognition

用于识别的 EncoderWithLightSVTR

For text recognition, PP-OCRv6 uses EncoderWithLightSVTR. It combines local context modeling with global attention to improve recognition quality on challenging text crops. The recognition improvements are especially relevant for multilingual text, screen text, industrial characters, special symbols, dense text, and noisy image regions. 在文本识别方面,PP-OCRv6 使用了 EncoderWithLightSVTR。它结合了局部上下文建模与全局注意力机制,提升了在复杂文本裁剪区域的识别质量。这些识别改进对于多语言文本、屏幕文本、工业字符、特殊符号、密集文本和噪声图像区域尤为有效。

Unified multilingual OCR

统一的多语言 OCR

The medium and small tiers support 50 languages in one model family, covering Simplified Chinese, Traditional Chinese, English, Japanese, and 46 Latin-script languages. This helps reduce the need for separate OCR models across common multilingual OCR scenarios. medium 和 small 层级在同一个模型系列中支持 50 种语言,涵盖简体中文、繁体中文、英语、日语以及 46 种拉丁语系语言。这有助于减少在常见多语言 OCR 场景中对独立 OCR 模型的需求。

Quick start with PaddleOCR

PaddleOCR 快速入门

Install PaddleOCR: pip install paddleocr 安装 PaddleOCR:pip install paddleocr

Run OCR with Paddle Inference (Default backend): 使用 Paddle Inference(默认后端)运行 OCR:

from paddleocr import PaddleOCR
# Model: PP-OCRv6_medium(Default)
# Backend: Paddle Inference(Default)
ocr = PaddleOCR(
    use_doc_orientation_classify=False,
    use_doc_unwarping=False,
    use_textline_orientation=False,
)
result = ocr.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png")
for res in result:
    res.print()
    res.save_to_img("output")
    res.save_to_json("output")

The OCR result can be saved as visualization images and structured JSON output. The structured output can then be used by downstream systems such as document parsing, search, extraction, RAG, analytics, or agent workflows. OCR 结果可以保存为可视化图像和结构化 JSON 输出。结构化输出随后可用于文档解析、搜索、提取、RAG(检索增强生成)、分析或智能体工作流等下游系统。

Available inference backends

可用的推理后端

PP-OCRv6 can be used with multiple inference backends through PaddleOCR. PaddleOCR 3.7 provides a unified inference-engine interface, where engine selects the underlying runtime and related configuration can be passed through the pipeline or module API. 通过 PaddleOCR,PP-OCRv6 可以使用多种推理后端。PaddleOCR 3.7 提供了一个统一的推理引擎接口,通过 engine 参数选择底层运行时,相关配置可以通过流水线或模块 API 进行传递。

BackendDescription
后端描述
TransformersHugging Face / PyTorch-oriented inference path for supported PaddleOCR models
Transformers针对受支持的 PaddleOCR 模型,面向 Hugging Face / PyTorch 的推理路径
ONNX RuntimePortable inference path for ONNX-based deployment environments
ONNX Runtime基于 ONNX 的部署环境的可移植推理路径
Paddle InferenceNative Paddle inference format
Paddle Inference原生 Paddle 推理格式

For Hugging Face users, PaddleOCR supports running selected OCR and document parsing models with a Transformers backend. This can be enabled with: engine="transformers" 对于 Hugging Face 用户,PaddleOCR 支持使用 Transformers 后端运行选定的 OCR 和文档解析模型。可以通过设置 engine="transformers" 来启用。

For more details on how the Transformers backend works in PaddleOCR, see: PaddleOCR: Running OCR and Document Parsing Tasks with a Transformers Backend 关于 Transformers 后端在 PaddleOCR 中如何工作的更多详情,请参阅:PaddleOCR: Running OCR and Document Parsing Tasks with a Transformers Backend

Run PP-OCRv6 example with Transformer Backend: 使用 Transformer 后端运行 PP-OCRv6 示例:

from paddleocr import PaddleOCR
# Model: PP-OCRv6_medium(Default)
# Backend: transformers
ocr = PaddleOCR(
    use_doc_orientation_classify=False,
    use_doc_unwarping=False,
    use_textline_orientation=False,
    engine="transformers",
)
result = ocr.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png")