PaddleOCR 3.5: Running OCR and Document Parsing Tasks with a Transformers Backend
PaddleOCR 3.5: Running OCR and Document Parsing Tasks with a Transformers Backend
PaddleOCR 3.5 brings OCR and document parsing tasks closer to the Hugging Face ecosystem. With this release, supported PaddleOCR models can run with Hugging Face Transformers as an inference backend by setting: engine="transformers"
PaddleOCR 3.5 将 OCR 和文档解析任务与 Hugging Face 生态系统拉得更近。在此版本中,通过设置 engine="transformers",受支持的 PaddleOCR 模型可以使用 Hugging Face Transformers 作为推理后端来运行。
PaddleOCR continues to provide OCR model series such as PP-OCRv5 and document parsing model series such as PaddleOCR-VL 1.5, while Transformers becomes one of the supported backends for running them. Try the live demo on Hugging Face Spaces: https://huggingface.co/spaces/PaddlePaddle/paddleocr-3.5-transformers-demo
PaddleOCR 继续提供诸如 PP-OCRv5 等 OCR 模型系列和诸如 PaddleOCR-VL 1.5 等文档解析模型系列,而 Transformers 则成为运行这些模型所支持的后端之一。欢迎在 Hugging Face Spaces 上体验在线演示:https://huggingface.co/spaces/PaddlePaddle/paddleocr-3.5-transformers-demo
What changed?
有哪些变化?
PaddleOCR 3.5 introduces a more flexible inference-engine interface. Developers can select the backend through the engine parameter and pass backend-specific options through engine_config. In practice, this means:
PaddleOCR 3.5 引入了更灵活的推理引擎接口。开发者可以通过 engine 参数选择后端,并通过 engine_config 传递特定于后端的选项。在实践中,这意味着:
-
The pipelines behind these tasks are managed by PaddleOCR, so developers do not need to manually call each internal component.
-
这些任务背后的流水线由 PaddleOCR 管理,因此开发者无需手动调用每个内部组件。
-
Transformers becomes one of the supported inference backends for running supported PaddleOCR models.
-
Transformers 成为运行受支持 PaddleOCR 模型的推理后端之一。
-
Developers can configure backend-related options such as
dtype, device placement, and attention implementation throughengine_config. -
开发者可以通过
engine_config配置与后端相关的选项,例如dtype(数据类型)、设备放置和注意力实现方式。
A simple way to understand the stack: 理解该技术栈的简单方式:
| Layer | What it means | Examples |
|---|---|---|
| Application layer | Applications that use OCR and document parsing outputs | RAG, agents, Document AI… |
| Model layer | OCR and document parsing capabilities | PP-OCRv5, PaddleOCR-VL 1.5… |
| Inference backend layer | Runtime used to run supported models | Paddle static graph, Paddle dynamic graph, Transformers |
| 层级 | 含义 | 示例 |
|---|---|---|
| 应用层 | 使用 OCR 和文档解析输出的应用 | RAG, 智能体, 文档 AI… |
| 模型层 | OCR 和文档解析能力 | PP-OCRv5, PaddleOCR-VL 1.5… |
| 推理后端层 | 用于运行受支持模型的运行时 | Paddle 静态图, Paddle 动态图, Transformers |
This release is mainly about the inference backend layer: PaddleOCR continues to provide OCR and document parsing capabilities, while Transformers gives supported PaddleOCR models another backend option that fits naturally into Hugging Face-centered environments. The larger Document AI workflow remains in the hands of developers and application builders.
此版本主要关注推理后端层:PaddleOCR 继续提供 OCR 和文档解析能力,而 Transformers 为受支持的 PaddleOCR 模型提供了另一种后端选择,使其能够自然地融入以 Hugging Face 为中心的环境中。更宏大的文档 AI 工作流依然由开发者和应用构建者掌控。
Why this matters
为什么这很重要
For RAG, Document AI, and document agent applications, the hard part often starts before the LLM. Developers first need to turn PDFs, scanned documents, screenshots, tables, charts, formulas, and complex page layouts into reliable structured data. If this ingestion step is weak, the downstream LLM workflow may miss key information, retrieve the wrong context, or produce unreliable answers.
对于 RAG(检索增强生成)、文档 AI 和文档智能体应用而言,难点往往在 LLM(大语言模型)介入之前就开始了。开发者首先需要将 PDF、扫描文档、截图、表格、图表、公式和复杂的页面布局转化为可靠的结构化数据。如果这一摄入步骤薄弱,下游的 LLM 工作流可能会遗漏关键信息、检索到错误的上下文或产生不可靠的答案。
PaddleOCR helps address this document ingestion challenge by providing OCR series models such as PP-OCRv5 and document parsing series models such as PaddleOCR-VL-1.5. With PaddleOCR 3.5, these capabilities are now easier to connect with Transformers-centered stacks. Supported PaddleOCR models can run with a Transformers backend, while PaddleOCR continues to manage the OCR or document parsing pipeline behind the scenes. For developers, this means less integration friction and a more natural path from documents to downstream RAG, agent, search, analytics, or automation workflows.
PaddleOCR 通过提供诸如 PP-OCRv5 等 OCR 系列模型和诸如 PaddleOCR-VL-1.5 等文档解析系列模型,帮助解决这一文档摄入挑战。随着 PaddleOCR 3.5 的发布,这些能力现在可以更轻松地与以 Transformers 为中心的技术栈连接。受支持的 PaddleOCR 模型可以使用 Transformers 后端运行,同时 PaddleOCR 继续在后台管理 OCR 或文档解析流水线。对于开发者而言,这意味着更少的集成摩擦,以及从文档到下游 RAG、智能体、搜索、分析或自动化工作流更自然的路径。
Quick start
快速开始
Install PaddleOCR 3.5, PaddleX, Transformers, and a compatible PyTorch build for your hardware. For example, on a CUDA 12.6 environment:
安装 PaddleOCR 3.5、PaddleX、Transformers 以及与您的硬件兼容的 PyTorch 版本。例如,在 CUDA 12.6 环境下:
python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
python -m pip install "paddleocr==3.5.0" "paddlex==3.5.2" "transformers>=5.4.0"
For CPU, ROCm, or other environments, install the PyTorch build that matches your target hardware. Run from the command line:
对于 CPU、ROCm 或其他环境,请安装与您的目标硬件匹配的 PyTorch 版本。通过命令行运行:
paddleocr ocr \
-i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png \
--device gpu:0 \
--engine transformers
Or use the Python API: 或者使用 Python API:
from paddleocr import PaddleOCR
pipeline = PaddleOCR(
device="gpu:0",
engine="transformers",
use_doc_orientation_classify=False,
use_doc_unwarping=False,
use_textline_orientation=False,
engine_config={
"dtype": "float32",
},
)
results = pipeline.predict(
"https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png"
)
for result in results:
print(result)
The Hugging Face Space uses float32 for broad compatibility. For your own hardware, you can tune backend-specific options through engine_config:
Hugging Face Space 使用 float32 以实现广泛的兼容性。对于您自己的硬件,可以通过 engine_config 调整特定于后端的选项:
engine_config = {
"dtype": "bfloat16",
"device_type": "gpu",
"device_id": 0,
"attn_implementation": "sdpa",
}
The best configuration depends on your model, hardware, and deployment environment.
最佳配置取决于您的模型、硬件和部署环境。
When should you use the Transformers backend?
什么时候应该使用 Transformers 后端?
Use the Transformers backend when you want PaddleOCR’s OCR and document parsing capabilities to fit more naturally into a Hugging Face-centered stack. This is especially useful if you are building RAG, Document AI, search, analytics, or agent applications and already rely on PyTorch / Transformers infrastructure for model loading, experimentation, deployment, or model artifact management.
当您希望 PaddleOCR 的 OCR 和文档解析能力更自然地融入以 Hugging Face 为中心的技术栈时,请使用 Transformers 后端。如果您正在构建 RAG、文档 AI、搜索、分析或智能体应用,并且已经依赖 PyTorch / Transformers 基础设施进行模型加载、实验、部署或模型制品管理,这一点尤其有用。
The Transformers backend is a good fit when you want: 当您需要以下内容时,Transformers 后端是一个很好的选择:
- a more familiar development experience for teams already using Transformers,
- 对于已经使用 Transformers 的团队来说,更熟悉的开发体验;
- Hub-compatible model discovery and distribution for supported PaddleOCR models,
- 针对受支持 PaddleOCR 模型与 Hub 兼容的模型发现和分发;
- easier integration with existing PyTorch / Transformers services.
- 与现有 PyTorch / Transformers 服务更轻松的集成。
When maximizing OCR or document parsing throughput is the priority, PaddleOCR’s default paddle_static backend is usually the recommended choice. This release is not about replacing one backend with another. It is about giving developers more flexibility: use PaddleOCR for OCR and document parsing capabilities, and choose the inference backend that best fits your stack.
当最大化 OCR 或文档解析吞吐量是首要任务时,通常建议使用 PaddleOCR 默认的 paddle_static 后端。此版本并非旨在用一个后端取代另一个,而是为了给开发者提供更大的灵活性:使用 PaddleOCR 获取 OCR 和文档解析能力,并选择最适合您技术栈的推理后端。
Try it now
立即尝试
Try the PaddleOCR 3.5 Transformers demo on Hugging Face Spaces: https://huggingface.co/spaces/PaddlePaddle/paddleocr-3.5-transformers-demo
在 Hugging Face Spaces 上尝试 PaddleOCR 3.5 Transformers 演示:https://huggingface.co/spaces/PaddlePaddle/paddleocr-3.5-transformers-demo
Explore PaddleOCR models on the Hub: https://huggingface.co/PaddlePaddle/models
在 Hub 上探索 PaddleOCR 模型:https://huggingface.co/PaddlePaddle/models