PriorLabs / TabPFN
PriorLabs / TabPFN
TabPFN Quick Start Interactive Notebook Tutorial
TabPFN 快速入门交互式 Notebook 教程
Tip: Dive right in with our interactive Colab notebook! It’s the best way to get a hands-on feel for TabPFN, walking you through installation, classification, and regression examples. 提示:直接通过我们的交互式 Colab notebook 开始体验吧!这是上手 TabPFN 的最佳方式,它将引导你完成安装、分类和回归示例。
⚡ GPU Recommended: For optimal performance, use a GPU (even older ones with ~8GB VRAM work well; 16GB needed for some large datasets). On CPU, only small datasets (≲1000 samples) are feasible. No GPU? Use our free hosted inference via TabPFN Client. ⚡ 推荐使用 GPU:为获得最佳性能,请使用 GPU(即使是约 8GB 显存的旧款 GPU 也能良好运行;部分大型数据集需要 16GB)。在 CPU 上,仅适用于小型数据集(≲1000 个样本)。没有 GPU?请通过 TabPFN Client 使用我们免费的托管推理服务。
Installation
安装
Official installation (pip):
官方安装 (pip):
pip install tabpfn
OR installation from source:
或从源码安装:
pip install "tabpfn @ git+https://github.com/PriorLabs/TabPFN.git"
OR local development installation: First install uv (version 0.10.0 or higher recommended), which we use for development, then run: 或本地开发安装:首先安装我们开发所用的 uv(建议 0.10.0 或更高版本),然后运行:
git clone https://github.com/PriorLabs/TabPFN.git --depth 1
cd TabPFN
uv sync
Basic Usage
基本用法
To use our default TabPFN-2.6 model, trained purely on synthetic data: 使用我们完全在合成数据上训练的默认 TabPFN-2.6 模型:
from tabpfn import TabPFNClassifier, TabPFNRegressor
clf = TabPFNClassifier()
clf.fit(X_train, y_train) # downloads checkpoint on first use
predictions = clf.predict(X_test)
reg = TabPFNRegressor()
reg.fit(X_train, y_train) # downloads checkpoint on first use
predictions = reg.predict(X_test)
To use other model versions (e.g. TabPFN-2.5): 使用其他模型版本(例如 TabPFN-2.5):
from tabpfn import TabPFNClassifier, TabPFNRegressor
from tabpfn.constants import ModelVersion
classifier = TabPFNClassifier.create_default_for_version(ModelVersion.V2_5)
regressor = TabPFNRegressor.create_default_for_version(ModelVersion.V2_5)
For complete examples, see the tabpfn_for_binary_classification.py, tabpfn_for_multiclass_classification.py, and tabpfn_for_regression.py files.
如需完整示例,请查看 tabpfn_for_binary_classification.py、tabpfn_for_multiclass_classification.py 和 tabpfn_for_regression.py 文件。
Usage Tips
使用技巧
- Use batch prediction mode: Each predict call recomputes the training set. Calling predict on 100 samples separately is almost 100 times slower and more expensive than a single call. If the test set is very large, split it into chunks of 1000 samples each.
使用批量预测模式: 每次调用
predict都会重新计算训练集。单独对 100 个样本进行预测,其速度和开销几乎是单次调用的 100 倍。如果测试集非常大,请将其拆分为每块 1000 个样本。 - Avoid data preprocessing: Do not apply data scaling or one-hot encoding when feeding data to the model. 避免数据预处理: 将数据输入模型时,请勿进行数据缩放或独热编码(One-hot encoding)。
- Use a GPU: TabPFN is slow to execute on a CPU. Ensure a GPU is available for better performance. 使用 GPU: TabPFN 在 CPU 上运行缓慢。请确保有 GPU 可用以获得更好的性能。
- Mind the dataset size: TabPFN works best on datasets with fewer than 100,000 samples and 2000 features. For larger datasets, we recommend looking at the Large datasets guide. 注意数据集大小: TabPFN 在样本数少于 100,000 且特征数少于 2000 的数据集上表现最佳。对于更大的数据集,建议查阅“大型数据集指南”。
TabPFN Ecosystem
TabPFN 生态系统
Choose the right TabPFN implementation for your needs: 根据需求选择合适的 TabPFN 实现:
- TabPFN Client: Simple API client for using TabPFN via cloud-based inference. TabPFN Client: 用于通过云端推理使用 TabPFN 的简单 API 客户端。
- TabPFN Extensions: A powerful companion repository packed with advanced utilities, integrations, and features - great place to contribute:
TabPFN Extensions: 一个功能强大的配套仓库,包含高级工具、集成和特性,非常适合贡献代码:
- interpretability: Gain insights with SHAP-based explanations, feature importance, and selection tools. 可解释性: 通过基于 SHAP 的解释、特征重要性和选择工具获得洞察。
- unsupervised: Tools for outlier detection and synthetic tabular data generation. 无监督学习: 用于异常检测和合成表格数据生成的工具。
- embeddings: Extract and use TabPFN’s internal learned embeddings for downstream tasks or analysis. 嵌入(Embeddings): 提取并使用 TabPFN 内部学习到的嵌入,用于下游任务或分析。
- many_class: Handle multi-class classification problems that exceed TabPFN’s built-in class limit. 多分类: 处理超出 TabPFN 内置类别限制的多分类问题。
- rf_pfn: Combine TabPFN with traditional models like Random Forests for hybrid approaches. rf_pfn: 将 TabPFN 与随机森林等传统模型结合,实现混合方法。
- hpo: Automated hyperparameter optimization tailored to TabPFN. 超参数优化(HPO): 专为 TabPFN 定制的自动化超参数优化。
- post_hoc_ensembles: Boost performance by ensembling multiple TabPFN models post-training. 事后集成(Post-hoc Ensembles): 通过在训练后集成多个 TabPFN 模型来提升性能。
To install: 安装方式:
git clone https://github.com/priorlabs/tabpfn-extensions.git
pip install -e tabpfn-extensions
- TabPFN (this repo): Core implementation for fast and local inference with PyTorch and CUDA support. TabPFN(本仓库): 核心实现,支持 PyTorch 和 CUDA,用于快速本地推理。
- TabPFN UX: No-code graphical interface to explore TabPFN capabilities—ideal for business users and prototyping. TabPFN UX: 无代码图形界面,用于探索 TabPFN 的功能,非常适合业务用户和原型设计。
TabPFN Workflow at a Glance
TabPFN 工作流概览
Follow this decision tree to build your model and choose the right extensions from our ecosystem. It walks you through critical questions about your data, hardware, and performance needs, guiding you to the best solution for your specific use case. 遵循此决策树来构建模型并从我们的生态系统中选择合适的扩展。它将引导你思考关于数据、硬件和性能需求的关键问题,从而为你特定的用例找到最佳解决方案。