Migrating Your GitHub CI to Hugging Face Jobs
Migrating Your GitHub CI to Hugging Face Jobs
将 GitHub CI 迁移至 Hugging Face Jobs
If you have a GitHub repository and you have GitHub Actions enabled, you probably use GitHub-hosted runners for CI. That is the default for many projects because it is simple: add a workflow, write runs-on: ubuntu-latest, and GitHub gives you a machine. That default is convenient, but it also has limits. GitHub Actions can be slow or down for maintenance, the hosted machines are generic, and GPU access is not something most open-source projects can just turn on.
如果你拥有一个 GitHub 仓库并启用了 GitHub Actions,你可能正在使用 GitHub 托管的运行器(runners)进行持续集成(CI)。对于许多项目来说,这是默认选择,因为它非常简单:添加一个工作流,写入 runs-on: ubuntu-latest,GitHub 就会为你提供一台机器。这种默认设置虽然方便,但也存在局限性。GitHub Actions 可能会运行缓慢或因维护而停机,托管的机器配置通用,且大多数开源项目无法直接开启 GPU 访问权限。
For Trackio, those limits started to matter. We wanted both reliable CPU CI for basic unit tests and frontend checks, but also GPU CI for tests that need to run on actual CUDA hardware. So built an alternative: keep GitHub Actions in charge of CI, but run the jobs on Hugging Face Jobs. The result: Trackio’s CI now runs on Hugging Face Jobs and streams back real-time logs, cutting our CI time for CPU jobs by about 30% and enabling a whole new test suite that runs on GPU machines!
对于 Trackio 而言,这些限制开始变得棘手。我们既需要可靠的 CPU CI 来进行基础单元测试和前端检查,也需要 GPU CI 来运行必须在真实 CUDA 硬件上执行的测试。因此,我们构建了一个替代方案:保留 GitHub Actions 作为 CI 的控制端,但在 Hugging Face Jobs 上运行实际任务。结果是:Trackio 的 CI 现在运行在 Hugging Face Jobs 上并实时流式传输日志,这使我们的 CPU 任务 CI 时间缩短了约 30%,并启用了一套全新的、可在 GPU 机器上运行的测试套件!
In this article, we explain step-by-step how to recreate the same setup for your GitHub repo. If you are using an agent, you can point it to this article, since we provide CLI instructions alongside browser-based instructions for us humans. Let’s start with a quick intro to Hugging Face Jobs!
在本文中,我们将逐步解释如何为你的 GitHub 仓库重建相同的设置。如果你正在使用 AI 代理(agent),你可以将此文章提供给它,因为我们不仅为人类用户提供了基于浏览器的操作指南,还提供了 CLI 指令。让我们先快速了解一下 Hugging Face Jobs!
What is Hugging Face Jobs?
什么是 Hugging Face Jobs?
Hugging Face Jobs lets you run commands or scripts on Hugging Face’s serverless infrastructure with almost any hardware flavor. A Job is essentially: a command to run a Docker image, from Docker Hub or a Hugging Face Space; a hardware flavor, such as cpu-upgrade, t4-small, or h200 GPU; and optional environment variables and secrets.
Hugging Face Jobs 允许你在 Hugging Face 的无服务器(serverless)基础设施上运行命令或脚本,并支持几乎任何硬件规格。一个 Job 本质上包含:一个用于运行 Docker 镜像的命令(来自 Docker Hub 或 Hugging Face Space)、一种硬件规格(例如 cpu-upgrade、t4-small 或 h200 GPU),以及可选的环境变量和密钥。
That makes Jobs a natural fit for CI. CI jobs are already command-driven, already run in clean environments, and often benefit from choosing exactly the right hardware. For ML libraries, the GPU case is especially compelling: you can run a test suite on real GPU hardware without maintaining your own always-on runner. The key step is connecting GitHub Actions to HF Jobs, which we describe below.
这使得 Jobs 成为 CI 的天然适配方案。CI 任务本身就是命令驱动的,运行在干净的环境中,并且通常能从选择精确的硬件配置中获益。对于机器学习库来说,GPU 的应用场景尤为引人注目:你可以在真实的 GPU 硬件上运行测试套件,而无需维护自己的常驻运行器。关键步骤是将 GitHub Actions 连接到 HF Jobs,我们将在下文进行说明。
The architecture
架构设计
For this setup, we created huggingface/jobs-actions, a small bridge that turns a GitHub Actions job into an ephemeral self-hosted runner running inside an HF Job. The complete flow looks like this:
为了实现这一设置,我们创建了 huggingface/jobs-actions,这是一个小型桥接工具,可以将 GitHub Actions 任务转换为运行在 HF Job 内部的临时自托管运行器。完整流程如下:
-
A pull request triggers a GitHub Actions workflow.
-
GitHub queues any job whose
runs-onlabel is not available, for examplehf-jobs-cpu-upgradeorhf-jobs-t4-small, and sends a signedworkflow_job.queuedwebhook to the dispatcher through the GitHub App. -
The dispatcher Space verifies the webhook, checks for an
hf-jobs-*label, mints a short-lived GitHub runner registration token, and starts an HF Job on the matching hardware. -
The HF Job boots an ephemeral GitHub Actions runner and registers it with the repo using that one-shot token.
-
GitHub assigns the pending workflow job to that runner; the runner executes the CI job, reports status back to GitHub, and exits.
-
Pull Request 触发 GitHub Actions 工作流。
-
GitHub 将任何
runs-on标签不可用的任务(例如hf-jobs-cpu-upgrade或hf-jobs-t4-small)放入队列,并通过 GitHub App 向调度器(dispatcher)发送已签名的workflow_job.queuedWebhook。 -
调度器 Space 验证 Webhook,检查
hf-jobs-*标签,生成一个短期的 GitHub 运行器注册令牌,并在匹配的硬件上启动一个 HF Job。 -
HF Job 启动一个临时的 GitHub Actions 运行器,并使用该一次性令牌将其注册到仓库中。
-
GitHub 将挂起的任务分配给该运行器;运行器执行 CI 任务,将状态报告回 GitHub,然后退出。
From GitHub’s point of view, this is just a self-hosted runner. From Hugging Face’s point of view, it is just a Job that launches a container to run the workflow steps from the repo’s GitHub Actions.
从 GitHub 的角度来看,这只是一个自托管运行器。从 Hugging Face 的角度来看,这只是一个启动容器来运行仓库 GitHub Actions 工作流步骤的 Job。
Step 1: Duplicate the dispatcher Space
第一步:复制调度器 Space
The first thing you need is the dispatcher. This is a small Docker Space that receives GitHub workflow_job webhook events and launches HF Jobs in response. Create this first because the GitHub App needs a webhook URL, and that URL comes from the Space.
你首先需要的是调度器。这是一个小型 Docker Space,用于接收 GitHub workflow_job Webhook 事件并据此启动 HF Jobs。请先创建它,因为 GitHub App 需要一个 Webhook URL,而该 URL 正是由这个 Space 提供的。
Web setup 网页设置
Go to huggingface/jobs-actions-dispatcher and click Duplicate this Space. Use:
- Owner: your HF user or org
- Name:
jobs-actions-dispatcher - Hardware:
cpu-upgrade
前往 huggingface/jobs-actions-dispatcher 并点击 Duplicate this Space。使用:
- 所有者: 你的 HF 用户名或组织名
- 名称:
jobs-actions-dispatcher - 硬件:
cpu-upgrade
Use cpu-upgrade for real CI so the dispatcher stays available for GitHub webhooks. cpu-basic is fine for testing and will probably work, but it can sleep after inactivity; if GitHub’s webhook arrives while it is waking up, the workflow may stay queued forever.
在正式的 CI 环境中请使用 cpu-upgrade,以确保调度器始终能响应 GitHub 的 Webhook。cpu-basic 适合测试且通常也能工作,但它在闲置后会进入休眠;如果 GitHub 的 Webhook 在它唤醒期间到达,工作流可能会一直处于排队状态。
After it builds, open the duplicated Space. You will see a section that says “Required Space secrets,” which you can ignore for now. The landing page should display the GitHub App webhook URL you need in the next step. It will look like this: https://YOUR-HF-NAMESPACE-jobs-actions-dispatcher.hf.space/webhook
构建完成后,打开复制的 Space。你会看到一个名为 “Required Space secrets” 的部分,目前可以忽略。落地页应显示你在下一步中需要的 GitHub App Webhook URL。它看起来像这样:https://YOUR-HF-NAMESPACE-jobs-actions-dispatcher.hf.space/webhook
Step 2: Create and install the GitHub App
第二步:创建并安装 GitHub App
Next, create and install the GitHub App from the dispatcher Space itself. This App needs permission to listen for queued workflow jobs and create ephemeral self-hosted runner registration tokens.
接下来,直接从调度器 Space 创建并安装 GitHub App。该 App 需要权限来监听排队的工作流任务,并创建临时的自托管运行器注册令牌。
Web setup 网页设置
Open your duplicated dispatcher Space: https://YOUR-HF-NAMESPACE-jobs-actions-dispatcher.hf.space. In the setup form, enter the GitHub repo whose CI should run on HF Jobs: YOUR-GITHUB-ORG/YOUR-REPO. Then click the button to create the GitHub App. GitHub will ask you to choose a name for the App; the name can be anything, as long as it is available in your GitHub account or org.
打开你复制的调度器 Space:https://YOUR-HF-NAMESPACE-jobs-actions-dispatcher.hf.space。在设置表单中,输入需要运行在 HF Jobs 上的 GitHub 仓库:YOUR-GITHUB-ORG/YOUR-REPO。然后点击按钮创建 GitHub App。GitHub 会要求你为 App 选择一个名称;名称可以是任何内容,只要在你的 GitHub 账户或组织中可用即可。
After you submit, the final screen tells you exactly how to upload the App credentials to the dispatcher Space with the hf CLI. Important note: you will need to provide an Hugging Face token that has permissions to launch Jobs, corresponding to your personal account or an org under which Jobs should be charged. This token should be saved as the HF_TOKEN secret in your dispatcher Space. Finally, you will install the App on…
提交后,最终屏幕会详细说明如何使用 hf CLI 将 App 凭据上传到调度器 Space。重要提示:你需要提供一个拥有启动 Jobs 权限的 Hugging Face Token,该 Token 应对应你的个人账户或需要扣费的组织。此 Token 应保存为调度器 Space 中的 HF_TOKEN 密钥。最后,你将把 App 安装到……