How hard can it be to build a CI/CD system?

构建一个 CI/CD 系统到底有多难？

How hard can it be to build a CI/CD system? That question stuck with me long enough that I actually started building one. Not because someone asked me to. Not because I spotted a market gap. Just because the question wouldn’t go away. 构建一个 CI/CD 系统到底有多难？这个问题在我脑海中盘旋了太久，以至于我真的开始动手构建一个了。不是因为有人要求我这么做，也不是因为我发现了什么市场空白，仅仅是因为这个问题一直挥之不去。

The trigger was Concourse CI. I’ve been using it for a while and what I love about it is the resource abstraction, an interface that anything external has to follow. Check for new versions, pull them, push back. As a Go developer this kind of clean interface resonates with me. Everything in the pipeline is just something that implements that contract. But the operational overhead is significant. And I needed CI for my own side projects anyway, games and open source tools that require custom environments GitHub Actions can’t provide. So I started building. 触发点是 Concourse CI。我已经使用它一段时间了，我最喜欢的是它的资源抽象（resource abstraction）——一种任何外部事物都必须遵循的接口。检查新版本、拉取、推送。作为一名 Go 开发者，这种简洁的接口让我产生共鸣。流水线中的一切都只是实现了该契约的对象。但它的运维开销很大。况且我自己的副业项目（如游戏和开源工具）也需要 CI，而这些项目需要 GitHub Actions 无法提供的自定义环境。于是，我开始了构建工作。

What I wanted: A single binary that could also scale horizontally when needed. Start with nothing, grow when you need to. You start like this: 我想要的是：一个单一的二进制文件，并且在需要时可以水平扩展。从零开始，按需增长。你可以这样启动：

./pikoci server \
  --db-system mem \
  --pubsub-system mem \
  --run-worker \
  --pipeline-config pipeline.hcl

That’s a complete CI/CD system. In memory, no files, no external services. When you want persistence, add —db-system sqlite. When you need distributed workers, add NATS and start workers on other machines. The pipeline config never changes. 这就是一个完整的 CI/CD 系统。基于内存，无需文件，无需外部服务。当你需要持久化时，添加 --db-system sqlite；当你需要分布式工作节点时，添加 NATS 并在其他机器上启动 worker。流水线配置无需任何更改。

The interesting parts: Four pluggable abstractions

有趣的部分：四个可插拔的抽象

PikoCI has four concepts you define in HCL and can source from a URL: resource types, runners, service types, and secret types. Each follows the same pattern: define the type once, instantiate it with params. PikoCI 有四个概念，你可以在 HCL 中定义它们，并从 URL 加载：资源类型（resource types）、运行器（runners）、服务类型（service types）和密钥类型（secret types）。它们都遵循相同的模式：定义一次类型，然后通过参数实例化。

A resource_type defines how to watch something for changes and fetch it. A resource is an instance of it: resource_type 定义了如何监控变更并获取资源。而 resource 则是它的一个实例：

resource_type "git" { source = "pikoci://git" } # built-in

resource "git" "my-app" {
  params { url = "https://github.com/org/app" name = "app" }
  check_interval = "@every 1m"
}

A runner_type defines where tasks execute. The docker and exec runners are built-in, no declaration needed. Here’s what the docker runner looks like under the hood, in case you want to define your own: runner_type 定义了任务执行的位置。docker 和 exec 运行器是内置的，无需声明。如果你想自定义，可以看看 docker 运行器的底层实现：

runner_type "docker" {
  run {
    path = "docker"
    args = [ "run", "--rm", "-v", "$WORKDIR:/workdir", "-w", "/workdir", "$image", "/bin/sh", "-ec", "$cmd", ]
  }
}

A secret_type defines where credentials come from. Secrets are bound to variables and referenced anywhere in the pipeline: secret_type 定义了凭据的来源。密钥绑定到变量，并在流水线的任何地方引用：

secret_type "vault" { source = "pikoci://vault" }

variable "db_password" {
  secret "vault" { path = "secret/data/db" key = "password" }
}

A service_type defines processes that run alongside your tasks, started before, stopped after, guaranteed regardless of outcome. This is the feature I’m most proud of: service_type 定义了与任务并行运行的进程，在任务前启动，在任务后停止，无论结果如何都会执行。这是我最引以为傲的功能：

service_type "postgres" {
  start "exec" { path = "/bin/sh" args = ["-ec", "docker run -d --name db -p 5432:5432 postgres:16"] }
  ready_check "exec" { path = "/bin/sh" args = ["-ec", "pg_isready -h localhost"] timeout = "30s" }
  stop "exec" { path = "/bin/sh" args = ["-ec", "docker rm -f db"] }
}

No Docker-in-Docker. No docker-compose alongside CI. The service stops regardless of whether the job passed or failed. All four types are sourceable from a URL. The built-ins use pikoci://, the same mechanism as anything you host yourself. If you need a runner that executes jobs in Kubernetes or Azure, write it once, host it anywhere, reference it by URL. 无需 Docker-in-Docker，无需在 CI 旁运行 docker-compose。无论任务成功还是失败，服务都会停止。所有四种类型都可以从 URL 加载。内置类型使用 pikoci://，这与你自己托管的任何资源机制相同。如果你需要一个在 Kubernetes 或 Azure 中执行任务的运行器，只需编写一次，托管在任何地方，并通过 URL 引用即可。

Putting it all together

整合在一起

Here’s a small pipeline that uses all four abstractions at once: 这是一个同时使用这四种抽象的小型流水线：

job "test" {
  get "git" "my-app" { trigger = true }
  service "postgresql" { version = "17" port = "5432" password = var.db_password }
  task "run-tests" {
    run "docker" {
      image = "golang:1.25"
      cmd = "cd app && make integration-test"
      args = ["-v", "/cache/go:/root/go/pkg/mod"]
    }
  }
}

A git resource watches for changes, a Vault secret feeds the Postgres password, Postgres starts as a service, and the task runs inside Docker. All four abstractions, one pipeline. git 资源监控变更，Vault 密钥提供 Postgres 密码，Postgres 作为服务启动，任务在 Docker 中运行。四种抽象，一个流水线。

Running pipelines locally

在本地运行流水线

pikoci run --pipeline-config pipeline.hcl --job test

Any job, on your laptop, no server required. Override resources with local paths, inject secrets via --var. The same pipeline that runs in CI runs on your laptop. 任何任务，在你的笔记本电脑上，无需服务器。通过本地路径覆盖资源，通过 --var 注入密钥。在 CI 中运行的流水线与在笔记本电脑上运行的完全一致。

The queue decision

关于队列的决策

Workers don’t connect directly to the server, they subscribe to a queue. This means workers can be behind NAT, on a laptop, in a different data center, or completely ephemeral. The server never needs to know where workers are. It made distributed workers trivially easy to add. Start a worker anywhere with network access to the queue and it just works. 工作节点（Workers）不直接连接到服务器，而是订阅队列。这意味着工作节点可以位于 NAT 之后、笔记本电脑上、不同的数据中心，或者完全是临时的。服务器无需知道工作节点在哪里。这使得添加分布式工作节点变得极其简单。只需在任何能访问队列的网络位置启动一个 worker，它就能直接工作。

Where it is now

目前的进展

PikoCI deploys itself. The pipeline runs PR checks, mock tests, and integration tests against six different database and queue backends: MariaDB, PostgreSQL, NATS, RabbitMQ, Kafka, and Vault, all running as services. Then it builds multi-arch Docker images and redeploys itself with zero downtime. Those six backends are not just targets, they are the pluggable abstractions themselves. The same PikoCI binary connects to any of them depending on how you start it. Testing against all of them is how I make sure the abstractions actually hold. All of it is publicly visible at ci.pikoci.com. No account needed. It’s Apache 2.0, written in Go, and very much something I use for my own projects. PikoCI 实现了自我部署。流水线运行 PR 检查、模拟测试和针对六种不同数据库及队列后端（MariaDB, PostgreSQL, NATS, RabbitMQ, Kafka, Vault）的集成测试，所有这些都作为服务运行。然后，它构建多架构 Docker 镜像并实现零停机自我部署。这六个后端不仅是目标，它们本身就是可插拔的抽象。同一个 PikoCI 二进制文件根据启动方式连接到其中任何一个。通过对所有这些后端进行测试，我确保了抽象的有效性。所有内容均可在 ci.pikoci.com 公开查看，无需账号。它是 Apache 2.0 协议，使用 Go 编写，确实是我自己项目正在使用的工具。