subPath ConfigMap Mounts Don't Hot-Reload: Silent Drift in Kubernetes

subPath ConfigMap Mounts Don’t Hot-Reload: Silent Drift in Kubernetes

subPath ConfigMap 挂载不支持热重载：Kubernetes 中的静默配置漂移

A Pipelock instance running in a Kubernetes cluster watched its config file for hours while four edits to the underlying ConfigMap landed in etcd. The dashboards showed updates. The pod showed an old config. The tests that exercised the new config kept failing for reasons that made no sense. The problem is not Pipelock. The problem is subPath. 一个运行在 Kubernetes 集群中的 Pipelock 实例监控其配置文件长达数小时，期间底层的 ConfigMap 已经更新了四次并成功写入 etcd。仪表盘显示配置已更新，但 Pod 内部仍在使用旧配置。测试用例因无法理解的原因持续失败。问题不在于 Pipelock，而在于 subPath。

Mount a ConfigMap key as a single file with subPath, and kubelet stops propagating updates to that mount. The behavior is documented but easy to miss, and it is the load-bearing reason any service that runs hot-reload in Kubernetes needs to think about how its config volume is mounted. 当使用 subPath 将 ConfigMap 的某个键挂载为单个文件时，kubelet 将停止向该挂载点同步更新。这种行为虽然有文档记录，但很容易被忽略。这也是任何需要在 Kubernetes 中实现热重载的服务必须考虑配置卷挂载方式的关键原因。

The shape of the bug

Bug 的表现形式

A reasonable-looking ConfigMap mount in a Deployment spec: 在 Deployment 配置中，一个看起来很合理的 ConfigMap 挂载方式如下：

spec:
  containers:
  - name: pipelock
    volumeMounts:
    - name: config
      mountPath: /etc/pipelock/pipelock.yaml
      subPath: pipelock.yaml
  volumes:
  - name: config
    configMap:
      name: pipelock-config

The pod gets /etc/pipelock/pipelock.yaml populated from the pipelock.yaml key of the pipelock-config ConfigMap. Other files in /etc/pipelock/ are unaffected. This is what subPath was designed for: pin one file to one path without taking over the whole directory. Pod 中的 /etc/pipelock/pipelock.yaml 会由 pipelock-config ConfigMap 中的 pipelock.yaml 键填充。/etc/pipelock/ 下的其他文件不受影响。这正是 subPath 的设计初衷：将单个文件固定到特定路径，而无需接管整个目录。

The drift surfaces when you kubectl edit configmap pipelock-config, change the value, and watch the running pod for the change to propagate. It does not propagate. The running pod’s view of /etc/pipelock/pipelock.yaml is the same content it had at pod creation. The kubelet has updated the underlying ConfigMap volume, but the bind mount that subPath created points at a different inode that is not part of the update path. 当你执行 kubectl edit configmap pipelock-config 修改值并观察运行中的 Pod 是否同步更新时，配置漂移就会出现。它不会同步。运行中的 Pod 所看到的 /etc/pipelock/pipelock.yaml 依然是 Pod 创建时的内容。kubelet 虽然更新了底层的 ConfigMap 卷，但 subPath 创建的绑定挂载（bind mount）指向的是一个不同的 inode，该 inode 并不在更新路径中。

Restart the pod and the new content shows up. fsnotify watchers configured to react to file changes never fire because the file the container sees is, from the container’s perspective, the same file it has always been. 重启 Pod 后，新内容才会出现。配置为响应文件变更的 fsnotify 监听器永远不会触发，因为从容器的角度来看，它所看到的文件始终是同一个文件。

Why the directory mount works

为什么目录挂载可以正常工作

Drop the subPath and mount the whole ConfigMap as a directory: 去掉 subPath 并将整个 ConfigMap 作为目录挂载：

spec:
  containers:
  - name: pipelock
    volumeMounts:
    - name: config
      mountPath: /etc/pipelock
  volumes:
  - name: config
    configMap:
      name: pipelock-config

Now /etc/pipelock/ contains every key from the ConfigMap. Kubelet syncs the directory periodically, subject to its sync period and ConfigMap cache. The atomic-update mechanism Kubernetes uses for ConfigMap volumes replaces a symlink that points at the current “version” of the data. Watchers need to watch the mounted directory or reopen the file path after an update, because the inode under an old file descriptor can change. With that watch shape, the service hot-reloads correctly. 现在 /etc/pipelock/ 包含了 ConfigMap 中的所有键。Kubelet 会根据同步周期和 ConfigMap 缓存定期同步该目录。Kubernetes 为 ConfigMap 卷使用的原子更新机制会替换指向当前数据“版本”的符号链接。监听器需要监控挂载目录，或者在更新后重新打开文件路径，因为旧文件描述符下的 inode 可能会发生变化。采用这种监听方式，服务即可正确实现热重载。

The cost is that /etc/pipelock/ now belongs to the ConfigMap. If you had other files in that directory (a CA certificate from a different volume, a generated state file written by an init container), the directory mount overwrites them. You have to mount each piece into a directory of its own and let the service compose them at runtime. 代价是 /etc/pipelock/ 现在完全归 ConfigMap 所有。如果你在该目录下有其他文件（例如来自不同卷的 CA 证书，或由 init 容器生成的状态文件），目录挂载会覆盖它们。你必须将每个部分挂载到各自独立的目录中，并让服务在运行时将它们组合起来。

The kubelet propagation lifecycle

Kubelet 的同步生命周期

Kubelet runs a sync loop that watches the API server for ConfigMap and Secret updates. When an update lands, kubelet writes the new content to a versioned directory inside the volume’s emptyDir on the node, then atomically swaps a symlink. The container, which had been reading through the symlink, now reads the new version. The whole swap is one syscall, so readers either see the old version or the new version, never a torn state. Kubelet 运行一个同步循环，监控 API Server 以获取 ConfigMap 和 Secret 的更新。当更新到达时，kubelet 会将新内容写入节点上卷的 emptyDir 内的一个版本化目录中，然后原子地交换符号链接。容器通过符号链接读取数据，因此现在读取到的是新版本。整个交换过程是一个系统调用，因此读取者要么看到旧版本，要么看到新版本，绝不会出现数据损坏的状态。

subPath works by computing the source path at pod creation and creating a bind mount to that specific path. The bind mount captures the inode that backs the file at that moment. Kubelet’s atomic swap operates on the symlink in the volume, not on the inode the bind mount points at. The bind survives the swap and continues to point at the original inode, which kubelet never updates. subPath 的工作原理是在 Pod 创建时计算源路径，并创建到该特定路径的绑定挂载。绑定挂载会捕获该时刻支持该文件的 inode。Kubelet 的原子交换操作作用于卷中的符号链接，而不是绑定挂载所指向的 inode。绑定挂载在交换后依然存在，并继续指向原始的 inode，而 kubelet 永远不会更新该 inode。

There is no documented kubelet behavior that re-evaluates a subPath mount during a ConfigMap update. The upstream issue, kubernetes/kubernetes#50345, has been open since 2017. The current state of the world is “subPath plus ConfigMap is static for running containers.” 目前没有任何文档记录表明 kubelet 会在 ConfigMap 更新期间重新评估 subPath 挂载。上游问题 kubernetes/kubernetes#50345 自 2017 年以来一直处于开启状态。目前的现状是：“对于运行中的容器，subPath 配合 ConfigMap 是静态的。”

Where this hurts

影响范围

Anyone running a service that watches its config file for live updates. Pipelock has fsnotify-based hot-reload on its config (SIGHUP is also supported). Other services with the same shape: 任何运行需要监控配置文件以实现实时更新的服务的人都会受到影响。Pipelock 基于 fsnotify 对其配置进行热重载（也支持 SIGHUP）。具有相同特征的其他服务包括：

Envoy and most service-mesh proxies, which use file-based dynamic configuration discovery.
Envoy 及大多数服务网格代理，它们使用基于文件的动态配置发现。
Prometheus, which reloads scrape configs on file change.
Prometheus，它在文件变更时重新加载抓取配置。
Nginx with the auto_reload patches.
带有 auto_reload 补丁的 Nginx。
Any custom service that watches its config for runtime updates.
任何监控其配置以进行运行时更新的自定义服务。

For all of these, subPath is a silent foot-gun. The service starts up, reads its config, watches the file, and never sees the file change because the file is a frozen bind mount. 对于所有这些服务，subPath 都是一个隐蔽的“陷阱”。服务启动、读取配置、监控文件，却永远无法感知文件变更，因为该文件是一个被冻结的绑定挂载。

The damage scales with how much you trust your config-update workflow. If you kubectl apply a new ConfigMap and assume the running pod picks it up, every minute between the apply and the next pod restart is a minute the cluster is running stale config. For a security tool, that gap means the new policy is not enforced, the new pattern does not match, the new allowlist is not honored. The dashboards say one thing. The reality is another. 这种损害程度取决于你对配置更新工作流的信任程度。如果你执行 kubectl apply 更新 ConfigMap 并假设运行中的 Pod 会自动获取，那么从应用更新到 Pod 重启之间的每一分钟，集群都在运行过期的配置。对于安全工具而言，这意味着新策略未被执行、新模式未匹配、新白名单未生效。仪表盘显示的是一回事，而现实却是另一回事。

Two patterns that work

两种可行的方案

Mount the directory, expose the file. Mount the ConfigMap as a directory volume, and read the file by path from inside that directory. This is the simplest pattern for services that own their config directory. The cost is that the directory is now ConfigMap-shaped, so anything else that needs to live there has to come from a different mount point. 挂载目录，暴露文件。 将 ConfigMap 作为目录卷挂载，并从该目录内部通过路径读取文件。对于拥有自己配置目录的服务来说，这是最简单的模式。代价是该目录现在完全由 ConfigMap 决定，因此任何其他需要存放在该目录下的文件都必须来自不同的挂载点。
Sidecar plus emptyDir. A sidecar container mounts the ConfigMap as a directory, watches for updates, and writes the consolidated file to a shared emptyDir volume that the main container reads. This adds a moving piece, but it lets you compose multiple sources (ConfigMap, Secret, downward API, environment) into a single config file at a single path. Sidecar 加 emptyDir。 一个 Sidecar 容器将 ConfigMap 挂载为目录，监控更新，并将整合后的文件写入主容器读取的共享 emptyDir 卷中。这增加了一个组件，但它允许你将多个来源（ConfigMap、Secret、Downward API、环境变量）组合成单个路径下的单个配置文件。