Screw you Realtek

Screw you Realtek / 去你的 Realtek

So I’ve got three nodes in my homelab k8s cluster (celebrated its 7’th birthday the other day 🎈 ) that are lovely little lenovo M75 boxes. They’re cheap, reasonably powerful, not too old, and make great k8s nodes for light workloads. The big problem, however, is that they have Realtek RTL8111/8168/8211/8411 NICs in them, which are not good.

我的家庭实验室 k8s 集群里有三个节点(前几天刚庆祝了 7 岁生日 🎈),它们是可爱的小型联想 M75 主机。这些机器便宜、性能尚可、不算太旧,非常适合作为轻负载的 k8s 节点。然而,最大的问题在于它们配备了 Realtek RTL8111/8168/8211/8411 网卡,这些网卡表现并不理想。

The problems begin A few months ago, I was troubleshooting soft-hangs, and the internet suggested that the in-kernel r8169 driver did not behave very well under load. “My system stops responding when there’s more than a dribble of network traffic” was a common thing that people complained about. There was a suggestion that the the out-of-tree r8168 driver might work better. I run my k8s nodes on debian stable, and the r8168-dkms package is available there, so I give it a shot. The dkms package takes care of building the kernel module and blacklisting the r8169 driver which makes switching pretty straightforward. I blindly do this on the three k8s nodes that have this NIC, and reboot. Problem fixed! No more soft-hangs!

问题开始了。几个月前,我在排查系统软挂起(soft-hangs)问题时,网上有人建议说内核自带的 r8169 驱动在高负载下表现不佳。“当网络流量稍微大一点时,系统就会停止响应”是人们常抱怨的问题。有人建议使用内核树外的 r8168 驱动可能会更好。我的 k8s 节点运行在 Debian Stable 上,那里正好有 r8168-dkms 包,于是我试了一下。DKMS 包会自动处理内核模块的构建并禁用 r8169 驱动,这使得切换过程非常简单。我在那三台配备此网卡的 k8s 节点上盲目地执行了这些操作并重启。问题解决了!再也没有软挂起了!

Some time later At some point, I move house. When rebuilding / cabling the homelab, I notice that there’s a little gentle jankyness. One of the k8s nodes takse a little bit too long pulling container images from the local registry cache. Poking around, nothing seems obviously wrong. I run an iperf test across the node pool and oh dear:

一段时间后,我搬了家。在重建家庭实验室并重新布线时,我注意到了一些轻微的卡顿。其中一个 k8s 节点从本地镜像仓库拉取容器镜像的时间过长。四处检查后,没发现什么明显的异常。我在节点池中运行了 iperf 测试,结果糟糕透了:

[ 1] local 2001:8b0:c8f:e8b0::2f port 5001 connected with 2001:8b0:c8f:e8b0::2c port 53764 [ ID] Interval Transfer Bandwidth [ 1] 0.0000-10.0204 sec 1.15 GBytes 987 Mbits/sec [ 2] local 2001:8b0:c8f:e8b0::2f port 5001 connected with 2001:8b0:c8f:e8b0::2e port 35998 [ ID] Interval Transfer Bandwidth [ 2] 0.0000-11.5529 sec 4.75 MBytes 3.45 Mbits/sec

987Mbit/s good. 3.45Mbit/s bad. These are roughly the same hardware, same driver, same OS, same switch. I swap the cable, the switch port, no change. Claude (hah!) makes a suggestion that this is a jumbo frames problem. Does clamping the MTU on the iperf client help? It turns out that iperf -6 -M 1500 shows the performance back to line rate. Huh. Everything’s configured for jumbo frames. MTU of 9000. So why does this one host have a problem? Also, where between 1500 and 9000 does it stop behaving?

987Mbit/s 是正常的,3.45Mbit/s 是异常的。这些机器硬件大致相同,驱动、操作系统和交换机也一样。我更换了网线和交换机端口,但没有任何改善。Claude(哈!)建议这可能是巨型帧(Jumbo Frames)的问题。在 iperf 客户端限制 MTU 有用吗?结果发现,使用 iperf -6 -M 1500 测试时,性能恢复到了线速。奇怪,所有设备都配置了巨型帧,MTU 为 9000。那么为什么这台主机有问题?另外,在 1500 到 9000 之间,它是从哪里开始出问题的呢?

One binary chop later: iperf -6 -N -M 7373 -c thinknodebot … [ 1] 0.0000-10.2512 sec 36.8 MBytes 30.1 Mbits/sec Too big. iperf -6 -N -M 7371 -c thinknodebot … [ 1] 0.0000-10.0125 sec 960 MBytes 804 Mbits/sec Works fine.

经过二分法测试: (测试结果显示 7373 太大,而 7371 运行正常。)

7372 is a very weird threshold. I also discover that iperf’ing the other way from the problem host to anything else works great. Something is wonky with the RX path on this one host. I’m out of ideas, so I ask Claude again Have you considered using the r8169 driver? the r8168 is a little problematic and you might get more stability with the in-kernel r8169. Sigh. I remove the r8168-dkms package, which restores the r8169 driver, and reboot. It works at line rate with jumbo frames. I tear my hair out.

7372 是一个非常奇怪的阈值。我还发现,从这台有问题的机器向其他机器进行 iperf 测试(反向)时,一切正常。看来这台主机的接收(RX)路径出了点毛病。我没招了,于是再次问 Claude:“你考虑过使用 r8169 驱动吗?r8168 有点问题,使用内核自带的 r8169 可能会更稳定。” 唉。我卸载了 r8168-dkms 包,恢复了 r8169 驱动并重启。现在它在巨型帧下能达到线速了。我简直要抓狂了。