The end of the AArch64 desktop experiment

The end of the AArch64 desktop experiment

AArch64 桌面实验的终结

This post is part 7 of the “Let me try to use an AArch64 system as a desktop” series: AArch64 desktop: day one, AArch64 desktop: day two, AArch64 desktop: last day, Arm desktop: 2025 attempt, part one, Arm desktop: emulation, Arm desktop: so many cores, not enough speed.

本文是“让我尝试使用 AArch64 系统作为桌面”系列文章的第 7 部分:AArch64 桌面:第一天、AArch64 桌面:第二天、AArch64 桌面:最后一天、Arm 桌面:2025 年尝试(第一部分)、Arm 桌面:模拟、Arm 桌面:核心虽多,速度不足。

The end of the AArch64 desktop experiment

AArch64 桌面实验的终结

After about eleven months of using an AArch64 desktop, I decided to end that experiment.

在使用 AArch64 桌面约十一个月后,我决定结束这项实验。

Hardware used

使用的硬件

About a year ago, I bought myself an Ampere Altra system. After moving some hardware around and making a few extra orders, the final setup was:

  • CPU: Ampere Altra Q80-30 processor (80 cores at 3.0GHz)
  • RAM: 128 GB (8x 16GB HMA82GR7CJR8N-XN)
  • GPU: AMD Radeon RX6700XT
  • NVME: Lexar LM970 2TB, ADATA SX8200 Pro 1TB
  • Motherboard: ASRock Rack ALTRAD8UD-1L2T
  • PSU: MSI MPG A850G (850W)
  • Case: Endorfy 700 Air
  • USB: no-name USB 3.2/10Gbps controller (PCIe x4)

大约一年前,我购买了一套 Ampere Altra 系统。在调整了一些硬件并额外订购了一些配件后,最终配置如下:

  • CPU: Ampere Altra Q80-30 处理器(80 核心,3.0GHz)
  • 内存: 128 GB (8x 16GB HMA82GR7CJR8N-XN)
  • 显卡: AMD Radeon RX6700XT
  • NVME: Lexar LM970 2TB, ADATA SX8200 Pro 1TB
  • 主板: ASRock Rack ALTRAD8UD-1L2T
  • 电源: MSI MPG A850G (850W)
  • 机箱: Endorfy 700 Air
  • USB: 无名 USB 3.2/10Gbps 控制器 (PCIe x4)

To be fair, I should mention that this is a server motherboard, not a desktop one, and Altra systems were never meant to be desktops (despite companies selling them as such). Naturally, the list of tested/approved devices (Qualified Vendor List (QVL for TLA fans)) is quite short and for Ampere Altra systems, it does not contain AMD Radeon GPU cards. They can be made to work, but this often requires additional effort. The extra USB 3.2 controller allowed me to have more USB devices than the motherboard alone supported, and gave me some 10Gbps ports for connecting external NVMe drives. The whole system was running just fine* under Fedora 42–44.

平心而论,我必须指出这是一块服务器主板而非桌面主板,Altra 系统本意并非作为桌面使用(尽管有公司将其作为桌面销售)。自然地,经过测试/批准的设备列表(即 QVL)非常短,且 Ampere Altra 系统并不包含 AMD Radeon 显卡。虽然可以通过额外努力让它们工作,但这通常需要付出代价。额外的 USB 3.2 控制器让我能够连接比主板原生支持更多的 USB 设备,并提供了一些 10Gbps 接口用于连接外部 NVMe 驱动器。整个系统在 Fedora 42–44 下运行得还算不错*。

The first issue

第一个问题

Have you noticed the small “*” at the end of the previous paragraph? The system I used was not quite Fedora — I had to use my own, self-built kernel. You see, the PCI Express controller in the Ampere Altra has some issues. Let me quote the description of the Ampere Altra erratum 82288 patches:

你注意到上一段末尾那个小小的“*”了吗?我使用的系统并非纯粹的 Fedora——我必须使用自己构建的内核。你看,Ampere Altra 中的 PCI Express 控制器存在一些问题。让我引用一下 Ampere Altra 勘误表 82288 补丁的描述:

Per Altra family erratum, PCIE_65 may cause invalid addresses to be generated on PCIe mmio writes, impacting certain device types, notably AMD GPUs, and thus the Altra family is not generally compatible with those device types.

根据 Altra 系列勘误表,PCIE_65 可能会在 PCIe MMIO 写入时产生无效地址,从而影响某些设备类型,特别是 AMD GPU,因此 Altra 系列通常与这些设备类型不兼容。

And longer description from patch itself: PCIe device drivers may map MMIO space as Normal, non-cacheable memory attribute (e.g. Linux kernel drivers mapping MMIO using ioremap_wc). This may be for the purpose of enabling write combining or unaligned accesses. This can result in data corruption on the PCIe interface’s outbound MMIO writes due to issues with the write-combining operation. The workaround modifies software that maps PCIe MMIO space as Normal, non-cacheable memory (e.g. ioremap_wc) to instead Device, non-gathering memory (e.g. ioremap). And all memory operations on PCIe MMIO space must be strictly aligned.

补丁本身的详细描述如下: PCIe 设备驱动程序可能会将 MMIO 空间映射为“普通、不可缓存”的内存属性(例如使用 ioremap_wc 的 Linux 内核驱动程序)。这可能是为了启用写合并或非对齐访问。由于写合并操作的问题,这可能导致 PCIe 接口的出站 MMIO 写入数据损坏。该变通方法修改了将 PCIe MMIO 空间映射为“普通、不可缓存”内存(如 ioremap_wc)的软件,改为映射为“设备、非收集”内存(如 ioremap)。并且所有对 PCIe MMIO 空间的内存操作都必须严格对齐。

So, to have a working Linux system, I had to rebuild the kernel on every package update. Which usually meant “weekly”. Each Monday or Tuesday, I would update the local copy of the Fedora kernel package repository and build it using my own versioning scheme, like “7.0.2-200.fc44.pcie65.6”. The “pcie65” part reminded me which patches I had applied, and the “6” was a counter for the patch rebases. I cloned the repository from GitHub and then rebased patches, adapting them whenever they needed work. The side effect was that I often used a newer kernel than the official Fedora release — there is a “stabilisation” branch in the Fedora kernel package repo where the soon-to-be-pushed version is present. So, when Fedora had 6.19.y kernel, I had 7.0.z one.

因此,为了拥有一个可用的 Linux 系统,我必须在每次软件包更新时重新构建内核。这通常意味着“每周一次”。每周一或周二,我都会更新本地的 Fedora 内核软件包仓库副本,并使用我自己的版本控制方案进行构建,例如“7.0.2-200.fc44.pcie65.6”。“pcie65”部分提醒我应用了哪些补丁,“6”则是补丁重构的计数器。我从 GitHub 克隆仓库,然后在需要时重新应用并适配补丁。副作用是我经常使用比官方 Fedora 发行版更新的内核——Fedora 内核包仓库中有一个“稳定化”分支,存放着即将推送的版本。所以,当 Fedora 使用 6.19.y 内核时,我已经在用 7.0.z 了。

So many cores, not enough speed

核心虽多,速度不足

As I wrote in my previous post, having eighty CPU cores does not mean that the system is a good, fast desktop machine.

正如我在上一篇文章中所写,拥有 80 个 CPU 核心并不意味着该系统是一台优秀、快速的桌面机器。

AMD GPU started failing

AMD 显卡开始出现故障

As I mentioned above, to get my AMD Radeon RX6700XT running properly I had to alter kernel with the out-of-tree patches. It worked, I could play some games, watch videos with hardware-assisted video decode acceleration. Until one day, around the Linux 7.0 release, when it started to fail. Running a game ended with: kernel: amdgpu 0000:03:00.0: Fence fallback timer expired on ring vcn_dec_0 Over and over again. Watching YouTube videos became impossible due to 720 out of 750 frames being dropped, etc. Normally I would start to bisect the kernel to find out where the problem is. But I was running a tainted kernel due to PCIE65 patches so who knew where the problem actually was…

正如我上面提到的,为了让我的 AMD Radeon RX6700XT 正常运行,我必须使用树外补丁修改内核。它确实起作用了,我可以玩一些游戏,并利用硬件加速观看视频。直到有一天,大约在 Linux 7.0 发布前后,它开始出现故障。运行游戏时会报错: kernel: amdgpu 0000:03:00.0: Fence fallback timer expired on ring vcn_dec_0 反复出现。观看 YouTube 视频变得不可能,因为 750 帧中会丢掉 720 帧。通常我会开始二分查找内核以找出问题所在。但我运行的是因为 PCIE65 补丁而受污染的内核,所以谁知道问题到底出在哪里……

Let’s get Nvidia

换用 Nvidia

I bought an Nvidia RTX 2060 graphics card and put it in place of the AMD Radeon. It turned out that if I wanted to use it with the nouveau kernel driver I still needed PCIE65 patches applied… So I tried default Fedora kernel with Nvidia binary driver. And it worked fine. Video decoding was accelerated, some games under Wine worked as well. But then I started FreeCAD. And OrcaSlicer. And in both cases I got crash and exit… It turned out that there was no org.freedesktop.Platform.GL.nvidia in Flatpak repositories for AArch64. And I used both of those tools quite often.

我买了一张 Nvidia RTX 2060 显卡替换了 AMD Radeon。结果发现,如果我想用 nouveau 内核驱动程序,仍然需要应用 PCIE65 补丁……所以我尝试了带有 Nvidia 二进制驱动程序的默认 Fedora 内核。它运行良好,视频解码有加速,Wine 下的一些游戏也能运行。但随后我启动了 FreeCAD 和 OrcaSlicer,结果两者都崩溃并退出……原来 AArch64 的 Flatpak 仓库中没有 org.freedesktop.Platform.GL.nvidia。而我经常使用这两个工具。

Powering up the old x86-64…

启动旧的 x86-64……

At that point, I gave up. And booted my x86-64 system, which had been powered off all that time. There were a lot of cables to move, some new ones to arrange, and now I have both “wooster” (Ampere Altra) and “puchatek” (Ryzen 5 3600) systems running under my desk. Moving from 80 cores to 6 cores (12 threads) was a weird experience. A much smaller number, yet things work fine. I can load all threads and the music still plays. All games from my Steam library are playable. A working FreeCAD allows me to finish designing cases for my home projects and I can 3D print prototypes straight from OrcaSlicer. The “wooster” system stays powered on, churning through RISC-V package builds. It may be weak in single-thread, but it flies when it comes to multi-core load.

在那一刻,我放弃了。我启动了那台一直关机的 x86-64 系统。我移动了许多线缆,整理了一些新的线路,现在我的桌下同时运行着“wooster”(Ampere Altra)和“puchatek”(Ryzen 5 3600)两套系统。从 80 核心切换到 6 核心(12 线程)是一种奇怪的体验。核心数少了很多,但一切运行正常。我可以让所有线程满载,音乐依然流畅播放。Steam 库中的所有游戏都能玩。能正常工作的 FreeCAD 让我也能完成家庭项目的机箱设计,并且可以直接从 OrcaSlicer 3D 打印原型。“wooster”系统保持开机状态,专门负责 RISC-V 软件包的构建。它在单线程上可能较弱,但在多核负载下表现飞快。

Conclusion

结论

As for the Ampere Altra, I am not planning to repeat this experiment. Another AArch64 desktop attempt would require a completely new hardware platform. And I have no plans.

至于 Ampere Altra,我不打算重复这个实验。下一次 AArch64 桌面尝试将需要一个全新的硬件平台。而我目前没有任何计划。