z386: An Open-Source 80386 Built Around Original Microcode - Small Things Retro

z386: An Open-Source 80386 Built Around Original Microcode - Small Things Retro

This is the fifth installment of the 80386 series. The FPGA CPU is now far enough along to run real software, and this post is about how it works. z386 is a 386-class CPU built around the original Intel microcode, in the same spirit as z8086. The core is not an instruction-by-instruction emulator in RTL. The goal is to recreate enough of the original machine that the recovered 386 control ROM can drive it.

这是 80386 系列文章的第五篇。目前这款 FPGA CPU 的开发进度已经足以运行真实的软件,本文将介绍其工作原理。z386 是一款基于原始 Intel 微代码构建的 386 级 CPU,其设计理念与 z8086 一脉相承。该核心并非 RTL 级别的指令模拟器,其目标是重构出足够还原的原始机器,以便能够由恢复出来的 386 控制 ROM 来驱动。

Today z386 boots DOS 6 and DOS 7, runs protected-mode programs like DOS/4GW and DOS/32A, and plays games like Doom and Cannon Fodder. Here are some rough numbers against ao486:

如今,z386 已经可以引导 DOS 6 和 DOS 7,运行 DOS/4GW 和 DOS/32A 等保护模式程序,并能游玩《毁灭战士》(Doom) 和《炮灰》(Cannon Fodder) 等游戏。以下是它与 ao486 的一些性能对比数据:

Metricz386ao486
Lines of code (cloc)8K17.6K
ALUTs18K21K
Registers5K6.5K
BRAM116K131K
FPGA clock85MHz90MHz
3DBench FPS3443
Doom (original) FPS, max details16.521.0

In current builds, z386 performs like a fast (~70MHz) cached 386-class machine, or a low-end 486. It runs at a much higher clock than historical 386 CPUs, but with somewhat worse CPI (cycles per instruction). The current cache is a 16 KB, 4-way set-associative unified L1, chosen partly to keep the clock high. Real high-end 386 systems often used larger external caches, typically in the 32 KB to 128 KB range.

在当前的构建版本中,z386 的性能表现相当于一台快速(约 70MHz)且带有缓存的 386 级机器,或是一台低端 486。它的运行频率远高于历史上的 386 CPU,但 CPI(每指令周期数)略差。目前的缓存是一个 16 KB、4 路组相联的统一 L1 缓存,选择该规格的部分原因是维持高时钟频率。真正的高端 386 系统通常使用更大的外部缓存,典型范围在 32 KB 到 128 KB 之间。

Much of this 386 microarchitecture archaeology has already been covered in the previous four posts: the multiplication/division datapath, the barrel shifter, protection and paging, and the memory pipeline. z386 tries to be both an educational reconstruction and a usable FPGA CPU. It keeps many 386-like structures: a 32-entry paging TLB, a barrel shifter shaped like the original, ROM/PLA-style decoding, the Protection PLA model, and most importantly the 37-bit-wide, 2,560-entry microcode ROM.

关于 386 微架构的考古工作,大部分已在前四篇文章中涵盖:乘除法数据通路、桶形移位器、保护与分页机制,以及内存流水线。z386 既是一次教育性的重构,也力求成为一款可用的 FPGA CPU。它保留了许多 386 的结构:32 条目的分页 TLB、与原始设计形状一致的桶形移位器、ROM/PLA 风格的解码器、保护 PLA 模型,以及最重要的——37 位宽、2560 条目的微代码 ROM。

At the same time, it uses FPGA-friendly shortcuts where they make sense, such as DSP blocks for multiplication and the small fast L1 cache. In this post, I will fill in the rest of the design: instruction prefetch, decode, the microcode sequencer, cache design, testing, how z386 differs from ao486, and some lessons from the bring-up.

同时,它在合理的地方使用了对 FPGA 更友好的捷径,例如用于乘法的 DSP 模块和小型快速 L1 缓存。在本文中,我将补全设计的其余部分:指令预取、解码、微代码定序器、缓存设计、测试、z386 与 ao486 的区别,以及在调试过程中总结的一些经验。

From z8086 to z386

从 z8086 到 z386

A little background first. Last year I wrote z8086, an original-microcode-driven 8086, based on reenigne’s disassembly work. That project showed that it was possible to build a working CPU around recovered microcode. Towards the end of the year, I learned that 80386 microcode had recently been extracted, and that reenigne and several others — credited at the end of this post — were working on a disassembly. They generously shared their work with me, and z386 started from there.

先交代一点背景。去年我编写了 z8086,这是一个基于 reenigne 的反汇编工作、由原始微代码驱动的 8086。该项目证明了围绕恢复的微代码构建可工作的 CPU 是可行的。去年年底,我得知 80386 的微代码最近已被提取出来,并且 reenigne 和其他几位人士(在文末致谢)正在进行反汇编工作。他们慷慨地与我分享了成果,z386 便由此开启。

The 386 is a very different problem from the 8086. The instruction set is larger, the internal state is much richer, and the machine has to enforce protection, paging, privilege checks, and precise faults. More importantly, the 80386 micro-operations are denser and more contextual. If the 8086 microcode reads like a straightforward C program, the 386 microcode reads more like hand-tuned assembly: short, subtle, and full of assumptions about hidden hardware. That puzzle took about four months of evenings and weekends. The result is not a perfect 386 yet, but it is now far enough along to run real protected-mode DOS software.

386 与 8086 是截然不同的课题。其指令集更大,内部状态更丰富,且机器必须强制执行保护、分页、权限检查和精确异常。更重要的是,80386 的微操作更密集且更具上下文相关性。如果说 8086 的微代码读起来像直白的 C 程序,那么 386 的微代码则更像手工调优的汇编:短小、微妙,且充满了对隐藏硬件的假设。这个难题耗费了我大约四个月的夜晚和周末。虽然结果还不是一个完美的 386,但它已经足以运行真实的保护模式 DOS 软件。

z386 - high-level view

z386 - 高层概览

At a high level, the 386 is organized around eight major units. z386 follows the same division closely enough that the original Intel block diagram is still a useful map.

从高层来看,386 围绕八个主要单元组织。z386 紧密遵循了这一划分,以至于原始的 Intel 框图至今仍是一份有用的地图。

(Note: The article continues with detailed descriptions of the eight units: Prefetch, Decoder, Microcode sequencer, ALU/shifter, Segmentation, Protection, Paging, and BIU/cache/memory path.)

(注:文章后续详细描述了这八个单元:预取单元、解码器、微代码定序器、ALU 与移位器、分段单元、保护单元、分页单元以及 BIU/缓存/内存路径。)