The design of littlefs
The design of littlefs
littlefs 的设计
littlefs is a little fail-safe filesystem designed for microcontrollers. littlefs 是一个专为微控制器设计的小型故障安全(fail-safe)文件系统。
.---._____ .-----.
| | --|o |---| littlefs | --| |---|
| '-----' '----------' |
| |
littlefs was originally built as an experiment to learn about filesystem design in the context of microcontrollers. The question was: How would you build a filesystem that is resilient to power-loss and flash wear without using unbounded memory? This document covers the high-level design of littlefs, how it is different than other filesystems, and the design decisions that got us here. For the low-level details covering every bit on disk, check out SPEC.md. littlefs 最初是作为一项实验而构建的,旨在学习微控制器环境下的文件系统设计。核心问题是:如何在不使用无限内存的情况下,构建一个能够抵御断电和闪存磨损的文件系统?本文档涵盖了 littlefs 的高层设计、它与其他文件系统的区别,以及促成当前设计的决策过程。有关磁盘上每一位的底层细节,请参阅 SPEC.md。
The problem
问题所在
The embedded systems littlefs targets are usually 32-bit microcontrollers with around 32 KiB of RAM and 512 KiB of ROM. These are often paired with SPI NOR flash chips with about 4 MiB of flash storage. These devices are too small for Linux and most existing filesystems, requiring code written specifically with size in mind. littlefs 针对的嵌入式系统通常是 32 位微控制器,拥有约 32 KiB 的 RAM 和 512 KiB 的 ROM。它们通常搭配约 4 MiB 闪存空间的 SPI NOR 闪存芯片。这些设备对于 Linux 和大多数现有文件系统来说太小了,因此需要专门针对体积进行优化的代码。
Flash itself is an interesting piece of technology with its own quirks and nuance. Unlike other forms of storage, writing to flash requires two operations: erasing and programming. Programming (setting bits to 0) is relatively cheap and can be very granular. Erasing however (setting bits to 1), requires an expensive and destructive operation which gives flash its name. Wikipedia has more information on how exactly flash works. 闪存本身是一项有趣的技术,有其独特的怪癖和细微差别。与其他存储形式不同,写入闪存需要两个操作:擦除和编程。编程(将位设置为 0)相对廉价且可以非常细粒度。然而,擦除(将位设置为 1)则需要昂贵且具有破坏性的操作,这也是“闪存”(Flash)名称的由来。维基百科上有关于闪存工作原理的更多详细信息。
To make the situation more annoying, it’s very common for these embedded systems to lose power at any time. Usually, microcontroller code is simple and reactive, with no concept of a shutdown routine. This presents a big challenge for persistent storage, where an unlucky power loss can corrupt the storage and leave a device unrecoverable. This leaves us with three major requirements for an embedded filesystem. 更令人头疼的是,这些嵌入式系统在任何时候断电都非常常见。通常,微控制器代码简单且具有响应性,没有关机程序的概念。这对持久化存储提出了巨大挑战,因为一次不幸的断电可能会损坏存储数据,导致设备无法恢复。这为嵌入式文件系统提出了三个主要要求。
-
Power-loss resilience - On these systems, power can be lost at any time. If a power loss corrupts any persistent data structures, this can cause the device to become unrecoverable. An embedded filesystem must be designed to recover from a power loss during any write operation.
-
断电鲁棒性 - 在这些系统中,随时可能断电。如果断电损坏了任何持久化数据结构,可能导致设备无法恢复。嵌入式文件系统必须设计为能够在任何写入操作期间发生断电后进行恢复。
-
Wear leveling - Writing to flash is destructive. If a filesystem repeatedly writes to the same block, eventually that block will wear out. Filesystems that don’t take wear into account can easily burn through blocks used to store frequently updated metadata and cause a device’s early death.
-
磨损均衡 - 写入闪存具有破坏性。如果文件系统反复写入同一个块,最终该块会磨损。不考虑磨损的文件系统很容易耗尽用于存储频繁更新元数据的块,从而导致设备过早损坏。
-
Bounded RAM/ROM - If the above requirements weren’t enough, these systems also have very limited amounts of memory. This prevents many existing filesystem designs, which can lean on relatively large amounts of RAM to temporarily store filesystem metadata. For ROM, this means we need to keep our design simple and reuse code paths where possible. For RAM we have a stronger requirement, all RAM usage is bounded. This means RAM usage does not grow as the filesystem changes in size or number of files. This creates a unique challenge as even presumably simple operations, such as traversing the filesystem, become surprisingly difficult.
-
受限的 RAM/ROM - 如果上述要求还不够,这些系统的内存也非常有限。这阻碍了许多现有的文件系统设计,因为它们往往依赖相对大量的 RAM 来临时存储文件系统元数据。对于 ROM,这意味着我们需要保持设计简单,并尽可能重用代码路径。对于 RAM,我们有更严格的要求:所有 RAM 使用量必须是有界的。这意味着 RAM 的使用量不会随着文件系统大小或文件数量的变化而增长。这带来了一个独特的挑战,因为即使是看似简单的操作(如遍历文件系统)也会变得异常困难。
Existing designs?
现有设计?
So, what’s already out there? There are, of course, many different filesystems, however they often share and borrow feature from each other. If we look at power-loss resilience and wear leveling, we can narrow these down to a handful of designs. 那么,目前市面上有什么呢?当然,有很多不同的文件系统,但它们通常会相互借鉴功能。如果我们着眼于断电鲁棒性和磨损均衡,可以将它们归纳为几种设计。
First we have the non-resilient, block based filesystems, such as FAT and ext2. These are the earliest filesystem designs and often the most simple. Here storage is divided into blocks, with each file being stored in a collection of blocks. Without modifications, these filesystems are not power-loss resilient, so updating a file is a simple as rewriting the blocks in place. 首先是基于块的非鲁棒性文件系统,例如 FAT 和 ext2。这些是最早的文件系统设计,通常也是最简单的。在这种设计中,存储被划分为块,每个文件存储在一组块中。如果不进行修改,这些文件系统不具备断电鲁棒性,因此更新文件就像原地重写块一样简单。
.--------.
| root |
| |
'--------'
.-' '-.
v v
.--------. .--------.
| A | | B |
| | | |
'--------' '--------'
.-' .-' '-.
v v v
.--------. .--------. .--------.
| C | | D | | E |
| | | | | |
'--------' '--------' '--------'
Because of their simplicity, these filesystems are usually both the fastest and smallest. However the lack of power resilience is not great, and the binding relationship of storage location and data removes the filesystem’s ability to manage wear. 由于其简单性,这些文件系统通常既是最快的也是最小的。然而,缺乏断电鲁棒性并不理想,且存储位置与数据之间的绑定关系剥夺了文件系统管理磨损的能力。
In a completely different direction, we have logging filesystems, such as JFFS, YAFFS, and SPIFFS, storage location is not bound to a piece of data, instead the entire storage is used for a circular log which is appended with every change made to the filesystem. Writing appends new changes, while reading requires traversing the log to reconstruct a file. Some logging filesystems cache files to avoid the read cost, but this comes at a tradeoff of RAM. 在完全不同的方向上,我们有日志文件系统,如 JFFS、YAFFS 和 SPIFFS。存储位置不绑定到特定的数据,而是将整个存储空间用作循环日志,每次对文件系统的更改都会追加到日志中。写入操作追加新的更改,而读取操作则需要遍历日志来重构文件。一些日志文件系统会缓存文件以避免读取开销,但这需要以牺牲 RAM 为代价。
v
.--------.--------.--------.--------.--------.--------.--------.--------.
| C | new B | new A | | A | B | | |
| | | | | | | | |
'--------'--------'--------'--------'--------'--------'--------'--------'
Logging filesystem are beautifully elegant. With a checksum, we can easily detect power-loss and fall back to the previous state by ignoring failed appends. And if that wasn’t good enough, their cyclic nature means that logging filesystems distribute wear across storage perfectly. The main downside is performance. If we look at garbage collection, the process of cleaning up outdated data from the end of the log, I’ve yet to see a pure logging filesystem that does not have one of these two costs: O(n²) runtime or O(n) RAM. 日志文件系统非常优雅。通过校验和,我们可以轻松检测到断电,并通过忽略失败的追加操作回退到之前的状态。如果这还不够好,它们的循环特性意味着日志文件系统可以完美地将磨损分布到整个存储空间。主要的缺点是性能。如果我们观察垃圾回收(即从日志末尾清理过期数据的过程),我还没有见过哪种纯日志文件系统不需要付出以下两种代价之一:O(n²) 的运行时间或 O(n) 的 RAM。
SPIFFS is a very interesting case here, as it uses the fact that repeated programs to NOR flash is both atomic and masking. This is a very neat solution, however it limits the type of storage you can support. SPIFFS 在这里是一个非常有趣的案例,因为它利用了对 NOR 闪存的重复编程既是原子性的又是掩码性的这一事实。这是一个非常巧妙的解决方案,但它限制了所支持的存储类型。
Perhaps the most common type of filesystem, a journaling filesystem is the offspring that happens when you mate a block based filesystem with a logging filesystem. ext4 and NTFS are good examples. Here, we take a normal block based filesystem and add a bounded log where we note every change before it occurs. 也许最常见的文件系统类型是日志式文件系统(Journaling filesystem),它是基于块的文件系统与日志文件系统结合的产物。ext4 和 NTFS 就是很好的例子。在这里,我们采用一个普通的基于块的文件系统,并添加一个有界的日志,在每次更改发生之前记录下来。
journal
.--------.--------. .--------.
| C' | D' | | E' |
| | | | |
'--------'--------' '--------'
| root |-->| |-->| |
| | | | | |
'------' '--------' '--------'