Openrsync: An implementation of rsync, by the OpenBSD team

Openrsync: An implementation of rsync, by the OpenBSD team

Openrsync:OpenBSD 团队实现的 rsync

Introduction

简介

This system has been merged into OpenBSD base. If you’d like to contribute to openrsync, please mail your patches to tech@openbsd.org. This repository is simply the OpenBSD version plus some glue for portability. This is an implementation of rsync with a BSD (ISC) license. It’s compatible with a modern rsync (3.1.3 is used for testing, but any supporting protocol 27 will do), but accepts only a subset of rsync’s command-line arguments. Its officially-supported operating system is OpenBSD, but it will compile and run on other UNIX systems. See Portability for details. 该系统已被合并至 OpenBSD 基础代码库中。如果您希望为 openrsync 做出贡献,请将补丁发送至 tech@openbsd.org。此仓库仅包含 OpenBSD 版本以及一些用于可移植性的粘合代码。这是一个采用 BSD (ISC) 许可证的 rsync 实现。它与现代 rsync 兼容(测试使用 3.1.3 版本,但任何支持协议 27 的版本均可),但仅接受 rsync 命令行参数的一个子集。其官方支持的操作系统是 OpenBSD,但它也可以在其他 UNIX 系统上编译和运行。详情请参阅“可移植性”(Portability)部分。

The canonical documentation for openrsync is its manual pages. See rsync(5) and rsyncd(5) for protocol details or utility documentation in openrsync(1). If you’d like to write your own rsync implementation, the protocol manpages should have all the information required. The Architecture and Algorithm sections on this page serve to introduce developers to the source code. They are non-canonical. openrsync 的权威文档是其手册页。有关协议详情,请参阅 rsync(5) 和 rsyncd(5);有关工具文档,请参阅 openrsync(1)。如果您想编写自己的 rsync 实现,协议手册页应包含所需的所有信息。本页的“架构”和“算法”部分旨在向开发者介绍源代码,它们并非权威文档。

Project background

项目背景

openrsync is written as part of the rpki-client(1) project, an RPKI validator for OpenBSD. openrsync was funded by NetNod, IIS.SE, SUNET and 6connect. openrsync 是作为 rpki-client(1) 项目的一部分编写的,该项目是 OpenBSD 的一个 RPKI 验证器。openrsync 由 NetNod、IIS.SE、SUNET 和 6connect 资助。

Installation

安装

On an up-to-date UNIX system, simply download and run: 在最新的 UNIX 系统上,只需下载并运行:

% ./configure % make

make install

This will install the openrsync utility and manual pages. It’s ok to have an installation of rsync at the same time: the two will not collide in any way. If you upgrade your sources and want to re-install, just run the same. If you’d like to uninstall the sources: 这将安装 openrsync 工具和手册页。同时安装 rsync 是没有问题的:两者不会产生任何冲突。如果您升级了源码并希望重新安装,只需再次运行上述命令即可。如果您想卸载源码:

make uninstall

If you’d like to interact with the openrsync as a server, you can run the following: 如果您想将 openrsync 作为服务器进行交互,可以运行以下命令:

% rsync —rsync-path=openrsync src/* dst % openrsync —rsync-path=openrsync src/* dst

If you’d like openrsync and rsync to interact, it’s important to use command-line flags available on both. See openrsync(1) for a listing. 如果您希望 openrsync 和 rsync 进行交互,请务必使用两者都支持的命令行标志。列表请参阅 openrsync(1)。

Algorithm

算法

For a robust description of the rsync algorithm, see “The rsync algorithm”, by Andrew Tridgell and Paul Mackerras. Andrew Tridgell’s PhD thesis, “Efficient Algorithms for Sorting and Synchronization”, covers the topics in more detail. This gives a description suitable for delving into the source code. 有关 rsync 算法的详尽描述,请参阅 Andrew Tridgell 和 Paul Mackerras 撰写的《The rsync algorithm》。Andrew Tridgell 的博士论文《Efficient Algorithms for Sorting and Synchronization》更详细地涵盖了这些主题。这些资料提供了深入研究源代码所需的背景描述。

The rsync algorithm has two components: the sender and the receiver. The sender manages source files; the receiver manages the destination. In the following invocation, first the sender is host remote and the receiver is the localhost, then the opposite. rsync 算法包含两个组件:发送方(sender)和接收方(receiver)。发送方管理源文件;接收方管理目标文件。在以下调用中,首先发送方是远程主机,接收方是本地主机,反之亦然。

% openrsync -lrtp remote:foo/bar ~/baz/xyzzy % openrsync -lrtp ~/foo/bar remote:baz/xyzzy

The algorithm hinges upon a file list of names and metadata (e.g., mode, mtime, etc.) shared between components. The file list describes all source files of the update and is generated by the sender. The sharing is implemented in flist.c. After sharing this list, both the receiver and sender independently sort the entries by the filenames’ lexicographical order. This allows the file list to be sent and received out of order. The ordering preserves a directory-first order, so directories are processed before their contained files. Moreover, once sorted, both sender and receiver may refer to file entries by their position in the sorted array. 该算法的核心在于组件之间共享的文件列表,其中包含文件名和元数据(如模式、修改时间等)。文件列表描述了更新的所有源文件,并由发送方生成。共享机制在 flist.c 中实现。共享此列表后,接收方和发送方会独立地按文件名的字典顺序对条目进行排序。这使得文件列表可以无序地发送和接收。排序过程保持了“目录优先”的顺序,因此目录会在其包含的文件之前被处理。此外,一旦排序完成,发送方和接收方都可以通过文件在排序数组中的位置来引用文件条目。

After the receiver reads the list, it iterates through each file in the list, passing information to the sender so that the sender may send back instructions to update the file. This is called the “block exchange” and is the mainstay of the rsync algorithm. During the block exchange, the sender waits to receive a request for update or end of sequence message; once a request is received, it scans for new blocks to send to the receiver. Once the block exchange is complete, the files are all up to date. The receiver is implemented in receiver.c; the sender, in sender.c. A great deal of the block exchange happens in blocks.c. 接收方读取列表后,会遍历列表中的每个文件,并将信息传递给发送方,以便发送方回传更新文件的指令。这被称为“块交换”(block exchange),是 rsync 算法的支柱。在块交换期间,发送方等待接收更新请求或序列结束消息;一旦收到请求,它就会扫描新的数据块发送给接收方。块交换完成后,所有文件即更新完毕。接收方实现在 receiver.c 中;发送方实现在 sender.c 中。块交换的大部分逻辑发生在 blocks.c 中。

Block exchange

块交换

The block exchange sequence is different for whether the file is a directory, symbolic link, or regular file. For symbolic links, the information required by the receiver is already encoded in the file list metadata. The symbolic link is updated to point to the correct target. No update is requested from the sender. For directories, the directory is created if it does not already exist. No update is requested from the sender. 块交换序列根据文件类型(目录、符号链接或常规文件)而有所不同。对于符号链接,接收方所需的信息已编码在文件列表元数据中。符号链接会被更新以指向正确的目标,无需向发送方请求更新。对于目录,如果目录不存在,则会直接创建,同样无需向发送方请求更新。

Regular files are handled as follows. First, the file is checked to see if it’s up to date. This happens if the file size and last modification time are the same. If so, no update is requested from the sender. Otherwise, the receiver examines each file in blocks of a fixed size. See Block sizes for details. (The terminal block may be smaller if the file size is not divisible by the block size.) If the file is empty or does not exist, it will have zero blocks. 常规文件的处理方式如下:首先,检查文件是否为最新。如果文件大小和最后修改时间相同,则视为最新,无需向发送方请求更新。否则,接收方会以固定大小的块检查每个文件(详情请参阅“块大小”)。(如果文件大小不能被块大小整除,最后一个块可能会更小。)如果文件为空或不存在,则块数为零。

Each block is hashed twice: first, with a fast Adler-32 type 4-byte hash; second, with a slower MD4 16-byte hash. These hashes are implemented in hash.c. The receiver sends the file’s block hashes to the sender. Once accepted, the sender examines the corresponding file with the given blocks. For each byte in the source file, the sender computes a fast hash given the block size. It then looks for matching fast hashes in the sent block information. If it finds a match, it then computes and checks the slow hash. If no match is found, it continues to the next byte. The matching (and indeed all block operation) is implemented in block.c. 每个块都会进行两次哈希计算:首先是快速的 4 字节 Adler-32 类型哈希;其次是较慢的 16 字节 MD4 哈希。这些哈希实现在 hash.c 中。接收方将文件的块哈希发送给发送方。接收后,发送方使用给定的块检查对应的文件。对于源文件中的每个字节,发送方根据块大小计算快速哈希,然后在发送过来的块信息中查找匹配的快速哈希。如果找到匹配项,则计算并检查慢速哈希。如果没有找到匹配项,则继续处理下一个字节。匹配过程(实际上是所有块操作)实现在 block.c 中。

When a match is found, the data prior to the match is first sent as a stream of bytes to the receiver. This is followed by an identifier for the found block, or zero if no more data is forthcoming. The receiver writes the stream of bytes first, then copies the data in the identified block if one has been specified. This continues until the end of file, at which point the file has been fully reconstituted. If the file does not exist on the receiver side---the basis case---the entire file is sent as a stream of bytes. Following this, the whole file is hashed using an MD4 hash. These hashes are then compared; and on success, the algorithm continues to the 当找到匹配项时,匹配之前的数据首先作为字节流发送给接收方。随后是已找到块的标识符,如果没有更多数据,则发送零。接收方首先写入字节流,如果指定了块,则复制该标识符对应的块数据。此过程持续到文件末尾,此时文件已完全重构。如果接收方侧不存在该文件(基础情况),则整个文件将作为字节流发送。随后,使用 MD4 哈希对整个文件进行哈希处理。最后比较这些哈希值;如果成功,算法将继续进行……