The smallest C++ binary

The smallest C++ binary

最小的 C++ 二进制文件

I thought of a cute problem: what is the smallest (size) ./a.out binary I can create? Here are some rules the program should follow: 我想到一个有趣的问题:我能创建的最小(体积)的 ./a.out 二进制文件是多大?程序需要遵循以下规则:

  • ./a.out must run successfully.
  • ./a.out 必须能成功运行。
  • $? must deterministically be 0.
  • $?(退出状态码)必须确定性地为 0。
  • The binary must be produced by GCC only; no post-processing with objcopy, hex editors, or manual patching.
  • 二进制文件必须仅由 GCC 生成;不得使用 objcopy、十六进制编辑器或手动修补进行后期处理。

We begin with the simplest program possible: 我们从最简单的程序开始:

// compiled with gcc empty.c
int main() { return 0; }

This gives us a file size of 15816 bytes (from stat). Not too shabby, but we will need four of the RAM used in the Apollo guidance computer to fit our binary that does nothing. 这产生了一个 15816 字节的文件(通过 stat 命令查看)。虽然还不错,但我们需要阿波罗制导计算机四倍的内存才能装下这个什么都不做的二进制文件。

Looking at file: 查看 file 命令的输出:

 file a.out
a.out: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /nix/store/jms7zxzm7w1whczwny5m3gkgdjghmi2r-glibc-2.42-51/lib/ld-linux-x86-64.so.2, for GNU/Linux 3.10.0, not stripped

“not stripped” looks suspicious. Whatever it is, surely it is better if we can strip stuff out of our binary. It turns out that gcc provides a -s flag that compiles the code without retaining any debugging information. We are now at 14352 bytes with our code stripped. “not stripped”(未剥离)看起来很可疑。不管它是什么,如果我们能从二进制文件中剥离掉一些东西肯定更好。事实证明,gcc 提供了一个 -s 标志,可以在编译代码时不保留任何调试信息。剥离后,我们的文件大小现在是 14352 字节。

Between running ./a.out and hitting int main(), there are many sorceries happening behind the scenes - so much so that there was a one-hour talk by Matt Godbolt at cppcon about it. Let’s tweak the main function so that we have a freestanding binary that skips everything that happened before int main(). 在运行 ./a.out 到执行 int main() 之间,幕后发生了许多“魔法”——以至于 Matt Godbolt 在 CppCon 上专门为此做了一个小时的演讲。让我们调整一下主函数,以便获得一个跳过 int main() 之前所有步骤的独立二进制文件。

// compiled with gcc empty.c -s -nostartfiles
#include <cstdlib>
extern "C" __attribute((noreturn)) void _start() { exit(0); }

This only gives us a measly improvement to 13632 bytes. Given how much Matt complains is happening before int main, surely there is still code in there that we aren’t running but is still in our binary! 这只带来了微不足道的改进,文件大小降至 13632 字节。考虑到 Matt 抱怨在 int main() 之前发生了那么多事情,肯定还有一些我们没运行但仍然存在于二进制文件中的代码!

Checking objdump -x a.out, we can see a bunch of libraries being dynamically loaded: 通过检查 objdump -x a.out,我们可以看到一堆被动态加载的库:

(Output truncated for brevity) (为简洁起见,输出已截断)