Test-case Reducers Are Underappreciated Debugging Tools
Test-case Reducers Are Underappreciated Debugging Tools
测试用例精简器:被低估的调试利器
Test-case reducers are less well known than they should be, and those who are aware of them don’t always realise the variety of ways we can use – perhaps even abuse! – them. In this post, I’m going to explore some of the things I’ve learnt while using these wonderful tools. I’ll start at the basics, because the idea is so simple that it can be hard to believe it works. I’ll then work my way up to a deeper surprise. 测试用例精简器(Test-case reducers)的知名度远低于其应有的水平。即便是一些了解它们的人,也未必意识到我们能以多种方式使用——甚至“滥用”——它们。在这篇文章中,我将探讨在使用这些出色工具时所学到的一些心得。我将从基础知识讲起,因为这个概念简单到让人难以相信它真的有效,随后我将深入探讨一些更令人惊喜的用法。
Test-case reducers try to reduce the length of an input, but we can force them to take into account additional factors such as how often an error occurs, or the number of instructions executed. Depending on the problem you’re trying to debug, this can make a huge difference to the real-world effectiveness of test-case reducers. I don’t expect that anything in this post will surprise experts, but since I learnt this stuff the hard way, perhaps others will benefit from having some of it in one place. 测试用例精简器旨在缩短输入内容的长度,但我们也可以强制它们考虑其他因素,例如错误的发生频率或执行指令的数量。根据你所调试问题的不同,这会对精简器的实际效果产生巨大影响。我不指望这篇文章的内容能让专家感到惊讶,但由于我是通过“硬碰硬”的方式学到这些知识的,或许将它们汇总在一起能对其他人有所帮助。
Test-case reduction
测试用例精简
Imagine we have written a program that crashes on a large input, and we don’t know what part of the input causes the crash: what can we do? Most of us will probably start with debugging our program using classic techniques, gradually moving from the quick and dirty (printf) to the more principled (debuggers) to – if we’re desperate and experienced – “exotic” tools (sanitisers, valgrind, etc). Every programmer ends up with a set of debugging tools and techniques they reach for, some more effective than others.
想象一下,我们编写了一个程序,在处理大型输入时会崩溃,但我们不知道输入中的哪一部分导致了崩溃:我们该怎么办?我们大多数人可能会先使用经典技术来调试程序,逐渐从简单粗暴的方法(如 printf)转向更规范的方法(如调试器),如果走投无路且经验丰富,还会转向“异类”工具(如 sanitisers、valgrind 等)。每个程序员最终都会形成一套自己习惯使用的调试工具和技术,其中一些比另一些更有效。
One technique that is used less often than it probably should be is reducing the size of the input. In nearly all cases, the smaller an input is, the easier it is for us to work out why it’s leading to problems. Reduction can be manual. We can load an input into a text editor, remove a portion of it, and then see if the new, smaller, input still causes a crash. Unsurprisingly, as easily-bored humans with limited vision, we tend to miss many opportunities for reduction when we do so manually. 有一种技术的使用频率远低于其应有的水平,那就是缩减输入规模。在几乎所有情况下,输入越小,我们就越容易找出它导致问题的原因。精简过程可以是手动的:我们可以将输入加载到文本编辑器中,删除一部分,然后查看新的、较小的输入是否仍然会导致崩溃。不出所料,作为容易厌倦且视野有限的人类,我们在手动操作时往往会错过许多精简的机会。
It’s also often the case that deleting part of an input stops the program crashing in the way we were investigating: the reduced program might run to completion, or throw a different, correct and expected, error on the new input. Even worse, at some point one realises that deleting portion A of the input doesn’t achieve the effect we want, but deleting disjoint portions A and B does: how big is the search space of deletions?! A Sisyphean future beckons. 此外,删除部分输入往往会导致程序不再以我们正在调查的方式崩溃:精简后的程序可能会顺利运行结束,或者在新的输入上抛出另一个正确且预期的错误。更糟糕的是,有时你会发现删除输入中的 A 部分无法达到我们想要的效果,但同时删除不连续的 A 和 B 部分却可以:这种删除的搜索空间该有多大?!这简直是一项西西弗斯式的苦差事。
Test-case reducers
测试用例精简器
Fortunately, there are tools that automate the process of reducing test cases: test-case reducers. These take a program, an input, and an interestingness test. The test-case reducer tries ever shorter versions of the input, and the interestingness test tells the reducer whether those shorter versions still trigger the problem you care about. Test-case reducers can be astonishingly effective – 95-99% reductions are common – and often make debugging vastly easier. 幸运的是,有一些工具可以自动化测试用例的精简过程,即测试用例精简器。它们接收一个程序、一个输入和一个“有趣性测试”(interestingness test)。测试用例精简器会不断尝试更短的输入版本,而有趣性测试则会告诉精简器这些较短的版本是否仍然会触发你所关注的问题。测试用例精简器可以达到惊人的效果——通常能实现 95-99% 的精简率——并往往能极大地简化调试工作。
Test-case reducers can sound like they’re magic: how can a tool know what parts of the input to remove? To make things even worse, the community that has most thoroughly embraced them are compiler authors, who many programmers think of as being an impossibly skilled elite. It seems to me that the combination of these two things has put many people off from trying such tools. The good news is that test-case reducers are not magic. The easiest way to see that is by writing one. 测试用例精简器听起来可能像魔法:一个工具怎么知道该删除输入的哪些部分?更糟糕的是,最彻底拥抱这些工具的群体是编译器开发者,许多程序员认为他们是拥有不可思议技能的精英。在我看来,这两点结合在一起,让许多人望而却步,不敢尝试这些工具。好消息是,测试用例精简器并非魔法。理解这一点的最简单方法就是亲手写一个。
Let’s imagine that I’ve written this program which reads words in from a file: 让我们假设我编写了这样一个程序,它从文件中读取单词:
import sys
for l in open(sys.argv[1]):
if len(l) > 25:
print("Word too long\n")
When I run this on my machine with /usr/share/dict/words it prints a warning:
当我在我的机器上使用 /usr/share/dict/words 运行它时,它会打印一条警告:
$ python3 t.py /usr/share/dict/words
Word too long
For the sake of the example, let’s pretend that seeing that warning is an “error” and we haven’t been able to work out why it crashes using traditional debugging techniques. Can a test-case reducer help us? The first thing we need to do is define our interestingness test. I’ll deliberately follow the conventions of the test-case reducers I’m familiar with and say that an interestingness test is a program which: 为了演示,我们假装看到该警告就是一种“错误”,并且我们无法通过传统调试技术找出它崩溃的原因。测试用例精简器能帮到我们吗?我们需要做的第一件事是定义我们的“有趣性测试”。我将特意遵循我所熟悉的测试用例精简器的惯例,即“有趣性测试”是一个满足以下条件的程序:
- Returns 0 if the input is “interesting” – that is, the input manifests the error we are interested in – and we should use this reduced input going forward.
- Returns non-0 if the input is “uninteresting” and we need to try a different reduction.
- 如果输入是“有趣的”(即输入表现出了我们关注的错误),则返回 0,我们应继续使用此精简后的输入。
- 如果输入是“无趣的”,则返回非 0 值,我们需要尝试另一种精简方式。
I’m going to write a simple shell script that takes a filename in as the first argument, runs my program above, and checks if it produces “Word too long” as its output. If it does, my interestingness test will return 0; if not it will return 1. 我将编写一个简单的 shell 脚本,它接收一个文件名作为第一个参数,运行我上面的程序,并检查其输出是否包含“Word too long”。如果是,我的有趣性测试将返回 0;否则返回 1。
#! /bin/sh
if python3 t.py "$1" | grep "Word too long" > /dev/null; then
exit 0
else
exit 1
fi
Now we need to do the hard part: the reducer! Fortunately it’s not too difficult: 现在我们需要完成困难的部分:精简器!幸运的是,它并不太难:
#! /usr/bin/env python3
import subprocess, sys, tempfile
cur = [x.rstrip() for x in list(open(sys.argv[2]))]
i = 0
while i < len(cur):
with tempfile.NamedTemporaryFile(mode="w") as p:
cnd = cur[:]
del cnd[i]
p.write("\n".join(cnd))
p.flush()
if subprocess.run([sys.argv[1], p.name]).returncode == 0:
cur = cnd
else:
i += 1
print("\n".join(cur))
In essence, this first loads in the text input and splits it into lines (cur). Then it loops over the input. Each iteration creates a candidate input (cnd), with one line of input removed (del cnd[i]) relative to the previous starting point. We create a temporary file, write the candidate input to it, and run our interestingness test. If it returns 0, we keep the candidate as our new state.
本质上,它首先加载文本输入并将其拆分为行(cur)。然后它遍历输入。每次迭代都会创建一个候选输入(cnd),相对于上一个起点删除了一行输入(del cnd[i])。我们创建一个临时文件,将候选输入写入其中,并运行我们的有趣性测试。如果它返回 0,我们就保留该候选者作为我们的新状态。