Generating a Tiny Test Suite with Greedy Set Cover Minimization

Aristotelis Koutsouridis
October 1, 2021
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

No one likes waiting for regression tests to run.  Maintaining the smallest regression test suite that still covers all the code makes sure your software pipeline is as fast as possible.  In addition, when running a fuzzing campaign to find bugs, a smaller initial test suite (aka "corpus" or "seeds") typically improves fuzzing efficiency.

In this article, we describe an algorithm that gives you a 2.7x improvement. We've contributed the code back to LLVM so everyone can get these speedups.

The Challenge of Minimization

Before diving into technical stuff, let's consider the problem of test suite minimization. The desired result is obtaining a significantly smaller test suite, compared to the initial test suite, with the same coverage as the initial test suite. We can think of each test case of a test suiteas a file associated with a set of features. The term feature is used in libFuzzer to describe code coverage. For example edge coverage, edge counters, value profiles, indirect caller/callee pairs can all be considered features. To simplify things, we can think of the feature set of a test case as the set of edges that are covered when the target program is run with that particular test case.

When choosing which test cases are going to be included in the minimized test suite, it seems reasonable to prefer test cases with many features and hence large coverage. At the same time other things need to be taken into account too. For example file size and execution time. The target program will probably spend more time in a large file due to IO rather than a small file.

libFuzzer’s Test Suite Minimization

The algorithm behind test suite minimization in libFuzzer is very straight forward. A pass over the initial test suite files, gathers unique features for each file; when gathering features for a file, only features that are not yet encountered in other files are added. Then, these files along with their disjoint feature sets are sorted first on their file size and then on their number of unique features. Finally, another pass is performed over the sorted test suite. In this pass, a test case is included in the minimized test suite only if it adds new features.

def minimize(initial_corpus) -> List[Testcase]:
minimized_corpus = [ ]
all_features = [ ]
sort(initial_corpus, key=(file.size, file.features.size()))
for file in initial_corpus:
if not file.features.subset_of(all_features):
minimized_corpus += file
all_features += file.features
return minimized_corpus

Due to the ordering mentioned earlier, smaller files are preferred for the minimized test suite and among files with the same size, the file with the largest number of unique features is selected. It is easy to see that the minimized test suite covers exactly the same features as the initial test suite.

Set Cover Minimization

The problem of test suite minimization can be reduced without much effort to a known problem in computer science. It’s called the set cover problem. The problem statement is the following:

given a set U (the universe) and a collection S of subsets of U, what is the smallest sub-collection of S whose union equals U

We can reduce test suite minimization to this problem if we consider the universe U as the union of the features of all the files in the initial test suite. Then, each element of S is a feature set of a test suite file. Clearly their union equals U. With that in mind, the solution to the set cover problem is the smallest set of files that covers all the features!

For example, in the above figure, each dot represents a feature (e.g. an edge of the control flow graph). Each colored region S is an execution path that corresponds to a single test case file from the initial test suite. The set of all dots is the set of all the features in the initial test suite. A candidate solution would be {S3, S4, S6} because it covers all the dots.

Unfortunately, the above problem is NP-Complete, which means that there is no polynomial time algorithm that solves it. In our setting, the test suite can contain from hundreds to a few million of test cases, each having hundreds of features. Therefore, our next best option is an approximation algorithm that will find a sub-optimal solution (the size of the approximate minimized test suite will be larger than the size of the optimal minimized test suite). Luckily, there is a very famous greedy approximation algorithm for set cover which also guarantees that the approximate solution is at most ln(Opt) times larger than the optimal solution Opt. This approximation ratio seemed very promising, compared to libFuzzer’s algorithm on which we have no guarantees on the quality of the solution.

The algorithm itself, although a bit more complex than the single-pass algorithm described earlier, is very intuitive. A set of covered features is maintained, initially empty. At each iteration we select the test case that covers the most features that are not yet covered. Of course, when two files have the same number of previously uncovered features, the smaller one is picked. The algorithm terminates when all features are covered.

def set_cover_minimize(initial_corpus) -> List[Testcase]:
covered = { }
minimized_corpus = [ ]
while covered != all_features:
file = file_with_most_uncovered_features(initial_corpus)
covered += file.features
minimized_corpus += file
initial_corpus -= file
return minimized_corpus

An optimization that improves the run time of minimization is the use of a bit vector to implement the covered set containing the already covered edges instead of std::set. Features are represented by simple 32-bit integers within libFuzzer. In fact, libFuzzer uses a 26KB (2^21 bits) bit-vector to track edge coverage during fuzzing. The same idea is used for minimization in order to speed things up. A feature with value f is covered if covered[f % covered.size] is set to 1. But what about collisions?

Collisions are ignored. This means that the minimized test suite may not cover some features that the initial test suite had. For example, if the initial test suite contains the features 0 and 2^21 in two different files, then it is possible that 2^21 will be falsely considered covered and the associated file will not get picked. This is very rare though in reality.

Comparing Results

The set cover implementation of test suite minimization can be used with the -set_cover_merge=1 flag, which is similar to the existing -merge=1 flag that libFuzzer uses for minimizing and merging multiple test suites into one. We compared the two different algorithms for several targets:

  • JQ: we’ve seen a harness of this target in a previous blog post. Here we get some really interesting results. 
  • rapidjson: another json parsing library. 
  • sqlite: This target is harnessed in the fuzzer-test-suite repository.

First let's see how we can obtain the minimized test suite with the standard libFuzzer minimization. You can also experiment with the docker image of the testing environment containing the initial test suite and the libFuzzer target.

root@da1a2c981a4d:/src/harness# ls initial-corpus/ | wc -l
101633
root@dd166043164c:/src/harness# mkdir minimized-corpus
root@dd166043164c:/src/harness# ./fuzzer_jq -merge=1 minimized-corpus initial-corpus
...
MERGE-OUTER: the control file has 5419469 bytes
MERGE-OUTER: consumed 8Mb (80Mb rss) to parse the control file
MERGE-OUTER: 679 new files with 6683 new features added;
...
root@dd166043164c:/src/harness# find minimized-corpus -ls | awk '{sum += $7; n++;} END {print sum/n;}'
922.415
root@dd166043164c:/src/harness# du -sh minimized-corpus
3.0M minimized-corpus

Initially, the test suite has a little over 100,000 test cases (line 1). An empty test suite is created (line 3) and the fuzzer is run with the -merge=1 flag, in order to minimize initial-test suite into minimized-test suite. The lines starting with MERGE-OUTER are libFuzzer output. Line 8 computes the average file size (in bytes) in the minimized test suite and line 10 computes the total size of the minimized test suite. By repeating the same workflow with the -set_cover_merge=1 flag, we can compare the two minimization techniques.

Overall, we get a 2.7x improvement over the existing libfuzzer approach. 

  • On JQ, we get a 2.4x improvement (283 vs. 679).
  • On rapidjson, we get a 4.31x improvement (245 vs 1056)
  • On sqlite, we get a 2.35x improvement (853 vs. 2001).

Focusing on the JQ target, the main result is the fact that the minimized test suite using the greedy set cover approach contains significantly less files, roughly 60% less. Of course the trade-off here is the execution time. From 5min29s, the time increased to 5min50s, a rather small change which is worth the dramatic decrease in test suite size. Another interesting result is the comparison on the average file size in the minimized test suite. Because the standard libFuzzer minimization prefers smaller files, it is expected that the average test case size is going to be small. Indeed, for greedy set cover the average size is 1.8 times larger. At a first glance this seems worrying because larger file size means more bytes for the target to read and therefore slow fuzzing. But looking at the total size of the minimized test suite we see that it is approximately 50% smaller in bytes, since it contains far less files.

Looking at the rapidjson target the difference in the minimized test suite quality is more pronounced. The test suite has 4 times less files and the total size is 3 times smaller. It is important that the number of features in the minimized test suite remains the same across the two implementations. This shows that using the bit-vector optimization doesn’t have a negative impact on coverage for real-world targets and test suites.

We also see similar results for sqlite. This target was given a relatively small starting test suite of approximately 10,000 test cases. 

Takeaways

This effort to improve test suite minimization for fuzzing shows that approximation algorithms can be really useful in solving certain hard combinatorial problems. The quality of the solution is much better and also there is a theoretical upper limit for the size of the minimized test suite, unlike a simple forward pass algorithm. Furthermore, we see that even though the proposed algorithm is not time efficient in terms of complexity (not linear), in practice it performs quite well if we apply simple optimizations (bit-vector) to the underlying implementation.

It would be interesting to see more people using this kind of minimization to fuzz their target applications. This would highlight what the effect of better minimization is during fuzzing. Set cover merge is now part of LLVM’s libFuzzer after contributing this work.

Note: This blog post was updated 10/25/21 to emphasize the 2.7x improvements in this method. 

Share this post

Add a Little Mayhem to Your Inbox

Subscribe to our weekly newsletter for expert insights and news on DevSecOps topics, plus Mayhem tips and tutorials.

By subscribing, you're agreeing to our website terms and privacy policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Add Mayhem to Your DevSecOps for Free.

Get a full-featured 30 day free trial.

Complete API Security in 5 Minutes

Get started with Mayhem today for fast, comprehensive, API security. 

Get Mayhem

Maximize Code Coverage in Minutes

Mayhem is an award-winning AI that autonomously finds new exploitable bugs and improves your test suites.

Get Mayhem