What is Property-Based Fuzzing?
This is part two of a three part series on Property-based Fuzz Testing. Part one can be found here.
Fuzzing is the automated process of finding software bugs by feeding random data into a target program until one of those permutations reveals a flaw. Fuzzing has been responsible for discovering a large number of security-critical issues found in operating systems, browsers, and other security-sensitive applications.
A fuzzer needs two things to analyze a program:
- One or multiple ways to feed random data into the application under test
- A way to detect issues
Commonly, a fuzzer operates on binary programs that read from a file or from stdin. They detect UNIX signals that are triggered on assertion failure or a segmentation fault. AFL is an example of a fuzzer that has been successful at findings a large number of issues.
Property-based testing is a form of fuzzing. Property-based testing feeds random data into an application (or function) and detects flaws. It is particularly powerful as it allows developers to define and check custom correctness and safety policies, i.e. properties they define in their test.
The most important fuzzing advancement has been coverage guidance. A coverage-guided fuzzer gathers coverage information for each random input it tries. If a random input exercises new code, the fuzzer will keep it in a set of interesting inputs. The fuzzer will then generate new inputs by mutating those interesting inputs.
Coverage-guided fuzzing is simple, but extremely powerful. Imagine having a program that looks like:
if(input == 'B')
if(input == 'U')
if(intput == 'G')
A regular fuzzer has about one chance in 17 million to generate an input that starts with 'BUG'. This would take a long time! You can also see how the odds would decrease exponentially with each additional comparison.
However, a coverage-guided coverage has the capability to detect incremental progress, which turns an exponential problem into a linear one. Instead of having to generate the right input all at once, it can generate it one character at a time. Indeed, it has a 1 in 256 chance of generating an input that starts with 'B'. Since that input will exercise new code, the coverage-guided fuzzer will keep it and start mutating it. It has then a roughly 1 in 256 chance to mutate it to an input that starts with 'BU'. We say roughly since that probably depends on the mutation strategy: in addition to generate a 'U' in the second character, it should not modify the first character. Quickly enough, it will find the input triggering the bug.
Coverage-guided fuzzing scales to complex programs as well. For instance, AFL, a coverage guided fuzzer, has been used to generate JPEG images out of thin air:
JPEGs automatically generated by coverage-guided fuzzing
By fuzzing a JPEG parser with a coverage-guided fuzzer, the fuzzer eventually generates valid JPEG images! The odds of that happening without coverage guidance is infinitesimally small.
Combining Fuzzing & Property-based Testing
Property-based testing already has some of the great techniques used by modern fuzzers:
- In-memory fuzzing. Most fuzzers start a new process per fuzzing input. The cost of starting a new process can sometimes dominate runtime. To avoid this unnecessary cost, some fuzzers like libfuzzer allow users to define a function that will be repeatedly called in the same process. Property-based testing does that, which allows them to analyze thousands of inputs a second.
- Structure-aware fuzzing. Stucture-aware fuzzers understand the input format of the program under test. For instance, when fuzzing a JPEG parser, structure-aware might skew the distribution towards valid JPEG images, which leads to better coverage of your code. Property-based testing gives you structure-aware fuzzing with input generators. Fuzzing targets can use a similar idea to generate higher-level types from a sequence of types.
- Stateful testing. While not a part of fuzzers per-say, this is a technique used by the some of the best vulnerability researchers we know to test set stateful APIs. They use the random data from the input data generated by the fuzzer to select which method to call next. This is doing the same thing as property-based stateful testing.
However, most property-based testing frameworks are not coverage-guided. As we have seen in the previous section, coverage guidance helps automated tools build complex inputs exploring more parts of the program than they could before. Property-based fuzzing fills that gap and brings advanced fuzzing techniques to property-based testing to get even deeper coverage in less time.
In part three, I'll talk about useful properties to check.