Why Fuzzing Works
Have you heard of fuzzing before but wondered what all the fuss was about, or have you seen the “bug walls” of well-known fuzzers and wondered how they could possibly uncover so many problems, especially in heavily tested, well-audited codebases? Today I’ll talk about the fundamental reasons why fuzzing is so effective, and why it’s likely to remain a useful part of a secure software development lifecycle for quite a while.
If you haven’t heard of fuzzing, the briefest description of the idea is sending randomized inputs to a program and seeing what happens. This apparently goes back to the days of punch cards, where programmers would pull a series of cards from a trash to feed as input, and it continued into the days of Unix when someone first tried sending random characters to command line programs to see if they’d crash. Modern fuzzing practices are much more mature and involve a slew of practical advances that increase both the efficiency of testing and probability of detecting bugs, but the fundamental strength of fuzzing as testing is still centered around the raw speed and unbiased nature of generating many randomized inputs.
The Law of Fuzzing
This may sound like a strange method of testing, but it’s actually so effective that it should be a part of every serious software development lifecycle. In fact, I’d argue the effectiveness of fuzzing really isn’t up for debate anymore. I typically explain how effective fuzzing software is with what I call the “Law of Fuzzing”: if the software hasn’t been fuzzed, you’ll find bugs by fuzzing it. I’m actually not the first person to make this claim; in one of the first books on fuzzing from 2008 one of the authors said almost exactly the same thing: “I (Ari) have never seen a piece of software or a network device that passes all fuzz tests thrown at it.”
While it may sound like a bold claim, I’d argue that it’s no less axiomatic than “unit tests will help you find bugs.” In order to avoid sounding bombastic, I’ll add two caveats that I feel are very reasonable: 1) you must do a decent job of fuzzing in terms of actually getting reasonable input from the fuzzer to the primary input processing functionality at a decent rate and 2) we’ll only talk about applying the law of fuzzing to one kind of software, the kind that dominates the world: non-trivial programs written by humans.
Does it seem outlandish to make such a claim, or perhaps strange that I would equate fuzzing to unit tests? Let’s start by reviewing the fact that typical fuzzing is simply a kind of repeated “negative testing”. That is how I think of fuzzing, not something new or magic, but as just an automated form of a testing method that has been around forever. The tools and techniques have changed over time, but programmers have always been locked in a struggle with the software they write, trying to get it to do all the things it is supposed to and none of things it’s not. As we’ll see, fuzzing is a great tool in this fight because its advantages line up with some core problems at the heart of software development and testing.
Now that we’ve set the stage, join me on a journey to the fundamentals of the programming universe! There’s essentially two basic reasons why fuzzing is so useful in software development: combinatorics and complexity. As we’ll see these two topics are fundamental parts of the unchanging challenge of software engineering.
Combinatorics - Counting for advanced students
Combinatorics is an unusual word and unfamiliar to most people besides computer scientists and mathematicians, so let’s start by explaining it. Wolfram MathWorld defines combinatorics as “the branch of mathematics studying the enumeration, combination, and permutation of sets of elements” which put more plainly is figuring how many different ways you can pick and choose from a finite list of things.
The best illustration of this concept comes from my favorite burger place in college, which claimed that you could order one of several millions of different and unique burgers from them. How is this possible? Here’s some potential choices for our burger: from one to four beef patties, optional choice of seven kinds of cheese, four kinds of buns, and seven condiments/toppings with a choice of light, regular, extra, or no serving of each… if you think about it, the presence of each new choice multiplies the number of total possibilities, so for our definition of unique we have around 2,097,152 (4*(7+1)*4*4^7) different burgers, and we haven’t even talked about bacon. This gives a good illustration of how having only a dozen or so choices can mean millions of combinations, even without considering things like getting more than one kind of cheese or whether it makes a difference if the ketchup goes on before the pickles. But let’s bring things back to software before I get too hungry.
In software, we’re almost always dealing with options, which means we’re dealing with combinatorics. The options in software are often simple, but numerous, like how many different fonts can a user use, or how many images are on this web page and how are they positioned? Software of all kinds often has many more options than one might think, and we always expect that it handles all of the possible sets of options correctly. The only issue is that as the number of options go up, the number of different scenarios that can happen goes up too, and quickly.
When the number of situations gets high enough, developers typically stop testing all the combinations because it becomes too cumbersome. The traditional approach of testing what we expect to happen in each unique situation (unit testing) should then be augmented with something that can rapidly try combinations and monitor if something goes wrong (like fuzzing). This is particularly helpful when you have millions of combinations of options to test, especially when you throw in that modern fuzzers can intelligently explore different situations. So the fuzzer will be able to figure it out if your condiment software is strong but your swiss cheese handling is full of holes… yep, I couldn’t resist.
Complexity - Why some problems are hard to think about
Complexity as a software metric essentially reflects how hard it is to reason about all the possible things that could happen in a given piece of code. The number of possible ways a piece of code can execute is itself combinatorial: as it is designed to handle more options and inputs there is an explosion in the number of potential values and conditions that can arise. A more complicated task may require more complicated software, but code complexity is also affected by how the programmers choose to write the program.
Effective abstraction can keep complexity problems localized to different components, but there is also potential for complexity in interactions between components. There are numerical measures and best practices around these numbers, but suffice to say that once the complexity of a program gets high enough, the totality of all possibilities becomes impossible to fit in a person's brain. This makes predicting potential problems in the code more difficult.
Thinking about what could go wrong in code is one of the fundamental problems of programming and one of the reasons why writing bug-free code is a difficult endeavor. Programmers naturally focus on getting software to work, and generally we tend to focus on the situations we expect and have limitations when it comes to things we’re not thinking about (a classic example of unexpected behavior: “what happens if the user doesn’t enter any password at all?”). This is where fuzzing really shines, because it doesn’t require a developer to describe exactly what inputs to process, it tries many random variations and can report if something goes wrong. There’s no silver bullet when it comes to discovering all of the things that could happen in a piece of software, but the randomized nature of fuzzing makes it quite effective at discovering corner cases that we might not think to test.
So far we’ve covered that the fuzzing’s strengths of speed and unbiased nature help address some of the core problems of developing reliable software. While adopting any kind of fuzz testing is a step in the right direction, there’s also some logical implications from what we’ve discussed so far that get into the questions of “how much” or “how often.” If you’re interested in putting fuzzing to the test, I think there’s two additional corollaries to the Law of Fuzzing that are worth considering.
First, let’s say you are perfect today and your software has no bugs--that’s great! But tomorrow when you continue development, you might make a mistake, and this brings us to the first corollary of the law of fuzzing: new code hasn’t been fuzzed, therefore we should continuously fuzz our code (as part of continuous integration or other test automation). This idea has been borne out by Google’s OSS-Fuzz effort, where they reported a whopping 40% of the bugs found with fuzzing are regression bugs. If such a finding applies across that many codebases (250+ at the time of that talk), it probably also applies to code you care about.
The second corollary is that incomplete fuzzing leads to incomplete results, which is to say that if there is some part of your code you haven’t fuzzed, then the Law of Fuzzing applies to that remainder. That being said, there are certainly diminishing returns here, and it’s probably unrealistic to shoot for 100% code coverage in fuzzing. In this manner it’s just like unit testing--more is better to a point, but there’s no perfect percentage threshold for everyone. A common recommendation is to fuzz anything that takes untrusted input or performs parsing on input, because if an attacker could send anything to that part of the software, it makes sense to test throwing at least a few million different things at it yourself first.
To bring this to a close, let me just say this: fuzzing is neither new nor untested, and the reasons it works are pretty fundamental. If you knew nothing about fuzzing but just examined the results from the past decade on a variety of targets and languages, you would likely come to the same conclusion: that it should be a part of the secure software development lifecycle. I hope you’ve enjoyed my attempt at a straightforward explanation of why fuzzing works, and why it is unlikely to stop working, at least until we just get the computers to write all the code for us.