Back to the Fuzz: Fuzzing for Command Injections

Adam Van Prooyen
March 2, 2021
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Some readers may remember NCSA HTTPd, an early web server out of the University of Illinois at Urbana–Champaign (I don’t – but that's probably because I was still wearing a diaper in its heyday). However, NCSA HTTPd has inevitably affected every one of our lives since the Apache project took over the codebase and became the Apache webserver.

Today we will be looking at reproducing an early CVE in the NCSA HTTPd web server. This is interesting for a couple of reasons:

  • The vulnerable program is a CGI binary — meaning it takes input through environment variables
  • The bug is a command injection vulnerability — not a classically “fuzzable” class of bug
  • It’s a 90’s web server so we get to feel like old-school hackers

A command injection attack comes from a class of software bugs that doesn't involve memory corruption or any other means of taking over the vulnerable program. Instead, it exploits flaws in the programs use of system or exec calls (think command line) to run an arbitrary command on the host. This class of bug is very similar to SQL injection because it is caused by improper escaping (filtering of special characters) and also that it is difficult to find with many traditional software testing methodologies.

Over the course of this blog post we will discuss:

  • how to fuzz environment variables
  • how to fuzz for a command injection vulnerability
  • a breakdown of the bug we found

Fuzzing environment variables

The binary we will be looking at within the NCSA HTTPd web server is called phf. phf is a CGI binary and thus takes input from environment variables and even STDIN when handling a POST request. This poses a challenge to us as these input methods are usually totally unsupported by fuzzers

The fuzzers you are most likely used to seeing, such as AFL or libfuzzer, take input via files or through special functions. However, we want to instead fuzz environment variables. To do this we will use LD_PRELOAD to modify the behavior of functions such as getenv in order to reroute the fuzz data from our file input to these environment variables.

Download: The Buyer's Guide to Application Security Testing

Get a detailed breakdown of the various Application Security Testing techniques, the strengths and weaknesses of each technique, and how each technique complements one another.

Download the Whitepaper More Resources

If you aren’t familiar with LD_PRELOAD, the concept is simple: the dynamic loader allows us to specify a shared library (.so) to fulfill undefined symbols in a binary before the normal shared libraries do. Put another way, LD_PRELOAD allows us to override or hook shared library functions to add custom behavior. Check out my previous blog post, Firmware Fuzzing 101, for a more detailed overview.

For environment variable fuzzing, our LD_PRELOAD harness will need to do two things:

  • Hook main and load in fuzzed environment variable data from the fuzz file
  • Hook getenv to return the fuzzed environment variable data

Below is pseudo-code for how we can accomplish this:

fuzzed_envp: dict

def __libc_start_main(main, argc, argv, envp):
fuzzed = open(argv[-1])
for env in envp:
if env.data == "fuzzme":
env.data = fuzzed.read(ENV_SIZE)
fuzzed_envp = envp
return main(argc, argv, envp)

def getenv(key):
return fuzzed_envp[key]

With this, we can use the environment variables to signal to our fuzz harness which ones should be fuzzed and which ones shouldn’t. To show how this works, the Mayhemfile for this harness has been included below:

version: '1.9'
project: ncsa-httpd
target: phf
baseimage: $MAYHEM_DOCKER_REGISTRY/phf
advanced_triage: true
cmds:
- cmd: /build/cgi-bin/phf @@
env:
LD_PRELOAD: /build/envfuzz.so
SERVER_NAME: example.com
SERVER_PORT: '80'
SCRIPT_NAME: /phf
QUERY_STRING: fuzzme


Let’s run it in Mayhem…

…and a bug! Unfortunately, this bug is an unexploitable uninitialized variable issue.

This harness, no matter how long we fuzz, will never find the command injection bug! Let’s tackle this in the next section.

Fuzzing for command injection vulnerabilities

Why can’t we find the command injection? Mayhem, like most fuzzers, is looking for memory corruption or other things that lead to faulting/signaling behavior. Since command injection results in perfectly “valid” behavior, Mayhem doesn’t have any way of detecting it.

Finding command injection bugs will involve another trick but turns out to also be pretty simple to implement. We will do two things: 

  • predispose the fuzzer to add commands that create “sentinel” files
  • check for their presence after system or popen commands

To predispose the fuzzer to inject commands, we can use a dictionary with the command string touch /getfuzzed and variants such as ;touch /getfuzzed, \ntouch /getfuzzed, `touch /getfuzzed`, etc. This will make the command show up more in the fuzzed file and if an injection is found, we will know where to look for evidence.

To check for the injection, we will use the same technique as before with LD_PRELOAD harnessing. Below is pseudo-code for how we can accomplish this:

def pre():
rm("/getfuzzed")

def post(cmd):
if file_exists("/getfuzzed"):
print("injection from {}".format(cmd))
abort()

def popen(cmd, rw):
pre()
ret = popen_orig(cmd, rw)
post(cmd)
return ret

def system(cmd):
pre()
ret = system_orig(cmd)
post(cmd)
return ret

In actuality, the popen hook is somewhat more complicated to get correct, as it returns a file pointer (FILE *) with access to the subprocess’s input/output – meaning that the subprocess has usually not finished executing by the end of the popen hook. Therefore, in the real harness, we would need to keep a file pointer to cmd string mapping and check for the file's presence after calls to fread, fwrite, pclose, etc.

Now that we have support for command injection, let's update the Mayhemfile. Note that the only two changes are to add the new LD_PRELOAD and the dictionary.

version: '1.9'
project: ncsa-httpd
target: phf

baseimage: $MAYHEM_DOCKER_REGISTRY/phf

advanced_triage: true

cmds:
- cmd: /build/cgi-bin/phf @@
env:
LD_PRELOAD: /build/envfuzz.so /build/injectiondetect.so
SERVER_NAME: example.com
SERVER_PORT: '80'
SCRIPT_NAME: /phf
QUERY_STRING: fuzzme
dictionary: /build/injection.dict

After rerunning in Mayhem, we find a new crash!

This time instead of an uninitialized variable bug, we have “Improper Input Validation” – code for a generic crash or abort.

But, by examining the stack trace of this defect, we can clearly see that this was the command injection that we were looking for.

Bug breakdown

This bug turns out to be about as classic as you can get. If we look at the stack trace from Mayhem, we can see that the popen call happens at phf.c:202. In phf.c, we have:

strcpy(commandstr, "/usr/local/bin/ph -m ");
if (strlen(serverstr)) {
strcat(commandstr, " -s ");
escape_shell_cmd(serverstr);
strcat(commandstr, serverstr);
strcat(commandstr, " ");
}
...
printf("%s%c", commandstr, LF);
printf("%c", LF);
phfp = popen(commandstr,"r");

The program takes user input, shell escapes the server string, and adds it to the command string. If the escape_shell_cmd (util.c:137) is flawed, we have command injection. And sure enough, it is:

void escape_shell_cmd(char *cmd) {
register int x,y,l;

l=strlen(cmd);
for(x=0;cmd[x];x++) {
if(ind("&;`'\"|*?~<>^()[]{}$\\",cmd[x]) != -1){
for(y=l+1;y>x;y--)
cmd[y] = cmd[y-1];
l++; /* length has been increased */
cmd[x] = '\\';
x++; /* skip the character */
}
}
}


If the command contains any of the listed special characters, they are escaped by prepending a backslash. Unfortunately, they forgot to check for the newline character, meaning any command with a newline will attempt to execute the string following the newline.

Conclusion

In this article, we have shown that with some ingenuity you can easily fuzz targets and find bugs that you traditionally wouldn’t be able to. 

With more effort, we can improve this method as well. One direction for improvement could be to add a shell parser into the command injection detection code to allow the coverage-guided fuzzing and symbolic execution to solve for injection, rather than requiring some amount of luck to line up the injected command and the escape logic. This would also allow us to remove the variations of touch /getfuzzed from the dictionary (although they may still speed it up in some cases).

Another direction we could take this in would be SQL injection. Instead of trying to inject and check for a specific file, we can use the dictionary again to bias the fuzzer to injecting SLEEP statements into the input – and therefore SQL queries. Combined with timers, we could use this to detect slow queries (bad) and fully-fledged SQL injections (worse!). The same principle of adding the SQL parser into the binary (if it isn't already) would help improve coverage and discoverability for these injections.

You can find the code for this on our Vulnerabilities Lab Github along with Mayhem configuration information to run it yourself. If you want to learn more about LD_PRELOAD and hacking on binaries that are traditionally difficult to fuzz, check out the Fuzzing Firmware 101 blog post as well.

Happy Hacking!

Share this post

Add a Little Mayhem to Your Inbox

Subscribe to our weekly newsletter for expert insights and news on DevSecOps topics, plus Mayhem tips and tutorials.

By subscribing, you're agreeing to our website terms and privacy policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Add Mayhem to Your DevSecOps for Free.

Get a full-featured 30 day free trial.

Complete API Security in 5 Minutes

Get started with Mayhem today for fast, comprehensive, API security. 

Get Mayhem

Maximize Code Coverage in Minutes

Mayhem is an award-winning AI that autonomously finds new exploitable bugs and improves your test suites.

Get Mayhem