ForAllSecure Blog

Back to the Fuzz: Fuzzing for Command Injections

Adam Van Prooyen
·
March 02, 2021

Some readers may remember NCSA HTTPd, an early web server out of the University of Illinois at Urbana–Champaign (I don’t – but that's probably because I was still wearing a diaper in its heyday). However, NCSA HTTPd has inevitably affected every one of our lives since the Apache project took over the codebase and became the Apache webserver.

Today we will be looking at reproducing an early CVE in the NCSA HTTPd web server. This is interesting for a couple of reasons:

  • The vulnerable program is a CGI binary — meaning it takes input through environment variables
  • The bug is a command injection — not a classically “fuzzable” class of bug
  • It’s a 90’s web server so we get to feel like old-school hackers

Command injection is a class of software bugs that doesn't involve memory corruption or any other means of taking over the vulnerable program. Instead, it exploits flaws in the programs use of system or exec calls (think command line) to run arbitrary commands on the host. This class of bug is very similar to SQL injections because it is caused by improper escaping (filtering of special characters) and also that it is difficult to find with many traditional software testing methodologies.

Over the course of this blog post we will discuss:

  • how to fuzz environment variables
  • how to fuzz for command injection
  • a breakdown of the bug we found

Fuzzing environment variables

The binary we will be looking at within the NCSA HTTPd web server is called phf. phf is a CGI binary and thus takes input from environment variables and even STDIN when handling a POST request. This poses a challenge to us as these input methods are usually totally unsupported by fuzzers. 

The fuzzers you are most likely used to seeing, such as AFL or libfuzzer, take input via files or through special functions. However, we want to instead fuzz environment variables. To do this we will use LD_PRELOAD to modify the behavior of functions such as getenv in order to reroute the fuzz data from our file input to these environment variables.

If you aren’t familiar with LD_PRELOAD, the concept is simple: the dynamic loader allows us to specify a shared library (.so) to fulfill undefined symbols in a binary before the normal shared libraries do. Put another way, LD_PRELOAD allows us to override or hook shared library functions to add custom behavior. Check out my previous blog post, Firmware Fuzzing 101, for a more detailed overview.

For environment variable fuzzing, our LD_PRELOAD harness will need to do two things:

  • Hook main and load in fuzzed environment variable data from the fuzz file
  • Hook getenv to return the fuzzed environment variable data

Below is pseudo-code for how we can accomplish this:

fuzzed_envp: dict
 
def __libc_start_main(main, argc, argv, envp):
   fuzzed = open(argv[-1])
   for env in envp:
       if env.data == "fuzzme":
           env.data = fuzzed.read(ENV_SIZE)
   fuzzed_envp = envp
   return main(argc, argv, envp)
 
def getenv(key):
   return fuzzed_envp[key]

With this, we can use the environment variables to signal to our fuzz harness which ones should be fuzzed and which ones shouldn’t. To show how this works, the Mayhemfile for this harness has been included below:

version: '1.9'
project: ncsa-httpd
target: phf
baseimage: $MAYHEM_DOCKER_REGISTRY/phf
advanced_triage: true
cmds:
 - cmd: /build/cgi-bin/phf @@
   env:
     LD_PRELOAD: /build/envfuzz.so
     SERVER_NAME: example.com
     SERVER_PORT: '80'
     SCRIPT_NAME: /phf
     QUERY_STRING: fuzzme

Let’s run it in Mayhem…

…and a bug! Unfortunately, this bug is an unexploitable uninitialized variable issue.

This harness, no matter how long we fuzz, will never find the command injection bug! Let’s tackle this in the next section.

Fuzzing for command injection

Why can’t we find the command injection? Mayhem, like most fuzzers, is looking for memory corruption or other things that lead to faulting/signaling behavior. Since command injection results in perfectly “valid” behavior, Mayhem doesn’t have any way of detecting it.

Finding command injection bugs will involve another trick but turns out to also be pretty simple to implement. We will do two things: 

  • predispose the fuzzer to add commands that create “sentinel” files
  • check for their presence after system or popen commands

To predispose the fuzzer to inject commands, we can use a dictionary with the command string touch /getfuzzed and variants such as ;touch /getfuzzed, \ntouch /getfuzzed, `touch /getfuzzed`, etc. This will make the command show up more in the fuzzed file and if an injection is found, we will know where to look for evidence.

To check for the injection, we will use the same technique as before with LD_PRELOAD harnessing. Below is pseudo-code for how we can accomplish this:

def pre():
   rm("/getfuzzed")
 
def post(cmd):
   if file_exists("/getfuzzed"):
       print("injection from {}".format(cmd))
       abort()
 
def popen(cmd, rw):
   pre()
   ret = popen_orig(cmd, rw)
   post(cmd)
   return ret
 
def system(cmd):
   pre()
   ret = system_orig(cmd)
   post(cmd)
   return ret

In actuality, the popen hook is somewhat more complicated to get correct, as it returns a file pointer (FILE *) with access to the subprocess’s input/output – meaning that the subprocess has usually not finished executing by the end of the popen hook. Therefore, in the real harness, we would need to keep a file pointer to cmd string mapping and check for the file's presence after calls to fread, fwrite, pclose, etc.

Now that we have support for command injection, let's update the Mayhemfile. Note that the only two changes are to add the new LD_PRELOAD and the dictionary.

version: '1.9'
project: ncsa-httpd
target: phf
 
baseimage: $MAYHEM_DOCKER_REGISTRY/phf
 
advanced_triage: true
 
cmds:
 - cmd: /build/cgi-bin/phf @@
   env:
     LD_PRELOAD: /build/envfuzz.so /build/injectiondetect.so
     SERVER_NAME: example.com
     SERVER_PORT: '80'
     SCRIPT_NAME: /phf
     QUERY_STRING: fuzzme
   dictionary: /build/injection.dict

After rerunning in Mayhem, we find a new crash!

This time instead of an uninitialized variable bug, we have “Improper Input Validation” – code for a generic crash or abort.

But, by examining the stack trace of this defect, we can clearly see that this was the command injection that we were looking for.

Bug breakdown

This bug turns out to be about as classic as you can get. If we look at the stack trace from Mayhem, we can see that the popen call happens at phf.c:202. In phf.c, we have:

strcpy(commandstr, "/usr/local/bin/ph -m ");
if (strlen(serverstr)) {
    strcat(commandstr, " -s ");
    escape_shell_cmd(serverstr);
    strcat(commandstr, serverstr);
    strcat(commandstr, " ");
}
...
printf("%s%c", commandstr, LF);
printf("%c", LF);
phfp = popen(commandstr,"r");

The program takes user input, shell escapes the server string, and adds it to the command string. If the escape_shell_cmd (util.c:137) is flawed, we have command injection. And sure enough, it is:

void escape_shell_cmd(char *cmd) {
    register int x,y,l;
 
    l=strlen(cmd);
    for(x=0;cmd[x];x++) {
        if(ind("&;`'\"|*?~<>^()[]{}$\\",cmd[x]) != -1){
            for(y=l+1;y>x;y--)
                cmd[y] = cmd[y-1];
            l++; /* length has been increased */
            cmd[x] = '\\';
            x++; /* skip the character */
        }
    }
}

If the command contains any of the listed special characters, they are escaped by prepending a backslash. Unfortunately, they forgot to check for the newline character, meaning any command with a newline will attempt to execute the string following the newline.

Conclusion

In this article, we have shown that with some ingenuity you can easily fuzz targets and find bugs that you traditionally wouldn’t be able to. 

With more effort, we can improve this method as well. One direction for improvement could be to add a shell parser into the command injection detection code to allow the coverage-guided fuzzing and symbolic execution to solve for injection, rather than requiring some amount of luck to line up the injected command and the escape logic. This would also allow us to remove the variations of touch /getfuzzed from the dictionary (although they may still speed it up in some cases).

Another direction we could take this in would be SQL injection. Instead of trying to inject and check for a specific file, we can use the dictionary again to bias the fuzzer to injecting SLEEP statements into the input – and therefore SQL queries. Combined with timers, we could use this to detect slow queries (bad) and fully-fledged SQL injections (worse!). The same principle of adding the SQL parser into the binary (if it isn't already) would help improve coverage and discoverability for these injections.

You can find the code for this on our Vulnerabilities Lab Github along with Mayhem configuration information to run it yourself. If you want to learn more about LD_PRELOAD and hacking on binaries that are traditionally difficult to fuzz, check out the Fuzzing Firmware 101 blog post as well.

Happy Hacking!

Stay Connected


Subscribe to Updates

By submitting this form, you agree to our Terms of Use and acknowledge our Privacy Statement.