Mayhem is an advanced fuzzer that combines the tried-and-true methods of guided fuzzing with the ingenuity of symbolic execution. Refer to this video for more information.

Mayhem does not rely solely on lists of known software weakness or vulnerabilities alone. It executes across a broad range of code to find new vulnerabilities, then tests against those to make certain they pose a risk. Mayhem is comparable to a popular form of manual testing known as penetration testing.  See our Mayhem Sanitizers for more.

Anyone can do fuzz test automation. The real question is who can do automation with precision. Mayhem combines fast fuzz testing with systematic symbolic execution. Together, they produce the most unique inputs with the least amount of time, efficiently canvasing your software application. This allows organizations to push software out safer in less time, effort, and cost.

No, Mayhem does not require source code to test an application. Mayhem works with both source code and binaries. Mayhem finds vulnerabilities before and after software release.

  • We are a platform
  • We store all the test cases, can continue progress, perform regression tests
  • We have a structured UI to display results
  • We provide an automated framework to handle jobs, users, projects
  • We provide documentation, tutorials, easy to use CLI, docker ingest, scriptable API

C, C++, GO, Rust, and Jovial. Refer to this solution brief for more information.

Linux x86/64 compiled software. Other platforms may be compatible on a case by case basis.  Refer to this solution brief for more information.

All defects found by Mayhem are tested three times and verified.

No. Mayhem accepts arbitrary black box binaries to perform analysis and does not modify the application at all.

Mayhem is priced based on the tier and number of cores you’d like in your cluster. For more information, reach out to a sales representative here.

Mayhem only saves the unique test cases that get new coverage. Multiple test cases can run into the same path, but there's no reason to save them all. Similarly, multiple test cases can trigger the same defect, but there’s no reason to save them all. To ensure regression test efficiency, we run deduplications and only save the most useful test cases.

Mayhem is offered both as an on-prem and cloud solution (private and public). And, for our Federal customers in the United States, we are TAA compliant.

Yes. We have designed Mayhem to integrate into your current CI/CD. Refer to this solution brief for more information.

Mayhem is a versatile solution that is able to conduct a multitude of testing techniques: unit testing, regression testing, negative testing, continuous testing, dynamic testing. Based on use cases that work for organizations, Mayhem can be incorporated as a part of the development phase of the SDLC, shifting DAST and fuzz testing further left. Mayhem is scriptable, integrating into most CI/CD tools such as Jenkins and Travis.

CWEs can lead to CVEs.  Mayhem can detect a number of CWE defects such as Improper Input Validation, Out-of-bounds Read, Incorrect Calculation of Buffer Size, Divide By Zero, Failure to Release Memory Before Removing Last Reference ('Memory Leak'), Use of Uninitialized Variable, NULL Pointer Dereference, Free of Memory not on the Heap, Release of Invalid Pointer or Reference, Out-of-bounds Write, and Improper Control of Dynamically-Managed Code Resources. Refer to this datasheet for more information.

An updated list can be found here.

Application Security Theory

In software, a bug refers to an error in code which causes it to behave unexpectedly (i.e. crash). The term originates from an actual bug, a moth, that crashed an early computer. Bugs are usually found either during unit testing built by the software developer or with module testing found by the QA teams.

A defect is found when the application does not conform to the requirement specification. A defect can also be found when the client or user is testing or when an error is found and a fix is in development.

A known vulnerability refers to a bug that has been reported to the vendor and may have a workaround or a patch available. These vulnerabilities are assigned by MITRE a number known as a Common Vulnerability Enumeration (CVE). A CVE includes year of discovery and sequence number, so CVE 2020-0654

A zero-day vulnerability refers to a bug or defect that has not been reported to the vendor and therefore does not yet have a workaround or patch available. These unknown vulnerabilities are valuable to malicious actors who can exploit these for years without detection. For example the Heartbleed vulnerability, which gave bad actors the opportunity to extract private information from websites, was in the wild for nearly two years before it was discovered.

White-box (Known) testing examines the functionality of an application with access to its internal structure. It is also known as Clear Box Testing, Open Box Testing, Glass Box Testing, Transparent Box Testing, Code-Based Testing or Structural Testing.

Black-box (unknown) testing examines the functionality of an application without access to it’s internal structure or code. It can be applied to third-party binaries, for example.

Gray-box testing is a mixture of white-box (known) and black-box (unknown) testing.

Positive testing, or functional testing, is a testing process where an application is sent a valid set of inputs. The purpose of positive testing is to ensure the application behaves as expected. While this type of testing is typically conducted by QA teams, modern DevOps shops may collaborate closely with security or development teams.

Negative testing, or non-functional testing, is a testing process where an application is sent an invalid set of inputs. The purpose of negative testing is to ensure the application remains stable in unexpected use cases. While this type of testing is typically conducted by security teams, modern DevOps shops may collaborate closely with QA or development teams.

Unit testing tests individual units/components of software. A unit is the smallest, testable part of any software. It usually has one or a few inputs. It usually has only a single output.

A property-based test allows for a range of inputs to be programmed and tested within a single test, rather than having to write a different test for every value that you want to test.

Regression testing is defined as re-running both function and non-functional tests to make sure that code previously tested and developed still runs after making any changes. If the code doesn’t run properly after a change, then then that is considered to be a regression.

Test coverage, also referred to as code coverage, measures how much a program has been exercised by a test suite. Mayhem uses the edge coverage metric, which measures the number of control flow graph edges that have been exercised. Refer to this webinar for more information.

SDL refers to Secure Development Lifecycle, which is the process of embedding security testing throughout the entire software cycle. SDLC refers to Software Development Lifecycle, which defines the different phases that a software product goes through from beginning to end of its life. In the SDLC sequence Training, Requirements, Design, Implementation, Verification, Release, and Response. As a part of the SDL, Fuzz Testing appears under Verification.

Application Security Testing Tools

SAST uncovers software bugs by analyzing source code. The defects identified are known unknown risks such as CVEs. Because static analysis solutions require source code, they are able to provide prescriptive remediation advice, down to the line of code. Static analysis can also be introduced earlier than most tools in the software development lifecycle, lowering cost and effort for remediation. Refer to this blog for more information.

Software Composition Analysis (SCA) is a relatively new industry term for a set of tools that provides users visibility into their open source inventory. Despite its misleading name suggesting access to all aspects of the source code (proprietary, third party commercial and open source), software composition analysis in effect acts as an open source management tool only.

SCA looks at your open source packages and checks to see if you're using anything with known vulnerabilities and flags those components. Refer to this brief for more information.

Dynamic analysis security testing (DAST) is an application testing method whereby a security solution monitors the target program while executing on real inputs. For example, valgrind is a type of dynamic analysis solution that looks for memory errors while a target runs.

Interactive application security testing (IAST) analyzes software code for security vulnerabilities while the app runs. It combines dynamic application security testing (DAST) and static analysis security testing (SAST) techniques. There’s two types: Active and Passive. Both require an agent within the application for testing. Active is designed to validate an existing vulnerability. Passive, will leverage any form of functional testing, and will listen in on those tests and report any findings, making it suitable for DevOps environments.

The defects that AFT tools identify are unknown unknown risks. AFT “thinks like a hacker” to uncover new defects utilizing unknown or uncommon attack patterns. After each simulated attack, they monitor and leverage its target’s reactions, or behaviors, as feedback to autonomously generate new test cases that are increasingly likely to uncover more defects and new code edges. Refer to this whitepaper for more information.

Concolic execution is a form of dynamic symbolic execution and a type of analysis that runs on a trace of a program that ran (for real) on a specific input. We technically only do concolic execution, not static symbolic execution.  We don't do static symbolic execution, which is what someone might mean when they say "symbolic execution", but when we say it we mean in the more general sense of "static or dynamic symbolic execution", of which we do one of those things.

"Symbolic execution" usually means "static symbolic execution", in which you analyze a non-executing program to consider how it might behave when it does execute for real. Symbolic execution is a program analysis technique that uses formal computer science methods to determine an input that triggers a node in the application to execute. Once determined, the valid input is used to derive invalid inputs for negative testing. Refer to this video for more information.

Security testing that is focused on the behavior of the software rather than the actual lines of code.

A penetration test, or pen test, is an authorized cyberattack on a computer system. It is performed as a means of evaluating the overall security of that system, often with limited prior knowledge of the test. This is not to be confused with a vulnerability assessment which looks for known vulnerabilities in a system. Penetration testing services are typically time-boxed.

Secure development practices call for the use of various testing techniques throughout the development lifecycle. SAST, SCA, and AFT strategically offer strength in each technique’s limitations, offering comprehensive application testing across the spectrum of software security risk. Refer to this whitepaper for more information.

AFT not only uncovers defects at run time, but also produces artifacts like test suites and reproducible defects that can be used to provide necessary context to fix them, something not provided by SAST or SCA solutions. Your choice will depend on your needs and many organizations opt to use a combination of solutions as they each have their own strengths and weaknesses. For example, if you have a lot of custom code and develop your own libraries, an SCA tool, targeted at cataloguing vulnerabilities in open source software, won’t be a great fit for your environment while a AFT or SAST solution would be a good place to start. You should use AFT alongside SAST and/or SCA for complete SDLC coverage. Refer to this brief for more information.

AFT can be continuous where a pen test is usually defined by the period of the contract. Guided AFT can expose more code coverage than a pen tester, depending on their experience. And continuous AFT can be much more cost effective than hiring a team of pen testers.

Fuzz Testing

Fuzz testing, or fuzzing, is a dynamic application security testing technique for negative testing. Fuzzing aims to detect known, unknown, and zero-day vulnerabilities. Fuzzers send malformed inputs to targets. Their objective is to trigger bad behaviors, such as crashes, infinite loops, and/or memory leaks. These anomalous behaviors are often a sign of an underlying vulnerability. Refer to this webinar for more information.

Fuzz testing should be a part of every SDLC. It looks at the runtime behavior of the code. It provides more code coverage than SAST or SCA. Refer to this infographic for more information.

Google finds 80% of bugs in its Chrome browser with fuzz testing, while the remaining 20% is found with other techniques or in production. Taking an adversarial approach to your software is a proven and an effective method for addressing the "low hanging fruit" or the weaknesses that are most attractive to malicious actors. While fuzzing is effective, it's also dependent on your tool of choice and the expert behind the tool. Refer to this webinar for more information.

There are three types of fuzz testing: random, template, and guided. Refer to this whitepaper for more information.

Random fuzzing is the act of sending random inputs to an application. There is no systematic method to the generation of these test cases, and they do not resemble a valid input. As a result, most inputs do not penetrate the application, leading to low code coverage. For a random fuzzer to reach the same level of coverage as its more effective counterparts, it requires significantly more time and effort from security experts. Refer to this blog for more information.

Template fuzzers utilize manually supplied custom inputs and modify them to include anomalies, and we also include generational and protocol fuzzers in this subcategory. As a group, they are more effective than random fuzzers, because they resemble valid inputs. The probability for these test cases to penetrate an application is higher than a random fuzzer’s test cases. However, there are several drawbacks. Template fuzzers randomly include anomalies without an understanding of common error-detection techniques. As a result, their test cases are often blocked. Second, templates have limitations. Not only is the test case quality limited to the valid input template provided, but they also do not fuzz beyond the scope of the template. A template fuzzer can be effective for fuzzing a single function, given that it was provided an expertly crafted template. However, scaling template fuzzing requires expertise.Generational fuzzers are a bit better and understand the inner workings of its input type.These tests are written to resemble a valid input, while evading common error-detection techniques. While vendor-maintained generational fuzzers require significantly less expertise than template fuzzers for testing, they still require expertise for operation and results interpretation. Most commercial fuzzers available today are template/generational fuzzers.

Guided fuzzers are intelligent, containing the capability to monitor and leverage the target’s behavior to autonomously generate new, custom test cases on-the-fly. These fuzzers have scoring capabilities that measure the effectiveness of the test cases it sends. High-scoring test cases influence the new set of test cases generated and sent. Unlike the aforementioned fuzzer types, guided fuzzers only require a way to monitor its target and sent inputs to them. They are not supplied starter test cases. In recent years, modern fuzzers have grown in sophistication and wield robust automation capabilities. Guided fuzzers relieve significant strains introduced from manual test case creation. However, if users desire deep fuzzing analysis, guided fuzzers require expert assistance. Mayhem, from ForAllSecure, provides automated guided fuzz testing.

In order to fuzz test, Mayhem needs a way to interact with the application. Unit tests and integration tests both typically involve running the software under test with a specific input and asserting that a specific output was observed. Fuzzing extends this form of testing by parameterizing the test within an array of bytes and then searching for strings of input bytes that trigger bugs. Fortunately, developers can write a fuzz test harness in much less time than required to write individual unit tests. Better yet, these harnesses typically only need to be written once for a given application

Seed corpus is a set of valid inputs that serve as a starting point for fuzzing a target.

Fuzzers send malformed inputs to targets. Their objective is to trigger bad behaviors, such as crashes, infinite loops, and/or memory leaks. These anomalous behaviors are often a sign of an underlying vulnerability.

When developers are creating software they are generally not looking at all possible ways their software can behave badly. Unfortunately, there are infinite ways to do something wrong.  Although the effectiveness of fuzz testing has been well-documented, fuzzing, as a whole, has been criticized for its shallow analysis and inability to penetrate through the peripheral layers of an application. Despite criticism, fuzz testing is a proven method and a recommended practice for organizations who build their own software and rely on supplier software for business productivity. Newer guided fuzzing solutions offer scoring capabilities that measure the effectiveness of the test cases it sends. Guided fuzzers do rely on sample inputs, or a corpus, for initial guidance to explore a program however, thereafter, it monitors and leverages its target’s behavioral feedback to generate new, customized test cases on-the-fly. These newly generated test cases aim to incrementally test new sections of code, checking the security of each new region it successfully penetrates and improving usability.

Perhaps the most famous example of fuzz testing is the discovery of the buffer overflow defect in certain versions of OpenSSL, better known as Heartbleed. An updated list of recent vulnerabilities Mayhem uncovered can be found here.

Among the benefits of fuzz testing are 

  • Ability to find unknown vulnerabilities
  • High accuracy, low false positives
  • True automation with the SDLC
  • Security testing done at machine speed and scale
  • Shift-Left DAST

Everyone. Seriously, the end users benefit from having secure applications that won’t leak their personal information or crash in life critical situations. Organizations benefit in that defects are easier and therefore less costly to fix in development. And government agencies benefit in that third-party applications can be rigorously tested for unknown vulnerabilities before deployed in mission critical systems.

Organizations that develop their own code benefit the most because they have the most control over the source code. However, fuzz testing can also be applied to third-party binaries and therefore bring benefit to those who procure third-party applications.