Category: Uncategorized

How Much Testing is Enough? Understanding Test Results with bncov and Coverage Analysis.

How Much Testing is Enough? Understanding Test Results with bncov and Coverage Analysis.

A frequently asked question in software testing is “Is that enough testing, or should we do more?” Whether you’re writing unit tests for your programs or finding bugs in closed-source third-party software, knowing what code you have and have not covered is an important piece of information. In this article, we’ll introduce bncov, an open source tool developed by ForAllSecure, and demonstrate how it can be used to answer common questions that arise in software testing.

At its core, bncov is a code coverage analysis tool. While there are several well-known tools that offer visibility into code coverage, we wanted to build a solution that enhanced and/or extended functionality in the following areas:

  1. Easily scriptable. Scriptability is a key feature to align with larger analysis efforts and for combining with other tools.
  2. Strong data presentation. Good visualizations quicken and enhance understanding.
  3. Fuzzing/testing workflow compatible. Tools that exactly fit your needs increase productivity and speed.
  4. Supports binary targets. Sometimes you don’t have the original source code.

While the existing code coverage tools are really good at some of these, our main focus was scriptability because of our requirements for flexibility. The driving purpose is to be able to answer common questions in software testing that often require a combination of information from static and dynamic analysis, so flexibility is important in order to answer a large variety of potential questions. We found that a plugin for Binary Ninja works perfectly for this, because it allows users to easily leverage the information from Binary Ninja in a python scripting environment.

The workflow for using bncov is a three-step process. While the first step is up to you, we’ve made the other steps easy to pipeline:

  1. Generate test cases. Test cases can be generated by any approach, from fuzzing solutions to manual test case development.
  2. Generate coverage data from those test cases.
  3. Run analysis and display output with bncov.

After running the normal install process for Binary Ninja plugins (instructions available at, the first step is to collect coverage information. This is done by running your target program with your inputs, also known as input files or seeds, and collecting coverage in the drcov format (from DynamoRIO’s built-in drcov tool We’ve packaged a script to make this easier, but it’s nothing a simple bash loop couldn’t accomplish. It’s important to note to the data that’s collected, because this is the data that ends up in the plugin and forms the basis of our analyses. The coverage files generated by drcov include which basic blocks are executed, but not the order or the number of times the blocks are executed.

With the information from the coverage files, we can now visualize block coverage using bncov. Import the whole directory of coverage files, and you’ll see blocks colored in a heatmap fashion, with blocks painted from blue to purple to red. Redder hues indicate that a block was covered in a smaller percentage of input files (i.e. the block is “rare” among the inputs), while bluer hues show blocks with a higher percentage, indicating more common code paths. Blocks that have not been covered at all are not recolored. This color scheme allows users to instantly visualize which blocks have been tested and what the common code paths are as they review functions.

Figure 1: The smaller the relative percentage of test cases that cover the block (“the rarer it is”), the more reddish it is.

Coverage visualization is very helpful for manual analysis. bncov’s unique differentiator is its scripting flexibility and ability to automate analysis. The code coverage data used to provide visualization can be used within Binary Ninja’s built-in scripting console or a normal python environment (only available with a Binary Ninja commercial license), allowing for additional analyses using its existing knowledge of the binary. The ability to programmatically reason about code coverage with a set of input files is extremely powerful, and we’ve provided some built-in examples as starting points, such as the GUI commands “Highlight Rare Blocks” and “Highlight Coverage Frontier.” These examples highlight and log blocks that are only covered by a single coverage file and blocks that have an outgoing edge to an uncovered block, respectively. Users can build various interesting analyses on top of these building blocks to answer challenging questions, such as the one we started with: “Should we do more testing?”

Figure 2: Blocks highlighted in green are in the “Coverage Frontier” — meaning they have an outgoing edge that isn’t covered.

As a demonstration, let’s walk through an open source project that has built-in test resources. The open source XML library “TinyXML-2” ( is an excellent example because it is a compact library that includes a test program, test inputs, and a Google OSS-fuzz harness. If users choose to conduct additional testing (like fuzzing) it’s helpful to understand what code the built-in test cases cover and compare how much more coverage fuzzing yields. This process is simplified by using a bncov script to compare coverage between the set of coverage files from before and after fuzzing. The code below is the heart of the coverage comparison process from the script:

Figure 3: Comparing coverage between sets of inputs with bncov

We’ll start the analysis with three sets of starting inputs:

  1. The test XML files included in the resources directory
  2. XML inputs extracted from TinyXML-2’s test binary.
  3. A set of XML files gathered from multiple test suites on the Internet  

First, we collect coverage using the bncov’s drcov automation script on each input set to understand the baseline level of coverage we get from the different inputs. We wrote a simple program that uses TinyXML-2 to parse and print input files, which we used as our target for collecting coverage (and later for fuzzing). The results from collecting baselines show that the extracted test cases offered significantly more coverage than the test cases from the resources directory, which makes sense as the test binary includes all the tests from the resources. Also, as you might expect, the combination of multiple external test suites had the most coverage among the initial input sets.

By fuzzing our target program with each of the input sets, we will explore new code paths in TinyXML-2 by generating testcases that cover new basic blocks that the initial sets do not. The results of fuzzing will vary greatly depending on multiple factors: how long the fuzzer is run and how fast the target program is, the kind of input processing the target does, the quality of the starting input set, the capabilities of the fuzzer, etc. In our case though, we’re looking to compare coverage and look for relative increases in block coverage across the input sets, so we just fuzzed each input set for the same period of time with AFL. Once the fuzzing finished, we did some comparison using one of the scripts included with bncov.

Figure 4: Coverage comparison script output.

As expected, we saw increased coverage for each input set after a short fuzzing run. Although the gap in the number of blocks covered between each input set narrows after fuzzing, there are certain blocks that were only found by the external suite. This result makes sense, as certain input constructs are harder for a fuzzer like AFL to synthesize. This is where a technique known as symbolic execution, a technology within our Mayhem solution, can often help by solving for inputs that are unlikely to be discovered by random permutations from a fuzzer.

Using the script output, we can now start to answer “how much is enough testing?” Using bncov, users now have data points that show which functions have been exercised and which basic blocks are not covered by the existing test cases. With the included coverage frontier analysis, we can also see the boundary between existing test inputs and untouched code, allowing users to automatically identify functions that could benefit from further exploration. This type of analysis quickly increases the amount of understanding a user has of the target code, and this is the kind of information needed to answer “how much is enough.”

Figure 6: Enumerate frontier blocks for each function.

Coverage analysis and using coverage information to enhance fuzzing is an active and developing research area. Using bncov to reason about coverage is a step forward because it leads to analysis automation, and flexible reasoning required for targeted application of techniques to augment fuzzing, such as directed symbolic execution. We’ll share more on these advanced topics in a future installment, but in the meantime you can fork bncov on GitHub and experiment for yourself! We hope it helps you get a better understanding of your testing coverage and discover code paths you might be missing.

ForAllSecure offers Mayhem, a dynamic testing solution that brings together the tried-and-true techniques of coverage-guided fuzzing with the advantages of symbolic execution, including patented technology from over a decade of research at Carnegie Mellon University. You can learn more at

Top 5 Takeaways From the “ForAllSecure Makes Software Security Autonomous” Livestream

Top 5 Takeaways From the “ForAllSecure Makes Software Security Autonomous” Livestream

In February 2019, Dr. David Brumley, ForAllSecure CEO, and Zach Walker, DIU project manager, discussed how Mayhem, ForAllSecure’s behavior testing solution, has helped secure the Department of Defense’s most critical platforms. The Defense Innovation Unit, also known as DIU, is a progressive group within the Department of Defense employing bleeding-edge technology to solve the nation’s defense challenges. Brumley and Walker recount their experience deploying and utilizing Mayhem within the government, lessons learned from the partnership, and what the future looks like for Mayhem.

The pain of not being able to hire enough cybersecurity people to secure software is universal.

Security is largely manual, and human effort can’t scale. This was Brumley and ForAllSecure’s motive behind Mayhem, the autonomous cyber reasoning system that won the 2016 DARPA Cyber Grand Challenge.

The Department of Defense and Brumley’s pains check out with market statistics. (ISC)2 predicts that by 2022 there will be 1.8 million open jobs in software security. It’s apparent security testing that relies on human expertise is unsustainable and unscalable.

“The purpose of autonomous cybersecurity isn’t to replace the human element”, Brumley clarifies. “The purpose is to elevate human potential. Scarce security expertise shouldn’t be wasted on boring, manual tasks, such as testing patches and shifting out false-positives. By automating these mundane tasks, we allow humans to focus on what they do best: leverage their creativity. Humans should have more creative roles in security, such as finding new attack vectors machines can’t find.”

It’s commonly unknown that risk can be inherited via vulnerable code components sourced from software supply chains.

It’s commonly assumed that security is an upstream responsibility. And, it’s true that users of open source or third-party components don’t have the flexibility, control, or insight to find and fix vulnerabilities as its developers. Yet, it is ultimately the user — not the developer — of the software that is liable. In some cases, it may merely be one vulnerable component, but it’s important to consider the attack surface. Every system that interfaces with or is dependent on the vulnerable component is now exposed.

Another factor to consider is that code decays over time. As modern software requires more code, it grows complex and expands its attack surface. It requires more components from the supply chain, introducing new vulnerabilities and reintroducing previously addressed vulnerabilities. Akin to dental hygiene, continuous analysis is critical to ensure that software maintains its security posture.

Continuous behavior testing is a proven and accepted practice for software security.

Brumley believes that the Sec in DevSecOps is about being continuous. “Today, the Sec in DevSecOps is secondary, and it’s asynchronous. Security should always keep happening in the background. It should never sleep. Security is, in many ways, a game and the goal is to outpace attackers,” Brumley comments.

Brumley’s statement echos similar sentiments made by technology-forward organizations. Continuous behavior testing is a proven and accepted technique that is commonly practiced by tech behemoths like Google, Microsoft, and more. However, not all organizations have the technical savvy and budget to do what they do. It simply isn’t reasonable for the average organization. As Mayhem is brought to market, its purpose is to make this advanced testing technique accessible to those outside of the academic and security researcher community. Visit to learn more about behavior testing.

Deploying Mayhem across the Department of Defense has provided ForAllSecure valuable insight into challenges in aerospace, automotive, critical infrastructure, and more.

The software security challenges Mayhem addresses for the Department of Defense is not unique to the Government. The DoD is a large, complex organization with multiple branches, each with their own specific needs.

“In October 2017, we had a big demonstration. This was three to four months into the project with ForAllSecure. We had people across numerous agencies try [Mayhem] out. We had the top civilian in Cyber Command say he thought this is one of the most important things Defense Innovation Unit and the nation is doing in cyber.”

– Zach Walker, Defense Innovation Unit.

Through ForAllSecure’s partnership with the DoD, Mayhem has gained widespread exposure to challenges and use cases that are transferable and relevant to the commercial market, including aerospace, automotive, critical infrastructure, and more. Brumley also shared what commercializing Mayhem entails.

“To take Mayhem to market, we’ve been focusing on Mayhem Sword, or the analysis component, first. This is a decision made after seeing tremendous market validation for this capability. Most notably, Mayhem Sword was used within the DoD to secure weapons systems. Mayhem is not a box that plugs into the network. It is a high-tech solution. Today, it supports x86, x64, and ARM on Linux, as well as compiled languages,” Brumley shares.

“Like the Mayhem prototype shown at CGC, we want to have Mayhem eventually automatically patch the vulnerabilities it finds. This is the Mayhem Shield component. As we go to market, we’ll expand our offering to include automatic patching, whether it’s our patch or a vendor’s patch.,” Brumley finishes. Visit to learn more.

Walker and Brumley’s fashion sense captivates audiences.

Zach and David’s taste in shoes wows crowds. Take a close look during the fireside chat session (41:00 – 43:00). What are your thoughts? Tell us at #UnleashingMayhem:

Watch now:

Onward to the Next Chapter in ForAllSecure’s Journey

Onward to the Next Chapter in ForAllSecure’s Journey

Welcome back to the second installment of the ForAllSecure Journey series. In my previous post, we took a look back at ForAllSecure’s history. In today’s piece, I’d like to share not only my vision for the future, but also an exciting announcement.

Where it all began…

In 1998, I joined Stanford as a computer security officer, a role that is called CISO today. It was me and one other person. Our job was “computer security”, which included incident response. Then, we were able to accomplish a lot with just two people:

  • Monitor for brand-new vulnerabilities. 90% of information came from simply subscribing to the securityfocus mailing list and checking CERT bulletins. The work involved wasn’t half bad.
  • Scan for known vulnerabilities. We used Nessus when it was first released as open source.
  • Tell users to update. Admittedly, this wasn’t easy. Even back then, users didn’t want to risk an update breaking their workflow.

Two years later, our team was at capacity. New vulnerabilities were released more frequently. Scanning took considerable time. Informing users was still a bottleneck. Updates only came after our users were hacked.

The worst part was the way it felt. It was a never-ending treadmill. We were reactive to whatever the attacker was doing. Security wasn’t in control of the security cycle; we were reacting to the pace set by the attacker.

It was impossible to do everything we needed to do. We needed to get ahead of the curve. Here are some of the fundamental challenges I saw:

  • We were reactively looking for vulnerabilities already discovered by attackers. We must be proactive by leveraging superior techniques and technology.
  • We were manually scanning for known vulnerabilities. We need to instantly recognize which machines are at risk every time a new vulnerability is discovered.
  • We were pleading with users to patch for security and unable to articulate other business impact. We shouldn’t be guessing; we should algorithmically determine functionality, performance, and security of all updates before they ship.

These needs gave rise to ForAllSecure’s vision:

Autonomously check and protect the world’s software.

Humans are at their best when they’re given the space to be creative. I believe that automating boring, time-consuming tasks with autonomous technology will give them that. Relieving scarce security, development, and operations engineers from mundane, manual tasks offers mental freedom to creatively tackle challenging issues. Ultimately, we wanted to build a solution that elevates human potential.

Since, we’ve aggressively executed against this mission. ForAllSecure has carried out fundamental research that top peer-revenue venues accepted. We competed in an open contest to determine how well our technology works and won. Now, we’re moving on to the next step.

The Next Stage of Growth

We believe in our vision and the need to invest in growing the business to achieve it. As part of this, we are thrilled to announce our recent close of $15M Series A, led by New Enterprise Associates (NEA). NEA’s Forest Baskett and Aaron Jacobson have proven experience scaling early stage enterprise companies. We have also brought on several strategic investors, including Lane Bess and Jim Swartz. Jacobson, Baskett, Bess, and Swartz are accomplished business-building partners, and we look forward to their guidance as we bring Mayhem to market broadly.

Today, the Mayhem solution is available as a part of our early access program. Through this program, we’re collaborating with design partners, such as the Defense Innovation Unit and Fortune 1000 companies in automotive, Internet of Things (IoT), and high-tech industries, to develop Mayhem into a scalable, enterprise-grade platform with broader architecture support, DevOps integration, and enhanced usability. Ultimately, we want to make it easy for security, operations, and development teams to bring powerful dynamic security testing, that historically has been exclusive to tech behemoths like Google, into their software lifecycle. To learn more, go to

As we progress, I’ll continue to unveil more about the ForAllSecure journey and want to share my deep gratitude to our design clients, who have been strategic to Mayhem’s evolution. Thank you and looking forward to the journey ahead!

A Reflection on ForAllSecure’s Journey in Bootstrapping Behavior Testing Technology

A Reflection on ForAllSecure’s Journey in Bootstrapping Behavior Testing Technology

Software security is a global challenge that is slated to grow worse. The application attack surface is growing by 111 billion new lines of software code every year, with newly reported zero-day exploits rising from one-per-week in 2015 to one-per-day by 2021, according to the Application Security Report from Cybersecurity Ventures. Mobile alone has one new application released every 13 seconds.

One common approach to addressing software security issues is applying network filters. This is an easy band-aid. However, we don’t think it is the right long-term solution. Network filters applied by solutions like Web Application Firewalls (WAFs) aim to solve symptoms, not the root cause. When there is a new software vulnerability, we believe the software must be fixed. How do we know which software to fix?

First, we must check software. It is pertinent that we check all software; not just a few programs or those a developer chooses to submit.

Second, we must use the right tools. Most tools today require source code and are built with developers in mind. These solutions can be effective, if developers choose to use it. When security testing tools require source code, end-users are forced to trust the developer to run the tool and fix all problems. This doesn’t allow the IT administrator, the end-user, or the CISO to independently verify the security, safety, and resiliency of the software they buy and use. Shouldn’t they be able to check the software? After all, it’s these users — not the developer — who pay the price for an incident.

I’ve often heard that consumers don’t buy security. Thus, if developers ship exploitable software, no one will know. That’s true today because we do not have the right technology. Once we can check software without a developer, we can give consumers — whether it be a user or enterprise — the facts they need to make informed decisions. That, in turn, we think, will drive a more security-centric software model. As an analogy, imagine you didn’t have crash test data for cars. Then, you likely wouldn’t be compelled to evaluate cars on crash test data. It’s the same in security: if we can give users crash test data for programs, they will be able to make better choices.

We sought to uncover the right solution to address the persistent software security issues that have existed in the market for over two decades. We began our research in a university lab, where a brand new technology was born.

Evolving Deep Technology: From Research to Application

The Mayhem concept was born in my research lab at Carnegie Mellon University, where we explored binary analysis, symbolic execution, and fuzzing. Some of the earliest work we did dates back to 2003, when I was a graduate student. In the last 15 years, we’ve developed new techniques that you’ll find in today’s off-the-shelf code analysis, security analysis, and patching solutions. My co-founders Thanassis Avgerinos and Alexandre Rebert, as well as many other students, spent years publish their work in academic tier-1 venues.

In academia, our research focused on program verification but with a twist. Typical academic program verification takes in a “safety” property and a program. Then, it tries to verify the program is safe. Although these types of research aim to “verify a program is safe”, they frequently prove the opposite. More often than not, researchers reveal the bugs they found. Perhaps subtle, but critical. Decades of research have followed this approach, and we thought it was flawed. Mayhem’s twist is that we check insecurity and verify execution path by execution path. The science behind Mayhem comes from two areas: symbolic execution and fuzz testing.

Symbolic execution translates a program into a logical formula, and then reasons the program’s safety using logic. This original research was significant because it was the first to break the “exponential cliff” barrier. The exponential cliff refers to technical limitations caused by the exponential-size formulas generated by previous techniques. The challenge with previous work was that it would “fork” every time a new program branch was taken. On the contrary, Mayhem was able to generate linear-size formulas because it did not require “forking” to build formulas.

This work appeared at Tier 1 academic venues, such as ACM/IEEE International Conference on Software Engineering, the ACM Conference on Computer and Communications Security, and USENIX Security, and won the ICSE “Best Paper” award. The academic work also resulted in four US patents created by the founders and owned by Carnegie Mellon University: US 9,135,405; US 9,183,396; US 9,542,559; US 9,619,375. All patents have been exclusively licensed to ForAllSecure.

In 2014, we had our Mayhem Symbolic Executor analyze over 38,000 programs from scratch and perform over 209 million tests of those programs. Of the 209 million tests, 2 million resulted in successful hacking of programs. Those 2.6 million successes were the result of 13,875 previously undiscovered bugs. The only cost was Amazon. On Amazon, it cost on average $0.28 for Mayhem to discover and prove each new vulnerability.

This work was a proof-of-concept that demonstrated the power of checking software at scale. Not only did we find serious vulnerabilities, but we also found a way to put an expected cost on finding a new bug. Clearly, more work was needed, but we took it as a positive indication.

Fuzzing is Part of Mayhem Too

Although we wrote far less about it in academia, we acknowledge fuzzing is also a critical technique for finding bugs. In a nutshell, fuzzing chooses a new input, runs the program, and observes whether the execution is safe. Typically “safe” is defined as “not crashing”. Fuzzing has found significant bugs. For example, the OSS-Fuzz program at Google has found over 13,000 bugs in Chrome.

Mayhem uses both symbolic execution and fuzzing. Fuzzing’s strength is in guess-input and speed, allowing users to run hundreds of tests per second. However, it misses the opportunity to deeply reason about each execution. This is where symbolic execution offers value. Symbolic execution does deep reasoning using formal logic. However, it can take seconds or longer on each execution. We found that the key for effective dynamic testing is to use these two techniques together: use deep reasoning of symbolic execution on some runs, while continuously fuzzing in the background. To learn more about the synergistic power of symbolic execution and fuzzing, download the “What is Behavior Testing” whitepaper here.

DARPA Cyber Grand Challenge

The Cyber Grand Challenge (CGC) was the next step in maturing research to commercial impact. I’ve previously written why the CGC mattered to me. I often joke that DARPA must have read our papers and thought, “I don’t believe you. Show me at DEFCON”, because that’s what they did. They asked participants to demonstrate that they could find bugs, and go one step further: fix them. The catch? It must be done without source code or human intervention.

I believe the CGC reflected real-life considerations better than academic research. A typical research paper aims to answer whether the program is secure after patching. In the CGC, our primary criteria was performance and functionality. It is, at times, better to remain vulnerable than deploy a patch that doesn’t meet performance or functionality goals. My experience as a CISO was often similar: security was important conceptually, but the “now” impact of performance and functionality outweighed the “someday” impact of a possible compromise.

Prior to the CGC, patching bugs wasn’t something we had done. Others in academia had done some work, but it wasn’t our area of focus. We read the papers and found the techniques useful.

Our goal was to have less than 5% performance impact on every patch and never lose functionality. With only the binary, how could we measure this? “QA” is a hidden component in security. QA is frequently met with yawns in the security community and is even considered a separate market. In Mayhem, QA is integral. In the CGC, we automatically created regression tests and replayed them against every candidate patch. We measured performance and ensured, to the extent we could, no functionality was lost. I believe one of the reasons we won CGC was our QA. It was that important. Others had great techniques for finding and fixing vulnerabilities. We found Mayhem made much better business decisions, such as not fielding a performance-expensive patch as a default. Mayhem made intelligent choices based upon quantitative data that was inline with a success strategy.

CGC was completely autonomous — no intervention was permitted in the find, fix, QA, and decision-making cycle of software security. We are proud to have won DARPA’s $60M experiment and mathematical evaluation of autonomous technology. We took our $2M prize money to bootstrap the evolution of Mayhem from a DARPA demonstration to a product.

Working with Design Partners: From Prototype to Product

We received an overwhelming response from enterprises, institutions, and individuals across various industries after our DARPA CGC win. The market need was undeniable, and we began designing our product.

We quickly realized we couldn’t transition the entirety of the Mayhem prototype, as fielded in the CGC, to market. Market validation revealed the technology that automatically found defects, created test suites, and performed QA on patches were the most desired. We have prioritized this, with the plan of bringing the rest of the Mayhem prototype to production over time.

When you bring a product to market, all the messy, real-life considerations that are abstracted in academia and DARPA work must be taken into account. For example, in CGC…

  • …we didn’t have to worry about ingesting a program. DARPA designed a whole infrastructure for that. As a product, we have to develop easy-to-use ways to ingest new software.
  • …we had 7 system calls, which are representative theoretically of real systems. As a product, we have to deal with the complexity of each OS. We are focusing on Linux today, as we believe it will be the best user experience out-the-door.

We’re not designing Mayhem as a product in isolation. We have several design partners who committed early on to help realize the vision. The Department of Defense has been a large influencer. They acutely feel the pain from a) having mission-critical systems to secure, and b) having to deal with both having no source code and fitting into DevOps situations. In 2017, we partnered with Defense Innovation Unit, a progressive group within the Department of Defense, to adapt Mayhem’s capabilities into both left-of-ship development processes and right-of-ship testing and validation processes. Our multi-million dollar contract with the Department of Defense became non-dilutive funding for ForAllSecure, continuing our ability to grow while remaining profitable.

In addition to DIU, we are also collaborating with design partners in automotive, Internet of Things (IoT), and high-tech industries to understand how Mayhem is used to secure the software they develop, as well as the software they integrate from third-party suppliers, including partners and open source. Our collaboration with design partners has helped us develop Mayhem into a scalable, enterprise-grade platform with broader architecture support, DevOps integration, and enhanced usability, allowing security, testing, and development teams to bring powerful dynamic security testing into their software lifecycle.

Over the coming year, ForAllSecure will make its solution available to mainstream companies and eventually to end-user consumers. In our first steps, we’re focusing on those with critical systems, as well as those already familiar with behavior testing techniques, such as fuzzing or symbolic execution. Today, the current version of Mayhem is available as a part of our early access program. Contact us at if you’re interested in learning more.

Additionally, we are scaling our engineering and business teams across our offices in Palo Alto CA, Pittsburgh PA, and Washington D.C. to continue evolving Mayhem. We are expanding our team with people passionate about building autonomous software security. Visit our Careers page to explore open roles.

The Next Chapter

The last two years have been a remarkable journey. I am incredibly proud of what we’ve accomplished for the company and the Mayhem solution. I am grateful to our design partners as well as the ForAllSecure team.

In the next installment of this series, I’ll share more about my vision for ForAllSecure’s future and what’s in store for the next stage of our growth. Stay tuned!

Innovators under 35

Innovators under 35

I am truly honored to share that I have been named to MIT Technology Review’s prestigious annual list of Innovators Under 35 as a Pioneer. The award, first given by the magazine in 1999, celebrates young innovators who are poised to be leaders in their fields. Many amazing people have been given this award: Larry Page and Sergey Brin of Google; Mark Zuckerberg of Facebook; Max Levchin of PayPal. I am humbled to be in such great company.

Gideon Lichfield, editor-in-chief of MIT Technology Review, said: “MIT Technology Review inherently focuses on technology first – the breakthroughs and their potential to disrupt our lives. Our annual Innovators Under 35 list is a chance for us to honor the outstanding people behind those technologies. We hope these profiles offer a glimpse into what the face of technology looks like today as well as in the future.”

The award is a recognition of my work on Mayhem – the autonomous cyber reasoning system that competed in and won the DARPA Cyber Grand Challenge. While this award is inherently individual, I would like to recognize the amazing people that poured years of their lives into taking Mayhem from a research idea to the winning system, especially David Brumley, Thanassis Avgerinos, Sang Kil Cha, John Davis, Tyler Nighswander, Ryan Goulden and Ned Williamson. Every single one of them deserves this award as much as I do.

Mayhem: from research project to winning the DARPA Cyber Grand Challenge

Mayhem was originally started by David Brumley, Thanassis Avgerinos, Sang Kil Cha and myself at Carnegie Mellon University. We made advances in the area of formal verification of software programs that allowed our analyses to scale to larger software. Our work won an ACM Distinguished Paper award after finding thousands of bugs in linux software.

Over the past few years, I led the development of the Mayhem Cyber Reasoning System, which culminated in 2016 in two historic events: the DARPA Cyber Grand Challenge, the first all-machine hacking competition, which pitted 7 fully autonomous systems against each other, and DEFCON, which challenged 14 of the best hacking teams against a completely autonomous system.

Mayhem won first place and $2 million in the Cyber Grand Challenge. Mayhem then went on to compete against elite hackers at DEFCON, where it held its ground showing that 1st generation systems are already competitive with the world’s best hackers.

What machines (currently) lack in creativity, they make up for in speed, tenacity & scale. Mayhem analyzes thousands of programs in parallel in a few hours, a task that would take a human many years of tedious work. Mayhem can find thousands of bugs and previously unknown vulnerabilities in a day running on the cloud. In the time it takes an expert to open up a file, an automated system may have looked at hundreds.

Beyond the Cyber Grand Challenge: defending real-world systems

We’re on the cusp of the age of automated computer security reasoning, and we have to adapt the way we think about security accordingly. In a world where cyberattacks are becoming commonplace and are increasingly leveraged by nation states to disrupt weapon development, power grids, and elections, computer security becomes a national security issue. With the shortage of computer security experts and the increasing volume of software in our daily lives, relying solely on human expertise is insufficient and dangerous. Automated computer security tools are a necessity to protect ourselves.

To meet this need, we founded ForAllSecure with the vision to automatically check the world’s software for security vulnerabilities. We are actively working on adapting our cyber-reasoning technology to secure critical systems and infrastructure both in the public and private sector. If you would like to hear more about how Mayhem can be applied to your software, please go to

Applying Cyber Grand Challenge Technology to Real Software

Applying Cyber Grand Challenge Technology to Real Software

I first heard about Mayhem when I read that researchers at my university, Carnegie Mellon, had reported 1200 crashes in Debianjust by running their binary analysis system on Debian programs for 15 minutes at a time. When I learned that the technology developed by those researchers was spun out as a startup, ForAllSecure, I knew I had to get involved.

Read More Read More

Why ForAllSecure is on MIT Technology Review’s 2017 List of Smartest Companies

Why ForAllSecure is on MIT Technology Review’s 2017 List of Smartest Companies

I am honored to share that ForAllSecure has been named to MIT Technology Review’s 2017 list of 50 Smartest Companies.   According to the MIT Tech Review team, to make the list, a company must exhibit technological leadership and business acumen, which set them apart from competitors. 

Nanette Byrnes, senior editor for MIT Tech Review business shared:

“Public and private, large and small, based in countries around the globe, this group of companies is creating new opportunities and pouncing on them. These are the ones that competitors must follow.”

Read More Read More

Case Study: LEGIT_00004

Case Study: LEGIT_00004

LEGIT_00004 was a challenge from Defcon CTF that implemented a file system in memory. The intended bug was a tricky memory leak that the challenge author didn’t expect Mayhem to get. However, Mayhem found an unintended null-byte overwrite bug that it leveraged to gain arbitrary code execution. We heard that other teams noticed this bug, but thought it would too hard to deal with. Mayhem 1 – Humans 0. In the rest of this article,  we will explain what the bug was, and how Mayhem used it to create a full-fledged exploit.

Read More Read More

Mayhem Wins DARPA CGC

Mayhem Wins DARPA CGC

Mayhem CRS.jpg

Mayhem is a fully autonomous system for finding and fixing computer security vulnerabilities.On Thursday, August 4, 2016, Mayhem competed in the historical DARPA Cyber Grand Challenge against other computers in a fully automatic hacking contest…and won.  The team walked away with $2 million dollars, which ForAllSecure will use to continue its mission to automatically check the world’s software for exploitable bugs.

Read More Read More

Why CGC Matters to Me

Why CGC Matters to Me

By David Brumley

In 2008 I started as a new assistant professor at CMU. I sat down, thought hard about what I had learned from graduate school, and tried to figure out what to do next. My advisor in graduate school was Dawn Song, one of the top scholars in computer security. She would go on to win a MacArthur “Genius” Award in 2010. She’s a hard act to follow. I was constantly reminded of this because, by some weird twist of fate, I was given her office when she moved from CMU to Berkeley.

The research vision I came up with is the same I have today:

Automatically check the world’s software for exploitable bugs.

To me, the two most important words are “automatically” and “exploitable”. “Automatically” because we produce software far faster than humans could check it manually (and manual analysis is unfortunately far too common in practice). “Exploitable” because I didn’t want to find just any bugs, but those that could be used by attackers to break into systems.

Read More Read More