Fuzzing: How to Automatically Create Complex Test Cases and Uncover Unknown Bugs

04 sept. 2019

8min

Fuzzing: How to Automatically Create Complex Test Cases and Uncover Unknown Bugs

auteur.e

A world-famous security expert who uncovered thousands of bugs impacting billions. He consulted and trained top NASDAQ companies and governments. Speaker @ Black-Hat, HITB, IEEE WCRE…

The advice that it is way more cost effective and efficient to fix bugs early in the development process of a product rather than after shipping has been heard by many developers. And it’s true! Numerous studies suggest that the expense of fixing a security bug increases exponentially as the product moves forward in the Software Development Life Cycle (SDLC). In 2001, Soo Hoo et al. calculated that fixing one at deployment stage is 100 times more expensive than doing it during the development stage.

Security bugs are how skilled researchers have been able to produce system jailbreaks and thus evade having to use application stores to access mobile apps, from which companies such as Apple, Google, and Samsung generate a significant portion of their revenue! And everybody remembers 2016’s Samsung Galaxy Note S7 fiasco with the exploding batteries, which was estimated to have cost the company $17 billion! It’s likely that this bug could have been detected via carrying out fuzzing on several of the interfaces related to power management!

A couple of years ago, the French data-protection authority CNIL fined a housing association €75,000 for a lack of security in its systems. Although not a huge amount for international corporations, it’s a substantial penalty for small businesses. Now, as GDPR laws have advanced, a fine can be set at 4% of the worldwide annual revenue of a company! Imagine the amount that the European Commission could collect from Internet giants such as Google, whose revenue in 2018 was $136.22 billion!

Fuzzing: A definition

Do you remember, under Windows 95, for example, when if you copied and pasted some text from the Internet onto a Microsoft Word document, the process word.exe would crash? Or when Visual Studio would crash when certain gdi32 API calls were made? What a horrible feeling it must have been to lose so much time simply because the products had not been tested for unexpected inputs!

Testing offers a viable family of approaches for detecting implementation bugs. Many developers are familiar with the concept of testing mostly in the form of functional testing, such as with unit tests, where you have a fixed set of inputs and call your function with these parameters and generally use a set of unit tests as a regression suite.

On the other hand, with negative testing, rather than assessing the conformance of a function with regard to certain fixed inputs, you focus on generating and submitting inputs with the intention of triggering a fault. Fuzzing is a particular class of negative testing where you automatically generate and evaluate abnormal inputs in order to trigger the effects of the targeted bug family (or families). It is also sometimes referred to as an “act of software torture” (Vuagnoux, 2005), a term that was coined initially by Barton Miller (Barton et al., 1989; Forrester and Miller, 2000), because it involves generating and submitting a high quantity of partially malformed inputs to the system under test conditions in the hope of triggering the effects of a fault.

Users of fuzzing

In the Silicon Valley, fuzzing is used by the tier 1 tech companies that make up FAANG—Facebook, Apple, Amazon, Netflix, and Google—to ensure the robustness and security of their products, and in particular to search for security bugs, or in other words, vulnerabilities.

Any entity that develops a sufficiently complex or critical product, in terms of handling critical data, or a product that has a critical impact or a significant competitive advantage, should use fuzzing not just for its software stack, but also its hardware stack. In fact, it would be unrealistic to ignore the fact that, for any protocol or data processing, fuzzing ought to be used. For example, some banks need to fuzz the input format of SWIFT messages, and some car manufacturers need to fuzz the messages sent between different units in smart cars.

Any development team should use fuzzing as a form of ultimate unit test on at least 80% of their core functions, such as audio, video, networking, and pattern processing. Those areas particularly prone to errors include hardware accelerated paths with handwritten assembly portions, complex state-based operations, and “spaghetti code” (code sections with high cyclomatic complexity, or functions that have a consequential number of lines, or where the contracts are either not super-clear or not verified at runtime).

A fuzzing overview

The best way to think of fuzzing is as the ultimate form of unit testing!

Let’s consider this C program:

int main(int argc, char** argv) {    if(argv[1][0]=="b") {        if(argv[1][1]=="o") {            if(argv[1][2]=="o") {                if(argv[1][3]=="m") {                    assert(false); // to trigger a crash    }    }    }    }    return 0;}

This program will crash if we call it via ./pgm_name boom

At the stage called Identify The Target, given a target function, fuzzing will basically fix all the parameters apart from one buffer that typically can be “tainted” by the attacker (for example, httpd, which receives parameters from the user via recv()). At the stage Generate Fuzzed Data you will then iterate over the domain value of that buffer parameter. When coupled with grey-box techniques, such as the edge-coverage one explained later in this article, fuzzing will discover that input “b” covers an edge that is not covered by, for instance, input “a”. It will then keep “b” for later use. In a subsequent Generate Fuzzed Data step, one example of a mutation operator is “append a character”. It’s the same story here: Appending “o” will construct the input “bo”, which covers an edge not covered before. Thus we will keep this input until fuzzing creates the input “boom”, which will trigger a failure, here in the form of an assertion. This input “boom” will be kept as a crashing input, which can then be used as a regression input in your test suite.

More graphically, with grey-box fuzzing techniques, you are able to automatically generate inputs such as the one illustrated here, which is a fuzzing of a GIF-rendering library.

Types of bugs found thanks to fuzzing

Fuzzing is traditionally applied to searching for memory-corruption bugs. When developing a modern application, we are generally interested in more than just security bugs. Thus, coupling fuzzing techniques with error detectors will expand your bug-finding capabilities.

Here are classes of bugs that can be detected by combining fuzzing with some of those test verdicts:

Memory-corruption and safety bugs: Many critical security bugs, such as Heap-Based Buffer Overflow, Use After Free, and Stack-Based Buffer Overflow, have a significant impact on security, while Uninitialized Variable and Integer Overflow can also lead to the security of your product being compromised.
Scheduling bugs:Race Condition, Hang, Livelock.
Assertions: Typically found in debug builds, although some are also seen in production, such as in web browsers.
Performance bugs: Memory leaks, which ultimately affect the disk operating system (DOS) of the OS after prolonged use without reboot—something that is still currently a typical problem with several OS!
Web and mobile GUI security bugs:Cross-Site Scripting (XSS), SQL Injection, XPATH Injection, PHP Code Injection, and Shell Command Injection.

The different flavors and smartness of fuzzing

In its basic essence, fuzzing can be seen as piping the output of ‘/dev/random’ to the input of your program. There are, however, different degrees of smartness, depending on the knowledge of the input format and the target, as outlined below.

Fuzzing and knowledge of the input format

Based on the degree of knowledge assumed a priori over the target, we can distinguish different methods of fuzzing as follows:

Dumb: In other words, random bit flipping.
Grammar-based generative and mutative fuzzing: This is where the tester writes a grammar that expresses a set of production rules regarding an approximation of the words accepted by the fuzzing target.
Inference: After submitting each input to the target, new knowledge is gained and reinjected into the input-generation process.

An example of grammar-based fuzzing, where words of the grammar are generated that will then become inputs. An example of this can be seen in the use of HTML files to fuzz a browser such as Microsoft Internet Explorer.

Fuzzing and knowledge of the target

We typically differentiate between black-box fuzzing, which assumes no knowledge of the internal workings of the application, and grey-box fuzzing, where we have the ability to capture some knowledge about the inner workings of an application (for example, which basic blocks at the assembly level have been executed by the application and the dependencies DLL/SO).

For black-box and approximate inference purposes, Radamsa from OUSPG is generally one of the most-used mutators. Meanwhile, American Fuzzy Lop, written by Michal Zalewski, was one of the first public fuzzers to democratize grey-box fuzzing. It works by using knowledge acquired by processing an input that will partially guide you in—or at least increase your knowledge of—how to generate the next inputs. This is typically achieved by recording knowledge at runtime or, for example, via instrumentation added at compile time to measure the executed basic blocks at the assembly level.

The initial investment cost of a fuzzing and sanitizer harness

Depending on whether any security analysis or bug-finding technique has ever been applied to the codebase, the return on investment is generally extremely valuable, not only in terms of bug-finding capabilities and bug-fixing efficiency (from birth to patch), as noted by the US-Computer Emergency Readiness Team, but also with regard to the overall cost for the company or entity. The cost, bad publicity, and impact on stock price suffered by Intel, Nvidia, and Qualcomm when they’ve needed to patch an architectural flaw that has security implications is proof of the latter!

In recent years, developments in LLVM (a project focusing on modular and reusable compiler and toolchain technologies) have also made fuzzing more affordable. However, do not underestimate the cost of producing separate fuzzing builds. These should be additional compilation profiles that are separate from your debug builds, typically when all assertions are enabled. Also, do not underestimate the infrastructure and maintenance cost of elements such as storage, computation, and data analysis.

The trade-offs of fuzzing

Beware: Fuzzing is not the perfect solution when it comes to the general problem of identifying and fixing all the bugs in your application! In particular, due to its generally mutational nature, it is generally weak at passing conditions where part of an input is the result of an operation applied to another part, for example a hash value. In such cases, current solutions include a manual limitation analysis and then eventually simplifying this type of code in fuzzing builds via conditional preprocessor definitions (via #ifdef, for example).

The bug-discovery capabilities via fuzzing are also limited by your ability to create new inputs, instrument your code and libraries, exercise relevant portions of your code without a graphical interface, and compute at every state a test verdict depending on which fault you want to expose.

Smart fuzzing: The future

The most likely future directions fuzzing could take include the following:

Continuous integration fuzzing: For this, on each commit of any developer, you recompile, instrument, and fuzz the targets in order to minimize the lifetime of bugs as much as possible.
Automatic generation of fuzzing players: Many companies of medium maturity need skilled developers to be able to tackle the functions they want to fuzz. For this, developers need to write fuzzing players that will call these functions and set up the required states to reach them, respecting the contracts expected by these functions. The next stage of maturity is the ability to automatically generate such fuzzing players.
Model-based fuzzing: This consists of having, or iteratively building, a model of the input in the form of complex grammars.
Concolic execution: This involves marking some parts of the inputs as symbolic, some parts as concrete, then computing the equations for reaching each basic block and solving those huge equations with constraint solvers. This permits you to obtain new concrete inputs for testing the parameters of the functions.
Taint-driven fuzzing: This is aimed specifically at certain “sink functions”, for higher precision when mutating.

As always in computer-security research and engineering, do not assume that fuzzing will help you to identify all the bugs in your products. It is also recommended that you learn how to use and tune static analysis checkers and concolic execution tools. And don’t hesitate to seek the advice of those who are expert in red team penetration testing and code audit capabilities so that you can identify the most critical security breaches in your products before they get exploited by malicious parties.

This article is part of Behind the Code, the media for developers, by developers. Discover more articles and videos by visiting Behind the Code!

Want to contribute? Get published!

Follow us on Twitter to stay tuned!

Illustration by Victoria Roussel

Les thématiques abordées

Testing

Inspirez-vous davantage sur : Coder stories

We can learn a lot by listening to the tales of those that have already paved a path and by meeting people who are willing to share their thoughts and knowledge about programming and technologies.

Keeping up with Swift's latest evolutions

Daniel Steinberg was our guest for an Ask Me Anything session (AMA) dedicated to the evolutions of the Swift language since Swift 5 was released.

10 mai 2021

"We like to think of Opstrace as open-source distribution for observability"

Discover the main insights gained from an AMA session with Sébastien Pahl about Opstrace, an open-source distribution for observability.

16 avr. 2021

The One Who Co-created Siri

Co-creator of the voice assistant Siri, Luc Julia discusses how the back end for Siri was built at Apple, and shares his vision of the future of AI.

07 déc. 2020

The Breaking Up of the Global Internet

Only 50 years since its birth, the Internet is undergoing some radical changes.

26 nov. 2020

On the Importance of Understanding Memory Handling

One concept that can leave developers really scratching their heads is memory, and how programming languages interact with it.

27 oct. 2020

En voir plus

La newsletter qui fait le taf

Envie de ne louper aucun de nos articles ? Une fois par semaine, des histoires, des jobs et des conseils dans votre boite mail.

Vous êtes à la recherche d’une nouvelle opportunité ?

Plus de 200 000 candidats ont trouvé un emploi sur Welcome to the Jungle.

Explorer les jobs