What is AFL++?
AFL++ is a fuzzer developed from the American Fuzzy Lop fuzzer to incorporate all features in the AFL family fuzzers developed earlier but not merged in AFL. The AFL++ fuzzing framework comprises AFL-fuzz which is a fuzzer with many configurations and mutations, different source and binary code instrumentation modules, utilities for test cases, and helper libraries (Heuse et al., 2022). The community-driven open-source tool offers various features including a custom mutator API allowing extending the fuzzing process at many stages. Fuzzing entails uncovering various bugs in a fully automated fashion (Zhang, 2022). The tool was developed as a collection of patches and forks from the AFL family of fuzzers with additional research-grade extensions making the tool ready for use. AFL++ has been termed as superior to Google’s AFL because it is faster, includes more and better mutations, the instrumentation is better, and it also supports custom modules (Zhang, 2022). The automated software testing tool was designed to create inputs that induce crashes and feed them to the target program or server that is being tested for vulnerabilities. AFL++ is a short form for American Fuzzy Lop Plus which is an upgrade from AFL family software. The tool is developed for C programs and has a special compile that instruments the target program or server with assembly blocks that act as a series of indicators that help comprehend the behavior of the program when executed with a given input.
How AFL++ Works
Fuzz testing also referred to as fuzzing involves the use of a tool to automate software testing by injecting invalid, malformed, or unexpected inputs into software to help uncover defects and flaws in the software (What Is Fuzz Testing and How Does It Work? n.d.). The AFL++ tool injects inputs into the system that is being tested and then monitors for exceptions like crashes or leakage of information. This is done by throwing random data at the software such as inputting structured data to mutate a little, throwing random data via stdin or from a file, instrumenting the code, throwing random data, and seeing the paths of code that are triggered then input it into the data generator (What Is Fuzz Testing and How Does It Work? n.d.). Some fuzzers are “blind” because the random inputs are generated and sent to the target program or server but this has a drawback because the success relies upon the lack of the input drawn.
AFL++ is a collection of patches from AFL++ family tools hence no singular principle of operation. One of the techniques is coverage measurements in which instrumentation is injected into a target program and it determines the coverage (Technical “Whitepaper” for Afl-Fuzz, n.d.). Instrumentation coverage provides insight into the code execution path which helps discover fault conditions that can cause vulnerabilities in the program.
Detecting new behaviors
AFL++ fuzzer has a global map of tuples which is used to compare with individual traces and execution traces produced by mutated input are preserved for later processing (Technical “Whitepaper” for Afl-Fuzz, n.d.). The other inputs that do not cause transitions are discarded. AFL++ fuzzer can learn from binary and source code instrumentation by reading the inputs and creating new behaviors. The fuzzer can recycle the inputs and perform mutation on them to narrow down to specific issues on the target program or server (Fuzzing IoT Binaries with AFL++ – Part I, 2022).
Evolving the input queue
The fuzzer evolves the input queue by adding mutated test cases that produce new transitions (Technical “Whitepaper” for Afl-Fuzz, n.d.). The tool uses an “input corpus” which is a set of files that contain typical inputs for the target server or program. AFL++ runs indefinitely or based on specification and executes the program numerous times per second, inputting the unique feed and watching how the program behaves (Fuzzing IoT Binaries with AFL++ – Part I, 2022). The tool stores the inputs that cause crashes or timeout in a separate location.
Culling the corpus
The corpus is the file containing the possible inputs. As the input queue evolves, AFL++ periodically re-evaluates the use using an algorithm to select the test cases while covering each tuple thus providing a balance between the speed of queue cycling and the diversity of test cases (Technical “Whitepaper” for Afl-Fuzz, n.d.).
Trimming input files
Input files can affect the fuzzing performance hence AFL++ has been designed to use instrumentation feedback to automatically trim down the input files. The changes do not affect the execution path. The algorithm used in the minimization of input files attempts to zero large blocks performs block deletion performs alphabet normalization and byte-by-byte normalization on non-zero bytes (Technical “Whitepaper” for Afl-Fuzz, n.d.).
Instrumentation feedback makes it easy to comprehend the different fuzzing strategies and optimization of parameters (Technical “Whitepaper” for Afl-Fuzz, n.d.). AFL++ fuzzing strategies sequential bit flips, sequential addition and subtraction of small integers, and sequential insertion of known integers such as 0, 1, and int_max.
De-duping crashes is a technique AFL++ uses to identify crashes based on two conditions: the crash trace includes a unique tuple and the crash trace misses a tuple that has been identified in earlier faults (Technical “Whitepaper” for Afl-Fuzz, n.d.). This way, the tool ensures unique crashes are reported.
The efficiency of AFL++ arises from this feature that avoids ambiguity in exploiting crashes. Constraints that cause non-crashing mutations are discarded by relying on instrumentation feedback and isolating the new inputs (Technical “Whitepaper” for Afl-Fuzz, n.d.).
The fork server
AFL++ fuzzer users a “fork server” to improve performance by cloning the fuzzed process. The fork server terminates at the first instrumented function making it possible to execute manual fork server mode and “persistent” mode that tries out multiple inputs (Technical “Whitepaper” for Afl-Fuzz, n.d.).
Parallelization entails periodic examining of the queues produced by the instances and choosing test cases that produce unique fuzzer behaviors (Technical “Whitepaper” for Afl-Fuzz, n.d.).
Binary-only instrumentation uses QEMU user mode to execute on different architectures. There are other binary translators including DyanmoRIO and PIN although QEMU is the fastest as it leverages a fork server that resembles that of a compiler instrumented code (Technical “Whitepaper” for Afl-Fuzz, n.d.).
Why is AFL++ useful in fuzzing code?
AFL++ is useful because you can compile your code using it and then use the information or output to establish how to trigger code paths using intelligent mutations. For example, it allows a code that parses a PNG file starting AFL++ off with no initial data, and the tool will generate valid PNG images (Godefroid, 2020). AFL++ is effective in finding bugs. AFL++ is useful in fuzzing code because it automates the software testing process enabling finding security vulnerabilities in software with increased efficiency and accuracy (Godefroid, 2020). The fuzzer is time-saving and cost-efficient by automating test generation and execution as this allows detecting up to thousands of security vulnerabilities in different kinds of software. AFL++ fuzzing comes in handy at every untrusted interface of any product development lifecycle. When developers are coding software that is likely to process untrusted input, fuzzing is an important tool. Also, when working on stand-alone software which has large and complex data parsers, the AFL++ fuzzing tool helps uncover bugs with increased effectiveness (Godefroid, 2020). Static program analysis and manually inspecting code are not very accurate and some vulnerabilities and bugs may be missed or unable to detect.
When using AFL++ fuzzer, these vulnerabilities and bugs that might have been missed during static code analysis or manual inspection of software code can be detected. AFL++ can use different modes including Qemu mode in which the qemu user is utilized to run the binary (Hazimeh et al., 2020). Qemu performs instrumentation of the basic blocks as the program runs enabling the instrumentation to generate information used to create new test cases which trigger different paths for the code and improve code coverage. Also, AFL++ is useful in fuzzing code because qemu mode enables instrumentation of foreign arch binaries such as arm binary on an x86_64 host which is essential for fuzzing code in IoT firmware binaries. However, the fuzzer only works with file inputs in that it does not support programs that get input from a socket. The fuzzer supports programs that accept file input only (Hazimeh et al., 2020). Code fuzzing using AFL++ is useful because it gives a good overall picture of the code quality of the system or server being tested. The tool helps assess the robustness and security posture of the system thus giving an overview of the system’s security and quality. Hackers use fuzzing as the primary technique to uncover vulnerabilities hence using AFL++ fuzzer helps prevent zero-day exploits that can arise from unknown bugs and flaws in the system (Hazimeh et al., 2020). Also, AFL++ involves a low overhead cost and saves time because the process is automated compared to static analysis and manual code inspection.
What is static code analysis?
Static code analysis is a source code debugging methodology carried out as part of a source code reviewing process whereby the source code is audited and not a compiled program. Static code analysis is performed during the implementation phase of a security development lifecycle and it refers to the running of tools to uncover possible flaws and vulnerabilities in the source code using techniques like Taint Analysis and Data Flow Analysis (Prähofer et al., 2012). The process entails a comparison of the source code with a predefined set of rules to uncover potential security flaws. This method of debugging code involves examining the source code and not an executed or running program. Static code analysis is carried out in the early development phases before testing of the software commences (Kulenovic & Donko, 2014). Data-flow analysis is a type of static data analysis for checking for vulnerabilities. An example of a static code analysis tool is FindBugs which is a popular open-source tool for Java.
The other technique is control flow analysis which determines the paths a program can take and uses the branches and loops to show information on the potential paths the program will take and also the dependencies. The methods used in control flow analysis include dominance relationships whereby the code analyzer gives information on the parts of the code or statements that are executed first while finding strongly connected components is the second method. This method involves establishing the loops in the program (Kulenovic & Donko, 2014). Code flow analysis on the other hand entails an analysis to determine where the source of the data and where goes, and how the code uses the data. The methods include live variables, experiences, and reaching definitions (Kulenovic & Donko, 2014). Reaching definitions involves establishing variable assignments which may reach a particular part of the code while live variables determine the lifetime of the variables thus detecting dead variables while analysis of variable expressions gives information about the expressions that have been evaluated to remove redundancy.
How does it work?
Program execution undergoes a series of transformation processes ranging from input phase, scanning to parsing. Static code analyzers feed on the output from the code execution stages and use different techniques to check the code for bugs and vulnerabilities (Prähofer et al., 2012). This includes scanning whereby the analyzer checks the code to detect errors such as unused imports, too many nested loops, use of curly brackets instead of square brackets, and use of single quotes instead of double quotes among other bugs. The static code analyzer does this by checking the code against the defined coding rules from standards to establish whether the code complies with these rules (Prähofer et al., 2012). The types of static analysis include control analysis whereby the analyzer focuses on the control flow, data analysis which entails defined data, fault analysis for analyzing faults and failures, and interfaces analysis for checking the code to make sure the interface fits into the model.
Why it is used on code to discover bugs and software vulnerabilities?
Injection vulnerabilities are some of the most common vulnerabilities found during static code analysis. For instance, a static code analysis tool is used to check for code that performs SQL queries if the queries depend on untrusted or external input and if the input can be sanitized to remove potentially malicious content before implementation. Other common vulnerabilities include SQL and LDAP injection, cross-site scripting, and buffer overflows among others. Static code analysis is used to find other coding issues including programming errors, syntax violations, security vulnerabilities, and defined values among others (Louridas, 2006). Static code analysis is used to discover vulnerabilities because it is relatively simple provided it is automated. There are numerous tools for static code analysis for different programming languages including C and C++, Java, and C# among others. Static code analysis is used to discover bugs and software vulnerabilities because it offers speed, depth, and accuracy. Static code analysis tools enable automation of the process of checking for bugs and vulnerabilities, which is faster compared to manual code evaluation. The methodology addresses the problems early and points out where the errors are in the code hence, they can be fixed earlier which also makes the debugging less costly (Louridas, 2006). Unlike software testing, a static code analyzer can cover every possible code execution path hence, it offers in-depth code analysis to find potential problems which could not be captured using testing or manual code analysis. Also, static code analysis is used because there is improved accuracy compared to manual code analysis which is prone to human error (Louridas, 2006). Static code analysis tools are automated and can check each line of code for potential problems ensuring the highest-quality code before software testing starts. It ensures compliance with coding standards.
Fuzzing IoT binaries with AFL++ – Part I. (2022, January 13). Attify Blog – IoT Security, Pentesting, and Exploitation. https://blog.attify.com/fuzzing-iot-devices-part-1/
Godefroid, P. (2020, March 4). Fuzzing: Using automated testing to identify security bugs in software. Microsoft Research. https://www.microsoft.com/en-us/research/blog/a-brief-introduction-to-fuzzing-and-why-its-an-important-tool-for-developers/
Hazimeh, A., Herrera, A., & Payer, M. (2020). Magma: A Ground-Truth Fuzzing Benchmark. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 4(3), 1–29. https://doi.org/10.1145/3428334
Heuse, M., Eißfeldt, H., Fioraldi, A., & Maier, D. (2022, January 1). AFL++ – AFL Plus Plus. GitHub. https://github.com/AFLplusplus/AFLplusplus
Ji, T., Wang, Z., Tian, Z., Fang, B., Ruan, Q., Wang, H., & Shi, W. (2020). AFL: Direction-sensitive fuzzing. Journal of Information Security and Applications, 54, 102497. https://doi.org/10.1016/j.jisa.2020.102497
Kulenovic, M., & Donko, D. (2014). A survey of static code analysis methods for security vulnerabilities detection. 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO). https://doi.org/10.1109/mipro.2014.6859783
Louridas, P. (2006). Static code analysis. IEEE Software, 23(4), 58–61. https://doi.org/10.1109/ms.2006.114
Prähofer, H., Angerer, F., Ramler, R., Lacheiner, H., & Grillenberger, F. (2012, September 1). Opportunities and challenges of static code analysis of IEC 61131-3 programs. IEEE Xplore. https://doi.org/10.1109/ETFA.2012.6489535
What Is Fuzz Testing and How Does It Work? (n.d.). Www.synopsys.com. https://www.synopsys.com/glossary/what-is-fuzz-testing.html#:~:text=Definition
Zhang, W. (2022). Obtaining Fuzzing Results with Different Timeouts. 2022 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). https://doi.org/10.1109/icstw55395.2022.00048