In a recent report, published by Veracode, it is pointed out that a high percentage of commercial and open source softwares is written in C/C++ and thereby, making them highly susceptible to many dangerous attacks. Under this category, buffer overflow (BoF) tops the list with 32% of the total vulnerabilities. In the present work, we focus on BoF vulnerability in C code.
There have been numerous studies in the literature to detect the presence of BoF in applications. These studies can broadly be categorized under static and dynamic code analysis approaches. Both of the approaches have their pros and cons.
Static code analysis is capable of analyzing all the possible paths from the tainted source to potentially vulnerable statements. However, it may produce false positives when dealing with complex language constructs or non trivial sanitization functions. It is also sensitive to the artifacts introduced by the compiler. For example, an off-by-one error can be detected by a static source code analyzer, but its actual exploitability will be known only after the compilation.
On the contrary, Dynamic analysis can be very accurate in vulnerability detection, but it faces a hard time in finding paths that activate the vulnerability. The discipline of generating inputs that trigger a vulnerability is termed as fuzzing].
Fuzzing can be completely random or intelligent. In the later case, a possible approach is to use a symbolic execution technique to produce a list of constraints that should be satisfied in order to execute a given path (the so-called path-condition). These path constraints are presented to a constraint solver to get some possible inputs that satisfy them. However, the obtained results highly depend on the solver’s efficiency. An alternative approach consists in re-phrasing the input generation as a search problem, that can be solved using dedicated techniques like genetic algorithms.
In this paper we propose a hybrid approach to detect BoF vulnerabilities. First, a static code analysis is used to generate Taint Dependency Sequences (TDS) representing a subset of tainted paths leading to a potential vulnerability. Then, a dynamic analysis is used to generate concrete inputs allowing to execute one of these paths.
This dynamic part relies on a genetic algorithm. The fitness function that we use associates a score to each current input according to the runtime dynamics of the application: each obtained execution trace is compared with the TDS that we target, and a next generation of inputs is produced from the best individuals. This runtime aspect makes the approach faster and accurate. We provide experimental results on the Verisec benchmark to validate our approach.
To read this external content in full, download the complete paper from the author archives online at the Cornell University arXiv repository .