Malware is a collective term for malicious software which enters a system without authorization of the user. With increasing popularity of the Internet, increasing amount of vulnerable software, and rising sophistication of malicious code itself, malware is a big threat to today’s computing world. Malicious attackers are able to gain access to confidential information inside the target platform, even to take control of it by taking advantage of design flaws.
In recent years, many techniques have been developed for detecting malicious software. Taint analysis is a form of information-flow analysis which establishes whether values from unauthenticated methods and parameters may flow into security-sensitive operations.
As taint analysis can detect many common vulnerabilities in applications, it has attracted much attention from both research and industry communities. Based on the concept that some data (such as input from a user) is not trustworthy, taint analysis tracks where the data may be used to harm the software, and monitors suspicious actions.
Generally speaking, there are two taint analyzing techniques: dynamic and static analysis. Static analysis is a process of analyzing program’s code without actually executing it. It relies only on the information available at compile time. In this process (taking binary executable as an example), the binary code is usually disassembled into a form of assembly instructions first, then both control flow and data flow analyzing techniques can be employed to draw conclusions about the functionalities of the program.
It is useful in providing a view of the overall behavior of a program without focusing on any particular execution. The technique has low overhead with respect to the utilization of system resources. However, it has the limitation of imprecision when it handles the dynamic structures (pointers, aliases and conditional statements) of the target program. Meanwhile, many interesting questions that can be asked about a program are undecidable in general cases.
Dynamic analysis analyzes the program at runtime. It is more precise than its static counterpart since it takes the runtime information into consideration. Since the dynamic technique only focuses on a particular execution of the target program, the amount of analysis is sharply decreased.
However, it suffers from large runtime overhead, and can only detect software vulnerabilities when the attacks have been launched. So, it is impossible to locate the latent weak spots, which is very desirable in many cases. In the field of dynamic taint tracking, many testing-based techniques, attempting to detect the potential security threats by improving the code coverage, have been developed.
However, high code coverage is difficult to achieve, and the testing incurs too much runtime overhead. In order to take advantage of the merits of both dynamic and static analysis and avoid the defects of both, it is necessary to combine the two approaches.
In this paper , we propose HYBit, a hybrid framework which integrates dynamic and static taint analysis to discover software flaws or vulnerabilities. Since the source code of most software is hard to acquire and intruders simply would not attach target program’s source code with their attacks, our framework is designed to handle the binary code.
In the framework, the source binary is first analyzed by the dynamic taint analyzer. Then, with the runtime information provided by its dynamic counterpart, the static taint analyzer can process the unexecuted part of the target program easily. Furthermore, a taint behavior filtration mechanism is proposed to optimize the performance of the framework. We evaluate our framework from three perspectives: efficiency, coverage, and effectiveness. The results are encouraging.
To read this external content in full, download the complete paper from the author archives online.