Hackers bite the (static analysis) dust: Part 2
Implementing security with static analysis
By Nikola Valerjev
Embedded.com
(03/19/08, 12:04:00 AM EDT)
We often hear that security is "hard to implement." Why is that? A better way to phrase it is that it is very hard to find the right balance between usability and security in a system. It is very easy to create a system that is secure but unusable " for instance one that is not connected to an outside network, or one that is powered off.

So, why is security so hard to combine with usability? The short answer is: because of the amount of source code that goes into a typical system. Today's systems are not just big, they're huge, registering at millions of lines of code. All other reasons are just corollaries of that one basic fact. That leads us to the first rule:

Rule 1: Security is hard because systems are huge
A system is implemented through source code. Every line of code in a source base is telling the system to execute a set of instructions. In other words, every line of code could potentially do something to compromises the security and reliability of the system. In order to implement security, someone or something needs to check every line of source code.

But that is not where it ends. Each source line depends on and interacts with all of the lines around it! A system is not simply a raw collection of source lines " it could be more accurately described as a tightly woven fabric built up of threads that are source lines. So not only do we need to check every source line, but also the interactions each source line has with everything else.

According to Wikipedia, Microsoft Windows source base consists of about 50,000,000 lines of source code. That sounds like a huge number, but what does it actually mean? Consider a huge novel " "War and Peace" comes to mind. Leo Tolstoy's masterpiece depicting 19th century Russia is a mere 1,500 pages " that is about 100,000 lines of text, or my new unit: one WAP ("War And Peace").

That makes Windows source base about 500 WAPs. Imagine reading and intimately understanding five hundred books that rival "War and Peace" in size and complexity. No offense to Tolstoy, but that sounds like a literature-class nightmare.

A smaller application, like the Apache web server, will come in at about 1.5 WAPs. Among popular operating systems, Linux Debian source base takes the top prize at 2,000 WAPs.

Rule 2: Security is hard because source code generally never gets removed
The problem of understanding the code does not scale proportionally with the amount of code. The solution is not simply to throw more programmers into understanding and checking code, because the complexity grows exponentially.

At some point the complexity reaches a limit where a certain level of security is no longer possible " and it looks like Windows and Linux developers have abandoned that idea several hundred WAPs ago. It's no wonder that my Windows machine downloads new critical security updates every few days. It is impossible to find all of the vulnerabilities.

Rule 3: Security is hard because complexity grows exponentially relative to source code, due to interdependencies.
How did we ever get ourselves into this mess? The biggest problem is that most of the systems are evolved versions of things that have been developed for decades. And when people add new things, they rarely remove the old functionality.

Why? Because no one wants to risk removing something that other things might depend on. And that leads to a vicious cycle " the more code you add, the less likely you will be able to understand the complexity of the system, and the less likely you will want to remove something old, but you still have to add more code, oh, you get the idea.

When I was in college, my professors told us to cherish our class projects, because that would be the only time we would write a new program from scratch. They were right.

Static analysis to the rescue
It turns out that static source analyzers have solutions for that as well. Analyzing one source line at a time is not sufficient. What is really needed is analysis between the interaction of any two source lines that have anything in common (like path of execution, access to the same data, etc.).

Static source analyzers can do exactly that, no matter how far two source lines of code are physically distant from each other. They do it by internally building data structures based largely on compiler optimization techniques, like forward and backward data flow analysis, and control flow graphs. This may sound like a load of technical gibberish, but the good news is that static analyzers can spot interdependencies that even experienced programmers would not detect.

Figure 2: Static source analyzer integrated into the application builder. Every time a developer builds an application, current defects are immediately reported and can be quickly addressed and fixed. Source: Green Hills Software.

Furthermore, while trying to employ thousands of programmers to manually read through and verify source code will likely lead to mutiny, static analyzers will happily analyze a system or application of any size, with the same scrutiny and rigor.

Legacy code that is almost never used (and is potentially riddled with security holes) is not spared from the inspection; static analyzers dip and peak into every nook and cranny until the entire base is completely scrubbed.

No Silver Bullet, but. . .
Ok, so static analyzers are great. Let's just run them on every system, fix all the problems they find, and we're done, right?

Well, not quite. While static analyzers find the classes of problems that are likely to cause ways to break into a system, there is a class of problems that is beyond what static analyzers can do today and possibly ever.

Just as we don't have programs that are intelligent enough to autonomously design a complete system from scratch, we don't have static analyzers that are intelligent enough to understand a system the way a human can. Humans, luckily for us programmers, are not obsolete.

Some cynics might view static source analyzers as yet another weapon in the hacker arsenal. Those are probably the same cynics who thought that source line debuggers were a bad idea when they came out decades ago (it would be unthinkable today to embark on a project of any significant size without source line debuggers).

It is easy to see why this cynical view is very short sighted. First, static source analysis works only when the source code is available, and hackers will generally have access to source code only for things that are developed from open source. Last time I checked, most systems used today are proprietary, especially for the highly secure environments. So, obtaining the source code is not that easy.

Second, hackers need to find only one vulnerability to break in, so to them it is only a matter of time before they find one " with or without a static analyzer. The more complex the system is, the easier it is to find one.

System designers, on the other hand, need to detect and fix as many vulnerabilities as possible (hopefully all). The more complex the system is, the harder it will be to detect all vulnerabilities. Therefore, it is the designers that truly benefit from static analysis, since it is not just a time saver, but a way to design more secure systems.

Finally, static analyzers do a lot of grunt work that is simply too cost prohibitive for humans to perform. In the ultimate twist of irony, we have been enabled to create more complex systems due to advances in hardware speed and capacity along with the help of more efficient software tools and debuggers. Now, static analyzers, as a new breed of software tools, are helping us overcome the burden of ever expanding complexity.

If static analyzers seemed like a luxury item a few years ago, and are a strong recommendation today, in the very near future they will be a required methodology by which software gets implemented and tested.

To read Part 1, go to "Hackers favorite tricks."

As a Director of Engineering at Green Hills Software, Nikola Valerjev is responsible for managing teams that plan, design, and develop new products, including the DoubleCheck static source analyzer. He also manages teams that evaluate new and existing solutions from the user perspective. Mr. Valerjev holds a Bachelor of Science and a Master of Engineering degree in computer science from Cornell University. He has been with Green Hills Software since 1997.