Hackers bite the (static analysis) dust: Part 2 - Embedded.com

Hackers bite the (static analysis) dust: Part 2

We often hear that security is “hard to implement.” Why is that? Abetter way to phrase it is that it is very hard to find the rightbalance between usability and security in a system. It is very easy tocreate a system that is secure but unusable ” for instance one that isnot connected to an outside network, or one that is powered off.

So, why is security so hard to combine with usability? The shortanswer is: because of the amount of source code that goes into atypical system. Today's systems are not just big, they're huge,registering at millions of lines of code. All other reasons are justcorollaries of that one basic fact. That leads us to the first rule:

Rule 1: Security is hard because systems are huge
A system is implemented through source code. Every line of code in asource base is telling the system to execute a set of instructions. Inother words, every line of code could potentially do something tocompromises the security and reliability of the system. In order toimplement security, someone or something needs to check every line ofsource code.

But that is not where it ends. Each source line depends on andinteracts with all of the lines around it! A system is not simply a rawcollection of source lines ” it could be more accurately described as atightly woven fabric built up of threads that are source lines. So notonly do we need to check every source line, but also the interactionseach source line has with everything else.

According to Wikipedia, Microsoft Windows source base consists ofabout 50,000,000 lines of source code. That sounds like a huge number,but what does it actually mean? Consider a huge novel ” “War and Peace”comes to mind. Leo Tolstoy's masterpiece depicting 19th century Russiais a mere 1,500 pages ” that is about 100,000 lines of text, or my newunit: one WAP (“War And Peace”).

That makes Windows source base about 500 WAPs. Imagine reading andintimately understanding five hundred books that rival “War and Peace”in size and complexity. No offense to Tolstoy, but that sounds like aliterature-class nightmare.

A smaller application, like the Apache web server, will come in atabout 1.5 WAPs. Among popular operating systems, Linux Debian sourcebase takes the top prize at 2,000 WAPs.

Rule 2: Security is hard because source codegenerally never gets removed
The problem of understanding the code does not scale proportionallywith the amount of code. The solution is not simply to throw moreprogrammers into understanding and checking code, because thecomplexity grows exponentially.

At some point the complexity reaches a limit where a certain levelof security is no longer possible ” and it looks like Windows and Linuxdevelopers have abandoned that idea several hundred WAPs ago. It's nowonder that my Windows machine downloads new critical security updatesevery few days. It is impossible to find all of the vulnerabilities.

Rule 3: Security is hard because complexity growsexponentially relative to source code, due to interdependencies.
How did we ever get ourselves into this mess? The biggest problem isthat most of the systems are evolved versions of things that have beendeveloped for decades. And when people add new things, they rarelyremove the old functionality.

Why? Because no one wants to risk removing something that otherthings might depend on. And that leads to a vicious cycle ” the morecode you add, the less likely you will be able to understand thecomplexity of the system, and the less likely you will want to removesomething old, but you still have to add more code, oh, you get theidea.

When I was in college, my professors told us to cherish our classprojects, because that would be the only time we would write a newprogram from scratch. They were right.

Static analysis to the rescue
It turns out that static source analyzers have solutions for that aswell. Analyzing one source line at a time is not sufficient. What isreally needed is analysis between the interaction of any two sourcelines that have anything in common (like path of execution, access tothe same data, etc.).

Static source analyzers can do exactly that, no matter how far twosource lines of code are physically distant from each other. They do itby internally building data structures based largely on compileroptimization techniques, like forward and backward data flow analysis,and control flow graphs. This may sound like a load of technicalgibberish, but the good news is that static analyzers can spotinterdependencies that even experienced programmers would not detect.

Figure2: Static source analyzer integrated into the application builder.Every time a developer builds an application, current defects areimmediately reported and can be quickly addressed and fixed. Source:Green Hills Software.

Furthermore, while trying to employ thousands of programmers tomanually read through and verify source code will likely lead tomutiny, static analyzers will happily analyze a system or applicationof any size, with the same scrutiny and rigor.

Legacy code that is almost never used (and is potentially riddledwith security holes) is not spared from the inspection; staticanalyzers dip and peak into every nook and cranny until the entire baseis completely scrubbed.

No Silver Bullet, but. . .
Ok, so static analyzers are great. Let's just run them on every system,fix all the problems they find, and we're done, right?

Well, not quite. While static analyzers find the classes of problemsthat are likely to cause ways to break into a system, there is a classof problems that is beyond what static analyzers can do today andpossibly ever.

Just as we don't have programs that are intelligent enough toautonomously design a complete system from scratch, we don't havestatic analyzers that are intelligent enough to understand a system theway a human can. Humans, luckily for us programmers, are not obsolete.

Some cynics might view static source analyzers as yet another weaponin the hacker arsenal. Those are probably the same cynics who thoughtthat source line debuggers were a bad idea when they came out decadesago (it would be unthinkable today to embark on a project of anysignificant size without source line debuggers).

It is easy to see why this cynical view is very short sighted.First, static source analysis works only when the source code isavailable, and hackers will generally have access to source code onlyfor things that are developed from open source. Last time I checked,most systems used today are proprietary, especially for the highlysecure environments. So, obtaining the source code is not that easy.

Second, hackers need to find only one vulnerability to break in, soto them it is only a matter of time before they find one ” with orwithout a static analyzer. The more complex the system is, the easierit is to find one.

System designers, on the other hand, need to detect and fix as manyvulnerabilities as possible (hopefully all). The more complex thesystem is, the harder it will be to detect all vulnerabilities.Therefore, it is the designers that truly benefit from static analysis,since it is not just a time saver, but a way to design more securesystems.

Finally, static analyzers do a lot of grunt work that is simply toocost prohibitive for humans to perform. In the ultimate twist of irony,we have been enabled to create more complex systems due to advances inhardware speed and capacity along with the help of more efficientsoftware tools and debuggers. Now, static analyzers, as a new breed ofsoftware tools, are helping us overcome the burden of ever expandingcomplexity.

If static analyzers seemed like a luxury item a few years ago, andare a strong recommendation today, in the very near future they will bea required methodology by which software gets implemented and tested.

To read Part 1, go to “Hackersfavorite tricks.

As a Director of Engineering at Green Hills Software, Nikola Valerjevis responsible for managing teams that plan, design, and develop newproducts, including the DoubleCheck static source analyzer.He also manages teams that evaluate new and existing solutions from theuser perspective. Mr. Valerjev holds a Bachelor of Science and a Masterof Engineering degree in computer science from Cornell University. Hehas been with Green Hills Software since 1997.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.