Pundits (and many other sources) predict Microsoft's Vista (neé Longhorn) operating system will comprise at least 50 million lines of code… assuming the troubled OS is ever released.
50 million lines of code. The scale is staggering.
Expect a staggering number of bugs.
Though it’s easy to poke fun at Microsoft, I’m impressed with the company’s recent performance. Windows XP is, at least for me, a very stable product. The much reviled update service seems to be working; reports surprisingly suggest Internet Explorer has fewer security vulnerabilities in recent months than Firefox.
But any 50 MLOC program is a monster. How will they test it?
Well-written C and C++ code contains some 5 to 10 errors per 100 LOC after a clean compile, but before inspection and testing. At a 5% rate any 50 MLOC program will start off with some 2.5 million bugs.
Testing typically exercises only half the code. It’s hard to devise tests that check rarely-invoked exception handlers, deeply nested IFs and nested loops. So the 50% test coverage number suggests Vista could ship with some 1.25 million bugs.
There are better ways to do testing that do produce fantastic programs. Code coverage, for instance, can insure every branch and conditional has been taken. It’s required by the FAA’s DO-178B level A standard for safety-critical avionics.
But the costs are unbelievable. It’s not unusual for the qualification process to produce a half page of documentation for each line of code. A 50 MLOC program’s doc might be 25 million pages long, consuming 50,000 reams of paper – a stack 2 miles high. Will Vista undergo this rigorous evaluation? Probably not.
Maybe Microsoft routinely uses a very disciplined approach to software engineering, including the mandatory use of code inspections. Again, the numbers are interesting. Since good inspections typically find 70% of the system’s mistakes, after inspection Vista might have 50 million * 0.05 bugs/LOC *0.30 defects left after inspection, or 750,000 bugs. If testing finds half of those, they’re still shipping with some 375,000 problems.
What if Microsoft were certified to the highest level of the Capability Maturity Model? Level 5 organizations employ a wide range of practices to generate great software. A CMM5 project typically ships with 1 bug per thousand lines of code. For Vista that works out to 50,000 bugs.
This isn’t an anti-Microsoft rant. It’s a peek inside the problems any organization has when building huge programs. Though we do indeed have ways to build better code, the costs are huge, and scale exponentially as the program size increases.
The largest commercial embedded systems I’m aware of are some cell phones which have around 5 million lines of code, generally a mix of C, C++ and Java. Though few if any of these companies work at CMM level 5, that 0.1% bug rate would yield 5,000 defects, a hopelessly buggy product. One can only hope that the most important features (like making a phone call) work well enough for most users most of the time.
Firmware size doubles every 10 months to two years, depending on which surveys one believes. Programs are gigantic today, and will be simply unbelievable tomorrow.
What do you think? How can we tame bug rates in these huge applications? Reply to , as the response form on this web page hasn’t been working. I guess there’s a bug somewhere…
Jack G. Ganssle is a lecturer and consultant on embedded development issues. He conducts seminars on embedded systems and helps companies with their embedded challenges. Contact him at . His website is .
Microsoft's problems with Vista was the topic of a recent article in the Wall Street Journal. Chairman Bill Gates was forced to reluctantly accept the recommendations of Microsoft's senior executive in charge of Vista, Jim Alchinn, to rebuild Vista from scratch. A very costly but wise decision. Here's a relevant excerpt from the article (emphasis added):
Mr. Allchin's reforms address a problem dating to Microsoft's beginnings. Old-school computer science called for methodical coding practices to ensure that the large computers used by banks, governments and scientists wouldn't break. But as personal computers took off in the 1980s, companies like Microsoft didn't have time for that. PC users wanted cool and useful features quickly. They tolerated — or didn't notice — the bugs riddling the software. Problems could always be patched over. With each patch and enhancement, it became harder to strap new features onto the software since new code could affect everything else in unpredictable ways.
The problem, in my opinion, is one of communication, not between programmers (nothing can really be done about that since programmers come and go) but between different parts of a complex software system. It has to do with data and event dependencies, not only at the program level, but also at the system level. Essentially, it is a matter of the left hand not knowing what the right hand is doing. The problem is proportional with complexity and it affects the entire software development industry, not just Microsoft. But is does not have to be that way. There is a solution, one which, unfortunately will require a fundamental rethinking of software construction and, eventually, of microprocessor architecture. It is expensive but it is never too late to retrace one's steps. The longer we wait, the costlier the problem will be because software is getting more complex and indispensable everyday. Eventually, something will have to be done. Otherwise we're in very serious trouble.
– Louis Savain
What about code re-use?
I would assume Microsoft would use some portions of XP as part of a certified routine library. If say just a small amount of XP is reused, say 30%, your calculations would be reduced by 30%. Best case 375,000 errors would then be 262,500. Not exactly great, but better.
Then there would be final applications testing. There are hundreds of applications written for the XP OS. Vista would have to be XP backwards compatible to some degree.
A good testing program using applications written by other companies would expose some percentage of errors. When XP was released, Microsoft already had a large list of programs that would not function properly on XP. That would tell me that the bugs caused by those programs were checked and either not found, fixed or left as is.
– Bill Mills
Have you checked out SPIN?http://spinroot.com/spin/whatispin.htmlIt has somewhat limited applicability – I don't think it would be asuseful for GUI code. I'm currently reading the related book “Designand Validation of Computer Protocols”, in preparation for designing aprotocol at work. It's found here:http://spinroot.com/spin/Doc/Book91.html
– Bryce Schober
Here is an interesting statistic from an IEEE Software Article (September/October 1997): Motorola reported 126 defects per million assembly-equivalent lines of code on 9 projects in 1997 when they reached SEI CMM level 5.
Now, if we scale that up to say that one line of C (say) is equivalent to 5 assembly statements, then the 50 Million lines of code in the Windows project is equivalent to 250 Million assembly instructions. If we multiply 126*250 we get 31,500 defects. Still a large number but a factor of 10 less than the estimate in your article. By the way, the productivity rate was 2.8 using the CMM level 2 data as a baseline (1.0). So it is possible to increase quality at a reduced cost. Microsoft should have learned that by now, don’t you think?
– Steve Hanka
It would appear that the methodology to develop a large scale system by a single entity is broken.
As a person involved with Software Validation and Verification, developing proprietary Mega LOC systems is very inefficient.
I honestly believe the only way these kinds of projects can be successful is if they follow the open source model. There's no way a single corporate mantra can compare to eyes from around the world from every kind of background imaginable reviewing the same systemcode.
– Joe Sallmen
I am sure I read somewhere that Microsoft employs one Software TestEngineerfor every two Software Development Engineers. If this is true, Microsoft isat least trying to ship high quality, independently tested software.
In contrast, I worked for a couple of years in a British company developingsoftware supposedly to RTCA/DO-178B (safety critical). This company'ssoftware processes were all abysmal, particularly the testing.Certificationwas achieved by sending a stack of meaningless documents to the CivilAviation Authority to be signed.
Big Code or Critical Code, the cost of delivering good quality software ishigh, but the cost of rectifying poor quality software is always higher.
– Martin Allen
While I'm far from a Microsoft supporter, I think you have painted a much too dark picture for Microsoft. Maybe not for the rest of us, but for Microsoft.
First, although Longhorn/Vista is predicted to have 50 MLOC, that number won't be that much bigger than the 20-25 MLOC for Windows/XP. XP came out, more or less on-time for a Microsoft O/S release (only a year or two late), and has been judged the most reliable Microsoft O/S release ever. None of the rest of us has that kind of track record.
Second, Microsoft will have at least 10,000 beta testers trying out the code for Longhorn/Vista for a year or more before it ships. Not exactly an option for the rest of us.
Third, Microsoft has huge resources to apply to the problem that the rest of us don't. Having billions of dollars in cash accounts allows them to do things the rest of us can't. I am aware of at least three different static checker projects underway at Microsoft. Basically, these are lint-like (or splint-like) programs that read the source code looking for particular bug patterns. Microsoft has at least three of these programs under development. The rest of us have to make do with splint, which is no longer under development.
If pair programming turns out to be an effective bug reducer, Microsoft can afford to pair everyone up and reap the benefits. Try asking your CEO/CFO to double your programming staff.
Fourth, Microsoft is coding an ever increasing percentage of their O/S's in higher level languages with features like array bounds checking and automatic memory reclamation, which trap or eliminate certain classes of bugs. Not exactly an option for most embedded systems or avionics.
I suspect Microsoft is already performing at a better rate than your quoted 0.1% of CMM level 5. Even I don't like Microsoft, I am impressed at their improvements in quality.
We should be worrying about what the rest of us can do, and how we can find cost-effective ways to reduce our bug counts. Fortunately, there's lots of available techniques that can help that most people aren't using yet.
– Terry Colligan