How many hurdles do project teams have to clear before their embedded systems are truly complete? Let us count the ways.
What are the top ten ways to protect Congressional pages?
10. If our representatives are so enamored of 700-mile fences, well, build one around Congress.
Whoops. Wrong publication.
If David Letterman can have a Top Ten list, so can I. But instead of ragging on Mel Gibson or making fun of any of a vast number of Congressional scandals, I'll hit embedded systems. What are, in my opinion, the top ten reasons embedded systems projects get into trouble?
10. Not enough resources
Firmware is the most expensive thing in the universe.
Ex-Lockheed CEO Norman Augustine, in his wonderful book Augustine's Laws wrote about how defense contractors were in a bind in the late 1970s.1 They found themselves unable to add anything to fighter aircraft, because increasing a plane's weight means decreasing performance. Business requirements–more profits–meant that regardless of the physics of flight they had to add features. Desperate for something hideously expensive yet weightless, they found . . . firmware! Today firmware in a high-performance aircraft consumes about half the total price of the plane (if you factor in all the development costs). A success by any standard, except perhaps for the taxpayers. Indeed, retired USAF Colonel Everest Riccioni has suggested firmware-stuffed fighter airplanes and smart missiles are now so expensive that the U.S. is unilaterally disarming.2
Face it: firmware is expensive and getting more costly as it gets bigger. One hard-to-believe analysis claims that embedded software doubles in size every 10 months.3 Couple that with the exponential relationship between schedule and size and it's pretty clear that within a few years firmware development will pretty-nearly consume the entire world's GDP.
Yet software people can't get reasonable sums of money for anything but bare necessities. While our EE pals routinely cadge $50,000 logic analyzers, we're unable to justify a few grand for a compiler. How many of you, dear readers, have been successful getting approval for quality tools like static analyzers? Free/Open Source tools bring giant smiles to the head CPA. The statement “just port GNU” is an early indicator of a looming disaster.
9. Starting coding too soon
Agile methods have shaken up the world of software development. Most value code over documents, which all too often is incorrectly seen as justification for typing void main() much too early.
Especially in the world of embedded systems, we don't dare shortchange careful analysis. Early design decisions are often not malleable. Select an underpowered CPU or build hardware with too little memory and the project immediately heads toward disaster. Poorly structured code may never meet real-time requirements. Should the system use an RTOS? Maybe a hierarchical state machine makes the most sense. These sorts of design decisions profoundly influence the project and are tremendously expensive to change late in the project.
Sometimes the boss reinforces this early-coding behavior. When he sees us already testing code in the first week of the project he sees progress. Or, what looks like progress. “Wow—did you see that the team already has the thing working? We'll be shipping six months ahead of schedule!”
8. The use of C
C is the perfect language for developers who love to spend a lot of time debugging.
C is no worse than most languages. But there are a few—Ada, SPARK, and (reputedly, though I have not seen reliable data) Eiffel—that intrinsically lead to better code.
Ada, for instance, is still the favored language for safety-critical applications. Lots of research (for instance Andy German's article on software static code analysis)4 shows that Ada code has half or fewer bugs than C code. SPARK, a relative newcomer, achieves a sixth of Ada's error rate. One might argue that since Ada and SPARK are often used in high-reliability applications, more effort goes into getting the software right. However, Ada programs can be produced for half the cost of C.5
Note that C and C++ dominate this industry. Let's see: they produce buggier results than some alternatives, and cost more to boot. I guess we keep using C/C++ because, uh, debugging is so much fun?
I do like coding in C. It's a powerful language that's relatively easy to master. The disciplined use of C, coupled with the right tools, can lead to greatly reduced error rates. The problem is, we tend to furiously type some code into the editor and rush to start debugging. That's impossible in Ada et al, which forces one to think very carefully about each line of code. It's important to augment C with resources similar to those used by SPARK and Ada developers, like static analyzers, Lint, and complexity checkers, as well as the routine use of code inspections.
So you can change Number 8 to the misuse of C. But the original title is more fun.
7. Bad science
Bad science means one of two things. First, and most common, poor analysis of the real-world events the system monitors or controls. I remember working on a system many years ago when we discovered, to our horror, that lead sulphide IR detectors were more sensitive to ambient temperature than infrared light. That necessitated a major redesign of the system's electronics and mechanics. Yet this was a well-known effect we should have known about. Then there are the systems that don't have enough A/D resolution, precision, and/or accuracy to make meaningful measurements. Poor filter selection can produce noisy data.The second type is when one stumbles onto something that's truly new, or something not widely known. Penzias and Wilson ran into this in 1965 as they tried and tried to eliminate the puzzling noise in a receiver, only to eventually find that they had discovered the cosmic microwave background radiation.
It's pretty hard to stick to a schedule when uncovering fundamental physics. But most of the time the science is known; we simply have to understand and apply that knowledge.
6. Poorly defined process
While there is certainly an art to developing embedded systems, that doesn't mean there's no discipline. A painter routinely cleans his brushes; a musician keeps her piano tuned. Many novelists craft a fixed number of pages per day.There's plenty of debate about process today, but no one advocates the lack of one. CMM, XP, SCRUM, PSP, and dozens of others each claim to be The One True Way for certain classes of products. Pick one. Or pick three and combine best practices from each. But use a disciplined approach that meets the needs of your company and situation.
There are indisputable facts we know but all too often ignore. Inspections and design reviews are much cheaper and more effective than relying on testing alone. Unmanaged complexity leads to lots of bugs. Small functions are more correct than big ones.
We have a vast lore documenting techniques that work. Ignore them at your peril!
5. Vague requirements
Next to the emphasis on testing, perhaps the greatest contribution the agile movement has made is to highlight the difficulty of eliciting requirements. For any reasonably sized project it's somewhere between extremely hard to impossible to correctly discern all aspects of a system's features.
But that's no excuse for shortchanging the process of developing a reasonably complete specification. If we don't know what the system is supposed to do, we won't deliver something the customer wants. Yes, it's reasonable to develop incrementally with frequent deliverables so stakeholders can audit the application's functionality, and to continuously hold the schedule to scrutiny. Yes, inevitable changes will occur. But we must start with a pretty clear idea of where we're going.
Requirements do change. We groan and complain, but such evolution is a fact of life in this business. Our goal is to satisfy the customer, so such changes are in fact a good thing. But companies will fail without a reasonable change control procedure. Accepting a modification without evaluating its impact is lousy engineering and worse business. Instead, chant: “Mr. Customer—we love you. Whatever you want is fine! But here's the cost in time and money.”
4. Weak managers or team leads
Managers or team leads who don't keep their people on track sabotage projects. Letting the developers ignore standards or skip using Lint or other static analyzers is simply unacceptable. No relentless focus on quality? These are all signs the manager isn't managing. They must track code size and performance, the schedule versus current status, keep a wary eye on the progress of consultants, and much more.
Management is very hard. It makes coding look easy. Perturb a system five times the same way and you'll get five identical responses. Perturb a person five times the same way and expect five very different results. Management is every bit as much of an art as is engineering.
Most people shirk from confrontation, yet it's a critical tool, hopefully exercised gently, to guide straying people back on course.
3. Inadequate testing
Considering that a few lines of nested conditionals can yield dozens of possible states it's clear just how difficult it is to create a comprehensive set of tests. Yet without great tests to prove the project's correctness we'll ship something that's rife with teeming bugs.
Embedded systems defy conventional test techniques. How do you build automatic tests for a system which has buttons some human must push and an LCD someone has to watch? A small number of companies use virtualization. Some build test harnesses to simulate I/O. But any rigorous test program is expensive.Worse, testing is often left to the end of the project, which is probably running late. With management desperate to ship, what gets cut?
Design a proper test program at the project's outset and update it continuously as the program evolves. Test incrementally, constantly, and completely.
2. Writing optimistic code
The inquiry board investigating the 1996 half-billion dollar failure of Ariane 5 recommended (among other findings) that the engineers take into account that software can fail. Developers had an implicit assumption that, unlike hardware, which is subject to structural problems, software, once tested, is perfect.
Programming is a human process subject to human imperfections. The usual tools, which are usually not used, can capture and correct all sorts of unexpected circumstances. These tools include checking pointer values; range checking data passed to functions; using asserts and exception handlers, and checking outputs (I have a collection of amusing pictures of embedded systems displaying insane results, like an outdoor thermometer showing 505 degrees, and a parking meter demanding $8 million in quarters).
For very good reasons of efficiency, C does not check, well, pretty much anything. It's up to us to add those that are needed.
My wife once asked why, when dealing with a kid problem, I look at all the possible lousy outcomes of any decision. Engineers are trained in worst-case analysis, which sometimes spills over to personal issues. What can go wrong? How will we deal with that problem?
1. Unrealistic schedules
Scheduling is hard. Worse, it's a process that will always be inherently full of conflict. The boss wants the project in half the estimated time for what may be very good reasons, like shipping the product to stave off bankruptcy, or maybe just to save money. Firmware is the most expensive thing in the universe, so it's not surprising that some want to chop the effort in half.
Capricious schedules are unrealistic. All too often, though, the supposedly accurate ones we prepare are equally unrealistic. Unless we create them carefully, spending the time required to get accurate numbers (see Karl E. Wiegers' “Stop Promising Miracles”)6 then we're doing the company a disservice. Yet there are some good reasons for seemingly arbitrary schedule. That “show” deadline may actually have some solid business justification. A well-constructed schedule shows time required for each feature. Negotiate with the boss to subset the feature list to meet the—possibly very important—deadline.
Your top ten?
Those are my top ten. What are yours?
Jack Ganssle () is a lecturer and consultant specializing in embedded systems' development issues. For more information about Jack .
2. Riccioni, Col. Everest. “Is the Air Force Spending Itself into Unilateral Disarmament?” POGO (Project on Government Oversite), August 2001. Available at: www.pogo.org/p/defense/do-010801-unilateraldisarm.html.
3. Hartenstein, Reiner. “The Digital Divide of Computing” in the April 2004 Proceedings of the First Conference on Computing Frontiers .
4. German, Andy. “Software Static Code Analysis Lessons Learned,” Crosstalk , November 2003. Available at: www.stsc.hill.af.mil/crosstalk/2003/11/0311German.html
Nice article on the top ten reasons embedded systems projects get into trouble. I'm the Director of R&D at a company that builds commercial fire alarm systems. Many of my top ten are the same as yours – however there are some others. I started my career in hardware, so as you will see, my list is colored by that and the fact that I am principally involved in embedded software from a management perspective:
1.) Code Jockeys – We find that project specifications and functionality are usually driven by the hardware group because finding a software engineer who really cares about the application is rare. In our business we want fire alarm engineers first and embedded software engineers second. Too frequently the embedded software engineer is only interested in coding up the application. Many, many problems would be found and circumvented earlier if the embedded engineer knew the application and realized that what he was implementing was incomplete or had applications errors. I can't count the number of projects that have needed additional software work because the software engineering team forgot important elements in the design, or blindly implemented things that made no sense from an application standpoint.
2.) “Better” is the enemy of “good enough” – One of the reasons hardware slips less frequently that software is that it is easy to tell when a hardware engineer is “polishing a turd.” Every design change requires a new board, so design changes are apparent and can be controlled and overruled. In software it is not apparent if the engineer is changing the code unless you are looking at the code in every routine in the system on a daily basis. Our systems are 100,000 to 300,000 lines of code so that is practically impossible to do. A hardware engineer who worked for me had a quote from General Patton on his wall: “A good plan, violently executed now, is better than a perfect plan executed next week.” That sentiment is far more common among hardware engineers than software engineers.
3.) Resistance to process – For some reason software engineers have to be forced, kicking and screaming, to follow a process well. Perfunctory code reviews don't find problems and that is used as evidence as to why we shouldn't have code reviews. Lines of code are only counted when management demands to see them. Collecting metrics is viewed as an evil plot by management, coding standards are fascist constraints on creativity, and unit testing is a delaying tactic to prevent the engineer from moving on to more coding.
4.) Estimation by “expertise” – linked to number 3 is the inclination of software engineers to rely on their “experience” in estimating new software rather than metrics. In my organization this is worse in my PC software group than the embedded, but that is principally because my embedded manager is a convert to metrics and my PC manager is not. I have even had cases where a metrics driven estimate was given and I only found out later that experience was used to create the estimate and then the man months predicted by experience were used to generate the LOC estimate. Unfortunately the way I found this out was when doing a root canal to find out why we were running at double the predicted schedule (and not coincidentally, double the LOC).
5.) Code bloat – We have basic fire panels that have 4000 LOC. We also have intelligent horns with 4000 LOC. How can that be? My cynical manager belief is that the small panel was written by an engineer who understood the application and was primarily interested in getting a functional unit. The engineer who wrote the horn code was interested in a nicely structured system with data abstraction and clean interfaces (see #1 above). Those are all good things, and are absolutely necessary when writing 100,000 LOC. But when writing code that should be 500 or 1000 LOC, the overhead associated with those things is unecessary. Unfortunately it also means that developing the software costs 4x as much and there are 4x more opportunities for defects.
6.) Lack of architecture – The flip side of #5 are the engineers who think that you can code up a 100,000 line panel using the same level of architecture and design that you used on a 1000 line horn. This is the same as your #9.
Finally, I would say your #4 is actually my #1. A good project manager will eliminate all the other reasons for getting into trouble. He won't create unrealistic schedules and he will insist that good practices are followed. In my experience there are not many really good project managers in the embedded software ranks, but those I have met have delivered their projects successfully and have a track record of on-time performance that is similar to hardware.
Another excellent article from you Jack, however I needed to write to you to address some of the points that “Anonymous” the Director of R&D for a fire alarm company added to the end of your article.
His preamble really didn't need to include his background in hardware, it was very obvious from his lack of understanding of a valid software methodology to follow.
#1 You don't want your embedded software engineers “caring” about a project, you want “ego-less” programming, counting on a person to have “parental feelings” about a project to guarantee its success, if fool hardy at best. This whole paragraph reeks of poorly defined requirements and design, he is blaming his software engineers for his lack of ability to create a team to define a product, and instead he expects his engineers to somehow magically divine his and the marketplace's requirements for a given product. His statement: “I can't count the number of projects that have needed additional software work because the software engineering team forgot important elements in the design, or blindly implemented things that made no sense from an application standpoint.” Is extremely telling. Who is defining the work to be done, where is the traceability back to design and requirements? Nope, this is just another “suit” that either doesn't know what to implement, or doesn't want to put his butt on the line to define a product in sufficient detail so that the engineers don't have any wiggle room for what gets included or not included. “Design creep” is an indication of process / management failure, not a failing of the Engineers.Given the lack of respect for s/w Engineers by this Director I would bet that the salary level for these s/w Engineers is below average as well, I can just see this guy spouting “Hell I can get hardware engineers for $45k per year, I won't pay more than that for some lousy 'Code Jockey'!”. Which means that this would be a destination for the newly graduated and freshly imported H1B engineers, who else would put up with his crap?
#2 Two words “configuration control”. “Yes Timmy we can tell what's changed in a s/w project”. He states 300,000 lines of code as if that is huge, Mr. Director you have no idea what a huge software project is, this is just getting warmed up. Wasn't Patton fired??? This guy should be as well. Rushing headlong into coding before waiting for that “perfect” design is obviously this Director's methodology. One of the more interesting changes in the last few years in new product design is the use of VHDL to define FPGAs and CPLDs functionality. Before this, hardware engineers had the advantage of working with previously defined building blocks (i.e. “components”) that generally had large margins for tolerance and failure designed into them by the manufacturer. Now that hardware engineers have gotten into programming VHDL, their lack of concern for reliability and tolerances are becoming evident. I have taken to reviewing VHDL code to look for bad practices perpetrated by hardware engineers who are used to predefined and hence reliable components. So I guess I would agree with him h/w engineers are more likely to follow Patton's bad advice.
#3 Yup, nobody likes to perform routine, worthless procedures. “TPS reports” are nobody favourite thing to do! Are you debating with your H1B engineers the value of your defined process? Could it be they may have some valuable insight into improving the process? Are s/w engineers pointing out that your favourite tool (i.e. “code metrics”) to judge them with is flawed? If only there was a simple number that would tell me if a program was well coded or not, code metrics is something that keeps SQA engineers in large companies employed, it is not a “silver bullet”, in fact there is no “silver bullet”. OOOOOOhhhhhh here is another good quote: “and unit testing is a delaying tactic to prevent the engineer from moving on to more coding.” Why are your development engineers being delayed by unit testing? Are they performing the tests? Have you ignored the idea of independence in your testing? Your engineers are correct, they should be working at a pace that suits them, if they have to stop to perform some mind numbing task such as unit testing their own code then “yes”, you are killing your development team's productivity. Software Engineers need to test their own code to their own level of satisfaction, but formal testing is independent, and controlled.
#4 Looking for that “silver bullet” again huh? There is no guarantee on schedules, they are coordination tools which are dynamic and constantly changing, not “tablets of jade, with letters of gold”. Given that we have already established that your projects are not properly defined how can you hope to estimate a schedule when you don't know what is to be implemented. If you absolutely have to have it by a drop dead date (I call this “management by show dates”), then prepare to scale back product features as this date nears, not add new features in hallway conversations with the software engineers.
#5 “intelligent horns”.. Somehow I feel there is a steer joke in there somewhere. This is so rife with great quotes here is another: “The engineer who wrote the horn code was interested in a nicely structured system with data abstraction and clean interfaces”. I guess Mr. Director prefers obfuscated code with magic numbers and buggy interfacesAnyone want to bet that the s/w engineer who did this horn has already left that company after being confronted with what Mr. Director calls “management”???And then there is this: “software costs 4x as much and there are 4x more opportunities for defects.” Yes, doing it correctly does cost more, is this a surprise to you? And as for defects do you suppose that a poorly structured system with magic numbers and 'dirty' interfaces somehow lowers the number of defects?
#6 Well following in Mr. Director's opinions for development, yes that 100,000 line project should also use obfuscated code, magic numbers, and bad interfaces if for no other reason but to lower it's outrageous cost and keep the defect count low. This guy needs to seek help!
#7 (virtual) Yes good people are hard to find, especially management. But hanging on to good people is even harder especially when you have a “suit” who thinks he knows how to run software development projects.
– Chris Gates
In Break Points, Jack Ganssle criticizes C for its error-proneness, citing a document (www.adaic.com/whyada/ada-vs-c/cada_art.html) written in 1995.
This document contains contradictory statements, omissions, subjective assumptions, and statistically invalid conclusions.* But I'm not here to defend C or attack an article that was merely referenced in Embedded Systems Design.Rather, I'm pointing out yet another instance where we're failing ourselves by using flawed evidence to argue for or against a language (methodology, etc.). At the same time, we complain that software is still too much of an art and not enough of an engineering discipline!
I understand why a valid study of, say, a comparison between two languages is too costly to produce: the number of factors that would have to be controlled for is overwhelmingly large, and such a study would be prohibitively expensive for any single company to perform. Furthermore, most software is written by small teams, where differences in personal skill levels overwhelm any differences in the choice of language or approach.
One could therefore argue that even if the studies done to date are imperfect, we should still pick the best language based on what we know today. Best based on what? Defect rates? Availability and cost of developers? Learning curve? Clearly there is no language that's best for every project, even within the embedded realm. Furthermore, making decisions based on research any statistician would reject cannot equal progress in making software development into an engineering discipline.
—Grant D. Schultz
Senior Software Engineer
The following observations are sufficient to render the C/Ada comparison–and thus Jack Ganssle's conclusion, invalid. This of course doesn't prove C superior:
- The system was begun in C. Therefore many of the architectural and design-level mistakes were already known and/or corrected before Ada was brought into the picture. The article mentions this (“In this project there was substantial value carried from the old design, so that the Ada version was able to get a better start.”) but does not demonstrate that this had no impact on the outcome.
- According to the article, “By mid-1991, the amount of Ada and the amount of C in VADS had reached approximate parity.” However, it also states that, up to October of 1994, 54% of the combined C and Ada SLOC was C. If these are both true, then the relative amounts of the two languages remained in near parity for around three years. This contradicts the claim that the project gradually shifted to Ada.
- The article states, “Since the base salary figures do not take into account salary changes during the development period, nor inflation, nor exact assignments, nor time spent on other (e.g. sales and mentoring) activities, nor many other burden costs,…” If that is the case, then the cost figures should not be used.
- No objective definition of “new features” seems to be given, nor is the word “maintenance” used anywhere in the document. In all other software metrics literature I have read, these two things are covered separately, as the effort required for each type of work differs. Also, each type of work is typically assigned to different people.
- “C was used for the biggest, oldest sections of VADS.” Yet no account is made for the fact that older code can grow buggier over time, particularly if developers know that the old code will be eclipsed by code in a newer, more glamorous language.
- The article says, “It is expected that more gifted developers would work on more difficult projects and so fix rates would depend more on individua update habits than on true merit.” But we all know that those more skilled in office politics will increase their chances of being put on the newer, more exciting parts of a project. We have no way (nor does the author provide one) to objectively evaluate developer skill levels.
- The graph entitled “Bug Fix Rates vs Experience” has axes labeled “Bugs Fixed / Feature” and “Number of Updates”.
- No basic statistical measures of correlation are given for any of the data where a correlation is claimed. It may be visible to the eye, but a more scholarly work would include these additional stats.
Jack Ganssle responds: Thanks for your comments. Your point about the study is well-taken. And you're so right about the difficulty of running a cost-effective comparison between languages. Ada is surely no panacea. However, my point was this: C offers us little inherent protection against sloppy coding or careless mistakes. Other languages do. As I wrote, that doesn't make C unfit for embedded use; what it means is that we must be diligent in our processes and use adjunct tools (Lint, etc) that help us build correct code.