It's always the software when something goes wrong - Embedded.com

It’s always the software when something goes wrong

Everybody blames the software when something goes wrong. Are software engineers just put-upon? Turns out, they deserve it.

Everybody always blames the software. The phones go out? Software glitch. Your clock can't keep up with Daylight Savings Time (as I outlined in an earlier column)? Software needs to be updated. I can't get my PDA to talk to my laptop over the Bluetooth link? Software profiles are to blame.

Pity the poor code developer. It's always his fault. Those hardware designers always seem to get off easy.

Unfortunately, I've run up against yet another problem where the software took the blame. If you're a user of XM Satellite Radio, like I am, you may have received this same e-mail from the company, which states (slightly paraphrased):

“As many of you know, XM customers have experienced service outages or significantly degraded service. We quickly identified the problem and are working hard to return to our normal levels of service. The problem occurred during the loading of software to a critical component of our satellite broadcast system, which resulted in a loss of signal from one of our satellites.”

Is my perception of the software usually being the culprit based on fact, or is it simply the quick and easy response? Keep in mind that I come from a hardware background. Generally, whenever we found an issue in the board design (Note the term “issue.” We never had hardware “problems.”), we simply threw it over the wall and had the software guys write a work-around. That's a free fix, right?

To get a perspective from someone who makes a living on the “other” side of the wall, I contacted our respected columnist Dan Saks. I asked Dan whether my perception of the software always being at fault was accurate. Much to my surprise (and delight), Dan agreed with my assessment.

In Dan's view, the software has become so complex, that it usually is an issue with the software that causes a problem. Generally, by the time you get your product out in the field, the hardware has been tested, tested, and tested again. You certainly don't want to do a hardware fix after you've released the product to manufacture.

But a software fix (or update) is something that can take place almost any time. That phenomenon is even easier thanks to the increased use of flash memory. If the system is built properly, like most handsets for example, the vendor can ship the product with software that's “almost 100%.” Then they simply send down a fix over the network.

In my eyes, it sounds like cheating. What's your opinion?

Richard Nass is editor in chief of Embedded Systems Design magazine. He can be reached at .

Reader Response


Just once, I'd like to see hardware guys develop a medium complexity 21st century project without the firmware part.

Use transistors, relays, gears, I don't care. Just make something like a cell phone but no microprocessor.

It will be big and expensive guaranteed. That's not the purpose of the test.

I just want to see the number of bugs in a hardware-only solution.

-John Davies
President
Montrose Hill Systems, Inc.
Pittsburgh, PA


I don't think it's the software developers' fault as much as management's. There is pressure early-on to get hardware the software guys can use. Then as soon as the hardware is ready, Management and Sales are forcing (unrealistic, usually) ship dates. The “The Budget” requires that it ship on a particular date, they cut corners on the to-be-completed items, which usually is “software/integration testing.”

It's extremely frustrating and is a major reason for me to be looking elsewhere.

-Andy Kunz
Sr. Firmware Engr
Transistor Devices, Inc.
Hackettstown, NJ


Richard,

I agree with you both. Because of hardware lead times. Because of Flash memory. Because of changing hardware in the field. Because of software complexity. From a biz sense, have software “fix” the problem. In designing a system, you need to not only design for the requirements but also changes that could and will occur. When an issue is found, you need to quickly find the problem, come up with a solution and get it to your customers. All this is dependant on how well you designed your system (hardware & software) in the first place

-Mark Meyer
Real-time Embedded Software Engineer
Differential Designs Inc.
Commerce, MI


Yes, always the software engineer had to face the problems definitely because the software guy brain is working as fast as restless as the clock cycles of the hardware.

-Vijaykumar Thota
Software Engineer
Siemens
Babenhausen, Germany


It's always easy to look at the things that can be worked out without lot of pain.

Whenever it comes to software there is always scope to look at workaround for those things that are improper in hardware. Let's take the example of fixed point processor every one like to see the performance comparable to floating point precision by developing algorithms or by any means.

Precisely, software can change the way things are happening and whenever these ways are not 100% everything's come back to guys who are responsible. So, off course software …..:)

-Suresh Kurmi
software engineer
STMicroelectronics
Noida, India


The conversation continues . . .


The open-ended increasing complexity of the software problem domain and managing change are precisely why software engineering is inherently difficult.

The fact that a given program can't be proven correct makes it unlikely it'll be error proof. Keeping it simple enough to be reliable while adding complexity is something hardware engineers rarely face.

The way business regards changes to hardware vs. software compounds the problem. I'd like to hear more about why these differ. Scope changes and investment in testing seem to be managed more rigorously for hardware than software.

-Talman Stoner
Principle Engineer
Knowledge-wave
Beaverton, OR


As for my take, although I am a software person myself, once the product hits the market, I would rather that all problems become software problems. As you have well pointed out, an enormous amount of resources are needed in fixing a hardware glitch. But fixing software basically requires for us to upload some patch somewhere secure and prepare it for download. Albeit a bug fix or a safe hardware workaround. Hardware on the other hand, at its very worst, equates to RECALL.I guess at the end of the day, whether we be executives or a suffering, burdened engineers, it is all about defending our profit not as a particular branch of development but as a business unit maximizing corporate profit, and hoping it goes back as bonuses or tax refunds :).

-Gauvin Repuspolo
Firmware Developer
Makati, Philippines


Software usually has to bear the blame for most system failures…that is the price software has to pay for easy reconfigurability. Once the hardware is done, we seldom tinker with it, because it is time consuming and expensive. As a result software is changed to cover up for hardware bugs, poor initial design, requirement changes, etc. This ensures that in the system the modules most susceptible to errors/bugs are in software.

-Sibin Thomas
Senior Engineer
MindTree Consulting
Bangalore, India


Look at where the functionality and complexity is in today's products–most of it is in the software. A typical mechanical fabrication drawing will have at most a few dozen dimensions, a schematic might show a few hundred interconnections, software complexity is measured in thousands of lines of code. The bugs/issues tend to cluster where the complexity is.

Everyone expects to build and test fully functional mechanical and electronic prototypes but the software is not a prototype if it appears to be fully functional–it is sold!

It would be interesting to develop a fair way to compare the ratio of issues/complexity for a statistically significant sample of mechanical, electronic, and software projects. I suspect that the ratios would be pretty similar, though mechanical might come out slightly better because they have managed to standardize their API (SAE, ANSI, ISO,…) for long enough that there aren't too many ways left to screw up a 6-32 x 1″ screw.

-Mark Dresser
Don Mills, Canada


Although it is true that the vast percentage of problems are in the software rather than in the hardware, it is not about cheating. In my experience and others as senior as I am (40+ years of hardware and software) it is a variant of “the shuttle disaster.” That is, management does not listen to engineering.

Specifically, in a recent example, when managment, for valid business reasons, said we want the next software release in 6 months. When engineering said reality for the feature set requested is 10-11 months, the response from management was work harder (implictly cut corners). After all, creative work is the same as laying bricks. Crack the whip and a few more layers of bricks will appear after a litle blood, sweat, and tears. As we know, creative work is not like that.

It is not cheating, it is GREED, plain and simple, and the inability of many managers to understand that the creative process (architecture and sofware design in this case) does not flow uniformly. Nor are all schools and programmers equal in their abilities even though such programmers can usually get something to work. Clarity, robustness, performance, and size optimization seems to not matter much these days with bloated RAM and disks.

So far, the above discussion has neglected QUALITY. In the case I was referring to, we had a backlog of over 2,000 bugs being deferred with software development for new or improved features being written on top of those bugs, on a system with low visibility to internal behavior. Debugging took a lot of time. Finally the sheer number of core dumps forced management to go into a fix the bugs mode. Now we have a stable and robust software system.

Our product is currently faster than the competition's, but everywhere I look I could speed up the software by 20 to 200 with well understand algorithmic changes. But conservatives management, trying to minimize risk, is afraid to replace the slow code with improved faster code. We have one programmer who always gets their code to work, but where a poor programmer will get speed O(N ) and a good programmer can get O(log(N )) and a great programmer can get O(4), this programmer gets O(N *N ).

The project management was non existent. At the end of the 6 months when the programming staff still said 6 more months were needed, we were told work harder (70-80 hours a week for 9 months), and we will be done in a week or too. The result from this greed (management's stock optionas will be worth $X ) is burnout of many wonderful people who had to quit.

-Name withheld on request


The conversation continues . . .


I also come from a hardware background, and I agree that release of software is very poor. But do not blame the engineer behind it.

You really talked about the reason: the vendor CAN ship the product with incomplete software, (something impossible about hardware). Why is this the decision? First, it is time to market. Second, it is money savings.

Be sure that today there are known means to test and validate software up to a quite 100% (Hardware cannot be tested up to 100% of possibilities either). The “issue” is that this is very expensive and takes quite a long time.

What would have happened if Microsoft released Windows 95 in 2000 because an intensive validation program? Who would have paid for the resulting expensive software? First their competitor would have taken the lead, second they would have not been able to sell a single copy of it.

Then, if we can accept that an electronic gadget (computer, pda, phone) fails “because software is not completely validated” and a simple fix may be sent to solve this, why complicate having robust software? Let the users do the validation work and enjoy making money so “easily”!

If a battery in a notebook burns, what happens to the manufacturer? Can you imagine serious damages because of software failures (or business losses)? What would happen then? Why we accept that software is “so complicated” that it cannot be validated? What does “complicated” mean? Why does a plane never have a failure in software that makes it crash? (Do they use software “gods” for engineers?)

I suggest that the blame goes on the businessman behind the software engineer, or in the final user that does not claim a “big” compensation if software fails, (or the “system” that enables this situation).

What do you think? For me, no engineer likes to leave the work unfinished (and they know it is).

Regards,
-Jose G. Fernández
Advanced Electronics Principal Engineer


Rich,

You cut to the heart of the issue by ironically describing a software work-around as a “free fix.” The problem is that software has always been perceived as free. Perhaps it is the ethereal nature of software versus the physical presence of hardware that renders this perception. Is it the normally faster turnaround time of software or the vision that a forklift upgrade of software can be accomplished without an actual forklift or even any physical presence? How many schedules have been “time compressed” due to hardware that is late and/or delivered with “issues” that software needs to discover and fix? How often are “system issues” left to the software developers? Who really needs to monitor deployment of a complex system?

No, it's not cheating. The reality is that software is the key to making money. Meeting the time-to-market requires software developers to monitor hardware designs to catch as many “issues” early as possible, spend time with sales and marketing to fully understand the key deliverables, survey the deployment site and procedures to enhance the customer perception, validate the system against external functional interfaces, and, of course, actually develop the software that delivers against all those requirements, all the while anticipating the direction the market is moving. Software development is far beyond mere programming.

What might be considered cheating is that a vendor can ship a product that is “almost 0%” complete in applications but has a solid “software forklift” to build the applications in situ function by function, download by download. Customer expectations usually demand significantly more than 0% upon initial installation, but they will also accept significantly less than 100% functionality if they can start their revenue stream earlier. That really isn't cheating. That is a strategic competitive advantage.

Brian R. Zimmerman
BSEE, MBA
Senior Member, IEEE


With your article, I've realized that software quality crisis is becoming more of an issue in the world of embedded systems. I'm glad you opened up this topic.

Technologies or techniques that contribute to improvement of software quality matter, and I believe that opportunities in this field are almost limitless. For instance, the key is to detect as many bugs as possible in the early stages of development using Failure Mode Effects Analysis, proper implementation of CMMi or Agile techniques, or whatever your organization sees fit. Even computer-assisted software engineering (CASE) tools that serve as complement for human limitations (of software engineers) are becoming increasingly significant.

As your article points out, because software is more “flexible” and agile when it comes to fixing system problems, software engineers inherently take the responsibility for the glitches particularly when hardware fixing can cause a bitter recall. Case in point: About ten years ago, the Flash ROM was only used in a small fraction of the total products we manufacture–it used to be included in customer sample units only, but nowadays this type of re-programmable memory is incorporated in almost all of our products, ensuring that software upgrade is possible even after deploying to market.

Nevertheless, as a project manager in particular, I feel it's an embarrassment for us in the software engineering department to still incur software bugs in the later stages of product development and I would absolutely hate to receive a report from the Customer Service team about a bug that occurred after mass production! (That could get me and my team on the hot seat! J)

-Honofre Tingson, Jr.
Manager, Audio-Visual Embedded Software Engineering team
Fujitsu Ten Solutions Philippines, Inc.
Manila, Philippines


I read your article on “It's always the software when something goes wrong.” I agree with your article and software is the problem. But I would also argue that in most projects that I have seen, the duration allocated for the hardware/software work items seems to be an issue. All projects have finite development cycles and most projects seem to allocate a lot of time for hardware (development+ verification) which eats up the software engineering duration. Testing of software is not 100% automated (at least in the first run/release of a new product), which leads to issues for verifying software. Also the effort estimated for “last minute” hardware changes are typically underestimated from a software development perspective.

-Vividh Siddha


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.