The mandate for certified safe and secure software used to be theexclusivedomain of military, medical and government or other niche areas. Newregulations are beginning to play a critical role in the viability ofdevices manufactured for the global market.
As with many technologies, the military was one of the first”industries” to mainline the use of computers in both itsinfrastructure and weapons systems. One can only imagine some of thespectacular failures that lead to the development of some of themilitary specific standards.
Regardless, they were one of the first to propose a rigor for thedevelopment of software used in military devices. From militaryapplications, it was a natural evolution for software to move intocivilian applications such as avionics. First used in communication,diagnostics and guidance systems, software controls systems have movedinto the arena of flight control systems, where fly-by-wire systemshave now been deployed in commercial aircraft. The European Airbus 380is a perfect example of an aircraft flown entirely by computer; thereare no mechanical linkages between the pilot and the flight controlsurfaces.
Medical devices are another area where the safety of software playsa role in ensuring both operator and patient safety. Programmableelectronic devices are deployed in everything from portable bloodglucose monitors to implanted heart defibrillators. Increasingly,automobile manufacturers are adding more and more computing power totheir products. The reasons range from safety concerns, toenvironmental, to cost.
Engine management software cleans our exhaust, controls thetransmission to insure optimal performance, and anti-locking brakingsoftware maximizes stopping power. In the late 1990s, BMW replaced thewiring harness used for controlling things like electric door locks,mirror and window controls with a simple two-wire CAN bus, and as aresult, eliminated over 10 Kg of wiring from the vehicle. Nowadays,modern luxury vehicles contain upward of 80 or more programmableelectronic devices.
Many automotive manufacturers are toying with the idea of X-By-Wiresystems (steer by wire, break by wire). This is an attractive featureto add from the standpoint of safety as the steering column has beenremoved along with the prospect of impaling the driver who is involvedin an accident. Furthermore, now the manufacturer no longer has tomaintain two versions of the vehicle as the steering wheel and theglove box can be interchangeable. The dealer can customize the car foreither driving in the US/Europe or the UK/Japan/Australia.
The use of software in the aforementioned devices improves theirfunctionality and usefulness, but if that software fails, then in somecases the results are catastrophic. Expensive devices may be ruined,but worse, there is a potential for loss of life.
Notable software Bugs
July 28, 1962 -Mariner I space probe . A bug in the flight control softwarecauses the Mariner I rocket to calculate the incorrect trajectory. Therocket was destroyed by Mission Control over the Atlantic.
1982 – Sovietgas pipeline. Conspiracy theories aside, a bug in the Soviet gaspipeline software controls caused the largest non-nuclear, man-madeexplosion in history.
1985-1987-Therac-25 medical accelerator . A therapeutic device thatutilizes radiation has a bug which can lead to a race condition. Ifthat condition occurs then the patient receives multiple times therecommend dosage of radiation. The failure directly caused the deathsof five patients and harmed many more.
January 15,1990 – AT&T Network Outage . A bug in a new release of codecauses the switches of AT&T to crash. Over 60 thousand New Yorkerswere left without phone service for nine hours.
June 4, 1996 -Ariane 5 Flight 501 . A bug in the Ariane 5 rocket caused theengines to over power resulting in such extreme acceleration that itcaused the rocket to rip itself apart.
November 2000– National Cancer Institute. Panama City Operators find thatthey can trick the software of a therapeutic device that utilizesradiation for treatment. Despite the legal requirement that alltreatment schedules be rechecked by hand, the device delivers twice therecommended dosage. Eight patients die and 20 more will undoubtedly bepermanently disabled.
May 2004Mercedes-Benz – “Sensotronic” braking system – One of thelargest recalls in automotive history; Mercedes-Benz has to recall680,000 cars due to a failure of its Sensotronic breaking system.
It is interesting to note that in every case, these system failuresoccurred in devices whose designers knew in advance the possiblydevastating results that a software failure could cause, and made everyeffort to prevent. It is also interesting to note that in the case ofthe National Cancer Institute in Panama, even with a supposedlyattentive operator bound by law to recalculate the settings by hand(but didn't), the device still caused harm.
From a historical perspective, there are a number of accepted if notmandated standards that many industries must adhere to: military andavionics, aerospace, nuclear and power plants, rail and medical. Theirstandards provide guidance as to how software (if not the entiredevice) is to be designed and deployed. They vary in their rigor,guidance, application and impact on development, but their goal is thesame; to produce safe and reliable devices.
As a side note, it was pointed out to me that software safety,software security and software reliability are not one and the same. Asa contrived and trivial example of the difference, a fire suppressionsystem does not have to be reliable in that it works as one wouldexpect it to; the goal of safe software is so that if it fails, itfails in a safe fashion. In the case of a fire suppression system, itmay be that if the software fails, the fire suppression system comeson.
The two standards to be examined, in reality, view the device, whichin the case of avionics, is the aircraft and in the second case, amedical device, as a total system. But for this paper, it is just thesoftware aspects that will be considered.
The first is the Federal Avionics Administration's DO-178B standard.Titled “Software Considerations in Airborne Systems and EquipmentCertifications,” the standard known as DO-178 was first published in1982 by the Radio Technical Commission for Aeronautics (RTCA).
After two revisions, the current version B was released in 1992. Thestandard was developed to establish guidelines on how software isdesigned, maintained, implemented and used in aircraft. Basically, itspecifies that every line of code be directly traceable to arequirement, every test case be traceable to a line of code and everyline of code has a corresponding test case.
The DO-178B standard has five levels of certification, each of whichequates to the potential for harm if the system fails. The lowest isLevel E and the highest is Level A. The potential for harm and thelevel of certification are:
*Level A: Where a softwarefailure would cause and or contribute to a catastrophic failure of theaircraft flight control systems.
* Level B: Where a softwarefailure would cause and or contribute to a hazardous/severe failurecondition in the flight control systems.
* Level C: Where a softwarefailure would cause and or contribute to a major failure condition inthe flight control systems.
* Level D: Where a softwarefailure would cause and or contribute to a minor failure condition inthe flight controls systems.
* Level E: Where a softwarefailure would have no adverse effect on the aircraft or on pilotworkload.
As an example of the various types of applications and theirpotential for causing harm, the in-flight entertainment system may beconsidered Level E, while a fly-by-wire system is considered Level A.As the potential for catastrophic failure increases, so does the amountof diligence to prevent that potential for catastrophic failure. Forall levels of the standard, almost all of the following “CertificationArtifacts” are required:
* Plan for Software Aspects of Certification
* Software Development Plan
* Software Verification Plan
* Software Configuration Management Plan
* Software Quality Assurance Plan
* Software Requirements Standards
* Software Design Standards
* Software Coding Standards
* Software Requirements Data
* Software Design Description
* Software Verification Cases and Procedures
* Software Life Cycle Environment Configuration Index
* Software Accomplishment summary
The documents above provide “Best in Practice” techniques fordesign, implementation, deployment and maintenance during its lifecycle. The records kept below prove that those practices were followed.
Records and Test Results
* Software Verification Results
* Problem Reports
* Software Configuration Records
* Software Quality Assurance Records
The most rigorous aspect of the DO-178B standard is its approach,quality assurance and testing of the code. That goal is accomplished by”Functional Analysis” of the software and by “Structural CoverageAnalysis” of the software.
The goal of functional analysis is to show a one-to-onecorrespondence between the code that makes up the software and therequirements (traceability); basically, “this code is here because ofthis requirement.” The functional analysis tests the software throughboundary testing and other techniques, and demonstrates that it doeswhat it is supposed to without undefined results.
There are three levels ofstructural analysis:
* Statement Coverage
* Decision Coverage
* Modified Condition/Decision Coverage
Statement coverage essentially means that each line of code has beenexecuted at least once. Decision coverage means that each entry andexit point has been executed at least once and all possible outcomeshave been executed at least once. Modified Condition Decision Coverageexercises each entry and exit point at least once and that everyconditional branch has been covered at lease once. Furthermore, eachcondition in a decision independently affects the executions outcome.
The amount of structural coverage analysis depends on the level ofcertification that is desired and is outlined below:
Level E – No StructuralCoverage Requirements
Level D – 100% traceability
Level C – Level D plus 100%code coverage
Level B – Level C plus decisioncoverage
Level A – Level B plus 100%modified condition decision coverage
The DO-178B specification spells out what, and to a large degree,how a flight system must be designed, implemented, tested andmaintained.
The other extreme to specifying safety in a device is the FDA'sapproach. The Food and Drug Administration's (FDA) 510(k) requires thatmanufacturers notify the FDA 90 days before they plan to market amedical device. It is similar to the FAA's DO-178B in that its intentis to make sure that medical devices are designed and deployed in amanner that ensures patient and operator safety.
The FDA takes a “kinder, gentler” approach to device design. Intheir guidance documents, they state that it is their desire to allowdevelopers to use a “Least Burdensome” approach. I am not implying thatthis particular standard is more lax than the FAA's. The FDA's approachdoes not constrain development to be done according to a singleparadigm.
One company could use extreme programming techniques and anothercould use the traditional waterfall approach. As long as both companiesadhere to the practices that they document and provide proof of duediligence, both approaches are fine with the FDA.
Above and beyond the FDA regulations on device development; in theUS, due to the nature of its liability laws, it is in the best interestof a medical device manufacturer to deliver very safe products.
Converging to Software Control
Historically, operator, plant and stakeholder safety depended onoperator training, physical barriers, mechanical interrupts andmechanical fail safes and lockouts. As technology evolved, so did thesafety systems. Electrical interrupts and lockouts replaced mechanicalones, and physical barriers were replaced by beams and light curtains.The really disruptive aspects of technology occurred when plant systemsthat depended on operator control and intervention started becoming”automated.” The machinery began to think for itself.
There are a multitude of reasons for using programmable logic andelectronics in industrial devices. In some cases, it is because thespeed of the plant operation becomes so fast or complicated that ahuman can no longer keep up with their task. It could be said thatquality control was better. Computer-based systems don't have bad days,or end-ofshift fatigue. In reality, the reason for the explosion ofautomation can be summed up in two words; cost reduction.
Digital systems are faster, more precise and, over the long haul,are less expensive than a $35 an hour laborer who has a pension. Likethe BMW example given earlier, it is so much more cost effective toreplace a wiring harness or pneumatic actuators with a single wire orbus control system. Not only does it reduce the BOM for the system, butin most cases, the labor involved installation is lower. In largeinterconnected systems such as a paper machine, the savings in materialand labor to install it can make the difference between a positive ROIand a negative ROI.
One of my first jobs as an adult was working as an industrialelectrician at a local paper mill. I pulled many a mile of cable thatyear, working with hundreds of others doing the same. At the same timethe instrumentation crews bent and installed thousands of miles ofpneumatic tubing. While there is still a need for the cabling requiredto power the thousands of motors that are used in a paper machine, mostof the “one switch, one control cable” and pneumatics can be replacedwith busses, each of which can support many switches and controllers
The mill had a number of processes that were largely performed usingprogrammable logic elements. At this time the wisdom was thatautomation required redundant or an isolated safety system. That way ifthe control portion of the bus system went nuts and started a broadcaststorm that caused a process to malfunction, the safety related systemcould still put the machine in a safe state. This “separation of churchand state” approach works pretty well, but redundancy is expensive.
Jack Ganssle said recently that the most expensive thing in theuniverse is software. That is true, but it is only true because doingthe next alternative (doing it purely with logic circuits) isprohibitively expensive.
Cultural and PhilosophicalDifferences
There are several cultural differences between the US and Europe as tothe evolution of safe software standards and the overall acceptance ofthem between the two geographical regions.
Europeans in general are used to more regulation in their dailylives and European governments tend to be more supportive of standards.European states use standards and certifications as barriers to trade.The European legal system is somewhat sympathetic to companies whocomply with standards groups as opposed to those who do not comply withthem.
Compliance with standards tends to protect manufacturers againstliability in the event that they produced an unsafe product.Furthermore, European workers are motivated to adhere to safetystandards as they, as individuals, are likely to be held civilly orcriminally responsible for the products they develop. In fact, it isthe personal responsibility of the chief officers of the company tomake every effort to ensure safe products are developed.
Some European companies take this so far as to have their officerssign a “Declaration of Conformity” to ensure that the device wasproduced in accordance with standards and is in compliance withnational standards.
In the US, rightly or wrongly, acceptance of standards and commonpractices, no matter how stringent, does nothing to mitigate amanufacturer's liability in the eyes of both the law and the jury. Withthe exception of those committing gross negligence ” for example aninebriated pilot crashing a plane ” an employee will not face civil orcriminal charges as a result of an unsafe product reaching the market.
So, the only reasons for US manufacturers to choose to adhere to astandard is that they see it as a marketing tool that differentiatesthem from their competitors, it is a government regulation or they fearlitigation if a product harms someone.
Do not misunderstand the prior statement. Many US companies do haveinternal coding, quality and safety standards that they follow; theyare motivated by the market to produce safe products so that is not theissue. It is that there is rarely an incentive for them to join andfollow external standards groups.
The Tipping Point
As a product marketing manager, one aspect of my job is to keep afinger on the pulse of the embedded space. I do a lot of reading, a lotof talking and most of all, a lot of listening. I read blogs, tradejournals, I talk to a lot of people and to customers of course; I talkto lost sales and to what essentially amounts to cold calls at tradeshows. Since I am interested both personally and professionally inindustrial automation, as well as safety critical applications such asavionics, I tend to ask questions pertaining to that aspect of people'sprojects.
What I am finding is that strategic thinking of developers andmanufacturers of home, building and industrial automation is splitalong geographical lines. My perception of this split started 24 monthsago. IEC 61508 was mentioned during a call with our German salesoffice.
I had never heard of it. Neither had any of the US-based customers Inormally spoke with. DO-178B, 510(k), I was familiar with. Over thenext few months, the German office reported more and more interest inIEC 61508. Then interest arose in France and the UK. I received twofrom Japan today.
A decade ago, the International Electro-technical Commission issued thefinal version of its IEC 61508 specification governing the developmentof electrical/electronic/programmable electronic safety-relatedsystems.
The main thrust of IEC 61508 is to provide “guidance” for developingdevices that are functionally safe. In the context of IEC 61508,functional safety is defined as: “Functional safety is part of theoverall safety that depends on a system or equipment operatingcorrectly in response to its inputs. Functional safety is achieved whenevery specified safety function is carried out and the level ofperformance required of each safety function is met.”
Basically, the standard strives to ensure that safety systemsperform as specified, and if they fail, they fail in a manner that issafe. One thing that needs to be (re)emphasized is that when discussingsafety in this context, reliability is not implied, only that if thereis a failure, that it will fail safely.
In many ways, the IEC 61508 standard is very similar to the DO-178Bstandard. It is very structured in its approach in developing software.Unlike the DO-178B standard, the IEC 61508 standard does allowcertification of standalone software. Basically, it allows softwarereuse without having to go through the process of recertifying theentire portion of code that has been previously certified. Of course,all of the code that can be precertified must be code that isindependent of the hardware.
Even while all specific code such as drivers must be certified, theability to pre-certify generic code has a dramatic impact on theexpense of developing safety systems. Since estimates for developingand certifying code to these standards run upward of $100 per line ofcode, this ability to amortize the cost of development over multipleprojects makes these features feasible.
It also makes commercially available, pre-certified, softwareattractive as software vendor's business model to amortize theirdevelopment costs over many, many sales. An added benefit to this isthat manufacturers can add features such as USB or Ethernetconnectivity at a reasonable price, where before they could not affordto certify the extra tens of thousands lines of additional code.
Another bright spot for manufacturers is that the standard allowsdevelopers to partition their systems into safe and non-safe featuresets. When properly implemented, by using MMU hardware, the standardallows developers to avoid the costly burden of validating theapplication code that runs in the partition and does not perform safetyrelated activities. While not a trivial task in terms of the workneeded to guarantee the non-safe partition can't bring down the safetyrelated partition, the benefits to the manufacturer and end customerare immense (when the other options involve the validation process at acost of $100s per LOC).
Another major difference between DO-178B and IEC 61508 is that atits highest level of safety SIL 4, IEC 61508 is stricter in how thatsafety is achieved. Just like DO-178B, as one works through the fourlevels of failure reduction SIL 1- 4, the degree of functional andstructural analysis is more rigorous. Unlike DO-178B, at its highestlevel SIL 4, IEC 61508 calls for redundancy.
Not only does it call for the use of multiple (at least two)processors, but also through the use of two or more different types ofprocessors (ARM vs MIPS), with the software written for each processorby different teams. For more information on hardware redundancies, seeIEC 61508-2. For more information about using different implementationteams, see: IEC 61508-3 section 220.127.116.11 and IEC 61508-7 Appendix B 1.5,and C 3.1 ” C 3.5.
Opportunities for Cost Reduction
Automation was first introduced to improve quality, efficiency andproductivity. However, some of those gains were offset due to the needto develop safety systems to deal with automation.
That required redundant systems to monitor the automated systems.With them came the added expense of isolated busses and controlsystems. So expense added up, not only due to the development of thesafety system, but its manufacturer and installation as well.
I think in general we can say that manufactures are developing safedevices, regardless of their adherence to a safety standard that wasdeveloped in house, or an open standard developed by a committee. It isinfrequent that a truly catastrophic event occurs due to a softwareerror.
That safety record has come at a relatively high cost when comparingfeatures and functionality to device counterparts that occupy theconsumer space. The question arises, which way is better; proprietary,in-house safety standard or use of an open standard such as IEC-61508?
There is some data available on this question. The quantitativeapproach used by many safety standards reduces costs by preventingeither over engineering or under engineering. Shell Global Solutionscut up to 20% from the cost of implementing safety systems. Extensiveinvestigation showed that about 65% of safety functions areoverengineered while 10% are actually under engineered and represent aweak link in the overall safety management of the facility. Only 25%didn't require changes. (exida.com)
The question of “Canadherence to a safety standard save money? ” is answeredpositively. Now, what about the question of will adherence make money?I think the answer to that question is also yes. From my small andclearly unscientific study of our current and potential customer base,I can conclude that if one does not begin to plan for utilizing designand maintenance guidelines that are set forth in standards such as theIEC-61508, one is effectively writing off a growing segment of theinternational market. Will IEC- 61508 go the way of the Dodo bird andISO9000? Only time will tell. Right now, it seems it is becomingestablished and that momentum is growing.
FIPS 140-2. On, May 26, 2006, the Federal Information Processing Standard (FIPS)140-2 “Security Requirements for Cryptographic Modules” took effect.The standard was developed in conjunction with the NSA and is publishedby the National Institute of Standards and Technology (NIST).
It describes the requirements and standards that a hardware and/orsoftware product must meet to be purchased for government use, forsensitive but Unclassified (SBU) use. The standard has been adopted bythe Canadian Communications Security Establishment (CSE) as well as theAmerican National Standards Institute.
In essence, FIPS 140-2 specifies the security requirements providedby the cryptographic module that is used to protect sensitive butunclassified information. The standard covers all computer andcommunication systems, providing four levels of increasing security:Level 1, Level 2, Level 3 and Level 4. Many of the devices requiringadherence to FIPS 140-2 are easy to identify; PC, laptops, printers,routers, switches, basically anything attached to the network.
Others are not identified so intuitively; things like telephones,both traditional and IP-based, are covered. What about cell phones? Itis possible, with the advent of combining traditional cell withVoIP-based services, that the lines are being blurred.
HIPPAA. To improve the efficiency and effectiveness of the health care system,the Health Insurance Portability and Accountability Act (HIPAA) of1996, Public Law 104-191, included “Administrative Simplification”provisions that required Health and Human Services (HHS) to adoptnational standards for electronic healthcare transactions. At the sametime, Congress recognized that advances in electronic technology coulderode the privacy of health information.
Consequently, Congress incorporated into HIPAA provisions thatmandated the adoption of Federal privacy protections for individuallyidentifiable health information.
This new U.S. regulation gives patients greater access to their ownmedical records and more control over how their personally identifiablehealth information is used. The regulation also addresses theobligations of healthcare providers and health plans to protect healthinformation.
There are many more software safety standards that exist than the fewthat are mentioned in this paper. However, the IEC 61508 standard seemsto be becoming a de facto standard, especially in areas before wherethere were either no standards for the industry, or there where noregulatory reasons to adopt one.
One of the primary reasons is that IEC 61508 is a standard that isgeneric in application, but comprehensive in its approach to achievingsafety. Companies that previously utilized proprietary or in-housestandards are adopting IEC 61508 as a marketing tool to prevent themfrom being shut out of markets.
Another factor that may drive North American manufacturers to adoptIEC 61508 is the 2002 Sarbanes-Oxley Act governing the behavior ofcorporate management. In the litigious society of North America, itwill only be a matter of time before some enterprising attorneyconnects the Sarbanes-Oxley act with an unfortunate software failure.
Furthermore, mandates for security in various segments ofgovernment, healthcare and finance are forcing manufacturers ofinfrastructure and office equipment to either conform to expensiveadherence of security standards or to write those markets off entirely.
Because of the rapid convergence of functionality into such thingsas cell phones, it is my personal belief that not only will these sortsof safety and security requirements thrive in the current areas ofacceptance, but they will also grow into other areas. I also believethat it is better to adopt them now while they provide a marketabledifferentiation in a product that will command a premium, rather thanwait until it is just an expected commodity feature of a product thatcommands no value.
Todd Brian is a product managerfor Accelerated Technology, an Embedded Systems Division of
1) Garfinkel, Simson, History's Worst Software Bugs
2) Validated Software's
3) Ganssle, Jack The Embedded Muse 124Feb. 9, 2006
4) IEC Web Page: FunctionalSafety Zone – E
5) IEC Web Page: IEC-61508