Who's at fault when code kills? - Embedded.com

Who’s at fault when code kills?


Click here for reader response to this article

In 2001 a Cobalt 60 machine in Panama delivered 20% to 100% more radiation than required to 28 patients, killing between 8 and 21 people.

Who was at fault?

St. Louis-based Multidata Systems provided the software that controlled the machines. Three FDA inspections in the '90s cited the company for poor specification and documentation procedures, inadequate testing, and a lack of comprehensive investigation into customer complaints. Another inspection in 2001, after the tragedy in Panama, revealed more of the same. In particular the company didn't have a comprehensive testing plan that proved the code was “fit for use.”

Doctors would hand off a treatment plan to radiation physicists who operated the machine. Lead shields carefully placed around the tumors protected patients' other organs. The physicists used a mouse to draw the configuration of blocks on the screen; the software then computed an appropriate dose.

To better control the gamma ray beam physicists sometimes used five, instead of the recommended four, lead blocks. The software didn't have a provision for this configuration but users found they could draw a single polygon that represented all 5 blocks. Unfortunately, it was possible to create a depiction that confused the code, causing the machine to deliver as much as twice the required dose.

Multidata contends that the hospital should have verified the dosages by running a test using water before irradiating people, or by manually checking the software's calculations. While I agree that a back-of-the-envelope check on any important computer calculation makes sense, I grew up with slide rules. Back then one had to have a pretty good idea of the size of a result before doing the math. Today most people take the computer's result as gospel. The software has got to be right.

The physicists believe the code should have at least signaled an error if the entered data was incorrect or confusing. Well, duh.

So who's at fault?

This week the physicists were sentenced to prison for four years, and were barred from practicing their profession for at least another 4 years. So far Multidata has dodged every lawsuit filed against it by injured patients and next-of-kin.

In my opinion this is a clear miscarriage of justice. Why prosecute careful users who didn't violate any rule laid down in the manual?

Who is at fault when software kills?

Is it management for not instituting a defined software process? Or for squeezing schedules till we're forced to court risky practices?

What about the software engineers? They, after all, wrote the bad code. The very first article of the IEEE's code of ethics states: “[We] accept responsibility in making engineering decisions consistent with the safety, health and welfare of the public, and to disclose promptly factors that might endanger the public or the environment.”

But how can we — or worse, the courts — blame users? Sure, there's a class of nefarious customers unafraid to open the cabinet doors and change the system's design. It's hard to guard against that sort of maliciousness. A normal user, running the system in a reasonable way, surely cannot be held accountable for code that behaves incorrectly.

What do you think? If you build safety critical systems, are you afraid of being held criminally accountable for bugs?

Jack G. Ganssle is a lecturer and consultant on embedded development issues. He's conducting a seminar about building better firmware faster in Las Vegas Dec 10. Contact him at . His website is .

Reader Response

John PatrickTherac-25 (courses.cs.vt.edu/~cs3604/lib/Therac_25/Therac_1.html) deja vu.

Just because you can do something with software doesn't mean you should. If the software recommends using four blocks andyou use five, it would be common sense for the user to test and make sure it works as expected. However, the article statesthat it “was possible” to confuse the software, which means the error was not always present.

The real question is whether or not the individuals at the company are at fault or the company as a whole. The individualsoftware engineers can state that they were instructed not to run rigorous tests by their managers. The managers can statethat they were told that it wasn't in the budget. Upper management could state that the budget was set by expected sales. Sales can state that the customers wouldn't buy it at the higher cost that testing would require. So, who's to blame, thecustomer who wouldn't pay more for a properly tested unit?

To me, this would indicate that the company as its own entity was to blame. Many dollars would be lost, probably resultingin the bankruptcy of the company and loss of jobs for those involved. No jail time, but punishment (and possible justice)served. But in today's litigious society, there's nothing stopping civil suits from being filed against the ex-employees.

Yes, but I don't think this is the right question. There is a trickle down affect that goes all the way from Systemsengineering to the software cross-compiler, to the hardware the code runs on. Think about it. Systems Engineers design thewhole “thing”. The software engineers design and write the code, compile it using a cross-compiler, and load it to the targetmicroprocessor based hardware. If something fails… the finger points to the software group, when in fact the failure cancome from any point in the chain. Are you going to put the whole engineering staff in prison? Since a failure in a safetycritical system is NOT INTENTIONAL, the blame is vaporized. You must prove “where” in the process the TRUE failure came from.Putting someone in prison for an UNINTENTIONAL failure just to satisfy the general public stinks of witch hunt.

– Steve King

That company is liable as is the team of engineers, managers, and support staff.We need to take responsiblity for our creations. If we can't test and verify a system to an acceptable degree (thatmeasurement will vary, depending on the application obviously) then we don't build it. If its built anyway, then we should beready to pay the consequences.

– Sean Thomson

Its a blunder to punish the physicists. The software engineers must be accountable for that.

– ramesh babu

Regarding the blame on the Company, I would say the people in higher places, the directors who have the authority over theentire development process, should take the responsibility, not the low level coders. The worst punishment the coder can takefor his job (and salary) is getting fired but the leaders with their high salaries should take the bigger, court punishments.

See the interfaces. The hospitals deal with the Company, i.e., its executives. The interface of the coder is his leader. Theimmediate higher people would have the power over their direct lower people.

– Naveen

Surely – the software enginner's discipline could be blamed (I should include myself in the category). Unless Engineeringdiscipline and ethics are taken serously by every embedded systems engineeer (especially in safety critical systems), thecommunity may have a guilty feeling lurking in the background.Mere compliance to safety standards for legality is not enough – we need to embed the concept of safety and quality inourselves. As Engineers, we have a greater social responsibility than anyone else.Let us stand up for ethical and social cause rather than making up for our management deadlines. After all, that is thepurpose of engineering.

– Saravanan T S

Follow the MONEY

– Martin Lundstrom

I do not think that either the physicists or software engineers are to blame. There should be a standard body for testing safety critical systems. End Users should know what they are using. Only standardized and verified code should be allowed to run on critical systems. Why just license drugs? License medical software too!

– Sumit Kumar Jha

Well, True that, the code goes inside a healthcare m/c must be tested toughest. By just blamming the developer wont be neat. The incident reveals lack of knowledge of 'how the s/w or machine works in all the situations by the doctor', also 'how a doctor may think in different situattions while giving treatment by the developer'. So, both the parties are guilty.

Proper logical training of doctors before thay use the machine as well Thorough testing from vendor team, in collaboration with medical experitise will only lead to KillLess medical solutions.

– Nitin

Can the doctor and engineer work together both during development and usage so 'everyone is on safer side'? Think that can make a safer world at some expense…

– Saravanan T S

Idea is not to find scapegoats to punish.

Governments should ensurecompliance of standards bysuch critical systems before they being declared 'safe' for use.

Same logic applies everywhere.Often airlines get the blame for any malfunction that iscaused by some 'third party' components used by them.

Logically the organisation (as a whole) that finally sells these products to the end users(common public/hospitals etc) must take the blame.

– Anand

Wait for a few more years and the entire medical industry will move to India and China or maybe even europe where killing a few people does not really matter if done with a good conscience.

Moral of the story: Don't sentence the engineers, sentence the lawyers.

– john doe

The problem is not with the system it was the physicists who use the equipment improperly. Poor software can not be an excuse for user incompetence.

– Michael

Earlier this year in Dallas, half a dozen people died after receiving transplant tissue from a donor who unknowingly had rabies. The hospital knew after the first victim, but their software was inadequate to enable them to track all the people who had received tissue from the same donor. The blame belongs on the hospital administrators and executives who decided on inadequate software and ignored the consequences for years.

– Ronda Hilton

No doubt the software process failed its risk assessment (hazard analysis). This would have answered the question, “We assume 4 blocks – what if someone used a different number?” Then the software should have been able to handle this – but that may not have been possible.

You cannot anticipate all novel ways in which someone could misuse a device. Engineers must nevertheless diligently try to do so. But if someone uses something “off-label”, how can it be the engineers' fault?

– Jeff Geisler

The software should never have allowed the delivery of a killing dose of radiation, no matterthe input. Management is to blame, as usual. They hire the engineers and specify the tools, hardwareand working conditions. If they deem that they are not receiving sufficient resources for a project,they should refuse to start the project, especcially considering the dangerous nature of radiation.The people who were jailed were technicions, not physicists; they were not responsible for the deaths

– David Eisenberger

First of all this tragedy and that of the Therac-25 point out two flaws with our currentsystem that I think Jack would probably agree need to be addressed (and haven't): (1) The FDA, forwhatever combination of reasons, has been ineffective at reducing or eliminating problems of this nature.(2) While the med-techs, doctors and the hospital were all licensed, the guys actually designing andverifiying the software weren't. They aren't even required to have a high-school diploma let alone alicense to write and test safetey-critical software like this. That does seem wrong and it is up to ourprofession to fix this.

I believe, therefore, that the question can be re-stated: Should software engineers on safety-criticalsystems be licensed in a similar fashion to the physicists/med-techs in the case inquestion? The implieddesired effects of such a licensing would be two fold: First the legal blame would likely be shifted fromthe med-techs (the users) to the engineers, if such a similar disaster occurred again. The other is thatthese engineers would be more careful about their delivered work, if they knew their careers were atstake.

A tweak of the FDA rules may be sufficient here for now, but an industry-wide overhaul may be called forat some point. By “tweak” I might suggest that the FDA and/or IEEE set up a “licensing board” thataccredits academic instituitions to deliver instructional courses and board certified exams forsafety-critical design and test. Futhermore, all software developers and testers of safety-critcalapplications should be required to be certified by these institutions.

Sadly, I understand this won't eliminate tragedies such as this, but it would reduce their likely-hood ofhappening again. As to the costs of such a proposed new system: How many lives does it take to make thiseffort worthwhile to you? I think that we have already crossed that threshold a few decades ago.

– Jim Gilbert

There a number of details missing from the problem description that would help place theblame.

1. What did the user manual state with regard to block placement and treatment design. If all manual andsystem training specified a 4 block solution then the physisists were delving into uncharted waters andwere in fact conducting experiments. As investigators, they were required by common procedure in theindustry to prove out their treatment plans.

On the other hand, if the manuals and training implied “draw any shape and the machine will correctlycompute the dosage”, that is a different situation.

In either case, the company should have had reasonable requirement documents to allow the softwareengineers and coders to do their job properly. In addition, test plans and test results would have gonea long ways toward providing an assurance that the calculations were correct.

– jeff tuttle

What the physicists tried to do seems reasonable enough. The fact that the system allowedthem to do something beyond its specifications screams of inadequate testing and a design that wasn'tproperly bounded. It's certainly regrettable, but with safety-critical systems there's no such thing astoo much testing – you should always assume that a monkey will operate the machinery.

Multidata as a corporation is guilty of bad engineering at the minimum and probably criminalnegligance. You can't really blame individuals other than senior management.

– Ben Warren

Very interesting. Software has a bug.. fix the software. The machine in question wasmanufactured in the U.S it should have gone through FDA certification/approval. If there was an issuethen it should have been caught in the course of the approval/certification process. Futhermore, if thecompany knew that this issue existed and did not disclose it as part of the approval/certificationprocess then they are solely to blame. If the issue was not discovered until after the approval process,then it was incumbent on the company to notify all hospitals that there was an issue and that propertherapy doses could be compromised or alternatively recall the machine. So in the end the innocent arepunished and the responsible continue get off scott free.

– Peter

This is a case of finding someone to blame, the physicists were the most immediate andaccessible. The entity to be blamed is the governmental authority who authorized use of the device. But Ido not mean blame in the sense of criminal intent. As others have said, no one meant to harm another. Butthe blame is for the lack of the governing authority to impose adequate testing standards, ie, validationand verification. For example, the FAA imposes an extremely rigorous V&V for Level A (safety critical)software and firmware. We have no mathemtical theory to prove error free software, but we can at leastreduce the probability to an acceptable level. That is the responsibility of the regulatory agency.

– Phil Gillaspy

People's lives are at stake in this case so all involved should follow a higher standard ofsafety.

If the physicists deviated from the operating manual, they should have been skeptical about the accuracyof the radiadion dose. Most software is not perfect.

The user interface should have been designed to be fool-proof, with detection of non-standard inputs.Users are not perfect. This type design should not be performed by programmers, and certainly not by acompany with a poor record of documentatin and testing.

This type of machine should include an independent measurement of the radiation dose, which can alarm andshut down the machine when a life-threatening level is reached. It may not be possible to distinguish a20% overdose from a normal dose, but surely a 50% overdose can be detected.

– Carl Noren

People seem to be missing the statement that the FDA *DID* cite Multidata for all kinds of problems. Unfortunately, theyeither did not have the teeth, or did not have the guts, to truly force Multidata to fix the problems before people were killed. Even*AFTER* people were killed, the FDA only did further 'inspections' — they did not take the company to court or ban its products frombeing sold.

Part of the travesty here is that health physicists (perhaps even some of those who were jailed), were the ones who originallyrecognized the problem with the calculations and told the FDA and Multidata about them.

As to the speculation that the health physicists (HPs) should have recognized that the dose was wrong — these calculations have a lotof different parameters, and are difficult if not impossible to compute by hand. The use of lead blocks could change the dosages byfactors easily as large as the differences. If the results had been wildly off (factors of 25 or 100) maybe it would have beenobvious based on a back of the envelope calculation.

The purpose of the software was to compute the right numbers. What would you expect the HPs to do? Create their own software tocross check the purchased product? Spend hours running through computations by hand?

Incidentally, there is a good chance people will now die of untreated diseases because some of the hospitals where these HPs workedmay have to shut down their nuclear medicine programs without HPs to oversee them.

– Greg Nelson

I am amazed with all of the comments presented that no one has mentioned the system's responsibility in terms of hardware.

No machine should be able to deliver a lethal dose, period, regardless of the command sent by the software. There has to be hardwarelimits or interlocks on anything that can be a safety issue. Software can never be totally responsible for policing itself.

– Thomas lavoie

I find it interesting the number of generalities that are put forth in these comments that just don't hold water in aglobal context. This event happened in Panama, so why do you assume the FDA had any oversight? It's possible that they couldn't getFDA approval because their product was a turd and dumped it in to environments which don't have the regulations of the US.

Software development is a complex task and most people don't understand how it's done or how it works within the machines that arethrust in front of them. Granted, anyone tasked with operating potentially lethal equipment should be trained to understand theramifications of their decisions, there are many who aren't adequately informed. Having worked in the avionics industry has unearthedmany stories that ground a majority of the flying public, but in most cases it was the software that prevented an “event”.

Short of a complete overhaul of the process, the task of ensuring the safety of the products will fall onto the individuals involvedin development and certification. The FAA and FDA have people who are responsible for the certification task but they're not able todo everything required to guarantee the system is without fault. Education and licensing are both good to steps toward making theproducts better. But we also need better tools and regulations that require safety-critical systems to do no harm.

– John

If we want to prevent future mishaps, the first thing we need to do is to stop looking for someone to put in jail. Thenpeople can talk about what happened and what to do differently to see that it doesn't happen again.

Software that acts in the real world needs to be treated differently from software that merely moves bits around. Yes it needs to becarefully built and thoroughly tested, but all testing in our current environment is incomplete. The fact that the test that shouldhave been done is obvious in hindsight doesn't change this. The standard that is most likely to keep things like this from happeningis “stick to tested proceedures and configurations” If you find a possible solution that is different from those that have beentested don't use it until it has also been tested.

If I had to point to one failure I would point to the lack of training. The physicists should have been trained to use a differentlevel of care with a computer controled machine than they would with a word processor. The developers need to be highly aware of thisand advocate this kind of training and careful use.

– Robin Warner

Jack quoted the first article of IEEE at the end of his article.

How many authors of software (aka programmers, software engineers, software developers, etc) are actually members of IEEE? I thinkthat we will discover that the software industry is not as well self-policed as some expect it to be especially now that offshoresoftware houses are leading to contracting out.

If a train runs off the track, the railroad is responsible regardless of the hardware or software involved. The same applies to theoperators of this medical system, but I think that the physicists' company should have been liable, not the physicists thrown in jail,unless of course they were shown to be criminally negligent.

A few years back, DC10s were falling out of the sky and the failures traced to improper procedure removing and reinstalling theengines on the part of the operator – who didn't follow the manufacturer's recommended procedure.

I don't know whether or not anyone wound up in jail but some parties were liable.

– Darcy

With regard to Jim Gilbert's comment, I've worked (at different times) on software which had to undergo both FAA Level Aand FDA testing. One of the problems at the FDA is that they really only have one “approval process” in place, so the application forsoftware approval is called an “IND” – short for “investigational new drug”! Clearly they don't REALLY have the infrastructure inplace to correctly evaluate software, and there is no equivalent of the FAA's DER (designated engineering representative) tocoordinate and oversee the application and the “squawks” brought up during the development process. As far as “licensing” individualsto work on the code, this will NEVER happen in the current political climate of “outsourcing on demand” – witness the spectacle we'vehad of one Mr. William Gates testifying BEFORE CONGRESS that if the “cap” on visas like H-1Bs weren't extended, his company (you mighthave heard of it) “would go out of business” – while folks like me can't even get job interviews. In the times we live in(unfortunately), money talks and dead patients don't – better get over it!

– Jeff Lawton

It's just a matter of time before the trial lawyers start going after us for writing bad software. If we don't have a testcase that handles every possible condition under the sun does that make us liable? There is plenty of blame to go around for thisincident. I don't think the technicians should go to jail but they are guilty of incompetence for not verifying the settings theyprogrammed were correct before using it on the patient. They should not be allowed to work in this field again. I find it similarto the recent Vioxx case where an FDA approved drug proved to have problems after it was released for sale. If you can't provideprotection for the manufacturer against lawsuits, it will come to the point where nothing can ever be made anymore. We will be likeCuba, everyone riding around in 1950's model cars because it's too risky to design something new. The lawyers have all but destroyedthe health care system now it appears they want to do the same for all technological development.

– Phil McDermott

This is a complex case. I don't think the article gave enough detail.

I am a software engineer who also underwent radiation therapy earlier this year. I received radiation from a cyclotron, not cobalt60, so maybe my experience doesn't apply. But I received thirty separate treatments over six weeks. Even if one dose was off by afactor of five or ten, it probably wouldn't have come close to threatening my life. When people write that there should beindependent dosage monitors that prevent overdoses, that implies keeping cumulative dosage records per patient, possibly acrossmultiple machines. That's probably a more complicated procedure than most well meaning writers envision.

I agree the most with Michael, Jeff Geisler and Jeff Tuttle. A lot depends on the content of the user's manual and the training thatthe physicists received. If the training materials specified ,or stated the assumption of, four blocks, the physicists should nothave been experimenting with five, and bear responsibility.

What does it mean to say that the software was confused? If there was information available to the program that could have detectedunusual block configurations or the resulting high dosages and the programmers failed to test for and report that, they bear someblame, maybe all of it, assuming no fault with the training materials. But if the inputs to the software when five blocks were usedwere indistinguishable from other known configurations, how could they be held responsible?

I think that licensing of software developers for critical applications like this is appropriate, although by iself that can't addressall of the issues raised by the article and those who commented on it. As an advocate of free markets, I think that non-governmentlicensing or certification would be at least as effective as government oversight. To protect themselves, equipment manufacturerscould require that engineers and programmers be required to carry liability insurance against unforeseen consequences. Perhaps themanufacturers could provide it as part of their compensation package. The insurance company, who along with the manufacturer standsto lose the most if accidents occur, could test and certify the employees.

– Bob Straub

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.