Who's at fault when code kills?
In 2001 a Cobalt 60 machine in Panama delivered 20% to 100% more radiation than required to 28 patients, killing between 8 and 21 people.
Who was at fault?
St. Louis-based Multidata Systems provided the software that controlled the machines. Three FDA inspections in the '90s cited the company for poor specification and documentation procedures, inadequate testing, and a lack of comprehensive investigation into customer complaints. Another inspection in 2001, after the tragedy in Panama, revealed more of the same. In particular the company didn't have a comprehensive testing plan that proved the code was "fit for use."
Doctors would hand off a treatment plan to radiation physicists who operated the machine. Lead shields carefully placed around the tumors protected patients' other organs. The physicists used a mouse to draw the configuration of blocks on the screen; the software then computed an appropriate dose.
To better control the gamma ray beam physicists sometimes used five, instead of the recommended four, lead blocks. The software didn't have a provision for this configuration but users found they could draw a single polygon that represented all 5 blocks. Unfortunately, it was possible to create a depiction that confused the code, causing the machine to deliver as much as twice the required dose.
Multidata contends that the hospital should have verified the dosages by running a test using water before irradiating people, or by manually checking the software's calculations. While I agree that a back-of-the-envelope check on any important computer calculation makes sense, I grew up with slide rules. Back then one had to have a pretty good idea of the size of a result before doing the math. Today most people take the computer's result as gospel. The software has got to be right.
The physicists believe the code should have at least signaled an error if the entered data was incorrect or confusing. Well, duh.
So who's at fault?
This week the physicists were sentenced to prison for four years, and were barred from practicing their profession for at least another 4 years. So far Multidata has dodged every lawsuit filed against it by injured patients and next-of-kin.
In my opinion this is a clear miscarriage of justice. Why prosecute careful users who didn't violate any rule laid down in the manual?
Who is at fault when software kills?
Is it management for not instituting a defined software process? Or for squeezing schedules till we're forced to court risky practices?
What about the software engineers? They, after all, wrote the bad code. The very first article of the IEEE's code of ethics states: "[We] accept responsibility in making engineering decisions consistent with the safety, health and welfare of the public, and to disclose promptly factors that might endanger the public or the environment."
But how can we or worse, the courts blame users? Sure, there's a class of nefarious customers unafraid to open the cabinet doors and change the system's design. It's hard to guard against that sort of maliciousness. A normal user, running the system in a reasonable way, surely cannot be held accountable for code that behaves incorrectly.
What do you think? If you build safety critical systems, are you afraid of being held criminally accountable for bugs?
Jack G. Ganssle is a lecturer and consultant on embedded development issues. He's conducting a seminar about building better firmware faster in Las Vegas Dec 10. Contact him at email@example.com. His website is www.ganssle.com.
John Patrick Therac-25 (courses.cs.vt.edu/~cs3604/lib/Therac_25/Therac_1.html) deja vu.
Just because you can do something with software doesn't mean you should. If the software recommends using four blocks and you use five, it would be common sense for the user to test and make sure it works as expected. However, the article states that it "was possible" to confuse the software, which means the error was not always present.
The real question is whether or not the individuals at the company are at fault or the company as a whole. The individual software engineers can state that they were instructed not to run rigorous tests by their managers. The managers can state that they were told that it wasn't in the budget. Upper management could state that the budget was set by expected sales. Sales can state that the customers wouldn't buy it at the higher cost that testing would require. So, who's to blame, the customer who wouldn't pay more for a properly tested unit?
To me, this would indicate that the company as its own entity was to blame. Many dollars would be lost, probably resulting in the bankruptcy of the company and loss of jobs for those involved. No jail time, but punishment (and possible justice) served. But in today's litigious society, there's nothing stopping civil suits from being filed against the ex-employees.
Yes, but I don't think this is the right question. There is a trickle down affect that goes all the way from Systems engineering to the software cross-compiler, to the hardware the code runs on. Think about it. Systems Engineers design the whole "thing". The software engineers design and write the code, compile it using a cross-compiler, and load it to the target microprocessor based hardware. If something fails... the finger points to the software group, when in fact the failure can come from any point in the chain. Are you going to put the whole engineering staff in prison? Since a failure in a safety critical system is NOT INTENTIONAL, the blame is vaporized. You must prove "where" in the process the TRUE failure came from. Putting someone in prison for an UNINTENTIONAL failure just to satisfy the general public stinks of witch hunt.
- Steve King
That company is liable as is the team of engineers, managers, and support staff. We need to take responsiblity for our creations. If we can't test and verify a system to an acceptable degree (that measurement will vary, depending on the application obviously) then we don't build it. If its built anyway, then we should be ready to pay the consequences.
- Sean Thomson
Its a blunder to punish the physicists. The software engineers must be accountable for that.
- ramesh babu
Regarding the blame on the Company, I would say the people in higher places, the directors who have the authority over the entire development process, should take the responsibility, not the low level coders. The worst punishment the coder can take for his job (and salary) is getting fired but the leaders with their high salaries should take the bigger, court punishments.
See the interfaces. The hospitals deal with the Company, i.e., its executives. The interface of the coder is his leader. The immediate higher people would have the power over their direct lower people.
Surely - the software enginner's discipline could be blamed (I should include myself in the category). Unless Engineering discipline and ethics are taken serously by every embedded systems engineeer (especially in safety critical systems), the community may have a guilty feeling lurking in the background. Mere compliance to safety standards for legality is not enough - we need to embed the concept of safety and quality in ourselves. As Engineers, we have a greater social responsibility than anyone else. Let us stand up for ethical and social cause rather than making up for our management deadlines. After all, that is the purpose of engineering.
- Saravanan T S
Follow the MONEY
- Martin Lundstrom
I do not think that either the physicists or software engineers are to blame. There should be a standard body for testing safety critical systems. End Users should know what they are using. Only standardized and verified code should be allowed to run on critical systems. Why just license drugs? License medical software too!
- Sumit Kumar Jha
Well, True that, the code goes inside a healthcare m/c must be tested toughest. By just blamming the developer wont be neat. The incident reveals lack of knowledge of 'how the s/w or machine works in all the situations by the doctor', also 'how a doctor may think in different situattions while giving treatment by the developer'. So, both the parties are guilty.
Proper logical training of doctors before thay use the machine as well Thorough testing from vendor team, in collaboration with medical experitise will only lead to KillLess medical solutions.
Can the doctor and engineer work together both during development and usage so 'everyone is on safer side'? Think that can make a safer world at some expense...
- Saravanan T S
Idea is not to find scapegoats to punish.
Governments should ensure compliance of standards by such critical systems before they being declared 'safe' for use.
Same logic applies everywhere. Often airlines get the blame for any malfunction that is caused by some 'third party' components used by them.
Logically the organisation (as a whole) that finally sells these products to the end users(common public/hospitals etc) must take the blame.
Wait for a few more years and the entire medical industry will move to India and China or maybe even europe where killing a few people does not really matter if done with a good conscience.
Moral of the story: Don't sentence the engineers, sentence the lawyers.
- john doe
The problem is not with the system it was the physicists who use the equipment improperly. Poor software can not be an excuse for user incompetence.
Earlier this year in Dallas, half a dozen people died after receiving transplant tissue from a donor who unknowingly had rabies. The hospital knew after the first victim, but their software was inadequate to enable them to track all the people who had received tissue from the same donor. The blame belongs on the hospital administrators and executives who decided on inadequate software and ignored the consequences for years.
- Ronda Hilton
No doubt the software process failed its risk assessment (hazard analysis). This would have answered the question, "We assume 4 blocks - what if someone used a different number?" Then the software should have been able to handle this - but that may not have been possible.
You cannot anticipate all novel ways in which someone could misuse a device. Engineers must nevertheless diligently try to do so. But if someone uses something "off-label", how can it be the engineers' fault?
- Jeff Geisler
The software should never have allowed the delivery of a killing dose of radiation, no matter the input. Management is to blame, as usual. They hire the engineers and specify the tools, hardware and working conditions. If they deem that they are not receiving sufficient resources for a project, they should refuse to start the project, especcially considering the dangerous nature of radiation. The people who were jailed were technicions, not physicists; they were not responsible for the deaths
- David Eisenberger
First of all this tragedy and that of the Therac-25 point out two flaws with our current system that I think Jack would probably agree need to be addressed (and haven't): (1) The FDA, for whatever combination of reasons, has been ineffective at reducing or eliminating problems of this nature. (2) While the med-techs, doctors and the hospital were all licensed, the guys actually designing and verifiying the software weren't. They aren't even required to have a high-school diploma let alone a license to write and test safetey-critical software like this. That does seem wrong and it is up to our profession to fix this.
I believe, therefore, that the question can be re-stated: Should software engineers on safety-critical systems be licensed in a similar fashion to the physicists/med-techs in the case inquestion? The implied desired effects of such a licensing would be two fold: First the legal blame would likely be shifted from the med-techs (the users) to the engineers, if such a similar disaster occurred again. The other is that these engineers would be more careful about their delivered work, if they knew their careers were at stake.
A tweak of the FDA rules may be sufficient here for now, but an industry-wide overhaul may be called for at some point. By "tweak" I might suggest that the FDA and/or IEEE set up a "licensing board" that accredits academic instituitions to deliver instructional courses and board certified exams for safety-critical design and test. Futhermore, all software developers and testers of safety-critcal applications should be required to be certified by these institutions.
Sadly, I understand this won't eliminate tragedies such as this, but it would reduce their likely-hood of happening again. As to the costs of such a proposed new system: How many lives does it take to make this effort worthwhile to you? I think that we have already crossed that threshold a few decades ago.
- Jim Gilbert
There a number of details missing from the problem description that would help place the blame.
1. What did the user manual state with regard to block placement and treatment design. If all manual and system training specified a 4 block solution then the physisists were delving into uncharted waters and were in fact conducting experiments. As investigators, they were required by common procedure in the industry to prove out their treatment plans.
On the other hand, if the manuals and training implied "draw any shape and the machine will correctly compute the dosage", that is a different situation.
In either case, the company should have had reasonable requirement documents to allow the software engineers and coders to do their job properly. In addition, test plans and test results would have gone a long ways toward providing an assurance that the calculations were correct.
- jeff tuttle
What the physicists tried to do seems reasonable enough. The fact that the system allowed them to do something beyond its specifications screams of inadequate testing and a design that wasn't properly bounded. It's certainly regrettable, but with safety-critical systems there's no such thing as too much testing - you should always assume that a monkey will operate the machinery.
Multidata as a corporation is guilty of bad engineering at the minimum and probably criminal negligance. You can't really blame individuals other than senior management.
- Ben Warren
Very interesting. Software has a bug.. fix the software. The machine in question was manufactured in the U.S it should have gone through FDA certification/approval. If there was an issue then it should have been caught in the course of the approval/certification process. Futhermore, if the company knew that this issue existed and did not disclose it as part of the approval/certification process then they are solely to blame. If the issue was not discovered until after the approval process, then it was incumbent on the company to notify all hospitals that there was an issue and that proper therapy doses could be compromised or alternatively recall the machine. So in the end the innocent are punished and the responsible continue get off scott free.
This is a case of finding someone to blame, the physicists were the most immediate and accessible. The entity to be blamed is the governmental authority who authorized use of the device. But I do not mean blame in the sense of criminal intent. As others have said, no one meant to harm another. But the blame is for the lack of the governing authority to impose adequate testing standards, ie, validation and verification. For example, the FAA imposes an extremely rigorous V&V for Level A (safety critical) software and firmware. We have no mathemtical theory to prove error free software, but we can at least reduce the probability to an acceptable level. That is the responsibility of the regulatory agency.
- Phil Gillaspy
People's lives are at stake in this case so all involved should follow a higher standard of safety.
If the physicists deviated from the operating manual, they should have been skeptical about the accuracy of the radiadion dose. Most software is not perfect.
The user interface should have been designed to be fool-proof, with detection of non-standard inputs. Users are not perfect. This type design should not be performed by programmers, and certainly not by a company with a poor record of documentatin and testing.
This type of machine should include an independent measurement of the radiation dose, which can alarm and shut down the machine when a life-threatening level is reached. It may not be possible to distinguish a 20% overdose from a normal dose, but surely a 50% overdose can be detected.
- Carl Noren
People seem to be missing the statement that the FDA *DID* cite Multidata for all kinds of problems. Unfortunately, they either did not have the teeth, or did not have the guts, to truly force Multidata to fix the problems before people were killed. Even *AFTER* people were killed, the FDA only did further 'inspections' -- they did not take the company to court or ban its products from being sold.
Part of the travesty here is that health physicists (perhaps even some of those who were jailed), were the ones who originally recognized the problem with the calculations and told the FDA and Multidata about them.
As to the speculation that the health physicists (HPs) should have recognized that the dose was wrong -- these calculations have a lot of different parameters, and are difficult if not impossible to compute by hand. The use of lead blocks could change the dosages by factors easily as large as the differences. If the results had been wildly off (factors of 25 or 100) maybe it would have been obvious based on a back of the envelope calculation.
The purpose of the software was to compute the right numbers. What would you expect the HPs to do? Create their own software to cross check the purchased product? Spend hours running through computations by hand?
Incidentally, there is a good chance people will now die of untreated diseases because some of the hospitals where these HPs worked may have to shut down their nuclear medicine programs without HPs to oversee them.
- Greg Nelson
I am amazed with all of the comments presented that no one has mentioned the system's responsibility in terms of hardware.
No machine should be able to deliver a lethal dose, period, regardless of the command sent by the software. There has to be hardware limits or interlocks on anything that can be a safety issue. Software can never be totally responsible for policing itself.
- Thomas lavoie
I find it interesting the number of generalities that are put forth in these comments that just don't hold water in a global context. This event happened in Panama, so why do you assume the FDA had any oversight? It's possible that they couldn't get FDA approval because their product was a turd and dumped it in to environments which don't have the regulations of the US.
Software development is a complex task and most people don't understand how it's done or how it works within the machines that are thrust in front of them. Granted, anyone tasked with operating potentially lethal equipment should be trained to understand the ramifications of their decisions, there are many who aren't adequately informed. Having worked in the avionics industry has unearthed many stories that ground a majority of the flying public, but in most cases it was the software that prevented an "event".
Short of a complete overhaul of the process, the task of ensuring the safety of the products will fall onto the individuals involved in development and certification. The FAA and FDA have people who are responsible for the certification task but they're not able to do everything required to guarantee the system is without fault. Education and licensing are both good to steps toward making the products better. But we also need better tools and regulations that require safety-critical systems to do no harm.
If we want to prevent future mishaps, the first thing we need to do is to stop looking for someone to put in jail. Then people can talk about what happened and what to do differently to see that it doesn't happen again.
Software that acts in the real world needs to be treated differently from software that merely moves bits around. Yes it needs to be carefully built and thoroughly tested, but all testing in our current environment is incomplete. The fact that the test that should have been done is obvious in hindsight doesn't change this. The standard that is most likely to keep things like this from happening is "stick to tested proceedures and configurations" If you find a possible solution that is different from those that have been tested don't use it until it has also been tested.
If I had to point to one failure I would point to the lack of training. The physicists should have been trained to use a different level of care with a computer controled machine than they would with a word processor. The developers need to be highly aware of this and advocate this kind of training and careful use.
- Robin Warner
Jack quoted the first article of IEEE at the end of his article.
How many authors of software (aka programmers, software engineers, software developers, etc) are actually members of IEEE? I think that we will discover that the software industry is not as well self-policed as some expect it to be especially now that offshore software houses are leading to contracting out.
If a train runs off the track, the railroad is responsible regardless of the hardware or software involved. The same applies to the operators of this medical system, but I think that the physicists' company should have been liable, not the physicists thrown in jail, unless of course they were shown to be criminally negligent.
A few years back, DC10s were falling out of the sky and the failures traced to improper procedure removing and reinstalling the engines on the part of the operator - who didn't follow the manufacturer's recommended procedure.
I don't know whether or not anyone wound up in jail but some parties were liable.
With regard to Jim Gilbert's comment, I've worked (at different times) on software which had to undergo both FAA Level A and FDA testing. One of the problems at the FDA is that they really only have one "approval process" in place, so the application for software approval is called an "IND" - short for "investigational new drug"! Clearly they don't REALLY have the infrastructure in place to correctly evaluate software, and there is no equivalent of the FAA's DER (designated engineering representative) to coordinate and oversee the application and the "squawks" brought up during the development process. As far as "licensing" individuals to work on the code, this will NEVER happen in the current political climate of "outsourcing on demand" - witness the spectacle we've had of one Mr. William Gates testifying BEFORE CONGRESS that if the "cap" on visas like H-1Bs weren't extended, his company (you might have heard of it) "would go out of business" - while folks like me can't even get job interviews. In the times we live in (unfortunately), money talks and dead patients don't - better get over it!
- Jeff Lawton
It's just a matter of time before the trial lawyers start going after us for writing bad software. If we don't have a test case that handles every possible condition under the sun does that make us liable? There is plenty of blame to go around for this incident. I don't think the technicians should go to jail but they are guilty of incompetence for not verifying the settings they programmed were correct before using it on the patient. They should not be allowed to work in this field again. I find it similar to the recent Vioxx case where an FDA approved drug proved to have problems after it was released for sale. If you can't provide protection for the manufacturer against lawsuits, it will come to the point where nothing can ever be made anymore. We will be like Cuba, everyone riding around in 1950's model cars because it's too risky to design something new. The lawyers have all but destroyed the health care system now it appears they want to do the same for all technological development.
- Phil McDermott
This is a complex case. I don't think the article gave enough detail.
I am a software engineer who also underwent radiation therapy earlier this year. I received radiation from a cyclotron, not cobalt 60, so maybe my experience doesn't apply. But I received thirty separate treatments over six weeks. Even if one dose was off by a factor of five or ten, it probably wouldn't have come close to threatening my life. When people write that there should be independent dosage monitors that prevent overdoses, that implies keeping cumulative dosage records per patient, possibly across multiple machines. That's probably a more complicated procedure than most well meaning writers envision.
I agree the most with Michael, Jeff Geisler and Jeff Tuttle. A lot depends on the content of the user's manual and the training that the physicists received. If the training materials specified ,or stated the assumption of, four blocks, the physicists should not have been experimenting with five, and bear responsibility.
What does it mean to say that the software was confused? If there was information available to the program that could have detected unusual block configurations or the resulting high dosages and the programmers failed to test for and report that, they bear some blame, maybe all of it, assuming no fault with the training materials. But if the inputs to the software when five blocks were used were indistinguishable from other known configurations, how could they be held responsible?
I think that licensing of software developers for critical applications like this is appropriate, although by iself that can't address all of the issues raised by the article and those who commented on it. As an advocate of free markets, I think that non-government licensing or certification would be at least as effective as government oversight. To protect themselves, equipment manufacturers could require that engineers and programmers be required to carry liability insurance against unforeseen consequences. Perhaps the manufacturers could provide it as part of their compensation package. The insurance company, who along with the manufacturer stands to lose the most if accidents occur, could test and certify the employees.
- Bob Straub