Making launch failures like the inaugural Falcon 1 flight (above) into teachable events is a process often hindered by bureaucracy. (credit: SpaceX)

Actually, we need more successful failures

by Wayne Eleazer
Monday, July 17, 2006

Bob Clarebrough’s recent article “We need more failures” (The Space Review, July 3, 2006) points out that learning from failure is essential to success, and that failures represent an opportunity to achieve better results in the future rather than reason for unrequited despair.

This is true; learning from failure has been a key and essential feature of the space launch industry. A great many of the procedures, processes, and even the basic philosophies applied in space launch activities derive directly from the lessons of failures. However, the process of learning from failures has not always been done very well, has been subject to some astonishing “failures” of its own, and is not getting better with age.

The Air Force way

All the way into the 1990’s US Air Force space launch failure investigations were conducted under the provisions of Air Force Regulation 127-4. This was the same regulation used for investigation of aircraft accidents, dubbed “mishaps” in USAF parlance. A key feature of 127-4 was secrecy and restricted access. Even for unclassified operations, the information gathered and the analyses undertaken were kept as very closely-held material. The most vital concept that drove the investigation process was that people should not fear retribution should they provide honest and complete data on an accident. Even if the individuals involved had made a serious error, the mishap investigation could not be used as a basis for disciplinary action. Should it be necessary to prosecute individuals for dereliction of duty, a completely separate investigation would be required; it could not be based on the formal mishap investigation.

The process of learning from failures has not always been done very well, has been subject to some astonishing “failures” of its own, and is not getting better with age.

The result of this policy was that access to the investigation teams’ data was strictly controlled, outside conversations were restricted, and the final report was limited in distribution; typically no more than a dozen or so copies were made and they were sent to a largely set list of organizations as specified in the regulation. The fact that space launches were intrinsically different from aircraft mishaps in that they almost never involved private property damage and never ever involved any loss of life cut no mustard with the authorities; the regulation had to be followed.

The formal report was so limited in distribution, in fact, that not even the manufacturers of the failed equipment could receive a copy. Although the companies that built the hardware were often deeply involved in a mishap investigation—as they always were in the case of space launches—they could not be officially apprised of the formal results. An Air Force JAG officer explained it this way: “Suppose an Air Force mishap investigation concluded that the employee of a private firm made an error that caused a rocket to fail. If the Air Force told the company of this conclusion, they could conceivably fire the employee on this basis. And since the employee had been promised that the mishap investigation would not be used to support punitive action, he could sue the Air Force.”

Thus, the lessons learned from launch failures literally were locked away and not widely disseminated, even within the Air Force. The Air Force System Project Offices (SPOs) depended on the Aerospace Corporation for the highly prized “corporate memory” that would prevent repetition of launch failures. Discussions about past failures within the Air Force itself were limited to reminiscing over beers in the Officer’s Club.

In 1982 things got even tougher, as the increasingly litigious nature of American society was being felt even by the military. Following certain aircraft accidents, the military was being pressed hard to enable mishap report data to be used for civilian lawsuits. Relatives of military members killed in aircraft accidents were attempting to gain access to data so that the manufacturers could be sued. Fearing a breakdown of the entire mishap prevention structure, the Air Force Inspection and Safety Center directed the destruction of all copies of mishap reports except those held at the Center itself. That destruction order included not only aircraft mishap reports but also those on space launches as well. Once again, this was totally inappropriate for space launches, where litigation issues did not exist, but the military philosophy in effect dictated “one size fits all.”

Regulations underwent revisions, and at times the Air Force relented a bit on revealing information on launch failures, but basic attitudes in the Air Force in regards to launch mishap investigation really did not change until after the Air Force Delta 2 failure of January 17, 1997. When commercial users of the Delta realized that the Air Force was going to conduct its usual secretive mishap investigation they set up a howl of protest. The private firms wanted to know what had happened and what needed to be done to fix it, period. In November 1998 the Air Force formally announced that it was abandoning its close-hold policy for space launches; recognizing reality had only taken forty years.

NASA’s approach

Meanwhile, in parallel and generally separate from the Air Force, NASA was conducting its own space launch mishap investigations. While not restricted by AFR 127-4, NASA followed generally similar practices for mishap investigations, but suffered from its organizational structure as well. Whereas all Air Force space launch SPOs worked for the same general officer commander, NASA had farmed out its acquisition of launch capabilities to the various space centers. NASA Lewis (now Glenn) handled the Atlas. Goddard bought the Delta. Langley took care of the Scout booster. Johnson and Marshall handled the Saturn and Shuttle. Kennedy handled launch operations, both in Florida and for NASA launches from Vandenberg AFB.

When there were common interests, NASA and the Air Force typically invited participation in each other’s mishap investigations, but that did not mean that they agreed on corrective actions.

The various NASA centers viewed each other with considerable suspicion, often even thinking of each other as competitors. When it came to mishap investigations, lessons learned by one center—even if disseminated—would almost certainly be discounted as irrelevant by the others. These attitudes had their impact. The Air Force watched in disbelief as NASA repeated the type of countdown management mistakes displayed in the 1986 loss of the Challenger only a year later, during both the loss of a NASA Atlas Centaur at Cape Canaveral and the seemingly chaotic countdown for a Scout booster at Vandenberg AFB. The Air Force conclusion was that NASA did not seem to be learning from its own mistakes, even though the Air Force itself had scrutinized and modified its own processes as a result of observing the NASA failures, even if from a distance.

Then there was the problem of Air Force and NASA cooperation. When there were common interests, the two agencies typically invited participation in each other’s mishap investigations, but that did not mean that they agreed on corrective actions. At times this led to absurd situations that could have been catastrophic.

Following the loss of an Atlas from Vandenberg AFB in 1980 the Air Force elected to make certain modifications to its Atlas E boosters. Next, the SPO that handled the payload that had been lost on the 1980 Atlas failure elected to procure new Atlas boosters through NASA rather than continue to fly the converted ICBM’s similar to the vehicle that had failed; reliability concerns were one reason for this decision. However, the new Atlas boosters were procured through NASA, who had decided not to make the mods the Air Force had to its version. Thus, so unaware, the Air Force proceeded to launch NASA-procured boosters that had not been made flightworthy by the USAF’s own standards—and furthermore, carrying the same type of payload that had been lost in the earlier failure. This incredible fact was not discovered until the final launch of the series, and then only through pure happenstance, when an Air Force officer who had served on the earlier mishap investigation board glanced at a booster before it left the factory.

The commercial factor

In the late 1980’s came the beginning of private industry’s commercial launches, and two new factors were introduced into the challenge of learning from failures: competence and competition.

On April 18, 1991, Atlas 1 mission AC-70, carrying the commercial communications satellite BS-3H, failed to achieve orbit when one of the RL-10 engines on the Centaur stage failed to start properly. The rocket’s manufacturer, General Dynamics, had to conduct its first mishap investigation not under government control. The problem that caused the failure was identified and corrective action was applied. Everyone was satisfied, and the following year, for the Atlas 1 AC-71 launch of the commercial communications satellite Galaxy 1R, the exact same failure happened.

The original failure investigation had failed, a first in recent memory. More tests were conducted and they stumbled onto the correct answer; it was related to both a hardware failure and modifications the company had made in order to achieve higher performance. Clearly, companies would have to learn how to conduct failure investigations without government guidance.

All too often, rather than learning from the failures of others private firms engineers argue that the lessons learned by other firms don’t apply to them.

Then there is the problem of selective memories. Admitting to failures can be emotionally distressing, but in the commercial world it can be financially painful as well. Openness in investigating a failure will be applauded by everyone, but you can bet that a certain percentage of customers will simply take their business elsewhere. The fact is that every single launch company carries launch “successes” on its books that were rather less that that, in some cases spectacularly so. Still more failures are listed as one cause—preferably implying a screw-up by a subcontractor, an unforeseeable fault, or the discovery of a heretofore undiscovered principle of physics—when the reality would prove to be a bit too embarrassing.

All too often, rather than learning from the failures of others private firms engineers argue that the lessons learned by other firms don’t apply to them. They would not make that same mistake, and besides their hardware is not absolutely identical anyway.

Today

In his article Bob Clarebrough says “…failures are major sources of learning and are studied by managers worldwide…” But the fact is that is not true, at least not in the space business.

Following the loss of the Columbia, NASA Administrator Sean O’Keefe sought to adopt the US Navy’s submarine crew training methods, which included a study of the loss of the shuttle Challenger as a means of demonstrating the interplay of technology and humans in creating disasters. But O’Keefe had to turn to the Navy for such methods because NASA had no such training tradition when it came to studying failures. Neither does the US Air Force. Neither does the private launch industry. There is something basically absurd about submariners studying a space launch failure when space launch personnel don’t even do it. And this is nothing new.

In May 1986 a NASA Delta 3925 booster carrying a GOES weather satellite failed when a momentary short caused by a chafed wire shut down the booster’s engine. In August 1998 an Air Force Titan 4B broke up during ascent when a momentary short caused the guidance system to reset itself. The cause was the same as the Delta failure of 13 years earlier, a chafed wire.

The 1986 Delta failure was doubly tragic in that a pre-launch review of the booster showed serious workmanship problems. One senior MDAC engineer described it as, “The worst quality I have ever seen.” There were certainly ample lessons learned from the past that showed poor quality control to be a major cause for concern, but they were ignored.

There is something basically absurd about submariners studying a space launch failure when space launch personnel don’t even do it. And this is nothing new.

The 1998 Titan 4B failure was doubly tragic in that following the Delta failure of 1986 the entire industry went through something approaching an epiphany in regards to booster wiring harnesses. Special inspections were done on every rocket. Tests were conducted that discovered all sorts of interesting things about wiring harnesses and insulation. New tooling was devised to prevent harness damage during assembly and processing operations. The need for redundancy in systems was emphasized. The whole industry became sensitized to wiring harness problems, yet 13 years later it was all but forgotten.

Today, there is little appreciation for the lessons to be learned from failures. Failures and their causes—and invariably today that always amounts to human error—are ignored by academia and industry alike. Government becomes intensely interested for a few moments after a disaster and then lapses back into its old ways. Industry tries hard to forget old embarrassments. Few people give it all any thought, and if they do look for information they find that gaining insight is more akin to an intelligence gathering operation than a research project.

Even in the case of the SpaceX Falcon 1 failure, despite a corporate pledge for an open investigation process, data that would have been provided freely under even the old restrictive government rules has been locked up.

A senior engineer from Lockheed Martin summed up the present day situation back in the year 2000, “We have a great lessons lost program.”

We don’t need more failures, but we need more successful failures—ones that teach useful lessons.

Wayne Eleazer spent 25 years in the US Air Force, serving as the Thor program manager, GPS integration manager, Atlas test director at Vandenberg Air Force Base, and led the space launch section of the Air Force Acquisition Directorate of the Secretariat in the Pentagon. Prior to his retirement in 1999 at the rank of lieutenant colonel, he served as chief of advanced planning of the 45th Space Wing at Patrick Air Force Base, Florida, responsible for initial planning of EELV and other launch operations at Cape Canaveral.