The Space Reviewin association with SpaceNews

Starliner after landing
Boeing’s CST-100 Starliner after landing on its uncrewed Orbital Flight Test in December. Boeing engineers scrambled during the mission to correct a software problem that a safety panel warned could have led to a “catastrophic spacecraft failure” as it prepared to return to Earth. (credit: NASA/Bill Ingalls)

Starliner software setback

Bookmark and Share

The quarterly teleconferences of NASA’s Aerospace Safety Advisory Panel, or ASAP, are enlightening, but rarely exciting. ASAP is an independent committee charged with examining the safety of NASA programs and facilities with a broad mandate, from the International Space Station and crewed spacecraft to the long-term health risks of human spaceflight to terrestrial facilities. The public meetings offer insights into safety issues they see in NASA programs and how the agency is addressing them, but with few surprises.

Until, that is, someone brings up “catastrophic spacecraft failure.”

“If it had gone uncorrected, it would have led to erroneous thruster firings and uncontrolled motion during SM [service module] separation for deorbit, with the potential for a catastrophic spacecraft failure,” Hill said.

That happened at the panel’s latest meeting last Thursday. At that meeting, the committee revealed that Boeing’s CST-100 Starliner has a second serious software problem—not previously reported by NASA or Boeing—that could have put the spacecraft in jeopardy during its uncrewed test flight last December (see “The year of commercial crew comes to an end, without crew”, The Space Review, December 23, 2019).

Paul Hill, a member of ASAP, said the committee had been told by NASA that engineers discovered that second problem during ground testing while the spacecraft was in orbit on that two-day test flight. The problem, he said, was corrected prior to the reentry of Starliner. “While this anomaly was corrected in flight, if it had gone uncorrected, it would have led to erroneous thruster firings and uncontrolled motion during SM [service module] separation for deorbit, with the potential for a catastrophic spacecraft failure,” he said.

The revelation was a surprise, and it took several hours for both Boeing and NASA to respond. Boeing, in a statement late that day, confirmed than a joint NASA-Boeing independent review team (IRT) had found a “valve mapping software issue” during that test flight, but didn’t state that it could have resulted in a “catastrophic” failure. “That error in the software would have resulted in an incorrect thruster separation and disposal burn,” the company said. “What would have resulted from that is unclear.”

NASA and Boeing elaborated in a media teleconference the next day, arranged on less than 24 hours’ notice after the comments at the ASAP meeting spread in the media. “After the crew module separates from the service module, the service module has to go be an independent spacecraft and get rid of herself,” said Jim Chilton, senior vice president for Boeing Space and Launch. “We believe that the way that the software was going to do that could have resulted in the service module bumping back into the crew module.”

John Mulholland, vice president and program manager for Starliner at Boeing, explained that, normally, the command module handles all the thruster firings on the service module when the two are attached. After the service module separates prior to reentry, the “propulsion controllers” on that module take over to perform the firings needed to move away from the crew module and deorbit. The “valve mapping” is different in that case than in normal flight, he said, “and the software, unfortunately, had the same valve mapping for both of those conditions.”

Boeing officials on the call didn’t state that a collision between the two modules could have been catastrophic, but said it could have made the crew module unstable by introducing a wobble, or could have damaged the module’s heat shield. “Nothing good could come from those two spacecraft bumping,” Chilton said.

That issue was detected in ground tests prompted by the Starliner’s first problem, a timer error that caused the spacecraft to think it was 11 hours ahead of where it actually was in the mission, causing it to skip an orbit insertion burn immediately after separating from the upper stage, causing a chain of events that led to the mission being shortened and a planned ISS docking abandoned.

“I don’t think we would have found it,” Chilton said of the second software error, “if we hadn’t gone looking right after that first one.”

Mulholland said that the Starliner software was supposed to initialize that mission elapsed timer by “polling” its Atlas V rocket, but only do so in the final phase of the countdown. However, that latter requirement was missing from the software on the spacecraft. “So, it polled an incorrect mission elapsed time from the launch vehicle, which then gave us an 11-hour mismatch,” he said.

“We went hunting immediately after our first software problem, and we found one,” said Chilton of funding the thruster error. “I don’t think we would have found it if we hadn’t gone looking right after that first one.”

Process escapes

At its Thursday meeting, ASAP said it was concerned the two software errors were symptoms of a more fundamental problem. “The panel has a larger concern with the rigor of Boeing’s verification processes,” Hill said. ASAP called for reviews of Boeing’s software development and testing processes.

“Further,” he added, “with confidence at risk for a spacecraft that is intended to carry humans in space, the panel recommends an even broader Boeing assessment of, and corrective actions in, Boeing’s SE&I [systems engineering and integration] processes and verification testing.”

NASA appears to be on the same page as its safety advisors. “The real problem is that we had numerous process escapes in the design, development and test cycle for software,” said Doug Loverro, NASA associate administrator for human exploration and operations, during Friday’s telecon. “As we go forward, that is what we’re going to be concentrating on.”

That software cycle, Mulholland said, is a “pretty standard one” for the industry: after defining requirements, developers write code. That code then goes through peer reviews and a series increasingly rigorous tests, leading to flight qualification tests. That approach, he said, is “designed to uncover and correct code errors as early as you can,” he said.

But Loverro said the process, or at least the application of it at Boeing, was flawed, with “multiple process breakdowns” leading to both errors getting into the flight software. “For each of these two problems that we know about, some of that breakdown was in different spots and some was in the same spot of the process,” he said.

Boeing said it would reverify all of the software written for Starliner, which accounts for about one million lines of code, to look for any additional errors. Meanwhile, NASA will review that software development process, including the role that organizational culture could have played. NASA will carry out an organizational safety assessment of Boeing, similar to one it carried out of SpaceX, one prompted by concerns among agency leadership after SpaceX CEO Elon Musk briefly smoked marijuana on a podcast.

“There could possibly be process issues at Boeing, and so we want to understand what the culture is at Boeing that may have led to that,” Loverro said.

NASA had planned to carry out a similar review of Boeing but reportedly balked because of the estimated cost, deciding instead to do a much less thorough review. ASAP agreed with the decision to do the safety review. “The review of SpaceX proved to be valuable to both NASA and the company, so it’s a prudent step to execute the same process with the other provider,” said Patricia Sanders, chair of the panel.

Loverro hinted at other problems in Boeing, notably the grounding of the 737 MAX jetliner because of software issues, as one reason to do the safety assessment. “There could possibly be process issues at Boeing, and so we want to understand what the culture is at Boeing that may have led to that,” he said.

Loverro, though, defended not revealing the problem earlier, which came to light only when NASA briefed the ASAP about an interim report by the independent team which the panel then mentioned as its public hearing. “We didn’t end up having an anomaly. We found an issue and we fixed it,” he said, saying talking about it earlier would have amounted to speculation.

“This is normal,” he added. “During test, we’re always going to find things that are wrong, and our job is to test and find them and then fix them.”

That overall process of finding and fixing the problems could take some time. The IRT is scheduled to complete its work at the end of the month, at which time NASA promises to provide more information about what went wrong and what will be done to correct it.

While NASA administrator Jim Bridenstine pledged transparency in the process, he also made clear during the telecon that these details were coming out at this time only because of the ASAP meeting the day before. “But in the interests of transparency, and some of the things that I saw online yesterday, I wanted to make sure that everybody knew kind of where we were in the investigation,” he said.

Neither Boeing nor NASA would commit to a schedule for completing those reviews or deciding whether to move ahead with a crewed flight test, as originally planned, or perform a second uncrewed flight test. Loverro said even the existence of the second software problem and the process errors that led to it doesn’t mean another uncrewed test flight is required. “You don’t go ahead and do flight tests to verify that you’ve solved problems. You do flight tests to look at a holistic picture of the system.”

Boeing is preparing for the possibility of a second uncrewed flight test. On January 29, Boeing announced it was taking a $410 million charge against its earnings largely to cover the costs of a second test flight, along with other testing and reviews. The company mentioned it briefing in its earnings call with financial analysts, but the news was lost in the far more serious financial issues involving the 737 MAX.

In an interview the same day, Chilton said the decision to take the $410 million charge was a precautionary one. “We’re ready for anything,” he said. “NASA is going to decide what we should do next, but they could decide to refly an OFT, and if that’s what they want to do, we’re ready for it.”

Meanwhile, at the same ASAP meeting that revealed the new Boeing software problem, members were optimistic about the prospects of SpaceX flying people on its Crew Dragon spacecraft soon. Sanders said SpaceX still has to resolve a number of technical issues, such as the interaction of titanium with nitrogen tetroxide, which was blamed for the explosion of a Crew Dragon spacecraft during preparations for a static fire test of its abort engines in April 2019. But, she added, “the end appears to be in sight” for that work.

“The panel’s assessment of the status of SpaceX is that NASA is at a point where there is not a question of whether they will be flying crew in the near term, but when, and under what risk conditions,” she said.

Note: we are temporarily moderating all comments submitted to deal with a surge in spam.