Over the last several years, the term resilience has entered the aviation industry’s lexicon in a dominant way. Resilience has been introduced as a topic in crew resource management training, and safety management efforts increasingly are oriented toward ensuring resilience in the face of disruptions. But defining the term is easier than putting the concept into operation.
According to Capt. Pierre Wannaz, senior adviser at CEFA Aviation, the term resilience originates from the physical property of material absorbing energy when the material is deformed elastically. The Merriam-Webster Dictionary says resilience is the capability of a strained body to recover its size and shape after deformation causes especially by compressive stress.
“The term has been used in psychology, in business, and in many other fields,” Wannaz says. “In essence, resilience is defined as the ability to successfully adapt and respond positively to difficulties or other adverse conditions.”
Resilience may be confused with robustness, but they are not the same. “Robustness is more about how we respond to challenges when there are ways that were trained and skills and resources that were provided,” according to Shawn Pruchnicki, assistant professor at the Center for Aviation Studies at The Ohio State University. “Resilience is more about how we go beyond that and how we continuously adapt to changing situations for which we may not necessarily have procedures. … In the aviation realm, it is when we are faced with unexpected events for which we do not have the likes of specific checklists that we can demonstrate” our level of resilience.
Andy German, lead safety consultant at Atkins, observes that with resilience, it is possible to sustain required operations under both expected and unexpected conditions, including rare “Black Swan” events – unpredictable events with the potential for severe consequences.
An example is the catastrophic uncontained engine failure on Qantas Flight QF32, an Airbus A380, in 2010. The uncontained failure caused significant damage to numerous aircraft systems including fuel, hydraulic and electric systems; flight controls; and engine controls, presenting the crew with an extremely challenging situation.
“Quite rapidly, the crew realised that some checklist actions were impossible, the systems were damaged beyond the possibilities envisaged by the computer and some sensors had also failed. Without an efficient crew that started to look at what was still working and what could be done with the systems still operating, the outcome would have most certainly been dramatic,” says Wannaz.
“Unfortunately, it is often the case that the strict application of procedures is set as the most important priority. Respecting procedures is a must in aviation and should be a part of the pilot’s DNA, but putting the procedures as a dogmatic issue, ahead of the pilot’s judgment, can lead to situations with no gain in safety, if not to a catastrophic event,” says Wannaz. “In the case of QF32, if the pilots had just followed the electronic centralised aircraft monitoring (ECAM) checklists, they would have spent much of their time head-down, focusing on one computer, potentially losing the overview of the situation and performing some useless, inefficient, if not dangerous, switching. Their achievement was the result of a competent crew that had sufficient confidence in what they had realised from the situation and from their flying skills.”
Another example of resilient performance is illustrated by Capt. Chesley Sullenberger’s ditching in 2009 of US Airways Flight 1549, an Airbus A320, on the Hudson River in New York following a bird strike on takeoff. “It was totally resource-constrained, primarily it was time-constrained,” says Pruchnicki. “The crew needed to adapt and only perform part of a checklist, and they had to be able to figure out on the fly what part of that checklist was going to be valuable in the next two minutes. They were not paralysed by this unexpected event, which was one that they were not specifically trained for, but rather adapted procedures to meet the challenge, to adapt to the circumstances.”
Proactivity vs Resilience
Being resilient also differs from being proactive, as the latter has to do mainly with trying to anticipate problems that might occur, either because of similar past occurrences or because of issues or trends identified by flight safety departments.
“‘Proactive’ is the base level of expectation that an organisation should strive to achieve; to be resilient, an organisation needs to move on from proactive and through the preemptive stage to be optimised,” says Matt Simpson, technical director for cyber resilience at Atkins.
Resilience includes the ability for operations to continue even after safety has been compromised, Simpson says. “Within the aviation industry, this means ensuring that systems do not shut down – from air traffic control to the air-conditioning within a room of servers – even if, for example, the cyber aspects of the system are not working properly,” he says.
“Today, the emerged part of the iceberg is well known, but the roots of what happened in the emerged part [are] the result of many ‘small issues’ happening in the submerged part,” says Wannaz. “By tracking and identifying underlying problems or risks, the safety department can proactively deal with potential issues, thus avoiding a startle effect that would affect an improperly trained crew. Unfortunately, that is not sufficient, due to the presence of both the identifiable and the ‘black swan’ events. Resilience and proactivity have both to be present in aviation today.”
More Than the Sum of Parts
Resilience requires that consideration be given to the performance of individuals, teams, organizations and even systems. All of these dimensions mutually reinforce one another.
Team resilience comprises the resilience of the individuals involved, but it would be limiting to say that it derives from simply adding all of the individual capabilities, according to Pruchnicki. “In a resilient team, there develops a symbiotic relationship where the emerging performance is more than just the individual skills brought together. Team resilience is a characteristic that we see in individuals that come together with little to no notice and perform a function adequately as a team to meet the challenges they are faced with,” he says.
“Organisational resilience has to do with the infrastructure that is in place and the resources that are provided to deal not only with normal everyday life, but also with life when we are faced with exceedances. So, it is something that tends to be more pre-planned and that is brought together ahead of time.”
A team’s resilience depends on the ability of individuals to draw on experience and training, culture and behaviors, together with the ability to rely on the people around them. “It builds on personal resilience but should be able to depend on investments made in managing potential risk to deal with change and disturbance (preparedness), as well as the provision of clear leadership and strategy,” says German. “Investment in resilience is in conflict with a number of practices brought in to minimise costs, for example: minimising equipment types being used in an organisation … and not investing in people, resulting in organisations being ‘one brick thick.’”
Simpson observes that, for example, the majority of cyber events are not malicious attacks. Instead, malware is most often introduced to networks through employee error – that is, clicking on a phishing link or using personal portable media such as USB sticks, which become malware-infected though uncontrolled use. “Therefore, organisations need to invest time and effort into educating all of their staff to improve organisational safety. A people-focused change-management program, as well as the introduction of robust processes and security technology, are a good starting point to making an organisation and its infrastructure cyber resilient,” he says.
System Resilience
The U.S. National Aeronautics and Space Administration’s (NASA’s) Resilient Autonomy team is currently developing the Expandable Variable-Autonomy Architecture (EVAA) system. EVAA will determine, by following a set of programmed rules, when safety should take priority over a mission and when human safety should take priority over vehicle safety.
“We are working on autonomy and system resilience,” says Mark Skoog, NASA Armstrong Flight Research Center principal investigator for autonomy. “Many individual components have resilience embedded into the design. This may be as simple as redundancy, though we try to use this as only a last resort. We emphasise verification through crosschecks. This may be at the component or sensor level, but it extends all the way up to the system level.”
Air traffic management (ATM) is another example of resilience at the system level. In ATM, resilience is about minimizing the impact of disruptions to air traffic operations. “Harmonisation is required for solutions that involve strong multi-stakeholder collaboration, such as trajectory-based operations and system-wide information management,” says Ruben Flohr, ATM expert at the SESAR JU (Single European Sky ATM Research Joint Undertaking).
Training for Resilience
From a resilience development perspective, humans are seen as a trained resource that provides system flexibility and resilience. “Individual resilience can be developed through training and operational experience, but it depends on an organisation’s investment made in managing potential risk to deal with change and disturbance,” says German. “We use tools such as Structured What-if [Technique], root cause analysis and bowtie [analysis] to model defense in depth, and socio-technical models such as systems theoretic process analysis (STPA) to consider the people, equipment, monitoring systems and data required for a safe resilience system. Also, the use of cyber vulnerability assessments can help a business understand the cyber risks it should be mitigating.”
Pruchnicki said that there are some basic areas — such as anticipation, learning and monitoring — where improved abilities can help individuals and teams to be more resilient. “But how can someone be trained to anticipate or to monitor better? This is where it may be difficult to operationalise resilience,” he says.
The implementation of evidence-based training (EBT) represents a first step. “Emphasising the systematic debriefing of the flight crew after a flight, focusing on the potential points to improve but also on the positive aspects of performance, is a boost towards confidence, which is a key element of resilience. By better understanding one’s own performance, competence can also be improved,” says Wannaz.
Under EBT, instead of focusing just on the result, the trainee is faced with the process of achieving that result. By understanding the process, the pilots will learn about their strengths, but also about their weaknesses. “By knowing the strengths and the weaknesses we have, it is possible to elaborate on working strategies so that strengths are used positively, and workarounds or other strategies can be put in place so that these weaknesses do not become an obstacle in the overall crew performance,” says Wannaz. “To illustrate with a personal example, as a captain, in an abnormal situation, I cannot (unlike some other colleagues that I can observe as an instructor) make an efficient ‘public address’ to my passengers, or convey an emergency briefing to the cabin crew, spontaneously. To compensate [for] this weakness, my strategy is that I need a few seconds of reflection and a paper to [write down] some keywords that will enable me to communicate a clear and efficient message.”
Fostering Resilience
Organizational resilience largely depends on an organization having an empowered, learning and just culture. “The culture, in turn, is dependent on the leadership, investment in preparedness and reward/punishment systems in place,” says German. “Leadership needs to be clear about the organisational goals, so in part, this is about considering the business as being in an infinite game and not a ‘quarterly reporting’ finite game. If a country (or union) wishes to encourage resilience, then legislation and regulation is necessary.
The resilient autonomy architecture that NASA is building deals with integration in the aviation system of nontraditional types of missions, such drone package delivery or urban air mobility. These missions operate much closer to people and infrastructure than aviation has ever operated in the past. “Our current system is [designed] to get aircraft up and away from people and buildings as quickly as possible, and when they are near or on the ground, place many barriers between the operating aircraft and people,” says Skoog. “Drone package delivery on the other hand is ‘in your face’ aviation (i.e., operating to and from close proximity to people) and in some cases, in and out of unimproved landing sites. This puts a level of complexity of the environment at a number of orders of magnitude more complex than aviation has dealt with in the past.”
NASA’s approach is to have many separate computerised monitors, each focused on one aspect of safety. “We channel these monitors through a central function we call ‘the moral compass,’ which gives control of the aircraft to the highest consequence task/monitor. Consequence is dictated by a set of rules of behaviour, which dictate when safety should outweigh the mission and when vehicle safety should be compromised to protect human life and limb,” says Skoog. “Our group will not be setting the rules of behaviour; that is for the governing authorities, such as the FAA [U.S. Federal Aviation Administration], state and local governments. We are instead establishing a framework/architecture that allows a set of rules of behaviour to be plugged into the aircraft, and the internal systems will ensure the vehicle will conduct itself accordingly.”
SESAR solutions aim to increase ATM system resilience in areas including enhancing safety, mitigating the impact of weather events and increasing resource flexibility. “Safety nets increase resilience and can be introduced in many different areas of operations,” says Flohr. “At airports, these safety nets help to mitigate the risk of runway incursions and excursions, such as traffic alerts for pilots and conformance monitoring notifications for controllers.” Controllers also have tools to ensure the safety of the aircraft once airborne. These range from reactive short-term conflict detection tools to more proactive medium-term systems for planning and tactical operations.”
Image: © Steve Jurvetson | Wikimedia CC-BY 2.0
Mario Pierobon is a safety management consultant and content producer.