ARMS
ARMS
ARMS
Presented by:
Jari NISULA
Mgr, Airline Safety Mgt Systems
Airbus
Page 1
Presentation contents
2009 Page 2
Page 2
Central role of ”Risk” in the SMS framework
2009 Page 3
Risk Management has a very central role in the new SMS Framework, introduced
by ICAO.
The component 2 of the SMS Framework, “Safety Risk Management” is the part
where safety is concretely delivered, by identifying hazards, risk assessing them,
and by taking action to manage the risks.
The Management of Change (point 3.2 on slide) process often requires making a
Risk Assessment (or a “Safety Assessment”) on the new planned activity; for
example, opening a new route or introducing a new aircraft type. This again calls
for a good practical method.
Let’s now look more in depth into the component 2, “Safety Risk Management”…
Page 3
Risk Assessment within Risk Management
2009
(ICAO SMM)
Page 4
This chart comes from the ICAO SMM (edition 1). The Risk Management
process starts with Hazard Identification (HI). For an airline, this consists
typically of things like Flight Data Analysis, Safety Reporting…etc. This is
an area which has improved drastically in the last 10 years, and today, an
airline can have access to a large amount of very proactive safety data.
The second part (on red) is the Risk Assessment; in terms of severity,
probability and acceptability. This is the difficult bit, and this is what the
rest of the presentation will focus on.
Finally, the last part (on yellow) is the Risk Mitigation part. This is about
taking action in order to make sure that all risks remain at acceptable
levels. This is related to many organizational issues and even if it has its
own challenges, they are not related to the Risk Assessment Methodology
itself. A typical arrangement is to use the Safety Review Board and Safety
Action Groups to take care of this part.
Page 4
Objectives for a Risk Assessment methodology
Planned changes
RA Associated Risk
There are two main inputs. The first one is the operational Hazard
Identification data (produced by the source listed on blue on the previous
slide). The Risk Assessment method should be able – based on that data
– to create a good overview of operational risks; we could call this the
Operational Risk Profile.
We can now list objectives concerning the acceptable inputs, the method
itself and the results (see bullets).
Page 5
Problems with older methods – fictitious example
Classic approach to
Risk Assessment :
2009 Page 6
This may seem as a simple task, but a closer study of the problem reveals
fundamental problems caused by a deficient underlying conceptual
framework.
Page 6
Fictitious example (cont’d)
• Severity of what?
Actual outcome: blown tires?
Most likely potential accident scenario: overshoot with some
injuries & few fatalities (if any)?
The worst-case scenario: overshoot with 100% fatalities?
Shall you consider bigger A/C? More pax? Critical airports?
• Likelihood of what?
The same maintenance error?
Near-overshoot events?
Actual overshoot events?
Any A/C type? Any location?
2009 Page 7
Where do all these problems of assessment come from? After all, the
event is a historical fact – shouldn’t it therefore be easy to risk-assess it?
*The ICAO definition of risk refers to the “worst foreseeable situation”, which usually
equals to 100% fatalities. But this is not the same as the “most probable accident
outcome”, which in the real life may be a more useful concept.
Page 7
Conceptual confusion on historical events
• Further question:
Should we assess events or Safety Issues?
2009 Page 8
Page 8
Further problems
2009 Page 9
Trying to assess the likelihood when dealing with individual events causes
other problems. Importantly, when an event type becomes more frequent,
one should re-assess the risk of previous events by correcting the
likelihood higher, otherwise their risk level does not reflect the increased
likelihood. Such continuous re-assessment is not feasible in the real-life
context.
Moreover, if one wants to estimate the total risk over an event type (e.g.
TCAS events during approach to LHR), the temptation is to sum together
the risk values of the individual events. If likelihood was one the two axes
in the initial assessment, likelihood is now being taken into account twice
vs. severity only once. The answer is flawed.
Page 9
List of problems with older methods
2009 Page 10
5. The aviation system with its various actors, technology and variable
conditions is extremely complex.
Page 10
ARMS Methodology
2009 Page 11
Page 11
Airline Risk Mgt Solutions (ARMS) Working Group
2009 Page 12
Due to the complexity of aviation and the nature of risk assessment, it will
never be 100% scientific and objective, but we are convinced it can be
done significantly better than with existing methods and that’s our aim.
The result are valuable only if they are actually useful in the real-life
operational context. We wanted this methodology to be developed by
operational practitioners, so that almost by definition the result is
pragmatic.
The ECAST SMS WG took ARMS as the reference for operational risk
assessment, not trying to duplicate the ARMS work in any way.
Page 12
ARMS Mission Statement
The Mission of the ARMS Working Group is to produce useful and cohesive Operational Risk
Assessment methods for airlines and other aviation organizations and to clarify the related
Risk Management processes.
The produced methods need to match the needs of users across the aviation domain in terms of
integrity of results and simplicity of use; and thereby effectively support the important role that Risk
Management has in Aviation Safety Management Systems.
Through its deliverables, the Working Group also aims at enhancing commonality of Risk
Management methodologies across organizations in the aviation industry, enabling increased
sharing and learning.
In its work, the Working Group seeks contribution from aviation safety experts having knowledge on
the user needs and practical applications of risk management in the operational setting.
The deliverables of the Working Group will be methodology definitions – not necessarily software
tools. The first results will be delivered before 1-Jan-09 after which the potential continuation of the
work will be reviewed.
The results of the Working Group will be available to the whole industry.
2009 Page 13
Page 13
ARMS Methodology
2009 Page 14
Page 14
Level 1 deliverable:
Conceptual methodology
On light blue background
2009 Page 15
Perhaps the most important deliverable of the ARMS working group is the
conceptual methodology for Operational Risk Assessment.
Page 15
Level 2 deliverable:
Example application
On yellow/orange background
A little “C” in the corner reminds that this part may
sometimes be further customized for specific contexts.
2009 Page 16 C
In addition to the conceptual methodology, the ARMS group has develop a
concrete example application, including all necessary matrices and
guidance text.
Page 16
ARMS Methodology
2009 Page 17
Page 17
Key points of the ARMS Methodology
Before going into the Terminology and the Methodology itself, here the
key points of this new Methodology summarized on one slide.
Page 18
Terminology
2009 Page 19
In order to talk the same language, we have listed on the next few slides,
the Terminology definitions used by the ARMS group.
Why is Safety Issue such an important concept? Two reasons. First of all,
you can do something about Safety Issues. Managing Safety pretty much
equals managing your Safety Issues. Secondly, you can define a Safety
Issue very precisely and therefore carry out a good Risk Assessment
without much room for subjectivity.
Page 19
Terminology
• (Safety) Event
Any happening that had or could have had a safety impact,
irrespective of real or perceived severity (ARMS)
2009 Page 20
Page 20
Terminology
RISK
2009 Page 21
We started with the ICAO definition of Risk, but were forced to modify it a
little bit.
Page 21
Preferred use related to “Risk Controls”
• Synonyms:
Risk Control
Barrier
Protection
Measures to avoid or to limit the bad
Defense outcome; through prevention, recovery,
mitigation. (SHELL)
• Not used:
Safety Barrier (misleading)
Protection, defense (for harmonization reasons)
2009 Page 22
Page 22
Not used due to several meanings
• Threat
Another meaning in the TEM context
Usually the word scenario can be used instead
• Mitigation
Classic= post-accident risk controls
ICAO = all risk controls (prevention, recovery, mitigation)
Used: controlling risks or reducing risks (verbs)
Used: Risk Controls, Barriers (nouns)
2009 Page 23
Mitigation again has two meanings. We try to avoid the word all together.
Page 23
Process summary – simplified schematic
Safety
Events
10 30 100 300
Urgent Actions?
3 10 30 100
Risk Reduction
1
Let’s now go into the methodology itself. It is important to start from the
overall process. This is a simplified summary.
The starting point is the safety data, which flows in from Hazard
Identification. The incoming elements are typically events. Due to this fact,
and due to the need to screen for item requiring urgent actions, the first
step has to be a quick screening of all incoming events. The purpose is
not a thorough analysis, but only a first-cut classification.
The data flows into a safety database, which is used for trend analysis.
This may lead to actions due to increasing trends, etc, sometimes without
a formal risk assessment. A key step here is to identify the Safety Issues.
The Safety Issues (SI) are then subject to a detailed Risk Assessment.
Safety Issues are no longer single events, but well-defined Issues,
typically highlighted by several events.
Page 24
Safety
Events
10 30 100 300
3 10 30 100
Investigations
All Data
Actions to
Data Analysis Safety
-Frequencies reduce risk
-Trends Performance
-Identification of Safety Monitoring
Issues
All collected
safety data
-Categorized
-ERC risk
index values
2009
2008-J.Nisula/Airbus Page 25
The Database has all the safety data in a structured format, enriched by
descriptors covering things like date, a/c type, location, flight phase. But
each event now also has a risk index value coming from the ERC. These
values can be used in statistics. Data Analysis is about looking at the data
with the help of the descriptors, statistical tools and graphs/charts in order
to detect Safety Issues. It is also the basis for monitoring the Safety
Performance.
Identified Safety Issues are risk assessed in the “Safety Issue Risk
Assessment” (SIRA). This will provide risk tolerability information on all
detected risks.
Finally, Risk values and related actions are monitored through the Risk
Register database.
Page 25
Event Risk Classification (ERC)
We spent a lot of time working on how to deal with historical events, and
the concept of risk related to them. The first conclusion, which we hope
makes sense to everybody, is that when dealing with an individual event,
we should not try to estimate its frequency.
When you ask the question: “what really makes an event worrying,
concerning, frightening?”, you realize there are two main factors: how bad
could it have been (as an accident) and how close did it get (to the
accident). The Risk Assessment of historical events is based on these two
dimensions, which translate to more specific questions.
What we are measuring is the risk experienced in the event under study,
that day, in those conditions. This acknowledges that some barriers have
already been breached, and what really matters is the remaining set of
barriers and their effectiveness. This is the Risk we measure with the ERC
matrix, presented on the next slide. If you look at tomorrow, the risk would
be different, because now you would assume all the barriers to be in place,
a priori.
Page 26
Event Risk Classification (ERC)
Question 2
What was the effectiveness of the remaining Question 1
barriers between this event and the most If this event had escalated into an
probable accident scenario? accident, what would have been the
Effective Limited Minimal Not effective most probable accident outcome?
1 or 2 fatalities, multiple
10 20 100 500 Major serious injuries, major
damage to the aircraft
No potential damage or
1 Negligible
injury could occur
We have guiding questions to take the user through the ERC assessment.
Having only 4 classes both ways helps making this assessment easily.
The guidance text for each class can be customized to specific
applications.
One has to keep in mind that the overall purpose is only to make an initial
estimate of the risk, so that the event is classified correctly. This is not the
final risk assessment. This classification should be possible even without
the guiding text, just based on the two questions.
Why is the bottom row just one block? Because if you say that this event
could not have escalated into an accident, then it makes no sense to
estimate the remaining safety margin.
Page 27
Event Risk Classification (ERC) - example
Question 2
What was the effectiveness of the remaining Question 1
barriers between this event and the most If this event had escalated into an
probable accident scenario? accident, what would have been the
Effective Limited Minimal Not effective most probable accident outcome?
1 or 2 fatalities, multiple
10 20 100 500 Major serious injuries, major
damage to the aircraft
No potential damage or
C
1 Negligible
injury could occur
2009 Page 28
The most probably accident outcome would have been a slow speed
overrun with injuries but without multiple fatalities. (This is a good example
of why we did not like the risk definition phrasing “worst foreseeable
situation” which would often be too severe).
There were no remaining barriers left. It was pure luck (or favorable
conditions) which made the plane stop on the runway and not just after. (A
physical net at the end of the runway would be such an extra barrier,
though).
This leads you to the red zone of the matrix with risk index 500.
Page 28
Event Risk Classification (ERC) - RESULT
2009 Page 29 C
The first result is the color.
Typical examples of the color’s meaning are presented above. These are
naturally subject to adaptation in each organization.
Page 29
Event Risk Classification (ERC) - RESULT
• The ERC will also produce a numerical Risk Index value for
each event
Etc.
C
1
2009 Page 30
The values (which can naturally be customized) have been derived semi-
scientifically by looking at insurance data on accidents. The date shows
that the amount of loss in different categories of accidents was roughly
1:5:25. The objective is also to create roughly exponential scales both
ways and make sure the difference between the lowest and highest value
is at least about 1000.
You can ask yourself how many of your least severe events you would
need, to consider their cumulative risk as high as that of one of your most
severe events (fatal accident avoided by pure luck).
Page 30
Data Analysis - example
40 3500
Number
35 3000
30
Total ERC 2500
25
2000
20
1500
15
1000
10
5 500
0 0
AAA BBB CCC DDD EEE
Airport
2009 Page 31 C
This is an example of Data Analysis (see next slide) and the use of ERC
risk index values.
Just looking at the absolute numbers of events (in this case unstabilized
approaches) can be misleading. Using rates is better, because the data is
normalized based on the exposure data. But still, it is only looking at
frequency of events, not their severity or risk.
By summing the ERC values of the events (in this case per airport), one
gets an estimate of the risk of these events, cumulatively, per airport. This
can give a completely different picture, like the above example illustrates.
Each graph tells a true but a different story and it is important to look at
each one of them.
Page 31
Safety
Events
10 30 100 300
3 10 30 100
Investigations
All Data
Actions to
Data Analysis Safety
-Frequencies reduce risk
-Trends Performance
-Identification of Safety Monitoring
Issues
All collected
safety data
-Categorized
-IRC risk
index values
2009 Page 32
Page 32
Events vs. Safety Issues
• Examples (fictitious):
Windshear at approach to XXX
Quality of de-icing in YYY
Operation into ZZZ (high-altitude, short runway, …)
Fatigue on red-eye flights
• You can Risk Assess Safety Issues because you can define &
scope them precisely
2009 Page 33
They evolve in time, old ones disappear and new ones emerge. For
example, high fuel price makes companies try fuel saving through new
procedures, which may introduce new Safety Issues.
Safety Issues can be precisely defined, which makes the eventual Risk
Assessment clear, transparent and credible.
Page 33
Conceptual framework for Risk Assessment
mx
T Air collision
EN
ops
OID
EV
Rwy overrun
R
PR
AV
ground
ES
VE
Undesirable
event
SS
CO
HAZARDS, SI’s ACCIDENTS
LO
RE
atc Ground
collision
ZE
wx
IMI
CFIT
MIN
HAZARD FREQUENCY
AVOIDANCE BARRIERS
RECOVERY BARRIERS
ACCIDENT SEVERITY
2009 Page 34
Safety Issues are risk-assessed regularly through SIRA (Safety Issue Risk
Assessment). A vital starting point is a proper conceptual framework for
such an Assessment.
For example, the initial hazard could be a maintenance error affecting the
braking system and the accident scenario a runway overrun. The
Undesirable Event is the point in time marking the transition from
Avoidance to Recovery, which in this case could be defined as landing on
a runway where the brakes are needed. Avoidance would be anything
allowing the detection of the maintenance error before the landing and
recovery would be a safe landing despite the problem (which might be
impossible, I.e. sometimes there are no recovery (or avoidance) barriers).
Page 34
Safety Issue Risk Assessment (SIRA)
2009 Page 35
The actual method for SIRA can be constructed in many different ways.
As input, there are the values for the four factors, and as output the risk
level. One can think of a simple excel-application, or an approach based
on two sub-matrices feeding to a final tolerability matrix, as presented on
the next slides.
Page 35
Safety Issue Risk Assessment (SIRA)
1. How frequent is the initial hazard (per sector)?
10-4 2 3 4 5 5
10-5 1 2 3 4
4
10-6 1 1 2 3
3
10-7 1 1 1 2
2
B C D E Catastrophic
2009
A A A B Negligible
Page 36 C
The first matrix contains the first two factors (frequency & avoidance
barriers). Here the calculation is done per flight sector, but this aspect can
naturally be customized. The barriers are like filters, through which a
certain fraction of events pass. The task is to estimate is the fraction
rather 1/1000, 1/100, 1/10 or virtually always.
The second matrix uses the same scale for recovery barriers and then
integrates the accident severity. Here the reference is again the most
probable accident scenario. From its conceptual content, this second
matrix is actually similar to the ERC matrix.
The results of the two matrices are fed to the final matrix, which gives the
tolerability of the risk.
Page 36
SIRA - Example
1. How frequent is the initial hazard (per sector)?
Safety Issue:
10-4 2 3 4 5
• Risk of runway overrun
at any airport in the 10-5 1 2 3 4
A B C D Major
A A B C Minor
2009
A A A B Negligible
Page 37 C
This example illustrates using the SIRA for the above described Safety
Issue. It cannot be stressed enough, that the base for a good assessment
is in a very precise definition of the Safety Issue. Such definitions allow the
assessment to be based on facts, rather than hazy assumptions. For
example, when the applicable airports have been defined, one can work
on actual runway length data.
Page 37
SIRA – Example (cont’d)
5 Stop
4
Improve
3
Secure
2
Monitor
1
A B C D E Accept
Note:
• Another SIRA application uses Excel instead of the
2009
intermediate matrices.
Page 38 C
The result on the final matrix would be on the red, meaning that this part of
the operation needs to be stopped immediately – this risk cannot be
tolerated at all.
This way of applying the SIRA with matrices is very visual, but introduces
some limitations in the range of values on the axes. For example, how to
cover hazards more frequent than 10-4? Calibrating the tolerability can
also become quite a demanding exercise and in some cases too
conservative compared to the JAR/FAR-25.1309.
Page 38
Conceptual difference between ERC and SIRA
mx
T Air collision
EN
ops
OID
EV
Rwy overrun
R
PR
AV
ground
ES
VE
Undesirable
event
SS
CO
HAZARDS, SI’s ACCIDENTS
LO
RE
atc Ground
collision
ZE
wx
IMI
CFIT
MIN
“How concerning was this event?”
ERC
ERC
ERC
SIRA
This slide illustrates the conceptual difference between ERC and SIRA.
In ERC, the historical event may (or may not) have reached the level of
the Undesirable Event, or escalated even further. Therefore, some
barriers have typically already been breached, they are history and we do
not care about them. What counts are the barriers that were still in place in
the historical event. We assess the risk present there and then.
Page 39
Hazard Identification data Operational Risk Profile
Planned changes
RA Associated Risk
RA of Future Risks:
• Hazard
Analysis:what
could go wrong?
• Risk Assess
identified threats
as Safety Issues
2009 Page 40
Let’s come come back to the objectives for the Risk Assessment
methodology for a while.
So far we have seen that the ARMS methodology can digest Hazard
Identification data and transfer it into an Operational Risk Profile. This is
done through plotting Safety Issues on a “risk map” using the SIRA
values, and also based on statistics using the ERC risk index values.
But what about the planned changes and the associated Future Risks?
Page 40
Safety
Events
10 30 100 300
3 10 30 100
Investigations
All Data
Actions to
Data Analysis Safety
-Frequencies reduce risk
-Trends Performance
-Identification of Safety Monitoring
Issues
All collected
safety data
-Categorized
-IRC values
2009
2008-J.Nisula/Airbus Page 41
This type of Future Risk assessments start in the bottom left corner of the
process chart.
The first step is to analyze the hazards associated with the change. There
are various systematic techniques for this. These are beyond the ARMS
process and are not discussed here.
Once the hazards have been identified, they can be formulated as Safety
Issues and fed into the same SIRA as used earlier.
Page 41
ARMS Methodology
2009 Page 42
Page 42
Safety Accountability and Safety Delivery
Board of Directors
nc y
ACCOUNTABILITY
c CEO
su ren
e
As pa
ra
Safety Review Board
s
CEO
fe tran
sk
ty
Ri
COO
Sa
Qty Mgr
Safety Mgr
Mgr
DELIVERY
Safety Mgr
t
ag sis
en
Postholders & Mgt team
y
em
M nal
Postholders A
an
sk
Mgt team
Ri
Local SAG’s ?
y t
fe
Sa
2009 Page 43
The fundamental split is between the Top Management and the rest of the
organization – the former having the Safety Accountability and the latter
being responsible for the Safety Delivery.
The Risk review and action managing tasks at different levels of the
organization are managed through the Safety Review Board (SRB) and
one or more Safety Action Groups (SAG) – sometimes called Safety
Committees.
The Safety Manager is not accountable for the Safety Performance, but
responsible for the Safety Management System itself.
Page 43
Roles and organization
2009 Page 44
Page 44
Roles and organization
Hazard Identification
Tools, methods
Risk Assessment
Expertise
Ensuring safety actions
SMS quality and evolution
2009 Page 45
Page 45
Conclusion
2009 Page 46
Page 46