Root Cause Analysis Half Day
Root Cause Analysis Half Day
Root Cause Analysis Half Day
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization
and Reliability Engineering
University of Toronto
1
CONTENT
Heat
Fire
Fuel
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 5
University of Toronto
Introduction: Condition vs. Action
Action
Heat
Probably the Fuel was there for a while,
then somebody added the Heat …
Fire
Condition We can consider the Fuel as a Condition
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 6
University of Toronto
Introduction: Condition vs. Action
What about fire cause triangle?
Action
Heat
Consider the causes again:
Condition
Fire Fuel
We can have more than
two causes for each effect,
Condition but at least one of them
should be an action
Oxygen
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 7
University of Toronto
Introduction: Infinite Continuum
Now consider this cause and Effect chain:
What is this?
A cause?
What if we ask Why?
Now the Cause became Effect itself
Relief High
Vessel Corrosion
Valve Moisture
Explosion Build up
Stuck Content
You can ask Why and go back in the cause and effect chain
The limit is our knowledge (or desire to go back)
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 8
University of Toronto
Introduction: Infinite Continuum
Now consider this cause and Effect chain:
What about this Effect?
What is caused by vessel explosion?
This time the Effect became the Cause
Relief High
Operator Vessel Corrosion
Valve moisture
injured Explosion Build up
Stuck content
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 9
University of Toronto
Introduction: Infinite Continuum
Now consider this Cause and Effect chain:
Relief High
Operator Vessel Corrosion
Valve Moisture
Injured Explosion Build up
Stuck Content
Vessel
What happened between this Cause and Effect?
Cracked
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 10
University of Toronto
Introduction: Event Based Problems
So, when we are speaking about
root cause analysis, it means there
are a continuum of events which
were happened in the past and
caused the problem,
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 11
University of Toronto
Introduction: Event Based Problems
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 12
University of Toronto
Introduction:
Infinite Continuum
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 13
University of Toronto
Introduction: Goal of RCA
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 14
University of Toronto
Introduction: Goal of RCA
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 15
University of Toronto
Introduction: Different Methods
There are so many extensive and complicated RCA methods in the market
why?
If we are looking only for solutions, why to spend so much on casual
analysis?
There is a robust answer:
➢ more we understand the causes, better we can control them
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 16
University of Toronto
Introduction: Effective Problem Solving
To have an effective organization, there is a
need to have effective problem solving abilities.
• Blaming
• Common sense: a single reality
• Management expectation of finding a single root cause
• Failure to ask enough Whys
• Failure to define the problem correctly
• Mistaking problem description for problem analysis
• Effect of past experiences
• Linear thinking
• Categorization
• Narration culture
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 18
University of Toronto
Introduction: Blaming
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 19
University of Toronto
Introduction: Blaming
Why blaming is so common by organizations?
• It is usually used to make examples, discouraging other
personnel from making the same/similar mistakes.
What is the real outcome?
• Problem likely to reoccur
• Bad feelings
• Hiding mistakes
• Prevents learning
• Personnel would have no intention to be involved in complex analyses
• Expanding gap between management and employees
• Increasing probability of disasters caused by multiple failures
The result will not be an Effective Organization
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 20
University of Toronto
Introduction: Blaming
Don’t forget:
• Most problems occur unintentionally.
• We are all humans, humans make mistakes, we need to
correct those mistakes.
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 21
University of Toronto
Human Errors
(adopted from RCM II John Moubray)
Human errors can be grouped into four categories:
• Anthropometric factors: Errors that occur because a person (or part of a person,
such as a hand or arm):
– simply cannot fit into the space available to do something
– cannot reach something
– is not strong enough to lift or move something
• Human sensory factors: Errors that occur because a person cannot see (field of
view, colour schemes), or cannot hear (background noise levels)
• Psychological factors
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 22
University of Toronto
Failure Modes:
Human Error: Psychological
UNINTENDED
ACTION
“Does the job wrong”
ERROR
INTENDED
ACTION
“Does the wrong job”
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 23
University of Toronto
Failure Modes:
Human Error: Psychological
Attention failures
SLIP Do incorrectly something
I normally do correctly
UNINTENDED
ACTION Memory failures
LAPSE Miss out a step in a planned
sequence of events
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 25
University of Toronto
Introduction: Single Root Cause
28
Preparation: Which Problems
• Availability?
29 incident RCA Effort & Details
• Quality
• OEE?
• Safety? 300 Near Misses Less
Heinrich Pyramid
• Environment?
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 30
University of Toronto
Preparation:
Limited Time and Resources
So many problems, so limited time and resources;
What we can do?
PRIORITIZATION
What common tools we have to prioritize?
▪ Pareto Analysis
▪ Cost – Benefit Analysis
▪ Decision Matrix
▪ Jack Knife Diagram
▪ Hybrid Approach
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 31
University of Toronto
Preparation: Pareto Analysis
When there is a main important measure, Pareto
analysis will be simple and effective:
• 80 / 20 Rule
• The significant few vs. worthless many
100%
80%
Unavailability
60%
40%
20%
0%
5 6 4 7 2 1 3 9 10 8
Truck Register Numbers
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 32
University of Toronto
Preparation: Cost – Benefit Analysis
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 33
University of Toronto
Preparation: Decision Matrix
Comparing the effort required to solve the problem
with its impact in a 2D matrix
1 1 2 3
• Impact / Benefit Most
Impact
Least
Most
problem weight = effort impact
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 34
University of Toronto
Preparation: Hybrid Approach
✓ Preparing 2D or 3D matrix
✓ Using Pareto or ABC Analysis to prioritize
• A: top 15%: Most Critical 5.00
• B: mid 20%: Moderately Critical C B A
Probability of Success
• C: last 65%: Least Critical
4.00
3.00
2.00 3.00 4.00 5.00
Impact of the Problem
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 35
University of Toronto
Preparation:
Sporadic vs. Chronic Failures
Which one will receive a greater attention from the
organization?
Failure of a passenger This on average has caused 1 More than 60 hours
seat adjustment minute daily flight delays during of delay in 10 years
mechanism of an aircraft the last 10 years of the aircraft life per an aircraft
Collision of an airport This can happen for 1 out of 10 2.4 hour delay and
truck to empennage of a aircrafts in their lives and 1000 $ repair cost
passenger jet requires on average 24 hours per an aircraft
plus 1000$ for repair
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 37
University of Toronto
Preparation:
Triggering Criteria / Threshold
• By which criteria should
5.00
we decide for new
problems? C B A
4.00
Effort
3.00
2.00 3.00 4.00 5.00
Impact
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 38
University of Toronto
Note that, Triggering Criteria can be defined for different
aspects/consequences (related to different goals and policies of
the organization):
• Safety
• Environment
• Production
• Critical assets
• Financial
• Non compliance to …
• Or a weighted average of all above
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 39
University of Toronto
Preparation
➢ When a triggering threshold is found to have been
reached, preparation actions should be taken to
support the effort before the start of analysis.
• Actions (if) required to preserve the data:
Photos, videos, etc.
Coordination with repair/maintenance personnel if we guess any
special part or evidence should be preserved.
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 40
University of Toronto
Preparation: Operating Context
Operating context document should include:
• The preliminary definition of the problem which probably would be
revised during the analysis.
• Interviewing those who may have information about different aspects
of the problems, including experts from inside or outside of the
organization.
• All the information initially collected
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 41
University of Toronto
Preparation: Operating Context
• Operating context document should be
concise and complete.
• Operating context should include a brief
description and reference to relevant
information with all required drawings/maps
etc.
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 42
University of Toronto
Preparation:
Operating Context References
Operating context may have statements about and / or references to relevant
information or documents :
• Event Time lines
• Performed response actions
• Equipment / system / plant design specification (vs. real operating conditions)
• Any recent changes prior to event (of any type)
• Related processes documents (production, maintenance, quality, change, procurement,
training, etc.)
• Relevant flowcharts
• Schematic drawings of equipment / system / plant
• Process & Instrumentation Diagrams (P&IDs)
• Environmental conditions (temperature, pressure, humidity, light, noise, etc.)
• Photographs and / or videos (taken soon after the event)
• Equipment conditions and indications before / during / after incident and in
comparison to normal operations
Continued …….
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 43
University of Toronto
Preparation:
Operating Context References
Operating context References – Continued
• Equipment conditions and indications before / during / after incident and in
comparison to normal operations
• Logs and reports
• Shift change information
• Human factors (time: day/night/ after sleep / meal /days off, incorrect guidance /lack of
knowledge, complacency , imprecise communication, distraction, fatigue, work stress,
high workload)
• Any other Information about human intervention with the process prior to the incident
• Observations and interview documents
• OEM recommendations
• Historical records, including similar events (and their differences)
• Records and documents of any recent changes
• Patterns and trends
• Third party reports
• Legal / organizational regulations/ requirements
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 44
University of Toronto
Data Preservation
• It is likely that some data / evidences get lost in the
period between the event and official beginning of
investigations.
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 45
University of Toronto
Team members
A natural team for a typical industrial case:
• Operation supervisor
• Experienced Operator
• Maintenance supervisor
• Experienced technician
• Facilitator
• Expert
Continued …..
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 46
University of Toronto
Team members
• Any person who has information, is affected by the problem, or can
be effective to solve it should be engaged somehow.
Continued ……
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 47
University of Toronto
Team members
• The team should participate in problem definition,
developing causal network, identifying solutions and
implementing them.
• RCA team leader should be selected from associated
department or section where problem has originated
from.
• Usually best ideas can come from the people who are
expert in the subject but were not involved in the
problem itself.
• Be cautious about the people who keep throwing the
problem to others fields
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 48
University of Toronto
Problem Definition:
Problem Definition Importance
➢ What will happen if the problem is not well defined?
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 49
University of Toronto
Problem Definition: 5W + H
The questions usually arise in any problem investigation:
• Who
• What The only who questions:
Who knows more; who has the information.
• When
Hold this question for now, it will be
• Where dealt with in future steps.
• Why Importance:
Safety
• How Environment
Cost
Revenue
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 50
University of Toronto
Problem Definition:
What: Primary Effect
➢ In the causal network, usually there are some effects which
are more important; the ones which catch the attentions.
➢ Any effect we want to prevent from reoccurring can be the
primary effect of our problem.
➢ This is what we want to solve.
Relief High
Operator Vessel Corrosion
Valve Moisture
Injured Explosion Build up
Stuck Content
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 52
University of Toronto
Problem Definition: Primary Effect
10,000$ What if we see the problem like this?
Damage
to Asset Relief High
Corrosion
Valve moisture
Build up
Vessel Stuck content
Operator
injured Explosion Vessel
Material
Strength
Lost
Production ✓ Sometimes a practical way to find the primary effect is to look
for the point where it branches into secondary effects
(consequences).
✓ These secondary effects are usually where the problem conflicts
with the goals of the organization, in other words, where
desired functions fail and problem becomes important.
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 53
University of Toronto
Problem Definition: Primary Effect
10,000$
Damage
to Asset Relief High
Corrosion
Valve moisture
Build up
Vessel Stuck content
Operator
injured Explosion Vessel
Material
Strength
Lost
Production How different people see the desired functions,
and what are they?
• Operation Supervisor?
• Maintenance Supervisor?
• HSE Officer?
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 54
University of Toronto
Problem Definition: Primary Effect
10,000$
Damage
to Asset Relief High
Corrosion
Valve moisture
Build up
Vessel Stuck content
Operator
injured Explosion Vessel
Material
Strength
Lost
Production Now what?
✓ Agreeing on primary effect will enable the team to
concentrate on solving one issue at a time.
✓ If consensus didn’t achieved about the primary effect, start with one of the
most accepted primary effects, different parts of this causal puzzle will be
completed later
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 55
University of Toronto
Primary Effect Examples; Use Noun-Verb Format.
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 56
University of Toronto
When
• When did the primary effect observed?
• Clock, date
• Continuous, intermittent, during running time
• Relative to other events
• Etc.
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 57
University of Toronto
When
• When Examples:
After intervention
Start of the night shift
After a routine test
Jan 31st, 1980
During earthquake
etc.
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 58
University of Toronto
Where
• Where did the primary effect observed?
• Geographically
• Physically on the asset
• Relatively to other assets
• Facility; System; Component
• Etc.
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 59
University of Toronto
Where
• Where examples:
• Smelter Furnace # 2
• Centrifugal pump #1 outlet
• Highly polluted area
• Close to gas station
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 60
University of Toronto
Importance
• Importance of the primary effect should
address the following issues:
• Safety
• Environment
• Production
• Cost
• Frequency
• Reputation
Estimate Return On Investment
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 61
University of Toronto
Importance
By determining the “Importance”, these items should
be clear:
• Why the problem is under consideration?
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 62
University of Toronto
Importance
10,000$
Damage
to Asset Relief High
Corrosion
Valve moisture
Build up
Vessel Stuck content
Operator
injured Explosion Vessel
Material
Strength
Lost
Production
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 63
University of Toronto
Importance
Safety: No impact/ Serious Potential
Environmental: No Impact
Revenue: $10,000 Lost (Reduced production rates in 4 hours, 1250 Kg/hr and 2 $/Kg)
Cost: Spare Parts: $3000 / Labor: $1000
Frequency: once in 2014, 3 times in 2013
---------------------------------------------------------------------------------------------------------------
Safety: No impact/ Serious Potential
Costumer service: No Impact
Production: $20,000 Lost
Cost: $ 5000 ( Materials $4500 / Labor $500)
Frequency: 2 times in 2014 and 2 times in 2012
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 64
University of Toronto
A Complete Problem Definition
What Broken leg
When Feb 20, 2012 at 6:30 PM
While driving an unfamiliar route after sunset
Where Root No. 23, West Town; Delivery Service
Significance
Safety Lost time Injury; Broken right leg
Environmental No Impact
Revenue No Impact
Cost $2500 (vehicle)
$10000 (medical)
Frequency 1st lost time/ 2nd recordable this year; 10% of customers
complained of late deliveries recently.
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 65
University of Toronto
Causal Analysis
66
Cause and Effects Principle
A review of the principals:
• Cause & effects are the same, only our view point makes them
different
• For each effect there are at least two causes: an action and a
condition
• Each effect exists only if its causes exist at the same point of time and
space
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 67
University of Toronto
Cause and Effects Principle
Effect
Effect
Why?
How the causes can grow: Effect
Effect
Why? Effect
Effect
Effect
Why? Effect
Effect
Effect
Effect
Effect
Effect
Why?
Effect
Effect 2 Minimum 4 8 16 32 ∞
Number of Causes Increase When We Ask Why
Why? Effect
Effect
Effect
Effect
Why? Effect
Effect
Effect
Effect Why?
Effect
Effect
Effect
Effect
Effect
Effect
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 68
University of Toronto
Where to start
• Start and connect all the easy or known parts, then you can look for the
harder remaining parts of the puzzle to discover.
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 69
University of Toronto
Adding more Causes
Always Causes Between The Causes
Relief
Vessel
Valve
Explosion
Stuck
Relief
Valve
Stuck
Vessel Vessel
Explosion Cracked
Vessel
Material
Strength
Continue
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 70
University of Toronto
Adding more Causes
Relief
Valve
Exists
Pressure
Increased
Vessel Relief
Cracked Valve
Stuck
Vessel Vessel
Explosion material
strength
Vessel
Exists
Continue
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 71
University of Toronto
Action and condition causes
How to look for actions and conditions:
- First look for the cause only!
- Depending on whether it is an action or a condition
cause, look for the other one.
- Actions often end in “ed”.
- Conditions are often noun phrases with an unstated
phrase of “existed”
Action
Effect
➢ Dividing causes to actions and
Condition
conditions is a tool, not a goal!!!
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 72
University of Toronto
Causal Network
Steps to develop a causal network:
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 73
University of Toronto
Adding Evidences
Caused by Caused by
Smoke Fire Sparks
Smelled smoke
Observed flames
Observed ashes
Sensed Evidence
• Directly sensed (Sight, Sound, Taste, Touch, Smell ,etc.)
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 74
University of Toronto
Adding Evidences
Caused by
High
Exhaust Gas
Temp (EGT)
Inferred Evidence
• Repeatable casual relationships (should be verified)
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 75
University of Toronto
Evidences
• Is it possible to look for evidences outside the
RCA team and meetings.
✓ Arrange interviews to get sensed & inferred information
• People working in the area of problem.
• Technical Specialist
• Technical publications
✓ Make a list of information / evidences to search with the
name of responsible people to do the researches and
available times for the tasks
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 76
University of Toronto
More on Evidences
• Remaining “or”s : implying causes which are not
completely validated by evidences: we prefer not to
have them, but sometimes it is not so easy
• When no evidence is available, use “?”
• Such a cause is not discarded, but lack of evidence
will be used in evaluating potential causes and
deciding where to focus our efforts
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 77
University of Toronto
Other Elements in a Causal Chart
Conditional
?
Cause
Conditional • Evidence
Cause
Primary • Evidence
Action Cause ?
Effect
?
Action Cause Stop
• The Use of Question Mark
• Evidence 1 • Reasons for stop:
• Evidence 2 • Out of control
• No more knowledge
• Desired conditions
• New problem
• Other Cause Paths More Productive
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 78
University of Toronto
Developing a chart to depict causal relations
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering Continue
University of Toronto
Cause & Effect (Time & Space)
Each effect exists only if its causes exist at the same point of time and space;
considering this principal, causal relations can be checked by asking:
• For each cause, if the cause is removed, would the effect still remain?
If so, that cause does not belong to the set, or at least there are other facts and (or)
relations in place.
• For each effect, are always all of the proposed causes enough to make it happen?
If not, there are other causes waiting to be discovered.
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 80
University of Toronto
Links Between Causes and Effects
and or
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 81
University of Toronto
Expanding Causal Network
• Continue each causal path from primary effect to the right
• Until reaching a point of no information; or where it is clearly outside of the
sphere of control, or until reaching to a desired condition
• Start from the primary effect to follow another path.
• Continue this until all possible causes are identified
• Ask for causes between causes (if appropriate), more causes for each effect, also
unknown causes on the right.
• In few iterations most of the knowledge of participants will be on the chart.
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 82
University of Toronto
Yellow Stickers
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 83
University of Toronto
What is the right way to build an aircraft?
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 84
University of Toronto
Discover Important Causal Paths
Look for what is different and ask why it is different:
(5W+H questions about the differences)
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 85
University of Toronto
Brainstorming
A few reminders:
• While generating ideas, collect as many information as possible
without criticisms or judgments (all ideas are equally valid at this
stage).
• Be creative, all ideas are welcome, even the silly ones.
• No secondary discussion about the ideas; every one should
understand all ideas will be discussed later.
• Build on other ideas.
• Write all ideas in a way everyone can easily see them.
• Remind the team that some of the ideas that has been recorded in
brainstorming, may not appear in final report, so ask them to move
quickly.
• Set a time limit for brainstorming.
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 86
University of Toronto
Effective Solutions
Identification
87
Solutions?
What actions can be taken upon a cause?
• Remove
• Change
• Control
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 88
University of Toronto
Adding Solutions
• Solution 1
• Solution 2
Handle
stopped fall
• Solution 1 • Evidence
• Solution 2
Caused by • Solution 1
Conditional • Solution 2
Cause • Solution 3
• Evidence
Person Fall
Caused by • Solution 1 • Evidence
Primary Effect • Solution 2
Action Cause
• Evidence
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 89
University of Toronto
Be cautious when you hear:
• We tried that before.
• Wrong!
• It will never work here.
• We have too many other important tasks.
• No one will buy it!
• We will just be extra careful in the future.
• That’s not our strategy.
• It isn’t seen in the budget.
• Good thought, unfortunately not impractical.
• Top management will never agree.
• Which Standard says to do it that way.
• We’ve always done it that way.
• Good idea, I’ll get back to you - and never does.
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 90
University of Toronto
Encourage Creativity
• Escape out of boundaries of conventional / logical belief system.
• Set up quick brain storming sessions
• Look for creative connections between ideas (when people laugh)
• Improve group synergy, build on other ideas.
• Discard the notions of a “right answer” and genuinely appreciate
different view points and ideas from all participants
• Ask for critics viewpoints.
• Get your subconscious mind engaged.
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 91
University of Toronto
What is the solution?
Heat
Fire Fuel
Oxygen
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 92
University of Toronto
There are two more important factors affecting effective
solutions being proposed:
• Organizational culture
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 93
University of Toronto
Solutions with Maintenance Nature
D
- Time Based Maintenance
- Condition Based Maintenance E
- Failure Finding
F
- Run to Failure
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 94
University of Toronto
Solution Criteria
• Prevent reoccurrence.
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 96
University of Toronto
Solution Criteria
• Within Control
– RCA Team Selection Issue
– How effective are solutions when you give it to
someone else, or tell someone else to do it?
– Is that a significant problem in your organization?
• Lack of involvement, investment of effort, and “buy-in”
– “It is not my problem…”
• May not fully understand circumstances, or
consequences of problem situation.
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 97
University of Toronto
Solution Criteria
• Meets Goals & Objectives
– Does not cause unacceptable problems.
• i.e. “Unintended Consequences” may be foreseeable!
– Prevents similar occurrences
• e.g. at different locations
– Provides reasonable value for its cost / effort /
resources expended.
• Value can be non-financial e.g. strategic
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 98
University of Toronto
Solutions
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 99
University of Toronto
Solutions
Implementation & Tracking
100
Effective Implementation and Tracking
An effective report should:
• Communicate Findings
Problem definition, summary of causal relationships, solutions
• Establish credibility and a basis for obtaining funding
Summary of potential achievements, costs and feasibility
• Provide a visual dialogue
Attached documents including causal chart (causes, evidences, solutions)
• Ensure success
Action plan
• Provide learning for others
Accessibility to other personnel
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 101
University of Toronto
Action Plan
Causes Corrective Actions Name Due Date
Not believed in Send out safety bulletin about Fred Barringer Completed
cautions in manual requirements to follow safety
cautions.
Not believed in Brief all the operators in conference Jack Goldberg Completed
cautions in manual room
Lockout procedures Revise lockout procedure in manual Robert Vesely Mar 14th
not updated 2013
Lockout procedures Perform separate RCA on why the John Clark March 1st
not updated procedure was not updated for 10 2013
years
Circuit not locked Include lockout steps in revised March 9th
out manual Paul Geitner 2013
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 102
University of Toronto
Responsibilities
• Implementation and tracking can be responsibility of
the team leader or either of the team members (who
is assigned to that solution(s))
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 103
University of Toronto
Track and Audit
What to look for:
✓ Solutions Effectiveness
✓ Implementation Efficiency
• Problem Reoccurrence
• Return of Investment
• Achieved Goals
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 104
University of Toronto
Implementation Notes
• Many root cause analysis fail at this stage
• Who is in charge?
• Implementation challenges
• Following ups/tracking
• Documentation of implementation
• A review of the results for other equipment or
systems
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 105
University of Toronto
A Note on Available Resources
• When we are speaking about implementing and
tracking the results, the number of them will come to
mind immediately; so it is essential to keep the number
of RCAs and as a result the number of the jobs within
the resource limitations of each department; here we
can see the need for prioritization and correctly set
thresholds.
• As a rule of thumb:
• each team is expected to accomplish 6-15 RCA analyses in a year
depending on the experience and size of problem
• RCA analyses are expected to reach to implementation phase in
max 10-15 days in most cases
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 106
University of Toronto
Tracking will be mostly about comparing the progress
against original implementation plan
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 107
University of Toronto
Food for thoughts
• What is the aim of Root Cause Analysis?
• What are the main steps to an effective RCA?
• How much easier and more effective would be, if greater
information was provided by a knowledgeable and motivated
team?
• Which solution do we call effective?
• Being structured and verified (by evidences) makes an RCA
analysis more objective.
• Why objective team-based solutions are better solutions and
easier to implement?
• Analysis is not enough! One needs to execute!
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 108
University of Toronto
References:
• Bloch, H. P. and F. K. Geitner (1997). Practical Machinery Management for Process
Plants: Volume 2: Machinery Failure Analysis and Troubleshooting. Texas, USA, Gulf
Publishing Company.
• Gano, D. L. (1999). Apollo Root Cause Analysis – A New Way Of Thinking. Washington,
USA, Apollonian Publication.
• Gano, D. L. (2011). Reality Charting: Seven Steps to Effective Problem-Solving and
Strategies for Personal Success. Washington, USA, Apollonian Publication.
• Jardine, A. K. S. and A. H. C. Tsang ( 2013). Maintenance, Replacement, and Reliability:
Theory and Applications, Second Edition USA, Taylor & Francis.
• Portwood, B. and L. Reising (2007). Root Cause Analysis and Quantitative Methods – Yin
and Yang, 25th International System Safety Conference. Baltimore, USA.
• Sklet, S. (2002). Methods for accident investigation (Report). NTNU, Norway.
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 109
University of Toronto