Root Cause Analysis Half Day

Download as pdf or txt
Download as pdf or txt
You are on page 1of 109

Root Cause Analysis

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization
and Reliability Engineering
University of Toronto

1
CONTENT

• Introduction to Root Cause Analysis and


Principals
• Cause and Effect Analysis
• Effective Solutions Identification
• Solutions Implementation and Tracking
• Conclusion
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 2
University of Toronto
INTRODUCTION
Think:
What is the problem?
What is the root cause?
What should we do?
What is Root Cause Analysis?

Ichihara Refinery, 2011, Japan


Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 3
University of Toronto
Introduction: Cause and Effect

Think: Root Cause Analysis:


A caused B, B caused C, C caused D and so on ….
A is the root cause, remove it and everything would be just fine or would
not be?
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 4
University of Toronto
Introduction: Domino Model
What do you think about Domino model?
Now apply it to a fire event ….

Fire Heat Spark

Does spark caused heat? Or vice versa?


What if we do not see them as linear?

Heat
Fire

Fuel

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 5
University of Toronto
Introduction: Condition vs. Action
Action

Heat
Probably the Fuel was there for a while,
then somebody added the Heat …
Fire
Condition We can consider the Fuel as a Condition

Fuel And the Heat as an action

For any Effect there are at least two Causes:


One as a Condition Cause and the other as an Action Cause

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 6
University of Toronto
Introduction: Condition vs. Action
What about fire cause triangle?
Action

Heat
Consider the causes again:

Condition

Fire Fuel
We can have more than
two causes for each effect,
Condition but at least one of them
should be an action
Oxygen

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 7
University of Toronto
Introduction: Infinite Continuum
Now consider this cause and Effect chain:
What is this?
A cause?
What if we ask Why?
Now the Cause became Effect itself
Relief High
Vessel Corrosion
Valve Moisture
Explosion Build up
Stuck Content

You can ask Why and go back in the cause and effect chain
The limit is our knowledge (or desire to go back)

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 8
University of Toronto
Introduction: Infinite Continuum
Now consider this cause and Effect chain:
What about this Effect?
What is caused by vessel explosion?
This time the Effect became the Cause

Relief High
Operator Vessel Corrosion
Valve moisture
injured Explosion Build up
Stuck content

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 9
University of Toronto
Introduction: Infinite Continuum
Now consider this Cause and Effect chain:

Causes and Effects are from an infinite continuum


The limit is our knowledge
or desired level of effort

Relief High
Operator Vessel Corrosion
Valve Moisture
Injured Explosion Build up
Stuck Content

Vessel
What happened between this Cause and Effect?
Cracked

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 10
University of Toronto
Introduction: Event Based Problems
So, when we are speaking about
root cause analysis, it means there
are a continuum of events which
were happened in the past and
caused the problem,

So, it means we are


tackling Event Based
Problems.

Note: Majority of our daily problems are event based

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 11
University of Toronto
Introduction: Event Based Problems

Event Based Problems are


solved by eliminating , changing
or Controlling at least one of
their causes

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 12
University of Toronto
Introduction:
Infinite Continuum

Event based problems are


solved by eliminating ,
changing or Controlling at
least one of their causes

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 13
University of Toronto
Introduction: Goal of RCA

However, the goal of Root


Cause Analysis is not finding
the root cause (which never
exist!).

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 14
University of Toronto
Introduction: Goal of RCA

However, the goal of Root


Cause Analysis is not finding
the root cause.

➢ The goal of Root Cause Analysis is to control the causes, in an


effective and worth doing way, to prevent problem reoccurrence.
➢ What we are looking for is the solution (s).

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 15
University of Toronto
Introduction: Different Methods
There are so many extensive and complicated RCA methods in the market

why?
If we are looking only for solutions, why to spend so much on casual
analysis?
There is a robust answer:
➢ more we understand the causes, better we can control them
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 16
University of Toronto
Introduction: Effective Problem Solving
To have an effective organization, there is a
need to have effective problem solving abilities.

Sugarland Explosion, 2008, USA


Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 17
University of Toronto
Introduction:
Barriers to Effective Problem Solving
Some of the common barriers to Effective Problem Solving?

• Blaming
• Common sense: a single reality
• Management expectation of finding a single root cause
• Failure to ask enough Whys
• Failure to define the problem correctly
• Mistaking problem description for problem analysis
• Effect of past experiences
• Linear thinking
• Categorization
• Narration culture

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 18
University of Toronto
Introduction: Blaming

Blaming culture is a great enemy of Effective Problem Solving

• Blaming culture prevents people from participating


and providing information
• Blaming will result in causes (like human error) with
no clear solutions
• Blaming focus is on “who” instead of going deep
enough by “why”

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 19
University of Toronto
Introduction: Blaming
Why blaming is so common by organizations?
• It is usually used to make examples, discouraging other
personnel from making the same/similar mistakes.
What is the real outcome?
• Problem likely to reoccur
• Bad feelings
• Hiding mistakes
• Prevents learning
• Personnel would have no intention to be involved in complex analyses
• Expanding gap between management and employees
• Increasing probability of disasters caused by multiple failures
The result will not be an Effective Organization
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 20
University of Toronto
Introduction: Blaming

Don’t forget:
• Most problems occur unintentionally.
• We are all humans, humans make mistakes, we need to
correct those mistakes.

➢ This should be reminded at the beginning of every single


RCA analysis session.
➢ So we can decrease the blaming attitude and encourage
participation.

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 21
University of Toronto
Human Errors
(adopted from RCM II John Moubray)
Human errors can be grouped into four categories:

• Anthropometric factors: Errors that occur because a person (or part of a person,
such as a hand or arm):
– simply cannot fit into the space available to do something
– cannot reach something
– is not strong enough to lift or move something

• Human sensory factors: Errors that occur because a person cannot see (field of
view, colour schemes), or cannot hear (background noise levels)

• Physiological factors: Errors that occur because of environmental stresses which


reduce human performance (temperature, vibration, tiredness, humidity)

• Psychological factors
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 22
University of Toronto
Failure Modes:
Human Error: Psychological

UNINTENDED
ACTION
“Does the job wrong”

ERROR

INTENDED
ACTION
“Does the wrong job”
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 23
University of Toronto
Failure Modes:
Human Error: Psychological
Attention failures
SLIP Do incorrectly something
I normally do correctly
UNINTENDED
ACTION Memory failures
LAPSE Miss out a step in a planned
sequence of events

ERROR Rule-based mistakes


Follow the rules, but the rules
are inappropriate or wrong
MISTAKE Knowledge-based mistakes
React inappropriately to a new
INTENDED situation (where no rules exist)
ACTION
Routine violations
VIOLATION Exceptional violations
Ali Zuashkiani Acts of sabotage
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 24
University of Toronto
Introduction: Single Reality
What do you see here?
A single reality?

➢ Even identical twins see the word


differently
➢ Add different life styles, experiences,
educations, emotions and motivations to
human’s basic differences.
➢ Lets understand and appreciate this and
take advantage of all perspectives and
experiences

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 25
University of Toronto
Introduction: Single Root Cause

You can see the symptoms


Above the surface
“The weed”

The causes of the problem


below the surface
“The root”

➢ The root is not a single cause, it’s a system, it contains a


network of interconnected causes.
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 26
University of Toronto
Introduction: Effective RCA Steps
Effective Root Cause Analysis should cover the following steps:

➢ Preparation and Problem Definition


Prepare for analysis sessions and investigations. Create an agreement on
problem definition and its importance
➢ Causal Analysis
Identify the causes, develop a chart, look for evidences and develop causal
reality accepted by all the parties
➢ Identification of Effective Solutions
Challenge the causes to find suitable solutions. Determine the best solutions
to solve the problem.
➢ Implementation and Tracking of Solutions
Plan for the selected solutions, implement them and follow them up.
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 27
University of Toronto
Preparation
&
Problem Definition

28
Preparation: Which Problems

• Solving which problems are important for us?


• Utilization?
• Availability?
• Quality
• OEE?
• Safety?
• Environment?
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 29
University of Toronto
Preparation: Which Problems

• Solving which problems are important for us?


• Utilization? 1 Accident More

• Availability?
29 incident RCA Effort & Details
• Quality
• OEE?
• Safety? 300 Near Misses Less
Heinrich Pyramid
• Environment?
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 30
University of Toronto
Preparation:
Limited Time and Resources
So many problems, so limited time and resources;
What we can do?
PRIORITIZATION
What common tools we have to prioritize?
▪ Pareto Analysis
▪ Cost – Benefit Analysis
▪ Decision Matrix
▪ Jack Knife Diagram
▪ Hybrid Approach
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 31
University of Toronto
Preparation: Pareto Analysis
When there is a main important measure, Pareto
analysis will be simple and effective:
• 80 / 20 Rule
• The significant few vs. worthless many
100%

80%
Unavailability

60%

40%

20%

0%
5 6 4 7 2 1 3 9 10 8
Truck Register Numbers

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 32
University of Toronto
Preparation: Cost – Benefit Analysis

Estimated relative cost/benefit should be considered


• The order of magnitude is enough
• No need for an accounting level accuracy

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 33
University of Toronto
Preparation: Decision Matrix
Comparing the effort required to solve the problem
with its impact in a 2D matrix

Each problem is assigned a number Least


3 3 6 9
depending on its location on the
matrix Effort 2 2 3 6

1 1 2 3
• Impact / Benefit Most

• Effort / Cost / Likelihood of success 1 2 3

Impact
Least

Most
problem weight = effort  impact

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 34
University of Toronto
Preparation: Hybrid Approach

✓ Preparing 2D or 3D matrix
✓ Using Pareto or ABC Analysis to prioritize
• A: top 15%: Most Critical 5.00
• B: mid 20%: Moderately Critical C B A

Probability of Success
• C: last 65%: Least Critical
4.00

3.00
2.00 3.00 4.00 5.00
Impact of the Problem

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 35
University of Toronto
Preparation:
Sporadic vs. Chronic Failures
Which one will receive a greater attention from the
organization?
Failure of a passenger This on average has caused 1 More than 60 hours
seat adjustment minute daily flight delays during of delay in 10 years
mechanism of an aircraft the last 10 years of the aircraft life per an aircraft

Collision of an airport This can happen for 1 out of 10 2.4 hour delay and
truck to empennage of a aircrafts in their lives and 1000 $ repair cost
passenger jet requires on average 24 hours per an aircraft
plus 1000$ for repair

Which one would have a greater impact on the


organization?
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 36
University of Toronto
Preparation:
Sporadic vs. Chronic Failures

➢Recognition of both sporadic and chronic


problems should be considered during
prioritization.
➢Usually chronic failures with minor
consequences are seen as the way thing are

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 37
University of Toronto
Preparation:
Triggering Criteria / Threshold
• By which criteria should
5.00
we decide for new
problems? C B A

4.00

Effort
3.00
2.00 3.00 4.00 5.00
Impact

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 38
University of Toronto
Note that, Triggering Criteria can be defined for different
aspects/consequences (related to different goals and policies of
the organization):
• Safety
• Environment
• Production
• Critical assets
• Financial
• Non compliance to …
• Or a weighted average of all above

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 39
University of Toronto
Preparation
➢ When a triggering threshold is found to have been
reached, preparation actions should be taken to
support the effort before the start of analysis.
• Actions (if) required to preserve the data:
Photos, videos, etc.
Coordination with repair/maintenance personnel if we guess any
special part or evidence should be preserved.

• Preparing operating context document:


What is the problem (preliminary definition) and in which operating
context the problem happened.
This document should be prepared by department or section where the
problem has happened.

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 40
University of Toronto
Preparation: Operating Context
Operating context document should include:
• The preliminary definition of the problem which probably would be
revised during the analysis.
• Interviewing those who may have information about different aspects
of the problems, including experts from inside or outside of the
organization.
• All the information initially collected

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 41
University of Toronto
Preparation: Operating Context
• Operating context document should be
concise and complete.
• Operating context should include a brief
description and reference to relevant
information with all required drawings/maps
etc.

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 42
University of Toronto
Preparation:
Operating Context References
Operating context may have statements about and / or references to relevant
information or documents :
• Event Time lines
• Performed response actions
• Equipment / system / plant design specification (vs. real operating conditions)
• Any recent changes prior to event (of any type)
• Related processes documents (production, maintenance, quality, change, procurement,
training, etc.)
• Relevant flowcharts
• Schematic drawings of equipment / system / plant
• Process & Instrumentation Diagrams (P&IDs)
• Environmental conditions (temperature, pressure, humidity, light, noise, etc.)
• Photographs and / or videos (taken soon after the event)
• Equipment conditions and indications before / during / after incident and in
comparison to normal operations
Continued …….
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 43
University of Toronto
Preparation:
Operating Context References
Operating context References – Continued
• Equipment conditions and indications before / during / after incident and in
comparison to normal operations
• Logs and reports
• Shift change information
• Human factors (time: day/night/ after sleep / meal /days off, incorrect guidance /lack of
knowledge, complacency , imprecise communication, distraction, fatigue, work stress,
high workload)
• Any other Information about human intervention with the process prior to the incident
• Observations and interview documents
• OEM recommendations
• Historical records, including similar events (and their differences)
• Records and documents of any recent changes
• Patterns and trends
• Third party reports
• Legal / organizational regulations/ requirements
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 44
University of Toronto
Data Preservation
• It is likely that some data / evidences get lost in the
period between the event and official beginning of
investigations.

• So this is the responsibility of the personnel and


supervisors at event site to preserve them as much
as possible and depict or refer to them in operating
context document

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 45
University of Toronto
Team members
A natural team for a typical industrial case:
• Operation supervisor
• Experienced Operator
• Maintenance supervisor
• Experienced technician
• Facilitator
• Expert
Continued …..
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 46
University of Toronto
Team members
• Any person who has information, is affected by the problem, or can
be effective to solve it should be engaged somehow.

• Don’t make sessions too crowded; keep it under 7, preferably around


5, and try not to have more than 7.

• Better to have both creative and logical type of individuals in the


team.

• Critics ( not disruptive) are welcome.

Continued ……

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 47
University of Toronto
Team members
• The team should participate in problem definition,
developing causal network, identifying solutions and
implementing them.
• RCA team leader should be selected from associated
department or section where problem has originated
from.
• Usually best ideas can come from the people who are
expert in the subject but were not involved in the
problem itself.
• Be cautious about the people who keep throwing the
problem to others fields
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 48
University of Toronto
Problem Definition:
Problem Definition Importance
➢ What will happen if the problem is not well defined?

➢ How effective will be the solutions of problem which is not


understood correctly?

➢ Can we assume everyone knows the problem?

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 49
University of Toronto
Problem Definition: 5W + H
The questions usually arise in any problem investigation:
• Who
• What The only who questions:
Who knows more; who has the information.
• When
Hold this question for now, it will be
• Where dealt with in future steps.
• Why Importance:
Safety
• How Environment
Cost
Revenue
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 50
University of Toronto
Problem Definition:
What: Primary Effect
➢ In the causal network, usually there are some effects which
are more important; the ones which catch the attentions.
➢ Any effect we want to prevent from reoccurring can be the
primary effect of our problem.
➢ This is what we want to solve.

What is the primary effect bellow?


What impact would have if selecting a different primary effect?
Oil on Shaft Oil Long
Dropping
Injury Fall the Seal Time in
Oil
Floor Failure Use
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 51
University of Toronto
Problem Definition: Primary Effect
What is the primary effect in this problem?

Relief High
Operator Vessel Corrosion
Valve Moisture
Injured Explosion Build up
Stuck Content

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 52
University of Toronto
Problem Definition: Primary Effect
10,000$ What if we see the problem like this?
Damage
to Asset Relief High
Corrosion
Valve moisture
Build up
Vessel Stuck content
Operator
injured Explosion Vessel
Material
Strength
Lost
Production ✓ Sometimes a practical way to find the primary effect is to look
for the point where it branches into secondary effects
(consequences).
✓ These secondary effects are usually where the problem conflicts
with the goals of the organization, in other words, where
desired functions fail and problem becomes important.
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 53
University of Toronto
Problem Definition: Primary Effect
10,000$
Damage
to Asset Relief High
Corrosion
Valve moisture
Build up
Vessel Stuck content
Operator
injured Explosion Vessel
Material
Strength
Lost
Production How different people see the desired functions,
and what are they?
• Operation Supervisor?
• Maintenance Supervisor?
• HSE Officer?

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 54
University of Toronto
Problem Definition: Primary Effect
10,000$
Damage
to Asset Relief High
Corrosion
Valve moisture
Build up
Vessel Stuck content
Operator
injured Explosion Vessel
Material
Strength
Lost
Production Now what?
✓ Agreeing on primary effect will enable the team to
concentrate on solving one issue at a time.
✓ If consensus didn’t achieved about the primary effect, start with one of the
most accepted primary effects, different parts of this causal puzzle will be
completed later
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 55
University of Toronto
Primary Effect Examples; Use Noun-Verb Format.

• CMMS Server Failure.


• Production Lost .
• Crane Collapsed.
• Shipment Delayed.
• Lost Sales.
• Exceeded Sales Goal.
• Wrong Blood Type Administered.
• SO3 Gas Released.
• Oil Spilled.
• Production Goal Met.

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 56
University of Toronto
When
• When did the primary effect observed?
• Clock, date
• Continuous, intermittent, during running time
• Relative to other events
• Etc.

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 57
University of Toronto
When
• When Examples:

After intervention
Start of the night shift
After a routine test
Jan 31st, 1980
During earthquake
etc.

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 58
University of Toronto
Where
• Where did the primary effect observed?
• Geographically
• Physically on the asset
• Relatively to other assets
• Facility; System; Component
• Etc.

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 59
University of Toronto
Where
• Where examples:
• Smelter Furnace # 2
• Centrifugal pump #1 outlet
• Highly polluted area
• Close to gas station

– Facility: Smelter Plant


– System: Electric power
– Component: MCC #2

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 60
University of Toronto
Importance
• Importance of the primary effect should
address the following issues:
• Safety
• Environment
• Production
• Cost
• Frequency
• Reputation
Estimate Return On Investment
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 61
University of Toronto
Importance
By determining the “Importance”, these items should
be clear:
• Why the problem is under consideration?

• How much time should be spent on it?

• Who should be involved?

• How much cost and effort should be spent to solve this?

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 62
University of Toronto
Importance
10,000$
Damage
to Asset Relief High
Corrosion
Valve moisture
Build up
Vessel Stuck content
Operator
injured Explosion Vessel
Material
Strength
Lost
Production

Importance of the problem

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 63
University of Toronto
Importance
Safety: No impact/ Serious Potential
Environmental: No Impact
Revenue: $10,000 Lost (Reduced production rates in 4 hours, 1250 Kg/hr and 2 $/Kg)
Cost: Spare Parts: $3000 / Labor: $1000
Frequency: once in 2014, 3 times in 2013
---------------------------------------------------------------------------------------------------------------
Safety: No impact/ Serious Potential
Costumer service: No Impact
Production: $20,000 Lost
Cost: $ 5000 ( Materials $4500 / Labor $500)
Frequency: 2 times in 2014 and 2 times in 2012

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 64
University of Toronto
A Complete Problem Definition
What Broken leg
When Feb 20, 2012 at 6:30 PM
While driving an unfamiliar route after sunset
Where Root No. 23, West Town; Delivery Service
Significance
Safety Lost time Injury; Broken right leg
Environmental No Impact
Revenue No Impact
Cost $2500 (vehicle)
$10000 (medical)
Frequency 1st lost time/ 2nd recordable this year; 10% of customers
complained of late deliveries recently.

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 65
University of Toronto
Causal Analysis

66
Cause and Effects Principle
A review of the principals:
• Cause & effects are the same, only our view point makes them
different

• Cause & effects exit in an infinite continuum, all directions and


dimensions

• For each effect there are at least two causes: an action and a
condition

• Each effect exists only if its causes exist at the same point of time and
space
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 67
University of Toronto
Cause and Effects Principle
Effect
Effect
Why?
How the causes can grow: Effect
Effect
Why? Effect
Effect
Effect
Why? Effect
Effect
Effect
Effect
Effect
Effect
Why?
Effect
Effect 2 Minimum 4 8 16 32 ∞
Number of Causes Increase When We Ask Why
Why? Effect
Effect

Effect
Effect
Why? Effect
Effect
Effect
Effect Why?
Effect
Effect
Effect
Effect
Effect
Effect

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 68
University of Toronto
Where to start

• Start and connect all the easy or known parts, then you can look for the
harder remaining parts of the puzzle to discover.

• As with puzzles, don’t waste time


on where to start; start with one of
the many known pieces, then come
back later.

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 69
University of Toronto
Adding more Causes
Always Causes Between The Causes
Relief
Vessel
Valve
Explosion
Stuck

Relief
Valve
Stuck
Vessel Vessel
Explosion Cracked
Vessel
Material
Strength
Continue
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 70
University of Toronto
Adding more Causes
Relief
Valve
Exists
Pressure
Increased

Vessel Relief
Cracked Valve
Stuck
Vessel Vessel
Explosion material
strength
Vessel
Exists

Continue
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 71
University of Toronto
Action and condition causes
How to look for actions and conditions:
- First look for the cause only!
- Depending on whether it is an action or a condition
cause, look for the other one.
- Actions often end in “ed”.
- Conditions are often noun phrases with an unstated
phrase of “existed”
Action
Effect
➢ Dividing causes to actions and
Condition
conditions is a tool, not a goal!!!
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 72
University of Toronto
Causal Network
Steps to develop a causal network:

• Start with primary effect


• Ask why for the primary effect
• Look for causes in action and condition types
• Make iterative loops to find more causes
• Support each cause with evidences
• End each cause path

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 73
University of Toronto
Adding Evidences
Caused by Caused by
Smoke Fire Sparks

Smelled smoke

Observed flames

Observed ashes

Sensed Evidence
• Directly sensed (Sight, Sound, Taste, Touch, Smell ,etc.)

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 74
University of Toronto
Adding Evidences
Caused by
High
Exhaust Gas
Temp (EGT)

EGT Indicator Reading

Inferred Evidence
• Repeatable casual relationships (should be verified)

EGT Circuit Electric potential


between Thermocouple EGT
indicator voltage conductors temp increased increased
reading increased increased

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 75
University of Toronto
Evidences
• Is it possible to look for evidences outside the
RCA team and meetings.
✓ Arrange interviews to get sensed & inferred information
• People working in the area of problem.
• Technical Specialist
• Technical publications
✓ Make a list of information / evidences to search with the
name of responsible people to do the researches and
available times for the tasks
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 76
University of Toronto
More on Evidences
• Remaining “or”s : implying causes which are not
completely validated by evidences: we prefer not to
have them, but sometimes it is not so easy
• When no evidence is available, use “?”
• Such a cause is not discarded, but lack of evidence
will be used in evaluating potential causes and
deciding where to focus our efforts
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 77
University of Toronto
Other Elements in a Causal Chart
Conditional
?
Cause
Conditional • Evidence
Cause
Primary • Evidence
Action Cause ?
Effect
?
Action Cause Stop
• The Use of Question Mark
• Evidence 1 • Reasons for stop:
• Evidence 2 • Out of control
• No more knowledge
• Desired conditions
• New problem
• Other Cause Paths More Productive
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 78
University of Toronto
Developing a chart to depict causal relations

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering Continue
University of Toronto
Cause & Effect (Time & Space)
Each effect exists only if its causes exist at the same point of time and space;
considering this principal, causal relations can be checked by asking:

• For each cause, if the cause is removed, would the effect still remain?
If so, that cause does not belong to the set, or at least there are other facts and (or)
relations in place.
• For each effect, are always all of the proposed causes enough to make it happen?
If not, there are other causes waiting to be discovered.

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 80
University of Toronto
Links Between Causes and Effects

and or

Causes are supported Possible causes which are not


with strong evidence supported with strong
evidence or just a contributing
factor

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 81
University of Toronto
Expanding Causal Network
• Continue each causal path from primary effect to the right
• Until reaching a point of no information; or where it is clearly outside of the
sphere of control, or until reaching to a desired condition
• Start from the primary effect to follow another path.
• Continue this until all possible causes are identified
• Ask for causes between causes (if appropriate), more causes for each effect, also
unknown causes on the right.
• In few iterations most of the knowledge of participants will be on the chart.

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 82
University of Toronto
Yellow Stickers

• A good tool to start causal analysis


• Helps to step back and see the
whole picture on the wall
• To continue, if the problem gets
bigger, MS Excel, MS Visio or more
dedicated software programs are
available.

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 83
University of Toronto
What is the right way to build an aircraft?

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 84
University of Toronto
Discover Important Causal Paths
Look for what is different and ask why it is different:
(5W+H questions about the differences)

• Design vs. application review

• Analysis of recent changes

• Analysis of similar events

• Analysis of similar situation / equipment , but without the


problem

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 85
University of Toronto
Brainstorming
A few reminders:
• While generating ideas, collect as many information as possible
without criticisms or judgments (all ideas are equally valid at this
stage).
• Be creative, all ideas are welcome, even the silly ones.
• No secondary discussion about the ideas; every one should
understand all ideas will be discussed later.
• Build on other ideas.
• Write all ideas in a way everyone can easily see them.
• Remind the team that some of the ideas that has been recorded in
brainstorming, may not appear in final report, so ask them to move
quickly.
• Set a time limit for brainstorming.
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 86
University of Toronto
Effective Solutions
Identification

87
Solutions?
What actions can be taken upon a cause?

• Remove
• Change
• Control

✓ Do not impose any restrictions


at this stage

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 88
University of Toronto
Adding Solutions
• Solution 1
• Solution 2

Handle
stopped fall
• Solution 1 • Evidence
• Solution 2
Caused by • Solution 1
Conditional • Solution 2
Cause • Solution 3
• Evidence
Person Fall
Caused by • Solution 1 • Evidence
Primary Effect • Solution 2

Action Cause

• Evidence

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 89
University of Toronto
Be cautious when you hear:
• We tried that before.
• Wrong!
• It will never work here.
• We have too many other important tasks.
• No one will buy it!
• We will just be extra careful in the future.
• That’s not our strategy.
• It isn’t seen in the budget.
• Good thought, unfortunately not impractical.
• Top management will never agree.
• Which Standard says to do it that way.
• We’ve always done it that way.
• Good idea, I’ll get back to you - and never does.
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 90
University of Toronto
Encourage Creativity
• Escape out of boundaries of conventional / logical belief system.
• Set up quick brain storming sessions
• Look for creative connections between ideas (when people laugh)
• Improve group synergy, build on other ideas.
• Discard the notions of a “right answer” and genuinely appreciate
different view points and ideas from all participants
• Ask for critics viewpoints.
• Get your subconscious mind engaged.

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 91
University of Toronto
What is the solution?
Heat

Fire Fuel

Oxygen

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 92
University of Toronto
There are two more important factors affecting effective
solutions being proposed:

• Organizational culture

• The main expert to whom the problem is


referred.

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 93
University of Toronto
Solutions with Maintenance Nature

For equipment failure Common Failure Patterns:


related causes A

D
- Time Based Maintenance
- Condition Based Maintenance E
- Failure Finding
F
- Run to Failure
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 94
University of Toronto
Solution Criteria
• Prevent reoccurrence.

• Within your control.

• Meet your goals and objectives.


• …….

✓ The best solutions are often applied to conditional causes.


Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 95
University of Toronto
Solution Criteria
• Prevent reoccurrence
• Focus on permanent solutions
• Temporary solutions may be needed to satisfy current
requirements, or mitigate consequences.
– e.g. customer relations, politics, etc.

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 96
University of Toronto
Solution Criteria
• Within Control
– RCA Team Selection Issue
– How effective are solutions when you give it to
someone else, or tell someone else to do it?
– Is that a significant problem in your organization?
• Lack of involvement, investment of effort, and “buy-in”
– “It is not my problem…”
• May not fully understand circumstances, or
consequences of problem situation.

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 97
University of Toronto
Solution Criteria
• Meets Goals & Objectives
– Does not cause unacceptable problems.
• i.e. “Unintended Consequences” may be foreseeable!
– Prevents similar occurrences
• e.g. at different locations
– Provides reasonable value for its cost / effort /
resources expended.
• Value can be non-financial e.g. strategic

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 98
University of Toronto
Solutions

Make sure “or”s in the chart are not forgotten


and the selected solutions will cover them too

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 99
University of Toronto
Solutions
Implementation & Tracking

100
Effective Implementation and Tracking
An effective report should:

• Communicate Findings
Problem definition, summary of causal relationships, solutions
• Establish credibility and a basis for obtaining funding
Summary of potential achievements, costs and feasibility
• Provide a visual dialogue
Attached documents including causal chart (causes, evidences, solutions)
• Ensure success
Action plan
• Provide learning for others
Accessibility to other personnel
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 101
University of Toronto
Action Plan
Causes Corrective Actions Name Due Date
Not believed in Send out safety bulletin about Fred Barringer Completed
cautions in manual requirements to follow safety
cautions.
Not believed in Brief all the operators in conference Jack Goldberg Completed
cautions in manual room
Lockout procedures Revise lockout procedure in manual Robert Vesely Mar 14th
not updated 2013
Lockout procedures Perform separate RCA on why the John Clark March 1st
not updated procedure was not updated for 10 2013
years
Circuit not locked Include lockout steps in revised March 9th
out manual Paul Geitner 2013

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 102
University of Toronto
Responsibilities
• Implementation and tracking can be responsibility of
the team leader or either of the team members (who
is assigned to that solution(s))

• But the overall responsibility is with the facility


(operation) manager and all tracking should be
reported to him …

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 103
University of Toronto
Track and Audit
What to look for:
✓ Solutions Effectiveness
✓ Implementation Efficiency
• Problem Reoccurrence
• Return of Investment
• Achieved Goals

• Follow up the action plan (team leader)


• Annual review (third auditor)
+ Analyze time efficiency

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 104
University of Toronto
Implementation Notes
• Many root cause analysis fail at this stage
• Who is in charge?
• Implementation challenges
• Following ups/tracking
• Documentation of implementation
• A review of the results for other equipment or
systems
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 105
University of Toronto
A Note on Available Resources
• When we are speaking about implementing and
tracking the results, the number of them will come to
mind immediately; so it is essential to keep the number
of RCAs and as a result the number of the jobs within
the resource limitations of each department; here we
can see the need for prioritization and correctly set
thresholds.
• As a rule of thumb:
• each team is expected to accomplish 6-15 RCA analyses in a year
depending on the experience and size of problem
• RCA analyses are expected to reach to implementation phase in
max 10-15 days in most cases
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 106
University of Toronto
Tracking will be mostly about comparing the progress
against original implementation plan

Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 107
University of Toronto
Food for thoughts
• What is the aim of Root Cause Analysis?
• What are the main steps to an effective RCA?
• How much easier and more effective would be, if greater
information was provided by a knowledgeable and motivated
team?
• Which solution do we call effective?
• Being structured and verified (by evidences) makes an RCA
analysis more objective.
• Why objective team-based solutions are better solutions and
easier to implement?
• Analysis is not enough! One needs to execute!
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 108
University of Toronto
References:
• Bloch, H. P. and F. K. Geitner (1997). Practical Machinery Management for Process
Plants: Volume 2: Machinery Failure Analysis and Troubleshooting. Texas, USA, Gulf
Publishing Company.
• Gano, D. L. (1999). Apollo Root Cause Analysis – A New Way Of Thinking. Washington,
USA, Apollonian Publication.
• Gano, D. L. (2011). Reality Charting: Seven Steps to Effective Problem-Solving and
Strategies for Personal Success. Washington, USA, Apollonian Publication.
• Jardine, A. K. S. and A. H. C. Tsang ( 2013). Maintenance, Replacement, and Reliability:
Theory and Applications, Second Edition USA, Taylor & Francis.
• Portwood, B. and L. Reising (2007). Root Cause Analysis and Quantitative Methods – Yin
and Yang, 25th International System Safety Conference. Baltimore, USA.
• Sklet, S. (2002). Methods for accident investigation (Report). NTNU, Norway.
Ali Zuashkiani
Director of Educational Programs
Centre for Maintenance Optimization and Reliability Engineering 109
University of Toronto

You might also like