Fault Tree Analysis: M. Pandey, University of Waterloo
Fault Tree Analysis: M. Pandey, University of Waterloo
Fault Tree Analysis: M. Pandey, University of Waterloo
Table of Contents
Introduction ............................................................................................................... 3
Fault Tree Analysis ................................................................................................... 3
Basic Events ............................................................................................................ 4
Advantages ............................................................................................................. 4
Limitations .............................................................................................................. 4
Notation................................................................................................................... 5
General Procedure for Fault Tree Analysis........................................................ 6
Rules of Fault Tree Construction ......................................................................... 7
Considerations...................................................................................................... 11
Fault Tree Evaluation ............................................................................................. 12
Boolean Algebra ................................................................................................... 12
The OR Gate ....................................................................................................... 12
The AND Gate.................................................................................................... 12
Qualitative Analysis ............................................................................................ 14
Minimal Cut Sets ............................................................................................... 14
Criticality............................................................................................................ 14
Quantitative Analysis.......................................................................................... 16
Common-Cause Failures..................................................................................... 16
References................................................................................................................. 18
Introduction
there is a need to analyze all the possible failure mechanisms in complex
(http://www.nrc.gov/reading-rm/doc-collections/nuregs/staff/sr0492/sr0492.pdf)
also Chapter 8 and 10 in McCormick (1981)
one of the principal methods of probabilistic safety (or risk) analysis
(PRA)
developed by Bell Telephone Laboratories in 1962 for the U.S. Air Force
Minuteman system, later adopted and extensively used by Boeing
Company
fault tree diagrams
z are used most often as a system-level risk assessment technique
z can model the possible combinations of equipment failures, human
errors, and external conditions that can lead to a specific type of
accident
z follow a top-down structure and represent a graphical model of the
pathways within a system between basic events that can lead to a
foreseeable loss event (or a failure) referred to as the top event
the contributory events and conditions are interconnected using standard
logic symbols (AND, OR, etc.), also referred to as gates
events that must coexist to cause the top event are described using the
AND relationship
alternate events that can individually cause the top event are described
using the OR relationship
the occurrence of a top event may or may not lead to a serious or adverse
consequence
the relative likelihood of a number of potential consequences will depend
tree
Basic Events
An event that cannot be developed any further.
all basic events are generally assumed to be statistically independent
unless they are common cause failures (i.e. failures arising from a
common cause or an initiating event)
basic events can be either
z primary fault events, i.e. subsystem failure due to a basic mode such as
a structural fault, failure to open or close, or to start or stop, or
z secondary fault events, i.e. subsystem failure due to excessive
operational or environmental stress resulting in the system element to
be out of tolerance
Advantages
allow the use of reliable information on component failure and other basic
events to estimate the overall risk associated with new system designs for
which no historical data exists
simple to understand and easy to implement
qualitative descriptions of potential problems and combinations of events
causing specific problems of interest
quantitative estimates of failure frequencies and likelihoods, and relative
importances of various failure sequences and contributing events
lists of recommendations for reducing risks
quantitative evaluations of recommendation effectiveness
Limitations
difficult to conceive all possible scenarios leading to the top event
construction of fault trees for large systems can be tedious
correlations between basic events (e.g. failure of components belonging to
the same batch) are difficult to model and exact solutions to correlated
events do not exist
subjective decisions regarding the level of detail and completeness are
often necessary
Notation
Symbol
Name
Circle
Oval
Diamond
House
Description
Primary Event Symbols
Basic Event a basic initiating fault requiring
no further development
Conditioning Event specific conditions or
restrictions that apply to any logic gate (used
with INHIBIT gate)
Undeveloped Event an event that is not
developed further because it is of insufficient
consequence or because information is
unavailable
External Event an event which is normally
expected to occur (not a fault event)
OR Gate
AND Gate
INHIBIT
Gate
Triangle-in
Triangle-out
Transfer Symbols
Indicates that the tree is developed further
someplace else (e.g. another page)
Indicates that this portion of the tree is a subtree connected to the corresponding Triangle-In
(appears at the top of the tree)
Step 2.
Step 3.
Step 4.
Step 5.
Step 6.
Step 7.
Step 8.
Rule 1. State the fault event as a fault, including the description and timing of
a fault condition at some particular time.
Include
(a) what the fault state of that system or component is,
(b) when that system or component is in the fault state.
Test the fault event by asking
(c) is it a fault?
(d) is the what-and-when portion included in the fault statement?
Rule 2. There are two basic types of fault statements, state-of-system and
state-of-component.
To continue the tree,
(a) if state-of-system fault statement, use Rule 3
(b) if state-of-component fault statement, use Rule 4
Rule 3. A state-of-system fault may use an AND, OR, or INHIBIT gate or no
gate at all.
To determine which gate to use, the faults must be then
(a) minimum necessary and sufficient fault events,
(b) immediate fault events.
Rule 4. A state-of-component fault always uses an OR gate.
To continue, look for the primary, secondary, and command failure
fault events. Then state those fault events.
(a) primary failure is failure of that component within the design
envelope or environment
(b) secondary failures are failures of that component due to
excessive environments exceeding the design environment
(c) command faults are inadvertent operation of the component
because of a failure of a control element
Rule 5. No gate-to-gate relationships.
Put an event statement between any two gates.
Rule 6. Expect no miracles.
Those things that would normally occur as the result of a fault will
occur, and only those things. Also, normal system operation may be
expected to occur when faults occur.
Fault Tree Analysis Page 7
Example: (McCormick, 1981) Construct a fault tree for the simple electric
motor circuit shown below.
Switch
Fuse
Power
Supply
Motor
Wire
Solution:
Step 1. Define the system of interest.
Need to identify
z Intended Functions
z Physical Boundaries (to avoid overlooking key elements of a system at
interfaces and penalizing a system by associating other equipment
with the subject of the study)
z Analytical Boundaries (to limit the level of analysis resolution, to
explicitly exclude certain types of events and conditions, such as
sabotage, from the analysis)
z Initial Conditions, (including equipment that is assumed to be out of
service initially, which affect the combinations of additional events
necessary to produce a specific system problem)
For this particular problem we have,
Intended Function
the motor is used for some (unknown)
purpose
Physical Boundaries
power supply
Analytical Boundaries include all contributors in the above diagram
Initial Conditions
switch closed, motor on
Step 2. Define the top event.
We are interested in the event that the motor fails to operate. Therefore,
the top event is defined as
Motor fails
to operate
Step 3. Construct the fault tree, starting from the top, i.e., define the treetop
structure. Identify the main contributing events, including all events and
scenarios that may cause the top event.
Step 4. Explore each branch in successive levels of detail, following the rules
of fault tree construction.
Motor fails
to operate
No current
to motor
Defect
in motor
Wire
failure
(open)
Switch
open
Switch
fails
open
Fuse fails
open
Power
supply
failure
Switch
opened
erroneously
Fuse
failure under
normal
conditions
(open)
Fuse failure
due to overload
Fuse fails
open
Overload
in circuit
Wire
failure
(shorted)
Power
failure
(surge)
Gate 2. The event No current in motor is the result of other events and
is therefore developed further. The lack of current to the motor can
result from a broken connection in any of the other four components in
the circuit, including the failure of the wire or power supply (basic
events), the switch being open, or failure of the fuse.
Gate 3. The open switch may be due to a basic failure of the switch, or the
event that the switch was opened erroneously. The erroneous opening
of the switch is due to human error, which could be developed further
into more basic events (i.e. operator is inexperienced, under stress,
etc.). However, due to insufficient information, the event is not
explored further. This purposely undeveloped event is therefore
denoted with the diamond symbol.
Gate 4. The fuse failure event may be caused by fuse failure under normal
conditions (primary failure) or due to overload from the circuit.
Gate 5. The secondary fuse failure can occur if the fuse does not open
every time an overload is present in the circuit (because all conditions
of an overload do not necessarily result in sufficient overcurrent to
open the fuse). This is why a conditional gate, denoted by the
hexagon, is used. The condition, i.e. Fuse fails open is placed in the
connecting oval, and the conditional gate is treated similarly to an
AND gate in subsequent tree analysis.
Gate 6. The overload in the circuit may be caused either by a short or a
power surge, both of which are primary (i.e. basic) events.
Considerations
construction of a fault tree is subjective
need to take into account
z
z
z
Boolean Algebra
fault trees describe the relationships between events using Boolean logic
a fault tree can be translated into an equivalent set of Boolean equations
The OR Gate
Represents the union of events at the gate. For event Q with two input events
A and B attached to the OR gate, the probability is obtained as
P (Q ) = P (A) + P (B ) P (A B )
(1)
or
P (Q ) = P (A) + P (B ) P (A)P (B | A)
(2)
P (Q ) = P (A) + P (B )
(3)
P (Q ) = P (A) + P (B ) P (A)P (B )
(4)
P (Q ) = P (A) + P (B ) P (A)(1) = P (B )
(5)
P (Q ) = P (A) + P (B )
(6)
P (Q ) = P (A)P (B | A) = P (B )P (A | B )
(7)
P (Q ) = P (A)P (B )
(8)
Engineering Notation
Designation
X Y = Y X
X Y = Y X
Commutative Law
X Y = Y X
X +Y = Y + X
X (Y Z ) = (X Y ) Z
X (Y Z ) = (X Y ) Z
Associative Law
X (YZ ) = (XY )Z
X (Y Z ) = (X Y ) Z
X + (Y + Z ) = (X +Y ) + Z
X (Y Z ) = (X Y ) (X Z )
X (Y + Z ) = X Y + X Z
Distributive Law
X (Y + Z ) = XY + XZ
X (Y Z ) = (X Y ) (X Z )
X + (Y Z ) = (X +Y ) (X + Z )
X X = X
X X = X
X X = X
X +X = X
X (X Y ) = X
X (X + Y ) = X
X (X Y ) = X
X + (X Y ) = X
X X =
X X == I
X X =
X +X = = I
Complementation
X Y = X Y
X Y = X + Y
de Morgans Rule
X Y = X Y
X + Y = X Y
X =
X =
X = X
+X = X
X = X
X = X
X =
+X =
=
=
=
=
X (X Y ) = X Y
X (X Y ) = X Y
X + (X Y ) = X + Y
X (X + Y ) = X Y
Idempotent Law
Law of Absorption
Other relationships
Qualitative Analysis
used for identifying
critical events
potential system weaknesses
z best ways to reduce the risk associated with the top event
conducted using minimal cut sets
z
z
Criticality
the importance of each minimal cut set can be based on its relative
Example: Determine all the minimal cut sets for the small motor problem.
Solution:
Let T denote the top event
Let P denote primary events (circles)
Let G denote intermediate events (rectangles)
Let S denote undeveloped events (diamonds)
Let C denote conditioning events (ovals)
T
Therefore
T = motor fails to operate
P1 = defect in motor
P2 = wire failure (open)
P3 = power supply failure
P4 = switch fails open
P5 = fuse failure under normal
conditions (open)
P6 = wire failure (shorted)
P7 = power failure (surge)
G1 = no current to motor
G2 = fuse fails open
G3 = switch open
G4 = fuse failure due to overload
G5 = overload in circuit
S1 = switch opened erroneously
C1 = fuse fails to open
G1
P1
G2
P2
P4
G3
P3
S1
G4
x
P5
C1
G5
+
P6
P7
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(15.1)
(15.2)
(15.3)
(15.4)
(15.5)
(15.6)
The top event, therefore, contains 6 single component minimum cut sets and
2 double component minimum cut sets.
Quantitative Analysis
calculate the probability of occurrence of the top event, given the fault
Common-Cause Failures
Multiple failures originating from a common cause that fails the system.
the basic events may not always be independent
failure events can be caused by a common environment or have factors in
probabilities
Example: Calculate the probability of occurrence of the top event for the
simple motor example. Assume the probability of occurrence of each basic
event is equal to 0.01 and the probability of the event S1 Switch opened
erroneously is equal to 0.001. Also assume that the condition C1 Fuse
fails open has a probability of occurrence of 0.50.
Solution:
Using the same notation as before we are given
Event
P1
P2
P3
P4
P5
P6
P7
S1
C1
Description
Defect in motor
Wire failure (open)
Power supply failure
Switch fails open
Fuse failure under normal conditions (open)
Wire failure (shorted)
Power failure (surge)
Switch opened erroneously
Fuse fails open
Probability
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.001
0.50
The probability of intermediate events can be evaluated using the fault tree.
The probability of the top event is given by the union of the minimum cut sets
determined before as
0.051
0.01
0.011
0.01
0.01
0.02
0.01
0.001
0.01
x
0.01
0.50
0.02
+
0.01
0.01
References
McCormick, N.J. 1981. Reliability and Risk Analysis: Methods and Nuclear Power
Applications. Academic Press, New York.
United States Coast Guard. 2004. Risk-based Decision-making (RBDM)
Guidelines (3rd Ed.), Vol. 3 Risk Assessment Tools Reference, Chapter 9
Fault Tree Analysis.
(http://www.uscg.mil/hq/g-m/risk/E-Guidelines/RBDMGuide.htm)
United States Nuclear Regulatory Commission (NRC). 1981. Fault Tree
Handbook. NUREG-0492. Systems and Reliability Research Office of
Nuclear Regulatory Research, Washington, D.C.
(http://www.nrc.gov/reading-rm/doc-collections/nuregs/staff/sr0492/sr0492.pdf)