Introduction To Reliability and Maintenance

Introduction to
RELIABILITY and
MAINTENANCE
Reliability.Asset.Integrity Center
1-2
Session Objectives
To recognize the importance of reliability

To understand the basic definitions of reliability and its
measures
To understand the concept of bathtub reliability curve
To understand basic methodology in reliability analysis and its
relation to maintenance
1-3
Highly competitive business environment
Increased concern in safety and

environment
Tight profit margin Safe,
OPERATIONAL ISSUES
Escalating operational cost Reliable,

and
Increased system complexity Pressure
Efficient
Plant
Depletion in oil and gas resources
Increased in demand
Changes in material, operating
conditions, equipment ages
1-4
Why RELIABILITY?
PETROCHEMICAL BUSINESS DRIVERS

Reduce operational cost
Healthy, Safe and environmental friendly operation
Maximize utilization
Meeting operation target and customer demand
Reduce wastes, failures and downtime
High availability
Continuously improve plant performance
RELIABILITY
DIRECTLY IMPACTS
ALL THESE
1-5
Reliability and Organizations profitability
Recent incident of oil spills in the Gulf of Mexico had

caused an estimated of USD 23 Billion loss to BP
What causes it?

Bad cement job
Failure of the shoe track barrier
The negative pressure test was accepted when it should
not have been
Failure in well control procedures
Failure in blow-out preventer failures
Rigs fire and gas system failed to prevent ignition
Source: BP report, www.bp.com
1-6
System Performance Improvement
Improve System
performance
Improve Reliability Improve Maintainability
Prolong the life of

equipment/component
Minimize Downtime
Study Reliability Estimate and reduce

Engineering issues Failure rate
(Modarres, et al (1999))
1-7
Failure Causes for Engineering Components and Systems
Causes Descriptions
1. Poor design Improper design, dimensions, tolerances, stress concentration, no
interchangeability of parts
2. Improper installation Improper foundation, excessive vibration, inadequate inputs (i.e voltage
etc.), wrong techniques/tools
3. Incorrect production Outdated technology, wrong equipment, lack of process control and
calibrated equipment, inadequate training
4. Improper maintenance Under/over maintenance, wrong tools/technique, poor spare part

management, insufficient skills and training
5. Complexity More number of components, interfaces and interconnection
6. Poor operational instruction Wrong instruction, lack of clarity, difficult to understand, poor language
/ SOP
7. Human error Lack of understanding of process and equipment, carelessness,

forgetfulness, poor judgmental skills
1-8
What is RELIABILITY?
the probability that the item will perform its required

function under given conditions for the time interval
Probability describe stochastic (random) behaviour of occurrence

of failure
Required function the designed function of the system
Given conditions the external condition in which the system
usually operates
Time interval the design life period of the system
1-9
RELIABILITY MEASURES
MEAN TIME TO FAILURE (MTTF)
The average time that elapses until a failure occurs. It

is for non-repairable item
1 n
MTTF ti
n i 1
Example:
Consider 6 similar type components have failure time of 23,

34, 32, 28, 19 and 27 days respectively
MTTF = (23+34+32+28+19+27) / 6 = 27.2 days
1-10
MEAN TIME BETWEEN FAILURE (MTBF)
The average time between successive failures. It is

used for repairable systems when failure rate is
assumed to be constant (random failure).
1 n
MTBF xi
n i 1
Example:
50 30 60 46
Uptime
Downtime
Fail Fail Fail Fail Time (days)
MTBF = (50+30+60+46) / 4 = 46.5 days
1-11
FAILURE RATE (HAZARD RATE)
Failure rate (hazard rate) is the conditional

probability that a component fails in a small
time interval given that it has survived from
time zero until the beginning of the time
interval.
What is the probability of
failure?
survive
t t + t
time
Note : Failure rate term has been widely used to describe reliability of both non-
repairable components and repairable system. The more appropriate term for non-
repairable is hazard rate, and for repairable is rate of occurrence of failure
(ROCOF)
1-12
FAILURE RATE (HAZARD RATE) CTD
Failure rate is an important function in Reliability study since it

describes changes in the probability of failure over the lifetime
of the item hence the items reliability performance
Increasing rate = reliability deteriorates

Decreasing rate = reliability improves
Constant rate = reliability maintains
1-13
Bathtub curve
Bathtub curve is a conceptual model of the reliability
characteristics (failure rate) of a component or system over its
lifetime. It is divided into three regions
Early failures
Failure rate
Wear out
2
Useful life
time
1-14
Bathtub curve
Infant mortality or burn-in period

1
Failure rate is initially higher due to
Early failures issues such as improper
Failure rate
manufacturing, installation and poor

materials
time
1-15
Bathtub curve
Failure rate is approximately constant as the

failures, assumed mostly stress-related occur at
random. This flat-portion of bathtub is also
referred as components or systems normal
operating life where realistically many
components or systems spend most of their
Failure rate
lifetimes operating
Useful life
time
1-16
Bathtub curve
Increasing failure rate

because of degradation
phenomena due to wear
Failure rate
out. Wear out is generally 3
caused by fatigue, Wear out

corrosion, creep, friction
and other aging factors
time
1-17
Failure rate curve Repairable system
Original system Original system Improvement # 1 Improvement # 2

decreasing failure useful life phase system wear out system wear out
rate phase phase phase
Failure rate
Major
Major
maintenance Original fielded
maintenance
action system failure
action
curve
t1 t2 tn
time
Useful life extension
Equipment / system useful life phase extension (Wasson, 2006)
1-18
Various types of Failure rate curve

1. Traditional view Typical equipment : Maintenance strategy:
(random failure then wear out)
Belt, chains, impellers Preventive Maintenance
2. Bathtub curve
Electro-mechanical Condition monitoring
components and motors
3. Slow aging
(steady increase in failure rate)
Turbine, engines, Condition monitoring

compressors, piping
1-19
Various types of Failure rate curve

4. Best New Typical equipment: Maintenance strategy:
(sharp increase in failure rate, then level
off)
Hydraulic and pneumatic Condition based
equipment maintenance
5. Random failure
(failure rate is constant, no age related Ball and roller bearing Condition based
failure pattern)
maintenance
6. Worst New
(high infant mortality, then random failure)
Electronics equipment Condition based

/components maintenance
1-20
Reliability Analysis
Statistical concepts play critical roles in Reliability analysis/

techniques
Applications of Reliability techniques in real-world problems
generally involves three main elements:
Acquisition effective and efficient data collection
Analysis description and analysis of data (descriptive and
inferential statistics)
Interpretation of data use the result to solve the problem
1-21
General Methodology for Reliability Analysis
Setting Objectives
Definition of system
and failure
Data gathering
Exploratory analysis
Distribution Analysis Recommendations for

Operation and
Maintenance
Estimation of Reliability improvement
Measures
1-22
Setting Objective
Clear objective is very important factor for successful reliability

study
Have clear definition of the specific purpose to be achieved at

the end of the analysis
The objective of the reliability study has high influence on the

approach and method of modeling and analysis used
Precise objective will set proper conditions for appropriate

collection of relevant maintenance data to be used in the
analysis
1-23
System Definition
Example:
Gas Compression Train
System Boundary
(adapted from OREDA (2002))
Recycle
valve
Inlet Gas
conditioning
(Scrubber, Cooler
etc.) Inlet Inter-stage
valve
Conditioning
(Scrubber,
Fuel/Gas control Cooler etc.)
Fuel/ Local valve
Gas Fuel/Gas
inlet Exhaust
Equipment
Gas Power Compressor unit After

Generator Gear Box 1st 2nd Cooler
Turbine
stage stage Outlet
valve
Air Air inlet
Equipment
System
Shaft
Starter Lubrication Control and
seal Miscellaneous boundary
system system monitoring
system
Power Coolant Power Remote Instr. Power Coolant
1-24
Source of Data
Historical Data test and field data on the same components /equipment
Vendor data Data from manufacturer / vendor / consultant
Test data experimental data of the parts
Operational data Field data collected under actual operating conditions
Handbook data theoretical data from standard engineering handbook,
Reliability database i.e. OREDA, MIL-HDBK 217F
Judgmental data information based on expert opinion inputs
Cost data data on sales, maintenance and operational costs
1-25
Operational Data
Main categories of data for reliability analysis :

Inventory data information on equipment related to design,
operational, functional and environmental characteristics. Can be classified
under equipment identification, manufacturing and design, maintenance
and test, engineering and process data
Failureevent data detailed records on failure incidents i.e. event

date; duration; modes; causes; codes; severity and effect on system;
downtime date and duration
Operating time data the time and duration for each operating state
i.e. operation, standby and downtime
1-26
Types of Data
Complete Data Exact time to

failure is known
Right Censored (Suspension) ? Item is still running

at the end of
observation time
Left Censored
Failure time is only
known to be before a
? certain time
Interval Censored
Failure time is
? between interval
1-27
Exploratory Data Analysis
Use statistical tools and techniques to investigate data sets in

order to gain insight about the data, understand their important
characteristics, identify outliers and extract important factors
Common Exploratory Tools
Histogram
Pie chart
Pareto
Box plot
Trend chart
scattered plot
27
1-28
Exploratory Analysis
TCS
Example PIE CHART 9%
TCS GT
No. Subsystem Code 25% 31%
GT
1 Gas Turbine GT PRO
39%
18%
2 Centrifugal Gas Compressor GC
3 Starter System STS
4 Gearbox GB
LOS
5 Fuel System FS
LOS 3%
6 Vibration Monitoring System VMS 7% AVS
7 Anti-surge Valve System AVS 9%
GC
8 Lube Oil System LOS AVS VMS
18%
9 Process and Utilities PRO 14% GC 3% FS STS
VMS STS 7% 3%
10 Turbine Control System TCS 4% 4%
Train 1
6%
Train 2
25 100 14
TREND
TCS
12
20 80 PRO
10 LOS
PARETO
cummulative %
no of failures
15 60 8 AVS
failures
VMS
6
10 40 FS
4 GB
5 20 STS
2
GC
0
GT
0 0 2002 2003 2004 2005 2006 2007 2008 2009
GT TCS GC AVS PRO LOS STS FS VMS GB
Gas compression Train (overall)
28
1-29
Types of Configurations
Series
Parallel
T201A T203-A
T202A
Feed/pure
gas exchanger T201B A201 T203-B M202
M201
Feed gas Absorber Feed gas
separator separator
T202B T201C T203-C
Feed/pure
gas exchanger T201D T203-D
Example RBD for Acid Gas Removal Unit
1-30
Series Configuration
Blocks are connected in a series.
It can be thought of as an OR relationship (i.e. The system

fails if A OR B fails).
It implies no redundancy in the components.
If units are in series, then all units must for the system to work.
If any unit in the series fails, then the system fails.
The reliability of the system is given by:

Rs = R 1 R2 Rn
R1 R2 R3
1-31
Reliability Calculation for Series System
Calculate system reliability given R1 = 0.90, R2 = 0.95

and R3 = 0.98.
R1 R2 R3
RS = R1 R2 R3
= (0.90)(0.95)(0.98)
= 0.8379
1-32
Reliability Calculation for Series System
What is the system reliability and failure rate?
Assuming that the components are having a constant failure rate.
Then, the system reliability is

So, the failure rate for
Rs (t ) R1 (t ) R2 (t ) R3 (t ) the system is
S 1 2 3
1t 2 t 3t
e e e
e ( 1 2 3 )t
R1 R2 R3
1-33
Exercise for Series System
Consider a system with three components in series.
R1 R2 R3
You are required to achieve a system reliability of 0.98 over a 800-hours

non-stop operation.
1. What would be the target failure rate for the system?
Rs (t ) e S t
0.98 e S (800)
ln( 0.98) S (800)
ln( 0.98)
S
800
S 2.53 10 5 per hour
1-34
Consider a system with three components in series.
R1 R2 R3
You are required to achieve a system reliability of 0.98 over a 800-hours

non-stop operation.
2. What would be the system MTBF be?
1
MTBFS
S
1

2.53 10 5
39599 hours
1650 days
1-35
3. Assuming the component failures are identically distributed,

a) What should be the component failure rate?
S 1 2 3
2.53 10 5 3
R1 R2 R3
2.53 10 5

3
8.42 10 6 per hour
b) What would be the component MTBF? 1 1
MTBF
8.42 10 6
118,796 hours
4950 days
c) What should be the component reliability?
R (t ) e t
6
e (8.4210 )( 800)
0.993
1-36
Parallel Configuration
A system will fails when all the units fail.
It can be thought of as an AND relationship (i.e. the system

fails if 1 and 2 and and n fail)
At least one unit must succeed for a successful mission.
The reliability of the system is given by:

1
Rs = 1 [(1-R1) (1-R2) (1-Rn)]
2
3
.
.
n
1-37
Reliability Calculation for Parallel System
Calculate system reliability given R1 = 0.90 and R2 = 0.98.
1
RS = 1 [(1 R1)(1-R2)]
= 1 [(1 0.90)(1 0.98)]
= 1 (0.10)(0.02)
2
= 1 0.002
= 0.998
1-38
Combination of Basic Configurations
Any of the previous configuration types can be used

simultaneously in one diagram.
Consider a system having subsystems.
3 4
1 2 6
1-39
Steps to calculate system reliability for
combined series-parallel configuration
1. Break the system into smaller series and parallel

arrangements.
2. Calculate reliability of each arrangement identified

in step 1.
3. Finally, calculate RS using the reliability obtained in

step 2.
1-40
k-out-of-n Redundancy
At times, a system function is such that k-out-of-n of its

components need to be working for the system to function.
1
1
2
2
3/4
k/n
3
3
4
4
.
.
.
1-41
A node is used to signify k-out-of-n redundancy.
The basic property of the node is to define the

number of incoming paths that must be Good for
the system to be Good.
1-42
For n identical components (i.e. same reliability values), the

system reliability is calculated as
1 Rs Prob (at least k components are working )

n
2
Px
xk
k/n where
3
n x
P x R 1 R n x Binomial
4
x distribution
. and
.
.
n n!

x x!n x !
n
1-43
Example: k-out-of-n Redundancy
A high pressure boiler is mounted with 5 identical pressure relief

valves. Pressure inside the boiler is successfully controlled by any
three of these valves. If the failure probability of a relief valve is
0.05, compute the reliability of pressure relief valve system.
Solution: This is 3-out-of-5 system where n = 5, R = 1 0.05 = 0.95.
n
n
Rs R x 1 R
n x
xk x
0.953 1 0.95 0.954 1 0.95

5! 5 3 5! 5 4

3!5 3! 4!5 4 !
0.955 1 0.95
5! 55

5!5 5!
0.99884
1-44
AVAILABILITY
Definition
The probability that a system or component is
performing its required function at a given
point in time or over a stated period of time
when operated and maintained in prescribed
manner
(Ebeling, 1997)
1-45
AVAILABILITY
Three Types of Availability Measures
1. Inherent, Ai
MTBF Steady state availability which considers only corrective
Ai = maintenance (CM)
(MTBF + MTTR)
2. Achieved, Aa
MTBM Steady state availability which include both corrective
Ai = maintenance (CM) and preventive maintenance (PM)
(MTBM + MMT)
3. Operational, Ao
MTBM
Ao =
MTBF = mean time between failure
MTTR = mean time to repair (MTBM + MMT + MLDT)
MTBM = mean time between
maintenance (LDT + ADT)
MMT = mean maintenance time
MLDT = mean logistics down time
LDT = logistics delay time
Uptime
ADT = administrative delay time Ao =
(Uptime + Downtime)
1-46
Operational Availability
Standby Operating
Time Time
UPTIME
Ao =
UPTIME + DOWNTIME
Logistics Administrative Corrective Preventive

Delay Time Delay Time (ADT) Maintenance Time Maintenance Time
(LDT) (CMT) (PMT)
locating tools
Parts availability setting up test preparation time servicing
Waiting for items equipment Fault location time Inspection
/ services finding personnel Getting parts overhaul
reviewing manuals Correcting fault
Test and check out
47
THANK YOU
1-48
References
Modarres, M., Kaminskiy, M. and Krivtsov, V. (1999) Reliability Engineering

and Risk Analysis. Marcel Dekker, New York
OREDA Offshore Reliability Data Handbook, 4th Edition (2002) OREDA

Participants
Ebeling, C. (1997), An Introduction to Reliability and Maintainability

Engineering, McGraw-Hill Companies, Inc., Boston.
Wasson, C. S. 2006. System Analysis, Design, and Development. Hoboken,

NJ, USA: John Wiley & Sons.

Introduction To Reliability and Maintenance

Uploaded by

Copyright:

Available Formats

Introduction To Reliability and Maintenance

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Reliability and Maintenance

Uploaded by

Copyright:

Available Formats

Introduction to

To recognize the importance of reliability

Highly competitive business environment

Increased concern in safety and

Escalating operational cost Reliable,

PETROCHEMICAL BUSINESS DRIVERS

Reliability and Organizations profitability

Recent incident of oil spills in the Gulf of Mexico had

What causes it?

Source: BP report, www.bp.com

System Performance Improvement

Improve Reliability Improve Maintainability

Prolong the life of

Study Reliability Estimate and reduce

Failure Causes for Engineering Components and Systems

4. Improper maintenance Under/over maintenance, wrong tools/technique, poor spare part

5. Complexity More number of components, interfaces and interconnection

7. Human error Lack of understanding of process and equipment, carelessness,

the probability that the item will perform its required

Probability describe stochastic (random) behaviour of occurrence

MEAN TIME TO FAILURE (MTTF)

The average time that elapses until a failure occurs. It

Consider 6 similar type components have failure time of 23,

MTTF = (23+34+32+28+19+27) / 6 = 27.2 days

MEAN TIME BETWEEN FAILURE (MTBF)

The average time between successive failures. It is

MTBF = (50+30+60+46) / 4 = 46.5 days

FAILURE RATE (HAZARD RATE)

Failure rate (hazard rate) is the conditional

FAILURE RATE (HAZARD RATE) CTD

Failure rate is an important function in Reliability study since it

Increasing rate = reliability deteriorates

Infant mortality or burn-in period

manufacturing, installation and poor

Failure rate is approximately constant as the

Increasing failure rate

out. Wear out is generally 3

caused by fatigue, Wear out

Failure rate curve Repairable system

Original system Original system Improvement # 1 Improvement # 2

Equipment / system useful life phase extension (Wasson, 2006)

Various types of Failure rate curve

Belt, chains, impellers Preventive Maintenance

Turbine, engines, Condition monitoring

Various types of Failure rate curve

Electronics equipment Condition based

Statistical concepts play critical roles in Reliability analysis/

General Methodology for Reliability Analysis

Distribution Analysis Recommendations for

Clear objective is very important factor for successful reliability

Have clear definition of the specific purpose to be achieved at

The objective of the reliability study has high influence on the

Precise objective will set proper conditions for appropriate

Gas Power Compressor unit After

Power Coolant Power Remote Instr. Power Coolant

Main categories of data for reliability analysis :

Failureevent data detailed records on failure incidents i.e. event

Complete Data Exact time to