Crow AMSAA PDF
Crow AMSAA PDF
Crow AMSAA PDF
And there use in Interpreting Meridian Energy Ltd’s, Main Unit Failure Data
by
Nigel Comerford
Areva T&D New Zealand
Purpose
To explore the Crow/AMSAA model and its use in monitoring and projecting reliability
growth or performance of existing plant and apply this technique to the main unit failures
of Meridian Energy’s generating system, allowing confirmation of reliability
improvement, those improvements to be quantified, correlation of operational events
against reliability, and forecast of failure rates.
Page 1 of 22
1.0 Executive Summary
Meridian Energy is a New Zealand state owned enterprise, which operates, among other
assets, 38 hydro machines in a deregulated electricity market. It is the Forced Outages or
unit failures as they apply to this system that is the subject of this paper.
The key objectives are to explore the Crow/AMSAA model and its use in monitoring and
projecting reliability growth or performance of existing plant and apply this technique to
the main unit failures of Meridian Energy’s generating system, allowing confirmation of
reliability improvement, those improvements to be quantified, correlation of operational
events against reliability, and forecast of failure rates.
Although Forced Outages, within this industry, do not have a great affect on availability
they do cost money and expose the business to risk.
The Crow/AMSAA technique involves plotting, most commonly, cumulative failures Vs
cumulative time on a log-log scale with the resulting straight lines’ slope indicating
improving, deteriorating, or constant reliability. Instantaneous failure rate can be
determined, and due to the straight-line nature of the plots, forecasts can be made of
failures into the future. This method handles mixed failure modes, so is therefore suitable
for the complex nature of the generating units, which at a system level exhibit random
failures of mixed modes.
A review of the last five years of Forced Outage data has determined the average
estimated cost of a Forced Outage at $6500 per event.
Crow/AMSAA plots of the last nine years of data show that although on average the
reliability of the system is constant there was a deteriorating situation up until the turn of
the century, then there has been consistent year on year improvements; that is we were at
a level then deteriorated and have since improved to be at our best now.
This performance mapped against operational history shows this improving situation
started after the unsettled period of the late 90’s and the Automation & Remote Control
project finishing and seems to have been driven by the Event Analysis system. The
analysis also shows that although the numbers of machine starts and the amount of
generation have both increased slightly and that the direct maintenance spend has dropped
there has still been a great improvement in reliability.
A sound management tool has been proven which allows forecasting of Forced Outages
allowing their affect on availability and unplanned maintenance costs to be known.
Combining this information with the cost of each Forced Outage, shows that if there had
been no improvement of the performance from the 1999 – 2000 period we could have
expected to see as much as 150 extra Forced outages in the last year alone, the reduction
in Forced Outages that has been achieved equates to savings of $975,000 / year.
3.3 Why should we reduce Forced Outages & how will measuring Forced
Outage Rate in this manner help
Forced outages do not have a great impact on availability however they do impact the
systems exposure to risk, cost money, and they indicate generally the overall health of the
system.
The fewer Forced Outages there are the less exposure to ‘Revenue Opportunity Cost’, the
cost of lost generation due to the units sudden unavailability and the systems inability to
pick-up the generation, and the cost of market imposed penalties for repeated failure to
deliver offered generation.
Overall the health of the generating system is perceived to be better when fewer Forced
Outages are occurring. It is seen as an indicator of the general health of the plant, the
fewer Forced Outages there are, the fewer high priority alarms, the fewer corrective
maintenance work orders, and the fewer unknown plant conditions. Our probable
exposure to “the big one”, that 1 in 200 Forced Outage event that causes considerable
plant damage, cost, or injury is less frequent when our failure rate is less.
Effective measuring, allowing transparency of reliability, cusps in the failure rate, and
Forced Outage forecasting and quantification will allow better management decisions to
4.1 History
Reliability Growth Modeling has its origins in the tracking of the improvements in
manufacturing times and has been exhaustedly demonstrated as a true log-log
phenomenon. T. P. Wright in 1936 pioneered an idea that improvements in the time to
manufacture an airplane could be described mathematically. Wrights findings showed
that, as the quantity of airplanes were produced in sequence, the direct labour input per
plane decreased in a mathematical pattern that forms a straight line when plotted on log-
log paper.4
Learning curves were used extensively by General Electric and a GE reliability engineer
made log-log plots of cumulative MTBF Vs cumulative time, which gave a straight line,
(Duane 1964). James Duane developed a deterministic postulate for monitoring failure
rates of more complex systems over time using a log-log plot with straight lines.
At the US Army Material Systems Analysis Activity (AMSAA) during the mid 1970’s
Larry Crow converted Duane’s postulate into mathematical and statistical proof via
Weibull statistics in MIL-HDBK-1893.
By the end of the seventies there were several dozen different growth models in use. The
Aerospace Industries Association Technical Management Committee studied this array of
methods as applied to mechanical components and concluded the Crow/AMSAA model
was the best. The U.S. Air Force study, conducted by Dr Abernethy1, including both
mechanical and electrical controls, reached the same conclusion.
4.2 Application
Although the Crow/AMSAA model has its base in measuring reliability growth, defined
as, “The positive improvement in a reliability parameter over a period of time due to
changes in product design or the manufacturing process”, it is the wider definition of
reliability growth management, which is defined as, “The systematic planning for
reliability achievement as a function of time and other resources, and controlling the
ongoing rate of achievement by reallocating of resources based on comparison between
planned and assessed reliability values”, which is more relevant for our purposes.
With reliability growth management in mind the Crow/AMSAA plot has more recently
found other important applications. Many industries are routinely doing Crow/AMSAA
analysis and it is considered best practice for tracking fleets of units to trend reliability,
Table 4.1
Then by plotting the cumulative failures over cumulative time, as in Figure 4.1, and
assuming 1 to 1 scale log paper is used, the following can be determined.
1000
Cumulative Failures
100
10 Failure
Number
8.1cm
Beta
1 0.81
Lambda
10cm
0.29
0.1
1 10 100 1000
Cumulative Time Days
Figure 4.1
Cumulative Failure Events at time t is represented by n(t). The scale parameter λ, is the
intercept on the y-axis of n(t) at t = 1 unit. With the data plotted in Figure 1, λ is read off
as 0.29. The slope β can be measured graphically providing log paper with a 1 x 1 scale is
used; in the above case Beta is 8.1 / 10 = 0.81.
The model’s intensity function p(t) measures the instantaneous failure rate at each
cumulative time point. The intensity function is;
ρ (t ) = λβt β −1
The log of the cumulative failure events n(t) verse the log of cumulative time is a liner
plot if the model applies;
n(t ) = λt β
The reciprocal of p(t) is the instantaneous MTBF, as the reciprocal of C(t) is the
cumulative MTBF, both of which can be modeled as an alternative.
Goodness of fit is indicated by the proximity of the points to a straight line. Curvature or
discontinuities may be observed, but this is part of the process. As improvement in the
reliability of the modeled system becomes apparent, a cusp or corner should appear on the
plot at the point of change. The straight line should be fit from this point onwards to
model the latest process.
The equation, n(t ) = λt β , can be used to predict future failures,
1
⎛ n ⎞β
t =⎜ ⎟
⎝λ⎠
For example the 25th failure, to continue Table 1, will occur at (25/0.29)^(1/0.81) = 245,
therefore the next failure will occur in 245-236 = 9Days.
Ability to plot multiple datasets and their fit lines on one graph
Ease of import of data from common packages such as MS Excel
A number of software packages were trailed for this project, however ‘WinSMITH
Visual’ was chosen as it fulfilled all the above criteria.
All of these types of failures are classified by ‘North American Electricity Reliability
Council’ codes, such as U1, U2, U3, SF, and all of these failures can occur from a mixture
of failure mechanisms and modes, in addition to the expected electrical & mechanical
failure modes, causes such as PLC code, operational policy, human error, and faults on
the connected transmission grid operator’s system, causing unit failures, are all included.
Direct Costs
Average cost of the Initial Callout Work Order, based $330
on the past 12 months of callouts.
Average cost of the Follow-up Work Orders to actually $815
repair the fault, based on the past 12 months of Forced
Outages.
Average cost per Forced Outage over the last five years $440
of the big failure events. Of which there has been three,
MAN03 Baffle Plate, AVI Battery Bank, and BEN03 CB.
Indirect Costs
Average cost of Asset Coordinators, Tactical Engineers, $1000
and AREVA Engineers time for RCFA per Failure Event
based on a subjective estimate @$100/Hr
Average cost of ROC per Forced Outage based on data $3430
from the last five years.
Total $6015
This estimate is sensitive to the accuracy of the data within our CMMS and is drawn for a
number of systems, as there is no consistent effort to record the true actual cost of each
Forced Outage. It is also of note that the affect of ROC seems much less in recent years
than say four or five years ago. This cost could be further inflated by less tangible costs
such as the cost of disruption to scheduled work due to a Forced Outage or the increased
personal risk associated with responding to a callout after-hours.
Figure 8.1
The failure data from the last nine financial years, from July 1995 to the end of June 2004,
has been plotted above in Figure 8.1, using the IEC method. The plot shows a beta value
of 0.969, which indicates that over this period, on average, reliability has been relatively
constant, i.e. no improvement or degradation. However when the discrete points are
examined cusps can be seen indicating periods where improvements have been made and
periods when degradation in failure rate are evident.
The Cumulative Failure Rate is 0.01874 and the Instantaneous Failure Rate is 0.01751,
these two figures are very close, again indicating little overall improvement, on average,
in the last nine years.
Extrapolation of the plot shows that if failures were to continue at the above instantaneous
rate there would be 153 failures in the upcoming financial year.
Figure 8.2
The Cumulative Failure Rate is 0.0207 and the Instantaneous Failure Rate is 0.02485,
again indicating deterioration over time.
Extrapolation of the plot shows that if failures were to continue at the rate they were
around the end of 1999 that is with an increasing rate of occurrence and no added effort to
reduce our failure rate then there would be 240 failures in the upcoming financial year.
Figure 8.5
Figure 8.6
The failure data from July 2003 to the end of June 2004 has been plotted above in Figure
8.6. The plot shows a beta value of 0.562, which indicates that over this period, on
average, reliability has been improving, that is the rate of occurrence of Forced Outages is
reducing.
The Cumulative Failure Rate is 0.01872 however the Instantaneous Failure Rate is
0.01016, which indicates that the rate at which failures are occurring now is less than the
average rate which suggest an improving situation
Extrapolation of the plot shows that if failures were to continue at the above instantaneous
rate there would be 87 failures in the upcoming financial year. Continuing this
extrapolation suggests that if the current rate of improvement was to continue then the
number of Forced outages occurring each year would reduce by about 3 to 4 per year,
however obviously there would be a theoretical floor once the inherent reliability of the
system is met.
Legend M AN Lube Oil Analysis Started Condition M onitoring of CB's VA of UW & M W Aux
Red Line - Annual Beta Value started, M AN VA of Aux plant plant started
started
ECNZ
Finishes, M EL M DT Restructuring
ARC Project started Begins
PRISM Started
TKA O/Haul
DGA Technique in use (started in the 80's),
Battery condition m onitoring started,
Therm ography in use from about 93', UW
EA RCFA Process Started 2M TT Cam e on-line
Lube Oil Analysis in place since early 90's,
M W Lube oil analysis started, BN Racks MAN 1/2 Life Project Started
VA started early 90's, PDA started.
Outsourcing of m aintenance started
1995, ARC Project Com plete PRISM Finished, M EMFOS Started
Asset M angers started to m ove from the
stations to centralised offices
Figure 9.1
c25992 18412203 Page 17 13/12/2006
Sys
MEL Operational Parameters / Beta Value Comparison MW
1800000
Unit
1600000 Starts
1400000 Mtce
Spend
1200000
Scaled Value
Beta
1000000 Value
800000 Poly.
600000 (Sys
MW)
400000 Poly.
(Beta
200000 Value)
Poly.
0 (Unit
Starts)
Poly.
n
p
ar
ar
ar
ar
ar
ar
ar
ar
ar
Ja
Se
Se
Se
Se
Se
Se
Se
Se
M
M
(Mtce
M onths, 95' to 04' Spend)
Figure 9.2
Figure 11.1
If further reduction in the number of Forced Outages is desired, bearing in mind the potential
savings, then any improvement can either be made by further controlling the ongoing rate of
achievement by reallocating of existing resources based on comparison between planned and
assessed reliability values or by a greater direct spend on maintenance. However any program
or physical change to plant designed to reduce the failure rate needs to be done for less cost
than the combined value of the ongoing Forced Outages it is expected to save.
References
1. Abernathy, R ‘The New Weibull Handbook, Forth Edition’, Robert B Abernathy 2000
2. Broeman W, et al, ‘Technical Report No. TR-652 AMSAA Reliability Growth
Guide’, AMSAA 2000
3. ‘Military Handbook – 189 Reliability Growth Management’, US Department of
Defence 1981
4. Barringer P, ‘Problem of the Month, Nov 2002 – Crow/AMSAA Reliability Growth
Plots’ www.barringer1.com, 2003
5. O’Connor, P ‘Practical Reliability Engineering, Forth Edition’ Wiley, 2002