The Top 10 Worst-Performing Alarm Systems in Industry

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

The Top 10 Worst-Performing

Alarm Systems in Industry EMPOWERING PEOPLE. DRIVING ASSETS.

Alarm Management is the current “hot topic” Steps to implement based on alarm system
in the process industries. Overloaded and performance after the first 3 steps:
poorly performing DCS alarm systems are Step 4: Perform Alarm Documentation and
common and have been identified as contributing Rationalization (D&R)
factors in several major accidents including
those at BP Texas City in 2005 and Texaco Step 5: Implement Alarm Audit and
Pembroke refinery in the UK in 1994. Enforcement Technology
Step 6: Implement Real Time Alarm Management
To improve an alarm system, it is essential to — and, of course —
perform an initial benchmark. Benchmarking a
system has many benefits. It provides a basis to Step 7: Control and Maintain Your Improved System
compare a system against industry best practic- The following examples are taken from analy-
es as well as a reference point to measure im- ses from many different process industries,
provements at the end of an alarm management including Refining, Petrochemical, Power, and
project. Other benefits include creating solid Pulp & Paper. The facilities are located all over
data driven analysis to communicate the state the world. Of course, the specifics are omitted
of the alarm system to appropriate stakeholders for confidentiality reasons. In every case, there
at a site, and justify further investment in alarm are many “close runners-up” to the “winners”
system improvement. Finally, a significant ben- shown, as these are very common problems,
efit in a benchmark study is the identification spread throughout the biggest names in the
of bad acting alarms. As a standard practice, processing industry.
PAS identifies bad acting alarms in the initial
benchmark report. Our experience indicates All of the analyses shown are based upon the
that breakthrough gains can be realized simply span of control of a single board operating po-
by resolving the bad acting alarms. sition. Later. when we mention 3,517 alarms
in ten minutes, we will be referring to a quan-
There are several different alarm problems to tity of alarms presented to a single person, not
examine, with differing solutions. In perform- multiple people.
ing these analyses, some amazing phenomena
have been documented, presented here as And now, the top 10 worst-performing
examples of how bad things can get. alarm systems.
The solution to alarm problems can be Number 10: Worst Diagnostic
achieved by following the seven-step process
Alarm Percentage
developed by PAS. The methodology and all
other aspects of alarm management are de- This measure is the extent to which alarms in-
tailed in The Alarm Management Handbook dicating malfunctioning instruments are a per-
– A Comprehensive Guide, which is available centage of the overall alarm load. A high abso-
at www.PAS.com and on www.Amazon.com. lute count of such alarms indicates significant
The implications of, and solutions to these maintenance problems with the instruments.
problems are presented in much more detail A high percentage of instrument diagnostic
than can be accomplished in this brief paper. alarms indicates that important process alarms
The book is intentionally designed with a are likely to be “buried” in the alarms from the
very “how-to” focus. malfunctioning instruments. Our “winner” in
this category has both high counts and high
Here are the seven steps to a highly effective
percentages – the alarm system is dominated
alarm management system:
by BAD VALUE Alarms.
Always-needed initial steps:

Step 1: Develop, Adopt, and Maintain an


Alarm Philosophy
Step 2: Collect Data and Benchmark Your Systems
Step 3: Perform “Bad Actor” Alarm Resolution
EMPOWERING PEOPLE. DRIVING ASSETS.

Worst Diagnostic Alarm Percentage Modern sensors can generally provide all of
the accuracy needed over the entire range
71% of the entire annunciated alarm system that the process is likely to vary. But some
load is from instrument malfunction alarms. engineers continue to follow the older con-
They averaged more than 600 such alarms per figuration practices and do not consider the
day during the 24-day analysis. consequences of generating lots of Bad Value
alarms during conditions such as startup and
shutdown. The correct practice is to configure
the instrument for the entire possible range of
the process value under all conditions, the ac-
curacy obtained then checked, and if needed,
a better sensor specified.

These situations must be addressed in a


prompt manner since often an instrument mal-
function removes an identified, rationalized
indicator of an abnormal situation from the
operator’s view. The time that operators spend
confirming the instrument problems reduces
their attention to other operator duties.

Figure #10: Worst Diagnostic Alarm Percentage Generally the addition of a new instrument
must follow a management-of-change meth-
Commentary and Solutions: odology, to ensure it is done properly. So
does the removal of an instrument, to ensure
It is surprising to see the amount of “bad
that it is truly not needed and the removal is
measurement” alarm events on most systems.
done properly. And functionally, the indefinite
These are often in the hundreds or thousands. If
toleration of a malfunctioning instrument is
the best control engineers in the company had
the same as removing it. If there is an incident,
been specifically asked to design instruments
it will be difficult to explain how a relevant
that would have such poor performance, it is
instrument was allowed to malfunction for
unlikely that they could have done it! Yet, we
months – to effectively be removed from ser-
find these on almost every system we analyze.
vice – without the appropriate level of review.
Since no instrument was designed to be in such This is the stuff of fines and lawsuits.
a state, every one of these situations can be
fixed, and should not be “just tolerated” – as Number 9: Worst Nuisance
is often the case. They are misconfigured in Alarm Percentage
range, in “measurement clamping,” or there is
We usually find that only 10 to 20 different
an installation problem (impulse leads filling up,
configured alarms make up from 20% to 80% or
etc.) The original justification for installing a flow
more of all alarm events in a system. Those most
meter probably did not include a specification
frequent alarms were not originally designed to
that it was OK if it didn’t work half of the time! If
annunciate hundreds (or thousands) of times per
that had been proposed, the money would have
day, but they do. These alarms are called “nui-
never been spent to buy it in the first place.
sance” as they deliver no value to the operator.
For example, a typical problem we see in- In fact, their high rate of occurrence becomes a
volves out-of-range alarms from transmitters. hindrance to the operator’s ability to identify im-
Long ago, the available instrument sensors had portant alarms during a process upset. Address-
a significant tradeoff between accuracy (signifi- ing them and making them work properly will
cant digits) and range; you could obtain high substantially improve an alarm system, provide
accuracy only over a small range, probably less immediate and much-needed relief to the opera-
than the possible variation of the process. tor, and is not difficult or very time consuming to
do. Our winner in this category:
EMPOWERING PEOPLE. DRIVING ASSETS.

Worst Nuisance Alarm Percentage The calculation methods, initial parameter


selections, and other cures for several similar
98% of all alarms are from the top ten bad problems are beyond the scope of this paper,
actors. This data is covers a one-year period. but are covered fully in the chapter “Common
(See figure #9 below.) Alarm Problems and How To Solve Them” in
The Alarm Management Handbook.

Number 8: Worst Alarm


Suppression
In a DCS, you intentionally configure an alarm
on a point. You assign values and a priority
that cause it to annunciate to the operator.
Most DCSs have another setting on the alarm
that you can select – an alarm suppression
setting. If suppression is “OFF” then the alarm
works normally. If suppression is “ON”, then
the alarm behaves as if you never configured it
in the first place. We call this situation “Alarm
Suppression.” This is a potentially very danger-
ous setting to manage, and we have very often
Figure #9: Worst Nuisance Alarm Percentage seen it mismanaged in very hazardous ways.
It is fundamentally different than intentionally
Commentary and Solutions: “de-configuring” the alarm.

Again, in this system, half of the top 10 alarms DCS alarms systems are notoriously easy to
are related to instrument malfunction, previ- change, and inadequate control over such
ously discussed. The others are from a pressure changes is common. Security settings in most
switch, a flow meter, and command-disagree DCSs are insufficiently granular to allow op-
signals from several motor-operated valves. erators to make the kinds of changes that are
Commonly, we see every possible alarm type needed, yet restrict them from making inap-
in the “top 10.” propriate alarm system changes. The response
to a nuisance alarm is often to suppress it. We
Chattering and fleeting alarms are where the have seen critical alarms disabled for months,
alarm appears and clears faster than would be with no records, no approvals, no repair ef-
possible by the application of an appropriate forts, and no other actions taken. Paper-based
operator action. The top 10 most frequent alarms management-of-control systems are rarely
usually contain several alarms that chatter or are effective. The practice of uncontrolled alarm
fleeting. These can be addressed in a variety of suppression is highly dangerous, and unfortu-
ways. First, the requirement to have the alarm and nately common.
a proper alarm trip point (relative to the normal
variation of the process) should be confirmed. A very common DCS, for example, has a
Also needed and effective are proper deadband much-abused suppression setting called DIS-
selection (if an analog signal, or proper “mechani- ABLE. When this is used, the alarm event is
cal deadband adjustment” of field switches), and still produced for the electronic journal, but
the proper application of alarm delay times. not annunciated to the operator. Thus, you can
ON-Delay times prevent alarms from being an- analyze the count of alarm events that have
nunciated to the operator until they have been in been suppressed vs. those the operator sees.
effect for a certain number of seconds, which can Our winner in this category is:
eliminate most chattering or “fleeting” alarms.
OFF-Delay times do not delay the initial presenta-
tion, but instead turn a string of chattering alarms
into a longer-duration single alarm event.
EMPOWERING PEOPLE. DRIVING ASSETS.

Worst Alarm Suppression effective operation is for three different prior-


ity levels, with an approximate alarm event
98% of all alarm events have been suppressed distribution of 80% LOW, 15% HIGH, and 5%
from the operator’s view. (See Figure #8 below.) EMERGENCY priority.

Without a consistent and logical method for


determining priority, typically the distribution will
be more heavily weighted towards the upper end.

Worst Annunciated Alarm Priority Distribution


1% LOW, 80% HIGH, 19% EMERGENCY

Note that we have seen systems configured with


as many as 24% of the alarms set to the high-
est possible priority – which produces a system
ineffective at providing truly useful guidance to
the operator and actually devalues the impor-
tance of those alarms. (See Figure #7 below.)

Figure #7 Worst Alarm Priority Distribution

Figure #8: Worst Alarm Suppression

Commentary and Solutions:

In this case, the operators have actually made the


alarm system almost meet the EEMUA-related
“Manageable” rating of ~300 alarms per day. The
method, however, was by suppressing the other tens
of thousands of alarms – many of the highest priority.
This is a hazardous situation.

An ongoing program of nuisance alarm detection


and resolution is needed. Alarm settings must
be locked down from inappropriate change and Commentary and Solutions:
undocumented suppression. Needed here (and at
many other facilities) is an Alarm Shelving System, The best solution for an improper alarm prior-
in which a programmatic overlay is used to allow ity distribution is Alarm Rationalization. This is
for temporary but controlled alarm suppression. A the review of an existing alarm system with the
proper shelving system controls, authorizes, and intent of insuring alarms exist only when op-
documents suppression of nuisance alarms until erator action is needed, insuring duplications
they can be repaired. It shows the overall list and are eliminated, documenting the rationale
relevant status information. It does not allow for for each alarm, solving various alarm-related
such alarms to be “forgotten,” and issues periodic problems, and assigning priorities in a logical
reminders for re-enabling via a “snooze” function. and consistent manner.

Number 7: Worst Alarm Number 6: Worst Alarm Daily Rates


Priority Distribution – Including Suppressed Alarms
Alarm priority is used to differentiate alarms so In this measure, we can see how easy it
that the operator can address the most impor- is to generate tens of thousands of alarms
tant ones first. This is particularly important events per day with an improperly config-
during upset conditions. Best practice for ured alarm system.
EMPOWERING PEOPLE. DRIVING ASSETS.

Worst Alarm Daily Rates Including over a long analysis period. The intent is to
Suppressed Alarms show that these rates are not “aberrations” –
they are sustained conditions.
• Worst Average Daily Rate: 26,665 alarms
per day (recorded – 1 every 3 seconds)
• Worst Daily Total: 48,803 in one day (>1
every 2 seconds)
• Based on an 18-day analysis period

Figure #6: Worst Alarm Daily Rates


Including Suppressed Alarms

Figure #5: Worst Average Daily Rate


– Annunciated Alarms Only (41 Days)

Commentary:

The rate far exceeds the ability of a single


operator to process. The lower-value line indi-
cates the alarm rate if the top 10 most-frequent
alarms were eliminated. In this case, the result
is still too high, but the difference is quite large.

Number 4: Worst Annunciated


Commentary:
Alarm Burst Rate
Again, a large difference is shown between the
alarms produced on the system and the small- (Highest count in a 10-minute period present-
er number of alarms presented to the operator ed to the operator)
due to uncontrolled alarm suppression.
“Alarms per 10 minutes” is an important mea-
sure. The minutes leading up to an accident
Number 5: Worst Alarm Daily Rates have been shown many times to be the time
– Annunciated Alarms Only when the alarm system is needed most, and
performing at its worst. Human factors stud-
In this measure, alarm overload of the operator
ies have shown that the maximum amount of
is clearly apparent.
alarms that can be effectively analyzed and
Worst Average Daily Rate – Annunciated dealt with by an operator is approximately 10
Alarms Only alarms in 10 minutes. Such a rate cannot be
handled for hours on end, however.
• Worst Average Daily Rate: 10,858 alarms
per day (average 1 every 8 seconds) Even if an alarm system “averages” less than the
10-in-10 threshold rate, the system is at high risk
• Worst Daily Total of this system: 25,324 in
for important alarms to be missed by the operator
one day (average 1 every 4 seconds)
during the periods the threshold is exceeded. The
“average” is a very misleading measurement.
Note that this daily total is not the “peak”
worst rate in this paper (shown later) – but
of this system, which had the highest average
EMPOWERING PEOPLE. DRIVING ASSETS.

Worst Annunciated Alarm Burst Rate Worst Average Time-In-Flood


per 10 Minutes
This alarm system is in flood for >89% of the
The worst alarm burst rate consists of 3,517 time (42 day analysis).
alarms in a single 10 minute period. About
77% of the 10-minute periods in this 102-day
span exceeded 10 alarms.

Figure #3A: Worst Average


Time-In-Flood Percentage

Alarm Flood Analysis – Petrochemical System


Figure #4: Worst Annunciated Alarm Burst Number of Floods 51
Rate per 10 Minutes Floods Per Day 1.2

Commentary: Total Alarms in All Floods 294,311


Average Alarms per Flood 5,771
This alarm system is much more of a hindrance Highest Alarm Count in a Flood 215,313
than a help to the operator.
Percentage of Alarms in Floods vs. 99.7%
All Annunciated Alarms
Number 3: Worst Average Time- Total Duration of Floods in Hours 899.33
In-Flood Percentage Percentaage of Time Alarm System 89.2%
is in a Flood Condition
Alarm flood analysis is more sophisticated
than just “Alarms per 10 minutes.” We de- Figure #3B: Worst Average
fine an alarm flood event as beginning when Time-In-Flood Percentage
the rate exceeds 10 alarms in 10 minutes,
and ends only when the system rate declines
to producing less than 5 alarms in 10 min-
utes – a manageable amount. Alarm floods
can make a difficult process situation much
worse. Alarm floods can go on for hours or
days. For that overall period of time, usually
during system upsets, the alarm system can
be much more of a hindrance than a help to
the operator. The winner:
EMPOWERING PEOPLE. DRIVING ASSETS.

Number 2: Longest Single Most alarms in a process unit pertain to the


Alarm Flood normal operating state of a piece of equip-
ment. However, equipment often has several
Floods can be minor to severe, infrequent to normal, but differing, operating states. A few
frequent. During a severe flood, the alarm common state examples include:
system could possibly be more help to the op-
erator if it was actually turned off, rather than • Running
being such a major distraction. • Not Running
• Use Feed “A” or “B”
Longest Flood: 106.8 hours (4.4 days) • Make Product “C” or “D”
• Full Rates
• Half Rates

Algorithms are created for the automatic moni-


toring and detection of various plant states,
and when detected the proper alarm settings
for those states are automatically put into ef-
fect. Since most alarm floods arise from the
unexpected shutdown of certain equipment,
the proper settings for those events can be
determined and placed into effect in real time
when such events are detected.

For example, when a compressor trips, there are


usually many diagnostic alarms immediately
produced. These are a distraction to the operator
at that time; the important alarms are from the
remainder of the system as the upstream and
downstream effects of the shutdown must be
properly managed. Only later are the diagnos-
Figure #2: Longest Single Alarm Flood tics needed, for restart of the compressor.

Commentary: Real-time alarm management is a sophisti-


cated technology, and the implementation
During this longest 4.4 day flood, more than involves careful consideration of many factors.
370,000 alarms were presented to the opera- While beyond the scope of this article,
tor – an average rate of about 2,200 alarms per they are covered in depth in The Alarm
hour, or more than 370 per 10 minutes. This Management Handbook.
is significantly higher, and of longer duration,
than the alarm system performance leading
up to the well-researched and written-about
Texaco Pembroke refinery explosion. This
alarm system is much more of a hindrance
than a help to the operator.

The curing of significant alarm flood problems


gets into the realm of real-time alarm adjust-
ment techniques. This must be overlaid on
top of an alarm system that has already had
nuisance alarm elimination and rationaliza-
tion. Real-time techniques readily address the
problems of state-based alarm adjustment and
flood suppression.
EMPOWERING PEOPLE. DRIVING ASSETS.

Number 1: The Worst-Performing The impact on the process network devices


Individual Alarm associated with extremely high alarm rates
can be severe. Historian capacity overflows
The winner of the Worst Performing Individual are a common problem. In extreme cases,
Alarm: A Low Pressure Alarm from a Pressure significant network or server bandwidth can
Controller (note: an Analog Signal, not a pres- be consumed, control response slowed, and
sure switch) needed information can be lost.

Our winner very steadily produced an alarm It was the installation of our PAS alarm
rate averaging over 1,400 alarms per 10 min- analysis software, which successfully
utes for a period of 41 hours (over 200,000 in captured and analyzed this event data that
a single day). This is 142 alarms per minute, identified this problem and led to the proper
or about 2.3 per second. In combination with diagnosis and repair.
three other nuisance alarms, this system pro-
duced a single-day peak of 208,311 alarms, • Average: 173,657 alarms per day
the highest single-day total we have encoun- • Peak: 180,488 alarms per day (> 2 per second)
tered (although not by much).
Summary:

Overloaded and malfunctioning alarm systems


are common throughout industry. The problems
can be identified, isolated, and solved through
proven alarm management methodologies, such
as those described in The Alarm Management
Handbook, coauthored by Bill Hollifield and
Eddie Habibi of PAS.

It is unfortunate that bad alarm systems continue


to negatively impact the profitability, safety
and environmental performance of process-
ing plants worldwide. The good news is that
solutions are available today to improve the
alarm system and to help bring some sanity to
the control room.

About the Author: Bill Hollifield is PAS’s


leading expert in Alarm Management, with
multi-company, international experience in
Figure #1: The Worst Performing
all aspects of the subject. At PAS, Bill is Prin-
Individual Alarm
cipal Consultant responsible for the Alarm
Management work processes, intellectual
Bonus Number 0: A Special Case
property and software product directions. He
The Highest Sustained Alarm Rate Ever Recorded is a voting member of the ISA SP-18 Alarm
Management committee, and co-author of
Here is a special case that was not included
the new ISA book, Alarm Management: Seven
in any of the above examples. In a new
Effective Methods for Optimum Performance.
system being brought online, there was a
grounding problem with the Safety Instru-
mented System’s PLC. This caused over
two weeks of extremely high-rate diag-
nostic alarms, averaging over 170,000 per
day. The alarms were not configured to be
visible to the operator.

You might also like