Actionable Alarming: Make Alarms & Situational Awareness Your Closest Allies To Maximize Productivity & Minimize Downtime
Actionable Alarming: Make Alarms & Situational Awareness Your Closest Allies To Maximize Productivity & Minimize Downtime
Actionable Alarming: Make Alarms & Situational Awareness Your Closest Allies To Maximize Productivity & Minimize Downtime
Executive summary
Effective and actionable Alarming has recently
become one of the most examined hot spots of an
automation system demonstrating extremely high ROI
and significant reduction in avoidable mistakes.
In many cases a faulty alarm design, rather than an unstable process, is the
root cause of untrustworthy and nuisance alarm notifications. The presence of
large amounts of alarm ‘noise’ in the system leads to a lack of operator trust in
the system, resulting in genuine abnormal situations being ignored or obscured
by that noise.
Content in this paper is derived from guidelines and standards found in ANSI/ISA 2, EEMUA3,
NAMUR4, API5 and IEC6. These resources offer valuable information to those interested
in alarm management, but can lack practical examples of how alarms actually work in an
operation and the consequential results. Instead of quoting the guidelines, this paper will
guide you through practical examples and demonstrate situational awareness scenarios and
contemporary alarm management methodologies.
What Caused the The continuous evolution of automated control systems from relay boards to Programmable
Logic Controllers (PLCs) to smart devices has dramatically increased the number of configured
Increase in Alarm alarms and their visual impact to a system. Consequently, today’s typical control systems now
have up to five times the number of configured alarms for each operator to manage than just a
Notifications? few decades ago.
During the evolution of control systems technologies, the automation industry began
implementing concepts which expanded in scope to become what is now called the “Internet of
Things.” An aspect of this scenario is prevalent in the manufacturing world; nearly every device
added to a plant in the past decade is “smart,” with smart drives, smart transmitters, and smart
controllers in place.
1
www.automationworld.com/alarm-management-opinions
2
ANSI/ISA-18.2-2009 Management of Alarm Systems for the Process Industries - www.isa.org/
templates/one-column.aspx?pageid=111294&productId=116626
3
EEMUA 191 - Engineering Equipment & Materials Users’ Association “Alarm Systems, a guide to
Design, Management and Procurement” - www.eemua.org/News/Better-alarms-handling.aspx
From a business perspective, going from “analog” to “smart” makes a lot of sense since, for a
relatively small investment, things like smart drives can save up to 50 percent in energy cost.
With that, additional data can be made available to the Process Visualization: Current, Torque,
Speed, Interlocks, Energy, Deviation, Frequency, Supply Voltage, Trip Alarms, Communication
Status, and Diagnostics.
But while this “Smartness” is good, it leads to increased volume of data from devices. Bringing
more information to the attention of the operator is not inherently bad, but a significant portion is
considered “auxiliary information” rather than actionable alarm information.
New System An Alternative Approach for Viewing Data vs. Actionable Alarms
Design When bringing data from devices into the system, the following questions should be asked:
Considerations 1. What information does the operator need in order to run the equipment?
2. How should the operator be properly alerted to abnormal situations without introducing
a source of constant distraction?
How these questions are answered will present an opportunity to rationalize alarms during the
design process. This will ensure the operator is not overloaded with irrelevant information.
Let’s look at an example of a pump motor at a processing plant. We want the operator to be
alerted to issues pertaining to significant, temperature-related abnormalities.
Any aberration in Voltage, Frequency, Speed, and Current can affect the motor, causing it to
overheat. An overheated motor can be caused by a malfunctioning gear box, a drop in voltage,
high frequency, high torque, and other factors, but the end result is heat.
The motor temperature is detected by a PTC thermistor sensor. Motor temperature, in this
example, is the only factor that should have a configured alarm, since heat is one of the main
causes of permanent damage to the motor by melting the windings.
So, we have identified a single critical situation (excessive motor temperature) that can be
caused by a number of different factors. But the other data points can provide context and
clarify the nature of the cause, or in this case identify it as auxiliary information. Data from
boundaries such as Current, Frequency and other operational supplements are auxiliary
information that should only be presented to the operator as supplementary information, not
alarm events.
When the alarm is triggered, the operator can readily discern the anomaly (temperature
deviation from normal), then visualize the auxiliary information that provides context for the
motor’s increase in temperature. If he does not see the relationship between the alarm and
auxiliary data, he can notify the maintenance department to investigate this matter.
The simple steps outlined here demonstrate a natural rationalization process that should occur
for each alarm configured in the system. Let’s outline these steps:
4
NAMUR - Normenarbeitsgemeinschaft für Meß- und Regeltechnik in der chemischen Industrie - NA
102:2008 – Alarm Management - www.namur.net/nc/en/recommendations-and-worksheets/current-
nena.html?tx_nena_pi1%5Bda%5D=138
5
API - American Petroleum Institute - Pipeline SCADA Alarm Management – API RP 1167. - global.ihs.
com/doc_detail.cfm?&item_s_key=00562674&input_doc_number=API RP 1167
6
IEC – International Electrotechnical Commission - IEC 62682 Ed. 1.0 b:2014 - Management of alarm
systems for the process industries - webstore.iec.ch/webstore/webstore.nsf/ArtNum_PK/50243
Determining which information is relevant to the operator can be extremely challenging. The
knowledge for making such decisions may be shared across multiple disciplines, such as
engineers, supervisors, operators and maintenance personnel. Further compounding the
problem is the likelihood that the plant is in an early startup phase while these important
decisions are being made.
It is paramount to ensure that further validation of these decisions is confirmed and fine-tuned
before they are implemented, in order to eliminate erroneous settings which may result in
unnecessary noise for operators. Experience shows that failure to do this can be very costly,
and even more costly to correct after the plant goes live.
Under normal conditions, the goal of an operator is to optimize production. However, the more
What is a system veers off target due to abnormalities or disruptions, the more the operator will be
Situational diverted from that goal.
Awareness in
Context with
Actionable Loss of Life
Alarming? Deviation
From Target No Production
Operating Explostion
Region
Off Spec Product
Figure 2
Equipment Environmental
Deviation from a target against Damage Release
Increased Cost
operations goals
The farther a system goes off target, the more the operators will have to balance and refocus
efforts to keep the system running. This diversion detracts from the primary goal to optimize
production.
Looking across the broad spectrum of modern operations, it is common to see alarm systems
performing poorly and causing the process to run in the “yellow” zone. Additionally, operators
who have to contend with such systems are faced with a daily battle to maintain normal
operations, or even just to stay on their production targets. As a result, the production system
consistently fails to achieve optimal or potential performance, producing a direct, negative
impact on the revenue and profitability of an operation. It should be no surprise that a poorly
performing alarm system can be very costly to an organization.
Situational Awareness
Situational Awareness involves having the appropriate level of awareness of what is happening
in order to properly interpret information and events, make decisions, and take necessary
actions that will impact goals and objectives. Situational Awareness can reduce the noise and
distraction in alarming events. By using Situational Awareness design elements, the time it
takes to recognize a problem is on average 38 percent faster than traditional approaches7 to
Process Visualization screen design. In many cases, diversions from normal operation can be
recognized and mitigated, or corrected, before an alarm condition ever ensues.
7
ASM Consortium case Study, www.asmconsortium.net/Documents/HFES2005BusinessJustificationfor
HFInterfaces-v100b.pdf
Traditional HMI
What Happened? Critical
Alarm
Tool
Impact
Grid
Process
Tool
Trends
Knowledge Operational
Figure 3 SA Graphics Operator Limits
An operator’s reaction to an What is Happening? Knowledge Alarm
Operator Boundaries
anomaly and the tools used in order
to decide which course of action to
take to appropriately respond to the
event.
Interpretation Time
Alarm Time
-40%
In traditional Supervisory Control Systems, an operator uses several tools before making a
decision. This concept is depicted on the right side of the graph, in which the operator received
notification of a temperature alarm on a pump in an alarm data grid. Using the alarm data grid,
he located the graphic representing the pump. Note that in this example the operator only acted
the moment the event became an alarm, so his response was purely reactive.
Next the operator identified the process around the pump in order to understand the alarm
limits or operational limits. The quality of this analysis is usually dependent on the operator’s
knowledge and years of experience, or comes from standard operating procedures.
Another tool the operator used was historical trends that provided historical information on the
pump load. For example, he investigated the duration of this behavior. So far, the operator was
in three different sections of the control system to determine the nature of the alarm.
This approach is different in the sense that the data is always present in
Figure 4 context. This symbol unifies the following information: (1) the position of
A situational awareness graphic for the signal within the last five minutes, and (2) the direction the signal is
a flow controller which manages the now heading, as identified by the arrow. In this example, the downward
pump speed. direction indicates an abnormal rate of change.
The horizontal grey line represents the setpoint, and the light grey rectangle represents the
optimal range. For this system, we can immediately see the marker is about to go below the
optimal range, and if this signal keeps going down it will trigger an alarm. Knowing what might
occur, operators can take corrective action before an alarm is activated, all because this
information was shown in context.
Another useful feature of situational awareness presented in this example is that alarm settings,
operational limits, direction of data, and historical information are all shown in one symbol. In
the traditional approach, the operator had to visit three screens to obtain the same data, and
also had to know or seek information on operational limits and alarm settings.
Introducing As part of the rationalization process, one must determine the impact that an alarm has on
operations and assign it a “severity” designation. It is strongly recommended that this severity
Severities to designation be made by someone with considerable knowledge of the process. Process
Categorize Impact
engineers and operators themselves should be consulted to eliminate doubt from the analysis of
alarm impact.
Figure 5
Example of severities applied to
alarms to categorize priority ranges.
It is advised to not use more than
four types of severities for alarm
conditions.
While these numbers might differ by plant, they affect operations in their own way.
Applying these severities makes it easy for operators to know which type of response is
required for each condition. The consequences are also clear for each given alarm severity.
Properly using Situational Awareness Library symbols with Situational Awareness techniques
ensures nothing else in the system uses these colors, so operators will not be confused during
an alarm event.
A good rule-of-thumb is to configure the severity distribution of alarms as: 79 percent = Low, 15
percent = Medium, 5 percent = High, and 1 percent = Critical. This general distribution may vary
considerably depending on the industry and type of product or service delivered.
8
Severity/priority impacts are explained in the EEMUA 191 and API RP 1167.
Another compelling reason to adopt this approach is that it does not interfere with alarm priority
usage. For example, if 100 represents maintenance and 200 represents production, it is still
possible to prioritize and route alarms to proper consumers. Implementing these four severity
types determines how a user group should respond to an alarm.
In accordance with best practices for Situational Awareness design, as shown in the Image
column of Figure 5, the graphical alarm severity representation uses “triple information coding”
to exploit shape, color, and a number to represent the information to the user at run time. This
makes it possible even for someone with impaired color perception to interpret the condition
correctly. In well-designed systems, according to the best practices encouraged by Situational
Awareness methods, the only bright colors that appear in Process Visualization Systems are
those used to represent alarm conditions. A powerful additional visual cue can be presented to
the operator in the form of an alarm border animation, which provides a conspicuous graphical
indication that particular part of the system is in an alarm state.
Figure 7
Examples of indication icons.
How Alarm A common practice of Process Visualization is to embed an alarm banner on each screen. The
intention is to always give operators high visibility of active alarms, regardless of the content
Aggregation Aids on the current screen. This design has its limitations, since there are typically more standing
in the Detection
alarms in the system than can be viewed in the alarm banner, which can result in the operations
team becoming desensitized to its content. What the operator needs is a system wide KPI
view of the most urgent alarms, and access to a graphic that conveys the proper situational
of Alarms awareness to take appropriate action.
Figure 8
A high level overview graphic that
includes KPI indicators for Reactor
R33.
The Alarm ‘badge’ in the upper right shows four active alarms, with severity levels 2, 3, and 4.
The severity 2 and 3 alarms are active but acknowledged, meaning the operator is aware of
them. The upper left corner represents a blinking severity 4 indicator, the highest severity
level for an unacknowledged alarm as represented by its alarm border and icon. This prompts
the operator to drill down into the system to assess the situation, without needing to access a
detailed alarm banner for information.
In Figure 9, an operator can view details of the alarm event that was brought to his attention,
and then take the appropriate corrective action.
Figure 9
The detailed display of Reactor 33.
How Operations Alarm analysis software can significantly aid in implementing the rationalization process
on a running system. Major components of these analysis tools include: 1) a dashboard to
Can Use Alarm present alarm metrics and indicate progress towards these metrics, 2) reporting and analyses
capabilities to examine historical alarms, and 3) a predictive component to identify relationships
Analysis to between alarms.
Optimize an
Performance metrics
Existing System Performance metrics can be used to analyze the system. A good guideline to use is EEMUA
191, which advises that a system should not exceed 144 alarms per 24 hour period per
operator, or one alarm every 10 minutes, though these numbers can vary by production
process and industry.
One approach to optimization is to review the current situation, establish a baseline for that
situation by running this analysis, and accordingly define a plan for improvement. For example, if
the measured rate is 500 alarms per day, 200 alarms per day might be a good target goal to set
for starters in the first two months.
Figure 10 on the next page represents an alarm analysis software dashboard which displays
key metrics that help determine how a system is trending towards goals. Configurable Alarm
widgets, shown in the figure, clearly indicate how the system is trending towards the set metrics
and goals.
Figure 10
Configurable Alarm
widgets clearly indicate how the
system is trending towards the set
metrics and goals.
A good Alarm analysis tool should analyze and categorize these alarms and present them in
reports to the user.
Figure 11
Total alarm report
Figure 12
Frequent Alarm report
Figure 13
Standing Alarm report
Figure 14
Fleeting Alarms report
Since these alarms can be the cause of a lot of noise in the system, the report can include
criteria on alarm duration (Chatter Time) to detect fleeting alarms. One method for eliminating
this category of alarms is to implement “bounce” and “de-bounce” timers set to exceed the
length of chatter time. It should be understood, however, that these reports should be empty;
chattering alarms should not exist.
Another useful type of report, the “consequential analysis” report, shows relationships between
alarms. This report type reveals whether obscure cause-and-effect relationships exist among
process areas and resultant alarms.
These various reports are identified and described in the EEMUA 191 standard.
We have reviewed alarm design considerations, how situational awareness and alarming are
Applying interrelated, that alarm severity requirements should be addressed during the design process
Rationalization instead of as an afterthought, and we have demonstrated the importance of a carefully considered
alarm design. In addition, we have examined the importance of alarm aggregation and the
methods various facets of alarm analysis such as alarm types, their characteristics, and examples of alarm
reporting. These are the foundation to understanding the process of analyzing alarms.
One of the final steps in this entire process is to improve the alarm design of the current system
in order to reduce the alarm load on operations and enable them to focus on the most critical
alarms and production optimization.
To determine which areas of the system require design improvements, individual alarms can
be examined. There might be several reasons why an alarm is no longer valid or is improperly
configured. A path for improving the design can be implemented using several techniques,
because at this point it is clear which pain points exist in the process.
The system should be designed to conceal many of the various types of alarms considered
to be noise. A lot of noise in a running system could potentially be caused by equipment that
is either not in production or is defective. For optimal alarm system management, the system
designer needs to understand the types of alarm states that can be attained by alarms, and how
these states are attained.
Understanding these alarm states helps the system designer rationalize and manage the
number of alarms within an operation. Since we understand how to use alarm analysis to
improve an alarm system design, managing alarms that have attained the aforementioned
states is the final step in making the alarm system more robust. The implementation of this
knowledge in practice will result in alarm noise reduction, by only showing alarms that are
currently important to the operation.
While any of these techniques can be used to reduce alarm loads, a good understanding of
the current situation and a design improvement plan should be in order. The plan might include
setting a goal as an organization; for example, to eliminate the top five bad actors from the
alarm analysis report each week. By implementing and refining the changes in this example,
the root cause found in three lists, multiplied by five bad actors, would result in tracking down 15
faulty alarms per week. This appears to be an achievable goal. After 10 weeks, 150 bad actors
that most likely are responsible for 80 percent of the noise would be eliminated.
The ISA/ANSI 18.2 standard refers to an “alarm life cycle,” and this is an important concept. The
The Approach management of alarms, design, runtime, and analysis can be an enormous task. This process
to Maintaining a is also ongoing; not a task that is executed only one time. Though this ongoing process can
appear tedious, ultimately it will result in more stable operations with higher throughput. What’s
Well-functioning more, operators can be freed to focus on other more productive activities.
Alarm System When the concepts introduced in this paper are executed in conjunction with effective
Situational Awareness practices, the result is a solid supervisory control platform that enables
the optimization of industrial operations.
Activity begins with a new or old system and an analysis of the current situation, implementation
of the system, and alarm system philosophy. These activities are ongoing activities; monthly/
weekly activities are required to act upon report data, and to execute improvement plans such
as eliminating bad actors, implementing state-based alarm suppression, and/or adjusting alarm
limit settings.
Annual Monthly
Analysis of current situation Alarm Rationalization
Operator Alarm Active
Input Advisor When
Determine Standing
Conclusion The demands upon modern systems require a new approach to solving daily problems. At a
time when many organizations are confronted by challenges of an aging workforce and high
staff turnover, the systems we create need to be more stable and have the ability to present
data in context, as well as deliver actionable alarming. Adoption of some of the fundamental
approaches outlined in this paper can help many organizations improve their overall
performance. Actionable alarming is one of the areas in which many implementations fail and
systems that are delivered do not perform adequately. Improvements and investments in this
area have a high ROI, since investment costs are relatively low and the return rate is high in
gained productivity.
“We were getting alarm “We were getting alarm horns all the time; at startup, shutdown and day-to-day operation. In
horns all the time; at one 18-hour period, operators were confronted with 5000 alarms, every one of which required
startup, shutdown and day- intervention of some sort, and 98 percent were designated top priority. The plant had to
to-day operation. In one designate an operator just for alarm management,” said Ron Bewsey, SRP I&E Supervisor.
18-hour period, operators
were confronted with 5000 After applying the techniques described in this white paper, these were the reported results at
the following facilities:
alarms, every one of which
required intervention of
Achievements at Santan:
some sort, and 98 percent
•• Startup time and effort is reduced from having 2 operators up to 4 hours, down to
were designated top 1 operator less than 2 hours.
priority. The plant had to
designate an operator just •• 40% of configured alarms and resulting nuisance alarms were identified and deleted.
for alarm management.”
Achievements at Navajo:
•• Initial priority distribution (Priority 1: 98%, Priorities 2 through 4: 2%) is updated to a final
priority distribution (Priority 1: 11%, Priority 2: 14%, Priority 3: 75%, Priority 4: Information
Only, Priority 5: Non-critical bad I/O).
•• 44% of the configured alarms and resulting nuisance alarms were identified and deleted.
The benefit of improving an alarm management system is that this process can be started at
any point during the lifecycle of a system.
All the methods and features in this white paper are delivered within the latest releases of
Wonderware® System Platform and Wonderware Alarm Adviser.
Alarm
Figure 17 Optimize Management Design Start here
Alarm Management Lifecycle. Lifecycle
Start here