Submitted Feb. 2004 ICWS
Autonomic Approach to Survivable Cyber-Secure Infrastructures
Frederick Sheldon, Tom Potok
Applied Software Engineering
Oak Ridge National Lab.1
Oak Ridge, TN 37831 USA
SheldonFT |
[email protected]
Michael Langston
Dept. Computer Science
University of Tennessee
Knoxville, TN 37996 USA
[email protected]
Abstract
Information systems now form the backbone of nearly
every government and private system – Web services
currently does or will play a major role in supporting
access to distributed resources, command and control to
exploit the backbone. Increasingly these systems are
networked together allowing for distributed operations,
sharing of databases, and redundant capability. Ensuring
these networks are secure, robust, and reliable is critical
for the strategic and economic well being of the Nation.
This paper argues in favor of a biologically inspired
approach to creating survivable cyber-secure
infrastructures (SCI). Our discussion employs the power
transmission grid. Keywords Infrastructure Vulnerability,
Reliability, Cyber-Security, Software Agent, Autonomic
Computing Paradigm
1
Introduction
Survivability of a system can be expressed as a combination
of reliability, availability, security, and human safety. Each
critical infrastructure (component) will stress a different
combination of these four facets to ensure the proper
operation of the entire system(s) in the face of threats from
within (malfunctioning components, normal but complex
system interrelationships that engender common failures)
and threats from without (malicious attacks, and
environmental insult, etc.). Structured models allow the
system reliability to be derived from the reliabilities of its
components. The probability that the system-of-systems
survives depends explicitly on each of the constituent
components and their interrelationships as well as systemof-systems relationships. Reliability analysis can provide
insight to developers about inherent (and defined)
components and/or (intra-)system “weaknesses” [1-4].
Naturally, as the software/system complexity increase, the
reliability analysis task becomes more difficult.
In the face of ever increasing computing complexity and
pervasiveness, at the core of autonomic systems (AS), is
1
This manuscript has been authored by UT-Battelle, a contractor of the
U.S. Government (USG) under Department of Energy (DOE) Contract
DE-AC05-00OR22725. The USG retains a non-exclusive, royalty-free
license to publish or reproduce the published form of this contribution, or
allow others to do so, for U.S. Government purposes.
Axel Krings and Paul Oman
Dept. Computer Science
University of Idaho
Moscow, ID 83844 USA
Krings |
[email protected]
introspection and self-management. AS strive to
(transparently) provide users with a machine/system that
runs at peak performance 24x7. Like their biological
complement, AS maintain and adjust their function in the
context of changing components, workloads, stress, and
external conditions and in the context of hardware/software
failures, random or malicious [5, 6].
1.1 Biologically Inspired Survivability (BIS)
The next generation of high performance dynamic and
adaptive nonlinear networks, of which power systems are
an application, will be designed and upgraded with
interdisciplinary knowledge for achieving improved
survivability, security, reliability, reconfigurability and
efficiency. Furthermore, there is an urgent need for the
development of innovative methods and conceptual
frameworks for analysis, planning, and operation of
complex, efficient, and secure electric power networks2.
SCI represents the combination of performance and
reliability modeling, and survivability analysis germane to
future (fourth-generation) power distribution and electronic
information infrastructure applications including
communication, network-centric distributed command and
control as they relate to electrical energy generation,
storage/distribution, and electrical machinery and
equipment. Two important themes form the basis for
increasing robustness in large-scale networked information
systems. First cognitive immunity promises improved, cost
effective technologies for the detection, quantification and
recovery from vulnerabilities/faults:3 Such cognition is
2
The continued security of electric power networks can be compromised
not only by technical breakdowns, but also by deliberate sabotage,
misguided economic incentives, regulatory difficulties, the shortage of
energy production and transmission facilities, as well as the lack of
appropriately trained engineers, scientists and operations personnel.
3
The term "fault" is used consistent with "fault-tolerant design" models,
and does not necessarily refer to short circuits like "bolted faults." During
the Aug. 10, 1996 west coast cascading failures one contributing cause
was McNary generator exciter circuits erroneously detecting a "phase
imbalance" that was actually a drop in frequency. Frequency oscillations
also contributed to voltage swings which were erroneously interpreted as
"switch onto fault" logic by several protective relays that (subsequently)
tripped offline. Theoretically, a fault is a discrepancy between a computed,
observed, or measured value or condition and the true, specified, or
theoretically correct value or condition (ANSI). Generally, a fault is an
truly context dependent. For example, to establish
immunity in the power distribution and electronic
information (PDEI) infrastructure, we must:
• Assess the state-of-practice of (remote) real-time
vulnerability/fault detection for SCI.
• Use existing models of SCI vulnerability/fault detection
to develop an improved numerical simulation model for
nominal/transient flows.
• Based on the new numerical model, develop and test a
real-time detector that can promptly locate and accurately
quantify vulnerabilities/faults.
• Simulate a real-time network of detectors and evaluate
the effects of signal strength and noise on
vulnerability/fault detectability.
• Explore the use of the numerical model, driven by realtime data, within a secure communications infrastructure
to define the parameters of a survivable SCI including
Supervisory Control and Data Acquisition (SCADA)
system.
The second theme, self-healing, provides biologicallyinspired response strategies and proactive automatic
contingency planning for the PDEI infrastructure including
automated data acquisition, secure system monitoring, and
control techniques between source/sink and control centers.
• Determine the similarities between energy control (e.g.,
electric power grid control) and information networks for
adaptation to SCI control systems,
• Assess the state-of-the-practice with respect to the
application of Information Security (InfoSec) principles
within existing SCI control and information networks,
• Adapt or develop procedures for Common Mode Failure
Analysis (CMFA) and Security/Survivability Systems
Analysis (S/SSA) from the electric power domain to
application within SCI and information networks in
general (e.g., Internet) [7],
• Identify areas within SCI control and information
networks where existing InfoSec technologies can be
applied, but are heretofore absent,
• Identify SCI specific vulnerabilities for which new
InfoSec technologies and devices must be developed or
adapted.
1.2 Ensuring System Integrity
To codify and systematize BIS the focus should be on
requirements, models and tools that aid in the process of
ensuring system integrity [8] by selecting the mitigation
mechanisms that maximize the individual and system wide
objectives (see Fig. 1). In this way, optimization
techniques can be added showing how resources (i.e., cost),
can be spent on individual solutions, and how this affects
the overall survivability. An advantage of this approach,
especially in the first phase, would be that SCI
“accidental or abnormal physical” condition that may cause a functional
unit(s) to fail to perform its required function (when and if encountered).
Faults can be classified in terms of criticality indicating the severity of the
failure consequences. Error analysis is the process of investigating an
observed fault with the purpose of tracing the fault to its source
(diagnosis).
implementations in the long haul could be targeted easier,
as it is a bottom-up approach[9]. In fact, the applicability of
the proposed technology/ methodology to multiple energy
sectors in the infrastructure scope is broad because the
degree of impact (i.e., to improve or sustain energy
assurance) on the energy infrastructure is determined at the
component level [10, 11].
2
Network Vulnerability
As a society, we have become dependent on the computer
infrastructure networks (including energy grids, pipelines,
transportation systems/ thoroughfares and facilities) that
sustain our daily lives. Network-centric infrastructure
demands robust systems that can respond automatically and
dynamically to both accidental and deliberate faults.
Adaptation of fault-tolerant computing techniques has made
computing and information systems intrusion-tolerant and
more survivable, but even with these advancements, a
system will inevitably exhaust all resources in the face of a
determined cyber adversary. Computing and information
systems also have a tendency to become more fragile and
susceptible to accidental faults and errors over time if
manually applied maintenance or restoration routines are
not administered regularly. This project seeks to address
these deficiencies by creating a new generation of security
and survivability technologies. These “fourth-generation”
technologies will bring attributes of human cognition to
bear on the problem of reconstituting systems that suffer the
accumulated effects of imperfect software, human error,
and accidental hardware faults, or the effects of a cyber
attack. Vulnerabilities addressed include mobile/malicious
code, denial-of-service attacks, and misuse and malicious
insider threats, as well as accidental faults introduced by
human error and the problems associated with software and
hardware aging [12, 13].
The overarching goals in light of, for example, the threat
posed by a blackout similar to the one that occurred on
August 14 2004, is to implement systems that always
provide critical functionality and show a positive trend in
reliability, that exceed initial operating capability and
approach a theoretical optimal performance level in the
long run. Desired capabilities include self-optimization,
self-diagnosis, and self-healing, architecture/methodology
for systems that support self-awareness and reflection in
order to achieve these capabilities.
2.1 Survival Strategy
SCI is a strategy intended to meet the critical need for
fourth generation survivability and security mechanisms
that complement first-generation security mechanisms
(trusted computing bases, encryption, authentication and
access control), second-generation security mechanisms
(boundary controllers, intrusion detection systems, public
key infrastructure, biometrics), and third-generation
security and survivability mechanisms (real-time execution
monitors, error detection and damage prevention, error
compensation and repair). New fourth generation
technologies will draw on biological metaphors (so-called
artificial biology) such as software that survives because it
possesses biological properties of redundancy and
regeneration (i.e., parts die off without affecting the whole),
natural diversity and immune systems to achieve robustness
and adaptability, the structure of organisms and ecosystems
to achieve scalability, and human cognitive attributes
(reasoning, learning and introspection) to achieve the
capacity to predict, diagnose, heal and improve the ability
to provide service.
2.2 Hierarchical Evaluation
The SCI strategy uses a hierarchical method to evaluate
and implement survivability mechanisms and mitigate
failures associated with three important areas of energy
assurance: (a) securing cyber assets, (b) modelling, and
analysis to understand and enable fundamentally robust and
fault-tolerant systems, and (c) systems architecture that can
overcome vital limitations. Infrastructure Evaluation
comprises 2 phases. First, individual components of the
infrastructure are evaluated in isolation to derive individual
component survivability (CS). The process identifies
feasible mitigation mechanisms on a per component basis.
In the second phase, the CS is composed into the system-atlarge. This approach leverages individual CS models to
create hierarchical structures with increased system
survivability (e.g., against failures due to the complexity of
engaging unanticipated component interactions)4.
To
codify and systematize this approach the focus is on models
that aid in the process of ensuring system integrity [15] by
selecting mitigation mechanisms that maximize individual
and system wide objectives. In this way, optimization
techniques can be added showing how resources may be
spent on individual solutions, and consequently, how such
strategies affect the overall survivability. Naturally,
individual component survivability alone is not the means
for understanding the survivability of the whole system-ofsystems. However, using a bottom up compositional
approach enables a model-based notational language to be
used to provide a complete and unambiguous description of
the system.
2.3 Networks of Control
Industries that use and develop critical infrastructure have
become more computerized, and the risk of digital
disruption from a range of adversaries has increased [7].
The societal common ground has proven essential to our
digital economy, but has become fragile and operated at its
margins of efficiency without reinvestment for many years.
Assessment and mitigation strategies are needed to support
implementing/configuring optimally redundant (backup)
systems, low-cost data collection methodologies,
identification of critically vulnerable nodes and
4
The sources of common mode faults are widespread. See [14]
A.
Krings and P. Oman, A Simple GSPN for Modeling Common Mode
Failures in Critical Infrastructures, HICSS-36 Minitrack on Secure and
Survivable Software Systems, Hawaii, 2003, 334a-44. for modelling
primitives that represent interdependency failures in very simple control
systems (i.e., an initial step in creating a framework for analyzing
reliability/survivability characteristics of infrastructures with both
hardware and software controls).
communication pathways, detecting intruders or abnormal
operations, mechanisms for distributed intelligent adaptive
control to effect more flexible and adaptive systems.
Fault-tolerant systems deal with accidental faults and
errors while intrusion-tolerant systems cope with malicious,
intentional faults caused by an intelligent adversary.
Combining fault- and intrusion-tolerance technologies
produces very robust and survivable systems, but these
techniques depend upon resources that may eventually be
depleted beyond the point required to maintain critical
system functionality. An biologically inspired approach
will reconstitute and reconfigure these resources in such a
manner that the systems are better protected in the process,
reliability is continually improved as vulnerabilities and
software bugs are discovered and fixed autonomously, and
the ability to provide critical services is never lost.
3
Autonomic Framework
The autonomic (AC) computing approach was outlined in
2001 by Paul Horn Sr. VP of Research at IBM, a corporatewide initiative in response to what their customers feel are
the major impediments to more widespread deployment of
computing in the workplace. Customers are concerned with
the total cost of ownership (TCO) and believe that
configuration management (i.e., installing software and
patches, setting various performance parameters, etc) is a
significant contributor to TCO.
Ideally, systems would be self-managing; work well out
of the box and continue to work well as the computing
environment changes (due to failure-induced outages,
changes in load characteristics, addition of server capacity).
New applications may be easier to deploy if existing ones
can automatically adjust and if the appropriate building
blocks exist to support the construction of new applications
in ways that can adapt themselves. The essential theme for
AC systems therefore is self-management and cognition
consisting of the following four pillars [5]:
• Self-configuration –Automated configuration of
components and systems follows high-level policies. Rest
of system adjusts automatically and seamlessly.
• Self-optimization –Components and systems continually
seek opportunities to improve their own performance and
efficiency.
• Self-healing –System automatically detects, diagnoses,
and repairs localized software and hardware problems.
• Self-protection –System automatically defends against
malicious attacks or cascading failures. It uses early
warning to anticipate and prevent system wide failures.
Therefore, the higher-order cognitive processes of
reflection and self-awareness are key to creating systems
that are not fragile in the presence of unforeseen inputs.
Moreover, these systems will have the capacity to reason,
learn, and respond intelligently to things never encountered
before. However, to realize the challenge many factors
must be considered (see Fig. 1).
3.1 Cognitive Cyber Defense
To achieve SCI a hierarchical method may be used to
assess and implement survivability mechanisms and
mitigate vulnerabilities as well as all classes of failures. (1)
Hardening cyber assets using a framework for SCI
survivability, (2) providing robustness and fault-tolerance
through modeling, simulation, and analysis, and (3)
overcoming fundamental limitations for increased
reliability via effective systems architecture and the
application/development of the autonomic computing
paradigm mentioned above [16]. Survivability assessment
comprises 2 phases: First individual components of the
infrastructure are evaluated in isolation to derive various
components survivability. This phase identifies feasible
mitigation mechanisms on a per component basis. In the
second phase, a mapping from component survivability is
extended to the overall system at large resulting in better
comprehension that can:
• Enhance control system dependability due to fault
tolerance and system integrity strategies (i.e., using
autonomic computing paradigm),
• Support a modular/scaleable approach to critical systems
automation, and energy/information distribution,
• Improved ability to sustain operational capability post
attack,
• Support modeling and simulation of damage
phenomenology in support of more intelligent sensors,
and
• Provide an optimized technology assessment approach
that can be used to select system architectures and define
the elements of systems and their control to enable,
improved system survivability (e.g., using segregated
system zones and autonomic computing paradigm).
3.2 Common Mode Failures
Critical energy infrastructures and essential utilities have
been optimized for reliability under the assumption of a
benign operating environment. Consequently, they are
susceptible to cascading failures induced by relatively
minor events such as weather phenomena, accidental
damage to system components, and/or cyber attack. In
contrast, survivable complex control structures should and
could be designed to lose sizable portions of the system and
still maintain essential control functions [7]. For example,
in [14], the Aug. 10, 1996 cascading blackout is studied to
identify and analyze common mode faults leading to the
cascading failure. Strategies are needed to define
independent, survivable software control systems for
automated regulation of critical infrastructures like electric
power, telecom, and emergency communications systems.
3.3 Cyber Security
Several mitigating factors contribute to the difficulty of
implementing cyber security in power substation control
networks. First, is the geographic distribution of these
networks, spanning hundreds of miles with network
components located in isolated remote locations as well as
the sheer number of devices connected to a single network
open to compromise. The enormity of access points greatly
increases the risk of cyber attack against electronic
equipment in a substation [17].
Our approach uses intelligent software agents (SAs) [18,
Requirements
• Real-time control
• Network connectivity
• Fault tolerant/fail safe
• Harsh environment
• Performance Constraints
• Size/ Weight ...
• Power/ Thermal ...
• Component libraries
Models
• Structural analysis
• Dynamic equations
• CAD modeling and simulation
• Part interaction analysis
• Sensor and actuator circuits
Tools
• Smart process schedulers
• Communications configuration
• Intelligent autonomic prgrmng tools
• Automatic code generation / V&V
• COTS integration / User interfaces
• On-line resource allocation
Figure 1. Automate/integrate physical, computational
platform and real-world constraints.
19 2003] (each modeled as an individual component) to
deploy new and user-friendly data collection and
management capabilities which possess inherent resiliency
to failures in control networks [7, 20 2003] as well as
maintenance/evolution properties that promote low cost of
ownership [7, 21]. SAs enable secure, robust real-time
status updates for identifying remotely accessible devices
vulnerable to overload, cyber attack etc., [22, 23], as well as
intelligent adaptive control [24].
3.4 Inherent Obstacles
The diversity of equipment and protocols used in the
communication and control of power systems is staggering
[7]. The diversity and lack of interoperability in these
communication protocols create obstacles for anyone
attempting to establish secure communication to and from a
substation (or among substations in a network of
heterogeneous protocols and devices). In addition to the
diversity of electronic control equipment is the variety of
communications media used to access this equipment. It is
not uncommon to find commercial telephone lines,
wireless, microwave, private fiber, and Internet connections
within substation control networks [25].
3.5 Mitigation Strategies
Previous work in this area has presented details of both
threats and mitigation mechanisms for substation
communication networks [25]. In [26], the most important
mitigation actions that would reduce the threat of cyber
intrusion are highlighted. The greatest reduction can be
achieved by enacting a program of cyber security education
combined with an enforced security policy. Combined,
the environment. Proactive plans form
the core of the deliberative process,
and represent the planning and
Application/Environment Context
reasoning processes. The autonomic
Component Context
plans represent the reflective process,
Inter-Agent
including monitoring the agent's
Cognitive Agent
Interacting and Contributing
Communication
performance to achieve robust and
and Coordination
secure behavior. The robustness will
Public
Public
Public
include, at a minimum, fail-safe plans
Goals
Beliefs
Services
to respond to unexpected events. An
Public
agent may make available its Public
Beliefs
Goals
Intentions
Private
Services to other agents in the system.
Autonomic
Private
Private
Agents in cognitive systems are
Proactive
Beliefs
Goals
autonomous and situated. Thus each
Reactive
Inputs/Outputs
agent is implemented with one or more
(sensors and
active processes (or threads). For
Reasoning and Adapting
actuators)
simple reactive agents (with very
limited deliberative or reflective
processes), a single thread is adequate
Agent Privileges, Access Policies and Enforcement Mechanisms
to respond reactively to external
Figure 2. Conceptualization: cognitive agents, components & application context. stimuli. Complex agents may have
separate threads for reactive, proactive
these two strategies will have the greatest impact because of
6
and
autonomic
plans
.
the lag in cyber security knowledge within the industry.
Component Rules and Constraints
Education and enforcement will assist with counteracting
both external and insider threats 5.
4
Software Agents
Adaptive/intelligent software agents [27-29] can be used to
deploy new and user-friendly data collection, and inherent
resiliency to failures in responsive decision networks [30]
as well as software maintenance/ evolution properties that
promote low cost of ownership [21, 31] (see [6] for a
discussion of fundamental [dis-]advantages). Using
software agents can enable secure and robust real-time
status updates for identifying remotely accessible devices
vulnerable to overload, cyber attack etc. [31-33], distributed
intelligent adaptive control [34], and characterization of
damage and failure mechanisms (see Fig. 2).
Cognitive systems may comprise 3 types of processes: a)
r e a c t i v e , timely response to external stimuli, b)
deliberative, learning and reasoning, c) reflective,
continuously monitor/adapt based on introspection.
4.1 Cognitive Agent Architecture
Based on the BDI model [35, 36] the Beliefs of an agent
can consist of private and public beliefs. Private beliefs
represent local agent state information, which form the
main basis for reasoning and reactive behavior. Public
beliefs include (distributed) information about the
context/environment and are the basis for reflective
processes. The Desires are goals, where private goals
govern the deliberative activities while the public goals
direct the reflective processes as they describe the overall
cognitive system goals. Intentions (services) consist of
reactive, pro-active, autonomic and public plans. Reactive
plans deal with timely responses to inputs and changes in
5
FERC (Federal Energy Regulatory Commission) adopted NERC (North
American Energy Reliability Council) security policies as standard.
4.2 Modeling and Optimization
In addition, as an extension to the SCI, we identify how
specific SCI communication protocols and mechanisms
[27] can be modeled and mapped onto fault-models for
understanding the impacts of common mode failures and
usage profiles, including load scheduling [37-39], to
identify weak points (assisting risk assessment and
mitigation) in the system [14, 40, 41]. For example, there
are cost effective ways to apply survivability methods [33,
42] based on redundancy and dissimilarities to the
communication networks controlling the SCI. This
provides several advantages: 1) the result uses a
transformation model [43-45] to map the specific protocol
and/or application to a graph and/or Petri Net(s) [46]; 2)
interesting optimization criteria can be applied to facilitate
survivability based on redundancy, while investigating the
degree of independence required to achieve certain
objectives (e.g., defining minimal cut sets of fault trees
associated with any hazard); 3) isolation of the critical
subsystems, which constitute a graph, and using agreement
solutions to augment the graph to achieve the required
survivability (robustness). Thus, different graphs may be
derived that contain the original critical subsystems and are
augmented by edges and/or vertices that allow the use of
agreement algorithms. In this way, critical systems
decisions are decentralized and invulnerable to malicious
attacks, as long as the threshold of faulty components
dictated by the agreement algorithms is not violated.
Moreover, the whole field of system fault diagnosis, which
6
Cognitive agent systems specification is defined by a Cognitive MultiAgent Modeling Language (CMAML) and formally described using
denotational semantics [6]. The key concepts of the language are Agent,
Belief, Goal, Plan, KQML Performative, FIPA Performative and
Blackboard [ICA02Kavi].
originated from the PMC model (Preparata, Metz and
Chien), can be applied [47, 48]. The fundamental question
is "Who tests whom, and how is the test implemented to
identify faulty components?" In this vein, we can address
(i.e., specify) how to derive "diagnostics" that would
determine if the system is robust along with a measure of
confidence (i.e., determine the effectiveness, see [32-34, 42,
49, 50]).
4.3 Exemplar
Consider the need for secure web services in the context of
compute-intensive applications. A natural place to focus
basic research efforts is on computational problems that are
hard to solve but easy to check. NP-complete problems are
prime examples. Such a problem cannot be solved in
polynomial time (assuming P≠NP), and yet is easy to
check, in the case of a “yes” instance due to its membership
in NP. Within this class, let us further restrict our attention
to problems that are FPT (Fixed-Parameter Tractable) [51].
A problem of size n, and parameterized by k, is FPT if it
can be decided in O(f(k)nc) time, where f is an arbitrary
function and c is a constant independent of both n and k.
Algorithms for FPT generally operate in two stages. The
first stage, termed “kernelization,” is aimed at condensing
an arbitrarily difficult instance into its combinatorial kernel
or core. The goal is to make the kernel's size some small
function of the relevant parameter (e.g., see [52]). The
second stage, known as “branching,” is used to explore the
search space of the kernel efficiently. It is branching that
requires the vast majority of time, space and
communication (e.g., see [53]). By kernelizing sequentially,
but branching across the web, we achieve:
• Verifiability. Membership in NP means that we can
usually expect to be able to check a candidate solution
quickly. This is a critical feature, ensuring that a faulty
or malicious processor cannot invalidate or subvert our
computation.
• Security. We break the search space into disjoint sections
and distribute them out to different processing elements.
Each element knows only its share of the given instance,
which is of course advantageous should the problem be
sensitive.
Even if two or more elements are
untrustworthy and work in collusion, they cannot deduce
the entire instance. Any attempt to exploit intercepted
transmissions is similarly thwarted, thereby containing
damage from intrusion. Strong concealment of the total
problem is a natural part of this method.
• Scalability. As a computation, an FPT-based approach
scales wonderfully. Branching translates to a most
flexible form of partitioning. There are no a priori lower
or upper bounds on the degree of parallelism that can be
utilized. Furthermore, almost any architectural model
will do, from tightly coupled parallel systems to widely
distributed grids. This process can be viewed as
something akin to a real-time, secure version of
seti@home or folding@home.
• Robustness. The kernelization-plus-branching algorithm
design paradigm requires no explicit communication
between remote processing elements. If a limited number
of elements or links fail or become unreliable, we are
able to add, delete or shift branching segments around at
will, thereby ensuring at worst a graceful form of
degradation and preventing catastrophic failure.
5
Summary and Conclusions
Agent-based computing combined with vision of autonomic
computing represents an important new paradigm both for
Artificial Intelligence and, more generally, Computer
Science. It has the potential to significantly improve the
theory and the practice of modeling, designing, and
implementing SCI systems. Yet, to date, there has been
little systematic analysis of what makes the agent-based
approach such an appealing and powerful computational
model. Moreover, even less effort has been devoted to
discussing the inherent disadvantages that stem from
adopting an agent-oriented view. Here both sets of issues
are explored. The standpoint of this paper has been the role
of agent-based software in solving complex, real-world
problems of security and. In particular, it was argued that
the development of robust, cyber defensible survivable
software systems requires autonomous agents that can
complete their objectives while situated in a dynamic and
uncertain environment, that can engage in rich, high-level
social interactions, and that can operate within flexible
organizational structures.
Some people claim that agent-based computing (ABC)
can significantly improve our ability to model, design and
build complex, distributed software systems. Indeed, a high
degree of correspondence exists between the requirements
of complex system development paradigms and the key
concepts and notions of agent-based computing. The ABC
approach will likely succeed as a mainstream software
engineering (SE) paradigm because it is a logical evolution
from contemporary SE approaches to and because it is well
suited to developing software for open systems. In contrast,
ABC has the characteristic of unpredictable interactions.
The strong possibility of emergent (nondeterministic)
behavior in the wrong context is an inherent drawback.
However, one important countermeasure is that long-term
means of addressing these problems, a social level
characterization of agent-based systems was advocated as a
promising point of departure. Agent-based computing
should be seen in its broader context as a general-purpose
model of computation that naturally encompasses
autonomic distributed and concurrent systems
6
References
[1] F. T. Sheldon and K. Jerath, Assessing the Effect of Failure
Severity, Coincident Failures and Usage-Profiles on the
Reliability of Embedded Control Systems, To Appear ACM
Symposium on Applied Computing, Nicosia Cyprus, 2004.
[2] F. T. Sheldon and S. A. Greiner, Composing, Analyzing and
Validating Software Models to Assess the Performability of
Competing Design Candidates, Annals of Software Engineering
(On Software Reliability, Testing and Maturity), vol. 8, 1999.
[3] F. T. Sheldon, S. Greiner, and M. Benzinger, Specification,
safety and reliability analysis using Stochastic Petri Net models,
10th Int. Wkshp on Software Specification and Design, San Diego,
CA, 2000, 123-132.
[4] F. T. Sheldon, K. Jerath, and S. A. Greiner, Examining
Coincident Failures and Usage-Profiles in Reliability Analysis of
an Embedded Vehicle Sub-System, Proc Ninth Int’l Conf. on
Analytical and Stochastic Modeling Techniques [ASMT 2002],
Darmstadt Germany, June 3-5, 2002, 558-563.
[5] J. O. Kephart and D. M. Chess, The Vision of Autonomic
Computing, IEEE Computer Magazine, 2003, 41-50.
[6] N. R. Jennings, On Agent-based Software Engineering,
Artificial Intelligence, vol. 117 (2), 2000, 277-96.
[7] F. Sheldon, T. Potok, A. Krings, and P. Oman, Critical
Energy Infrastructure Survivability, Inherent Limitations,
Obstacles and Mitigation Strategies, Int'l Jr. Power and Energy
Systems (Spec. Theme Blackout), 2004, To Appear.
[8] F. T. Sheldon and H. Y. Kim, Testing Software Requirements
with Z and Statecharts Applied to an Embedded Control System,
To Appear: Software Quality Journal, 2004.
[9] A. W. Krings, W. S. Harrison, N. Hanebutte, C. S. Taylor, M.
McQueen, and S. Matthews, An Agent Supported Bottom-Up
Approach to Computer and Network Survivability, Int'l Conf.
Dependable Systems and Networks (Supplement of to DSN-2001),
Goteborg Sweden, 2001, B70-71.
[10] C. Taylor, P. Oman, and A. Krings, Assessing Power
Substation Network Security and Survivability: A Work in
Progress Report, Proc. Int’l Conf. on Security and Management
(SAM'03), Las Vegas, 2003.
[11] H. Y. Kim, Jerath, K. and Sheldon, F.T., "Assessment of
High Integrity Components for Completeness, Consistency, FaultTolerance and Reliability," in Component-Based Software
Quality: Methods and Techniques, vol. LNCS 2693, A. Vallecillo,
Ed. (Heidelburg: Springer-Verlag, 2003) 259-86.
[12] B. Liscouski and W. J. S. Elliott, "Causes of the August 14
Blackout in the United States and Canada," NRCAN/USDOE
(US-Canada Power System Outage Task Force), Wash. DC,
Interim Report, Nov. 2003.
[13] E. J. Lerner, "What's wrong with the Electric Grid?" The
Industrial Physicist, Vol. 9, Issue 5, (Accessed: Nov. 1, 2003)
http://www.tipmagazine.com Last Updated: Jan. 2004.
[14] A. Krings and P. Oman, A Simple GSPN for Modeling
Common Mode Failures in Critical Infrastructures, HICSS-36
Minitrack on Secure Survivable SW Sys, Hawaii, 2003, 334a-44.
[15] F. T. Sheldon and H. Y. Kim, Validation of Guidance Control
Software Requirements for Reliability and Fault-Tolerance, IEEE
Proc. RAMS, Seattle, Jan. 2002, 312-318.
[16] C. Tristram, From Artificial Intelligence to Artificial
Biology? Technology Review, vol. 106 (9), 2003, 40.
[17] NERC, An Approach to Action for the Electricity Sector, Ver.
1 (Princeton, NJ: N. American Electric Reliability Council, 2001).
[18] T. E. Potok, M. T. Elmore, J. W. Reed, and F. T. Sheldon,
VIPAR: Advanced Information Agents Discovering Knowledge in
an Open and Changing Environment, Proc. 7th World Mulitconf.
On Systemics, Cybernetics and Informatics Spec. Session on
Agent-Based Computing, Orlando, July 27-30, 2003, 28-33.
[19] F. T. Sheldon, M. T. Elmore, and T. E. Potok, An OntologyBased Software Agent System Case Study, IEEE Proc. Int’l Conf.
on Information Technology: Coding & Computing, Las Vegas,
Apr. 28-30, 2003, 500-06.
[20] T. E. Potok, L. Phillips, R. Pollock, A. Loebl, and F. T.
Sheldon, Suitability of Agent-Based Systems for Command and
Control in Fault-tolerant, Safety-critical Responsive Decision
Networks, ISCA 16th Int’l Conf. on Parallel and Distributed
Computer Systems (PDCS), Reno, Aug. 13-25, 2003, 283-290.
[21] F. T. Sheldon, K. Jerath, and H. Chung, Metrics for
Maintainability of Class Inheritance Hierarchies, Jr. of Software
Maintenance and Evolution, vol. 14 (3), 2002, 147-160.
[22] D. Conte de Leon, J. Alves-Foss, A. Krings, and P. Oman,
Modeling Complex Control Systems to Identify Remotely
Accessible Devices Vulnerable to Cyber Attack, ACM Wkshp on
Scientific Aspects of Cyber Terrorism, Wash. DC, Nov. 2002.
[23] C. Taylor, A. Krings, and J. Alves-Foss, Risk Analysis and
Probabilistic Survivability Assessment (RAPSA): An Assessment
Approach for Power Substation Hardening, Proc. ACM Wkshp on
Scientific Aspects of Cyber Terrorism, Wash. DC, Nov. 2002.
[24] C. Taylor, A. Krings, W. S. Harrison, N. Hanebutte, and M.
McQueen, Considering Attack Complexity: Layered Intrusion
Tolerance, DSN 2002 Wkshp on Intrusion Tolerance, June 2002.
[25] P. Oman, E. Schweitzer, and J. Roberts, Protecting the Grid
From Cyber Attack, Part II: Safeguarding IEDS, Substations and
SCADA Systems, Utility Automation, vol. 7 (1), 2002, 25-32.
[26] C. Taylor, P. Oman, and A. Krings, Assessing Power
Substation Network Security and Survivability: A Work in
Progress Report, Proc. Int’l Conf. on Security and Management
(SAM'03), Las Vegas, 2003, 281-287.
[27] Z. Zhou, Sheldon, F.T. and Potok, T.E., Orlando, July 31 Aug. 2, 2003, Modeling with Stochastic Message Sequence
Charts, IIIS Proc. Int’l. Conf. on Computer, Communication and
Control Technology, Orlando, FL, 2003.
[28] F. T. Sheldon, M. T. Elmore, and T. E. Potok, An OntologyBased Software Agent System Case Study, Int’l Conf on
Information Technology Coding and Computing (ITCC), Las
Vegas, Nevada, USA, 2003.
[29] T. Potok, Elmore, M., Reed, J. and Sheldon, F.T., VIPAR:
Advanced Information Agents Discovering Knowledge in an
Open and Changing Environment, SCI 2003 Proc. 7th World
MultiConf on Systemics, Cybernetics and Informatics (Special
Session on Agent-Based Computing), Orlando, 2003.
[30] F. T. Sheldon, T. Potok, and K. Kavi, Multi-Agent Systems
for Knowledge Management and Decision Networks, Informatica,
vol. 28 (SI Agent Based Computing), 2004, To Appear.
[31] T. E. Potok, Phillips, L., Pollock, R., Loebl, A. and Sheldon,
F.T., Suitability of Agent-Based Systems for Command and
Control in Fault-tolerant, Safety-critical Responsive Decision
Networks, ISCA 16th Int’l Conf. on Parallel and Distributed
Computer Systems (PDCS), Reno NV, 2003.
[32] D. Conte de Leon, J. Alves-Foss, A. Krings, and P. Oman,
Modeling Complex Control Systems to Identify Remotely
Accessible Devices Vulnerable to Cyber Attack, ACM Wkshp on
Scientific Aspects of Cyber Terrorism (SACT), Wash. DC, 2002.
[33] C. Taylor, A. Krings, and J. Alves-Foss, Risk Analysis and
Probabilistic Survivability Assessment (RAPSA): An Assessment
Approach for Power Substation Hardening, Proc. ACM Wkshp on
Scientific Aspects of Cyber Terrorism, (SACT), Wash. DC, 2002.
[34] C. Taylor, A. Krings, W. S. Harrison, N. Hanebutte, and M.
McQueen, Considering Attack Complexity: Layered Intrusion
Tolerance, Int’l Conf on Dependable Systems and Networks
(Wkshp on Intrusion Tolerance), 2002.
[35] A. S. Rao and M. P. Georgeff, BDI Agents: From theory to
practice, Int'l Conf. on Multi-Agent Sys, San Fran., 1995, 312-319.
[36] K. M. Kavi, M. Aborizka, and D. Kung, A framework for the
design of intelligent agent based real-time systems, Proc. 5th Int'l
Conf. on Algorithms and Architectures for Parallel Processing,
Beijing, 2002, 196-200.
[37] A. Krings, W. Harrison, A. Azadmanesh, and M. McQueen,
Scheduling Issues in Survivability Applications using Hybrid
Fault Models, To Appear Parallel Processing Letters, 2004.
[38] A. W. Krings, W. S. Harrison, M. H. Azadmanesh, and M.
McQueen, The Impact of Hybrid Fault Models on Scheduling for
Survivability, Int’l Wkshp on Scheduling in Computer- and
Manufacturing Systems (Seminar 02231, Report 343), Schloss
Dagstuhl, Germany, 2002.
[39] F. T. Sheldon, K. Jerath, and S. A. Greiner, Examining
Coincident Failures and Usage-Profiles in Reliability Analysis of
an Embedded Vehicle Sub-System, 9th Int’l Conf. on Analytical
and Stochastic Modeling Techniques [ASMT 2002], Darmstadt
Germany, 2002, 558-563.
[40] A. Krings and P. Oman, Secure and Survivable Software
Systems, IEEE HICSS-36, Minitrack on Secure and Survivable
Software Systems, Big Island, Hawaii, 2003, 334a.
[41] W. S. Harrison, A. Krings, N. Hanebutte, and M. McQueen,
On the Performance of a Survivability Architecture for Networked
Computing Systems, IEEE Proc. HICSS-35, Hawaii, 2002, 1-9.
[42] C. Taylor, A. Krings, W. S. Harrison, and N. Hanebutte,
Merging Survivability System Analysis and Probability Risk
Assessment for Survivability Analysis, IEEE DSN 2002 Book of
FastAbstracts, 2002.
[43] A. W. Krings and M. H. Azadmanesh, A Graph Based Model
for Survivability Applications, To Appear Electronic Journal of
Operations Research (EJOR), 2004.
[44] A. Krings and P. Oman, A Simple GSPN for Modeling
Common Mode Failures in Critical Infrastructures, HICSS-36
Minitrack on Secure and Survivable Software Systems, Hawaii,
Jan. 2003, 334a-44.
[45] A. W. Krings, Agent Survivability: An Application for Strong
and Weak Chain Constrained Scheduling, HICSS-37, Minitrack on
Security and Survivability in Mobile Agent Based Distributed
Systems, Big Island, Hawaii, Jan. 2004, To Appear.
[46] F. T. Sheldon, K. M. Kavi, W. W. Everett, R. Brettschneider,
J. T. Yu, and R. C. Tausworthe, Reliability Measurement: From
Theory to Practice, IEEE Software, 1992, 13-20.
[47] S. Chessa and P. Santi, Comparison based system-level fault
diagnosis in ad-hoc networks, 20th IEEE Symp. On Reliable
Distributed Systems, 2001, 257-266.
[48] F. P. Preparata, G. Metze, and R. T. Chien, On the
Connection Assignment Problem of Diagnosable Systems, IEEE
Transactions on Computers, vol. EC-16, 1967, 848 - 854.
[49] A. Krings, S. Harrison, N. Hanebutte, C. Taylor, and M.
McQueen, Attack Recognition Based on Kernel Attack
Signatures, Int'l Sym. on Information Systems and Engineering
(ISE), Las Vegas, 2001, 413-419.
[50] C. Taylor, W. Harrison, A. Krings, N. Hanebutte, and M.
McQueen, Low-Level Network Attack Recognition: A SignatureBased Approach, IEEE Proc. PDCS, Anaheim, 2001, 570-574.
[51] R. G. Downey and M. R. Fellows, Parameterized Complexity
(Springer-Verlag, 1999).
[52] F. N. Abu-Khzam, R. L. Collins, M. R. Fellows, M. A.
Langston, W. H. Suters, and C. T. Symons, Kernelization
Algorithms for the Vertex Cover Problem: Theory and
Experiments, Proc. Wkshp on Algorithm Engineering and
Experiments (ALENEX), 2004, To Appear.
[53] F. N. Abu-Khzam, M. A. Langston, and P. Shanbhag,
Scalable Parallel Algorithms for Difficult Combinatorial
Problems: A Case Study in Optimization, Proc., Int’l Conf on
Parallel and Distributed Computing and Systems, 2003, 563-568.
7
Apx: Cyber-Security in the Electric Sector
Excerpt from [12]: The generation and delivery of
electricity has been, and continues to be, a target of
malicious groups and individuals intent on disrupting the
View publication stats
electric power system. Even attacks that do not directly
target the electricity sector can have disruptive effects on
electricity system operations. Many malicious code attacks,
by their very nature, are unbiased and tend to interfere with
operations supported by vulnerable applications. One such
incident occurred in January 2003, when the “Slammer”
Internet worm took down monitoring computers at
FirstEnergy Corporation’s idled Davis-Besse nuclear plant.
A subsequent report by the North American Electric
Reliability Council (NERC) concluded that, although it
caused no outages, the infection blocked commands that
operated other power utilities. The report, “NRC Issues
Information Notice on Potential of Nuclear Power Plant
Network to Worm Infection,” is available at web site
http://www.nrc.gov/reading-rm/doccollections/
news/2003/03-108.html.
This example, among others, highlights the increased
vulnerability to disruption via cyber means faced by North
America’s critical infrastructure sectors, including the
energy sector. Of specific concern to the U.S. and Canadian
governments are the Supervisory Control and Data
Acquisition (SCADA) systems, which contain computers
and applications that perform a wide variety of functions
across many industries. In electric power, SCADA includes
telemetry for status and control, as well as Energy
Management Systems (EMS), protective relaying, and
automatic generation control. SCADA systems were
developed to maximize functionality and interoperability,
with little attention given to cyber security. These systems,
many of which were intended to be isolated, are now, for a
variety of business and operational reasons, either directly
or indirectly connected to the global Internet. For example,
in some instances, there may be a need for employees to
monitor SCADA systems remotely. However, connecting
SCADA systems to a remotely accessible computer
network can present security risks. These risks include the
compromise of sensitive operating information and the
threat of unauthorized access to SCADA systems’ control
mechanisms.
Security has always been a priority for the electricity
sector in North America; however, it is a greater priority
now than ever before. Electric system operators recognize
that the threat environment is changing and that the risks
are greater than in the past, and they have taken steps to
improve their security postures. NERC’s Critical
Infrastructure Protection Advisory Group has been
examining ways to improve both the physical and cyber
security dimensions of the North American power grid.
This group includes Canadian and U.S. industry experts in
the areas of cyber security, physical security and
operational security. The creation of a national SCADA
program to improve the physical and cyber security of these
control systems is now also under discussion in the United
States. The Canadian Electrical Association Critical
Infrastructure Working Group is examining similar
measures.