Expert Systems: 5.1 Overview
Expert Systems: 5.1 Overview
Expert Systems: 5.1 Overview
Expert Systems
meningitis (Shortliffe, 1976). MYCIN is a rule-based expert or through analogies between similar problems. In a network
system that incorporates approximately 500 rules and uses environment, an expert system may interact with other
backward-chaining reasoning, which is discussed in more systems, such as by obtaining information from a database or
detail later in this chapter. by consulting other expert systems. Similar to human experts,
During the development of MYCIN, the importance of an expert system should be able to explain its reasoning pro-
separating a knowledge base and an inference engine was recog- cess, justify its conclusion, and answer questions about the
nized. Shortly afterwards, EMYCIN (Buchanan and Shortliffe, inference procedure. The explanation ability is also important
1984), the domain-independent version of MYCIN, was in the development of the expert system. It enables the de-
produced by removing specific domain knowledge about in- signer to validate the encoded knowledge and the reasoning
fectious blood disease. It became the forerunner of expert algorithms. Finally, for real-world applications that require
system shells. The impact of EMYCIN was immediate in the real-time response, an expert system should perform well
development of subsequent expert systems. For example, PUFF under time and resource constraints.
(Aikens et al., 1983), a program for analyzing pulmonary prob- In the past, expert systems have been built to solve a large
lems, was developed based on EMYCIN in about 5 years, a range of problems in the fields of engineering, manufacturing,
significant improvement over the 20 years required to develop business, medicine, environment, energy, agriculture, telecom-
MYCIN. munications, government, law, education, and transportation.
The value of expert system shells was quickly recognized by Generally, they fall into the following broad categories (Water-
the research community and industry. The number of expert man, 1986):
systems increased dramatically in the following years. The
. Controlling: Governing the behavior of a complex system
estimated number of developed expert systems increased
. Designing: Configuring components to meet design ob-
from 50 in 1985 to 2200 in 1988 and then to 12,500 in 1992
(Durkin, 1993; Hammond, 1986). The vast majority of them jectives and constraints
. Diagnosing: Determining the cause of malfunctions based
have been developed using a shell, with LISP, PROLOG, and
OPS-5 as the other major languages being used. During the on observations
. Instructing: Assisting students’ learning and understand-
past three decades, the field of expert systems has changed
from a technology confined to research circles to one being ing of a subject
. Interpretating: Forming high-level conclusions or de-
used commercially to aid human decision making in a wide
range of applications. scriptions from raw data
. Monitoring: Comparing a system’s observed behavior to
the expected behavior
5.1.2 Characteristics and Categories . Planning: Constructing a sequencing of actions that
Compared to conventional computer applications and other achieve a set of goals
. Predicting: Projecting probable consequences from given
AI programs, expert systems have the following distinguished
characteristics: situations
. Scheduling: Assigning resources to a sequence of activities
. They deal with complex problems that require a consider- . Selecting: Identifying the best choice from a list of possi-
able amount of human expertise. bilities
. They encode human knowledge for a specific domain and . Simulating: Modeling the behavior of a complex system
simulate human reasoning using the knowledge.
. They often solve problems using heuristic or approximate Among the large number of developed expert systems, here
methods, which are not guaranteed to find the best are some well-known ones:
solutions.
. DENRAL: A chemical expert system that interprets mass
. They are able to explain and justify solutions to human
spectra on organic chemical compounds:
users at different knowledge levels.
. ELIZA: An interactive dialog expert system
An expert system is meant to embody the knowledge of . HEARSAY: A speech-understanding expert system
human experts in a particular domain so that nonexpert . INTERNIST: One of the largest medical expert systems
users can use it to solve difficult problems. The knowledge for internal medicine
and information handled by the expert system may be inaccu- . MACSYMA: A symbolic mathematics expert system that
rate, fuzzy, or incomplete. Hence, an expert system should be solves math problems involving algebra, calculus, and
able to reason under approximate and noisy conditions. To differential equations
improve its performance so that recurring problems can be . MOLGEN: An object-oriented expert system for planning
solved faster and new problems can be solved better, an expert gene-cloning experiments
system should learn from its past experience automatically. . MYCIN: A medical expert system that deals with bacterial
The learning may be achieved either from previous examples infections
5 Expert Systems 369
. PUFF: A medical diagnostic system for interpreting the expert system may consult other expert systems or obtain
results of respiratory tests information over the Internet.
. PROSPECTOR: An expert system for predicting potential
Among these components, the two most important ones are
mineral deposits
knowledge base and inference engine. The knowledge base
. XCON (originally called R1): An expert system used
contains formally encoded domain-specific knowledge of a
to configure computers that consist of hundreds of
problem, whereas the inference engine solves problems using
modules
various reasoning algorithms. To achieve expert-level perfor-
mance, expert systems need to access a substantial amount of
5.1.3 Architecture domain knowledge and to apply the knowledge efficiently
through one or several reasoning mechanisms.
The architecture of a typical expert system is shown in Figure
The separation of the knowledge base and the inference
5.1. It contains the following major components:
engine is essential for the success of expert systems. It is very
. A knowledge base contains specialized knowledge of the important for a number of reasons: First, knowledge can be
problem domain. In a rule-based expert system, the represented in a more natural and declarative form that is
knowledge including facts, concepts, and relationships is easier to understand and implement. For example, if . . .
represented in the if . . . then . . . rule form. then . . . rules represent domain knowledge in a more declara-
. An inference engine performs knowledge processing. It is tive form than embedding the knowledge in problem-solving
modeled after the reasoning process of human experts procedures. Second, developers can focus on creating and
and applies knowledge in the knowledge base as well as organizing high-level knowledge rather than on the detailed
information on a given problem for its solution. implementation. Third, changes made to the knowledge base
. A working memory holds the data, goal statements, and do not affect the function of inference engine and vice versa.
intermediate results that make up the current state of the Finally, the same problem-solving techniques employed by the
inference process. inference engine can be used in a variety of applications.
. An explanation subsystem provides explanations of the As a subfield of artificial intelligence (AI), expert system
reasoning process of the system and justifications for the research has strong links with related topics in AI. The four
system’s actions and conclusions. Adequate explanation fundamental topics in expert systems are representing knowl-
facilitates the acceptance of the system by the user edge, reasoning using knowledge, acquiring knowledge, and
community. explaining solutions. Building a successful expert systems
. A user interface is the window through which the user depends on a number of factors, including the nature of the
interacts with the system. Interface styles include question- task, the availability of expertise, the amount of knowledge
and-answer, menu-driven, natural language, graphics required, and the ability to formally encode the expertise and
interfaces, and online help. An expert system needs to be reason efficiently with it to solve the task.
easy to learn and friendly to use.
. A knowledge-base editor is the window through which
the knowledge engineer develops the system. It also assists
5.1.4 Languages and Tools
in maintaining and updating the knowledge base. Expert systems have been constructed using various general-
. A system interface links the expert system to external purpose programming languages as well as specific tools. LISP
programs and information sources. For example, the and PROLOG have been used widely. OPS-5 has also been
popular among rule-based programmers. OPS is a product of systems, knowledge representation is mostly concerned with
the Instructable Production System Project at CMU in 1975. It symbolically coding a large amount of domain knowledge in
was used to create one of the classic expert systems called R1 computer-tractable form such that it can be used to describe
(later called XCON) to help DEC configure VAX computers. C the task and the environment unambiguously and to reason
and C þ þ have been used in applications when execution with the knowledge efficiently toward certain goals.
speed is of importance. C Language Integrated Production Because of the property of unambiguity, mathematical tools
System (CLIPS) is a popular development environment for are used as the bridges between verbal or mental representation
building rule- and object-based expert systems. CLIPS is writ- and computer representation of knowledge. These tools in-
ten in C for portability and speed. JAVA is relatively new and is clude logic, probability theory, fuzzy sets, abstract algebra, and
gaining popularity in building expert systems mostly due to set theories. Major knowledge representation schemes in
the desirable attribute of portability on different computer expert systems include the following:
platforms.
. Production rules are generative rules of intelligent be-
Compared to general-purpose programming languages,
havior, similar to the grammar rules in automata theory
shells provide better tools for fast design and implementation
and formal grammars. They encode relationships between
of expert systems. An expert system shell is a programming
facts and concepts as well as associations between data
environment that contains the necessary utilities for both
and actions (i.e., the ‘‘what to do when’’ knowledge).
developing and running an expert system. For example, since
. Semantic networks and frames are graphical formalisms
an expert system shell provides a build-in inference engine,
for encoding facts, such as objects and events and their
the developer can focus on inputting problem-specific knowl-
properties, and heuristic knowledge for processing
edge into the knowledge base. By using expert system shells,
information, such as procedures. In these structured
the programming skills required to build an expert system
representations, nodes stand for concepts and arcs for
become minimal, and the design and implementation time
relationships between them.
can be greatly reduced.
. Logic languages are derived from mathematical logic;
EMYCIN (Buchanan and Shortliffe, 1984) is an example of
they encode facts and control information in logic for-
shells. The tools it provides include (a) an abbreviated rule
mulas.
language that is easier to read than LISP and more concise than
. Model-based approaches use theoretical models to rep-
the English subset used by MYCIN; (b) an indexing and
resent knowledge about components and systems.
grouping scheme for rules; (c) a goal-driven inference engine;
. Object-oriented approaches represent knowledge in
(d) a communication interface between the program and the
terms of interacting objects, which consist of data and
end user; (e) an interface between the system designer and the
procedures.
program, which supports displaying and editing rules, editing
. Blackboard architectures partition domain knowledge
knowledge in tables, and running selected rules sets on a
into independent knowledge sources and build up solu-
problem; and (f) facilities for monitoring the behavior of the
tions on a global data structure, the blackboard. The
program, such as explaining how a conclusion is reached,
blackboard serves the function of the working memory,
comparing the results of the current run with correct results
but its structure is much more complex.
and exploring the discrepancies, and reviewing conclusions
about a stored library of cases.
General-purpose shells are offered to address a broad range
5.2.1 Rule-Based Expert Systems
of problems. Many domain-specific tools have also been de-
veloped to provide special features to support the development Based on predicate logic, probability theory, and fuzzy sets,
of expert systems for specific domains. The introduction and rule-based expert systems are the most common type of expert
widespread use of expert system shells promotes the successful systems. Rule-based expert systems use if-then rules (or pro-
applications of expert systems to a wide range of tasks. duction rules) to represent human expert knowledge, which is
often a mix of theoretical knowledge, heuristics derived from
experience, and special-purpose rules for dealing with abnor-
5.2 Knowledge Representation mal situations. A production system is an example of rule-
based systems: its knowledge base is a rule set, its inference
Knowledge representation is a substantial subfield of AI in its engine is a rule interpreter that decides when to apply which
own right. Winston defines a representation as ‘‘a set of rules, and its working memory contains the data that are
syntactic and semantic conventions that make it possible to examined and modified in the inference process. The rules
describe things’’ (Winston, 1984). The syntax of a representa- are triggered by the data, and the rule interpreter controls the
tion specifies a set of rules for combining symbols to form selection of the rules and updating of the data.
expressions in the language. The semantics of a representation Generally, the knowledge base of a rule-based expert system
specify the meanings of expressions. In the field of expert consists of a set of rules of the following form:
5 Expert Systems 371
if condition1 and . . . and conditionm are true probabilities that are difficult to estimate. It makes a strong
then conclusion1 and . . . and conclusionn assumption, however, that all rules are independent. At the
Each rule contains one or more conditions and one or more heart of the Bayesian approach is the Bayes’ rule. The Bayes’
conclusions or actions. rule (also known as Bayes’ law or Bayes’ theorem) is the
MYCIN is a classical rule-based system. It diagnoses certain following equation:
infectious diseases, prescribes antimicrobial therapy, and can
P(XjY )P(Y )
explain its reasoning (Buchanan and Shortliffe, 1984). In a P(Y jX) ¼ , (5:2)
controlled test, its performance equaled that of specialists. P(X)
Because medical diagnosis often involves a degree of uncer-
where P(A) represents the unconditional or prior probability
tainty, each MYCIN’s rule has associated with it a certainty
that A is true and where P(AjB) represents the conditional or
factor (CF), which indicates the extent to which the evidence
posterior probability A given that all we know is B. This simple
supports the conclusion (i.e., likelihood). A typical MYCIN
equation underlies modern AI systems for probabilistic infer-
rule looks like this:
ence. The limitations of the Bayesian approach resides in the
If the stain of the organism is gram-positive, and assumptions that the evidences are independent, prior prob-
the morphology of the organism is coccus, and abilities are known, and the sets of hypotheses are both inclu-
the growth conformation of the organism is clumps sive and exhaustive.
then there is suggestive evidence (0.7) that the
Fuzzy logic is also widely used to deal with uncertainty.
identity of the organism is staphylococcus.
Probability theory represents uncertainty based on frequency
There are many sources of uncertainty in problem solving. of occurrence, whereas fuzzy logic sets represent imprecision as
The domain knowledge may be incomplete, error-prone, or a graded membership function with a value between zero and
approximate. The data may be noisy or unreliable. In the field one. Fuzzy logic has been applied in various types of expert
of expert system, research methods for handling uncertainty systems, including rule-based systems, to handle uncertainties.
include certainty factor approach, the Bayesian approach, In fuzzy rule-based expert systems, knowledge is represented in
fuzzy logic, and the belief theory. fuzzy rules and fuzzy sets. Fuzzy rules are if-then statements
In the certainty factor approach certainty factors are asso- that contain fuzzy variables. These rules are processed based on
ciated with given data, indicating their degree of certainty. the mathematical principles of fuzzy logic.
Each rule is also associated with a CF representing how strong Belief theory is yet another alternative approach. Dempster-
the conditions of the rule support the conclusions. A certainty Shafer theory of evidence performs reasoning under uncer-
factor of a hypothesis h and an evidence e is defined as: tainty in a structured space of hypotheses. It provides a means
for computing a belief function over the hypothesis space and a
CF[h, e] ¼ MB[h, e] MD[h, e], (5:1) way for combining belief function derived from different pieces
of evidence. This approach is based on better mathematical
where MB[h, e] measures the extent to which e supports h and foundations than the CF approach and is more general than
where MD[h, e] measures the extent to which e supports the the Bayesian and CF approaches (Durkin, 1993).
negation of h. Since MB[h, e] and MD[h, e] have values
between zero and one, a CF has a value between 1 and 1.
5.2.2 Model-Based Systems
As the CF moves toward 1, the evidence increasingly supports
the hypotheses. The CF of the conclusion of a rule is computed Expert systems relying on heuristic rules have a number of
as the CF of the rule times the minimum of the CFs of its limitations. They fail when a problem does not match the
individual premises. In MYCIN, when several different pos- heuristics or when the heuristics are applied in inappropriate
sible diseases are consistent with available evidences, their situations. The model-based approach attempts to address
certainty factors show how likely each is. When there are these problems by solving problems based on theoretical
multiple possible treatments, their certainty factors show the models of the components and systems. They are less depend-
likelihood that each treatment will work. ent on expert opinion and experience and have higher flexibil-
The certainty factor approach is simple, easy to understand, ity and scope for expansion. The downsides are the cost and
and efficient to run; it provides a means of estimating belief complexity of building models and the resulting programs that
that is natural to a domain expert. Its combination methods may be large and cumbersome (Stern and Luger, 1997).
ensure locality, detachment, and modularity (Stefik, 1995). The central concept of a model-based system is a model, a
This means that adding or deleting rules does not require description of a device, or a system using an appropriate
rebuilding the entire certainty model. modeling language. The model specifies the structure, func-
A more general approach than the certainty factor approach tions, and behaviors of the devices for the purposes of analysis,
is the Bayesian approach based on conditional probabilities. prediction, diagnosis, and other such procedures. The causal
The certainty factor approach is simple because it ignores prior and structural information in the model may be represented
372 Yi Shang
using a number of data structures, such as rules and objects. relationships between values on the terminals: (1) if we know
For example, in using objects to represent the component the values at A and B, then the value at C is A þ B; (2) if we
structure in a model, the fields of an object represent a com- know the values at A and C, then the value at B is C A; and (3)
ponent’s state, and the methods define its functionality. A if we know the values at B and C, then the value at A is C B.
model-based system reasons based on the model. Using this kind of model in fault diagnosis, consider the
The types of models employed in model-based systems can circuit of two multipliers and two adders shown in Figure
be categorized as follows: 5.2(B). The input values are given at A through D, and the
output values are given at E and F with the expected output
. Declarative versus procedural: Declarative models de-
values in parentheses and the actual ones in brackets. The task
scribe relationships between entities. They are very general,
is to determine which device is faulty from the fact that we
but can lead to complex and inefficient reasoning algo-
expect 9 at E and instead get 6. Let’s assume that only a single
rithms. Procedural models are more appropriate when the
device is faulty. Since the value at E depends on Add-1, Mult-1,
knowledge contains a strong element of directionality.
and Mult-2, one of them must have a fault. Since we get a
They tend to be highly specialized to a certain task or
correct value at F, Mult-2 must be correct. Thus, the fault lies
situation and can efficiently derive outputs for inputs.
in either Mult-1 or Add-1. Additional tests may be applied to
. Quantitative versus qualitative: Quantitative models are
the fewer devices left to identify the faulty one.
numerical models (e.g., the algebraic or differential equa-
tion-based models in traditional engineering disciplines).
In qualitative models, variables take on qualitative values
like low and normal instead of numerical values. The
5.3 Reasoning
representation is usually based on fuzzy sets and belief
Some reasoning strategies are general, whereas others are
networks. The advantage of qualitative models is that they
specific to particular knowledge representation schemes.
map the domain knowledge in a more natural way and
Goal-driven and data-driven strategies are general reasoning
better reflect a human point of view.
strategies. In data-driven reasoning, from the conditions that
. Certain versus uncertain: When the knowledge is in-
are known to be true, we reason forward to establish new
accurate, incomplete, or ambiguous, the model should
conclusions. In contrast, in goal-driven reasoning, we reason
still be able to represent and process the knowledge. The
backward from a goal state toward the conditions necessary to
major techniques for dealing with uncertainty are prob-
make it true. In addition, representation-specific reasoning
ability theory and fuzzy theory.
methods have been developed for the various types of expert
. Static versus dynamic: Static models represent steady-
systems, including rule-based systems, frame-based systems,
state or equilibrium behaviors of systems, whereas dy-
logic systems, model-based systems, object-oriented systems,
namic models represent behaviors in transient states.
fuzzy logic systems, blackboard systems, and many others.
. Continuous versus discontinuous: In continuous
models, behavior trajectories evolve smoothly through
adjacent states. Discrete models present discrete state 5.3.1 Forward and Backward Chaining
transitions, jumping from one state to another with dif- in Rule-Based Systems
ferent properties.
In a rule-based system, the inference engine usually goes
The following simple examples of device and circuit analysis through a simple recognize-assert cycle. The control scheme
from Davis and Hamscher (1992) illustrate the concept of is called forward chaining for data-driven reasoning, and back-
models. To represent an adder shown in Figure 5.2(A), the ward chaining for goal-driven reasoning. The basic idea of
model encodes the knowledge that captures the functionality forward chaining is when the premises of a rule (the if por-
of the adder. It consists of three expressions that represent the tion) are satisfied by the data, the expert system asserts the
A=1 (E = 9)
Mult 1 Add 1
A B=3 [E = 6]
Adder C
B A + B --> C C=2 (F = 8)
Mult 2 Add 2
D=3 [F = 8]
conclusions of the rule (the then portion) as true. A forward- To illustrate forward and backward chaining, consider a
chaining reasoning system starts by placing initial data in its simple example with the following rules in the knowledge base:
working memory. Then the system goes through a cycle of
Rule 1: if a and b, then c
matching the premises of rules with the facts in the working
Rule 2: if c and d, then p (s1)
memory, selecting one rule, and placing its conclusion in the Rule 3: if not d and not e, then p (s2) or p (s3)
working memory. This inference process is useful in searching Rule 4: if not d and e, then p (s4)
for a goal or an interpretation, given a set of data. For example,
XCON is a forward-chaining rule-based system (McDermott, The symbols a to e and s1 to s4 represents objects, and p
1982) that contains several thousand rules for designing con- represents a property of objects.
figurations of computer components for individual customers. To perform backward-chaining reasoning, the top-level goal,
It was one of the first clear commercial successes of expert p(X), is placed in the working memory as shown in Figure
systems. Its underlying technology has been implemented in 5.3(A), where X is a variable that can match with any object.
the general-purpose language OPS-5. The conclusions of three rules (rules 2, 3, and 4) match with
In a backward-chaining reasoning system the goal is ini- the expression in the working memory. If we solve conflicts in
tially placed in the working memory. The system matches rule favor of the lower-numbered rule, then rule 2 will be selected
conclusions with the goal, selects one rule, and places its and fire. This causes X to be bound to s1 and the two premises
premises in the working memory. The process iterates, with of rule 2 to be placed in the working memory as in Figure
these premises becoming new goals to match against rule 5.3(B). Then, since the conclusion of rule 1 matches with a fact
conclusions. Thus, the system works backward from the ori- in the working memory, we then fire rule 1 and place its
ginal goal until all the subgoals in the working memory are promises in the working memory as in Figure 5.3(C). At this
known to be true. Subgoals may also be solved by asking the point, there are three entries in the working memory (a, b, d)
user for information. For example, MYCIN’s inference engine that do not match with any rule conclusion. The expert system
uses a backward-chaining control strategy. From its goal of will query the user directly about these subgoals. If the user
finding significant disease-causing organisms, MYCIN uses its confirms them as true, the expert system will have successfully
rules to reason backward to the data available. Once it finds determined the causes for the top-level goal p(X).
such organisms, it attempts to select a therapy to treat the The control of the previous backward-chaining process per-
disease(s). Since it was designed as a consultant for physicians, forms a depth-first search, in which each new subgoal is
MYCIN was given the ability to explain both its reasoning and searched exhaustively first before moving onto old subgoals.
its knowledge (Buchanan and Shortliffe, 1984). Other search strategies, such as breadth-first search, can also be
Given a fixed reasoning method, the process of searching applied.
through alternative solutions can be affected through the Given the same set of rules, forward chaining can also be
structuring and ordering of the rules in implementations. For applied to derive new conclusions from given data. For
example, in production systems, a rule of the form ‘‘if p and q example, the algorithm of forward chaining with breadth-
and r then s’’ may be interpreted in backward chaining as a first search is as follows: Compare the content of the working
procedure of four steps: to do s, first do p, then do q, then do memory with the premises of each rule in the rule base using
r. Although the procedural interpretation of rules reduces the the ordering of the rules. If the data in the working memory
advantages of declarative representation, it can be used to match a rule’s premises, the conclusion is placed in the
reflect more efficient heuristic solution strategies. For instance, working memory, and the control moves to the next rule.
the premises of a rule may be ordered so that the one that is Once all rules have been considered, the control starts again
most likely to fail or is easiest to be satisfied will be tried first. from the beginning of the rule sets.
(A) Rule-Based System (B) System After Rule 2 Fires (C) System After Rule 1 Fires
5.3.2 Model-Based Reasoning explanations of what was carried out, whether it was a success
or failure, the repair strategy in the case of failure, and how to
Model-based reasoning contains a diverse set of approaches
avoid the failure. Cases can be represented in many different
and a collection of loosely connected techniques, with most
forms such as rules, logic formulas, frames, and database
applications in the areas of monitoring, control, and diagnosis.
records. To be successful, it is important for a CBR system to
For diagnosis applications, model-based reasoning usually
have a sufficiently large collection of cases to reason from and
consists of the following elements: (a) simulate and predict
to have a domain that is well understood.
the normal behavior of the system, (b) record dependencies
CBR systems select and reason from appropriate past cases
between internal model components and predicted observa-
to build general rules and are able to learn from experiences
tions, (c) upon detection of abnormal observations, use the
because success and failure of previous attempts are retained.
dependencies to identify conflicting model assumptions, and
The major steps in case-based reasoning are as follows:
(d) in the presence of multiple candidates, apply a measure-
ment strategy to reduce the number of candidates. 1. A new problem is analyzed and represented in the form
Among the various types of models, qualitative models have such that the system can retrieve relevant past cases
been the focus of model-based reasoning in AI. They are useful from its memory. A past case is appropriate if it has the
in the situations where the quantitative models are impossible potential to provide a solution to the new problem.
to develop due to lack of knowledge or are prohibitively Typically, heuristics are used to choose cases similar to
computationally expensive. They may also correspond to the the new problem based on certain similarity measures.
common sense knowledge better than abstract mathematical 2. A retrieved case is modified so that it is applicable to
models. The major approaches in building qualitative models the new problem. Analytic methods or heuristic
and related reasoning systems are the following: methods are applied to transform the stored solution
into operations suitable to the new problem.
. Constraint-based approach: A physical system is modeled
3. In applying the transformed case to the new problem,
by a set of qualitative mathematical constraints between
an initial solution is proposed, tested, and evaluated. If
the variables that comprise the model (Kuipers, 1986).
it does not perform well, an explanation of the result is
. Component-based approach: A model represents the
generated.
topological structure of the target system, consisting of
4. The new case is saved into the case collection with a
components, connections, and materials that flow
record of its success or failure for future use. The
through connections (deKleer and Brown, 1984). The
system goes through an incremental learning process
behavior of each component is described by a set of
as new cases are added.
local properties (i.e., relationships between inputs, in-
ternal states, and outputs). To retrieve relevant cases, one technique is to assign indexes
. Process-based approach: A system is modeled based on (or labels) to all cases. Indexes represent an interpretation of a
component models and system topology, similar to the situation and determine under what conditions a case can be
component-based approach. The difference is that the used to make useful inferences (Kolodner, 1993). Assigning
process-based approach uses processes to model physical indexes is problem dependent, requiring a good understanding
interactions and to define the dynamic characteristics of of the problem domain. Another technique is to compute the
the target system (Davis and Hamscher, 1992). relevance of a case based on similarity measures. For example,
in nearest-neighbor matching, each corresponding feature of a
new case and a retrieved case is compared by looking at the
5.3.3 Case-Based Reasoning
field type and values. Each feature gets a similarity score, and
Knowledge acquisition is a very difficult process in building the weighted sum of the feature scores is the overall similarity
expert systems. Case-based reasoning (CBR) systems simpli- score of the two cases.
fies the process by using a collection of past problem solutions To apply a retrieved case to the new problem, the previous
(cases) to address new problems (Kolodner, 1993). The basic solution may need to be modified to fit the current problem.
idea underlining this approach is ‘‘what was true yesterday is There are two general types of modification: structural and
likely to be true today.’’ derivational. In structural modification, simple changes, such
Past cases are either collected from human experts or from as substituting values or adjusting and interpolating numerical
previous successes or failures of the system. A case usually numbers, are made to the retrieved solution to make it work
contains three components: problem description, solution for the new problem. In derivational modification, procedures
to the problem, and outcome of the solution. A description that generated the previous solutions are run again to generate
of the problem contains all descriptive information necessary a new solution to the new case.
for solving the problem. The description of the solution allows To illustrate how case-based reasoning works, let’s look at
the reuse of a previous solution without starting from scratch CHEF, a CBR system that creates cooking recipes (Hammond,
once a similar case arrives. The outcome description contains 1986). The input to CHEF is a list of goals that specify different
5 Expert Systems 375
types, tastes, and textures of dishes. The output is a recipe that 3. Knowledge analysis: The outputs from the know-
satisfies these constraints. For example, the input may be ledge extraction phase, such as concepts and heuristics,
‘‘create a crisp pork dish; include broccoli; use stir-fry are analyzed and represented in formal forms,
method.’’ CHEF looks in its case base for a recipe that makes including heuristic rules, frames, objects and rela-
a similar meal and then adapts it to solve the new problem. If it tions, semantic networks, classification schemes,
already has a recipe for pork with cabbage, it may copy the neural networks, and fuzzy logic sets. These representa-
recipe and substitute broccoli for cabbage. However, broccoli tions are used in implementing a prototype expert
may not remain crisp if cooked like cabbage. Thus, the way of system.
stir-frying may be modified, for example, by using rules from 4. Knowledge verification: The prototype expert system
other cases involving crisp broccoli. containing the formal representation of the heuristics
CBR has been applied to a variety of applications, including and concepts is verified by the experts. If the knowledge
diagnosis, classification, interpretation, instruction, planning, base is incomplete or insufficient to solve the problem,
and scheduling. One example is Support Management Auto- alternative knowledge acquisition techniques may be
mated Reasoning Technology (SMART) of Compaq (Acorn applied, and additional knowledge acquisition process
and Walden, 1992), which provides quality technical support may be conducted.
by using previous cases to resolve new ones instead of trying to
Many knowledge acquisition techniques and tools have been
diagnose a new problem from scratch. Another system, CARES
developed with various strengths and limitations. Commonly
(Ong et al., 1997), employs CBR to predict the recurrence of
used techniques include interviewing, protocol analysis, reper-
colorectal cancer.
tory grid analysis, and observation.
Hybrid systems combining CBR with rule-based reasoning
Interviewing is a technique used for eliciting knowledge
and model-based reasoning have also been developed. In com-
from domain experts and design requirements. The basic
bining CBR and rule-based reasoning, rules may be used to
form involves free-form or unstructured question–answer ses-
capture broad trends and general strategies in the problem
sions between the domain expert and the knowledge engineer.
domain, whereas cases are used to support exceptions and
The major problem of this approach results from the inability
explain and justify rules. In combining CBR and model-based
of domain experts to explicitly describe their reasoning process
reasoning, model-based reasoning is good at handling well-
and the biases involved in human reasoning. A more effective
understood components under normal situations, whereas
form of interviewing is called structured interviewing, which
CBR covers the part of the domain that does not have a good
is goal-oriented and directed by a series of clearly stated goals.
model or theory. In addition, by incorporating domain know-
Here, experts either fill out a set of carefully designed ques-
ledge into CBR, past cases can be more efficiently organized,
tionnaire cards or answer questions carefully designed based
searched, retrieved, and better adapted to the new cases.
on an established domain model of the problem-solving pro-
cess. This technique reduces the interpretation problem inher-
ent in the unstructured interviewing as well as the distortion
5.4 Knowledge Acquisition caused by domain expert subjectivity.
As an example, let’s look at the interviewing process used in
Knowledge acquisition refers to the process of extracting, constructing GTE’s COMPASS system (Prerau, 1990). COM-
structuring, and organizing domain knowledge from domain PASS is an expert system that examines error messages derived
experts into a program. A knowledge engineer is an expert in from a telephone switch’s self-test routines and suggests run-
AI language and knowledge representation who investigates a ning of additional tests or replacing a particular component.
particular problem domain, determines important concepts, The interviewing process in building COMPASS has an elicit–
and creates correct and efficient representations of the objects document–test cycle as follows:
and relations in the domain.
1. Elicit knowledge from an expert.
Capturing domain knowledge of a problem domain is the
2. Document the elicited knowledge in rules and proce-
first step in building an expert system. In general, the know-
dures.
ledge acquisition process through a knowledge engineer can be
3. Test the new knowledge using a set of data:
divided into four phases:
(a) Have the expert analyze a new set of data.
1. Planning: The goal is to understand the problem (b) Analyze the same set of data using the documented
domain, identify domain experts, analyze various knowledge.
knowledge acquisition techniques, and design proper (c) Compare the two results.
procedures. (d) If the results differ, find the rules or proce-
2. Knowledge extraction: The goal is to extract know- dures that lead to the discrepancy and return to
ledge from experts by applying various knowledge ac- step 1 to elicit more knowledge to resolve the
quisition techniques. problem.
376 Yi Shang
Protocol analysis is another technique of data analysis ori- repairing constraint violation. SALT automatically acquires
ginated in clinical psychology. In this approach, an expert is these kinds of knowledge by interacting with an expert and
asked to talk about his or her thinking process while solving a then compiles the knowledge into production rules to generate
given problem. The difference from interviewing is that experts a domain-specific knowledge base. SALT retains the original
find it much easier to talk about specific problem instances knowledge in a declarative form as a dependency network,
than to talk in abstract terms. The problem-solving process which can be updated and recompiled as necessary.
being described is then analyzed to produce a structured model To build a knowledge base, the knowledge can be either
of the expert’s knowledge, including objects of significance, captured through knowledge engineers or be generated auto-
important attributes of the objects, relationships among the matically by machine learning techniques. For example, rules
objects, and inferences drawn from the relationships. The in rule-based expert systems may be obtained through a
advantage of protocol analysis is the accurate description of knowledge acquisition process involving domain experts
the specific actions and rationales as the expert solves the and knowledge engineers or may be generated automatically
problem. from examples using decision-tree learning algorithms. Case-
Repertory grid analysis investigates the expert’s mental based reasoning is another example of automated knowledge
model of the problem domain. First, the expert is asked to extraction in which the expert system searches its collection of
identify the objects in the problem domain and the traits that past cases, finds the ones that are similar to the new problem,
differentiate them. Then, a rating grid is formed by rating the and applies the corresponding solutions to the new one. The
objects according to the traits. whole process is fully automatic. An expert system of this type
Observation involves observing how an expert solves a can be built quickly and maintained easily by adding and
problem. It enables the expert to continuously work on a deleting cases. Automatic knowledge generation is especially
problem without being interrupted while the knowledge is good when a large set of examples exist or when no domain
obtained. A major limitation of this technique is that the expert exists.
underlying reasoning process of an expert may not be revealed In addition to generating knowledge automatically, machine
in his or her actions. learning methods have also been used to improve the perfor-
Knowledge acquisition is a difficult and time-consuming mance of the inference engines by learning the importance of
task that often becomes the bottleneck in expert system devel- individual rules and better control in reasoning.
opment (Hayes-Roth et al., 1983). Various techniques have
been developed to automate the process by using domain-
tailored environments containing well-defined domain knowl- 5.5 Explanation
edge and specific problem-solving methods (Rothenfluh et al.,
1996). For example, OPAL is a program that expedites knowl- Providing explanations that clarify the decision-making pro-
edge elicitation for the expert system ONCOCIN (Shortliffe et cess and justify recommendations is an integral component of
al., 1981) that constructs treatment plans for cancer patients. an expert system. It is one of the primary user requirements
OPAL uses a model of the cancer domain to acquire knowledge (Dhaliwal and Benbasat, 1996; Ye and Johnson, 1995). The
directly from an expert. OPAL’s domain model has four main explanation facility may be used by different users for different
aspects: entities and relationships, domain actions, domain reasons in different contexts. For example, novice users can use
predicate, and procedural knowledge. Based on its domain the facility to find more about the knowledge being applied to
knowledge, OPAL can acquire more knowledge from a human solve a particular problem. Advanced users access it to make
expert and translate it into executable code, such as production sure that the system’s knowledge and reasoning process is
rules and finite state tables. Following OPAL, more general- appropriate. Decision makers use the explanation facility be-
purpose systems called PROTEGE and PROTEGE-II were de- cause it aids them in formulating problems and models for
veloped (Musen, 1989). PROTEGE-II contains tools for creat- analysis. The context may be problem solving by end users,
ing domain ontology and generating OPAL-like knowledge knowledge-base debugging by knowledge engineers, or expert
acquisition programs for particular applications PROTEGE-II system validation by domain experts and/or knowledge engi-
is a general tool developed by abstraction from a successful neers. Although explanations are commonly used by end-
application, similar to the process from MYCIN to EMYCIN. users, they also play a significant role in the development of
Another example of automated knowledge acquisition is the expert systems by offering enhanced debugging and validation
SALT system (Marcus and McDermott, 1989) associated with abilities. Most current expert system development shells and
an expert system called Vertical Transportation (VT) for environments include explanation tools.
designing custom-design elevator systems. SALT assumes a The knowledge required for providing explanations may
propose-and-revise strategy in the knowledge acquisition be derived from the knowledge base or may be separate from
process. Domain knowledge is seen as performing one of the knowledge used in solving problems. Depending on the
three roles: (1) proposing an extension to the current design, type of problem-solving tasks, explanations may be presented
(2) identifying constraints upon design extension, and (3) to the users in different ways, such feed-forward and feedback.
5 Expert Systems 377
Feed-forward explanations focus on the input cues, are not framework for empirical evaluation. Information Systems Research
case-specific, and are presented prior to an assessment being 7(3), 342–362.
performed, whereas feedback explanations focus on the out- Durkin, J. (1993). Expert systems: Catalog of applications. Akron, OH:
come, explain a particular case-specific outcome, and are pre- Intelligent Computer Systems.
Gordon, J., and Shortliffe, E.H. (1985). The dempster-shafer theory of
sented subsequent to the assessment.
evidence. In B. G. Buchanan and E. H. Shortliffe (Eds.), Rule-based
The major types of explanations include why, how, what, expert systems: The MYCIN experiments of the Stanford heuristic
what-if, and strategic explanations. The why explanations programming project. Reading, MA: Addison-Wesley.
provide justification knowledge of the underlying reasons for Hammond, K. (1986). A model of case-based planning. Proceedings of
an action based on causal models. The how explanations the 5th National Conference on AI, 65–95.
provide reasoning trace knowledge of the inference process. Harmon, P., and Sawyer, B. (1990). Creating expert systems for business
The why and how explanations were first introduced in and industry. New York: John Wiley & Sons.
MYCIN. They remain the core of most explanation facilities Hayes-Roth, F., Waterman, D.A., and Lenat, D.B. (1983). Constructing
in current expert system applications and development shells. an expert system. Reading, MA: Addison Wesley.
The what explanations provide knowledge about the object Kolodner, J.L. (1993). Case-based reasoning. San Mateo, CA: Morgan
definitions or decision variables used by the system. The what- Kaufmann.
Kuipers, B. (1986). Qualitative simulation. Artificial Intelligence 29,
if explanations provide direct and explicit information about
289–388.
the sensitivity of decision variables. The strategic explanations Marcus, S., and McDermott, J. (1989). SALT: A knowledge acquisition
provide information about the problem-solving strategy and language for propose-and-revise systems. Artificial Intelligence 39,
metaknowledge. 1–37.
In MYCIN, the explanation module is invoked at the end of McDermott, J. (1982). R1: A rule-based configurer of computer
every consultation. To explain the result, the module retrieves systems. Artificial Intelligence 19(1), 39–88.
the list of rules that were successfully applied, along with the Musen, M.A. (1989). Automated generation of model-based knowledge
conclusions drawn. It allows the user to interrogate the system acquisition tools. San Francisco: Morgan-Kaufmann.
about the conclusions. Inquiries generally fall into two types: Ong, L.S., Shepherd, S., Tong, L.C., Seow-Choen, F., Ho, Y.H., Tang,
why a particular question was put and how a particular con- C.L., Ho, Y.S., and Tan, K. (1997). The colorectal cancer recurrence
clusion was reached. MYCIN keeps track of the goal–subgoal support (CARES) system. Artificial Intelligence in Medicine. Vol. II,
No. 3, 175–188.
structure of the computation and uses it to answer a why
Prerau, D.S. (1990). Developing and managing expert systems, Reading,
question by citing the related rules together with other condi- MA: Addison-Wesley.
tions. To answer a how question, MYCIN maintains a record of Rothenfluh, T.E., Gennari, J.H., Eriksson, H., Puerta, A.R. Tu, S.W.,
the decisions it made and cites the rules that it applied as well and Musen, M.A. (1996). Reusable ontologies, knowledge acquisi-
as the degree of certainty of the decision. tion tools, and performance systems: PROTEGE-II solutions to
sisyphus-2. International Journal of Human-Computer Studies 44,
References 303–332.
Acorn, T., and Walden, S. (1992). SMART: Support-management Shortliffe, E.H. (1976). Computer-Based Medical Consultation,
cultivated reasoning technology for Compaq customer service. In MYCIN. New York: American Elsevier.
Proceedings of the 11th National Conference on AI. Shortliffe, E.H., Scott, A.C., Bischoff, M.M., van Melle, W., and
Aikens, J.S., Kunz, J.C., and Shortliffe, E.H. (1983). PUFF: An expert Jacobs, C.D. (1981). ONCOCIN: An expert system for oncology
system for interpretation of pulmonary function data. Computers protocol management. In Proceedings of the 7th National Conference
and Biomedical Research 16, 199–208. on AI 876–881.
Buchanan, B.G., and Shortliffe, E.H. (1984). Rule-based experts pro- Stefik, M. (1995). Introduction to knowledge systems. San Francisco:
grams: The MYCIN experiments of the Stanford heuristic program- Morgan Kaufmann.
ming project. Reading, MA: Addison-Wesley. Stern, C.R., and Luger, G.F. (1997). Abduction and abstraction in
Clancy, W.J., and Shortliffe, E.H. (1984). Readings in medical artificial diagnosis: A schema-based account. In Expertise in context. Cam-
intelligence: The first decade. Reading, MA: Addison-Wesley. bridge, MA: MIT Press.
Davis, R., and Hamscher, W. (1992). Model-based reasoning: Trouble- Waterman, D.A. (1986). A guide to expert systems. Reading, MA:
shooting. In W. Hamscher, L. Console, and J. de Kleer (Eds), Read- Addison-Wesley.
ings in model-based diagnosis. San Mateo, CA, Morgan Kaufman. Winston, P.H. (1984). Artificial Intelligence (2nd ed.). Reading, MA:
de Kleer, J., and Brown, J.S. (1984). A qualitative physics based on Addison-Wesley.
confluences. Artificial Intelligence 24, 7–83. Ye, R., and Johnson, P.E. (1995). The impact of explanation facilities
Dhaliwal, J.S., and Benbasat, I. (1996). The use and effects of know- on user acceptance of expert systems advice. MIS Quarterly 19(2),
ledge-based system explanations: Theoretical foundations and a 157–172.