Lecture Notes 18CS753-MODULE5

Regulation – 2018 (CBCS Scheme) ARTIFICIAL INTELLIGENCE - 18CS753
MODULE 5- LEARNING, EXPERT SYSTEMS

SYLLABUS
LEARNING,
 WHAT IS LEARNING
 ROTE LEARNING
 LEARNING BY TAKING ADVICE
 LEARNING IN PROBLEM – SOLVING
 LEARNING FROM EXAMPLES: INDUCTIONS
 EXPLANATION – BASED LEARNING
 DISCOVERY
 ANALOGY
 FORMAL LEARNING THEORY
 NEURAL NET LEARNING AND GENETIC LEARNING
EXPERT SYSTEMS
 WHAT ARE EXPERT SYSTEMS?
 REPRESENTING AND USING DOMAIN KNOWLEDGE
 EXPERT SYSTEM SHELLS
 EXPLANATION
 KNOWLEDGE ACQUISITION
TEXT BOOK:
1. E. Rich, K. Knight & S. B. Nair - Artificial Intelligence, 3/e, McGraw Hill. (CHAPTERS 4, 5,6)
Prepared by KARTHIKA K – Dept. of CSE, Sri Sairam College of Engineering, Page 1

Anekal
MODULE 5A- LEARNING

INTRODUCTION
What is learning?
• Learning covers a wide range of phenomena. At one end of the spectrum is skill refinement. People get
better at many tasks simply by practicing. At the end of the spectrum lies knowledge acquisition. Many AI
programs draw heavily on knowledge as their source of power. Knowledge is generally acquired through
experience, and such acquisition is focused here. Knowledge acquisition itself includes many activities.
Simple storing of computed information, rote learning, is the most basic learning activity.
• Another way we learn is through taking advice from others. Advice taking is similar to rote learning,
but high level advice may not be in a form simple enough for a program to use directly in problem-solving.
People also learn through their problem solving experience.
• After solving a complex problem we remember the structure of the problem and the methods we used
to solve it. We can generalize from our experience to solve related problems. Another form of learning is
learning from examples. We often learn to classify things in the world without being given explicit rules.
 WHAT IS LEARNING
 ROTE LEARNING
 LEARNING BY TAKING ADVICE
 LEARNING IN PROBLEM – SOLVING
 LEARNING FROM EXAMPLES: INDUCTIONS
 EXPLANATION – BASED LEARNING
 DISCOVERY
 ANALOGY
 FORMAL LEARNING THEORY
 NEURAL NET LEARNING AND GENETIC LEARNING
Learning is an important area in AI, perhaps more so than planning.
• Problems are hard -- harder than planning.
• Recognized Solutions are not as common as planning.
• A goal of AI is to enable computers that can be taught rather than programmed.
Learning is a an area of AI that focuses on processes of self-improvement.
Information processes that improve their performance or enlarge their knowledge bases are said to learn.
Why is it hard?
• Intelligence implies that an organism or machine must be able to adapt to new situations.
• It must be able to learn to do new things.
• This requires knowledge acquisition, inference, updating/refinement of knowledge base,
acquisition of heuristics, applying faster searches, etc.
How can we learn?
Many approaches have been taken to attempt to provide a machine with learning capabilities. This is
because learning tasks cover a wide range of phenomena. Listed below are a few examples of how one
may learn. We will look at these in detail shortly
 Skill refinement o one can learn by practicing, e.g playing the piano.
 Knowledge acquisition o one can learn by experience and by storing the experience in a knowledge
base. One basic example of this type is rote learning.
 Taking advice o Similar to rote learning although the knowledge that is input may need to be
transformed (or operationalized) in orders to be used effectively.

Anekal
 Problem solving o if we solve a problem one may learn from this experience. The next time we see
a similar problem we can solve it more efficiently. This does not usually involve gathering new
knowledge but may involve reorganization of data or remembering how to achieve to solution.
 Induction o one can learn from examples. Humans often classify things in the world without
knowing explicit rules. Usually involves a teacher or trainer to aid the classification.
 Discovery o Here one learns knowledge without the aid of a teacher.
 Analogy
 If a system can recognize similarities in information already stored then it may be able to transfer
some knowledge to improve to solution of the task in hand
ROTE LEARNING
Rote Learning is basically memorization.
• Saving knowledge so it can be used again.
• Retrieval is the only problem.
• No repeated computation, inference or query is necessary.
A simple example of rote learning is caching

• Store computed values (or large piece of data) Recall this information when required by
computation.
• Significant time savings can be achieved.
• Many AI programs (as well as more general ones) have used caching very effectively.
Memorization is a key necessity for learning:
• It is a basic necessity for any intelligent program -- is it a separate learning process?
• Memorization can be a complex subject -- how best to store knowledge?
• A minimax search was used to explore the game tree.
• Time constraints do not permit complete searches.
• It records board positions and scores at search ends.
• Now if the same board position arises later in the game the stored value can be recalled and the
end effect is that deeper searched have occurred.
Store v Compute
• Rote Learning must not decrease the efficiency of the system.
• We are must able to decide whether it is worth storing the value in the first place.
• Consider the case of multiplication -- it is quicker to recomputed the product of two numbers
rather than store a large multiplication table
How can we decide?
• Cost-benefit analysis: o Decide when the information is first available whether it should be stored.
An analysis could weigh up amount of storage required, cost of computation, likelihood of recall.
• Selective forgetting: o here we allow the information to be stored initially and decide later if we

Anekal
retain it. Clearly the frequency of reuse is a good measure. We could tag an object with its time of
last use.
• If the cache memory is full and we wish to add a new item we remove the least recently used
object. Variations could include some form of cost-benefit analysis to decide if the object should be
removed.
Capabilities of Rote learning include:
– Organized storage information: In order for it to be faster to use a stored value than it would be
to re-compute it, there must be a way to access the appropriate stored value quickly.
– Generalization: The number of distinct objects that might potentially be stored can be very large.
To keep the number of stored objects down to a manageable level, some kind of generalization is necessary.
Learning by Taking Advice:
 The idea of advice taking in AI based learning was proposed as early as 1958 (McCarthy). However
very few attempts were made in creating such systems until the late 1970s.
 Expert systems providing a major impetus in this area.
There are two basic approaches to advice taking:
• Take high level, abstract advice and convert it into rules that can guide performance elements of the
system. Automate all aspects of advice taking
• Develop sophisticated tools such as knowledge base editors and debugging. These are used to aid an
expert to translate his expertise into detailed rules. Here the expert is an integral part of the
learning system. Such tools are important in expert systems area of AI.
Automated Advice Taking
The following steps summarize this method:
 Request
o This can be simple question asking about general advice or more complicated by identifying
shortcomings in the knowledge base and asking for a remedy.
 Interpret
o Translate the advice into an internal representation.
 Operationalize
o Translated advice may still not be usable so this stage seeks to provide a representation that
can be used by the performance element.
 Integrate
o When knowledge is added to the knowledge base care must be taken so that bad side-effects
are avoided.
o E.g. Introduction of redundancy and contradictions.
 Evaluate
o The system must assess the new knowledge for errors, contradictions etc.
The steps can be iterated.
 Knowledge Base Maintenance
o Instead of automating the five steps above, many researchers have instead assembled tools
that aid the development and maintenance of the knowledge base.
Many have concentrated on:
• Providing intelligent editors and flexible representation languages for integrating new knowledge.
• Providing debugging tools for evaluating, finding contradictions and redundancy in the existing
knowledge base.

Anekal
EMYCIN is an example of such a system. Example Learning System - FOO

Learning the game of hearts
FOO (First Operational Operationalizes) tries to convert high level advice (principles, problems, methods)
into effective executable (LISP) procedures.
Hearts:
• Game played as a series of tricks.
• One player - who has the lead - plays a card. Other players follow in turn and play a card.
o The player must follow suit.
o If he cannot he play any of his cards.
• The player who plays the highest value card wins the trick and the lead.
• The winning player takes the cards played in the trick.
• The aim is to avoid taking points. Each heart counts as one point the queen of spades is worth 13
points.
• The winner is the person that after all tricks have been played has the lowest points score.
Hearts is a game of partial information with no known algorithm for winning. Although the possible
situations are numerous general advice can be given such as:
• Avoid taking points.
• Do not lead a high card in suit in which an opponent is void.
• If an opponent has the queen of spades try to flush it.
In order to receive advice a human must convert into a FOO representation (LISP clause) (avoid (take-
points me) (trick))
FOO operationalizes the advice by translating it into expressions it can use in the game. It can
UNFOLD avoid and then trick to give:
(Achieve (not (during
(Scenario
(Each p1 (players) (play-card p1)) (Take-trick (trick-winner)))
(take-points me))))
However the advice is still not operational since it depends on the outcome of trick which is generally not
known. Therefore FOO uses case analysis (on the during expression) to determine which steps could case
one to take points.
Step 1 is ruled out and
Step 2's take-points is UNFOLDED: (achieve (not (exists c1 (cards-played)
Exists c2 (point-cards) (during (take
(trick-winner) c1)
(Take me c2))))))
FOO now has to decide: Under what conditions does (take me c2) occur during (take (trickwinner) c1).
A technique, called partial matching, hypothesizes that points will be taken if me = trickwinner and c2 =
c1.
We can reduce our expression to:
(achieve (not (and (have-points(card-played)) (= (trick-
winner) me ))))
This not quite enough a this means Do not win trick that has points. We do not know who the trick-winner
is, also we have not said anything about how to play in a trick that has point led in the suit. After a few
more steps to achieve this FOO comes up with:

Anekal
(Achieve (>= (and (in-suit-led(card-of me)) (possible

(trick-has-points)))
(low (card-of me)))
FOO had an initial knowledge base that was made up of:
• Basic domain concepts such as trick, hand, deck suits, avoid, win etc.
• Rules and behavioral constraints -- general rules of the game.
• Heuristics as to how to UNFOLD. FOO has 2
basic shortcomings:
• It lacks a control structure that could apply operationalization automatically.
• It is specific to hearts and similar tasks
LEARNING BY PROBLEM SOLVING
There are three basic methods in which a system can learn from its own experiences
Learning through problem solving from own experience –without an instructor /advisor.
– Learning by Parameter Adjustment
– Learning with Macro-Operators
– Learning by Chunking.
Learning by Parameter Adjustment.
Many programs rely on an evaluation procedure to summarize the state of search etc. Game playing
programs provide many examples of this. However, many programs have a static evaluation function.
In learning a slight modification of the formulation of the evaluation of the problem is required. Here the
problem has an evaluation function that is represented as a polynomial of the form such as:
The t terms values of features and the c terms are weights.

In designing programs it is often difficult to decide on the exact value to give each weight initially. So the
basic idea of idea of parameter adjustment is to:
• Start with some estimate of the correct weight settings.
• Modify the weight in the program on the basis of accumulated experiences.
• Features that appear to be good predictors will have their weights increased and bad ones will be
decreased.
• Use outcomes to adjust the weights for factors using evaluation function.
Considerations
– What are the initial weights?
– When does certain weight increase?
– When do they decrease?
LEARNING WITH MACRO-OPERATORS
Suppose you are faced with the problem of getting to the downtown post office. • Your solution may
involve getting in to your car, starting it, and driving along a certain route. • Substantial planning may go
into choosing the appropriate route, but you need not plan about how to go about starting your car. •
Sequences of actions that can be treated as a whole are called macro-operators.
The basic idea here is similar to Rote Learning: Avoid expensive re-computation. Macro-operators
can be used to group a whole series of actions into one. For example: Making dinner can be described a lay
the table, cook dinner, serves dinner. 0We could treat laying the table as on action even though it involves
a sequence of actions. The STRIPS problem-solving employed macro-operators in it's learning phase.

Anekal
Consider a blocks world example in which ON(C,B) and ON(A,TABLE) are true. STRIPS can achieve
ON(A,B) in four steps:
UNSTACK(C,B), PUTDOWN(C), PICKUP(A), STACK(A,B)
STRIPS now builds a macro-operator MACROP with preconditions ON(C,B), ON(A,TABLE), post-conditions
ON(A,B), ON(C,TABLE) and the four steps as its body.
MACROP can now be used in future operation. But it is not very general. The above can be easily
generalized with variables used in place of the blocks. However generalization is not always that easy
LEARNING BY CHUNKING
Chunking involves similar ideas to Macro Operators and originates from psychological ideas on
memory and problem solving.
The computational basis is in production systems (studied earlier). SOAR is a system that use
production rules to represent its knowledge. It also employs chunking to learn from experience.
Basic Outline of SOAR's Method
• SOAR solves problems it fires productions these are stored in long term memory.
• Some firings turn out to be more useful than others.
• When SOAR detects are useful sequence of firings, it creates chunks.
• A chunk is essentially a large production that does the work of an entire sequence of smaller ones.
• Chunks may be generalized before storing.
SOAR is a uniform processing architecture. Problems like choosing which sub goal to tackle and
which operators to try are solved with the same mechanism as the problems in the original problem space.
In solving 8 puzzle problems, SOAR learns how to place a given tile without permanently disturbing
the previously placed tiles. Chunks are generally applicable toward any goal state. This contrasts with

Anekal
macro tables, which are structured toward reaching a particular goal state from any initial state.
The price that SOAR pays for this generality and flexibility is speed. At present, chunking is
inadequate for duplicating the contents of large, directly-computed macro-operator tables.
LEARNING FROM EXAMPLES: INDUCTION

This involves the process of learning by example -- where a system tries to induce a general rule from a set
of observed instances.
This involves classification -- assigning, to a particular input, the name of a class to which it belongs.
Classification is important to many problem solving tasks.
A learning system has to be capable of evolving its own class descriptions:
• Initial class definitions may not be adequate.
• The world may not be well understood or rapidly changing.
The task of constructing class definitions is called induction or concept learning
Classification is the process of assigning to a particular input, the name of the class to which it belongs. It is
embedded inside another operation.
To see how this can happen, consider a problem solving system that contains the following
production rules:
If: The current goal is to set from place A to place B, and there is a WALL separating the two
places. Then: look for DOORWAY in the WALL and go through it.
To use this rule successfully, the system’s matching routine must be able to identify an object as a
wall. Without this, the rule can never be invoked. Then, to apply the rule, the system must be able to
recognize a doorway.
Before classification can be done, the classes it will use must be defined. This can be done in a
variety of ways, including: – Isolate a set of features that are relevant to the task domain. Define each class

Anekal
by a weighted sum of values of these features. Each class is then defined by a scoring function of the form:
C1t1+C2t2+C3t3+…..
Each t corresponds to a value of a relevant parameter, and c represents the weight to be attached to
the corresponding t.
Ex: if the task is weather prediction, the parameters can be rainfall and location of cold front.
Different functions can be written to combine these parameters to predict sunny, cloudy, rainy or snowy
weather. – Isolate a set of features that are relevant to the task domain. Define each class as a structure
composed of those features.
Ex: if the task is to identify animals, the body of each type of animal can be stored as a structure, with
various features representing such as color, length of neck etc. It is often difficult to construct, by hand, good
class definitions, idea of producing a classification program that can evolve its own class definition –
concept learning or induction.
If classes are described by scoring functions, concept learning can be done using a technique of
coefficient adjustments.
If we want to define class structurally, some other technique for learning class definitions is
necessary.
Three such techniques are: Winston’s Learning Program, Version Spaces, Decision Trees.
WINSTON’S LEARNING PROGRAM
The program started with a line drawing of a blocks world structure. • Construct a semantic net representation of the
structural description of the object(s). • This structure was then provided as input to the learning program.
• The goal is to construct representation of the definitions of concepts in this domain.
• Concepts such a house - brick (rectangular block) with a wedge (triangular block) suitably placed on top of
it, tent - 2 wedges touching side by side, or an arch - two non-touching bricks supporting a third wedge or
brick, were learned.
• The idea of near miss objects -- similar to actual instances was introduced. Input was a line drawing of a
blocks world structure
• Input processed (see VISION Sections later) to produce a semantic net representation of the structural
description of the object
• Links in network include left-of, right-of,
does-not-marry, supported-by, has-part, and
isa.
The marry relation is important -- two objects with a
common touching edge are said to marry. Marrying is
assumed unless does-not-marry stated
There are three basic steps to the problem of concept formulation:

1. Select one know instance of the concept. Call this the concept definition.

Anekal
2. Examine definitions of other known instance of the concept. Generalize the definition to include them.
3. Examine descriptions of near misses. Restrict the definition to exclude these.
Both steps 2 and 3 rely on comparison and both similarities and differences need to be identified
Node A represents the entire structure, which is composed of two parts: – node B, a wedge – node C,
a brick. The two supporting objects are related not only by left-of and right-of links, but also by a does – not
-marry link, which says that the two objects do not marry.
The two objects marry if they have faces that touch and they have a common edge.
The basic approach that Winston’s program took to the problem of concept formation can be
described as follows:
1. Begin with a structural description of one known instance of the concept. Call that description the
concept definition.

Anekal
2. Examine descriptions of other known instances of the concept. Generalize the definition to include
them.
3. Examine descriptions of near misses of the concept. Restrict the definition to exclude them.
Step 2 and 3 rely on a comparison process by which similarities and differences between structures
can be detected.
The output of the comparison procedure is a skeleton structure describing the commonalities
between the two input structures.
To see how this approach works, we trace it through the process of learning what an arch is. Suppose
that the arch description of fig (b) is presented first. It then becomes the definition of the concept Arch.
Then suppose the arch description of (c) is presented. The comparison routine will return a structure
similar to the two input structures except that it will note that the objects represented by the nodes labeled
C are not identical.
The c-note link from node C describes the difference found by the comparison routine. It notes that
the difference occurred in the isa link. It also notes that if we were to follow isa links from Brick and
Wedges, these links would eventually merge. At this point, a new description of the concept Arch can be
generated. This description simply says that node C must be either a brick or a wedge.
VERSION SPACES
Goal: to produce a description that is consistent with all positive examples but no negative examples
in the training set. Version spaces works by maintaining a set of possible descriptions and evolving that set
as new examples and near misses are presented.
we assume that a simple frame-based language is used for version space. Structural concept learning
systems are not without their problems. The biggest problem is that the teacher must guide the system
through carefully chosen sequences of examples.
In Winston's program the order of the process is important since new links are added as and when
now knowledge is gathered. The concept of version spaces aims is insensitive to order of the example
presented.
To do this instead of evolving a single concept description a set of possible descriptions are
maintained. As new examples are presented the set evolves as a process of new instances and near misses.
We will assume that each slot in a version space description is made up of a set of predicates that do
not negate other predicates in the set -- positive literals. Indeed we can represent a description as a frame
bases representation with several slots or indeed use a more general representation. For the sake of
simplifying the discussion we will keep to simple representations.

Anekal
Before we proceed to the version space algorithm, we should make some observations about the
representation. Some descriptions are more general than others. A portion of that partial ordering is shown
in fig 17.10. The entire partial ordering is called the concept space and is depicted
The version space is a set of descriptions, so an initial idea is to keep an explicit list of those

Anekal
descriptions – Unfortunately, the number of descriptions in the concept space is exponential in the number
of features and values. However, it turns out that the version space has a concise representation. It consists
of two subsets of the concept space.
One subset called G contains the most general descriptions consistent with the training examples
seen so far. The other subset S, contains the most specific descriptions consistent with the training
examples. Version space is a set of all descriptions that lie between some element of G and some element of
S in the partial order of concept space.
This representation of version space is not only efficient for storage, but also for modifications. Each
time we receive a positive training example, we want to make S more general. Negative training examples
serve to make the G set more specific. If S and G sets converge, our range of hypothesis will narrow to a
single concept description. The algorithm for narrowing the version space is called the candidate
elimination algorithm.
CANDIDATE ELIMINATION ALGORITHM

If we keep to the above definition the Mitchell's candidate elimination algorithm is the best known algorithm.

Anekal
• S and G are both singletons, so the algorithm has converged on the target concept. No more examples are
needed.
• Candidate Elimination algorithm is a least-commitment algorithm. The version space is pruned as little as
possible at each step.
• Even if all positive training examples are Japanese cars, the algorithm will not reject the possibility that
the target concept may include cars of other origin – until it receives a negative example that forces
rejection.
• This means that if training data are sparse, S and G may never converge to a single description.
• Algorithm involves exhaustive, breadth-first search through the version space.

Anekal
• The S set always contains exactly one element, because any two positive examples always have exactly
one generalization.
• The version space has several deficiencies:
– Large space requirements of the exhaustive, breadth-first search.
– Inconsistent data
– noise, can cause the candidate elimination algorithm to prune the target concept from the version
space prematurely.
 In the previous example, if the third training instance had been mislabeled (-) instead of (+), the target
concept of “Japanese economy car” would never be reached.
 Given enough erroneous negative examples, the G set can be specialized so far that the version space
becomes empty. In this case, the algorithm concludes that no concept fits the training examples.
 One solution to this problem is to maintain several G and S sets.
 When inconsistency arises, the algorithm switches to G and S sets that are consistent with most, but not
all, of the training examples.
 Another problem with the candidate elimination algorithm is the learning of disjunctive concepts
 The negative instance of Japanese car will only cause an inconsistency.

DECISION TREES

Anekal
When the concept space is very large, decision tree learning algorithms run more quickly than the
version space. Also disjunction is straightforward.
Drawback of ID3 approach: – Large, complex decision trees can be difficult for humans to
understand, so a decision tree system must have a hard time explaining the reason for its classification.
EXPLANATION BASED LEARNING:
The position is called a “fork” because the white knight attacks both the black king and the black
queen. • Black must move the king thereby leaving the queen open to capture. From this single experience,
black is able to learn quite a bit about the fork trap. The idea is that if any piece x attacks both the opponents
king and another piece y, then piece y will be lost. We don’t need to see dozens of positive and negative

Anekal
examples of fork positions in order to draw this conclusion. From just one experience, we can learn to avoid
this trap in the future. An EBL system attempts to learn from a single example x by explaining why x is an
example of the target concept. The explanation is then generalized, and the system’s performance is
improved through the availability of this knowledge.
IN GENERAL INPUTS TO EBL PROGRAMS:

Anekal

Anekal
Old Proof: New Proof:

RO=NY (given) BAC= DAE
ON=ON (Reflexive) CAD=CAD
RO+ON=ON+NY (Additive) BAC+CAD=CAD+DAE
RN=OY (Transitive) BAD=CAE

Anekal

Anekal
5B: EXPERT SYSTEMS

INTRODUCTION:
• Expert systems (ES) are one of the prominent research domains of AI. It is introduced by the
researchers at Stanford University, Computer Science Department.
• Expert systems solve problems that are normally solved by human “experts”.
• To solve expert-level problems, expert systems need access to a substantial domain knowledge
base, which must be built as efficiently as possible.
• They also need to exploit one or more reasoning mechanisms to apply their knowledge to the
problems they are given.
• Then they need a mechanism for explaining what they have done to the users who rely on them.
• The problems that expert systems deal with are highly diverse.
• There are some general issues that arise across these varying domains.
• But it also turns out that there are powerful techniques that can be defined for specific classes of
problems.
• Expert systems are complex AI programs.
• The most widely used way of representing domain knowledge in expert system is a set of
production rules, which are often coupled with a frame system that defines the objects that occur
in the rules.
• All the rules discussed next are actually the English version of the actual rules that the systems use.
• Difference among these rules illustrates some of the important differences in the way expert
systems operate.
• What are Expert Systems?
o The expert systems are the computer applications developed to solve complex
problems in a particular domain, at the level of extra-ordinary human intelligence
and expertise.

Anekal
Capabilities of Expert Systems

The expert systems are capable of
 Advising
 Instructing and assisting human in decision making
 Demonstrating
 Deriving a solution
 Diagnosing
 Explaining
 Interpreting input
 Predicting results
 Justifying the conclusion
 Suggesting alternative options to a problem
They are incapable of
 Substituting human decision makers
 Possessing human capabilities
 Producing accurate output for inadequate knowledge base
 Refining their own knowledge
Components of Expert Systems
 The components of ES include
 Knowledge Base
 Inference Engine
 User Interface
Knowledge Base
• It contains domain-specific and high-quality knowledge.
• Knowledge is required to exhibit intelligence.
• The success of any ES majorly depends upon the collection of highly accurate and precise
knowledge.

Anekal
What is Knowledge?
o The data is collection of facts.
o The information is organized as data and facts about the task domain.
o Data, information, and past experience combined together are termed as knowledge.
Components of Knowledge Base
 The knowledge base of an ES is a store of both, factual and heuristic knowledge.
 Factual Knowledge − It is the information widely accepted by the Knowledge
Engineers and scholars in the task domain.
 Heuristic Knowledge − It is about practice, accurate judgment, one’s ability of
evaluation, and guessing.
Knowledge representation
 It is the method used to organize and formalize the knowledge in the knowledge base.
 It is in the form of IF-THEN-ELSE rules.
Inference Engine
 Use of efficient procedures and rules by the Inference Engine is essential in deducting a correct,
flawless solution.
 In case of knowledge-based ES, the Inference Engine acquires and manipulates the knowledge
from the knowledge base to arrive at a particular solution.
In case of rule based ES, it
 Applies rules repeatedly to the facts, which are obtained from earlier rule application.
o Adds new knowledge into the knowledge base if required.
o Resolves rules conflict when multiple rules are applicable to a particular case.
To recommend a solution, the Inference Engine uses the following strategies
o Forward Chaining
o Backward Chaining
User Interface
 User interface provides interaction between user of the ES and the ES itself.
 It is generally Natural Language Processing so as to be used by the user who is well- versed in the
task domain.
 The user of the ES need not be necessarily an expert in Artificial Intelligence.
 It explains how the ES has arrived at a particular recommendation.
 The explanation may appear in the following forms
o Natural language displayed on screen.
o Verbal narrations in natural language.
o Listing of rule numbers displayed on the screen.
o The user interface makes it easy to trace the credibility of the deductions.
Requirements of Efficient ES User Interface

Anekal
• It should help users to accomplish their goals in shortest possible way.

• It should be designed to work for user’s existing or desired work practices.
• Its technology should be adaptable to user’s requirements; not the other way round.
• It should make efficient use of user input.
Expert Systems Limitations
o No technology can offer easy and complete solution.
o Large systems are costly; require significant development time, and computer resources.
o ESs have their limitations which include
o Limitations of the technology
o Difficult knowledge acquisition
o ES are difficult to maintain
o High development costs
Applications of Expert System
The following table shows where ES can be applied.
Application Description
Design Domain Camera lens design, automobile design.
Medical Domain Diagnosis Systems to deduce cause of disease from
Observed data, conduction medical operations on humans.
Monitoring Systems Comparing data continuously with observed system or with prescribed
behavior such as leakage monitoring in long petroleum pipeline.
Process Control Systems Controlling a physical process based on monitoring.
Knowledge Domain Finding out faults in vehicles, computers.
Finance/Commerce Detection of possible fraud, suspicious transactions, stock market trading,
Airline scheduling, cargo scheduling.
Expert System Technology
 There are several levels of ES technologies available.
 Expert systems technologies include
 Expert System Development Environment
o The ES development environment includes hardware and tools.
o They are
• Workstations,
• minicomputers,
• Mainframes.
• High level Symbolic Programming Languages such as LISt Programming (LISP) and
PROgrammationen LOGique (PROLOG).
• Large databases.
• Tools
• They reduce the effort and cost involved in developing an expert system to large

Anekal
extent.
• Powerful editors and debugging tools with multi-windows
 They provide rapid prototyping
o Have Inbuilt definitions of model,
o knowledge representation, and
o Inference design.
 Shells
o A shell is nothing but an expert system without knowledge base.
o A shell provides the developers with knowledge acquisition,
o inference engine,
o user interface, and
o Explanation facility.
 For example, few shells are given below
o Java Expert System Shell (JESS) that provides fully developed Java API for creating an expert
system.
 Vidwan, a shell developed at the National Centre for Software Technology, Mumbai in 1993. It
enables knowledge encoding in the form of IF-THEN rules
Representing and Using Domain Knowledge
• Expert systems are complex AI programs.
• The most widely used way of representing domain knowledge in expert systems is as a set of
production rules, which are often coupled with a frame system that defines the objects that occur in the
rules.
• MYCIN is one example of an expert system rule.
• All the rules we show are English versions of the actual rules that the systems use.
 RI (sometimes are called XCON) is a program that configures DEC VAX systems.
 Its rules look like this:
 Notice that RI’s rules, unlike MYCIN’s, contain no numeric measures of certainty.
 In the task domain with which RI deals, it is possible to state exactly the correct thing to be done in
each particular set of circumstances.
 One reason for this is that there exists a good deal of human expertise in this area.
 Another is that since RI is doing a design task, it is not necessary to consider all possible
alternatives; one good one is enough. As a result, probabilistic information is not necessary in RI.

Anekal
 PROSPECTOR is a program that provides advice on mineral exploration. Its rules look like this:
o In PROSPECTOR, each rule contains two confidence estimates.

o The first indicates the extent to which the presence of the evidence described in the condition part
of the rule suggests the validity of the rule’s conclusion.
o In the PROSPECTOR rule shown above, the number 2 indicates that the presence of the evidence is
mildly encouraging.
o The second confidence estimate measures the extent to which the evidence is necessary to the
validity of the conclusion or stated another way, the extent to which the lack of the evidence
indicates that the conclusion is not valid.
 DESIGN ADVISOR is a system that critiques chip designs. Its rules look like:
 This gives advice to a chip designer, who can accept or reject the advice.
 If the advice is rejected,, the system can exploit a justification-based truth maintenance system to
revise its model of the circuit.
 The first rule shown here says that an element should be criticized for poor reset ability if the
sequential level count is greater than two, unless its signal is currently believed to be resettable.
Reasoning with the Knowledge

 Expert systems exploit many of the representation and reasoning mechanisms that we have seen.
 Because these programs are usually written primarily as rule-based systems, forward chaining,
backward chaining, or some combination of the two is usually used.

Anekal
 For example, MYCIN used backward chaining to discover what organisms were present; then it used
forward chaining to reason from the organisms to a treatment regime.
 RI, on the other hand, used forward chaining.
 As the field of expert systems matures, more systems that exploit other kinds of reasoning mechanisms
are being developed.
 The DESIGN ADVISOR is an example of such a system; in addition to exploiting rules, it makes extensive
use of a justification-based truth maintenance system.
EXPERT SYSTEM SHELLS
 Initially, each expert system that was built was created from scratch, usually in LISP.
 In particular, since the systems were constructed as a set of declarative representations combined with
an interpreter for those representations, it was possible to separate the interpreter from the domain
specific knowledge and thus to create a system that could be used to construct new expert systems by
adding new knowledge corresponding to the new problem domain.
 The resulting interpreters are called shells. One influential example of such a shell is EMYCIN (for
empty MYCIN) which was derived from MYCIN
 There are now several commercially available shells that serve as the basis for many of the expert
systems currently being built.
 These shells provide much greater flexibility in representing knowledge and in reasoning with it than
MYCIN did.
 They typically support rules, frames, truth maintenance systems, and a variety of other reasoning
mechanisms.
 Early expert systems shells provided mechanisms for knowledge representation, reasoning and
explanation.
 But as experience with using these systems to solve real world problem grew, it became clear that
expert system shells needed to do something else as well.
 They needed to make it easy to integrate expert systems with other kinds of programs.
EXPLANATION
In order for an expert system to be an effective tool, people must be able to interact with it easily. To
facilitate this interaction, the expert system must have the following two capabilities in addition to the
ability to perform its underlying task:
Explain its reasoning:
o In many of the domains in which expert systems operate, people will not accept results unless they
have been convinced of the accuracy of the reasoning process that produced those results.
o This is particularly true, for example, in medicine, where a doctor must accept ultimate responsibility
for a diagnosis, even if that diagnosis was arrived at with considerable help from a program.
Acquire new knowledge and modifications of old knowledge:
o Since expert systems derive their power from the richness of the knowledge bases they exploit, it is
extremely important that those knowledge bases be as complete and as accurate as possible.
o One way to get this knowledge into a program is through interaction with the human expert. Another
way is to have the program learn expert behavior from raw data.

Anekal
KNOWLEDGE ACQUISITION
 The success of any expert system majorly depends on the quality, completeness, and accuracy of the
information stored in the knowledge base.
 The knowledge base is formed by readings from various experts, scholars, and the Knowledge
Engineers.
 The knowledge engineer is a person with the qualities of empathy, quick learning, and case
analyzing skills.
 He acquires information from subject expert by recording, interviewing, and observing him at work,
etc.
 He then categorizes and organizes the information in a meaningful way, in the form of IF-THEN-
ELSE rules, to be used by interference machine.
 The knowledge engineer also monitors the development of the ES.
How are expert systems built?

 Typically, a knowledge engineer interviews a domain expert to elucidate expert knowledge, when is
then translated into rules.
 After the initial system is built, it must be iteratively refined until it approximates expert-level
performance.
 This process is expensive and time-consuming, so it is worthwhile to look for more automatic ways of
constructing expert knowledge bases.
 While no totally automatic knowledge acquisition systems yet exist, there are many programs that
interact with domain experts to extract expert knowledge efficiently.
 These programs provide support for the following activities:
 Entering knowledge
 Maintaining knowledge base consistency
 Ensuring knowledge base completeness
 The most useful knowledge acquisition programs are those that are restricted to a particular problem-
solving paradigm e.g. diagnosis or design.
 It is important to be able to enumerate the roles that knowledge can play in the problem-solving
process.
 For example, if the paradigm is diagnosis, then the program can structure its knowledge base around

Anekal
symptoms, hypotheses and causes.

 It can identify symptoms for which the expert has not yet provided causes.
 Since one symptom may have multiple causes, the program can ask for knowledge about how to decide
when one hypothesis is better than another.
 If we move to another type of problem-solving, say profitably interacting with an expert.
MOLE (Knowledge Acquisition System)
 It is a system for heuristic classification problems, such as diagnosing diseases.
 In particular, it is used in conjunction with the cover-and-differentiate problem-solving method.
 An expert system produced by MOLE accepts input data, comes up with a set of candidate explanations
or classifications that cover (or explain) the data, then uses differentiating knowledge to determine
which one is best.
 The process is iterative, since explanations must themselves be justified, until ultimate causes the
ascertained.
 MOLE interacts with a domain expert to produce a knowledge base that a system called MOLE-p (for
MOLE-performance) uses to solve problems.
The acquisition proceeds through several steps:
 Initial Knowledge base construction.
o MOLE asks the expert to list common symptoms or complaints that might require diagnosis.
o For each symptom, MOLE prompts for a list of possible explanations.
o MOLE then iteratively seeks out higher-level explanations until it comes up with a set of ultimate
causes.
o During this process, MOLE builds an influence network similar to the belief networks.
o The expert provides covering knowledge, that is, the knowledge that a hypothesized event might be
the cause of a certain symptom.
o Refinement of the knowledge base.
o MOLE now tries to identify the weaknesses of the knowledge base.
o One approach is to find holes and prompt the expert to fill them.
o It is difficult, in general, to know whether a knowledge base is complete, so instead MOLE
lets the expert watch MOLE-p solving sample problems.
o Whenever MOLE-p makes an incorrect diagnosis, the expert adds new knowledge.
o There are several ways in which MOLE-p can reach the wrong conclusion.
o It may incorrectly reject a hypothesis because it does not feel that the hypothesis is needed
to explain any symptom.
o MOLE has been used to build systems that diagnose problems with car engines, problems in
steel rolling mills, and inefficiencies in coal-burning power plants.
o For MOLE to be applicable, however, it must be possible to pre-enumerate solutions or
classifications.
o It must also be practical to encode the knowledge in terms of covering and differentiating.
o One problem-solving method useful for design tasks is called propose-and-revise.

Anekal
o Propose- and revise systems build up solutions incrementally.

o First, the system proposes an extension to the current design.
o Then it checks whether the extension violates any global or local constraints. Constraints
violations are then fixed, and the process repeats.
SALT Program
 The SALT program provides mechanisms for elucidating this knowledge from the expert.
 Like MOLE, SALT builds a dependency network as it converses with an expert. Each node stands
for a value of a parameter that must be acquired or generated.
 There are three kinds of links:
o Contributes-to: Associated with the first type of link are procedures that allow SALT to
generate a value for one parameter based on the value of another.
o Constraints: Rules out certain parameter values.
Suggests-revision-of: points of ways in which a constraint violation can be fixed.
o
SALT uses the following heuristics to guide the acquisition process:
1. Every non-input node in the network needs at least one contributes-to link coming into it.
2. If links are missing, the expert is prompted to fill them in.
3. No contributes-to loops are allowed in the network.
4. Without a value for at least one parameter in the loop, it is impossible to compute values for any
parameter in that loop.
5.If a loop exists, SALT tries to transform one of the contributes-to links into a constraints link.
6.Constraining links should have suggests-revision-of links associated with them.
7.These include constraints links that are created when dependency loops are broken.
o Control Knowledge is also important.
o It is critical that the system propose extensions and revisions that lead toward a design solution.
o SALT allows the expert to rate revisions in terms of how much trouble they tend to produce.
o SALT compiles its dependency network into a set of production rules.
o As with MOLE, an expert can watch the production system solve problems and can override the
system’s decision.
o At the point, the knowledge base can be changes or the override can be logged for future
inspection.

Anekal

Lecture Notes 18CS753-MODULE5

Uploaded by

Copyright:

Available Formats

Lecture Notes 18CS753-MODULE5

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture Notes 18CS753-MODULE5

Uploaded by

Copyright:

Available Formats

Regulation – 2018 (CBCS Scheme) ARTIFICIAL INTELLIGENCE - 18CS753

MODULE 5- LEARNING, EXPERT SYSTEMS

Prepared by KARTHIKA K – Dept. of CSE, Sri Sairam College of Engineering, Page 1

MODULE 5A- LEARNING

Prepared by KARTHIKA K – Dept. of CSE, Sri Sairam College of Engineering, Page 2

A simple example of rote learning is caching

Prepared by KARTHIKA K – Dept. of CSE, Sri Sairam College of Engineering, Page 3

Prepared by KARTHIKA K – Dept. of CSE, Sri Sairam College of Engineering, Page 4

EMYCIN is an example of such a system. Example Learning System - FOO

Prepared by KARTHIKA K – Dept. of CSE, Sri Sairam College of Engineering, Page 5

(Achieve (>= (and (in-suit-led(card-of me)) (possible

The t terms values of features and the c terms are weights.

Prepared by KARTHIKA K – Dept. of CSE, Sri Sairam College of Engineering, Page 6

Prepared by KARTHIKA K – Dept. of CSE, Sri Sairam College of Engineering, Page 7

LEARNING FROM EXAMPLES: INDUCTION

Prepared by KARTHIKA K – Dept. of CSE, Sri Sairam College of Engineering, Page 8

There are three basic steps to the problem of concept formulation:

Prepared by KARTHIKA K – Dept. of CSE, Sri Sairam College of Engineering, Page 9

Prepared by KARTHIKA K – Dept. of CSE, Sri Sairam College of Engineering, Page 10

Prepared by KARTHIKA K – Dept. of CSE, Sri Sairam College of Engineering, Page 11

Prepared by KARTHIKA K – Dept. of CSE, Sri Sairam College of Engineering, Page 12

CANDIDATE ELIMINATION ALGORITHM

Prepared by KARTHIKA K – Dept. of CSE, Sri Sairam College of Engineering, Page 13

Prepared by KARTHIKA K – Dept. of CSE, Sri Sairam College of Engineering, Page 14

 The negative instance of Japanese car will only cause an inconsistency.

Prepared by KARTHIKA K – Dept. of CSE, Sri Sairam College of Engineering, Page 15

EXPLANATION BASED LEARNING:

Prepared by KARTHIKA K – Dept. of CSE, Sri Sairam College of Engineering, Page 16

IN GENERAL INPUTS TO EBL PROGRAMS:

Prepared by KARTHIKA K – Dept. of CSE, Sri Sairam College of Engineering, Page 17

Prepared by KARTHIKA K – Dept. of CSE, Sri Sairam College of Engineering, Page 18

Old Proof: New Proof:

Prepared by KARTHIKA K – Dept. of CSE, Sri Sairam College of Engineering, Page 19

Prepared by KARTHIKA K – Dept. of CSE, Sri Sairam College of Engineering, Page 20

5B: EXPERT SYSTEMS

Prepared by KARTHIKA K – Dept. of CSE, Sri Sairam College of Engineering, Page 21

Capabilities of Expert Systems

Prepared by KARTHIKA K – Dept. of CSE, Sri Sairam College of Engineering, Page 22

Prepared by KARTHIKA K – Dept. of CSE, Sri Sairam College of Engineering, Page 23

• It should help users to accomplish their goals in shortest possible way.

Prepared by KARTHIKA K – Dept. of CSE, Sri Sairam College of Engineering, Page 24

Prepared by KARTHIKA K – Dept. of CSE, Sri Sairam College of Engineering, Page 25

o In PROSPECTOR, each rule contains two confidence estimates.

Reasoning with the Knowledge

Prepared by KARTHIKA K – Dept. of CSE, Sri Sairam College of Engineering, Page 26

Prepared by KARTHIKA K – Dept. of CSE, Sri Sairam College of Engineering, Page 27

How are expert systems built?

Prepared by KARTHIKA K – Dept. of CSE, Sri Sairam College of Engineering, Page 28

symptoms, hypotheses and causes.

Prepared by KARTHIKA K – Dept. of CSE, Sri Sairam College of Engineering, Page 29

o Propose- and revise systems build up solutions incrementally.

Prepared by KARTHIKA K – Dept. of CSE, Sri Sairam College of Engineering, Page 30

You might also like