Combined A.I Note
Combined A.I Note
Combined A.I Note
Principles and
Concepts of
Machine Intelligence
COPYRIGHT © 2009
1
Introduction
AI is one of the newest sciences. Work started in earnest soon after World War 11, and
the name itself was coined in 1956. Along with molecular biology, AI is regularly cited
as the "field I would most like to be in" by scientists in other disciplines. A student in
physics might reasonably feel that all the good ideas have already been taken by
Galileo, Newton, Einstein, and the rest. AI, on the other hand, still has openings for
several ground breaking contributions.
What is A.I.?
Some definitions of artificial intelligence has been organized into four categories in
Figure 1.
2
principia mathematica ( a book on mathematics), Gelernter’s theorem
(Geometry).
o It appeared that computers could perform well at these formal tasks
simply by being fast at exploring a large number of solution paths and
then selecting the best one- but this assumption turned out to be false
since no computer is fast enough to overcome the combinatorial
explosion generated by most problems
o General Problem Solver (GPS) – (Newell et al., 1963) an attempt to
model commonsense reasoning and symbolic manipulation of logical
expressions.
• The initial drawback of the early attempts was that the programs were designed
to handle large amount of knowledge. But as the AI research progressed and
techniques for larger amounts of knowledge were developed new tasks could
reasonably be attempted. these include:
• Some task Domain in A.I.
• Mundane Tasks:
o perception (vision, speech)
o Natural Language processing (understanding, generation, translation)
o Commonsense reasoning
o Robot control
• Formal tasks
o Games (Checkers, Chess, Ayo etc.)
o Mathematics (Geometry, Logic, Calculus, Proving properties of
Program)
• Expert Tasks
o Engineering (Design, fault finding, manufacturing planning)
o Scientific analysis
o Medical analysis
o Financial analysis
Developments in AI
1. Problem Solving and Planning: This deals with systematic refinement of goal
hierarchy, plan revision mechanisms and a focused search of important goals.
2. Expert Systems: This deals with knowledge processing and complex decision-making
problems.
3. Natural Language Processing: Areas such as automatic text generation, text
processing, machine translation, speech synthesis and analysis, grammar and style
analysis of text etc. come under this category.
4. Robotics: This deals with the controlling of robots to manipulate or grasp objects and
using information from sensors to guide actions etc.
5. Computer Vision: This topic deals with intelligent visualisation, scene analysis,
image understanding and processing and motion derivation.
6. Learning: This topic deals with research and development in different forms of
machine learning.
3
7. Genetic Algorithms: These are adaptive algorithms which have inherent learning
capability. They are used in search, machine learning and optimisation.
8. Neural Networks: This topic deals with simulation of learning in the human brain by
combining pattern recognition tasks, deductive reasoning and numerical computations.
Application Areas of AI
Autonomous Planning and Scheduling, Game Playing, Autonomous Control,
Diagnosis, Logistics Planning, Robotics, Language Understanding and Problem
Solving
4
o Are there techniques that are appropriate for the solution of a variety of
AI problems?
o Can these techniques be useful in solving other problems?
- An AI technique must exploit knowledge in such a way that:
• The knowledge captures generalization
• It can be understood by people who must provide it
• It can easily be modified to correct errors and reflect changes in the world and
in our world view
• It can be used in a great many situations even it is not totally accurate or
complete
• It can be used to help overcome its own sheer bulk by helping to narrow the
range of possibilities that must be considered.
Note: that it is possible to solve AI problems using non-AI techniques (although the
solution are not likely to be very good). It is also possible to apply AI techniques to
the solution of non-AI problems.
Characteristics of AI Techniques
• Search – provides a way of solving important problems for which no more
direct approach is available as well as a framework into which any
direct techniques that are available can be embedded
• Use of Knowledge – Provides a way of solving complex problems by exploiting
the structures of the objects that are involved.
• Abstraction – Provides a way of separating important features and variations
from the many unimportant ones that would otherwise overwhelm any
process.
Why develop AI programs?
Why do we attempt to model human performance:
• To test psychological theories of human performance
• To enable computers to understand human reasoning
• To enable people to understand computer reasoning
• To exploit what knowledge we can glean from people
How do we measure system’s Intelligence?
• Turing Test (Alan Turing, 1950): A method to determine whether a system can think.
Problem Spaces
Solving AI problems require four things:
• Define the problem precisely: this must include the specification of the initial
situation(s) and the goal of the problem solving process
• Analyze the problem: the emergent features after analysis can help determine
the appropriateness of various possible techniques for solving the problem.
• Isolate and represent the task knowledge that is necessary to solve the problem.
• Choose the best problem-solving technique(s) and apply it (them) to the
particular problem.
5
• AI problems are abstractions of a state space search containing the set of
possible states, the operators and the set of rules for transformation.
• State spaces:
9 forms the basis of most AI problems
9 It allows for a formal definition of a problem in way that shows the
initial state to a goal state using a set of permissible operations
9 It permits us to define the process of solving a particular problem as a
combination of known techniques (each represented as a rule defining a
single step in the state space search), the general technique of exploring
the space in order to find some path from the current state to a goal state.
9 Most AI problems are NP problems. Hence the search is a very important
process in the solution of hard problems for which no more direct
techniques are available.
Exercise
1. Find a good state space representation for the following problems:
i) Traveling salesman
ii) Missionaries and cannibals
iii) Tower of Hanoi
Example
Water Jug Problem: You are given two jugs, a 4-gallon one and a 3-gallon one.
Neither has any measuring made on it. There is a pump that can be used to fill the
jugs with water. How can you get exactly 2 gallons of water into the 4- gallon jug?
6
Although such a specification must somehow be provided before we can design a
program to solve the problem, producing such a specification itself a very hard
problem.
• Generally problem can be solved by using the roles in the state space search, in
combination with appropriate control strategy, to move goal state is found.
Therefore the process is search is fundamental to the problem solving process.
Production Systems
Consists of:
• a set of rules, each consist of a left side ( a pattern) that determines the applicability
of the rule and a right side that describes the operation to be performed,
• one or more knowledge database ( knowledge base) that contains relevant and
appropriate information for the task.
• A control strategy that specifies the order in which the rules will be compared to the
database and a way of resolving the conflict that arise when several rules match at
ones.
• a rule applier
Examples: OPSS [Brownston et. al 1985], ACT * [Anderson 1983]
- Expert System Shells
- General Problem-solving architectures like SOAR [Liard et. al 1989]
Control strategies
Requirements
1. Causes motion: should be able to lead to a solution
2. Systematic: should be based on structure that bring motion e.g. tree or graph
that can facilitate effective search process.
Searching Techniques
Algorithm:
Breath-First Search
1. Create a variable called NODE-LIST and set to initial state
2. Until a goal state is found or NODE-LIST is empty do:
a) Remove the first element from the NODE-LIST and call it E.
If NODE-LIST was empty, quit
b) for each way that each rule can match the state described in E do:
i) Apply the rule to generate a new state
ii) If the new state is a good state, quit and return this state
iii) Otherwise, add the new state to the end of NODE-LIST
Description
• Construct a tree with the initial state as its roots. Generate all offspring of the root
by applying each of the applicable rules to the initial state. Thereafter, for each leaf
node, generate all successors by the rules that are appropriate.
• Continue this process until some rule produces a goal state.
7
Exercise:
Apply the breadth-first search to the water jug problem?
Depth-first Search
• Pursues a single branch of the tree until it yields a solution or until a decision to
terminate the path is made (i.e. when it reaches a dead end, when the length of
the path exceeds the Futility Limit). Thereafter backtracking occurs. It
backtracks to the most recently created state from which alternative moves are
available.
• Chronological backtracking – because the ruler in which steps are undone
depends only on temporal sequence in which the steps were originally made.
The most recent steps is always the first to be undone.
8
One-level of a Breadth-First Search Tree
•
(0,0)
(4,0) (0,3)
(0,0)
(4,0) (0,3)
(0,0)
Algorithm: Depth-First Search
• First the initial state is a good state, quit and return success.
• Otherwise, do the following until success or failure is signaled:
(a) Generate
(4,0) a successor, E, of the initial state. If there are no more successors,
signal failure.
(b) Call Depth-First Search with E as the initial state.
(c) If success is returned, signal success, otherwise continue this loop.
(4,3)
Advantages of Depth-First Search
• Requires lessA memory
Depth-First Search
since only Tree
the nodes on the current path are stored, this
contrast with breadth-first search, where all of the tree that has so far been
generated must be stored.
• By chance (if care is taken in ordering the alternative successor states), may find
solution without much searching.
• In breadth first all nodes at level n must be examined before any node on level
n+1 can be examined. This is particularly significant if many acceptable solutions
exist. Depth-first search can stop when one of them is found.
Disadvantage
9
• A wrong path may be followed, and it can get trapped if there are loops on that
path.
• May find a solution on a loop path of a tree not necessarily the nominal path.
Note: the Best-First Search combines the strengths of these two algorithms to achieve
a better implementation.
Exercise
Discuss the traveling salesman problem?
A salesman has a list of cities, each of which he must visit exactly once. There are
direct roads between each pair of cities on the list. Find the route the salesman
should follow for the shortest possible trip that both starts and finishes at any one of
the cities.
IB
OS
ON
LA BE
- Combinatorial explosion
- Branch and bound strategy
- NP problem.
No of path among cities = (N-1)!
Total time to perform the search = N!
Branch and bound Technique: Begin generating complete paths, keeping track of
the shortest path found so far. Give up exploring any path as soon as its partial length
becomes greater than the shortest path found so far.
Heuristic Search
• A heuristic is a technique that improves the efficiency of a search process,
possibly by sacrificing claims of completeness.
• It is a control structure that is not guaranteed to find the best answer but will
always find a good answer.
• Heuristics are rules of the thumbs that can guide for correctness unlike
algorithms.
• Heuristics help to find good though non-optimal solutions to NP problems.
• There are general purpose heuristics and also domain-specific heuristics. e.g.
Nearest neighbour heuristic, which works by selecting the locally superior
alternative at each step.
• Error bounds (applicable to general purpose heuristics)
• Heuristics solves the problem of combinational explosion.
Arguments in favour of Heuristics
10
• Rarely do we actually need the optimum solution in the real world, a good
approximation will usually serve very well
• Although Approximation produced by heuristics may not be very good in the
worst case, worst cases rarely arise in the real world.
• Understanding why heuristics works or why it doesn’t work, often leads to a
deeper understanding of the problem.
• Domain specific heuristic knowledge can be incorporated into a rule-based search
procedure by:
• Putting them in rules
• As heuristic function that evaluates individual problem states and determines
how desirable they are.
• Heuristic Function is a function that maps from problem state description to
measures of desirability, usually represented as numbers. The value of the heuristic
function at any given node is meant to be as good an estimate as possible whether
that node is on the derived path to a solution. It specifies the level of importance of
that node to the path of solution. It is designed to efficiently guide a search process
toward a solution. The purpose of the heuristic function is to guide the search
process in the most profitable direction by suggesting which path to follow first
when more than one is available.
Characteristics of Problems
• Decomposable problems: problems can be decomposed into smaller or easier
components?
• Ignorable problems: solution steps can be ignored when considered not
necessary (e.g. theorem proving)
• Recoverable: in which solutions steps can be undone
• Irrecoverable: in which solutions steps cannot be undone
• Certain-outcome: lead to definite outcome
• Uncertain-outcome: produces a probability to lead to a solution
(the hardest problems to solve are those that are irrecoverable, uncertain-
outcome) e.g. advising a lawyer who is defending a client who is standing trial
for murder
• Problems that require absolutely good solution and those that require relatively
good solution e.g. traveling salesman algorithm (Any-path/Best path) problem.
• Problem that require a solution as state or path.
Characteristics of Production systems
• Production systems are a good way to describe the operations that are
performed in a search for a solution to a problem.
• Questions
• Can Production system be described with characteristics that shed some light on
how they can easily be implemented?
• If so, what relationship are there between problem types and types of production
systems suited to solving the problems.
Types of Production systems
11
• Monotonic production systems: the application of a rule never prevents the later
application of another rule that could have been applied at the time the first rule
was selected.
• Non-Monotonic P.S: lacks the above attribute of Monotonic system
• Partially Commutative P.S.: if the application of a particular sequence of rules
transforms state x into state y, then any permutation of those rules that is
allowable also transforms state x into state y. (i.e. the order of selection of the
set of rules is not important)
• Commutative P.S: is both monotonic and partially commutative
• Partially commutative, monotonic systems are useful for solving ignorable
problems. Problems that involve creating new things rather than changing old
ones are generally ignorable. e.g. Theorem proving, and making deductions
from know facts.
• Non-Monotonic, Partially Commutative P.S. are good for problems where
changes occur but can be reversed and in which order of operations is not
critical. e.g. Physical manipulation systems like ‘robot navigation on flat plane’
• Non-partially commutative P.S. are useful for problems in which irreversible
changes occur e.g. declaring the process of chemical compound production
(Chemical synthesis).
Issues in the Design of Search Programs
• Every search process can be viewed as the traversal of a tree structure
• The node represents a problem state
• Each arc represents a relationship between the state nodes its connects
• In most programs the rules are not represented explicitly, rather the tree is
represented implicitly in the rules and generates explicitly only those paths that
they decide to explore.
• (Therefore, keep in mind the distinction between implicit search trees and the
explicit partial search tree that are actually constructed by the search tree that
are actually constructed by search program.)
• The direction of search (forward / backward reasoning.)
• Selection of applicable rules (matching)
• How to represent each node of the search process (Knowledge representation
problem and the Frame problem)
Search Tree/Graph
• The concept of search graph eliminates the repeated traversal of previously
expanded and process nodes that may reappear. This makes the search space to
be an arbitrary directed graph rather than a tree. The graph differs from a tree in
that several paths may come together as a node. (See Figure 1.3 below)
12
(0,0)
(4,0) (0,3)
13
• Simulated Annealing (Kirkpatrick et al. 1983)
• Algorithm:
1. Evaluate the initial state. If it is a goal state, then return it and quit. Otherwise
continue with the initial state as the current state.
2. Initialize BEST-SO-FAR to the current state
3. Initialize T according to the annealing schedule
4. Loop until a solution is found or until there are no new operators left to be
applied in the current state
a) Select an operator that has not yet been applied to current state and apply
it to produce a new state
b) Evaluate the new state. Compute
ΔE = (value of current) – ( value of new state)
- if the new state is good state, then return it and quit state
- if is not a good state but is better than the current state, then make it the
current state. Also set
BEST-SO-FAR to this new state
- if it is not better than the current state, then make it current state with
probability P’ where (P’ = e-ΔE/T). This step is usually implemented by
invoking a random number generator to produce a number in the range [0,1].
If that number is less than P’, then the move is accepted. Otherwise do
nothing.
(c) Revise T as necessary according to annealing schedule
5. Return BEST-SO-FAR as the answer.
14
Best- First Search
• Combines the good attributes of both Depth-First Search and Breadth-First
Search.
• Depth first search is good because it allows a solution to be found without all
computing branches having to be expanded.
• Breadth-first search is good because it does not get trapped on dead-end paths.
The Best-first combines these two attributes to follow a single path at a time,
but switch paths whenever some competing path looks more promising than the
current one does.
• At each step of the best-first search process, we select the most promising of the
nodes generated so far. This is done by applying an appropriate heuristic
function to each of them. We then expand the chosen node by using the rules to
generate its successors. If one of them is a solution, we can quit. If not, all those
new nodes are added to a set of nodes generated so far. Again the most
promising node is selected and the process continues. Usually what happens is
that a bit of depth-first searching occurs as the most promising branch is
explored. But eventually, if a solution is not found, that branch will start to look
less promising than on e of the top-level branches that had been ignored. At that
point, the now more promising, previously ignored branch will be explored. But
the old branch is not forgotten. Its last node remains in the set generated but
unexpanded nodes. The search return to it when all the others get bad enough
that it is again the most promising path.
Figure 1.4 A
A
A
B C D B C D
A
A E F
B C D
B C D
G H E F
G H E F
I J
15
• OPEN - nodes that have been generated and have had the heuristic function
applied to them but which have not yet been examined ( i.e. had their
successors generated)
• OPEN is actually a priority queue in which the elements with the highest
priority are those with the most promising value of the heuristic function.
(Revise: Standard technique for manipulating queues.)
• CLOSED- nodes that have already been examined. We need to keep the nodes
in memory if we want to search a graph rather than a tree, since whenever a new
node is generated, we need to check whether it has been generated before.
Algorithm
1. Start with OPEN containing just the initial start
2. until a goal is found or there are no nodes left on OPEN do:
(a) Pick the best node on OPEN
(b) Generate its successors
(c) For each successor do:
i. If it has not been generated before, evaluate it, add it to OPEN,
and record its parent.
ii. If it has been generated before, change the parent if this new path
is better than the previous one. In that case, update the cost of
getting to this node any successors that this node may already
have.
It is clear from the above algorithm that the algorithm continues the possibility of
exploring a new state in each iteration of the repeat-until loop and exits only when the
current state is equal to the goal. Most important part in the algorithm is to generate a
new state. This is not an easy task. If generation of new states is not feasible, the
algorithm should be terminated.
In our simple algorithm, we, however, did not include this intentionally to keep it
simplified. But how does one generate the states of a problem? To formalize this, we
define a four tuple, called state space, denoted by
{ nodes, arc, goal, current },
where
nodes represent the set of existing states in the search space;
16
an arc denotes an operator applied to an existing state to cause transition to another
state;
goal denotes the desired state to be identified in the nodes; and
current represents the state, now generated for matching with the goal.
The state space for most of the search problems we will cover in this chapter takes the
form of a tree or graph2.
The algorithm for traversal in a tree by depth first manner can be presented with a
queue as follows:
Procedure Breadth-first-search
Begin
i) Place the starting node in a queue;
ii) Repeat
Delete queue to get the front element;
If the front element of the queue = goal,
return success and stop;
Else do
Begin
insert the children of the front element,
if exist, in any order at the rear end of
the queue;
End
17
Until the queue is empty;
End.
The breadth first search algorithm, presented above, rests on a simple principle. If the
current node is not the goal add the offspring of the current in any order to the rear end
of the queue and redefine the front element of the queue as the current. The algorithm
terminates, when the goal is found.
Fig. 2: First few steps of breadth first search on the tree of fig. 1.
Time Complexity
For the sake of analysis, we consider a tree of equal branching factor from each node =
b and largest depth = d. Since the goal is not located within depth (d-1), the number of
false search [1], [2] is given by
1+b+b2 +b3 + … + bd-1 = (bd-1) / (b-1), b>>1.
Further, the first state within the fringe nodes could be the goal. On the other hand, the
goal could be the last visited node in the tree. Thus, on an average, the number of fringe
nodes visited is given by (1+bd) / 2.
Consequently, the total number of nodes visited in an average case becomes
(bd-1) / (b-1) + (1+bd ) / 2
≡ bd (b+1) / 2(b-1).
Since the time complexity is proportional to the number of nodes visited, therefore, the
above expression gives a measure of time complexity.
Space Complexity
The maximum number of nodes will be placed in the queue, when the leftmost node at
depth d is inspected for comparison with the goal. The queue length under this case
becomes bd. The space complexity of the algorithm that depends on the queue length,
in the worst case, thus, is of the
order of bd. In order to reduce the space requirement, the generate and test algorithm is
realized in an alternative manner, as presented below.
18
Depth First Search
The depth first search generates nodes and compares them with the goal along the
largest depth of the tree and moves up to the parent of the last visited node, only when
no further node can be generated below the last visited node. After moving up to the
parent, the algorithm attempts to generate a new offspring of the parent node. The
above principle is employed recursively to each node of a tree in a depth first search.
One simple way to realize the recursion in the depth first search algorithm is to employ
a stack. A stack-based realization of the depth first search algorithm is presented below.
End
Fig. 3. Depth first search on a tree, where the node numbers denote the order of visiting
that node.
In the above algorithm, a starting node is placed in the stack, the top of which is pointed
to by the stack-top. For examining the node, it is popped out from the stack. If it is the
19
goal, the algorithm terminates, else its children are pushed into the stack in any order.
The process is continued until the stack is empty. The ascending order of nodes in Fig.
3 represents its traversal on the tree by depth first manner. The contents of the stack at
the first few iterations are illustrated below in Fig 4. The arrowhead in the figure
denotes the position of the stack-top.
Space Complexity
Maximum memory in depth first search is required, when we reach the largest depth at
the first time. Assuming that each node has a branching factor b, when a node at depth
d is examined, the number of nodes saved in memory are all the unexpanded nodes up
to depth d plus the node being examined. Since at each level there are (b-1) unexpanded
nodes, the total number of memory required = d (b -1) +1. Thus the space complexity
of depth first search is a linear function of b, unlike breadth first search, where it is an
exponential function of b. This, in fact, is the most useful aspect of the depth first
search.
Time Complexity
If we find the goal at the leftmost position at depth d, then the number of nodes
examined = (d +1). On the other hand, if we find the goal at the extreme right at depth
d, then the number of nodes examined include all the nodes in the tree, which is
1+b+b2 +b3 +…+bd = (bd+1 -1) / (b-1)
So, the total number of nodes examined in an average case
= (d+1) /2 + (bd+1 -1) / 2(b-1)
≡b( bd + d) / 2 (b -1)
This is the average case time complexity of the depth first search algorithm. Since for
large depth d, the depth first search requires quite a large runtime, an alternative way to
solve the problem is by controlling the depth of the search tree. Such an algorithm,
where the user mentions the initial depth cut-off at each iteration, is called an Iterative
Deepening Depth First Search or simply an Iterative deepening search.
20
Iterative Deepening Search
When the initial depth cut-off is one, it generates only the root node and examines it. If
the root node is not the goal, then depth cut-off is set to two and the tree up to depth 2 is
generated using typical depth first search. Similarly, when the depth cut-off is set to m,
the tree is constructed up to depth m by depth first search. One may thus wonder that in
an iterative happening search, one has to regenerate all the nodes excluding the fringe
nodes at the current depth cut-off. Since the number of nodes generated by depth first
search up to depth h is
( bh+1-1) / (b-1),
the total number of nodes expanded in failing searches by an iterative deepening search
will be
The last pass in the algorithm results in a successful node at depth d, the average time
complexity of which by typical depth first search is given by
b( bd + d) / 2 (b -1).
Thus the total average time complexity is given by
b(bd - d) / (b-1)2 + b( bd + d) / 2 (b -1).
≡ (b+1) bd+1 / 2 (b -1)2.
Consequently, the ratio of average time complexity of the iterative deepening search to
depth first search is given by
{(b+1) bd+1 / 2 (b -1)2 } : { bd+1 / 2 (b-1)}
= (b+1): (b -1).
The iterative deepening search thus does not take much extra time, when compared to
the typical depth first search. The unnecessary expansion of the entire tree by depth first
search, thus, can be avoided by iterative deepening. A formal algorithm of iterative
deepening is presented below.
Procedure Iterative-deepening
Begin
1. Set current depth cutoff =1;
2. Put the initial node into a stack, pointed to by stack-top;
3. While the stack is not empty and the depth is within the
given depth cut-off do
Begin
Pop stack to get the stack-top element;
if stack-top element = goal, return it and stop
else push the children of the stack-top in any order
into the stack;
End While;
21
4. Increment the depth cut-off by 1 and repeat
through step 2;
End.
The breadth first, depth first and the iterative deepening search can be equally used for
Generate and Test type algorithms. However, while the breadth first search requires an
exponential amount of memory, the depth first search calls for memory proportional to
the largest depth of the tree. The iterative deepening, on the other hand, has the
advantage of searching in a depth first manner in an environment of controlled depth of
the tree.
Procedure Hill-Climbing
The ‘generate and test’ type of search algorithms presented above only expands the
search space and examines the existence of the goal in that space. An alternative
approach to solve the search problems is to employ a function f(x) that would give an
estimate of the measure of distance of the goal from node x. After f(x) is evaluated at
the possible initial nodes x, the nodes are sorted in ascending order of their functional
values and pushed into a stack in the ascending order of their ‘f’ values. So, the stack-
top element has the least f value. It is now popped out and compared with the goal. If
the stack-top element is not the goal, then it is expanded and f is measured for each of
its children. They are now sorted according to their ascending order of the functional
values and then pushed into the stack. If the stack-top element is the goal, the algorithm
exits; otherwise the process is continued until the stack becomes empty. Pushing the
sorted nodes into the stack adds a depth first flavor to the present algorithm. The hill
climbing algorithm is formally presented below.
Begin
1. Identify possible starting states and measure the distance (f) of their
closeness with the goal node; Push them in a stack according to the
ascending order of their f ;
2. Repeat
Pop stack to get the stack-top element;
If the stack-top element is the goal, announce it and exit
Else push its children into the stack in the ascending order of their f values;
Until the stack is empty;
End.
The hill climbing algorithm too is not free from shortcomings. One common problem is
trapping at local maxima at a foothill. When trapped at local maxima, the measure of f
at all possible next legal states yield less promising values than the current state. A
second drawback of the hill climbing is reaching a plateau [2]. Once a state on a plateau
is reached, all legal next states will also lie on this surface, making the search
ineffective. A new algorithm, called simulated annealing, discussed below could easily
solve the first two problems. Besides the above, another problem that too gives us
22
trouble is the traversal along the ridge. A ridge (vide fig. 4.5) on many occasions leads
to a local maxima. However, moving along the ridge is not possible by a single step due
to non-availability of appropriate operators. A multiple step of movement is required to
solve this problem.
Simulated Annealing
“Annealing” is a process of metal casting, where the metal is first melted at a high
temperature beyond its melting point and then is allowed to cool down, until it returns
to the solid form. Thus in the physical process of annealing, the hot material gradually
loses energy and finally at one point of time reaches a state of minimum energy. A
common observation reveals that most physical processes have transitions from higher
to lower energy states, but there still remains a small probability that it may cross the
valley of energy states [2] and move up to a energy state, higher than the energy state of
the valley. The concept can be verified with a rolling ball. For instance, consider a
rolling ball that falls from a higher (potential) energy state to a valley and then moves
up to a little higher energy state (vide fig. 4.6). The probability of such:
23
announce it and exit;
Else do
Begin
a) generate children of the stack-top element N and
compute f for each of them;
b) If measure of f for at least one child of N is improving
Then push those children into stack in ascending order of
their f;
c) If none of the children of N is better in f
Then do
Begin
a) select any one of them randomly, compute its p’ and test whether p’ exceeds a
randomly generated number in the interval [0,1]; If yes, select that state as the next
state; If no, generate
another alternative legal next state and test in this way until one move can be selected;
Replace stack-top element by the selected move (state);
b) Reduce T slightly; If the reduced value is negative, set it to zero;
End;
Until the stack is empty;
End.
The algorithm is similar to hill climbing, if there always exists at least one better next
state than the state, pointed to by the stack-top. If it fails, then the last begin-end
bracketed part of the algorithm is invoked. This part corresponds to simulated
annealing. It examines each legal next state one by one, whether the probability of
occurrence of the state is higher than the random value in [0,1]. If the answer is yes, the
state is selected, else the next possible state is examined. Hopefully, at least one state
will be found whose probability of occurrence is larger than the randomly generated
probability.
Another important point that we did not include in the algorithm is the process of
computation of E. It is computed by taking the difference of the value of f of the next
state and that of the current (stack-top) state.
The third point to note is that T should be decreased once a new state with less
promising value is selected. T is always kept non-negative. When T becomes zero, p’
will be zero and thus the probability of transition to any other state will be zero.
In fact, we would expand nodes by judiciously selecting the more promising nodes,
where these nodes are identified by measuring their strength compared to their
competitive counterparts with the help of specialized intuitive functions, called
heuristic functions.
24
Heuristic search is generally employed for two distinct types of problems: i) forward
reasoning and ii) backward reasoning. We have already discussed that in a forward
reasoning problem we move toward s the goal state from a pre-defined starting state,
while in a backward reasoning problem, we move towards the starting state from the
given goal. The former class of search algorithms, when realized with heuristic
functions, is generally called heuristic Search for OR-graphs or the Best First search
Algorithms. It may be noted that the best first search is a class of algorithms, and
depending on the variation of the performance measuring function it is differently
named. One typical member of this class is the algorithm A*. On the other hand, the
heuristic backward reasoning algorithms are generally called AND-OR graph
search algorithms and one ideal member of this class of algorithms is the AO*
algorithm. We will start this section with the best first search algorithm.
Procedure Best-First-Search
Begin
1. Identify possible starting states and measure the distance (f) of their
closeness with the goal; Put them in a list L;
2. While L is not empty do
Begin
a) Identify the node n from L that has the minimum f; If there
exist more than one node with minimum f, select any one of them
(say, n) arbitrarily;
b) If n is the goal
Then return n along with the path from the starting node,
25
and exit;
Else remove n from L and add all the children of n to the list L,
with their labeled paths from the starting node;
End While;
End.
As already pointed out, the best first search algorithm is a generic algorithm and
requires many more extra features for its efficient realization.
For instance, how we can define f is not explicitly mentioned in the algorithm. Further,
what happens if an offspring of the current node is not a fringe node. The A* algorithm
to be discussed shortly is a complete
realization of the best first algorithm that takes into account these issues in detail. The
following definitions, however, are required for presenting the A* algorithm. These are
in order.
Definition 4.1: A node is called open if the node has been generated and the h’ (x) has
been applied over it but it has not been expanded yet.
Definition 4.2: A node is called closed if it has been expanded for generating
offsprings.
In order to measure the goodness of a node in A* algorithm, we require two cost
functions: i) heuristic cost and ii) generation cost. The heuristic cost easures the
distance of the current node x with respect to the goal and is denoted by h(x). The cost
of generating a node x, denoted by g(x), on the other hand measures the distance of
node x with respect to the starting node in the graph. The total cost function at node x,
denoted by f(x), is the sum of g(x) plus h(x).
26
Knowledge & Reasoning
Issues in Knowledge Representation
• Knowledge are facts that can be exploited by AI programs in order to generate
good results
• The main entities are:
• Facts: truths in some relevant world
• Representation: the formalism used to describe facts in a way that they can be
manipulated.
• These two entities should be structured at two levels:
• Knowledge level – at which facts are described
• Symbol level – at which representation of objects at the knowledge level are
defined in terms of symbols that can be manipulated by programs.
Mapping between facts and Representation
Reasoning Program
Facts Internal Representation
English Representation
27
• Acquisitional Efficiency: - the ability to acquire new information easily. It
should be possible to make direct insertion into the database and the addition of
new knowledge.
28
• At what level should knowledge be represented? Is there a good set of
primitives into which knowledge can be broken down? Is it helpful to
use such primitives?
• How should sets of objects be represented
• Given a large amount of knowledge stored in a database, how can
relevant parts be accessed when they are needed?
• (Reference E. Rich, K Knight, 1999, Artificial intelligence)
Answers to Questions
1. (Yes) instance and isa i.e. class membership and class inclusion are
common and support inheritance
2. (Yes)
a. Inverses: if we can define a relationship from the perspective of
objects A and B. It is also possible to define also, an inverse
relationship from the perspective of B to A.
b. Existence in isa hierarchy: just as there are classes of and
specializations of attributes.
• There exist techniques for Reasoning about values:
- information about the type of value
- constraints on the value
- rules for computing values when it is needed
- rules that describe action that should be taken, if a value ever
becomes known.
• Single-valued attributes
- explicit notation to track duplication
- replacement of an old value
Representation should be done at a variety of granularities i.e. both the use of low level
primitives and high-level form should be used. But the particular domain will determine
which should be more employed.
Exercise
Investigate the frame problem
29
Knowledge based Agents
• The knowledge base is the central component of a knowledge-based agent.
• A knowledge base is a set of representation of facts about the world, that cause
the knowledge based agent to adapt to its environment.
• Each individual representation in a knowledge base is called Sentence. The
sentences are expressed in a knowledge representation language.
• A Knowledge base must have a Tell (assert) and Ask (query) mode
respectively. In essence knowledge based agent takes a percept as input and
returns an action.
• The expectation from a knowledge based agent is that when it asked a question,
the response should follow what it has been told which will be the proof of
reasoning.
• The knowledge base of a knowledge-based agent is referred to as background
knowledge. Every time an agent program is called, two things happens
o 1) it tells the knowledge base what it perceives and
o 2) it asks the knowledge base what action it should perform.
• In doing this, the concept of logical reasoning is used to determine which action
is better than others.
• The objective of knowledge representation is to express knowledge in a
computer-traceable form, such that it can be used to help agents perform well.
• A knowledge representation is defined by two aspects:
o The Syntax of a language which describes the possible
configuration that can constitute sentences
o Semantics, which determines the facts in the world to which the
sentence refer.
• Provided the syntax and semantics are precisely defined, we can call the
language a Logic. Also from the syntax and semantics we can derive an
interface mechanism for an agent using the language.
• An Inference procedure can do one of two things:
o 1. Given a knowledge base KB, it can generate new sentences α that
purport to be entailed by KB Or
30
o 2. Given a knowledge base KB and another sentence α, it can report
whether or not α is entailed by KB. An inference procedure that
generalizes sentences is called sound or truth preserving.
• Entailment: In mathematical notation, the relation entailment between a
knowledge base KB and a sentence α is pronounced “KB entails
α” and written as:
KB╞ α.
• An inference procedure i can be described by the sentences it can derive. If i can
derive α from KB, this could be written as KB╞ i α. i.e. alpha is
derived from KB by i or i derives alpha from KB.
• The record of operation of a sound inference procedure is called a proof.
• An inference procedure is complete if it can find a proof for any sentence that is
entailed.
• In summary:
• Logic consist of:
1. A formal system of describing states of affairs, consisting of:
a) The syntax of the language, which describes how to make sentences,
b) The semantics of the language, which states the systematic constraints on how
sentences relates to states of affairs.
2. The proof Theory- a set of rules for deducing the entailments of a set of
sentences.
• Therefore, by examining the semantics of a logical language, we can extract the
semantics of the language.
Types of Agent Programs
Simple Reflex Agent
Model-based Agent
Goal-based Agent
Utility-based Agent
Learning Agent
31
Knowledge Representation with First-Order Logic
• Propositional logic can also be used but it lacks sufficient primitives for
representing knowledge in a logical way. It has limited ontology, making
only the commitment that the world consists of facts. But this is not true and
grossly inadequate.
• Predicate logic (Firs-order logic) makes a stronger set of ontological
commitments. It has primitives to represent objects, relations properties and
functions. The main reason for this is that the world consists of an object,
that is, things with individual identities and properties that distinguish them
from other objects. Among these objects various relations hold. Some of
these are functions- relations in which there is one “value” for a given
“input”. Examples of objects, properties, relations and functions include:
• Objects: people, houses, numbers, ball etc.
• Relations: brotherof, biggerthan, inside, part of, hascolor etc.
• Properties: red, round, yellow, prime
• Functions: fatherof, bestfriend etc.
• First-order logic has sentences, but it also has terms, which represents
objects. Constant symbols, variables and function symbols are used to build
terms, and quantifiers and predicate symbols are used to build sentences.
Syntax of First-Order Logic (with equality in Backus-Nomal Form)
Sentences → Atomic sentences
| sentence connective sentence
| Quantifier variable… Sentence
| ┐sentence | ┐(sentence)
Atomic sentence → Predicate ( Term, …) | Term = Term
Term → Function ( Term,…) | Constant | Variable
Connective → ═>| Λ | V | <=>
Quantifier → V | Ǝ
Constant → A | X1 | John
Variable → a | x | s…
Predicate → Before | Hascolor | Raining | ---
32
Function → Mother| Leftlegof |
33
• Thus the universal quantifier makes statements about every object in the
universe. The existential quantifier allows us to make statements about some
object in the universe without naming it.
• Ǝ sister (x, spot) Λ cat (x) – This is pronounced “ There exist x, such that x is a
sister to spot and x is a cat
Exercise
Represent the following sentences in first-order logic using a consistent vocabulary you
must define.
a) Not all student take both History and Biology
b) Only one student failed History
c) Only one student failed both History and Biology
d) The best score in History was better than the best score in Biology
e) Every person who dislikes all vegetarian is smart
f) No person likes smart vegetarians
g) There is a woman who likes all men who are not vegetarians
h) There is a barber who shaves all men in town who do not shave themselves
i) No person likes a Professor unless the professor is smart
j) Politicians can fool some of the people all of the time, and they can fool all of
the people some of the time, but they can’t fool all of the people all of the time.
Sample Solution
a) V student(x) => ┐ (take(x, History) Λ take(x, Biology))
b) V student(x, onlyone) = student_failed(x, history)
c) V student(x, onlyone) => student_failed(x, history) Λ
student_failed(x, biology)
d) V betterthan(bestscore(x, history), bestscore(x, biology))
(e) V person(x) Ʌ dislike(x, vegetarian) → smart(x)
34
Semantic Networks
Some of the networks have been explicitly designed to implement hypotheses about
human cognitive mechanisms, while others have been designed primarily for computer
efficiency.
35
process information. In artificial intelligence (AI) the primary aim is to store knowledge
so that programs can process it and achieve the verisimilitude of human intelligence.
In cognitive theory
For some authors knowledge is stored either in episodic or semantic memory. The
further is organized in spacio-temporal dimensions, the second according semantic
content-oriented principles, e.g. networks of concepts.
In computer science
The advantages of knowledge representation structure like semantic network over First-
Order Logic includes the fact that:
1. It makes it easy to describe properties of relations
2. It is a form of object-oriented programming and has the advantages that such
systems normally have, including modularity and ease of viewing by people.
36
Here is an example of semantic nets:
Figure 1. Animals-Birds-Tweety
The major problem with semantic nets is that although the name of this knowledge
representation language is semantic nets, there is not, ironically, clear semantics of the
various network representations. For the above example, it can be interpreted as the
representation of a specific bird named Tweety, or it can be interpreted as a
representation of some relationship between Tweety, birds and animals.
Semantic networks can also include other explicit kind of relationships among concept
and concept types that adequately represent the semantic relationships among entities.
As an example, Figure 2 shows a KL-ONE network that defines the concepts Truck and
TrailerTruck as subtypes of Vehicle.
37
Figure 2: Truck and TrailerTruck in KL-ONE
Figure 2 has nine ovals for concept nodes and nine arrows, which represent different
kinds of links. The white ovals represent generic concepts for the types, as
distinguished from the shaded oval, which is an individual concept for the instance 18.
The oval marked with an asterisk * indicates that Integer is a built-in or primitive type.
The concepts Truck and TrailerTruck are defined in Figure 2, but Vehicle, Trailer,
WtMeasure, and VolMeasure would have to be defined by other KL-ONE diagrams.
Generally, formal graph notations annotated with specific labels can be used for
semantic network representations.
Exercises
Show the representation of the following: i) Student ii) Teacher iii) Worker iv)
Computer.
38
Learning
What is Learning?
• Learning is the process through an entity acquires knowledge.
• Machine intelligence is not natural and automatic, an intelligent machine is one
that exhibits intelligence after a process of learning.
Approaches to Learning:
• Symbolic learning: describes systems that formulate and modify rules, facts,
and relationships, explicitly represented in words or symbols. In other words
they create and modify their own knowledge base.
• Numerical learning: refers to systems that use numerical models, where
certain techniques are used for optimizing the numerical parameters. Examples
include neural networks, genetic algorithms and simulated annealing.
• Learning can be with a teacher in which case it is said too be supervised.
Unsupervised learning is learning without a teacher.
Classification of learning
• Rote Learning: The system is given confirmation of correct decisions.
When it produces incorrect decision it is “spoon fed” with the correct rule or
relationship that it should have used.
• Learning from Advice: Rather than being given a specific rule that should
apply in a given circumstance, the system is given a piece of general advice,
such as “gas is more likely to escape from a valve than from a pipe”. The
system must sort out for itself how to move from this high level advice to an
immediately usable rule.
• Learning by Induction: The system is presented with sets of example data
and is told the correct conclusion that is should draw from each. The system
continually refines its rules and relations so as to correctly handle each new
example.
• Learning by Analogy: The system is told the correct response to a similar,
but not identical task. The system must adapt the previous response to
generate a new rule applicable to the new circumstances.
• Explanation-Based Learning (EBL): The system analyzes a set of
examples solutions and their outcomes to determine why each one was
successful or otherwise. Explanations are generated, which are used to guide
future problem solving. An example of an EBL system is PRODIGY ( a
general purpose problem-solver).
• Case-Based Reasoning (CBR): Any case about which the system has
reasoned is filed away, together with the outcome, whether it is successful
or otherwise. Whenever a new case is encountered, the system adapts its
stored behaviour to fit the new circumstances.
• Explorative or Unsupervised Learning: This is also called discovery
learning, rather than having an explicit goal, an explorative system
continuously searches for patterns and relationships in the input data,
perhaps marking some patterns as interesting and warranting further
investigation. Examples of application of unsupervised learning can be
found :
39
o data mining : where patterns are sought among large or complex
data sets;
o Identifying clusters, possibly for compressing the data;
o Feature recognition
2. Expert systems
• Expert systems are meant to solve real problems which normally would
require a specialised human expert (such as a doctor or a minerologist).
• Building an expert system entails extracting the relevant knowledge from
the human expert and representing the ‘heuristic knowledge’ in knowledge
base.
• A knowledge engineer has the job of extracting this knowledge and building
the expert system. This is called Knowledge Elicitation and Encoding.
• The most widely used knowledge representation scheme for expert systems
is rules (sometimes in combination with frame systems).
• Typically, the rules won't have certain conclusions - there will just be some
degree of certainty that the conclusion will hold if the conditions hold.
Statistical techniques are used to determine these certainties. Rule-based
systems, with or without certainties, are generally easily modifiable and
make it easy to provide reasonably helpful traces of the system's reasoning.
These traces can be used in providing explanations of what it is doing.
• Expert systems have been used to solve a wide range of problems in
domains such as medicine, mathematics, engineering, geology, computer
science, business, law, defence.
Figure 1 shows the most important modules that make up a rule-based expert system.
The user interacts with the system through a user interface which may use menus,
natural language or any other style of interaction). Then an inference engine is used to
reason with both the expert knowledge (extracted from our friendly expert) and data
specific to the particular problem being solved. The expert knowledge will typically be
in the form of a set of IF-THEN rules. The case specific data includes both data
provided by the user and partial conclusions (along with certainty measures) based on
this data. In a simple forward chaining rule-based system the case specific data will be
the elements in working memory.
40
Almost all expert systems also have an explanation subsystem, which allows the
program to explain its reasoning to the user. Some systems also have a knowledge base
editor which help the expert or knowledge engineer to easily update and check the
knowledge base. `
One important feature of expert systems is the way they (usually) separate domain
specific knowledge from more general purpose reasoning and representation
techniques. The general purpose bit (in the dotted box in the figure) is referred to as an
expert system shell. As we see in the figure, the shell will provide the inference engine
(and knowledge representation scheme), a user interface, an explanation system and
sometimes a knowledge base editor. Given a new kind of problem to solve (say, car
design), we can usually find a shell that provides the right sort of support for that
problem, so all we need to do is provide the expert knowledge. There are numerous
commercial expert system shells, each one appropriate for a slightly different range of
problems. (Expert systems work in industry includes both writing expert system shells
and writing expert systems using shells.) Using shells to write expert systems generally
greatly reduces the cost and time of development (compared with writing the expert
system from scratch).
41