Combined A.I Note

Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

COVENANT UNIVERSITY

COLLEGE OF SCIENCE AND TECHNOLOGY

ARTIFICIAL INTELLIGENCE LECTURE NOTE

DR. J.O. DARAMOLA

Department of Computer and Information Sciences

Principles and
Concepts of
Machine Intelligence

COPYRIGHT © 2009

1
Introduction

AI is one of the newest sciences. Work started in earnest soon after World War 11, and
the name itself was coined in 1956. Along with molecular biology, AI is regularly cited
as the "field I would most like to be in" by scientists in other disciplines. A student in
physics might reasonably feel that all the good ideas have already been taken by
Galileo, Newton, Einstein, and the rest. AI, on the other hand, still has openings for
several ground breaking contributions.

AI currently encompasses a huge variety of subfields, ranging from general-purpose


areas, such as learning and perception to such specific tasks as playing chess, proving
mathematical theorems, writing poetry, and diagnosing diseases. AI systematizes and
automates intellectual tasks and is therefore potentially relevant to any sphere of human
intellectual activity. In this sense, it is truly a universal field.

What is A.I.?
Some definitions of artificial intelligence has been organized into four categories in
Figure 1.

Systems that think like humans Systems that think rationally


"The exciting new effort to make "The study of mental faculties through the
computers think . . . machines with minds, use of computational models." (Chamiak
in the full and literal sense." (Haugeland, and McDermott, 1985)
1985) "The study of the computations that make
"[The automation of] activities that we it possible to perceive, reason, and act."
associate with human thinking, activities (Winston, 1992)
such as decision-making, problem
solving, learning . . ." (Bellman, 1978)

Systems that act like humans Systems that act rationally


"The art of creating machines that perform "Computational Intelligence is the study of
functions that require intelligence when the design of intelligent agents." (Poole et
performed by people." (Kurzweil, 1990) al., 1998)
"The study of how to make computers do "A1 . . .is concerned with intelligent
things at which, at the moment, people are behavior
better." (Rich and Knight, 1991) in artifacts." (Nilsson, 1998)

Figure 1: Some Definition of A.I

A.I Problems Domain


• Early work focused on formal tasks like:
o Game playing e.g. checkers-playing programs, where experience gained
through playing with opponents were used to improve performance
o Theorem proving e.g. Logic Theorist – which is an early attempt to
prove mathematical theorem i.e. the first and second chapter of

2
principia mathematica ( a book on mathematics), Gelernter’s theorem
(Geometry).
o It appeared that computers could perform well at these formal tasks
simply by being fast at exploring a large number of solution paths and
then selecting the best one- but this assumption turned out to be false
since no computer is fast enough to overcome the combinatorial
explosion generated by most problems
o General Problem Solver (GPS) – (Newell et al., 1963) an attempt to
model commonsense reasoning and symbolic manipulation of logical
expressions.
• The initial drawback of the early attempts was that the programs were designed
to handle large amount of knowledge. But as the AI research progressed and
techniques for larger amounts of knowledge were developed new tasks could
reasonably be attempted. these include:
• Some task Domain in A.I.
• Mundane Tasks:
o perception (vision, speech)
o Natural Language processing (understanding, generation, translation)
o Commonsense reasoning
o Robot control
• Formal tasks
o Games (Checkers, Chess, Ayo etc.)
o Mathematics (Geometry, Logic, Calculus, Proving properties of
Program)
• Expert Tasks
o Engineering (Design, fault finding, manufacturing planning)
o Scientific analysis
o Medical analysis
o Financial analysis

Developments in AI

1. Problem Solving and Planning: This deals with systematic refinement of goal
hierarchy, plan revision mechanisms and a focused search of important goals.
2. Expert Systems: This deals with knowledge processing and complex decision-making
problems.
3. Natural Language Processing: Areas such as automatic text generation, text
processing, machine translation, speech synthesis and analysis, grammar and style
analysis of text etc. come under this category.
4. Robotics: This deals with the controlling of robots to manipulate or grasp objects and
using information from sensors to guide actions etc.
5. Computer Vision: This topic deals with intelligent visualisation, scene analysis,
image understanding and processing and motion derivation.
6. Learning: This topic deals with research and development in different forms of
machine learning.

3
7. Genetic Algorithms: These are adaptive algorithms which have inherent learning
capability. They are used in search, machine learning and optimisation.
8. Neural Networks: This topic deals with simulation of learning in the human brain by
combining pattern recognition tasks, deductive reasoning and numerical computations.

Application Areas of AI
Autonomous Planning and Scheduling, Game Playing, Autonomous Control,
Diagnosis, Logistics Planning, Robotics, Language Understanding and Problem
Solving

• Some Pertinent Questions in A.I.


• What are our underlying assumptions about intelligence?
• What kinds of technique will be useful for solving AI problems?
• At what level of detail, if at all, are we trying to model human
intelligence?
• How will we know when we have succeeded in building an intelligent
program?
• Underlying Assumption
• The physical symbol system hypothesis [ Newell and Simon , 1976].
• The physical symbol system has the necessary and sufficient means for
general intelligent actions.
• Physical symbol system consist of :
• Symbols which are physical patterns which occur as components of
another entity called expression or symbol structure.
• Contains a collection of processes that operate on expressions to
produce other expressions. This includes processes of creation,
modification, reproduction and destruction.
• Therefore, a physical symbol system is a machine that produces
through time an existing of symbol structures.
• Using the computer as a medium for this experimentation has been
found to be true
• The importance of the physical symbols system hypothesis is two fold:
• It is a significant theory of the nature of human intelligence and so is
of great interest to psychologists.
• It also forms the basis of the belief that it is possible to build programs
that can perform intelligent tasks now performed by people.
A.I. Techniques
• An AI technique is a method that exploits knowledge in solving a problem.
• Nature of Knowledge
o It is voluminous
o It is hard to characterize accurately
o It is constantly changing
o It differs from data by being organized in a way that corresponds to the
way it will be used.
• Questions

4
o Are there techniques that are appropriate for the solution of a variety of
AI problems?
o Can these techniques be useful in solving other problems?
- An AI technique must exploit knowledge in such a way that:
• The knowledge captures generalization
• It can be understood by people who must provide it
• It can easily be modified to correct errors and reflect changes in the world and
in our world view
• It can be used in a great many situations even it is not totally accurate or
complete
• It can be used to help overcome its own sheer bulk by helping to narrow the
range of possibilities that must be considered.

Note: that it is possible to solve AI problems using non-AI techniques (although the
solution are not likely to be very good). It is also possible to apply AI techniques to
the solution of non-AI problems.

Characteristics of AI Techniques
• Search – provides a way of solving important problems for which no more
direct approach is available as well as a framework into which any
direct techniques that are available can be embedded
• Use of Knowledge – Provides a way of solving complex problems by exploiting
the structures of the objects that are involved.
• Abstraction – Provides a way of separating important features and variations
from the many unimportant ones that would otherwise overwhelm any
process.
Why develop AI programs?
Why do we attempt to model human performance:
• To test psychological theories of human performance
• To enable computers to understand human reasoning
• To enable people to understand computer reasoning
• To exploit what knowledge we can glean from people
How do we measure system’s Intelligence?
• Turing Test (Alan Turing, 1950): A method to determine whether a system can think.
Problem Spaces
Solving AI problems require four things:
• Define the problem precisely: this must include the specification of the initial
situation(s) and the goal of the problem solving process
• Analyze the problem: the emergent features after analysis can help determine
the appropriateness of various possible techniques for solving the problem.
• Isolate and represent the task knowledge that is necessary to solve the problem.
• Choose the best problem-solving technique(s) and apply it (them) to the
particular problem.

State Space Definition of Problems

5
• AI problems are abstractions of a state space search containing the set of
possible states, the operators and the set of rules for transformation.
• State spaces:
9 forms the basis of most AI problems
9 It allows for a formal definition of a problem in way that shows the
initial state to a goal state using a set of permissible operations
9 It permits us to define the process of solving a particular problem as a
combination of known techniques (each represented as a rule defining a
single step in the state space search), the general technique of exploring
the space in order to find some path from the current state to a goal state.
9 Most AI problems are NP problems. Hence the search is a very important
process in the solution of hard problems for which no more direct
techniques are available.

Exercise
1. Find a good state space representation for the following problems:
i) Traveling salesman
ii) Missionaries and cannibals
iii) Tower of Hanoi

Example
Water Jug Problem: You are given two jugs, a 4-gallon one and a 3-gallon one.
Neither has any measuring made on it. There is a pump that can be used to fill the
jugs with water. How can you get exactly 2 gallons of water into the 4- gallon jug?

Steps to formal description of a problem


- Define a state space that contains all possible configurations of the relevant
objects (and perhaps some possible ones).
Note: It is possible to define this space without explicitly enumerating all the states
it contains.
• Specify one or more states within that space that describe possible situations
from which the problem solving may start these states are called the initial
states.
• Specify the goal states that would be acceptable as solutions.
• Specify a set of rules that describe the actions (operators) available with
respect to the following:
o What unstated assumptions are present in the informal problem
description?
o How general should the rules be?
o How much of the work required to solve the problems should be
precomputed and represented in the rules?
Open problem
• Writing of programs that can produce formal descriptions from informal ones.
This process is called operationalization. “For example consider the task of
specifying precisely means to understand an English language sentence”.

6
Although such a specification must somehow be provided before we can design a
program to solve the problem, producing such a specification itself a very hard
problem.
• Generally problem can be solved by using the roles in the state space search, in
combination with appropriate control strategy, to move goal state is found.
Therefore the process is search is fundamental to the problem solving process.

Production Systems
Consists of:
• a set of rules, each consist of a left side ( a pattern) that determines the applicability
of the rule and a right side that describes the operation to be performed,
• one or more knowledge database ( knowledge base) that contains relevant and
appropriate information for the task.
• A control strategy that specifies the order in which the rules will be compared to the
database and a way of resolving the conflict that arise when several rules match at
ones.
• a rule applier
Examples: OPSS [Brownston et. al 1985], ACT * [Anderson 1983]
- Expert System Shells
- General Problem-solving architectures like SOAR [Liard et. al 1989]

Control strategies
Requirements
1. Causes motion: should be able to lead to a solution
2. Systematic: should be based on structure that bring motion e.g. tree or graph
that can facilitate effective search process.

Searching Techniques
Algorithm:
Breath-First Search
1. Create a variable called NODE-LIST and set to initial state
2. Until a goal state is found or NODE-LIST is empty do:
a) Remove the first element from the NODE-LIST and call it E.
If NODE-LIST was empty, quit
b) for each way that each rule can match the state described in E do:
i) Apply the rule to generate a new state
ii) If the new state is a good state, quit and return this state
iii) Otherwise, add the new state to the end of NODE-LIST
Description
• Construct a tree with the initial state as its roots. Generate all offspring of the root
by applying each of the applicable rules to the initial state. Thereafter, for each leaf
node, generate all successors by the rules that are appropriate.
• Continue this process until some rule produces a goal state.

7
Exercise:
Apply the breadth-first search to the water jug problem?

Advantages of Breadth-First Search


• The algorithm therefore will not get trapped after exploring a wrong path
• If there is a solution the breadth first will find it, though it may take time. Also,
if there are multiple solutions, the minimal solution will be found. This is
because the longer paths are never examined until all the shorter ones have
been examined.

Depth-first Search
• Pursues a single branch of the tree until it yields a solution or until a decision to
terminate the path is made (i.e. when it reaches a dead end, when the length of
the path exceeds the Futility Limit). Thereafter backtracking occurs. It
backtracks to the most recently created state from which alternative moves are
available.
• Chronological backtracking – because the ruler in which steps are undone
depends only on temporal sequence in which the steps were originally made.
The most recent steps is always the first to be undone.

8
One-level of a Breadth-First Search Tree

(0,0)

(4,0) (0,3)

(0,0)

(4,0) (0,3)

(4,3) (0,0) (1,3) (4,3) (0,0) (3,0)

Two-level of a Breadth-First Search Tree

(0,0)
Algorithm: Depth-First Search
• First the initial state is a good state, quit and return success.
• Otherwise, do the following until success or failure is signaled:
(a) Generate
(4,0) a successor, E, of the initial state. If there are no more successors,
signal failure.
(b) Call Depth-First Search with E as the initial state.
(c) If success is returned, signal success, otherwise continue this loop.
(4,3)
Advantages of Depth-First Search
• Requires lessA memory
Depth-First Search
since only Tree
the nodes on the current path are stored, this
contrast with breadth-first search, where all of the tree that has so far been
generated must be stored.
• By chance (if care is taken in ordering the alternative successor states), may find
solution without much searching.
• In breadth first all nodes at level n must be examined before any node on level
n+1 can be examined. This is particularly significant if many acceptable solutions
exist. Depth-first search can stop when one of them is found.
Disadvantage

9
• A wrong path may be followed, and it can get trapped if there are loops on that
path.
• May find a solution on a loop path of a tree not necessarily the nominal path.
Note: the Best-First Search combines the strengths of these two algorithms to achieve
a better implementation.
Exercise
Discuss the traveling salesman problem?
A salesman has a list of cities, each of which he must visit exactly once. There are
direct roads between each pair of cities on the list. Find the route the salesman
should follow for the shortest possible trip that both starts and finishes at any one of
the cities.

IB
OS

ON

LA BE

- Combinatorial explosion
- Branch and bound strategy
- NP problem.
No of path among cities = (N-1)!
Total time to perform the search = N!
Branch and bound Technique: Begin generating complete paths, keeping track of
the shortest path found so far. Give up exploring any path as soon as its partial length
becomes greater than the shortest path found so far.

Heuristic Search
• A heuristic is a technique that improves the efficiency of a search process,
possibly by sacrificing claims of completeness.
• It is a control structure that is not guaranteed to find the best answer but will
always find a good answer.
• Heuristics are rules of the thumbs that can guide for correctness unlike
algorithms.
• Heuristics help to find good though non-optimal solutions to NP problems.
• There are general purpose heuristics and also domain-specific heuristics. e.g.
Nearest neighbour heuristic, which works by selecting the locally superior
alternative at each step.
• Error bounds (applicable to general purpose heuristics)
• Heuristics solves the problem of combinational explosion.
Arguments in favour of Heuristics

10
• Rarely do we actually need the optimum solution in the real world, a good
approximation will usually serve very well
• Although Approximation produced by heuristics may not be very good in the
worst case, worst cases rarely arise in the real world.
• Understanding why heuristics works or why it doesn’t work, often leads to a
deeper understanding of the problem.
• Domain specific heuristic knowledge can be incorporated into a rule-based search
procedure by:
• Putting them in rules
• As heuristic function that evaluates individual problem states and determines
how desirable they are.
• Heuristic Function is a function that maps from problem state description to
measures of desirability, usually represented as numbers. The value of the heuristic
function at any given node is meant to be as good an estimate as possible whether
that node is on the derived path to a solution. It specifies the level of importance of
that node to the path of solution. It is designed to efficiently guide a search process
toward a solution. The purpose of the heuristic function is to guide the search
process in the most profitable direction by suggesting which path to follow first
when more than one is available.
Characteristics of Problems
• Decomposable problems: problems can be decomposed into smaller or easier
components?
• Ignorable problems: solution steps can be ignored when considered not
necessary (e.g. theorem proving)
• Recoverable: in which solutions steps can be undone
• Irrecoverable: in which solutions steps cannot be undone
• Certain-outcome: lead to definite outcome
• Uncertain-outcome: produces a probability to lead to a solution
(the hardest problems to solve are those that are irrecoverable, uncertain-
outcome) e.g. advising a lawyer who is defending a client who is standing trial
for murder
• Problems that require absolutely good solution and those that require relatively
good solution e.g. traveling salesman algorithm (Any-path/Best path) problem.
• Problem that require a solution as state or path.
Characteristics of Production systems
• Production systems are a good way to describe the operations that are
performed in a search for a solution to a problem.
• Questions
• Can Production system be described with characteristics that shed some light on
how they can easily be implemented?
• If so, what relationship are there between problem types and types of production
systems suited to solving the problems.
Types of Production systems

11
• Monotonic production systems: the application of a rule never prevents the later
application of another rule that could have been applied at the time the first rule
was selected.
• Non-Monotonic P.S: lacks the above attribute of Monotonic system
• Partially Commutative P.S.: if the application of a particular sequence of rules
transforms state x into state y, then any permutation of those rules that is
allowable also transforms state x into state y. (i.e. the order of selection of the
set of rules is not important)
• Commutative P.S: is both monotonic and partially commutative
• Partially commutative, monotonic systems are useful for solving ignorable
problems. Problems that involve creating new things rather than changing old
ones are generally ignorable. e.g. Theorem proving, and making deductions
from know facts.
• Non-Monotonic, Partially Commutative P.S. are good for problems where
changes occur but can be reversed and in which order of operations is not
critical. e.g. Physical manipulation systems like ‘robot navigation on flat plane’
• Non-partially commutative P.S. are useful for problems in which irreversible
changes occur e.g. declaring the process of chemical compound production
(Chemical synthesis).
Issues in the Design of Search Programs
• Every search process can be viewed as the traversal of a tree structure
• The node represents a problem state
• Each arc represents a relationship between the state nodes its connects
• In most programs the rules are not represented explicitly, rather the tree is
represented implicitly in the rules and generates explicitly only those paths that
they decide to explore.
• (Therefore, keep in mind the distinction between implicit search trees and the
explicit partial search tree that are actually constructed by the search tree that
are actually constructed by search program.)
• The direction of search (forward / backward reasoning.)
• Selection of applicable rules (matching)
• How to represent each node of the search process (Knowledge representation
problem and the Frame problem)

Search Tree/Graph
• The concept of search graph eliminates the repeated traversal of previously
expanded and process nodes that may reappear. This makes the search space to
be an arbitrary directed graph rather than a tree. The graph differs from a tree in
that several paths may come together as a node. (See Figure 1.3 below)

12
(0,0)

(4,0) (0,3)

(1,3) (4,3) (4,3) (3,0)

Figure 1.3 Search Graph for the Water-Jug Problem

• Heuristic Search Techniques


• Weak methods: are varieties of heuristic search techniques whose efficacy is
dependent on the way they exploit domain-specific knowledge, since in
themselves they are unable to overcome the problems of combinatorial
explosion to which search processes are so vulnerable.
• Hill Climbing
• Algorithm
1. Evaluate the initial state. If it is also a goal state, then return and quit. Otherwise
continue with the initial state as the current state.
2. Loop until a solution is found or until there are no new operators to be applied
in the currents state.
(a) Select an operator that has not yet been applied to the current state and
apply it to produce a new state.
(b) Evaluate the new state
i. If it is a goal state, then return it and quit
ii. If is not a goal state but it is better than the current state, then
make it the current state
iii. If it is not better then the current state, then continue in the loop
• The Hill climbing is a variant of the generate-and-test search (another search
technique based on depth-first search) in which feedback from the test
procedure is used to help the generator decide which direction to move in the
search space.
• Used mostly when a good heuristic function is available for evaluating states
but no other useful knowledge is available.
• The key difference between this algorithm and the generate-and-test is the use
of evaluation function as a way to inject task-specific knowledge into the
control process.

13
• Simulated Annealing (Kirkpatrick et al. 1983)
• Algorithm:
1. Evaluate the initial state. If it is a goal state, then return it and quit. Otherwise
continue with the initial state as the current state.
2. Initialize BEST-SO-FAR to the current state
3. Initialize T according to the annealing schedule
4. Loop until a solution is found or until there are no new operators left to be
applied in the current state
a) Select an operator that has not yet been applied to current state and apply
it to produce a new state
b) Evaluate the new state. Compute
ΔE = (value of current) – ( value of new state)
- if the new state is good state, then return it and quit state
- if is not a good state but is better than the current state, then make it the
current state. Also set
BEST-SO-FAR to this new state
- if it is not better than the current state, then make it current state with
probability P’ where (P’ = e-ΔE/T). This step is usually implemented by
invoking a random number generator to produce a number in the range [0,1].
If that number is less than P’, then the move is accepted. Otherwise do
nothing.
(c) Revise T as necessary according to annealing schedule
5. Return BEST-SO-FAR as the answer.

• A variation of hill climbing in which at the beginning of the process some


down-hill moves may be made. This is to make the final solution relatively
insensitive to the starting state.
• This reduces the chances of a local maximum, a plateau or a ridge.
o A local maximum is state that is better than all its neighbors but is not better
than some other states further away. At a local maximum, all moves appear
to make things worse. Local maxima are particularly frustrating because
they often occur almost within sight of solution.
o A plateau is a flat area of the search space in which a whole set of
neighboring states has the same value. On a plateau, it is not possible to
determine the best direction in which to move by making local comparisons.
o A ridge is a special kind of local maximum. It is an area of the search space
that that is higher than surrounding area and that itself has slope. ( which
one would like to climb). But the orientation of the high region, compared to
the set of available moves and the directions in which they move, makes it
impossible to traverse a ridge by single moves.
• Makes use an objective function as heuristic function, and the objective
function is minimized.
• Uses a probability
• P = e –ΔE/T
• ΔE is generalized to represent the change in the value of the objective function,
T is the temperature.

14
Best- First Search
• Combines the good attributes of both Depth-First Search and Breadth-First
Search.
• Depth first search is good because it allows a solution to be found without all
computing branches having to be expanded.
• Breadth-first search is good because it does not get trapped on dead-end paths.
The Best-first combines these two attributes to follow a single path at a time,
but switch paths whenever some competing path looks more promising than the
current one does.
• At each step of the best-first search process, we select the most promising of the
nodes generated so far. This is done by applying an appropriate heuristic
function to each of them. We then expand the chosen node by using the rules to
generate its successors. If one of them is a solution, we can quit. If not, all those
new nodes are added to a set of nodes generated so far. Again the most
promising node is selected and the process continues. Usually what happens is
that a bit of depth-first searching occurs as the most promising branch is
explored. But eventually, if a solution is not found, that branch will start to look
less promising than on e of the top-level branches that had been ignored. At that
point, the now more promising, previously ignored branch will be explored. But
the old branch is not forgotten. Its last node remains in the set generated but
unexpanded nodes. The search return to it when all the others get bad enough
that it is again the most promising path.

Figure 1.4 A
A
A
B C D B C D

A
A E F

B C D
B C D

G H E F
G H E F

I J

Algorithm: Best –First Search OR Graph


Although best-first search applies to a tree structure, it is better to search a graph so
that duplicate path will not be pursued. To implement such a graph-search
procedure, we will need to use two list nodes.

15
• OPEN - nodes that have been generated and have had the heuristic function
applied to them but which have not yet been examined ( i.e. had their
successors generated)
• OPEN is actually a priority queue in which the elements with the highest
priority are those with the most promising value of the heuristic function.
(Revise: Standard technique for manipulating queues.)
• CLOSED- nodes that have already been examined. We need to keep the nodes
in memory if we want to search a graph rather than a tree, since whenever a new
node is generated, we need to check whether it has been generated before.
Algorithm
1. Start with OPEN containing just the initial start
2. until a goal is found or there are no nodes left on OPEN do:
(a) Pick the best node on OPEN
(b) Generate its successors
(c) For each successor do:
i. If it has not been generated before, evaluate it, add it to OPEN,
and record its parent.
ii. If it has been generated before, change the parent if this new path
is better than the previous one. In that case, update the cost of
getting to this node any successors that this node may already
have.

General Problem Solving Approaches


There exist quite a large number of problem solving techniques in AI that rely on
search. The simplest among them is the generate and test method. The algorithm for
the generate and test method can be formally stated as follows:

Procedure Generate & Test


Begin
Repeat
Generate a new state and call it current-state;
Until current-state = Goal;
End.

It is clear from the above algorithm that the algorithm continues the possibility of
exploring a new state in each iteration of the repeat-until loop and exits only when the
current state is equal to the goal. Most important part in the algorithm is to generate a
new state. This is not an easy task. If generation of new states is not feasible, the
algorithm should be terminated.
In our simple algorithm, we, however, did not include this intentionally to keep it
simplified. But how does one generate the states of a problem? To formalize this, we
define a four tuple, called state space, denoted by
{ nodes, arc, goal, current },
where
nodes represent the set of existing states in the search space;

16
an arc denotes an operator applied to an existing state to cause transition to another
state;
goal denotes the desired state to be identified in the nodes; and
current represents the state, now generated for matching with the goal.

The state space for most of the search problems we will cover in this chapter takes the
form of a tree or graph2.

Breadth First Search


The breadth first search algorithm visits the nodes of the tree along its breadth, starting
from the level with depth 0 to the maximum depth. It can be easily realized with a
queue. For instance, consider the tree, given in Figure 1. Here, the nodes in the tree are
traversed following their ascending ordered labels.

Fig. 1: The order of traversal in a tree of depth 3 by breadth first manner.

The algorithm for traversal in a tree by depth first manner can be presented with a
queue as follows:

Procedure Breadth-first-search
Begin
i) Place the starting node in a queue;
ii) Repeat
Delete queue to get the front element;
If the front element of the queue = goal,
return success and stop;
Else do
Begin
insert the children of the front element,
if exist, in any order at the rear end of
the queue;
End

17
Until the queue is empty;
End.
The breadth first search algorithm, presented above, rests on a simple principle. If the
current node is not the goal add the offspring of the current in any order to the rear end
of the queue and redefine the front element of the queue as the current. The algorithm
terminates, when the goal is found.

Fig. 2: First few steps of breadth first search on the tree of fig. 1.

Time Complexity
For the sake of analysis, we consider a tree of equal branching factor from each node =
b and largest depth = d. Since the goal is not located within depth (d-1), the number of
false search [1], [2] is given by
1+b+b2 +b3 + … + bd-1 = (bd-1) / (b-1), b>>1.
Further, the first state within the fringe nodes could be the goal. On the other hand, the
goal could be the last visited node in the tree. Thus, on an average, the number of fringe
nodes visited is given by (1+bd) / 2.
Consequently, the total number of nodes visited in an average case becomes
(bd-1) / (b-1) + (1+bd ) / 2
≡ bd (b+1) / 2(b-1).
Since the time complexity is proportional to the number of nodes visited, therefore, the
above expression gives a measure of time complexity.

Space Complexity
The maximum number of nodes will be placed in the queue, when the leftmost node at
depth d is inspected for comparison with the goal. The queue length under this case
becomes bd. The space complexity of the algorithm that depends on the queue length,
in the worst case, thus, is of the
order of bd. In order to reduce the space requirement, the generate and test algorithm is
realized in an alternative manner, as presented below.

18
Depth First Search
The depth first search generates nodes and compares them with the goal along the
largest depth of the tree and moves up to the parent of the last visited node, only when
no further node can be generated below the last visited node. After moving up to the
parent, the algorithm attempts to generate a new offspring of the parent node. The
above principle is employed recursively to each node of a tree in a depth first search.
One simple way to realize the recursion in the depth first search algorithm is to employ
a stack. A stack-based realization of the depth first search algorithm is presented below.

Procedure Depth first search


Begin
1. Push the starting node at the stack,
pointed to by the stack-top;
2. While stack is not empty do
Begin
Pop stack to get stack-top element;
If stack-top element = goal, return
success and stop
Else push the children of the stack-top
element in any order into the stack;
End while;

End

Fig. 3. Depth first search on a tree, where the node numbers denote the order of visiting
that node.

In the above algorithm, a starting node is placed in the stack, the top of which is pointed
to by the stack-top. For examining the node, it is popped out from the stack. If it is the

19
goal, the algorithm terminates, else its children are pushed into the stack in any order.
The process is continued until the stack is empty. The ascending order of nodes in Fig.
3 represents its traversal on the tree by depth first manner. The contents of the stack at
the first few iterations are illustrated below in Fig 4. The arrowhead in the figure
denotes the position of the stack-top.

Fig. 4: A snapshot of the stack at the first few iterations.

Space Complexity
Maximum memory in depth first search is required, when we reach the largest depth at
the first time. Assuming that each node has a branching factor b, when a node at depth
d is examined, the number of nodes saved in memory are all the unexpanded nodes up
to depth d plus the node being examined. Since at each level there are (b-1) unexpanded
nodes, the total number of memory required = d (b -1) +1. Thus the space complexity
of depth first search is a linear function of b, unlike breadth first search, where it is an
exponential function of b. This, in fact, is the most useful aspect of the depth first
search.

Time Complexity
If we find the goal at the leftmost position at depth d, then the number of nodes
examined = (d +1). On the other hand, if we find the goal at the extreme right at depth
d, then the number of nodes examined include all the nodes in the tree, which is
1+b+b2 +b3 +…+bd = (bd+1 -1) / (b-1)
So, the total number of nodes examined in an average case
= (d+1) /2 + (bd+1 -1) / 2(b-1)
≡b( bd + d) / 2 (b -1)
This is the average case time complexity of the depth first search algorithm. Since for
large depth d, the depth first search requires quite a large runtime, an alternative way to
solve the problem is by controlling the depth of the search tree. Such an algorithm,
where the user mentions the initial depth cut-off at each iteration, is called an Iterative
Deepening Depth First Search or simply an Iterative deepening search.

20
Iterative Deepening Search
When the initial depth cut-off is one, it generates only the root node and examines it. If
the root node is not the goal, then depth cut-off is set to two and the tree up to depth 2 is
generated using typical depth first search. Similarly, when the depth cut-off is set to m,
the tree is constructed up to depth m by depth first search. One may thus wonder that in
an iterative happening search, one has to regenerate all the nodes excluding the fringe
nodes at the current depth cut-off. Since the number of nodes generated by depth first
search up to depth h is
( bh+1-1) / (b-1),
the total number of nodes expanded in failing searches by an iterative deepening search
will be

(d-1) {1 / (b-1) } (bh+1 -1)


h=0
≡b(bd - d) / (b-1)2.

The last pass in the algorithm results in a successful node at depth d, the average time
complexity of which by typical depth first search is given by
b( bd + d) / 2 (b -1).
Thus the total average time complexity is given by
b(bd - d) / (b-1)2 + b( bd + d) / 2 (b -1).
≡ (b+1) bd+1 / 2 (b -1)2.
Consequently, the ratio of average time complexity of the iterative deepening search to
depth first search is given by
{(b+1) bd+1 / 2 (b -1)2 } : { bd+1 / 2 (b-1)}
= (b+1): (b -1).

The iterative deepening search thus does not take much extra time, when compared to
the typical depth first search. The unnecessary expansion of the entire tree by depth first
search, thus, can be avoided by iterative deepening. A formal algorithm of iterative
deepening is presented below.

Procedure Iterative-deepening
Begin
1. Set current depth cutoff =1;
2. Put the initial node into a stack, pointed to by stack-top;
3. While the stack is not empty and the depth is within the
given depth cut-off do
Begin
Pop stack to get the stack-top element;
if stack-top element = goal, return it and stop
else push the children of the stack-top in any order
into the stack;
End While;

21
4. Increment the depth cut-off by 1 and repeat
through step 2;
End.

The breadth first, depth first and the iterative deepening search can be equally used for
Generate and Test type algorithms. However, while the breadth first search requires an
exponential amount of memory, the depth first search calls for memory proportional to
the largest depth of the tree. The iterative deepening, on the other hand, has the
advantage of searching in a depth first manner in an environment of controlled depth of
the tree.

Procedure Hill-Climbing
The ‘generate and test’ type of search algorithms presented above only expands the
search space and examines the existence of the goal in that space. An alternative
approach to solve the search problems is to employ a function f(x) that would give an
estimate of the measure of distance of the goal from node x. After f(x) is evaluated at
the possible initial nodes x, the nodes are sorted in ascending order of their functional
values and pushed into a stack in the ascending order of their ‘f’ values. So, the stack-
top element has the least f value. It is now popped out and compared with the goal. If
the stack-top element is not the goal, then it is expanded and f is measured for each of
its children. They are now sorted according to their ascending order of the functional
values and then pushed into the stack. If the stack-top element is the goal, the algorithm
exits; otherwise the process is continued until the stack becomes empty. Pushing the
sorted nodes into the stack adds a depth first flavor to the present algorithm. The hill
climbing algorithm is formally presented below.

Begin
1. Identify possible starting states and measure the distance (f) of their
closeness with the goal node; Push them in a stack according to the
ascending order of their f ;
2. Repeat
Pop stack to get the stack-top element;
If the stack-top element is the goal, announce it and exit
Else push its children into the stack in the ascending order of their f values;
Until the stack is empty;
End.

The hill climbing algorithm too is not free from shortcomings. One common problem is
trapping at local maxima at a foothill. When trapped at local maxima, the measure of f
at all possible next legal states yield less promising values than the current state. A
second drawback of the hill climbing is reaching a plateau [2]. Once a state on a plateau
is reached, all legal next states will also lie on this surface, making the search
ineffective. A new algorithm, called simulated annealing, discussed below could easily
solve the first two problems. Besides the above, another problem that too gives us

22
trouble is the traversal along the ridge. A ridge (vide fig. 4.5) on many occasions leads
to a local maxima. However, moving along the ridge is not possible by a single step due
to non-availability of appropriate operators. A multiple step of movement is required to
solve this problem.

Simulated Annealing
“Annealing” is a process of metal casting, where the metal is first melted at a high
temperature beyond its melting point and then is allowed to cool down, until it returns
to the solid form. Thus in the physical process of annealing, the hot material gradually
loses energy and finally at one point of time reaches a state of minimum energy. A
common observation reveals that most physical processes have transitions from higher
to lower energy states, but there still remains a small probability that it may cross the
valley of energy states [2] and move up to a energy state, higher than the energy state of
the valley. The concept can be verified with a rolling ball. For instance, consider a
rolling ball that falls from a higher (potential) energy state to a valley and then moves
up to a little higher energy state (vide fig. 4.6). The probability of such:

transition to a higher energy state, however, is very small and is given by


p = exp (- E / KT)

where p is the probability of transition from a lower to a higher energy state, E


denotes a positive change in energy, K is the Boltzman constant and T is the
temperature at the current thermal state. For small E, p is higher than the value of p,
for large E. This follows intuitively, as w.r.t the example of ball movement, the
probability of transition to a slightly higher state is more than the probability of
transition to a very high state.
An obvious question naturally arises: how to realize annealing in search?
Readers, at this stage, would remember that the need for simulated annealing is to
identify the direction of search, when the function f yields no better next states than the
current state. Under this circumstance, E is computed for all possible legal next states
and p’ is also evaluated for each such next state by the following formula:
p’ = = exp (- E / T)
A random number in the closed interval of [0,1] is then computed and p’ is compared
with the value of the random number. If p’ is more, then it is selected for the next
transition. The parameter T, also called temperature, is gradually decreased in the
search program. The logic behind this is that as T decreases, p’ too decreases, thereby
allowing the algorithm to terminate at a stable state. The algorithm for simulated
annealing is formally presented below.

Procedure Simulated Annealing


Begin
1. Identify possible starting states and measure the distance (f) of their closeness with
the goal; Push them in a stack according to the ascending order of their f ;
2. Repeat
Pop stack to get stack-top element;
If the stack-top element is the goal,

23
announce it and exit;
Else do
Begin
a) generate children of the stack-top element N and
compute f for each of them;
b) If measure of f for at least one child of N is improving
Then push those children into stack in ascending order of
their f;
c) If none of the children of N is better in f
Then do
Begin
a) select any one of them randomly, compute its p’ and test whether p’ exceeds a
randomly generated number in the interval [0,1]; If yes, select that state as the next
state; If no, generate
another alternative legal next state and test in this way until one move can be selected;
Replace stack-top element by the selected move (state);
b) Reduce T slightly; If the reduced value is negative, set it to zero;
End;
Until the stack is empty;
End.

The algorithm is similar to hill climbing, if there always exists at least one better next
state than the state, pointed to by the stack-top. If it fails, then the last begin-end
bracketed part of the algorithm is invoked. This part corresponds to simulated
annealing. It examines each legal next state one by one, whether the probability of
occurrence of the state is higher than the random value in [0,1]. If the answer is yes, the
state is selected, else the next possible state is examined. Hopefully, at least one state
will be found whose probability of occurrence is larger than the randomly generated
probability.

Another important point that we did not include in the algorithm is the process of
computation of E. It is computed by taking the difference of the value of f of the next
state and that of the current (stack-top) state.
The third point to note is that T should be decreased once a new state with less
promising value is selected. T is always kept non-negative. When T becomes zero, p’
will be zero and thus the probability of transition to any other state will be zero.

4.3 Heuristic Search


This section is devoted to solve the search problem by a new technique, called heuristic
search. The term “heuristics” stands for “ thumb rules”, i.e., rules which work
successfully in many cases but its success is not guaranteed.

In fact, we would expand nodes by judiciously selecting the more promising nodes,
where these nodes are identified by measuring their strength compared to their
competitive counterparts with the help of specialized intuitive functions, called
heuristic functions.

24
Heuristic search is generally employed for two distinct types of problems: i) forward
reasoning and ii) backward reasoning. We have already discussed that in a forward
reasoning problem we move toward s the goal state from a pre-defined starting state,
while in a backward reasoning problem, we move towards the starting state from the
given goal. The former class of search algorithms, when realized with heuristic
functions, is generally called heuristic Search for OR-graphs or the Best First search
Algorithms. It may be noted that the best first search is a class of algorithms, and
depending on the variation of the performance measuring function it is differently
named. One typical member of this class is the algorithm A*. On the other hand, the
heuristic backward reasoning algorithms are generally called AND-OR graph
search algorithms and one ideal member of this class of algorithms is the AO*
algorithm. We will start this section with the best first search algorithm.

4.3.1 Heuristic Search for OR Graphs


Most of the forward reasoning problems can be represented by an OR-graph, where a
node in the graph denotes a problem state and an arc represents an application of a rule
to a current state to cause transition of states. When a number of rules are applicable to
a current state, we could select a better state among the children as the next state. We
remember that in hill climbing, we ordered the promising initial states in a sequence
and examined the state occupying the beginning of the list. If it was a goal, the
algorithm was terminated. But, if it was not the goal, it was replaced by its offsprings in
any order at the beginning of the list. The hill climbing algorithm thus is not free from
depth first flavor. In the best first search algorithm to be devised shortly, we start with a
promising state and generate all its offsprings. The
performance (fitness) of each of the nodes is then examined and the most promising
node, based on its fitness, is selected for expansion. The most promising node is then
expanded and the fitness of all the newborn children is measured. Now, instead of
selecting only from the generated children, all the nodes having no children are
examined and the most promising of these fringe nodes is selected for expansion. Thus
unlike hill climbing, the best first search provides a scope of corrections, in case a
wrong step has been elected earlier. This is the prime advantage of the best first search
algorithm over hill climbing. The best first search algorithm is formally presented
below.

Procedure Best-First-Search
Begin
1. Identify possible starting states and measure the distance (f) of their
closeness with the goal; Put them in a list L;
2. While L is not empty do
Begin
a) Identify the node n from L that has the minimum f; If there
exist more than one node with minimum f, select any one of them
(say, n) arbitrarily;
b) If n is the goal
Then return n along with the path from the starting node,

25
and exit;
Else remove n from L and add all the children of n to the list L,
with their labeled paths from the starting node;
End While;
End.
As already pointed out, the best first search algorithm is a generic algorithm and
requires many more extra features for its efficient realization.
For instance, how we can define f is not explicitly mentioned in the algorithm. Further,
what happens if an offspring of the current node is not a fringe node. The A* algorithm
to be discussed shortly is a complete
realization of the best first algorithm that takes into account these issues in detail. The
following definitions, however, are required for presenting the A* algorithm. These are
in order.

Definition 4.1: A node is called open if the node has been generated and the h’ (x) has
been applied over it but it has not been expanded yet.
Definition 4.2: A node is called closed if it has been expanded for generating
offsprings.
In order to measure the goodness of a node in A* algorithm, we require two cost
functions: i) heuristic cost and ii) generation cost. The heuristic cost easures the
distance of the current node x with respect to the goal and is denoted by h(x). The cost
of generating a node x, denoted by g(x), on the other hand measures the distance of
node x with respect to the starting node in the graph. The total cost function at node x,
denoted by f(x), is the sum of g(x) plus h(x).

26
Knowledge & Reasoning
Issues in Knowledge Representation
• Knowledge are facts that can be exploited by AI programs in order to generate
good results
• The main entities are:
• Facts: truths in some relevant world
• Representation: the formalism used to describe facts in a way that they can be
manipulated.
• These two entities should be structured at two levels:
• Knowledge level – at which facts are described
• Symbol level – at which representation of objects at the knowledge level are
defined in terms of symbols that can be manipulated by programs.
Mapping between facts and Representation

Reasoning Program
Facts Internal Representation

English Representation English generation

English Representation

Approaches to Knowledge Representation


Properties of good knowledge representation system
• Representational Adequacy – the ability to represent all the kinds of knowledge
that are needed in that domain.
• Inferential Adequacy – the ability to manipulate the representational structures
in such a way as to derive new structures corresponding to the new knowledge
inferred from old.
• Inferential Efficiency – the ability to incorporate into the knowledge structures
additional information that can be used to focus the attention of the inference
mechanism in the most promising directions.

27
• Acquisitional Efficiency: - the ability to acquire new information easily. It
should be possible to make direct insertion into the database and the addition of
new knowledge.

Types of Knowledge Representation


• Simple relational knowledge – using database tables
• Inheritable knowledge – provides for property inheritance in which elements
of specific classes inherits attributes and values from more general classes in
which they are included. Objects are organized into classes and classes are
arranged in a generalization hierarchy. [See Fig. 4.5 page 111: Artificial
Intelligence, Elaine, R. and K. Knight]. Examples include Slot-Filler
structure like Semantic network and Frames.
• Inferential Knowledge: the power of property inheritance and traditional
logic (Predicate, Propositional) are combined to generate the inference
(deductions) that is needed.
• Procedural knowledge: facts are not just static or declarative, procedural
knowledge specifies what to do and when to do it. This is another kind of
knowledge that must be represented. Mostly production rules are used to
represent procedural knowledge.
• Semantic knowledge - Ontologies, Vocabulary, Thesaurus, Episodic
knowledge

Issues in Knowledge Representation


The following issues cut across the various kinds of real world knowledge.
• Are any attributes of objects so basic that they occur in almost every
problem domain? If there are, we need to make sure that they are
handled appropriately in each of the mechanisms we propose. If such
attributes exist, what are they?
• Are there any important relationships that exist among attributes of
objects?

28
• At what level should knowledge be represented? Is there a good set of
primitives into which knowledge can be broken down? Is it helpful to
use such primitives?
• How should sets of objects be represented
• Given a large amount of knowledge stored in a database, how can
relevant parts be accessed when they are needed?
• (Reference E. Rich, K Knight, 1999, Artificial intelligence)
Answers to Questions
1. (Yes) instance and isa i.e. class membership and class inclusion are
common and support inheritance
2. (Yes)
a. Inverses: if we can define a relationship from the perspective of
objects A and B. It is also possible to define also, an inverse
relationship from the perspective of B to A.
b. Existence in isa hierarchy: just as there are classes of and
specializations of attributes.
• There exist techniques for Reasoning about values:
- information about the type of value
- constraints on the value
- rules for computing values when it is needed
- rules that describe action that should be taken, if a value ever
becomes known.
• Single-valued attributes
- explicit notation to track duplication
- replacement of an old value
Representation should be done at a variety of granularities i.e. both the use of low level
primitives and high-level form should be used. But the particular domain will determine
which should be more employed.

Exercise
Investigate the frame problem

29
Knowledge based Agents
• The knowledge base is the central component of a knowledge-based agent.
• A knowledge base is a set of representation of facts about the world, that cause
the knowledge based agent to adapt to its environment.
• Each individual representation in a knowledge base is called Sentence. The
sentences are expressed in a knowledge representation language.
• A Knowledge base must have a Tell (assert) and Ask (query) mode
respectively. In essence knowledge based agent takes a percept as input and
returns an action.
• The expectation from a knowledge based agent is that when it asked a question,
the response should follow what it has been told which will be the proof of
reasoning.
• The knowledge base of a knowledge-based agent is referred to as background
knowledge. Every time an agent program is called, two things happens
o 1) it tells the knowledge base what it perceives and
o 2) it asks the knowledge base what action it should perform.
• In doing this, the concept of logical reasoning is used to determine which action
is better than others.
• The objective of knowledge representation is to express knowledge in a
computer-traceable form, such that it can be used to help agents perform well.
• A knowledge representation is defined by two aspects:
o The Syntax of a language which describes the possible
configuration that can constitute sentences
o Semantics, which determines the facts in the world to which the
sentence refer.
• Provided the syntax and semantics are precisely defined, we can call the
language a Logic. Also from the syntax and semantics we can derive an
interface mechanism for an agent using the language.
• An Inference procedure can do one of two things:
o 1. Given a knowledge base KB, it can generate new sentences α that
purport to be entailed by KB Or

30
o 2. Given a knowledge base KB and another sentence α, it can report
whether or not α is entailed by KB. An inference procedure that
generalizes sentences is called sound or truth preserving.
• Entailment: In mathematical notation, the relation entailment between a
knowledge base KB and a sentence α is pronounced “KB entails
α” and written as:
KB╞ α.
• An inference procedure i can be described by the sentences it can derive. If i can
derive α from KB, this could be written as KB╞ i α. i.e. alpha is
derived from KB by i or i derives alpha from KB.
• The record of operation of a sound inference procedure is called a proof.
• An inference procedure is complete if it can find a proof for any sentence that is
entailed.
• In summary:
• Logic consist of:
1. A formal system of describing states of affairs, consisting of:
a) The syntax of the language, which describes how to make sentences,
b) The semantics of the language, which states the systematic constraints on how
sentences relates to states of affairs.
2. The proof Theory- a set of rules for deducing the entailments of a set of
sentences.
• Therefore, by examining the semantics of a logical language, we can extract the
semantics of the language.
Types of Agent Programs
Simple Reflex Agent
Model-based Agent
Goal-based Agent
Utility-based Agent
Learning Agent

31
Knowledge Representation with First-Order Logic
• Propositional logic can also be used but it lacks sufficient primitives for
representing knowledge in a logical way. It has limited ontology, making
only the commitment that the world consists of facts. But this is not true and
grossly inadequate.
• Predicate logic (Firs-order logic) makes a stronger set of ontological
commitments. It has primitives to represent objects, relations properties and
functions. The main reason for this is that the world consists of an object,
that is, things with individual identities and properties that distinguish them
from other objects. Among these objects various relations hold. Some of
these are functions- relations in which there is one “value” for a given
“input”. Examples of objects, properties, relations and functions include:
• Objects: people, houses, numbers, ball etc.
• Relations: brotherof, biggerthan, inside, part of, hascolor etc.
• Properties: red, round, yellow, prime
• Functions: fatherof, bestfriend etc.
• First-order logic has sentences, but it also has terms, which represents
objects. Constant symbols, variables and function symbols are used to build
terms, and quantifiers and predicate symbols are used to build sentences.
Syntax of First-Order Logic (with equality in Backus-Nomal Form)
Sentences → Atomic sentences
| sentence connective sentence
| Quantifier variable… Sentence
| ┐sentence | ┐(sentence)
Atomic sentence → Predicate ( Term, …) | Term = Term
Term → Function ( Term,…) | Constant | Variable
Connective → ═>| Λ | V | <=>
Quantifier → V | Ǝ
Constant → A | X1 | John
Variable → a | x | s…
Predicate → Before | Hascolor | Raining | ---

32
Function → Mother| Leftlegof |

• A term is a logical expression that refers to an object. The formal semantics of


terms is that an interpretation must specify which object in the world is referred
to by the function symbol and objects referred to the terms of its arguments.
• Constant symbol: specifies which object is referred to by each constant
symbol. e.g. A,B,ADE etc.
• Predicate symbol: refer to a particular relation in the model.
• Function symbol: These are functional relation that relates any given object to
exactly one other object. E.g. any angle has only one number that is its cosine,
any person has only one person as his or her father.
• Atomic sentences: is formed from a predicate symbol followed by a
parenthesized list of terms. For example:
• Brother(Richard, John)
• This sentence states Richard is the brother of John.
• Atomic sentences can have arguments that are complex terms e.g.
• Married (fatherof (Richard), Motherof (John))- This states that the father of
Richard married the mother of John. An atomic sentence is true if the relation
referred to by the predicate symbols holds between the objects referred to by the
arguments.
• Complex sentences: A complex sentence is combination of two or more atomic
sentences using logical connections e.g.
• Brother(Richard, John) Λ Boss(Richard, John) – is true when Richard is the
brother of John and Richard is the Boss of John.
• Older(John, 30) => ┐Younger (John, 30) – if John is older than 30, then he is
not younger than 30
• Quantifiers: Quantifiers helps to express properties of entire collections of
objects, rather than having to enumerate objects by name. The two standard
quantifiers in First-order logic are called Universal and Existential operators
• V cat(x) => mammal(x). This is pronounced for all x, if x is a cat then x is a
mammal.

33
• Thus the universal quantifier makes statements about every object in the
universe. The existential quantifier allows us to make statements about some
object in the universe without naming it.
• Ǝ sister (x, spot) Λ cat (x) – This is pronounced “ There exist x, such that x is a
sister to spot and x is a cat
Exercise
Represent the following sentences in first-order logic using a consistent vocabulary you
must define.
a) Not all student take both History and Biology
b) Only one student failed History
c) Only one student failed both History and Biology
d) The best score in History was better than the best score in Biology
e) Every person who dislikes all vegetarian is smart
f) No person likes smart vegetarians
g) There is a woman who likes all men who are not vegetarians
h) There is a barber who shaves all men in town who do not shave themselves
i) No person likes a Professor unless the professor is smart
j) Politicians can fool some of the people all of the time, and they can fool all of
the people some of the time, but they can’t fool all of the people all of the time.
Sample Solution
a) V student(x) => ┐ (take(x, History) Λ take(x, Biology))
b) V student(x, onlyone) = student_failed(x, history)
c) V student(x, onlyone) => student_failed(x, history) Λ
student_failed(x, biology)
d) V betterthan(bestscore(x, history), bestscore(x, biology))
(e) V person(x) Ʌ dislike(x, vegetarian) → smart(x)

34
Semantic Networks

A semantic network or net is a graphic notation for representing knowledge in patterns


of interconnected nodes and arcs. Computer implementations of semantic networks
were first developed for artificial intelligence and machine translation, but earlier
versions have long been used in philosophy, psychology, and linguistics.

What is common to all semantic networks is a declarative graphic representation that


can be used either to represent knowledge or to support automated systems for
reasoning about knowledge. Some versions are highly informal, but other versions are
formally defined systems of logic. Following are six of the most common kinds of
semantic networks:

1. Definitional networks emphasize the subtype or is-a relation between a concept


type and a newly defined subtype. The resulting network, also called a
generalization or subsumption hierarchy, supports the rule of inheritance for
copying properties defined for a supertype to all of its subtypes. Since
definitions are true by definition, the information in these networks is often
assumed to be necessarily true.
2. Assertional networks are designed to assert propositions. Unlike definitional
networks, the information in an assertional network is assumed to be
contingently true, unless it is explicitly marked with a modal operator. Some
assertional netwoks have been proposed as models of the conceptual structures
underlying natural language semantics.
3. Implicational networks use implication as the primary relation for connecting
nodes. They may be used to represent patterns of beliefs, causality, or
inferences.
4. Executable networks include some mechanism, such as marker passing or
attached procedures, which can perform inferences, pass messages, or search for
patterns and associations.
5. Learning networks build or extend their representations by acquiring knowledge
from examples. The new knowledge may change the old network by adding and
deleting nodes and arcs or by modifying numerical values, called weights,
associated with the nodes and arcs.
6. Hybrid networks combine two or more of the previous techniques, either in a
single network or in separate, but closely interacting networks.

Some of the networks have been explicitly designed to implement hypotheses about
human cognitive mechanisms, while others have been designed primarily for computer
efficiency.

“Knowledge representation is an issue that arises in both cognitive science and


artificial intelligence. In cognitive science it is concerned with how people store and

35
process information. In artificial intelligence (AI) the primary aim is to store knowledge
so that programs can process it and achieve the verisimilitude of human intelligence.

In cognitive theory

For some authors knowledge is stored either in episodic or semantic memory. The
further is organized in spacio-temporal dimensions, the second according semantic
content-oriented principles, e.g. networks of concepts.

“Long-Term Memory which is a large storage system, stores factual information,


procedural rules of behavior, experiential knowledge , in fact everything we know. We
have two types of long term memory Episodic and semantic memory Episodic memory
represents our memory of events and experiences in a serial form. It is from this
memory that we can reconstruct the actual events that took place at a given point in our
lives. The second is Semantic memory, which is a structured record of facts, concepts
and skills that we have acquired. The information in semantic memory is derived from
that in our episodic memory, such that we can learn new facts or concepts from our
experiences. Semantic memory is structured in some way to allow access to
information representation of relationship between pieces of information and inference.
One model for the way in which semantic memory is structured is as a network. Items
are associated to each other in classes and may inherit attributes from parent classes.
This model is known as a Semantic network.

In computer science

In computer science, a semantic network can be defined as a knowledge representation


formalism which describes objects and their relationships in terms of a network
consisting of labelled arcs and nodes.

• “A semantic network is often used as a form of knowledge representation. It is a


directed graph consisting of vertices which represent concepts and edges which
represent semantic relations between the concepts.”
• “A semantic network is a knowledge representation tool consisting of a
framework of semantically related terms, with the purpose of allowing a
definition of those words through their relationships.”
• Most ontologies use a kind of semantic network for knowledge representation.

The advantages of knowledge representation structure like semantic network over First-
Order Logic includes the fact that:
1. It makes it easy to describe properties of relations
2. It is a form of object-oriented programming and has the advantages that such
systems normally have, including modularity and ease of viewing by people.

36
Here is an example of semantic nets:

Figure 1. Animals-Birds-Tweety

The major problem with semantic nets is that although the name of this knowledge
representation language is semantic nets, there is not, ironically, clear semantics of the
various network representations. For the above example, it can be interpreted as the
representation of a specific bird named Tweety, or it can be interpreted as a
representation of some relationship between Tweety, birds and animals.

Semantic networks can also include other explicit kind of relationships among concept
and concept types that adequately represent the semantic relationships among entities.
As an example, Figure 2 shows a KL-ONE network that defines the concepts Truck and
TrailerTruck as subtypes of Vehicle.

37
Figure 2: Truck and TrailerTruck in KL-ONE

Figure 2 has nine ovals for concept nodes and nine arrows, which represent different
kinds of links. The white ovals represent generic concepts for the types, as
distinguished from the shaded oval, which is an individual concept for the instance 18.
The oval marked with an asterisk * indicates that Integer is a built-in or primitive type.
The concepts Truck and TrailerTruck are defined in Figure 2, but Vehicle, Trailer,
WtMeasure, and VolMeasure would have to be defined by other KL-ONE diagrams.

The double-line arrows represent subtype-supertype links from TrailerTruck to Truck


and from Truck to Vehicle. The arrows with a circle in the middle represent roles. The
Truck node has four roles labeled UnloadedWt, MaxGrossWt, CargoCapacity, and
NumberOfWheels. The TrailerTruck node has two roles, one labeled HasPart and one
that restricts the NumberOfWheels role of Truck to the value 18. The notation v/r at the
target end of the role arrows indicates value restrictions or type constraints on the
permissible values for those roles.

Generally, formal graph notations annotated with specific labels can be used for
semantic network representations.

Exercises

Show the representation of the following: i) Student ii) Teacher iii) Worker iv)
Computer.

38
Learning
What is Learning?
• Learning is the process through an entity acquires knowledge.
• Machine intelligence is not natural and automatic, an intelligent machine is one
that exhibits intelligence after a process of learning.
Approaches to Learning:
• Symbolic learning: describes systems that formulate and modify rules, facts,
and relationships, explicitly represented in words or symbols. In other words
they create and modify their own knowledge base.
• Numerical learning: refers to systems that use numerical models, where
certain techniques are used for optimizing the numerical parameters. Examples
include neural networks, genetic algorithms and simulated annealing.
• Learning can be with a teacher in which case it is said too be supervised.
Unsupervised learning is learning without a teacher.
Classification of learning
• Rote Learning: The system is given confirmation of correct decisions.
When it produces incorrect decision it is “spoon fed” with the correct rule or
relationship that it should have used.
• Learning from Advice: Rather than being given a specific rule that should
apply in a given circumstance, the system is given a piece of general advice,
such as “gas is more likely to escape from a valve than from a pipe”. The
system must sort out for itself how to move from this high level advice to an
immediately usable rule.
• Learning by Induction: The system is presented with sets of example data
and is told the correct conclusion that is should draw from each. The system
continually refines its rules and relations so as to correctly handle each new
example.
• Learning by Analogy: The system is told the correct response to a similar,
but not identical task. The system must adapt the previous response to
generate a new rule applicable to the new circumstances.
• Explanation-Based Learning (EBL): The system analyzes a set of
examples solutions and their outcomes to determine why each one was
successful or otherwise. Explanations are generated, which are used to guide
future problem solving. An example of an EBL system is PRODIGY ( a
general purpose problem-solver).
• Case-Based Reasoning (CBR): Any case about which the system has
reasoned is filed away, together with the outcome, whether it is successful
or otherwise. Whenever a new case is encountered, the system adapts its
stored behaviour to fit the new circumstances.
• Explorative or Unsupervised Learning: This is also called discovery
learning, rather than having an explicit goal, an explorative system
continuously searches for patterns and relationships in the input data,
perhaps marking some patterns as interesting and warranting further
investigation. Examples of application of unsupervised learning can be
found :

39
o data mining : where patterns are sought among large or complex
data sets;
o Identifying clusters, possibly for compressing the data;
o Feature recognition

2. Expert systems

• Expert systems are meant to solve real problems which normally would
require a specialised human expert (such as a doctor or a minerologist).
• Building an expert system entails extracting the relevant knowledge from
the human expert and representing the ‘heuristic knowledge’ in knowledge
base.
• A knowledge engineer has the job of extracting this knowledge and building
the expert system. This is called Knowledge Elicitation and Encoding.
• The most widely used knowledge representation scheme for expert systems
is rules (sometimes in combination with frame systems).
• Typically, the rules won't have certain conclusions - there will just be some
degree of certainty that the conclusion will hold if the conditions hold.
Statistical techniques are used to determine these certainties. Rule-based
systems, with or without certainties, are generally easily modifiable and
make it easy to provide reasonably helpful traces of the system's reasoning.
These traces can be used in providing explanations of what it is doing.
• Expert systems have been used to solve a wide range of problems in
domains such as medicine, mathematics, engineering, geology, computer
science, business, law, defence.

Expert System Architecture

Figure 1 shows the most important modules that make up a rule-based expert system.
The user interacts with the system through a user interface which may use menus,
natural language or any other style of interaction). Then an inference engine is used to
reason with both the expert knowledge (extracted from our friendly expert) and data
specific to the particular problem being solved. The expert knowledge will typically be
in the form of a set of IF-THEN rules. The case specific data includes both data
provided by the user and partial conclusions (along with certainty measures) based on
this data. In a simple forward chaining rule-based system the case specific data will be
the elements in working memory.

40
Almost all expert systems also have an explanation subsystem, which allows the
program to explain its reasoning to the user. Some systems also have a knowledge base
editor which help the expert or knowledge engineer to easily update and check the
knowledge base. `

One important feature of expert systems is the way they (usually) separate domain
specific knowledge from more general purpose reasoning and representation
techniques. The general purpose bit (in the dotted box in the figure) is referred to as an
expert system shell. As we see in the figure, the shell will provide the inference engine
(and knowledge representation scheme), a user interface, an explanation system and
sometimes a knowledge base editor. Given a new kind of problem to solve (say, car
design), we can usually find a shell that provides the right sort of support for that
problem, so all we need to do is provide the expert knowledge. There are numerous
commercial expert system shells, each one appropriate for a slightly different range of
problems. (Expert systems work in industry includes both writing expert system shells
and writing expert systems using shells.) Using shells to write expert systems generally
greatly reduces the cost and time of development (compared with writing the expert
system from scratch).

Examples : MYCIN, XCON(R1),PROSPECTOR

41

You might also like