06 Learning Systems
06 Learning Systems
06 Learning Systems
www.myreaders.info
Learning System
Artificial Intelligence
Learning System
Artificial Intelligence
Topics
(Lectures 31, 32, 33, 34 4 hours) Slides
1. What is 03-09
Learning
Definition, learning agents, components of learning system; Paradigms
of machine learning.
2. Rote Learning 10
6. Clustering
53-62
Distance functions, K-mean clustering –
algorithm.
7. Analogy 63
9. Reinforcement Learning
68-80
RL Problem : Agent - environment interaction, key Features; RL tasks,
Markov system, Markov decision processes, Agent’s learning task,
Policy, Reward function, Maximize reward, Value functions.
10 References 81
02
RC Chakraborty,
www.myreaders.info
Learning
Machine Learning
• Mitchell, 1997
A computer program is said to learn from experience E with respect to
some class of tasks T and performance measure P, if its performance at
tasks in T, as measured by P, improves with experience E.
03
RC Chakraborty,
www.myreaders.info
AI -
1. What is Learning Learning
1.1 Definition
04
RC Chakraborty,
www.myreaders.info
◊ Learning element,
◊ Performance element,
◊ Critic, and
◊ Problem generator.
Critic Percepts
Sensors
E
N
feed back V
changes R
Learning Element Performance O
Element N
knowledge M
learning goals E
N
Effectors T
Actions
Problem
Generator
Learning Agent
driving actions.
e.g., turning, accelerating, braking are performance element on roads.
■ Learning Element: Formulates goals.
e.g., learn rules for braking, accelerating, learn geography of the city.
■ Critic: Observes world and passes information to learning element.
e.g. , quick right turn across three lanes of traffic, observe reaction
of other drivers.
■ Problem Generator: Try south city road .
07
RC Chakraborty,
www.myreaders.info
world, the organisms that are poorly suited for an environment die off,
while those well-suited for it prosper. Genetic algorithms search the
space of individuals for good candidates. The "goodness" of an
individual is measured by some fitness function. Search takes place in
parallel, with many individuals in each generation.
The techniques used for constructing class definitions (or concept leaning) are :
• Version Spaces
• Decision Trees
Note : The Decision tree technique has been explained in all details
while other two techniques are briefly described in the next few slides.
11
RC Chakraborty,
www.myreaders.info
■ The program for Each concept is learned through near miss. A near
Wedge Brick A
has-part has-part
Supported - by
B C
isa isa
12
RC Chakraborty,
www.myreaders.info
• Winston's Program
• Fundamental Assumptions
14
RC Chakraborty,
www.myreaders.info
Tree nodes :
‡ each node is connected to a model.
‡ nodes in the generalization tree are connected to a model that
matches everything in its sub-tree.
‡ nodes in the specialization tree are connected to a model that
matches only one thing in its sub-tree.
Links between nodes :
‡ denotes generalization relations in a generalization tree, and
15
RC Chakraborty,
www.myreaders.info
The key idea in version space learning is that specialization of the general
models and generalization of the specific models may ultimately lead
to just one correct model that matches all observed positive examples
and does not match any negative examples. This means :
■ Finally, the positive and negative examples may be such that only one
general model and one identical specific model survive.
16
RC Chakraborty,
www.myreaders.info
Examples 1 to 5 of features
Country of Origin, Manufacturer, Color, Decade, Type
Example Origin Manufacturer Color Decade Type Example Type
G= ( ?, ?, ?, ?, ? )
These two models represent the most general and the most specific
heuristics one might learn.
20
RC Chakraborty,
www.myreaders.info
‡ At this point, the new G and S sets specify a version space that
can be translated roughly in to English as : The concept may be as
specific as "Japanese, blue economy car".
21
RC Chakraborty,
www.myreaders.info
‡ Prune away all the specific models that match the negative example.
No match found therefore no change in S
S = { (Japan, ?, Blue, ?, Economy) }
G = ( ?, ?, ?, ?, ? )
(?, ?, ?, ?, Economy)
(Japan, ?, Blue, ?, ?)
S = {(Japan, ?, ?, ?, Economy)}
‡ It is now clear that the car must be Japanese, because all description in
the version space contain Japan as origin.
22
RC Chakraborty,
www.myreaders.info
It is positive example.
‡ Prune G to exclude descriptions inconsistent with positive example.
G = { (Japan, ?, ?, ?, Economy) }
(?, ?, ?, ?, Economy)
(?, ?, Blue, ?, ?)
S = {(Japan, ?, ?, ?, Economy)}
S = {(Japan, ?, ?, ?, Economy)}
G = {(?, ?, ?, ?, Economy)}
G = {(Japan, ?, ?, ?, Economy)}
S = {(Japan, ?, ?, ?, Economy)}
23
RC Chakraborty,
www.myreaders.info
• Description
A A Decision node
C=red K=y
C=blue Leaf node
A
A
B < 4.5
B ≥ 4.5 B ≥ 8.1 B < 8.1
C=true C=false
K=y K=y
24
RC Chakraborty,
www.myreaders.info
■ Description
■ Attribute Selection
H(X) = n n
p(xi) • log2 ( 1/p(xi) ) = - p(xi) • log2 p(xi)
i= i=
1 1
Uncertainty
Uncertainty about the value of Y is measured by its entropy, H(Y).
uncertainty about the value of Y when we know the value of X is
given by the conditional entropy of Y given X, i.e., H(Y |X).
H(Y) = k
- ∑ yi=1 p(yi) • log2 p(yi) is the initial entropy in Y,
l
H(Y |X) = ∑ xj=1 p(xj) • H(Y |xj) is conditional entropy of Y
26
RC Chakraborty,
www.myreaders.info
■ Example
‡ Learning set
In the above example, two attributes, the Temperature and
Humidity have continuous ranges. ID3 requires them to be
discrete like hot, medium, cold, high, normal. Table below
indicates the acceptable values.
27
RC Chakraborty,
www.myreaders.info
28
RC Chakraborty,
www.myreaders.info
‡ Attribute Selection
By applying definitions of Entropy and Information gain
Y is given by H(Y)
k k
H(Y) = p(yi) • log2 ( 1/p(yi) ) = - p(yi) • log2 p(yi)
i= i=
1 1
l
p(xj) • H(Y |xj) is conditional entropy of Y
H(Y |X) = xj= l
1
given X where H(Y |xj) = - p (yi | xj) • log2 p(yi | xj) is
xj=1
For each attribute, the gain is calculated and the highest gain is
used in the decision node.
30
RC Chakraborty,
www.myreaders.info
Entropy(Ssunny)
= – (2/5) x log2(2/5) – (3/5) x log2(3/5) = 0.970950
Entropy(Scloudy)
= – (4/4) x log2(4/4) – (0/4) x log2(0/4) = 0
Entropy(Srainy)
= – (3/5) x log2(3/5) – (2/5) x log2(2/5) = 0.970950
31
RC Chakraborty,
www.myreaders.info
Entropy(Shot)
= – (2/4)x log2(2/4) – (2/4) x log2(2/4) = -0.99999999
Entropy(Smedium)
= – (4/6)x log2(4/6) – (2/6) x log2(2/6) = -0.91829583
Entropy(Scold)
= – (3/4)x log2(3/4) – (1/4) x log2(1/4) = -0.81127812
32
RC Chakraborty,
www.myreaders.info
Entropy(Shigh)
= – (3/7) x log2(3/7) – (4/7) x log2(4/7) = -0.9852281
Entropy(Snormal)
= – (6/7) x log2(6/7) – (1/7) x log2(1/7) = -0.5916727
33
RC Chakraborty,
www.myreaders.info
Entropy(Sweak)
= – (6/8) x log2(6/8) – (2/8) x log2(2/8) = 0.811
Entropy(Sstrong)
= – (3/6) x log2(3/6) – (3/6) x log2(3/6) = 1.00
34
RC Chakraborty,
www.myreaders.info
35
RC Chakraborty,
www.myreaders.info
◊ Step 09 : This process goes on until all days data are classified
Outlook
sunnycloudy rainy
■ Applications
38
RC Chakraborty,
www.myreaders.info
AI – Explanation Based Learning
4. Explanation Based Learning (EBL) (EBL)
Inputs
Specific Partial
goal / external
problem solution
Generalizer
39
RC Chakraborty,
www.myreaders.info
• Input to EBL :
The blocks color yellow are external to EBL.
They are the 4 different kinds of input that EBL algorithm accepts.
It represents facts and rules that constitute what the learner knows.
The facts describe an instance of the goal concept and the rules
describe relationships between objects and actions in a domain; e.g.,
the cup domain includes facts: concavities, bases, and lugs , as well as
rules: about lift ability, stability and what makes an open vessel .
4. Finally, the operational rules are extracted from the general explanation.
The definition of the target concept is conjunction of the remaining leaf
nodes.
41
RC Chakraborty,
www.myreaders.info
Goal Concept
Training example
colour(Obj23, Blue) ^ has-part(Obj23, Handle16) ^ has-part(Obj23, Bottom19)
^ owner(Obj23, Ralph) ^ has-part(Obj23, Concavity12) ^ is(Obj23, Light) ^
is(Ralph, Male) ^ isa(Handle16,Handle) ^ isa(Bottom19, Bottom) ^
is(Bottom19, Flat) ^ isa(Concavity12, Concavity) ^
is(Concavity12, Upward-Pointing)
Domain theory
has-part(x,y) ^ isa(y,Concavity) ^ is(y, Upward-Pointing) open-vessel(x)
is(x, Light) ^ has-part(x,y) ^ isa(y,Handle) liftable(x)
has-part(x,y) ^ isa(y, Bottom) ^ is(y,Flat) stable(x)
Operationality Criterion
Cup(Obj1)
isa(Concavity1 , Flat)
isa(Concavity1, Bottom)
has- part( Obj 1 , Concavity
1)
43
RC Chakraborty,
www.myreaders.info
AI –Learning :
5. Discovery Discovery
Simon (1966) first proposed the idea that we might explain scientific discovery
in computational terms and automate the processes involved on a computer.
Project DENDRAL (Feigenbaum 1971) demonstrated this by inferring structures
of organic molecules from mass spectra, a problem previously solved only by
experienced chemists.
These two discovery programs are illustrated in the next few slides.
44
RC Chakraborty,
www.myreaders.info
• AM (Automated Mathematician)
AM is a heuristic driven program that discovers concepts in elementary
mathematics and set theory. AM has 2 inputs:
(a) description of some concepts of set theory: e.g. union, intersection;
• BACON System
The next few slides shows how BACON1 rediscovers Kepler’s third Law and
BACON3 rediscovers Ideal Gas Law.
47
RC Chakraborty,
www.myreaders.info
+ I is in all cases.
49
RC Chakraborty,
www.myreaders.info
AI –Learning :
■ Apply heuristics 1 to 4 : (Iteration 3) Discovery
Try heuristic 1: not applicable, D2/P is not constant
Try heuristic 2: not applicable, D2/P is not linearly related with any other
variable
Try heuristic 3: applicable, D2/P decreases as D/P increases, so add the
new variable: (D2/P) × (D/P) = D3/P2
■ Conclusion : D3/P2
This is Kepler’s third law. (took about 20 years to discover it !)
■ A limitation of BACON.1
It works only for target equation relating at most two variable.
50
RC Chakraborty,
www.myreaders.info
BACON.3 :
BACON.3 is a knowledge based system production system that discovers
empirical laws. The main heuristics detect constancies and trends in data,
and lead to the formulation of hypotheses and the definition of theoretical
terms. The program represents information at varying levels of
description. The lowest levels correspond to direct observations, while the
highest correspond to hypotheses that explain everything so far observed.
BACON.3 is built on top of BACON.1.
■ BACON holds some constant and try to notice trends in the data.
■ BACON has also been applied to Kepler's 3rd law, Ohm's law,
conservation of momentum and Joule's law.
51
RC Chakraborty,
www.myreaders.info
• Example :
Rediscovering the ideal gas law pV/nT = 8.32, where p is the pressure on
a gas, n is the number of moles, T is the temperature and V the volume of
the gas. [The step-by-step complete algorithm is not given like previous
example, but the procedure is explained below]
• ••
••• • ••
••• •••
••
■■
■ ■■■
■ ■■
Let R denote the field of real numbers. The space of all n-tuples of real
numbers forms an n-dimensional vector space over R, denoted by Rn.
An element of Rn is written as X = (x1, x2, …xi…., xn), where xi is a
real number. Similarly the other element Y = (y1, y2, …yi…., yn) .
θ = cos-1 (
X•Y
||X|| ||Y|| )
∑n
||X || = i=1
(xi - yi)2
54
RC Chakraborty,
www.myreaders.info
between two points that one would measure with a ruler. The
Euclidean distance between two points P = (p1, p2, . . pi . . , xn) and Q
= (q1, q2,
. . qi . . , qn) in Euclidean n-space, is defined as :
with coordinates (x1, y1) and the point P2 with coordinates (x2 , y2) is
|x1 – x2| + |y1 – y2|
■ Minkowski Metric
‡ Euclidean distance IF h = 2
dist(Xi, Xj) = ( (xi1 – xj1)2 + (xi2 – xj2)2 + . . + (xir – xjr)2 )1/2
=( (xi1 – xj1)2 + (xi2 – xj2)2 + . . + (xir – xjr)2 )
‡ Manhattan distance IF h = 1
be first centroids. If P1
4.0 D ◊
3.5
and P2 denote 3.0 C ◊
coordinates of the centroids, Y 2.5
2.0
then P1= (1, 1) and P2 = (2, 1). 1.5
1.0 A ◊ B◊
0.5
Let centroid P1 be for cluster group-1 0
01 23 4 5
Let centroid P2 be for cluster group-2. X
58
1. Iteraion 0
(a) Objects Clusters centers stated before : Objects : A , B, C, D
Group-1 has center P1 = (1, 1) ; Attributes : X and Y
Group-2 has center P2 = (2, 1) ; A B C D
X 1 2 4 5
Y 1 1 3 4
(b) Calculate distances between cluster center to each object
1st, calculate the Euclidean distances from cetroid P1 to each point A,
B, C, D. It is the 1st row of the distance matrix.
2nd, calculate the Euclidean distances from cetroid P2 to each point A,
B, C, D. It is the 2nd row of the distance matrix.
The ways to calculate just two distance matrix elements D13 and D23 are :
Similarly calculate other elements D11 , D12 , D14 , D21 , D22 , D24
(c) Distance matrix becomes
1st row indicates group-1 cluster
2nd row indicates group-2 cluster
RC Chakraborty,
www.myreaders.info
The cluster groups have new members. Compute the new centroids
of
each group. Repeat the process of iteration indicated below.
Group-1 has one member A, the
centroid remains as P1 = (1, 1). 4.5
4.0 D ◊
Group-2 now has 3 members B, 3.5
The cluster groups have new members. Compute the new centroids
of
60
each group. Repeat the process of iteration indicated below.
RC Chakraborty,
www.myreaders.info
AI –Learning :
3. Iteration 2 : Clustering
The cluster groups have new members. Compute the new centroids of
each group. Repeat the process of iteration indicated below.
Group-1 has 2 members, so the
new centroid is the average 4.5
This means the grouping of objects in this last iteration and the one
before does not change anymore. Thus, the computation of the k-mean
clustering has reached its stability and no more iteration is needed.
Results of final grouping are :
Medicine A 1 1 1
Medicine B 2 1 1
Medicine C 4 3 2
Medicine A 3 4 2
62
RC Chakraborty,
www.myreaders.info
AI –Learning by
7. Analogy analogy
Example: Infer by analogy the hydraulics laws that are similar to Kirchoff's laws.
Qa = 3 Qb = 9
I1 I2
I3 = I1 + I2
Qc = ?
The Sun has a greater mass than the Earth and attracts it, causing the Earth to
revolve around the Sun. The nucleus also has a greater mass then the electron
and attracts it. Therefore it is plausible that the electron also revolves around the
nucleus.
63
RC Chakraborty,
www.myreaders.info
AI –Learning - Neural net and Genetic learning
8. Neural net and Genetic
The Neural net, the Genetic learning
Learning
and the Reinforcement learning are
■ A Perceptron is a model of a
single `trainable' neuron.
64
RC Chakraborty,
www.myreaders.info
AI –Learning - Neural net
■ Perceptron is a model of a single `trainable' neuron.
X1
w1
X2 w2
w3
X3 T Y
●
wn
●
Xn
‡ w1, w2, . . . , wn are weights of the edges and are real valued.
‡ When two organisms mate they share their genes. The resultant
offspring may end up having half the genes from one parent and
half from the other. This process is called cross over.
‡ A gene may be mutated and expressed in the organism as a
completely new trait.
population.
(3) [New population] Create a new population by repeating following
(4) [Replace] Use new generated population for a further run of the
algorithm.
(5) [Test] If the end condition is satisfied, stop, and return the best
The RL Problem and the tasks are illustrated in the next few slides.
68
RC Chakraborty,
www.myreaders.info
Agent
state st reward rt
action at
rt + 1
Environment
st + 1
steps, t = 0, 1, 2, 3, 4, - - -
‡ At each discrete time t, the agent (learning system) observes state
st ∈ S and chooses action at ∈ A
its experience.
‡ The agent's goal, is to maximize the total amount of reward it
• Key Features of RL
■ The learner is not told what actions to take, instead it find finds out
‡ Policy
‡ Reward
‡ Value Functions
Each of these elements are explained in the next few slides. However, first
the Stochastic process, Markov chain, and few notations are explained.
71
RC Chakraborty,
www.myreaders.info
■ Stochastic processes
A physical stochastic processes is a sequences of events or path
governed by probabilistic laws.
A stochastic process X = { X(t), t ∈ T } is a collection of random
variables. For each t in the index set T, the X(t) is a random
variable. The t is often interpreted as time and X(t) the state of
the process at time t.
If the index set T is a countable set,
then we have a discrete-time stochastic process;
Else if T is a non-countable continuous set,
then we have a continuous-time stochastic process.
Any realization of X is named a sample path, which can be
discrete or continuous.
■ Markov chain
A Markov chain, is a stochastic process with the Markov property.
Markov property means, given the present state, the future states
are independent of the past states; means the future states depends
only on its present state, and not on any past states.
In other words, the description of the present state fully captures all
the information that could influence the future evolution of the
process. Future states will be reached through a probabilistic process
instead of a deterministic one.
At each step the system may change its state from the current
state to another state, or remain in the same state, according to a
certain probability distribution. The changes in state are called
transitions, and the probabilities associated with various state-changes
are called transition probabilities.
72
RC Chakraborty,
www.myreaders.info
■ Notations followed
t tR (n)
n - step return
Rt - return
73
RC Chakraborty,
www.myreaders.info
• Markov System
75
RC Chakraborty,
www.myreaders.info
0.8
0.2 1 2
0.4
0.6
0.5 0.3
0.1
■ Transition matrix :
The matrix P whose ijth entry is pij is called transition matrix
associated with the system. The entries in each row add up to 1.
Example :
A 2 x 2 transition matrix P
To
1 2
1 P11 P12 Arrows originating in state 1
From
2 P21 P22 Arrows originating in state 2
76
RC Chakraborty,
www.myreaders.info
Description :
MDP is a discrete time stochastic control process characterized by a
set of states;
In each state there are several actions from which the decision maker
state transitions possess the Markov property, ie., given the state of
the MDP at time t, the transition probabilities to the state at time t+1
are independent of all previous states or actions.
MDPs are an extension of Markov chains. The differences are the
‡ Set of states S
pa ss′ = P ( st+1 = s′ |s t= s, a =t a′ )
77
RC Chakraborty,
www.myreaders.info
: S → A with (s) = a
giving the action chosen in the state S.
• Policy
It defines the learning agent's way of behaving at a given time.
A mapping from states to actions (deterministic policy), or the distributions
over actions (stochastic policy).
• Reward Function
The function defines the goal in a reinforcement learning problem.
■ Tells what are the good and bad events for the agent;
• Maximize Reward
Agent's goal is to maximize reward it receives in long run. "Agent–
Environment" interaction is generally a sequence of episodes.
If the interaction result into a sequence of separate episodes, it is known
as episodic tasks. If the interaction does not break into identifiable
episodes, but continue without limit is called continuing tasks.
• Value Functions
The value of state s under policy is the expected return when starting from
′
V ∞
(s) = E { Rt | st = s } = E {∑У t+k-1 rt+k| st = s }
K=1
K=1
80
RC Chakraborty,
www.myreaders.info
AI –Learning -
10. References : Textbooks References
1. "Artificial Intelligence", by Elaine Rich and Kevin Knight, (2006), McGraw Hill
companies Inc., Chapter 17, page 447-484.
81