15 Wk7 Decision Tree ILP
15 Wk7 Decision Tree ILP
15 Wk7 Decision Tree ILP
FOL
Syntax
FOL
Rules
(Copi)
1. Courses are either tough or boring.
2. Not all courses are boring.
3. Therefore there are tough courses.
(Cx, Tx, Bx, )
Dealing with Time
8.1 to 8.4
9.1, 9.2 (skip 9.2.3 to 9.4),
9.5 resolution (skip 9.5.4 - 9.5.5)
Problems
from
Nilsson
82
Learning Logical Rules
Decision Trees
• Trivially, there is a consistent decision tree for any training set with one path to
leaf for each example (unless f nondeterministic in x) but it probably won't
generalize to new examples
6 2 0 4 4 6 2
• Gain (Patrons) B( ) [ B( ) B( ) B( )] 0.541bits
12 12 2 12 4 12 2
1
• Gain (Type) 1 [1.B( )] 0 Information Gain is higher
2 for Patrons
Decision trees
• One possible representation for hypotheses
• E.g., here is the “true” tree for deciding whether to wait:
Information gain
• A chosen attribute A divides the training set E into subsets
E1, … , Ev according to their values for A, where A has v
distinct values.
v
p i ni pi
remainder( A) B( )
i 1 p n pi ni
• Information Gain (IG) or reduction in entropy from the
attribute test:
p
IG ( A) B( ) remainder ( A)
pn
• Choose the attribute with the largest IG
Decision tree learning
• Aim: find a small tree consistent with the training examples
• Idea: (recursively) choose "most significant" attribute as root of
(sub)tree
Example contd.
• Decision tree learned from the 12 examples:
[r, ] [~r]
[]
Logic Programming
• example of a logic program:
parent_of(charles,george).
parent_of(george,diana).
parent_of(bob,harry).
parent_of(harry,elizabeth).
grandparent_of(X,Y) :- parent_of(X,Z),
parent_of(Z,Y).
• From the program, we can ask queries about
grandparents.
• Query: grandparent_of(X,Y)?
• Answers:
• grandparent_of(charles,diana).
• grandparent_of(bob,elizabeth).
Forms of Reasoning
• Deduction: From causes to effect (Prediction)
– fact a, rule a => b
INFER b (*First-order logic*)
Given:
Background knowledge (of the domain): facts
Examples (of the relation to be learned): facts
Try to learn
Theories (as a result of learning): rules
• Generate hypothesis H:
– grandparent_of(X,Y) :- parent_of(X,Z),
parent_of(Z,Y).
Rule Learning (Intuition)
• How to come up with a rule for grandparent_of(X,Y)?
1. Take the example grandparent_of(bob,elizabeth).
2. Find the subset of background knowledge relevant to this
example: parent_of(bob,harry),
parent_of(harry,elizabeth).
3. Form a rule from these facts
grandparent_of(bob,elizabeth) :-
parent_of(bob,harry), parent_of(harry,elizabeth).
4. Generalize the rule
grandparent_of(X,Y) :- parent_of(X,Z), parent_of(Z,Y).
5. Check if this rule is valid w.r.t the positive and the negative
examples
Progol Algorithm Outline
Decision step
For given x, determine optimal t.
Minimum Expected Loss
Example: classify medical images as ‘cancer’ or ‘normal’
Loss matrix L:
Decision
Truth
Minimum Expected Loss
Decision step
For given x, make optimal
prediction, y(x), for t.
Loss function:
The Squared Loss Function
Performance measurement
• How do we know that h ≈ f ?
1. Use theorems of computational/statistical learning theory
2. Try h on a new test set of examples
(use same distribution over example space as training set)
Learning curve = % correct on test set as a function of training set size