BookSlides 6A Probability-Based Learning PDF
BookSlides 6A Probability-Based Learning PDF
BookSlides 6A Probability-Based Learning PDF
1 Big Idea
2 Fundamentals
Bayes’ Theorem
Bayesian Prediction
Conditional Independence and Factorization
4 Summary
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Big Idea
(a)
(b)
Likelihood
(b)
Figure: A game of find the lady : (a) the cards dealt face down on a
table; and (b) the initial likelihoods of the queen ending up in each
position.
(a)
Likelihood
(b)
Figure: A game of find the lady : (a) the cards dealt face down on a
table; and (b) a revised set of likelihoods for the position of the queen
based on evidence collected.
(a)
Likelihood
(b)
Figure: A game of find the lady : (a) The set of cards after the wind
blows over the one on the right; (b) the revised likelihoods for the
position of the queen based on this new evidence.
Figure: A game of find the lady : The final positions of the cards in
the game.
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Big Idea
We can use estimates of likelihoods to determine the most
likely prediction that should be made.
More importantly, we revise these predictions based on
data we collect and whenever extra evidence becomes
available.
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Fundamentals
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Bayes’ Theorem
Bayes’ Theorem
P(Y |X )P(X )
P(X |Y ) =
P(Y )
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Bayes’ Theorem
Example
After a yearly checkup, a doctor informs their patient that he
has both bad news and good news. The bad news is that the
patient has tested positive for a serious disease and that the
test that the doctor has used is 99% accurate (i.e., the
probability of testing positive when a patient has the disease is
0.99, as is the probability of testing negative when a patient
does not have the disease). The good news, however, is that
the disease is extremely rare, striking only 1 in 10,000 people.
Bayes’ Theorem
P(t|d)P(d)
P(d|t) =
P(t)
0.99 × 0.0001
P(d|t) =
0.0101
= 0.0098
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Bayes’ Theorem
P(X |Y )P(Y
) P(Y |X )P(X )
=
P(Y )
P(Y )
P(Y |X )P(X )
⇒P(X |Y ) =
P(Y )
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Bayes’ Theorem
0 ≤ P(X |Y ) ≤ 1
X
P(Xi |Y ) = 1.0
i
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Bayes’ Theorem
Bayesian Prediction
Bayesian Prediction
Chain Rule
P(q[1], . . . , q[m]) =
P(q[1]) × P(q[2]|q[1])×
· · · × P(q[m]|q[m − 1], . . . , q[2], q[1])
Bayesian Prediction
Bayesian Prediction
P(M|h, ¬f , v ) =?
Bayesian Prediction
Bayesian Prediction
|{d5 , d8 , d10 }| 3
P(m) = = = 0.3
|{d1 , d2 , d3 , d4 , d5 , d6 , d7 , d8 , d9 , d10 }| 10
|{d3 , d4 , d6 , d7 , d8 , d10 }| 6
P(h, ¬f , v ) = = = 0.6
|{d1 , d2 , d3 , d4 , d5 , d6 , d7 , d8 , d9 , d10 }| 10
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Bayesian Prediction
P(h, ¬f , v | m) =?
Bayesian Prediction
Bayesian Prediction
Bayesian Prediction
Bayesian Prediction
P(m|h, ¬f , v ) = 0.3333
P(¬m|h, ¬f , v ) = 0.6667
Bayesian Prediction
Bayesian Prediction
P(m | h, f , ¬v ) =?
P(¬m | h, f , ¬v ) =?
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Bayesian Prediction
!
P(h|m) × P(f | h, m)
× P(¬v | f , h, m) × P(m)
P(m | h, f , ¬v ) =
P(h, f , ¬v )
0.6666 × 0 × 0 × 0.3
= =0
0.1
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Bayesian Prediction
!
P(h|¬m) × P(f | h, ¬m)
× P(¬v | f , h, ¬m) × P(¬m)
P(¬m | h, f , ¬v ) =
P(h, f , ¬v )
0.7143 × 0.2 × 1.0 × 0.7
= = 1.0
0.1
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Bayesian Prediction
P(m | h, f , ¬v ) = 0
P(¬m | h, f , ¬v ) = 1.0
Bayesian Prediction
Curse of Dimensionality
As the number of descriptive features grows the number of
potential conditioning events grows. Consequently, an
exponential increase is required in the size of the dataset as
each new descriptive feature is added to ensure that for any
conditional probability there are enough instances in the
training dataset matching the conditions so that the resulting
probability is reasonable.
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Bayesian Prediction
P(X |Y ) = P(X )
P(X , Y ) = P(X ) × P(Y )
Recall, that when two event are dependent these rules are:
P(X , Y )
P(X |Y ) =
P(Y )
P(X , Y ) = P(X |Y ) × P(Y ) = P(Y |X ) × P(X )
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
P(X |Y , Z ) = P(X |Z )
P(X , Y |Z ) = P(X |Z ) × P(Y |Z )
P(X , Y )
P(X |Y ) =
P(Y ) P(X |Y ) = P(X )
P(X , Y ) = P(X |Y ) × P(Y ) P(X , Y ) = P(X ) × P(Y )
= P(Y |X ) × P(X )
X and Y are independent
X and Y are dependent
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
P(q[1], . . . , q[m] | t = l)
= P(q[1] | t = l) × P(q[2] | t = l) × · · · × P(q[m] | t = l)
m
Y
= P(q[i] | t = l)
i=1
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
m
!
Y
P(q[i] | t = l) × P(t = l)
i=1
P(t = l | q[1], . . . , q[m]) =
P(q[1], . . . , q[m])
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
P(m|h, f , ¬v ) = 0.1948
P(¬m|h, f , ¬v ) = 0.8052
A Worked Example
A Worked Example
C REDIT G UARANTOR /
ID H ISTORY C O A PPLICANT ACCOMMODATION F RAUD
1 current none own true
2 paid none own false
3 paid none own false
4 paid guarantor rent true
5 arrears none own false
6 arrears none own true
7 current none own false
8 arrears none own false
9 current none rent false
10 none none own true
11 current coapplicant own false
12 current none own true
13 current none rent true
14 paid none own false
15 arrears none own false
16 current none own false
17 arrears coapplicant rent false
18 arrears none free false
19 arrears none own false
20 paid none own false
Summary
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
P(d|t) × P(t)
P(t|d) = (2)
P(d)
1 Big Idea
2 Fundamentals
Bayes’ Theorem
Bayesian Prediction
Conditional Independence and Factorization
4 Summary