BookSlides 6A Probability-Based Learning PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 68

Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Fundamentals of Machine Learning for


Predictive Data Analytics
Chapter 6: Probability-based Learning
Sections 6.1, 6.2, 6.3

John Kelleher and Brian Mac Namee and Aoife D’Arcy

john.d.kelleher@dit.ie brian.macnamee@ucd.ie aoife@theanalyticsstore.com


Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

1 Big Idea

2 Fundamentals
Bayes’ Theorem
Bayesian Prediction
Conditional Independence and Factorization

3 Standard Approach: The Naive Bayes’ Classifier


A Worked Example

4 Summary
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Big Idea
(a)

(b)

Figure: A game of find the lady


(a)

Likelihood

Left Center Right

(b)

Figure: A game of find the lady : (a) the cards dealt face down on a
table; and (b) the initial likelihoods of the queen ending up in each
position.
(a)

Likelihood

Left Center Right

(b)

Figure: A game of find the lady : (a) the cards dealt face down on a
table; and (b) a revised set of likelihoods for the position of the queen
based on evidence collected.
(a)

Likelihood

Left Center Right

(b)

Figure: A game of find the lady : (a) The set of cards after the wind
blows over the one on the right; (b) the revised likelihoods for the
position of the queen based on this new evidence.
Figure: A game of find the lady : The final positions of the cards in
the game.
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Big Idea
We can use estimates of likelihoods to determine the most
likely prediction that should be made.
More importantly, we revise these predictions based on
data we collect and whenever extra evidence becomes
available.
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Fundamentals
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Table: A simple dataset for M ENINGITIS diagnosis with descriptive


features that describe the presence or absence of three common
symptoms of the disease: H EADACHE, F EVER, and VOMITING.
ID H EADACHE F EVER VOMITING M ENINGITIS
1 true true false false
2 false true false false
3 true false true false
4 true false true false
5 false true false true
6 true false true false
7 true false true false
8 true false true true
9 false true false false
10 true false true true
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

A probability function, P(), returns the probability of a


feature taking a specific value.
A joint probability refers to the probability of an
assignment of specific values to multiple different features.
A conditional probability refers to the probability of one
feature taking a specific value given that we already know
the value of a different feature
A probability distribution is a data structure that
describes the probability of each possible value a feature
can take. The sum of a probability distribution must equal
1.0.
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

A joint probability distribution is a probability distribution


over more than one feature assignment and is written as a
multi-dimensional matrix in which each cell lists the
probability of a particular combination of feature values
being assigned.
The sum of all the cells in a joint probability distribution
must be 1.0.
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

P(h, f , v , m), P(¬h, f , v , m)


 

 P(h, f , v , ¬m), P(¬h, f , v , ¬m) 


 P(h, f , ¬v , m), P(¬h, f , ¬v , m) 

 P(h, f , ¬v , ¬m), P(¬h, f , ¬v , ¬m) 
P(H, F , V , M) =  

 P(h, ¬f , v , m), P(¬h, ¬f , v , m) 


 P(h, ¬f , v , ¬m), P(¬h, ¬f , v , ¬m) 

 P(h, ¬f , ¬v , m), P(¬h, ¬f , ¬v , m) 
P(h, ¬f , ¬v , ¬m), P(¬h, ¬f , ¬v , ¬m)
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Given a joint probability distribution, we can compute the


probability of any event in the domain that it covers by
summing over the cells in the distribution where that event
is true.
Calculating probabilities in this way is known as summing
out.
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Bayes’ Theorem

Bayes’ Theorem
P(Y |X )P(X )
P(X |Y ) =
P(Y )
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Bayes’ Theorem

Example
After a yearly checkup, a doctor informs their patient that he
has both bad news and good news. The bad news is that the
patient has tested positive for a serious disease and that the
test that the doctor has used is 99% accurate (i.e., the
probability of testing positive when a patient has the disease is
0.99, as is the probability of testing negative when a patient
does not have the disease). The good news, however, is that
the disease is extremely rare, striking only 1 in 10,000 people.

What is the actual probability that the patient has the


disease?
Why is the rarity of the disease good news given that the
patient has tested positive for it?
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Bayes’ Theorem

P(t|d)P(d)
P(d|t) =
P(t)

P(t) = P(t|d)P(d) + P(t|¬d)P(¬d)


= (0.99 × 0.0001) + (0.01 × 0.9999) = 0.0101

0.99 × 0.0001
P(d|t) =
0.0101
= 0.0098
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Bayes’ Theorem

Deriving Bayes theorem

P(Y |X )P(X ) = P(X |Y )P(Y )

P(X |Y )P(Y ) P(Y |X )P(X )


=
P(Y ) P(Y )

P(X |Y )P(Y
 ) P(Y |X )P(X )
=
P(Y )

  P(Y )
P(Y |X )P(X )
⇒P(X |Y ) =
P(Y )
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Bayes’ Theorem

The divisor is the prior probability of the evidence


This division functions as a normalization constant.

0 ≤ P(X |Y ) ≤ 1
X
P(Xi |Y ) = 1.0
i
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Bayes’ Theorem

We can calculate this divisor directly from the dataset.


|{rows where Y is the case}|
P(Y ) =
|{rows in the dataset}|

Or, we can use the Theorem of Total Probability to


calculate this divisor.
X
P(Y ) = P(Y |Xi )P(Xi ) (1)
i
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Bayesian Prediction

Generalized Bayes’ Theorem

P(q[1], . . . , q[m]|t = l)P(t = l)


P(t = l|q[1], . . . , q[m]) =
P(q[1], . . . , q[m])
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Bayesian Prediction

Chain Rule
P(q[1], . . . , q[m]) =
P(q[1]) × P(q[2]|q[1])×
· · · × P(q[m]|q[m − 1], . . . , q[2], q[1])

To apply the chain rule to a conditional probability we just


add the conditioning term to each term in the expression:
P(q[1], . . . , q[m]|t = l) =
P(q[1]|t = l) × P(q[2]|q[1], t = l) × . . .
· · · × P(q[m]|q[m − 1], . . . , q[3], q[2], q[1], t = l)
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Bayesian Prediction

ID H EADACHE F EVER VOMITING M ENINGITIS


1 true true false false
2 false true false false
3 true false true false
4 true false true false
5 false true false true
6 true false true false
7 true false true false
8 true false true true
9 false true false false
10 true false true true

H EADACHE F EVER VOMITING M ENINGITIS


true false true ?
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Bayesian Prediction

P(M|h, ¬f , v ) =?

In the terms of Bayes’ Theorem this problem can be stated


as:
P(h, ¬f , v |M) × P(M)
P(M|h, ¬f , v ) =
P(h, ¬f , v )

There are two values in the domain of the M ENINGITIS


feature, ’true’ and ’false’, so we have to do this calculation
twice.
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Bayesian Prediction

We will do the calculation for m first


To carry out this calculation we need to know the following
probabilities: P(m), P(h, ¬f , v ) and P(h, ¬f , v | m).

ID H EADACHE F EVER VOMITING M ENINGITIS


1 true true false false
2 false true false false
3 true false true false
4 true false true false
5 false true false true
6 true false true false
7 true false true false
8 true false true true
9 false true false false
10 true false true true
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Bayesian Prediction

We can calculate the required probabilities directly from


the data. For example, we can calculate P(m) and
P(h, ¬f , v ) as follows:

|{d5 , d8 , d10 }| 3
P(m) = = = 0.3
|{d1 , d2 , d3 , d4 , d5 , d6 , d7 , d8 , d9 , d10 }| 10
|{d3 , d4 , d6 , d7 , d8 , d10 }| 6
P(h, ¬f , v ) = = = 0.6
|{d1 , d2 , d3 , d4 , d5 , d6 , d7 , d8 , d9 , d10 }| 10
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Bayesian Prediction

However, as an exercise we will use the chain rule


calculate:

P(h, ¬f , v | m) =?

ID H EADACHE F EVER VOMITING M ENINGITIS


1 true true false false
2 false true false false
3 true false true false
4 true false true false
5 false true false true
6 true false true false
7 true false true false
8 true false true true
9 false true false false
10 true false true true
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Bayesian Prediction

Using the chain rule calculate:

P(h, ¬f , v | m) = P(h | m) × P(¬f | h, m) × P(v | ¬f , h, m)


|{d8 , d10 }| |{d8 , d10 }| |{d8 , d10 }|
= × ×
|{d5 , d8 , d10 }| |{d8 , d10 }| |{d8 , d10 }|
2 2 2
= × × = 0.6666
3 2 2
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Bayesian Prediction

So the calculation of P(m|h, ¬f , v ) is:


!
P(h|m) × P(¬f |h, m)
× P(v |¬f , h, m) × P(m)
P(m|h, ¬f , v ) =
P(h, ¬f , v )
0.6666 × 0.3
= = 0.3333
0.6
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Bayesian Prediction

The corresponding calculation for P(¬m|h, ¬f , v ) is:

P(h, ¬f , v | ¬m) × P(¬m)


P(¬m | h, ¬f , v ) =
P(h, ¬f , v )
!
P(h|¬m) × P(¬f | h, ¬m)
× P(v |¬f , h, ¬m) × P(¬m)
=
P(h, ¬f , v )
0.7143 × 0.8 × 1.0 × 0.7
= = 0.6667
0.6
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Bayesian Prediction

P(m|h, ¬f , v ) = 0.3333
P(¬m|h, ¬f , v ) = 0.6667

These calculations tell us that it is twice as probable that


the patient does not have meningitis than it is that they do
even though the patient is suffering from a headache and
is vomiting!
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Bayesian Prediction

The Paradox of the False Positive


The mistake of forgetting to factor in the prior gives rise to
the paradox of the false positive which states that in
order to make predictions about a rare event the model has
to be as accurate as the prior of the event is rare or there is
a significant chance of false positives predictions (i.e.,
predicting the event when it is not the case).
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Bayesian Prediction

Bayesian MAP Prediction Model


MMAP (q) = argmax P(t = l | q[1], . . . , q[m])
l∈levels(t)

P(q[1], . . . , q[m] | t = l) × P(t = l)


= argmax
l∈levels(t) P(q[1], . . . , q[m])

Bayesian MAP Prediction Model (without normalization)

MMAP (q) = argmax P(q[1], . . . , q[m] | t = l) × P(t = l)


l∈levels(t)
ID H EADACHE F EVER VOMITING M ENINGITIS
1 true true false false
2 false true false false
3 true false true false
4 true false true false
5 false true false true
6 true false true false
7 true false true false
8 true false true true
9 false true false false
10 true false true true

H EADACHE F EVER VOMITING M ENINGITIS


true true false ?
ID H EADACHE F EVER VOMITING M ENINGITIS
1 true true false false
2 false true false false
3 true false true false
4 true false true false
5 false true false true
6 true false true false
7 true false true false
8 true false true true
9 false true false false
10 true false true true

P(m | h, f , ¬v ) =?

P(¬m | h, f , ¬v ) =?
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Bayesian Prediction

!
P(h|m) × P(f | h, m)
× P(¬v | f , h, m) × P(m)
P(m | h, f , ¬v ) =
P(h, f , ¬v )
0.6666 × 0 × 0 × 0.3
= =0
0.1
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Bayesian Prediction

!
P(h|¬m) × P(f | h, ¬m)
× P(¬v | f , h, ¬m) × P(¬m)
P(¬m | h, f , ¬v ) =
P(h, f , ¬v )
0.7143 × 0.2 × 1.0 × 0.7
= = 1.0
0.1
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Bayesian Prediction

P(m | h, f , ¬v ) = 0

P(¬m | h, f , ¬v ) = 1.0

There is something odd about these results!


Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Bayesian Prediction

Curse of Dimensionality
As the number of descriptive features grows the number of
potential conditioning events grows. Consequently, an
exponential increase is required in the size of the dataset as
each new descriptive feature is added to ensure that for any
conditional probability there are enough instances in the
training dataset matching the conditions so that the resulting
probability is reasonable.
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Bayesian Prediction

The probability of a patient who has a headache and a


fever having meningitis should be greater than zero!
Our dataset is not large enough → our model is over-fitting
to the training data.
The concepts of conditional independence and
factorization can help us overcome this flaw of our current
approach.
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Conditional Independence and Factorization

If knowledge of one event has no effect on the probability


of another event, and vice versa, then the two events are
independent of each other.
If two events X and Y are independent then:

P(X |Y ) = P(X )
P(X , Y ) = P(X ) × P(Y )

Recall, that when two event are dependent these rules are:

P(X , Y )
P(X |Y ) =
P(Y )
P(X , Y ) = P(X |Y ) × P(Y ) = P(Y |X ) × P(X )
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Conditional Independence and Factorization

Full independence between events is quite rare.


A more common phenomenon is that two, or more, events
may be independent if we know that a third event has
happened.
This is known as conditional independence.
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Conditional Independence and Factorization

For two events, X and Y , that are conditionally


independent given knowledge of a third events, here Z , the
definition of the probability of a joint event and conditional
probability are:

P(X |Y , Z ) = P(X |Z )
P(X , Y |Z ) = P(X |Z ) × P(Y |Z )

P(X , Y )
P(X |Y ) =
P(Y ) P(X |Y ) = P(X )
P(X , Y ) = P(X |Y ) × P(Y ) P(X , Y ) = P(X ) × P(Y )
= P(Y |X ) × P(X )
X and Y are independent
X and Y are dependent
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Conditional Independence and Factorization

If the event t = l causes the events q[1], . . . , q[m] to


happen then the events q[1], . . . , q[m] are conditionally
independent of each other given knowledge of t = l and
the chain rule definition can be simplified as follows:

P(q[1], . . . , q[m] | t = l)
= P(q[1] | t = l) × P(q[2] | t = l) × · · · × P(q[m] | t = l)
m
Y
= P(q[i] | t = l)
i=1
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Conditional Independence and Factorization

Using this we can simplify the calculations in Bayes’


Theorem, under the assumption of conditional
independence between the descriptive features given the
level l of the target feature:

m
!
Y
P(q[i] | t = l) × P(t = l)
i=1
P(t = l | q[1], . . . , q[m]) =
P(q[1], . . . , q[m])
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Conditional Independence and Factorization

Withouth conditional independence

P(X , Y , Z |W ) = P(X |W ) × P(Y |X , W ) × P(Z |Y , X , W ) × P(W )

With conditional independence

P(X , Y , Z |W ) = P(X |W ) × P(Y |W ) × P(Z |W ) × P(W )


| {z } | {z } | {z } | {z }
Factor 1 Factor 2 Factor 3 Factor 4
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Conditional Independence and Factorization

The joint probability distribution for the meningitis dataset.


P(h, f , v , m), P(¬h, f , v , m)
 
 P(h, f , v , ¬m), P(¬h, f , v , ¬m) 
 
 P(h, f , ¬v , m), P(¬h, f , ¬v , m) 
 
 P(h, f , ¬v , ¬m), P(¬h, f , ¬v , ¬m) 
P(H, F , V , M) = 
 
 P(h, ¬f , v , m), P(¬h, ¬f , v , m) 

 P(h, ¬f , v , ¬m), P(¬h, ¬f , v , ¬m) 
 
 P(h, ¬f , ¬v , m), P(¬h, ¬f , ¬v , m) 
P(h, ¬f , ¬v , ¬m), P(¬h, ¬f , ¬v , ¬m)
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Conditional Independence and Factorization

Assuming the descriptive features are conditionally


independent of each other given M ENINGITIS we only need
to store four factors:
Factor1 : < P(M) >
Factor2 : < P(h|m), P(h|¬m) >
Factor3 : < P(f |m), P(f |¬m) >
Factor4 : < P(v |m), P(v |¬m) >
P(H, F , V , M) = P(M) × P(H|M) × P(F |M) × P(V |M)
ID H EADACHE F EVER VOMITING M ENINGITIS
1 true true false false
2 false true false false
3 true false true false
4 true false true false
5 false true false true
6 true false true false
7 true false true false
8 true false true true
9 false true false false
10 true false true true

Calculate the factors from the data.


Factor1 : < P(M) >
Factor2 : < P(h|m), P(h|¬m) >
Factor3 : < P(f |m), P(f |¬m) >
Factor4 : < P(v |m), P(v |¬m) >
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Conditional Independence and Factorization

Factor1 : < P(m) = 0.3 >


Factor2 : < P(h|m) = 0.6666, P(h|¬m) = 0.7413 >
Factor3 : < P(f |m) = 0.3333, P(f |¬m) = 0.4286 >
Factor4 : < P(v |m) = 0.6666, P(v |¬m) = 0.5714 >
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Conditional Independence and Factorization

Factor1 : < P(m) = 0.3 >


Factor2 : < P(h|m) = 0.6666, P(h|¬m) = 0.7413 >
Factor3 : < P(f |m) = 0.3333, P(f |¬m) = 0.4286 >
Factor4 : < P(v |m) = 0.6666, P(v |¬m) = 0.5714 >

Using the factors above calculate the probability of


M ENINGITIS=’true’ for the following query.

H EADACHE F EVER VOMITING M ENINGITIS


true true false ?
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Conditional Independence and Factorization

P(h|m) × P(f |m) × P(¬v |m) × P(m)


P(m|h, f , ¬v ) = P =
i P(h|Mi ) × P(f |Mi ) × P(¬v |Mi ) × P(Mi )
0.6666 × 0.3333 × 0.3333 × 0.3
= 0.1948
(0.6666 × 0.3333 × 0.3333 × 0.3) + (0.7143 × 0.4286 × 0.4286 × 0.7)
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Conditional Independence and Factorization

Factor1 : < P(m) = 0.3 >


Factor2 : < P(h|m) = 0.6666, P(h|¬m) = 0.7413 >
Factor3 : < P(f |m) = 0.3333, P(f |¬m) = 0.4286 >
Factor4 : < P(v |m) = 0.6666, P(v |¬m) = 0.5714 >

Using the factors above calculate the probability of


M ENINGITIS=’false’ for the same query.

H EADACHE F EVER VOMITING M ENINGITIS


true true false ?
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Conditional Independence and Factorization

P(h|¬m) × P(f |¬m) × P(¬v |¬m) × P(¬m)


P(¬m|h, f , ¬v ) = P =
i P(h|Mi ) × P(f |Mi ) × P(¬v |Mi ) × P(Mi )
0.7143 × 0.4286 × 0.4286 × 0.7
= 0.8052
(0.6666 × 0.3333 × 0.3333 × 0.3) + (0.7143 × 0.4286 × 0.4286 × 0.7)
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Conditional Independence and Factorization

P(m|h, f , ¬v ) = 0.1948

P(¬m|h, f , ¬v ) = 0.8052

As before, the MAP prediction would be


M ENINGITIS = ’false’
The posterior probabilities are not as extreme!
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Standard Approach: The Naive


Bayes’ Classifier
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Naive Bayes’ Classifier


m
!
Y
M(q) = argmax P(q[i] | t = l) × P(t = l)
l∈levels(t) i=1
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Naive Bayes’ is simple to train!


1 calculate the priors for each of the target levels
2 calculate the conditional probabilities for each feature
given each target level.
Table: A dataset from a loan application fraud detection domain.
C REDIT G UARANTOR /
ID H ISTORY C O A PPLICANT ACCOMODATION F RAUD
1 current none own true
2 paid none own false
3 paid none own false
4 paid guarantor rent true
5 arrears none own false
6 arrears none own true
7 current none own false
8 arrears none own false
9 current none rent false
10 none none own true
11 current coapplicant own false
12 current none own true
13 current none rent true
14 paid none own false
15 arrears none own false
16 current none own false
17 arrears coapplicant rent false
18 arrears none free false
19 arrears none own false
20 paid none own false
P(fr ) = 0.3 P(¬fr ) = 0.7
P(CH = ’none’ | fr ) = 0.1666 P(CH = ’none’ | ¬fr ) = 0
P(CH = ’paid’ | fr ) = 0.1666 P(CH = ’paid’ | ¬fr ) = 0.2857
P(CH = ’current’ | fr ) = 0.5 P(CH = ’current’ | ¬fr ) = 0.2857
P(CH = ’arrears’ | fr ) = 0.1666 P(CH = ’arrears’ | ¬fr ) = 0.4286
P(GC = ’none’ | fr ) = 0.8334 P(GC = ’none’ | ¬fr ) = 0.8571
P(GC = ’guarantor’ | fr ) = 0.1666 P(GC = ’guarantor’ | ¬fr ) = 0
P(GC = ’coapplicant’ | fr ) = 0 P(GC = ’coapplicant’ | ¬fr ) = 0.1429
P(ACC = ’own’ | fr ) = 0.6666 P(ACC = ’own’ | ¬fr ) = 0.7857
P(ACC = ’rent’ | fr ) = 0.3333 P(ACC = ’rent’ | ¬fr ) = 0.1429
P(ACC = ’free’ | fr ) = 0 P(ACC = ’free’ | ¬fr ) = 0.0714

Table: The probabilities needed by a Naive Bayes prediction model


calculated from the dataset. Notation key: FR=F RAUDULENT,
CH=C REDIT H ISTORY, GC = G UARANTOR /C O A PPLICANT, ACC =
ACCOMODATION, T=’true’, F=’false’.
P(fr ) = 0.3 P(¬fr ) = 0.7
P(CH = ’none’ | fr ) = 0.1666 P(CH = ’none’ | ¬fr ) = 0
P(CH = ’paid’ | fr ) = 0.1666 P(CH = ’paid’ | ¬fr ) = 0.2857
P(CH = ’current’ | fr ) = 0.5 P(CH = ’current’ | ¬fr ) = 0.2857
P(CH = ’arrears’ | fr ) = 0.1666 P(CH = ’arrears’ | ¬fr ) = 0.4286
P(GC = ’none’ | fr ) = 0.8334 P(GC = ’none’ | ¬fr ) = 0.8571
P(GC = ’guarantor’ | fr ) = 0.1666 P(GC = ’guarantor’ | ¬fr ) = 0
P(GC = ’coapplicant’ | fr ) = 0 P(GC = ’coapplicant’ | ¬fr ) = 0.1429
P(ACC = ’own’ | fr ) = 0.6666 P(ACC = ’own’ | ¬fr ) = 0.7857
P(ACC = ’rent’ | fr ) = 0.3333 P(ACC = ’rent’ | ¬fr ) = 0.1429
P(ACC = ’free’ | fr ) = 0 P(ACC = ’free’ | ¬fr ) = 0.0714

C REDIT H ISTORY G UARANTOR /C O A PPLICANT ACCOMODATION F RAUDULENT


paid none rent ?
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

A Worked Example

P(fr ) = 0.3 P(¬fr ) = 0.7


P(CH = ’paid’ | fr ) = 0.1666 P(CH = ’paid’ | ¬fr ) = 0.2857
P(GC = ’none’ | fr ) = 0.8334 P(GC = ’none’ | ¬fr ) = 0.8571
P(ACC = ’rent’ | fr ) = 0.3333 P(ACC = ’rent’ | ¬fr ) = 0.1429
Ym 
P (q [k ] | fr ) × P (fr ) = 0.0139
k =1
m
Y 
P (q [k ] | ¬fr ) × P(¬fr ) = 0.0245
k =1

C REDIT H ISTORY G UARANTOR /C O A PPLICANT ACCOMODATION F RAUDULENT


paid none rent ?
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

A Worked Example

P(fr ) = 0.3 P(¬fr ) = 0.7


P(CH = ’paid’ | fr ) = 0.1666 P(CH = ’paid’ | ¬fr ) = 0.2857
P(GC = ’none’ | fr ) = 0.8334 P(GC = ’none’ | ¬fr ) = 0.8571
P(ACC = ’rent’ | fr ) = 0.3333 P(ACC = ’rent’ | ¬fr ) = 0.1429
Ym 
P (q [k ] | fr ) × P (fr ) = 0.0139
k =1
m
Y 
P (q [k ] | ¬fr ) × P(¬fr ) = 0.0245
k =1

C REDIT H ISTORY G UARANTOR /C O A PPLICANT ACCOMODATION F RAUDULENT


paid none rent ’false’
The model is generalizing beyond the dataset!

C REDIT G UARANTOR /
ID H ISTORY C O A PPLICANT ACCOMMODATION F RAUD
1 current none own true
2 paid none own false
3 paid none own false
4 paid guarantor rent true
5 arrears none own false
6 arrears none own true
7 current none own false
8 arrears none own false
9 current none rent false
10 none none own true
11 current coapplicant own false
12 current none own true
13 current none rent true
14 paid none own false
15 arrears none own false
16 current none own false
17 arrears coapplicant rent false
18 arrears none free false
19 arrears none own false
20 paid none own false

C REDIT H ISTORY G UARANTOR /C O A PPLICANT ACCOMMODATION F RAUDULENT


paid none rent ’false’
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

Summary
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

P(d|t) × P(t)
P(t|d) = (2)
P(d)

A Naive Bayes’ classifier naively assumes that each of the


descriptive features in a domain is conditionally
independent of all of the other descriptive features, given
the state of the target feature.
This assumption, although often wrong, enables the Naive
Bayes’ model to maximally factorise the representation that
it uses of the domain.
Surprisingly, given the naivety and strength of the
assumption it depends upon, a Naive Bayes’ model often
performs reasonably well.
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary

1 Big Idea

2 Fundamentals
Bayes’ Theorem
Bayesian Prediction
Conditional Independence and Factorization

3 Standard Approach: The Naive Bayes’ Classifier


A Worked Example

4 Summary

You might also like