Bayesian Learning Introduction Bayes08 PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 65

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Bayesian Learning: An Introduction


Joao Gama

LIAAD-INESC Porto, University of Porto, Portugal


September 2008

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Motivation: Information Processing

Introduction

Bayesian Network Classifiers

k-Dependence Bayesian Classifiers

Links and References

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Outline

Motivation: Information Processing

Introduction

Bayesian Network Classifiers

k-Dependence Bayesian Classifiers

Links and References

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Life as information processing:

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Motivation: Micro Array

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Motivation: Supervised Learning Task I

Given:
a set of micro-array experiments, each done with mRNA from
a different patient (same cell type from every patient)
Patients expression values for each gene constitute the
features, and
patients disease constitutes the class

Goal: Learn a model that accurately predicts class based on


features

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Micro-array data: Data Points are Samples

Person
A28202 ac AB00014 at AB00015 at
Person1
1144.0
321.0
2567.2
Person2
105.2
586.1
759.2
Person3
586.3
559.0
3210.2
Person4
42.8
692.0
812.2
Learning Problem:
Find groups of genes particularly active for groups of

...
...
...
...
...
Persons.

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Supervision: Add Class Values

Person
Person1
Person2
Person3
Person4

A28202 ac
1144.0
105.2
586.3
42.8

AB00014 at
321.0
586.1
559.0
692.0

AB00015 at
2567.2
759.2
3210.2
812.2

...
...
...
...
...

Class
normal
cancer
normal
cancer

Learning Problems:
Find a function: Class = f(A28202 ac, AB00014 at,
AB00015 at, . . .)
Given the expression level of genes of a Person, predict if he
has cancer or not.

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Splice junctions: Supervised Learning Task II

DNA: T G C A G C T C C G G A C T C C A T
mRNA: A C G U C G A G G C C U G A G G U A
Exons: Sequences of nucleotides that are expressed
(translated to proteins)
Introns: Intercalated sequences eliminated during the
translation Non-coding regions
Splice-junctions: Frontiers between an exon and an intron
Donors: border exon-intron
Acceptors: border intron-exon

Learning Problem: Given a sequence of DNA, recognize the


boundaries between exons and introns.

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Illustrative Example: patient monitoring in ICU ...


ALARM is a diagnostic application used to explore probabilistic reasoning techniques
in belief networks. ALARM implements an alarm message system for patient
monitoring; it calculates probabilities for a differential diagnosis based on available
evidence. The medical knowledge is encoded in a graphical structure connecting 8
diagnoses, 16 finding and 13 intermediate variables.
Three types of variables are represented in ALARM. Diagnoses and other qualitative
information are at the top level of the network. These variables have no predecessors
and they are assumed to be mutually independent a priori. All nodes are associated
with a set of mutually exclusive and exhaustive values representing the presence or
absence or the severity of a particular disease. Measurements represent any available
quantitative information. All continuous variables are represented categorically with
sets of discrete intervals dividing the value range. Depending on the necessary level of
detail, three to five categories are used per node. Intermediate variables are inferred
entities that cannot be measured directly.
The probabilities in a belief network can represent objective as well as subjective
knowledge. ALARM contains statistical data on prior probabilities, logical conditional
probabilities computed from equations relating variables, and a number of subjective
assessments. It is necessary to obtain conditional probabilities for the states of a node,
given all different states of the parent nodes.

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Illustrative Example: Qualitative model ...


A DAG that reflects the conditional independence between
variables.

Example: ICU Alarm network


Domain: Monitoring Intensive-Care Patients
37 variables
509 parameters
instead of 254
PULMEMBOLUS

PAP

SHUNT

MINVOLSET

KINKEDTUBE

INTUBATION

VENTMACH

VENTLUNG

DISCONNECT

VENITUBE
PRESS

MINOVL

ANAPHYLAXIS

SAO2

TPR

HYPOVOLEMIA

LVEDVOLUME

CVP

PCWP

LVFAILURE

STROEVOLUME

FIO2

VENTALV

PVSAT

ARTCO2

EXPCO2

INSUFFANESTH

CATECHOL

HISTORY

ERRBLOWOUTPUT

CO

HR

HREKG

ERRCAUTER

HRSAT

HRBP
BP

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Illustrative Example: .. and Quantitative model

A set of contingency tables between dependent variables.

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Illustrative Example: Propagating Evidence ...

A Bayesian net:

Prediction: Observing Smoking


and propagating this evidence

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Illustrative Example: Propagating Evidence ...

A Bayesian net:

Diagnosis:
Observing
Positive X-ray and propagating this evidence

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Another Interface ...


ERIC HORVITZ, a researcher at Microsoft and a guru in the field of Bayesian
statistics, feels bad about the paperclip, but he hopes his latest creation will make up
for it. The paperclip in question, as even casual users of Microsofts Office software
will be aware, is a cheery character who pops up on the screen to offer advice on
writing a letter or formatting a spreadsheet. That was the idea, anyway. But many
people regard the paperclip as annoyingly over-enthusiastic, since it appears without
warning and gets in the way.
Mobile Manager evaluates incoming e-mails on a users PC and decides which are
important enough to forward to a pager, mobile phone or other e-mail address. Its
Bayesian innards give it an almost telepathic ability to distinguish junk mail from
genuinely important messages.

The Clip is an interface for a Bayesian Network:

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Why Learn Bayesian Networks?

Join Probability Distribution of a set of Variables.


Conditional independences & graphical language capture
structure of many real-world distributions
Graph structure provides much insight into domain
Learned model can be used for many tasks:
Prediction: Given the Inputs: which Outputs?
Diagnosis: Given the Outputs: which Inputs?
Unsupervised: Given Inputs and Outputs: Which structure?

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Outline

Motivation: Information Processing

Introduction

Bayesian Network Classifiers

k-Dependence Bayesian Classifiers

Links and References

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Bayes Theorem
What is the most probable hypothesis h, given training data D?
A method to calculate the probability of a hypothesis based on
its prior probability,
the probability of observing the data given the hypothesis,
The data itself
P(h|D) =

P(h)P(D|h)
P(D)

P(h) = prior probability of hypothesis h


P(D) = prior probability of training data D
P(h|D) = probability of h given D
P(D|h) = probability of D given h

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Illustrative Example

The Win envelope has 1


dollar and four beads in it.
The Lose envelope has
three beads in it.
Someone draws an envelope at
random and offers to sell it to
you. Which one should we
choose?

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Illustrative Example

Before deciding, you are allowed to see one bead in one


envelope.
If it is black, Which one
should we choose?
And if it is red?

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Illustrative Example

Prior Probabilities:
P(Win) =
P(Lose) =
P(red) =
P(black) =
P(black|Win) =
P(red|Win) =
After seeing the bead:
P(Win|black) =
P(Win|red) =

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Illustrative Example

Prior Probabilities:
P(Win) = 1/2
P(Lose) = 1/2
P(red) = 3/7
P(black) = 4/7
P(black|Win) = 1/2
P(red|Win) = 1/2
After seeing the bead:
If bead = black:
=
P(Win|black) = P(Win)P(black|Win)
P(black)
If bead = red: P(Win|red) =

1/21/2
= 0.4375
4/7
P(Win)P(red|Win)
= 1/21/2
P(red)
3/7

= 0.583

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Bayesians ...

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Bayesians ...

The Homo apriorius establishes the probability of an hypothesis, no matter what


data tell. Astronomical example: H0=25. What the data tell is uninteresting.
The Homo pragamiticus establishes that it is interested by the data only.
Astronomical example: In my experiment I found H0=66.6666666, full stop.
The Homo frequentistus measures the probability of the data given the
hypothesis. Astronomical example: if H0=66.666666, then the probability to
get an observed value more different from the one I observed is given by an
opportune expression. Dont ask me if my observed value is near the true one, I
can only tell you that if my observed values is the true one, then the probability
of observing data more extreme than mine is given by an opportune expression.
The Homo sapients measures the probability of the data and of the hypothesis.
Astronomical example: [missing]
The Homo bayesianis measures the probability of the hypothesis, given the data.
Astronomical example: the true value of H0 is near 72 with +-3 uncertainty.
Stefano Andreon Homepage

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Outline

Motivation: Information Processing

Introduction

Bayesian Network Classifiers

k-Dependence Bayesian Classifiers

Links and References

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

An Illustrative Problem:

A patient takes a lab test and the result comes back positive. The
test returns a correct positive result in only 75% of the cases in
which the disease is actually present, and a correct negative result
in only 96% of the cases in which the disease is not present.
Furthermore, 8% of the entire population have this cancer.
How to represent that information?

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Representation:
It is useful to represent this information in a graph.
The graphical information is
qualitative
The nodes represent
variables.
Arcs specify the
(in)dependence between
variables. Direct arcs
represent influence between
variables.
The direction of the arc tell us that the value of the variable
disease influences the value of the variable test.

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Inference:

Inference using Bayes Theorem:


P(Disease|Test =0 Positive 0 ) =
= P(Disease)P(Test =0 Positive 0 |Disease)

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

The Semantics of Arrows

C
A

Direction of Arrow indicates Influence not causality.


The ground truth might be different!

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Naive Bayes Classifier

Assume target function f : X Y , where each instance x


described by attributes hx1 , x2 . . . xn i.
Most probable value of f (x) is:
YMAP

= argmax P(yj |x1 , x2 . . . xn )


yj Y

YMAP

= argmax
yj Y

P(x1 , x2 . . . xn |yj )P(yj )


P(x1 , x2 . . . xn )

= argmax P(x1 , x2 . . . xn |yj )P(yj )


yj Y

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Naive Bayes Classifier

Naive Bayes assumption: Attributes are independent given the


class.
Q
P(x1 , x2 . . . xn |yj ) = i P(xi |yj ) which gives Naive Bayes classifier:
Y
YNB = argmax P(yj )
P(xi |yj )
yj V

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Naive Bayes

Assume a decision problem with p variables.


Each variable assume k values.
The joint probability requires to estimate k p probabilities.
Assuming that variables are conditionally independent given
the class, only requires to estimate k p probabilities.

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Naive Bayes Formulae

Naive Bayes can be expressed in addictive form:


P(yi |~x ) ln(P(yi )) +

ln(P(xj |yi )

Points out the contribution of each attribute to the decision.

Two class problems:


ln

P(y+ |~x )
P(y+ ) X P(xj |y+ )
ln
+
ln
P(y |~x )
P(y )
P(xj |y )

The sign of each term indicates the class the attribute


contributes to.

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Naive Bayes as a Bayesian Net

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Naive Bayes Algorithm


Naive Bayes Learn(examples)
Initialize all counts to zero
For each example {X , yj }
Nr = Nr + 1
Count(yj ) = Count(yj ) + 1
For each attribute value xi X
Count(xi , yj ) = Count(xi , yj ) + 1

Classify New Instance(x)


j)
yNB = argmax P(y

yj Y

xi X

i |yj )
P(x

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Naive Bayes: Example

Weather
Rainy
Sunny
Sunny
Overcast
Rainy
Rainy
Overcast
Overcast
Sunny
Rainy
Overcast
Sunny
Sunny
Rainy

Temperature
71
69
80
83
70
65
64
72
75
68
81
85
72
75

Humidity
91
70
90
86
96
70
65
90
70
80
75
85
95
80

Wind
Yes
No
Yes
No
No
Yes
Yes
Yes
Yes
No
No
No
No
No

Play
No
Yes
No
Yes
Yes
No
Yes
Yes
Yes
Yes
Yes
No
No
Yes

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Two representations:

P(play |weather , temperature, humidity , wind) =

= P(Play )P(Weather |Play )P(Temperature|Play )P(Humidity |Play )P(Wind|Play

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Naive Bayes: Counts

Nr. examples: 14
Play =0 Yes 0 : 9
Play =0 No 0 : 5
Weather
Yes
Sunny
2
Overcast
4
Rainy
3

No
3
0
2

Temperature
Yes
No
Mean
73
74.6
SD
6.2
7.9

Humidity
Yes
No
Mean
79.1 86.2
SD
10.2
9.3

Wind
Yes
False
6
True
3

No
2
3

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Naive Bayes: Distribution Tables

Nr. examples: 14
P(Play =0 Yes 0 ) = 9/14
P(Play =0 No 0 ) = 5/14
Weather
Yes
Sunny
2/9
Overcast
4/9
Rainy
3/9

No
3/5
0/5
2/5

Temperature
Yes
No
Mean
73
74.6
SD
6.2
7.9

Humidity
Yes
No
Mean
79.1 86.2
SD
10.2
9.3

Continuous attributes can be approximated using normal


2
1
distribution: N(x) = 2
e (x)
2 2

Wind
Yes
False
6/9
True
3/9

No
2/5
3/5

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Naive Bayes: Test Example

Weather
Sunny

Temperature
66

Humidity
90

Wind
Yes

Play
?

P(Yes|Weather = Sunny 0 , Temperature = 66, Humidity = 90, Wind = Yes 0 ) =


= P(Yes)P(Weather = Sunny 0 |Yes)P(Temperature = 66|Yes)P(Humidity =
90|Yes)P(Wind = Yes 0 |Yes)
P(No|Weather = Sunny 0 , Temperature = 66, Humidity = 90, Wind = Yes 0 ) =
= P(No)P(Weather = Sunny 00 |No)P(Temperature = 66|No)P(Humidity =
90|No)P(Wind = Yes 00 |No)

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Naive Bayes: Subtleties

what if none of the training instances with target value yj


i |yj ) = 0 and
have attribute value xi ? Then P(x
Q

P(yj ) i P(xi |yj ) = 0


i |yj )
Typical solution is Bayesian estimate for P(x
nc +mp

P(xi |yj ) n+m where


n is number of training examples for which y = yj ,
nc number of examples for which Y = yj and X = xi
i |yj )
p is prior estimate for P(x
m is weight given to prior (i.e. number of virtual examples)

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Discretization

Transform a continuous variable into a set of ordered intervals (bins).


Two Main Problems:
How many Intervals?
Generic: Use 10 intervals or min(10, nr . of different values)
Bioinformtics: few intervals (2, 3, ...)
How to define to borders of each Interval?

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Discretization: Basic Methods


Equal width discretization. Divides the range of observed
values for a feature into k equally sized bins. Pos: simplicity,
Neg: the presence of outliers.
Equal frequency discretization. Divides the range of
observed values into k bins, where (considering n instances)
each bin contains n/k values.
k-means. An iterative method that begins with an
equal-width discretization, iteratively adjust the boundaries to
minimize a squared-error function and only stops when it can
not change any value to improve the previous criteria.

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Naive Bayes: Analysis

Conditional independence assumption is often violated


Y
P(x1 , x2 . . . xn |yj ) =
P(xi |yj )
i

...but it works surprisingly well anyway. Note dont need


j |x) to be correct; need only that
estimated posteriors P(y

j)
argmax P(y
yj Y

Y
i

i |yj ) = argmax P(yj )P(x1 . . . , xn |yj )


P(x
yj Y

see [Domingos & Pazzani, 1996] for analysis


Naive Bayes posteriors often unrealistically close to 1 or 0

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Naive Bayes: Analysis


Robust to the presence of irrelevant attributes.
Suppose a two class problem, where Xi is an irrelevant
attribute: p(xi |y1 ) = p(xi |y2 ).
p(Y |x1 , ..., xiQ
, ..., xn ) Q
n
p(Y )p(xi |c) i1
l=1 p(xl |Y )
l=i+1 p(xl |Y ) and
p(Y |x1 , ..., xi , ..., xn ) p(Y |x1 , ..., xi1 , xi+1 , ..., xn )
Redundant variables must be taken into account.
Suppose that Xi = Xi1 then p(xi1 |Y ) = p(xi |Y )
p(Y |x1 , ..., xi1 , xi , ..., xQ
n)
Qn
p(Y )p(xi1 |Y )p(xi |Y ) i2
l=1 p(xl |Y )
l=i+1 p(xl |Y )
and
p(Y |x1 , ..., xi1Q, xi , ..., xn ) Q
n
p(Y )p(xi |Y )2 i2
l=1 p(xl |Y )
l=i+1 p(xl |Y )

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Naive Bayes: Summary

The variability of a dataset is summarized in contingency


tables.
Requires a single scan over the dataset.
The algorithm is Incremental (incorporation of new examples)
and decremental (forgetting old examples).

The dimension of the decision model is independent of the


number of examples.
Low variance: stable with respect to small perturbations of the
training set.
High bias: the number of possible states is finite.

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Naive Bayes: Extensions

Techniques that apply different naive Bayes classifiers to


different regions of the input space. Recursive Bayes (Langley,
93), Naive Bayes Tree (Kohavi,96).
Techniques that built new attributes that reflect
interdependencies between original attributes. Semi-naive
Bayes (Kononenko, 91).
Processing continuous attributes: Flexible Bayes (G.John),
Linear Bayes (Gama, 01)
Search for better Parameters: Iterative Bayes (Gama, 99)

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Successful Stories

KDDCup 1998: Winner - Boosting naive Bayes;


Coil 1999: Winner - Simple naive Bayes;
The most used classifier in Text Mining;

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

The Balance-scale Problem

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Rapid Miner: Naive Bayes

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Knimer: Naive Bayes

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Weka: Naive Bayes

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Outline

Motivation: Information Processing

Introduction

Bayesian Network Classifiers

k-Dependence Bayesian Classifiers

Links and References

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Mutual Information
Uncertainty of a random Variable (Entropy):
X
H(X ) =
P(x)log2 (P(x))
Uncertainty about X after knowing Y (Conditional Entropy):
X
X
H(X |Y ) =
P(y )
P(x|y )logP(x|Y )
y

Reduction in the uncertainty of X when Y is known (Mutual


Information):
I (X , Y ) = H(X |Y ) H(X )
XX
P(xi , yj )
I (X , Y ) =
P(xi , yj )log
P(xi )p(xj )
i

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Example: TAN
Tree augmented naive Bayes
Compute the Mutual Information
between all pairs of variables given the
Class
X
I (X , Y |C ) =
P(c)I (X , Y |C = c)
c

Construct a spanning tree maximizing


MI.
All the variables depends on the class.
p(c|x1 , x2 , x3 ) p(c)p(x1 |c)p(x2 |x1 , c)p(x3 |x1 , c)

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

k-Dependency Bayesian Networks


All the attributes depends on
the class
Any attribute depends on k
other attributes, at most.
Restricted Bayesian Networks
Decision Models of Increase
Complexity
p(c|x1 , x2 , x3 ) p(c)p(x1 |c)p(x2 |x1 , c)p(x3 |x1 , c)
k-Dependency Bayesian Networks: a common framework for
classifiers with increase (smooth) complexity.

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

k-Dependency Bayesian Networks

Compute I (Xi , C ) and I (Xi , Xj |C ) for all pairs of variables


Iterate
Choose the variable Xmax not yet in the model that maximizes
I (Xi |C )
Choose the k parents of Xmax : those with greater
I (Xj , Xmax |C )

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

k-Dependency Bayesian Networks

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

k-Dependency Bayesian Networks

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

k-Dependency Bayesian Networks

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Weka: TAN

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Outline

Motivation: Information Processing

Introduction

Bayesian Network Classifiers

k-Dependence Bayesian Classifiers

Links and References

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Software Available
R (package e1071)
Weka (naive Bayes, TAN models, k-dependence Bayesian classifiers)
Genie: Bayesian Networks, Influence Diagrams, Dynamic Bayesian Networks
(http://genie.sis.pitt.edu/)
Hugin (http://www.hugin.com/)
Elvira (http://www.ia.uned.es/ elvira)
Kevin Murphys MATLAB toolbox - supports dynamic BNs, decision networks,
many exact and approximate inference algorithms, parameter estimation, and
structure learning
(http://www.ai.mit.edu/ murphyk/Software/BNT/bnt.html)
Free Windows software for creation, assessment and evaluation of belief
networks.
(http://www.research.microsoft.com/dtas/msbn/default.htm)
Open source package for the technical computing language R, developed by
Aalborg University
(http://www.math.auc.dk/novo/deal)
http://www.snn.ru.nl/nijmegen

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Bibliography

Tom Mitchell, Machine Learning, (chapter 6), McGraw-Hill,


1997
R. Duda, P. Hart, D. Stork; Pattern Classification, J. Willey &
Sons, 2000
J.Pearl, Probabilistic Reasoning in Intelligent Systems:
Networks of Plausible Inference, Morgan Kaufmann, 1988
P. Domingos, M.Pazzani; On the Optimality of the Simple
Bayes Classifier under zero-one loss, Machine Learning, 29
R. Neapolitan, Learning Bayesian Networks, Prentice Hall,
2004

Outline Motivation: Information Processing Introduction Bayesian Network Classifiers k-Dependence Bayesian Classifiers Links a

Would you like to learn more? Wait for SAD ...

You might also like