22 BN Inference

Download as pdf or txt
Download as pdf or txt
You are on page 1of 40

CSE 473: Artificial Intelligence

Bayesian Networks: Inference

Hanna Hajishirzi

Many slides over the course adapted from either Luke


Zettlemoyer, Pieter Abbeel, Dan Klein, Stuart Russell or Andrew
Moore
1
Outline

§ Bayesian Networks Inference


§ Exact Inference: Variable Elimination
§ Approximate Inference: Sampling
Bayes Net
Representation

3
4
Reachability (D-Separation)
! Question: Are X and Y Active Triples Inactive Triples
conditionally independent (dependent) (Independent)
given evidence vars {Z}?
! Yes, if X and Y “separated” by Z
! Look for active paths from X to Y
! No active paths = independence!
! A path is active if each triple
is active:
! Causal chain A → B → C where B
is unobserved (either direction)
! Common cause A ← B → C where
B is unobserved
! Common effect (aka v-structure)
A → B ← C where B or one of its
descendents is observed
! All it takes to block a path is
a single inactive segment
! 5
Bayes Net Joint Distribution
Example:#Alarm#Network#
B# P(B)# E# P(E)#
+b# 0.001#
B# E# +e# 0.002#
Qb# 0.999# Qe# 0.998#

A#
A# J# P(J|A)# A# M# P(M|A)#
B# E# A# P(A|B,E)#
+a# +j# 0.9# +a# +m# 0.7#
+b# +e# +a# 0.95#
+a# Qj# 0.1# +a# Qm# 0.3#
Qa# +j# 0.05# J# M# Qa# +m# 0.01#
+b# +e# Qa# 0.05#
+b# Qe# +a# 0.94#
Qa# Qj# 0.95# Qa# Qm# 0.99#
+b# Qe# Qa# 0.06#
Qb# +e# +a# 0.29#
Qb# +e# Qa# 0.71#
Qb# Qe# +a# 0.001#
Qb# Qe# Qa# 0.999#
6
Bayes Net Joint Distribution
Example:#Alarm#Network#
B# P(B)# E# P(E)#
+b# 0.001#
B# E# +e# 0.002#
Qb# 0.999# Qe# 0.998#

A#
A# J# P(J|A)# A# M# P(M|A)#
B# E# A# P(A|B,E)#
+a# +j# 0.9# +a# +m# 0.7#
+b# +e# +a# 0.95#
+a# Qj# 0.1# +a# Qm# 0.3#
Qa# +j# 0.05# J# M# Qa# +m# 0.01#
+b# +e# Qa# 0.05#
+b# Qe# +a# 0.94#
Qa# Qj# 0.95# Qa# Qm# 0.99#
+b# Qe# Qa# 0.06#
Qb# +e# +a# 0.29#
Qb# +e# Qa# 0.71#
Qb# Qe# +a# 0.001#
Qb# Qe# Qa# 0.999#

7
Probabilistic Inference
§ Probabilistic inference: compute a desired probability from
other known probabilities (e.g. conditional from joint)

§ We generally compute conditional probabilities


§ P(on time | no reported accidents) = 0.90
§ These represent the agent s beliefs given the evidence

§ Probabilities change with new evidence:


§ P(on time | no accidents, 5 a.m.) = 0.95
§ P(on time | no accidents, 5 a.m., raining) = 0.80
§ Observing new evidence causes beliefs to be updated
Inference
Inference#
! Inference:#calcula)ng#some# ! Examples:#
useful#quan)ty#from#a#joint#
! Posterior#probability#
probability#distribu)on#

! Most#likely#explana)on:#

9
Inference by Enumeration
§ General case:
§ Evidence variables:
§ Query* variable:
§ Hidden variables: All variables

§ We want:
§ First, select the entries consistent with the evidence
§ Second, sum out H to get joint of Query and evidence:

§ Finally, normalize the remaining entries to conditionalize


§ Obvious problems:
§ Worst-case time complexity O(dn)
§ Space complexity O(dn) to store the joint distribution
Inference in BN by Enumeration
Inference#by#Enumera)on#in#Bayes’#Net#
! Given#unlimited#)me,#inference#in#BNs#is#easy#
B# E#
! Reminder#of#inference#by#enumera)on#by#example:#

P (B | + j, +m) /B P (B, +j, +m) A#


X
= P (B, e, a, +j, +m)
e,a
J# M#
X
= P (B)P (e)P (a|B, e)P (+j|a)P (+m|a)
e,a

=P (B)P (+e)P (+a|B, +e)P (+j| + a)P (+m| + a) + P (B)P (+e)P ( a|B, +e)P (+j| a)P (+m| a)
P (B)P ( e)P (+a|B, e)P (+j| + a)P (+m| + a) + P (B)P ( e)P ( a|B, e)P (+j| a)P (+m| a)

11
Inference#by#Enumera)on?#
Inference by Enumerataion

P (Antilock|observed variables) = ?
12
Variable Elimination
§ Why is inference by enumeration so slow?
§ You join up the whole joint distribution before you
sum out the hidden variables
§ You end up repeating a lot of work!

§ Idea: interleave joining and marginalizing!


§ Called Variable Elimination
§ Still NP-hard, but usually much faster than inference
by enumeration

§ We ll need some new notation to define VE


Review

§ Joint distribution: P(X,Y) T W P


§ Entries P(x,y) for all x, y hot sun 0.4
§ Sums to 1
hot rain 0.1
cold sun 0.2
cold rain 0.3
§ Selected joint: P(x,Y)
§ A slice of the joint distribution T W P
§ Entries P(x,y) for fixed x, all y cold sun 0.2
§ Sums to P(x)
cold rain 0.3
Review
§ Family of conditionals:
P(X |Y)
T W P
§ Multiple conditionals
§ Entries P(x | y) for all x, y hot sun 0.8
§ Sums to |Y|
hot rain 0.2
cold sun 0.4
cold rain 0.6
§ Single conditional: P(Y | x)
§ Entries P(y | x) for fixed x, all
T W P
y cold sun 0.4
§ Sums to 1
cold rain 0.6
Review

§ Specified family: P(y | X)


§ Entries P(y | x) for fixed y,
T W P
but for all x hot rain 0.2
§ Sums to … who knows!
cold rain 0.6

§ In general, when we write P(Y1 … YN | X1 … XM)


§ It is a “factor,” a multi-dimensional array
§ Its values are all P(y1 … yN | x1 … xM)
§ Any assigned X or Y is a dimension missing (selected) from the array
Inference
§ Inference is expensive with enumeration

§ Variable elimination:
§ Interleave joining and marginalization: Store
initial results and then join with the rest
Example: Traffic Domain
§ Random R +r   0.1  
-­‐r   0.9  
Variables
§ R: Raining T
§ T: Traffic +r   +t   0.8  
+r   -­‐t   0.2  
§ L: Late for class! L -­‐r   +t   0.1  
-­‐r   -­‐t   0.9  
§ First query: P(L)
+t   +l   0.3  
+t   -­‐l   0.7  
-­‐t   +l   0.1  
-­‐t   -­‐l   0.9  
Variable Elimination Outline
§ Maintain a set of tables called factors
§ Initial factors are local CPTs (one per node)

+r   0.1   +r   +t   0.8   +t   +l   0.3  


+r   -­‐t   0.2   +t   -­‐l   0.7  
-­‐r   0.9  
-­‐r   +t   0.1   -­‐t   +l   0.1  
-­‐r   -­‐t   0.9   -­‐t   -­‐l   0.9  

§ Any known values are selected


§ E.g. if we know , the initial factors are

+r   +t   0.8  
+r   0.1   +t   +l   0.3  
+r   -­‐t   0.2  
-­‐r   0.9   -­‐t   +l   0.1  
-­‐r   +t   0.1  
-­‐r   -­‐t   0.9  

§ VE: Alternately join factors and eliminate variables


Operation 1: Join Factors
§ First basic operation: joining factors
§ Combining factors:
§ Just like a database join
§ Get all factors over the joining variable
§ Build a new factor over the union of the variables involved

§ Example: Join on R
R,T
R
+r   0.1   +r   +t   0.8   +r   +t   0.08  
-­‐r   0.9   +r   -­‐t   0.2   +r   -­‐t   0.02  
T
-­‐r   +t   0.1   -­‐r   +t   0.09  
-­‐r   -­‐t   0.9   -­‐r   -­‐t   0.81  
§ Computation for each entry: pointwise products
Example: Multiple Joins

+r   0.1  
R -­‐r   0.9   Join R
+r   +t   0.08   R, T
+r   -­‐t   0.02  
T +r   +t   0.8   -­‐r   +t   0.09  
+r   -­‐t   0.2   -­‐r   -­‐t   0.81  
L
-­‐r   +t   0.1  
-­‐r   -­‐t   0.9  
L

+t   +l   0.3   +t   +l   0.3  
+t   -­‐l   0.7   +t   -­‐l   0.7  
-­‐t   +l   0.1   -­‐t   +l   0.1  
-­‐t   -­‐l   0.9   -­‐t   -­‐l   0.9  
Example: Multiple Joins

R, T, L

+r   +t   0.08  
+r   -­‐t   0.02  
R, T -­‐r   +t   0.09   Join T +r   +t   +l   0.024  
-­‐r   -­‐t   0.81  
+r   +t   -­‐l   0.056  
+r   -­‐t   +l   0.002  
L
+r   -­‐t   -­‐l   0.018  
+t   +l   0.3  
-­‐r   +t   +l   0.027  
+t   -­‐l   0.7  
-­‐r   +t   -­‐l   0.063  
-­‐t   +l   0.1  
-­‐t   -­‐l   0.9  
-­‐r   -­‐t   +l   0.081  
-­‐r   -­‐t   -­‐l   0.729  
Operation 2: Eliminate
§ Second basic operation: marginalization
§ Take a factor and sum out a variable
§ Shrinks a factor to a smaller one
§ A projection operation
§ Example:

+r   +t   0.08  
+r   -­‐t   0.02   +t   0.17  
-­‐r   +t   0.09   -­‐t   0.83  
-­‐r   -­‐t   0.81  
Multiple Elimination

R, T, L T, L L

Sum Sum
+r   +t   +l   0.024  
out R out T
+r   +t   -­‐l   0.056  
+r   -­‐t   +l   0.002   +t   +l   0.051   +l   0.134  
-­‐l   0.886  
+r   -­‐t   -­‐l   0.018   +t   -­‐l   0.119  
-­‐r   +t   +l   0.027   -­‐t   +l   0.083  
-­‐r   +t   -­‐l   0.063   -­‐t   -­‐l   0.747  
-­‐r   -­‐t   +l   0.081  
-­‐r   -­‐t   -­‐l   0.729  
P(L) : Marginalizing Early!

+r   0.1  
-­‐r   0.9   Join R Sum out R

R +r   +t   0.08  
+r   +t   0.8   +r   -­‐t   0.02   +t   0.17  
-­‐r   +t   0.09   -­‐t   0.83  
+r   -­‐t   0.2  
-­‐r   +t   0.1   -­‐r   -­‐t   0.81   T
T
-­‐r   -­‐t   0.9  
R, T

L +t   +l   0.3   +t   +l   0.3   +t   +l   0.3  


L
+t   -­‐l   0.7   +t   -­‐l   0.7   L +t   -­‐l   0.7  
-­‐t   +l   0.1   -­‐t   +l   0.1   -­‐t   +l   0.1  
-­‐t   -­‐l   0.9   -­‐t   -­‐l   0.9   -­‐t   -­‐l   0.9  
Marginalizing Early (aka VE*)
T
T, L L
Join T Sum out T
L

+t   0.17  
-­‐t   0.83  
+t   +l   0.051  
+l   0.134  
+t   -­‐l   0.119  
-­‐l   0.886  
+t   +l   0.3   -­‐t   +l   0.083  
+t   -­‐l   0.7   -­‐t   -­‐l   0.747  
-­‐t   +l   0.1  
-­‐t   -­‐l   0.9  
* VE is variable elimination
Traffic#Domain#
Traffic Domain
R P (L) = ?

T ! Inference#by#Enumera)on# ! Variable#Elimina)on#
XX X X
= P (L|t)P (r)P (t|r) = P (L|t) P (r)P (t|r)
L t r t r
Join#on#r# Join#on#r#

Join#on#t# Eliminate#r#

Eliminate#r# Join#on#t#

Eliminate#t# Eliminate#t#

27
Marginalizing Early
Marginalizing#Early!#(aka#VE)#
Join#R# Sum#out#R# Join#T# Sum#out#T#

+r# +t# 0.08#


+r# 0.1# +r# Qt# 0.02#
Qr# 0.9# +t# 0.17#
Qr# +t# 0.09#
Qt# 0.83#
Qr# Qt# 0.81#
R R, T T T, L L
+r# +t# 0.8#
+r# Qt# 0.2#
Qr# +t# 0.1#
T Qr# Qt# 0.9# L L +t# +l# 0.051#
+t# Ql# 0.119# +l# 0.134#
Qt# +l# 0.083# Ql# 0.866#
L +t# +l# 0.3# +t# +l# 0.3# Qt# Ql# 0.747#
+t# +l# 0.3#
+t# Ql# 0.7# +t# Ql# 0.7#
+t# Ql# 0.7#
Qt# +l# 0.1# Qt# +l# 0.1#
Qt# +l# 0.1#
Qt# Ql# 0.9# Qt# Ql# 0.9#
Qt# Ql# 0.9#

28
Evidence
§ If evidence, start with factors that select that evidence
§ No evidence uses these initial factors:

+r   0.1   +r   +t   0.8   +t   +l   0.3  


-­‐r   0.9   +r   -­‐t   0.2   +t   -­‐l   0.7  
-­‐r   +t   0.1   -­‐t   +l   0.1  
-­‐r   -­‐t   0.9   -­‐t   -­‐l   0.9  

§ Computing , the initial factors become:

+r   0.1   +r   +t   0.8   +t   +l   0.3  


+r   -­‐t   0.2   +t   -­‐l   0.7  
-­‐t   +l   0.1  
-­‐t   -­‐l   0.9  

§ We eliminate all vars other than query + evidence


Evidence II
§ Result will be a selected joint of query and evidence
§ E.g. for P(L | +r), we d end up with:

Normalize
+r   +l   0.026   +l   0.26  
+r   -­‐l   0.074   -­‐l   0.74  

§ To get our answer, just normalize this!

§ That’s it!
General Variable Elimination
§ Query:

§ Start with initial factors:


§ Local CPTs (but instantiated by evidence)

§ While there are still hidden variables (not Q or


evidence):
§ Pick a hidden variable H
§ Join all factors mentioning H
§ Eliminate (sum out) H

§ Join all remaining factors and normalize


Variable Elimination Bayes Rule
Start / Select Join on B Normalize

B a, B
B P
+b 0.1
-b 0.9 a
A B P A B P
B A P
+a +b 0.08 +a +b 8/17
+b +a 0.8 +a -b 0.09 +a -b 9/17
b -a 0.2
-b +a 0.1
-b -a 0.9
Example
Query:

Choose A
Example

Choose E

Finish with B

Normalize
Variable Elimination
P(B, j, m) = ∑ P(b, j, m, A, E) = B E
A,E

∑ P(B)P(E)P(A | B, E)P(m | A)P( j | A) A


A,E

∑ P(B)P(E)∑ P(A | B, E)P(m | A)P( j | A)


E A M J

= ∑ P(B)P(E)∑ P(m, j, A | B, E )
E A

= ∑ P(B)P(E)P(m, j | B, E ) = P(B)∑ P(m, j, E | B)


E E

= P(B)P(m, j | B)
Another#Variable#Elimina)on#Example#
Another#Variable#Elimina)o
Another Example
Another#Variable#Elimina)on#Example#
Another#Variable#Elimina)on#Example#

Computa)onal#complexity#cri)cally#
Computa)onal#complexity#cri)cally#
depends#on#the#largest#factor#being#
depends#on#the#largest#factor#being#
generated#in#this#process.##Size#of#facto
generated#in#this#process.##Size#of#factor
Computa)onal#complexity#cri)cally#
=#number#of#entries#in#table.##In#
=#number#of#entries#in#table.##In#
depends#on#the#largest#factor#being#
example#above#(assuming#binary)#all#
example#above#(assuming#binary)#all#
generated#in#this#process.##Size#of#fact
factors#generated#are#of#size#2#QQQ#as#
factors#generated#are#of#size#2#QQQ#as#
=#number#of#entries#in#table.##In#
they#all#only#have#one#variable#(Z,#Z,#
they#all#only#have#one#variable#(Z,#Z,#
example#above#(assuming#binary)#all#
and#X3#respec)vely).##
and#Xfactors#generated#are#of#size#2#QQQ#as#
3#respec)vely).##
they#all#only#have#one#variable#(Z,#Z,#
and#X3#respec)vely).##

36
Variable#Elimina)on#Ordering#
Variable#Elimina)on#Ordering#
Variable Elimination Ordering
! For#the#query#P(Xn|y1,…,yn)#work#through#the#following#two#different#orderin
as#done#in#previous#slide:#Z,#X1,#…,#XnQ1#and#X1,#…,#XnQ1,#Z.##What#is#the#size#of#th
maximum#factor#generated#for#each#of#the#orderings?#
! For#the#query#P(Xn|y1,…,yn)#work#through#the#following#two#different#orderings#
as#done#in#previous#slide:#Z,#X1,#…,#XnQ1#and#X1,#…,#XnQ1,#Z.##What#is#the#size#of#the#
maximum#factor#generated#for#each#of#the#orderings?#

…#
…#

…# …#

! Answer:#2n+1#versus#22#(assuming#binary)#
! Answer:#2n+1#versus#22#(assuming#binary)#
! In#general:#the#ordering#can#greatly#affect#efficiency.###
! In#general:#the#ordering#can#greatly#affect#efficiency.###
37
VE: Computational and Space
Complexity
VE:#Computa)onal#and#Space#Complexity#
! The#computa)onal#and#space#complexity#of#variable#elimina)on#is#
determined#by#the#largest#factor#

! The#elimina)on#ordering#can#greatly#affect#the#size#of#the#largest#factor.###
! E.g.,#previous#slide’s#example#2n#vs.#2#

! Does#there#always#exist#an#ordering#that#only#results#in#small#factors?#
! No!#

38
Exact Inference: Variable Elimination
§ Remaining Issues:
§ Complexity: exponential in tree width (size of the
largest factor created)
§ Best elimination ordering? NP-hard problem

§ What you need to know:


§ Should be able to run it on small examples, understand
the factor creation / reduction flow
§ Better than enumeration: saves time by marginalizing
variables as soon as possible rather than at the end

§ We have seen a special case of VE already


§ HMM Forward Inference
Variable#Elimina)
Variable Elimination
! Interleave#joining#and#marginalizing#

! dk#entries#computed#for#a#factor#over#k#
variables#with#domain#sizes#d#

! Ordering#of#elimina)on#of#hidden#variables#
can#affect#size#of#factors#generated#

! Worst#case:#running#)me#exponen)al#in#the#
size#of#the#Bayes’#net#
40

You might also like