22 BN Inference
22 BN Inference
22 BN Inference
Hanna Hajishirzi
3
4
Reachability (D-Separation)
! Question: Are X and Y Active Triples Inactive Triples
conditionally independent (dependent) (Independent)
given evidence vars {Z}?
! Yes, if X and Y “separated” by Z
! Look for active paths from X to Y
! No active paths = independence!
! A path is active if each triple
is active:
! Causal chain A → B → C where B
is unobserved (either direction)
! Common cause A ← B → C where
B is unobserved
! Common effect (aka v-structure)
A → B ← C where B or one of its
descendents is observed
! All it takes to block a path is
a single inactive segment
! 5
Bayes Net Joint Distribution
Example:#Alarm#Network#
B# P(B)# E# P(E)#
+b# 0.001#
B# E# +e# 0.002#
Qb# 0.999# Qe# 0.998#
A#
A# J# P(J|A)# A# M# P(M|A)#
B# E# A# P(A|B,E)#
+a# +j# 0.9# +a# +m# 0.7#
+b# +e# +a# 0.95#
+a# Qj# 0.1# +a# Qm# 0.3#
Qa# +j# 0.05# J# M# Qa# +m# 0.01#
+b# +e# Qa# 0.05#
+b# Qe# +a# 0.94#
Qa# Qj# 0.95# Qa# Qm# 0.99#
+b# Qe# Qa# 0.06#
Qb# +e# +a# 0.29#
Qb# +e# Qa# 0.71#
Qb# Qe# +a# 0.001#
Qb# Qe# Qa# 0.999#
6
Bayes Net Joint Distribution
Example:#Alarm#Network#
B# P(B)# E# P(E)#
+b# 0.001#
B# E# +e# 0.002#
Qb# 0.999# Qe# 0.998#
A#
A# J# P(J|A)# A# M# P(M|A)#
B# E# A# P(A|B,E)#
+a# +j# 0.9# +a# +m# 0.7#
+b# +e# +a# 0.95#
+a# Qj# 0.1# +a# Qm# 0.3#
Qa# +j# 0.05# J# M# Qa# +m# 0.01#
+b# +e# Qa# 0.05#
+b# Qe# +a# 0.94#
Qa# Qj# 0.95# Qa# Qm# 0.99#
+b# Qe# Qa# 0.06#
Qb# +e# +a# 0.29#
Qb# +e# Qa# 0.71#
Qb# Qe# +a# 0.001#
Qb# Qe# Qa# 0.999#
7
Probabilistic Inference
§ Probabilistic inference: compute a desired probability from
other known probabilities (e.g. conditional from joint)
! Most#likely#explana)on:#
9
Inference by Enumeration
§ General case:
§ Evidence variables:
§ Query* variable:
§ Hidden variables: All variables
§ We want:
§ First, select the entries consistent with the evidence
§ Second, sum out H to get joint of Query and evidence:
=P (B)P (+e)P (+a|B, +e)P (+j| + a)P (+m| + a) + P (B)P (+e)P ( a|B, +e)P (+j| a)P (+m| a)
P (B)P ( e)P (+a|B, e)P (+j| + a)P (+m| + a) + P (B)P ( e)P ( a|B, e)P (+j| a)P (+m| a)
11
Inference#by#Enumera)on?#
Inference by Enumerataion
P (Antilock|observed variables) = ?
12
Variable Elimination
§ Why is inference by enumeration so slow?
§ You join up the whole joint distribution before you
sum out the hidden variables
§ You end up repeating a lot of work!
§ Variable elimination:
§ Interleave joining and marginalization: Store
initial results and then join with the rest
Example: Traffic Domain
§ Random R +r
0.1
-‐r
0.9
Variables
§ R: Raining T
§ T: Traffic +r
+t
0.8
+r
-‐t
0.2
§ L: Late for class! L -‐r
+t
0.1
-‐r
-‐t
0.9
§ First query: P(L)
+t
+l
0.3
+t
-‐l
0.7
-‐t
+l
0.1
-‐t
-‐l
0.9
Variable Elimination Outline
§ Maintain a set of tables called factors
§ Initial factors are local CPTs (one per node)
+r
+t
0.8
+r
0.1
+t
+l
0.3
+r
-‐t
0.2
-‐r
0.9
-‐t
+l
0.1
-‐r
+t
0.1
-‐r
-‐t
0.9
§ Example: Join on R
R,T
R
+r
0.1
+r
+t
0.8
+r
+t
0.08
-‐r
0.9
+r
-‐t
0.2
+r
-‐t
0.02
T
-‐r
+t
0.1
-‐r
+t
0.09
-‐r
-‐t
0.9
-‐r
-‐t
0.81
§ Computation for each entry: pointwise products
Example: Multiple Joins
+r
0.1
R -‐r
0.9
Join R
+r
+t
0.08
R, T
+r
-‐t
0.02
T +r
+t
0.8
-‐r
+t
0.09
+r
-‐t
0.2
-‐r
-‐t
0.81
L
-‐r
+t
0.1
-‐r
-‐t
0.9
L
+t
+l
0.3
+t
+l
0.3
+t
-‐l
0.7
+t
-‐l
0.7
-‐t
+l
0.1
-‐t
+l
0.1
-‐t
-‐l
0.9
-‐t
-‐l
0.9
Example: Multiple Joins
R, T, L
+r
+t
0.08
+r
-‐t
0.02
R, T -‐r
+t
0.09
Join T +r
+t
+l
0.024
-‐r
-‐t
0.81
+r
+t
-‐l
0.056
+r
-‐t
+l
0.002
L
+r
-‐t
-‐l
0.018
+t
+l
0.3
-‐r
+t
+l
0.027
+t
-‐l
0.7
-‐r
+t
-‐l
0.063
-‐t
+l
0.1
-‐t
-‐l
0.9
-‐r
-‐t
+l
0.081
-‐r
-‐t
-‐l
0.729
Operation 2: Eliminate
§ Second basic operation: marginalization
§ Take a factor and sum out a variable
§ Shrinks a factor to a smaller one
§ A projection operation
§ Example:
+r
+t
0.08
+r
-‐t
0.02
+t
0.17
-‐r
+t
0.09
-‐t
0.83
-‐r
-‐t
0.81
Multiple Elimination
R, T, L T, L L
Sum Sum
+r
+t
+l
0.024
out R out T
+r
+t
-‐l
0.056
+r
-‐t
+l
0.002
+t
+l
0.051
+l
0.134
-‐l
0.886
+r
-‐t
-‐l
0.018
+t
-‐l
0.119
-‐r
+t
+l
0.027
-‐t
+l
0.083
-‐r
+t
-‐l
0.063
-‐t
-‐l
0.747
-‐r
-‐t
+l
0.081
-‐r
-‐t
-‐l
0.729
P(L) : Marginalizing Early!
+r
0.1
-‐r
0.9
Join R Sum out R
R +r
+t
0.08
+r
+t
0.8
+r
-‐t
0.02
+t
0.17
-‐r
+t
0.09
-‐t
0.83
+r
-‐t
0.2
-‐r
+t
0.1
-‐r
-‐t
0.81
T
T
-‐r
-‐t
0.9
R, T
+t
0.17
-‐t
0.83
+t
+l
0.051
+l
0.134
+t
-‐l
0.119
-‐l
0.886
+t
+l
0.3
-‐t
+l
0.083
+t
-‐l
0.7
-‐t
-‐l
0.747
-‐t
+l
0.1
-‐t
-‐l
0.9
* VE is variable elimination
Traffic#Domain#
Traffic Domain
R P (L) = ?
T ! Inference#by#Enumera)on# ! Variable#Elimina)on#
XX X X
= P (L|t)P (r)P (t|r) = P (L|t) P (r)P (t|r)
L t r t r
Join#on#r# Join#on#r#
Join#on#t# Eliminate#r#
Eliminate#r# Join#on#t#
Eliminate#t# Eliminate#t#
27
Marginalizing Early
Marginalizing#Early!#(aka#VE)#
Join#R# Sum#out#R# Join#T# Sum#out#T#
28
Evidence
§ If evidence, start with factors that select that evidence
§ No evidence uses these initial factors:
Normalize
+r
+l
0.026
+l
0.26
+r
-‐l
0.074
-‐l
0.74
§ That’s it!
General Variable Elimination
§ Query:
B a, B
B P
+b 0.1
-b 0.9 a
A B P A B P
B A P
+a +b 0.08 +a +b 8/17
+b +a 0.8 +a -b 0.09 +a -b 9/17
b -a 0.2
-b +a 0.1
-b -a 0.9
Example
Query:
Choose A
Example
Choose E
Finish with B
Normalize
Variable Elimination
P(B, j, m) = ∑ P(b, j, m, A, E) = B E
A,E
= ∑ P(B)P(E)∑ P(m, j, A | B, E )
E A
= P(B)P(m, j | B)
Another#Variable#Elimina)on#Example#
Another#Variable#Elimina)o
Another Example
Another#Variable#Elimina)on#Example#
Another#Variable#Elimina)on#Example#
Computa)onal#complexity#cri)cally#
Computa)onal#complexity#cri)cally#
depends#on#the#largest#factor#being#
depends#on#the#largest#factor#being#
generated#in#this#process.##Size#of#facto
generated#in#this#process.##Size#of#factor
Computa)onal#complexity#cri)cally#
=#number#of#entries#in#table.##In#
=#number#of#entries#in#table.##In#
depends#on#the#largest#factor#being#
example#above#(assuming#binary)#all#
example#above#(assuming#binary)#all#
generated#in#this#process.##Size#of#fact
factors#generated#are#of#size#2#QQQ#as#
factors#generated#are#of#size#2#QQQ#as#
=#number#of#entries#in#table.##In#
they#all#only#have#one#variable#(Z,#Z,#
they#all#only#have#one#variable#(Z,#Z,#
example#above#(assuming#binary)#all#
and#X3#respec)vely).##
and#Xfactors#generated#are#of#size#2#QQQ#as#
3#respec)vely).##
they#all#only#have#one#variable#(Z,#Z,#
and#X3#respec)vely).##
36
Variable#Elimina)on#Ordering#
Variable#Elimina)on#Ordering#
Variable Elimination Ordering
! For#the#query#P(Xn|y1,…,yn)#work#through#the#following#two#different#orderin
as#done#in#previous#slide:#Z,#X1,#…,#XnQ1#and#X1,#…,#XnQ1,#Z.##What#is#the#size#of#th
maximum#factor#generated#for#each#of#the#orderings?#
! For#the#query#P(Xn|y1,…,yn)#work#through#the#following#two#different#orderings#
as#done#in#previous#slide:#Z,#X1,#…,#XnQ1#and#X1,#…,#XnQ1,#Z.##What#is#the#size#of#the#
maximum#factor#generated#for#each#of#the#orderings?#
…#
…#
…# …#
! Answer:#2n+1#versus#22#(assuming#binary)#
! Answer:#2n+1#versus#22#(assuming#binary)#
! In#general:#the#ordering#can#greatly#affect#efficiency.###
! In#general:#the#ordering#can#greatly#affect#efficiency.###
37
VE: Computational and Space
Complexity
VE:#Computa)onal#and#Space#Complexity#
! The#computa)onal#and#space#complexity#of#variable#elimina)on#is#
determined#by#the#largest#factor#
! The#elimina)on#ordering#can#greatly#affect#the#size#of#the#largest#factor.###
! E.g.,#previous#slide’s#example#2n#vs.#2#
! Does#there#always#exist#an#ordering#that#only#results#in#small#factors?#
! No!#
38
Exact Inference: Variable Elimination
§ Remaining Issues:
§ Complexity: exponential in tree width (size of the
largest factor created)
§ Best elimination ordering? NP-hard problem
! dk#entries#computed#for#a#factor#over#k#
variables#with#domain#sizes#d#
! Ordering#of#elimina)on#of#hidden#variables#
can#affect#size#of#factors#generated#
! Worst#case:#running#)me#exponen)al#in#the#
size#of#the#Bayes’#net#
40