22 BN Inference

CSE 473: Artificial Intelligence
Bayesian Networks: Inference
Hanna Hajishirzi
Many slides over the course adapted from either Luke

Zettlemoyer, Pieter Abbeel, Dan Klein, Stuart Russell or Andrew
Moore
1
Outline
§ Bayesian Networks Inference

§ Exact Inference: Variable Elimination
§ Approximate Inference: Sampling
Bayes Net
Representation
3
4
Reachability (D-Separation)
! Question: Are X and Y Active Triples Inactive Triples
conditionally independent (dependent) (Independent)
given evidence vars {Z}?
! Yes, if X and Y “separated” by Z
! Look for active paths from X to Y
! No active paths = independence!
! A path is active if each triple
is active:
! Causal chain A → B → C where B
is unobserved (either direction)
! Common cause A ← B → C where
B is unobserved
! Common effect (aka v-structure)
A → B ← C where B or one of its
descendents is observed
! All it takes to block a path is
a single inactive segment
! 5
Bayes Net Joint Distribution
Example:#Alarm#Network#
B# P(B)# E# P(E)#
+b# 0.001#
B# E# +e# 0.002#
Qb# 0.999# Qe# 0.998#
A#
A# J# P(J|A)# A# M# P(M|A)#
B# E# A# P(A|B,E)#
+a# +j# 0.9# +a# +m# 0.7#
+b# +e# +a# 0.95#
+a# Qj# 0.1# +a# Qm# 0.3#
Qa# +j# 0.05# J# M# Qa# +m# 0.01#
+b# +e# Qa# 0.05#
+b# Qe# +a# 0.94#
Qa# Qj# 0.95# Qa# Qm# 0.99#
+b# Qe# Qa# 0.06#
Qb# +e# +a# 0.29#
Qb# +e# Qa# 0.71#
Qb# Qe# +a# 0.001#
Qb# Qe# Qa# 0.999#
6
Bayes Net Joint Distribution
Example:#Alarm#Network#
B# P(B)# E# P(E)#
+b# 0.001#
B# E# +e# 0.002#
Qb# 0.999# Qe# 0.998#
A#
A# J# P(J|A)# A# M# P(M|A)#
B# E# A# P(A|B,E)#
+a# +j# 0.9# +a# +m# 0.7#
+b# +e# +a# 0.95#
+a# Qj# 0.1# +a# Qm# 0.3#
Qa# +j# 0.05# J# M# Qa# +m# 0.01#
+b# +e# Qa# 0.05#
+b# Qe# +a# 0.94#
Qa# Qj# 0.95# Qa# Qm# 0.99#
+b# Qe# Qa# 0.06#
Qb# +e# +a# 0.29#
Qb# +e# Qa# 0.71#
Qb# Qe# +a# 0.001#
Qb# Qe# Qa# 0.999#
7
Probabilistic Inference
§ Probabilistic inference: compute a desired probability from
other known probabilities (e.g. conditional from joint)
§ We generally compute conditional probabilities

§ P(on time | no reported accidents) = 0.90
§ These represent the agent s beliefs given the evidence
§ Probabilities change with new evidence:

§ P(on time | no accidents, 5 a.m.) = 0.95
§ P(on time | no accidents, 5 a.m., raining) = 0.80
§ Observing new evidence causes beliefs to be updated
Inference
Inference#
! Inference:#calcula)ng#some# ! Examples:#
useful#quan)ty#from#a#joint#
! Posterior#probability#
probability#distribu)on#
! Most#likely#explana)on:#
9
Inference by Enumeration
§ General case:
§ Evidence variables:
§ Query* variable:
§ Hidden variables: All variables
§ We want:
§ First, select the entries consistent with the evidence
§ Second, sum out H to get joint of Query and evidence:
§ Finally, normalize the remaining entries to conditionalize

§ Obvious problems:
§ Worst-case time complexity O(dn)
§ Space complexity O(dn) to store the joint distribution
Inference in BN by Enumeration
Inference#by#Enumera)on#in#Bayes’#Net#
! Given#unlimited#)me,#inference#in#BNs#is#easy#
B# E#
! Reminder#of#inference#by#enumera)on#by#example:#
P (B | + j, +m) /B P (B, +j, +m) A#

X
= P (B, e, a, +j, +m)
e,a
J# M#
X
= P (B)P (e)P (a|B, e)P (+j|a)P (+m|a)
e,a
=P (B)P (+e)P (+a|B, +e)P (+j| + a)P (+m| + a) + P (B)P (+e)P ( a|B, +e)P (+j| a)P (+m| a)
P (B)P ( e)P (+a|B, e)P (+j| + a)P (+m| + a) + P (B)P ( e)P ( a|B, e)P (+j| a)P (+m| a)
11
Inference#by#Enumera)on?#
Inference by Enumerataion
P (Antilock|observed variables) = ?
12
Variable Elimination
§ Why is inference by enumeration so slow?
§ You join up the whole joint distribution before you
sum out the hidden variables
§ You end up repeating a lot of work!
§ Idea: interleave joining and marginalizing!

§ Called Variable Elimination
§ Still NP-hard, but usually much faster than inference
by enumeration
§ We ll need some new notation to define VE

Review
§ Joint distribution: P(X,Y) T W P

§ Entries P(x,y) for all x, y hot sun 0.4
§ Sums to 1
hot rain 0.1
cold sun 0.2
cold rain 0.3
§ Selected joint: P(x,Y)
§ A slice of the joint distribution T W P
§ Entries P(x,y) for fixed x, all y cold sun 0.2
§ Sums to P(x)
cold rain 0.3
Review
§ Family of conditionals:
P(X |Y)
T W P
§ Multiple conditionals
§ Entries P(x | y) for all x, y hot sun 0.8
§ Sums to |Y|
hot rain 0.2
cold sun 0.4
cold rain 0.6
§ Single conditional: P(Y | x)
§ Entries P(y | x) for fixed x, all
T W P
y cold sun 0.4
§ Sums to 1
cold rain 0.6
Review
§ Specified family: P(y | X)

§ Entries P(y | x) for fixed y,
T W P
but for all x hot rain 0.2
§ Sums to … who knows!
cold rain 0.6
§ In general, when we write P(Y1 … YN | X1 … XM)

§ It is a “factor,” a multi-dimensional array
§ Its values are all P(y1 … yN | x1 … xM)
§ Any assigned X or Y is a dimension missing (selected) from the array
Inference
§ Inference is expensive with enumeration
§ Variable elimination:
§ Interleave joining and marginalization: Store
initial results and then join with the rest
Example: Traffic Domain
§ Random R +r 0.1
-‐r 0.9
Variables
§ R: Raining T
§ T: Traffic +r +t 0.8
+r -‐t 0.2
§ L: Late for class! L -‐r +t 0.1
-‐r -‐t 0.9
§ First query: P(L)
+t +l 0.3
+t -‐l 0.7
-‐t +l 0.1
-‐t -‐l 0.9
Variable Elimination Outline
§ Maintain a set of tables called factors
§ Initial factors are local CPTs (one per node)
+r 0.1 +r +t 0.8 +t +l 0.3

+r -‐t 0.2 +t -‐l 0.7
-‐r 0.9
-‐r +t 0.1 -‐t +l 0.1
-‐r -‐t 0.9 -‐t -‐l 0.9
§ Any known values are selected

§ E.g. if we know , the initial factors are
+r +t 0.8
+r 0.1 +t +l 0.3
+r -‐t 0.2
-‐r 0.9 -‐t +l 0.1
-‐r +t 0.1
-‐r -‐t 0.9
§ VE: Alternately join factors and eliminate variables

Operation 1: Join Factors
§ First basic operation: joining factors
§ Combining factors:
§ Just like a database join
§ Get all factors over the joining variable
§ Build a new factor over the union of the variables involved
§ Example: Join on R
R,T
R
+r 0.1 +r +t 0.8 +r +t 0.08
-‐r 0.9 +r -‐t 0.2 +r -‐t 0.02
T
-‐r +t 0.1 -‐r +t 0.09
-‐r -‐t 0.9 -‐r -‐t 0.81
§ Computation for each entry: pointwise products
Example: Multiple Joins
+r 0.1
R -‐r 0.9 Join R
+r +t 0.08 R, T
+r -‐t 0.02
T +r +t 0.8 -‐r +t 0.09
+r -‐t 0.2 -‐r -‐t 0.81
L
-‐r +t 0.1
-‐r -‐t 0.9
L
+t +l 0.3 +t +l 0.3
+t -‐l 0.7 +t -‐l 0.7
-‐t +l 0.1 -‐t +l 0.1
-‐t -‐l 0.9 -‐t -‐l 0.9
Example: Multiple Joins
R, T, L
+r +t 0.08
+r -‐t 0.02
R, T -‐r +t 0.09 Join T +r +t +l 0.024
-‐r -‐t 0.81
+r +t -‐l 0.056
+r -‐t +l 0.002
L
+r -‐t -‐l 0.018
+t +l 0.3
-‐r +t +l 0.027
+t -‐l 0.7
-‐r +t -‐l 0.063
-‐t +l 0.1
-‐t -‐l 0.9
-‐r -‐t +l 0.081
-‐r -‐t -‐l 0.729
Operation 2: Eliminate
§ Second basic operation: marginalization
§ Take a factor and sum out a variable
§ Shrinks a factor to a smaller one
§ A projection operation
§ Example:
+r +t 0.08
+r -‐t 0.02 +t 0.17
-‐r +t 0.09 -‐t 0.83
-‐r -‐t 0.81
Multiple Elimination
R, T, L T, L L
Sum Sum
+r +t +l 0.024
out R out T
+r +t -‐l 0.056
+r -‐t +l 0.002 +t +l 0.051 +l 0.134
-‐l 0.886
+r -‐t -‐l 0.018 +t -‐l 0.119
-‐r +t +l 0.027 -‐t +l 0.083
-‐r +t -‐l 0.063 -‐t -‐l 0.747
-‐r -‐t +l 0.081
-‐r -‐t -‐l 0.729
P(L) : Marginalizing Early!
+r 0.1
-‐r 0.9 Join R Sum out R
R +r +t 0.08
+r +t 0.8 +r -‐t 0.02 +t 0.17
-‐r +t 0.09 -‐t 0.83
+r -‐t 0.2
-‐r +t 0.1 -‐r -‐t 0.81 T
T
-‐r -‐t 0.9
R, T
L +t +l 0.3 +t +l 0.3 +t +l 0.3

L
+t -‐l 0.7 +t -‐l 0.7 L +t -‐l 0.7
-‐t +l 0.1 -‐t +l 0.1 -‐t +l 0.1
-‐t -‐l 0.9 -‐t -‐l 0.9 -‐t -‐l 0.9
Marginalizing Early (aka VE*)
T
T, L L
Join T Sum out T
L
+t 0.17
-‐t 0.83
+t +l 0.051
+l 0.134
+t -‐l 0.119
-‐l 0.886
+t +l 0.3 -‐t +l 0.083
+t -‐l 0.7 -‐t -‐l 0.747
-‐t +l 0.1
-‐t -‐l 0.9
* VE is variable elimination
Traffic#Domain#
Traffic Domain
R P (L) = ?
T ! Inference#by#Enumera)on# ! Variable#Elimina)on#
XX X X
= P (L|t)P (r)P (t|r) = P (L|t) P (r)P (t|r)
L t r t r
Join#on#r# Join#on#r#
Join#on#t# Eliminate#r#
Eliminate#r# Join#on#t#
Eliminate#t# Eliminate#t#
27
Marginalizing Early
Marginalizing#Early!#(aka#VE)#
Join#R# Sum#out#R# Join#T# Sum#out#T#
+r# +t# 0.08#

+r# 0.1# +r# Qt# 0.02#
Qr# 0.9# +t# 0.17#
Qr# +t# 0.09#
Qt# 0.83#
Qr# Qt# 0.81#
R R, T T T, L L
+r# +t# 0.8#
+r# Qt# 0.2#
Qr# +t# 0.1#
T Qr# Qt# 0.9# L L +t# +l# 0.051#
+t# Ql# 0.119# +l# 0.134#
Qt# +l# 0.083# Ql# 0.866#
L +t# +l# 0.3# +t# +l# 0.3# Qt# Ql# 0.747#
+t# +l# 0.3#
+t# Ql# 0.7# +t# Ql# 0.7#
+t# Ql# 0.7#
Qt# +l# 0.1# Qt# +l# 0.1#
Qt# +l# 0.1#
Qt# Ql# 0.9# Qt# Ql# 0.9#
Qt# Ql# 0.9#
28
Evidence
§ If evidence, start with factors that select that evidence
§ No evidence uses these initial factors:
+r 0.1 +r +t 0.8 +t +l 0.3

-‐r 0.9 +r -‐t 0.2 +t -‐l 0.7
-‐r +t 0.1 -‐t +l 0.1
-‐r -‐t 0.9 -‐t -‐l 0.9
§ Computing , the initial factors become:
+r 0.1 +r +t 0.8 +t +l 0.3

+r -‐t 0.2 +t -‐l 0.7
-‐t +l 0.1
-‐t -‐l 0.9
§ We eliminate all vars other than query + evidence

Evidence II
§ Result will be a selected joint of query and evidence
§ E.g. for P(L | +r), we d end up with:
Normalize
+r +l 0.026 +l 0.26
+r -‐l 0.074 -‐l 0.74
§ To get our answer, just normalize this!
§ That’s it!
General Variable Elimination
§ Query:
§ Start with initial factors:

§ Local CPTs (but instantiated by evidence)
§ While there are still hidden variables (not Q or

evidence):
§ Pick a hidden variable H
§ Join all factors mentioning H
§ Eliminate (sum out) H
§ Join all remaining factors and normalize

Variable Elimination Bayes Rule
Start / Select Join on B Normalize
B a, B
B P
+b 0.1
-b 0.9 a
A B P A B P
B A P
+a +b 0.08 +a +b 8/17
+b +a 0.8 +a -b 0.09 +a -b 9/17
b -a 0.2
-b +a 0.1
-b -a 0.9
Example
Query:
Choose A
Example
Choose E
Finish with B
Normalize
P(B, j, m) = ∑ P(b, j, m, A, E) = B E
A,E
∑ P(B)P(E)P(A | B, E)P(m | A)P( j | A) A

A,E
∑ P(B)P(E)∑ P(A | B, E)P(m | A)P( j | A)

E A M J
= ∑ P(B)P(E)∑ P(m, j, A | B, E )
E A
= ∑ P(B)P(E)P(m, j | B, E ) = P(B)∑ P(m, j, E | B)

E E
= P(B)P(m, j | B)
Another#Variable#Elimina)on#Example#
Another#Variable#Elimina)o
Another Example
Computa)onal#complexity#cri)cally#
depends#on#the#largest#factor#being#
generated#in#this#process.##Size#of#facto
generated#in#this#process.##Size#of#factor
=#number#of#entries#in#table.##In#
example#above#(assuming#binary)#all#
generated#in#this#process.##Size#of#fact
factors#generated#are#of#size#2#QQQ#as#
factors#generated#are#of#size#2#QQQ#as#
they#all#only#have#one#variable#(Z,#Z,#
and#X3#respec)vely).##
and#Xfactors#generated#are#of#size#2#QQQ#as#
3#respec)vely).##
and#X3#respec)vely).##
36
Variable#Elimina)on#Ordering#
Variable#Elimina)on#Ordering#
Variable Elimination Ordering
! For#the#query#P(Xn|y1,…,yn)#work#through#the#following#two#different#orderin
as#done#in#previous#slide:#Z,#X1,#…,#XnQ1#and#X1,#…,#XnQ1,#Z.##What#is#the#size#of#th
maximum#factor#generated#for#each#of#the#orderings?#
! For#the#query#P(Xn|y1,…,yn)#work#through#the#following#two#different#orderings#
as#done#in#previous#slide:#Z,#X1,#…,#XnQ1#and#X1,#…,#XnQ1,#Z.##What#is#the#size#of#the#
maximum#factor#generated#for#each#of#the#orderings?#
…#
…#
…# …#
! Answer:#2n+1#versus#22#(assuming#binary)#
! Answer:#2n+1#versus#22#(assuming#binary)#
! In#general:#the#ordering#can#greatly#affect#efficiency.###
! In#general:#the#ordering#can#greatly#affect#efficiency.###
37
VE: Computational and Space
Complexity
VE:#Computa)onal#and#Space#Complexity#
! The#computa)onal#and#space#complexity#of#variable#elimina)on#is#
determined#by#the#largest#factor#
! The#elimina)on#ordering#can#greatly#affect#the#size#of#the#largest#factor.###
! E.g.,#previous#slide’s#example#2n#vs.#2#
! Does#there#always#exist#an#ordering#that#only#results#in#small#factors?#
! No!#
38
Exact Inference: Variable Elimination
§ Remaining Issues:
§ Complexity: exponential in tree width (size of the
largest factor created)
§ Best elimination ordering? NP-hard problem
§ What you need to know:

§ Should be able to run it on small examples, understand
the factor creation / reduction flow
§ Better than enumeration: saves time by marginalizing
variables as soon as possible rather than at the end
§ We have seen a special case of VE already

§ HMM Forward Inference
Variable#Elimina)
! Interleave#joining#and#marginalizing#
! dk#entries#computed#for#a#factor#over#k#
variables#with#domain#sizes#d#
! Ordering#of#elimina)on#of#hidden#variables#
can#affect#size#of#factors#generated#
! Worst#case:#running#)me#exponen)al#in#the#
size#of#the#Bayes’#net#
40

22 BN Inference

Uploaded by

Copyright:

Available Formats

22 BN Inference

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

22 BN Inference

Uploaded by

Copyright:

Available Formats

CSE 473: Artificial Intelligence

Bayesian Networks: Inference

Many slides over the course adapted from either Luke

§ Bayesian Networks Inference

§ We generally compute conditional probabilities

§ Probabilities change with new evidence:

§ Finally, normalize the remaining entries to conditionalize

P (B | + j, +m) /B P (B, +j, +m) A#

§ Idea: interleave joining and marginalizing!

§ We ll need some new notation to define VE

§ Joint distribution: P(X,Y) T W P

§ Specified family: P(y | X)

§ In general, when we write P(Y1 … YN | X1 … XM)

+r 0.1 +r +t 0.8 +t +l 0.3

§ Any known values are selected

§ VE: Alternately join factors and eliminate variables

L +t +l 0.3 +t +l 0.3 +t +l 0.3

+r# +t# 0.08#

+r 0.1 +r +t 0.8 +t +l 0.3

§ Computing , the initial factors become:

+r 0.1 +r +t 0.8 +t +l 0.3

§ We eliminate all vars other than query + evidence

§ To get our answer, just normalize this!

§ Start with initial factors:

§ While there are still hidden variables (not Q or

§ Join all remaining factors and normalize

∑ P(B)P(E)P(A | B, E)P(m | A)P( j | A) A

∑ P(B)P(E)∑ P(A | B, E)P(m | A)P( j | A)

= ∑ P(B)P(E)P(m, j | B, E ) = P(B)∑ P(m, j, E | B)

§ What you need to know:

§ We have seen a special case of VE already

You might also like