Artificial Intelligence: Cognitive Agents: AI, Uncertainty & Bayesian Networks
Artificial Intelligence: Cognitive Agents: AI, Uncertainty & Bayesian Networks
Artificial Intelligence: Cognitive Agents: AI, Uncertainty & Bayesian Networks
2015-03-10 / 03-12
Kim, Byoung-Hee
Biointelligence Laboratory
Seoul National University
http://bi.snu.ac.kr
A Bayesian network is a graphical model
for probabilistic relationships among a set of variables
Causality, Dependency
From correlation to causality
정성적 방법
정량적 방법
Granger causality index
http://www.google.com/trends/
Google의 CausalImpact
R package for causal inference in time series
Official posting: http://google-
opensource.blogspot.kr/2014/09/causalimpact-new-open-
source-package.html
소개 기사(영문): https://gigaom.com/2014/09/11/google-
has-open-sourced-a-tool-for-inferring-cause-from-
correlations/
SEASON
DRY 0.6
SPRINKLER RAINY 0.4 RAIN
SEASON DRY RAINY SEASON DRY RAINY
SLIPPERY
WET YES NO
YES 0.8 0.1
NO 0.2 0.9
Q1
A1
cat car
Often hand programming not possible.
Solution? Get the computer to program itself, by
showing it examples of the behavior we want!
This is the learning approach to AI.
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 30
Artificial Intelligence (AI)
(Traditional) AI
Knowledge & reasoning; work with facts/assertions;
develop rules of logical inference
Planning: work with applicability/effects of actions;
develop searches for actions which achieve goals/avert
disasters.
Expert systems: develop by hand a set of rules for
examining inputs, updating internal states and
generating outputs
Inferences viewed as
message passing along the network
ex) 𝑝 𝑥1 , 𝑥2 , … , 𝑥7 =
* Without given DAG structure, usual chain rule can be applied to get
the joint distribution. But computational cost is much higher.
Likelihood
p ( X | Y ) p (Y ) Prior
p (Y | X )
p( X )
Posterior Normalizing
constant
p( X ) p ( X | Y ) p (Y )
Y
Likelihood: p (D | w )
Frequentist
w: a fixed parameter determined by ‘estimator’
Maximum likelihood: Error function = log p ( D | w )
Error bars: Obtained by the distribution of possible data sets D
Bootstrap
Cross-validation
Bayesian
a probability distribution w: the uncertainty in the parameters
Prior knowledge
Noninformative (uniform) prior, Laplace correction in estimating priors
Monte Carlo methods, variational Bayes, EP
(See an article ‘WHERE Do PROBABILITIES COME FROM?’ on page 491 in the textbook (Russell and Norvig, 2010) for more discussion)
Node c is tail-to-tail
Node c is head-to-tail
Node c is head-to-head
Head-to-head node
Blocks a path if is unobserved, but on the node, and/or
at least one of its descendants, is observed the path
becomes unblocked.
d-separation?
All paths are blocked.
The joint distribution will satisfy conditional
independence w.r.t. concerned variables.
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 53
d-separation
(a) a is dependent to b given c
Head-to-head node e is unblocked, because a descendant c is in
the conditioning set.
Tail-to-tail node f is unblocked
(independent)
parents
Co-parents
children
binomial Gaussian
Exponential family & conjugacy
Many probability densities on x can be represented as the same form
p ( x | ) h ( x ) g ( ) exp u ( x )
T
There are conjugate family of density functions having the same form of density
functions
Beta & binomial F beta Dirichlet
Dirichlet & multinomial
Normal & Normal x binomial multinomial
1 The principle of indifference: head and tail are equally probable 𝑃 ℎ𝑒𝑎𝑑𝑠 = 1ൗ2
Usual method
Estimate the probability distribution of a variable X
based on a relative frequency and belief concerning a
relative frequency
binomial
beta
Expert System
Uncertain expert knowledge can be encoded into a Bayesian network
DAG in a Bayesian network is hand-constructed by domain experts
Then the conditional probabilities were assessed by the expert, learned from data, or
obtained using a combination of both techniques.
Bayesian network-based expert systems are popular
Planning
In some different form, known as decision graphs or influence diagrams
We don’t cover about this direction
https://www.coursera.org/course/pgm :
Probabilistic Graphical Models by D. Koller
Fully connected graph: The set DF will contain all possible distributions
Fully disconnected graph: The joint distributions which factorize into the
product of the marginal distributions over the variables only.
Multivariate Gaussian
1 1 1 1
N (x | μ, Σ ) exp ( x μ ) Σ ( x μ )
T
(2 )
D /2 1/ 2
|Σ| 2