Impact of Bayesian Network Model Structure On The
Impact of Bayesian Network Model Structure On The
Impact of Bayesian Network Model Structure On The
net/publication/282261408
CITATIONS READS
5 594
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Marek Jozef Druzdzel on 25 October 2015.
1 Introduction
2 Bayesian networks
Bayesian networks [2] are acyclic directed graphs modeling probabilistic depen-
dencies and independencies among variables. The graphical part of a Bayesian
network reflects the structure of a problem, while local interactions among neigh-
boring variables are quantified by conditional probability distributions. Bayesian
networks have proven to be powerful tools for modeling complex problems in-
volving uncertain knowledge.
Mathematically, a Bayesian network is an acyclic directed graph that con-
sists of a qualitative part, encoding existence of probabilistic influences among
domain’s variables in a directed graph, and a quantitative part, encoding the
joint probability distribution over these variables. Each node in the graph rep-
resents a random variable. Each arc represents a direct dependence between two
variables. Formally, the structure of the directed graph is a representation of a
factorization of the joint probability distribution. In case of a Bayesian network
that consists of n variables: X1 , X2 , ..., Xn , this factorization is represented as
follows:
∏
Pr(X1 , X2 , .., Xn ) = Pr(Xi |Pa(Xi )) , (1)
i
Table 1. Medical data used in our experiments (mv stands for missing values)
Table 2. Bayesian network models used in our experiments (#nodes: number of nodes;
µ #states: average number of states per node; µ in-degree: average number of parents
per node; #arcs: number of arcs; #params: number of numerical parameters)
We assumed that the models obtained this way were perfect in the sense
of having the right structure and containing parameters as precise as the data
would allow.
A critical element of our experiments is comparison of accuracy of models. We
define diagnostic accuracy as the percentage of correct diagnoses on real patient
5
cases. This is a simplification, as one might want to know the models’ sensitivity
and specificity for each of the disorders or even study the models’ ability to
detect a disorder in terms of their ROC (Receiver Operating Characteristic)
curves or AUC (Area Under the ROC Curve) measure. We have decided against
this because the ROC curves express models’ ability to detect single disorders.
So do sensitivity and specificity. We focused instead on a simple measure of the
percentage of correct diagnoses. Furthermore, because Bayesian network models
operate only on probabilities, we used probability as the decision criterion: the
diagnosis that is most likely given patient data is the diagnosis that the model
puts forward.
Because virtually each of the original data sets was rather small, we always
applied the method of “leave-one-out” [13] to test models’ performance. It in-
volves n-fold learning from n − 1 records out of the n records available and
subsequently testing it on the remaining nth record.
Fig. 2. Histograms for arc strength of the Bayesian network models (clock-wise: Acute
Inflammation, Spect Heart, Cardiotography, Hepatitis, Lymphography, and
Primary Tumor). The Euclidean distance was applied.
there are several values of arc strength that are more likely than others, their
probability distributions are generally spread over the entire range of 0 − 1.
5 Experimental results
We conducted two experiments to investigate the impact of departures from the
ideal structure of a Bayesian network on its accuracy. There are two straight-
forward ways of distorting the structure of a Bayesian network: (1) removing
its arcs, and (2) reversing them. Please note that adding additional arcs would
not have much impact on the accuracy of Bayesian network models, as addi-
tional dependencies introduced by such arcs will be compensated in the learning
process by parameters that capture independence numerically.
Fig. 3. The diagnostic accuracy of the six models (clock-wise: Acute Inflammation,
Spect Heart, Cardiotography, Hepatitis, Lymphography, and Primary Tu-
mor) as a function of the percentage of arcs removed. Arcs ordered according to the
Euclidean distance.
original models, then removed 10%, 20%, 30%, . . . , 90%, and 100% of their
arcs, re-learned their numerical parameters from the Irvine Machine Learning
Repository data sets by means of the EM algorithm, and re-tested the resulting
distorted models at each step. The first model in this sequence (0% arcs removed)
was the original, gold standard model and the last model (100% arcs removed)
was a model including all original variables but no arcs, i.e., it assumed that all
model variables are independent of each other.
In the experiment, we followed three different orders of arc removal: (a) as-
cending order of arc strengths (i.e., from the weakest to the strongest arc),
(b) descending order of arc strengths (i.e., from the strongest to the weakest
arc), and (c) random order.
Figures 3 and 4 show the results of our experiment for each of the models
and for the two measures of distance, Euclidean and Hellinger, respectively. The
8
Fig. 4. The diagnostic accuracy of the six models (clock-wise: Acute Inflammation,
Spect Heart, Cardiotography, Hepatitis, Lymphography, and Primary Tu-
mor) as a function of the percentage of arcs removed. Arcs ordered according to the
Hellinger distance.
6 Discussion
This paper presented the results of two experiments probing the question of
sensitivity of accuracy of Bayesian networks to their structure. We started from
learning realistic gold standard medical diagnostic models from real data sets
originating from the Irvine Machine Learning Repository. We subjected these
models to systematic structure distortions and tested the impact of these dis-
tortions on the accuracy of the models. In the first experiment, we removed
systematically fractions of the existing arcs and in the second experiment we
systematically reversed a fraction of the arcs. Our results suggest that the pre-
cise structure of Bayesian networks is not as important as popularly believed.
Structure transformations such as arc removal and arc reversal turn out to have
only moderate impact of the diagnostic quality of the models. Of these, arc
removal seems to have a stronger impact.
It is clear that when using the relative probability of disorder as the main
decision criterion for choosing the diagnosis, prior probability distributions are
10
Fig. 5. The diagnostic accuracy of the six models (clock-wise: Acute Inflammation,
Spect Heart, Cardiotography, Hepatitis, Lymphography, and Primary Tu-
mor) as a function of the percentage of arcs reversed. Arcs ordered according to the
Euclidean distance.
important. For example, diagnostic accuracy of the Spect Heart and Car-
diotography models reached 80% even after all arcs have been removed. The
dominating factor here is the prior probability distribution of the node represent-
ing the class variable, Cardiotography, with the following a-priori distribu-
tion (0.78, 0.14, 0.08). When no evidence reaches the Cardiotography node,
the model always chooses the first, most likely state as its diagnosis. This leads
to the accuracy of 78%.
In a problem as hard as testing whether the accuracy of Bayesian networks is
sensitive to their structure, no study will provide definitive answer. In addition to
increasing the sample size of models tested, we have several follow-up questions
and studies in mind. The first is applying different measures of accuracy. Pradhan
et al. [6], for example, focus on the posterior probability of the correct diagnosis.
11
While this measure has several disadvantages, which we discussed earlier [4], it
might lead to different results.
The strongest test of sensitivity to structure will be node removal. This is
equivalent to the problem of feature selection. When important features have
been removed, the accuracy will suffer. While the end result will never fall below
the 100% arc removal baseline, the shape of the curves pictured in Figures 3
and 4 may be different. We have indirectly touched this problem – when all
paths between a feature node and the disease node have been removed, the
feature node has been de-facto removed.
Acknowledgments
References
[1] Max Henrion, John S. Breese, and Eric J. Horvitz. Decision Analysis and Expert
Systems. AI Magazine, 12(4):64–91, Winter 1991.
[2] Judea Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible
Inference. Morgan Kaufmann Publishers, Inc., San Mateo, CA, 1988.
[3] Marek J. Druzdzel and Agnieszka Oniśko. The impact of overconfidence bias
on practical accuracy of Bayesian network models: An empirical study. In Silja
Renooij, Hermi J.M. Tabachneck-Schijf, and Suzanne M. Mahoney, editors, Work-
ing Notes of the 2008 Bayesian Modelling Applications Workshop, Special Theme:
How Biased Are Our Numbers?, Helsinki, Finland, July 9 2008. Annual Confer-
ence on Uncertainty in Artificial Intelligence (UAI–2008).
[4] Agnieszka Oniśko and Marek J. Druzdzel. Effect of imprecision in probabilities
on Bayesian network models: An empirical study. In Working Notes of the Euro-
pean Conference on Artificial Intelligence in Medicine (AIME–03) Workshop on
Qualitative and Model-based Reasoning in Biomedicine, pages 45–49, October 19
2003.
[5] Agnieszka Oniśko and Marek J. Druzdzel. Impact of precision of Bayesian network
parameters on accuracy of medical diagnostic systems. Artificial Intelligence in
Medicine, 57(3):197–206, 2013.
[6] Malcolm Pradhan, Max Henrion, Gregory Provan, Brendan del Favero, and Kurt
Huang. The sensitivity of belief networks to imprecise probabilities: An experi-
mental investigation. Artificial Intelligence, 85(1–2):363–397, August 1996.
[7] K. Bache and M. Lichman. UCI Machine Learning Repository.
http://archive.ics.uci.edu/ml, 2013. University of California, Irvine,
School of Information and Computer Sciences, USA.
12
[8] Agnieszka Oniśko, Marek J. Druzdzel, and Hanna Wasyluk. Extension of the
Hepar II model to multiple-disorder diagnosis. In S.T. Wierzchoń M. Klopotek,
M. Michalewicz, editor, Intelligent Information Systems, Advances in Soft Com-
puting Series, pages 303–313, Heidelberg, 2000. Physica-Verlag (A Springer-
Verlag Company).
[9] J. Czerniak and H. Zarzycki. Application of rough sets in the presumptive di-
agnosis of urinary system diseases. In Jerzy Soldek and Leszek Drobiazgiewicz,
editors, Artifical Inteligence and Security in Computing Systems, ACS’2002 9th
International Conference, pages 41–51, Norwell, MA, USA, 2003. Kluwer Aca-
demic Publishers.
[10] Igor Kononenko. Inductive and Bayesian learning in medical diagnosis. Applied
Artificial Intelligence, 7:317–337, 1993.
[11] Gregory F. Cooper and Edward Herskovits. A Bayesian method for the induction
of probabilistic networks from data. Machine Learning, 9(4):309–347, 1992.
[12] Agnieszka Oniśko, Marek J. Druzdzel, and Hanna Wasyluk. An experimental
comparison of methods for handling incomplete data in learning parameters of
Bayesian networks. In M. Klopotek, M. Michalewicz, and S.T. Wierzchoń, editors,
Intelligent Information Systems, Advances in Soft Computing Series, pages 351–
360, Heidelberg, 2002. Physica-Verlag (A Springer-Verlag Company).
[13] A.W. Moore and M.S. Lee. Efficient algorithms for minimizing cross validation
error. In Proceedings of the 11th International Conference on Machine Learning,
San Francisco, 1994. Morgan Kaufmann.
[14] B. Boerlage. Link strengths in Bayesian networks. Master’s thesis, Dept. of
Computer Science, The University of British Columbia, Vancuver, Canada, 1992.
[15] A.E. Nicholson and N. Jitnah. Using mutual information to determine relevance in
Bayesian networks. In Proceedings of the 5th Pacific Rim International Conference
on Artificial Intelligence. Springer, 1998.
[16] Imme Ebert-Uphoff. Measuring connection strengths and link strengths in dis-
crete Bayesian networks. Technical Report GT-IIC-07-01, Georgia Institute of
Technology, 2007.
[17] Imme Ebert-Uphoff. Tutorial on how to measure link strengths in discrete
Bayesian networks. Technical Report GT-ME-2009-001, Georgia Institute of Tech-
nology, 2009.
[18] Carmen Lacave and F. Javier Dı́ez. A review of explanation methods for heuristic
expert systems. The Knowledge Engineering Review, 19(2):133–146, 2004.
[19] J. R. Koiter. Visualizing inference in Bayesian networks. Master’s thesis, Delft
University of Technology, Delft, The Netherlands, 2006.
[20] Yuichiro Kanazawa. Hellinger distance and Akaike’s information criterion for the
histogram. Statistics and Probability Letters, 17(4):293–298, 1993.