Comparison of Artificial Neural Network and Bayesian Belief Network in Computer-Assisted Diagnosis Scheme For Mammography
Comparison of Artificial Neural Network and Bayesian Belief Network in Computer-Assisted Diagnosis Scheme For Mammography
Comparison of Artificial Neural Network and Bayesian Belief Network in Computer-Assisted Diagnosis Scheme For Mammography
4182
22. Topology of networks and GA initialization of the chromosomes was set as 50. In order to incorporate
our experience in the feature selection and also to achieve
In these experiments, all 38 features were used to train and a diverse initial population, about one third of the initial
test the classifiers. The topologies of the ANN and the chromosomes were manually selected with a small number
BBN are different. The ANN has 38 input neurons, 16 of bits of I (S IO), while the rest of initial population was
hidden neurons, and one output neuron. The topology of randomly assigned by GA sohare. Meanwhile, the
the BBN was similar to the BBN that we have developed crossover rate, the mutation rate, and the generation gap
and tested for mass detection in our previous studies [20]. used in the GA were set at 0.6, 0.001, and 1.0,
Basically, the six global features are located in the top respectively.
layer of the network and comprise the ‘barent” nodes to
detection node (output for the mass identification). The 32 2.3. Optimization of feature selection
local features are ‘Wild nodes” located in a layer below
the detection node. Unlike the ANN where the input From initial 38 features, we used GA to select sub-sets of
features are continuous data (e.g., from 0 to l), in a BBN,
each node must be quantified to a fixed number of
features, xi,i = 1,2,... n , where n I38 . The selected
exclusive states. The continuous data for these features features were then connected to the input neurons in the
must be converted to discrete data. In this study, each ANN and the probability nodes in the BBN. The number
feature was divided into five discrete states. The methods ..
of hidden neurons (hj j = 1,2,. rn) in the ANN was
to convert these features into discrete states have been determined as half of the input neurons, or
reported elsewhere [20]. Although each feature vector
contains 38 features, many of them might be redundant.
+
m = (int)(0.5 n / 2 ) . There is one neuron in the
The redundant features used in the input nodes of an ANN output layer to represent the result of mass detection. The
or a BBN make very little contribution to information but detailed description of the ANN structure used in our CAD
add a lot of noise, which result in poor generalization for scheme was previously reported [19]. In this study, the
the networks. Evan though the topology of a BBN can be number of training iterations was fixed at 1,000. The
interpreted, manual selection of independent and effective momentum and learning rate were set as 0.8 and 0.01,
features is a difficult task. To find a small number of respectively. In the BBN, the features selected by the GA
independent features and eliminate the redundant ones in were located in the different nodes. If the selected feature
this feature vector, a genetic algorithm (GA) was used to was a global feature, it was placed in the top layq
optimize feature set and topologies of the ANN and the otherwise the feature was connected to one of the ‘Wild”
BBN. The GA sohare, GENESIS, was acquired from nodes in the BBN.
Prime Time Freeware for AI [21] and used in this study.
Because a GA is a task independent optimizer, users must
A GA solves a complex optimization problem by provide or define a fitness function and an evaluation
emulating evolutionary concepts that on& the strongest criterion, so that the GA has an optimization goal. In the
stvyiyes.A population of possible chromosomesis created, experiments, the fitness function was the receiver
evaluated, recombined, and mutated to generate more and operating characteristic (ROC) curve, and the evaluation
different chromosomes. The best are kept as a basis for criterion was the maximum area under the ROC curve
evolving better chromosomes. In general, a GA involves ( A , value). ROC curves were generated by the ROCFIT
the steps of initialization, evaluation, selection, search, and program [23], based on output data fiom the networks.
termination [22]. Although the software, GENESIS, Once GA selected a chromosome, a set of features was
provides a basic program structure for optimization, users also extracted to be used in the networks. The training
need to determine m y detailed parameters and functions, database was used to train the ANN by setting its weights
such as encoding, fitness function, and evaluation or to train the BBN by computing the conditional
criterion. probability table. After training the networks, the second
database (or evaluation database) was used to examine the
Based on their distributions in our databases, each of the performance of the networks. The output values of
38 features was normalized to a range between 0 and 1. In evaluation were directly used as input data to compute a
the GA, a binary coded chromosome was used. Each ROC curve. The ROCFIT program produced an A, value
feature corresponded to a gene in a chmosome (or a bit
of the structure defined in GEAESIS). In this binary coded for each selected chromosome. Chromosomes with higher
chromosome, I indicates the presence of a gene (the A,values have higher priority to be selected to generate
feature is used as an input node) and 0 indicates its absence new chromosomes in next generation by the GA using the
(the feature is rejected). In our experiments, all techniques of crossover and mutation. The GA was
chromosomes have fixedlength of 38 (including six global terminated when it had converged to a maximum A, value
features and 32 local features). The initial population size
4183
or the searching generation has reached to 50. Two and optimized using different databases are not
chromosomes that produced highest A,values for the comparable [24]. In this study, we used the same database
and the same optimization protocol to train and test two
ANN and the BNN were selected to build an optimal ANN
machine learning classifiers, an ANN and a BBN. Hence,
and an optimal BBN, respectively. The performance and
robustness of these networks were compared using another the performance of these two networks can be objectively
compared. When applied to a new independent database,
independent testing database containing 312 true-positive
the perfiormance deterioration of a CAD scheme may be
mass regions and 1,261 false-positive regions. This test
caused by two factors. The fist is bias in the training
database was never involved in any of the optimization
samples due to the Iimited size of the training set
processes. Finally, we tested a hybrid classifier, in which
feature vectors passed through two networks separately, compared to the diverse feature distributions in real
clinical testing populations. The second is data over-fitting
and the ultimate output was the average score from the
in certain learning algorithms, which makes classifiers
outputs of the two networks.
much more sensitive to the noise patterns in the testing
samples. The ANN and BBN represent two very typical
III. Results and popular classifiers used in CAD. The ANN uses a
black box ‘hill-climbing” method to search for the
Using the same training database and all 38 features to relationship between training samples and classification
train and test the ANN and BBN, we achieved A, values results. Both the sample bias and the data over-fitting have
of 0.791 f 0.012 and 0.783 f 0.011 on the evaluation impact in the robustness of the ANN. The BBN uses
database involving 172 positive mass regions and 1,876 Bayesian probability theory to find the optimal relationship
negative regions for the ANN and BBN, respectively. between the input features and output results. Although
there is no data over-fitting problem in the BBN, bias in
Table 1 demonstrates that after GA o p t i d o n , the the learning samples generates incorrect probability tables
number of features selected in both ANN and the BBN that reduce the robustness of the BBN. As a result, the
have been significantly reduced. Less than half of the ANN usually outperforms the BBN in training results, but
original 38 features were retained for these two networks. falls behind in testing new cases. In our experiments,
Although the features selected in two networks were not several methods have been used to minimize over-fitting
exactly the same, the perfonnance levels of the two during ANN training, which include limiting the number
networks converged to the same level. of training iterations and maintaining a large ratio between
momentum and learning rate [ll]. The training iteration
Table 1:Optimization results for the ANN and the BBN. number was limited to 1,000 and the ratio between
momentum and learning rate was set at 80.
I Network I Numberof I Number of I La A I
local features global features Although data over-fitting is a potential danger to testing
A” 12 2 0.866 an ANN, by using a GA and setting an appropriate fitness
BBN 14 3 0.868 criterion, the impact of over-fitting can be significantly
reduced, as shown in this study. In both optimization and
In an independent test of these optimal networks, the independent testing, the ANN and the BBN achieved the
results for the ANN and the BBN remained at the same same performance level (A, value), which clearly
level. For the ANN, A, value was 0.847 f 0.014, and for indicates that, in this experiment, the performance
‘tletmioration” in independent testing is mainly caused by
the BBN, A, value was 0.845 f 0.01 1. Finally, using a the bias in the training database. There is no significant
hybrid classifier containing both the ANN and the BBN, difference between using an ANN and a BBN in our CAD
the A, value on the test database was increased to 0.859 f scheme for mass detection, as long as each network has
0.01. been properly optimized or trained. This study
demonstrated that improving performance and robustness
of CAD schemes might be more dependent on feature
IV.Discussion selection and database diversity than on any particular
machine learning or classification algorithm.
Objectively evaluating CAD performance and robustness
is a very complicated and difficult task [18]. The
performance of a scheme depends on many factors, such as V. Acknowledgements
case difficulty in the training and testing databases [24],
the size of training database [19], validation methods [25], This work is supported in part by the National Cancer
and the ground truth for the comparison [26]. Basically, Institute under the grants of CA77850 and CA79587, and
the CAD schemes that developed at different institutions US Army under mt DAMD17-98-1-8018.
4184
13. Teach RL, Shortliffe EH, An analysis of physician attitudes
VI. References regarding computer-based clinical consultation systems,
Cbmput. Biomed.Res. 1981; 14:542-548.
14. Chan HP, Sahiner B, Wagner RF, Petrick N, Effects of
1. Vybomy CJ , Giger ML, Computer vision and artificial sample size on classifier design for computer-aided
intelligence in mammography, Am J. Roentgen 1994; diagnosis,Proc SPZE 1998; 3338~845-858.
162699-708. 15. Mitchell TM, Machine learning, WCB McGraw-Hill,
2. Kegelmeyer WP,Pnmeda JM, Bourland PD, Hills A, Eggs Boston, MA, 1997.
M W , Nipper ML, computer-aided mammographic screening 16. Michie D., Spiegelhalter DJ, Taylor CC, Machine learning,
for speculated lesions,Radwbgy 1994; 191:331-337. neural and statistical clawijhtion, Ellis Horwood, New
3. Zheng By Chang YH, Gur D, Computerized detection of York, NY,1994.
masses from digitized mammograms using a single image 17. Kahn CE, Roberts LM, Shaffer KA, Haddawy P,
segmentation and a multi-layer topographic feature analysis, Construction of a Bayesian network for mammographic
A d Radiol 1995;2:959-966. diagnosis of breast cancer, Conput. Biol. Med. 1997; 27:19-
4. Zhang W, Doi K, Giger ML, Nishikawa RM, Schmidt RA, 29.
An improved shift-invariant artificial neural network for 18. Nishikawa RM, Variations in measured performance of
computerized detection of clustered microcalcifications in CAD schemes due to database composition and scoring
digital mammograms,Med Phys 1996; 23:595-601. protocol, Proc SPZE on MedZmaging 1998; 3338:838-844.
5. Sahiner B,Chan HP, Wei D, Petrick N, Helvie MA, Adler 19. Zheng B, Chang YH, Good WF, Gur D, Adequacy testing of
DD, Goodsilt MM, Image feature selection by a genetic training set sample sizes in the development of a computer-
algorithm: application to classification of mass and normal assisted diagnosis scheme,AcadRadw11997; 4:497-502.
breast tissue, MedPhys 1996; 23:1671-1684. 20. Zheng B, Chang YH, Wang XH, Good WF, Gur D,
6. Li L, Qian W, Clarke LP, Computer-assisted diagnosis Application of a Bayesian belief network in a computer-
method for mass detection with multiorientation and assisted diagnosis scheme for mass detection,Proc SPZE on
multiresolution
-_
wavelet transforms, Acad Radiol 1997; M e d Z ~ g i n g1999; 3661-167.
4:724-73 1. 21. KantmGk My(editor), hime time freeware for AI, h u e 1-
7. Polakowski WE, Coumoyer DA, Rogers SK, Computer- 1, Selected materials from the Carnegie Mellon University
aided breast cancer detection and diagnosis of masses using Artificial Intelligence RepositoryJ994.
difference of Gaussians and derivative-based feature 22. G, Genetic algorithms,in chapter 12 offie -book
saliency,IEEE Trans Med Imaging 1997; 16:811-819. of appliedexpert Wtems, edited by Liebowik J, CRC Press,
8. Rymon R, Zheng B, Chang YH, Gur D, Incorporation of a Boca Raton,FL, 1997.
set enumeration treebased classifierinto a hybrid computer-
assisted diagnosis scheme for mass detection,Ad Radiol
23. Kronman =, wangpL, shen m, ROCFIT: a
modified maximum likelihood algorithm for estimating a
1998; 5:181-187. binormal ROC curve from confidencerating data,
9. Cheng HD, Lui YM, Freimanis, A novel approach to University of Chicago, Chicago, E, 1985.
microcdcification *tion using filzzy logic technique, 24. NiShikawa, RM,Giger a,mi K, Effect of case selection
ZEEE Trans MedZmaging 1998; 17:442-450. on the performance of computer-aided detection schemes,
10. Yu S, Guan L,Brown S, Automated detection of clustered M d P h y ~1994; 21~265-269.
microcalcifications in 25. Tourassi GD, Floyd CE, The effect of data sampling on the
ElectronicImaging 1999,8:76-82. performance evaluation of artificial neural networks in
11. Schalkoff R, pattern recornition: statistical. structural and medical diagnosis,Med. De&. Making 1997,17:186-197.
neural~roachesJohnWile~&sOns,Inc.NewYork,~, 26. mer@ M, meyGM, Gaviria J, Evaluating the
1992. performance of detection algorithms in digital
12. Diederich J, Explanation and artificial neural networks,Int. mammography,Med. Phys. 1999,26267-275.
J. M~~-Machine Stud 1992;37:335-341.
4185