Using: Neural Networks in Reliability Prediction
Using: Neural Networks in Reliability Prediction
Using: Neural Networks in Reliability Prediction
Networks in
Reliability
Prediction
NACHIMUTHU KARUNANITHI, DARRELL WHITLEY, and
YASHWANT K. MALAIYA
, Colorado State University
- - ._
54 JULY 1992
lation learning algorithm. The algorithm, neurons in the (first) hdden layer. cascade-correlation algorithm. The cas-
which dynamically constructs feed-for- Feed-forward networks can propagate cade network differs from the feed-for-
ward neural networks, combines the idea activations only in the forward direction; ward network in Figure 1a because it has
of incremental archtecture and learning Jordan networks, on the other hand, have feed-forward connections between I/O
in one training algorithm. It starts with a both forward and feedback connections. layers, not just among hidden units.
minimal network (consisting of an input The feedback connection in the Jordan In our experiments, all neural net-
and an output layer) and dynamically network in Figure 1b is from the output works use one output unit. O n the input
trains and adds hidden units one by one, layer to the hidden layer through a recur- layer the feed-forward nets use one
until it builds a suitable multilayer archi- rent input unit. At time t, the recurrent input unit; the Jordan networks use two
tecture. unit receives as input the output unit's out- units, the normal input unit and the re-
As the box on the facing page describes, put at time t - 1. That is, the output of the current input unit.
we chose feed-forward and Jordan net- additional input unit is the same as the
works as the two classes of models most output of the network that corresponds to Choosing lraitingdata. A neural network's
-
suitable for our prediction experiments. the previous input pattem. predictive ability can be affected by what it
Figure l a shows a typical three-layer feed- In Figure 1b, the dashed h e represents learns and in what sequence. Figure 3
forward network; Figure a fixed connection with a shows two reliability-prediction regimes:
l b shows a Jordan net- weight of 1.0. T h i s generalization training and prediction
work. weight copies the output training.
A typical feed-forward to the additional recur- Generalization training is the standard
neural network comprises The coxodecorrelation rent input unit and is not way of training feed-forward networks.
During training, each input i, at time t is
associated with the corresponding output
ot. Thus the network learns to model the
actual functionahty between the indepen-
dent (or input) variable and the dependent
(or output) variable.
Prediction training, on the other hand,
is the general approach for training recur-
rent networks. Under t h ~ straining, the
value of the input variable it at time t is
associated with the actual value ofthe out-
put variable at time t+1. Here, the network
leams to predict outputs anticipated at the
next time step.
neurons do not perform any computation; rithm to construct both feed-forward and Thus if you combine these two train-
they merely copy the input values and as- Jordan networks. Figure 2 shows a typical ing regimes with the feed-forward net-
sociate them with weights, feeding the feed-forward network developed by the work and the Jordan network, you get four
t
Output layer
(rumulstive fauhs)
A Output loyer
(tumulotive faults)
1
Q,
Hidden units
Input layer ~nputlayer ~5
(execution time) (execution time)
- ~.
Figure 2. Afeed-fmward network deoeloped by the
Figure 1. (A)A standard feed-forward network and (B) aJordan netvmk cascade-cowelation alprithm.
IEEE SOFTWARE 55
output
!3
Input
/
io ri
il
Time
[Bl
Figure 3. Two network-training regimes: (A) generalizatim trnining and (B) prediction trainhig.
0 20 40 60 80 100
Normalized execution lime
1 ~~
~
neural network prediction models: FFN training pairs. We used cumulative execu-
generalization, FFN prediction, JN gen- tion time as input and the corresponding
erahzation, a n d m prediction. cumulative faults as the desired output to
form a training pair. The algorithm then
Troini the network. Most feed-forward calculates a sum squared error between
networks and Jordan networks are trained the desired outputs and the networks ac-
using a supervised learning algorithm. tual outputs. It uses the gradient of the
Under supervised learning, the algorithm sum squared error (with respect to
adjusts the network weights using a quan- weights) to adapt the network weights so
tified error feedback There are several su- that the error measure is smaller in future
pervised learning algorithms, but one of epochs.
the most widely used is back propagation Training terminates when the sum
-an iterative procedure that adjusts net- squared error is below a specified toler- Method. Most training methods initial-
work weights by pro agating the error ance lunit. ize neural-network weights with random
back into the network. P values at the beginning of training, whch
Typically,traininga neural network in- PREDICTION EXPERIMENT causes the network to converge to differ-
volves several iterations (also known as ep- ent weight sets at the end of each training
ochs). At the beginning of training, the We used the testing and debugging session. You can thus get different predic-
algorithm initializes network weights with data fiom an actual project described by tion results at the end of each training ses-
a set of small random values (between + 1.0 Yoshiro Tohma and colleagues to illustrate sion. To compensate for these prediction
and -1.0). the prediction accuracy of neural net- variations, you can take an average over a
During each epoch, the algorithm works. In thls data (Tohas Table 4), ex- large number of trials. In our experiment,
presents the network with a sequence of ecution time was reported in terms of days we trained the network with 50 random
56 JULY 1992
Model
Average error Maximum error I
1 st half 2nd half Overall 1st half 2nd half Overall '
Neural-net models
FFNgeneralization 7.34 1.19 3.36 10.48 2.85 10.48
seeds for each training-set size and aver-
aged their predictions. FENprediction 6.25 1.10 2.92 8.69 3.18 8.69
JN generalization 4.26 3.03 3.47 11.00 3.97 11.00
Results. %er training the neural net- JNprediction 5.43 2.08 3.26 7.76 3.48 7.76
work with a failure history up to time t Analpc models
(where t is less than the total testing and
debugging time of 44 days), you can use Logarithmic 21.59 6.16 11.61 35.75 13.48 35.75
the network to predict the cumulative Inverse polynomial 11.97 5.65 7.88 20.36 11.65 20.36
faults a t the end of a future testing and Exponential 23.81 6.88 12.85 40.85 15.25 40.85
debugging session. Power 38.30 6.39 17.66 76.52 15.64 76.52
To evaluate neural networks, you can Delayed S-shape 43.01 7.11 19.78 54.52 22.38 54.52
use the following extreme prediction hori-
zons: the next-step prediction (at t=t+l)
and the endpoint prediction (at t=46).
Since vou alreadv know the actual cu-
mulanve faults for those two future testing - -
and debuggmg sessions, you can compute
the netw&%'s-prediction. error at t. Then
the relative prediction error is given by
(predicted faults - actual faults)/actual
faults.4
Figures 4 and 6 show the relative pre-
diction error curves of the neural network
models. In these figures the percentage
prediction error is plotted against the per-
centage normalized execution time t/%.
Figures 4 and 5 show the relative error
curves for endpoint predictions of neural
networks and five well-known analytic
models. Results fkom the analytic models
are included because they can provide a
better basis for evaluating neural net-
works. Yashwant Malaiya and colleagues
give details about the analpc models and
fitting T h e graphs suggest
that neural networks are more accurate 0 20 40 60 80 100
than analytic models. i Normulized exetutioii tiiiie
57
15 -
20 -
15 -
IO -
I
,p~0+"'1 t,)
1st half 2nd half Overall 1st half 2nd half Overall j +
errors are above the analyticmodels in the NEURAL NETWORKS VS. ANALYTIC MODELS where PO, PI, and pz are the model pa-
first half by only two to four percent and
rameters, which are determined by
the difference in the second halfis less than In comparing the five analytlc models weights feeding the output unit. In thls
two percent, these two approaches don't and the neural networks in our experi- model, PO= -WO and p 1 = -u1, and pz = -wh
appear to be that different. But worst-case ment, we used the number of parameters
prediction errors may suggest that the an- as a measure of complexity; the more pa- (the weight from the hidden unit). How-
alytlc models have a slight edge over the rameters, the more complex the model. ever, the output of h, is an intermediate
neural-network models. However, the dif- Since we used the cascade-correlation value computed using another two-pa-
ference in overall average errors is less algorithm for evolving network archtec- rameter logistic-function expression:
than two percent, which suggests that h 1
m e , the number of hdden units used to 1-
both the neural-network models and the 1 +?-(U 3+"4 til
learn the problem varied, depending on
58 JULY 1992
Thus, the model has five parameters d y beginning to tap the potential ofneu- recognize that our approach is very new
that correspond to the five weights in the al-network models in reliability, but we and still needs research to demonstrate its
network. believe that &IS class of models will even- practicality on a broad range of software
ually offer significant benefits. We also projects. +
FFN prediiion. In h
s model, for the net-
work with no hidden unit, the equivalent
1 two-parameter model is
ACKNOWLEDGMENTS
We thank IEEE Sofnuare reviewers for their useful comments and suggestions.We also t h a n k Scott
Fahhan for providing the code for his cascade-correlationalgorithm.
where the tr-l is the cumulative execution This research was supported in part by NSFgrant IN-9010546, and in part by a project funded by the
time at the z-lth instant. SDIOflST and monitored by the Office of Naval Research.
For the network with one hidden unit,
the equivalent five-parameter model is REFERENCES
1 1. S.Fahlman and C. Lebiere, The Cascaded-Cxrrelation Learning Architecture,Tech. Report (MU-(3-
MtJ =
+ ,(PO+Pl tr-l+Pz b,) 90-100, CS Dept., Carnegie-Mellon Univ., Pittsburgh, Feb. 1990.
2. D. Rumelhart, G.Hmton, and R. \Villiamns, Leaming Intemal Representations by Error Propagation, in
ParallelDimbuted Pmcessmg, VolumeI, MIIT Press, Cambridge, Mass., 1986,pp. 3 18-162.
I m p l i i n ~ .These expressions imply 3 . Y. Tohma et al., Parameter Esdmation ofthe Hyper-Gometric Distribution Model for Real Test/Debug
that the neural-network approach devel- Data, Tech. Report 901002, CS Dept., ToLyo Inst. ofltchnology, 1990.
4. J. M u s a , A. Iannino, and K. Okunioto, .Sofii,ure Reliability -Measurmrent, U - ~ d h ~Appluutio?rr,
n, ;McGraw-
ops models that can be relatively complex. HiU,NewYork, 1987.
These expressionsalso suggest that neural 5 . Y Mabya, N.Karunanithi, and P. Verina, PredictabilityMeasures for Software Reliability .Wxkk,
networks use models of varying complex- IEEE Trans.Relizbility Eng. (to appear).
ity at different phases of testing. In con- 6. Sojhare Reliability Models: Theowhcal Dmelqwents, Erulirutrona~zJAppIirnnunr, Y. Malaiya and P. Srunani,
eds., IEEE C;S Press, Los Alamitos, Calif., 1990.
trast, the analyttcmodels have only two or
three parameters and their complexity re-
main static. Thus, the main advantage of
neural-network models is that model coni-
plexity is automatically adjusted to the com-
plexity of the failure history. Nachimuthu Karunanithi IS a PhD candidate in computer science at C ~ i i l i i r a dState
i~
University.His research interests are neural ncnrrirks, genetic algorithnis, and sofhvare-
reliability modeling-.
Kanmanithi received a BE in clectric.il enpnccring from PSG Tech., 3ladras Uni-
W e have demonstrated how you can versity, in 1982 and an ME in ciimputer science k0ni Anna Uniremity, hladrds, in 1984.
He is a member of the suhcominittee iin software rehdhility c n + m i n g ofthe IEEF.
use neural-network models and Chnputer Societys.khnical (:onimittcc on Softuare F,nginccring.
training regimes for reliability prediction.
Results with actual testing and debugging
data suggest that neural-network models
are better at endpoint predictions than an-
a l p c models. Though the results pre-
sented here are for only one data set, the
results are consistent with 13 other data Darrell Whitley i s an associate professor of computer science at Colorado State Cni-
sets we tested. versity. He has published inore than 30 papers on neural netu-orksand genetic dgo-
T h e Inajor advantages in using the lithms.
Whitley received an .MSin computer science and a PhD in anthropology, both
neural-network approach are from Southem Illinois University. 1 IC serve.; on the <k)vcrningB o d of the Interna-
+ It is a black-box approach; the user tional Society for Genetichlgorithms and is p r o p m chair ofboth the l W 2 Workshop
need not know much about the underlying on Combinations of Genetic hlgorithm\ and Neural Networks and the 1092 Founda-
failure process of the project. tions of Genetic iUgorithms IVorksh(ip.
+ It is easy to adapt models of varying
complexity at different phases of testing
w i h n a project aswell as across projects Yashwant K. Malaiya is a g u e ~editor
t ofthi?q)rcidl issue. His phiitograph and biography appcar on p. I?.
+ You can simultaneously construct a
model and estimate its parameters if you
use a training algorithm like cascade cor-
relation. Address questions dlxIut this arhck til Kininanithi ar C S Dept., Ci~loradoState Vnhersity, Fort <;ollins, <;O
IVe recognize that our experiments are 80523 ; Intemet kanindniQcs.co~ostate.e(~u.
IEEE SOFTWARE 59