Efficient Defect Estimation Method

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

An Efficient Defect Estimation Method for Software Defect Curves

1
1
Supported by the National Natural Science Foundation of China (60233020), the Aviation Science Foundation of China (01F51025) and the 863
Program of China (2001AA113192).
Chenggang Bai
Department of Automatic
Control, Beijing University of
Aeronautics and Astronautics
Beijing 100083, China
[email protected]
Kai-Yuan Cai
Department of Automatic
Control, Beijing University of
Aeronautics and Astronautics
Beijing 100083, China
[email protected]
T.Y. Chen
School of Information
Technology, Swinburne
University of Technology
Hawthorn 3122, Australia
[email protected]
Abstract
Software defect curves describe the behavior of the
estimate of the number of remaining software defects as
software testing proceeds. They are of two possible
patterns: single-trapezoidal-like curves or multiple-
trapezoidal-like curves. In this paper we present some
necessary and/or sufficient conditions for software defect
curves of the Goel-Okumoto NHPP model. These
conditions can be used to predict the effect of the
detection and removal of a software defect on the
variations of the estimates of the number of remaining
defects. A field software reliability dataset is used to
justify the trapezoidal shape of software defect curves and
our theoretical analyses. The results presented in this
paper may provide useful feedback information for
assessing software testing progress and have potentials in
the emerging area of software cybernetics that explores
the interplay between software and control.
1. Introduction
Software defects play a key role in software reliability
study. Therefore, it is very important to study the
properties of software defects. As pointed out by
Yourdon [1], "Defects are not the only measure of
quality, of course; but they are the most visible indicator
of quality throughout a project". The problem has been
studied by many researchers [2-4]. Many early studies of
defect occurrence suggested that it followed a Rayleigh
curve [5, 6], roughly proportional to project staffing. The
underlying assumption is that the more effort expended,
the more mistakes are made. McConnell [7] has discussed
the relationship between defect rate and development
time. In his observations, projects that achieve the lowest
defect rates also achieve the shortest schedules. Since
software testing is the immediate phase prior to the
release of software, it will be most interesting to know
more about the relationship between the number of
remaining software defects and testing process. Although
there have been some experimental curves to depict the
relationship between the number of remaining software
defects and testing process, a thorough and rigorous
analysis has not yet been conducted. In our previous
investigation [8], we have presented the notion of
software defect curve to depict such a relationship. This
paper is a follow-up of our previous investigation.
Software defect curves describe the behavior of the
estimate of the number of remaining software defects as
software testing proceeds. Figure 1 shows a typical
pattern of software defect curves under a time-
homogenous test profile. In Figure 1, n denotes the
number of software defects detected and removed, and
n
M

denotes the estimate of the number of defects


remaining in the software after n defects are detected and
removed. In the early testing phase, as more and more
software defects are detected and removed, the estimate
of the number of remaining defects tends to increase. In
the middle testing phase, the estimate of the remaining
defects remains steady. In the late testing phase, the
estimate of the number of remaining defects tends to
decrease.
If the test profile is time-nonhomogenous, changing
from one to another, then the expected trend of software
defect curves demonstrates a multiple-trapezoidal-like
curve as shown in Figure 2.
The trapezoidal shape of software defect curves has
been recently proposed [8]. It was motivated mainly by
the question: as software testing proceeds and software
reliability tends to grow, whether or not the estimate of
the number of remaining software defects tends to
Proceedings of the 27th Annual International Computer Software and Applications Conference (COMPSAC03)
0730-3157/03 $ 17.00 2003 IEEE
Figure 2 Expected trend of software defect
curves under multiple test profiles
decrease. This question was triggered by the empirical
observation that, under certain circumstances, the
estimate of the initial number of software defects tends to
increase as software testing proceeds[2].
Obviously, software defect curves help to answer the
questions. A considerable amount of theoretical analyses
and simulation studies have been conducted to justify the
trapezoidal shape of software defect curve and its
potential applications [8]. Software defect curves may
help to detect changes in software test profile, identify
stages of software testing, improve software testing
strategies, and so on. In order to do so, it is necessary to
analyze the actual trend of a software defect curve. In
particular, we need to characterize the relationship
between
n
M

and
1

+ n
M
(refer to Figure 1). In this paper
we present some necessary and/or sufficient conditions
for the relationship. These conditions are easily verifiable.
These results also supplement our previous theoretical
analyses of software defect curves [8].
The rest of the paper is organized as follows. Section 2
reviews the Goel-Okumoto NHPP model of software
reliability. Section 3 presents necessary and/or sufficient
conditions for software defect curves under the Goel-
Okumoto NHPP model. In Section 4 presents a software
defect curve based on Musas software reliability data.
Concluding remarks are presented in Section 5.
2. The Goel-Okumoto NHPP model
The Goel-Okumoto NHPP model is one of the most
important models in software reliability engineering. It is
based on the following assumptions [9].
(1). Software is tested under anticipated operating
environment.
(2). For any set of finite time instants
1
t ,
2
t , ,
n
t ,
the numbers of software failures
n
q q q , , ,
2 1
observed
in the time intervals ] , ( , ], , ( ], , 0 (
1 2 1 1 n n
t t t t t

respectively, are independent.


(3). Every software defect has an equal chance of
being detected.
(4). The accumulative number of software failure
observed up to time t, ) (t N , follows a Poisson
distribution with the mean value ) (t m such that the
number of software failures observed in the interval
) , ( t t t A + is proportional to the interval length and to
the number of remaining software defects at time t.
(5). ) (t m is a bounded and non-decreasing function
with
m(t)=0, as t0, and m(t) a , as t
where a is the total number of software failures
eventually observed.
With these assumptions, we have
m(t)= a {1-exp(-bt)}
In this way
} ) ( ) ( Pr{
1 i i i
t N t N q =

!
)} ( ) ( exp{ )] ( ) ( [
1 1
i
i i i i
t m t m t m t m
i
q
q

=

Suppose that there is an one-to-one correspondence
between failure and defect, and each failure-causing
defect is removed immediately upon the detection of a
failure, without introducing new defects. Then, the
number of software failures is equal to the number of
software defects. Thus the Goel-Okumoto NHPP model
can be used to estimate the number of software defect [2].
Originally in the Goel-Okumoto NHPP model
} , , , {
2 1 n
t t t can be any observation time instants
given a priori and the model deals with } , , , {
2 1 n
q q q
which are classified as the type II data
2
[10]. It should be
noted that Goel-Okumoto NHPP model deals with both

2
Type I data mean that software reliability data are represented in terms
of the time intervals between successive software failures. Type II data
mean that software reliability data are represented in terms of the
numbers of failures observed in successive time intervals.
n
M

Early phase Middle phase Late phase


n
Figure 1 Expected trend of software defect
curves under a single test profile
n
M

n
Test profile 1 Test profile 2
Test profile 3
Proceedings of the 27th Annual International Computer Software and Applications Conference (COMPSAC03)
0730-3157/03 $ 17.00 2003 IEEE
type I and type II data. Since type I data is more accurate
than type II data, we use type I data in this study.
Here we still treat t
i
as the time instant of the ith
software failure. Let the software failure process be
represented in Figure 3, where } , , 1 : { n i X
i
= is a series
of time intervals between successive software failures
with
1
=
i i i
T T X
.
Suppose
n
X X X , , ,
2 1

are independent and
i
t is
realization of
i
T . Then the joint density distribution
function of
n
T T T , , ,
2 1

can be determined by
3
} 1 ( exp{ } exp{ ) ( ) , , , (
1
2 1
n
bt
n
i
i
n
n
e a t b ab t t t L

=
=
_
Hence, the maximum likelihood estimates of a and b
are determined by the following equations
n n
t b
n
e
n
a

=
(1)
_
=

+ =
n
i
i
t b
n n
n
t e t a
b
n
n n
1

(2)
Here we are more interested in
n
M

than
n
a , with
n a M
n n
=

(3)
Let z=
n
b

t
n
and
n
n
i
i
n
nt
t
P
_
=
=
1 , the followings are after
equations (1)-(3)
1

=
z n
e
n
M
(4)
1
1 1

=
z n
e z
P
(5)

3
The joint density distribution function can be obtained from most
books, e.g. [11], about non-homogeneous Poisson process.
Let ( ) w z be the inverse function of
1
1 1

=
z
e z
w
, and
( )
( )
1
1

=
w z
e
w h
. In [8], we have proved the following
proposition.
Proposition 1
1

+ n
M
~
n
M

iff n+1h (
1 + n
P
) ~n h (
n
P
),
where ~ can be >, = , or < . Q.E.D.
In the next section, we will present another necessary
and sufficient condition for software defect curves, which
is easier to be verified.
3. Necessary and sufficient conditions
From equations (4) and (5), we note
n
M
M
n
P
n
n
n

1 ln(
1

+
=
Thus, we have
0 lim
0

n
M
P
n
Let

=
>
+ = =
0 0
0
)
1
1 ln(
1
) (
x
x x
x
x f y
Lemma 1 ) (x f y = is continuous and strictly
increasing on ) , 0 [ .
Proof It is obvious that ) (x f y = is continuous on
) , 0 [ .
Furthermore, for ) , 0 ( e x , ) (x f y = is equivalent
to

=
x
e
y
e
x

1
1
1 1
1
1

(6)
So,
1
d
d 1
d
d
2
=
x x
y

1
) 1 ( 1
2
2

e
e
It is easy to prove that
0 1
) 1 ( 1
2
2
>

e
e
, for 0 > .
Therefore ) (x f y = is strictly increasing for ) , 0 ( e x .
Finally, 0 ) 0 ( = f , and 0 ) ( > = x f y for ) , 0 ( e x
because
x x
1
)
1
1 ln( < + . Hence, we can conclude that
) (x f y = is strictly increasing on ) , 0 [ e x . Q.E.D.
Corollary 1 ) ( ) (
1
y f y k x

= = is continuous and
strictly increasing on )
2
1
, 0 [ .
Figure 3 Software failure process
X
1
X
2
X
i
t
T
0
=0 T
1
T
i-1
T
2
T
i
Proceedings of the 27th Annual International Computer Software and Applications Conference (COMPSAC03)
0730-3157/03 $ 17.00 2003 IEEE
Proof Note 0 ) 0 ( = f and
2
1
)
1
1 ln(
1
lim =
+

x
x
x
.
Therefore, we have
)
2
1
, 0 [ e y
, and ) ( y k x = is
continuous and strictly increasing on
)
2
1
, 0 [
as a result of
Lemma 1. Q.E.D.
Corollary 1 ) ( ) (
1
y f y k x

= = is continuous and
strictly increasing on
)
2
1
, 0 [
.
Proof Note 0 ) 0 ( = f and
2
1
)
1
1 ln(
1
lim =
+

x
x
x
.
Therefore, we have
)
2
1
, 0 [ e y
, and ) ( y k x = is
continuous and strictly increasing on
)
2
1
, 0 [
as a result of
Lemma 1. Q.E.D.
Lemma 2 For ) , 0 ( e x ,
0
d
d
2
2
<
x
y
.
Proof Refer to equation (6).
2
2
2
2
2
2
2
)
d
d
).(
d
d
.
d
d
.
d
d
d
d
(
d
d
x x
x y y
x
y

=
2
3
)
d
d
(
) 1 (
2 2
x e
e e


=
It is easy to verify that 0 2 2 <

e e for
0 > . This completes the proof. Q.E.D.
Lemma 3 ) ( y k is a strictly convex function over
)
2
1
, 0 [
.
Proof First,
0 )
d
d
/(
d
d
.
d
d
)
d
d
(
d
d
d
d
2
2
2
2
2
> = =
x
y
x
y
y
x
y
x
y y
x
Hence, ) ( y k a strictly convex function over
)
2
1
, 0 [
.
Furthermore, we have
y
x
y
x
y
x
y
k y k
k
y y y
d
d
lim
d
d
lim lim
0
) 0 ( ) (
lim ) 0 (
0 0 0
/

= = =

0
) 1 (
lim
2 2
2
=

=

e e
e
0
0
) 0 (
d
d
lim ) 0 (
/
0
//
=

y
k
y
x
k
y
This implies that ) ( y k is a convex function defined
on
)
2
1
, 0 [
, that is,
)
2
1
, 0 [ e y
.
) 1 , 0 ( e , there holds
) ( ) 0 ( ) 1 ( ) 0 ) 1 (( y k k y k + s +
or ) ( ) ( y k y k s
In order to show that ) ( y k is a strictly convex
function defined on
)
2
1
, 0 [
, we need to show
) ( ) ( y k y k = for
)
2
1
, 0 [ e y
, ) 1 , 0 ( e . This is to be
done by contradiction. Suppose
)
2
1
, 0 [ e -y
, ) 1 , 0 (
0
e ,
such that ) ( ) (
0 0 0 0
y k y k =
Let ) (
0 0
y k x = or ) (
0 0 0 0
y k x = . Then
0
0
0
)
1
1 ln(
1
x
x
y
+
=
0 0
0 0
0 0
)
1
1 ln(
1
x
x
y


+
=
However, we also have
)
)
1
1 ln(
1
(
0
0
0 0 0
x
x
y
+
=
Thus,
0 0
1
0
1
1 )
1
1 (
0
x x

+ = +
On the other hand, we have
0 0
1
0
1
1 )
1
1 (
0
x x

+ > +
, )
2
1
, 0 (
0
e x , ) 1 , 0 (
0
e
This leads to a contradiction. Q.E.D.
Theorem 1
1

+ n
M
~
n
M

iff ) ( ) 1 (
1 +
+
n
P k n ~ ) (
n
P nk ,
where ~ can be >, <, or =.
Proof The proof follows after
)

(
n
M
f P
n
n
=
and that
(.) k is strictly increasing function. Q.E.D.
Theorem 2 If
n n
P P >
+1
, then
n n
M M

1
>
+
.
Proof
n n
P P >
+1
implies ) ( ) (
1 n n
P k P k >
+
, which in
turn implies ) ( ) ( ) 1 (
1 n n
P nk P k n > +
+
. Q.E.D.
Theorem 3 If
n n
nP P n < +
+1
) 1 ( , then
n n
M M

1
<
+
Proof Note (.) k is a strictly increasing and convex
function, and 0 ) 0 ( = k . Hence:
)
1
( ) (
1 n n
P
n
n
k P k
+
<
+
=
) 0
1
1
1
(
+
+
+ n
P
n
n
k
n
Proceedings of the 27th Annual International Computer Software and Applications Conference (COMPSAC03)
0730-3157/03 $ 17.00 2003 IEEE
) (
1
) 0 (
1
1
) (
1
n n
P k
n
n
k
n
P k
n
n
+
=
+
+
+
<
This completes the proof as a result of Theorem 1.
Q.E.D.
4. Example of software defect curves
In this example, we use the software reliability dataset
of Musa [12].
n
M

( 136 1 s s n )can be computed from


Equations (1)-(3). Figure 4 depicts the trajectory of
n
M

.
Disregarding the portion of
n
M

for 27 1 s s n , which is
over fluctuating and unreliable as anticipated at the very
early stage of software testing, Figure 4 matches the
pattern of Figure 2. For 95 28 s s n , the behavior of
n
M

is displayed in Figure 5, which also matches the pattern of


Figure 1.
0 20 40 60 80 100 120 140
0
20
40
60
80
100
120
140
160
180
n
n
M

Figure 4 Trajectory of
n
M

for the Musa Data


20 30 40 50 60 70 80 90 100
0
5
10
15
20
25
30
35
40
n
n
M

Figure 5 Trajectory of
n
M

for the Musa Data


from 28 = n to 95 = n
The example software defect curves in Figures 4 and 5
demonstrate:
(1) The trapezoidal shape of software defect curves
complies with empirical observations.
(2) Although the expected trend of a software defect
curve may demonstrate a single-trapezoidal-like curve or
a multiple-trapezoidal-like curve, fluctuations are
associated with the software defect curve throughout the
testing process.
(3) For the trapezoidal-like software defect curve, if
n
M

tends to increase, then the software testing is in its early


phase with respect to the corresponding test profile; if
n
M

tends to fluctuating around certain values, then the


software testing is in its middle phase with respect to the
corresponding test profile; if
n
M

tends to decrease, then


the software testing is in its late stage with respect to the
corresponding test profile.
Once we know whether
n
M

tends to increase or
decrease, we can judge the phase of software testing. As
shown in Equations (1)(3),
n
M

can be calculated only


after we know
n
b

. However, Equation (2) is an implicit


function of
n
b

. Therefore, it is very difficult to calculate


n
M

on-line during the process of software testing. In


order to solve the problem, we have developed Theorems
1-3 to provide other more easily computable conditions.
These conditions involve
n
P which could be computed
directly from the data collected during the process of
software testing. In view of Theorems 1-3, it is natural to
ask: will
n
P be as effective as
n
M

in judging the stage of


software testing. Figures 6 and 7 show the relationship
between
n
P and n.
0 20 40 60 80 100 120 140
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
n
Pn
Figure 6 Trajectory of
n
P for the Musa Data
Proceedings of the 27th Annual International Computer Software and Applications Conference (COMPSAC03)
0730-3157/03 $ 17.00 2003 IEEE
20 30 40 50 60 70 80 90 100
0.28
0.3
0.32
0.34
0.36
0.38
0.4
0.42
0.44
0.46
n
Pn
Figure 7 Trajectory of
n
P for the Musa Data
from 28 = n to 95 = n
Having compared Figures 4 and 6, as well as Figures 5
and 7, we can conclude that the patterns of the trajectories
of
n
P and
n
M

are very similar. Since it is easier to


compute
n
P ,
n
P should be used instead of
n
M

to check
the stage of software testing.
5. Conclusion
In this paper, we have reviewed the trapezoidal shape
of software defect curves and presented some necessary
and/or sufficient conditions for software defect curves of
the Goel-Okumoto NHPP model. These results have
supplemented the results of our previous theoretical
analyses. Our necessary/sufficient conditions provide a
faster approach to predict the types of change for the
estimated number of remaining software defects, after
detecting and removing a software defect. We have also
used an example software defect curve generated by a
field software reliability dataset to further justify the
trapezoidal shape of software defect curves, and to
demonstrate the applications of our necessary/sufficient
conditions. In addition, these necessary/sufficient
conditions may help to assess software testing progress
and thus provide useful feedback information for adaptive
software testing which is counterpart of adaptive control
in software testing and falls into the scope of software
cybernetics. Software cybernetics explores the interplay
between software and cybernetics [13, 14].
References
[1] E.Yourdon, "Software Metrics," Application Development
Strategies (newsletter), Nov. 1994, pp 16.
[2] K.Y. Cai, Software Defect and Operational Profile
Modeling, Kluwer Academic Publishers, 1998.
[3] A.M. Neufelder, How to predict software defect density
during proposal phase, National Aerospace and
Electronics Conference. NAECON 2000, pp 71 -76.
[4] N.E. Fenton and M. Neil, A critique of software defect
prediction models IEEE Transactions on Software
Engineering , Vol: 25 ,Issue: 5 , pp675 -689.
[5] D. N. Card, Managing Software Quality with Defects,In
Proceedings: Computer Software and Applications
Conference, August 2002 , 2002 IEEE Computer Society.
[6] C.Y. Huang, S.Y. Kuo, and I.Y. Chen, "Analysis Of A
Software Reliability Growth Model With Logistic Testing-
Effort Function", 8th International Symposium on Software
Reliability Engineering, Nov, 1997, pp378-388.
[7] S. McConnell, Software Quality at Top Speed, Software
Development, Aug,1996.
[8] C.G. Bai, K.Y. Cai, Software Defect Curves, submitted
for publication, 2002.
[9] A.L. Goel and K. Okumoto, Time Dependent Error
Detection Rate Model for Software Reliability and Other
Performance Measure, IEEE Transactions on Reliability,
Vol. R-28, No.3, 1979, pp206-211.
[10] K.Y. Cai, Towards a Conceptual Framework of Software
Run Reliability Modeling, Information Sciences, Vol.126,
2000, pp137-163.
[11] D.L. Syder, Random Point Processes, Wiley, New York,
1975.
[12] J. D. Musa, Software Reliability Data, Bell Telephone
Laboratories Whippany, N.J. 07981, 1979.
[13] K.Y.Cai, Optimal Software Testing and Adaptive Software
Testing in the Context of Software Cybernetics,
Information and Software Technology, Vol.44, 2002,
pp841-855.
[14] K.Y.Cai, T.Y.Chen, T.H.Tse, Towards Research on
Software Cybernetics, Proc. 7
th
IEEE International
Symposium on High Assurance Systems Engineering, 2002,
pp240-241.
Proceedings of the 27th Annual International Computer Software and Applications Conference (COMPSAC03)
0730-3157/03 $ 17.00 2003 IEEE

You might also like