Efficient Defect Estimation Method
Efficient Defect Estimation Method
Efficient Defect Estimation Method
1
1
Supported by the National Natural Science Foundation of China (60233020), the Aviation Science Foundation of China (01F51025) and the 863
Program of China (2001AA113192).
Chenggang Bai
Department of Automatic
Control, Beijing University of
Aeronautics and Astronautics
Beijing 100083, China
[email protected]
Kai-Yuan Cai
Department of Automatic
Control, Beijing University of
Aeronautics and Astronautics
Beijing 100083, China
[email protected]
T.Y. Chen
School of Information
Technology, Swinburne
University of Technology
Hawthorn 3122, Australia
[email protected]
Abstract
Software defect curves describe the behavior of the
estimate of the number of remaining software defects as
software testing proceeds. They are of two possible
patterns: single-trapezoidal-like curves or multiple-
trapezoidal-like curves. In this paper we present some
necessary and/or sufficient conditions for software defect
curves of the Goel-Okumoto NHPP model. These
conditions can be used to predict the effect of the
detection and removal of a software defect on the
variations of the estimates of the number of remaining
defects. A field software reliability dataset is used to
justify the trapezoidal shape of software defect curves and
our theoretical analyses. The results presented in this
paper may provide useful feedback information for
assessing software testing progress and have potentials in
the emerging area of software cybernetics that explores
the interplay between software and control.
1. Introduction
Software defects play a key role in software reliability
study. Therefore, it is very important to study the
properties of software defects. As pointed out by
Yourdon [1], "Defects are not the only measure of
quality, of course; but they are the most visible indicator
of quality throughout a project". The problem has been
studied by many researchers [2-4]. Many early studies of
defect occurrence suggested that it followed a Rayleigh
curve [5, 6], roughly proportional to project staffing. The
underlying assumption is that the more effort expended,
the more mistakes are made. McConnell [7] has discussed
the relationship between defect rate and development
time. In his observations, projects that achieve the lowest
defect rates also achieve the shortest schedules. Since
software testing is the immediate phase prior to the
release of software, it will be most interesting to know
more about the relationship between the number of
remaining software defects and testing process. Although
there have been some experimental curves to depict the
relationship between the number of remaining software
defects and testing process, a thorough and rigorous
analysis has not yet been conducted. In our previous
investigation [8], we have presented the notion of
software defect curve to depict such a relationship. This
paper is a follow-up of our previous investigation.
Software defect curves describe the behavior of the
estimate of the number of remaining software defects as
software testing proceeds. Figure 1 shows a typical
pattern of software defect curves under a time-
homogenous test profile. In Figure 1, n denotes the
number of software defects detected and removed, and
n
M
and
1
+ n
M
(refer to Figure 1). In this paper
we present some necessary and/or sufficient conditions
for the relationship. These conditions are easily verifiable.
These results also supplement our previous theoretical
analyses of software defect curves [8].
The rest of the paper is organized as follows. Section 2
reviews the Goel-Okumoto NHPP model of software
reliability. Section 3 presents necessary and/or sufficient
conditions for software defect curves under the Goel-
Okumoto NHPP model. In Section 4 presents a software
defect curve based on Musas software reliability data.
Concluding remarks are presented in Section 5.
2. The Goel-Okumoto NHPP model
The Goel-Okumoto NHPP model is one of the most
important models in software reliability engineering. It is
based on the following assumptions [9].
(1). Software is tested under anticipated operating
environment.
(2). For any set of finite time instants
1
t ,
2
t , ,
n
t ,
the numbers of software failures
n
q q q , , ,
2 1
observed
in the time intervals ] , ( , ], , ( ], , 0 (
1 2 1 1 n n
t t t t t
!
)} ( ) ( exp{ )] ( ) ( [
1 1
i
i i i i
t m t m t m t m
i
q
q
=
Suppose that there is an one-to-one correspondence
between failure and defect, and each failure-causing
defect is removed immediately upon the detection of a
failure, without introducing new defects. Then, the
number of software failures is equal to the number of
software defects. Thus the Goel-Okumoto NHPP model
can be used to estimate the number of software defect [2].
Originally in the Goel-Okumoto NHPP model
} , , , {
2 1 n
t t t can be any observation time instants
given a priori and the model deals with } , , , {
2 1 n
q q q
which are classified as the type II data
2
[10]. It should be
noted that Goel-Okumoto NHPP model deals with both
2
Type I data mean that software reliability data are represented in terms
of the time intervals between successive software failures. Type II data
mean that software reliability data are represented in terms of the
numbers of failures observed in successive time intervals.
n
M
n
Test profile 1 Test profile 2
Test profile 3
Proceedings of the 27th Annual International Computer Software and Applications Conference (COMPSAC03)
0730-3157/03 $ 17.00 2003 IEEE
type I and type II data. Since type I data is more accurate
than type II data, we use type I data in this study.
Here we still treat t
i
as the time instant of the ith
software failure. Let the software failure process be
represented in Figure 3, where } , , 1 : { n i X
i
= is a series
of time intervals between successive software failures
with
1
=
i i i
T T X
.
Suppose
n
X X X , , ,
2 1
are independent and
i
t is
realization of
i
T . Then the joint density distribution
function of
n
T T T , , ,
2 1
can be determined by
3
} 1 ( exp{ } exp{ ) ( ) , , , (
1
2 1
n
bt
n
i
i
n
n
e a t b ab t t t L
=
=
_
Hence, the maximum likelihood estimates of a and b
are determined by the following equations
n n
t b
n
e
n
a
=
(1)
_
=
+ =
n
i
i
t b
n n
n
t e t a
b
n
n n
1
(2)
Here we are more interested in
n
M
than
n
a , with
n a M
n n
=
(3)
Let z=
n
b
t
n
and
n
n
i
i
n
nt
t
P
_
=
=
1 , the followings are after
equations (1)-(3)
1
=
z n
e
n
M
(4)
1
1 1
=
z n
e z
P
(5)
3
The joint density distribution function can be obtained from most
books, e.g. [11], about non-homogeneous Poisson process.
Let ( ) w z be the inverse function of
1
1 1
=
z
e z
w
, and
( )
( )
1
1
=
w z
e
w h
. In [8], we have proved the following
proposition.
Proposition 1
1
+ n
M
~
n
M
iff n+1h (
1 + n
P
) ~n h (
n
P
),
where ~ can be >, = , or < . Q.E.D.
In the next section, we will present another necessary
and sufficient condition for software defect curves, which
is easier to be verified.
3. Necessary and sufficient conditions
From equations (4) and (5), we note
n
M
M
n
P
n
n
n
1 ln(
1
+
=
Thus, we have
0 lim
0
n
M
P
n
Let
=
>
+ = =
0 0
0
)
1
1 ln(
1
) (
x
x x
x
x f y
Lemma 1 ) (x f y = is continuous and strictly
increasing on ) , 0 [ .
Proof It is obvious that ) (x f y = is continuous on
) , 0 [ .
Furthermore, for ) , 0 ( e x , ) (x f y = is equivalent
to
=
x
e
y
e
x
1
1
1 1
1
1
(6)
So,
1
d
d 1
d
d
2
=
x x
y
1
) 1 ( 1
2
2
e
e
It is easy to prove that
0 1
) 1 ( 1
2
2
>
e
e
, for 0 > .
Therefore ) (x f y = is strictly increasing for ) , 0 ( e x .
Finally, 0 ) 0 ( = f , and 0 ) ( > = x f y for ) , 0 ( e x
because
x x
1
)
1
1 ln( < + . Hence, we can conclude that
) (x f y = is strictly increasing on ) , 0 [ e x . Q.E.D.
Corollary 1 ) ( ) (
1
y f y k x
= = is continuous and
strictly increasing on )
2
1
, 0 [ .
Figure 3 Software failure process
X
1
X
2
X
i
t
T
0
=0 T
1
T
i-1
T
2
T
i
Proceedings of the 27th Annual International Computer Software and Applications Conference (COMPSAC03)
0730-3157/03 $ 17.00 2003 IEEE
Proof Note 0 ) 0 ( = f and
2
1
)
1
1 ln(
1
lim =
+
x
x
x
.
Therefore, we have
)
2
1
, 0 [ e y
, and ) ( y k x = is
continuous and strictly increasing on
)
2
1
, 0 [
as a result of
Lemma 1. Q.E.D.
Corollary 1 ) ( ) (
1
y f y k x
= = is continuous and
strictly increasing on
)
2
1
, 0 [
.
Proof Note 0 ) 0 ( = f and
2
1
)
1
1 ln(
1
lim =
+
x
x
x
.
Therefore, we have
)
2
1
, 0 [ e y
, and ) ( y k x = is
continuous and strictly increasing on
)
2
1
, 0 [
as a result of
Lemma 1. Q.E.D.
Lemma 2 For ) , 0 ( e x ,
0
d
d
2
2
<
x
y
.
Proof Refer to equation (6).
2
2
2
2
2
2
2
)
d
d
).(
d
d
.
d
d
.
d
d
d
d
(
d
d
x x
x y y
x
y
=
2
3
)
d
d
(
) 1 (
2 2
x e
e e
=
It is easy to verify that 0 2 2 <
e e for
0 > . This completes the proof. Q.E.D.
Lemma 3 ) ( y k is a strictly convex function over
)
2
1
, 0 [
.
Proof First,
0 )
d
d
/(
d
d
.
d
d
)
d
d
(
d
d
d
d
2
2
2
2
2
> = =
x
y
x
y
y
x
y
x
y y
x
Hence, ) ( y k a strictly convex function over
)
2
1
, 0 [
.
Furthermore, we have
y
x
y
x
y
x
y
k y k
k
y y y
d
d
lim
d
d
lim lim
0
) 0 ( ) (
lim ) 0 (
0 0 0
/
= = =
0
) 1 (
lim
2 2
2
=
=
e e
e
0
0
) 0 (
d
d
lim ) 0 (
/
0
//
=
y
k
y
x
k
y
This implies that ) ( y k is a convex function defined
on
)
2
1
, 0 [
, that is,
)
2
1
, 0 [ e y
.
) 1 , 0 ( e , there holds
) ( ) 0 ( ) 1 ( ) 0 ) 1 (( y k k y k + s +
or ) ( ) ( y k y k s
In order to show that ) ( y k is a strictly convex
function defined on
)
2
1
, 0 [
, we need to show
) ( ) ( y k y k = for
)
2
1
, 0 [ e y
, ) 1 , 0 ( e . This is to be
done by contradiction. Suppose
)
2
1
, 0 [ e -y
, ) 1 , 0 (
0
e ,
such that ) ( ) (
0 0 0 0
y k y k =
Let ) (
0 0
y k x = or ) (
0 0 0 0
y k x = . Then
0
0
0
)
1
1 ln(
1
x
x
y
+
=
0 0
0 0
0 0
)
1
1 ln(
1
x
x
y
+
=
However, we also have
)
)
1
1 ln(
1
(
0
0
0 0 0
x
x
y
+
=
Thus,
0 0
1
0
1
1 )
1
1 (
0
x x
+ = +
On the other hand, we have
0 0
1
0
1
1 )
1
1 (
0
x x
+ > +
, )
2
1
, 0 (
0
e x , ) 1 , 0 (
0
e
This leads to a contradiction. Q.E.D.
Theorem 1
1
+ n
M
~
n
M
iff ) ( ) 1 (
1 +
+
n
P k n ~ ) (
n
P nk ,
where ~ can be >, <, or =.
Proof The proof follows after
)
(
n
M
f P
n
n
=
and that
(.) k is strictly increasing function. Q.E.D.
Theorem 2 If
n n
P P >
+1
, then
n n
M M
1
>
+
.
Proof
n n
P P >
+1
implies ) ( ) (
1 n n
P k P k >
+
, which in
turn implies ) ( ) ( ) 1 (
1 n n
P nk P k n > +
+
. Q.E.D.
Theorem 3 If
n n
nP P n < +
+1
) 1 ( , then
n n
M M
1
<
+
Proof Note (.) k is a strictly increasing and convex
function, and 0 ) 0 ( = k . Hence:
)
1
( ) (
1 n n
P
n
n
k P k
+
<
+
=
) 0
1
1
1
(
+
+
+ n
P
n
n
k
n
Proceedings of the 27th Annual International Computer Software and Applications Conference (COMPSAC03)
0730-3157/03 $ 17.00 2003 IEEE
) (
1
) 0 (
1
1
) (
1
n n
P k
n
n
k
n
P k
n
n
+
=
+
+
+
<
This completes the proof as a result of Theorem 1.
Q.E.D.
4. Example of software defect curves
In this example, we use the software reliability dataset
of Musa [12].
n
M
.
Disregarding the portion of
n
M
for 27 1 s s n , which is
over fluctuating and unreliable as anticipated at the very
early stage of software testing, Figure 4 matches the
pattern of Figure 2. For 95 28 s s n , the behavior of
n
M
Figure 4 Trajectory of
n
M
Figure 5 Trajectory of
n
M
tends to increase or
decrease, we can judge the phase of software testing. As
shown in Equations (1)(3),
n
M
to check
the stage of software testing.
5. Conclusion
In this paper, we have reviewed the trapezoidal shape
of software defect curves and presented some necessary
and/or sufficient conditions for software defect curves of
the Goel-Okumoto NHPP model. These results have
supplemented the results of our previous theoretical
analyses. Our necessary/sufficient conditions provide a
faster approach to predict the types of change for the
estimated number of remaining software defects, after
detecting and removing a software defect. We have also
used an example software defect curve generated by a
field software reliability dataset to further justify the
trapezoidal shape of software defect curves, and to
demonstrate the applications of our necessary/sufficient
conditions. In addition, these necessary/sufficient
conditions may help to assess software testing progress
and thus provide useful feedback information for adaptive
software testing which is counterpart of adaptive control
in software testing and falls into the scope of software
cybernetics. Software cybernetics explores the interplay
between software and cybernetics [13, 14].
References
[1] E.Yourdon, "Software Metrics," Application Development
Strategies (newsletter), Nov. 1994, pp 16.
[2] K.Y. Cai, Software Defect and Operational Profile
Modeling, Kluwer Academic Publishers, 1998.
[3] A.M. Neufelder, How to predict software defect density
during proposal phase, National Aerospace and
Electronics Conference. NAECON 2000, pp 71 -76.
[4] N.E. Fenton and M. Neil, A critique of software defect
prediction models IEEE Transactions on Software
Engineering , Vol: 25 ,Issue: 5 , pp675 -689.
[5] D. N. Card, Managing Software Quality with Defects,In
Proceedings: Computer Software and Applications
Conference, August 2002 , 2002 IEEE Computer Society.
[6] C.Y. Huang, S.Y. Kuo, and I.Y. Chen, "Analysis Of A
Software Reliability Growth Model With Logistic Testing-
Effort Function", 8th International Symposium on Software
Reliability Engineering, Nov, 1997, pp378-388.
[7] S. McConnell, Software Quality at Top Speed, Software
Development, Aug,1996.
[8] C.G. Bai, K.Y. Cai, Software Defect Curves, submitted
for publication, 2002.
[9] A.L. Goel and K. Okumoto, Time Dependent Error
Detection Rate Model for Software Reliability and Other
Performance Measure, IEEE Transactions on Reliability,
Vol. R-28, No.3, 1979, pp206-211.
[10] K.Y. Cai, Towards a Conceptual Framework of Software
Run Reliability Modeling, Information Sciences, Vol.126,
2000, pp137-163.
[11] D.L. Syder, Random Point Processes, Wiley, New York,
1975.
[12] J. D. Musa, Software Reliability Data, Bell Telephone
Laboratories Whippany, N.J. 07981, 1979.
[13] K.Y.Cai, Optimal Software Testing and Adaptive Software
Testing in the Context of Software Cybernetics,
Information and Software Technology, Vol.44, 2002,
pp841-855.
[14] K.Y.Cai, T.Y.Chen, T.H.Tse, Towards Research on
Software Cybernetics, Proc. 7
th
IEEE International
Symposium on High Assurance Systems Engineering, 2002,
pp240-241.
Proceedings of the 27th Annual International Computer Software and Applications Conference (COMPSAC03)
0730-3157/03 $ 17.00 2003 IEEE