Using A Neural Network in The Software Testing Process: International Journal of Intelligent Systems January 2002
Using A Neural Network in The Software Testing Process: International Journal of Intelligent Systems January 2002
Using A Neural Network in The Software Testing Process: International Journal of Intelligent Systems January 2002
net/publication/220063934
CITATIONS READS
72 4,058
3 authors, including:
Mark Last
Ben-Gurion University of the Negev
203 PUBLICATIONS 2,811 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Mark Last on 18 November 2017.
1 INTRODUCTION
The main objective of software testing is to determine how well the evaluated application conforms to
its specifications. Two common approaches to software testing are black-box and white-box testing.
While the white-box approach uses the actual code of the tested program to perform its analysis, the
black-box approach checks the program output against the input without taking into account its inner
workings [10].
Software testing is divided into three stages: generation of test data, application of the data to the
software being tested, and the evaluation of the results. Traditionally, software testing was done
manually by a human tester who chooses the test cases and analyzes the results [2]. However, due to the
increase in the number and size of the programs being tested in present day, the burden of the human
tester is increased and alternative, automated software testing methods are needed. While automated
methods appear to take over the role of the human tester, the issues of reliability and the capability of the
software testing method still need to be resolved [11]. Thus, testing is an important aspect in the design
of a software product.
Both the white-box and black-box approaches to software testing are not without their limitations. Voas
& McGraw (1998) note that present day software systems are too large to be tested by the white-box
approach as a single entity; instead white-box testing techniques work at the subsystem level. One of the
limitations of the white-box testing approach is that it is not capable of analyzing certain faults, one of
which is testing for missing code [12]. The main problem associated with the black-box approach is to
generate test cases that are more likely to detect faults [12].
"Fault-based testing" is the term used to refer to methods that base the selection of test data on the
detection of specific faults [12] and is a type of white-box approach as it is using the code of the tested
program [10]. Mutation analysis is a fault-based technique that generates mutant versions of the
program that is being tested [3]. A test set is applied to every mutant program and is evaluated to
determine whether the test set is able to distinguish between the original and mutant versions.
Artificial neural networks (ANN) have been used in the past to handle several aspects of software
testing. Experiments have been conducted to evaluate the effectiveness of generating test cases capable
of exposing faults[1], to use principle components analysis to find faults in a system [6], to compare the
capabilities of neural networks to other fault exposing techniques [5, 7], and to find faults in failure data
[9].
In this paper, we present a new application of neural networks as an automated “oracle” for a tested
system. A multi-layer neural network is trained on the original software application by using randomly
generated test data that conform to the specifications. The neural network can be trained within a
reasonable accuracy of the original program though it may be unable to classify the test data 100%
correctly. In effect, the trained neural network becomes a simulated model of the software application.
When new versions of the original application are created and “regression testing” is required, the tested
code is executed on the test data to yield outputs that are compared with those of the neural network. We
assume here that the new versions do not change the existing functions, which means that the
application is supposed to produce the same output for the same inputs. A comparison tool then makes
the decision whether the output of the tested application is incorrect or correct based on the network
activation functions. Figure 1.1 presents the overview of the proposed testing methodology.
Decision:
Input Calculate the Output Erroneous
Data Distance or
Output Correct
Faulty Act ual
Application Out put
3
conducted to evaluate the effectiveness of using a neural network as an automated “oracle.” Section 6
summarizes our conclusions.
x1 wi1
Activation
x2 wi2 Function
n
x w
Input
Signals Output
j ij yi
. .
. . j 1
. .
Adder
xn win
4
between the input signal and the neuron is denoted wij. The set of input signals multiplied with the set of
the synaptic weights is linearly combined. The resulting value is then used to calculate the output of the
activation function, typically a sigmoid function (shown in Figure 2.2) with a range between 0.0 and 1.0
that consequently generates the neuron output signal yi.
n
net x w
j 1
j ij (1)
1
yi (2)
1 e net
x1
x2
y1
Input
Signals . Output
x2 .
.
.
Signals
.
.
.
.
.
ym
xn
Figure 2.3 Structure of a Multi-layer Network and Signal Propagation Through the Network
Multi-layer feedforward neural networks are capable of solving general and complex problems and the
backpropagation technique is computationally efficient as a training method [8]. In this paper, we use
the multi-layer network trained by backpropagation to simulate a credit approval application.
Training Phase
Tested Program
(may contain faults)
Evaluation Phase
Tested Program
(may contain faults)
Program
Input Output
Decision:
Comparison Output Erroneous
Set of
Tool or
Test Cases
Output Correct
Program
Input Output
Automated Oracle
(Trained NN)
Comparison of outputs
Same Different
Both correct ANN correct
Binary Both wrong Faulty application
correct
Type of Both correct ANN correct
Output Both wrong Faulty application
Continuous
correct
Both Wrong
Table 3.4 Possibilities of Fault Types
8
The two types of outputs are handled differently by the comparison tool, as there are four possible
situations for the binary output and five for the continuous output. The analysis of the continuous output
differs in one part with respect to the binary output. When both the ANN output and the faulty
application are different, the ANN output may be correct, the faulty application output may be correct or
in the case of a continuous output, they both may be wrong, whereas only two situations are possible for
the binary output (see Table 3.4). In our experiment (see next section), the values of both the low and
the high threshold for each type of output were obtained experimentally to yield the best overall fault
detection results.
9
more detailed description and the type of each attribute can be viewed in Table 4.1 and Table 4.2
provides a sample data set. Each record includes nine input attributes and two output attributes (one
binary, one continuous).
Name of the attribute Data Type Attribute Type Details
Serial ID Number integer Input unique for each customer
Citizenship integer Input 0: American
1: Others
State integer Input 0: Florida
1: other states
Age integer Input 1-100
Sex integer Input 0: Female
1: Male
Region integer Input 0-6 for different regions in US
Income Class integer Input 0 if income p.a. < $10k
1 if income p.a. >= $10k
2 if income p.a. >= $25k
3 if income p.a. >= $ 50k
Number of dependents integer Input 0-4
Marital status integer Input 0: Single
1: Married
Credit approved integer Output 0: No
1: Yes
Amount integer Output >= 0
Table 4.1 Input Attributes of the Data
1 0 1 20 1 3 1 1 1 0 860
2 1 1 18 1 4 1 1 0 0 1200
3 0 0 15 0 5 1 0 0 1 0
4 0 0 53 1 3 1 0 1 0 1400
5 0 0 6 1 4 2 2 0 1 0
6 1 1 95 1 3 0 1 0 0 400
7 1 0 78 1 5 2 2 0 1 0
8 0 0 84 0 2 0 2 0 0 1650
9 0 1 28 0 3 2 3 1 0 1370
10 0 0 74 0 2 2 2 0 0 1950
10
1. If ((Region == 5) || (Region == 6))
2. Credit Limit = 0
3. Else
4. If (Age < 18)
5. Credit Limit = 0
6. Else
7. If (Citizenship == 0){
8. Credit Limit = 5000 + 1000 * Income Class
9. If (State == 0)
10. If ((Region == 3) || (Region == 4)
11. Credit Limit = Credit Limit * 2.00
12. Else
13. Credit Limit = Credit Limit * 1.50
14. Else
15. Credit Limit = Credit Limit * 1.10
16. If (Marital Status == 0)
17. If (Number of dependents > 0)
18. Credit Limit = Credit Limit + 200 * Number of Dependents
19. Else
20. Credit Limit = Credit Limit + 500
21. Else
22. Credit Limit = Credit Limit + 1000
23. If (Sex == 0)
24. Credit Limit = Credit Limit + 500
25. Else
26. Credit Limit = Credit Limit + 1000
27. }
28. Else{
29. Credit Limit = 1000 + 800 * Income Class
30. If (Marital Status == 0)
31. If (Number of dependents > 2)
32. Credit Limit = Credit Limit + 100 * Number of Dependents
33. Else
34. Credit Limit = Credit Limit + 100
35. Else
36. Credit Limit = Credit Limit + 300
37. If (Sex == 0)
38. Credit Limit = Credit Limit + 100
39. Else
40. Credit Limit = Credit Limit + 200
41. }
42. If Credit Limit = 0
43. Credit Approved = 1
44. Else
45. Credit Approved = 0
Figure 4.3 Algorithm for the Credit Card Approval Application
A detailed description of the application logic is necessary for the reader to understand the type of faults
that are injected into the application though this logic was “hidden” from the backpropagation training
algorithm. The algorithm that the application follows can be found in Figure 4.3. The structure of the
application consists of a series of layered conditional statements. This provides the opportunity to
examine the effects of the faults over a range of possibilities. The types of faults that have been injected
in our experiment consist of minor changes to the conditional statements. These include a change in
operator and a change in the values used in the conditional statements. Several assumptions are made
when applying the faults to the application. Only one change is made at a time and the fault is either a
sign change or an error in the numerical value used in the comparison. Consequently, the analysis of
the outputs was conducted independently of each other. Table 4.4 provides a listing of the details for the
injected faults.
Description of the neural network
The backpropagation neural network used in this experiment is capable of generalizing the training data,
however the network cannot be applied to the raw data as the attributes do not have uniform
11
presentation. Some input attributes are not numeric and in order for the neural network to be trained on
the test data, the raw data has to be preprocessed and as the values and types differ for each attribute it
becomes necessary that the input data is normalized. The values of the continuous attribute vary over a
range while there are only two possible values for the binary attribute (0 or 1). Thus for the network to
be able to process the data uniformly, the continuous input attributes have to be normalized to a number
between 0 and 1. The data is preprocessed by normalizing the values for all the input attributes
according to the maximum and minimum possible values for each attribute. The output attributes were
processed in a different manner. If the output attribute was binary, the two possible network outputs are
0 and 1. On the other hand, continuous outputs cannot be treated in the same way, since they can take
an unlimited number of values. To overcome this problem, we have divided the range (found by using
the maximum and minimum possible values) of the continuous output into ten equal-sized intervals and
placed the output of each training example into the correct interval (a value between 0 and 9).
The architecture of the neural network is also dependent on the data. In this experiment, we used eight
non-computational input units for the eight relevant input attributes (the first is not used as it is a
descriptor for the example) and twelve output computational units for the output attributes. The first
two output units are used for the binary output. For the purposes of training, the unit with the higher
output value is considered to be the "winner". Similarly the remaining ten units are used for the
continuous output. The initial synaptic weights of the neural network were obtained randomly and
covered a range between –0.5 and 0.5. Experimenting with the neural network and the training data, we
concluded that one hidden layer with twenty-four units was sufficient for the neural network to
approximate the original application to within a reasonable accuracy. A learning rate of 0.5 was used
and the network required 1500 epochs to produce a 0.2 % misclassification rate on the binary output and
5.4 % for the continuous output. Figure 4.5 shows the number of epochs vs. the convergence of the
error.
12
Output 1
Output 2
(Credit Approval)
Fault Line (Credit Limit)
Original Line Injected Fault Error Type
# # Final root mean
Error Confidence
(%) Interval (95%) squared error
0.8
0.6
Error
Output 1
0.4
Output 2
0.2
0
1 201 401 601 801 1001 1201
Epochs
14
Low Threshold = 0.21 Low Threshold = 0.21 Low Threshold = 0.31 Low Threshold = 0.31
High Threshold = 0.81 High Threshold = 0.91 High Threshold = 0.81 High Threshold = 0.91
Percentage Percentage Percentage Percentage Percentage Percentage Percentage Percentage
of Correct of Incorrect of Correct of Incorrect of Correct of Incorrect of Correct of Incorrect
Injected Outputs Outputs Outputs Outputs Outputs Outputs Outputs Outputs
Fault Classified as Classified Classified Classified Classified Classified Classified Classified as
Number Being as Being as Being as Being as Being as Being as Being Being
Incorrect Correct Incorrect Correct Incorrect Correct Incorrect Correct
(%) (%) (%) (%) (%) (%) (%) (%)
02 1.40 16.43 1.40 12.86 1.16 16.43 1.16 12.86
03 1.41 14.97 2.11 13.61 1.29 15.65 1.99 14.29
04 2.22 5.56 3.75 5.07 2.22 5.80 3.75 5.31
05 6.74 2.80 12.92 2.80 5.62 3.16 11.80 3.16
06 1.40 31.88 1.61 27.54 1.29 31.88 1.50 27.54
Percentage Average 2.63 14.33 4.36 12.37 2.32 14.58 4.04 12.63
Total Average 8.48 8.37 8.45 8.34
Table 5.1 The Error Rate as a Function of the Threshold Values for the Binary Output
Percentage Percentage
of Correct of Incorrect
Injected Number of Number of Outputs Outputs
Fault Correct Incorrect Classified Classified
Number Outputs Outputs as Being as Being
Incorrect Correct
(%) (%)
02 860 140 1.28 14.29
03 853 147 1.88 13.61
04 586 414 3.41 5.07
05 178 822 10.11 2.80
06 931 69 1.61 28.99
Percentage
3.66 12.95
Average
Total
8.31
Average
Table 5.2 The Minimum Error Rate for the Binary Output (Low Threshold = 0.26, High Threshold =
0.86)
Table 5.3 summarizes the results for the continuous output. Due to the increased complexity involved in
evaluating the continuous output, there is a significant change in the capability of the neural network to
distinguish between the correct and the faulty test cases: the minimum average error of 8.31 achieved
for the binary output vs. the minimum average error of 20.79 for the continuous output. An attempt to
vary the threshold values also did not result in an evident change to the overall average percentage of
error for the continuous output.
15
Percentage Percentage
of Correct of Incorrect
Injected Number of Number of Outputs Outputs
Fault Correct Incorrect Classified Classified
Number Outputs Outputs as Being as Being
Incorrect Correct
(%) (%)
02 140 860 28.14 1.43
03 307 693 6.78 49.51
04 587 413 5.33 21.12
05 822 178 8.99 3.89
06 69 931 23.63 13.04
07 559 441 11.11 5.72
08 355 645 21.86 7.89
09 217 783 8.17 73.27
10 303 697 7.60 52.15
11 238 762 7.74 66.39
12 276 724 24.17 10.51
13 371 629 23.05 6.47
14 99 901 22.86 23.23
15 65 935 23.32 33.85
16 407 593 23.27 4.91
17 273 727 22.56 13.55
18 20 980 24.49 50.00
19 71 929 24.54 1.41
20 1000 0 0.00 4.20
21 125 875 20.91 50.40
Percentage
16.93 24.65
Average
Total
20.79
Average
Table 5.3 The Minimum Error Rate for the Continuous Output (Low Threshold = 0.10, High Threshold
= 0.90)
6 CONCLUSIONS
In this experiment, we have used a neural network as an automated “Oracle” for testing a real
application and applied mutation testing to generate faulty versions of the original program. We then
used a comparison tool to evaluate the correctness of the obtained results based on the absolute
difference between the two outputs.
The neural network is shown to be a promising method of testing a software application provided that
the training data has a good coverage of the input range. The backpropagation method of training the
neural network is a relatively rigorous method capable of generalization and one of its properties ensures
that the network can be updated by learning new data. As the software, that the network is trained to
simulate is updated, so too can the trained neural network learn to classify the new data. Thus, the
neural network is capable of learning new versions of evolving software.
The benefits and the limitations of the approach presented in this paper need to be fully studied on
additional software systems involving a larger number of inputs and outputs. However, as most of the
methodology introduced in this paper has been developed from other known techniques in artificial
intelligence, it can be used as a solid basis for future experimentation. One possible application can
include generation of test cases that are more likely to cause faults. The heuristic used by the comparison
tool may be modified by using more than two thresholds or an overlap of thresholds by fuzzification.
The method can be further evaluated by introducing more types of faults into a tested application.
16
ACKNOWLEDGEMENTS
This work was partially supported by the USF Center for Software Testing under grant no. 2108-004-00.
REFERENCES
1. C. Anderson, A. von Mayrhauser, R. Mraz, On the Use of Neural Networks to Guide Software
Testing Activities, Proceedings of ITC'95, the International Test Conference, October 21-26, 1995.
2. J. Choi, B. Choi, Test Agent System Design, 1999 IEEE International Fuzzy Systems Conference
Proceedings, August 22-25, 1999.
3. R. A. DeMillo, A. J. Offutt, Constraint-based Automatic Test Data Generation, IEEE Transactions on
Software Engineering SE-17, 9 (Sept. 1991), pp. 900-910.
4. S. Haykin, Neural Networks, A Comprehensive Foundation, 1999.
5. T. M. Khoshgoftaar, E. B. Allen, J. P. Hudepohl, S. J. Aud, Application of Neural Networks to
Software Quality Modeling of a Very Large Telecommunications System, IEEE Transactions on
Neural Networks, vol 8, num 4, July 1997, pp. 902-909.
6. T. M. Khoshgoftaar, R.M. Szabo, Using Neural Networks to Predict Software Faults During Testing,
IEEE Transactions on Reliability, vol 45, num 3, September 1996, pp. 456-462.
7. L. V. Kirkland, R. G. Wright, Using Neural Networks to Solve Testing Problems, IEEE Aerospace
and Electronics Systems Magazine, vol 12, num 8, August 1997, pp. 36-40.
8. T. Mitchell, Machine Learning, McGraw-Hill, 1997.
9. S. A. Sherer, Software Fault Prediction, Journal of Systems and Software, vol 29, num 2, May 1995,
pp. 97-105.
10. J. M. Voas, G. McGraw, Software Fault Injection, 1998.
11. J. M. Voas, K.W. Miller, Software Testability: The New Verification, IEEE Software, 1995.
12. E. Weyuker, T. Goradia, A. Singh, Automatically Generating Test Data from a Boolean
Specification, IEEE Transactions on Software Engineering SE-20, 5(May 1994), pp. 353-363.
17