Pchandra 2005 505 509

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Journal of Computer Sciences 1 (4): 505-509, 2005

ISSN 1549-3636
© 2005 Science Publications

Bayesian Regularization in a Neural Network Model


to Estimate Lines of Code Using Function Points
1
K.K. Aggarwal, 1Yogesh Singh, 1Pravin Chandra and 2Manimala Puri
1
GGS Indraprastha University, Delhi, India, 2IT Department, D.Y.Patil, COE, Pune, India

Abstract: It is a well known fact that at the beginning of any project, the software industry needs to
know, how much will it cost to develop and what would be the time required ? . This paper examines
the potential of using a neural network model for estimating the lines of code, once the functional
requirements are known. Using the International Software Benchmarking Standards Group (ISBSG)
Repository Data (release 9) for the experiment, this paper examines the performance of back
propagation feed forward neural network to estimate the Source Lines of Code. Multiple training
algorithms are used in the experiments. Results demonstrate that the neural network models trained
using Bayesian Regularization provide the best results and are suitable for this purpose.

Key words: Neural network, estimation, lines of code, function point

INTRODUCTION clustering[11,12]. They are particularly useful in


problems where there is a complex relationship
The estimation of resource expenditure (example, between an input and output. It has been established
effort, schedule ) is an essential software project that a one hidden layer feedforward network with
management activity. Most projects (60-80 %). (sufficient number of) sigmoidal nodes can approximate
encounter effort and schedule overrun[1-5]. any continuous mapping with arbitrary precision[13-16].
Software development involves a number of The feed forward multi layer network is a network in
interrelated factors which can affect development which no loops occur in the network path. A learning
effort and time and it is a complex dynamic process. rule is defined as a procedure for modifying the weight
It is a challenge to estimate the lines of code for the and biases of a network with the objective of
project during the early stages of project as very little is minimizing the mismatch between the desired output
known about the problem. Several researchers have and the obtained output from the network for any given
suggested various techniques to predict software effort input. The learning rule / network training algorithm is
namely model based (SLIM, COCOMO Checkpoint), used to adjust the weights and biases of the network in
Expert based ( Delphi) , Regression based etc. The order to move the network outputs close to the targets.
latest of these techniques are machine learning The classical backpropagation algorithm was the first
techniques. There are a number of approaches[1,2,3] to training algorithm developed[17]. The simplest
machine learning namely Neural Networks, Fuzzy implementation of backpropagation learning updates
Logic, Case Based Reasoning and Hybrid Systems. the network weights and biases in the direction in
Many researchers[6-9] have explored the possibility of which the performance function decreases most rapidly
using Neural Networks for estimating the effort. Neuro - the negative of the gradient[17] though second order
Fuzzy models[10] have also been explored and they are optimization algorithms like the conjugate gradient, the
found to be useful in software estimation. This paper Levenberg-Marquardt and Baysian learning algorithms
focuses on using a neural network to predict the lines of have also been developed. In this paper, a four input
code when the function point, the FP standard used, the and one output network is used. The network uses only
language one is going to use and the maximum team one hidden layer. The activation functions at the hidden
size is known. The ISBSG repository that is available layer and the output layers are the tangent - hyperbolic
for a number of projects is used to prove that neural (tanh) function. The network inputs are (a) The
networks are indeed suitable for this purpose. It also function point count for projects, (b) the team size, (c)
examines which training algorithm is best suited for the the level of the language used in development and (d)
purpose. the function point standard. The block diagram of the
network used is shown in Fig. 1.
Artificial neural networks: Artificial neural networks Figure 1 shows a model whose inputs are function
can model complex non-linear relationships and points, language used, FP standard and maximum team
approximate any measurable function. They can be size. The output (target) is lines of code.
used as an effective tool for pattern classification and Neural Network model for estimating lines of code:

Corresponding Author: Manimala Puri, IT Department, D.Y.Patil, COE, Pune, India


505
J. Computer Sci., 1 (4): 505-509, 2005

Table 1: Input codes for FP Std


FP Std. Code
CPM 4.0 0.5
IFPUG 4 0.25
IFPUG 4.1 -0.25
Backfired -0.5

Table 2: Input codes for language used


Language used Code
3 GL 0.25
4 GL 0.5

The neural network used a sigmoid feed forward


Fig. 1: Neural network model network with a single hidden layer using the neural
network tool for of MATLAB. Seventy one exemplars
EXPERIMENT were used for training with SLOC as the target. The
neurons in the hidden layer were varied from five to
Study area and Data used: The project data used was sixteen. It was found that the ensemble with fifteen
that of International Software Benchmarking Standards neurons in the hidden layer yielded best results.
Group (ISBSG) repository data (release 9). Out of the Thus there are four nodes in the input layer, fifteen
various fields available, the following fields were used neurons in the hidden layer and one node in the output
as inputs. layer. The MATLAB adaptation learning function
selected for this experiment was ‘learngdm, the
Project ID: This was used for identifying projects performance function used was mean square error
(MSE). Transfer functions used were, tangent -
Functions points: The function points count for that hyperbolic in both the hidden and the output layers.
particular project. The goal was kept as 0.00(though it was never
achieved, but a goal of 10 –16 was reached which is as
F.P. standard: This field specifies which function good as 0.)The no. of epochs was kept as 1000. After
point standard was used e.x. CPM 4.0 IFPUG 4, training, testing was done on the network from the data
set of seventeen projects. Random partioning was done
IFPUF4.1 etc.
to form three sets of training and test data. After
training, testing was done and the output obtained were
Language: This defines the language type used compared with the target values. In this case we
for the project e.g. 3GL, 4GL, Application obtained seventy one training cases and seventeen test
Generator etc. cases in every set. The objectives of the experiments
were twofold:
Lines of code: The number of the source line of code
(SLOC) produced by the project. This is not available * To verify if neural networks can be used for
for all projects. prediction of SLOC counts on the basis of Function
Since SLOC is not available for all projects , only Points, team size, function point standard and
those projects were considered for the experiment language type used in development.
where SLOC data was available . This lead to a data * To empirically evaluate the training algorithms and
set of 88 projects.
to find which training algorithm is suitable for the
estimation purpose.
MATERIALS AND METHODS
* The experiment was conducted using the same
neural network but using different algorithms as
The function point count, function point
standard, language used a nd maximum team size shown in Table
were used as inputs. Outliers were removed from After performing the same experiment with
all data sets. The function point data was different algorithms, the results are compared in the
normalized by linear scaling between -1 and 1. The FP next section.
standard was coded as shown in table I and the
language used, was coded as shown in Table 2. Error measurements: Different error measurements
Since there was only one data POINT pertaining to 5 have been used by various researchers. We have chosen
GL that was dropped. All data variables are scaled in the mean absolute Percentage Error(MAPE) MAPE is
the range -1 to 1. calculated as follows[7]

506
J. Computer Sci., 1 (4): 505-509, 2005

Table 3: Different training algorithms


trg. Fnc. Description
trainb trains a network with weight and bias learning rules with batch updates. The weights and biases are
updated at the end of an entire pass through the input data.
Trainbfg updates weight and bias values according to the BFGS quasi-Newton method.
Trainbr updates the weight and bias values according to Levenberg-Marquardt optimization. It minimizes a
combination of squared errors and weights and then determines the correct combination so as to
produce a network that generalizes well. The process is called Bayesian regularization.
Trainc trains a network with weight and bias learning rules with incremental updates after each presentation
of an input. Inputs are presented in cyclic order.
train cgb updates weight and bias values according to the conjugate gradient backpropagation with Powell-Beale
restarts.
train cgf updates weight and bias values according to the conjugate gradient backpropagation with Fletcher-
Reeves updates.
train cgp updates weight and bias values according to the conjugate gradient backpropagation with Polak-
Ribiere updates.
train gd updates weight and bias values according to gradient descent.
train gda updates weight and bias values according to gradient descent with adaptive learning rate.
train gdm updates weight and bias values according to gradient descent with momentum.
train gdx updates weight and bias values according to gradient descent momentum and an adaptive learning rate.
train lm updates weight and bias values according to Levenberg-Marquardt optimization.
train oss updates weight and bias values according to the one step secant method.
train rp updates weight and bias values according to the resilient backpropagation algorithm (RPROP).
train scg updates weight and bias values according to the scaled conjugate gradient method.

Table 4: Results using various training algorithms


At Run 1 Run 2 Run 3
Significance

Significance

Significance
Co-relation

Co-relation

Co-relation
Algorithm

MAPE

MAPE

MAPE
Std

Std

Train gd 17.35 0.33 0.28 0.81 13.18 0.48 0.17 0.95 18.57 0.03 Std
0.23 0.10

Train gdm 20.50 0.20 0.18 0.57 14.76 0.42 0.16 0.92 22.04 -0.32 0.21 0.81

Train gda 17.97 0.33 0.35 0.80 15.76 0.41 0.14 0.94 19.89 0.09 0.32 0.28

Train gdx 24.32 0.56 0.31 0.98 16.44 0.41 0.15 0.91 22.95 0.37 0.31 0.87

Train rp 18.42 0.46 .21 0.94 16.83 0.34 0.15 0.84 15.44 0.67 0.19 0.99

Train oss 34.59 0.35 0.56 0.83 17.02 0.50 0.16 0.96 34.05 0.61 0.38 0.99

Train scg 39.20 0.27 0.63 0.71 25.63 0.31 0.31 0.79 56.14 0.63 0.63 0.99

Train cgp 19.42 0.39 0.25 0.88 17.12 0.54 0.16 0.98 63.72 0.52 0.72 0.97

Train cgf 24.91 0.69 0.32 0.99 23.09 0.44 0.27 0.93 39.17 0.60 0.48 0.99

Train cgb 25.75 0.39 0.36 0.87 23.97 0.75 0.24 0.99 72.80 0.36 0.80 0.86

Train b 20.63 0.20 0.18 0.56 12.85 0.49 0.15 0.96 23.37 -0.09 0.31 0.28

Train bfg 30.84 0.08 0.53 0.26 33.79 0.64 0.41 0.99 50.61 0.73 0.63 0.99

Train lm
37.50 -0.22 0.46 0.62 89.16 0.24 0.93 0.67 63.59 0.55 0.75 0.98

Train br 13.93 0.71 0.14 0.99 12.94 0.60 0.13 0.99 17.08 0.70 0.19 0.99

507
J. Computer Sci., 1 (4): 505-509, 2005

 j =n  Estimate − Actual   ÷n×100 (1)


MAPE =  ∑  

 j =1  Actual 

If MAPE is small, the better is the model and the


predictions are a good set of predictions.
The Correlation Coefficient (r). or correlation
coefficient for short is a measure of the degree of linear
relationship between two variables. The correlation
coefficient may take on any value between plus and
minus one
Significance of Correlations.(sig). The significance
level calculated for each correlation is a primary source
of information about the reliability of the correlation
Fig. 3: SLOC (actual and predicted ) Vs. Projects for
Standard Deviation.(std.). The standard deviation
RUN 2
(this term was first used by Pearson, 1894) is a
commonly-used measure of variation. The standard
deviation of a population of values is computed as:

= [ (xi-µ)2/N]1/2 ( 2)

where:
µ is the population mean
N is the population size.

RESULTS AND DISCUSSION

The results using various training algorithms are as


shown in Table 4.
Results demonstrate that train br algorithm can be
rated as the best. average MAPE in this case is 14.65,
average co-relation is 0.64 and average Fig. 4: SLOC (actual and predicted ) Vs. Projects for
significance is 0.99 . train gd algorithm can be rated as RUN 3
the next best one with average mape as 16.36, average
co-relation as 0.28 and average significance as Future scope: The neural network model used here
0.62.these are followed by train rp algorithm with could further be extended to a neural fuzzy model being
average values of MAPE, co-relation and significance trained and tested for the same ISBSG data ( Release
being 16.89, 0.49 and 0.90 respectively. The plots of 9). There is a possibility that this model could be a
actual SLOC and predicted SLOC for various test better model.
projects using the train br algorithm for three runs are
as shown in Fig. 2- 4. CONCLUSION

In the present work, the possibility of use of neural


networks for estimating lines of code for software
project was explored. The ISBSG Data(Realease 9) ,
was used to train and test the neural network.
It is concluded from experimental work the neural
networks can be very well used for estimating the lines
of code once the function point count is known. Also it
is concluded that the train br algorithm yields the best
results.

REFERENCES

1. Aggarwal, K.K. and Yogesh Singh, 2001. Software


Fig. 2: SLOC (actual and predicted ) Vs. Projects for Engineering Programs, Documentation Operating
RUN 1 Procedure, New Age International Publishers.

508
J. Computer Sci., 1 (4): 505-509, 2005

2. Pressman, 1997. Software Engineering. A 10. Hodgkinson, A. and P. Garatt, 1999. A neuro fuzzy
Practitioners Approach. McGraw Hill. cost estimator. Proc. Intl. Conf. Software Eng.
3. Sommerville, R., 1996. Software Engineering. Application, pp: 401-406.
Addison Wesley. 11. Haykyn, S., 2003. Neural Networks, A
4. Kjetil Molekkan and Magne Jergense, 2003. A Comprehensive Foundation. Prentice Hall, India.
review of surveys on software effort estimation. 12. Agarwal, K., Y. Singh and M. Puri, 2005.
Proc. 2003 Intl. Symp. Empirical Software Measurement of software understandability using
Engineering. neural networks. Proc. Intl. Conf.
5. Putnam, L.H., 1978. A general empirical solution Multidimensional Aspects of Engineering, IEEE,
to the macro software sizing and estimation WEI Group.
problem. IEEE Trans. Software Engineering, 4: 13. Cybenko, G., 1989. Approximation by
345-361. superposition of a sigmoidal function. Math,
6. Dawson, C.W., 1996. A neural network approach Control, Signal and Syst., 5: 233-243.
to software projects effort estimation. Trans. 14. Funahashi, K., 1989. On the approximate
Information and Commun. Technol. realization of continuous mappings by neural
7. Finnie, G. and G. Wittig, 1996. AI tools for networks. Neural Networks, 2: 183-192.
software development effort estimation. IEEE 15. Barron, A.R., 1993. Universal approximation
Trans. Software Engineering, pp: 346-353. bounds for superposition of a sigmoid function.
8. Ali Idri Tagi, M. Khoshgoftaar and Alin Abran, IEEE Trans. Inform. Theory, 39: 930-945.
2002. Can neural networks be easily interpreted in 16. Hornik, 1989. Stinchcombe and white, multilayer
software cost estimation. IEEE Trans. Software feedforward networks are universal approximators.
Engineering, pp: 1162-1167. Neural Networks, 2: 359-366.
9. Fuzzy systems and neural networks in software 17. Rumelhart et al., 1986. Learning
engineering project management. J. Applied Intell., representations by back –propagating errors.
4: 31-42. Nature, 323: 533-6.

509

You might also like