Lecture2 SLR

Lecture 2
ML in Software Engineering
Software Project Management: What is it?
• Project management is one of the most critical activities in software

system because it includes the entire software development process
from start to finish.
• Estimating software effort is a part of project management, especially

for planning and monitoring a software project to ensure the project is
completed on time and within the desired budget
Why Software Projects Failed
• In the year 2017, it was estimated that failure in software causes a

nearly $1.7 trillion loss
• Loss in the US economy 25-75 billion dollars during 2005-2010 due

to the failure of software projects.
(This is because these projects started without or inaccurate estimation of the time and budget
the projects takes)
Prediction/Estimation in SE
• To start a new software project, the managers need to identify how

much effort will be required to complete this project.
• Effort is considered as the ‘budget and schedule’ of the project.
• To get this info, software managers need some kind of Effort

estimation/prediction of similar (past completed) projects
Prediction/Estimation in SE
• In earlier days, these estimations were performed by some algorithmic

models (mathematical equations) or expert judgment.
• Expert Judgment was manual estimation by experts (e.g. software

managers) who have successfully completed similar projects in the past.
• These techniques (algorithmic models and expert judgment) are referred

as non-ML techniques.
Problems with non-ML techniques
• These techniques were used in the SE studies from early 90s to early
2000s.
• However, they primarily worked better for small size projects and
when used with the medium or large size software projects, the
estimation provided by these techniques were reported as “inaccurate”
and the projects failed to complete.
ML in Software Development Efforts Estimation (SDEE)
• SDEE using ML-based methods got the attention of researchers and

has been extensively used since 1991.
• ML allows performing the estimation using the information of

previously finished projects.
• By applying this learning mechanism, the experts spend less time on

the estimation of the proposed project and more time on other
functionality of the software system which will satisfy the customer
• In the last two decade they have been widely investigated with the aim
of providing better prediction accuracy compared to the other
techniques (algorithmic models, expert judgment, etc).
The Rise of ML in SE
• Jorgensen performed an SLR, and the results revealed that the use of
ML techniques has increased since the early 2000 and the algorithmic
models (non‐ML techniques) became scarce with the passage of time.
• From the provided analysis and interpretation of the results, we

believe that the main reason is the possibility of obtaining better effort
predictions.
The Rise of ML in SE
• The SLR in SDEE (Wen et al) and SFP (Malhotra et al) found that ML
techniques performed better than non-ML techniques in 66% and 65%
of studies respectively.
• All these findings suggest that the use of ML techniques in the domain
of software engineering is recommended to improve the quality of
future software systems.
A Systematic Literature Review (SLR) about the use of ML in
SDEE
THE METHOD USED TO CONDUCT THE SLR
SLR is conducted in SE according to w the guidelines suggested

by Kitchenham
Steps of the SLR
• In the first phase, we set up a few research questions that are related to
the objectives of the SLR.
• While considering these research questions, in the second phase of the

SLR, we designed a search strategy to identify studies that will help in
answering our research questions.
• In this stage, we define a search string as well as the literature

resources to identify our iterative and unbiased search strategy.
Steps of the SLR
• The third phase is about finding all the relevant studies from selected
literature resources which are based on our research questions.
• We define the inclusion and exclusion criteria to determine what

studies to include and what to discard.
• Then, we define the quality assessment criteria to gauge the strength

and quality of individual studies.
Steps of the SLR
• In the fifth phase, we collect all the important information which

adhere our research questions and store them in a table (called the data
extraction form).
• In the final phase of the SLR, we analyze and synthesize the extracted
data based on the research questions
The SLR we performed (The Use of ML in SDEE)
• Research questions:
• The goal of our SLR was to select and analyze the studies (from 1991-
2017) in the domain of SDEE which used ML techniques for
prediction.
• To this aim, we have formulated and analyzed seven different research

questions
Our RQs
• RQ1. Which ML techniques are used in SDEE studies?

• RQ2. Which ML techniques outperforms other ML techniques in
terms of effort estimation accuracy?
• RQ3. Do ML techniques provide better results in terms of effort
prediction accuracy than non‐ML techniques?
• RQ4. Which are the datasets most frequently used in SDEE studies?
• RQ5. Which are the accuracy measures widely used in SDEE studies
assessing ML techniques?
• RQ6. Which are the dominant journals for papers analyzing ML
techniques for SDEE?
Search strategy
• The main search strategy to identify and download the studies consist of
two phases:
a. Primary search
b. Secondary search
Regarding the primary search phase, we used the following steps:

1. Identify major terms from the research questions.
2. Consider synonyms and alternative terms used in step 1.
3. Search term combinations, ie, Boolean OR for synonyms and alternative spellings and
Boolean AND to combine major terms
Search String (Primary Search)
• Software AND (effort OR cost OR costs) AND (estimat* OR predict*)

AND (machine AND learning OR “data mining” OR “artificial
intelligence” or AI OR “pattern recognition” OR “case based
reasoning” OR “decision tree” OR “regression tree” OR “classification
tree”
OR “neural net*” OR “genetic algorithm” OR “genetic program*” OR
“Bayesian belief network” OR “Bayesian net*” OR “association
rule*” OR “support vector machine” OR SVR OR “support vector
regression” OR “support vector*.”
Search Interface of IEEE Explore
Secondary Search
• Some studies can be easily missed from the search string (primary
search), so we have to adopt the secondary search.
• We used the references of the identified and selected studies and

downloaded further studies, which were originally missed. .
Digital Libraries for Selecting Studies
• The literature sources we focused on for searching and selecting our primary
studies are the following:
IEEE Xplore
ScienceDirect
Scopus
Springer
ACM Digital Library
We decided to select these databases for retrieving studies because these are widely used
in the community of software engineering
Study Selection
• After the search and selection of our studies based on their titles and
abstracts, we used two main phases to filter them and get more
relevant and reliable literature.
• we first used our inclusion and exclusion criteria to decide which

studies to include or exclude in the systematic literature review. Then,
we used quality assessment criteria to further filter the selected
studies.
Inclusion Criteria
• Studies that used ML techniques for software effort estimation.

• Studies that used both ML and non‐ML techniques for software effort
estimation.
• Studies that compare different ML techniques or studies that compare
ML techniques with non‐ML techniques.
• For studies that are published both in journals and conferences, only
the journal version of the study was included.
Exclusion Criteria
• Studies which focus on software effort estimation but do not use ML

techniques.
• Studies employing a dependent variable different from effort/cost/budget
estimation.
• Studies which focus on the estimation of software
maintenance/quality/testing.
• Review studies (ie, without empirical investigations).
After carefully studying other related SLRs and several rounds of meetings, both the authors
have finalized the inclusion/exclusion criteria with mutual understanding.
Quality Assessment Criteria
• We performed a quality assessment for each study to determine its

credibility and relevance.
• It can be considered a supplementary criterion used to select the studies

in our SLR.
• After applying inclusion/exclusion criteria, we have exploited some
quality criteria questions to weigh the obtained candidate studies.
• Studies having a low quality (ie, with weights less than certain thresholds) have
been excluded.
Q1. Are the aims of the research clearly stated?

Q2. Are the estimation methods well defined?
Q3. Is the experiment applied on sufficient dataset(s)?
Q4. Is the accuracy of estimations measured?
Q5. Is the proposed estimation method compared with other methods?
Q6. Are the limitations of the study analyzed explicitly?
Q7. How latest the publication is?
Q8. Does the study have sufficient number of average citation count?
• Each question, for a single study gets a value between 0 - 1
• If the overall score of a study for all questions is less than 4, the
study/paper is considered of low quality and hence, excluded.
• Results section will be discussed in next lecture

Lecture2 SLR

Uploaded by

Copyright:

Available Formats

Lecture2 SLR

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture2 SLR

Uploaded by

Copyright:

Available Formats

Lecture 2

• Project management is one of the most critical activities in software

• Estimating software effort is a part of project management, especially

• In the year 2017, it was estimated that failure in software causes a

• Loss in the US economy 25-75 billion dollars during 2005-2010 due

• To start a new software project, the managers need to identify how

• Effort is considered as the ‘budget and schedule’ of the project.

• To get this info, software managers need some kind of Effort

• In earlier days, these estimations were performed by some algorithmic

• Expert Judgment was manual estimation by experts (e.g. software

• These techniques (algorithmic models and expert judgment) are referred

• SDEE using ML-based methods got the attention of researchers and

• ML allows performing the estimation using the information of

• By applying this learning mechanism, the experts spend less time on

• From the provided analysis and interpretation of the results, we

SLR is conducted in SE according to w the guidelines suggested

• While considering these research questions, in the second phase of the

• In this stage, we define a search string as well as the literature

• We define the inclusion and exclusion criteria to determine what

• Then, we define the quality assessment criteria to gauge the strength

• In the fifth phase, we collect all the important information which

• To this aim, we have formulated and analyzed seven different research

• RQ1. Which ML techniques are used in SDEE studies?

Regarding the primary search phase, we used the following steps:

• Software AND (effort OR cost OR costs) AND (estimat* OR predict*)

• We used the references of the identified and selected studies and

• we first used our inclusion and exclusion criteria to decide which

• Studies that used ML techniques for software effort estimation.

• Studies which focus on software effort estimation but do not use ML

• We performed a quality assessment for each study to determine its

• It can be considered a supplementary criterion used to select the studies

Q1. Are the aims of the research clearly stated?

• Each question, for a single study gets a value between 0 - 1

You might also like