Empiricalstudiesagiledevelopment Postprint
Empiricalstudiesagiledevelopment Postprint
Empiricalstudiesagiledevelopment Postprint
net/publication/222827396
CITATIONS READS
2,560 20,243
2 authors:
All content following this page was uploaded by Torgeir Dingsøyr on 15 November 2020.
Abstract
Agile software development represents a major departure from traditional, plan-based approaches to
software engineering. A systematic review of empirical studies of agile software development up to
and including 2005 was conducted. The search strategy identified 1,996 studies, of which 36 were
identified as empirical studies. The studies were grouped into four themes: introduction and
adoption, human and social factors, perceptions on agile methods, and comparative studies. The
review investigates what is currently known about the benefits and limitations of, and the strength of
evidence for, agile methods. Implications for research and practice are presented. The main
implication for research is a need for more and better empirical studies of agile software
development within a common research agenda. For the industrial readership, the review provides a
map of findings, according to topic, that can be compared for relevance to their own settings and
situations.
1 Introduction
The issue of how software development should be organized in order to deliver faster, better,
and cheaper solutions has been discussed in software engineering circles for decades. Many
remedies for improvement have been suggested, from the standardization and measurement of
the software process to a multitude of concrete tools, techniques, and practices.
1 Postprint of: Dybå, T., & Dingsøyr, T. (2008). Empirical Studies of Agile Software Development: A Systematic Review. Information and
Software Technology, 50, 833-859. doi:doi:10.1016/j.infsof.2008.01.006
https://www.sciencedirect.com/science/article/abs/pii/S0950584908000256?via%3Dihub
Released under Creative Commons Attribution Non-Commercial No Derivatives License.
1
Recently, many of the suggestions for improvement have come from experienced
practitioners, who have labelled their methods agile software development. This movement
has had a huge impact on how software is developed worldwide. However, though there are
many agile methods, little is known about how these methods are carried out in practice and
what their effects are.
This systematic review seeks to evaluate, synthesize, and present the empirical findings on
agile software development to date, and provide an overview of topics researched, their
findings, strength of the findings, and implications for research and practice. We believe this
overview will be important for practitioners who want to stay up to date with the state of
research, as well as for researchers who want to identify topic areas that have been researched
or where research is lacking. This review will also help the scientific community that works
with agile development to build a common understanding of the challenges that must be faced
when investigating the effectiveness of agile methods. The results of such investigation will
be relevant to the software industry.
Methods for agile software development constitute a set of practices for software
development that have been created by experienced practitioners [68]. These methods can be
2
seen as a reaction to plan-based or traditional methods, which emphasize “a rationalized,
engineering-based approach” [21, 47] in which it is claimed that problems are fully
specifiable and that optimal and predictable solutions exist for every problem. The
“traditionalists” are said to advocate extensive planning, codified processes, and rigorous
reuse to make development an efficient and predictable activity [11].
agility means to strip away as much of the heaviness, commonly associated with the
traditional software-development methodologies, as possible to promote quick response to
changing environments, changes in user requirements, accelerated project deadlines and the
like. (p. 89)
Williams and Cockburn [66] state that agile development is “about feedback and change”,
that agile methodologies are developed to “embrace, rather than reject, higher rates of
change”.
In 2001, the “agile manifesto” was written by the practitioners who proposed many of the
agile development methods. The manifesto states that agile development should focus on four
core values2:
• Individuals and interactions over processes and tools.
• Working software over comprehensive documentation.
• Customer collaboration over contract negotiation.
• Responding to change over following a plan.
In an article that describes the history of iterative and incremental development, Larman and
Basili [40] identify Dynamic Systems Development Method (DSDM) [60] as the first agile
method, followed by extreme programming (XP) [9], which originated from the Chrysler C3
project in 1996 [5]. In 1998, the word “agile” was used in combination with “software
process” for the first time [6]. Several further methods followed, including the Crystal family
of methods [16], EVO [28], Feature-Driven Development [50], Lean Development [52] and
3
Scrum [56]. In 2004, a new version of XP appeared [10]. See Table 1 for an overview of the
most referenced agile development methods, and Table 2 for a comparison of traditional and
agile development.
2 http://agilemanifesto.org
4
Extreme Focuses on best practice for development. Consists of twelve practices: [9, 10]
programming the planning game, small releases, metaphor, simple design, testing,
(XP; XP2) refactoring, pair programming, collective ownership, continuous
integration, 40-hour week, on-site customers, and coding standards. The
revised “XP2” consists of the following “primary practices”: sit together,
whole team, informative workspace, energized work, pair programming,
stories, weekly cycle, quarterly cycle, slack, ten-minute build, continuous
integration, test-first programming, and incremental design. There are
also 11 “corollary practices”.
Table 2: Main differences between traditional development and agile development [47].
Many have tried to explain the core ideas in agile software development, some by examining
similar trends in other disciplines. Conboy and Fitzgerald [19], for example, describe agility
as what is known in other fields as “flexibility” and “leanness”. They refer to several sources
of inspiration, primarily:
• Agile manufacturing, which was introduced by researchers from Lehigh University in
an attempt for the USA to regain its competitive position in manufacturing. Key
concepts in agile manufacturing are integrating customer-supplier relationships,
managing change, uncertainty, complexity, utilizing human resources, and
information [30, 55].
5
• Lean development [67], which is rooted in the Toyota Production System [49] from
the 1950s. Some of the core ideas in this system were to eliminate waste, achieve
quality first time, and focus on problem solving.
Meso and Jain [44] have compared ideas in agile development to those in Complex Adaptive
Systems by providing a theoretical lens for understanding how agile development can be used
in volatile business environments. Turk et al. [64] have clarified the assumptions that underlie
processes of agile development and also identifies the limitations that may arise from these
assumptions. In the literature, we also find articles that trace the roots of agile development to
the Soft Systems Methodology of Peter Checkland [14], New product development [63] and
Ackoff’s interactive planning [4].
Nerur and Balijepally [46] compare agile development to maturing design ideas in
architectural design and strategic management: “the new design metaphor incorporates
learning and acknowledges the connectedness of knowing and doing (thought and action), the
interwoven nature of means and ends, and the need to reconcile multiple world-views”. (p.
81)
However, agile development methods have also been criticized by some practitioners and
academics, mainly focusing on five aspects:
1. Agile development is nothing new; such practices have been in place in software
development since the 1960s [43].
3. There is little scientific support for many of the claims made by the agile community
[42].
4. The practices in XP are rarely applicable, and are rarely applied by the book [34].
5. Agile development methods are suitable for small teams, but for larger projects, other
processes are more appropriate [17].
It has also been suggested that the social values embraced by extreme programming makes
agile teams make ineffective decisions, which are contrary to those that the group members
desire [41].
6
2.2 Summary of Previous Reviews
Introductions to and overviews of agile development are given by Abrahamsson et al. [2],
Cohen et al. [17], and Erickson et al. [27]. These three reports describe the state of the art and
state of the practice in terms of characteristics of the various agile methods and lessons
learned from applying such methods in industry. We summarize each of these previous
overviews briefly.
The first review of the existing literature on agile software development was done in a
technical report published by Abrahamsson et al. at VTT in 2002 [2]. The report discusses the
concept of agile development, presents processes, roles, practices, and experience with 10
agile development methods, and compares the methods with respect to the phases that they
support and the level of competence that they require. Only DSDM and the Rational Unified
Process [38] were found to give full coverage to all phases of development, while Scrum
mainly covers aspects related to project management. Abrahamsson et al. found anecdotal
evidence that agile methods are “effective and suitable for many situations and
environments”, but state that very few empirically validated studies support these claims. The
report was followed by a comparative analysis of nine agile methods in 2003 [3], where it is
stated that empirical support for the suggested methods remains scarce.
Cohen et al.’s review published in 2004 [17] emphasizes the history of agile development,
shows some of the roots to other disciplines, and, in particular, discusses relations between
agile development and the Capability Maturity Model (CMM) [51]. They further describe the
state of the art with respect to the main agile methods and their characteristics. They also
describe the state of the practice, which resulted from an online discussion between 18
practitioners, many of whom were involved in defining the various agile development
methods. They discuss issues such as the introduction of, and project management in, agile
development. They also present experiments and surveys, and seven case studies of agile
development. The authors believe that agile methods will be consolidated in the future, just as
object-oriented methods were consolidated. Further, they do not believe that agile methods
will rule out traditional methods. Rather, they believe that agile and traditional methods will
have a symbiotic relationship, in which factors such as the number of people working on a
project, application domain, criticality, and innovativeness will determine which process to
select.
7
In 2005, Erickson et al. [27] described the state of research on XP, agile software
development, and agile modelling. With respect to XP, they found a small number of case
studies and experience reports that promote the success of XP. The XP practice of pair
programming is supported by a more well-established stream of research, and there are some
studies on iterative development. Erickson et al. recommend that the other core practices in
XP be studied separately in order to identify what practices are working (for recent studies of
practices, see [22, 26]). Further, they see challenges with matching agile software
development methods with standards such as ISO, and they argue that this is an area that
needs further research. There was much less research on agile modelling than on XP.
In a short time, agile development has attracted huge interest from the software industry. A
survey in the USA and Europe reveals that 14% of companies are using agile methods, and
that 49% of the companies that are aware of agile methods are interested in adopting them
[1]. In just six years, the Agile3 conference has grown to attract a larger attendance than most
conferences in software engineering.
Rajlich [53] describes agile development as a paradigm shift in software engineering, which
has emerged from independent sources: studies of software life cycles and iterative
development. “The new paradigm brings a host of new topics into the forefront of software
engineering research. These topics have been neglected in the past by researchers inspired by
the old paradigm, and therefore there is a backlog of research problems to be solved.” (p. 70)
No systematic review of agile software development research has previously been published.
The existing reviews that were presented in the previous section only partially cover the
empirical studies that exist today. Further, the previous reviews do not include any
assessment of the quality of the published studies, as in this systematic review.
This means that practitioners and researchers have to rely on practitioner books in order to get
an overview. We hope that this article will be useful for both groups, and that it will make
clear which claims on agile software development are supported by scientific studies.
3 www.agile200X.org
8
The objective of the review is to answer the following research questions:
1. What is currently known about the benefits and limitations of agile software
development?
3. What are the implications of these studies for the software industry and the research
community?
3 Review method
Informed by the established method of systematic review [31, 35, 36], we undertook the
review in distinct stages: the development of review protocol, the identification of inclusion
and exclusion criteria, a search for relevant studies, critical appraisal, data extraction, and
synthesis. In the rest of this section, we describe the details of these stages and the methods
used.
We developed a protocol for the systematic review by following the guidelines, procedures,
and policies of the Campbell Collaboration4, the Cochrane Handbook for Systematic Reviews
of Interventions [31], the University of York’s Centre for Reviews and Dissemination’s
guidance for those carrying out or commissioning reviews [35], and consultation with
software engineering specialists on the topic and methods. This protocol specified the
research questions, search strategy, inclusion, exclusion and quality criteria, data extraction,
and methods of synthesis.
Studies were eligible for inclusion in the review if they presented empirical data on agile
software development and passed the minimum quality threshold (see Section 3.5). Studies of
both students and professional software developers were included. Inclusion of studies was
4 www.campbellcollaboration.org
9
not restricted to any specific type of intervention or outcome measure. The systematic review
included qualitative and quantitative research studies, published up to and including 2005.
Only studies written in English were included.
Studies were excluded if their focus, or main focus, was not agile software development or if
they did not present empirical data. Furthermore, as our research questions are concerned
with agile development as a whole, and its underlying assumptions, studies that focused on
single techniques or practices, such as pair programming, unit testing, or refactoring, were
excluded. In addition to agile methods in general, we included the following specific
methods: XP, Scrum, Crystal, DSDM, FDD, and Lean.
Finally, given that our focus was on empirical research, “lessons learned” papers (papers
without a research question and research design) and papers merely based on expert opinion
were also excluded.
The search strategy included electronic databases and hand searches of conference
proceedings. The following electronic databases were searched:
• XP
• XP/Agile Universe
• Agile Development Conference
10
Figure 1 shows the systematic review process and the number of papers identified at each
stage. In stage 1, the titles, abstracts, and keywords of the articles in the included electronic
databases and conference proceedings were searched using the following search terms:
All these search terms for agile articles were combined by using the Boolean “OR” operator,
which entails that an article only had to include any one of the terms to be retrieved. That is,
we searched:
1 OR 2 OR 3 OR 4 OR 5 OR 6 OR 7 OR 8 OR 9
Excluded from the search were editorials, prefaces, article summaries, interviews, news,
reviews, correspondence, discussions, comments, reader’s letters and summaries of tutorials,
workshops, panels, and poster sessions. This search strategy resulted in a total of 2,946 “hits”
that included 1,996 unduplicated citations.
11
Identify relevant studies –
Stage 1 search databases and n = 1,996
conference proceedings
Relevant citations from stage 1 (n = 1,996) were entered into and sorted with the aid of
EndNote. They were then imported to Excel, where we recorded the source of each citation,
our retrieval decision, retrieval status, and eligibility decision. For each subsequent stage,
separate EndNote databases and Excel sheets were established.
At stage 2, both authors sat together and went through the titles of all studies that resulted
from stage 1, to determine their relevance to the systematic review. At this stage, we excluded
studies that were clearly not about agile software development, independently of whether they
were empirical or not. As an example, because our search strategy included the term “xp and
software”, we got several “hits” on articles about Microsoft’s Windows XP operating system.
In addition, because we used the term “agile and software”, we got several hits on articles
related to agile manufacturing. Articles with titles that indicated clearly that the articles were
outside the scope of this systematic review were excluded. However, titles are not always
clear indicators of what an article is about. Some authors’ use of “clever” or witty titles can
sometimes obscure the actual content of an article. In such cases, the articles were included
for review in the next stage. At this stage, 1,175 articles were excluded.
At stage 3, studies were excluded if their focus, or main focus, was not agile software
development or if they did not present empirical data. However, we found that abstracts were
of variable quality; some abstracts were missing, poor, and/or misleading, and several gave
12
little indication of what was in the full article. In particular, it was not always obvious
whether a study was, indeed, an empirical one. Therefore, at this stage, we included all
studies that indicated some form of experience with agile development. If it was unclear from
the title, abstract, and keywords whether a study conformed to the screening criteria, it was
included for a detailed quality assessment (see below).
At this stage, we divided the abstracts among ourselves and a third researcher in such a way
that each abstract was reviewed by two researchers independently of each other. For the 821
abstracts assessed, the number of observed agreements was 738 (89.9 percent). We also
computed the Kappa coefficient of agreement, which corrects for chance agreement [18]. The
Kappa coefficient for stage 3 assessments was 0.78, which is characterized as “substantial
agreement” by Landis and Koch [39]. All disagreements were resolved by discussion that
included all three researchers, before proceeding to the next stage. As a result of this
discussion, another 551 articles were excluded at this stage, which left 270 articles for the
detailed quality assessment.
Each of the 270 studies that remained after stage 3 was assessed independently by both
authors, according to 11 criteria. These criteria were informed by those proposed for the
Critical Appraisal Skills Programme (CASP)5 (in particular, those for assessing the quality of
qualitative research [29]) and by principles of good practice for conducting empirical research
in software engineering [37].
The 11 criteria covered three main issues pertaining to quality that need to be considered
when appraising the studies identified in the review (see Appendix B):
• Rigour: has a thorough and appropriate approach been applied to key research
methods in the study?
• Credibility: are the findings well-presented and meaningful?
• Relevance: how useful are the findings to the software industry and the research
community?
5 www.phru.nhs.uk/casp/casp.htm
13
We included three screening criteria that were related to the quality of the reporting of a
study’s rationale, aims, and context. Thus, each study was assessed according to whether:
1. the study reported empirical research or whether it was merely a “lessons learned”
report based on expert opinion;
2. the aims and objectives were clearly reported (including a rationale for why the study
was undertaken);
3. there was an adequate description of the context in which the research was carried out.
The first of these three criteria represents the minimum quality threshold of the review and
was used to exclude non-empirical research papers (see Appendix B). As part of this
screening process, any single-technique or single-practice papers were also identified and
excluded.
Five criteria were related to the rigour of the research methods employed to establish the
validity of data collection tools and the analysis methods, and hence the trustworthiness of the
findings. Consequently, each study was assessed according to whether:
4. the research design was appropriate to address the aims of the research;
5. there was an adequate description of the sample used and the methods for identifying
and recruiting the sample;
6. any control groups were used to compare treatments;
7. appropriate data collection methods were used and described;
8. there was an adequate description of the methods used to analyze data and whether
appropriate methods for ensuring the data analysis were grounded in the data.
In addition, two criteria were related to the assessment of the credibility of the study methods
for ensuring that the findings are valid and meaningful. In relation to this, we judged the
studies according to whether:
9. the relationship between the researcher and participants was considered to an adequate
degree;
10. the study provided clearly stated findings with credible results and justified
conclusions.
14
The final criterion was related to the assessment of the relevance of the study for the software
industry at large and the research community. Thus, we judged the studies according to
whether:
Taken together, these 11 criteria provided a measure of the extent to which we could be
confident that a particular study’s findings could make a valuable contribution to the review.
Each of the 11 criteria was graded on a dichotomous (“yes” or “no”) scale. Again, only
criterion 1 was used as the basis for including or excluding a study.
Of the 270 articles assessed for quality, the number of observed agreements regarding
inclusion/exclusion based on the screening criterion was 255 (94.4 percent). The
corresponding Kappa coefficient was 0.79. Again, all disagreements were resolved by
discussion that included all three researchers. At this stage, another 234 lessons-learned or
single-practice articles were excluded, leaving 33 primary and 3 secondary studies for data
extraction and synthesis. A summary of the quality assessment criteria for these studies is
presented in Table 3.
1. Is the paper based on research (or is it merely a “lessons learned” report based on expert opinion)?
2. Is there a clear statement of the aims of the research?
3. Is there an adequate description of the context in which the research was carried out?
4. Was the research design appropriate to address the aims of the research?
5. Was the recruitment strategy appropriate to the aims of the research?
6. Was there a control group with which to compare treatments?
7. Was the data collected in a way that addressed the research issue?
8. Was the data analysis sufficiently rigorous?
9. Has the relationship between researcher and participants been considered to an adequate degree?
10. Is there a clear statement of findings?
11. Is the study of value for research or practice?
During this stage, data was extracted from each of the 33 primary studies included in this
systematic review according to a predefined extraction form (see Appendix C). This form
15
enabled us to record full details of the articles under review and to be specific about how each
of them addressed our research questions.
When we piloted the extraction process we found that extracting data was hindered by the
way some of the primary studies were reported. Due to this, we also found that we differed
too much in what we actually extracted for independent extraction to be meaningful. As a
consequence, all data from all primary studies were extracted by both authors in consensus
meetings.
The aims, settings, research methods descriptions, findings, and conclusions, as reported by
the authors of the primary studies, were copied verbatim into NVivo, from QSR Software6, a
specialist software package for undertaking the qualitative analysis of textual data.
Meta-ethnographic methods were used to synthesize the data extracted from the primary
studies [48]. The first stage of the synthesis was to identify the main concepts from each
primary study, using the original author’s terms. The key concepts were then organized in
tabular form to enable comparison across studies and the reciprocal translation of findings
into higher-order interpretations. This process is analogous to the method of constant
comparison used in qualitative data analysis [45, 62]. When we identified differences in
findings, we investigated whether these could be explained by the differences in methods or
characteristics of the study setting.
In a meta-ethnographic synthesis, studies can relate to one another in one of three ways: they
may be directly comparable as reciprocal translations; they may stand in opposition to one
another as refutational translations; or taken together they may represent a line of argument
[13]. Table 4 shows Noblit and Hare’s seven-step process for conducting a meta-ethnography.
This process of reciprocal and refutational translation and synthesis of studies achieved three
things with respect to answering our overarching question about the benefits and limitations
of agile software development. First, it identified a set of higher-order interpretations, or
themes, which recurred across studies. Second, it documented that agile software
6 See http://www.qsrinternational.com/
16
development contains both positive and negative dimensions. Finally, it highlighted gaps in
the evidence about the applicability of agile methods to software development.
1. Getting started
2. Deciding what is relevant to the initial interest
3. Reading the studies
4. Determining how the studies are related
5. Translating the studies into one another
6. Synthesizing translations
7. Expressing the synthesis
4 Results
We identified 36 empirical studies on agile software development. Thirty-three are primary
studies (S1-S33) and three are secondary studies (S34-S36); see Appendix A. In what
follows, we discuss the primary studies. These cover a range of research topics, were done
with a multitude of research methods, and were performed in settings that ranged from
professional projects to university courses. Key data, along with a description of the domain
in which each primary study was conducted, is presented in Appendix D.
We categorized the studies into four main groups: 1) introduction and adoption, 2) human and
social factors, 3) customer and developer perceptions, and 4) comparative studies. Three
studies did not fit into any of these categories. They provide baseline data on various aspects
of agile development (S1, S8, S11).
We now describe characteristics of the studies, describe the research methods applied, and
assess the quality of the studies. Then, we present the studies included in the four categories
mentioned above.
With respect to the kinds of agile method that have been studied, we see from Table 5 that 25
(76%) of the studies in this review were done on XP. Studies on agility in general come next,
with five (15%) of the studies. Scrum and Lean Software Development were studied in only
one empirical research article each.
17
Table 5: Studies after type of agile method used in the study.
If we look at the level of experience of the employees who perform agile development in the
reviewed studies (Appendix D), we see that 24 (73%) of the studies that investigated agile
projects dealt with employees who are beginners (less than a year of experience in agile
development). Four (12%) studies dealt with mature agile development teams (at least one
year of experience in agile development). Two studies did not indicate whether it was a
beginner or mature team that was studied and for three studies (surveys) this classification
was not applicable.
Most studies (24 of 33, 73%) dealt with professional software developers. The remaining nine
(27%) were conducted in a university setting. Most projects were of short duration and were
completed in small teams.
18
Table 6: Distribution of studies after publication channel and occurrence.
Table 6 gives an overview of the studies according to publication channel. We see that the
conferences XP and Agile Development have the largest number of studies. Most of the
studies were published in conferences (26 of 33, 79%), while seven (21%) appeared in
scientific journals.
19
single-case studies, nine were done in projects in industry. The material for the other four
studies was taken from projects where students did the development. Interestingly, three of
these studies took their data from the same project. Only one of the single-case studies in
industry was done on a mature development team.
For the 11 multiple-case studies, all were done in industry, but only three of the studies were
on mature teams. The number of cases varied from two to three.
Three of the four surveys were done on employees in software companies, while one was
done on students. The three experiments were all done on students, with team sizes ranging
from three to 16. For the two mixed-method studies, Melnik and Maurer (S22) reported on a
survey amongst students in addition to interviews and notes from discussions. The study by
Baskerville et al. (S3), reported on 10 case studies in companies, in combination with
findings from group discussions in a “discovery colloquium” that was inspired by principles
in action research [8].
7 www.phru.nhs.uk/casp/casp.htm
20
Taken together, these 11 criteria provide a measure of the extent to which we can be
confident that a particular study’s findings can make a valuable contribution to the review.
The grading of each of the 11 criteria was done on a dichotomous (“yes” or “no”) scale. The
results of the quality assessment are shown in Table 8, in which a “1” indicates “yes” (or OK)
to the question, while “0” indicates “no” (or not OK).
Because we only included research papers in this review, all included studies were rated as
OK on the first screening criterion. However, two of the included studies still did not have a
clear statement of the aims of the research. All studies had some form of description of the
context in which the research was carried out. For three of the studies, the chosen research
design did not seem appropriate to the aims of the research. As many as 25 out of the 33
primary studies did not have a recruitment strategy that seemed appropriate for the aims
stated for the research. Ten of the studies included one or more groups with which to compare
agile methods. As many as seven and eight studies, respectively, did not adequately describe
their data collection and data analysis procedures. In only one study was the recognition of
any possibility of researcher bias mentioned.
21
Table 8. Quality assessment.
1 2 3 4 5 6 7 8 9 10 11
Total
Study Research Aim Context R. design Sampling Ctrl. grp. Data coll. Data anal. Reflexivity Findings Value
S1 1 1 1 0 0 0 1 1 0 1 1 7
S2 1 1 1 1 1 1 1 1 0 1 1 10
S3 1 1 1 1 0 0 0 0 0 1 1 6
S4 1 1 1 1 1 0 1 1 0 1 1 9
S5 1 1 1 1 0 1 1 1 0 1 1 9
S6 1 1 1 1 0 1 1 1 0 1 1 9
S7 1 1 1 1 0 1 1 1 0 1 1 9
S8 1 1 1 1 0 0 1 0 0 1 1 7
S9 1 1 1 1 1 0 1 1 0 1 1 9
S10 1 0 1 1 0 1 1 1 0 1 1 8
S11 1 1 1 1 0 0 0 0 0 1 1 6
S12 1 1 1 1 1 0 1 1 0 1 1 9
S13 1 1 1 1 0 0 0 1 0 1 1 7
S14 1 1 1 1 0 1 1 1 0 1 1 9
S15 1 1 1 1 0 1 1 1 0 1 1 9
S16 1 1 1 1 0 0 1 1 0 1 1 8
S17 1 1 1 1 0 0 1 1 0 1 1 8
S18 1 1 1 1 0 0 1 0 0 1 1 7
S19 1 1 1 1 0 0 1 1 0 1 1 8
S20 1 1 1 1 0 0 1 1 0 1 1 8
S21 1 1 1 1 0 0 1 1 0 1 1 8
S22 1 1 1 1 1 0 1 1 0 1 1 9
S23 1 0 1 0 0 0 0 0 0 1 1 4
S24 1 1 1 1 0 0 1 1 0 1 1 8
S25 1 1 1 1 0 0 0 0 0 1 1 6
S26 1 1 1 1 0 0 0 0 0 1 1 6
S27 1 1 1 1 0 0 1 1 0 1 1 8
S28 1 1 1 1 1 1 1 1 0 1 1 10
S29 1 1 1 1 1 0 1 1 0 1 1 9
S30 1 1 1 1 1 0 1 1 0 1 1 9
S31 1 1 1 0 0 0 0 0 1 1 1 6
S32 1 1 1 1 0 1 1 1 0 1 1 9
S33 1 1 1 1 0 1 1 1 0 1 1 9
Total 33 31 33 30 8 10 26 25 1 33 33
22
We frequently found the following: methods were not well described; issues of bias, validity,
and reliability were not always addressed; and methods of data collection and analysis were
often not explained well. None of the studies got a full score on the quality assessment and
only two studies got one negative answer. Twenty-one studies were rated at two or three
negative answers, while ten studies were rated as having four or more negative answers. The
highest number of negative answers was seven.
Several studies addressed how agile development methods are introduced and adopted in
companies; see Table 9. We characterized these studies as falling into three broad groups:
those that discuss introduction and adoption, those that discuss how the development process
is changed, and those that discuss how knowledge and projects are managed.
Table 9: Study aims for studies on the introduction and adoption of agile development methods.
However, some researchers argue that there is nothing new about agile methods. Hilkka et al.
(S9) studied two development organizations in Finland, and concluded that XP is “old wine
in new bottles”. XP “formalizes several habits that appear naturally (...) close customer
involvement, short release cycles, cyclical development, and fast response to change
requests”. In a company-internal development department, the researchers found that:
the tools and techniques of XP had been employed for more than 10 years and had been
applied in a quite systematic fashion, though the company had never made a deliberate
decision to use XP. (p. 52)
Another “new economy” company was more aware of developments in the field:
23
the XP process had more or less emerged as a novel way of solving time and budget
constraints. The developers were aware of XP practices, but did not choose to engage in it by
the book. (p. 52)
However, most studies treated agile development as something “new” and that consequently
requires introduction and adoption.
Svensson and Höst (S30) present the results of introducing a process based on XP to a large
software development company. The process was introduced to a pilot team that worked for
eight months. Svensson and Höst concluded that the introduction of the process proved
difficult, due to the complexity of the organization. They advise companies that want to
introduce agile development methods to assess existing processes, with the following goals in
mind: determining what to introduce; clarifying terminology to simplify communication with
the rest of the company; avoiding underestimating the effort needed to introduce and adapt
XP; and introducing the practice of continuous testing early, because it takes time and effort
to introduce this properly.
In contrast, Bahli and Zeid (S2) studied how a Canadian organization shifted from a waterfall
process to XP, and found that “even though team members had no prior experience with XP
(except one week of training), they found the model easy to use and useful to develop
information systems”. A development manager described the shift as follows:
The first week was tough, no one of my guys have a strong experience with XP. But they
quickly caught up and we got quite good results. A lot of work is needed to master XP but we
are getting there. (p. 8)
The study reports that the development team found using the waterfall model to be an
“unpleasant experience”, while XP was found to be “beneficial and a good move from
management”. The XP project was delivered a bit less late (50% time-overrun, versus 60%
for the traditional), and at a significantly reduced cost overrun (25%, compared to 50% cost
overrun for the traditional project).
Bahli and Zeid claim that the adoption of XP was facilitated by “a high degree of knowledge
creation enabled by mutual development of information systems developers and users”.
24
Karlström and Runeson (S29) found that XP teams experienced improved communication,
but were perceived by other teams as more isolated. Their study is described in detail below.
Tessem (S31) set up a project with researchers and students to learn more about how practices
in XP work. The project lasted for three weeks and had two deliveries (three planned, but
reduced because of “severe underestimation in the beginning”). Six people worked on the
project. The experience of the people varied from programming experience only from
university courses to people experienced with professional development. The aim of the
project was to develop a web application for group work in university courses. Tessem
reports on experience with key practices of XP. Pair programming was found to be a
“positive experience, enhancing learning and also leading to higher quality”. However, three
of the programmers also reported it to be “extremely inefficient”, “very exhausting”, and “a
waste of time”. Towards the end of the project, single programming was used to a greater
extent than pair programming. Tessem suggests that there is a connection between this shift in
programming methods and a higher occurrence of problems in the end. Frequent partner
changes are suggested as a way to achieve optimal learning and to increase collective code
ownership. Further, the on-site customer role was perceived as “very valuable by all
programmers”. Tessem also found that test-first programming contributed to “higher quality
in code”, while the project struggled to get functional tests running.
A study by Svensson and Höst (S29) also provides insight into the development process, but
focused primarily on how agile development affects customer collaboration. This study was
done in a software development company in Sweden with 250 software developers, who were
responsible for over 30 software systems. A modified process was introduced that mainly
followed XP. The researchers found that having the customer on-site enabled better
collaboration with the customer, because it provided an arena for detailed discussions.
Concepts from lean development were introduced in a large company’s information systems
department in a study organized by Middleton (S23). The techniques were tried on two two-
person teams that were maintaining a financial and management information system. The
teams were instructed to change their work practice, so that it involved shorter feedback
cycles and completing work before taking on more. In the beginning, many errors were
discovered in the work, which led to a time of “frustration and low productivity”. One of the
25
teams made fewer errors over time, but the other team continued to make a high number of
errors, which also led to an early termination of the study. According to Middleton, this was
because one person in the fault-prone team felt overqualified for the work and was not willing
to discuss his work with others, and was “unable to produce work without errors”. There was
no infrastructure in the company to handle this problem. Although the experiment was short
and only successful for one team, Middleton claimed that “by moving responsibility for
measuring quality from the manger to the workers, a much quicker and more thorough
response to defects was obtained”.
Hilkka et al. (S9) found that in the cases they studied, XP worked best with experienced
developers with domain and tool knowledge. The tools facilitated fast delivery and easy
modification of prototypes. In addition, continuous feedback was found to be a key factor for
success.
The study by Bahli and Zeid (S2) examined knowledge sharing in an XP project and a
traditional project. They found that when the XP model was used, the creation of tacit
knowledge improved as a result of frequent contacts:
Because the XP model’s main characteristics are short iterations with small releases and rapid
feedback, close user participation, constant communication and coordination and collective
ownership, knowledge and the capability to create and utilize knowledge among the
development team members are eminent. (p. 4)
Hilkka et al. also underline the importance of skilled team members with solid domain
knowledge: “without these kinds of persons, the chosen approach would probably have little
possibility to succeed” (S9).
Karlström and Runeson (S12) studied the feasibility of applying agile methods in large
software development projects, using stage-gate project management models. They report
findings from introductory trials with XP in three companies: ABB, Ericsson Microwave
Systems, and Vodaphone Group. They found that the engineers were motivated by the
principles of agile development, but that the managers were initially afraid and needed to be
trained. They also found that as a result of using agile development, the engineers focused on
26
past and current releases, while the managers focused increasingly on current and future
releases. A potential problem was that technical issues were raised too early for management.
In the study by Tessem (S31), the planning game practice of XP was used to estimate the size
of work. Estimates made by the project team at the beginning of the project were about one
third of what turned out to be correct, which is explained by both the team’s lack of
estimation experience and coarse user stories. Estimates improved towards the end of the
project. Several of the study participants mentioned that during the project, there were not
enough discussions on design and architecture.
In a study by Svensson and Höst (S30), the planning game activity was found to have a
positive effect on collaboration within the company, because it provided the organization with
better insight into the software development process.
Several studies examined various human and social factors related to agile development; see
Table 10. Three broad topics were investigated: the impact of organizational culture, how
collaborative work takes place in agile development, and what characterizes agile
development teams.
Table 10: Study aims for studies on human and social factors.
Robinson and Sharp (S26) found that XP has the ability to thrive in radically different
organizational settings in an ethnographically informed study of three companies in the UK.
27
The companies were studied with respect to three factors: organizational type, organizational
structure, and physical and temporal settings. These factors are described in Table 11.
Case A was a large multinational bank with an XP team that was “a small part of the bank’s
software development activities”. Case B was a medium-sized company that produces content
security software, using only XP. Case C was a small start-up company that had used XP
since the beginning to develop web-based intelligent advertisements.
Table 11: Differences in organization type, structure, and physical setting for the three cases studied
in S26.
Robinson and Sharp (S26) found that, despite the variations in organization type, structure,
and physical setting, XP was working well. They list a number of consequences for
development that were generated by the organizational culture.
Case C is further described in another publication (S27). Here, the development team is
described as:
In case C, the authors claim that the organization seemed to behave in an agile fashion. They
found no signs of such normal software development artefacts as modelling techniques,
requirements documents, or minutes of meetings. The working mode in this company
28
resembles descriptions of communities of practice in the literature on knowledge management
[65].
The organizational culture affected how XP was carried out, with respect to behaviour,
beliefs, attitudes and values, organizational structure, as well as physical and temporal
settings.
Collaborative work in XP development has been studied from three angles: the role of
conversation in collaborative work, how progress is tracked, and how work is standardized.
With respect to conversation, Robinson and Sharp (S25) describe pairing as a process of:
purposeful talk where they [two developers] discuss, explore, negotiate and progress the task
at hand. This talk has a complex structure with identifiable episodes of exploration, creation,
fixing & refining, overlaid with explaining, justifying & scrutinising. (p. 102)
Paring is described as intense and stressful, and one pair’s conversation would frequently
spread to other pairs. Mackenzie and Monk (S16) also emphasize the importance of
conversations, claiming that it constitutes “talking code into existence”.
With respect to tracking progress, Robinson and Sharp described it as happening on two
levels: the daily rhythm and the rhythm oriented around the iteration. Progress was
communicated in daily stand-up meetings, and teams would often have ceremonies around
releasing code. One team studied by Robinson and Sharp used a toy cow that was tilted to
make a ‘moo’ sound when new code was released. Chong (S5) reports similar findings,
stating that “XP makes developing software visually and aurally available”.
Studies of collaborative work also find that the work patterns are standardized. Chong (S5)
observed that:
29
One practice that standardizes work in XP is the planning game, which is described in use by
Mackenzie and Monk (S16):
the card game knit together in a rule-governed process a very disparate set of work processes
and relations involving management, the customer or client and all the members of the
software development team. (p. 114)
Mackenzie and Monk claimed that the process spans the usual boundaries between project
managers and software developers.
Robinson and Sharp (S24) claim that agile development teams have faith in their own
abilities, show respect and responsibility, establish trust, and preserve the quality of working
life. Young et al. (S33) used a technique called “repertory grid analysis” to identify good
personality characteristics for members of XP development teams.
Faith in one’s own abilities was observed to have two aspects in the study by Robinson and
Sharp (S24): believing that the team was capable of achieving the tasks at hand, and
understanding what the limitations were. The team received feedback on their beliefs from
successfully executing code, from a satisfied customer, and from support and encouragement
from each other.
Preserving the quality of working life was observed through constructive discussions in the
planning game, taking into account the needs of individuals in pair programming, and
adhering to 40 hour work-weeks. In addition, one team took regular breaks and identified
several ways to relieve developers in hectic periods.
Respect for one’s team members and a sense of responsibility were manifested via the way in
which work was assigned; active agreement was required. “Individuals clearly felt that they
had the respect of their fellow team members and were therefore empowered to take on
responsibility in this way”.
In the study by Robinson and Sharp (S24), trust was found to be pervasive:
30
The nature of the trust relationship here transcends the immediate business of two individuals
pairing and is persistent. It also applies across pairs (and sub-teams), with each pair trusting
the others to do their part, and it extends beyond the 12 practices. (p. 146)
Young et al. (S33) investigated what personality traits it is beneficial for team members to
possess in agile development. They discussed the traits of roles such as team leader, technical
lead, architect, good (XP) team member, and bad team member. Good XP team members are
described as “analytical, with good interpersonal skills and a passion for extending his
knowledge base (and passing this on to others).”
Several studies have investigated how agile methods are perceived by different groups. We
describe findings from studies that examined the perceptions amongst customers, developers,
and university students. Table 12 gives an overview of the aims of these studies.
Table 12: Study aims for the perceptions of customers, developers, and students.
Several aspects of customer perceptions are discussed in the literature on agile development.
Some have addressed how satisfied customers are with agile methods, others describe the
customer role, and some focus on the collaboration between a customer and the development
team.
31
With respect to the customer’s satisfaction with agile development methods, Ilieva et al.
(S10) studied the introduction of an agile method based on XP and the Personal Software
Process [32]. They state that the customer had constant control over the development process,
which was “highly praised by the customer at the project sign-off”. In addition, Mann and
Maurer (S17) found, in a study on the impact of Scrum on overtime and customer
satisfaction, that customers believed that the daily meetings kept them up to date and that
planning meetings were helpful to “reduce the confusion about what should be developed”.
The attitude of the customers was found to change from “one of ambivalence to becoming
involved”. The customers stated that their satisfaction with the project that was based on XP
was greater than with previous projects at the company.
However, Mann and Maurer stress that the customer should be trained in the Scrum process
so that they will understand the new expectations that the developers will have of them.
The role of the customer is also the focus in the study by Martin et al. (S19), on three XP
projects with on-site customers. In all cases, they found that the customer was under stress
and committed working long hours, although all the customers were supported by an
acceptance team, various technical advisors, or senior personnel:
The existing XP Customer practices appears to be achieving excellent results, but they also
appear to be unsustainable, and so constitute a great risk to XP projects, especially in long or
high pressure projects (p. 12)
Martin et al. (S20) also studied the role of the customer in outsourced projects, and found that
this was challenging because the customer was required to become acclimatized to the
different cultures or organizations of the developers.
Koskela and Abrahamsson (S13) analyzed the role of the customer in an XP project and
found that most of the time was spent on participating in planning game sessions and
acceptance testing, followed by retrospective sessions at the end of release cycles.
Mannaro et al. (S18) surveyed the job satisfaction amongst employees in software companies
that used XP and companies that did not use agile development methods. One hundred and
32
twenty-two people completed a web-based questionnaire. The bulk of these were from
Europe and the United States.
Ninety-five percent of the employees who used XP answered that they would like their
company to continue using their current development process, while the number for the
employees in companies that did not use agile development methods was 40%. In addition,
the employees in the companies that used XP were significantly more willing to use the
development process in the future than the employees in companies that did not use XP.
Further, Mannaro et al. (S18) claimed that employees who use XP have greater job
satisfaction, feel that the job environment is more comfortable, and believe that their
productivity is higher. In particular, 73% of the employees who used pair programming claim
that this practice speeds up the software development process.
In the study by Ilieva et al. (S10), developers found pair programming to be “a very useful
style of working as everyone was strictly conforming to the coding standards”. However, the
authors also note that working 40 hours a week in pairs requires a lot of concentration, and
that as a result, the developers became exhausted.
Mann and Maurer (S17) found that the introduction of Scrum led to a reduction of overtime,
and all developers recommended the use of Scrum in future projects. The developers were
more satisfied with the product, and saw that the Scrum process fostered more customer
involvement and communication. One developer said that “the Scrum process is giving me
confidence that we are developing the software that the customer wants”.
A study by Bahli and Zeid (S2) used the Technology Acceptance Model [54] to study the
adoption of XP in a company that develops medical information systems. They found that
employees saw XP as easy to use and useful, and that employees intended to use this
development process in the future.
Melnik and Maurer (S21, S22) report on student perceptions of agile development methods in
two studies, one from 2002 and one from 2005. They found that 240 students who responded
to a survey at the Southern Alberta Institute of Technology and at the University of Calgary
in Canada were “very enthusiastic about core agile practices”. The findings were consistent
across educational programmes.
33
The students found that working in agile teams helped them to develop professional skills
such as communication, commitment, cooperation, and adaptability. Seventy-eight percent of
the respondents stated that they believe that XP improves the productivity of small teams.
This figure is comparable to the findings of Mannaro et al. (S18) on pair programming and
productivity for employees in software companies.
Further, 76% of the respondents believed that using XP improved the quality of code, and
65% would recommend using XP to any company for which they may work in the future. Of
those that recommended using XP to their future employers, a large number preferred to work
in pairs.
In the 2002 study, Melnik and Maurer (S21) present qualitative findings on perceptions of XP
in general, pair programming, test-first design, and the planning game. Most students found
pair programming to be helpful, but some expressed concern when the the members of the
pair had different levels of competence. One student stated that “There was a huge difference
in skill level in my pair, so we weren’t very productive when I wasn’t driving”. In addition,
test-first design was difficult for many students. The authors believe that this is because
design in itself is very difficult, and writing the tests first forces students to make design
decisions early.
One third (11) of the reviewed primary studies provided some form of comparison of agile
development against an alternative; see Table 13. Using our interpretations as a basis, these
comparisons can be grouped into four higher-order comparative topics: project management,
productivity, product quality, and team characteristics. Non-comparative studies that mention
one or more of these issues are also included in this section.
34
Table 13: Study aims for comparative studies.
The management of software projects has long been a matter of interest. Agile methods have
reinforced this interest, because many conventional ideas about management are challenged
by such methods. Ceschi et al. (S4) found, in their survey of plan-based and agile companies,
that agile methods improved the management of the development process as well as
relationships with the customer. In particular, they found that companies that use agile
methods prefer to organize their processes in more releases and that the managers of such
companies are more satisfied with the way they plan their projects than are plan-based
companies. Moreover, Baskerville et al. (S3) found that “Internet-speed development” project
management differs from that of traditional development in that “Projects do not begin or
end, but are an ongoing operation more akin to operations management.” (ibid., p. 77).
Karlström and Runeson (S12) studied how traditional stage-gate project management could
be combined with agile methods. In a case study of three large companies, they found that
agile methods give the stage-gate model powerful tools for microplanning, day-to-day work
control, and reporting on progress. They also found that they were able to communicate much
35
more effectively when using the working software and face-to-face meetings of agile methods
than when using written documents. In turn, the stage-gate model provided the agile methods
with a means to coordinate with other development teams and to communicate with
marketing and senior management. Their conclusion was that it is feasible to integrate agile
methods with stage-gate project management to improve cost control, product functionality,
and on-time delivery.
A central concern for agile methods is to attend to the real needs of the customer, which are
often not stated explicitly in a more or less complete requirements specification. Thus,
Dagnino et al. (S6) compared and contrasted the use of an evolutionary agile approach with a
more traditional incremental approach in two different technology development projects.
They showed that by planning in detail only the features and requirements to be implemented
in a specific cycle, the agile team was more able to incorporate changes in requirements at a
later stage with less impact on the project. In addition, by delivering in-progress software to
the customer more frequently, the agile team was able to demonstrate business value more
quickly and more often than the traditional, iterative team. Combined with continuous
feedback by the customer, this lead to a sharp increase in customer satisfaction on the agile
project.
Similarly, Ceschi et al. (S4) found that the tighter links between the customer and the
development team resulted in agile companies being more satisfied with their customer
relationships than plan-based companies. Furthermore, Sillitti et al.’s (S28) survey of project
managers found that companies that use agile methods are more customer-centric and flexible
than document-driven ones, and that companies that use agile methods seem to have a more
satisfactory relationship with the customer.
However, with respect to human resource management, Baskerville et al. (S3) concluded that
compared to traditional development, team members of agile teams are less interchangeable,
and more difficult to describe and identify.
4.7.2 Productivity
Four studies compared the productivity of agile teams with the productivity of teams using
traditional development methods (S7, S10, S14, S32); see Table 14. Ilieva et al. (S10)
compared the productivity of two similar projects, one of which used traditional methods and
the other of which used XP. They measured the productivity for three iterations of each
36
project. Overall, the results showed a 42% increase in productivity for the agile team. The
increase in productivity was largest for the first iteration, while there was virtually no
difference in productivity for the last iteration.
The case study by Layman et al. (S14) compared an old release developed with traditional
methods with a new release developed with agile methods. The results showed a 46%
increase in productivity for the new agile release compared with the old, traditional release.
However, the agile team had notably greater domain and programming language expertise
and project manager experience, because three of the team members on the new release had
previously worked on the old release.
Dalcher et al. (S7) performed an experiment in which fifteen software teams developed
comparable software products using four different development approaches (V-model,
incremental, evolutionary, and XP). The greatest difference in productivity was between the
V-model teams and the XP teams, with the XP teams being, on average, 337% more
productive than the V-model teams. However, this productivity gain was due to the XP team
delivering 3.5 times more lines of code without delivering more functionality.
Contrary to the studies by Dalcher et al. (S7), Ilieva et al. (S10), and Layman et al. (S14),
Wellington et al. (S32) found a 44% decrease in productivity for an XP team compared with
a traditional team. Furthermore, Svensson and Höst (S30) found no change in overall
productivity when comparing results from before and after the introduction of an agile
process. However, they did find evidence that when the agile process was introduced, the
team improved their productivity during the first iterations.
In addition, Mannaro et al. (S18) asked their subjects whether the team’s productivity had
increased significantly as a result of the development process that was used. On a scale from
1 (Strongly Disagree) to 6 (Strongly Agree), the mean for the non-XP developers was 3.78,
while the mean for the XP developers was one scale point higher (4.75). Similarly, 78% of
Melnik and Maurer’s (S22) respondents either believed or believed strongly that using XP
improves the productivity of small teams.
37
Table 14. Comparisons of productivity.
2 Comparisons were made between two one-semester courses; however, the actual hours worked
by the members of the teams were not measured.
Several aspects of product quality were examined by the studies in this review. For example,
comparing the results for a new release of a project to those for an old release, Layman et al.
(S14) found a 65% improvement in pre-release quality and a 35% improvement in post-
release quality. Ilieva et al. (S10) found 13% fewer defects reported by the customer or by the
quality assurance team in an XP project than in a non-XP project.
In Wellington et al.’s (S32) study, the XP team’s code scored consistently better on the
quality metrics used than the traditional team. In addition, the quality of the code delivered by
the XP team was significantly greater than that delivered by the traditional team. However,
both teams agreed that the traditional team had developed a better and much more consistent
user interface.
Macias et al. (S15) measured the internal and external quality of the products developed by
10 XP teams and 10 traditional teams. However, in contrast to Layman et al. and Wellington
et al., they found no difference in either internal or external quality between the XP teams
and the traditional teams.
With respect to product size, the XP model teams in Dalcher et al.’s (S7) study delivered 3.5
times more lines of code than the V-model teams. This is in sharp contrast to Wellington et
al.’s (S32) results, which showed that the traditional team delivered 78% more lines of code
than the XP team. However, in contrast to both Dalcher et al. and Wellington et al., Macias et
al. (S15) found no difference in product size between the XP teams and the traditional teams.
38
4.7.4 Work practices and job satisfaction
A few studies made qualitative comparisons of social behaviour. Chong (S5), for example,
performed an ethnographic study to compare the work routines and work practices of the
software developers on an XP team and a non-XP team. Chong’s observations suggest that
certain features of XP promote greater uniformity in work routine and work practice across
individual team members and that, consequently, XP provides a framework for standardizing
the work of software development and making it more visible and accessible to the members
of a software development team.
The point of departure for Mannaro et al. (S18) was the importance of job satisfaction for the
effectiveness of the software development process. Consequently, they performed a survey to
compare the job satisfaction of developers that used XP practices with that of developers that
did not use them. The results of their study showed that the developers viewed XP practices
favourably and indicated that developers who use XP practices are more comfortable with
their job environment and more satisfied with their jobs than developers that do not use XP
practices.
5 Discussion
The present review identified a greater number of studies than did previous reviews.
Abrahamsson et al. (S34) wrote in their 2002 review that the existing evidence consists
mainly of practitioners’ success stories. Cohen et al. (S35) found seven case studies on agile
development in their 2004 report, we included none of these in our final set of studies,
because they were either lessons learned studies or single-practice studies. Further, Erickson
et al.’s (S36) 2004 review found four “case studies and lessons learned reports”, none of
which we included in our review. This systematic review shows that there are many more
empirical studies on agile development methods in general than have previously been
39
acknowledged. In contrast to the previous reviews, this review used an explicit search
strategy combined with explicit inclusion and exclusion criteria.
We now address our research questions, starting by discussing what we found regarding the
benefits and limitations of agile software development. The second subsection discusses the
strength of evidence of these findings, while the third subsection discusses the implications of
the findings for research and practice. Finally, we discuss the limitations of this systematic
review.
The studies that address the introduction and adoption of agile methods do not provide a
unified view of current practice, but offer a broad picture of experience and some
contradictory findings. XP was found to be difficult to introduce in a complex organization,
yet seemingly easy in other types of organizations. This is consistent with earlier findings that
suggest that agile development methods are more suitable for small teams than for larger
projects [17]. It is likely that the ease with which XP can be introduced will depend on how
interwoven software development is with the other functions of the organization. Most studies
reported that agile development practices are easy to adopt and work well. Benefits were
reported in the following areas: customer collaboration, work processes for handling defects,
learning in pair programming, thinking ahead for management, focusing on current work for
engineers, and estimation. With respect to limitations, the lean development technique did not
work well for one of the teams trying it out, pair programming was seen as inefficient, and
some claimed that XP works best with experienced development teams. A further limitation
that was reported by one of the studies, which has also been repeatedly mentioned in the
literature [42, 61], was the lack of attention to design and architectural issues.
A recurring theme in studies on agile development is what we have called human and social
factors and how these factors affect, and are affected by, agile development methods. A
benefit of XP was that it thrived in radically different environments; in organizations that
varied from having a hierarchical structure to little or no central control. In addition, customer
involvement and physical settings varied greatly for the successful XP teams studied. It seems
to be possible to adopt XP in various organizational settings. Further, conversation,
standardization, and the tracking of progress have been studied and are described as
mechanisms for creating awareness within teams and organizations. In addition, studies of XP
indicate that successful teams manage to balance a high level of individual autonomy with a
40
high level of team autonomy and corporate responsibility. They have faith in their own
abilities and preserve the quality of their working lives. Good interpersonal skills and trust
were found to be important characteristic for a successful XP team.
Many studies have sought to identify how agile methods are perceived by different groups.
Studies on customer perceptions report that customers are satisfied with the opportunities for
feedback and responding to changes. However, we also found that the role of on-site
customer can be stressful and cannot be sustained for a long period. Developers are mostly
satisfied with agile methods. Companies that use XP have reported that their employees are
more satisfied with their job and that they are more satisfied with the product. There were
mixed findings regarding the effectiveness of pair programming and several developers
regard it as an exhausting practice, because it requires heavy concentration. University
students perceive agile methods as providing them with relevant training for their future work
and believe that these methods improve the productivity in teams. However, they reported
that pair programming was difficult when there was a large skill differential between the
members of the pairs. In addition, test-first development was reported to be difficult for many
students.
With respect to product quality, most studies report increased code quality when agile
methods are used, but, again, none of these studies had an appropriate recruitment strategy to
41
ensure an unbiased comparison. The size of the end product seems not to be correlated with
the method of development used. Different studies have reported larger, smaller, and equal
sizes of end product for traditional versus agile methods. The effect on work practices and job
satisfaction of using agile and traditional methods has not been established conclusively.
Some studies have found that work practice is more standardized when agile methods are
used and that job satisfaction is greater. However, a study of team cohesion did not find any
improvement of cohesion in an XP team.
Several systems exist for making judgments about the strength of evidence in systematic
reviews (see [7] for an overview). Most of these systems suggest that the strength of evidence
can be based on a hierarchy with evidence from systematic reviews and randomized
experiments at the top of the hierarchy and evidence from observational studies and expert
opinion at the bottom of the hierarchy [36]. The inherent weakness with evidence hierarchies
is that randomized experiments are not always feasible and that, in some instances,
observational studies may provide better evidence.
To cope with the weaknesses of evidence hierarchies, we used the GRADE (Grading of
Recommendations Assessment, Development and Evaluation) working group definitions to
grade the overall strength of the evidence as high, moderate, low, or very low [7] (see Table
15). According to GRADE, the strength of evidence can be determined on the basis of the
combination of four key elements, i.e., in addition to study design, study quality, consistency,
and directness are also evaluated. The GRADE system initially categorizes evidence
concerning study design by assigning randomized experiments a high grade and observational
studies a low grade. However, by considering the quality, consistency, and directness of the
studies in the evidence base, the initial overall grade could be increased or decreased, i.e.,
evidence from inconsistent, low-quality experiments may be assigned a low grade, while
strong or very strong evidence of association from two or more high-quality observational
studies may be assigned a high grade [7].
Table 15. Definitions used for grading the strength of evidence [7].
High Further research is very unlikely to change our confidence in the estimate of effect.
Moderate Further research is likely to have an important impact on our confidence in the estimate of
effect and may change the estimate.
42
Low Further research is very likely to have an important impact on our confidence in the
estimate of effect and is likely to change the estimate.
Regarding study design, there were only three experiments in the review (two randomized
trials), while the remaining primary studies were observational. That there are few
experiments is natural because we included only studies that addressed agile methods as a
whole and excluded ones that investigated specific practices in isolation. This is consistent
with Shadish et al.’s comments that experiments are best used to investigate specific cause-
effect phenomena [57]. Consequently, our initial categorization of the total evidence in this
review based on study design is low. We now consider the quality, consistency, and directness
of the studies in the evidence base.
With respect to the quality of the studies, methods were not, in general, described well; issues
of bias, validity, and reliability were not always addressed; and methods of data collection
and analysis were often not explained well (see Section 4.3). As many as 25 out of the 33
primary studies did not have a recruitment strategy that seemed appropriate for the aims
stated for the research and 23 of the studies did not use other groups or baselines with which
to compare their findings. Furthermore, in only one study was the possibility of researcher
bias mentioned (see Table 9). Using these findings as a basis, we conclude that there are
serious limitations to the quality of the studies that inevitably increases the risk of bias or
confounding. Hence, we must be circumspect about the studies’ reliability.
With respect to consistency, i.e., the similarity of estimates of effect across studies, we found
differences in both the direction of effects and the size of the differences in effects, i.e., we
found no consistent evidence of association from two or more studies with no plausible
confounders nor did we find direct evidence from studies with no major threats to validity.
These inconsistencies might by due to imprecise or sparse data, and reporting bias.
With respect to directness, i.e., the extent to which the people, interventions, and outcome
measures are similar to those of interest, we found that most studies were concerned with XP.
This leaves an uncertainty about the directness of evidence for other agile methods. However,
given that most of the studies regarding XP were performed with student subjects or
professionals who had little or no experience in agile development, this also raises an issue
43
regarding the directness of evidence for XP. In addition, very few studies provided direct
comparisons of interventions; hence, we had to make comparisons across studies. However,
such indirect comparisons leave greater uncertainty than direct comparisons because of all the
other differences between studies that can affect the results. Our judgment is thus that there
are major uncertainties about the directness of the included studies.
Combining the four components of study design, study quality, consistency, and directness,
we find that the strength of the evidence in the current review regarding the benefits and
limitations of agile methods, and for decisions related to their adoption, is very low. Hence,
any estimate of effect that is based on evidence of agile software development from current
research is very uncertain. This is consistent with criticisms that have been raised regarding
the sparse scientific support for many of the claims made by the agile community [42].
This systematic review has a number of implications for research and practice. For research,
the review shows a clear need for more empirical studies of agile development methods.
Agile development has had a deep impact on the software industry in recent years. In our
opinion, this should lead to a greater interest amongst researchers as to what has driven the
trend and what the effects are of the changes that emerge in response to the adoption of agile
development.
This review also shows that, with rare exceptions, only XP has been studied. Hence, research
on other agile approaches that are popular in industry should be a priority when designing
future studies. In our opinion, management-oriented approaches, such as Scrum, are clearly
the most under-researched compared to their popularity in industry.
Another striking finding is that only one research group in the world has studied mature agile
development teams. If we want to investigate the potential of agile methods, we clearly need
to direct more resources towards investigating the practices of mature teams.
The review shows that a range of research methods have been applied. We need to employ
both flexible and fixed research designs if we are to gain a deeper understanding of agile
development. Edmondson and McManus [25] argue that the research design needs to fit the
current state of theory and research. They divide this state into three categories: nascent,
intermediate, and mature; see Table 16. For agile software development, we believe the
44
current state of theory and research on methods is clearly nascent, which suggests a need for
exploratory qualitative studies. Rajlich [53] phrased it as a “backlog of research problems to
be solved”.
Other areas of research on agile software development, such as studies of particular practices,
like pair programming, or areas that connect well to existing streams of software engineering
research, might be described as being at an intermediate, or even a mature, state.
A major challenge is to increase the quality of studies on agile software development. In [58],
Sjøberg et al. discuss measures to increase the quality of empirical studies in software
engineering in general. Recently, Höst and Runeson [33] have suggested a checklist to use in
case studies in software engineering. The recent special issue of Information and Software
Technology on qualitative software engineering research [20] provides many useful examples
of approaches for study designs, data collection, and analysis that should be relevant for
future studies of agile software development. The state of research with respect to controlled
experiments has been described thoroughly in a survey by Sjøberg et al. [59].
In order to increase the usefulness of the research for industry and to provide a sufficient
number of studies of high quality on subtopics related to agile development, we think that
45
researchers in the field should collaborate to determine a common research agenda. It lies
beyond the scope of this article to suggest such an agenda, but we hope that the synthesis of
research presented herein may provide the inspiration to create one.
For practitioners, this review shows that many promising studies of the use of agile methods
have been reported. Although serious limitations have been identified, e.g., that the role of
on-site customer seems to be unsustainable for long periods and that it is difficult to introduce
agile methods into large and complex projects, the results of the review suggest that it is
possible to achieve improved job satisfaction, productivity, and increased customer
satisfaction.
The strongest, and probably most relevant, evidence for practice is from the studies of mature
agile teams, which suggests that it is necessary to focus on human and social factors in order
to succeed. Specifically, it seems that a high level of individual autonomy must be balanced
with a high level of team autonomy and corporate responsibility. It also seems important to
staff agile teams with people that have faith in their own abilities combined with good
interpersonal skills and trust.
Evidence also suggests that instead of abandoning traditional project management principles,
one should rather take advantage of these principles, such as state-gate project management
models, and combine them with agile project management. The evidence also suggests that
agile methods not necessarily are the best choice for large projects. Thus, consistent with
recommendations provided by others [11, 12, 15], we suggest that practitioners carefully
study their projects’ characteristics and compare them with the relevant agile methods’
required characteristics.
Due to the limited number and relatively poor quality of the primary studies in this review, it
is impossible to offer more definitive and detailed advice. Rather, this review provides an
overview of research carried out to date, which must be critically appraised by companies in
order to identify similarities and differences between the studies reported and their own
situation. A particular important aid in this appraisal is the description of the context of the
studies in this review (Appendix D). A further aid would be to apply the principles of
evidence-based software engineering in order to support and improve the decisions about
what methods and technologies to employ [24].
46
The review clearly shows the need for more research in order to determine the situations in
which advice on agile development that has been offered by practitioners may suitably be
applied. We would like to urge companies to participate in research projects in the future, in
order to target research goals that are relevant for the software industry. Action research is
one such way of organizing collaboration between industry and researchers that would be
highly relevant for a nascent field such as agile software development.
The main limitations of the review are bias in the selection of publications and inaccuracy in
data extraction. To help to ensure that the process of selection was unbiased, we developed a
research protocol in advance that defined the research questions. Using these questions as a
basis, we identified keywords and search terms that would enable us to identify the relevant
literature. However, it is important to recognize that software engineering keywords are not
standardized and that they can be both discipline- and language-specific. Therefore, due to
our choice of keywords and search strings, there is a risk that relevant studies were omitted.
To avoid selection bias, we piloted every part of the review process, and in particular, the
search strategy and citation management procedure, in order to clarify weaknesses and refine
the selection process. Furthermore, since our focus was on empirical research, we excluded
“lessons learned” papers and papers that were based merely on expert opinion. If the review
had included this literature, the current study could, in principle, have provided more data. In
that event, it might have been possible to draw more general conclusions. To further ensure
the unbiased selection of articles, a multistage process was utilized that involved three
researchers who documented the reasons for inclusion/ exclusion at every step, as described
in Section 3 and also as suggested by Kitchenham [36].
When we piloted the data extraction process, we found that several articles lacked sufficient
details about the design and findings of a study and that, due to this, we differed too much in
what we actually extracted. As a consequence, all data from all the 33 primary studies were
extracted by the two authors in consensus meetings according to a predefined extraction form
(Appendix C). However, we often found that the extraction process was hindered by the way
some of the primary studies were reported. Many articles lacked sufficient information for us
to be able to document them satisfactorily in the extraction form. More specifically, we
frequently found that methods were not described adequately, that issues of bias and validity
were not always addressed, that methods of data collection and analysis were often not
explained well, and that samples and study settings were often not described well. There is
47
therefore a possibility that the extraction process may have resulted in some inaccuracy in the
data.
6 Conclusion
We identified 1,996 studies from searches of the literature, of which 36 were found to be
research studies of acceptable rigour, credibility, and relevance. Thirty-three of the 36 studies
identified were primary studies, while three were secondary studies.
The studies fell into four thematic groups: introduction and adoption, human and social
factors, perceptions of agile methods, and comparative studies. We identified a number of
reported benefits and limitations of agile development within each of these themes. However,
the strength of evidence is very low, which makes it difficult to offer specific advice to
industry. Consequently, we advise readers from industry to use this article as a map of
findings according to topic, which they can use to investigate relevant studies further and
compare the settings in the studies to their own situation.
The studies investigated XP almost exclusively, and only a few of the studies on XP were
done on mature development teams. A clear finding of the review is that we need to increase
both the number and the quality of studies on agile software development. In particular, agile
project management methods, such as Scrum, which are popular in industry, warrant further
attention. We see that there is a backlog of research issues to be addressed. In this context,
there is a clear need to establish a common research agenda for agile software development
and for future field studies to pay more attention to the fit between their research methods and
the state of prior work.
Acknowledgements
The work in this paper was supported by the Research Council of Norway through the project
Evidence-Based Software Engineering (181685/I30). We are grateful to Geir K. Hanssen at
SINTEF ICT, who participated in selecting and assessing the studies included in this review.
We are also grateful to Chris Wright for proofreading the paper.
48
Appendix A: Studies Included in the Review
[S1] P. Abrahamsson and J. Koskela, “Extreme programming: A survey of empirical data from a controlled
case study,” Proceedings - 2004 International Symposium on Empirical Software Engineering, ISESE 2004,
Aug 19-20 2004, Redondo Beach, CA, United States, 2004.
[S2] B. Bahli and E.S.A. Zeid, “The role of knowledge creation in adopting extreme programming model: an
empirical study,” ITI 3rd International Conference on Information and Communications Technology: Enabling
Technologies for the New Knowledge Society, 2005.
[S3] R. Baskerville, B. Ramesh, L. Levine, J. Pries-Heje, and S. Slaughter, “Is Internet-Speed Software
Development Different?” IEEE Software, no. 6, vol. 20, pp. 70-77, 2003.
[S4] M. Ceschi, A. Sillitti, G. Succi, and S. De Panfilis, “Project Management in Plan-Based and Agile
Companies,” IEEE Software, no. 3, vol. 22, pp. 21-27, 2005.
[S5] J. Chong, “Social Behaviours on XP and non-XP teams: A Comparative Study,” Proceedings of the Agile
Development Conference (ADC'05), 2005.
[S6] A. Dagnino, K. Smiley, H. Srikanth, A.I. Anton, and L. Williams, “Experiences in applying agile
software development practices in new product development,” Proceedings of the 8th IASTED International
Conference on Software Engineering and Applications, Nov 9-11 2004, Cambridge, MA, United States, 2004.
[S7] D. Dalcher, O. Benediktsson, and H. Thorbergsson, “Development Life Cycle Management: A
Multiproject Experiment,” Proceedings of the 12th International Conference and Workshops on the
Engineering of Computer-Based Systems (ECBS'05), 2005.
[S8] A. Fruhling, K. Tyser, and G.-J. De Vreede, “Experiences with Extreme Programming in Telehealth:
Developing and Implementing a Biosecurity Health Care Application,” Proceedings of the 38th Hawaii
International Conference on System Sciences (HICCS), Hawaii, USA, 2005.
[S9] M.-R. Hilkka, T. Tuure, and R. Matti, “Is Extreme Programming Just Old Wine in New Bottles: A
Comparison of Two Cases,” Journal of Database Management, no. 4, vol. 16, pp. 41-61, 2005.
[S10] S. Ilieva, P. Ivanov, and E. Stefanova, “Analyses of an agile methodology implementation,” Proceedings
30th Euromicro Conference, 2004, IEEE Computer Society Press, pp. 326-333.
[S11] T. Jokela and P. Abrahamsson, “Usability assessment of an extreme programming project: Close co-
operation with the customer does not equal to good usability,” in Product Focused Software Process
Improvement, Lecture Notes in Computer Science, vol. 3009. Berlin: Springer Verlag, 2004, pp. 393-407.
[S12] D. Karlström and P. Runeson, “Combining Agile Methods with Stage-Gate Project Management,” IEEE
Software, no. 3, vol. 22, pp. 43-49, 2005.
[S13] J. Koskela and P. Abrahamsson, “On-site customer in an XP project: Empirical results from a case
study,” in Software Process Improvement, Proceedings, Lecture Notes in Computer Science, vol. 3281, T.
Dingsøyr, Ed. Berlin: Springer-Verlag, 2004, pp. 1-11.
[S14] L. Layman, L. Williams, and L. Cunningham, “Exploring extreme programming in context: an industrial
case study,” Agile Development Conference, 2004.
[S15] F. Macias, M. Holcombe, and M. Gheorghe, “A formal experiment comparing extreme programming
with traditional software construction,” Proceedings of the Fourth Mexican International Conference on
Computer Science (ENC 2003), 2003.
[S16] A. Mackenzie and S. Monk, “From Cards to Code: How Extreme Programming Re-Embodies
Programming as a Collective Practice,” Computer Supported Cooperative Work, vol. 13, pp. 91-117, 2004.
[S17] C. Mann and F. Maurer, “A Case Study on the Impact of Scrum on Overtime and Customer
Satisfaction,” Agile Development Conference, 2005.
[S18] K. Mannaro, M. Melis, and M. Marchesi, “Empirical analysis on the satisfaction of IT employees
comparing XP practices with other software development methodologies,” in Extreme Programming and Agile
Processes in Software Engineering, Proceedings, Lecture Notes in Computer Science, vol. 3092: Springer
Verlag, 2004, pp. 166-174.
[S19] A. Martin, R. Biddle, and J. Noble, “The XP customer role in practice: three studies,” Agile
Development Conference, 2004.
49
[S20] A. Martin, R. Biddle, and J. Noble, “When XP met outsourcing,” in Extreme Programming and Agile
Processes in Software Engineering, Proceedings, Lecture Notes in Computer Science, vol. 3092. Berlin:
Springer Verlag, 2004, pp. 51-59.
[S21] G. Melnik and F. Maurer, “Perceptions of Agile Practices: A Student Survey,” in Proceedings, eXtreme
Programming/Agile Universe 2002, Lecture Notes in Computer Science, vol. 2418: Springer Verlag, 2002, pp.
241-250.
[S22] G. Melnik and F. Maurer, “A Cross-Program Investigation of Student's Perceptions of Agile Methods,”
International Conference on Software Engineering (ICSE), St. Louis, Missouri, USA, 2005.
[S23] P. Middleton, “Lean software development: Two case studies,” Software Quality Journal, no. 4, vol. 9,
pp. 241-252, 2001.
[S24] H. Robinson and H. Sharp, “The characteristics of XP teams,” in Extreme Programming and Agile
Processes in Software Engineering, Lecture Notes in Computer Science, vol. 3092. Berlin: Springer Verlag,
2004, pp. 139-147.
[S25] H. Robinson and H. Sharp, “The social side of technical practices ” in Extreme Progamming and Agile
Processes in Software Engineering, Lecture Notes in Computer Science, vol. 3556. Berlin: Springer Verlag,
2005, pp. 100-108.
[S26] H.S. Robinson, H., “Organisational culture and XP: three case studies,” Proceedings of the Agile
Conference (ADC'05), 2005.
[S27] H. Sharp and H. Robinson, “An ethnographic study of XP practice,” Empirical Software Engineering,
no. 4, vol. 9, pp. 353-375, 2004.
[S28] A. Sillitti, M. Ceschi, B. Russo, and G. Succi, “Managing Uncertainty in Requirements: a Survey in
Documentation-driven and Agile Companies,” Proceedings of the 11th International Software Metrics
Symposium (METRICS), 2005.
[S29] H. Svensson and M. Höst, “Introducing Agile Process in a Software Maintenance and Evolution
Organization,” Ninth European Conference on Software Maintenance and Reengineering (CSMR'05), 2005.
[S30] H. Svensson and M. Höst, “Views from an organization on how agile development affects its
collaboration with a software development team ”, Lecture Notes in Compuer Science, vol. 3547. Berlin:
Springer Verlag, 2005, pp. 487-501.
[S31] Tessem, “Experiences in Learning XP Practices: A Qualitative Study,” in XP 2003, vol. 2675. Berlin:
Springer Verlag, 2003, pp. 131-137.
[S32] C.A. Wellington, T. Briggs, and C.D. Girard, “Comparison of Student Experiences with Plan-Driven and
Agile Methodologies,” Proceeedings of the 35th ASEE/IEEE Frontiers in Education Conference, 2005.
[S33] S.M. Young, H.M. Edwards, S. Mcdonald, and J.B. Thompson, “Personality Characteristics in an XP
Team: A Repertory Grid Study,” Proceedings of Human and Social Factors of Software Engineering (HSSE),
St. Louis, Missouri, USA, 2005.
[S34] P. Abrahamsson, O. Salo, J. Ronkainen, and J. Warsta, “Agile software development methods: Review
and analysis,” VTT Technical report 2002,
[S35] D. Cohen, M. Lindvall, and P. Costa, “An Introduction to Agile Methods,” in Advances in Computers,
Advances in Software Engineering, vol. 62, M. V. Zelkowitz, Ed. Amsterdam: Elsevier, 2004.
[S36] J. Erickson, K. Lyytinen, and K. Siau, “Agile Modeling, Agile Software Development, and Extreme
Programming: The State of Research,” Journal of Database Management, no. 4, vol. 16, pp. 88 - 100, 2005.
50
Appendix B: Quality assessment form
Screening questions:
3. Is there an adequate description of the context in which the research was carried out?
Consider whether the researcher has identified:
□ Yes □ No
– The industry in which products are used (e.g. banking, telecommunications, consumer
goods, travel, etc)
– The nature of the software development organization (e.g. in-house department or
independent software supplier)
– The skills and experience of software staff (e.g. with a language, a method, a tool, an
application domain)
– The type of software products used (e.g. a design tool, a compiler)
– The software processes being used (e.g. a company standard process, the quality assurance
procedures, the configuration management process)
If question 1, or both of questions 2 and 3, receive a “No” response do not continue with the
quality assessment.
Detailed questions:
Research design
4. Was the research design appropriate to address the aims of the research?
Consider:
□ Yes □ No
– Has the researcher justified the research design (e.g. have they discussed how they decided
which methods to use)?
Sampling
5. Was the recruitment strategy appropriate to the aims of the research?
Consider:
□ Yes □ No
– Has the researcher explained how the participants or cases were identified and selected?
– Are the cases defined and described precisely?
– Were the cases representative of a defined population?
– Have the researchers explained why the participants or cases they selected were the most
appropriate to provide access to the type of knowledge sought by the study?
– Was the sample size sufficiently large?
Control group
6. Was there a control group with which to compare treatments?
Consider:
□ Yes □ No
– How were the controls selected?
– Were they representative of a defined population?
– Was there anything special about the controls?
– Was the non-response high? Could non-respondents be different in any way?
51
Data collection
7. Was the data collected in a way that addressed the research issue?
Consider:
□ Yes □ No
– Were all measures clearly defined (e.g. unit and counting rules)?
– Is it clear how data was collected (e.g. semi-structured interviews, focus group etc.)?
– Has the researcher justified the methods that were chosen?
– Has the researcher made the methods explicit (e.g. is there an indication of how interviews
were conducted, did they use an interview guide)?
– If the methods were modified during the study, has the researcher explained how and why?
– Whether the form of the data is clear (e.g. tape recording, video material, notes etc.)
– Whether quality control methods were used to ensure completeness and accuracy of data
collection
Data analysis
8. Was the data analysis sufficiently rigorous?
Consider:
□ Yes □ No
– Was there an in-depth description of the analysis process?
– If thematic analysis was used, is it clear how the categories/ themes were derived from the
data?
– Has sufficient data been presented to support the findings?
– To what extent has contradictory data been taken into account?
– Whether quality control methods were used to verify the results
Findings
10. Is there a clear statement of findings?
Consider:
□ Yes □ No
– Are the findings explicit (e.g. magnitude of effect)?
– Has an adequate discussion of the evidence, both for and against the researcher’s
arguments, been demonstrated?
– Has the researcher discussed the credibility of their findings (e.g. triangulation, respondent
validation, more than one analyst)?
– Are limitations of the study discussed explicitly?
– Are the findings discussed in relation to the original research questions?
– Are the conclusions justified by the results?
52
Appendix C: Data extraction form
Study description
1. Study identifier: Unique id for the study
2. Date of data extraction
3. Bibliographic reference: Author, year, title, source
4. Type of article: Journal article, conference paper, workshop paper,
book section
5. Study aims: What were the aims of the study?
6. Objectives: What were the objectives?
7. Design of study: Qualitative, quantitative
(experiment, survey, case study, action research)
Study findings
1. Findings and conclusions What were the findings and conclusions?
(verbatim from the study)
2. Validity Limitations, threats to validity
3. Relevance Research, practice
53
Appendix D: Overview of Primary Studies
ID Research method Agile method Agile experience Professional-Student Project duration Team size Domain, comment
S1 Singlecase XP Beginner Student 8,4 weeks 4 Research prototype developed.
S2 Multicase XP Beginner Professional 1 year 9 Medical information systems
S3 Mixed General - Professional - NA Web development
S4 Survey General NA Professional - NA NA
S5 Multicase XP Beginner Professional - 7-12 Mid-size software start-up
S6 Multicase Other Beginner Professional 2700 hours 5 Industrial automation
S7 Experiment XP Beginner Student 1 year 3-4 NA
S8 Singlecase XP Beginner Professional 21 months 4 Medical information systems
S9 Multicase XP Beginner Professional NA / 18 months 6/4 Factory system + communication system
S10 Singlecase XP Beginner Professional 900 hours 4 Financial software
S11 Singlecase XP Beginner Student 8,4 weeks 4 Research prototype developed.
S12 Multicase General Beginner Professional - - Industrial automation / Defence / Telecom
S13 Singlecase XP Beginner Student 8,4 weeks 4 Research prototype developed.
S14 Singlecase XP Beginner Professional 3,5 months 10 Airline company software
S15 Experiment XP Beginner Student 1 semestre 4-5 NA
S16 Singlecase XP Beginner Professional - 6-12 Knowledge management software
S17 Singlecase Scrum Beginner Professional 22 months 4-6 Oil and gas software
S18 Survey XP NA Professional NA NA NA
S19 Multicase XP Beginner Professional 15 / 45 / 18 11 / 8 / 16 -
S20 Multicase XP Beginner Professional 15 months / 18+ months 11 / 60 -
S21 Survey XP Beginner Student NA NA NA
S22 Mixed General Beginner Student - NA NA
S23 Multicase LSD Beginner Professional 3 days 2 Financial system
S24 Multicase XP Mature Professional - 8 / 23 Web applications / document software
S25 Multicase XP Mature Professional - 7 / 23 / 8 Mid-size software start-up
S26 Multicase XP Mature Professional - 12 / 20 / 8 Banking / Content security software / web-applications
S27 Singlecase XP Mature Professional - 10 -
S28 Survey General NA Professional - NA NA
S29 Singlecase XP Beginner Professional - - Software house
S30 Singlecase XP Beginner Professional - - Software maintenance and evolution
S31 Singlecase XP Beginner Student 3 weeks 6 Educational software
S32 Experiment XP Beginner Student 1 semester 16 NA
S33 Singlecase XP - Professional - 6 Software house
* Several numbers for a study in the columns Project duration and Team size indicate that the study included several teams.
54
References
[1] “North American and European Enterprise Software and Services Survey,” N. 2005, Ed.: Business
Technographics, 2005.
[2] P. Abrahamsson, O. Salo, J. Ronkainen, and J. Warsta, “Agile software development methods: Review and
analysis,” VTT Technical report 2002,
[3] P. Abrahamsson, J. Warsta, M.T. Siponen, and J. Ronkainen, “New Directions on Agile Methods: A
Comparative Analysis,” in Proceedings of the 25th International Conference on Software Engineering
(ICSE'03): IEEE Press, 2003.
[4] R.L. Ackoff, “Alternative Types of Planning,” in Ackoff’s Best: His Classic Writings on Management. New
York: Wiley, 1999, pp. 104-114.
[5] A. Anderson, R. Beattie, K. Beck, D. Bryant, M. Dearment, M. Fowler, M. Fronczak, R. Garzaniti, D.
Gore, B. Hacker, C. Hendrickson, R. Jeffries, D. Joppie, D. Kim, P. Kowalsky, D. Mueller, T. Murasky, R.
Nutter, A. Pantea, and D. Thomas, “Chrysler goes to "extremes",” Distributed Computing Magazine, no. Oct.,
pp. 24-28, 1998.
[6] M. Aoyama, “Web-based Agile Software Development,” IEEE Software, no. 6, vol. 15, pp. 56-65, 1998.
[7] D. Atkins, D. Best, P.A. Briss, M. Eccles, Y. Falck-Ytter, S. Flottorp, G.H. Guyatt, R.T. Harbour, M.C.
Haugh, D. Henry, S. Hill, R. Jaeschke, G. Leng, A. Liberati, N. Magrini, J. Mason, P. Middleton, J.
Mrukowicz, D. O’connell, A. D Oxman, B. Phillips, H.J. Schünemann, T.T.-T. Edejer, H. Varonen, G.E. Vist,
J.W. Williams Jr, and Z. Stephanie, “Grading quality of evidence and strength of recommendations,” BMJ, no.
1490, vol. 328, 2004.
[8] D. Avison, F. Lau, M. Myers, and P.A. Nielsen, “Action Research,” Communications of the ACM, no. 1,
vol. 42, pp. 94-97, 1999.
[9] K. Beck, Extreme Programming Explained: Embrace Change: Addison-Wesley, 2000, ISBN 201-61641-6.
[10] K. Beck, Extreme Programming Explained: Embrace Chage (2nd ed): Addison-Wesley, 2004, ISBN 978-
0321278654.
[11] B. Boehm, “Get ready for agile methods, with care,” IEEE Computer, no. 1, vol. 35, pp. 64 - 69, 2002.
[12] B. Boehm and R. Turner, Balancing Agility and Discipline: A Guide for the Perplexed. Boston: Addison-
Wesley, 2003, ISBN 978-0321186126.
[13] N. Britten, R. Campbell, C. Pope, J. Donovan, M. Morgan, and R. Pill, “Using Meta Ethnography to
Synthesise Qualitative Research: A Worked Example,” Journal of Health Services Research and Policy, no. 4,
vol. 7, pp. 209–215, 2002.
[14] P. Checkland and J. Scholes, Soft Systems Methodology in Action. Chichester: Wiley, 1990, ISBN
0/471/98605/4.
[15] A. Cockburn, “Selecting a project's methodology,” IEEE Software, no. 4, vol. 17, pp. 64-71, 2000.
[16] A. Cockburn, Crystal Clear : A Human-Powered Methodology for Small Teams: Addison-Wesley, 2004,
ISBN 0-201-69947-8.
[17] D. Cohen, M. Lindvall, and P. Costa, “An Introduction to Agile Methods,” in Advances in Computers,
Advances in Software Engineering, vol. 62, M. V. Zelkowitz, Ed. Amsterdam: Elsevier, 2004.
[18] J. Cohen, “A Coefficient of Agreement for Nominal Scales,” Educational and Psychological
Measurement, vol. 20, pp. 37–46, 1960.
[19] K. Conboy and B. Fitzgerald, “Toward a Conceptual Framework of Agile Methods: A Study of Agility in
Different Disciplines,” Proceedings of XP/Agile Universe, Springer Verlag, 2004, pp. 105-116.
[20] Y. Dittrich, M. John, J. Singer, and B. Tessem, “For the Special issue on Qualitative Software
Engineering Research,” Information and Software Technology, no. 6, vol. 49, pp. 531–539, 2007.
[21] T. Dybå, “Improvisation in Small Software Organizations,” IEEE Software, no. 5, vol. 17, pp. 82-87,
2000.
[22] T. Dybå, E. Arisholm, D. Sjøberg, J. Hannay, and F. Shull, “Are Two Heads Better than One? On the
Effectiveness of Pair-Programming,” IEEE Software, no. 6, vol. 24, pp. 10-13, 2007.
[23] T. Dybå, T. Dingsøyr, and G.K. Hanssen, “Applying Systematic Reviews to Diverse Study Types: An
Experience Report,” Proceedings of the 1st International Symposium on Empirical Software Engineering and
Measurement (ESEM'07), IEEE Computer Society, Madrid, Spain, 2007, pp. 225-234.
[24] T. Dybå, B. Kitchenham, and M. Jørgensen, “Evidence-based software engineering for practitioners,”
IEEE Software, no. 1, vol. 22, pp. 58-65, 2005.
[25] A.C. Edmondson and S.E. Mcmanus, “Methodological Fit in Management Field Research,” Academy of
Management Review, no. 4, vol. 32, pp. 1155-1179, 2007.
[26] H. Erdogmus, M. Morisio, and M. Torchiano, “On the effectiveness of the test-first approach to
programming ” IEEE Transactions on Software Engineering, no. 3, vol. 31, pp. 226-237, 2005.
[27] J. Erickson, K. Lyytinen, and K. Siau, “Agile Modeling, Agile Software Development, and Extreme
Programming: The State of Research,” Journal of Database Management, no. 4, vol. 16, pp. 88 - 100, 2005.
55
[28] T. Gilb, Competitive Engineering: A Handbook for Systems Engineering, Requirements Engineering, and
Software
Engineering Using Planguage. Oxford: Elsevier Butterworth-Heinemann, 2005, ISBN 0-7507-6507-6.
[29] T. Greenhalgh, How to Read a Paper (2nd Ed.). London: BMJ Publishing Group, 2001,
[30] A. Gunasekaran, “Agile manufacturing: A framework for research and development,” International
journal of production economics, no. 1-2, vol. 62, pp. 87-105, 1999.
[31] J.P.T. Higgins and S. Green, “Cochrane Handbook for Systematic Reviews of Interventions 4.2.5
[updated May 2005],” in The Cochrane Library, vol. 3. Chichester, UK: John Wiley & Sons, Ltd., 2005.
[32] W.S. Humphrey, PSP: A Self-Improvement Process for Software Engineers: Addison-Wesley, 2005, ISBN
978-0321305497.
[33] M. Höst and P. Runeson, “Checklists for Software Engineering Case Study Research,” Proceedings of the
First International Symposium on Empirical Software Engineering and Measurement, IEEE, Madrid, Spain,
2007, pp. 479-481.
[34] G. Keefer, “Extreme Programming Cosidered Harmful for Reliable Software Development 2.0,” AVOCA
GmbH, online report 2003,
[35] K.S. Khan, G. Ter Riet, J. Glanville, A.J. Sowden, and J. Kleijnen, “Undertaking Systematic Review of
Research on Effectiveness, CRD’s Guidance for those Carrying Out or Commissioning Reviews, CRD Report
Number 4 (2nd Ed.),” NHS Centre for Reviews and Dissemination, University of York 2001,
[36] B.A. Kitchenham, “Guidelines for performing Systematic Literature Reviews in Software Engineering
Version 2.3,” Keele University and University of Durham, EBSE Technical Report 2007,
[37] B.A. Kitchenham, S.L. Pfleeger, L.M. Pickard, P.W. Jones, D.C. Hoaglin, K. El Emam, and J. Rosenberg,
“Preliminary guidelines for empirical research in software engineering,” IEEE Transactions on Software
Engineering, no. 8, vol. 28, pp. 721 - 734, 2002.
[38] P. Krutchen, The Rational Unified Process: An Introduction, 3rd ed. Boston: Addison-Wesley, 2003,
[39] J.R. Landis and G.G. Koch, “The Measurement of Observer Agreement for Categorical Data,”
Biometrics, no. 1, vol. 33, pp. 159–174, 1977.
[40] C. Larman and V.R. Basili, “Iterative and Incremental Development: A Brief History,” IEEE Computer,
no. 6, vol. 36, pp. 47-56, 2003.
[41] J. Mcavoy and T. Butler, “The impact of the Abilene Paradox on double-loop learning in an agile team,”
Information and Software Technology, no. 6, vol. 49, pp. 552-563, 2007.
[42] P. Mcbreen, Questioning Extreme Programming. Boston, MA, USA: Pearson Education, 2003, ISBN 0-
201-84457-5.
[43] H. Merisalo-Rantanen, T. Tuure, and R. Matti, “Is Extreme Programming Just Old Wine in New Bottles:
A Comparison of Two Cases,” Journal of Database Management, no. 4, vol. 16, pp. 41-61, 2005.
[44] P. Meso and R. Jain, “Agile Software Development: Adaptive Systems Principles and Best Practices,”
Information Systems Management, no. 3, vol. 23, pp. 19-30, 2006.
[45] M.B. Miles and M. Huberman, Qualitative Data Analysis : An Expanded Sourcebook, 2nd ed: Sage
Publications, 1994, ISBN 0803955405.
[46] S. Nerur and V. Balijepally, “Theoretical Reflections on Agile Development Methodologies,”
Communications of the ACM, no. 3, vol. 50, pp. 79-83, 2007.
[47] S. Nerur, R. Mahapatra, and G. Mangalaraj, “Challenges of migrating to agile methodologies,”
Communications of the ACM, vol. May, pp. 72 - 78, 2005.
[48] G.W. Noblit and R.D. Hare, Meta-Ethnography: Synthesizing Qualitative Studies. London: Sage
Publications, 1988,
[49] T. Ohno, Toyota Production System: Beyond Large-scale Production. New York, USA: Productivity
Press, 1988, ISBN 0-915299-14-3.
[50] S.R. Palmer and J.M. Felsing, A Practical Guide to Feature-Driven Development. Upper Saddle River,
NJ: Prentice Hall, 2002, ISBN 0-13-067615-2.
[51] M.C. Paulk, C.V. Weber, B. Curtis, and M.B. Chrissis, The Capability maturity model : guidelines for
improving the software process. Boston: Addison-Wesley, 1995, ISBN: 0-201-54664-7.
[52] M. Poppendieck and T. Poppendieck, Lean Software Development - An Agile Toolkit for Software
Development Managers. Boston: Addison-Wesley, 2003, ISBN 0-321-15078-3.
[53] V. Rajlich, “Changing the paradigm of Software Engineering,” Communications of the ACM, no. 8, vol.
49, pp. 67 - 70, 2006.
[54] C.K. Riemenschneider, B.C. Hardgrave, and F.D. Davis, “Explaining Software Developer Acceptance of
Methodologies: A Comparison of Five Theoretical Models,” IEEE Transactions on Software Engineering, no.
12, vol. 28, pp. 1135 - 1145, 2002.
[55] L.M. Sanchez and R. Nagi, “A review of agile manufacturing systems,” International Journal of
Production Research, no. 16, vol. 39, pp. 3561-3600, 2001.
[56] K. Schwaber and M. Beedle, Agile Software Development with Scrum. Upper Saddle River: Prentice Hall,
2001,
[57] W.R. Shadish, T.D. Cook, and D.T. Campbell, Experimental and Quasi-Experimental Designs for
Generalized Causal Inference. Boston: Houghton Mifflin Company, 2002,
56
[58] D. Sjøberg, T. Dybå, and M. Jørgensen, “The Future of Empirical Methods in Software Engineering
Research,” in Future of Software Engineering (FOSE '07): IEEE, 2007, pp. 358-378.
[59] D. Sjøberg, J.E. Hannay, O. Hansen, V.B. Kampenes, A. Karahasanovic, N.-K. Liborg, and A.C. Rekdal,
“A Survey of Controlled Experiments in Software Engineering,” IEEE Transactions on Software Engineering,
no. 9, vol. 31, pp. 733 - 753, 2005.
[60] J. Stapleton, DSDM: Business Focused Development, , Second ed: Pearson Education, 2003, ISBN 978-
0321112248.
[61] M. Stephens and D. Rosenberg, Extreme Programming Refactored: The Case Against XP. Berkeley, CA:
Apress, 2003, ISBN 1-59059-096-1.
[62] A. Strauss and J. Corbin, Basics of Qualitative Research: Second edition: Sage Publications, 1998, ISBN
0-8039-5939-7.
[63] H. Takeuchi and I. Nonaka, “The new product development game,” Harvard Business Review, no.
January, pp. 137 - 146, 1986.
[64] D. Turk, R. France, and B. Rumpe, “Assumptions Underlying Agile Software-Development Processes,”
Journal of Database Management, no. 4, vol. 16, pp. 62-87, 2005.
[65] E. Wenger, Communities of practice : learning, meaning and identity. Cambridge, UK: Cambridge
University Press, 1998, ISBN 0-521-43017-8.
[66] L. Williams and A. Cockburn, “Agile Software Development: It’s about Feedback and Change,” IEEE
Computer, no. 6, vol. 36, pp. 39-43, 2003.
[67] J.P. Womack, D.T. Jones, and D. Roos, The Machine That Changed the World: The Story of Lean
Production-- Toyota's Secret Weapon in the Global Car Wars That Is Now Revolutionizing World Industry:
Free Press, 2007, ISBN 978-0743299794.
[68] P. Ågerfalk and B. Fitzgerald, “Flexible and Distributed Software Processes: Old Petunias in New
Bowls?,” Communications of the ACM, no. 10, vol. 49, pp. 27-34, 2006.
57