The RIGHT model for Continuous Experimentation
Fabian Fagerholma,∗, Alejandro Sanchez Guineab , Hanna Mäenpääa , Jürgen Müncha,c
a Department
of Computer Science, University of Helsinki, P.O. Box 68, FI-00014 University of Helsinki,
Finland
b University of Luxembourg, 4 rue Alphonse Weicker, L-2721, Luxembourg
c Faculty of Informatics, Reutlingen University, Alteburgstraße 150, D-72762 Reutlingen, Germany
Abstract
Context: Development of software-intensive products and services increasingly occurs
by continuously deploying product or service increments, such as new features and
enhancements, to customers. Product and service developers must continuously find
out what customers want by direct customer feedback and usage behaviour observation.
Objective: This paper examines the preconditions for setting up an experimentation
system for continuous customer experiments. It describes the RIGHT model for Continuous Experimentation (Rapid Iterative value creation Gained through High-frequency
Testing), illustrating the building blocks required for such a system. Method: An initial
model for continuous experimentation is analytically derived from prior work. The
model is matched against empirical case study findings from two startup companies and
further developed. Results: Building blocks for a continuous experimentation system
and infrastructure are presented. Conclusions: A suitable experimentation system
requires at least the ability to release minimum viable products or features with suitable
instrumentation, design and manage experiment plans, link experiment results with a
product roadmap, and manage a flexible business strategy. The main challenges are
proper, rapid design of experiments, advanced instrumentation of software to collect,
analyse, and store relevant data, and the integration of experiment results in both the
product development cycle and the software development process.
∗ Corresponding
author
Email addresses:
[email protected] (Fabian Fagerholm),
[email protected] (Alejandro Sanchez Guinea),
[email protected]
(Hanna Mäenpää),
[email protected],
[email protected] (Jürgen Münch)
Preprint submitted to Journal of Systems and Software
May 12, 2016
Keywords: Continuous experimentation, Product development, Software architecture,
Software development process, Agile software development, Lean software
development,
1
1. Introduction
2
The accelerating digitalisation in most industry sectors means that an increasing
3
number of companies are or will soon be providers of software-intensive products and
4
services. Simultaneously, new companies already enter the marketplace as software
5
companies. Software enables increased flexibility in the types of services that can be
6
delivered, even after an initial product has been delivered to customers. Many constraints
7
that previously existed, particularly in terms of the behaviour of a product or service,
8
can now be removed.
9
With this newfound flexibility, the challenge for companies is no longer primarily
10
how to identify and solve technical problems, but rather how to solve problems which
11
are relevant for customers and thereby deliver value. Finding solutions to this problem
12
has often been haphazard and based on guesswork, but many successful companies have
13
approached this issue in a systematic way. Recently, a family of generic approaches has
14
been proposed. For example, the Lean Startup methodology [26] proposes a three-step
15
cycle: build, measure, learn.
16
However, a detailed framework for conducting systematic, experiment-based soft-
17
ware development has not been elaborated. Such a framework has implications for the
18
technical product infrastructure, the software development process, the requirements
19
regarding skills that software developers need to design, execute, analyse, and inter-
20
pret experiments, and the organisational capabilities needed to operate and manage a
21
company based on experimentation in research and development.
22
Methods and approaches for continuous experimentation with software product and
23
service value should itself be based on empirical research. In this paper, we present
24
the most important building blocks of a framework for continuous experimentation.
25
Specifically, our research question is:
2
26
27
RQ How can Continuous Experimentation with software-intensive products and services be organised in a systematic way?
28
To further scope the question, we split it into two sub-questions:
29
RQ1 What is a suitable process model for Continuous Experimentation with software-
30
31
32
intensive products and services?
RQ2 What is a suitable infrastructure architecture for Continuous Experimentation
with software-intensive products and services?
33
We give an answer to the research questions by validating an analytically derived
34
model against a series of case studies in which we implemented different parts of the
35
model in cooperation with two startup companies. The result is the RIGHT model
36
for Continuous Experimentation (Rapid Iterative value creation Gained through High-
37
frequency Testing). This model focuses on developing the right software, whereas the
38
typical focus of software engineering in the past has been on developing the software
39
right (e.g. in terms of technical quality). The model is instantiated in the RIGHT process
40
model and the RIGHT infrastructure architecture model. Together, these instantiations
41
address the need to integrate the requirements, design, implementation, testing, deploy-
42
ment, and maintenance phases of software development in a way that uses continuous
43
empirical feedback from users.
44
The rest of this paper is organised as follows. In Section 2, we review related work
45
on integrating experimentation into the software development process. In Section 3, we
46
describe the research approach and context of the study. In Section 4, we first present
47
our proposed model for continuous experimentation, and then relate the findings of our
48
case study to it in order to illustrate its possible application and show the empirical
49
observations that it was grounded in. In Section 5, we discuss the model and consider
50
some possible variations. Finally, we conclude the paper and present an outlook on
51
future work in Section 6.
3
52
2. Related work
53
Delivering software that has value – utility for its users – can be considered a primary
54
objective for software development projects. In this section, we describe models for
55
systematic value delivery and approaches for using experiments as a means for value
56
testing and creation. In addition, we discuss related work with respect to experiments at
57
scale.
58
2.1. Models for systematic value delivery
59
Lean manufacturing and the Toyota Production System [22] has inspired the defini-
60
tion of Lean software development. This approach provides comprehensive guidance
61
for the combination of design, development, and validation built as a single feedback
62
loop focused on discovery and delivery of value [25]. The main ideas of this approach,
63
which have been emphasised since its introduction, are summarised in seven principles:
64
optimize the whole, eliminate waste, build quality in, learn constantly, deliver fast,
65
engage everyone, and keep getting better [24].
66
Lean Startup [26] provides mechanisms to ensure that product or service develop-
67
ment effectively addresses what customers want. The methodology is based on the
68
Build-Measure-Learn loop that establishes learning about customers and their needs as
69
the unit of progress. It proposes to apply scientific method and thinking to startup busi-
70
nesses in the form of learning experiments. As the results of experiments are analysed,
71
the company has to decide to “persevere” on the same path or “pivot” in a different
72
direction while considering what has been learned from customers.
73
Customer Development [4] emphasises the importance of not only doing product
74
development activities but also to learn and discover who a company’s initial customers
75
will be, and what markets they are in. Customer Development argues that a separate
76
and distinct process is needed for those activities. Customer Development is a four-
77
step model divided into a search and an execution phase. In the search phase, a
78
company performs customer discovery, testing whether the business model is correct
79
(product/market fit), and customer validation, which develops a replicable sales model.
80
In the execution phase, customer creation focuses on creating and driving demand, and
4
81
company building is the transition from an organisation designed to learn and discover
82
to one that is optimised for cost-efficient delivery of validated products or services.
83
In light of the benefits that a methodology such as Lean Startup can provide, where
84
controlled experiments constitute the main activity driving development, Holmström
85
Olsson et al. [12] propose a target stage for any company that wishes to build a develop-
86
ment system with the ability to continuously learn from real-time customer usage of
87
software. They describe the stages that a company has to traverse in order to achieve that
88
target as the “stairway to heaven”. The target stage is achieved when the software organ-
89
isation functions as an R&D experiment system. The stages on the way to achieving
90
the target are: (i) traditional development, (ii) agile R&D organisation, (iii) continuous
91
integration, and (iv) continuous deployment. The authors first describe these four stages
92
and then analyse them through a multiple-case study that examines the barriers that
93
exist on each step on the path towards continuous deployment. The target stage is
94
only described; the authors do not detail any means to overcome the barriers. A main
95
finding from the case study is that the transition towards Agile development requires
96
shifting to small development teams and focusing on features rather than on compo-
97
nents. Also, it is relevant to notice that the transition towards continuous integration
98
requires an automated build and test system (continuous integration), a main version
99
control branch to which code is continuously delivered, and modularised development.
100
Holmström Olsson et al. found that in order to move from continuous integration to
101
continuous deployment, organisational units such as product management must be fully
102
involved, and close work with a very active lead customer is needed when exploring
103
the product concept further. The authors suggest two key actions to make the transition
104
from continuous deployment to an R&D experiment system. First, the product must be
105
instrumented so that field data can be collected in actual use. Second, organisational
106
capabilities must be developed in order to effectively use the collected data for testing
107
new ideas with customers.
108
Other works have studied some of the stages of the “stairway to heaven” individually.
109
Ståhl & Bosch [27] have studied the continuous integration stage, pointing out that there
110
is no homogeneous practice of continuous integration in the industry. They propose
111
a descriptive model that allows studying and evaluating the different ways in which
5
112
continuous integration can be viewed. Eklund & Bosch [7] present an architecture that
113
supports continuous experimentation in embedded systems. They explore the goals of
114
an experiment system, develop experiment scenarios, and construct an architecture that
115
supports the goals and scenarios. The architecture combines an experiment repository,
116
data storage, and software to be deployed on embedded devices via over-the-air data
117
communication channels. The architecture also considers the special requirements
118
for safety in, e.g., automotive applications. However, the main type of experiment is
119
confined to A/B testing, and the architecture is considered mainly from the perspective
120
of a software development team rather than a larger product development organisation.
121
Holmström Olsson & Bosch [13] describe the Hypothesis Experiment Data-Driven
122
Development (HYPEX) model. The goal of this model is to shorten the feedback loop
123
to customers. It consists of a loop where potential features are generated into a feature
124
backlog, features are selected and a corresponding expected behaviour is defined. The
125
expected behaviour is used to implement and deploy a minimum viable feature (MVF).
126
Observed and expected behaviour is compared using a gap analysis, and if a sufficiently
127
small gap is identified, the feature is finalised. On the other hand, if a significant gap is
128
found, hypotheses are developed to explain it, and alternative MVFs are developed and
129
deployed, after which the gap analysis is repeated. The feature may also be abandoned
130
if the expected benefit is not achieved.
131
2.2. Systematic value creation through experimentation
132
The models outlined above all aim to make experimentation systematic in the
133
software development organisation. One important conceptual concern is the definition
134
of experimentation. Experimentation has been established in software engineering since
135
the 1980s. Basili et al. [3] were among the fist to codify a framework and process for
136
experimentation. Juristo et al. [14] and Wohlin et al. [31] present more recent syntheses
137
regarding experimentation in software engineering. Taken together, these works show
138
that “experimentation” in software engineering can be considered in a broad sense,
139
including both controlled experiments but also more explorative activities which aim
140
at understanding and discovery rather than hypothesis testing. For the purposes of
141
this article, we consider experimentation to be a range of activities that can be placed
6
142
within a spectrum including controlled experiments as well as open-ended exploration.
143
However, we emphasise that regardless of the placement within this spectrum, all
144
methods require rigorous study designs and have a defensible and transparent way
145
of reasoning and drawing conclusions from empirical data. They are not the same
146
method being applied more or less carefully. The logic of controlled experiments relies
147
on careful manipulation of variables, observation of effects, and analysis to test for
148
causal relationships. Quasi-controlled experiments relax some of the requirements
149
for randomised treatment. Case studies often include qualitative elements and their
150
logic is different from controlled experiments: they generalise analytically rather than
151
statistically [32]. Qualitative methods may also be used alone, such as through interview-
152
or observation-based studies.
153
Experimentation may also be considered in terms of goals, and goals may exist
154
on different levels of the product development organisation. On the product level,
155
experimentation may be used to select features from a set of proposed features. On the
156
technical level, experimentation may be used to optimise existing features. However,
157
the model presented in this paper links experimentation on the product and technical
158
level to the product vision and strategy on the business level. Experimentation becomes
159
a systemic activity that drives the entire organisation. This allows for focused testing of
160
business hypotheses and assumptions, which can be turned into faster decision-making
161
and reaction to customer needs. Depending on the specific method used, the results
162
of an experiment may suggest new information which should be incorporated into the
163
decision-making process.
164
2.3. Considerations for running experiments at a large scale
165
Previous works have presented case studies that exhibit different aspects concerning
166
continuous experimentation. Steiber [28] report on a study of the continuous experimen-
167
tation model followed by Google, analysing a success story of this approach. Tang et
168
al. [29] describe an overlapping experiment infrastructure, developed at Google, that
169
allows web queries in a search engine to be part of multiple experiments, thus allowing
170
more experiments to be carried out at a faster rate. Adams [1] present a case study
7
171
on the implementation of Adobe’s Pipeline, a process that is based on the continuous
172
experimentation approach.
173
Kohavi et al. [16, 17] note that running experiments at large scale requires ad-
174
dressing multiple challenges in three areas: cultural/organisational, engineering, and
175
trustworthiness. The larger organisation needs to learn the reasons for running controlled
176
experiments and the trade-offs between controlled experiments and other methods of
177
evaluating ideas. Even negative experiments should be run, which degrade user experi-
178
ence in the short term, because of their learning value and long-term benefits. When
179
the technical infrastructure supports hundreds of concurrent experiments, each with
180
millions of users, classical testing and debugging techniques no longer apply because
181
there are millions of live variants of the system in production. Instead of heavy up-front
182
testing, Kohavi et al. report having used alerts and post-deployment fixing. The system
183
has also identified many negative features that were avoided despite being advocated by
184
key stakeholders, saving large amounts of money.
185
Experimentation also has an important relationship with company culture. Kohavi
186
et al. [15] describe a platform for experimentation built and used at Microsoft, noting
187
the cultural challenges involved in using experiment results, rather than opinions from
188
persons in senior positions, as the basis of decisions. They suggest, for example, that one
189
should avoid trying to build features through extensive planning without early testing of
190
ideas, that experiments should be carried out often, that a failed experiment is a learning
191
opportunity rather than a mistake, and that radical and controversial ideas should be
192
tried. All these suggestions are challenging to put into practice in organisations that are
193
not used to experimentation-based decision-making. Kohavi et al. note the challenges
194
they faced at Microsoft, and describe efforts to raise awareness of the experimentation
195
approach.
196
The final stage of the “stairway to heaven” model is detailed and analysed by
197
Bosch [5]. The differences between traditional development and the continuous ap-
198
proach are analysed, showing that in the context of the new, continuous software
199
development model, R&D is best described as an “innovation experiment system” ap-
200
proach where the development organisation constantly develops new hypotheses and
201
tests them with certain groups of customers. This approach focuses on three phases:
8
202
pre-deployment, non-commercial deployment, and commercial deployment. The au-
203
thors present a first systematisation of this so-called “innovation experiment system”
204
adapted for software development for embedded systems. It is argued that aiming for
205
an “innovation experiment system” is equally valid for embedded systems as it is in the
206
case of cloud computing and Software-as-a-Service (SaaS), and that the process could
207
be similar in both cases. That is, requirements should evolve in real time based on data
208
collected from systems in actual use with customers.
209
Inspired by the ideas that define the last stage of the “stairway to heaven”, we
210
develop and propose the RIGHT model for Continuous Experimentation. In this model,
211
experiments are derived from business strategies and aim to assess assumptions derived
212
from those strategies, potentially invalidating or supporting the strategy. Previous works
213
have explored the application of a framework for linking business goals and strategies
214
to the software development activities (e.g., [2], [20]). However, those works have
215
not considered the particular traits of an experiment system such as the one presented
216
in this paper. The model presented also describes the platform infrastructure that is
217
necessary to establish the whole experiment system. The Software Factory [8] can serve
218
as infrastructure for the model proposed, as it is a software development laboratory well
219
suited for continuous experimentation. In a previous article, in which we presented a
220
study on creating minimum viable products [19] in the context of collaboration between
221
industry and academia, we showed the Software Factory laboratory in relation to the
222
Lean Startup approach and continuous experimentation. Some of the foundational ideas
223
behind Software Factory with respect to continuous experimentation have been studied
224
in the past, analysing, for instance, the establishment of laboratories specifically targeted
225
for continuous development [21] and the impact of continuous integration in teaching
226
software engineering.
227
The building blocks presented in this paper, although generalizable with certain
228
limitations, are derived from a startup environment where the continuous experimenta-
229
tion approach is not only well suited but possibly the only viable option for companies
230
to grow. Our work has similarities to the “Early Stage Startup Software Development
231
Model” (ESSSDM) of Bosch et al. [6] which extends existing Lean Startup approaches
232
offering more operational process support and better decision-making support for startup
9
233
companies. Specifically, ESSSDM provides guidance on when to move product ideas
234
forward, when to abandon a product idea, and what techniques to use and when, while
235
validating product ideas. Some of the many challenges faced when trying to establish a
236
startup following the Lean Startup methodology are presented by May [18] with insights
237
that we have considered for the present work.
238
3. Research approach
239
Our general research framework can be characterised as design science research [11],
240
in which the purpose is to derive a technological rule which can be used in practice to
241
achieve a desired outcome in a certain field of application [30]. The continuous experi-
242
mentation model presented in this paper was first constructed based on the related work
243
presented in the previous section as well the authors’ experience. While a framework
244
can be derived by purely analytic means, its validation requires grounding in empirical
245
observations. For this reason, we conducted a holistic multiple case study [32] in the
246
Software Factory laboratory at the Department of Computer Science, University of
247
Helsinki, in which we matched the initial model to empirical observations and made
248
subsequent adjustments to produce the final model. The model can still be considered
249
tentative, pending further validation in other contexts. It is important to note that this
250
study investigates how Continuous Experimentation can be carried out in a systematic
251
way independently of the case projects’ goals and the experiments carried out in them.
252
Those experiments and their outcomes are treated as qualitative findings in the context
253
of this study. In this section, we describe the case study context and the research process.
254
3.1. Context
255
The Software Factory is an educational platform for research and industry collabora-
256
tion [8]. In Software Factory projects, teams of Master’s-level students use contemporary
257
tools and processes to deliver working software prototypes in close collaboration with
258
industry partners. The goal of Software Factory activities is to provide students with
259
means for applying their advanced software development skills in an environment with
260
working life relevance and to deliver meaningful results for their customers [19].
10
261
During the case projects used in this study, two of the authors were involved as
262
participant observers. The first author coordinated the case projects: started the projects,
263
handled contractual and other administrative issues, followed up progress through direct
264
interaction with the customer and student teams, ended the projects, handled project
265
debriefing and coordinated the customer interviews. The third author also participated
266
as an observer in several meetings where the customer and student teams collaborated.
267
The researchers were involved in directing the experimentation design activities together
268
with the customer, and students were not directly involved in these activities. However,
269
the customer and students worked autonomously and were responsible for project
270
management, technical decisions, and other issues related to the daily operations of the
271
project.
272
3.1.1. Case Company 1
273
Tellybean Ltd.1 is a small Finnish startup that develops a video calling solution for
274
the home television set. During September 2012–December 2013 the company was a
275
customer in three Software Factory projects with the aim of creating an infrastructure to
276
support measurement and management of the architecture of their video calling service.
277
Tellybean Ltd. aims at delivering a life-like video calling experience. Their value
278
proposition – “the new home phone as a plug and play -experience” – is targeted at late
279
adopter consumer customers who are separated from their families, e.g. due to migration
280
into urban areas, global social connections, or overseas work. The company puts special
281
emphasis on discovering and satisfying needs of the elderly, making ease of use the most
282
important non-functional requirement of their product. The primary means for service
283
differentiation in the marketplace are affordability, accessibility and ease of use. For
284
the première commercial launch, and to establish the primary delivery channel of their
285
product, the company aims at partnering with telecom operators. The company had made
286
an initial in-house architecture and partial implementation during a pre-development
287
phase prior to the Software Factory projects. A first project was conducted to extend
288
the platform functionality of this implementation. A second project was conducted to
1 http://www.tellybean.com/
11
289
validate concerns related to the satisfaction of operator requirements. After this project, a
290
technical pivot was conducted, with major portions of the implementation being changed;
291
the first two projects contributed to this decision. A third project was then conducted
292
to extend the new implementation with new features related to the ability to manage
293
software on already delivered products, enabling continuous delivery. The launch
294
strategy can be described as an MVP launch with post-development adaptation. The
295
three projects conducted with this company are connected to establishing a continuous
296
experimentation process and building capabilities to deliver software variations on
297
which experiments can be conducted. They also provided early evidence regarding the
298
feasibility of the product for specific stakeholders, such as operator partners, developers,
299
and release management.
300
3.1.2. Product
301
The Tellybean video calling service has the basic functionalities of a home phone: it
302
allows making and receiving video calls and maintaining a contact list. The product is
303
based on an Android OS set-top-box (STB) that can be plugged into a modern home
304
TV. The company maintains a backend system for mediating calls to their correct
305
respondents. While the server is responsible for routing the calls, the actual video
306
call is performed as a peer to peer connection between STBs residing in the homes of
307
Tellybean’s customers.
308
The company played the role of a product owner in three Software Factory projects
309
during September 2012–December 2013. The aim of the first two projects was to create
310
new infrastructure for measuring and analysing usage of their product in its real envi-
311
ronment. This information was important in order to establish the product’s feasibility
312
for operators and for architectural decisions regarding scalability, performance, and
313
robustness. For the present research, the first two projects were used to validate the steps
314
required to establish a continuous experimentation process. The third project at Software
315
Factory delivered an automated system for managing and updating the STB software
316
remotely. This project was used to investigate factors related to the architecture needs
317
for continuous experimentation. Table 1 summarises the goals and motivations of the
12
Table 1: Scope of each of the three Tellybean projects at Software Factory.
Project 1
High-level goal
Motivation
As an operator, I want to be able to see
. . . so that I can extract and analyse busi-
metrics for calls made by the video call
ness critical information.
product’s customers.
. . . so that I can identify needs for maintenance of the product’s technical architecture.
Project 2
As a Tellybean developer, I want to be
. . . so that I know the limitations of the
sure that our product’s system architec-
system.
ture is scalable and robust.
. . . so that I can predict needs for scala-
As a Tellybean developer, I want to
bility of the platform.
know technical weaknesses of the sys-
. . . so that I can consider future devel-
tem.
opment options.
As a Tellybean developer, I want to receive suggestions for alternative technical architecture options.
Project 3
As a technical manager, I want to be
. . . so that I can deploy upgrades to the
able to push an update to the Tellybean
software on one or multiple set-top-
set-top-boxes with a single press of a
boxes.
button.
13
318
projects in detail. Each project had a 3–7 -person student team, a company representative
319
accessible at all times, and spent effort in the range of 600 and 700 person-hours.
320
3.1.3. Project 1
321
The aim of Tellybean’s first project at the Software Factory was to build means for
322
measuring performance of their video calling product in its real environment. The goal
323
was to develop a browser-based business analytics system. The team was also assigned
324
to produce a back-end system for storing and managing data related to video calls, in
325
order to satisfy operator monitoring requirements. The Software Factory project was
326
carried out in seven weeks by a team of four Master’s-level computer science students.
327
Competencies required in the project were database design, application programming,
328
and user interface design.
329
The backend system for capturing and processing data was built on the Java Enter-
330
prise Edition platform, utilising the Spring Open Source framework. The browser-based
331
reporting system was built using JavaScript frameworks D3 and NVD3 to produce vivid
332
and interactive reporting. A cache system of historical call data was implemented to
333
ensure the performance of the system.
334
After the project had been completed, both students and the customer deemed that
335
the product had been delivered according to the customer’s requirements. Despite
336
the fact that some of the foundational requirements changed during the project due to
337
discoveries of new technological solutions, the customer indicated satisfaction with the
338
end-product. During the project, communication between the customer and the team
339
was frequent and flexible.
340
The first project constituted a first attempt at conducting continuous experimentation.
341
The goal of the experiment was to gain information about the performance of the system
342
architecture and its initial implementation. The experiment arose from operator needs
343
to monitor call volumes and system load – a requirement that Tellybean’s product
344
developers deemed necessary to be able to partner with operators. It was clear that there
345
existed a set of needs arising from operator requirements, but it was not clear how the
346
information should be presented and what functionality was needed to analyse it. From
14
347
a research perspective, however, the exact details of the experiment were less important
348
than the overall process of starting experimentation.
349
3.1.4. Project 2
350
The second project executed at Software Factory aimed at performing a system-wide
351
stress test for the company’s video calling service infrastructure. The Software Factory
352
team of four Master’s-level students produced a test tool for simulating very high call
353
volumes. The tool was used to run several tests against Tellybean’s existing call mediator
354
server.
355
The test software suite included a tool for simulating video call traffic. The tool
356
was implemented using the Python programming language. A browser-based visual
357
reporting interface was also implemented to help analysis of test results. The reporting
358
component was created using existing Javascript frameworks such as Highcharts.js and
359
Underscore.js. Test data was stored in a MongoDB database to be utilised in analysis.
360
The purpose of the experiment was a counterpart to the experiment in the first
361
project. Whereas the first project had focused on operator needs, the second focused
362
on their implications for developers. The initial system architecture and many of the
363
technical decisions had been questioned. The project aimed to provide evidence for
364
decision-making when revisiting these initial choices.
365
The team found significant performance bottlenecks in Tellybean’s existing proof-of-
366
concept system and analysed their origins. Solutions for increasing operational capacity
367
of the current live system were proposed and some of them were also implemented.
368
Towards the end of the project, the customer suggested that a new proof-of-concept
369
call mediating server should be proposed by the Software Factory team. The team
370
delivered several suggestions for a new service architecture and composed a new call
371
mediator server. For the purposes of this study, we consider the second experiment to
372
be another round in the continuous experimentation cycle where findings from the first
373
cycle resulted in a new set of questions to experiment on.
15
374
3.1.5. Project 3
375
For their third project at Software Factory, Tellybean aimed to create a centralised
376
infrastructure for updating their video calling product’s software components. The
377
new remote software management system would allow the company to quickly deploy
378
software updates to already delivered STBs. The functionality was business critical to
379
the company and its channel partners: it allowed updating the software without having
380
to travel on-location to each customer to update their STBs. The new instrument enabled
381
the company to establish full control of their own software and hardware assets.
382
The project consisted of a team of five Master’s-level computer science students.
383
The team delivered a working prototype for rapid deployment of software updates. In
384
this project, the need for a support system to deliver new features or software variations
385
was addressed. We considered the architectural requirements for a continuous delivery
386
system that would support continuous experimentation.
387
3.1.6. Case Company 2
388
Memory Trails Ltd. (Memory Trails) is a small Finnish startup that develops a
389
well-being service which helps users define, track, and receive assistance with life goals.
390
During May–July 2014, the company was a customer in a Software Factory project
391
that aimed to develop a backend recommendation engine for the service, improve the
392
front-end user experience, and to validate central assumptions in the service strategy.
393
Memory Trails aims at delivering the service as an HTML5-based application which
394
is optimised for tablets but also works on other devices with an HTML5-compatible
395
browser. The service targets adults who wish to improve their quality of life and change
396
patterns of behaviour to reach different kinds of life goals.
397
Whereas the projects with the first case company focused mostly on establishing
398
a continuous experimentation process and building capabilities to deliver software
399
variations for experimentation, the project with the second case company focused on
400
some of the details of deriving experiments themselves. In particular, we sought to
401
uncover how assumptions can be identified in initial product or service ideas. These
402
assumptions are candidates for experiments of different kinds.
16
403
404
3.1.7. Project 4
Memory Trails provided an initial user interface and backend system prototype
405
which demonstrated the general characteristics of the application from a user perspective.
406
Users interact with photos which can be placed in different spatial patterns to depict
407
emotional aspects of their goals. Users are guided by the application to arrange the
408
photos as a map, showing the goal, potential steps towards it, and aspects that qualify
409
the goals. For example, a life goal may be to travel around the world. Related photos
410
could depict places to visit, moods to be experienced, items necessary for travel such as
411
tickets, etc. The photos could be arranged, e.g., as a radial pattern with the central goal
412
in the middle, and the related aspects around it, or as a time-line with the end goal to the
413
right and intermediate steps preceding it.
414
In the project, two high-level assumptions were identified. The customer assumed
415
that automatic, artificial intelligence-based processing in the backend could be used
416
to automatically guide users towards their goals, providing triggers, motivation, and
417
rewards on the way. Also, the customer assumed that the motivation for continued
418
use of the application would come from interacting with the photo map. Since the
419
automatic processing depended on the motivation assumption, the latter became the
420
focus of experimentation in the project. The customer used versions of the application
421
in user tests during which observation and interviews were used to investigate whether
422
the assumption held. For the purposes of this study, we used the project to validate the
423
link in our model between product vision, business model and strategy, and experiment
424
steps.
425
3.2. Research process
426
The case study analysis was performed in order to ground the continuous experi-
427
mentation model in empirical observations, not to understand or describe the projects
428
themselves, nor to assess the business viability of the case companies. Therefore, we
429
collected information that would help us understand the prerequisites for performing
430
continuous experimentation, the associated constraints and challenges, and the logic of
431
integrating experiment results into the business strategy and the development process.
17
432
We used four different sources of data in our analysis: (i) participant observa-
433
tion, (ii) analysis of project artefacts, (iii) group analysis sessions, and (iv) individual
434
interviews. We subsequently discuss the details of the data collection and analysis.
435
During the projects, we observed the challenges the companies faced related to
436
achieving the continuous experimentation system. At the end of each project, an in-
437
depth debriefing session was conducted to gain retrospective insights into the choices
438
made during the project, and the reasoning behind them. In addition to these sources, we
439
interviewed three company representatives from Tellybean to understand their perception
440
of the projects and to gain data which could be matched against our model. We also
441
conducted a joint analysis session with the project team and two representatives from
442
Memory Trails to further match insights on the experimentation process in their project
443
with our model.
444
The debriefing sessions were conducted in a workshop-like manner, with one re-
445
searcher leading the sessions and the project team, customer representatives, and any
446
other project observer present. The sessions began with a short introduction by the
447
leader, after which the attendees were asked to list events they considered important
448
for the project. Attendees wrote down each event on a separate sticky note and placed
449
them on a time-line which represented the duration of the project. As event-notes
450
were created, clarifying discussion about their meaning and location on the time-line
451
took place. When attendees could not think of any more events, they were asked to
452
systematically recount the progress of the project using the time-line with events as a
453
guide.
454
The interviews with customer representatives were conducted either in person on the
455
customer’s premises, online via video conferencing, or on the University of Helsinki’s
456
premises. The interviews were semi-structured thematic interviews, having a mixture of
457
open-ended and closed questions. This interview technique allows participants to freely
458
discuss issues related to a focal theme. Thematic interviews have the advantage that
459
they provide opportunities to discover information that researchers cannot anticipate
460
and that would not be covered by more narrowly defined, closed questions. While they
461
may result in the discussion straying away from the focal theme, this is not a problem in
18
462
practice since the interviewer can direct the participant back to the theme and irrelevant
463
information can be ignored in the analysis.
464
A minimum of two researchers were present in the interviews to ensure that relevant
465
information was correctly extracted. All participating researchers took notes during
466
the interviews, and notes were compared after the interviews to ensure consistency. In
467
the interviews, company representatives were first asked to recount their perception
468
of their company, its goals, and its mode of operation before the three projects. Then,
469
they were asked to consider what each project had accomplished in terms of software
470
outcomes, learned information, and implications for the goals and mode of operation of
471
the company. Finally, they were asked to reflect on how the company operated at the
472
time of the interview and how they viewed the development process, especially in terms
473
of incorporating market feedback into decision-making.
474
During analysis, the project data were examined for information relevant to the
475
research question. We categorised the pieces of evidence according to whether they
476
related to the Continuous Experimentation process or to the infrastructure. We sought to
477
group the observations made and understanding gained during the projects with evidence
478
from the retrospective sessions and interviews so that the evidence was triangulated
479
and thus strengthened. Such groups of triangulated evidence was then matched with
480
our initial model, which was similar to the sequence shown in Figure 1, and included
481
the build-measure-learn cycle for the process, and a data repository, analysis tools, and
482
continuous delivery system as infrastructure components. We adjusted the model and
483
introduced new process steps and infrastructure components that supported the need
484
implied by the evidence. We strived for minimal models, and when more than one need
485
could be fulfilled with a single step or component, we did not introduce more steps or
486
components. When all the evidence had been considered, we evaluated the result as a
487
whole and made some adjustments and simplifications based on our understanding and
488
judgement.
19
Learning Cycle
Build-MeasureLearn
Learnings
Build-MeasureLearn
Learnings
Build-MeasureLearn
Technical
Infrastructure
time
Figure 1: Sequence of RIGHT process blocks.
489
4. Results
490
In this section, we first describe our proposed model for continuous experimentation,
491
and then report on the insights gained from the multiple case study and how they inform
492
the different parts of the model.
493
4.1. The RIGHT model for Continuous Experimentation
494
By continuous experimentation, we refer to a software development approach that
495
is based on field experiments with relevant stakeholders, typically customers or users,
496
but potentially also with other stakeholders such as investors, third-party developers,
497
or software ecosystem partners. The model consists of repeated Build-Measure-Learn
498
blocks, supported by an infrastructure, as shown in Figure 1. Each Build-Measure-Learn
499
block results in learnings which are used as input for the next block. Conceptually,
500
the model can also be thought to apply not only to software development, but also to
501
design and development of software-intensive products and services. In some cases,
502
experimentation using this model may require little or no development of software.
503
The Build-Measure-Learn blocks structure the activity of conducting experiments,
504
and connect product vision, business strategy, and technological product development
505
through experimentation. In other words, the requirements, design, implementation,
506
testing, deployment, and maintenance phases of software development are integrated
507
and aligned by empirical information gained through experimentation. The model
508
can be considered a vehicle for incremental innovation as defined by Henderson and
509
Clark [10], but the model itself, as well as the transition to continuous experimentation
510
in general, can be considered radical, architectural innovations that require significant
511
new organisational capabilities.
20
512
4.1.1. The RIGHT process model for Continuous Experimentation
513
Figure 2 expands the Build-Measure-Learn blocks and describes the RIGHT process
514
model for Continuous Experimentation. A general vision of the product or service is
515
assumed to exist. Following the Lean Startup methodology [26], this vision is fairly
516
stable and is based on knowledge and beliefs held by the entrepreneur. The vision is
517
connected to the business model and strategy, which is a description of how to execute
518
the vision. The business model and strategy are more flexible than the vision, and
519
consist of multiple assumptions regarding the actions required to bring a product or
520
service to market that fulfils the vision and is sustainably profitable. However, each
521
assumption has inherent uncertainties. In order to reduce the uncertainties, we propose
522
to conduct experiments. An experiment operationalises the assumption and states a
523
hypothesis that can be subjected to experimental testing in order to gain knowledge
524
regarding the assumption. The highest-priority hypotheses are selected first. Once a
525
hypothesis is formulated, two parallel activities can occur. The hypothesis can optionally
526
be used to implement and deploy a Minimum Viable Product (MVP) or Minimum Viable
527
Feature (MVF), which is used in the experiment and has the necessary instrumentation.
528
Simultaneously, an experiment is designed to test the hypothesis. The experiment
529
is then executed and data from the MVP/MVF are collected in accordance with the
530
experimental design. The resulting data are analysed, concluding the experimental
531
activities.
532
Once the experiment has been conducted and analysis performed, the analysis
533
results are used on the strategy level to support decision-making. Again following Lean
534
Startup terminology, the decision can be to either “pivot” or “persevere” [26], but a
535
third alternative is also possible: to change assumptions in the light of new information.
536
If the experiment has given support to the hypothesis, and thus the assumption on the
537
strategy level, a full product or feature is developed or optimised, and deployed. The
538
strategic decision in this case is to persevere with the chosen strategy. If, on the other
539
hand, the hypothesis was falsified, invalidating the assumption on the strategy level,
540
the decision is to pivot and alter the strategy by considering the implications of the
541
assumption being false. Alternatively, the tested assumption could be changed, but not
21
Build
Vision
Measure
Learn
Vision
Business Model & Strategy
Strategy
pivot, change
assumptions,
validate further
assumptions,
or stop
Identify and
prioritise
hypotheses
Learnings
Decision
Making
Package
Learnings
support
persevere
Experiments
Hypotheses
Design
Product
implement
and deploy
MVP / MVF
Execute
update
Analyse
implement /
optimise, schedule
for deployment
Product / Service / MVP / MVF
Instrumentation
Figure 2: The RIGHT process model for Continuous Experimentation.
542
completely rejected, depending on what the experiment was designed to test and what
543
the results were.
544
4.1.2. The RIGHT infrastructure architecture for Continuous Experimentation
545
To support conducting such experiments, an infrastructure for continuous experimen-
546
tation is needed. Figure 3 sketches the RIGHT infrastructure architecture for Continuous
547
Experimentation, with roles and associated tasks, the technical infrastructure, and in-
548
formation artefacts. The roles indicated here will be instantiated in different ways
549
depending on the type of company in question. In a small company, such as a startup, a
550
small number of persons will handle the different roles and one person may have more
551
than one role. In a large company, the roles are handled by multiple teams. Seven roles
552
are defined to handle four classes of tasks. A business analyst and a product owner, or a
553
product management team, together handle the creation and iterative updating of the
554
strategic roadmap. In order to do so, they consult existing experimental plans, results,
555
and learnings, which reside in a back-end system. As plans and results accumulate and
556
are stored, they may be reused in further development of the roadmap. The business
22
Role
Business
Analyst
Product
Owner
Create & Iterate
Roadmap
Task
Technical
Infrastructure
Data
Scientist
Software
Developer
Design, Execute,
Analyse Experiments
Quality
Assurance
DevOps
Engineer
Develop
Product
Release
Engineer
Deploy
Product
API
Experiment DB
Analytics Tools
Instrumentation
Back-end
system
Front-end
system
Continuous
Delivery
System
Continuous
Integration
System
Information
Artefacts
Raw Data
Experiment Plans
Learnings
Roll-out
status
Figure 3: The RIGHT infrastructure architecture for Continuous Experimentation.
557
analyst and product owner work with a data scientist role, which is usually a team with
558
diverse skills, to communicate the assumptions of the roadmap and map the areas of
559
uncertainty which need to be tested.
560
The data scientist designs, executes, and analyses experiments. A variety of tools
561
are used for this purpose, which access raw data in the back-end system. Conceptually,
562
raw data and experiment plans are retrieved, analysis performed, and results produced
563
in the form of learnings, which are stored back into the back-end system.
564
The data analyst also communicates with a developer and quality assurance role.
565
These roles handle the development of MVPs, MVFs, and the final product. They first
566
work with the data analyst to produce proper instrumentation into the front-end system,
567
which is the part of the software which is delivered or visible to the user. In the case of
568
a persevere-decision, they work to fully develop or optimise the feature and submit it
569
for deployment into production. MVPs, MVFs, and final products are deployed to users
570
after first going through the continuous integration and continuous delivery systems. A
571
DevOps engineer acts as the mediator between the development team and operations,
572
and a release engineer may oversee and manage the releases currently in production.
573
Importantly, the continuous delivery system provides information on software roll-out
574
status, allowing other roles to monitor the experiment execution and, e.g., gain an
575
understanding of the conditions under which the software was deployed to users and of
576
the sample characteristics and response rate of the experiment. Cross-cutting concerns
23
577
such as User Experience may require additional roles working with several of the roles
578
mentioned here. To simplify the figure, we have omitted the various roles that relate to
579
operations, such as site reliability engineer, etc. Also, we have omitted a full elaboration
580
of which information artefacts should be visible to which roles. In general, we assume
581
that is is beneficial to visualise the state of the continuous experimentation system for
582
all roles.
583
The back-end system consists of an experiment database which, conceptually, stores
584
raw data collected from the software instrumentation, experiment plans – which in-
585
clude programmatic features of sample selection and other logic needed to conduct
586
the experiment – and experiment results. The back-end system and the database are
587
accessible through an API. Here, these parts should be understood as conceptual; an
588
actual system likely consists of multiple APIs, databases, servers, etc. The experiment
589
database enables a product architecture where deployed software is configured for ex-
590
periments at run-time. Thus it is not always required that a new version of the software
591
or the accompanying instrumentation is shipped to users prior to an experiment; the
592
experimental capability can be built into the shipped software as a configurable variation
593
scheme. The shipped software fetches configuration parameters for new experiments,
594
reconfigures itself, and sends back the resulting measurement data, eliminating the need
595
to perform the Develop Product and Deploy Product tasks. For larger changes, a new
596
software version may be required, and the full set of tasks performed.
597
4.2. Model instantiations and lessons learned
598
In this subsection, we describe how the RIGHT models were instantiated in the four
599
projects, and we describe the lessons learned. We include illustrative examples from our
600
interview data. We note that the model was initially quite simple, similar to the sequence
601
described in Figure 1 with a build-measure-learn cycle, a data repository, analysis tools,
602
and continuous delivery system. We also note that not all parts of the models were
603
instantiated in all projects. We assume that this will be the case in other projects as well.
604
In the first two projects, we focused on problem validation: developing an understanding
605
of the needs in real situations that a model for continuous experimentation should
606
address. In the two latter projects, we already had most of the model in place and
24
607
focused more on validating our solution, using detailed findings from the projects in
608
order to adjust the model.
609
Each of the four case projects relate to different aspects of continuous experimen-
610
tation. The case findings support the need for systematic integration of all levels of
611
software product and service development, especially when the context is rapid new
612
product and service development. The key issue is to develop a product that customers
613
will buy, given tight financial constraints. Startup companies operate in volatile markets
614
and under high uncertainty. They may have to do several quick changes as they get
615
feedback from the market. The challenge is to reach product-market fit before running
616
out of money.
617
“You have to be flexible because of money, time and technology constraints.
618
The biggest question for us has been how to best use resources we have to
619
achieve our vision. In a startup, you are time-constrained because you have
620
a very limited amount of money. So you need to use that time and money
621
very carefully.” (Tellybean founder)
622
When making changes in the direction of the company, it is necessary to base
623
decisions on sound evidence rather than guesswork. However, we found that it is
624
typically not the product or service vision that needs to change. The change should
625
rather concern the strategy by which the vision is implemented, including the features
626
that should be implemented, their design, and the technological platform on which the
627
implementation is based. For example, although Tellybean has had to adapt several
628
times, the main vision of the company has not changed.
629
“The vision has stayed the same: lifelike video calling on your TV. It is very
630
simple; everyone in the company knows it. The TV part doesn’t change,
631
but the business environment is changing. The technology – the hardware
632
and software – is changing all the time." (Tellybean founder)
633
“We had to pivot when it comes to technology and prioritising features. But
634
the main offering is still the same: it’s the new home phone and it connects
635
to your TV. That hasn’t changed. I see the pivots more like springboards to
25
636
the next level. For example, we made a tablet version to [gain a distributor
637
partner].” (Tellybean CTO)
638
Also, although an experiment design is, at best, self-evident when viewed in hind-
639
sight, developing one based on the information available in actual software projects,
640
especially new product or service development, is not an easy task. There are multiple
641
possibilities for what to experiment on, and it is not obvious how to choose the first
642
experiment or each next experiment after that. Our case projects showed that initiating
643
the continuous experimentation process is a significant task in its own right and involves
644
much learning. This strengthens the notion that a basic and uncomplicated model to
645
guide the process in the right direction is needed.
646
4.2.1. Project 1
647
In the first project, the new business analytics instrument allowed Tellybean to yield
648
insights on their system’s statistics, providing the company a means for feedback. They
649
could gain a near-real-time view on call related activities, yielding business critical
650
information for deeper analysis. The presence of the call data could be used as input
651
for informed decisions. It also allowed learning about service quality and identifying
652
customer call behaviour patterns. Based on the customer’s comments, such information
653
would be crucial for decision-making regarding the scaling of the platform. Excess
654
capacity could thus be avoided and the system would be more profitable to operate
655
while still maintaining a good service level for end users. The primary reason for
656
wanting to demonstrate such capabilities was the need to satisfy operator needs. To
657
convince operators to become channel partners, the ability to respond to fluctuations in
658
call volumes was identified as critical. Potential investors would be more inclined to
659
invest in a company that could convince channel operators of the technical viability of
660
the service.
661
“There were benefits in terms of learning. We were able to show things to
662
investors and other stakeholders. We could show them examples of metric
663
data even if it was just screenshots.” (Tellybean CTO)
26
664
The high-level goal of the first project could be considered as defining a business
665
hypothesis to test the business model from the viewpoint of the operators. The project
666
delivered the needed metrics as well as a tool-supported infrastructure to gather the
667
necessary data. These results could be used to set up an experiment to test the business
668
hypotheses.
669
Table 2 shows the parts of our model that were instantiated in Project 1. The project
670
instantiated a few basic elements of the RIGHT process model. The chosen business
671
model and strategy was to offer the video calling service through operator partnerships.
672
In order for the strategy to be successful, the company needed to demonstrate the
673
feasibility of the service in terms of operator needs and requirements. This demon-
674
stration was to operators themselves but also to other stakeholders, such as investors,
675
who assessed the business model and strategy. The hypothesis to test was not very
676
precisely defined in the project, but could be summarised as “operators will require
677
system performance management analysis tools in order to enter a partnership”. The
678
experiment, which was obviously not a controlled one but rather conducted as part
679
of investor and operator negotiations, used the analytics instrument developed in the
680
project to assess whether the assumption was correct, thus instantiating an MVF, and
681
making a rudimentary experiment execution and analysis. Based on this information,
682
some decisions were made: to start investigating alternative architectures and product
683
implementation strategies.
684
4.2.2. Project 2
685
In the second project, Tellybean was able to learn the limitations of the current
686
proof-of-concept system and its architecture. An alternative call mediator server and an
687
alternative architecture for the system were very important for the future development of
688
the service. The lessons learned in the second project, combined with the results of the
689
first, prompted them to pivot heavily regarding the technology, architectural solutions,
690
and development methodology.
691
“The Software Factory project [. . . ] put us on the path of ‘Lego software
692
development’, building software out of off-the-shelf, pluggable components.
693
It got us thinking about what else we should be doing differently. [. . . ] We
27
Table 2: Model instantiations in Project 1.
Process model instantiation
Vision
Video calling in the home
Business model and strategy
Offer video calling through operator partnerships
(+ assumptions about architecture and product implementation strategies)
Hypotheses
“Operators will require performance management analysis
tools in order to enter a partnership”
Design, execute, analyse
Rudimentary
MVF
Analytics instrument
Decision making
Start architectural pivot (continued in Project 2)
Start product implementation strategy pivot (continued in
Project 2)
Validate further assumptions (regarding architecture and
product implementation)
Infrastructure model instantiation (only applicable parts)
Roles
Business analyst, product owner (played by company leadership), software developer (played by Software Factory
students)
Technical Infrastructure
Analytics Tools (MVF developed in project)
Information Artefacts
Learnings (not formally documented in project)
28
694
were thinking about making our own hardware. We had a lot of risk and
695
high expenses. Now we have moved to existing available hardware. Instead
696
of a client application approach, we are using a web-based platform. This
697
expands the possible reach of our offering. We are also looking at other
698
platforms. For example, Samsung just released a new SDK for Smart TVs.”
699
(Tellybean founder)
700
“Choosing the right Android-based technology platform has really sped
701
things up a lot. We initially tried to do the whole technology stack from
702
hardware to application. The trick is to find your segment in the technology
703
stack, work there, and source the rest from outside. We have explored
704
several Android-based options, some of which were way too expensive.
705
Now we have started to find ways of doing things that give us the least
706
amount of problems. But one really important thing is that a year ago,
707
there were no Android devices like this. Now there are devices that can do
708
everything we need. So the situation has changed a lot.” (Tellybean CTO)
709
The high-level goals of the second project could be considered as defining and testing
710
a solution hypothesis that addresses the feasibility of the proposed hardware-software
711
solution. The project delivered an evaluation of the technical solution as well as
712
improvement proposals. The analysis showed that the initial architecture and product
713
implementation strategy were too resource-consuming to carry out fully. The results
714
were used by the company to modify their strategy. Instead of implementing the
715
hardware themselves, they opted for a strategy where they would build on top of generic
716
hardware platforms and thus shorten time-to-market and development costs. Table 3
717
shows the model instantiations in Project 2.
718
4.2.3. Project 3
719
In the third project, the capability for continuous deployment was developed. The
720
STBs could be updated remotely, allowing new features to be pushed to customers at very
721
low cost and with little effort. The implications of this capability are that the company
722
is able to react to changes in their technological solution space by updating operating
29
Table 3: Model instantiations in Project 2.
Process model instantiation
Vision
Video calling in the home
Business model and strategy
Offer video calling through operator partnerships
(+ assumptions about architecture and product implementation strategies)
Hypotheses
“Product should be developed as custom hardwaresoftware codesign” and “Architecture should be based on
Enterprise Java technology and be independent of TV set
(which acts only as display)”
Design, execute, analyse
Prototype implementation; evaluate current solution proposal
Alternative call mediator server; alternative system archi-
MVF
tecture
Decision making
Architectural pivot (Android-based COTS hardware and
OS)
Product implementation strategy pivot (do not develop
custom hardware)
Infrastructure model instantiation (only applicable parts)
Roles
Business analyst, product owner (played by company leadership), software developer (played by Software Factory
students)
Technical Infrastructure
Analytics Tools (from previous project)
Information Artefacts
Learnings (not formally documented in project)
30
723
system and application software, and to emerging customer needs by deploying new
724
features and testing feature variants continuously.
725
The high-level goals of the third project could be considered as developing a capa-
726
bility that allows for automating the continuous deployment process. The prerequisite
727
for this is a steady and controlled pace of development where the focus is on managing
728
the amount of work items that are open concurrently in order to limit complexity. At
729
Tellybean, this is known as the concept of one-piece flow.
730
“The one-piece flow means productisation. In development, it means you
731
finish one thing before moving on to the next. It’s a bit of a luxury in
732
development, but since we have a small team, it’s possible. On the business
733
side, the most important thing has been to use visual aids for business
734
development and for prioritising. In the future we might try to manage
735
multiple-piece flows.” (Tellybean founder)
736
The third project instantiated parts of our infrastructure architecture model, shown in
737
Table 4. In particular, it focused on the role of a continuous delivery system in relation
738
to the tasks that need to be carried out for continuous experimentation, meaning that top
739
and rightmost parts of Figure 3 were instantiated, as detailed in the table.
740
4.2.4. Project 4
741
In the fourth project, it was initially difficult to identify what the customers consid-
742
ered to be the main assumptions. However, once the main assumptions became clear,
743
it was possible to focus on validating them. This highlights the finding that although
744
it is straightforward in theory to assume that hypotheses should be derived from the
745
business model and strategy, it may not be straightforward in practice. In new product
746
and service development, the business model and strategy is not finished, and, especially
747
in the early cycles of experimentation, it may be necessary to try several alternatives and
748
spend effort on modelling assumptions until a good set of hypotheses is obtained. We
749
therefore found it useful to separate the identification and prioritisation of hypotheses
750
on the strategy level from the detailed formulation of hypotheses and experiment design
751
on the experiment level. Table 5 shows the instantiated model parts in Project 4. We
31
Table 4: Model instantiations in Project 3.
Process model instantiation
Vision
Video calling in the home
Business model and strategy
Offer video calling through operator partnerships
(+ assumptions about architecture and product implementation strategies)
Hypotheses
“Capability for automatic continuous deployment is
needed for incremental product development and delivery”
Design, execute, analyse
Project focused on instantiating parts of infrastructure architecture model and did not include a product experiment
MVF
Prototype for rapid deployment of software updates
Decision making
Persevere
Infrastructure model instantiation (only applicable parts)
Roles
Business analyst, product owner (played by company leadership), software developer (played by Software Factory
students), DevOps engineer, release engineer (played by
company CTO and other technical representatives; also
represented by user stories with tasks for these roles)
Technical infrastructure
Continuous integration system, continuous delivery System (MVF developed in project)
Information artefacts
Roll-out status
32
752
note that some of these parts were introduced into the model because of our findings
753
from Project 4.
754
In this project, there were two assumptions: that interaction with the photo map
755
would retain users, and that an automated process of guiding users towards goals
756
was feasible. The assumption that continued use of the application would come from
757
interacting with the photo map was shown to be incorrect. Users would initially create
758
the map, but would not spend much time interacting with it – by, e.g., adding or changing
759
photos, rearranging the map, adding photo annotations, etc. Instead, users reported a
760
desire for connecting with other users to share maps and discuss life goals. Also, they
761
expressed a willingness to connect with professional or semi-professional coaches to
762
get help with implementing their life goals. The social aspect of the service had been
763
overlooked. Whether this was due to familiarity with existing social media applications
764
was left uninvestigated. In any case, the assumption was invalidated and as a result, the
765
assumptions regarding automated features for guiding users towards goals were also
766
invalidated. The investigation indicated that users were motivated by the potential for
767
interaction with other users, and that these interactions should include the process of
768
motivating them to reach goals. It is important to note that the two hypotheses could be
769
invalidated because they were dependent. The process of identifying and prioritising
770
hypotheses separately from detailed formulation of hypotheses and experiment design
771
makes it possible to choose the order of experiments in a way that gains the maximum
772
amount of information with the minimum number of experiments. Testing the most
773
fundamental assumptions – the ones on which most other assumptions rely – first, allows
774
the possibility of eliminating other assumptions with no additional effort.
775
The fourth project also revealed challenges involved with instrumenting the ap-
776
plication for data collection. It was difficult to separate the process of continuous
777
experimentation from the technical prerequisites for instrumentation. In many cases,
778
substantial investments into technical infrastructure may be needed before experiments
779
can be carried out. These findings led to the roles, the high-level description of the
780
technical infrastructure, and the information artefacts in the infrastructure architecture
781
(see Figure 3).
33
Table 5: Model instantiations in Project 4.
Process model instantiation
Vision
Well-being service for defining, tracking, and receiving
assistance with life goals
Business model and strategy
Product and service recommendations, automated recommendation engine for motivating progress towards goals
“Motivation for continued use comes from interacting with
Hypotheses
photo map” and “Automatic recommendation engine will
automatically guide users to reach goals” (depends on first
hypothesis)
Design, execute, analyse
User tests with observation and interviews
MVF
HTML5-based, tablet-optimised application
Decision making
Product implementation strategy pivot (focus on social
interaction rather than automated recommendations)
Infrastructure model instantiation (only applicable parts)
Roles
Business analyst, product owner (played by company leadership), software developer (played by Software Factory
students)
Technical infrastructure
Instrumentation, Front-end system
Information artefacts
Learnings
34
782
However, many experiments are also possible without advanced instrumentation.
783
The fourth project indicates that experiments may typically be large, or target high-level
784
questions, in the beginning of the product or service development cycle. They may
785
address questions and assumptions which are central to the whole product or service
786
concept. Later stages of experimentation may address more detailed aspects, and may
787
be considered optimisation of an existing product or service.
788
5. Discussion
789
The continuous experimentation model developed in the previous section can be seen
790
as a general description. Many variations are possible. For instance, experiments may
791
be deployed to selected customers in a special test environment, and several experiments
792
may be run in parallel. A special test environment may be needed particularly in
793
business-to-business markets, where the implications of feature changes are broad and
794
there may be reluctance towards having new features at all. The length of the test cycle
795
may thus have to be longer in business-to-business markets. Direct deployment could
796
be more suitable for consumer markets, but we note that the attitude towards continuous
797
experimentation is likely to change as both business and consumer customers become
798
accustomed to it.
799
Each project could have instantiated the RIGHT models in different ways. In the
800
first project, the experiment could have been carried out using mockup screens to
801
validate what metric data, visualisation, and analysis tools would have been sufficient to
802
convince the stakeholders. However, this would have been detrimental since it would not
803
have revealed the shortcomings in the initial architecture and implementation strategy.
804
Although the design of the experiment left much to be desired, carrying it out using a
805
real, programmed prototype system made it possible to discover the need to reconsider
806
some of the previous strategy choices.
807
In the second project, the learnings could have been better used to define a more
808
precise set of hypotheses after a careful analysis of the shortcomings of the previous
809
system architecture. However, this was not necessary since the purpose was not a point-
810
by-point comparison but rather an either-or comparison between one general approach
35
811
and another. This highlights an important notion regarding continuous experimentation:
812
it only seeks to produce enough information for a decision to be made correctly.
813
In the third project, only the capability for continuous delivery was instantiated. The
814
project could also have addressed the components that are necessary to carry out actual
815
experiments. Due to project time constraints, this was left uninvestigated in the third
816
project, but was considered in the fourth project instead. In that project, one cycle of
817
the full RIGHT process model was carried out, and the software was instrumented for
818
experimentation although using ready-made services such as Google Analytics.
819
While our ultimate aim is for our models to cover the entire breadth of continuous
820
experimentation, we assume that not all real-life projects will need to instantiate all parts.
821
For instance, experiments can be conducted without an MVP, especially in an early
822
stage of product development. It may also not be necessary in all cases to have a heavy
823
infrastructure for the experimentation – this becomes relevant if experimentation is
824
conducted in very large volumes or when the purpose is to maintain a set of experiments
825
that are run continuously to collect trend information while the product is incrementally
826
changed.
827
In addition to the project-specific observations, we consider some more general
828
concerns. Having several experiments run in parallel presents a particular challenge.
829
The difficulty of interpreting online experiments has been convincingly demonstrated by
830
Kohavi et al. [16]. Statistical interactions between experiments should be considered in
831
order to assess the trustworthiness of the experiments. For this reason, it is important to
832
coordinate the design and execution of experiments so that correct inferences are drawn.
833
More generally, the issue of validity becomes important when the entire R&D organisa-
834
tion is experiment-driven. Incorrectly designed or implemented experiments may lead
835
to critical errors in decision-making. Threats to validity can also stem from a failure to
836
consider ethical aspects of experiments. Not only may unethical experiments damage
837
company reputation, but they may cause respondents to knowingly or unconsciously
838
bias the experimental results, leading to errors in decision-making.
839
Other challenges include the difficulty of prioritising where to start: which assump-
840
tion should be tested first. In Project 4, we identified a dependency between assumptions
841
regarding the backend recommendation logic and the assumption of what motivates users
36
842
to keep using the application. By invalidating the latter, we automatically invalidated
843
the first assumption. This highlights the importance of identifying critical assumptions,
844
as testing them first may save several unneeded experiments. We see a need for further
845
research into this area. Also, in hardware-software co-design, illustrated by the first
846
three projects, setting up the experimental cycle quickly is a major challenge due to both
847
the longer release cycle of hardware and the potential synchronisation problems between
848
hardware and software development schedules. Based on the findings presented in this
849
paper, it may be beneficial to test a few strategic technical assumptions first, such as the
850
viability of a certain hardware-software platform. As our case demonstrates, choosing
851
the correct platform early can have a significant impact on the ability to proceed to
852
actual service development.
853
A further set of challenges have to do with the model of sales and supplier networks.
854
Essentially all companies are dependent on a network of suppliers and sales channels. It
855
may be necessary to extend the model presented here to take into account the capabilities
856
particularly of hardware suppliers to supply the needed components in a timely fashion
857
and with the needed flexibility to programmatically vary behavioural parameters in
858
these components. Also, when the company is not selling its products directly to end
859
users, several levels of intermediaries may interfere with the possibilities to collect
860
data directly from field use. If a sales partner cannot grant access to end users, other
861
means of reaching the audience are needed. We envision using early-access and beta-test
862
programs for this purpose, a practice that is commonly used in the computer gaming
863
industry. Other models are possible, and there is an opening for further research in this
864
area.
865
In some cases, an experimental approach may not be suitable at all. For example,
866
certain kinds of life-critical software or software that is used in environments where
867
experimentation is prohibitively expensive, may preclude the use of experiments as
868
a method of validation. However, it is not clear how to determine the suitability of
869
an experimental approach in specific situations, and research on this topic could yield
870
valuable guidelines on when to apply the model presented here.
871
Another question is whether continuous delivery is a strictly necessary precondition
872
for continuous experimentation. In the beginning of the product development cycle,
37
873
experimentation must occur before much software is written at all. At that stage,
874
continuous delivery may not be necessary. Also, not all experiments require new
875
software to be delivered to users. While a continuous delivery system may exist, the
876
software itself may be architected for variability so that it can reconfigure itself at
877
run-time. In such cases, no new version of the software needs to be delivered for new
878
experiments to run. However, not all experiments are possible even with a very flexible
879
architecture that allows for run-time reconfiguration. Continuous delivery is a good
880
vehicle for delivering experiments to users and to ensure quality in the development
881
process. The model presented here is based on iterative, evolutive optimisation of
882
product features and an incremental model of innovation. To carry out revolutionary
883
innovation, the process needs to be extended with other means of discovering customer
884
value. These may profoundly invalidate the business model or strategy, and may even
885
have an impact on the overall vision.
886
Finally, experimentation may be conducted with several kinds of stakeholders. Apart
887
from customers and end users, experiments could be directed towards investors, suppli-
888
ers, sales channels, or distributors. Companies whose product is itself a development
889
platform may want to conduct experiments with developers in their platform ecosystem
890
to optimise the developer experience [9] of their tools, methods, and processes. These
891
experiments may require other kinds of experimental artefacts than the MVP/MVF,
892
including, e.g., processes, APIs, and documentation. Research on the types of experi-
893
mental artefacts and associated experimental designs could lead to fruitful results for
894
such application areas. Also, an open question is who should primarily lead or conduct
895
the experimentation, especially when the development organisation is separate from
896
the customer organisation. Some training may be needed for customers in order to
897
ensure that they can interact with the continuous experimentation process running in
898
the development organisation. Similarly, the development team may need additional
899
training to be able to interact with the customer to derive assumptions, plan experiments,
900
and report results for subsequent decision-making. Another possibility is to introduce
901
a mediating role which connects the customer and development organisations. More
902
generally, increasing the capability to perform experimentation and continuous software
903
engineering requires consideration of human factors in software development teams [23].
38
904
Further research is needed to determine how the experimental process works across
905
organisational borders, whether within or outside a single company.
906
A particular limitation of this study is the use of relatively short projects with student
907
participants. Students carried out the technical software development and analysis tasks
908
in the projects, while the researchers handled tasks related to identification of assump-
909
tions, generation of hypotheses, and higher-level planning tasks together with customer
910
representatives. While it is reasonable to expect that professional software developers
911
would have reached a different level of quality and rigour in the technical tasks, we
912
consider it likely that the findings are applicable beyond student projects since the focus
913
of this paper is not on the technical implementation but on the integration of experiment
914
results in the product development cycle and the software development process. The
915
length of the projects means that at most one experimental cycle could be carried out
916
in a single project. Thus the first case company completed three, and the second case
917
company one experimental cycle. In a real setting, multiple experimentation rounds
918
would be carried out over an extended period of time, proceeding from experiments
919
addressing the most important assumptions with the highest impact towards increasing
920
detail and optimisation. The findings of this study should be considered to apply mostly
921
in the early stages of experimentation.
922
6. Conclusions
923
Companies are increasingly transitioning their traditional research and product de-
924
velopment functions towards continuous experiment systems [12]. Integrating field
925
experiments with product development on business and technical levels is an emerg-
926
ing challenge. There are reports of many companies successfully conducting online
927
experiments, but there is a lack of a systematic framework model for describing how
928
such experiments should be carried out and used systematically in product development.
929
Empirical studies on the topic of continuous experimentation in software product devel-
930
opment is a fruitful ground for further research. Software companies would benefit from
931
clear guidelines on when and how to apply continuous experimentation in the design
932
and development of software-intensive products and services.
39
933
In this paper, we match a model for Continuous Experimentation based on analysis
934
of previous research against a multiple case study in the Software Factory laboratory
935
at the University of Helsinki. The model describes the experimentation process, in
936
which assumptions for product and business development are derived from the business
937
strategy, systematically tested, and the results used to inform further development of the
938
strategy and product. The infrastructure architecture for supporting the model takes into
939
account the roles, tasks, technical infrastructure, and information artefacts needed to
940
run large-scale continuous experiments.
941
A system for continuous experimentation requires the ability to release minimum
942
viable products or features with suitable instrumentation, design and manage exper-
943
iment plans, link experiment results with a product roadmap, and manage a flexible
944
business strategy. There are several critical success factors for such a system. The
945
organisation must be able to properly and rapidly design experiments, perform advanced
946
instrumentation of software to collect, analyse, and store relevant data, and integrate
947
experiment results in both the product development cycle and the software development
948
process. Feedback loops must exist through which relevant information is fed back from
949
experiments into several parts of the organisation. A proper understanding of what to test
950
and why must exist, and the organisation needs a workforce with the ability to collect
951
and analyse qualitative and quantitative data. Also, it is crucial that the organisation has
952
the ability to properly define decision criteria and act on data-driven decisions.
953
In future work, we expect the model to be expanded as more use cases arise in the
954
field. Domain-specific variants of the model may also be needed. Furthermore, there are
955
many particular questions with regard to the individual parts of the model. Some specific
956
areas include (i) how to prioritise assumptions and select which assumptions to test first;
957
(ii) how to assess validity and determine how far experimental results can be trusted
958
– especially how to ensure that experiments are trustworthy when running potentially
959
thousands of them in parallel; (iii) how to select proper experimental methods for
960
different levels of product or service maturity; and (iv) how to build a back-end system
961
for continuous experimentation that can scale to the needs of very large deployments, and
962
can facilitate and even partially automate the creation of experimental plans. Particular
963
questions regarding automation include which parts of the model could be automated or
40
964
supported through automation. Another question is how quickly a Build-Measure-Learn
965
block can be executed, and what the performance impact of the model is on the software
966
development process.
967
Acknowledgements
968
This work was supported by Tekes – the Finnish Funding Agency for Technology
969
and Innovation, as part of the N4S Program of DIGILE (Finnish Strategic Centre for
970
Science, Technology and Innovation in the field of ICT and digital business).
971
References
972
[1] Rob J. Adams, Bradee Evans, and Joel Brandt, Creating Small Products at a Big Company: Adobe’s
973
Pipeline Innovation Process, CHI’13 Extended Abstracts on Human Factors in Computing Systems,
974
2013, pp. 2331–2332.
975
[2] V Basili, J Heidrich, M Lindvall, J Münch, M Regardie, D Rombach, C Seaman, and A Trendowicz,
976
GQM+ Strategies: A comprehensive methodology for aligning business strategies with software measure-
977
ment, Proceedings of the DASMA Software Metric Congress (MetriKon 2007): Magdeburger Schriften
978
zum Empirischen Software Engineering, 2007, pp. 253–266.
979
980
981
982
[3] V. Basili, R. Selby, and D. Hutchens, Experimentation in Software Engineering, IEEE Transactions on
Software Engineering 12 (1986), no. 7, 733–743.
[4] Steve Blank, The Four Steps to the Epiphany: Successful Strategies for Products that Win, 2nd ed., K&S
Ranch, 2013.
983
[5] Jan Bosch, Building Products as Innovation Experiment Systems, Software Business, 2012, pp. 27–39.
984
[6] Jan Bosch, Helena Holmström Olsson, Jens Björk, and Jens Ljungblad, The Early Stage Software Startup
985
Development Model: A Framework for Operationalizing Lean Principles in Software Startups, Lean
986
Enterprise Software and Systems, 2013, pp. 1–15.
987
[7] U. Eklund and J. Bosch, Architecture for Large-Scale Innovation Experiment Systems, Joint Working
988
IEEE/IFIP Conference on Software Architecture (WICSA) and European Conference on Software
989
Architecture (ECSA), 2012, pp. 244–248.
990
[8] F. Fagerholm, N. Oza, and J. Münch, A platform for teaching applied distributed software development:
991
The ongoing journey of the Helsinki software factory, 3rd International Workshop on Collaborative
992
Teaching of Globally Distributed Software Development (CTGDSD), 2013, pp. 1–5.
993
994
[9] Fabian Fagerholm and Jürgen Munch, Developer Experience: Concept and Definition, International
Conference on Software and System Process, 2012, pp. 73–77.
41
995
[10] Rebecca M. Henderson and Kim B. Clark, Architectural Innovation: The Reconfiguration of Existing
996
Product Technologies and the Failure of Established Firms, Administrative Science Quarterly 35 (1990),
997
no. 1, 9–30.
998
999
[11] Alan R. Hevner, Salvatore T. March, Jinsoo Park, and Sudha Ram, Design Science in Information
Systems Research, MIS Quarterly 28 (2004), no. 1, 75–105.
1000
[12] Helena Holmström Olsson, Hiva Alahyari, and Jan Bosch, Climbing the “Stairway to Heaven” – A
1001
Mulitiple-Case Study Exploring Barriers in the Transition from Agile Development towards Continuous
1002
Deployment of Software, 39th EUROMICRO Conference on Software Engineering and Advanced
1003
Applications (2012), 392–399.
1004
1005
[13] Helena Holmström Olsson and Jan Bosch, The HYPEX Model: From Opinions to Data-Driven Software
Development, Continuous Software Engineering, 2014, pp. 155–164.
1006
[14] Natalia Juristo and Ana M. Moreno, Basics of Software Engineering Experimentation, Springer, 2001.
1007
[15] Ron Kohavi, Thomas Crook, and Roger Longbotham, Online Experimentation at Microsoft, Third
1008
Workshop on Data Mining Case Studies and Practice Prize, 2009.
1009
[16] Ron Kohavi, Alex Deng, Brian Frasca, Roger Longbotham, Toby Walker, and Ya Xu, Trustworthy
1010
Online Controlled Experiments: Five Puzzling Outcomes Explained, Proceedings of the 18th ACM
1011
SIGKDD International Conference on Knowledge Discovery and Data Mining, 2012, pp. 786–794.
1012
[17] Ron Kohavi, Alex Deng, Brian Frasca, Toby Walker, Ya Xu, and Ni Pohlmann, Online Controlled
1013
Experiments at Large Scale, Proceedings of the 19th ACM SIGKDD International Conference on
1014
Knowledge Discovery and Data Mining (KDD’13), 2013, pp. 1168–1176.
1015
[18] Beverly May, Applying Lean Startup: An Experience Report – Lean & Lean UX by a UX Veteran:
1016
Lessons Learned in Creating & Launching a Complex Consumer App, Agile Conference (AGILE) 2012,
1017
2012, pp. 141–147.
1018
[19] Jürgen Münch, Fabian Fagerholm, Patrik Johnson, Janne Pirttilahti, Juha Torkkel, and Janne Järvinen,
1019
Creating Minimum Viable Products in Industry-Academia Collaborations, Proceedings of the Lean
1020
Enterprise Software and Systems Conference (LESS 2013, Galway, Ireland, December 1-4), 2013,
1021
pp. 137–151.
1022
[20] Jürgen Münch, Fabian Fagerholm, Petri Kettunen, Max Pagels, and Jari Partanen, Experiences and
1023
Insights from Applying GQM+Strategies in a Systems Product Development Organisation, Proceedings
1024
of the 39th EUROMICRO Conference on Software Engineering and Advanced Applications (SEAA
1025
2013), 2013.
1026
1027
1028
[21] Jim Nieters and Amit Pande, Rapid Design Labs: A Tool to Turbocharge Design-led Innovation,
Interactions (2012), 72–77.
[22] Taiichi Ōno, Toyota production system: beyond large-scale production, Productivity press, 1988.
42
1029
[23] Efi Papatheocharous, Marios Belk, Jaana Nyfjord, Panagiotis Germanakos, and George Samaras, Per-
1030
sonalised Continuous Software Engineering, Proceedings of the 1st International Workshop on Rapid
1031
Continuous Software Engineering, 2014, pp. 57–62.
1032
[24] Mary Poppendieck, Lean software development: an agile toolkit, Addison-Wesley Professional, 2003.
1033
[25] Mary Poppendieck and Michael A Cusumano, Lean Software Development: A Tutorial, IEEE Software
1034
1035
1036
1037
1038
1039
1040
(2012), 26–32.
[26] Eric Ries, The Lean Startup: How Today’s Entrepreneurs Use Continuous Innovation To Create Radically
Successful Businesses, Crown Business, 2011.
[27] Daniel Ståhl and Jan Bosch, Modeling Continuous Integration Practice Differences in Industry Software
Development, Journal of Systems and Software (2014), 48–59.
[28] Annika Steiber and Sverker Alänge, A Corporate System for Continuous Innovation: The case of Google
Inc., European Journal of Innovation Management (2013), 243–264.
1041
[29] Diane Tang, Ashish Agarwal, Deirdre O’Brien, and Mike Meyer, Overlapping Experiment Infrastructure:
1042
More, Better, Faster Experimentation, Proceedings 16th Conference on Knowledge Discovery and Data
1043
Mining, 2010, pp. 17–26.
1044
[30] Joan E. van Aken, Management Research Based on the Paradigm of the Design Sciences: The Quest
1045
for Field-Tested and Grounded Technological Rules, Journal of Management Studies 41 (2004), no. 2,
1046
219–246.
1047
1048
1049
[31] Claes Wohlin, Per Runeson, Martin Höst, Magnus C. Ohlsson, Björn Regnell, and Anders Wesslén,
Experimentation in Software Engineering, Springer, 2012.
[32] Robert Yin, Case study research: design and methods, 4th ed., SAGE Publications, Inc., 2009.
43