Information and Software Technology: Banu Aysolmaz, Henrik Leopold, Hajo A. Reijers, Onur Demirörs

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Information and Software Technology 93 (2018) 14–29

Contents lists available at ScienceDirect

Information and Software Technology


journal homepage: www.elsevier.com/locate/infsof

A semi-automated approach for generating natural language


requirements documents based on business process models
Banu Aysolmaz a,b,∗, Henrik Leopold a, Hajo A. Reijers a, Onur Demirörs c,d
a
Vrije Universiteit Amsterdam, Department of Computer Science, De Boelelaan 1105, 1081HV Amsterdam, The Netherlands
b
Maastricht University, School of Business and Economics, PO Box 616, 6200 MD, Maastricht, The Netherlands
c
Izmir Institute of Technology, Department of Computer Engineering, 35430, Urla, Turkey
d
University of New South Wales, School of Computer Science and Engineering, Barker St, Kensington NSW 2052, Australia

a r t i c l e i n f o a b s t r a c t

Article history: Context: The analysis of requirements for business-related software systems is often supported by using
Received 9 March 2017 business process models. However, the final requirements are typically still specified in natural language.
Revised 6 July 2017
This means that the knowledge captured in process models must be consistently transferred to the speci-
Accepted 18 August 2017
fied requirements. Possible inconsistencies between process models and requirements represent a serious
Available online 30 August 2017
threat for the successful development of the software system and may require the repetition of process
Keywords: analysis activities.
Requirements elicitation
Objective: The objective of this paper is to address the problem of inconsistency between process models
Business process model
Natural language generation
and natural language requirements in the context of software development.
Method: We define a semi-automated approach that consists of a process model-based procedure for
capturing execution-related data in requirements models and an algorithm that takes these models as in-
put for generating natural language requirements. We evaluated our approach in the context of a multiple
case study with three organizations and a total of 13 software development projects.
Results: We found that our approach can successfully generate well-readable requirements, which do not
only positively contribute to consistency, but also to the completeness and maintainability of require-
ments. The practical use of our approach to identify a suitable subcontractor on the market in 11 of the
13 projects further highlights the practical value of our approach.
Conclusion: Our approach provides a structured way to obtain high-quality requirements documents
from process models and to maintain textual and visual representations of requirements in a consistent
way.
© 2017 Elsevier B.V. All rights reserved.

1. Introduction Despite this prominent role of business process modeling for


requirements analysis, the actual specification of requirements is
Business process modeling is an established method for docu- commonly conducted using natural language [5–8]. This means
menting, analyzing, and improving organizational operations. What that the knowledge captured in process models must be consis-
is more, it has become a widely accepted practice in software engi- tently transferred to natural language requirements. On the one
neering [1–3]. In particular for analyzing requirements of business- hand, this is a complex and time-consuming task [9,10]. On the
related software systems business process modeling has proven to other hand, updates at later stages in either the textual or the
be an effective means [4]. Process models do not only provide an model-based requirements come with the risk of inconsistencies
overview of the operations that must be supported by the to-be [11–13]. Such inconsistencies between the process model and the
developed software systems, but also show how these operations resulting requirements represent a serious threat for the success-
are related to the different organizational roles and systems. ful development of the respective software system throughout the
Software Development Lifecycle (SDLC). More specifically, they may

result in a system that does not fully reflect the functionality de-
Corresponding author:
fined in the process models.
E-mail addresses: [email protected], [email protected]
(B. Aysolmaz), [email protected] (H. Leopold), [email protected] (H.A. Reijers), To address this problem, we propose a semi-automated ap-
[email protected] (O. Demirörs). proach whose final output are generated requirements documents

http://dx.doi.org/10.1016/j.infsof.2017.08.009
0950-5849/© 2017 Elsevier B.V. All rights reserved.
B. Aysolmaz et al. / Information and Software Technology 93 (2018) 14–29 15

that integrate process model and execution-related data in an un- A satisfying explanation for these opposing views is provided by
derstandable fashion. As a result, organizations can systematically the Cognitive Theory of Multimedia Learning (CTML) [21], which
transfer the knowledge captured in their process models to other has been developed through more than a decade of empirical re-
SDLC activities and create consistent and maintainable artifacts. search. Among others, it discusses the concept of learning prefer-
Our proposed approach consists of three main steps. In the first ence, which suggests that both textual and visual representations
step, we analyze the process models that are relevant for the sys- should be presented at the same time. The rationale behind this
tem to be developed and identify the set of automatable activi- concept is that people with different backgrounds may simply have
ties. In the second step, we capture execution-related data, such different preferences and cognitive abilities. By providing both rep-
as responsibilities, application systems, data needs, and additional resentations, they are provided with a choice.
constraints in a requirements model. In the third step, we auto- Transferred to the field of requirements engineering, the CTML
matically generate requirements documents from the created mod- suggests that both models and natural language requirements
els via a template-based natural language generation algorithm. should be used for capturing and discussing requirements. In fact,
The consistency between the processes and the requirements is by this view is supported by many researchers. For instance, Weber
definition guaranteed by the generation feature of the approach. and Weisbrod discuss the importance of natural language require-
To evaluate the impact of our approach on other key character- ments for communication, but also highlight that the sole use of
istics of high-quality requirements–readability, completeness and natural language is hardly feasible for complex projects [22]. They
maintainability–, we conducted a multiple case study that involved propose the additional use of so-called requirements management
3 different organizations and a total of 13 software development information models (RMIs). In a similar way, Schatz et al. [23] and
projects. We found that the requirements documents generated Davis [24] propose to combine text-based and model-based re-
by our approach were considered to be well-readable, almost per- quirements. Nicolás and Toval even explicitly discuss the value of
fectly complete, and beneficial for improving consistency as well as generation in this context [25]. They argue that generation reduces
maintainability. Meeting these key requirement characteristics was the effort and, at the same time, increases the quality and trace-
found to be essential to enhance the usability of the requirements ability of the requirements.
by domain experts, analysts, project managers, and software devel- Recognizing the potential of automatically generating natural
opers. In 11 of the projects, the generated artifacts were used for language requirements, we define a respective approach for pro-
identifying a suitable subcontractor on the market for developing cess models in this paper. To highlight what is specifically missing
the respective systems, which confirmed the usability of the ap- to define such an approach, the next section reviews related work
proach in practical settings. on process models in the context of requirements engineering.
The remainder of this paper is structured as follows. In
Section 2 we elaborate on the background of our research and 2.2. Process models and requirements engineering
identify the research gap that we will address. In Section 3, we in-
troduce our semi-automatic approach for generating requirements Many authors have emphasized the important role of process
documents. In Section 4, we present and discuss the findings of models in the context of specifying requirements of software sys-
our multiple case study. In Section 5, we elaborate on the steps tems [26–28]. Some authors even go so far as considering their
required for adapting the presented approach to languages other use as mandatory [1,3]. However, the specific role of process mod-
than English. In Section 6 we discuss the implications of our work els differs considerably among available approaches. Table 1 gives
before concluding the paper in Section 7. an overview of the most relevant works using process models in
the context of requirements engineering. As Table 1 illustrates, we
2. Background differentiate between works that use process models in a manual
and in an automated way.
In this section, we discuss the background of our paper. In The related work that discusses the manual use of process mod-
Section 2.1, we first clarify the relevance and the value of gener- els in the context of requirements engineering can be further cat-
ating natural language requirements. In Section 2.2, we then elab- egorized into works that elicit textual and that elicit model-based
orate on the use of process models in requirements engineering. requirements from process models.
We close the section by pointing out what is still missing to de- The main insight of the works from the first subcategory that
fine an approach for automatically generating high quality require- elicit textual requirements from process models is that process
ments from process models. models represent an effective way of steering the activity of re-
quirements elicitation and enhance the completeness, correctness,
2.1. The value of requirements generation and traceability of the final requirement statements [4]. Cardoso
et al. analyze the level of automation for each activity in the pro-
While many would argue that models are the preferred means cess models and then define a set of textual requirements for the
to foster communication, others favor requirements in textual for- activities to be automated [4]. In a similar manner, Ma and Jiang
mat. At its heart, the question about the value of generating nat- define a set of textual requirements for each activity of a pro-
ural language requirements relates to the debate whether textual cess [7]. Mayr et al. discuss that detailed notions for requirements
or visual representations are superior in terms of communication should be specified based on process models and they also map
effectiveness. Interestingly, this debate is neither new nor lim- requirements in sentence form to the process models [28]. Li et al.
ited to the field of requirements engineering. The first studies ad- propose a method to link textual requirements to activities in the
dressing this controversy date back to the seventies. At this time, process model [8]. Such links help to identify dependencies be-
psychologists empirically compared the expressive power of nat- tween requirements consecutively being used for discovering miss-
ural language texts with matrices, spatial maps, and tree repre- ing and ambiguous text-based requirements. Demirörs et al. an-
sentations [14–17]. Later, many studies from the field of computer alyze and define not only functional requirements, but also non-
science contributed to the debate. Among others, authors com- functional, security, and hardware requirements based on process
pared the comprehension performance of code-based representa- models. Lastly, Monsalve et al. elaborate on the usage of process
tions and flow diagrams [18–20]. The conclusions of these and modeling notations for eliciting and expressing user requirements
other works remain, however, contradictory. Some argue in favor on a strategic level. They find Qualigram more helpful in this re-
of text-based other argue in favor of visual representations. spect than BPMN [30]. What all these works have in common is
16 B. Aysolmaz et al. / Information and Software Technology 93 (2018) 14–29

Table 1
Work combining process models and requirements engineering.

Approach Authors

Manual use of process models


Elicitation of textual requirements
Requirements engineering based on business process models Cardoso et al. [4]
Business process modeling and requirements modeling Mayr et al. [28]
Process-oriented information system requirements engineering Ma and Jiang [7]
A business process-driven approach for requirements dependency analysis Li et al. [8]
Utilizing business process models for requirements elicitation Demirörs et al. [29]
Requirements elicitation using BPM notations Monsalve et al. [30]
Elicitation of model-based requirements
A goal-based approach on business process-driven requirements engineering González and Díaz [31]
Deriving requirements from process models via the problem frames approach Cox et al. [32]
Automated use of process models
Transformation of business process models to business rules Malik and Bajwa [33]
Supporting process model validation through natural language generation Leopold et al. [9]
Generating functional requirements from process models Türetken et al. [34]
Bridging the gap between business process modeling and software requirements analysis Coşkunçay et al. [35]

that they exemplify how process models can support requirements of natural language requirement specification based on process
elicitation. What is more, they show that process models are also models.
useful for identifying gaps and problems, thus for validating re- This literature review showed that process models play an im-
quirements with end users. portant role in the context of analyzing and representing system
The second subcategory of works that elicit model-based re- requirements. What is more, it showed that first approaches con-
quirements from process models illustrates that process models sidering the automated generation of requirements based on pro-
are also useful for deriving model-based requirements. For in- cess models have already been introduced. What is still missing is
stance, González and Díaz suggest to build a goal model using the an approach that integrates the complete set of execution-related
activities from process models [31]. They subsequently use the goal data and provides the user with consistent, well-readable, and also
model to establish the use cases and their relations. However, the well-maintainable requirements. Recognizing this gap, we use this
specification remains on the use case diagram level and the us- paper to propose a semi-automated approach that automatically
age of the suggested role and resource models in the context of generates textual requirements documents based on process mod-
the requirements definition is left open. Cox et al. discuss that the els and execution-related data. We will show that our approach
framing of real-world problems for capturing and classifying soft- provides a structured way to obtain consistent requirements that
ware development problems is a difficult task in reality. They de- are well readable, complete, and easy to maintain.
fine a set of steps to manually develop problem frame diagrams
together with textual requirements using role activity diagrams. 3. Conceptual approach
Rather than being an elicitation and validation tool between do-
main experts and modelers, the problem frames approach enables In this section, we introduce our approach for the semi-
the formal analysis of requirements for verification. What both ap- automated generation of requirements documents based on pro-
proaches have in common is that they enhance the representa- cess models. As illustrated by Fig. 1, the approach consists of two
tional capabilities of process models for requirements elicitation. main phases: a preparation phase and a generation phase. In the
However, they do not consider automated support. preparation phase, we first analyze the input process model(s) and
Related work on the automated use of process models in the identify automatable activities. Then, we analyze the requirements
context of requirements engineering consider process models as for the automated execution of these activities and create a re-
the final requirements artifact and focus on the benefits of ver- quirements model for each of them. In the generation phase, these
balizing the models in the requirements elicitation and validation requirements models are used as the input for the automated gen-
phases. For instance, Leopold et al. analyze the activity labels and eration of the requirements documents. In the following subsec-
the control flow of process models to automatically generate cor- tions, we introduce the details of each phase and illustrate our
responding natural language descriptions of the models [9]. Ma- concepts using a running example.
lik and Bajwa provide a sentence generation algorithm for require-
ments using a template-based approach [33]. Though their ap- 3.1. Preparation phase
proach does not include clear text structuring techniques, the con-
sideration of the message flow between parties is an important The starting point of the preparation phase is a set of process
feature to reveal requirements on system interactions. Türetken models. We manually analyze each of the input process models
et al. include a broader set of process elements in the generated to identify automatable activities. Activities that can either be sup-
sentences, including roles, input and output data, events, and sys- ported by the system to be developed or can be totally automated
tems [34]. Consideration of such elements is important to be able are marked and added to the list of automatable activities. Unclear
to express requirements that concern other aspects than control cases are discussed with the respective process owners. The result
flow. However, they rely on a certain process structure, do not of this step is a set of automatable activities that constitute the ba-
consider all execution-related aspects, and only generate rudimen- sis for associating the underlying business processes with the re-
tary sentences. The work of Coşkunçay et al. specifies the need quirements.
for analyzing additional data for process automation in a sepa- To illustrate this step, consider the business process shown in
rate set of models, though it lacks a description of requirements Fig. 2. It describes the evaluation of project proposals by indepen-
analysis approach and a formal generation technique. The stud- dent auditors (IAs) in the context of a grant program. It is depicted
ies in this group commonly express the need for the automation using the Event Driven Process Chain (EPC) notation, a modeling
B. Aysolmaz et al. / Information and Software Technology 93 (2018) 14–29 17

Process
model(s)

Preparation phase Generation phase


Automatable Requirements
activities models
Identification of
Requirements Sentence Sentence Document
automatable
analysis generation refinement organization
activities

Generated
requirements
document(s)
Fig. 1. Overview of our approach.

language widely used in industry [36]. In this paper, we use the and Vanthienen, possible business constraints can, for instance,
EPC notation as an example to illustrate our approach. Note, how- emerge from business regulations, business policies, costs and
ever, that our approach can be applied to other process modeling benefits, time, information prerequisites, and technical circum-
notations such as the Business Process Model and Notation (BPMN) stances [45].
without adaptations. The example process from Fig. 2 is triggered
Typically, the information about these aspects must be obtained
when evaluations for proposals are required. The first activity is to
from domain experts who are part of the respective business pro-
assign the proposals to IAs. Once the proposals have been assigned
cesses. We propose the use of the following questions to infer the
to IAs, they are evaluated. Then, the proposal score is registered
required information:
and the evaluation status is reviewed. In case the evaluations are
not yet finished, they are evaluated by other IAs. Otherwise, the • (Q1) Who will be responsible to perform this activity and what
evaluation plan is updated and a status report is prepared. Upon will be the responsibility types involved?
closer inspection of the process model from Fig. 2, it becomes clear • (Q2) What are the data entities needed to execute this activity
that it contains four automation candidates: “Assign proposals to and how are they used?
IAs”, “Register proposal score”, “Update evaluation plan”, and “Prepare • (Q3) Which internal and external systems are interacted with
IA status report”. The other activities must be performed manually for the execution of this activity?
and are outside the scope of the system to be developed. These • (Q4) What constraints and rules need to be taken into account
activities are “Evaluate proposal” and “Review evaluation status”. during the execution of this activity?
The second step of the preparation phase is the requirements
analysis. The main goal of this step is to specify how the activities Based on these questions, we elicit the relevant functional re-
are to be executed. This requires the identification of execution- quirements from the domain experts and capture the results for
related data for activities. Building on the insights from [3,37–39], each activity in a requirements model. More specifically, we use
we investigate the following four execution-related aspect for each a customized version of the so-called Function Allocation Diagram
automatable activity: (FAD) introduced as part of the ARIS method [36]. FADs are used to
focus on the details of an individual activity by depicting the pro-
• Responsibilities: To specify the responsibilities associated with cess elements related to that activity. For complete requirements,
an activity, we adopt the so-called RASCI matrix [40,41]. This we need to represent the aforementioned four execution-related
means that we do not only capture the different roles that aspects in the requirements model. The FAD is a conceptual model
are involved in the execution of the activity, but also capture that allows us to do so by adding respective model elements for
their specific responsibilities, such as “carries out” or “approves”. the execution-related aspects. Fig. 3 shows an exemplary FAD for
In conformance with the RASCI concept [42], we also capture the activity “Register proposal score”. It shows that the activity is
whether multiple roles share the specified responsibility (e.g. associated with three roles. The “Project Officer” and the “Evalua-
whether multiple roles may “carry out” or “approve” the activ- tion Committee Member” are responsible for carrying it out while
ity) or whether the role has the exclusive responsibility (e.g. the “Independent Auditor” is responsible for its approval. Note that
only that role can “carry out” or “approve” the activity). the marker “+” indicates that a responsibility can be exercised by
• Data needs: As for the data needs, we specify how data entities either of the associated roles. A responsibility without a marker,
are used by the activity [43,44]. Therefore, we adopt the CRUDL therefore, represents a responsibility that is jointly exercised by all
approach and capture manipulation operations (create, update, associated roles. The FAD also specifies the data needs of the ac-
delete) and usage operations (read, use, view, list). tivity. Among others, we can see that the “Project proposal” is read
• System interactions: During the execution of an activity, inter- and the “Proposal status” is viewed and updated. We can also see
actions with multiple systems may take place. We identify both the two systems that are relevant for the activity –the “Grant Man-
internal applications that are to be developed as part of the sys- agement System and the “IA Registration System”– and how they are
tem and external applications that the system communicates connected with the data needs and operations. Lastly, we observe
with (e.g., web services). In this way, not only internal entity two constraints that are associated with the two systems. They are
operations, but also data interface requirements are revealed. expressed using natural language and specify that (1) a third eval-
• Execution constraints: In addition to the later three aspects, we uation is requested in case two evaluations differ to a certain de-
also capture constraints of the application system during the ex- gree and that (2) IAs might be dropped if they continuously submit
ecution of the considered activity. As categorized by Goedertier contradicting evaluations.
18 B. Aysolmaz et al. / Information and Software Technology 93 (2018) 14–29

sentences specifying the usage and manipulation of the data enti-


Evaluation
ties (data need sentences). Third, we generate sentences describing
for proposals
the constraints (constraint sentences). Note that there is no dedi-
required
cated sentence type for system interactions. They are either covered
by data need sentences (if the system interaction relates to a data
Assign
need) or constraint sentences.
proposals to
To implement the generation of these different sentence types,
IAs
we adopt the so-called template filling approach [47]. The ratio-
nale behind this approach is to define sentence templates which
contain well-defined gaps. By filling a template with the respec-
tive information (in our case the information from a requirements
model), proper sentences are constructed in an automated fashion.
Evaluate
proposal The advantages of such template filling approaches are their speed,
the consistency of the produced sentences, and the high linguistic
quality of the output. What is more, it does not require any spe-
Register cific knowledge related to natural language generation to adapt the
proposal system [47]. Hence, they are often considered as a viable choice
score for natural language generation [48]. Table 2 gives an overview of
the sentence templates we defined for the three sentence types.
Review The first three templates (R1 to R3) are used to generate sentences
evaluation
about the responsibilities associated with the activity and, there-
status
fore, answer question Q1. The templates on data needs (D1 and D2)
Evaluations serve the purpose of generating sentences with respect to ques-
are not tions Q2 and Q3. Lastly, the answer to the question Q4 is provided
finished by means of the sentences generated by template C1. The gaps in
the templates that need to be filled with information from the re-
Evaluations quirements model are indicated by terms between “ < ” and “ > ”.
are finished While the terms for roles, responsibilities, entities, operations, and
systems are directly obtained from the labels of the model, the ac-
tivity is split into an action (i.e., the verb) and an object, and the
constraint is split into a condition and a consequence. Both opera-
Update tions can be automatically performed using available tools. Deriv-
Prepare IA
evaluation ing action and object from activity labels is possible with the tech-
status report
plan nique introduced in [49] and splitting conditional sentences can be
implemented using the Stanford Parser [50]. Note that verbs may
occur in different grammatical forms (i.e., base form, gerund, and
participle).
Evaluation Algorithm 1 formalizes the steps of our template-based sen-
finished tence generation approach. The algorithm requires a requirements
model (e.g. an FAD) as input. As a result, it returns a list of sen-
tences.
Fig. 2. Exemplary EPC model for daily independent auditor evaluation process.
The algorithm starts with the creation of a list s for the gener-
ated sentences (line 1). The first part of the algorithm is then con-
cerned with generating the responsibility sentences (lines 2–17).
In the next section, we explain how such a requirements model
It begins by checking whether the considered requirements model
can be used for the automated generation of a requirements docu-
contains roles (line 2). If that is the case, it is checked whether all
ment.
roles exclusively perform “carry out”-operations (line 3). If yes, a
responsibility sentence using template R1 is created for each role
3.2. Requirements document generation (lines 4–7). To this end, the required information (role, action, and
object) are derived from the requirements model. If the roles also
This section defines our approach for generating textual re- perform other operations, a responsibility sentence using template
quirements documents from the requirements models defined in R2 is created for each role (lines 9–12). Since this template requires
the preparation phase. In line with other natural language gen- the action in the gerund form (e.g. “defining” instead of “define”),
eration systems, we adopt the traditional pipeline concept [46]. we use the lexical database WordNet to derive the gerund form
In particular, as outlined by Fig. 1, we follow a three step proce- from the base form. In case the considered responsibility model
dure. First, we generate the sentences from the requirements mod- does not contain any roles, a responsibility sentence using tem-
els. Then, we refine the generated sentences by aggregating them plate R3 is created (lines 15–16). Instead of using a role descrip-
in a way that appeals to the user. Finally, we organize the gen- tion, this sentence uses the name of the main system.
erated sentences in the context of a document structure. In the The second part of the algorithm handles the generation of the
Sections 3.2.1 through 3.2.3, we explain the details of each step. data need and constraint sentences (lines 18–34). For this purpose
each system from the requirements model is analyzed separately.
3.2.1. Sentence generation For each system, the algorithm then analyzes the respective oper-
To adequately reflect the information captured in the require- ations that are associated with this system (lines 19–27). If a con-
ments model, we generate three types of sentences. First, we gen- sidered operation is of type “use”, a sentence using template D1 is
erate sentences describing which roles are involved in the execu- created (lines 20–22). For other operations than “use”, a sentence
tion of the activity (responsibility sentences). Second, we generate using template D2 is created (lines 23–25). This requires the place-
B. Aysolmaz et al. / Information and Software Technology 93 (2018) 14–29 19

Table 2
Sentence templates for requirements generation.

Type No. Sentence template

Responsibility R1 The < Role > shall < Action > < Object > .
R2 The < Role > shall < Responsibility > the operation of < ActionGerund > < Object > .
R3 The < System > shall automatically < Action > < Object > .
Data need D1 While < ActionGerund > and by using the < Entity > , operations shall be performed on the < System > .
D2 While < ActionGerund > , the < Entity > shall be < OperationParticiple > on/from the < System > .
Constraint C1 < ConstraintCondition > while < ActionGerund > on the < System > , < ConstraintConsequence > .

Evaluation
Project Independent
Committee
Officer Auditor
Member
carries out+ carries out+ approves

Register
Proposal
proposal
use IA score
score

read create view update update IA status


update
Project Proposal Proposal Proposal Assigned
proposal IA score status status IA list IA
IA Registration
Proposal
create System
evaluation

Grant IA
Management list repository If the evaluations of an IA
System continuously do not com-
ply with the other evalua-
IA tion, this is marked and
update repository the IA is dropped from
If the score differences of
the two IAs are more than the available IA list.
15, a third IA shall be as-
signed to the proposal.

Fig. 3. The FAD for Register proposal score activity in Daily IA Evaluation process.

ment of the participle form of that operation name. Finally, respec- (as indicated by the presence of a marker), we insert the cor-
tive sentences for the constraints are generated (lines 28–33). In rect conjunction to express a shared or exclusive responsibil-
case the considered system is associated with one or more con- ity among multiple roles. In case of a marker, we respectively
straints, a sentence following the template C1 is created for each insert the conjunction “or”, otherwise we insert the conjunc-
of these (lines 30–31). Once all available components from the re- tion “and”. If more than one responsibility type is used, the
quirements model are verbalized, the list of sentences s is returned sentences are combined to include those types. For instance,
and the algorithm has completed. we generate “The Project Officer shall register, and the Indepen-
To illustrate the effect of Algorithm 1, Table 3 provides exam- dent Auditor shall approve the operation of registering the project
ples for sentences generated based on the requirements model score”.
from Fig. 3. • Object aggregation: If the model contains multiple entities with
the same operations, we merge the sentences. For instance, in-
3.2.2. Sentence refinement stead of keeping “While registering the proposal score, the pro-
In this step, we refine the generated responsibility and data posal status shall be updated on the Grant Management System”
need sentences to enhance their readability. We apply one aggre- and “While registering the proposal score, the assigned IA list shall
gation technique for responsibility sentences and three aggregation be updated on the Grant Management System”, we generate the
techniques for data need sentences as described below. refined sentence “While registering the proposal score, the pro-
posal status and the assigned IA list shall be updated on the Grant
• Role aggregation: If the same responsibility type is applicable Management System”.
for multiple roles, we merge the respective sentences. For in- • Operation aggregation: If the model contains multiple operations
stance, instead of keeping the two sentences “The Project Offi- for the same entity, we apply the same procedure as for the
cer shall register the proposal score” and “The Committee Mem- object aggregation. For instance, we generate “the IA repository
ber shall register the proposal score”, we generate the refined shall be listed and updated” instead of discussing these aspects
sentence “The Project Officer or the Committee shall register the in different sentences.
proposal score”. Note that depending on the specific connection
20 B. Aysolmaz et al. / Information and Software Technology 93 (2018) 14–29

Algorithm 1: generateSentences(RequirementsModel rm).


1: List sentences = new List();
2: if rm.getRoles() = ∅ then
3: if rm.getResponsibilities().containsOnly(“carry out”) = true then
4: for all Role r ∈ rm.getRoles() do
5: Sentence s = fillTemplateR1(r, r m.getAction(),r m.getObject());
6: sentences.add(s);
7: end for
8: else
9: for all Role r ∈ rm.getRoles() do
10: Sentence s = fillTemplateR2(r,transformToGerund(rm.getAction()),rm.getObject());
11: sentences.add(s);
12: end for
13: end if
14: else
15: Sentence s = fillTemplateR3(rm.getMainSystem(),rm.getAction(),rm.getObject());
16: sentences.add(s);
17: end if
18: for all System sys ∈ rm.getSystems() do
19: for all Operation o ∈ sys.getOperations() do
20: if o = “use” then
21: Sentence s = fillTemplateD1(o.getEntity(),sys,transformToParticiple(rm.getAction()));
22: sentences.add(s);
23: else
24: Sentence s = fillTemplateD2(o.getEntity(),sys,transformToGerund(rm.getAction()));
25: sentences.add(s);
26: end if
27: end for
28: if rm.getConstraints() = ∅ then
29: for all Constraint c ∈ sys.getConstraints() do
30: Sentence s = fillTemplateC1(c.getCondition(),c.getConsequence(),sys,transformToGerund(rm.getAction()));
31: sentences.add(s);
32: end for
33: end if
34: end for
35: return s;

Table 3 As a result of applying four aggregation techniques on the sen-


Exemplary sentences generated from the requirements model from Fig. 3.
tences generated in the example requirements model in Fig. 3, the
Type No. Example following refined sentences are obtained:
Responsibility R1 The Project Officer shall carry out the operation of
1. “The Project Officer or the Evaluation Committee Member shall
registering the proposal score.
R2 The Independent Auditor shall approve the operation carry out, and the Independent Auditor shall approve the opera-
of registering the proposal score. tion of registering the proposal score.”
Data need D1 While registering the proposal score, the project 2. “While registering the proposal score, the project proposal shall be
proposal shall be read from the Grant Management read, the proposal status shall be viewed and updated, the pro-
System.
posal IA score shall be created, and the assigned IA list shall be
D2 While registering the proposal score and by using the
proposal IA score, operations shall be performed on updated on the Grant Management System.”
the IA Registration System. 3. “While registering the proposal score and by using the proposal IA
Constraint C1 If the score differences of the two IAs are more than score, the IA repository shall be listed and updated, the IA proposal
15 while registering the proposal score on the Grant
evaluation shall be created, and the IA status shall be updated on
Management System, a third IA shall be assigned to
the proposal.
the IA Registration System.”

3.2.3. Structuring of the document


Upon completion of the two phases of the generation, the re-
• System aggregation: If the model contains multiple systems, we quirements sentences need to be organized in the context of a doc-
further merge the previously refined sentences into a single ument. For this purpose, we assign unique IDs to the aggregated
one. For instance, we merge the sentences for two systems as requirements sentences. The requirements document can then be
“While registering the proposal score, the project proposal shall be organized in two different ways as explained below.
read on the Grant Management System, and the proposal evalu-
ation shall be created on the IA Registration System” instead of Process-based document. In this way of organizing the document,
having two different sentences for each system. This is an op- we exploit the hierarchy of the considered process models to struc-
tional refinement step and it is only applied when the require- ture the document. Thus, the application of this type requires a
ments model includes a small number of entities. hierarchical organization of the process models. Table 4 illustrates
B. Aysolmaz et al. / Information and Software Technology 93 (2018) 14–29 21

Table 4 quired us to get involved in cases with varying characteristics.


Process-based requirements document structure.
While assembling a suitable set of cases, we considered two main
1 < Top-level process name > Process goals. First, we wanted to show the value of our approach in a
REQ11 < Responsibility sentence for activity 1 > practical setting. Thus, we required one or more organizations that
REQ21 < Data need sentence for activity 1 and system 1 >
were willing to perform business process and requirements anal-
REQ31 < Data need sentence for activity 1 and system 2 >
... (all data need sentences) ysis as part of a system development project. Second, we wanted
REQX1 < Constraint sentence for activity 1 > to compare our approach to traditional requirements engineering.
... (all constraint sentences) Therefore, we sought at least one organization that was interested
REQ1N < Responsibility sentence for activity N > in applying our approach after the actual development project, al-
REQ2N < Data need sentence for activity N and system 1 >
lowing us to retrospectively compare the results.
REQ3N < Data need sentence for activity N and system 2 >
.. (all data need sentences) As a result of these considerations, we selected three case sets.
REQXN < Constraint sentence for activity N > Each of these case sets consisted of a program covering multiple
... (all constraint sentences) projects. Projects in a program were managed by the same inte-
1.1 < First-level process name > Process
grator organization, which used similar principles and practices.
REQ1M < Responsibility sentence for activity M >
REQ2M < Data need sentence for activity M and system 1 > The integrator organization was in charge of subcontracting the
... projects. Altogether, the case sets included 13 projects pertaining
to different process areas. The large number of projects allowed us
to receive comprehensive feedback and to make wide-ranging ob-
the resulting document structure. The first level heading is derived servations. Furthermore, we were able to implement the approach
from the process model on top of the process hierarchy. Then, the in a wide variety of process areas. In line with our case selection
requirements sentences generated for the top-level process model goals, two of the case sets represented new development projects
are listed. Afterwards, we create a subheading for each of the pro- and one was a retrospective set. Table 5 gives an overview of the
cess models from the levels underneath and list the requirements most important characteristics of the case sets. It shows the types
under the respective subheading. All sentences are organized in the of the projects belonging to each case set as well as the number
same way by recursively processing all models. The order of the re- of involved process models and activities. In the paragraphs below,
quirements for a specific process is derived from the order of the we describe the details of each case set.
respective activities.
Case Set 1 e-Government. The e-Government case set was managed
System-based document. In this document style, we use the sys- by the leading integrator organization for e-government projects
tems to be developed to organize the requirements sentences. That in Turkey. The case set consists of a program comprising two
is, we list the requirement sentences under the respective head- projects. The program was initiated to develop two online sys-
ings of the systems. Responsibility sentences are placed under the tems for managing all processes related to the life cycle of com-
heading for the main system. In case this style is used, the system panies (e-Company) and trademarks (e-Trademark) registered in
aggregation technique is not applied on data need sentences in the North Cyprus. A team of three external analysts worked on two
sentence refinement phase. projects in parallel, together with three internal analysts and two
Once all these steps have been completed, users are provided domain experts. In addition, 15 domain experts were occasionally
with automatically generated requirements documents, organized involved in the workshops to provide domain-specific knowledge,
in their preferred style. In the next section, we apply our semi- but did not take part in the preparation and evaluation of the out-
automated approach in the context of a case study to demon- puts. The internal analysts were experienced in different modeling
strate the improvements in obtaining higher-quality requirements notations, while the domain experts had only used natural lan-
in terms of key requirement characteristics. guage before and had no experience with process modeling no-
tations.
4. Evaluation
Case Set 2 Public Services. The Public Services case set was man-
In order to show the feasibility of our approach in practice, we aged by the Turkish Ministry of Development, the responsible in-
applied it in a real-world setting [51]. More specifically, we con- stitution for regional development agencies. In total, this case set
ducted a multiple case study using a set of three different orga- consists of nine different projects. Among others, they cover the
nizations and 13 projects with varying characteristics [52]. In this automation of public service processes provided by the develop-
way, we were able to improve the generalizability of the find- ment agencies, such as grant programs and investment support,
ings and to demonstrate the value of our approach [53]. The over- but also internal processes such as human resource management.
all goal of the evaluation is to learn whether the project teams The team included three external and four internal analysts, to-
from our case study perceive the generated requirements docu- gether with four domain experts who took part in the preparation
ments as well-readable, complete, consistent, and easy to main- of the outputs, and 66 domain experts that were occasionally in-
tain. In Section 4.1, we explain the rationale for the selection of volved in analysis activities. The domain experts were not expe-
the cases and introduce them in detail. In Section 4.2, we briefly rienced in modeling notations and the internal analysts had only
describe our implementation of the approach in the context of a used flowcharts before.
prototype tool. In Section 4.3, we provide details on how we con-
ducted the case study. In Section 4.4, we present the findings of Case Set 3 Campus System. The Campus System case set was man-
our case study. In Section 4.5, we compare the manually created aged by the Computer Center of the Middle East Technical Uni-
and generated requirements documents. In Section 4.6, we discuss versity (METU), which is the top-ranked university in Turkey. The
the limitations of our evaluation. whole program consists of the automation of over 90 business pro-
cesses, which concern the areas of research, education, campus ser-
4.1. Overview of cases vices, and support. From this set, we selected two representative
projects for this evaluation (Announcement and Research Program
Process modeling and requirements analysis activities are per- Management). The involved internal analysts were experienced in
formed in a wide spectrum of industry fields. This diversity re- BPMN and other notations. In contrast the former two cases, this
22 B. Aysolmaz et al. / Information and Software Technology 93 (2018) 14–29

Table 5
Overview of the case set characteristics.

Case set Case type Project #PM #ACT

(1) e-Government New project e-Company 18 125


e-Trademark 9 47
Total 27 172
(2) Public Services New project Auditing 4 53
Budget Management. 69 470
Archive Management 12 638
Human Resource Management 24 218
Investment 20 123
Performance Management 6 26
Program Management 8 953
Project Support 91 662
Stakeholder Management 18 729
Total 252 1782
(3) Campus System Retrospective Announcement 3 25
Research Program Management 13 42
Total 16 67

Legend: #PM = Number of process models, #ACT = Number of activities

study took place shortly after the completion of the projects. With Table 6
Interviews for cases.
this retrospective case set, we specifically aimed to evaluate the
completeness of the requirements generated via our approach in Case set Internal Analyst External Analyst Domain Expert
comparison to the requirements already defined with traditional e-Government 2 3 2
approaches. Public Services 2 2 2
Campus System 3 1
4.2. Implementation

We developed a prototype tool to facilitate the implementa- in a practical setting. The interviews followed a semi-structured
tion of our approach in the case study. It is available as a plug-in style, taking around 45 minutes per interviewee. The interviews
for the integrated development environment Eclipse and based on covered questions about the participants’ background as well as
the Eclipse Modeling Framework (EMF) and the Eclipse Graphical the evaluation of our approach including the quality of the gener-
Modeling Framework (GMF).1 The tool supports the development ated requirements documents. Table 6 shows the number of inter-
of process model diagrams in the EPC notation, the identification viewees for each case set. We transcribed, coded, and analyzed the
of automatable activities, the development of related requirements interviews to maintain a chain of evidence [52]. Table 7 summa-
models in conformance with the exemplified FAD notation, and, rizes the key figures of the case study performance. It shows the
lastly, the generation of textual requirements documents in confor- number of workshops performed (#WS), the total analysis effort
mance with the approach explained in Section 3. As all the cases spent (EFF), the number of requirements models developed (#RM),
were conducted in Turkey, we implemented the generation for the and the number of requirements generated based on these mod-
Turkish language. A snapshot of the (English) tool and the gener- els (#REQ). In the following section, we discuss the findings of our
ated requirements document can be seen in Fig. 4. case study.

4.3. Conduct of the case study 4.4. Findings

Our case study consisted of three main steps: (1) the applica- In this section, we discuss how our semi-automated approach
tion of our semi-automated approach, (2) the analysis of the out- for requirements generation was assessed by the project teams of
puts, and (3) a set of feedback interviews. the three case sets. More specifically, we discuss how they evalu-
The starting point of our case study was the application of our ated four key characteristics that have been found to contribute to
semi-automated approach by the project teams in the context of high-quality requirements [6,39,54,55]: readability, completeness,
the cases. To make sure the teams could apply our approach in consistency, and maintainability. We present our findings for each
an effective and efficient way, we provided respective training to key characteristic separately for domain experts and analysts.
the teams before the start of the project. During the execution,
we mainly acted as observers on how the teams applied the ap-
4.4.1. Readability
proach, but were also available for questions. Upon completion of
The readability (sometimes also referred to as unambiguousness)
the application and the generation of the requirements documents,
of a requirements document is one of its most important features
we analzyed which changes were manually applied to the docu-
[54].
ments. After the completion of all project activities, we conducted
Overall, all four domain experts found the documents informa-
a set of interviews with the internal and external analysts as well
tive and understandable. The domain experts who became famil-
as with the domain experts who were involved in the projects. We
iar with process modeling through the training session and the
chose interviews because they are the most prominent qualitative
workshops found the joint presentation of the process models and
data collection method for obtaining in-depth insights [53]. Con-
requirements sentences to further improve readability. For in-
sidering the number of team members involved in the case sets,
stance, one domain expert from case set 1 stated that “Studying a
interviews enabled us to develop a comprehensive understanding
model and the related statements together helped me to easily under-
of the participant’s experiences related to the use of our approach
stand the requirements”. Other domain experts supported this with
similar statements. Despite the generally positive feedback, some
1
The tool can be obtained from www.aysolmaz.com. domain experts also mentioned aspects for improvement. For in-
B. Aysolmaz et al. / Information and Software Technology 93 (2018) 14–29 23

Fig. 4. A screenshot of the prototype tool and the generated requirements document.

Table 7
Key figures of case study conduct.

Case set Project #WS EFF #RM #REQ

(1) e-Government e-Company 10 76 82 363


e-Trademark 6 41 36 177
Total 16 117 118 540
(2) Public Services Auditing 2 10 24 61
Budget Management 28 154 339 822
Archive Management 4 28 52 110
Human Resource Management 12 46 159 336
Investment 6 36 103 218
Performance Management 14 67 18 36
Program Management 3 23 72 154
Project Support 41 148 457 1038
Stakeholder Management 27 9 54 129
Total 119 539 1278 2904
(3) Campus System Announcement 3 6 18 65
Research Program. Management 3 8 18 60
Total 6 14 36 125

Legend: #WS = Number of workshops performed, EFF = Analysis effort in person-days,


#RM = Number of requirements models, #REQ = Number of requirements sentences gen-
erated

stance, one domain expert from case set 2 stated that the fixed 4.4.2. Completeness
structure of the sentences sometimes felt mechanical. At the same The completeness of requirements is an important characteristic
time, however, he also pointed out that such a generation facili- because it indicates the additional effort that has to be invested
tates a standardized and mature requirements structure. beyond the application of our approach [6].
All of the analysts mentioned that the generated documents The domain experts from case set 2 stated that the approach
were clear and understandable. Overall, they personally preferred supported them to “recognize whether the requirements are com-
to examine the models instead of the documents, but they found plete”. Overall, all domain experts agreed that the final set of re-
the generated documents to fit the purpose. An internal analyst quirements appeared complete. Besides that, they did not have fur-
from case set 2 stated: “We needed to explain the system to various ther comments on completeness.
experts and the documents certainly helped us for this”. Some ana- The analysts provided further comments on completeness. One
lysts also suggested specific changes to enhance readability. For in- internal analyst from case set 1, for instance, pointed out she
stance, the team from case set 1 suggested to merge short respon- “would not be able to define such detailed requirements” in an-
sibility and data need sentences into a single sentence. The same other way. An analyst from case set 2 said that “the approach was
team also asked for removing the first part of the data need sen- adequate to express what is required”. Emphasizing the support pro-
tences (“While < ActionGerund > ”). We implemented the suggested vided by the approach, analysts also mentioned that our approach
changes by updating the generation algorithm respectively. helped “to collect the functional requirements with respect to the ar-
Altogether, we found that our approach generated well-readable chitectural components effectively”. Being asked for a comparison
requirements documents. In fact, all requested changes could be with traditional approaches, an analyst of case set 2 indicated that
implemented by straightforward adaptations of the generation al- “it would be harder to make a complete set like this if we wrote down
gorithm. the requirements textually in the first place”. He explained that “we
24 B. Aysolmaz et al. / Information and Software Technology 93 (2018) 14–29

Fig. 5. Generated requirements per process models and requirements coverage per case.

would miss a lot of aspects of the system if we didn’t see the complete models and the natural language requirements by means of the
picture by means of the models”. Another analyst from case set 2 automated generation approach.
stated that the completeness was achieved as “we were able to an- From the domain experts we received very positive feedback
ticipate how the system should work as a whole and see the relations with respect to the consistency. In fact, they explicitly stated that
between different parts by means of the process-based requirements they did not observe any inconsistencies in the requirements.
analysis”. The analysts were also very positive. They also had more spe-
The overall completeness of the generated requirements for cific comments on the achieved consistency. One internal analyst
each project is illustrated in Fig. 5. On the left-hand side, we from case set 1 mentioned that “especially if more than one person
can see the total number of requirements per process model and works on the analysis, this approach supports you to get the same
project. On the right-hand side, we see the coverage of the gener- quality of output from everybody”. Another analyst from case set 2
ated requirements for each project. For example, in the e-Company was initially critical about the usage of specific model elements for
project of case set 1 about 20 requirement sentences were gener- modeling the requirements, but later found that “it was helpful for
ated per process model and the generated requirements covered ensuring quality”. Here, it should be noted that the consistency of
91% of all requirements. The rest of the requirements were manu- the generated requirements is dependent on the consistency of the
ally added by the analysts in case set 1 and 2. Among the manually models. In this respect, although the use of the approach does not
added requirements, none related to the process-related aspects of ensure the consistency of the generated requirements, the model-
the systems. Rather, they concerned general aspects which were based analysis helped the analysts to avoid such problems. All ex-
not directly related to the processes and included the architecture ternal analysts emphasized that “updates would normally introduce
of the system, interfaces with external systems, system-wide char- consistency problems”, but that our approach helped to “observe
acteristics, security and quality requirements, and software devel- cross relations and to prevent resulting inconsistencies”. Internal an-
opment principles. Thus, they were not expected to be covered in alysts from case set 2 stated that they “were able to define the re-
the generated requirements set. Case set 3, the retrospective case, quirements consistently although there were many different processes”
posed an important role to evaluate the completeness. While in by means of “the holistic view and the standardized language”.
case sets 1 and 2 the requirements were developed from scratch
in the context of the programs, case set 3 included an existing re- 4.4.4. Maintainability
quirements document which was prepared in a different setting. Maintainability, sometimes also referred to as modifiability, is
We used the existing requirements as a benchmark and performed particularly important when it comes to changes [39]. All intervie-
a delta analysis for the generated requirements. For this, we pre- wees pointed out that they found the requirements easy to main-
pared a mapping between the existing requirement statements and tain. Among others, this was found to be caused by the improved
the generated ones. The results showed that 95% coverage was traceability between process models and requirements.
achieved by the approach even with respect to the requirements One surprising finding was that even the domain experts, who
already developed with traditional approaches. The unmatched re- typically do not develop models themselves, agreed on the im-
quirements in the existing document related to quality aspects of proved maintenance. One domain expert from case set 1 stated
the system. Moreover, six additional requirements were identified that he “could better understand the effects of a change”.
that were not included in the existing document. Thus, the findings The analysts provided further discussions on how the main-
of the retrospective case confirmed that a complete set of process- tainability was improved by our approach. One internal analyst
related requirements can be revealed by means of our approach. of case set 1 stated that “when a process was updated, it was
also clear which requirements need to be changed”. While we also
4.4.3. Consistency expected positive comments with respect to the maintainabil-
Consistency is another important characteristic of a require- ity resulting from the automated generation, we received quite
ments set and refers to the absence of contradictions within the unexpected feedback. All analysts stated that they did not find
set [55]. Our approach inherently ensures the consistency of the that the approach would save time to prepare the initial require-
B. Aysolmaz et al. / Information and Software Technology 93 (2018) 14–29 25

Table 8
Comparison of the text Structure of actual and generated requirements.

Actual Generated

Project W/S V/S S/R W/R W/S V/S S/R W/R

Research Program Management 16.16 1.13 1.28 20.68 13.93 1.07 1 13.93
Announcement 14.19 1.43 1.60 22.77 13.48 1.38 1 13.48

ments document. One internal analyst from case set 1 stated that, ated requirements to the corresponding generated requirements.
creating the process models for the requirements generation is Fig. 6 visualizes this mapping. It shows for each manually created
time-consuming: “I would be faster with traditional methods, but requirement to how many generated requirements it relates. We
I wouldn’t be able to achieve the level of completeness”. Two ex- observe that many of the manually created requirements relate
ternal analysts of case set 1 and 2, as well as the internal an- to more than a single generated requirement. The average num-
alysts of case set 2 emphasized the potential time gain for up- ber of generated requirements per manually created requirement
dates and future development phases despite the extra time spent. is 3.9 for the Research Program Management project and 2.2 for
Another external analyst from case set 2 stated that “it may look the Announcement project. Against the background of our findings
like we spent more time, but in the long run, the time spent will be from the text structure comparison, this is quite a surprising result.
less”. While the manually created requirements tend to be more verbose
Overall, the interviews highlighted that domain experts were and, sometimes, even provide redundant information, the gener-
mainly interested in readability. Since domain experts often strug- ated requirements document provides more details. We analyzed
gled with understanding the requirements, readability was their the extreme cases (i.e. where a manually created requirement re-
major concern. The analysts, by contrast, were also interested in lates to 10 generated requirements) and found that the manually
the other three characteristics since they directly relate them to created document lacked important details with respect to respon-
time savings and the automated support they expect from our ap- sibilities and data needs.
proach. The analysts provided clear statements on how the ap- In summary, we can say that this comparison highlighted the
proach enabled them to produce more complete, maintainable, and value of automated requirements generation. From a structural
consistent requirements. point of view, the generated requirements are very similar to the
manually created requirements. The generated requirements, how-
4.5. Comparison of manually created and generated requirements ever, use less words and do not provide redundant information.
From a content perspective, the comparison particularly illustrated
The results of our case study showed that the generated re- the superiority of the generated requirements in terms of com-
quirements were positively perceived with respect to the four in- pleteness.
vestigated key characteristics. An open question, however, is how
exactly the manually created and generated requirements differ.
4.6. Limitations
To investigate this, we made use of the retrospective use case set
Campus System. Our goal was to understand how the manually cre-
Despite the positive results, our evaluation has to be reflected
ated and the generated texts compare with respect to text struc-
from the perspective of some limitations. The first limitation re-
ture and how they convey the requirements content.
lates to the conducted interviews. While the interviews allowed us
To investigate the text structure, we computed a set of basic
to collect in-depth insights about the use of our approach in prac-
sentence complexity metrics [56]:
tice, interviews are also subjective by nature [53]. Among others,
• Average number of words per sentence (W/S) this means that the results of interviews could have been influ-
• Average number of verbs per sentence (V/S) enced by the bias of the interviewer. To avoid such a bias as far
• Average number of sentences per requirement (S/R) as possible, we designed and strictly followed an interview guide-
• Average number of words per requirement (W/R) line. Moreover, an independent researcher reviewed the interview
Table 8 summarizes the results of the comparison of the text transcripts and confirmed the relevance of the answers with re-
structure. A general observation is that the generated sentences fol- spect to the interview guideline. By following this procedure, we
low a similar structure like the manually created sentences, as in- tried to minimize the limitations of interviews and obtain unbi-
dicated by similar values for the metrics W/S and V/S. This means ased and reliable results. The second limitation relates to gener-
that our approach generates sentences that are structurally com- alizability of the overall case study [52]. While we carefully col-
parable to those created by humans. However, we also observe lected a number of differing cases, we cannot claim that the re-
some differences. Most notably, the manually created requirements sults are representative or can be generalized to other organiza-
contain a higher number of sentences and words per requirement tions. However, since the feedback from the evaluation was consis-
(see S/R and W/R). This raises the question whether the manu- tently positive among the three cases, we are also confident that
ally created requirements are unnecessarily verbose or complex, the presented approach can indeed provide considerable value for
which might explain the lower readability and comprehensibil- organizations.
ity perceived by users. A detailed analysis of the manually devel-
oped requirements indeed supports this conjecture. We identified 5. Adaptation to other languages
many sentences in the manually created requirements that con-
tained nonessential and repetitive descriptions. Among others, we From a conceptual perspective, the presented approach is not
found nonessential context information, redundant descriptions of bound to a specific language. However, to use our approach for
functionality, and descriptions of data attributes that were already languages other than English, two main adaptations are required.
defined in the data dictionary. First, the templates must be translated and adapted to the tar-
To understand how the manually created and the generated re- get language. To illustrate the required steps, assume we would
quirements convey their content, we mapped the manually cre- like to adapt the system to German and Turkish (i.e. two lan-
26 B. Aysolmaz et al. / Information and Software Technology 93 (2018) 14–29

Fig. 6. Relationship between manually created and generated requirements.

guages from different language families). What is more, recon- niques. The consistency is ensured via automated generation. Our
sider the template “The < Role > shall < Action > < Object > ” and approach also informs methods for process model validation. In
its instantiation “The Project Officer shall carry out the operation of contrast to existing process model verbalization approaches [9],
registering the proposal score”. If we wish to adapt the system to our approach also considers execution-related data and, thus, al-
German, we need to translate this template and adapt it to the lows to obtain a more complete picture.
German grammar. By replacing the word “The” with a new slot From a practical perspective, our approach helps to improve
“ < Article > ” (in German the article depends on the gender of the several characteristics that contribute to high-quality requirements,
referenced noun), by translating “shall” into “soll”, and by switch- thus improving their usability. Other potential benefits for prac-
ing the order of the action and the object slots, we obtain the tem- titioners include the standardization of requirements engineering
plate “ < Article > < Role > soll < Object > < Action > ”. In a simi- activities of analysts, enhanced testability, and improved scoping of
lar way, a respective template for Turkish can be obtained. Since the project. Hence, our approach can help practitioners in achiev-
Turkish does not use articles, the article “the” is omitted and the ing considerable improvements in the software development pro-
word order is adapted to the Turkish grammar. As a result, we cess. While an extra effort must be spent in the initial analysis
obtain the template “ < Role > < Object > < ActionGerund > işlemini phase, the quality of the obtained requirements might save project
< ResponsibilityVerb > .”. teams from unnecessary repetitions in the SDLC. In the long run,
Second, respective inflection mechanisms for the target lan- our approach may, thus, also help to reduce costs. Taking these
guage have to be implemented. In case of German this means that benefits into account, we believe our approach has the potential
the correct article has to be determined based on the gender of to influence the way requirements elicitation is conducted in prac-
the noun and that the verb must be conjugated. Both aspects can tice. In fact, two organizations from our three cases, used the gen-
be achieved by using publicly available language processing tools erated requirements document for finding a suitable software de-
such as SimpleNLG [57]. Based on this tool and respective German velopment subcontractor.
inputs for the slots, we are therefore able to generate a German
version of the sentence: “Der Projektleiter soll die Registrierung der 7. Conclusion
Angebotsbewertung vornehmen”. For Turkish, only the gerund of the
verb must be obtained. This can be achieved by looking at the last In this paper, we addressed the problem of inconsistencies be-
vowel of the verb and concatenating “-ma” in case of hard vowel tween process models and natural language in the context of re-
sounds (e.g. a, u) and “-me” in case of soft vowel sounds (e.g. e, ü) quirements specification. To cope with this problem, we introduced
to the end of the input verb. In this way, we are able to also gen- a semi-automated approach, which consists of two main phases. In
erate a Turkish version of the sentence: “Proje uzmanı teklif puanını the manual preparation phase, users identify the automatable ac-
kaydetme işlemini yürütecektir.” tivities in the input process model(s) and specify the associated re-
These examples illustrate that the adaptation of our technique sponsibilities, data needs, system interactions, and execution con-
is a one-time investment that is associated with reasonable effort. straints. The requirements model resulting from this analysis then
Because tools for inflecting words are available for many languages, serves as input for a generation algorithm, which automatically
only little technical knowledge about natural language generation provides the user with a well-organized natural language require-
will be required for the adaptation. ments document.
We evaluated our approach by applying it in the context of
6. Implications a multiple case study with three organizations and a total of 13
projects. We found that our approach could be successfully applied
The approach we presented in this paper has several implica- to generate well-readable requirements that are complete, consis-
tions for research and practice. tent, well maintainable, and, most importantly, of practical value.
From a research perspective, our work complements existing The interviewed analysts and domain experts pointed out that our
methods for requirements elicitation based on process models approach positively contributed to the completeness, consistency,
[4,29,30] by providing an automated way to obtain requirements and maintainability of the requirements documents. Thus, the sys-
documents. In contrast to existing approaches that consider auto- tematic analysis as well as the automated generation helped the
mated support to elicit requirements, such as the ones proposed by studied project teams to deliver requirements documents of higher
Türetken et al. [34] and Coşkunçay et al. [35], our approach was quality. This is emphasized by the fact that the generated require-
evaluated to generate requirements that are well-readable, com- ments documents were used for finding a suitable software devel-
plete, and easy to maintain by means of the formulated require- opment subcontractor in 11 of the 13 projects. Hence, we conclude
ments analysis and formalized natural language generation tech- that our approach successfully addresses the problem of inconsis-
B. Aysolmaz et al. / Information and Software Technology 93 (2018) 14–29 27

tency between process models and requirements documents, and [20] D.A. Scanlan, Structured flowcharts outperform pseudocode: an experimental
provides real value to organizations. comparison, Software, IEEE 6 (5) (1989) 28–36.
[21] R.E. Mayer, Multimedia Learning, second edition, Cambridge University Press,
In future work, we aim to extend our approach with the Cambridge, UK, 2009.
capability to automatically reflect changes of the generated re- [22] M. Weber, J. Weisbrod, Requirements engineering in automotive development–
quirements documents in the associated requirements and process experiences and challenges, in: Requirements Engineering, 2002. Proceedings.
IEEE Joint International Conference on, IEEE, 2002, pp. 331–340.
models. In this way, the consistency between the artifacts can be [23] B. Schätz, A. Fleischmann, E. Geisberger, M. Pister, et al., Model-based re-
also assured if changes are applied to requirements. Another aspect quirements engineering with autoraid., in: GI Jahrestagung (2), Citeseer, 2005,
we wish to investigate is the specific impact of using the generated pp. 511–515.
[24] A. Davis, Just Enough Requirements Management: Where Software Develop-
requirements. In this context, we plan to apply our method with
ment Meets Marketing, Addison-Wesley, 2013.
and without the generated requirements documents. Besides that, [25] J. Nicolás, A. Toval, On the generation of requirements specifications from soft-
we also plan to apply our approach in organizations that maintain ware engineering models: a systematic literature review, Inf. Softw. Technol. 51
(9) (2009) 1291–1307.
English models. This will not only allow us to test our generation
[26] M. Dumas, W.V. der Aalst, A. ter Hofstede, Process-Aware Information Systems:
algorithm in another language, but also to evaluate the applicabil- Bridging People and Software Through Process Technology, John Wiley & Sons,
ity in different cultures and settings. A final line of work we plan New Jersey, 2005.
is to investigate the systematic transfer of the acquired require- [27] M. Indulska, P. Green, J. Recker, M. Rosemann, Business Process Modeling: Per-
ceived Benefits, in: A. Laender, S. Castano, U. Dayal, F. Casati, J. Oliveira (Eds.),
ments knowledge to the following software development phases. Concept. Model. - ER 2009 SE - 34, Lecture Notes in Computer Science, Vol.
In this way, the benefits of the approach may also contribute to 5829, Springer Berlin Heidelberg, 2009, pp. 458–471.
other phases of the SDLC. [28] H.C. Mayr, C. Kop, D. Esberger, Business Process Modeling and Requirements
Modeling, in: Digit. Soc. 2007. ICDS ’07. First Int. Conf., 2007, p. 8.
Acknowledgement [29] O. Demirors, Ç. Gencel, A. Tarhan, Utilizing business process models for re-
quirements elicitation, in: Euromicro Conf. 20 03, 20 03, pp. 1–4.
[30] C. Monsalve, A. April, A. Abran, Requirements Elicitation Using BPM Notations:
This work has been partially supported by the European Union’s Focusing on the Strategic Level Representation, in: 10th WSEAS Int. Conf. Appl.
Horizon 2020 research and innovation programme under the Marie Comput. Appl. Comput. Sci., 2011, pp. 235–241.
[31] J.D.l.V. González, J. Díaz, Business process-driven requirements engineering: a
Skłodowska-Curie grant agreement No 660646.
goal-based approach, in: Proceedings of the 8th Workshop on Business Process
References Modeling, 2007, pp. 1–9.
[32] K. Cox, K.T. Phalp, S.J. Bleistein, J.M. Verner, Deriving requirements from pro-
[1] A. Gross, J. Doerr, EPC vs. UML activity diagram - two experiments examining cess models via the problem frames approach, Inf. Softw. Technol. 47 (5)
their usefulness for requirements engineering, in: Requir. Eng. Conf. 2009. RE (2005) 319–337.
’09. 17th IEEE Int., 2009, pp. 47–56. [33] S. Malik, I.S. Bajwa, Back to origin: transformation of business process models
[2] C. Monsalve, A. Abran, A. April, Measuring software functional size from busi- to business rules, in: M. La Rosa, P. Soffer (Eds.), Bus. Process Manag. Work.
ness process models, Int. J. Softw. Eng. Knowl. Eng. 21 (03) (2011) 311–338. BPM 2012 Int. Work. Tallinn, Est. Sept. 3, 2012. Revis. Pap., Springer Berlin Hei-
[3] J. Vara, M. Fortuna, J. Sánchez, C. Werner, M. Borges, A Requirements Engi- delberg, Berlin, Heidelberg, 2013, pp. 611–622.
neering Approach for Data Modelling of Process-Aware Information Systems, [34] O. Turetken, O. Su, O. Demirors, Automating software requirements generation
in: W. Abramowicz (Ed.), Bus. Inf. Syst. SE - 12, Lecture Notes in Business In- from business process models, in: Proc. 1st Conf. Princ. Softw. Eng., Buenos
formation Processing, Vol. 21, Springer Berlin Heidelberg, 2009, pp. 133–144. Aires, Argentina, 2004, pp. 1–16.
[4] E.C. Cardoso, J.P.A. Almeida, G. Guizzardi, Requirements engineering based on [35] A. Coskuncay, B. Aysolmaz, O. Demirors, O. Bilen, I. Dogan, Bridging the gap
business process models: a case study., in: EDOCW, 2009, pp. 320–327. between business process modeling and software requirements analysis: a
[5] K. Brennan, A guide to the Business Analysis Body of Knowledge (BABOK case study, in: MCIS 2010 Proc., 2010, p. Paper 20.
guide), version 2.0, 2nd edition, IIBA International Institute of Business Analy- [36] R. Davis, E. Brabander, ARIS Design Platform Getting Started with BPM,
sis, 2009. Springer, London, 2007.
[6] IEEE, IEEE Recommended Practice for Software Requirements Specifications, [37] T. Specht, J. Drawehn, M. Thränert, S. Kühne, Modeling cooperative busi-
IEEE Std 830–1998, Technical Report, Software Engineering Standards Commit- ness processes and transformation to a service oriented architecture, in: Proc.
tee of the IEEE Computer Society, Piscataway, N.J., 1998. Seventh IEEE Int. Conf. E-Commerce Technol., IEEE, Munich, Germany, 2005,
[7] Q. Ma, Y. Jiang, Process-oriented information system requirements pp. 249–256.
engineering - a case study, J. Bus. Cases Appl. 10 (2014) 1–16. [38] C.M. Chiao, V. Kunzle, M. Reichert, Integrated modeling of process- and data–
[8] J. Li, R. Jeffery, K.H. Fung, L. Zhu, Q. Wang, H. Zhang, X. Xu, A business process– centric software systems with PHILharmonicFlows, in: Commun. Bus. Process
driven approach for requirements dependency analysis, in: Lect. Notes Comput. Softw. Model. Qual. Understandability, Maintainab. (CPSM), 2013 IEEE 1st Int.
Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), Vol. Work., 2013, pp. 1–10.
7481 LNCS, 2012, pp. 200–215. [39] B. Berenbach, D.J. Paulish, J. Kazmeier, A. Rudorfer, Software & Systems Re-
[9] H. Leopold, J. Mendling, A. Polyvyanyy, Supporting process model validation quirements Engineering: In Practice, Vol. 29, McGraw-Hill, 2009.
through natural language generation, IEEE Trans. Softw. Eng. 40 (8) (2014) [40] L. Hunnebeck, ITIL Service Design, 2nd edition, The Stationery Office, 2011.
818–840. [41] G. Hardy, Using IT governance and COBIT to deliver value with IT and respond
[10] A. Coskuncay, An approach for generating natural language specifications by to legal, regulatory and compliance challenges, Inf. Secur. Tech. Rep. 11 (1)
utilizing business process models, Middle East Technical University, 2010 Msc (2006) 55–61.
thesis. [42] M.L. Smith, J. Erwin, Role and Responsibility Charting (RACI), Technical Report,
[11] T. Olsson, J. Grundy, Supporting traceability and inconsistency management Project Management Forum (PMForum), 2005.
between software artefacts, in: Proc. Int. Conf. on Software Engineering and [43] COSMIC, The COSMIC Functional Size Measurement Method Version 4.0 Mea-
Application, 2002. surement Manual, Technical Report, The Common Software Measurement In-
[12] S. Winkler, J. Pilgrim, A survey of traceability in requirements engineering ternational Consortium (COSMIC), 2014.
and model-driven development, Software and Systems Modeling (SoSyM) 9 (4) [44] E. Insfrán, O. Pastor, R. Wieringa, Requirements engineering-based conceptual
(2010) 529–565. modelling, Requir. Eng. 7 (2) (2002) 61–72.
[13] H. van der Aa, H. Leopold, H.A. Reijers, Detecting inconsistencies between [45] S. Goedertier, J. Vanthienen, Declarative process modeling with business vo-
process models and textual descriptions, in: H.R. Motahari-Nezhad, J. Recker, cabulary and business rules, in: R. Meersman, Z. Tari, P. Herrero (Eds.), Move
M. Weidlich (Eds.), Bus. Process Manag. 13th Int. Conf. BPM 2015, Innsbruck, to Meaningful Internet Syst. 2007 OTM 2007 Work. SE - 83, Lecture Notes in
Austria, August 31, – Sept. 3, 2015, Proc., Springer International Publishing, Computer Science, Vol. 4805, Springer Berlin Heidelberg, 2007, pp. 603–612.
Cham, 2015, pp. 90–105. [46] E. Reiter, R. Dale, Building applied natural language generation systems, Nat.
[14] R.S. Day, Alternative representations, Psychol. Learn. Motiv. 22 (1988) 261–305. Lang. Eng. 3 (1997) 57–87.
[15] J.M. Polich, S.H. Schwartz, The effect of problem size on representation in de- [47] E. Reiter, Nlg vs. templates, in: Proceedings of the 5th European Workshop on
ductive problem solving, Mem. Cognit. 2 (4) (1974) 683–686. Natural Language Generation, 1995, pp. 95–106.
[16] S.M. Schwartz, D.L. Fattaleh, Representation in deductive problem-solving: the [48] K.V. Deemter, M. Theune, E. Krahmer, Real vs . template-based natural lan-
matrix., J. Exp. Psychol. 95 (2) (1972) 343. guage generation: a false opposition? Comput. Linguist. 31 (2003) 15–24.
[17] P. Wright, F. Reid, Written information: some alternatives to prose for express- [49] H. Leopold, S. Smirnov, J. Mendling, On the refactoring of activity labels in
ing the outcomes of complex contingencies., J. Appl. Psychol. 57 (2) (1973) 160. business process models, Inf. Syst. 37 (5) (2012) 443–459.
[18] H.R. Ramsey, M.E. Atwood, J.R. Van Doren, Flowcharts versus program de- [50] D. Klein, C.D. Manning, Accurate unlexicalized parsing, 41st Meeting Assoc
sign languages: an experimental comparison, Commun. ACM 26 (6) (1983) Comput. Linguist. (2003) 423–430.
445–449. [51] I. Benbasat, D.K. Goldstein, M. Mead, The case research strategy in studies of
[19] T.G. Moher, D. Mak, B. Blumenthal, L. Levanthal, Comparing the comprehensi- information systems, MIS Q. 11 (3) (1987) 369–386.
bility of textual and graphical programs, in: Empirical Studies of Programmers: [52] R.K. Yin, Case Study Research: Design and Methods, 3rd Edition (Applied Social
Fifth Workshop, Ablex, Norwood, NJ, 1993, pp. 137–161. Research Methods, Vol. 5), SAGE Publications, Inc, 2002.
28 B. Aysolmaz et al. / Information and Software Technology 93 (2018) 14–29

[53] J. Recker, Scientific Research in Information Systems: A Beginner’s Guide, [56] X. Lu, Automatic analysis of syntactic complexity in second language writing,
Springer-Verlag Berlin Heidelberg, 2013. Int. J. Corpus Linguist. 15 (4) (2010) 474–496.
[54] D. Firesmith, Specifying good requirements, J. Object Technol. 2 (4) (2003) [57] M. Bollmann, Adapting simplenlg to german, in: Proceedings of the 13th Eu-
77–87. ropean Workshop on Natural Language Generation, Association for Computa-
[55] D. Zowghi, V. Gervasi, The three cs of requirements: consistency, complete- tional Linguistics, 2011, pp. 133–138.
ness, and correctness, in: International Workshop on Requirements Engineer-
ing: Foundations for Software Quality, Essen, Germany: Essener Informatik
Beitiage, 2002, pp. 155–164.
B. Aysolmaz et al. / Information and Software Technology 93 (2018) 14–29 29

Dr. Banu Aysolmaz is an assistant professor at the Accounting and Information Management Department, Maastricht University. She worked as a post-doctoral researcher
and a Marie Curie fellow with the Department of Computer Science at the Vrije Universiteit Amsterdam. Her research interests include business process modeling, software
engineering, process model comprehension, and visualization. She obtained her PhD in information systems from Middle East Technical University (METU), Ankara, Turkey.
Her doctoral thesis received 2014 METU year of the thesis award. She worked as a consultant in the areas of business process management and software process improvement
in many organizations in Turkey.

Dr. Henrik Leopold is an assistant professor with the Department of Computer Science at the Vrije Universiteit Amsterdam. His research interests include business process
modeling, natural language processing techniques, process model matching, and process architectures. His research has been published, among others, in Decision Support
Systems, IEEE Transactions on Software Engineering, and Information Systems. His doctoral thesis received the German Targion Award 2014 for the best dissertation in the
field of strategic information management.

Dr. Hajo A. Reijers is a full professor at Vrije Universiteit Amsterdam, where he heads the Information Management & Software Engineering group of the Computer Science
department. He is also a part-time, full professor at the Department of Mathematics & Computer Science of Eindhoven University of Technology. His expertise is in enterprise
systems, business process management, process mining, conceptual modeling, and workflow technology. Reijers has published over 150 scientific papers, chapters in edited
books, and articles in professional journals.

Dr. Onur Demirörs is a full professor at the Department of Computer Engineering, Izmir Institute of Technology and a visiting professor at the School of Computer Science
and Engineering, NSWU. He worked as the head of the software management program at the Middle East Technical University, and lead the Software Management Research
Group and Bilgi Grubu Consultancy. His work focuses on software process improvement, software project management, software engineering education, software engineering
standards, software measurement, and organizational change management.

You might also like