Software Engineering For Science
Software Engineering For Science
Software Engineering For Science
ENGINEERING
FOR SCIENCE
www.ebook3000.com
Chapman & Hall/CRC
Computational Science Series
SERIES EDITOR
Horst Simon
Deputy Director
Lawrence Berkeley National Laboratory
Berkeley, California, U.S.A.
PUBLISHED TITLES
www.ebook3000.com
SOFTWARE
ENGINEERING
FOR SCIENCE
Edited by
Jeffrey C. Carver
University of Alabama, USA
George K. Thiruvathukal
Loyola University Chicago, Chicago, Illinois
www.ebook3000.com
MATLAB® is a trademark of The MathWorks, Inc. and is used with permission. The MathWorks does not
warrant the accuracy of the text or exercises in this book. This book’s use or discussion of MATLAB® soft-
ware or related products does not constitute endorsement or sponsorship by The MathWorks of a particular
pedagogical approach or particular use of the MATLAB® software.
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts
have been made to publish reliable data and information, but the author and publisher cannot assume
responsibility for the validity of all materials or the consequences of their use. The authors and publishers
have attempted to trace the copyright holders of all material reproduced in this publication and apologize to
copyright holders if permission to publish in this form has not been obtained. If any copyright material has
not been acknowledged please write and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmit-
ted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented,
including photocopying, microfilming, and recording, or in any information storage or retrieval system,
without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.
com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood
Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and
registration for a variety of users. For organizations that have been granted a photocopy license by the CCC,
a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used
only for identification and explanation without intent to infringe.
List of Figures xv
Acknowledgments xxv
Introduction xxvii
vii
www.ebook3000.com
viii Contents
1.6 Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.7 Additional Future Considerations . . . . . . . . . . . . . . . 25
3.4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.5 Testing Stochastic Software Using Pseudo-Oracles . . . . . . 77
3.5.1 The Huánglóngbìng SECI Model . . . . . . . . . . . . 78
3.5.2 Searching for Differences . . . . . . . . . . . . . . . . . 80
3.5.3 Experimental Methodology . . . . . . . . . . . . . . . 82
3.5.4 Differences Discovered . . . . . . . . . . . . . . . . . . 82
3.5.5 Comparison with Random Testing . . . . . . . . . . . 86
3.5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.7 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . 88
www.ebook3000.com
x Contents
www.ebook3000.com
xii Contents
References 235
Index 265
www.ebook3000.com
List of Figures
xv
www.ebook3000.com
xvi List of Figures
xvii
www.ebook3000.com
About the Editors
xix
www.ebook3000.com
List of Contributors
xxi
www.ebook3000.com
xxii List of Contributors
www.ebook3000.com
Acknowledgments
Jeffrey C. Carver was partially supported by grants 1243887 and 1445344 from
the National Science Foundation. Any opinions, findings, and conclusions or
recommendations expressed in this material are those of the author(s) and do
not necessarily reflect the views of the National Science Foundation.
Neil P. Chue Hong was supported by the UK Engineering and Physical Sci-
ences Research Council (EPSRC) Grant EP/H043160/1 and EPSRC, BBSRC
and ESRC Grant EP/N006410/1 for the UK Software Sustainability Institute.
George K. Thiruvathukal was partially supported by grant 1445347 from
the National Science Foundation. Any opinions, findings, and conclusions or
recommendations expressed in this material are those of the author(s) and do
not necessarily reflect the views of the National Science Foundation.
MATLAB
R
is a registered trademark of The MathWorks, Inc. For product
information, please contact:
xxv
www.ebook3000.com
Introduction
General Overview
Scientific software is a special class of software that includes software devel-
oped to support various scientific endeavors that would be difficult, or impos-
sible, to perform experimentally or without computational support. Included
in this class of software are, at least, the following:
• Software that solves complex computationally- or data-intensive prob-
lems, ranging from large, parallel simulations of physical phenomena run
on HPC machines, to smaller simulations developed and used by groups
of scientists or engineers on a desktop machine or small cluster
• Applications that support scientific research and experiments, including
systems that manage large data sets
• Systems that provide infrastructure support, e.g. messaging middleware,
scheduling software
• Libraries for mathematical and scientific programming, e.g. linear alge-
bra and symbolic computing
The development of scientific software differs significantly from the devel-
opment of more traditional business information systems, from which many
software engineering best practices and tools have been drawn. These differ-
ences appear at various phases of the software lifecycle as outlined below:
• Requirements:
– Risks due to the exploration of relatively unknown scientific/engi-
neering phenomena
– Risks due to essential (inherent) domain complexity
– Constant change as new information is gathered, e.g. results of a
simulation inform domain understanding
• Design
xxvii
www.ebook3000.com
xxviii Introduction
software and which are not. Some of the ineffective practices may need further
refinements to fit within the scientific context. To increase our collective under-
standing of software engineering for science, this book consists of a collection
of peer-reviewed chapters that describe experiences with applying software
engineering practices to the development of scientific software.
Publications regarding this topic have seen growth in recent years as
evidenced by the ongoing Software Engineering for Science workshop se-
ries1 [1–5], workshops on software development as part of the IEEE Inter-
national Conference on eScience 2,3 conference, and case studies submitted to
the Working towards Sustainable Scientific Software: Practice and Experiences
workshop series4,5 . Books such as Practical Computing for Biologists [6] and
Effective Computation in Physics [8] have introduced the application of soft-
ware engineering techniques to scientific domains. In 2014, Nature launched a
new section, Nature Toolbox6 , which includes substantial coverage of software
engineering issues in research. In addition, this topic has been a longstanding
one in Computing in Science and Engineering (CiSE)7 , which sits at the inter-
section of computer science and complex scientific domains, notably physics,
chemistry, biology, and engineering. CiSE also has recently introduced a Soft-
ware Engineering Track to more explicitly focus on these types of issues8 .
EduPar is an education effort aimed at developing the specialized skill set (in
concurrent, parallel, and distributed computing) needed for scientific software
development [7]9 .
In terms of funding, the United States Department of Energy funded
the Interoperable Design of Extreme-Scale Application Software (IDEAS)
project10 . The goal of IDEAS is to improve scientific productivity of extreme-
scale science through the use of appropriate software engineering practices.
sustainable-software-for-science/
5 http://openresearchsoftware.metajnl.com/collections/special/working-towards-
sustainable-software-for-science-practice-and-experiences/
6 http://www.nature.com/news/toolbox
7 http://computer.org/cise
8 https://www.computer.org/cms/Computer.org/ComputingNow/docs/2016-software-
engineering-track.pdf
9 http://grid.cs.gsu.edu/ tcpp/curriculum/?q=edupar
10 http://ideas-productivity.org
www.ebook3000.com
xxx Introduction
The chapters underwent peer review from the editors and authors of other
chapters to ensure quality and consistency.
The chapters in this book are designed to be self-contained. That is, read-
ers can begin reading whichever chapter(s) are interesting without reading the
prior chapters. In some cases, chapters have pointers to more detailed infor-
mation located elsewhere in the book. That said, Chapter 1 does provide a
detailed overview of the Scientific Software lifecycle. To group relevant ma-
terial, we organized the book into three sections. Please note that the ideas
expressed in the chapters do not necessarily reflect our own ideas. As this book
focuses on documenting the current state of software engineering in scientific
software development, we provide an unvarnished treatment of lessons learned
from a diverse set of projects.
Software Testing
This section provides examples of the use of testing in scientific software
development. The authors of chapters in this section highlight key issues as-
sociated with testing and how those issues present particular challenges for
scientific software development (e.g. test oracles). The chapters then describe
solutions and case studies aimed at applying testing to scientific software de-
velopment efforts. This section includes four chapters.
Chapter 4, Testing of Scientific Software: Impacts on Research Credibil-
ity, Development Productivity, Maturation, and Sustainability provides an
overview of key testing terminology and explains an important guiding prin-
ciple of software quality: understanding stakeholders/customers. The chapter
argues for the importance of automated testing and describes the specific
challenges presented by scientific software. Those challenges include testing
floating point data, scalability, and the domain model. The chapter finishes
with a discussion of test suite maintenance.
Chapter 5, Preserving Reproducibility through Regression Testing describes
how the practice of regression testing can help developers ensure that results
are repeatable as software changes over time. Regression testing is the prac-
tice of repeating previously successful tests to detect problems due to changes
to the software. This chapter describes two key challenges faced when testing
scientific software, the oracle problem (the lack of information about the ex-
pected output) and the tolerance problem (the acceptable level of uncertainty
in the answer). The chapter then presents a case study to illustrate how regres-
sion testing can help developers address these challenges and develop software
with reproducible results. The case study shows that without regression tests,
faults would have been more costly.
Chapter 6, Building a Function Testing Platform for Complex Scientific
Code describes an approach to better understand and modularize complex
codes as well as generate functional testing for key software modules. The
chapter defines a Function Unit as a specific scientific function, which may be
implemented in one or more modules. The Function Unit Testing approach
targets code for which unit tests are sparse and aims to facilitate and expe-
dite validation and verification via computational experiments. To illustrate
the usefulness of this approach, the chapter describes its application to the
Terrestrial Land Model within the Accelerated Climate Modeling for Energy
(ACME) project.
Chapter 7, Automated Metamorphic Testing of Scientific Software ad-
dresses one of the most challenging aspects of testing scientific software, i.e.
the lack of test oracles. This chapter first provides an overview of the test or-
acle problem (which may be of interest even to readers who are not interested
in the main focus of this chapter). The lack of test oracles, often resulting from
www.ebook3000.com
xxxii Introduction
the exploration of new science or the complexities of the expected results, leads
to incomplete testing that may not reveal subtle errors. Metamorphic testing
addresses this problem by developing test cases through metamorphic rela-
tions. A metamorphic relation specifies how a particular change to the input
should change the output. The chapter describes a machine learning approach
to automatically predict metamorphic relations which can then serve as test
oracles. The chapter then illustrates the approach on several open source sci-
entific programs as well as on in-house developed scientific code called SAXS.
Experiences
This section provides examples of applying software engineering techniques
to scientific software. Scientific software encompasses not only computational
modeling, but also software for data management and analysis, and libraries
that support higher-level applications. In these chapters, the authors describe
their experiences and lessons learned from developing complex scientific soft-
ware in different domains. The challenges are both cultural and technical. The
ability to communicate and diffuse knowledge is of primary importance. This
section includes three chapters.
Chapter 8, Evaluating Hierarchical Domain-Specific Languages for Compu-
tational Science: Applying the Sprat Approach to a Marine Ecosystem Model
examines the role of domain-specific languages for bridging the knowledge
transfer gap between the computational sciences and software engineering.
The chapter defines the Sprat approach, a hierarchical model in the field of
marine ecosystem modeling. Then, the chapter illustrates how developers can
implement scientific software utilizing a multi-layered model that enables a
clear separation of concerns allowing scientists to contribute to the develop-
ment of complex simulation software.
Chapter 9, Providing Mixed-Language and Legacy Support in a Library:
Experiences of Developing PETSc summarizes the techniques developers em-
ployed to build the PETSc numerical library (written in C) to portably and
efficiently support its use from modern and legacy versions of Fortran. The
chapter provides concrete examples of solutions to challenges facing scien-
tific software library maintainers who must support software written in legacy
versions of programming languages.
Chapter 10, HydroShare — A Case Study of the Application of Mod-
ern Software Engineering to a Large, Distributed, Federally-Funded, Scien-
tific Software Development Project presents a case study on the challenges of
introducing software engineering best practices such as code versioning, con-
tinuous integration, and team communication into a typical scientific software
development project. The chapter describes the challenges faced because of
differing skill levels, cultural norms, and incentives along with the solutions
developed by the project to diffuse knowledge and practice.
Introduction xxxiii
Chapter 1
- The development lifecycle for scientific software must reflect stages that
are not present in most other types of software, including model devel-
opment, discretization, and numerical algorithm development.
- The requirements evolve during the development cycle because the re-
quirements may themselves be the subject of the research.
- Modularizing multi-component software to achieve separation of con-
cerns is an important task, but it difficult to achieve due to the mono-
lithic nature of the software and the need for performance.
- The development of scientific software (especially multiphysics, multi-
domain software) is challenging because of the complexity of the un-
derlying scientific domain, the interdisciplinary nature of the work, and
other institutional and cultural challenges.
- Balancing continuous development with ongoing production requires
open development with good contribution and distribution policies.
Chapter 2
Chapter 3
- Scientific software is often difficult to test because it is used to answer
new questions in experimental research.
www.ebook3000.com
xxxiv Introduction
Chapter 7
- The oracle problem poses a major challenge for conducting systematic
automated testing of scientific software.
- Metamorphic testing can be used for automated testing of scientific soft-
ware by checking whether the software behaves according to a set of
metamorphic relations, which are relationships between multiple input
and output pairs.
- When used in automated unit testing, a metamorphic testing approach
is highly effective in detecting faults.
Chapter 8
- Scientists can use domain-specific languages (DSLs) to implement well-
engineered software without extensive software engineering training.
- Integration of multiple DSLs from different domains can help scientists
from different disciplines collaborate to implement complex and coupled
simulation software.
- DSLs for scientists must have the following characteristics: appropriate
level of abstraction for the meta-model, syntax that allows scientists
to quickly experiment, have tool support, and provide working code
examples as documentation.
Chapter 9
- Multi-language software, specifically Fortran, C, and C++, is still im-
portant and requires care on the part of library developers, benefitting
from concrete guidance on how to call Fortran from C/C++ and how
to call C/C++ from Fortran.
- Mapping of all common C-based constructs in multiple versions of For-
tran allows developers to use different versions of Fortran in multi-
language software.
Chapter 10
- Use of modern software engineering practices helps increase the sus-
tainability, quality and usefulness of large scientific projects, thereby
enhancing the career of the responsible scientists.
- Use of modern software engineering practices enables software develop-
ers and research scientists to work together to make new and valuable
contributions to the code base, especially from a broader community
perspective.
- Use of modern software engineering practices on large projects increases
the overall code capability and quality of science results by propagating
these practices to a broader community, including students and post-
doctoral researchers.
www.ebook3000.com
Chapter 1
Software Process for Multiphysics
Multicomponent Codes
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Development Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Verification and Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.3 Maintenance and Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.4 Performance Portability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.5 Using Scientific Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Key Insights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Domain Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Key Insights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Institutional and Cultural Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Key Insights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5.1 FLASH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5.1.1 Code Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5.1.2 Verification and Validation . . . . . . . . . . . . . . . . 13
1.5.1.3 Software Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5.1.4 Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.5.2 Amanzi/ATS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.5.2.1 Multiphysics Management through Arcos . 20
1.5.2.2 Code Reuse and Extensibility . . . . . . . . . . . . . 20
1.5.2.3 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.5.2.4 Performance Portability . . . . . . . . . . . . . . . . . . . 22
Key Insights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.6 Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Key Insights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.7 Additional Future Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
www.ebook3000.com
2 Software Engineering for Science
1.1 Introduction
Computational science and engineering communities develop complex ap-
plications to solve scientific and engineering challenges, but these communi-
ties have a mixed record of using software engineering best practices [43, 296].
Many codes developed by scientific communities adopt standard software prac-
tices when the size and complexity of an application become too unwieldy to
continue without them [30]. The driving force behind adoption is usually the
realization that without using software engineering practices, the develop-
ment, verification, and maintenance of applications can become intractable.
As more codes cross the threshold into increasing complexity, software engi-
neering processes are being adopted from practices derived outside the scien-
tific and engineering domain. Yet the state of the art for software engineering
practices in scientific codes often lags behind that in the commercial soft-
ware space [16, 36, 52]. There are many reasons: lack of incentives, support,
and funding; a reward system favoring scientific results over software develop-
ment; limited understanding of how software engineering should be promoted
to communities that have their own specific needs and sociology [22, 35].
Some software engineering practices have been better accepted than oth-
ers among the developers of scientific codes. The ones that are used often
include repositories for code version control, licensing process, regular test-
ing, documentation, release and distribution policies, and contribution poli-
cies [21, 22, 30, 32]. Less accepted practices include code review, code depreca-
tion, and adoption of specific practices from development methodologies such
as Agile [9]. Software best practices that may be effective in commercial soft-
ware development environments are not always suited for scientific environ-
ments, partly because of sociology and partly because of technical challenges.
Sociology manifests itself as suspicion of too rigid a process or not seeing the
point of adopting a practice. The technical challenges arise from the nature
of problems being addressed by these codes. For example, multiphysics and
multicomponent codes that run on large high-performance computing (HPC)
platforms put a large premium on performance. In our experience, good perfor-
mance is most often achieved by sacrificing some of the modularity in software
architecture (e.g. [28]). Similarly lateral interactions in physics get in the way
of encapsulations (see Sections 1.3 and 1.4 for more examples and details).
This chapter elaborates on the challenges and how they were addressed in
FLASH [26, 33] and Amanzi [41], two codes with very different development
timeframe, and therefore very different development paths. FLASH, whose
development began in the late 1990s, is among the first generation of codes
that adopted a software process. This was in the era when the advantages
of software engineering were almost unknown in the scientific world. Amanzi
is from the “enlightened” era (by scientific software standards) where a min-
imal set of software practices are adopted by most code projects intending
Software Process for Multiphysics Multicomponent Codes 3
long term use. A study of software engineering of these codes from different
eras of scientific software development highlight how these practices and the
communities have evolved.
FLASH was originally designed for computational astrophysics. It has been
almost continuously under production and development since 2000, with three
major revisions. It has exploited an extensible framework to expand its reach
and is now a community code for over half a dozen scientific communities.
The adoption of software engineering practices has grown with each version
change and expansion of capabilities. The adopted practices themselves have
evolved to meet the needs of the developers at different stages of development.
Amanzi, on the other hand, started in 2012 and has developed from the ground
up in C++ using relatively modern software engineering practices. It still has
one major target community but is also designed with extensibility as an ob-
jective. Many other similarities and some differences are described later in the
chapter. In particular, we address the issues related to software architecture
and modularization, design of a testing regime, unique documentation needs
and challenges, and the tension between intellectual property management
and open science.
The next few sections outline the challenges that are either unique to, or
are more dominant in scientific software than elsewhere. Section 1.2 outlines
the possible lifecycle of a scientific code, followed by domain specific technical
challenges in Section 1.3. Section 1.4 describes the technical and sociological
challenges posed by the institutions where such codes are usually developed.
Section 1.5 presents a case study of FLASH and Amanzi developments. Sec-
tions 1.6 and 1.7 present general observations and additional considerations
for adapting the codes for the more challenging platforms expected in the
future.
1.2 Lifecycle
Scientific software is designed to model phenomena in the physical world.
The term physical includes chemical and biological systems since physical pro-
cesses are also the underlying building blocks for these systems. A phenomenon
may be microscopic (e.g. protein folding) or it can have extremely large or mul-
tiple scales (e.g. supernovae explosions). The physical characteristics of the
system being studied translate to mathematical models that describe their es-
sential features. These equations are discretized so that numerical algorithms
can be used to solve them. One or more parts of this process may themselves
be subjects of active research. Therefore the simulation software development
requires diverse expertise and adds many stages in the development and life-
cycle that may not be encountered elsewhere.
www.ebook3000.com
4 Software Engineering for Science
www.ebook3000.com
6 Software Engineering for Science
quantities) need to be built into the verification process. This issue is discussed
in greater detail in Chapter 4.
Testing of scientific software needs to reflect the layered complexity of the
codes. The first line of attack is to develop unit tests that isolate testing of
individual components. In scientific codes, however, often dependencies ex-
ist between different components of the code that cannot be meaningfully
isolated, making unit testing more difficult. In these cases, testing should be
performed with a minimal possible combination of components. In effect, these
minimally combined tests behave like unit tests because they focus on possible
defects in a narrow section of the code. In addition, multicomponent scientific
software should test various permutations and combination of components in
different ways. Configuring tests in this manner can help verify that the con-
figurations of interest are within the accuracy and stability constraints (see
Section 1.5.1.2 for an example of testsuite configuration for FLASH).
Key Insights
• Several stages of development precede software development in science,
including translation of physical processes into mathematical models,
discretization, convergence, and stability testing of numerics.
www.ebook3000.com
8 Software Engineering for Science
• Software users must understand their tools and also the limitations of
the tools well.
as these are not common in software outside of scientific domains. Section 1.5
describes how these challenges have been met by FLASH and Amanzi.
Multiphysics multiscale codes often require tight integration with third-
party software, which comes in the form of numerical libraries. Because mul-
tiphysics codes combine expertise from many domains, the numerical solvers
they use also require diverse applied mathematics expertise. It can be chal-
lenging for any one team to assemble all the necessary expertise to develop
their own software. Many, therefore, turn to third-party math libraries for
highly optimized routines. As mentioned in Section 1.2.5, the use of third-
party software does not absolve them from understanding its appropriate use.
Additionally, information about appropriate use of third-party software within
the context of a larger code must also be communicated to the users of the
code.
Key Insights
• Multiphysics codes need modularization for separation of concerns, but
modularization can be hard to achieve because of lateral interactions
inherent in the application.
• Codes can use third-party libraries to fill their own expertise gap, but
they must understand the characteristics and limitations of the third-
party software.
www.ebook3000.com
10 Software Engineering for Science
carved out of scientific goal-oriented projects that have their own priorities
and timeline. This model often ends up shortchanging the software engineer-
ing. The scientific output of applications is measured in terms of publications,
which in turn depend on data produced by the simulations. Therefore, in a
project driven purely by scientific objectives, the short-term science goals can
lead to situations where quick-and-dirty triumphs over long-term planning
and design. The cost of future lost productivity may not be appreciated until
much later when the code base has grown too large to remove its deficiencies
easily. Software engineering is forcibly imposed on the code, which is at best
a band-aid solution.
Another institutional challenge in developing good software engineering
practices for scientific codes is training students and staff to use the applica-
tion properly. Multiphysics codes require a broad range of expertise in domain
science from their developers, and software engineering skill is an added re-
quirement. Often experts in a domain science who develop scientific codes are
not trained in software engineering and many learn skills on the job through
reading or talking to colleagues [43,296]. Practices are applied as the scientists
understand them, usually picking only what is of most importance for their
own development. This practice can be both good and bad: good because it
sifts out the unnecessary aspects of SE practice and bad because it is not
always true that the sifted out aspects were really not necessary. It might just
be that the person adopting the practice did not understand the usefulness
and impact of those aspects.
Institutional challenges also arise from the scarcity and stability of re-
sources apart from funding. Deep expertise in the domain may be needed to
model the phenomenon right; and that kind of expertise is relatively rare.
Additionally, domain and numerical algorithmic expertise is rarely replicated
in a team developing the multiphysics scientific application. Then there is
the challenge of communicating the model to the software engineer, if one is
on the team, or to team members with some other domain expertise. Such
communications require at least a few developers on the team who can act as
interpreters for various domain expertise and are able to integrate the ideas.
Such abilities take a lot of time and effort to develop, neither of which is easy in
academic institutions where these codes are typically organically grown. The
available human resources in these institutions are postdocs and students who
move on, leaving little retention of institutional knowledge about the code. A
few projects that do see the need for software professionals struggle to find
ways of funding them or providing a path for their professional growth.
These institutional challenges are among the reasons that it is hard and
sometimes even undesirable to adopt any rigid software development method-
ology in scientific application projects. For example, the principles behind the
agile manifesto apply, but not all the formalized processes do. Agile software
methods [9] are lightweight evolutionary development methods with focus on
adaptability and flexibility, as opposed to waterfall methods which are se-
quential development processes where progress is perceived as a downward
Software Process for Multiphysics Multicomponent Codes 11
flow [11]. Agile methods aim to deliver working software as early as possible
within the lifecycle and improve it based on user feedback and changing needs.
These aims fit well with the objectives of scientific software development as
well. The code is developed by interdisciplinary teams where interactions and
collaborations are preferred over regimented process. The code is simultane-
ously developed and used for science, so that when requirements change, there
is quick feedback. For the same reason, the code needs to be in working con-
dition almost all the time. However, scarcity of resources does not allow the
professional roles in the agile process to be played out efficiently. No clear sep-
aration exists between the developer and the client; many developers of the
code are also scientists who use it for their research. Because software devel-
opment goes hand in hand with research and exploration of algorithms, doing
either within fixed timeframe is impossible. This constraint effectively elimi-
nates using agile methods such as sprints or extreme programming [22]. The
waterfall model is even less useful because it is not cost effective or even pos-
sible to have a full specification ahead of time. The code has to grow and alter
organically as the scientific understanding grows, the effect of using technolo-
gies are digested, and requirements change. A reasonable solution is to adopt
those elements of the methodologies that match the needs and objectives of
the team, adjust them where needed, and develop their own processes and
methodologies where none of the available options apply.
Because of the need for deep expertise, and the fact that the developer
of a complex physics module is almost definitely going to leave with possibly
no replacement, documentation of various kinds takes on a crucial role. It
becomes necessary to document the algorithm, the implementation choices,
and the range of operation. The generally preferred practice of writing self-
explanatory code helps but does not suffice. To an expert in the field who has
comprehensive understanding of the underlying math, such a code might be
accessible without inline documentation. But it is not to people from another
field or a software engineer in the team (if there is one) who may have reasons
to look at the code. For longevity and extensibility, a scientific code must have
inline documentation explaining the implementation logic and reasons behind
the choices made.
Key Insights
• The benefits of investment in software design or process are not appre-
ciated, and the funding model is not helpful in promoting them either.
• Development requires interdisciplinary teams with good communication,
which is difficult in academic institutions.
• Methodologies get better foothold if they are flexible and adapt to the
needs of the development team.
www.ebook3000.com
12 Software Engineering for Science
1.5.1 FLASH
The FLASH code [13, 26] has been under development for nearly two
decades at the Flash Center at the University of Chicago. The code was orig-
inally developed to simulate thermonuclear runaways in astrophysics such as
novae and supernovae. It was created out of an amalgamation of three legacy
codes: Prometheus for shock hydrodynamics, PARAMESH for adaptive mesh
refinement (AMR) and locally developed equation of state and nuclear burn
code. It has slowly evolved into a well-architected extensible software with a
user base in over half a dozen scientific communities. FLASH has been applied
to a variety of problems including supernovae, X-ray bursts, galaxy clusters,
stellar structure, fluid instabilities, turbulence, laser-experiments design and
analysis, and nuclear reactor rods. It supports an Eulerian mesh combined
with a Lagrangian framework to cover a large class of applications. Physics
capabilities include compressible hydrodynamics and magnetohydrodynamics
solvers, nuclear burning, various forms of equations of state, radiation, laser
drive, and fluid-structure interactions.
code units. A setup tool parsed this information to configure a consistent ap-
plication. The setup tool also interpreted the configuration DSL to implement
inheritance using the directory structure. For more details about FLASH’s
object-oriented framework see [26].
FLASH is designed with separation of concerns as an objective, which is
achieved by separating the infrastructural components from physics. The ab-
straction that permits this approach is well known in scientific codes, that of
decomposing a physical domain into rectangular blocks surrounded by halo
cells copied over from the surrounding neighboring blocks. To a physics oper-
ator, the whole domain is not distinguishable from a box. Another necessary
aspect of the abstraction is not to let any of the physics modules own the
state variables. They are owned by the infrastructure that decomposes the
domain into blocks. A further separation of concern takes place within the
units handling the infrastructure, that of isolating parallelism from the bulk
of the code. Parallel operations such as ghost cell fill, refluxing, or regridding
have minimal interleaving with state update obtained from applying physics
operators. To distance the solvers from their parallel constructs, the required
parallel operations provide an API with corresponding functions implemented
as a subunit. The implementation of numerical algorithms for physics opera-
tors is sequential, interspersed with access to the parallel API as needed.
Minimization of data movement is achieved by letting the state be com-
pletely owned by the infrastructure modules. The dominant infrastructure
module is the Eulerian mesh, owned and managed by the Grid unit. The
physics modules query the Grid unit for the bounds and extent of the block
they are operating on and get a pointer to the physical data. This arrange-
ment works in most cases but gets tricky where the data access pattern does
not conform to the underlying mesh. An example is any physics dealing with
Lagrangian entities (LEs). They need a different data structure, and the data
movement is dissimilar from that of the mesh. Additionally, the LEs interact
with the mesh, so maintaining physical proximity of the corresponding mesh
cell is important in their distribution. This is an example of unavoidable lat-
eral interaction between modules. In order to advance, LEs need to get field
quantities from the mesh and then determine their new locations internally.
They may need to apply near- and far-field forces or pass some information
along to the mesh or be redistributed after advancing in time. FLASH solves
this conundrum by keeping the LE data structure extremely simple and using
argument passing by reference in the APIs. The LEs are attached to the block
in the mesh that has the overlapping cell - an LE leaves its block when its
location no longer overlaps with the block. Migration to a new block is an inde-
pendent operation from everything else that happens to the LEs. In FLASH
parlance this is the Lagrangian framework (see [29] for more details). The
combination of Eulerian and Lagrangian frameworks that interoperate well
with one another has succeeded in largely meeting the performance-critical
data management needs of the code.
www.ebook3000.com
14 Software Engineering for Science
correctly start from a checkpoint without any loss of state information. The
FLASH testsuite configures and builds each test every time it is run; there-
fore, build and configuration tests are built-in. Progress of a test is reported
at every stage, the final stage being the outcome of the comparison. Because
FLASH has many available application configurations whose union provides
good code coverage, the task of building a test suite is simplified to selection
among existing applications. In many applications the real challenge is picking
parameters that exercise the targeted features without running for too long.
We use a matrix to ensure maximum coverage, where infrastructure features
are placed along the rows and physics modules are placed along columns. For
each selected test all the covered features are marked off in the matrix. Mark-
ing by the same test in two or more places in the same row or same column
represents interoperability among the corresponding entities. The following
order is used for filling the matrix:
• Unit tests
• Setups used in science production runs
• Setups known to be sensitive to perturbations
• Simplest and fastest setups that fill the remaining gaps
FLASH’s testing can be broadly classified into three categories: the daily
testing, to verify ongoing correctness of the code; more targeted testing re-
lated to science production runs; and porting to and testing on new platforms.
Daily testing is performed on multiple combinations of platforms and software
stacks and uses the methodology described above. In preparing for a produc-
tion schedule, testing is a combination of scaling tests, cost estimation tests,
and looking for potential trouble spots. Scientists and developers work closely
to devise meaningful weak-scaling tests (which can be difficult because of
nonlinearity and adaptive mesh refinement), and tests that can exercise the
vulnerable code sections without overwhelming the test suite resources. Sam-
ple smaller scale production runs are also performed on the target platform to
help make informed estimates of CPU hours and disk space needed to com-
plete the simulation. For more details on simulation planning see [27]. For
porting the code to a new platform, a successful production run from the past
is used as a benchmark for exercising the code on a new platform, along with
a subset of the standard test suite.
FLASH has had some opportunities for validation against experiments. For
example, FLASH could model a variety of laboratory experiments involving
fluid instabilities [25, 38]. These efforts allowed researchers to probe the valid-
ity of models and code modules, and also bolstered the experimental efforts
by creating realistic simulation capabilities for use in experimental design.
The newer high-energy density physics (HEDP) initiative involving FLASH is
directed at simulation-based validation and design of experiments at the ma-
jor laser facilities in the United States and Europe. Other forms of validation
www.ebook3000.com
16 Software Engineering for Science
have been convergence tests for the flame model that is used for supernova
simulations, and validation of various numerical algorithms against analyti-
cal solutions of some known problems. For example, the Sedov [12] problem,
which is seeded by a pressure spike in the center that sends out a spherical
shock-wave into the domain, has a known analytical solution. It is used to val-
idate hydrodynamics in the code. Several other similar examples exist where a
simple problem can help validate a code capability through known analytical
solutions.
with the ongoing modifications to the older version (needed by the scientists
to do their work) turned the completion of the transition into a moving target.
Because of these lessons learned, the second transition took a completely
different approach and was much more successful. The infrastructural back-
bone/framework for the new version was built in isolation from the old version
in a new repository. The framework design leveraged the knowledge gained by
the developers about the idiosyncracies of the solvers in earlier versions and
focused on the needs of the future version. There was no attempt at backward
compatibility with the framework of the previous version. Once the framework
was thoroughly tested, physics modules were transitioned. Here the emphasis
was on transitioning all the capabilities needed for one project at the same
time, starting with the most stable modules. Once a module was moved to the
new version, it was effectively frozen in the old version (the reason for selecting
the most stable and mature code sections). Any modification after that point
had to be made simultaneously in the new version as well. Although it sounds
like a lot of duplicate effort, in reality such instances were rare. This version
transition was adopted by the scientists quickly.
FLASH’s documentation takes a comprehensive approach with a user’s
guide, a developer’s guide, robodoc API, inline documentation, and online
resources. Each type of documentation serves a different purpose and is in-
dispensable to the developers and users of the code. Scripts are in place that
look for violations of coding standards and documentation requirements. The
user’s guide documents the mathematical formulation, algorithms used, and
instructions on using various code components. It also includes examples of
relevant applications explaining the use of each code module. The developer’s
guide specifies the design principles and coding standards with an extensive
example of the module architecture. Each function in the API is required
to have a robodoc header explaining the input/output, function, and special
features of the function. Except for the third-party software, every nontrivial
function in the code is required to have sufficient inline documentation so that
a nonexpert can understand and maintain the code.
FLASH effectively has two versions of release: internal, which is close to
the agile model, and general, which is no more than twice a year. The internal
release amounts to tagging a stable version in the repository for the internal
users of the code. This signals to the users that a forward merge into their
production branch is safe. General releases have a more rigorous process that
makes them more expensive and therefore infrequent. For a general release
the code undergoes pruning, checking for compliance with coding and docu-
mentation standards, and more stringent than usual testing. The dual model
ensures that the quality of code and documentation is maintained without un-
duly straining the team resources, while near-continuous code improvement is
still possible for ongoing projects.
www.ebook3000.com
18 Software Engineering for Science
1.5.1.4 Policies
In any project, policies regarding attributions, contributions and licensing
matter. In scientific domains, intellectual property rights and interdisciplinary
interactions are additional policy areas that are equally important. Some of
these policy requirements are a direct consequence of the strong gatekeeping
regimes that majority of publicly distributed scientific software follow. Many
arguments are forwarded for dominance of this model in the domain; the most
compelling one relates to maintaining the quality of software. Recollect that
the developers in this domain are typically not trained in software engineering
and that software quality control varies greatly between individuals and/or
groups of developers. Because of tight, and sometimes lateral, coupling be-
tween functionalities of code modules, a lower-quality component introduced
into the code base can have a disproportionate impact on the overall relia-
bility of output produced by the code. Strong gatekeeping is desirable, and
that implies having policies in place for accepting contributions. FLASH again
differentiates between internal and external contributors in this regard. The
internal contributors are required to meet the quality requirements such as
coding standards, documentation, and code verification in all their develop-
ment. Internal audit processes minimize the possibility of poorly written and
tested code from getting into a release. The internal audit also goes through
a periodic pruning to ensure that bad or redundant code gets eliminated.
The external contributors are required to work with a member of the in-
ternal team to include their code in the released version. The minimum set
required from them is (1) code that meets coding standards and has been
used or will be used for results reported in peer-reviewed publication; (2)
at least one test that can be included in the test-suite for nightly testing;
(3) documentation for the user’s guide, robodoc documentation for any API
functions and inline documentation explaining the flow of the control; and (4)
a commitment to answer questions on a user’s mailing list. The contributors
can negotiate the terms of release; a code section can be excluded from the
release for a mutually agreed period of time in order to enable contributors
to complete their research and publish their work before the code becomes
public. This policy permits potential contributors to be freed from the ne-
cessity of maintaining their code independently while still retaining control
over their software until an agreed-upon release time. As a useful side effect
their code remains in sync with the developments in the main branch between
releases.
Another model of external contribution to FLASH involves no interven-
tion from the core gate-keeping team. In this model anyone can host any
FLASH-compatible code on their site. The code has no endorsement from
the distributing entity, the Flash Center, which does not take responsibility
for its quality. The Flash Center maintains a list of externally hosted “as-is”
code sites; the support for these code sections is entirely the responsibility of
hosting site.
Software Process for Multiphysics Multicomponent Codes 19
1.5.2 Amanzi/ATS
Amanzi and its sister code the Advanced Terrestrial Simulator (ATS),
provide a good contrasting example to FLASH. Developed starting in 2012
as the simulation capability for the U.S. Department of Energy’s Environ-
mental Management program, Amanzi solves equations for flow and reactive
transport in porous media, with intended applications of environmental re-
mediation for contaminated sites [42]. Built on Amanzi’s infrastructure, ATS
adds physics capability to solve equations for ecosystem hydrology, including
surface/subsurface hydrology, energy and freeze/thaw cycles, surface energy
balance and snow, and vegetation modeling [15,47]. Amanzi was initially sup-
ported by a development team of several people with dedicated development
money. ATS was largely developed by one person, postdocs, and a growing set
of collaborators from the broader community and was supported by projects
whose deliverables are ecosystem hydrology papers.
Amanzi/ATS’s history makes it a good contrast to FLASH. Developed
from the ground up in C++ using relatively modern software engineering
practices, it has few legacy code issues. Unlike FLASH, Amanzi/ATS makes
extensive use of third-party libraries, with associated advantages and disad-
vantages (currently Amanzi/ATS uses nearly 10k lines of cmake to build it and
its libraries). However, they also share a lot of commonalities. Like FLASH,
version control has played a critical role in the development process, espe-
cially because developers are spread across multiple physical locations and
networks. Like FLASH, Amanzi/ATS makes extensive use of module-level
and regression-level testing to ensure correctness and enable refactoring. And
like FLASH, Amanzi/ATS has found the open source strategy to be incred-
ibly useful; in particular, the open source nature of the code has eliminated
www.ebook3000.com
20 Software Engineering for Science
1.5.2.3 Testing
Testing is an extremely sensitive subject in computational software engi-
neering – so much so that it merits its own chapter: 4. Few scientific codes
are sufficiently tested by conventional software engineering (SE) standards,
and many scientific code developers are aware of the shortcoming. As dis-
cussed above, frequently scientific codes are limited to component-level tests,
because it can be difficult to write sufficiently fine-grained unit tests. SE tech-
niques such as mocking objects are almost never practiced, because mocked
objects would require nearly all of the same functionality of the real object in
order to properly test the physics component. The claim is that most physics
www.ebook3000.com
22 Software Engineering for Science
Key Insights
• Both FLASH and Amanzi took a long term view and are designed for
extensibility.
• Both codes have found open development beneficial for many reasons,
including robustness of results and community penetration.
• FLASH takes a broader view of unit testing; similar tests are described
as component tests by Amanzi.
• Both codes use different levels of granularity in testing to obtain cover-
age.
• FLASH adopted and evolved software engineering practices over time;
Amanzi started with many more practices in place.
• Because of its age and code accumulation over time, refactoring FLASH
is a large undertaking. It has a big challenge in adapting for future
heterogeneous platforms.
1.6 Generalization
Not all the solutions described in the earlier sections for computational sci-
ence specific challenges are generalizable to all scientific software, but the vast
majority of them are. Indeed at a workshop on community codes in 2012 [30],
all represented codes had nearly identical stories to tell about their motivation
for adopting software engineering practices and the ones that they adopted.
This was true irrespective of the science domains these codes served, the al-
gorithms and discretization methods they used, and communities they repre-
sented. Even their driving design principles were similar at the fundamental
level, although the details differed. The codes represented the state of the art
in their respective communities in terms of both the model and algorithmic
research incorporated and the software engineering practices. Note that these
are the codes that have stood the test of time and are respected in their com-
munities. They are widely used and supported and have more credibility for
producing reproducible reliable results than do smaller individualistic efforts.
At a minimum they provide a snapshot of the state of large scale computing
www.ebook3000.com
24 Software Engineering for Science
Key Insights
• High-level framework design of multiphysics codes follows componenti-
zation and composability, and is cognizant of trade-offs with raw perfor-
mance.
Software Process for Multiphysics Multicomponent Codes 25
www.ebook3000.com
26 Software Engineering for Science
challenge are understood even less. The general consensus is that more pro-
gramming abstractions are necessary, not just for the extreme scale, but also
for small-scale computing. The unknown is which abstraction or combination
of abstractions will deliver the solution. Many solutions have been proposed,
for example [54] (also see [10] for a more comprehensive and updated list). Of
these, some have undergone more testing and exercise under realistic applica-
tion instances than others. Currently, no approach has been shown to provide
a general solution that can be broadly applicable in the ways that optimizing
compilers and MPI were in the past. This is an urgent and serious challenge
facing the scientific communities today. Future viability of scientific codes de-
pends on significant help from software engineering expertise and motivation
within the community.
Acknowledgments
This work was supported by the U.S. Department of Energy, Office of
Science, under contract number DE-AC02-06CH11357; National Energy Re-
search Scientific Computing Center, a DOE Office of Science User Facility un-
der Contract No. DE-AC02-05CH11231; Department of Energy at Los Alamos
National Laboratory under contract DE-AC52-06NA25396 and the DOE Of-
fice of Science Biological and Environmental Research (BER) program in
Subsurface Biogeochemical Research. Support was also provided through the
IDEAS scientific software productivity project (www.ideas-productivity.org),
funded by the U.S. Department of Energy Office of Science, Advanced Sci-
entific Computing Research and Biological and Environmental Research pro-
grams. One software described in this work was in part developed by the DOE-
supported ASC / Alliance Center for Astrophysical Thermonuclear Flashes at
the University of Chicago under grant B523820.
Chapter 2
A Rational Document Driven Design
Process for Scientific Software
W. Spencer Smith
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2 A Document Driven Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.2.2 Development Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2.3 Software Requirements Specification (SRS) . . . . . . . . . . . . . 34
2.2.4 Verification and Validation (V&V) Plan and Report . . . 35
2.2.5 Design Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2.6 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.2.7 User Manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.2.8 Tool Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.3 Example: Solar Water Heating Tank . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.3.1 Software Requirements Specification (SRS) . . . . . . . . . . . . . 42
2.3.2 Design Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.4 Justification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.4.1 Comparison between CRAN and Other Communities . . 48
2.4.2 Nuclear Safety Analysis Software Case Study . . . . . . . . . . 49
2.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.1 Introduction
This chapter motivates, justifies, describes and evaluates a rational doc-
ument driven design process for scientific software. The documentation is
adapted from the waterfall model [71, 114], progressing from requirements,
to design, to implementation and testing. Many researchers have stated that
a document driven process is not used by, nor suitable for, scientific soft-
ware. These researchers argue that scientific developers naturally use an ag-
ile philosophy [55, 59, 67, 101], or an amethododical process [79], or a knowl-
edge acquisition driven process [80]. Just because a rational process is not
currently used does not prove that it is inappropriate, only that past efforts
27
www.ebook3000.com
28 Software Engineering for Science
www.ebook3000.com
30 Software Engineering for Science
to deal with concurrency (except for the case of parallel processing), real-time
constraints, or complex user interactions. The typical scientific software design
pattern is simply: Input ⇒ Calculate ⇒ Output. All domains struggle with
up-front requirements, but scientists should remember that their requirements
do not have to be fully determined a priori. As mentioned in the previous para-
graph, iteration is inevitable and a rational process can be faked.
Although current practice tends to neglect requirements documentation,
it does not have to be this way. To start with, when researchers say that
requirements emerge through iteration and experimentation, they are only
referring to one category of scientific software. As observed previously [80,106],
scientific software can be divided into two categories: specific physical models
and general purpose tools. When scientific software is general purpose, like a
solver for a system of linear equations, the requirements should be clear from
the start. General purpose tools are based on well understood mathematics for
the functional requirements, as shown in scientific computing textbooks [73].
Even the nonfunctional requirements, like accuracy, can be quantified and
described through error analysis and in some cases validated computing, such
as interval arithmetic.
Even specialized software, like weather prediction or structural analysis,
can be documented a priori, as long as the author’s viewpoint takes into ac-
count separation of concerns, a broad program family approach and consider-
ation for future change management. With respect to separation of concerns,
the physical models should be clearly separated from the numerical methods.
Knowing the most appropriate numerical technique is difficult at the outset,
but this is a decision for the design, not the requirements, stage. In addition,
rather than aim for a narrow specification of the model to be implemented,
the target should be a broad specification of the potential family of models. A
program family approach, where commonalities are reused and variabilities are
identified and systematically handled, is natural for scientific software [110].
As pointed out previously, at an abstract level, the modeller will know which
governing conservation equations will need to be satisfied. The challenge is
to know which simplifying assumptions are appropriate. This is where the
“experimentation” by scientists comes in. If the assumptions are documented
clearly, and explicit traceability is given to show what part of the model they
influence, then changes can be made later, as understanding of the problem
improves. Using knowledge from the field of SE, the documentation can be
built with maintainability and reusability in mind.
This chapter shows how SE templates, rules and guidelines, which have
been successful in other domains, can be adapted to handle rapid change and
complexity. The document driven approach is first described (Section 2.2)
and then illustrated via the example of software to model a solar water heat-
ing tank (Section 2.3). Justification (Section 2.4) for the document driven
process is shown through a case study where legacy nuclear safety analysis
code is re-documented, leading to the discovery of 27 issues in the original
documentation. Further justification is given through a survey of statistical
A Rational Document Driven Design Process for Scientific Software 31
software for psychology, which shows that quality is highest for projects that
most closely follow a document driven approach.
www.ebook3000.com
32 Software Engineering for Science
development phase has a corresponding test plan and report. The proposed
process does not go this far; one document summarizes the V&V plan at the
crucial initial stage of development and provides an overview for the V&V of
the other phases. In the traditional waterfall, test plans are in a later stage,
but thinking about system tests early has the benefit that test cases are often
more understandable than abstract requirements. System test cases should
be considered at the same time as requirements; the tests themselves form
an alternative, but incomplete, view of the requirements. Iteratively work-
ing between requirements and system tests builds confidence that the project
is moving in the correct direction before making significant, and expensive,
decisions.
www.ebook3000.com
34 Software Engineering for Science
1 Reference Material 1
1.1 Table of Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Table of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Abbreviations and Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Introduction 4
2.1 Purpose of Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Scope of Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Organization of Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
5 Requirements 23
5.1 Functional Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.2 Nonfunctional Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6 Likely Changes 25
An SRS improves the software qualities listed in Table 2.1. For instance,
usability is improved via an explicit statement of the expected user character-
istics. Verifiability is improved because the SRS provides a standard against
which correctness can be judged. The recommended template [108, 109] facili-
tates verification of the theory by systematically breaking the information into
structured units, and using cross-referencing for traceability. An SRS also im-
proves communication with stakeholders. To facilitate collaboration and team
integration, the SRS captures the necessary knowledge in a self-contained
document. If a standard template is adopted for scientific software, this would
help with comparing between different projects and with reusing knowledge.
• Test cases can be selected that are a subset of the real problem for which
a closed-form solution exists. When using this approach, confidence in
the actual production code can only be built if it is the same code used
for testing; that is, nothing is gained if a separate, simpler, program is
written for testing the special cases.
• Verification test cases can be created by assuming a solution and using
this to calculate the inputs. For instance, for a linear solver, if A and x
are assumed, b can be calculated as b = Ax. Following this, Ax∗ = b can
be solved and then x and x∗ , which should theoretically be equal, can be
compared. In the case of solving Partial Differential Equations (PDEs),
this approach is called the Method of Manufactured Solutions [99].
• Most scientific software uses floating point arithmetic, but for testing
purposes, the slower, but guaranteed correct, interval arithmetic [74] can
be employed. The faster floating point algorithm can then be verified by
ensuring that the calculated answers lie within the guaranteed bounds.
www.ebook3000.com
36 Software Engineering for Science
• Verification tests should include plans for convergence studies. The dis-
cretization used in the numerical algorithm should be decreased (usually
halved) and the change in the solution assessed.
• Confidence can be built in a numerical algorithm by comparing the
results to another program that overlaps in functionality. If the test
results do not agree, then one, possibly both of the programs is incorrect.
• The verification plan should also include test plans for nonfunctional
requirements, like accuracy, performance and portability, if these are
important implementation goals. Performance tests can be planned to
describe how the software responds to changing inputs, such as problem
size, condition number etc. Verification plans can include relative com-
parisons between the new implementation and competing products [106].
In addition to system test cases, the verification plan should outline other
testing techniques that will be used to build confidence. For instance, the plan
should describe how unit test cases will be selected, although the creation of
the unit test cases will have to wait until the design is complete. The test
plan should also identify what, if any, code coverage metrics will be used
and what approach will be employed for automated testing. If other testing
techniques, such as mutation testing, or fault testing [114], are to be employed,
this should be included in the plan. In addition to testing, the verification plan
should mention the plans for other techniques for verification, such as code
walkthroughs, code inspections, correctness proofs etc. [71, 114].
Validation is also included in the V&V plan. For validation, the document
should identify the experimental results for comparison to the simulated re-
sults. If the purpose of the code is a general purpose mathematical library
there is no need for a separate validation phase.
Figure 2.3 shows the proposed template for capturing the V&V plan. The
first two sections cover general and administrative information, including the
composition of the testing team and important deadlines. The “Evaluation”
section fleshes out the methods, tools and techniques while the “System Test
Description” provides an example of a system test. In an actual V&V report,
there would be multiple instances of this section, each corresponding to a
different system test. In cases where validation tests are appropriate, each
validation test would also follow this template.
The corresponding document for the V&V plan is the V&V report. Once
the implementation and other documentation is complete, the V&V activities
take place. The results are summarized in the report, with enough detail
to convince a reader that all the planned activities were accomplished. The
report should emphasize those changes that were made in a response to issues
uncovered during verification and validation.
A Rational Document Driven Design Process for Scientific Software 37
1 General Information 2
1.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Overview of Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Plan 4
2.1 Software Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Test Team . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Milestones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3.1 Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3.2 Dates and Deadlines . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.4 Budget . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Evaluation 6
3.1 Methods and Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.2 Extent of Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.3 Test Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.4 Testing Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Data Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2.1 Data Recording . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2.2 Test Progression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2.3 Testing Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2.4 Testing Data Reduction . . . . . . . . . . . . . . . . . . . . . . . . . 9
www.ebook3000.com
38 Software Engineering for Science
module. As implied in Section 2.1, design for change is valuable for scientific
software, where a certain amount of exploration is necessary.
The modular decomposition can be recorded in a Module Guide (MG) [94],
which organizes the modules in a hierarchy by their secrets. Given his interest
in embedded real time systems, the top-level decomposition from Parnas [94]
includes a hardware hiding module. For scientific software on standard hard-
ware, with serial algorithms, simplification is usually possible, since the vir-
tualization of the hardware will typically not have to be directly implemented
by the programmer, being generally available via libraries, such as stdio.io
in C. Further simplifications are available in scientific software, by taking ad-
vantage of the Input ⇒ Calculate ⇒ Output design pattern mentioned in
Section 2.1. This pattern implies the presence of an input format hiding mod-
ule, an input parameter data structure hiding module and an output format
hiding module [75]. The bulk of the difference between designs comes through
the modules dealing with calculations. Typical calculation modules hide data
structures, algorithms and the governing physics. The application of the Par-
nas approach to scientific software has been illustrated by applying it to the
example of a mesh generator [111].
Figure 2.4 shows the proposed template for the MG document. The docu-
ment begins with an explicit statement of the anticipated, or likely, changes.
These anticipated changes guide the design. If a likely change is required,
then ideally only one module will need to be re-implemented. The “Module
Decomposition” section lists the modules, organized by a hierarchy of related
secrets. The top level decomposition of the hierarchy consists of hardware
hiding, behavior hiding and software decision hiding modules [94]. For each
module the secret it encapsulates and the service it provides are listed. Care
is taken that each module lists only one secret and that secrets are in the form
of nouns, not verbs. The example modules listed in the section of Figure 2.4
are typical of scientific software. The “Traceability Matrix” section shows how
the anticipated changes map to modules, and how the requirements from the
SRS map to modules. Section 2.3.2 describes an example MG, along with the
uses hierarchy between modules.
The modular decomposition advocated here has much in common with
Object Oriented (OO) design, which also emphasizes encapsulation. However,
care must be taken with overusing OO languages, since a significant perfor-
mance penalty is possible using dynamic dispatch, especially in an inner loop.
Operator overloading should also be used with care, since the operator seman-
tics may change depending on the type of its operands.
The MG alone does not provide enough information. Each module’s in-
terface needs to be designed and documented by showing the syntax and
semantics of its access routines. This can be done in the Module Interface
Specification (MIS) [75]. The MIS is less abstract than the architectural de-
sign. However, an MIS is still abstract, since it describes what the module will
do, but not how to do it. The interfaces can be documented formally [68, 111]
or informally. An informal presentation would use natural language, together
A Rational Document Driven Design Process for Scientific Software 39
Contents
1 Introduction 2
3 Module Hierarchy 4
5 Module Decomposition 5
5.1 Hardware Hiding Modules (M1) . . . . . . . . . . . . . . . . . . . . . . . . . 5
5.2 Behavior-Hiding Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
5.2.1 Input Format Module (M2) . . . . . . . . . . . . . . . . . . . . . . . 6
5.2.2 Input Parameters Module (M3) . . . . . . . . . . . . . . . . . . . . . 6
5.2.3 Output Format Module (M4) . . . . . . . . . . . . . . . . . . . . . . 6
5.2.4 Calculation Related Module (M5) . . . . . . . . . . . . . . . . . . . . 6
5.2.5 Another Calculation Related Module (M6) . . . . . . . . . . . . . . . 7
5.2.6 Control Module (M7) . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5.3 Software Decision Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5.3.1 Data Structure Module (M8) . . . . . . . . . . . . . . . . . . . . . . 7
5.3.2 Solver Module (M9) . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5.3.3 Plotting or Visualizing Module (M10) . . . . . . . . . . . . . . . . . . 8
6 Traceability Matrix 8
with equations. The specification needs to clearly define all parameters, since
an unclear description of the parameters is one cause of reusability issues for
libraries [65]. To assist with interface design, one can take inspiration from
the common design idioms for the structures of set, sequence and tuple [75, p.
82–83]. In addition, the designer should keep in mind the following interface
quality criteria: consistent, essential, general, minimal and opaque [75, p. 83].
2.2.6 Code
1
Comments can improve understandability, since comments “aid the under-
standing of a program by briefly pointing out salient details or by providing
a larger-scale view of the proceedings” [82]. Comments should not describe
details of how an algorithm is implemented, but instead focus on what the
algorithm does and the strategy behind it. Writing comments is one of the
best practices identified for scientific software by Wilson et al. [117]. As said
by Wilson et al., scientific software developers should aim to “write programs
for people, not computers” and “[t]he best way to create and maintain ref-
erence documentation is to embed the documentation for a piece of software
in that software” [117]. Literate Programming (LP) [84] is an approach that
takes these ideas to their logical conclusion.
www.ebook3000.com
40 Software Engineering for Science
2kc hb
hc = , (B.23)
2kc + τc hb
2kc hp
hg = (B.24)
2kc + τc hp
21 Calculation of heat transfer coefficient (hc ) and the gap conductance (hg ) 21
/∗ calculation of heat transfer coefficient ∗/
∗h c = (2 ∗ (∗k c ) ∗ (∗h b ))/((2 ∗ (∗k c )) + (∗tau c ∗ (∗h b )));
/∗ calculation of gap conductance ∗/
∗h g = (2 ∗ (∗k c ) ∗ (∗h p ))/((2 ∗ (∗k c )) + (∗tau c ∗ (∗h p )));
This code is used in chunks 15 and 60
www.ebook3000.com
42 Software Engineering for Science
Figure 2.6: Solar water heating tank, with heat flux qc from coil and qP to
the PCM.
provides a conceptual view of the heat flux (q) in the tank. The full set of
documents, code and test cases, for the SWHS example can be found at:
https://github.com/smiths/swhs.git.
GS3: predict the change in the energy of the water over time
GS4: predict the change in the energy of the PCM over time
7
Figure 2.7: Goal statements for SWHS.
general definitions and creates data definitions. The template aids in docu-
menting all the necessary information, since each section has to be considered.
This facilitates achieving completeness by essentially providing a checklist. Be-
sides requiring that section headings be filled in, the template also requires
that every equation either has a supporting external reference, or a derivation.
Furthermore, for the SRS to be complete and consistent every symbol, general
definition, data definition, and assumption needs to be used at least once.
The goal statements for SWHS, given in Figure 2.7, specify the target of
the system. In keeping with the principle of abstraction, the goals are stated
such that they describe many potential instances of the final program. As a
consequence, the goals will be stable and reusable.
As mentioned in Section 2.1, scientists often need to experiment with their
assumptions. For this reason, traceability information needs to be part of the
assumptions, as shown in Figure 2.8. As the assumptions inevitably change,
the analyst will know which portions of the documentation will potentially
also need to change.
The abstract theoretical model for the conservation of thermal energy is
presented in Figure 2.9. As discussed in Section 2.1, this conservation equation
applies for many physical problems. For instance, this same model is used in
the thermal analysis of a nuclear reactor fuel pin [103]. This is possible since
the equation is written without reference to a specific coordinate system.
T1 (Figure 2.9) can be simplified from an abstract theoretical model to
a more problem specific General Definition (GD). Figure 2.10 shows one po-
tential refinement (GD2), which can be derived using Assumptions A3–A6
(Figure 2.8). The specific details of the derivation are given in the full SRS
on-line, but not reproduced here for space reasons. This restating of the con-
servation of energy is still abstract, since it applies for any control volume
that satisfies the required assumptions. GD2 can in turn be further refined
to specific (concrete) instanced models for predicting the temperature of the
water and the PCM over time.
www.ebook3000.com
4.2 Solution Characteristics Specification
The instance models (ODEs) that govern SWHS are presented in Subsection 4.2.5. The
information to understand the meaning of the instance models and their derivation is also
44presented, so that the instance
Software
models Engineering
can be verified. for Science
4.2.1 Assumptions
This section simplifies the original problem and helps in developing the theoretical model by
filling in the missing information for the physical system. The numbers given in the square
brackets refer to the theoretical model [T], general definition [GD], data definition [DD],
instance model [IM], or likely change [LC], in which the respective assumption is used.
A1: The only form of energy that is relevant for this problem is thermal energy. All other
forms of energy, such as mechanical energy, are assumed to be negligible [T1].
A2: All heat transfer coefficients are constant over time [GD1].
A3: The water in the tank is fully mixed, so the temperature is the same throughout the
entire tank [GD2, DD2].
A4: The PCM has the same temperature throughout [GD2, DD2, LC1].
A5: Density of the water and PCM have not spatial variation; that is, they are each constant
over their entire volume [GD2].
A6: Specific heat capacity of the water and PCM have no spatial variation; that is, they
are each constant over their entire volume [GD2].
A7: Newton’s law of convective cooling applies between the coil and the water [DD1].
Figure
A8: The temperature of the2.8: Sample
heating assumptions
coil is constant for[DD1,
over time SWHS.
LC2].
A9: The temperature of the heating coil does not vary along its length [DD1, LC3].
IM1Newton’s
A10: and IM2 law ofprovide
convectivethe system
cooling appliesofbetween
ODEsthethatwaterneeds
and theto be[DD2].
PCM solved to
determine Tw and TP . If a reader would prefer a bottom up approach, as
A11: The model only accounts for charging of the tank, not discharging. The temperature of
opposedtheto theand
water default
PCM cantop down
only organization
increase, of thethey
or remain constant; original SRS, they
do not decrease. Thiscan
start reading with the instance models and trace back to find any additional
implies that the initial temperature (A12) is less than (or equal) to the temperature
of the coil
information that[IM1, LC4].
they require. IM2 is shown in Figure 2.11. Hyperlinks are
included
A12: The initial temperaturedocumentation
in the original of the water and theforPCM
easyis navigation
the same [IM1,toIM2,
theLC5].
associated
data definitions, assumptions and instance models.
A13: The simulation will start with the PCM in solid form [IM2, IM4].
To achieve a separation of concerns between the requirements and the de-
A14:the
sign, TheSRSoperating temperature
is abstract, range of theinsystem
as discussed is such
Section 2.2.that
Thethe water is always
governing ODEs in are
liquid form. That is, the temperature will not drop below the melting point of water,
given, but not a solution algorithm. The focus is on “what” the software does,
or rise above its boiling point [IM1, IM3].
not “how” to do it. The numerical methods are left to the design document.
This approach facilitates change, since8 a new numerical algorithm requires no
changes to the SRS.
If an expert reviewer is asked to “sign off” on the documentation,
he or she should find an explanation/justification for every symbol/equa-
tion/definition. This is why IM2 not only shows the equation for energy
balance to find TP , but also the derivation of the equation. (The deriva-
tion is not reproduced here for space reasons, but it can be found at
https://github.com/smiths/swhs.git.)
Table 2.3 shows an excerpt for the table summarizing the input variables
for SWHS. With the goal of knowledge capture in mind, this table includes
constraints on the input values, along with data on typical values. When new
users are learning software, they often do not have a feel for the range and
magnitude of the inputs. This table is intended to help them. It also provides
a starting point for later testing of the software. The uncertainty information
A15: The tank is perfectly insulated so that there is no heat loss from the tank [IM1, LC6].
A16: No internal heat is generated be either the water or the PCM; therefore, the volumetric
heat generation is zero [IM1, IM2].
A17: The volume change of the PCM due to melting is negligible [IM2].
A18: The PCM is in either in a liquid or solid state, but not a gas [IM2, IM4].
Number T1
Label Conservation of thermal energy
Equation r · q + g = ⇢C @T
@t
Description The above equation gives the conservation of energy for time varying heat
transfer in a material of specific heat capacity C and density ⇢, where q
is the thermal flux vector, g is the volumetric heat generation, T is the
temperature, t is time, and r is the gradient operator. For this equation to
apply, other forms of energy, such as mechanical energy, are assumed to be
negligible in the system (A1).
Source http://www.efunda.com/formulae/heat_transfer/conduction/
overview_cond.cfm
Ref. By GD2
www.ebook3000.com
46 Software Engineering for Science
Number GD2
Label Simplified rate of change of temperature
Equation mC dT
dt
= qin Ain qout Aout + gV
Description The basic equation governing the rate of change of temperature, for a given
volume V , with time.
m is the mass (kg).
1
C is the specific heat capacity (J kg C 1 ).
T is the temperature ( C) and t is the time (s).
qin and qout are the in and out heat transfer rates, respectively (W m 2 ).
Ain and Aout are the surface areas over which the heat is being transferred
in and out, respectively (m2 ).
g is the volumetric heat generated (W m 3 ).
V is the volume (m3 ).
Ref. By IM1, IM2
Detailed derivationFigure
of simplified
2.10:rate of change
Sample of temperature
general definition.
Integrating (T1) over a volume (V ), we have
Z Z Z
• Likely changes for SWHS @T of the initial input data”
rqdVinclude
+ gdV“the
= format
⇢C dV.
and the “algorithm used V for theV ODE solver.”
V @t The likely changes are the
Applying Gauss’s
basis Divergence
on which theorem toare
the modules thedefined.
first term over the surface S of the volume,
with q as the thermal flux vector for the surface, and n̂ is a unit outward normal for the
• One straightforward module is the Input Format Module (M2). This
surface,
module hides the Zformat ofZ the input, Z as
@T discussed generically in
Section 2.2.5. It knows q · n̂dS
the + gdV
structure = of ⇢C input
the dV. file, so that no other
(1)
S V V @t
module
We consider needs tovolume.
an arbitrary know Thethisvolumetric
information. The service
heat generation that the
is assumed InputThen
constant. For-
(1) canmat Module
be written as provides is to read the input data and then modify the state
of the Input Parameters Module (M3) Zso that it holds all of the required
@T
information. qin Ain qout Aout + gV = ⇢C dV,
V @t
• Several
where qin , qout , A
ofin ,the
andmodules
Aout are explained
that areindocumented,
GD2. Assumingsuch
⇢, C as
andthe
T are constant Data
Sequence over
the volume, which is true in our case by assumption (A3), (A4),
Structure Module (M8) and the ODE Solver Module (M9), are already (A5), and (A6), we have
available in MATLAB,dTwhich is the selected implementation environ-
⇢CV = qin Ain qout Aout + gV. (2)
ment for SWHS. These dtmodules are still explicitly included in the de-
sign, with a notation that indicates that they will be implemented by
MATLAB. They are included so12 that if the implementation environment
is later changed, the developer will know that they need to provide these
modules.
• The MG shows the traceability matrix between the modules and the SRS
requirements. This traceability increases confidence that the design is
complete because each requirement maps to a module, and each module
maps to at least one requirement.
A Rational Document Driven Design Process for Scientific Software 47
Number IM2
Label Energy balance on PCM to find TP
Input mP , CPS , CPL , hP , AP , tfinal , Tinit , Tmelt
P
, TW (t) from IM1
P
The input is constrained so that Tinit < Tmelt (A13)
Output TP (t), 0 t tfinal , with initial conditions, TW (0) = TP (0) = Tinit (A12),
and TW (t) from IM1, such that the following governing ODE is satisfied.
The specific ODE depends on TP as follows:
8
> dTP = 1S (TW (t) TP (t)) if TP < Tmelt
>
P
< dt ⌧P
dTP
dt
= dTdtP = ⌧1L (TW (t) TP (t)) if TP > Tmelt P
>
> P
:0 if T = T P and 0 < < 1
P melt
P
The temperature remains constant at Tmelt ,
even with the heating (or cool-
ing), until the phase change has occurred for all of the material; that is as
long as 0 < < 1. (from DD4) is determined as part of the heat energy
in the PCM, as given in IM4
tinit
melt , the temperature at which melting begins.
tfinal
melt , the temperature at which melting ends.
Detailed derivation of the energy balance on the PCM during sensible heating
Figure 2.11: Sample instance model.
phase
To find the rate of change of TP , we look at the energy balance on the PCM. The volume
being considered is the volume of the PCM, VP . The derivation that follows is initially for
the solid PCM. The PCM in the tank has mass mP and specific heat capacity CPS . Heat
input from the water to the PCM is qP over area AP . There is no heat flux output. Assuming
2.4 Justification
no internal heat generated (A16), g = 0, the equation for GD2 can be written as:
Part of the justification for theS dT document
P driven approach presented in
m CP = qP AP
this chapter is an appeal to theP value dt of a systematic, rigorous, engineering
Using DD2 for
approach. ThisqP , approach
this equation canbeen
has be written as
successful in other domains, so it stands
to reason that it should be successful for scientific software. The example
of the solar water heating tank provides 18 partial support for this, since the
documentation and code were positively reviewed by a mechanical engineer.
Although only providing anecdotal evidence in support of the documenta-
tion, the reviewer liked the explicit assumptions; the careful description of
names, nomenclature and units; and, the explicit planning for change in the
design. The reviewer thought that the documentation captured knowledge
that would facilitate new project members quickly getting up to speed. The
www.ebook3000.com
48 Software Engineering for Science
reviewer’s main concern was the large amount of documentation for such a rel-
atively simple, and non-safety critical, problem. This concern can be mitigated
by the following observations: i) the solar water heating example was inten-
tionally treated more seriously than the problem perhaps deserves, so that
a relatively small, but still non-trivial example, could be used to illustrate
the methods proposed in this paper; and, ii) if the community recognizes the
value of rational documentation, then tool support will follow to reduce the
documentation burden. This last point is explored further in the Concluding
Remarks (Section 2.5).
Justification by appeals to success in other domains, and by positive com-
ments from a review of SWHS, are not entirely satisfying. Maybe there really
is something about science that makes it different from other domains? The
research work presented below further justifies that this is not the case.
and visibility. Commercial software, for which the development process was
unknown, provided better usability, but did not show as much evidence of ver-
ifiability. With respect to usability, a good CRAN example is mokken [113].
The overall high ranking of R packages stems largely from their use of Rd,
Sweave, R CMD check and the CRAN Repository Policy. The policy and sup-
port tools mean that even a single developer project can be sufficiently well
documented and developed to be used by others. A small research project usu-
ally does not have the resources for an extensive development infrastructure
and process. By enforcing rules for structured development and documenta-
tion, CRAN is able to improve the quality of scientific software.
www.ebook3000.com
50 Software Engineering for Science
process and explicit traceability between theory and code, would have caught
these mistakes. The use of LP was also shown to improve the quality of verifi-
ability, since all the information for the implementation is given, including the
details of the numerical algorithms, solution techniques, assumptions and the
program flow. The understandability of LP is a great benefit for code reading,
which is a key activity for scientists verifying their code [79].
Although some of the problems in the original documentation for the case
study would likely have been found with any effort to redo the documentation,
the rational process builds confidence that the methodology itself improves
quality. The proposed SRS template assisted in systematically developing the
requirements. The template helped in achieving completeness by acting as
a checklist. Since the template was developed following the principle of sep-
aration of concerns, each section could be dealt with individually, and the
details for the document could be developed by refining from goals to in-
stanced models. The proposed template provides guidelines for documenting
the requirements by suggesting an order for filling in the details. This reduces
the chances of missing information. Verification of the documentation involves
checking that every symbol is defined; that every symbol is used at least once;
that every equation either has a derivation, or a citation to its source; that
every general definition, data definition and assumption is used by at least one
other component of the document; and, that every line of code either traces
back to a description of the numerical algorithm (in the LPM), or to a data
definition, or to an instance model, or to an assumption, or to a value from
the auxiliary constants table in the SRS.
In all software projects, there is a danger of the code and documentation
getting out of sync, which seems to have been a problem in the legacy software.
LP, together with a rigorous change management policy, mitigates this danger.
LPM develops the code and design in the same document, while maintaining
traceability between them, and back to the SRS. As changes are proposed,
their impact can be determined and assessed.
www.ebook3000.com
52 Software Engineering for Science
using the software for maintenance purposes, but added back in for reviewers
verifying the mathematical model. The user can specify the “recipe” for their
required documentation using the developed DSL.
Tool support will make the process easier, but practitioners should not
wait. The document driven methods as presented here are feasible today and
should be employed now to facilitate high quality scientific software. If an
approach such as that described in this paper becomes standard, then the work
load will be reduced over time as documentation is reused and as practitioners
become familiar with the templates, rules, and guidelines.
Chapter 3
Making Scientific Software Easier to
Understand, Test, and Communicate
through Software Engineering
Matthew Patrick
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.2 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.3 Challenges Faced by the Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.3.1 Intuitive Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.3.2 Automating Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.3.3 Legacy Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.4 Iterative Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.4.1 The Basic SEIR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.4.2 Experimental Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.4.3 Initial Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.4.3.1 Sanity Checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.4.3.2 Metamorphic Relations . . . . . . . . . . . . . . . . . . . . 70
3.4.3.3 Mathematical Derivations . . . . . . . . . . . . . . . . . 71
3.4.4 Exploring and Refining the Hypotheses . . . . . . . . . . . . . . . . . 71
3.4.4.1 Complexities of the Model . . . . . . . . . . . . . . . . . 72
3.4.4.2 Complexities of the Implementation . . . . . . . 73
3.4.4.3 Issues Related to Numerical Precision . . . . . 74
3.4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.5 Testing Stochastic Software Using Pseudo-Oracles . . . . . . . . . . . . . . 77
3.5.1 The Huánglóngbìng SECI Model . . . . . . . . . . . . . . . . . . . . . . . . 78
3.5.2 Searching for Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.5.3 Experimental Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.5.4 Differences Discovered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.5.5 Comparison with Random Testing . . . . . . . . . . . . . . . . . . . . . . 86
3.5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.7 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
53
www.ebook3000.com
54 Software Engineering for Science
3.1 Introduction
When a piece of insulation foam broke off and hit the left wing of the
Space Shuttle Columbia, NASA (The National Aeronautics and Space Ad-
ministration) consulted a computational model to determine the extent of the
damage [150]. Although the model predicted debris had penetrated the left
wing, NASA decided to ignore this conclusion because their previous expe-
riences led them to believe the model was over-conservative. On the 1st of
February 2003, the Space Shuttle Columbia disintegrated as it made its de-
scent, resulting in the deaths of all seven of its crew members. It is of course
vitally important scientific software is correct. However, in this example the
model gave the right prediction. It is therefore equally important that scien-
tists understand and have faith in the software they are using and developing.
This chapter focuses on the application of software engineering techniques to
make scientific software easier to understand, test and communicate.
Scientific researchers are increasingly turning to computational resources
to help them with their research. For example, computational models are used
to inform key decisions on a wide variety of topics, ranging from finance [120]
and health-care [133] through epidemiology [141] and conflict [126], as well
as many other important areas. The application and development of scientific
software is increasing and more than half of scientists say they develop more
software now than they did 10 years ago [129]. In a recent survey [135], 70%
of biological, mathematical and physical science researchers said they develop
software as part of their job and 80% claimed it would not be possible to
conduct their work without such software. Since scientific software is used
and developed to answer important research questions, it is crucial that it
performs correctly and the people using it understand how it works.
Scientific software can be challenging to understand because of its high
levels of essential complexity and accidental complexity [144]. Essential com-
plexity arises due to the need to represent biological, chemical, or physical
real-world systems. For example, in epidemiology, computational models are
used to describe the stochastic behavior of biological systems over a range
of spatiotemporal scales. These models frequently involve complex interac-
tions between species, with nonlinear dynamics. With these models, scientists
expect to perform Bayesian and other forms of complex statistical inferences.
Accidental complexity arises because of the disconnect between the so-
phistication and transparency of the models and the computational methods
used to implement them. Programming language syntax can be restrictive
and performance optimizations are introduced to handle the large volumes
of data and mathematical calculations. The scientists using and developing
this software are typically trained in their own field of research rather than
in software engineering, so it is not surprising they sometimes have difficulty
understanding the way the software behaves.
Making Scientific Software Easier to Understand, Test, and Communicate 55
Software testing techniques are used to find faults. They help testers un-
derstand how the software operates and why it behaves the way it does, so
they can check it is working correctly. More than half of scientists admit they
do not have a good understanding of software testing techniques [142] and one
group retracted five papers from top level journals, such as Science, because
they later discovered their software had a fault which inverted the protein
crystal structures they were investigating [142]. Similarly, nine packages for
seismic data processing were found to produce significantly different results
due to problems such as off-by-one errors [134]. The predictions made from the
packages would have led people using them to come to different conclusions,
potentially leading to $20 million oil wells being drilled in the wrong place. We
need more effective and easy to apply testing techniques for scientific software.
Another area in which scientific software development needs to improve
is communication. Researchers often report difficulties in using code someone
else has written, partly because the software is necessarily complex, but also
because of the way it is written. Scientific software typically includes undoc-
umented assumptions and it is unclear why these were made or how they
impact the results. This increases the risk of undetected errors that could
compromise the inferences made from the models. The lack of transparency
also makes it difficult to reproduce results and constrains the use of models
by other researchers. Communication should be made an essential part of the
development process because testing and understanding are easier for other
people when the motivations behind these assumptions are explained.
Scientific software presents many unique software engineering challenges,
both technical and sociological. The errors that scientific software produces
are not always the result of programming mistakes. There may be additional
problems, for example due to the way in which experimental data is collected
and the choice of modeling approximation used [129]. However, there is a con-
siderable divide between the techniques known by the software engineering
community and the ways in which scientists actually test their software [138].
Scientists typically determine whether or not their code is working by look-
ing at the output and checking it matches what they expect. However, this
approach is likely to miss important errors, since the output may appear rea-
sonable and still be incorrect. A more rigorous approach is required.
This chapter investigates the challenges involved in making scientific soft-
ware easier to understand, test, and communicate by exploring three case stud-
ies of scientific software development in different research groups. We look at
the things these researchers are doing well and the things they are struggling
with, so we can understand how to help. We then present two new techniques:
Iterative hypothesis testing helps scientists to understand how and why their
software behaves the way it does; and search-based pseudo-oracles make it
possible to identify differences between highly complex stochastic implemen-
tations. These techniques enable scientists to communicate their software, by
separating the model and hypotheses from the program code, as well as rep-
resenting the reasoning behind each implementation assumption.
www.ebook3000.com
56 Software Engineering for Science
www.ebook3000.com
58 Software Engineering for Science
C 2
C++ 7
FORTRAN 1
JavaScript 2
Linux shell scripts 5
Maple 1
MATLAB 7
Perl 4
Python 6
R 7
Windows batch scripts 2
Since some of the techniques involve specific terminology, we also gave the
researchers additional explanations of the techniques in non-technical terms.
In analyzing the results of this study, it is important to take into ac-
count the languages in which the scientists are writing software. For example,
the challenges in testing scripts and compiled software are often considerably
different. Other research attempts have been conducted to investigate soft-
ware engineering practices in scientific research [129] [135], but our study is
unique in considering the impact of the programming languages used within
the groups (see Figure 3.2). Although half of the members program in C++,
scripting languages such as MATLAB (Matrix Laboratory) and R are used
by the same number of researchers. C++ is popular for back-end programs
because of its efficiency, but scripting languages are used for more everyday
usage, since they are easy to program and have useful libraries for mathemat-
ical modelling. In the Bioinformatics group, scripting languages such as R,
Python and Perl are used for everyday tasks, whereas the back-end is written
in C. The large number of languages used within these groups, the wide vari-
ety of tasks they are applied to and the diverse range of programming abilities
all pose challenges to ensuring software is correct.
Figure 3.3 shows the software engineering techniques researchers from the
Epidemiology & Modelling and Theoretical & Computational Epidemiology
groups have heard of and use. Some of the techniques were used by all the
researchers (e.g. manual testing) and some were used by none (e.g. coverage
metrics). The techniques cover topics in black box testing, white box testing
and code clarity. We interviewed group members individually after the survey
to learn more about the reasons as to why they used particular techniques.
www.ebook3000.com
60 Software Engineering for Science
Manual testing 12
Automated unit tests 3 6
Assertions 10
Statistical comparisons 5 9
Coverage 0 5
Modularisation 11
Descriptive names 11
Comments 10 11
Boundary testing 3 5 Use technique
Partition testing 1 3 Know of technique
Researchers typically test the software they develop by running its code
with certain inputs (that are carefully chosen by hand) and then visually
checking that the outputs appear to be correct. This process is reflected in
our study by the popularity of manual testing. All 12 participants use manual
testing in their work. The outputs are checked manually in an ad-hoc way,
either by taking advantage of the researchers’ own intuitive understanding of
the expected values of the results, or by comparing the outputs to previous
results by other researchers (e.g. from published research).
The respondents have found manual testing to be useful in their work, but
it is not ideal and they were interested in using more sophisticated testing
techniques. However, the researchers were worried these techniques might be
time consuming to learn. More sophisticated techniques such as automated
unit tests, coverage metrics and boundary/partition testing are seldom used by
the researchers in these groups. Many respondents claim to be using assertions,
but interviews later revealed that they were using if-statements to check the
values of their variables. Systematic techniques may be less popular than
manual testing because they require software engineering knowledge group
members do not have, but they are also more difficult to use because they
need a more formal representation of the particular software testing problem.
www.ebook3000.com
62 Software Engineering for Science
Boundary testing and partition testing [136] are widely used techniques
within the field of software engineering. They help testers to produce test suites
that thoroughly cover the input domain, whilst at the same time they decrease
the number of tests that need to be executed. Very few of the researchers
we surveyed used these techniques, or were even aware of their existence.
Upon explaining the techniques to one researcher, the respondent remarked
that they could not see how these techniques could be used in their work.
However, upon digging deeper, we found that the scientists are already using
techniques similar to boundary and partition testing - they are just not aware
of the software engineering terminology.
Group members typically look for strange or unexpected results in the
output of their software (boundary testing), such as values that are too high
or frequencies that do not add up to one. This is generally performed in an
ad-hoc way and could potentially gain from being transformed into a more
standardized process. Members of the Epidemiology & Modelling group and
Theoretical & Computational Epidemiology group also make use of a concept
known as the basic reproduction number (R0) [124]. If R0 is greater than one,
the disease can be expected to spread, otherwise the disease can be expected to
die out. Members typically of these groups take advantage of this information
when manually testing their software, considering both cases in their tests.
This is essentially a form of partition testing and it might be made more
rigorous by using results from software engineering research in this area.
In addition to testing the software, it is also important to make sure the
data is correct. The Epidemiology & Modelling and Theoretical & Compu-
tational Epidemiology groups clean the data before they use it by checking
the quantities and distributions of host are correct and making sure there are
no hosts in unexpected locations. Members of the bioinformatics group use
data cleaning techniques too, but they also work more closely with their data
providers (wet-lab scientists) to design and conduct experiments. This pro-
vides an opportunity to ensure data correctness at multiple stages of research.
They can assist in preparing appropriate block designs and use statistical anal-
yses to check the data that is produced. It is advantageous (where possible)
for computational scientists to be involved in the data generation process.
because it is necessary to learn a new approach in order to use them and the
researchers did not know where to start in creating the automated tests.
The biggest challenge scientific researchers face in creating automating
tests is in describing how the software should behave in a formalized way.
Although scientists may know what to expect when they manually check the
outputs of their software, transforming this into a set of test cases is often
non-trivial. They are able to identify potentially incorrect results intuitively,
but can struggle to describe what the outputs should be before the software is
run. It is inevitable that manual testing will remain an important part of sci-
entific software engineering, as scientific research is inherently an exploratory
process. Yet, any techniques that allow us to represent the scientists’ intuition
as automated unit tests would be valuable, as they could help scientific sets
to test their software more rigorously and systematically.
Since it is difficult for scientific researchers to know what the test cases
should be before they start to execute their software, it is important to make
the process of test suite generation iterative. As more information becomes
available through scientific research, the test suite can be incrementally ex-
panded to make it more rigorous. Initially, unit tests can be created of the
basic functionality of the software, but later tests can be made more sophis-
ticated to test the overall behavior of the software. This approach has some
similarities with regression testing [121] which compares the output of soft-
ware with previous results, except instead of checking whether the software
behavior has changed, the aim would be to determine whether the test suite
needs to be improved (more details of this approach are provided in Section
3.4).
One way to formalize the intuition of scientific researchers is with asser-
tions, which can be used to check the state of the software at various points
during its execution. Some group members already use assertions in conjunc-
tion with an automated test framework, but the majority check for errors
by inserting ‘if’ statements into their code and printing an error message on
the screen if an unexpected value occurs. This approach is not ideal, since
it is possible for these warnings to be missed amongst the other text that
is produced by the program. This approach is also difficult to automate and
the error information is typically not recorded in a structured way. A better
approach is to integrate the assertions into unit tests, so that a clear record is
produced of any errors that gave occurred and the test cases that were used to
find them. This makes it easier to find the locations of faults in the software
and the information can be used to inform the future refinement of test cases.
Even if automated tests are used, there is still the issue of knowing whether
the software has been tested sufficiently. To address this problem, structural
coverage metrics [121] have been developed to assess the quality of the test
suite and its ability to identify potential failures. So far, no-one in the group
has used coverage metrics to evaluate their test cases. One researcher pointed
out a limitation of coverage metrics in that, even if each part of the code
is covered by the test suite, the tests might still not be able to identify any
www.ebook3000.com
64 Software Engineering for Science
faults. Test suites are only as useful as the oracle used to check the outputs.
However, coverage metrics can still help to encourage more rigorous tests.
Beyond simple control coverage, we might consider whether more sophis-
ticated metrics, such as data flow or mutation analysis are useful [121]. It
may also be possible to devise new metrics for scientific research, that con-
sider how thoroughly we have tested our answers to the research questions.
However, since not even the structural coverage metrics are used often, any
new metrics should be introduced gradually. There is likely to be a trade-off
between the sophistication of the technique and the number of people who
are willing to learn how to use it. Nevertheless, by creating techniques that fit
well with the groups’ (scientific) ways of working, it may be possible to devise
new metrics that are both easy for the users to apply and effective.
www.ebook3000.com
66 Software Engineering for Science
3.3.4 Summary
Most of the researchers we surveyed test their software manually by check-
ing if the output matches their expectation. Part of the reason why automated
testing is underused is because traditional techniques are not well suited for
scientific software. They do not take into account other sources of error (e.g.
data or modelling assumptions) and they are often too technical or abstract
for scientists to apply. Another reason is because these techniques do not
fit well with the paradigm of many researchers. Scientists are typically more
interested in conducting research than thinking about software engineering.
Yet, although scientists are often not aware of advanced software testing tech-
niques, they sometimes use their own intuition to find solutions similar to
those in software engineering. This seems like a good place to start.
the solution and then invert the equation to determine what inputs were nec-
essary to create it. Other approaches suggested by Salari and Knupp [148],
include trend tests (varying the input parameters and checking the overall
pattern), symmetry tests (e.g. changing the order of inputs and checking that
the results are the same) and comparison tests (using pseudo-oracles).
Our iterative hypothesis testing technique repeatedly refines a set of hy-
potheses using random testing to search for discrepancies in the output.
Random testing is a straightforward and inexpensive software testing tech-
nique [125]. It can generate a large number of input values in a short amount
of time, then verify the results using automatic tests of our hypotheses. De-
spite its simplicity, random testing is often an efficient technique [118] and it
can sometimes be more effective than advanced tools for software testing [149].
Our iterative hypothesis testing technique is therefore straightforward to use
and it has the ability to reveal useful properties that help us to improve our
tests. We evaluate our technique by applying it to an epidemiological SEIR
model.
www.ebook3000.com
68 Software Engineering for Science
dS dE dI dR
= −βIS, = βIS − γE, = γE − µI, = µI. (3.1)
dt dt dt dt
H2: The total amount of host should not differ at each time step
Our model assumes a closed population, i.e. there are no births, deaths or
migration. Although host may move from one compartment to another,
it should never leave the model, nor should any new host enter it. We
therefore need to check that the amount of host in the compartments
(S, E, I and R) adds up to the same total at every time step.
www.ebook3000.com
70 Software Engineering for Science
www.ebook3000.com
72 Software Engineering for Science
approach to explore and refine these hypotheses so they can be used to con-
struct a test suite. We use random testing, coupled with boundary analysis,
and scrutinize the results to determine how the hypothesis should be changed.
(a) Peak in E but not in I (H5) (b) AUC is more appropriate (H10)
curves can have the same AUC but yet have different shapes. We therefore
need to use both metrics (AUC and peak host) to characterize the difference.
Random testing is not effective at finding situations under which hypothe-
ses only fail for a specific set of input values. We therefore augmented our
random testing with boundary analysis by setting the input parameters to
zero one at a time. For example in H6, we found that when S = 0, increasing
β has no effect on the peak of E, because there is no host to move into E from
S. Similarly in H8, increasing µ has no effect on the final amount of host in
S when β = 0 because host will never leave S regardless. Our approach to
boundary analysis could miss some important cases because it does not take
the differential equations into account. It might also be possible to apply a
more advanced symbolic approach to analyze the equations directly.
www.ebook3000.com
74 Software Engineering for Science
(a) Jumping over transitions (H11) (b) Problem with asymptote (H15)
host), this occurs when the threshold is around 1 × 10−6 (see Figure 3.8a); for
H2 (the sum of hosts does not differ from the initial total), it occurs when
the threshold is around 2 × 10−8 (see Figure 3.8b).
It can be difficult to set the threshold appropriately in advance, since the
optimum value depends upon the hypothesis being tested. However, we dis-
covered numerical error in H1 is dominated by a stopping condition in the
solver (by default, at no more than 1 × 10−6 from the correct value). The
optimum threshold for H2 appears to be smaller than H1, because adding up
the compartments cancels out some of the error. Yet, in the worst case, these
errors accumulate instead of cancelling out. Error in the mathematical deriva-
tions is dominated by the use of numerical operations whose results cannot be
represented precisely by the data type. Experimentation is required to choose
an appropriate tolerance threshold for the various interacting sources of error,
but a better understanding of these errors helps us test our hypotheses.
There are also some issues due to the way in which the equation for It
in H14 is written. When γ and t are big, and µ is small, the term et(γ−µ)
will evaluate to infinity (as it is outside the range of representable numbers).
Subsequently, when multiplied by e−γt (which evaluates to zero due to the
limits of precision), the result is NaN (not a number) for many time steps.
This problem can be avoided by separating out the e−µt and e−γt terms such
that they go to zero rather than infinity when the exponents become large:
−µt −γt
)+I0 (γ−µ)e−µt
It = E0 γ(e −e γ−µ . A special case must also be constructed for
γ = µ, to avoid problems due to division by zero.
3.4.5 Summary
Iterative hypothesis testing is easy for scientists to use because it is mod-
elled on the scientific method. It found discrepancies in all but one of the
www.ebook3000.com
76 Software Engineering for Science
hypotheses we tested (H4) and the information generated can be used to re-
fine them (see below). In addition to testing the software, iterative hypothesis
testing allows scientists to understand their software more thoroughly and
the resulting tests are helpful for communicating the way in which it works.
However, the refined hypotheses are not guaranteed to be perfect. It may be
possible under some specific input condition that they will still fail.
1. Scientists are familiar with the idea behind iterative hypothesis testing
2. We reinterpret testing this way to make it more accessible to scientists
3. It constructs rigorous tests, even starting with imperfect information
Refined Hypotheses:-
H1: None of the compartments should ever contain a negative
amount of host (allowing for a suitable tolerance threshold)
H2: The total amount of host should not differ at each time step
(allowing for a suitable tolerance threshold)
H3: The amount of susceptible host (S) should never increase
(allowing for a suitable tolerance threshold)
H4: The amount of removed host (R) should never decrease
H5: The exposed host (E) peak should not occur after the infectious
host (I) peak (unless I is monotonically decreasing)
H6: Increasing the infection rate (β) should increase the peak of E
(unless S is equal to zero, allowing for a suitable tolerance threshold)
H7: Increasing the latent rate (γ) should reduce the time until the
peak of I (unless I is monotonically decreasing)
H8: Increasing µ should increase the final amount of host in S
(unless β is equal to zero, allowing for a suitable tolerance threshold)
H9: Increasing β should decrease the final amount of host in S
(unless S or I is equal to zero, allowing for a suitable tolerance threshold)
H10: Increasing the number of susceptible host (S0 ) should increase
the peak of I (unless I is equal to zero and also check AUC)
H11: I should be increasing when γE > µI, otherwise it should be
decreasing (except for points at which the direction of I transitions)
H12: If I = E = 0, the state of the model should not change
(allowing for a suitable tolerance threshold)
H13: Exact analytical solutions are available when γ = 0
(allowing for a suitable tolerance threshold)
H14: Exact analytical solutions are available when β = 0
(rearrange analytical solution and allow suitable tolerance threshold)
Making Scientific Software Easier to Understand, Test, and Communicate 77
www.ebook3000.com
78 Software Engineering for Science
the same way. As we have seen earlier, computational modellers often start
by building software that has already been developed, before extending it for
their new research. This provides an opportunity in that there are often mul-
tiple implementations of a basic model available. By widening the differences
between these implementations using search, we can identify the worst case
disagreements, use statistical analyses to assess whether they are significant
and find software faults more easily amongst the other sources of error.
The rates at which hosts move between compartments in the model depend
upon the choice of parameters (α, β, , γE and γC ) and the number of hosts
already in each compartment (notably, the number of cryptic and infectious
hosts). The HLB model used in this chapter is a continuous-time discrete-host
spatially-explicit stochastic model. The values of γE and γC are fixed, but α,
β, can be adjusted by our search-based optimization technique.
When there are no cryptic or infectious trees, the amount of time before
a given susceptible tree moves to the exposed compartment is exponentially
distributed with rate parameter . This process reflects infectious material
coming from outside the citrus grove, and is termed ‘primary infection’. The
time spent in the exposed and cryptic compartments are also exponentially
distributed, with rate parameters γE and γC respectively. The number of cryp-
tic and infectious trees increases the rate at which susceptible trees become
infected (‘secondary infection’). The rate at which a susceptible host i becomes
infected at time t when there are some number of cryptic and/or infectious
trees is given in Equation 3.2.
β X rji
φi (t) = + k( ), (3.2)
α2 j α
the models are the same, we use the parameters given in M1 [146].
www.ebook3000.com
80 Software Engineering for Science
compartment; for consistency with M2, we set the amount of time to be distributed accord-
ing to an exponential distribution.
Making Scientific Software Easier to Understand, Test, and Communicate 81
“more extreme” than what was actually observed. Since we are comparing
the outputs of two different implementations, the smaller the p-value for a
particular statistic, the more confident we can be that the values of that
statistic differ from one implementation to the other. By taking the minimum
of the p-values for multiple statistics, we aim to make at least one of the p-
values as small as possible. This allows our optimizer to make at least one of
the differences in outputs between implementations as large as possible.
p(peak_hostsc,i , peak_hostsc,j )
minc∈{S,E,C} p(peak_timec,i , peak_timec,j )
f = min
(3.3)
p(AUCc,i , AUCc,j )
p(total_hostsc,i , total_hostsc,j )
www.ebook3000.com
82 Software Engineering for Science
(along with the means and standard deviation), to maximize the likelihood of
previously successful search steps and candidate solutions. We generate time
series t1 and t2 from model implementations M 1 and M 2, then characterize
them using the peak hosts ph, peak time pt and area under the curve auc
statistics, along with the total number of hosts th.
the differences in output we observed. The parameter values used to find these
differences are shown in Table 3.1 and the p-values of the Kolmogorov–Smirnov
test on 100 time series for each difference are shown in Table 3.2.
While M1 and M2 should behave identically in theory, running the same
simulations on both was in practice complicated by each model having dif-
ferent additional features, parameterization, and input/output formats. We
emphasize the differences described below do not invalidate the results or
conclusions of [146] or [123]. Rather, they highlight that two theoretically
www.ebook3000.com
84 Software Engineering for Science
Difference
Statistic
1 2 3 4
AU CS 0.013 5.112 × 10−33 0.961 1.888 × 10−5
AU CE 0.677 5.774 × 10−37 0.193 0.675
AU CC 0.556 1.212 × 10−7 0.443 1.000
P TS 1.000 1.000 1.000 1.000
P TE 0.047 5.113 × 10−33 0.894 0.675
P TC 0.961 1.317 × 10−38 0.677 1.000
P HS 1.000 1.000 1.551 × 10−45 1.000
P HE 0.140 3.963 × 10−16 0.961 0.675
P HC 0.099 5.335 × 10−42 0.677 1.000
T HAll 1.000 1.000 1.000 0.006
identical models can be implemented in slightly different ways, and show the
challenges involved in using two different implementations to simulate the
same system.
1) Start time/age category. Among the features of M1 not in M2 were
the starting time and age categories. The rate at which trees move from the
Exposed compartment to the Cryptic compartment depends on their age (i.e.
γE is different). The time at which the epidemic starts is given by t0 ; before
t0 , is effectively 0, so none of the trees become infected. We presumed the
starting time was not relevant for our purposes, so we set t0 = 0. However, we
discovered that as well as controlling the starting time, t0 also affects the age
category, thus impacting the rate at which HLB spreads.
Age categories do not exist in M2, so M1 and M2 had different values for
γE until trees were old enough to change category. The differences can be seen
in Figure 3.10a and 3.10b. Notably, the Exposed curve reaches its peak earlier
for M1 than M2. This is because younger trees move from Exposed to Cryptic
more quickly. It also affects the time the cryptic curve reaches its peak and
feeds back through secondary infection to the Susceptible compartment. The
difference was picked up4 by the Kolmogorov–Smirnov test (see Table 3.2)
in the time of the exposed peak (P TE ) and area under the susceptible curve
(AU CS ). We resolved this issue by setting t0 = 1.
2) Distribution of time spent in Cryptic compartment. In M2, the time
spent in the cryptic compartment is exponentially distributed (as is common
for disease models), but we discovered it is gamma distributed in M1. Even
when M1 and M2 had identical means, the distributions were different, so the
time series were not the same (see Figure 3.10c and 3.10d). The most obvious
difference is that in M1, the cryptic curve rises further ahead of the Infectious
4 Significance is determined at the 95% confidence interval.
Making Scientific Software Easier to Understand, Test, and Communicate 85
curve than in M2. Our technique made this difference more prominent by
maximizing β (see Table 3.1).
This was the largest difference we observed between M1 and M2. The
Kolmogorov–Smirnov test (see Table 3.2) identified a significant4 difference in
all of the statistics, apart from the susceptible peak and the total hosts, which
should not change under any circumstances. We removed this difference by
modifying the code of M1 to use an exponential distribution.
3) Initial number of Susceptibles. We started all the simulations with 2000
Susceptible hosts, so the peak of the susceptible curve should not change.
However, the Kolmogorov–Smirnov test (see Table 3.2) showed this value was
significantly4 different between M1 and M2. Upon closer inspection, we found
the initial number of Susceptible hosts in M1 was 1999. This was caused by a
mistake in the mechanism we created to make the output formats of M1 and
M2 the same. The state after the first transition from susceptible to exposed
was being copied back, instead of the state before being used.
4) Modified output/truncation. For extremely low values of (see Table
3.1), there are sometimes simulations in which no infections occur. In the
modified output for M1, these resulted in a single-line output comprising only
the initial state (with all hosts susceptible). It was expected to contain the
state at subsequent times as well, but these were not produced. This difference
was found by the Kolmogorov–Smirnov test (see Table 3.2) in the area under
the susceptible curve when was equal to 5.124 × 10−6 .
www.ebook3000.com
86 Software Engineering for Science
Table 3.3: Comparison of Areas under p-Value Progress Curves for the
Search-Based Technique and Random Testing (significant results are in bold)
Statistic K-S Distance p-Value Effect Size
AUC for Susceptible (AU CS ) 0.22 0.1786 0.130
AUC Exposed (AU CE ) 0.16 0.5487 0.017
AUC for Cryptic (AU CC ) 0.36 0.002835 0.391
Peak Hosts for Susceptible (P HS ) 0 1 NA
Peak Hosts for Exposed (P HE ) 0.26 0.06779 0.211
Peak Hosts for Cryptic (P HC ) 0.22 0.1786 0.130
Peak Time for Susceptible (P TS ) 0 1 NA
Peak Time for Exposed (P TE ) 0.34 0.005842 0.357
Peak Time for Cryptic (P TC ) 0.32 0.01151 0.321
Total Hosts for All (T HAll ) 0.36 0.003068 0.388
Table 3.4: Comparison of p-Values Achieved after 1 Hour for the Search-
Based Technique and Random Testing (significant results are in bold)
Median Standard Deviation Wilcoxon Test
Statistic
Optimized Random Optimized Random p-Value Effect Size
AU CS 0.0030 0.0030 0.0041 0.0079 0.2176 0.1755
AU CE 0.0014 0.0024 0.0037 0.0051 0.5603 0.0834
AU CC 0.0030 0.0050 0.0045 0.0046 0.1354 0.2120
P HS 1 1 0 0 NA NA
P HE 0.0082 0.0082 0.0101 0.0103 0.1437 0.2077
P HC 0.0050 0.0082 0.0073 0.00854 0.01275 0.3491
P TS 1 1 0 0 NA NA
P TE 0.0030 0.0050 0.0043 0.0037 0.2592 0.1607
P TC 0.0050 0.0050 0.0057 0.0087 0.07965 0.2482
T HAll 0.7942 0.5560 0.2919 0.2551 0.002483 0.4201
Making Scientific Software Easier to Understand, Test, and Communicate 87
and Exposed (peak hosts and peak time) compartments, in addition to the
total hosts statistic. The effect sizes5 are considered to be small to medium,
since they are mostly between 0.2 and 0.5 [152]. We also applied a Wilcoxon
signed rank test (see Table 3.4) to the final p-values achieved by each tech-
nique. Only the peak hosts in the cryptic compartment and the total hosts
are significantly different4 ; the effect sizes5 for these tests are also not large.
3.5.6 Summary
Stochastic software is difficult to test because the distribution of output
values may overlap even when the programs are different. We can use search-
based optimization and statistical (Kolmogorov–Smirnov) tests to find input
parameters for which the outputs of the implementations differ the most.
Using this technique, we were able to identify and remove four previously
unknown differences related to the way in which the implementations were
used by a third party that could have a significant impact on the results. The
technique presented in this section makes the causes of the differences more
readily identified, increases the speed at which software may be tested and
allows differences to be observed that might otherwise have been overlooked.
3.6 Conclusions
In this chapter, we explored three case studies of scientific software de-
velopment in different research groups in the Department of Plant Sciences
in the University of Cambridge (the Epidemiology and Modelling group, the
Theoretical and Computational Epidemiology group and the Bioinformatics
group). We then presented two new techniques (iterative hypothesis testing
and search-based pseudo-oracles) that help to make it easier for these re-
searchers to understand, test and communicate their scientific software.
Scientific software is difficult to test because it is not always clear what the
correct outputs should be; scientists’ perceptions change as they are exploring
the results. There are also challenges from big data, stochastic processes and
mathematical assumptions. Yet the results are important, as (for the groups
we investigated) they are used to inform government policy decisions and act
q
5 1 1
Two-sample Cohen’s d effect sizes calculated using d = Z n1
+ n2
.
www.ebook3000.com
88 Software Engineering for Science
as a starting point for future research. We need to make sure, not only that
the software is correct, but that the researchers fully understand how it works
and communicate this information to other people who will use the software.
The group members we surveyed were not trained in software engineering
techniques and they tended not to test their programs methodically. Part of
the reason is because they could not see how traditional software engineering
techniques applied to their work and they were worried it would take too
much time away from their research. However, some group members were
using their intuition to discover for themselves techniques which are similar
to those recommended in software engineering. It therefore makes sense to
engage these researchers from their perspective. This is why we introduced
two techniques that correspond well with the scientific research method.
Iterative hypothesis testing allows researchers to understand the behavior
of their software in more detail. It does this by automatically finding situ-
ations that challenge their initial perceptions. It found discrepancies in all
but one of our initial hypotheses and the information generated was used to
produce a refined set that reflects a new understanding of the software. Our
search-based pseudo-oracle technique allows researchers to identify differences
between highly complex implementations, that are difficult to tell apart due
to their stochastic nature. This makes it possible to understand intricate de-
tails that might otherwise be overlooked. Using this technique, we identified
and removed four differences related to the way in which the implementations
were used by a third party that could significantly affect the results.
An increasing number of researchers are using and developing scientific
software because of its benefits in scale, speed and economy. The majority of
this software will be developed by scientists rather than software engineers, so
it is important they are equipped with tools suitable for them. The techniques
described in this chapter are applicable to a wide range of scientific software
and computational models. They work in a way scientists understand.
3.7 Acknowledgments
This work was supported by the University of Cambridge/ Wellcome Trust
Junior Interdisciplinary Fellowship “Making scientific software easier to under-
stand, test and communicate through modern advances in software engineer-
ing.” We thank Andrew Craig, Richard Stutt, James Elderfield, Nik Cunniffe,
Matthew Parry, Andrew Rice and Chris Gilligan for their helpful advice and
useful discussions.
This work is based on an earlier work: Software Testing in a
Scientific Research Group, in Proc. ACM Symposium on Applied
Computing, 2016 [in press]
Chapter 4
Testing of Scientific Software:
Impacts on Research Credibility,
Development Productivity,
Maturation, and Sustainability
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.2 Testing Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.2.1 Granularity of Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.2.2 Types of Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.2.3 Organization of Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.2.4 Test Analysis Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.3 Stakeholders and Team Roles for CSE Software Testing . . . . . . . . 95
4.3.1 Stakeholders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.3.2 Key Roles in Effective Testing . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.3.3 Caveats and Pitfalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.4 Roles of Automated Software Testing in CSE Software . . . . . . . . . 98
4.4.1 Role of Testing in Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.4.2 Role of Testing in Development Productivity . . . . . . . . . . . 99
4.4.3 Role of Testing in Software Maturity and Sustainability 101
4.5 Challenges in Testing Specific to CSE . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.5.1 Floating-Point Issues and Their Impact on Testing . . . . . 103
4.5.2 Scalability Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.5.3 Model Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.6 Testing Practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.6.1 Building a Test Suite for CSE Codes . . . . . . . . . . . . . . . . . . . 110
4.6.2 Evaluation and Maintenance of a Test Suite . . . . . . . . . . . . 112
4.6.3 An Example of a Test Suite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.6.4 Use of Test Harnesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
4.6.5 Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
4.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.8 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
89
www.ebook3000.com
90 Software Engineering for Science
4.1 Introduction
High quality scientific software has always required extensive testing to
be a reliable tool for scientists. Good testing practices have become even
more important today, since software projects are facing significant refactoring
efforts due to increased demands for multiphysics and multiscale coupling,
as well as significant computer architecture changes, especially with a view
towards exascale computing. Regular, extensive testing of software is very
important for many reasons, because it:
• Promotes high-quality software that delivers correct results and im-
proves confidence in the software;
• Increases the quality and speed of development, thereby reducing devel-
opment and maintenance costs;
• Maintains portability to a wide variety of (ever changing) systems and
compilers; and
• Facilitates refactoring and the addition of new features into library code
without unknowingly introducing new errors, or reintroducing old errors
research communities; see Section 4.3.1 and [178]. This chapter not only in-
vestigates the multitude of reasons why software testing is so important but
also provides some information on how to perform it (and provides references
to more detailed sources of information). While much of the information can
also be applied to general scientific software, there is particular focus on soft-
ware challenges pertaining to mathematical and physical models, which are
part of many scientific and engineering codes. The chapter does not address
all software challenges for all CSE domains. A large amount of its content is
based on the authors’ experiences.
Often research teams are not aware of tools that facilitate better testing
or do not have the software skills to use them accurately or efficiently, and
gaining the knowledge and skills requires a non-trivial investment. In some
cases they might not even have access to computing resources that would al-
low them to perform regular automated testing. Consequently, code testing
requirements and practices vary greatly among projects, going from none to
excellent. To some extent they depend upon the size and visibility of the soft-
ware. Among the worst offenders are codes developed by individuals or very
small teams in science domains where the only users are internal. The best
managed and tested codes are the ones that have regular public releases and
are therefore subject to external scrutiny. Even within well tested codes there
is some variability in the types (see Section 4.2) and extent of testing. Many
scientific codes are under constant development. Some of the development re-
lates to new features or capabilities, and some relates to bug-fixing. Specific
to scientific codes, a lot of development also relates to improvements in algo-
rithms and features because of new research findings. A desirable testing cycle
for large multiphysics projects, which have the most stringent requirements,
would have the elements described in this section. Testing for many other
projects will be some proper subset of these practices. Ongoing maintenance
of CSE codes relies upon regular, preferably daily, automated building and
testing. Here, the test suites aim to provide comprehensive coverage for exist-
ing features. The developers monitor the outcome and strive to fix any failures
promptly. The same testing regime is applicable to minor feature additions or
modifications, and bug fixes. With a bug report after release, it may be nec-
essary to modify some of the tests. Larger feature additions or modifications
may involve the addition of new tests, or tweaking already existing tests (see
Section 4.6.1). Prior to a public release codes should undergo comprehensive
testing which includes regression tests and possibly acceptance tests required
by the users.
New capability development in CSE codes has its own testing require-
ments. This is where the research component of software becomes the domi-
nant determinant of the kind of tests. Some of these tests do not fall under any
of the general categories applicable to other software. For example numerical
algorithms have to be verified for validity of range, stability and numerical ac-
curacy. They also have to be tested for the order of convergence. All of these
tests are applied along with the general verification tests as the algorithm is
www.ebook3000.com
92 Software Engineering for Science
classes. They are usually written by code developers before or during code de-
velopment to detect faults quickly and prevent faults from beinig introduced.
By definition, unit tests must build fast, run fast, and localize errors. An ex-
ample of a unit-level test in a CSE code would be a test for a quadrature
rule over a few basic element geometries. This test would pass in points from
various known functions and verify whether the results are correct, e.g., a 3rd
order quadrature rule should exactly integrate a cubic polynomial.
Integration tests are focused on testing the interaction of larger pieces
of software but not at the full system level. Integration tests typically test
several objects from different types of classes together and typically do not
build and run as fast, or localize errors as well as unit tests. An example of
an integration-level test in CSE software would be an element calculation for
a specific set of physics over a single element or a small number of elements.
System-level tests are focused on testing the full software system at the
user interaction level. For example, a system-level test of a CFD code would
involve passing in complete input files and running the full simulation code,
and then checking the output and final solutions by some criteria. System-
level tests on their own are typically not considered a sufficient foundation
to effectively and efficiently drive code development and code refactoring ef-
forts. There are different hierarchies of system-level tests, for example, they
could be testing complete single-physics, possibly varying constitutive or sub-
scale models, which would be on a lower level than various combinations and
scenarios of the complete coupled physics code.
www.ebook3000.com
94 Software Engineering for Science
4.3.1 Stakeholders
High quality CSE software testing is crucial for many parties. First of all,
domain scientists need correctly operating simulation and analysis tools that
can be used to produce reliable scientific results. These results will in turn af-
fect other scientists who use them as a foundation for their research. If software
produces wrong results unknown to the scientists and software developers due
to insufficient testing, whole research communities can be negatively affected.
www.ebook3000.com
96 Software Engineering for Science
scoping the target and the design of the test, and are often the best choice
for implementing the test. The implementer of the test suite can be distinct
from the implementer of tests, with primary responsibility being compilation
of tests into the test suite. An effective testing process still needs a manager
with an overall vision of the testing goals and motivations. This is especially
true where interoperability of components and composability of the software
are important. A test suite implementer can often be an effective manager of
the testing process for the larger teams with some resources to devote to the
software process.
In the context of CSE software, an orthogonal categorization of roles is
necessary to fully understand the process: verification and validation tester,
performance tester, maintenance tester, and finally, manager. This classifi-
cation offers a closer reflection of expertise division within the team. Thus
verification and validation testers are experts in modeling and numerics with
a responsibility for developing and testing the mathematical aspects of the
software. A performance tester has the knowledge of the target platform and
general high performance computing. A maintenance tester is responsible for
the test suite, including running, optimization of resource use, monitoring the
state and general health of the test suite. A manager has the overall responsi-
bility for testing which includes coverage, policies, quality of verification and
validation, targeted testing for production campaigns etc.
For a successful testing regime it is important for all team members to
understand their CSE specific roles defined earlier. Every verification and val-
idation tester will fulfill the roles of designer and analyst with a partial role of
tester thrown in. This is because of the research aspects of the software where
expertise is tied to the model and the algorithm. Design, scoping and analysis
of verification, test results (which include convergence, stability and accuracy
tests), all demand expertise in the physical model being implemented. A per-
formance tester covers the roles of analyst and tester where the goal is to
define the performance targets and ensure that they are met. The remainder
of the tester role is fulfilled by the maintenance tester. This role is of interest
to everyone associated with the software development effort, all code develop-
ers and users, including external ones, if there are any. The manager’s role is
similar in both definitions, that of ensuring the quality of testing and coverage,
and determining policy and process. This role is important for all developers
and users of the code to ensure quality software.
www.ebook3000.com
98 Software Engineering for Science
like a small team should prioritize giving the roles of designer and analyst
to its members, but in fact one member taking on the role of the manager
is likely to be much more critical to success. This is because in the absence
of overall testing strategy and goals, it is easy to overlook some components.
With someone taking the managerial responsibility, it is much more likely that
testing itself does not get shortchanged in face of other pressures.
In CSE software teams, because interdisciplinary interactions are common,
the distribution of roles is much more challenging. The human tendency to
blame the “other” side can be particularly difficult to overcome when there are
code components that come from different disciplines. It is an entirely feasible
scenario that each component has undergone its own testing to the satisfaction
of the corresponding team, but the test design does not cover some interactions
that come into play only when combined with another component. In such a
situation, if there is no trust between the designers and analysts, the results
could be unhealthy blame apportioning.
software provides are more valuable than the software itself. Some research
software is written from scratch for a specific scientific study and then no
longer used. Therefore, one often is less concerned about the long-term main-
tainability of short-lived research software compared to longer-lived software.
Tests are very important even in software that is only used by researchers. De-
fects in research software can be just as damaging as in production software.
While a defect in production software used in a business environment may
cost a company money and result in damage to the company’s reputation,
a defect in research software can damage and retard scientific advancements
in major ways. Several examples of this can be found in the literature. One
example is the case of the protein folding code [178] already mentioned in
Section 4.3. Another example is the claim of the successful simulation of Cold
Fusion [182]. Therefore, the most important role that automated testing plays
in research software is the reduction of defects that damage the integrity of
the research and science itself.
Another important usage of automated tests for research software is the
demonstration of numerical properties that are claimed in research publica-
tions. For example, if some new numerical method claims to be second-order
convergent, then the research code should have tests that demonstrate second-
order convergence on a few manufactured problems. Such tests help to reduce
defects as well as test the numerical analysis presented in the paper. If the
researcher’s proof of second-order convergence is false and the method is ac-
tually only first-order convergent, it will be nearly impossible to construct
exhaustive tests that show second-order convergence. This should cause the
researcher to go back to the proof and discover the mistake. Note that such
tests can also be focused on finer-grained sub-algorithms whose numerical
properties can be demonstrated and checked as well.
Another role of testing in research software is to help provide a founda-
tion for other researchers to continue the work. By having tests in place that
demonstrate and protect the major behaviors and results of the research code,
other researchers can tinker with the implemented algorithms and methods
and have confidence that they have not damaged the existing algorithms.
Thus, other researchers can build on existing research in a way that is not
currently well supported in the CSE community.
Finally, a critical role that testing plays in research software is that it
aids in the reproducibility of the numerical results of the research. For some
classes of algorithms and methods, it is possible for the automated test suite
to actually produce the results reported in publications. In other cases where
the results require special hardware and are very expensive to run, automated
tests can test a “dumbed down” version that, with a simple change to the
inputs, can reproduce the full expensive calculations. Of course testing alone
is not sufficient to achieve reproducibility, but one can argue that it is a
necessary component of reproducibility.
www.ebook3000.com
100 Software Engineering for Science
test how robust the element-level calculations are when faced with these ill-
conditioned corner cases. Detecting such problems in a full simulation in a
portable way is extremely difficult. Creating entire meshes that expose all of
the various types of ill-conditioned elements is very hard and isolating the
impact of these elements is equally difficult in system-level tests. Codes that
have only system-level tests based on necessarily looser tolerances tend to
allow subtle defects to creep into the code.
Finer-grained tests force more modular design (leading to better
reuse and parallel development). That is because the process of getting
a single class or a small number of classes or functions into a unit-test harness
requires breaking entangling dependencies that often exist in software that
is not unit tested. The benefits of modular design are numerous. The more
modular classes and functions are, the easier it is for different developers
to work on different parts of the system at the same time without stepping
on each other’s toes (i.e., merge and semantic conflicts upon integrating the
different branches of development). Also, more modular code allows for easier
reuse of that code in other contexts, even outside the original application
code. This further accelerates productivity because it is often easier to reuse
an existing piece of (working) software than to write it from scratch yourself
(which is likely to be a buggy implementation).
Finer-grained tests make porting to new platforms much more
efficient with lower risk and help to isolate problems with third-
party libraries (TPLs) on new platforms. Here porting might involve
new compilers, new processor classes, new TPL versions and implementations,
etc. For example, if a large CSE code only has system-level tests, what do
you do when you port to a new platform and some of the solves just fail,
the solvers diverge, or the code produces radically different, possibly wrong,
answers? Without finer-grained tests, one is left with a difficult and time
consuming debugging effort over thousands (to hundreds of thousands) of
lines of code to try to determine the cause of the problems. In contrast, with
good finer-grained unit-level and integration-level tests in place, the selective
failure of the finer-grained tests will pinpoint the sections of code that cause
the problem.
With respect to software written for users, automated tests help in the
communication with the users. For example, users can inject their require-
ments directly into the development process by helping to develop acceptance
tests, even before the software is written. The process of writing acceptance
tests first and then implementing the changes or additions in the software
to pass the acceptance tests is known as acceptance test driven development
(ATDD) [169].
www.ebook3000.com
102 Software Engineering for Science
did not adequately maintain it over subsequent changes. When such documen-
tation gets too far out of sync with the code itself, it can do more harm than
good. Therefore, it may be better to invest in a clean well purposed automated
test suite than in design and implementation documentation that is not well
maintained.
www.ebook3000.com
104 Software Engineering for Science
While the above issues can result in different floating-point results, they
all generally result in deterministic floating-point numbers. That is, if the
CSE software and tests are well written (e.g., use the same seed for random
number generators on each test run), then the code will produce the exact
same binary result each time it is run. Determinism makes it easier to define
test suites and to debug problems when they occur. But determinism is lost
once one considers scalable high-performance multi-threaded algorithms.
In a scalable multi-threaded program, one cannot guarantee that the same
binary floating-point numbers are computed in each run of the same code.
This makes it impossible for tests to use binary reproducibility as in [165].
CSE software often contains complex linear and nonlinear models, which
present a number of challenges with regard to floating-point operations. For
example, a set of nonlinear equations can have more than one solution [179],
and algorithms that solve these equations can find any one of these solutions or
no solution at all. Nonlinear models can also possess bifurcations [157, 189],
non-convexities [179], ill-conditioning [179] and other phenomena that
can make the behavior of numerical algorithms on these problems quite un-
predictable. Therefore, if any of these factors is present, a test may result in a
solution with a difference as large as O(1) in any norm one might consider com-
pared to a previous solution. Even in the case of linear solvers, ill-conditioning
can result in large changes in the results for small differences in the order of op-
erations and how round off is performed. In addition, many of the algorithms
are iterative and are terminated once a given error tolerance is achieved. This
becomes a major issue when tests are defined that use loose tolerances and
evaluate the final solution or look at the exact number of iterations used for a
given test problem. A minor change in the order of floating-point operations
can result in a tested algorithm going from, for example, 5 to 6 iterations and
cause a large change in the results.
Also the use of operations like min() and max() can result in large changes
in the output of a numerical test for even small changes in the order or round
off of floating-point operations due to any of the above issues. For example,
if a test checks the value and position of the maximum temperature in a
discretized field of a finite-element code, a small change in floating-point op-
erations can cause a large change in the position of such a value, resulting in
a non-continuous change in the output of the code.
While in a non-floating-point code, by definition, refactoring the code
does not result in any observed change in the behavior or the output of the
program [168], this property, unfortunately, cannot be maintained for even
the most simple refactoring in a floating-point code, even for mathematically
identical code. For example, suppose one starts out with a block of code that
computes z = a + b + c. If one refactors the code to create a function add()
and uses it as z = add(a, b) + c then the floating-point result for z can be
different depending on the compiler, the processor, the sizes of the various
numbers, etc. Therefore, refactoring can become very difficult and expensive
in codes with poorly designed test suites that do not take into account the
Testing of Scientific Software 105
www.ebook3000.com
106 Software Engineering for Science
One of the challenges for weak scaling studies is the question on how to
best scale up the global problem in a way that allows for a logical comparison
between different problem sizes. For example, a refined problem can be more
or less ill-conditioned and consequently behave very different at large scale.
One needs to consider whether one wants to include such effects in the scaling
study or not. In any case, interpreting such a scaling of the code can be
difficult, especially if one is not aware of such issues a priori.
Getting access to a large number of processors on a parallel machine for
longer time periods can be difficult and expensive. The cost of weak scaling
studies is typically greater because the cost per process can go up (somewhat)
as the number of processes is increased and, therefore, the last scaling point
(e.g., going from 32,768 processes to 65,536 processes) is more expensive than
all of the previous cases combined! This makes doing very large scale scalabil-
ity studies (e.g., O(100,000) processes) extremely expensive. Another factor
that hampers scalability testing is that supercomputing facilities that give out
time at no cost to research CSE projects generally want the cycles to be used
for science runs and not for testing and therefore discourage routine scala-
bility testing. Another difficulty is that supercomputers use batch submission
systems where users submit requests for jobs that reserve X number of pro-
cesses for a maximal wall clock time of Y hours. These requests are placed
into a queue, which is prioritized based on various criteria. Generally, jobs
that require more processes and time are pushed further back in the queue
and therefore it can take a long time before they are selected to be run, if they
are selected at all. So strategies have to be devised to submit a scaling study
in a set of smaller jobs and then to aggregate the results. The large overhead
and complexity of performing such studies on a shared supercomputer is often
enough of a deterrent to scaling studies.
The codes with the greatest challenges for scalability testing are codes
that are designed to be massively scalable, e.g., codes with structured grids.
While such codes can show excellent scalability on 10,000 processes, there is
no guarantee that they will continue to scale well at 100,000 processes or more.
For example, communication overhead that has little impact on scalability at
10,000 processes can become a large bottleneck at 100,000 processes. Also, just
the sizes of buffers and index ranges for larger problems can cause significant
slowdowns or even complete failure at large scale. Such situations are difficult
to test for or predict on smaller numbers of processes. Such defects could e.g.,
be in the MPI implementation, which makes prediction and diagnosis even
more challenging. To be absolutely sure that a CSE code will run at large scale,
one actually has to run it. Since this is not always possible, there are ways
to mitigate the need to run full problems on larger numbers of processes. For
example, one can put targeted timers in the code around certain computations
and then observe the growth rates of the measured times. This can be helpful
in finding code segments that are too expensive or growing more rapidly than
they should. To detect problems caused by buffer sizes and index ranges, one
can simulate aspects of a large problem or run very short jobs at large scale.
Testing of Scientific Software 107
However, the latter test is more of a correctness test checking for hard defects
than a scalability performance test.
In summary, scalability testing is typically extremely complex and expen-
sive, requiring a lot of compute time and large numbers of processes. But this
testing is critical to obtain highly efficient CSE codes that are capable to solve
problems of scientific and engineering interest at large scale.
www.ebook3000.com
108 Software Engineering for Science
voids or fractures with interior faces that are very close together, but struc-
turally separated from their neighbor, i.e., no edges connect the nodes across
a fracture. Unfortunately, this cannot be readily distinguished from a mesh
generation error in which two adjacent geometric objects were not correctly
joined to form a single domain. Instead, in this case it is necessary to provide
additional tests to users to ensure the mesh captures the geometry of their
conceptual model. Here, a simple diffusion problem with linear boundary con-
ditions could be used to verify that the mesh is properly connected, and no
unintended voids are present. Thus, we can see the value of flexible functional
representations of initial conditions, boundary conditions and source terms to
not only support verification testing but the development of tests that probe
these higher-level aspects of conceptual models.
Model complexity is also growing through a steady increase in the num-
ber and variety of processes being studied and coupled in simulations. Here
the challenge for testing is to support a scientific workflow that explores the
relative importance and representations of a wide range of processes in order
to develop scientific understanding and build confidence in particular model
configurations. In essence, the distinction between test suites used by devel-
opers and simulation campaigns used by domain scientists is blurred as we
move beyond the traditional multi-physics setting where a small number of
processes are coupled in a small set of predefined configurations to a dynamic
environment where the configurations needed by the users cannot be enumer-
ated a priori. To address this challenge, flexible and extensible frameworks
are being explored that leverage object oriented designs with well defined ab-
stract interfaces, automated dependency analysis, (semi)-automated process
couplers [163, 180], and data management capabilities [185].
Increasing model fidelity through increasing mesh resolution is another
natural approach that raises two additional challenges for testing. First, as
the mesh is refined the efficiency and scaling of the underlying algorithms
is stressed. In particular, in a time dependent simulation the time step will
be refined accordingly to maintain accuracy, implying more time steps to
reach the same target time. In addition, for processes with an elliptic com-
ponent, e.g., diffusion, implicit time-integration is necessary and each time
step requires now a larger system of equations to be solved. Thus, the chal-
lenges of scalability testing highlighted in Section 4.5.2 are prevalent here as
well. Second, as the mesh is refined, approximations to the model equations
and parameterizations of subgrid processes may need to be modified. Thus,
the interaction of computational capabilities and scientific research raises the
same challenges identified with increasing model complexity, as it effectively
increases the number of coupled processes being studied.
Finally, an important class of acceptance tests (Section 4.2.2) for models
in complex applications is benchmarking. In benchmarking, a mathematical
description of a problem is laid out in sufficient detail that the simulation
results of several codes can be compared using a variety of solution and per-
formance metrics. This has numerous benefits for the community, including
Testing of Scientific Software 109
www.ebook3000.com
110 Software Engineering for Science
of analysis to evaluate the solution quality. The tests can take many flavors,
such as convergence tests, confronting the numerically obtained solution of
a simplified problem with an analytical or semi-analytical one, or examining
some defining features that only occur if the solution is right. Sometimes these
tests can also serve as regression tests, but often requirements for ongoing
testing have a different focus.
The first step in formulating a continuous testing regime is to take an
inventory of verification needs within the software. This process defines the
code coverage requirements for the software. It implies picking all features of
the code that are necessary for correct behavior. These features are not lim-
ited to code sections/units or even subsections, they also include interaction
between code units. In CSE codes, where it is not always possible to elimi-
nate lateral coupling between code sections, code unit interactions can have
multiple facets. Some of these multi-faceted interactions may be features that
need to be included in the inventory. One of the guiding principles in taking
the inventory is to know the purpose of testing each feature. This knowledge
is critical for several reasons. It reduces effort wasted in developing tests that
have limited usefulness. It helps in mapping a specific testing need to the most
appropriate kind of test for it, and by minimizing waste it makes sure that
testing resources (human or machine) are optimally used.
The next step is to identify behaviors of the code that have detectable
response to changes. Since the purpose of testing is to be able to map a failure
easily to its source, it is also essential to be able to isolate the cause of detected
response. When a single code unit is being tested for its functionality, the
generic software engineering approach to meeting the isolation requirements is
to build a unit test for it. This is also a useful practice for CSE software where
possible. However, many times, breaking down the dependencies may not be a
feasible or worthwhile effort. For example, most physics on a discretized mesh
would need the mesh mocked up, which is less helpful than just using the
existing mesh. A work-around is to use minimally combined code units. The
purpose of these tests is still the same as that of unit tests - quickest map of
manifestation of failure to the source of failure within a minimal set of units.
Testing for interaction between units is trickier because many permutations
may be possible. Being numerical, code units may add regime of validity for
a particular combination to the overall testing space. For this kind of feature
testing scientists rely on one or more no–change or bounded-change tests.
One very important requirement for such tests is that they should be low
cost in terms of running. The verification tests that are used during the
development phase of the code need not have this constraint, but the ongoing
test suite does. One way to ensure that is to select tests that respond quickly
to perturbations. Because of multi-faceted interactions the count of needed
tests can grow rapidly. Similar challenges arise when covering features that
cannot be directly tested. Cross-checking with more than one test may be
needed to pinpoint the cause of failure. In both situations one is faced with
the need to downselect from the number of available tests for the whole testing
www.ebook3000.com
112 Software Engineering for Science
system to remain viable. One option is to use the matrix approach with tests
and feature coverage as rows and columns of the matrix respectively. This way
any overlap in coverage becomes visible and can be leveraged to reduce the
number of tests. A detailed example of using the matrix approach is described
in [155], and also discussed in Chapter 1 “Software Process for Multiphysics
Multicomponent Codes”
The approach to adding tests needs to be somewhat different when adding
new code to software that is already under the regime of regular testing. Here,
the concerns are limited to testing the new functionality and its interaction
with the existing code. If possible one should always add a unit test for the
new code. Sometimes tweaking some of the existing tests may provide the
coverage, otherwise one or more new tests may need to be added. In general
it is desirable to leverage the overlap as much as possible and minimize the
number of new tests added.
www.ebook3000.com
114 Software Engineering for Science
www.ebook3000.com
116 Software Engineering for Science
gtest [171] or xUnit [191], can make unit testing cost effective, when tailored
to the programming language and particular software being tested.
4.6.5 Policies
A software project should have solid policies on testing practices. Here,
some policies that have shown to be useful and avoided issues when followed
by everyone are listed.
It is important to have a consistent policy on how to deal with failed
tests. Generally, critical issues should be fixed immediately, whereas fixing
minor issues possibly can be delayed. To make sure that such issues are not
delayed indeterminately and are not forgotten one needs to find an efficient
method to track the issues. The tasks of fixing the bugs need to be adequately
assigned to the team members, i.e., it needs to be clear who is responsible to
fix the problem, particularly in a larger team. It could be the responsibility of
the person who wrote the part of the code that is broken, since they should
know the code best. That is where the policy to always assign a second person
who is familiar with that part of the software, is useful, since they can now
take over. Once the problem has been fixed it is advisable to add another
regression test for this issue to avoid reintroducing it at a later time. For very
large teams and integration efforts it is important to designate one or several
people who will watch over the test suite, monitor results and address failures.
They should ensure that these failures are addressed in a timely fashion by
assigning someone to take care of the problem within a set time or fix the
problem themselves.
When refactoring and adding new features in order to preserve code
quality, it is a good policy to require running a regression test suite before
checking in new code to avoid breaking the existing code. In addition one
should add new regression tests to regularly test the new features.
Making sure that there is always a second person familiar with a
particular portion of the code is another good policy. This person can then
take over in case the original developer is not available or leaves the project
for any reason. Should the developer leave, it is important that a replacement
is trained in the knowledge of the code, or this policy will eventually break
down, making it very difficult to deal with issues later, particularly if there is
insufficient documentation.
A good policy is to avoid using regression suites consisting primarily
of system-level no–change tests. While there are many instances where
no–change tests can be very useful, particularly at the unit and integration
test level, their use can also lead to serious issues. One problem is that when
the behavior of the software changes, these tests have to be rebaselined in
order to pass. This is often done without sufficient verification of the new
updated gold standard outputs. Many CSE codes use regression test suites
that almost completely consist of no–change system-level tests. Such test suites
generally do not provide a sufficient foundation to efficiently and safely drive
Testing of Scientific Software 117
future development and refactoring efforts for CSE codes. Codes that have
only system-level no–change tests based on necessarily looser tolerances can
tend to allow subtle defects to creep into the code. This has led some teams
to the extreme policy of requiring zero-diffing tests against “gold standard
output” (for example, see [165]). First off, the verification that goes into the
“gold standard output” can be tenuous. Even if it is solid, these tests do not
allow even the most basic types of valid refactorings (to improve the design
and better handle complexity) and therefore result in more complex and less
maintainable software. This trend nearly always leads to the uncontrolled
generation of software entropy which in turn leads to software that is very
difficult to change and in many cases the eventual (slow and painful) death
of the software [159, 161, 177, 181]. Also, no–change system-level tests with
higher tolerances are extremely difficult to maintain across multiple platforms.
High-quality finer-grained computational tests allow for better system-level
testing approaches while at the same time still providing a solid verification
and acceptance testing foundation in a much more portable manner. With
strong finer-grained unit and integration tests in place, the remaining system-
level tests can then be focused on gross integration issues and higher-level
verification tests and therefore allow for looser (and more portable) tolerances.
A good policy is to require a code review before releasing a test suite.
Such a review includes going over the test suite with the whole project or a
fellow programmer who can critique it and give advice on how to improve it.
Just with a paper or general research it is easy to miss any problems after
having looked at the code for a long time, whereas another person might spot
any issues right away. The code review can be performed in a group meeting or
through an online tool such as github, which allows each reviewer to examine
the code and comment. The evidence for the cost effectiveness of code reviews
in the SE literature is overwhelming (see [177]), but they are often neglected
or not even considered due to lack of sufficient developers or funding for the
software. One issue with code reviews is how to minimize the overhead of code
reviews while still being effective.
4.7 Conclusions
Automated software testing is extremely important for a variety of rea-
sons. It does not only ensure that a code delivers correct results, but it can
also significantly decrease development and maintenance costs. It facilitates
refactoring and portability of the code and plays an important role in software
maturity and sustainability. While these benefits apply to any type of software
there are challenges that are specific to scientific software, such as the the pre-
dominant use of floating-point operations, the need to run the code at large
scale on high performance computers, and the difficulty to test the underlying
www.ebook3000.com
118 Software Engineering for Science
physics models, all of which have been discussed here. In addition, stakehold-
ers and the key team roles for testing in a scientific software project as well as
various testing practices, including the development and maintenance of test
suites, the use of automated testing systems and a few helpful policies were
presented. All of these components need to be considered to generate reliable,
mature scientific software of high quality, which is crucial to deliver correct
scientific research results.
4.8 Acknowledgments
This work is partially supported by the Director, Office of Science, Office
of Advanced Scientific Computing Research of the U.S. Department of Energy
under Contract No. DE-AC02-05CH11231. This material is based upon work
supported by the U.S. Department of Energy, Office of Science, under con-
tract number DE-AC02-06CH11357. This work is funded by the Department
of Energy at Los Alamos National Laboratory under contracts DE-AC52-
06NA25396 and the DOE Office of Science Biological and Environmental Re-
search (BER) program in Subsurface Biogeochemical(SBR) through the In-
teroperable Design of Extreme-scale Application Software (IDEAS) project.
This work was performed under the auspices of the U.S. Department of En-
ergy by Lawrence Livermore National Laboratory under Contract DE-AC52-
07NA27344. This manuscript has been authored by UT-Battelle, LLC under
Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The
United States Government retains and the publisher, by accepting the arti-
cle for publication, acknowledges that the United States Government retains a
non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce
the published form of this manuscript, or allow others to do so, for United
States Government purposes. The Department of Energy will provide pub-
lic access to these results of federally sponsored research in accordance with
the DOE Public Access Plan(http://energy.gov/downloads/doe-public-access-
plan). Sandia National Laboratories is a multi-program laboratory managed
and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed
Martin Corporation, for the U.S. Department of Energy’s National Nuclear
Security Administration under contract DE-AC04-94AL85000.
Chapter 5
Preserving Reproducibility through
Regression Testing
Daniel Hook
5.1 Introduction
This chapter describes an approach to software testing that is known to
software engineers as regression testing. To the mathematically minded scien-
tist the word “regression” is strongly associated with the statistical machinery
for fitting equations and models to data. It is therefore necessary to open this
chapter with an important clarification: software regression testing is not (nec-
essarily) a statistical approach to testing and does not involve the fitting of
119
www.ebook3000.com
120 Software Engineering for Science
neither is it forbidden. In fact, a statistically literate scientist may find opportunities to ad-
vance the state of the art in software testing through the innovative application of statistical
techniques.
Preserving Reproducibility through Regression Testing 121
from Kanewala and Bieman [198] surveys much of this work—but much of this
testing research is still in its infancy: a clear set of best practices for testing
scientific software still needs to be identified. This chapter intends to show
that regression testing should be one of the techniques included in this set of
best practices.
Note that readers who are interested in learning more about software test-
ing techniques that are not specific to scientific software should consult refer-
ence books by Myers [200] and Beizer [192].
5.1.2 Reproducibility
For a scientific result to be considered trustworthy it is necessary that it be
reproducible. It should be possible, at least in theory, to reproduce any scien-
tifically valid result by following the same procedure that was used to produce
the result in the first place. Reproducible results are more trustworthy because
they can be independently verified, and they are more widely and deeply un-
derstood by the scientific community. Reproducible results are also important
in that they act as a starting point for further research. Donoho et al [193]
offer an excellent summary of the importance of reproducibility in computa-
tional science. They write, “The prevalence of very relaxed attitudes about
communicating experimental details and validating results is causing a large
and growing credibility gap. It’s impossible to verify most of the results that
computational scientists present at conferences and in papers.... Current com-
putational science practice doesn’t generate routinely verifiable knowledge.”
Admittedly, Donoho et al have a broad view of reproducibility in mind—
they would like to see it become common practice for researchers to publish
their codes and data sets with their articles so that other scientists can test
the codes and results for themselves. It is a laudable and ambitious goal. The
goal of this chapter is more modest: here the aim is simply to show that
regression testing is a very powerful and important tool to aid in the ongoing
development of scientific software that produces reproducible results. To that
end, this chapter will use a less broad definition of reproducibility; let us say
that software can be described as giving reproducible results if, for a specific,
fixed set of inputs, it gives a corresponding, fixed set of outputs even as the
software is modified and distributed.
In some simple contexts software reproducibility is trivial to achieve: com-
puters are designed to be deterministic machines and, unless faulty, should
always compute the same result provided that the initial state and opera-
tional environment are constant. However, in most real-world contexts the
initial state and operational environment are not constant.
Broadly speaking, there are three main reasons that software reproducibil-
ity is lost:
1. A developer unintentionally changes the behavior of the software, that
is, someone introduces a code fault (a “bug”). This kind of change may
www.ebook3000.com
122 Software Engineering for Science
3. Regression tests are usually lightweight tests that are chosen to run
as quickly as is reasonably achievable while maintaining an acceptably
high level of code coverage. That is, the goal of regression testing is to
exercise the code, not to stress test the hardware or check the science.
This differs from stress tests and from many validation tests. Note that
other types of tests—e.g., unit tests—may also be lightweight.
4. Usually the tests are designed to test integrated software systems rather
than focusing on isolated routines. This differs from unit tests that are
coded alongside individual routines.3 There is, however, some overlap
between regression tests and unit tests: some regression tests may be
focused on specific modules, and a suite of unit tests can be run as a
valuable part of a regression testing procedure.
Regression tests, like most traditional software tests, are often evaluated
on a pass/fail basis where a failure is understood to have occurred when the
output from a test differs from the historical output for that test. However,
it is important to note that simple binary pass/fail evaluation can easily be
generalized to a nearly continuous set of scores by using a more advanced
metric—such as relative error—to compare the current and historical out-
puts. In fact, because of the tolerance problem, such a generalization is often
essential when one is testing scientific software.
expected result ahead of time, thus allowing them to hard-code the correct test result di-
rectly into the test. Integration and regression tests of scientific software, on the other hand,
usually involve a level of complexity such that the expected output cannot be predicted and
must be measured either from an external source (for a validation test) or from the output
of the program itself (for a regression test).
www.ebook3000.com
124 Software Engineering for Science
cretization error, and round-off error. These errors are often unavoidable and difficult to
quantify [201].
Preserving Reproducibility through Regression Testing 125
down, the record of historic results from regression tests can help one better
manage the tolerance problem by providing valuable insights into the evolu-
tion of the software’s accuracy. For example, the developers may observe that
certain outputs are more likely to be perturbed than others—this may indicate
some numerical instability that can be reduced by better solution techniques
or it may, at least, indicate that tolerances on those outputs need to be more
carefully chosen. These historic insights can then be called upon when one is
attempting to determine appropriate test tolerances.
It also bears mentioning that there are other testing techniques that can
help tame the tolerance problem. Sensitivity testing is one useful approach.
www.ebook3000.com
126 Software Engineering for Science
the cause of the failure can be identified and rectified. This build procedure
ensures that software with detected faults will not be distributed to beta
testers and users.
The regression testing procedure at ESG was first applied to the nightly
build on May 24, 2012. In the time between implementing our regression
testing procedure and the writing of this chapter (November 1, 2015) there
have been a total of 114 regression test failures. 36 of these failures were due
to code faults—Table 5.1 lists the 36 fault related failures in four categories of
severity. The other 78 failures were due to unavoidable and/or expected losses
of reproducibility: 77 were due to improvements in the code and one was due
to a compiler update.
1. The existing code was not designed or built to be used with automated
testing in mind and was, therefore, difficult to pull apart and link with
third-party libraries.
2. The outputs of seismic data processing can be extensive and varied—we
currently check 140 different output fields for each test. This requires a
test evaluation system that is flexible and which can provide much more
detailed information than a simple pass/fail result. Using ESG software
leverages existing data manipulation routines to evaluate the outputs.
3. The software reads from and writes to proprietary binary files and
databases which are most easily read and parsed by ESG routines.
www.ebook3000.com
128 Software Engineering for Science
Calibration
results
Extraction
Test 42
data store
Case test
result
data
Test
Case
workflows -filter
Evaluation
-pick
-locate
-filter -filter -analyze
-pick -pick
-locate -locate
workflow results
-analyze -analyze
?
-filter -filter
-pick -pick
Test
-locate -locate
X == 42
-analyze -analyze
Case X 42
test historic
result result
Figure 5.1: Schematic of the relationship between the three regression tester
tasks.
www.ebook3000.com
130 Software Engineering for Science
tion) and are, therefore, not checked by the regression testing engine.
Preserving Reproducibility through Regression Testing 131
5.3.4 Results
As of the writing of this chapter, regression testing has been used at ESG
for 680 weekdays (ESG’s nightly builds are not run on the weekends). As
shown in Table 5.1, 36 code faults were detected by the regression tests and
22 of these faults could have significantly hurt the accuracy of our software’s
outputs while 8 of them could have resulted in software crashes.
For the sake of comparison, it was noted that in the time since imple-
menting regression testing there have been 28 bugs reported in BugZilla for
the three libraries that are being tested by the regression tester. (Note that
bugs that are caught by regression tests are not included in BugZilla.) An
examination of the descriptions of the 28 BugZilla bugs suggests that at least
23 of them are innate bugs, that is, they are due to problems in the initial
specification or implementation of an algorithm and so are not detectable by
regression tests. At ESG, this suggests that the rate at which code faults are
introduced during the modification of existing software is comparable to the
rate at which they are built into new developments. Regression testing has
proven its value by helping ESG detect more than half of the bugs reported
in the libraries in the last 680 days, and these bugs were detected before they
were distributed to any beta testers or users.
The other interesting observation is that there were 36 faults (i.e., mis-
takes) per 77 improvements that broke reproducibility. This suggests that for
every 10 improvements made to existing code routines developers have made
roughly 5 mistakes—if testing hadn’t been applied to detect these faults ESG
www.ebook3000.com
132 Software Engineering for Science
would have taken 1 step back for every 2 steps forward. Even if these results
are not representative in a wider context, it still seems reasonable to hypoth-
esize that developers make mistakes at a rate that is comparable to the rate
at which they make improvements: that is why it is critically important to
inspect and test scientific code.
the tests are insignificant when compared to the many hours of debugging
and troubleshooting that would have been spent if the faults had not been
detected before beta testing and the software’s release.
When developers are working with software that is amenable to test au-
tomation, regression testing activities are of demonstrable value, but what
about contexts where test automation is more difficult? For example, it may
be challenging to automate the testing of a software system that includes large
amounts of complex legacy code or a system that is designed to be applied to
very large data sets. In such contexts the developers may have to step back
and consider if it is worth re-working the architecture of their system in order
to make at least some parts of the system testable. Perhaps testing could be
focused on some key units or routines that are known to be prone to errors, or
maybe some artificial data sets could be created to exercise only key parts of
the code. Further research into more advanced regression testing techniques
may also be needed. While such work may be time consuming, this chapter
suggests that the resulting improvements in reproducibility and traceability
will pay dividends.
Some areas where further research and development is needed include the
following:
• Studies of test selection techniques that are specifically targeted to sci-
entific software are needed. Some initial work by Gray and Kelly [195]
suggests that randomly selected tests have some utility when testing sci-
entific codes, but further work remains to be done and is needed. At this
point in time, no work targeted specifically at test selection for scientific
software regression testing is known to exist.
• Work could be done to develop techniques that use historical regression
test results to help tame the tolerance problem. A zero-tolerance condi-
tion is not always appropriate—e.g., when changing compiler options—
and it can be difficult to know what tolerance should be used for tests
in such cases. Perhaps a combination of sensitivity testing, statistical
techniques, and regression histories could be used to help determine ap-
propriate tolerances.
• Case studies of regression testing techniques being applied in diverse and
varied scientific contexts would be valuable. Developers and managers
at ESG have been strongly convinced of benefits of regression testing.
Case studies from scientific software developers in other fields and/or
using different tools would help to reinforce the value of reproducibility
testing in these other contexts.
• Test automation can be very challenging when dealing with large scien-
tific systems that run in complex execution environments. Test automa-
tion tools that acknowledge the oracle and tolerance problems and other
challenges of the scientific computing world need to be developed.
www.ebook3000.com
Chapter 6
Building a Function Testing Platform
for Complex Scientific Code
6.1 Introduction
Complex scientific code is defined as complicated software systems that
have been adopted by specific communities to address scientific questions.
Many scientific codes were originally designed and developed by domain sci-
entists without first implementing basic principles of good software engineer-
135
www.ebook3000.com
136 Software Engineering for Science
ing practices. Software complexity has become a barrier that impedes further
code development of these scientific codes, including adding new features into
the code, validating the knowledge incorporated in the code, and repurposing
the code for new and emerging scientific questions. In this chapter, we de-
scribe innovative methods to: (1) better understand existing scientific code,
(2) modularize complex code, and (3) generate functional testing for key soft-
ware modules. In the end, we will present our software engineering practices
within the Accelerated Climate Modeling for Energy (ACME) and Interop-
erable Design of Extreme-scale Application Software (IDEAS) projects. We
believe our methods can benefit the broad scientific communities that are
facing the challenges of complex code.
program modules together with associated control data, usage procedures, and
operating procedures, are tested to determine whether they are fit for use [216].
For generic software code, the unit testing has been proven as an efficient
method for code design, optimization, debugging, and refactoring [213]. There
are several existing language-specific unit testing frameworks, such as JUnit
[204], CUnit [205], and FUnit [206], etc. Our unique function unit testing
concept targets scientific code where the unit testing cases are sparse, and
aims to facilitate and expedite the scientific module validation and verification
via computational experiments.
Specifically, after experimenting with many testing tools in scientific code,
we had repeated issues with tool scalability and usability. Based on previous
experiences, we wanted to build a test platform that would: (1) be widely used
by developers and module builders to fix problems and collect variables, (2)
integrate smoothly into the existing workflow, (3) empower developers and
module builders to write and deploy their own test case and track specific
variables, and (4) produce test suites automatically and provide validation
using benchmark testing cases. The ultimate goal of this effort is to generate
key scientific function modules from large scale scientific code, so that those
function units can be reconfigured and reassembled together to expedite new
computational experiment design for model validation and verification. Due to
the interests of general readers of this book, we are focusing on the functional
testing framework establishment, instead of scientific experiment design using
those functional units.
www.ebook3000.com
138 Software Engineering for Science
Figure 6.1: The major software component and workflow of the ACME Land
Model ALM functional testing.
Data Collector inside of scientific code are used to driver the Function Unit
inside the Unit Test Module. Two Output Streams from the Data Collector are
sent to Module Validator to verify the correctness of Unit Test Module within
the Unit Test Environment. Our methods contain the following steps:
www.ebook3000.com
140 Software Engineering for Science
the code integrity of the targeted Unit Test Module within a unified testing
environment is validated.
www.ebook3000.com
142
Software Engineering for Science
Figure 6.2: Cube visualization showing the call-tree of a three-day ACME simulation running on 32 nodes (508 cores) of
the Titan machine.
Building a Function Testing Platform for Complex Scientific Code
Figure 6.3: Vampir’s trace visualization showing a three-day ACME simulation running on 32 nodes (508 cores) of the Titan
machine.
143
www.ebook3000.com
144 Software Engineering for Science
Vampir. This also helps to maintain the code since new components, models
or methods are added to the ACME code regularly.
Figure 6.4: The software function call within ALM. Each node represents a
software function call.
www.ebook3000.com
146 Software Engineering for Science
www.ebook3000.com
148 Software Engineering for Science
6.6 Conclusion
In this chapter, we have presented a unique approach to complex large-
scale scientific code with sparse unit test cases. Based on the advanced devel-
opment tools and modern compilers, we can build function testing platform
for scientific code, so that model developers and users can better understand
existing scientific code, and develop comprehensive testing cases for key scien-
tific function modules, so that the science module builders are able to better
understand existing scientific code and to make module validation more con-
venient. In the chapter, we also present our software engineering practices to
develop a function testing platform for the land model within the Acceler-
ated Climate Modeling for Energy (ACME). This practice is also part of code
refactoring method research within the Interoperable Design of Extreme-scale
Application Software (IDEAS) project. We believe our methods can be ben-
eficial to the broad scientific communities that are facing the challenges of
complex code.
Chapter 7
Automated Metamorphic Testing of
Scientific Software
Scientific programs present many challenges for testing that do not typically
appear in application software. These challenges make it difficult to conduct
systematic testing on scientific programs. Due to a lack of systematic test-
ing, subtle errors, errors that produce incorrect outputs without crashing the
program, can remain undetected.
149
www.ebook3000.com
150 Software Engineering for Science
7.1 Introduction
Custom scientific software is widely used in science and engineering. Such
software plays an important role in critical decision making in fields such as
the nuclear industry, medicine and the military. In addition, results obtained
from scientific software are used as evidence for research publications. Despite
the critical usage of such software, many studies have pointed out a lack of
systematic testing of scientific software [221]. Especially, unit testing is an
important part of systematic testing, where the smallest testable parts of
the application are individually tested to ensure they are working correctly.
Several studies have pointed out a lack of automated unit testing in scientific
software development [222–224].
Due to this lack of systematic testing, subtle program errors can remain
undetected. These subtle errors can produce seemingly correct outputs with-
out causing the program to crash. There are numerous reports of subtle faults
in scientific software causing losses of billions of dollars and the withdrawal of
scientific publications [225].
One of the greatest challenges for conducting systematic automated testing
is the lack of automated test oracles. Systematic testing requires an automated
Automated Metamorphic Testing of Scientific Software 151
test oracle that will determine whether a test case output is correct according
to the expected behavior of the program. But for many scientific programs
such automated test oracles do not exist for two main reasons: (1) Scien-
tific programs are often written to find answers that are previously unknown.
Therefore test oracles do not exist for such programs. (2) Scientific software
often produces complex outputs or perform complex calculations. This makes
developing oracles practically difficult for these programs. Weyuker identified
software that faces these types of problems as non-testable programs [226].
Metamorphic testing (MT) is a testing technique that can be used to test
non-testable programs. This technique operates by checking whether the pro-
gram behaves according to a set of properties called metamorphic relations
(MR). A metamorphic relation specifies how your output should change ac-
cording to a specific change that you make to the inputs. A violation of a
metamorphic relation during testing indicates that the program might con-
tain a fault.
One of the challenges for automating the MT process is the task of identi-
fying MRs. Usually MRs that should be satisfied by a program under test are
identified manually by the programmer using her domain knowledge about
the program. But, in this work we use MRpred : a novel automated method
for predicting likely MRs for a given program [227]. MRpred uses a super-
vised machine learning approach to train a classifier that can predict MRs for
a previously unseen function. Through the use of MRpred, we were able to
automate the entire MT based unit testing process.
In this chapter, we present the results of a case study that we conducted
to evaluate the effectiveness of the automated unit testing approach based
on MT. We used four scientific libraries in our evaluation including a custom
scientific library that is developed in-house. In each of the code libraries, we
were able to identify about 90%–99% of automatically inserted faults through
automated unit testing. Our results also show that MRs differ in fault findings
effectiveness perform differently in identifying faults. MRs that make changes
to individual input values perform best for the scientific libraries used in this
study. Further, faults introduced by altering assignment operators and faults
introduced reported the highest detection effectiveness.
The remainder of this chapter is organized as follows: Section 7.2 describes
the oracle problem in scientific software testing, root causes of the oracle
problem, approaches used by practitioners to alleviate the oracle problem and
their limitations. Section 7.3 describes the MT process and its applications for
testing scientific software. Section 7.4 describes the MRpred approach used for
automatic prediction of MRs. Section 7.5 describes the experimental setup of
the case studies and Section 7.6 describes the results. Finally, we present our
conclusions and future work in Section 7.7.
www.ebook3000.com
152 Software Engineering for Science
www.ebook3000.com
154 Software Engineering for Science
Further a statistical oracle cannot decide whether a single test case has
passed or failed [257].
8. Reference data sets: Cox et al. created reference data sets based on the
functional specification of the program that can be used for black-box
testing of scientific programs [258].
Limitations: When using reference data sets, it is difficult to determine
whether the error is due to using unsuitable equations or due to a fault
in the code.
Figure 7.1: Function from the SAXS project described in Section 7.5.1 used
for calculating the radius of gyration of a molecule.
Consider the function in Figure 7.1 that calculates the radius of gyration of
a molecule. Randomly permuting the order of the elements in the three input
arrays should not change the output. This is the permutative MR in Table 7.3
on Page 167. Therefore, using this metamorphic relation, follow-up test cases
can be created for every initial test case and the outputs of the follow-up test
cases can be predicted using the initial test case output.
Figure 7.2 shows an automated JUnit1 test case written for testing the
function in Figure 7.1 using the permutative MR. The initial test case is
generated randomly by generating three array inputs. Other automated test
generation approaches, such as the coverage based test generation can be
used to generate the initial test cases as well. This randomly generated initial
test case is executed on the function under test. Next, based on the input
relationship specified by the permutative MR, a follow-up test case is created
by randomly permuting the elements in the three array input and this follow-
up test case is also executed on the function under test. Finally, based on
the output relationship specified by the permutative MR, an assertEquals
statement is created to check whether the outputs of the initial and follow-up
test cases are equal within a given tolerance specified by the domain. As shown
with this test case, this MR based testing approach allows to generate test
cases automatically and verify relationships between multiple outputs without
any manual intervention.
www.ebook3000.com
156 Software Engineering for Science
@Test
public void f i n d G y r a t i o n R a d i u s R a n d T e s t ( ) {
Random rand=new Random ( ) ;
i n t a r r L e n=rand . n e x t I n t (MAXSIZE) + 1 ;
// i n i t i a l t e s t c a s e s
double [ ] iX=new double [ a r r L e n ] ;
double [ ] iY=new double [ a r r L e n ] ;
double [ ] i Z=new double [ a r r L e n ] ;
f o r ( i n t k =0;k<a r r L e n ; k++){
iX [ k]= rand . n e x t D o u b l e ( ) ;
iX [ k]= rand . n e x t D o u b l e ( ) ;
iX [ k]= rand . n e x t D o u b l e ( ) ;
}
// E x e c u t i n g t h e i n i t i a l t e s t c a s e on t h e f u n c t i o n under t e s t
double i n t i a l O u t p u t=SAXSFunctions . f i n d G y r a t i o n R a d i u s ( iX , iY , i Z ) ;
// E x e c u t i n g t h e f o l l o w −up t e s t c a s e on t h e f u n c t i o n under t e s t
double followUpOutput=SAXSFunctions . f i n d G y r a t i o n R a d i u s ( fX , fY , f Z ) ;
a s s e r t E q u a l s ( i n t i a l O u t p u t , followUpOutput , e p s ) ; }
Figure 7.2: JUnit test case that uses the permutative MR to test the function
in Figure 7.1.
www.ebook3000.com
158 Software Engineering for Science
Figure 7.3: Function for finding Figure 7.4: Function for finding
the maximum element in an array. the average of an array of numbers.
d o u b l e calcRun ( d o u b l e [ ] a ) {
int s i z e = a . length ;
i f ( s i z e < 2) e r r o r ;
d o u b l e run = 0 ;
f o r ( i n t i =1; i <s i z e ; ++i ) {
double x = a [ i ] − a [ i −1];
run += x∗x ;
}
r e t u r n run ;
}
(a) CFG for the function in (b) CFG for the function in
Figure 7.3 Figure 7.4
Figure 7.6: CFGs for the functions max, average, and calcRun.
www.ebook3000.com
160 Software Engineering for Science
www.ebook3000.com
162 Software Engineering for Science
is a product of the similarity of its nodes and edges. This concept is illustrated
in Figure 7.10. Computing this kernel requires specifying an edge kernel and a
node kernel. We used the following approach for determining the kernel value
between a pair of nodes. We assign a value of 0.5, if the node labels represent
two operations with similar properties, even if they are not identical. The
kernel value between pair of edges is determined using their edge labels, where
we assign a value of one if the edge labels are identical zero otherwise.
Figure 7.10: Random walk kernel computation for the graphs G1 and G2 .
www.ebook3000.com
164 Software Engineering for Science
1.00
0.90
AUC
0.80
0.70
0.60
of these functions, we only use the procedural-style static Java methods from
these libraries.
We list the specific functions we used from the above four code libraries in
Table 7.1. These functions and their graph representations can be accessed via
the following URL: http://www.cs.colostate.edu/saxs/MRpred/functions
.tar.gz. These functions and code libraries were selected to cover wide range
of numerical calculations commonly found in scientific software. Thus, func-
tions in the code corpus perform various calculations using sets of numbers
such as calculating statistics (e.g. average, standard deviation and kurtosis),
calculating distances (e.g. Manhattan and Chebyshev) and searching/sorting.
Table 7.2 displays the LOC and the average McCabe’s complexity for the
functions in different libraries in the code corpus. Lines of code of these func-
tions varied between 5 and 54, and the cyclomatic complexity varied between
1 and 11. The number of input parameters to each function varied between 1
and 5.
7.5.3 Setup
We used mutation analysis [275] to measure the effectiveness of the pre-
dicted metamorphic relations from MRpred in detecting faults. Mutation anal-
ysis operates by inserting faults into the program under test such that the cre-
ated faulty version is very similar to the original version of the program [276].
A faulty version of the program under test is called a mutant. If a test identifies
a mutant as faulty that mutant is said to be killed.
Mutation analysis was conducted on 100 functions from the code corpus.
We used the µJava7 mutation engine to create the mutants for the functions
in our code corpus. We used only method level mutation operators [277] to
create mutants since we are only interested in the faults at the function level.
Each mutated version of a function was created by inserting only a single
mutation. Figure 7.12 shows an example mutant generated by the tool.
µJava generates method-level mutants by modifying the source code of
the methods under consideration using 19 separate mutation operators. The
main method level mutation categories supported by µJava are described in
Table 7.4. Mutants that resulted in compilation errors, run-time exceptions
or infinite loops were removed before conducting the experiment.
For each of the mutants used in the experiment, we used MRpred to get
a set of predicted metamorphic relations. These predicted metamorphic rela-
tions are used to conduct automated unit testing on the mutants. For each
7 https://cs.gmu.edu/ offutt/mujava/
www.ebook3000.com
166 Software Engineering for Science
TABLE 7.1: Functions Used in the Experiment
Open source Functions Used in the Experiment
project
SAXS scatterSample, findGyrationRadius, calculateDis-
tance, discreteScatter
Colt Project min, max, covariance, durbinWatson, lag1, me-
anDeviation, product, weightedMean, autoCorre-
lation, binarySearchFromTo, quantile, sumOfLog-
arithms, kurtosis, pooledMean, sampleKurto-
sis, sampleSkew, sampleVariance, pooledVari-
ance, sampleWeightedVariance, skew, standardize,
weightedRMS, harmonicMean, sumOfPowerOfDe-
viations, power, square, winsorizedMean, polevl
Apache Commons errorRate, scale, eucleadianDistance, distance1,
Mathematics Library distanceInf, ebeAdd, ebeDivide, ebeMultiply,
ebeSubtract, safeNorm, entropy, g, calcu-
lateAbsoluteDifferences, calculateDifferences,
computeDividedDifference, computeCanber-
raDistance, evaluateHoners, evaluateInternal,
evaluateNewton, mean, meanDifference, variance,
varianceDifference, equals, checkNonNegative,
checkPositive, chiSquare, evaluateWeightedProd-
uct, partition, geometricMean, weightedMean,
median, dotProduct
Functions from the cosineDistance, manhattanDistance, cheby-
previous study [227] shevDistance, tanimotoDistance, hammingDis-
tance, sum, dec, errorRate, reverse, add_values,
bubble_sort, shell_sort, sequential_search,
selection_sort, array_calc1, set_min_val, get_-
array_value, find_diff, array_copy, find_mag-
nitude, dec_array, find_max2, insertion_sort,
mean_absolute_error, check_equal_tolerance,
check_equal, count_k, clip, elementwise_max,
elementwise_min, count_non_zeroes, cnt_ze-
roes, elementwise_equal, elementwise_not_equal,
select
7.6 Results
In this section we present the results of our case studies. We first dis-
cuss the overall fault detection effectiveness using the automated unit testing
approach that we discussed in this Chapter. Then we discuss the fault detec-
tion effectiveness of different MRs and the effectiveness of this approach in
detecting faults in different categories.
www.ebook3000.com
168 Software Engineering for Science
Fault finding effectiveness of all the four libraries were above 90%. In Colt
functions, 99% of the faulty versions could be found through automated MT
based unit testing.
www.ebook3000.com
170 Software Engineering for Science
www.ebook3000.com
172 Software Engineering for Science
Figure 7.18: Fault detection effectiveness across fault categories for individ-
ual MRs.
tional operator faults was least effective. Further, among the six MRs used in
this study, additive and multiplicative MRs were most effective in identifying
faults in all the categories.
This study showed that certain MRs are more effective in identifying faults.
In the future we plan to investigate this further and develop techniques to
automatically prioritize the most effective MRs for testing a given program.
We plan to add this functionality to MRpred. We also plan to incorporate
information about fault categories when making this prioritization. Further,
we plan to investigate techniques to automatically generate initial and follow-
up test cases for given MRs so that the fault findings effectiveness can be
further improved.
www.ebook3000.com
Chapter 8
Evaluating Hierarchical
Domain-Specific Languages for
Computational Science: Applying the
Sprat Approach to a Marine
Ecosystem Model
175
www.ebook3000.com
176 Software Engineering for Science
8.1 Motivation
When software engineers started to examine the software development
practice in computational science, they noticed a “wide chasm” [296] between
how these two disciplines view software development. Faulk et al. [293] de-
scribe this chasm between the two subjects using an allegory which depicts
computational science as an isolated island that has been colonized but then
was left abandoned for decades:
“Returning visitors (software engineers) find the inhabitants (scientific
programmers) apparently speaking the same language, but communi-
cation—and thus collaboration—is nearly impossible; the technologies,
culture, and language semantics themselves have evolved and adapted
to circumstances unknown to the original colonizers.”
Hierarchical Domain-Specific Languages for Computational Science 177
The fact that these two cultures are “separated by a common language” cre-
ated a communication gap that inhibits knowledge transfer between them. As
a result, modern software engineering practices are rarely employed in com-
putational science.
So far, the most promising attempt to bridge the gap between computa-
tional science and software engineering seems to be education via workshop-
based training programs focusing on Ph.D. students, such as the ones orga-
nized by Wilson [318] and Messina [306]. While the education approach does
address the skill gap that is central to the “software chasm,” education will not
suffice alone: just exposing scientists to software engineering methods will not
be enough because these methods often fail to consider the specific character-
istics and constraints of scientific software development—i.e., the functioning
of these methods is based on (often implicit) assumptions that are violated
in the computational science context [288, 297]. We therefore conclude that—
complementary to the education approach—we have to select suitable software
engineering techniques and adapt them specifically to the needs of computa-
tional scientists.
www.ebook3000.com
178 Software Engineering for Science
which are designed to be able to implement any program that can be com-
puted with a Turing machine, DSLs limit their expressiveness to a particular
application domain. By featuring high-level domain concepts that enable to
model phenomena at the abstraction level of the domain and by providing a
notation close to the target domain, DSLs can be very concise. The syntax of
a DSL can be textual or graphical and DSL programs can be executed either
by means of interpretation or through generation of source code in existing
GPLs. A popular example of a textual DSL are regular expressions, which tar-
get the domain of text pattern matching and allow to model search patterns
independently from any concrete matching engine implementation.
As with any other formal language, a DSL is defined by its concrete and ab-
stract syntax as well as its static and execution semantics. While the concrete
syntax defines the textual or graphical notation elements with which users of
the DSL can express models, the abstract syntax of a DSL determines the
entities of which concrete models can be comprised. These abstract model
entities (abstract syntax) together with the constraints regarding their rela-
tionships (static semantics) can again be expressed as a model of all possible
models of the DSL, which is therefore called the meta-model of the DSL.
Since DSLs are designed to express solutions at the abstraction level of the
domain, they allow the scientists to care about what matters most to them:
doing science without having to deal with technical, implementation-specific
details. While they use high-level domain abstractions, they still stay in full
control over their development process as it is them who directly implement
their solutions in formal and executable (e.g., through generation) program-
ming languages. Additionally, generation from a formal language into a low-
level GPL permits to examine the generated code to trace what is actually
computed.
DSLs can also help to reconcile the conflicting quality requirements of per-
formance on the one hand and portability and maintainability on the other
hand that are responsible for many of the difficulties experienced in scientific
software development (cf. [288]). DSL source code is maintainable because it is
often pre-structured and much easier to read than GPL code, which makes it
almost self-documenting. This almost self-documenting nature of DSL source
code and the fact that it can rely on an—ideally—well-tested generator for
program translation ensure the reliability of scientific results based on the out-
put of the software. Portability of DSL code is achieved by just replacing the
generator for the language with one that targets another hardware platform.
With DSLs, the high abstraction level does not have to result in performance
losses because the domain-specificity first of all enables to apply—at compile
time—domain-specific optimizations and greatly simplifies automatic paral-
lelization.
In the way described above, DSLs integrated into a custom MDSE ap-
proach could help to improve the productivity of computational scientists and
the quality of their software. A first indicator that supports this hypothesis
can be found in the survey report of Prabhu et al. [308], who find that those
Hierarchical Domain-Specific Languages for Computational Science 179
scientists who program with DSLs “report higher productivity and satisfac-
tion compared to scientists who primarily use general purpose, numerical, or
scripting languages.”
www.ebook3000.com
180 Software Engineering for Science
Note that an analysis algorithm is appropriate not only for the specific
model in question but for at least a whole sub-class of models in the respective
mathematical modeling framework. Therefore, the analysis algorithm can be
implemented independently of any concrete model and can be arranged in a
way that the model component makes use of the algorithm component but
not the other way around.
If additional scientific effects are to be included in the simulation, they can
usually be interpreted as extensions to a base model. If, for example, sea ice
is supposed to be included in an ocean model, it can be represented as a layer
over the entire sea surface which may contain ice of variable thickness [279].
This layer would then influence certain processes which are modeled in the
basic fluid dynamics equations.
Such model extensions introduce higher levels of abstraction and can be
implemented atop the existing base model, which remains independent of the
extension components. In this way, multiple model extensions can be stacked
on top of each other, which leads to a layered software architecture as depicted
in Figure 8.1.
Alexander and Easterbrook [279] demonstrate that it is not only theoret-
ically possible to employ the multi-layered architecture pattern in the engi-
neering of scientific software but that is actually used by existing simulation
software. For this purpose, they analyze the software architecture of eight
global climate models that represent both ocean and atmosphere.
Regarding the boundaries between the different components of the climate
models, Alexander and Easterbrook point out that they “represent both natu-
Hierarchical Domain-Specific Languages for Computational Science 181
#pragma omp p a r a l l e l f o r
f o r ( i =1; i <N−1; i ++) {
dt_u [ i ] = rho [ i ] ;
dt_rho [ i ] = ( v [ i +1] − v [ i −1]) / ( 2 ∗ dx ) ;
dt_v [ i ] = ( rho [ i +1] − rho [ i −1]) / ( 2 ∗ dx ) ;
}
ral boundaries in the physical world (e.g., the ocean surface), and divisions be-
tween communities of expertise (e.g., ocean science vs. atmospheric physics).”
Therefore, the hierarchically arranged components in simulation software also
belong to distinct scientific (sub-)disciplines. This, of course, does not only
hold true for climate models but also applies to general simulation software:
the analysis algorithm, the base model, and all model extension components
are separated from each other along the boundaries of different “communities
of expertise.”
We will make use of the possibility to partition scientific simulation soft-
ware along discipline boundaries into hierarchically arranged layers by con-
structing a DSL hierarchy that mirrors this hierarchical structure.
∂t u = ρ (8.1)
ij
∂t ρ = δ ∂i vj (8.2)
∂t vi = ∂i ρ. (8.3)
www.ebook3000.com
182 Software Engineering for Science
Figure 8.3: Multiple layers acting as domain-specific platforms for each other.
that each layer uses abstractions provided by the next lower hierarchy level
but never uses abstractions from higher levels. For a description of how the
levels of the hierarchy can interact and, thus, form domain-specific platforms
for each other, see Section 8.3.2.2.
Each level in a DSL hierarchy is associated with a modeler role which uses
the DSL of the level to model the application part of this level. Together, the
application parts of all hierarchy levels form the whole scientific simulation
application to be implemented. Note that we assign a role to each level and
not a person. This implies that a single person can fulfill multiple roles in a
DSL hierarchy and one role can be assumed by several persons at once.
By employing an individual DSL for each discipline that is involved in
an interdisciplinary scientific software project, we achieve a clear separation
of concerns. Additionally, this ensures that all participating scientists (who
assume modeler roles) are working only with abstractions that they are al-
ready familiar with from their respective domain. Due to the high specificity
of a well-designed DSL, the code of an implemented solution that uses this
language can be very concise and almost self-documenting. This simplifies
writing code that is easy to maintain and to evolve, which allows scientists
to implement well-engineered software without extensive software engineering
training.
www.ebook3000.com
184 Software Engineering for Science
Figure 8.4: DSL hierarchy for the Sprat Marine Ecosystem Model.
www.ebook3000.com
186 Software Engineering for Science
2. The DSLs have to integrate well with the tools and workflows that the
scientists are used to.
3. Candidate DSLs have to be easy to learn for domain specialists (the
concrete syntax must appear “natural” to them) and offer good tool
support. In this way, the scientists require only minimal training to use
the languages.
4. As performance is a very important quality requirement in computa-
tional science, it must be made sure that the increased level of abstrac-
tion of a candidate DSL does not compromise the runtime performance
of programs significantly. Additionally, the DSL should introduce as few
dependencies as possible.
5. The language engineers must ensure that candidate DSLs can be inte-
grated with each other vertically in a DSL hierarchy.
Clearly, the language engineers have to cooperate closely with the scientists
and obtain feedback from them continuously to make sure that the selected
DSLs actually meet the needs of the scientists and that the latter are really
www.ebook3000.com
188 Software Engineering for Science
willing to use the languages. For this reason, it is important for the language
engineers to know about and to respect the characteristics of software devel-
opment in computational science.
If no suitable DSLs can be identified for some or all levels of the DSL
hierarchy, the language engineers have to develop corresponding languages
by themselves. In principle, the development of DSLs for computational sci-
ence is not different from DSL engineering for other domains. Generally, the
DSL development process can be divided into a domain analysis, a language
design, and an implementation phase for which Mernik et al. [305] identify
several patterns. A more detailed approach to DSL engineering that focuses
on meta-modeling is given by Strembeck and Zdun [316]. Of course, for DSL
development the language engineers have to pay special attention to the same
factors that were already discussed above in the context of DSL selection
for scientific software development. Again, it cannot be overemphasized that
the language engineers have to work in close collaboration with the scientists
all the time and that they have to respect (and at least partially embrace)
the specific characteristics of scientific software development. For a DSL to
be accepted by the target user community, the accidental complexity (both
linguistic and technical) introduced along with it must be kept to a minimum.
Concerning the order in which the DSLs should be constructed, we gen-
erally propose to develop all languages of the language hierarchy at the same
time. Preferably, the development of each DSL takes place in an incremental
fashion using agile methods. This approach provides large flexibility because
potential incompatibilities between different languages in the DSL hierarchy
can be addressed early on. Since DSLs on higher levels of the hierarchy depend
on those on lower ones, each development iteration for each language should
begin on lower hierarchy levels moving on to higher ones.
www.ebook3000.com
190 Software Engineering for Science
artifacts that are necessary for carrying out a scientific software development
project with the Sprat Approach.
The only artifact that the scientists produce together with the language
engineers to communicate the development process among themselves is a
diagram of the DSL hierarchy. Therefore, all development aspects have to
be represented in this diagram. This includes even concerns that could be
modeled as orthogonal to the actual development of the software, such as the
deployment process (cf. Figure 8.4). Thus, the hierarchy diagram represents
a combination of different concerns and even mixes structural and procedural
elements (e.g., x must be present before y can be deployed). This approach
minimizes the complexity that the scientific programmers have to deal with
but still enables meaningful reasoning about the software, its development
process, and the different responsibilities of the personnel involved.
www.ebook3000.com
192 Software Engineering for Science
1 DistributedVector u, q;
2 ElementVectorArray F_L;
3 ElementMatrixArray C;
4 ElementMatrix D;
5
6 foreach_omp( tau , Elements ( mesh ) , p r i v a t e (D) , {
7 foreach ( i , ElementDoF ( tau ) , {
8 foreach ( j , ElementDoF ( tau ) , {
9 D( i , j ) = max( i . g l o b a l I n d e x ( ) , j . g l o b a l I n d e x ( ) ) ;
10 })
11 })
12 F_L [ tau ] = C [ tau ] ∗ q + D∗u ;
13 })
14 u ∗= u . dotProduct ( q ) ;
15 u . exchangeData ( ) ;
www.ebook3000.com
194 Software Engineering for Science
−−−
− hosts : localhost
connection : l o c a l
vars_files :
− . / o p e n s t a c k _ c o n f i g . yml
tasks :
− name : Make s u r e t h e c l o u d i n s t a n c e s a r e r u n n i n g
nova_compute :
state : present
hostname : s p r a t {{ item }}
image_id : "{{ os_image_id }}"
with_sequence : s t a r t =1 end={{ n I n s t a n c e s }}
Listing 8.3: Excerpt from the Ansible Playbook for deploying the Sprat Sim-
ulation on an OpenStack cloud.
www.ebook3000.com
196 Software Engineering for Science
Additionally, they would like to have a specification of the data structures and
interfaces of the DSL in order to understand how (possibly already existing)
GPL code can be combined with the constructs of the language.
The DSL developers also suggest a combination of a summary and a com-
plete language reference as learning material. However, they do not mention
the importance of complete code examples (they rather seem to think of exam-
ple snippets embedded into written text) and they generally put much more
emphasis on the reference document than the scientists do: three out of four
interviewed DSL developers name the reference first and talk about introduc-
tory material only when asked about it. For them, the basis of the reference
document should be a formalized meta-model or the abstract syntax of the
DSL, which is supposed to quickly give a top-down overview on the language.
One of the interviewees mentions the reference of the Swift programming lan-
guage [281] as exemplary in this respect. This reference is structured around
grammar rules that are grouped according to which aspect of the language
they belong to (expressions, types, etc.).
From the interviews, it can be seen that the domain scientists seem to favor
a more practical and pragmatic approach to learning a DSL for computational
science than DSL designers might think. The scientists emphasize the impor-
tance of complete documented code examples and they are interested in a
reference only as a second step when it comes to more technical aspects of the
implementation. In consequence, the utility of a formalized meta-model and
lengthy grammar rule descriptions seems questionable for such an audience.
When developing DSLs for computational scientists, DSL designers should re-
flect on their generally more formal and systematic approach to introducing
others to such a language.
www.ebook3000.com
198 Software Engineering for Science
www.ebook3000.com
Chapter 9
Providing Mixed-Language and
Legacy Support in a Library:
Experiences of Developing PETSc
9.1 Introduction
This chapter explains how numerical libraries written in C can portably
support its use from both modern and legacy versions of Fortran efficiently.
This is done by examining, in a particular library, all the cross-language issues
in mixing C and Fortran. Despite the chagrin of many computer scientists,
scientists and engineers continue to use Fortran to develop new simulation
codes, and the Fortran language continues to evolve with new standards and
updated compilers. The need to combine Fortran and C code will also continue,
therefore, will be no less important in future computing systems that include
many-core processing with a hierarchy of memories and the integration of
GPU systems with CPU systems all the way up to exascale systems. Thus,
numerical analysts and other developers of mathematical software libraries
must ensure that such libraries are usable from Fortran. To make the situation
more complicated, depending on the age of the Fortran application (or the
age of its developers), the Fortran source code may be Fortran 77, Fortran 90,
Fortran 2003, or Fortran 2008 (possibly with TS 29113, required by MPI-3’s
“mpi_f08” module). In fact, the same Fortran application may include source
files with the suffix .f that utilize Fortran 77 syntax and formatting (traditional
fixed format), .F files that utilize some Fortran 90 or later language features
but still have fixed format, and .F90 that use free format. Many Fortran
201
www.ebook3000.com
202 Software Engineering for Science
application developers also resist utilizing the more advanced features of recent
Fortran standards for a variety of reasons. Thus, any interlanguage Fortran
library support must support both traditional Fortran dialects and modern
features such as derived types. See [325] for a short history of Fortran.
The Babel project [321] was an ambitious effort to develop a language-
independent object model that would allow scientific software libraries written
in several languages to be utilized from any of the languages, much as Corba
[326] was for business software. However, because of its extremely ambitious
nature, the tools (Java) selected to develop the model, and insufficent funding
for the large amount of development needed, the software could not fully serve
application and library needs.
The Portable Extensible Toolkit for Scientific computation (PETSc) is a
portable C software library for the scalable solution of linear, nonlinear, and
ODE/DAE systems, and computation of adjoints (sometimes called sensitiv-
ities) of ODE systems. PETSc has been developed and supported at ANL
for the past 20 years. PETSc has always supported a uniform Fortran inter-
face [319], even in the very early phases of library development (see page 29
of) [323], [324]. PETSc is written using a C-based object model (in fact, that
model inspired the Babel design) with a mapping of the objects and methods
(functions) on the objects to Fortran as well as Python. This paper discussses
only the Fortran mapping in PETSc.
Symbol names: Fortran compilers convert the symbol names to all capitals,
all lower case, or all lower case with an underscore suffix. One variant is that
symbols with an underscore get an additional underscore at the end of the
symbol. In PETSc we handle this name mangling using the preprocessor,
with code such as
#if defined(PETSC_HAVE_FORTRAN_CAPS)
#define matcreateseqaij_ MATCREATESEQAIJ
#elif !defined(PETSC_HAVE_FORTRAN_UNDERSCORE)
Providing Mixed-Language and Legacy Support in a Library 203
A terser, arguably better way of managing this is to use the paste ## feature
of the C preprocessor. First we define the macro FC_FUNC() based on the
Fortran symbol format.
#if defined(PETSC_HAVE_FORTRAN_CAPS)
#define FC_FUNC(name,NAME) NAME ## _
#elif !defined(PETSC_HAVE_FORTRAN_UNDERSCORE)
#define FC_FUNC(name,NAME) name
#else
#define FC_FUNC(name,NAME) name ## _
#endif
Defining each symbol then takes only a single line, such as
#define matcreateseqaij_ FC_FUNC(matcreateseqaij,MATCREATESEQAIJ)
Character strings: Since Fortran strings are not null terminated, the For-
tran compiler must generate additional code to indicate the length of each
string. Most Fortran compilers include the string length (as an integer) as
an additional argument at the end of the calling sequence; some compilers
pass the length (as an integer) immediately after the character argument. In
PETSc we handle this issue in the definition of our C stub function, again
using the preprocessor #define, with code such as
www.ebook3000.com
204 Software Engineering for Science
if (*ierr) return;
}
}
allocates a null terminated version of the string to pass to C and
#define FREECHAR(a,b) if (a != b) PetscFreeVoid(b);
frees the temporary string. Depending on where the Fortran compiler places
the len argument, either the PETSC_MIXED_LEN(len) or the PETSC_END_-
LEN(len) macro simply removes the argument.
Include files: Although the Fortran 77 standard did not provide for include
files, most Fortran compilers support include files that use the C preprocessor
(CPP) syntax; and for those systems that do not, one can always call the
C preprocessor on the Fortran source and then call the Fortran compiler on
the result. The use of include files with Fortran code makes possible many
of the techniques utilized by PETSc (and discussed below). Full C/Fortran
interoperability can be provided without requiring the use of Fortran include
files, instead, for example, utilizing Fortran modules to contain the needed
common data and values.
PetscEnum INSERT_VALUES
parameter (INSERT_VALUES=1)
PetscEnum ADD_VALUES
parameter (ADD_VALUES=2)
although care must be taken that the same integer values are used in the C
and Fortran code. Recent versions of Fortran support enums via
ENUM InsertMode
ENUMERATOR :: INSERT_VALUE
ENUMERATOR :: ADD_VALUE
END ENUM
ENUM, BIND(C) :: InsertMode
which automatically ensures that the values assigned in Fortran match those
in the C enum.
subroutine PetscSetFromCommonBlock()
implicit none
#include <petsc/finclude/petscsys.h>
www.ebook3000.com
206 Software Engineering for Science
call PetscSetFortranBasePointers(PETSC_NULL_CHARACTER,
PETSC_NULL_INTEGER,PETSC_NULL_DOUBLE,PETSC_NULL_OBJECT,
PETSC_NULL_FUNCTION)
return
end
passes the addresses within a common block to C.
PetscChar(80) PETSC_NULL_CHARACTER
PetscInt PETSC_NULL_INTEGER
PetscFortranDouble PETSC_NULL_DOUBLE
PetscObject PETSC_NULL_OBJECT
external PETSC_NULL_FUNCTION
common /petscfortran/ PETSC_NULL_CHARACTER,
PETSC_NULL_INTEGER,
PETSC_NULL_DOUBLE,
PETSC_NULL_OBJECT
The following C routine called from Fortran then puts the values of the Fortran
common block addresses and external function into global C variables.
void STDCALL petscsetfortranbasepointers_(char *fnull_character
PETSC_MIXED_LEN(len),
void *fnull_integer,
void *fnull_double,
void *fnull_object,
void (*fnull_func)(void)
PETSC_END_LEN(len))
{
PETSC_NULL_CHARACTER_Fortran = fnull_character;
PETSC_NULL_INTEGER_Fortran = fnull_integer;
PETSC_NULL_DOUBLE_Fortran = fnull_double;
PETSC_NULL_OBJECT_Fortran = fnull_object;
PETSC_NULL_FUNCTION_Fortran = fnull_func;
}
Note that since traditional Fortran has no concept of a common block variable
declared as a function pointer, the PETSC_NULL_FUNCTION is simply declared
with the external marker. This construct for managing null pointer usage in
Fortran is needed because Fortran has no concept of a generic NULL. Instead,
one needs a NULL for each data type; then in the C stub called from Fortran,
the specific NULL data type is converted to the C NULL, for example,
void STDCALL matcreateseqaij_(MPI_Comm *comm,PetscInt *m,
PetscInt *n,PetscInt *nz,
PetscInt *nnz,Mat *newmat,
PetscErrorCode *ierr)
Providing Mixed-Language and Legacy Support in a Library 207
{
if ((void*)(uintptr_t)nnz == PETSC_NULL_INTEGER_Fortran)
{ nnz = NULL; }
*ierr = MatCreateSeqAIJ(MPI_Comm_f2c(*(MPI_Fint*)comm),*m,*n,
*nz,nnz,newmat);
}
PETSc also has many runtime constants in the style of MPI_COMM_WORLD,
such as PETSC_VIEWER_STDOUT_WORLD, which are handled similarly but are
compile-time constants in Fortran. In Fortran they are defined as integers via
the parameter statement.
PetscFortranAddr PETSC_VIEWER_STDOUT_WORLD
parameter (PETSC_VIEWER_STDOUT_WORLD = 8)
Then the C stub checks whether the input is one of these special values and
converts to the appropriate runtime C value, for example,
#define PetscPatchDefaultViewers_Fortran(vin,v)
{
if ((*(PetscFortranAddr*)vin) ==
PETSC_VIEWER_DRAW_WORLD_FORTRAN){
v = PETSC_VIEWER_DRAW_WORLD;
} else if ((*(PetscFortranAddr*)vin) ==
PETSC_VIEWER_DRAW_SELF_FORTRAN){
v = PETSC_VIEWER_DRAW_SELF;
} else if ((*(PetscFortranAddr*)vin) ==
PETSC_VIEWER_STDOUT_WORLD_FORTRAN){
v = PETSC_VIEWER_STDOUT_WORLD;
} else ...
} else {
v = *vin;
}
}
void STDCALL vecview_(Vec *x,PetscViewer *vin,PetscErrorCode *ierr)
{
PetscViewer v;
PetscPatchDefaultViewers_Fortran(vin,v);
*ierr = VecView(*x,v);
}
www.ebook3000.com
208 Software Engineering for Science
gives users direct access to the local values with a vector. Traditional Fortran
has no concept of an array pointer, which would severely limit the use of
some of PETSc’s functionality from traditional Fortran. Fortunately, again,
despite there having been no Fortran standard for this type of functionality, it
is still achievable and has been used in PETSc for over 20 years. In the user’s
Fortran code, an array of size one is declared as well as an integer long enough
to access anywhere in the memory space from that array offset (PetscOffset
is a 32-bit integer for 32-bit memory systems and a 64-bit integer for 64-bit
memory systems).
Vec X
PetscOffset lx_i
PetscScalar lx_v(1);
They then call, for example,
call VecGetArray(X,lx_v,lx_i,ierr)
call InitialGuessLocal(lx_v(lx_i),ierr)
call VecRestoreArray(X,lx_v,lx_i,ierr)
where InitialGuessLocal is defined, for example, as
subroutine InitialGuessLocal(x,ierr)
implicit none
PetscInt xs,xe,ys,ye
common /pdata/ xs,xe,ys,ye
PetscScalar x(xs:xe,ys:ye)
PetscErrorCode ierr
size_t *ia)
{
size_t tmp1 = (size_t) base,tmp2;
size_t tmp3 = (size_t) addr;
PetscScalar,pointer :: lx_v(:)
Vec X
call VecGetArrayF90(X,lx_v,ierr)
This is implemented in PETSc by the C stub
void STDCALL vecgetarrayf90_(Vec *x, F90Array1d *ptr,int *ierr
PETSC_F90_2PTR_PROTO(ptrd))
{
PetscScalar *fa;
PetscInt len,one = 1;
*ierr = VecGetArray(*x,&fa); if (*ierr) return;
*ierr = VecGetLocalSize(*x,&len); if (*ierr) return;
*ierr = F90Array1dCreateScalar(fa,&one,&len,ptr
PETSC_F90_2PTR_PARAM(ptrd));
}
www.ebook3000.com
210 Software Engineering for Science
implicit none
#include <petsc/finclude/petscsys.h>
PetscInt start,len1
PetscScalar, target :: array(start:start+len1-1)
PetscScalar, pointer :: ptr(:)
ptr => array
end subroutine
The Portland Group Fortran compiler passes additional information about
each of the Fortran pointer arrays through final (hidden) arguments to the
called functions. With this system the PETSC_F90_2PTR_PROTO(ptrd) is de-
fined; on all other systems it generates nothing. The same general mechanism
outlined above for PetscScalar one-dimensional arrays also works (with mod-
ification) for multiple-dimensional arrays as well as arrays of integers. One
would think that with support for using F90 array features there would be no
need to continue to support the F77 compatible VecGetArray(); yet, surpris-
ingly large numbers of PETSc users continue to use the older version.
Portable Fortran source and include files: The Fortran standards pro-
vide a file format that is safe to use for all Fortran standards. This format
uses exclusively the ! in the first column, only numerical values in the second
to fifth column, a possible continuation character of & in the sixth column,
Fortran commands in the seventh to 71st column, and a possible continuation
character of & after the 72nd column. As long as this formatting is obeyed in
the libraries’ include files and source code, the code will compile with any For-
tran compiler. Note that using C for the comment character or any symbol but
the & for the continuation character will not be portable. A related issue with
ensuring that code does not exceed the 71st column is that the CPP macro
definitions in the Fortran include files may be longer than the name of the
macro, thus pushing characters that appear to be with the 71st column past
the 71st column. For example, depending on the Fortran compiler features
and PETSc options, PetscScalar may be defined as real(kind=selected_-
real_kind(10)), making user declarations such as
PetscScalar ainput,broot,ccase,dnile,erank
illegal with the fixed format.
a variety of data records as well as function pointers that implement all the
matrix functionality for a particular matrix implementation. We provide two
ways of mapping the Mat object to Fortran. In the traditional approach we
use the fact that all Fortran variables are passed by pointer (i.e., the address
of the variable is passed to the subroutine). On the Fortran side the objects
are then just
#define Mat PetscFortranAddr
where, as before, PetscFortranAddr is either a 32- or 64-bit integer depending
on the size of the memory addresses. A drawback to this approach is that in
Fortran all PETSc objects are of the same type, so that the Fortran compiler
cannot detect a type mismatch. For example, calling MatMult() with a vector
object would not be flagged as incorrect. Hence we provide an alternative
configure time approach where each PETSc object family is defined by a
Fortran derived type and utilizes modules.
use petscmat
type(Mat) A
The corresponding definition in the PETSc module is simply
type Mat
PetscFortranAddr:: v
end type Mat
Again the simplicity of the Fortran pass-by-pointer argument handling means
that what is actually passed to a C stub is again an integer large enough to hold
the PETSc object (which is, of course, a pointer). In fact, this definition allows
the same Fortran application to refer to a Mat in some files using the traditional
approach (as an integer) and in other files using the modern approach (as
a Fortran derived type). With Fortran 2003 one no longer needs to use an
appropriately sized integer to hold the C pointer in Fortran. Instead, one can
use the construct
use iso_c_binding
type(c_ptr) :: A
to directly hold the C object pointer, or one can use
use iso_c_binding
type Mat
type(c_ptr) :: v
end type Mat
www.ebook3000.com
212 Software Engineering for Science
www.ebook3000.com
214 Software Engineering for Science
9.4 Conclusion
In PETSc, we have mapped all the C-based constructs needed by users,
including enums, abstract objects, array pointers, null pointers, and function
pointers (callbacks) to equivalent traditional Fortran and modern Fortran con-
structs, allowing Fortran PETSc users to utilize almost all the functionality
of PETSc in their choice of Fortran standard. This support has substantially
enlarged the user base for PETSc. We estimate that nearly one-third of our
users work in Fortran, and we can provide them high quality numerical library
support for modern algebraic solvers. As a result of automation of much of
the process, the cost of PETSc Fortran support is significantly less than 10
percent of our development time. In addition, the Fortran support allows ap-
plications that are written partially in C and partially in Fortran, although we
are not aware of many PETSc applications implemented in this way. Because
of user and legacy demands, it is still important to support the full suite of
F77, F90, F2003, F2011, and C interfaces. The advent of F2003 Fortran-C in-
teroperability features, while a good addition, did not fundamentally change
how PETSc supports Fortran users, nor did it allow us to discard outdated
interfacing technology. Instead, it allowed us to enhance the Fortran support
we already provided. The performance hit in using PETSc from Fortran rather
than C for any nontrivial problems, consisting of only a small extra function
call overhead, is negligible because of the granularity of the operations.
Providing Mixed-Language and Legacy Support in a Library 215
www.ebook3000.com
Chapter 10
HydroShare – A Case Study of the
Application of Modern Software
Engineering to a Large Distributed
Federally-Funded Scientific Software
Development Project
217
www.ebook3000.com
218 Software Engineering for Science
Abstract
HydroShare is an online collaborative system under development
to support the open sharing of hydrologic data, analytical tools,
and computer models. With HydroShare, scientists can easily dis-
cover, access, and analyze hydrologic data and thereby enhance
the production and reproducibility of hydrologic scientific results.
HydroShare also takes advantage of emerging social media func-
tionality to enable users to enhance information about and collab-
oration around hydrologic data and models.
HydroShare is being developed by an interdisciplinary collabora-
tive team of domain scientists, university software developers, and
professional software engineers from ten institutions located across
the United States. While the combination of non–co-located, di-
verse stakeholders presents communication and management chal-
lenges, the interdisciplinary nature of the team is integral to the
project’s goal of improving scientific software development and ca-
pabilities in academia.
This chapter describes the challenges faced and lessons learned
with the development of HydroShare, as well as the approach to
software development that the HydroShare team adopted on the
basis of the lessons learned. The chapter closes with recommenda-
tions for the application of modern software engineering techniques
to large, collaborative, scientific software development projects,
similar to the National Science Foundation (NSF)–funded Hy-
droShare, in order to promote the successful application of the
approach described herein by other teams for other projects.
www.ebook3000.com
220 Software Engineering for Science
the code. Even minor errors influence the validity of research findings; in-
deed, in some cases, papers have been retracted from scientific journals and
careers have been ruined [357]. Paper retractions and irreproducible results
due to poor-quality software impede the advancement of science and impart
huge financial repercussions. Under the worst case scenario, programming er-
rors can lead to the loss of lives if erroneous findings result in faulty medical
technologies or misdirected policies on disaster response, to provide examples.
The detection of errors in academic software is extremely challenging, how-
ever. While manuscripts submitted for journal publication must undergo a
peer review process, the software code that is used to generate the findings
presented in manuscripts is rarely subjected to a peer review process or other
measures of quality assurance. Yet, peer review and testing of software code
are critical for the credibility of science and require software engineering best
practices.
Of significance, the risk of introducing error into scientific research through
the use of low-quality software provides a little recognized, but highly im-
pactful, incentive for the adoption of software engineering best practices in
academic scientific software development.
The HydroShare project addresses the challenges and highlights the ben-
efits of the adoption of software engineering best practices through a collab-
orative scientific software project involving a large, geographically dispersed
team of academic scientists, academic software developers, and professional
software engineers.
www.ebook3000.com
222 Software Engineering for Science
dents have very short-term goals (i.e., graduate in the next couple of years),
so software sustainability is not a high priority. Third, graduate students and
their faculty advisors typically have not received formal training in software
development, let alone software engineering. Fourth and lastly, the rigorous
metadata requirements necessary for reproducible science make scientific soft-
ware systems more complex than other types of software and thus require
significant time to create unit tests. This presents a paradox, as the more
complex software is, the more benefit one gets from having comprehensive
unit tests.
The team also encountered more specific technical challenges. For example,
as implementation of our HydroShare project began, the team quickly real-
ized that most members were not familiar with Git, GitHub, or continuous
integration (i.e., a development practice that requires developers to integrate
code into a shared repository on a very frequent basis). The decision was
thus made to assign only members at the lead technical institution the task
of implementing initial beta release functionalities in order to expedite cre-
ation of the code infrastructure for subsequent collaborative development and
continuous integration by the broader team members. However, this limited
HydroShare beta release functionality to only those functionalities that could
be implemented by the lead technical institution. This approach did expedite
the initial release of the system, but the approach also precluded the ability
for other team members to contribute to the development. For HydroShare,
this trade-off was acceptable as the other team members used the additional
time to get versed on continuous integration, Git, GitHub, and other specific
technologies and approaches used in HydroShare software development.
Early on in the project, the team held several in-person meetings, as well
as weekly team teleconferences, that served to achieve the development objec-
tives, including the development of a data model (i.e., a conceptual model of
how data elements relate to each other) and access control policies (i.e., policies
to restrict access to data) and thorough consideration of how to accommodate
hydrologic models within HydroShare. As implementation progressed and soft-
ware engineering principles, such as code versioning (i.e., management of revi-
sions to source code) and continuous integration, were diffused from the pro-
fessional software engineers to the hydrologists, additional challenges emerged.
For example, the distributed development team experienced difficulty achiev-
ing short-release cycles of continuous integration of the Django-based system
using Git and GitHub. Django is a large, complex, open source, python-based
web development framework, in which its customization model and content
data are stored in databases [328]. Django proved to be difficult to manage via
version control by a team with members of various skill levels. Specifically, the
challenge was how to manage multiple, distributed development teams that
were simultaneously checking out their own branch3 of HydroShare, while
3 A branch in GitHub lives separately from the production codebase, thus allowing for
www.ebook3000.com
224 Software Engineering for Science
Each of these items on their own presented a significant task; however, the
summation of all of these branches into a single release required numerous dry-
runs and small intermediate tests based on the results of the dry-runs before
the team was confident that it was right. The team put as much time into
testing and validation as they did into coding the changes themselves. The
main lesson learned from this experience is that it is best to perform smaller,
but more frequent merges, rather than a large release with multiple complex
merges. With the former approach, the merge complexity will be reduced and
time will be saved.
www.ebook3000.com
226 Software Engineering for Science
3. The team’s lack of expertise was magnified when the team members that
made the decision to adopt the system left the project and a new lead
development team came onboard without any prior knowledge of the
selected technology or understanding of the previous team’s activities;
this challenge was exacerbated by lack of transition documentation to
guide the new team.
The team has since adopted a more flexible iterative approach with Hy-
droShare, one that embraces change. The conclusion is that one should expect
to throw out an early version of a software product and learn from the expe-
rience. Also, one should realize that it is so much more efficient (and easier to
accept) if this is part of the team’s plan from the start, for when planning to
throw out an early version of developed software, a team can view the experi-
ence as an exceptional opportunity to learn what works and what doesn’t from
the perspectives of software and technology integration, team communication,
meeting productivity, and process efficiency. The HydroShare team also found
it beneficial to encapsulate functionality in small, loosely coupled systems. For
example, the distributed data management system used by HydroShare can
work separately from the content management system, which can work sepa-
rately from the web applications system, and so forth. In the first iteration,
the team found that the integration of systems too tightly presents limita-
tions. Unforeseen challenges arise in every software development project; the
key is to plan for this early on and in every facet of the project–and expect
to throw away at least one early product.
www.ebook3000.com
228 Software Engineering for Science
other parts of the system and can be tested locally. Production VMs share an
allocation of fifty terabytes of project disk space and another fifty terabytes of
replicated disk space located four miles away in order to ensure fault tolerance
and disaster recovery.
www.ebook3000.com
230 Software Engineering for Science
and the software development infrastructure (i.e., the software and hardware
used for development). With HydroShare, the establishment of communication
protocols and development infrastructure early on in the project supported
collaboration and productivity and likely will continue to serve the team well
throughout the lifetime of the project.
For example, for weekly meetings of distributed team members, the team
employs videoconferencing software with screen sharing capability. For com-
munication outside of meetings, a team email list is used. HipChat [331], a
synchronous chat tool, was adopted as a place solely for development-centric
discussion, so as to avoid overloading subject matter experts (i.e., domain sci-
entists who do not participate in development) with extraneous information
or noise that only serves to distract from the research process. Furthermore,
the team adopted a content management system to host all documents for
the project, including meeting notes, presentations, use cases, architectural
diagrams, API documentation, policies, etc. The team also uses email lists to
disseminate community announcements (e.g., [email protected], sup-
[email protected]) and to allow people to obtain support for HydroShare.
To describe the project to interested parties, the team has created public-
facing web pages. Each of these activities has proven important to the success
of HydroShare.
10.4.8 DevOps
In addition to effective communication among team members, close collab-
oration is essential. Development Operations or DevOps is an industry concept
that can be defined as an approach to software development that emphasizes
the importance of collaboration between all stakeholders [327]. DevOps recog-
nizes that stakeholders (e.g., programmers, scientists) do not work in isolation.
This principle was adopted for HydroShare; software developers and domain
scientists work together, closely and continuously, in the development of the
HydroShare code. For HydroShare, a software engineer was selected to fill the
DevOps lead role because s/he must be a maestro of Git, GitHub, and coding,
and few team scientist-developers were skilled with modern software engineer-
ing techniques at the start of the project. The appointment of an experienced
software engineer as the DevOps lead allows the scientist-developers to learn
tools such as Git as they develop and contribute code. The DevOps lead fa-
cilitates this learning process by writing task automation scripts in order to
simplify and optimize code contributions in Git. With HydroShare, GitHub is
used for issue tracking in order to drive new development or track defects (i.e.
bugs). GitHub issues are also used to track the progress of code reviews, with
developers giving a simple “+1” to indicate that the code has been reviewed
and that the DevOps lead may proceed with a code merge. Task automa-
tion scripts help the DevOps lead groom the code repository and make Git’s
branching and merging processes more transparent. Together, these activities
contribute to the DevOps lead’s ability to successfully ensure continuous in-
HydroShare – A Case Study of the Application 231
www.ebook3000.com
232 Software Engineering for Science
10.6 Conclusion
The HydroShare project is a work in progress, and exploration, refinement,
and implementation of the topics herein are by no means finished. Rather, the
HydroShare – A Case Study of the Application 233
goal is to provide readers with insight into the HydroShare experience and
lessons learned in order to minimize the learning curve and accelerate the
development progress for other teams. The goal of this chapter is to provide
readers with a basic understanding of why good software engineering for sci-
ence is tantamount to the success and sustainability of a scientific research
project and why poor software engineering will detract from research time,
with more time spent managing poorly written code than actually conducting
research. In the long run, good software engineering will foster research and
one’s research career by ensuring the validity of research findings, reducing
the amount of time needed to maintain and extend code, and improving the
ease at which new features can be adopted, thus supporting software reuse
and sustainability.
Acknowledgments
Technical editorial and writing support was provided by Karamarie Fe-
cho, Ph.D. This material is based upon work supported by the NSF under
awards 1148453 and 1148090; any opinions, findings, conclusions, or recom-
mendations expressed in this material are those of the authors and do not
necessarily reflect the views of the NSF. The authors wish to thank many
who have contributed to the HydroShare project, including but not limited to:
Jennifer Arrigo, Larry Band, Christina Bandaragoda, Alex Bedig, Brian Blan-
ton, Jeff Bradberry, Chris Calloway, Claris Castillo, Tony Castronova, Mike
Conway, Jason Coposky, Shawn Crawley, Antoine deTorcey, Tian Gan, Jon
Goodall, Ilan Gray, Jeff Heard, Rick Hooper, Harry Johnson, Drew (Zhiyu) Li,
Rob Lineberger, Yan Liu, Shaun Livingston, David Maidment, Phyllis Mbewe,
Venkatesh Merwade, Setphanie Mills, Mohamed Morsy, Jon Pollak, Mauriel
Ramirez, Terrell Russell, Jeff Sadler, Martin Seul, Kevin Smith, Carol Song,
Lisa Stillwell, Nathan Swain, Sid Thakur, David Valentine, Tim Whiteaker,
Zhaokun Xue, Lan Zhao, and Shandian Zhe.
The authors wish to especially thank Stan Ahalt, Director of RENCI,
and Ashok Krishnamurthy, Deputy Director of RENCI for their continued
organizational and supplemental financial support of this project.
www.ebook3000.com
References
235
www.ebook3000.com
236 References
www.ebook3000.com
238 References
www.ebook3000.com
240 References
[56] Arne Beckhause, Dirk Neumann, and Lars Karg. The impact of com-
muncation structure on issue tracking efficiency at a large business soft-
ware vendor. Issues in Information Systems, X(2):316–323, 2009.
[57] Jacques Carette. Gaussian elimination: A case study in efficient gener-
icity with MetaOCaml. Science of Computer Programming, 62(1):3–24,
2006.
[58] Jacques Carette, Mustafa ElSheikh, and W. Spencer Smith. A gener-
ative geometric kernel. In ACM SIGPLAN 2011 Workshop on Partial
Evaluation and Program Manipulation (PEPM’11), pages 53–62, Jan-
uary 2011.
[59] Jeffrey C. Carver, Richard P. Kendall, Susan E. Squires, and Douglass E.
Post. Software development environments for scientific and engineer-
ing software: A series of case studies. In ICSE ’07: Proceedings of the
29th International Conference on Software Engineering, pages 550–559,
Washington, DC, USA, 2007. IEEE Computer Society.
[60] CIG. Mineos. http://geodynamics.org/cig/software/mineos/,
March 2015.
[61] CRAN. The comprehensive R archive network. https://cran.
r-project.org/, 2014.
[62] CSA. Quality assurance of analytical, scientific, and design computer
programs for nuclear power plants. Technical Report N286.7-99, Cana-
dian Standards Association, 178 Rexdale Blvd. Etobicoke, Ontario,
Canada M9W 1R3, 1999.
[63] Andrew P. Davison. Automated capture of experiment context for eas-
ier reproducibility in computational research. Computing in Science &
Engineering, 14(4):48–56, July-Aug 2012.
[64] Andrew P. Davison, M. Mattioni, D. Samarkanov, and B. Teleńczuk.
Sumatra: A toolkit for reproducible research. In V. Stodden, F. Leisch,
and R.D. Peng, editors, Implementing Reproducible Research, pages 57–
79. Chapman & Hall/CRC, Boca Raton, FL, March 2014.
[65] Paul F. Dubois. Designing scientific components. Computing in Science
and Engineering, 4(5):84–90, September 2002.
[66] Paul F. Dubois. Maintaining correctness in scientific programs. Com-
puting in Science & Engineering, 7(3):80–85, May-June 2005.
[67] Steve M. Easterbrook and Timothy C. Johns. Engineering the software
for understanding climate change. IEEE Des. Test, 11(6):65–74, 2009.
References 241
www.ebook3000.com
242 References
[83] Oleg Kiselyov, Kedar N. Swadi, and Walid Taha. A methodology for
generating verified combinatorial circuits. In Proceedings of the 4th ACM
International Conference on Embedded Software, EMSOFT ’04, pages
249–258, New York, NY, USA, 2004. ACM.
[84] Donald E. Knuth. Literate Programming. CSLI Lecture Notes Number
27. Center for the Study of Language and Information, 1992.
[85] Adam Lazzarato, Spencer Smith, and Jacques Carette. State of the
practice for remote sensing software. Technical Report CAS-15-03-SS,
McMaster University, January 2015. 47 pp.
[86] Friedrich Leisch. Sweave: Dynamic generation of statistical reports using
literate data analysis. In Wolfgang Härdle and Bernd Rönz, editors,
Compstat 2002 — Proceedings in Computational Statistics, pages 575–
580. Physica Verlag, Heidelberg, 2002. ISBN 3-7908-1517-9.
[87] Jon Loeliger and Matthew McCullough. Version Control with Git:
Powerful Tools and Techniques for Collaborative Software Development.
O’Reilly Media, Inc., 2012.
[88] Thomas Maibaum and Alan Wassyng. A product-focused approach to
software certification. IEEE Computer, 41(2):91–93, 2008.
[89] NASA. Software requirements DID, SMAP-DID-P200-SW, release 4.3.
Technical report, National Aeronautics and Space Agency, 1989.
[90] Nedialko S. Nedialkov. Implementing a Rigorous ODE Solver through
Literate Programming. Technical Report CAS-10-02-NN, Department
of Computing and Software, McMaster University, 2010.
[91] Suely Oliveira and David E. Stewart. Writing Scientific Software: A
Guide to Good Style. Cambridge University Press, New York, NY, USA,
2006.
[92] Linda Parker Gates. Strategic planning with critical success factors and
future scenarios: An integrated strategic planning framework. Tech-
nical Report CMU/SEI-2010-TR-037, Software Engineering Institute,
Carnegie-Mellon University, November 2010.
[93] David L. Parnas. On the criteria to be used in decomposing systems
into modules. Comm. ACM, 15(2):1053–1058, December 1972.
[94] David L. Parnas, P. C. Clement, and D. M. Weiss. The modular struc-
ture of complex systems. In International Conference on Software En-
gineering, pages 408–419, 1984.
References 243
[95] David L. Parnas and P.C. Clements. A rational design process: How and
why to fake it. IEEE Transactions on Software Engineering, 12(2):251–
257, February 1986.
[96] David Lorge Parnas. Precise documentation: The key to better software.
In The Future of Software Engineering, pages 125–148, 2010.
[97] Matt Pharr and Greg Humphreys. Physically Based Rendering: From
Theory to Implementation. Morgan Kaufmann Publishers Inc., San
Francisco, CA, USA, 2004.
[98] Michael Pilato. Version Control With Subversion. O’Reilly & Asso-
ciates, Inc., Sebastopol, CA, USA, 2004.
[99] Patrick J. Roache. Verification and Validation in Computational Science
and Engineering. Hermosa Publishers, Albuquerque, NM, 1998.
[100] Padideh Sarafraz. Thermal optimization of flat plate PCM cap-
sules in natural convection solar water heating systems. Mas-
ter’s thesis, McMaster University, Hamilton, ON, Canada, 2014.
http://hdl.handle.net/11375/14128.
[101] Judith Segal. When software engineers met research scientists: A case
study. Empirical Software Engineering, 10(4):517–536, October 2005.
[102] Judith Segal and Chris Morris. Developing scientific software. IEEE
Software, 25(4):18–20, July/August 2008.
www.ebook3000.com
244 References
www.ebook3000.com
246 References
www.ebook3000.com
248 References
www.ebook3000.com
250 References
[202] Rebecca Sanders and Diane Kelly. The challenge of testing scientific
software. In CAST ’08: Proceedings of the 3rd Annual Conference of the
Association for Software Testing, pages 30–36, Toronto, ON, Canada,
2008. Association for Software Testing.
[203] Rebecca Sanders and Diane Kelly. Dealing with risk in scientific software
development. IEEE Software, 25(4):21–28, 2008.
[204] A framework to write repeatable Java tests. available at
http://junit.org, Accessed: 12-20-2015.
[205] A Unit Testing Framework for C. available at http://cunit. source-
forge.net, Accessed: 12-10-2015.
[206] A Unit Testing Framework for FORTRAN. available at https://
rubygems.org/gems/funit/versions/0.11.1, Accessed: 20-12-2015.
[207] Cube 4.x series, 2015. Version 4.3.2, available at http://www.scalasca.
org/software/cube-4.x/download.html, Accessed: 06-10-2015.
[208] D. Babic, L. Martignoni, S. McCamant, and D. Song. Statically-directed
dynamic automated test generation. pages 12–22, 2011.
[209] G. B. Bonan. The land surface climatology of the NCAR land surface
model coupled to the NCAR community. Climate Model. J. Climate,
11:1307–1326.
[210] C. Cadar, D. Dunbar, and D. Engler. Klee: Unassisted and automatic
generation of high-coverage tests for complex systems programs. pages
209–224, 2008.
[211] D. Wang, W. Wu, T. Janjusic, Y. Xu, C. Iversen, P. Thornton, and
M. Krassovisk. Scientific functional testing platform for environmen-
tal models: An application to community land model. International
Workshop on Software Engineering for High Performance Computing
in Science, 2015.
[212] R. E. Dickinson, K. W. Oleson, G. Bonan, F. Hoffman, P. Thornton,
M. Vertenstein, Z. Yang, and X. Zeng. The community land model and
its climate statistics as a component of the community climate system
model. J. Clim., 19:2302–2324, 2006.
[213] M. Feathers. Working Effectively with Legacy Code. Prentice-Hall, 2004.
[214] A. Knüpfer, C. Rössel, D. Mey, S. Biersdorf, K. Diethelm, D. Eschweiler,
M. Gerndt, D. Lorenz, A. D. Malony, W. E. Nagel, Y. Oleynik, P. Sa-
viankou, D. Schmidl, S. Shende, R. Tschüter, M. Wagner, B. Wesarg,
and F. Wolf. Score-P - A Joint Performance Measurement Run-Time
Infrastructure for Periscope, Scalasca, TAU, and Vampir. 5th Parallel
Tools Workshop, 2011.
www.ebook3000.com
252 References
www.ebook3000.com
254 References
[244] D. Kelly, R. Gray, and Y. Shao, “Examining random and designed tests
to detect code mistakes in scientific software,” Journal of Computational
Science, vol. 2, no. 1, pp. 47–56, 2011. [Online]. Available: http://www.
sciencedirect.com/science/article/pii/S187775031000075X
[245] D. Kelly, S. Thorsteinson, and D. Hook, “Scientific software testing:
Analysis with four dimensions,” Software, IEEE, vol. 28, no. 3, pp. 84–
90, May–Jun. 2011.
www.ebook3000.com
256 References
[254] R. Sanders and D. Kelly, “Dealing with risk in scientific software devel-
opment,” IEEE Software, vol. 25, no. 4, pp. 21–28, Jul.–Aug. 2008.
www.ebook3000.com
258 References
[276] Y. Jia and M. Harman, “An analysis and survey of the development of
mutation testing,” IEEE Transactions on Software Engineering, vol. 37,
pp. 649–678, 2011.
[277] Y.-S. Ma and J. Offutt, “Description of method-level mutation opera-
tors for java,” November 2005. [Online]. Available: http://cs.gmu.edu/
~offutt/mujava/mutopsMethod.pdf
[278] David Abrahams and Aleksey Gurtovoy. C++ Template Metaprogram-
ming: Concepts, Tools, and Techniques from Boost and Beyond. Addison
Wesley, 2004.
[279] Kaitlin Alexander and Stephen M. Easterbrook. The software ar-
chitecture of climate models: A graphical comparison of CMIP5 and
EMICAR5 configurations. Geoscientific Model Development Discus-
sions, 8(1):351–379, 2015.
[280] Ansible Incorporated. Ansible documentation. http://docs.ansible.
com, 2015.
[281] Apple Incorporated. The Swift programming language – lan-
guage reference. https://developer.apple.com/library/ios/
documentation/Swift/Conceptual/Swift_Programming_Language/
AboutTheLanguageReference.html, 2015.
[282] Simonetta Balsamo, Antinisca Di Marco, Paola Inverardi, and Marta
Simeoni. Model-based performance prediction in software development:
A survey. Software Engineering, 30(5):295–310, 2004.
[283] Victor R. Basili, Daniela Cruzes, Jeffrey C. Carver, Lorin M. Hochstein,
Jeffrey K. Hollingsworth, Marvin V. Zelkowitz, and Forrest Shull. Un-
derstanding the high-performance-computing community: A software
engineer’s perspective. IEEE Software, 25(4):29–36, 2008.
[284] Marco Brambilla, Jordi Cabot, and Manuel Wimmer. Model-driven soft-
ware engineering in practice. Number 1 in Synthesis Lectures on Soft-
ware Engineering. Morgan & Claypool, 2012.
[285] Susanne Brenner and L. Ridgway Scott. The Mathematical Theory of
Finite Element Methods. Springer, 3 edition, 2008.
www.ebook3000.com
260 References
[301] Diane Kelly. A software chasm: Software engineering and scientific com-
puting. IEEE Software, 24(6):118–120, 2007.
[302] Sarah Killcoyne and John Boyle. Managing chaos: Lessons learned devel-
oping software in the life sciences. Computing in Science & Engineering,
11(6):20–29, 2009.
[314] Judith Segal and Chris Morris. Developing scientific software. Software,
IEEE, 25(4):18–20, 2008.
[315] Thomas Stahl and Markus Völter. Model-Driven Software Development:
Technology, Engineering, Management. Wiley, 2006.
[316] Mark Strembeck and Uwe Zdun. An approach for the systematic devel-
opment of domain-specific languages. Software: Practice and Experience,
39(15):1253–1292, 2009.
[317] Gregory V. Wilson. Where’s the real bottleneck in scientific computing?
American Scientist, 94(1):5–6, 2006.
[318] Gregory V. Wilson. Software carpentry: Lessons learned.
F1000Research, 3:1–11, 2014.
[319] Satish Balay, Shrirang Abhyankar, Mark F. Adams, Jed Brown, Peter
Brune, Kris Buschelman, Lisandro Dalcin, Victor Eijkhout, William D.
Gropp, Dinesh Kaushik, Matthew G. Knepley, Lois Curfman McInnes,
Karl Rupp, Barry F. Smith, Stefano Zampini, and Hong Zhang. PETSc
Users Manual. Technical Report ANL-95/11 - Revision 3.6, Argonne
National Laboratory, 2015.
[320] David M. Beazley. SWIG: An easy to use tool for integrating scripting
languages with C and C++. In Proceedings of the 4th USENIX Tcl/Tk
Workshop, pages 129–139, 1996.
[321] Thomas G. W. Epperly, Gary Kumfert, Tamara Dahlgren, Dietmar
Ebner, Jim Leek, Adrian Prantl, and Scott Kohn. High-performance
language interoperability for scientific computing through Babel. In-
ternational Journal of High Performance Computing Applications, page
1094342011414036, 2011.
[322] William D. Gropp. Users manual for bfort: Producing Fortran inter-
faces to C source code. Technical Report ANL/MCS-TM-208, Argonne
National Laboratory, IL (United States), 1995.
[323] William D. Gropp and Barry F. Smith. Simplified linear equation
solvers users manual. Technical Report ANL–93/8, Argonne National
Laboratory, IL (United States), 1993.
www.ebook3000.com
262 References
www.ebook3000.com
264 References