Scalable Computational Science
Scalable Computational Science
Scalable Computational Science
Carmine Spagnuolo
Università degli Studi di Salerno
Dipartimento di Informatica
Dottorato di Ricerca in Informatica e Ingegneria dell’Informazione
DOCTOR OF PHILOSOPHY
Computer Science
Carmine Spagnuolo
2017
Carmine Spagnuolo
Scalable Computational Science
Parallel and Distributed Computing, Supervisors: Prof. Vittorio Scarano and Dott. Gennaro
Cordasco
Successes in computational science over the past twenty years have caused demand
of supercomputing, to improve the performance of the solutions and to allow the
growth of the models, in terms of sizes and quality. From a computer scientist’s
perspective, it is natural to think to distribute the computation required to study a
complex systems among multiple machines: it is well known that the speed of single-
processor computers is reaching some physical limits. For these reasons, parallel
and distributed computing has become the dominant paradigm for computational
scientists who need the latest development on computing resources in order to solve
their problems and the “Scalability” has been recognized as the central challenge in
this science.
v
Interface) – are provided in order to meet both the flexibility and scalabil-
ity requirements.
– Memory consistency. D-MASON provides a memory consistency mecha-
nism at framework level which enables the modeler to design the simu-
lation without any specific knowledge about the underlying distributed
memory environment.
– Support diverse computing environments. D-MASON was initially con-
ceived to harness the amount of unused computing power available in
common, even heterogeneous, installations like educational laboratories.
Thereafter the focus moved to dedicated homogenous installations, such
as massively parallel machines or supercomputing centers. Eventually,
D-MASON has been extended in order to provide a SIMulation-as-a-
Service (SIMaaS) infrastructure that simplifies the process of setting up
and running distributed simulations in a Cloud Computing environment.
• The proposal of an architecture, which enable to invoke code supported by
a Java Virtual Machine (JVM) from code written in C language. Swift/T is a
parallel programming language that enables to compose and execute a series
of computational or data manipulation steps in a scientific application. Swift/T
enables to easily execute code written in other languages, as C, C++, Fortran,
Python, R, Tcl, Julia, Qt Script. The proposed architecture has been integrated
in Swift/T (since the version 1.0) enabling the support for others kinds of
interpreted languages.
• The proposal of two tools, which exploit the computing power of parallel sys-
tems to improve the effectiveness and the efficiency of Simulation Optimization
strategies. Simulations Optimization (SO) is used to refer to the techniques
studied for ascertaining the parameters of a complex model that minimize
(or maximize) given criteria (one or many), which can only be computed by
performing a simulation run. Due to the the high dimensionality of the search
space, the heterogeneity of parameters, the irregular shape and the stochas-
tic nature of the objective evaluation function, the tuning of such systems is
extremely demanding from the computational point of view. The proposed
tools are SOF (Simulation Optimization and exploration Framework on the
cloud) and EMEWS (Extreme-scale Model Exploration With Swift/T) which
focus respectively on Cloud Environment and HPC systems.
• The proposal of an open-source, extensible, architecture for the visualization
of data in HTML pages, exploiting a distributed web computing. Following the
Edge-centric Computing paradigm, the data visualization is performed edge
side ensuring data trustiness, privacy, scalability and dynamic data loading.
The architecture has been exploited in the Social Platform for Open Data
(SPOD).
vi
„
Acknowledgement
La doccia è milanese perché ci si lava meglio,
consuma meno acqua e fa perdere meno tempo.
Il bagno invece è napoletano. E’ un incontro con
i pensieri, un appuntamento con la fantasia.
vii
A mia madre Lena!
A mio padre Enzo!
A mia sorella Rox!
A Skipper!
Al mio Sogno!
ix
Contents
1 Introduction 1
1.1 Computational Science . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 What is scalability? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 Parallel and Distributed Programming Background . . . . . . 5
1.4 When a system is scalable? . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.1 How to Measure Scalability Efficiency . . . . . . . . . . . . . 13
1.5 Scalable Computational Science . . . . . . . . . . . . . . . . . . . . . 14
1.5.1 Real world example . . . . . . . . . . . . . . . . . . . . . . . 15
1.6 Dissertation Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.6.1 Distributed Agent-Based Simulation . . . . . . . . . . . . . . 17
1.6.2 SWIFT/T Parallel Language and JVM scripting . . . . . . . . 20
1.6.3 Simulation Optimization . . . . . . . . . . . . . . . . . . . . 20
1.6.4 Scalable Web Scientific Visualization . . . . . . . . . . . . . . 21
xi
2.6.1 Scalability of field partitioning strategies . . . . . . . . . . . . 82
2.6.2 Scalability of the Communication layer . . . . . . . . . . . . . 84
2.6.3 Beyond the Limits of Sequential Computation . . . . . . . . . 88
2.6.4 Scalability and Cost evaluation on a cloud infrastructure . . . 90
xii
6 Conclusion 167
Appendices 171
Bibliography 205
xiii
Introduction 1
„ The complexity of parallel, networked platforms
and highly parallel and distributed systems is
rising dramatically. Today’s 1,000-processor
parallel computing systems will rapidly evolve
into the 100,000-processor systems of tomorrow.
Hence, perhaps the greatest challenge in
computational science today is software that is
scalable at all hardware levels (processor, node,
and system) . In addition, to achieve the
maximum benefit from parallel hardware
configurations that require such underlying
software, the software must provide enough
concurrent operations to exploit multiple
hardware levels gracefully and efficiently.
1
1. Algorithms (numerical and non-numerical) and modeling and simulation
software developed to solve science (e.g., biological, physical, and social),
engineering, and humanities problems;
2. Computer and information science that develops and optimizes the ad-
vanced system hardware, software, networking, and data management compo-
nents needed to solve computationally demanding problems;
3. The computing infrastructure that supports both the science and engineering
problem solving and the developmental computer and information science.
The Figure 1.1 describes the definition of SC and its relation with this work. As
shown in the Figure, SC tackles complex problems using multidisciplinary skills in
combination with the computational thinking. Each contribution of this work aims
to face a key-aspect of the SC areas. The Figure shows also the contributions of this
work (see the circles) to the SC field. These contributions can be divided into three
categories:
1.2 Motivation
The SC field comprises many disciplines and scientific fields. From the Computer
Science point of view, the challenge is to improve the current status of methods,
algorithms and applications in order to enhance support for SC in terms of both
efficiency and effectiveness of the solutions.
2 Chapter 1 Introduction
Fig. 1.1.: Computational Science areas and their relations with the contributions of this
work.
For instance, one can consider what is needed for the development of new simulation
software or for a novel simulation model to study natural disaster or epidemiological
emergency. In order to study the effect on a population, it is desirable that our
system enables to perform experiments increasing:
Unfortunately, there is not a generic answer. It is not possible to describe all possible
requirements, their are dependent on the problem itself. Nevertheless, it is important
to identify methodologies and solutions that could be helpful to tackle real complex
problems.
As described in [Com05] the most important constrains that should guide our
research is the scalability. Our solutions, software, models, algorithms, systems
should be scalable according to the problem itself. This requirement is not only on
the architectures and frameworks software development, but comprises also design
solutions to a problem in the SC field.
1.2 Motivation 3
Scalability is a frequently-claimed attribute of system or solution to face complex
problems using computer systems. The definition of Scalability is not generally-
accepted (as described in [Hil90]). Nevertheless, we can use some of its principles to
better design scalable solutions for the SC field. According to this idea it is possible
to state that the Computational Science should be scalable at different software and
hardware levels.
The system scalability requirement also refers to the application of very large com-
pute clusters to solve computational problems. A compute cluster is a set of com-
puting resources that work and collaborate together to solve a problem. In the
following, we denote with supercomputing system a system with a large number of
nodes and dedicated hardware architecture.
The availability of supercomputing systems become much affordable day by day, and
also Cloud Computing systems - which offer a large number of high performance
resources at low-cost - is an attractive opportunity in the SC field. The Top500
Supercomputer site, provides a detailed description of the most powerful available
systems; this list is updated twice a year. The Figure 1.2 shows the trend of su-
percomputing architectures over time 1995–2015. We will refer to this systems as
Super-scale and Extra-scale computing systems.
4 Chapter 1 Introduction
Fig. 1.2.: Systems Architecture Share from Top500.
In order to better understand the concept of scalability in the next Section 1.3.1 in
the following is provided some background concepts about parallel and distributed
computing.
Flynn’s Taxnomy
To understand the parallel computing, the first concept that is essential to know
is the classification of the computer architectures. An important and well know
classification scheme is the Flynn taxonomy. Figure 1.3 shows the taxonomy defined
by Flynn in 1966.
6 Chapter 1 Introduction
Fig. 1.4.: Microprocessor Transistor counts from 1971 to 2011.
Moore’s law
The Moore’s Law shows that the count of CPU and RAM transistor doubled each
year. Nonetheless due to physical limits, heat emission and the size of the atom, this
trend has ended around 2008 stabilizing the speed processors. That is clearly shown
in Figure 1.5 and 1.6).
The actual trend of CPU vendors is to increase the number of core for machine in
order to have better performance. In other words, this means that each science has
8 Chapter 1 Introduction
Fig. 1.7.: Symmetric multiprocessing architecture.
Computational Models
This section describes two theoretical computational models that are independent on
the parallel computing architectures used. These models aim to measure the quality
Equal Duration Model. These model measure the quality of a solution in terms of its
execution time. Consider a given task that can be decomposed into n equal sub-tasks,
each one executable on a different processor. Let ts (resp. tm ) be the execution time
of the whole task on a single processor unit (resp. concurrently on n processors).
Since this model consider that all processors execute their task concurrently, we
have tm = tns . So, it is possible to compute the speedup factor (S(n)) of a parallel
system, as the ratio between the time to execute the whole computation on a single
processors (ts ) and the time obtained exploiting n processors unit (tm ).
def ts ts
S(n) = = =n
tm ts /n
Obviously, the previous definition is not enough to describe the speed obtained.
One need to consider the communication overhead introduced by the parallel or
distributed computation. Let tc be the communication overhead, the total parallel
time is given by tm = (ts /n) + tc and the speedup become.
def ts ts n
S(n) = = ts = tc
tm n + tc 1+n× ts
This value normalized by n is named efficiency ξ and can be seen as the speedup per
processor.
S(n) 1
ξ= =
n 1 + n × ttsc
The value of the efficiency ranges between 0 and 1. Again, this model is not realistic,
because it is based on the assumption that a given task can be divided, in “equal” sub-
tasks, among n processors. On the other hand, real algorithms contains some serial
parts that cannot be divided among processors. Furthermore, a parallel algorithm
is also characterized by same sub-tasks that cannot be executed concurrently by
processors. These sub-tasks includes synchronization or other special instructions,
and are named critical sections. The Figure 1.10 shows a program that have some
code segments that could be executed in parallel while some other segments must
be executed sequentially (due to interdependencies).
This consideration is the main idea of the next model, Parallel Computation with
Serial Sections Model.
Parallel Computation with Serial Sections Model This model assumes that only a
fraction f of a task can be divided into concurrent sub-tasks and the remaining
fraction 1 − f has to be executed in serial way. Like the previous model, the total
10 Chapter 1 Introduction
Fig. 1.10.: Example program segments.
ts n
S(n) = ts
= (1.1)
f ts + (1 − f ) n
1 + (n − 1)f
As in the equal duration model, the speedup factor considering the communication
overhead is given by
ts n
S(n) = ts =
f ts + (1 − f ) n + tc f (n − 1) + 1 + n tc
ts
Considering the limit of the number of processors used, we have that the maximum
speedup factor is given by
n 1
lim S(n) = lim = (1.2)
n→∞ n→∞ tc tc
f (n − 1) + 1 + n ts f+ ts
According to equation 1.2, it is worth to notice that the maximum speedup factor
depends on the fraction of the computation that cannot be parallelized and on the
communication overhead. In this model the efficiency is given by
1
ξ=
1 + (n − 1)f
without considering the communication overhead, while taking into account the
communication overhead we have,
1
ξ=
tc
f+ ts
This section introduces two laws that aim to describe the benefits of using parallel
computing.
Grosch’s Law. H. Grosh, in the 1940, postulated that the power of a computer
system P increases in proportion to the square of its cost C, P = K × C 2 with K a
positive constant. The Figure 1.11 depicts the relationship between the power and
the cost of a computer system. Today this law is clearly abrogated while the research
communities and computational scientists are looking for strategies to make the
most of HPC and heterogeneous distributed systems. The SC field is one of the most
attractive science from this point of view, this field is involved in many real problems
that very often could have advantages in using HPC systems, in terms of problems
size achievable, speedup and efficiency.
Amdahl’s Law. Starting from the definition of speedup, it is possible to study the
maximum speed achievable independently from the number of processors involved
in a given computation.
According to equation (1.1), also known as Amdahl’s law, the potential speedup,
using n processors, is defined by the size of the sequential fraction of the code f .
The Amdahl’s principle states that the maximum speedup factor is given by
1
lim S(n) =
n→∞ f
Nevertheless there are real problems that have a sequential part f that is a function
of n, such that limn→∞ f (n) = 0. In this cases, the speedup limit is
12 Chapter 1 Introduction
n
lim S(n) = lim =n
n→∞ n→∞ 1 + (n − 1)f (n)
This contradicts the Amdahl’s law considering that is possible to achieve linear
speedup increasing the problem size. This statement has been verified by researchers
at the Sandia National Laboratories which show a linear speed up factor can be
possible for some engineering problems.
This reassures and supports the hypothesis that it is worth to invest in the SC fields
in order to improve the current solutions using parallel and distributed computing.
Indeedu SC fields face complex problems also in terms of input dimensions.
The strong scalability aims to study the number of resources needed, to a given
application in order to complete the computation in a reasonable time. For this
reason the problem size stays fixed but the number of processing elements are
increased. An application scales linearly when the speedup obtained is equal to the
computational resources used, n, but it is harder to obtain a linear scalability due to
the communication overhead, that typically increases in proportion to the size of
n.
Given the completion time of a single task, t1 , and tm the completion time of the
same task on n computational resources, the strong scalability is given by:
ts
SS = × 100
(n × tm )
The weak scalability aims to define the efficiency of an application fixing the problem
size for each computational resource. This measure is useful for studying the memory
or resources consumption of an application.
In the case of weak scaling, linear scaling is achieved if the run time stays constant
while the workload is increased in direct proportion to the number of computational
resources
Given the completion time of a single work unit, t1 , and tm the completion time of n
works unit on n computational resources, the weak scalability is given by:
t1
WS = ( ) × 100
tN
14 Chapter 1 Introduction
This work refers to the idea to use scalability as a major design objective for problem
solving in SC. Henceforth we call this approach Scalable Computational Science
(SCS).
The complexity of the ABM for CRC and the dimensions of the parameters space is
huge and the calibration of the model was unimaginable without running simula-
tions on a HPC system. This example shows how the use of an HPC ME framework’s
Fig. 1.12.: Scaling study results for EMEWS ABM calibaration workflow on IBM Blue
Gene/Q.
16 Chapter 1 Introduction
1.6 Dissertation Structure
This dissertation discusses Frameworks, Architecture and Parallel Languages for SCS.
Chapter 2 provides the results about Distributed Agent-Based Simulation. In details
a novel framework D-MASON is presented. Chapter 3 presents the contributions in
developing Java scripting integration in the parallel language Swif/T. Thereafter,
Chapter 4 describes two framework for Simulation Optimization (SO): SOF: Simula-
tion Optimization Framework on the Cloud Computing infrastructure and EMEWS:
Extreme-scale Model Exploration with Swift/T on HPC systems. In the Chapter 5 is
discussed a scalable architecture for scientific visualization of Open Data on the Web.
Finally, Chapter 6 presents a summary of the results and analyzes some directions
for future research.
D-MASON, was began to be developed since 2011, the main purpose of the project
was overcoming the limits of the sequentially computation of MASON, using dis-
tributed computing. D-MASON enables to do more than MASON in terms of size
of simulations (number of agents and complexity of agents behaviors), but allows
also to reduce the simulation time of simulations written in MASON. For this reason,
one of the most important feature of D-MASON is that it requires a limited number
of changing on the MASON’s code in order to execute simulations on distributed
systems.
[Cor+16b] Cordasco G., Spagnuolo C. and Scarano V. Toward the new version
of D-MASON: Efficiency, Effectiveness and Correctness in Parallel and Distributed
Agent-based Simulations. 1st IEEE Workshop on Parallel and Distributed
Processing for Computational Social Systems. IEEE International Parallel &
Distributed Processing Symposium 2016.
[Cor+11] Cordasco G., De Chiara R., Mancuso A., Mazzeo D., Scarano V. and
Spagnuolo C. A Framework for distributing Agent-based simulations. Ninth
International Workshop Algorithms, Models and Tools for Parallel Computing
on Heterogeneous Platforms of Euro-Par 2011 conference.
Much effort has been made, on the Communication Layer, to improve the commu-
nication efficiency in the case of homogeneous systems. D-MASON is based on
Publish/Subscribe (PS) communication paradigm and uses a centralized message
broker (based on the Java Message Service standard) to deal with heterogeneous
systems. The communication for homogeneous system uses the Message Passing
Interface (MPI) standard and is also based on PS. In order to use MPI within Java,
D-MASON uses a Java binding of MPI. Unfortunately, this binding is relatively new
and does not provides all MPI functionalities. Several communication strategies
were designed, implemented and evaluated. These strategies were presented in two
papers:
D-MASON provides also mechanisms for the visualization and gathering of the data
in distributed simulation (available on the Visualization Layer). These solutions are
presented in the paper:
[Cor+13b] Cordasco G., De Chiara R., Raia F., Scarano V., Spagnuolo C. and
Vicidomini L. Designing Computational Steering Facilities for Distributed Agent
Based Simulations. Proceedings of the ACM SIGSIM Conference on Principles
of Advanced Discrete Simulation 2013.
In DABS one of the most complex problem is the partitioning and balancing of the
computation. D-MASON provides, in the Distributed Simulation layer, mechanisms
for partitioning and dynamically balancing the computation. D-MASON uses field
partitioning mechanism to divide the computation among the distributed system.
18 Chapter 1 Introduction
The field partitioning mechanism provides a nice trade-off between balancing and
communication effort. Nevertheless a lot of ABS are not based on 2D- or 3D-fields
and are based on a communication graph that models the relationship among the
agents. In this case the field partitioning mechanism does not ensure good simulation
performance.
The field partitioning mechanism, intuitively, enables the mono and bi-dimensional
partitioning of an Euclidean space. This approach is also know as uniform par-
titioning. But in some cases, e.g. simulations that simulate urban areas using a
Geographical Information System (GIS), the uniform partitioning degrades the sim-
ulation performance, due to the unbalanced distribution of the agents on the field
and consequently on the computational resources, see Appendix B. In such a case,
D-MASON provides a non-uniform partitioning mechanism (inspired by Quad-Tree
data structure), presented in the following papers:
Swif/T provides an interesting feature the one of calling easily and natively other
languages (as Python, R, Julia, C) by using special language functions named leaf
functions.
Considering the actual trend of some supercomputing vendors (such as Cray Inc.)
that support in its processors Java Virtual Machines (JVM), it is desirable to provide
methods to call also Java code from Swift/T. In particular is really attractive to be
able to call scripting languages for JVM as Clojure, Scala, Groovy, JavaScript etc.
For this purpose a C binding to instanziate and call JVM was designed, and is
described in the Chapter 3. This binding is used in Swif/T (since the version 1.0) to
develop leaf functions that call Java code. The code are public available at GitHub
project page .
20 Chapter 1 Introduction
The first frameworks is SOF: Zero Configuration Simulation Optimization Framework
on the Cloud, it was designed to run SO process in the cloud. SOF is based on the
Apache Hadoop [Tur13] infrastructure and is presented in the following paper:
[Car+16b] Carillo M., Cordasco G., Scarano V., Serrapica F., Spagnuolo C.
and Szufel P. SOF: Zero Configuration Simulation Optimization Framework on
the Cloud. Parallel, Distributed, and Network-Based Processing 2016.
Open data is data freely available to everyone, without restrictions from copyright,
patents or other mechanisms of control. The presented architecture allows to easily
visualize data in classical HTML pages. The most important design feature concerns
the rendering of the visualization that is made on the client side, and not on the
server side, as in other architecture. This ensure the scalability in terms of number
of concurrent visualizations, and dependability of the data (because the data are
dynamically loaded client side, without any server interactions).
This Chapter 5 describes the proposed architecture, that has also appeared in the
following papers:
[Cor+17a] G. Cordasco, D. Malandrino, P. Palmieri, A. Petta, D. Pirozzi, V.
Scarano, L. Serra, C. Spagnuolo, L. Vicidomini A Scalable Data Web Visu-
alization Architecture. Parallel, Distributed, and Network-Based Processing
2017.
[Mal+16] G. Cordasco, D. Malandrino, P. Palmieri, A. Petta, D. Pirozzi, V.
Scarano, L. Serra, C. Spagnuolo, L. Vicidomini An Architecture for Social
Sharing and Collaboration around Open Data Visualisation. In Poster Proc. of
22 Chapter 1 Introduction
Distributed Agent-Based
Simulation
2
„ How does variation in the number of interacting
units (grid size) affect the main results of an
agent-based simulation?.
— Claudio Cioffi-Revilla
(Invariance and universality in social
agent-based simulations, 2002)
2.1 Introduction
The traditional answer to the need for HPC is to invest resources in deploying
a dedicated installation of dedicated computers. Such solutions can provide the
computing power surge needed for highly specialized customers. Nonetheless a
large amount of computing power is available, unused, in common installations like
educational laboratories, accountant department, library PCs.
SC, as described before, uses computer simulation (see the example in Figure 2.1) to
investigate solutions and study complex real problems. This scenario of use is quite
common in the context of heterogeneous computing where different computing
platforms participate to the same computing effort contributing with its own specific
capabilities. One of the most challenging aspect in parallel computing is to balance
23
the work load among the machines that provides the computation. Indeed, due to
synchronization constraints, the entire computation advances with the speed of the
slowest machine, which may represent a bottleneck for the overall performances. Of
course this issue become particularly challenging in the context of heterogeneous
computing.
Software for computer simulation in the context of SC should be able to exploit both
HPC, characterized by a large number of homogenous computing machines, and
heterogeneous systems. This Chapter focuses on a particular class of computational
model, named Agent-Based simulation Models (ABMs).
The computer science community has responded to the need for platforms that can
help the development and testing of new models in each specific field by providing
tools, libraries and frameworks that enable to setup and run ABMs.
As a matter of fact, the research in many fields that uses the simulation toolkits for
ABMs is often conducted interactively, since the “generative” paradigm [Eps+07]
describes an iterative methodology where models are designed tested and refined to
reach the generation of an outcome with a simple generative model. In this context,
given that scientists of the specific domain often are not computer scientists, usually
they do not have access to systems for high performances computations for a long
time, and usually they have to perform preliminary studies within their limited re-
sources and, only later (if needed), allow extensive testing on large supercomputing
centers. In social sciences, for example, the need for “the capacity to model and
make up in parallel, reactive and cognitive systems, and the means to observe their
interactions and emerging effects” [CC95] clearly outlined, since 1995, the needs of
flexible, though powerful, tools.
Then, the design of ABMs is usually done by domain experts who seldom are
computer scientists and have limited knowledge of managing a modern parallel
2.1 Introduction 25
infrastructure as well as developing parallel code. In this context, our goal is to offer
such scientists a setting where a simulation program can be run on one desktop, first,
and can, later, harness the power of other desktops in the same laboratory, thereby
allowing them to scale up the size they can treat or to significantly reduce the time
needed to run the simulation. The scientist, then, is able to run extensive tests by
enrolling the different machines available, maybe, during off-peak hours.
Of course, it means that the resulting distributed system, by collecting hardware from
research labs, administration offices, etc., can be highly heterogeneous in nature
and, then, the challenge is how to efficiently use such hardware without an impact
on the “legitimate” user (i.e., the owner of the desktop) both on performances and
on installation/customization of the machine. On the other hand, a program in
MASON should not be very different than the corresponding program in D-MASON
so that the scientist can easily modify it to run over an increasing number of hosts.
These ensures that also an easily migration on a HPC system.
Repast
Repast HPC is the version of Repast that offers facilities to leverage the computing
power of clusters. In Repast HPC the programmer has to take into account that
some of the information of the simulation must be propagated across the cluster
to each of the workers involved in the computation. In [Che+08] the authors
present HLA_Grid_RePast, a middleware for the execution of collaborating Repast
applications on the Grid. The system considers Repast applications as services
on a Grid while the effort that allows the interoperability of models is left to the
programmer.
NetLogo
NetLogo allows the user to design the simulation by using a functional language
inspired by Logo and, for this reason, it is considered to be easily grasped by a
wider audience; furthermore NetLogo offers numerous facilities to support the
analysis of the simulation. For example, NetLogo offers BehaviorSpace that allows
to automatically running a given simulation with different parameters settings,
allowing the automatic exploration of the parameters’ space. BehaviorSpace can
exploit the parallelism of the machine by running more simulations at the same time.
These functionalities of NetLogo are similar to those described in Chapter Simulation
Optimization. The exploration of the parameters’ space can exploit the parallelism of
the machine by running more simulations at the same time but is not useful to run
massive simulation (i.e., simulating a large number of agents) and/or simulations
which deal with complex agents, that is, computationally intensive agents.
The simulation layer is the core of MASON and is mainly represented by an event
scheduler and a variety of fields which hold agents into a given simulation space.
MASON is based on steppable agent: a computational entity which may be scheduled
to perform some action (step), and which can interact (communicate) with other
agents. The visualization layer permits both visualization and manipulation of the
model. The simulation layer is independent from the visualization layer, which
allows us to treat the model as a self-contained entity.
The main reasons that directed us toward a distributed version of MASON are:
• MASON is one of the most expressive and efficient library for ABMs (as
reported by many reviews [Mat08; Naj+01; Rai+06]);
• MASON structure, that clearly separates visualization by simulation, makes it
particularly well suited to the re-engineering into a distributed “shape” of the
framework [Luk+04; Luk+05];
Therefore, the approach followed was that of managing explicitly the distribution
of agents on several computing nodes (see for recent examples [CN11; Men+08;
PS09]), in order to get the most from the efficiency point of view.
As an example, a simple way to partition the whole computation into different tasks
is to assign a fixed number of agents to each available logical processor (LP) [HX98].
A logical processor is an abstraction of a computational resource. This approach,
named agents partitioning, enables a balanced workload distribution but it may
introduce a significant communication overhead (a very small number of agents’
interdependencies, may result in all–to–all communication, required to synchronize
the simulation).
Other strategies which partition the work in a smarter way [Cos+11] have been
proposed in order to reduce the communication and synchronization time.
In our view, two are the main opposite forces that drive the design of scalable,
distributed ABMs. The first one is the quest for efficiency: the ultimate goal for the
simulation of extremely large and time-consuming models. In this context, this force
is propelled by the well-known end-to-end principle [D.P+84] that tries to push
higher up in the layers any effective attempt to achieve efficiency. In this context the
principle states that it is possible to gain substantial improvements on efficiency only
if it is possible to address the semantics of the (parallel) application itself, although
additional support by the lower layers is helpful. Some examples of this approach
include recent results on Land use [Tan+11], on burglary [Mal+12] as well as
a comparison of effectiveness of parallelism at the application level with respect
to super-individual methods [PE08], where single individuals are aggregated by
carefully changing the model itself, so that results of the simpler simulation in the
new model is consistent with the original one.
The opposite force is driven by the inherent complexity of the models and (therefore)
of the simulations, that requires coordinated work by a multidisciplinary team that
is involved in several repeated iterations of the method. It is fundamental, then,
2.3 D-MASON 31
Fig. 2.4.: Trading easiness of development for efficiency of the implementation
The design of D-MASON is inspired by the need for efficiency, in a distributed setting
where computing resources are scarce, heterogeneous, not centrally managed and
that are used for other purposes during other periods of the workday. Moreover, the
multidisciplinarity of the teams that use ABMs, often, places an important emphasis
on easiness of development, thereby suggesting our compromise between efficiency
and impact by acting at the framework-level. In this way, the scientists still can
use the same computing and storage abstractions they are familiar with, in order to
build a simulation from a given model. The (modified) framework is able to execute
the simulation within a distributed simulation, thereby achieving both efficiency and
effectiveness. Finally, our approach is cost-effective since it allows a high degree of
backward-compatibility with MASON simulations, because of the moderate number
of modifications into the source code of an existing MASON application.
Work partitioning
2.3 D-MASON 33
Fig. 2.5.: Field partitioning.
By noticing that most ABMs are inspired by natural models, where agents limited
visibility allow to bound the range of interaction to a fixed range named agent’s Area
of Interest (AOI), several field partitioning approaches have been proposed [Cos+11;
Zha+10; ZZ04] in order to reduce the communication overhead. Since the AOI
radius of an agent is small compared with the size of a cell, the communication is
limited to local messages (messages between LPs, managing adjacent spaces, etc.).
With this approach agents can migrate between adjacent cells and consequently
the association between workers and agents changes during the simulation. In
D-MASON by design, agents are allowed to migrate only between adjacent cells.
This is consistent with a large family of models (e.g. biology inspired models) that
do not need any kind of “teleportation”.
The Appendix B provides an analytical study about the field partitioning methods,
in particular, the study is focused on the communication effort induced by the
field partitioning approach used. This study motivates the introduction of the non-
uniform field partitioning approach. Scalability of field partitioning strategies will
be analyzed in Section 2.6.1.
Load Balancing The problem of a field partitioning approach is that since agents
can migrate between adjacent cells, the association between workers and agents
changes during the simulation. Moreover, load balancing is not guaranteed and need
to be addressed by the application. To better exploit the computing power provided
by the workers of the system, it is necessary to design the system to avoid bottlenecks.
Since the simulation is synchronized after each step, the system advances with the
same speed provided by the slower worker in the system. For this reason it is
necessary to balance the load among workers.
The choice of the partitioning strategy is important for the efficiency of the whole
system. Two key factors need to be considered: (i) static vs dynamic partitioning
and (ii) the granularity of the world decomposition. Dynamic partitioning can
be useful, for instance, when the workload of the simulation changes along the
time. Unfortunately, the management of dynamic cells requires a large amount of
communication between workers that consumes bandwidth and introduces latency.
2.3 D-MASON 35
Fig. 2.7.: Non-uniform field partitioning with 25 LPs.
Similarly the granularity of the world decomposition (that is, the cell size and,
consequently, the number of cells, which a given space is partitioned into) determines
a trade–off between load balancing and communication overhead. The finer is the
granularity adopted, the higher is the balancing that, ideally, can be reached by the
system. However, due to agents’ interdependency and system synchronizations, finer
granularity usually determines a huge amount of communication which may harm
the overall performances.
D-MASON uses a simple but efficient technique to cope with heterogeneity. The
idea is to clone the software ran by high capable workers so that they could serve as
multiple workers; i.e., a worker that is x times more powerful than other workers
could execute x virtual workers (that is by simulating, concurrently, several cells).
D-MASON uses a static partitioning while the granularity of the decomposition is
chosen by the user according to the expected unbalancing of the model and the
performances of the communication layer.
Communication
then simply subscribe to the topics associated with the cells which overlap with their
Area of Interest (AOI) in order to receive relevant message updates. Other topics are
also used for system management and visualization.
The first version of D-MASON uses Java Message Service (JMS) for communication
between LPs, by running Apache ActiveMQ Server [Apa11] as JMS provider. These
functionalities are provided by the Communication Layer of D-MASON described in
the section 2.4.2.
D-MASON can also be deployed on HPC systems. In order to better exploit such ho-
mogeneous environments D-MASON provides also an MPI-based Publish/Subscribe
communication (see Section 2.4.2).
Synchronization
2.3 D-MASON 37
Fig. 2.9.: D-MASON LPs’ synchronization.
parallel implementation with respect to the sequential one, each LP needs to collect
information about the adjacent cells. Each simulation step is formed by two phases:
communication/synchronization and simulation (see Figure 2.9). First of all, the LP w
sends to its neighbors (i.e., the LPs responsible for its adjacent cells) the information
about:
This information exchange is locally synchronized in order to let the simulation run
consistently (see Figure 2.9).
Reproducibility
2.3 D-MASON 39
2.4 D-MASON Architecture
D-MASON is designed under the scalability principle in all of its forms. D-MASON
architecture provides five design requirements and is divided in layers. The design
requirements are: efficiency for exploiting hardware architecture, effectiveness for
modelling different kind of ABM, usability from the users experience point of view
and correctness of the results. The D-MASON function blocks (or layers) and
their interaction aim to meet these requirements. The D-MASON functional blocks
(or layers) are: Distributed Simulation, Communication, Visualization and System
Management. Figure 2.10 depicts the layers interactions and how these meet the
design requirements.
Figure 2.11 depicts the D-MASON architecture, showing the corresponding architec-
tural layer for each of the D-MASON component. As shown, Distributed Simulation
and Communication layers are dependent on each other, both are needed to build a
distributed simulation in D-MASON. But there are no dependencies between the
System Management and Visualization layer with other layers, these are independed
module of D-MASON.
The next four sections describes the D-MASON layers while the Appendix C provides
a comparison between the implementations of a simulation example written in D-
MASON and MASON.
D-MASON DS layer consists of two main packages: Engine and Field (see Figure
2.12). This choice is dictated by choice to maintain the same structure as MASON
in order to provide the developer with a friendly environment.
Engine
The Engine package, depicted in Figure 2.13, consists of three objects, each one being
a distributed version of a correspondent one in MASON. The first, DistributedState,
represents the state of the simulation in a distributed environment and includes:
These three objects are the core of D-MASON. Listing 2.1 depicts a basic example of
a DistributedState. This toy simulation uses a DContinuousGrid2D (described
in the following), a continuous space field corresponding to the Continuous2D field
of MASON. The simulation initializes 100 agents for each LP, and sets their positions
randomly on the field.
1 p u b l i c c l a s s DSimulation e x t e n d s D i s t r i b u t e d S t a t e <Double2D>
2 {
3 p u b l i c DContinuousGrid2D s i m _ f i e l d ;
4 p u b l i c DSimulation ( GeneralParam params , S t r i n g p r e f i x )
5 {
12 super . s t a r t () ;
13 try
14 {\\ Tua .
15 sim_field =
DContinuousGrid2DFactory . createDContinuous2D ( 1 0 . 0 / 1 . 5 , 2 0 0 ,
200 , t h i s ,
16 s u p e r . AOI , TYPE . p o s _ i , TYPE . p o s _ j ,
s u p e r . rows , s u p e r . columns ,MODE, " d f i e l d 1 " ,
topicPrefix , true ) ;
17 init_connection () ;
18 } c a t c h ( DMasonException e ) { e . p r i n t S t a c k T r a c e ( ) ; }
19
20 DAgent agent = n u l l ;
21 f o r ( i n t i = 0 ; i < 100; i++) {
22 agent=new DAgent ( t h i s , new Double2D ( 0 , 0 ) ,
t h i s . random . n e x t I n t ( ) ) ;
23 agent . s e t P o s ( s i m _ f i e l d . g e t A v a i l a b l e R a n d o m L o c a t i o n ( ) ) ;
24 s i m _ f i e l d . s e t O b j e c t L o c a t i o n ( agent , agent . pos ) ;
25 agent . s e t C o l o r ( C o l o r . RED) ;
26 s c h e d u l e . scheduleOnce ( s i m _ f i e l d ) ;
27 }
28
29 }
30 @Override
31 public DistributedField2D getField ()
32 {
33 return sim_field ;
34 }
35 @Override
36 p u b l i c v o i d addToField ( RemotePositionedAgent rm , Double2D l o c )
37 {
38 s i m _ f i e l d . s e t O b j e c t L o c a t i o n (rm , l o c ) ;
39
40 }
41 @Override
42 p u b l i c SimState g e t S t a t e ( )
43 {
44 return t h i s ;
45 }
46 }
These two objects are a subclass of RemoteAgent and are, respectively, an instance
of an agent that has a position in the space (e.g. an agent in a continuous 2D space)
and an instance of an agent without any positioning (e.g. an agent in a Network).
This hierarchy is necessary to exploit all MASON features. Indeed, some MASON
features, like some type of agents visualization, are obtained by extending a visu-
alization class. On the other hand, in D-MASON the agent class should extend a
RemoteAgent class, while unfortunately Java does not support multiple inheritance.
This reasoning justify the hierarchy described above and ensures the compatibility
with all MASON functionalities.
1 // A b s t r a c t based agent c l a s s
2 p u b l i c a b s t r a c t c l a s s RemoteDAgent<E> implements S e r i a l i z a b l e ,
RemotePositionedAgent<E> {
3
4 p r i v a t e s t a t i c f i n a l long s e r i a l V e r s i o n U I D = 1L ;
5 p u b l i c E pos ; // L o c a t i o n o f a g e n t s
6 p u b l i c S t r i n g i d ; // i d remote agent . An i d u n i q u e l y i d e n t i f i e s t h e
agent i n t h e d i s t r i b u t e d −f i e l d
7 p u b l i c RemoteDAgent ( ) {}
8 p u b l i c RemoteDAgent ( D i s t r i b u t e d S t a t e <E> s t a t e ) {
9 i n t i=s t a t e . n e x t I d ( ) ;
10 t h i s . i d=s t a t e . getType ( ) . t o S t r i n g ( )+"−"+i ;
11 }
12 @Override
13 p u b l i c E g e t P o s ( ) { r e t u r n pos ; }
14 @Override
15 p u b l i c v o i d s e t P o s ( E pos ) { t h i s . pos = pos ; }
16 @Override
17 public String getId () { return id ;}
18 @Override
19 public void s e t I d ( S t r i n g id ) { t h i s . id = id ;}
20 @Override
21 p u b l i c boolean e q u a l s ( O b j e c t o b j ) {
22 i f ( t h i s == o b j ) r e t u r n t r u e ;
23 i f ( o b j == n u l l ) r e t u r n f a l s e ;
24 i f ( g e t C l a s s ( ) != o b j . g e t C l a s s ( ) ) r e t u r n f a l s e ;
25 RemoteFlock o t h e r = ( RemoteFlock ) o b j ;
26 i f ( i d == n u l l ) {
27 i f ( o t h e r . i d != n u l l ) r e t u r n f a l s e ;
28 } else i f (! id . equals ( other . id ) ) return f a l s e ;
29 i f ( pos == n u l l ) {
30 i f ( o t h e r . pos != n u l l ) r e t u r n f a l s e ;
31 } e l s e i f ( ! pos . e q u a l s ( o t h e r . pos ) ) r e t u r n f a l s e ;
65 }
Listing 2.3 depicts the example code for executing D-MASON simulation on a
local machine. This code considers that an instance of the message broker Apache
ActiveMQ is running on the local machine (see Section Communication Layer). The
test initializes and executes 8 LPs (DSimulation object), using a uniform partitioning
approach.
1 p u b l i c c l a s s DTestLocalMachine {
2 p r i v a t e s t a t i c i n t numSteps = 100; // number o f s t e p
3 p r i v a t e s t a t i c i n t rows = 2 ; // number o f rows
4 p r i v a t e s t a t i c i n t columns = 4 ; // number o f columns
5 p r i v a t e s t a t i c i n t AOI=10; // AOI
6 p r i v a t e s t a t i c i n t CONNECTION_TYPE=ConnectionType . pureActiveMQ ;
7 p r i v a t e s t a t i c S t r i n g i p=" 1 2 7 . 0 . 0 . 1 " ; // i p o f activemq
12 p u b l i c s t a t i c v o i d main ( S t r i n g [ ] a r g s )
13 {
14 c l a s s worker e x t e n d s Thread
15 {
16 p r i v a t e D i s t r i b u t e d S t a t e <?> ds ;
17 p u b l i c worker ( D i s t r i b u t e d S t a t e <?> ds ) {
18 t h i s . ds=ds ;
19 ds . s t a r t ( ) ;
20 }
21 @Override
22 p u b l i c v o i d run ( ) {
23 i n t i =0;
24 w h i l e ( i !=numSteps )
25 {
26 ds . s c h e d u l e . s t e p ( ds ) ;
27 i ++;
28 }
29 System . e x i t ( 0 ) ;
30 }
31 }
32
As observed in [Hyb+06], ABMs are very expressive; they implement the sense-think-
act cycle and offer a real agent-based programming model. Unfortunately, due to the
high level of interdependencies between agents, these models [Min+96; Luk+05;
Nor+07] are not easy to parallelize.
Theoretically, all the agents update their status at step t, considering the status of
all neighbor agents at step t − 1 (synchronous update). Specifically, in synchronous
updating, the agent v at the t-th iteration updates its state based on the state of its
neighbors at the (t − 1)-th iteration. Suppose agent v has k neighbors u1 , u2 , . . . , uk .
Let Sv [t] be the state of agent v at the t-th iteration. Hence,
where fupd computes the state of an agent. Using a sequential environment, the
agents are updated asynchronously, that is
where u1 , u2 , . . . , um are the neighbors of v that have already performed their com-
puting in the current iteration, while um+1 , um+2 , . . . , uk are the neighbors that have
not yet performed the computing. Asynchronous execution exhibits a very strong
side effect: the behavior of the model is influenced by the order of agents’ executions
[LM09]. So in order to meet the reproducibility feature (see Section 2.3.2), it is
extremely important to update the agents using a deterministic strategy. Indeed, the
asynchronous approach, performed on a sequential environment, adds unnecessary
dependencies that harm the efficiency of the system.
Although there are several approaches, like double buffering, which enable imple-
menting the synchronous approach even in a sequential environment, it is worth
mentioning that these strategies
MCM Implementation Details. Clearly the mechanism described above is quite hard
to implement, especially for model designers with limited experience in object ori-
ented programming. In D-MASON this mechanism is easily affordable for every user
because D-MASON solves the problem at framework level. A former implementa-
tion of the memory consistency mechanism was based on the Byte Code Generation
Library (cglib [cgl16]) to inject, at runtime, the code needed to realize the above
mechanism. The cglib is high level API to generate and transform JAVA byte code.
It is used by Aspect Oriented Programming, testing, data access frameworks to
generate dynamic proxy objects and intercept field access. Unfortunately this former
implementation introduces a performance slowdown mainly due to Java Reflection
overhead (see Figure 2.14).
In order to better explain the coding effort needed to use the D-MASON memory
consistency mechanism, in the following, we provide a toy example: The state of
each agent is represented by an integer. Agents wanders into a geometric field.
During each simulation step, each agent read the state of all its neighbors (i.e., the
agents within a certain distance) and updates its value as the maximum among its
state and the states of all its neighbors. The following code (Listing 2.4) implements
the above logic using the memory consistency mechanism. The same logic without
MCM is shown in the Listing 2.2.
First is declared the variables, which compose the agents’ state, after that it is
declared the RemoteAgentStateMethodHandler object to access consistently the
agents state (see Listing 2.4). Notice that, in the step method, the access to the state
variable val is always performed using getState(. . .) and setState(. . .) of the
RemoteAgentStateMethodHandler object. It is worth mentioning that, although the
methods getVal and setVal are implemented in the standard way by the modeler,
they will be used by RemoteAgentStateMethodHandler for access “consistently” the
agents’ state.
1 p u b l i c c l a s s DAgent e x t e n d s RemoteDAgent<Double2D>{
2 private int val ;
3 // Agent s t a t e d e f i n i t i o n
4 s t a t i c A r r a y L i s t <S t a t e V a r i a b l e > s t a t e v a r i a b l e s=new
A r r a y L i s t <S t a t e V a r i a b l e >() ;
5 static {
6 s t a t e v a r i a b l e s . add (new S t a t e V a r i a b l e ( " v a l " , i n t . c l a s s ) ) ;
7 }
8 f i n a l s t a t i c RemoteAgentStateMethodHandler memory = new
RemoteAgentStateMethodHandler DAgent . c l a s s , s t a t e v a r i a b l e s ) ;
9 // end Agent s t a t e d e f i n i t i o n
10 p u b l i c DAgent ( ) {}
11
12 p u b l i c DAgent ( S t r i n g id , Double2D l o c a t i o n , I n t e g e r v a l ) {
13 t h i s . id = id ;
18 @Override
19 p u b l i c v o i d s t e p ( SimState s t a t e ) {
20 Bag b = g e t N e i g h b o r s ( ( D i s t r i b u t e d S t a t e ) s t a t e ) ;
21 i n t max=v a l ;
22 for ( Object f : b) {
23 DAgent d=(DAgent ) f ;
24 i n t d v a l = ( i n t ) memory . g e t S t a t e (
( DistributedMultiSchedule ) s t . schedule , d , " val " ) ;
25 i f (max < d v a l ) max=d v a l ;
26 }
27 memory . s e t S t a t e ( ( D i s t r i b u t e d M u l t i S c h e d u l e ) s t . schedule , t h i s ,
" v a l " , max) ;
28 }
29
34 p u b l i c v o i d s e t V a l ( D i s t r i b u t e d M u l t i S c h e d u l e schedule , i n t v a l ) {
35 t h i s . v a l=v a l ;
36 }
37 }
• Double Buffering (DB), uses the well know double buffering strategy imple-
mented ad hoc in the agent.
• Java Reflection (Reflection), uses the Byte Code Generation Library (cglib
[cgl16]) to inject, at runtime, the code change the effect of access methods
exploiting the Java Reflection Proxy to intercept all methods calls.
• Method Handler, uses the Method Handler mechanism of Java 8. Two kind of
binding have been considered:
– Dynamic Lookup (MHDL), where the methods handler, for each access
method, are obtained dynamically;
– Static Lookup (MHSL), where the methods handler are bound in a static
context, as show in Listing 2.4.
The figure 2.14 depicts the performance obtained by different memory consistency
mechanisms.
Fig. 2.14.: Memory Consistency Performances: Simulation time obtained running a SIR
simulation using four different memory consistency implementations with
4, 16, 36, 64 and 144 LPs.
Field
The package Field, depicted in Figure 2.13, is the real core of D-MASON: indeed this
package defines the logic of agents’ distribution. This package provides classes in
common with all the fields and the distributed versions of several MASON fields.
The new hierarchy of fields in Distributed MASON is based on a Java interface called
DistributedField that represents an abstraction of a distributed field (cell): all
the fields that are meant to be distributed must implements this interface, so that
they have to expose functionalities, such as evaluating whether a given position
belongs to the current cell or not or a method to generate a random position in the
current cell. The interface also exposes a method synchro() that allows the local
synchronization among cells and hence the update of the simulation (see Figure
2.9).
D-MASON provides a distributed version for almost all the MASON fields. For in-
stance, the sub-packages grid and continuous provide two specializations of the 2D
fields of MASON, SparseGrid2D, Continuos2D, IntGrid2D and DoubleGrid2D,
two types of class factory to create the right field according to the type of partition-
ing (e.g., DSparseGrid2DXY and DContinuosGrid2DXY for uniform partitioning
and DSparseGridNonUniform and DContinuosGridNonUniform for non-uniform
partitioning) and the corresponding abstract classes for the fields, DSparseGrid2D e
DContinuousGrid2D.
As anticipated in the Section 2.3.2, currently, in D-MASON there are two kind
of field partitioning: uniform and non-uniform. Listing 2.1 shown the code of
a toy simulation. Line 15 shows the code for instantiating a DContinuous2D
field using the factory DContinuousGrid2DFactory that instantiates a new dis-
tributed field using the given constructions parameters. This example uses the
method createDContinuonus2D which performs a uniform partitioning. On the
other hand, the method createDContinuous2DNonUniform can be used to perform
a non-uniform partitioning.
Why Networks are important in ABM simulation? Networks are everywhere. Com-
plex interactions between different entities play an important role in modeling the
D-MASON provides a distributed network field, named DNetwork (see Figure 2.15),
based on METIS [KK98], a graph multilevel k-way partitioning suite, developed in
the Karypis lab of University of Minnesota, evaluated for our specific purpose in
[Ant+15].
DNetwork field: Usage. The following Listing 2.5 shows the code of the start
method for an D-MASON simulation on the DNetwork field. For visualization
purposes, the agents also lie on a Continuous2D field.
1 p u b l i c DNetwork network ;
2 p u b l i c Continuous2D yard = new Continuous2D ( 1 . 0 , 1 0 0 , 1 0 0 ) ;
3
4 @Override
5 public void s t a r t ()
6 {
7 super . s t a r t () ;
8 i n t commID = ( TYPE . p o s _ i * rows )+TYPE . p o s _ j ;
1 p u b l i c i n t e r f a c e Connection {
2 p u b l i c boolean s e t u p C o n n e c t i o n ( Address p r o v i d e r A d d r ) throws
Exception ;
3
6 p u b l i c boolean p u b l i s h T o T o p i c ( S e r i a l i z a b l e o b j e c t , S t r i n g
topicName , S t r i n g key ) throws E x c e p t i o n ;
7
10 p u b l i c boolean a s y n c h r o n o u s R e c e i v e ( S t r i n g key ) ;
11
Considering the good results obtained by the first version of D-MASON, which was
mainly devoted to heterogeneous clusters of workstations with a limited number
of LPs, the focus moved to dedicated installations, such as massively parallel ma-
chines or supercomputing centers. These platforms usually offer a large number
of homogeneous machines that, on one hand, simplify the issue of balancing the
load among LPs, but, on the other hand, the considerable computational power
provided by the system weakens the efficiency of the centralized communication
server. Indeed, centralized solutions can not scale both in terms of the growth of the
computing power, which affects the amount of communication, and in terms of the
number LPs, which affects the number of communication channels. For this reasons,
a novel decentralized communication mechanism, which realizes a Publish/Sub-
scribe paradigm through a layer based on the MPI standard, was implemented in
D-MASON [Cor+14b].
The current version of D-MASON, as anticipated in the Section 2.3.2, provides two
kind of communication specialization. A centralized communication, which is used
for general purposes architecture (heterogeneous computing or cloud computing),
exploits the Java Message Service standard, exposed by the ConnectionJMS interface
The functionalities of the layer for homogeneous computing were tested on the
OpenMPI [Ope16c] implementation. Using MPI, the overall communication is com-
pletely decentralized. Moreover, when the system requires some system management
functionalities, D-MASON communication is performed using a hybrid approach:
the communication among LPs is handled by the MPI infrastructure (in order to
achieve scalability) while the management messages, being asynchronous, operate
through the ActiveMQ Server.
1. pureActiveMQ: uses Apache ActiveMQ as message broker for both the manage-
ment and LPs’ synchronization messages;
2. pureMPI, uses only the MPI layer and can be used only when there are no
management services (like centralized visualization). Three different MPI
communication mechanisms are available [Cor+14b; Cor+14a]:
• pureMPIBcast, exploits a broadcasting mechanism between the LPs
groups (i.e., LPs managing adjacent cells);
• pureMPIGather, using the function MPI Gather an LP is able to get, in
a single step, all the information needed from its neighborhood. This
strategy allows to decrease the number of communication rounds but
increases the message size.
1 p u b l i c f i n a l c l a s s ConnectionType implements S e r i a l i z a b l e {
2
3 p u b l i c s t a t i c f i n a l i n t unitTestJMS = −5;
4
5 p u b l i c s t a t i c f i n a l i n t pureMPIBcast = 1 ;
6 p u b l i c s t a t i c f i n a l i n t pureMPIGather = 2 ;
7 public s t a t i c f i n a l i n t pureMPIParallel = 3;
8
9 p u b l i c s t a t i c f i n a l i n t pureActiveMQ = 0 ;
10
11 p u b l i c s t a t i c f i n a l i n t hybridActiveMQMPIBcast = −1;
12 p u b l i c s t a t i c f i n a l i n t hybridActiveMQMPIGather = −2;
13 p u b l i c s t a t i c f i n a l i n t h y b r i d A c t i v e M Q M P I P a r a l l e l = −3;
14
15 p r i v a t e s t a t i c i n t ConnectionType = pureActiveMQ ;
16 }
The design strategy that enables to use MPI in D-MASON is based on the concept of
synchronization time that is the time required to exchange all the synchronization
messages at the end of a simulation step. The system associates an MPI Process to
each D-MASON LP and uses MPI communication groups to define the communication
among cells. The synchronization time is partitioned in communication rounds, each
of which is dedicated to a different communication group.
MPI. MPI is a library specification for message-passing, designed for high perfor-
mance on both massively parallel machines and on workstation clusters. MPI has
The need for thread safety. Since the architecture of D-MASON uses a thread for
each operation (send or receive) on the same cell, during the choice of the MPI
implementation, it is important to consider thread safety implementations.
Before MPI-2 there was no solution to enable users to write user-level threads, in
MPI programs, able to make calls to MPI. The MPI-2 Standard has clearly defined
the interaction between MPI and user-level threads in MPI programs. There are four
levels of thread safety supported by the standard, that a user must explicitly select
according to his/her necessities [GT07].
In the literature there are several Java implementations of the MPI standard [Wei+00;
Haf+11; Bak+06]. MPJ Express is an open-source library for message passing com-
pliant with MPI-1.1, offering the same thread-safety guarantees of MPI-2. The
comparison between MPJ Express and other MPI implementations is reported in
[exp13]. Open MPI Project [Ope13] is an open source MPI-2 Standard implemen-
tation that is developed and maintained by a consortium of academic, research
and industry partners. Combining the expertise, technologies, resources from all
across the HPC community, it offers the best MPI library available. The prerelease
1.9a1r27886 of Open MPI and the current trunk on Open MPI’SVN [Ope13] offers
mpiJava, a Java binding of MPI-1.1 Standard [Bak+99; Car+99].
Significant research has been carried out in the past for improving collective commu-
nication using parallel algorithms based on the message size and optimized for the
specific platform. As an example, the function MPI_Bcast for small messages uses
binomial tree algorithms, for shared memory systems employs pipelining to improve
buffer utilization [Mam+]
Two states of the computation are present in D-MASON in which all the processes
are performing the same action: the connection with the Publish/Subscribe server,
because after this phase all workers are connected in the network and the master
can see them and spread the computation; the end of each simulation step because
at this point of the simulation, each cell has to receive and send updates to start
the next step. The purpose of the strategy is to create an MPI infrastructure that
abstracts the Publish/Subscribe pattern used in D-MASON, ensuring the decoupling
between topics and MPI processes and the scalability of the number of topics by
exploiting the collective communication primitives offered by the MPI Standard.
The communication layer using MPI is based on the following set of assumptions:
Each cell is associated with MPI process and a topic is represented with an MPI
Group. So in order to create a topic, an MPI process creates an MPI Group and a
subscription corresponds to a MPI Group join.
The main difference between the centralized communication and the MPI one is
the synchronization. In the centralized communication, the synchronization is
implemented at the framework level using a data structure that indexes, for each
step, the updates and acts as barrier, so that each cell remains locked until it receives
all the required updates. On the other hand, the MPI strategy takes advantage of the
intrinsic synchronization of MPI, because the collective communication primitives
are blocking.
of the topic i. In the j-th minislot, one publisher process invokes MPI_Bcast to
send updates to subscribers, while the other processes, subscribed to topic i, invoke
MPI_Bcast to receive updates from the topic i.
MPI_Gather. By observing that, during each simulation step, each cell needs to
collect information from its neighborhood, the second implementation uses the
MPI_Gather function in order to gather all the messages needed in a single time step.
This allows to decrease the number of communication phases while increasing the
messages size. In this solution, a communication phase is divided into n time-slots,
where n is the number of MPI processes (cf. Figure 2.18). During the i-th phase the
process i invokes the MPI_Gather function in order to receive updates from all its
neighborhood.
• Union Groups. For each i = 1, 2, . . . , n, the Union Group i contains all MPI pro-
cesses present in the MPI Group of the topics to which the cell i is subscribed;
• Communicator Groups. The Communicator Group i is the Communicator of
the Union Group i.
This explains why the first two strategies use native Java Serialization of the objects,
and send and receive arrays of bytes. The problem is that the serialization/deserial-
ization of objects can be computational expensive and for this reasoning an ad-hoc
strategy, named parallel, was developed without using any collective communication
routines in order to increase the degree of parallelism during the communication.
The novel strategy is depicted in Figure 2.19. The problem to be addressed is the
following. At the end of each step there are c communications to be executed (com-
munications are represented by a pair hsender, receiveri) during the synchronization
phase. Each process can invoke one function (send or receive) at a time, so while it
is receiving it can not send anything.
Clearly the goal is to minimize the size of r. If one maps this problem on a graph
where there is a node for each process and a directed edge between two processes q
and p if, during the synchronization phase, q has to send a message to p, the problem
above become the well-known Edge coloring problem, which is known to be NP-hard
[MG92]. An edge coloring of a graph is a minimum assignment of colors (rourds in
our case) to the edges of the graph so that no adjacent edges have the same color.
The number of colors needed to edge color a simple graph is at least ∆ where ∆
is the maximum degree of the graph. Edge coloring has applications in scheduling
problems and in frequency assignment for fiber optic networks. In the literature
there are several greedy strategies that construct coloring that use at most ∆ + 1
colors. The Parallel strategy uses the following simple randomized strategy. Let E be
the edges set and C be the set of colors (rounds). Consider a random permutation
of the edges in E. For each uncolored edge e ∈ E, check whether there is a color in
C which, once assigned to e, avoids conflicts. If no colors are available, the strategy
generates a new color for e, and adds it to C. This greedy algorithm runs, in the
worst case, in O(|E|2 ), and uses, ∆ + 1 colors on average.
Flockers is a model stated in 1987 [Rey87] by Craig Reynolds that simulates the
behaviour of a flock. In this model "flockers" moves according to three simple rules:
separation, they steer to avoid crowding local flockers, alignment, they steer toward
the average heading of local flockers, cohesion, they steer to move toward the average
position (center of mass) of local flockers.
Ant Foraging simulates the foraging process of ant colonies. At the start of the
simulation ants are in the nest and no ant knows where the food source is. The
food discovery happens through an adaptive and iterative process: ants forager
random walk on several paths, lying pheromone traces along the visited path, that is
used by future ants to evaluate the quality of the path, and once they find the food
source they come back to the nest reinforcing the good path with pheromone traces
in a stigmatic fashion. This kind of simulation is characterized by a non-uniform
distribution of agents on the field: when the ants find the food source, then they will
move only on the shortest path between the nest and the food source.
Simulations were run using 4 LPs, for 100 simulation steps (Flockers) and until the
first ant reaches the food (Ants). Several configurations were considered where the
parameters:
The size of the simulation field has been changed in a way that keeps constant the
def A
field’s density density = w×h , where w and h denote respectively the width and the
height of the field. The density has been chosed according to the original density
of the sequential version of the simulations available in MASON. For instance on
Flockers we start using a 2D field of dimension 7, 500 × 7, 500 with 500, 000 agents
up to a 2D field of dimension 23, 800 × 23, 800 with 5, 000, 000 agents. The tests have
been performed on 4 machines configured ad follow:
The following strategies were tested: JMS (centralized) and MPI (Bcast, Gather,
Parallel). Figure 2.21 depicts the comparison between the four strategies, JMS,
Bcast, Gather and Parallel, on Flockers with AOI = 10. The JMS performance are
placed on the X axis as a benchmarks for the other strategies. In general, the MPI
strategies tend to be better on massive simulations up to a certain value where the
MPI Strategies show an additional load due to the Garbage Collection and an heavy
disk I/O burst.
It is worth to mention that in our small test scenario, with only 4 workers, the benefit
of the Parallel solution with respect to the other solutions do not appear yet. On
the other hand in bigger simulations, with a high number of cells (at least 16) it is
expected significant enhancements of the performance.
Indeed using n MPI processes with a neighborhood size (degree) ∆, the random
algorithm uses at most ∆ + 1 communications rounds. In a 2 × 2 grid in which each
cell has 8 neighbours (topic subscriptions), there are 8 × 4 communication rounds
using the Bcast and at most (8 + 1) × 2 communication rounds (we multiply by 2
because each edge of the communication graph is bidirectional) using the Parallel:
the gain in percentage is less than the 50% of rounds. In a 4 × 4 grid in which each
cell has 8 neighbors, there are 8 × 16 = 128 communication rounds using the Bcast
and again at most (8 + 1) · 2 communication rounds using the Parallel, so the gain in
percentage is greater than the 85% of rounds.
The first version of D-MASON system management had two disadvantages: first, it
was not fully decoupled from the simulation part. Hence, adding new features often
requires complex interventions with a considerable waste of time. Moreover, the
system was designed for local interactions (that is assuming that both the simulation
Architecture. The novel web server components has been encapsulated into the
D-MASON Master application, and is available on /resources/
systemmanagement folder and experimentals.systemmanagement package, which
now comprises two communication components:
When the user starts the Master application, both the ActiveMQ and the Jetty server
will run on the host. In particular the Jetty server is reachable on a TCP port (default
is 8080) and the user can access the management console via browser. Using this
approach the user can manage and monitor its simulation, considering that the port
8080 of the Master node is reachable on the Internet.
It is worth to mention that the load of the Jetty Server will not harm the performance
of the system. This is true especially when the number of users is small and the
user interaction is limited. Indeed, the load of the Jetty server is only due to the
activity of discovering and monitoring of LPs. In any case, when this load increases
(i.e., a huge number of users continuously interacting with the master and/or a
large number of LPs to be monitored) the master node can be configured to use an
external ActiveMQ communication server in order to separate the communication
and monitoring effort.
The Web System Management provides four main views, selectable by a control
panel (see Figure 2.22):
1. Monitoring, enables the user to monitor the resources available on all connected
workers (see Figure 2.23). Using such information, the user is able to choose
appropriately the workers to be engaged for future simulations. The system
mechanism has been implemented. All the log files are available at run-time
on the Simulation Info panel (see Figure 2.26 (right)).
3. History, enables the user to visualize the performed simulations (see Figure
2.25). The history page allows also downloading log files of the simulations.
4. Settings, enables the user to change system configurations, for instance the IP
and PORT number for the JMS server.
The Appendix C.2 describes other D-MASON feature that help to execute D-MASON
on a cluster environment and/or a cloud infrastructure.
GRAPHICS
The initial version of the visualization layer was a standalone Java application, while
in the last version of D-MASON, the visualization layer is included in the Web
System Management Layer functionality, although the two layer are yet independent.
The solutions adopted in the first implementation, and described in the next Section,
have been taken similarly in the new version.
The strategy allows to change the level of details of the visualization, according to
the size of cells, the level of details needed, the network speed and the computa-
tional power available on both, workers and the collector. Strategies like image
compression/decompression can be implemented in order to make the system more
efficient.
Zoom App. The compressed visualization provided by the Global Viewer is not
enough to ensure the usability and effectiveness requirements of D-MASON. As-
suming that it is not possible to view detailed visualization, of whole simulation
field, on a single node, D-MASON provides an additional mechanism to view the
detailed visualization of the agents of one cell at time. The idea consists to enable the
visualization of a single LP from the Global Viewer, by clicking on the corresponding
space in the view, that results in the activation the MASON visualization, which the
other hand may be seen as a zoom operation.
The zoom application is synchronized with the simulation, due to the massive
number of agents to be visualized (but for smaller simulation it is also possible to
execute the zoom in asynchronous way). After the activation of the zoom, for a
particular LP, the LP packs and sends its agents, at end of each simulation step, on a
topic named GRAPHICS-ID_LP. The zoom application is listening on this topic and
uses the MASON components to visualize the agents.
The last version of the System Management Layer, described in the Section 2.4.3,
embed the global visualization. Figure 2.30 shows the global visualization from the
Visualization page of the Web System Management.
StarCluster is useful to configure the network of the cluster, create user accounts,
enable password-less connections sharing the SSH password between the cluster’s
nodes, setup NFS shares and the queuing system for the jobs. StarCluster is also
customizable via plug-ins, which enable users to configure the cluster with their
specific configuration. Plug-ins are written in Python exploiting StarCluster API to
interact with the nodes. The API supports the execution of commands, copy of files,
and other OS-level operations on the nodes. StarCluster supports also the use of
Spot instances allowing the user to run on-demand experiments in easy way and at
affordable prices.
2.5.3 Architecture
D-MASON on the cloud has been re-
alized with the purpose to provide a
SIMulation-as-a-Service (SIMaaS) envi-
ronment. The architecture of the sys-
tem is depicted in Figure 2.31. D-
MASON on the cloud is based on a mod-
ular approach, which comprises three
levels: The Infrastructure is given by
Amazon EC2 which provides a wide
portfolio of instance types [Ama16] de-
signed to be adopted for different use
cases. Instance types vary by CPU per-
formances, memory, storage (size and
performance), and networking capacity.
The user is free to select an AWS cell
according to prices and availability or
resources. Starting with a free available
Amazon AMI (ami-52a0c53b), which in-
cludes a minimal software stack for dis-
tributed and parallel computing [Sta16],
an AMI, specifically configured for ex-
ecuting D-MASON on the cloud, was
realized. The D-MASON AMI, public
Fig. 2.31.: D-MASON on the Cloud: Archi-
available on Amazon Infrastructure, pro- tecture.
vides also Java 8, Maven. On top of that,
we developed a StarCluster plug-in, which exploits all the functionality provided
The master node runs the D-MASON Master application, the JMS message broker
(ActiveMQ) and the web system management server (see Section 2.4.3). The
other machines run the D-MASON Worker applications, which communicate using
the JMS message broker running on the Master node. Each D-MASON Worker
application provides a simulation slot for each core available on the machine. The
StarCluster D-MASON Plugin is freely available on GitHub D-MASON source code
repository [Repce].
The D-MASON tier did not require any particular change: the engine of the system
will be executed on the cloud environment and the management is performed thanks
to the Web system management interface described 2.4.3.
The performance of each test is measured in terms of the number of simulation steps
performed in a time span of 120 seconds. Since each configuration involves some
1
D-MASON GitHub repository – Quad Tree Java implementation – https://goo.gl/9JAupK
In the following are described the performance results in terms of weak and strong
scalability.
In the weak scalability test is explored the ability of the partitioning strategies to scale
using a fixed amount of computation for each logical processor (see Section 1.4.1).
In order to do that, the number of agents is proportional to k (i.e., A = 28000 × k)
while the dimensions
of the field are set in order to keep the agents’ density constant
w×h
(i.e., density = A ≈ 100, where w and h denotes the weight and the height of
the field) for each weak scalability test.
Twenty configurations varying the value of k ∈ {4, 9, 16, 25, 36}, the partitioning
strategy (Square and Tree) and the agents distribution (QU and NU) were tested.
The Figure 2.32 depicts the weak scalability results as the the number of simu-
lation steps performed in a time span of 120 seconds (Y-axis), for each value of
k ∈ {4, 9, 16, 25, 36} (X-axis) and for each partitioning strategy-agents distribution
(series).
The Tree strategy gives the best performances for both the agents distribution.
Furthermore, we notice that the performance trends are affected by the granularity
of the decomposition, which impacts on the communications overhead. Using the
Square strategy, the amount of communication overhead is proportional to k (8
communication channels for each cell). The tree strategy generates more channels
(see table 2.4.2). Hence, with small values of k, the gap is sensible. Increasing k, the
improvement tends to decrease as the communication overhead increases, especially
using a centralized communication approach.
The strong scalability test explores the ability of the partitioning strategies to scale
using a fixed amount of computation (see Section 1.4.1). In our case the amount
of computation consists of 1M agents moving on a 2-dimensional field of size
10000 × 10000. The Figure 2.33 depicts the strong scalability results as the the
number of simulation steps performed in a time span of 120 seconds (Y-axis), for
each value of k ∈ {4, 9, 16, 25, 36} (X-axis) and for each partitioning strategy-agents
distribution (series).
The Tree strategy gives the best performances for both the agents distribution. The
improvement ranges from ×2 for the QU agents distribution to ×30 for the NU agents
distribution. The figure shows also that the Tree strategy is able to counterbalance
the non-uniform agents distribution. Indeed, especially for k = 16 and k = 25, the
performance of the Tree strategy is not affected by agents distribution.
Setting and goals of the Experiments. These tests were performed on a cluster of
eight nodes, each equipped as follows:
• CPUs: 2 x Intel(R) Xeon(R) CPU E5-2680 @ 2.70GHz (#core 16, #threads 32)
• RAM: 256 GB
Considering the high computational power of each node, the tests were able to run
several (up to 90) LPs on each node. Simulations have been conducted on a scenario
consisting of seven machines for computation and one for managing the simulations
and running the ActiveMQ server when needed. As in the previous test, the tests
was executed using the simulation Flockers (see Section 2.4.2). Among the three
decentralized discussed and preliminary evaluated in Section 2.4.2, we decided
to analyze only the Parallel MPI strategy, which have been shown to be extremely
efficient on simulation involving a large number of LPs.
Communication scalability test. For this experiment, the field size (10, 000 ×
10, 000), the number of agents (1 million) and the AOI (10), were fixed. This
experiment is defined by 16 test settings, each characterized by: the field partitioning
configuration (number of rows and columns), which determines also the number
of Logical Processes (Number of LPs = [R]ows × [C]olums) and the communication
scheme (decentralized or centralized). A couple (P, S) identifies each test setting
where
Figure 2.34 presents the results. The X−axis indicates the value of P (left to right
the number of LPs is increasing), while the Y −axis indicates the overall execution
When the number of LPs is small, the advantage of the decentralized communication
does not appear because the message broker is much efficient comparing to the
coarse grain synchronization requirement of the decentralized one. By increasing
the number of LPs, the efficiency of the centralized message broker gets down
dramatically and the simulation performance does exhibit the benefits of using the
decentralized communication. This trend is due to the fact that by increasing the
LPs number there are much more messages in the system and the effort needed to
have a synchronizing mechanism in the decentralized communication approach is
hidden by the time taken by the message broker to deliver all the messages.
Computation scalability test. For this experiment, the density of the field (100) and
the AOI range (10), were fixed. This experiment is defined by 72 test settings, each
characterized by: the field partitioning configuration, the communication scheme
and the number of agents. Each test setting is identified by a triple (P, S, A) where
Figure 2.35 presents the results. The X−axis indicates the number of agents A,
while the Y −axis indicates the overall execution time in seconds. The test starts
with a field size of 10, 000 × 10, 000 and one million of agents, these values are scaled
up proportionally in such a way to keep a fixed density along the overall test.
The figure 2.35 shows that for each field configuration the decentralized approach
performs better than the centralized one up to a certain number of agents (i.e.,
64M for 10 × 10 configuration) that is when the computational requirement are
significantly higher than the communication one. However, the figure shows also
that, if this is the case, then the system deserves a finer field partitioning. Indeed, by
increasing the number of LPs (i.e., moving from 10 × 10 to 15 × 15).
Moreover increasing the number of LPs requires more communication, which in-
creases the ratio communication / computation and consequently shifts the “cross-
point” (1024M for 15 × 15 configuration). It is worth mentioning that in the last field
configuration (20 × 20), the cross point has not been reached because the centralized
server was not able to manage the communication generated by more than 2048M
agents.
This test aims to evaluate the weak scalability of D-MASON varying the number
of LPs. The amount of computation for each LP consists of around 90000 agents.
Several tests were performed varying the number of LPs from 4 (360000 Agents) up
to 225 (20M Agents).
Figure 2.36 presents the results. The X−axis indicates the number of logical
processors, while the Y −axis indicates the number of steps performed within a time
span of 15 minutes. The system scales pretty well, the overall performance degrades
gracefully. Moreover, with this configuration the centralized approach seems to
perform better than the decentralized one.
Strong Scalability
This test aims to evaluate the strong scalability of D-MASON fixing the overall
amount of computation (20M agents) and varying the number of LPs (from 4 to
225).
Figure 2.37 presents the results. The X−axis indicates the number of logical
processors, while the Y −axis indicates the number of steps performed within a time
span of 15 minutes. The speedup provided is always better than the 50% of the
ideal speedup. Again, with this configuration the centralized approach seems to
This test aims to evaluate the limits of D-MASON for a particular cluster machine.
This test is similar to the weak scalability test described above but here the amount
of computation for each LP consists of 450000 agents. Tests were performed varying
the number of LPs from 4 (18000000 Agents) up to 225 (100M Agents). The results
depicted in figure 2.38 show the limit of centralized approach. The two communica-
tion strategy provide similar trends up to 125 LPs. After that the performance of the
centralized communication sensibly drops.
All the tests have been performed on Flockers. Boids/Agents have been simulated
on a 2D geometric field having size 6400 × 6400. For each test we executed a
reproducible simulation with 1M agents for 15 minutes. At the end of the simulation,
the number of simulation steps performed was collected. The web-based system
management, described in Section 2.4.3 was used, to start and stop the simulation
and to collect the log files.
c3.large, processor Intel Xeon E5-2680 v2 (Ivy Bridge) with 2 vCPU, 3.75GB of
memory and 2 x 16GB SSD storage (cost $0.105 /h — or 0.019/h for spot at
the low price range);
c3.xlarge, processor Intel Xeon E5-2680 v2 (Ivy Bridge) with 4 vCPU, 7.5GB of
memory and 2 x 40GB SSD storage. (cost $0.210 /h — or 0.039/h for spot at
the low price range).
Tab. 2.2.: Cost calculation for in-house hosting of a single server with 8 Xeon 2-cores
processors.
Two different HPC configurations were considered. In the former one, named HPC1,
all the LPs are executed using a single node, while in the latter, named HPC∗ , we
executed exactly 2 LPs for each machine. Hence in this last configuration the system
uses up to 10 nodes.
Four instances were tested (c3.large, c3.xlarge, HPC1, HPC∗ ) with 5 partitioning
configuration (20 tests overall). Notice that all the tests have been executed on a
reproducible deterministic simulation using the same JVM (version 1.8.0_72). Each
tests was executed 10 times. The results are compared using means of simulation
steps performed (we observed a minimum variance in the cloud instance results,
while on the HPC instances the variance was negligible). Results about performance
and costs are reported in Table 2.3.
Analyzing the results from Table 2.3: D-MASON on the cloud provide a good degree
of scalability with very affordable prices. The HPC∗ instance provides the best
performance. This result was expected and we believe that it is mainly due to the
quality of the dedicated interconnection network. It should be highlighted, however,
that the HPC∗ configuration is considerably more expensive. On the other hand the
cloud instances are much cheaper than the HPC ones. Moreover, both the cloud
instances scale better than the HPC1, which have comparable costs. Finally, in order
to measure the trade-off between performances and cost, we computed the cost (per
step) of each test setting (see last column of Table 2.3). The results show that the
cloud instances are much cheaper than dedicated instances. Figure 2.39 summarizes
the results shown in the table 2.3. The number of LPs appear on the X-axis, the
number of steps performed in 15 min (avg) appear along Y -axis and the instance
configuration are reported as series. The radii of the bubbles are proportional with
costs (× step).
— Enrico Clementi
This chapter discusses the design and the development of a new Swift/T feature:
the ability to invoke Java Virtual Machine (JVM) based interpreted languages, like
Clojure, Groovy, Javascript, Scala etc. This feature is becoming more and more
attractive from the Computational Science point of view, due to the high number
of open-source scientific programming libraries. Furthermore, many vendors of
95
supercomputing systems provide in their systems the ability to execute JVM based
languages, such as Cray Inc.
The next sections summarize the syntax and basic semantics of the Swift/T lan-
guage.
3.2.1 Syntax
The Swift language uses C-like syntax and conventional data types such as int,
float, and string. It also has typical control constructs such as if, for, and
foreach. Swift code can be encapsulated into functions, which can be called recur-
sively. As shown in 3.1, Swift can perform typical arithmetic and string processing
tasks quite naturally. Swift also has a file type, that allows dataflow processing on
files.
1 add ( i n t v1 , i n t v2 ) {
2 p r i n t f ( " v1+v2=%i " , v1+v2 ) ;
3 }
4 i n t x1 = 2 ;
5 i n t x2 = t o i n t ( " 2 " ) ;
6 add ( x1 , x2 ) ;
1 app ( f i l e o ) gcc ( f i l e c , s t r i n g o p t z ) {
2 " gcc " "−c " "−o " o o p t z c ;
3 }
4 app ( f i l e x ) F ( f i l e o ) {
5 " gcc " "−o " x o ;
6 }
7 file c = input ( " f . c " ) ;
An example use of Swift for shell tasks is shown in 3.2. This example demonstrates
a fragment of a software build mechanism. The user defines two app functions,
which compile and link a C language file. Swift app functions differ from other Swift
functions because they operate primarily on variables of type file.
Other forms of external execution in Swift/T allow the user to call into native code
(C/C++/Fortran) directly by constructing a package with SWIG. Such libraries can
be assembled with dynamic or static linking. In the static case, the Swift script and
the native code libraries are bundled into a single executable, with minimal system
dependencies (for the most efficient loading on a large-scale machine).
3.2.3 Concurrency
1 i n t X = 100 , Y = 100;
2 int A[][];
3 int B[];
4 f o r e a c h x i n [ 0 : X−1] {
5 f o r e a c h y i n [ 0 : Y−1] {
6 i f ( check ( x , y ) ) {
7 A[ x ] [ y ] = g ( f ( x ) , f ( y ) ) ;
8 } else {
9 A[ x ] [ y ] = 0 ;
10 }
11 }
12 B[ x ] = sum(A[ x ] ) ;
13 }
that are applied to the leaf function invocation. A generic annotation takes the
form
where the key and the value denote the annotation type.
Task locations. Task locations allow the developer to specify the location of task
execution in the system. Locations are optional; by default, Swift/T places the next
task in a location determined by the tasks load balancer. Locations can be used
to direct computation to part of the system for multiple reasons [Dur+16]. In a
data-intensive application, tasks can be sent to the location containing the data to
be processed. In a workflow with resident tasks [Ozi+15], certain processes retain
state from task to task, and can be queried by sending a task to that process.
1 foreach i in [0:9] {
2 @par=i s i m u l a t e ( i ) ;
3 }
Python Support
Python is a widely used high level programming languages used in several scientific
fields. During the years the Python community has developed several libraries as
well as code fragments for a wide range of problems.
1 e x p o r t PYTHONPATH=$PWD
2 s w i f t −t −p python−f . s w i f t
For this reason many users ask for tools that enable to access Python from the top
level of the scientific workflow; and optionally call down from the interpreted level
A basic example of Python usage from Swift/T is shown in Listings 3.7, 3.8 and 3.6.
In this three codes fragment, a short module is defined in F.py (Listing 3.7) which
provides an addition function named f().
1 def f (x , y ) :
2 r e t u r n s t r ( x+y )
A call to this function from Swift/T is shown in python-f.swift (Listing 3.8) lines
3-5. The string containing the python code is populated with the Pythonic % operator,
which fills in values for x and y at the conversion specifiers %i. The Python function
F.f() receives these values, adds them, and returns the result as a string. Swift/T
receives the result in z and reports it with the Swift/T builtin trace() function.
1 import python ;
2 x = 2; y = 3;
3 z = python ( " import F " ,
4 " F . f (%i ,% i ) "
5 % (x , y) ) ;
6 trace (z) ;
Thus, data can easily be passed to and from Python with Pythonic conventions; only
a string formatting is required. To execute, the user simply sets PYTHONPATH (Listing
3.6) so that the Python interpreter can find module F, and runs swift-t.
R Support
The R support in Swift/T is similar to the Python support. An example use case
is shown in Listing 3.9. This script is devoted to run a collection of simulations in
parallel, then send result values to R for statistical processing. The first section (lines
1-4) simply imports requisite Swift packages. The second section (lines 6-10) defines
the external simulation program, which is implemented as a call to the bash shell
random number generator, seeded with the simulation number i. The output goes
to temporary file o. The third section (lines 11-15) calls the simulation a number of
times, reading the output number from disk and storing it in the array results. The
fourth section (lines 16-19) computes the mean of results via R. It joins the results
1 import io ;
2 import string ;
3 import files ;
4 import R;
5
6 app ( f i l e o ) s i m u l a t i o n ( i n t i ) {
7 " bash " "−c "
8 ( "RANDOM=%i ; echo $RANDOM" % i )
9 @stdout=o ;
10 }
11 string results [];
12 foreach i in [0:9] {
13 f = simulation ( i ) ;
14 r e s u l t s [ i ] = read ( f ) ;
15 }
16 A = join ( results , " , " ) ;
17 code = "m = mean( c(%s ) ) ) " % A ;
18 mean = R( code , " t o S t r i n g (m) " ) ;
19 p r i n t f (mean) ;
The C-JVM engine supports up to now all the above languages, but the support for
others languages and frameworks, like Apache Spark Library are currently ongoing
works. The architecture of the C-JVM engine follows.
The C-JVM-c uses the Java Java Native Interface (JNI) API to initialize a new JVM
and invoke Java codes to evaluate a string, containing the code of a interpreted
language, by using the C-JVM-j. JNI is a programming framework that enables Java
code to call and to be called by native applications such as C, C++ and assembly.
The Listing 3.10 depicts the C code of the C-JVM-c interface. The reader will observe
that the functions exported are those for evaluating a string of code written in each
of the supported interpreted languages.
1 #i f n d e f SWIFT_JVM_H_ / * I n c l u d e guard * /
2
3 #d e f i n e SWIFT_JVM_H_
4
15 #e n d i f //SWIFT_JVM_H_
The code in Listing 3.11 shows the implementation of C-JVM-c. The C code uses JNI
to call static Java methods, provided by C-JVM-j. These methods enable to evaluate
strings of code. Two type of evaluation are supported: one will be used when the
code is supposed to provide an output (as a string) and one will be used when no
output is expected. For instance, lines 40 − 46 evaluates a string of Groovy code
that is supposed to provide an output. The first step shown in line 42 is the JVM
initialization. Then, two C-JVM-j methods are invoked, the first method set the
engine of the JVM interpreter as Groovy, while the second invokes the method eval
that returns the output string, given by the evaluation of the Groovy code.
6 / * −− Macro D e f i n i t i o n s * /
7 / * hide * /
8
9 JNIEnv * env ;
10 JavaVM * jvm ;
11
18 cha r * c r e a t e C l a s s P a t h S t r i n g ( c har * j a r s _ d i r )
19 {
20 / * hide * /
27 c h a r * c a l l _ j a v a _ s t a t i c _ c h a r _ m e t h o d ( c har * j a v a _ c l a s s _ n a m e , c har
* method_name , cha r * sengine , char * scode )
28 {
29 / * h i de * /
30 }
31
32 s t a t i c int init_jvm () {
33 /* h i d e */
34 }
35 void destroy_jvm ()
36 {
37 / * h ide * /
38 }
39 / * E v a l u a t e Groovy Code and r e t u r n s a char a r r a y o f t h e s t d i o * /
40 c h a r * groovy ( char * code )
41 {
42 i f ( jvm == NULL) i n i t _ j v m ( ) ;
43 call_java_static_method (
" i t / i s i s l a b / s w i f t / i n t e r f a c e s / SwiftJVMScriptingEngine " ,
" s e t E n g i n e " , " groovy " ) ;
44 char * t o r = c a l l _ j a v a _ s t a t i c _ c h a r _ m e t h o d
( " i t / i s i s l a b / s w i f t / i n t e r f a c e s / SwiftJVMScriptingEngine " , " eval " ,
" groovy " , code ) ;
45 return tor ;
46 }
47 / * E v a l u a t e C l o j u r e Code and r e t u r n s a char a r r a y o f t h e s t d i o * /
48 c h a r * c l o j u r e ( ch ar * code )
49 {
50 i f ( jvm == NULL) i n i t _ j v m ( ) ;
51 c h a r * t o r=c a l l _ j a v a _ s t a t i c _ c h a r _ m e t h o d
( " i t / i s i s l a b / s w i f t / i n t e r f a c e s / SwiftJVMScriptingEngine " , " eval " ,
" c l o j u r e " , code ) ;
52 return tor ;
53 }
54 / * E v a l u a t e S c a l a Code and r e t u r n s a char a r r a y o f t h e s t d i o * /
55 c h a r * s c a l a ( ch ar * code )
56 {
57 i f ( jvm == NULL) i n i t _ j v m ( ) ;
58 c h a r * t o r=c a l l _ j a v a _ s t a t i c _ c h a r _ m e t h o d
( " i t / i s i s l a b / s w i f t / i n t e r f a c e s / SwiftJVMScriptingEngine " , " eval " ,
" s c a l a " , code ) ;
59 return tor ;
60 }
61 / * E v a l u a t e J a v a S c r i c p t Code and r e t u r n s a char a r r a y o f t h e s t d i o * /
The listing 3.12 depicts the usage of the C-JVM-c library from C code, for evaluating
a string of Groovy code.
10 }
The C-JVM-j is a Java library developed using Maven. C-JVM-j is composed by six
modules: swift-jvm-build, swift-clojure, swift-groovy, swift-scala and swift-javascript,
swift-interfaces. The swift-jvm-build module defines the build of the library. The
swift-*(language-name) modules define the Java classes for evaluating the code of a
particular interpreted languages. Finally, the swift-interfaces module is the hearth of
C-JVM-j as it provides the external library functionalities. In the following a detailed
description of the module swift-interfaces will be provided.
1 package i t . i s i s l a b . s w i f t . i n t e r f a c e s ;
2 import j a v a x . s c r i p t . S c r i p t E x c e p t i o n ;
1 package i t . i s i s l a b . s w i f t . i n t e r f a c e s ;
2 import j a v a . i o . S t r i n g W r i t e r ;
3 import j a v a x . s c r i p t . S c r i p t C o n t e x t ;
4 import j a v a x . s c r i p t . S c r i p t E n g i n e ;
5 import j a v a x . s c r i p t . Scrip tEngineManager ;
6 import j a v a x . s c r i p t . S c r i p t E x c e p t i o n ;
7 import i t . i s i s l a b . s w i f t . s c a l a . S c a l a S c r i p t E n g i n e ;
8 import i t . i s i s l a b . s w i f t l a n g . s w f i t _ c l o j u r e . C l o j u r e S c r i p t E n g i n e ;
9 public c l a s s SwiftJVMScriptingEngine {
10 p u b l i c s t a t i c S c r i p t E n g i n e engine ;
11 p u b l i c s t a t i c S t r i n g engine_name ;
12 p u b l i c s t a t i c v o i d s e t E n g i n e ( S t r i n g engine_name_given )
13 {
14 engine_name=engine_name_given ;
15 try {
16 s w i t c h ( engine_name ) {
17 c a s e SwiftJVMScriptingEngineNames . CLOJURE :
18 engine = new C l o j u r e S c r i p t E n g i n e ( ) ;
19 break ;
20 c a s e SwiftJVMScriptingEngineNames . GROOVY:
21 engine = new ScriptEn gineManager ( ) . getEngineByName
( SwiftJVMScriptingEngineNames . GROOVY) ;
22
23 break ;
24 c a s e SwiftJVMScriptingEngineNames . SCALA :
25 engine = new S c a l a S c r i p t E n g i n e ( ) ;
26 break ;
27 c a s e SwiftJVMScriptingEngineNames . JAVASCRIPT :
28 engine = new ScriptEn gineManager ( ) . getEngineByName (
SwiftJVMScriptingEngineNames . JAVASCRIPT ) ;
29 break ;
30 default :
31 break ;
32 }
33 } catch ( ScriptException e) {
57 default :
58 return null ;
59 }
60 } catch ( ScriptException e) {
61 e . printStackTrace () ;
62 return null ;
63 }
64
65 }
66 p u b l i c s t a t i c S t r i n g e v a l ( S t r i n g engine_name_given , S t r i n g
code )
67 {
68 O b j e c t o ut p ut=n u l l ;
69 StringWriter writer ;
70 ScriptContext context ;
71 engine_name=engine_name_given ;
72 try {
73 s w i t c h ( engine_name ) {
74 c a s e SwiftJVMScriptingEngineNames . CLOJURE :
75 engine = new C l o j u r e S c r i p t E n g i n e ( ) ;
76 o ut pu t=(engine . e v a l ( code , engine .
getContext () ) ) . toString () ;
77 r e t u r n ou tp u t != n u l l ? ou t pu t . t o S t r i n g ( ) : " " ;
78 c a s e SwiftJVMScriptingEngineNames . GROOVY:
79 engine = new ScriptE ngineManager ( ) .
getEngineByName ( SwiftJVMScriptingEngineNames . GROOVY) ;
80 w r i t e r = new S t r i n g W r i t e r ( ) ;
1 package i t . i s i s l a b . s w i f t . i n t e r f a c e s ;
2 p u b l i c c l a s s SwiftJVMScriptingEngineNames {
3 s t a t i c f i n a l S t r i n g CLOJURE = " c l o j u r e " ;
4 s t a t i c f i n a l S t r i n g GROOVY = " groovy " ;
5 s t a t i c f i n a l S t r i n g SCALA = " s c a l a " ;
6 s t a t i c f i n a l S t r i n g JAVASCRIPT = " j a v a s c r i p t " ;
7 }
The C-JVM requires different tools for building the library: Java Development Kit
(JDK) version major/equal to 1.7, Maven 3, gcc 4.2, autoconf (GNU Autoconf) 2.69
and automake (GNU automake) 1.14. The commands in the Listing 3.16 allows to
build the C-JVM (assuming that the script is executed from the root of the project).
1 ./ bootstrap
2 ./ configure
3 make
4 #change t h i s with a d d i t i o n a l j a r f o l d e r l i b r a r i e s
5 e x p o r t SWIFT_JVM_USER_LIB= s w i f t −jvm / s w i f t −jvm−b u i l d /
t a r g e t / s w i f t −jvm−b u i l d −0.0.1− b i n / s w i f t −jvm / c l a s s e s /
6 #change t h i s with JVM home
7 e x p o r t LD_LIBRARY_PATH= / u s r / l i b / jvm / j a v a −8−o r a c l e / j r e / l i b /
amd64/ s e r v e r
1 #i f n d e f TCL_JVM_H
2 #d e f i n e TCL_JVM_H
3 void t c l _ j v m _ i n i t ( T c l _ I n t e r p * i n t e r p ) ;
4 #e n d i f
Listing 3.17 and 3.18 refer to the implementation of the C back-end that exploits
the C-JVM library (see Section 3.3.1) to evaluate code in Groovy, Clojure, Scala
and JavaScript. For instance, the function Clojure_Eval_Cmd (see Listing 3.18,
lines 14 − 31) allows to evaluate a string code of Groovy language. The C-JVM
clojure function is called (line 24) to evaluate the string code parameter (recoverd
at line 19). Finally, at lines 110 − 113, the four functions clojure, groovy, scala
and javascript are exported to the Tcl environment (the C code is converted in Tcl
package). Therefore, these functions can be used to define new Swift/T functions,
shown in the Listing 3.19. The Listing 3.20 depicts the instructions to build the TCL
module.
1 #i n c l u d e " c o n f i g . h "
2 # i f HAVE_JVM_SCRIPT==1
3 #i n c l u d e " s w i f t −t−jvm / s r c / s w i f t −jvm . h "
4 #e n d i f
5 #i n c l u d e <s t d i o . h>
6 #i n c l u d e < t c l . h>
7 #i n c l u d e <s t r i n g . h>
8 #i n c l u d e < l i s t . h>
9 #i n c l u d e " s r c / u t i l / debug . h "
10 #i n c l u d e " s r c / t c l / u t i l . h "
11 #i n c l u d e " t c l −jvm . h "
12 # i f HAVE_JVM_SCRIPT==1
13 static int
14 Clojure_Eval_Cmd ( C l i e n t D a t a cdata , T c l _ I n t e r p * i n t e r p ,
15 i n t objc , Tcl_Obj * const objv [ ] )
16 {
17 TCL_ARGS ( 3 ) ;
18 // A chunk o f C l o j u r e code
19 c h a r * code = T c l _ G e t S t r i n g ( o b j v [ 1 ] ) ;
20 // A chunk o f C l o j u r e code t h a t r e t u r n s a v a l u e
21 c h a r * expr = T c l _ G e t S t r i n g ( o b j v [ 2 ] ) ;
22 c l o j u r e ( code ) ;
23 // The s t r i n g r e s u l t from C l o j u r e : D e f a u l t i s empty s t r i n g
24 c h a r * s = c l o j u r e ( expr ) ;
25 TCL_CONDITION( s != NULL , " c l o j u r e code f a i l e d : %s " , code ) ;
26 T c l _ O b j * r e s u l t = Tcl_NewStringObj ( s , s t r l e n ( s ) ) ;
27 i f ( s t r l e n ( s )>0)
28 free ( s ) ;
29 Tcl_SetObjResult ( interp , r e s u l t ) ;
30 r e t u r n TCL_OK ;
31 }
32 static int
33 Groovy_Eval_Cmd ( C l i e n t D a t a cdata , T c l _ I n t e r p * i n t e r p ,
34 i n t objc , Tcl_Obj * const objv [ ] )
35 {
36 TCL_ARGS ( 2 ) ;
37 // A chunk o f Groovy code :
38 c h a r * code = T c l _ G e t S t r i n g ( o b j v [ 1 ] ) ;
39 // The s t r i n g r e s u l t from Groovy : D e f a u l t i s empty s t r i n g
40 c h a r * s = groovy ( code ) ;
41 TCL_CONDITION( s != NULL , " groovy code f a i l e d : %s " , code ) ;
42 T c l _ O b j * r e s u l t = Tcl_NewStringObj ( s , s t r l e n ( s ) ) ;
43 i f ( s t r l e n ( s )>0)
44 free ( s ) ;
45 Tcl_SetObjResult ( interp , r e s u l t ) ;
46 r e t u r n TCL_OK ;
47 }
48 static int
49 J a v a S cript_Eval_Cmd ( C l i e n t D a t a cdata , T c l _ I n t e r p * i n t e r p ,
50 i n t objc , Tcl_Obj * const objv [ ] )
105 r e t u r n TCL_OK ;
106 }
107 void
108 tcl_jvm_init ( Tcl_Interp * interp )
109 {
110 COMMAND( " c l o j u r e " , Clojure_Eval_Cmd ) ;
111 COMMAND( " groovy " , Groovy_Eval_Cmd ) ;
112 COMMAND( " j a v a s c r i p t " , JavaScript_Eval_Cmd ) ;
113 COMMAND( " s c a l a " , Scala_Eval_Cmd ) ;
114 }
1 @dispatch=WORKER
2 ( s t r i n g o ut pu t ) c l o j u r e ( s t r i n g code , s t r i n g expr )
3 " turbine " " 0.1.0 "
4 [ " s e t <<output>> [ jvm : : c l o j u r e <<code>> <<expr>> ] " ] ;
5
6 @dispatch=WORKER
7 ( s t r i n g o ut pu t ) groovy ( s t r i n g code )
8 " turbine " " 0.1.0 "
9 [ " s e t <<output>> [ jvm : : groovy <<code>> ] " ] ;
10
11 @dispatch=WORKER
12 ( s t r i n g o ut pu t ) j a v a s c r i p t ( s t r i n g code ) " t u r b i n e " " 0 . 1 . 0 "
13 [ " s e t <<output>> [ jvm : : j a v a s c r i p t <<code>> ] " ] ;
14
15 @dispatch=WORKER
16 ( s t r i n g o ut pu t ) s c a l a ( s t r i n g code )
17 " turbine " " 0.1.0 "
18 [ " s e t <<output>> [ jvm : : s c a l a <<code>> ] " ] ;
1 # MODULE TCL−JVM
2 DIR := s r c / t c l / jvm
3 TCL_JVM_SRC := $ ( DIR ) / t c l −jvm . c
1 ...
2 ...
3 # JVM s c r i p t i n g s u p p o r t : D i s a b l e d by d e f a u l t
4 HAVE_JVM_SCRIPT=0
5 USE_JVM_SCRIPT_HOME=0
6 AC_ARG_ENABLE( jvm−s c r i p t i n g ,
7 AS_HELP_STRING([−−enable−jvm−s c r i p t i n g ] ,
8 [ Enable c a l l i n g JVM s c r i p t i n g l a n g u a g e s ] ) ,
9 [
10 HAVE_JVM_SCRIPT=1
11 USE_JVM_SCRIPT_HOME=s w i f t −t−jvm
12 ])
13 AC_ARG_WITH( jvm−s c r i p t i n g ,
14 AS_HELP_STRING([−−with−jvm−s c r i p t i n g ] ,
15 [ Use t h i s JVM s c r i p t i n g p l u g i n home d i r e c t o r y ] ) ,
16 [
17 HAVE_JVM_SCRIPT=1
18 USE_JVM_SCRIPT_HOME=${ w i t h v a l }
19 ])
20 i f ( ( ${HAVE_JVM_SCRIPT} ) )
21 then
22 AC_CHECK_FILE ( ${USE_JVM_SCRIPT_HOME}/ s r c / s w i f t −jvm . h , [ ] ,
23 [AC_MSG_ERROR( [ Could not f i n d JVM s c r i p t i n g
header ! ] ) ] )
24 AC_MSG_RESULT ( [JVM s c r i p t i n g enabled ] )
25 else
26 AC_MSG_RESULT ( [JVM s c r i p t i n g d i s a b l e d ] )
27 fi
28
33 #JVM HOME
34 AC_SUBST(JVMHOME, " / u s r / l i b / jvm / j a v a −8−o r a c l e " )
35 AC_ARG_WITH( [ jvm−home ] ,
36 [ AS_HELP_STRING([−−with−jvm−home ] ,
37 [ S e t up t h e jvm home d i r e c t o r y ( d e f a u l t :
/ u s r / l i b / jvm / j a v a −8−o r a c l e ) ] ) ] ,
38 [AC_SUBST(JVMHOME, $ w i t h v a l ) ] ,
39 )
40
41 #JVM SWIFT−T L I B s
42 AC_SUBST( JVMLIB , $ (pwd) " / s w i f t −jvm / s w i f t −jvm−b u i l d / t a r g e t /
s w i f t −jvm−b u i l d −0.0.1− b i n / s w i f t −jvm / c l a s s e s " )
43 AC_ARG_WITH( [ s w i f t −jvm−engine−l i b ] ,
44 [ AS_HELP_STRING([−−with−s w i f t −jvm−engine−l i b ] ,
45 [ S e t up t h e s w i f t jvm engine l i b ( d e f a u l t : c l a s s e s ) ] ) ] ,
46 [AC_SUBST( JVMLIB , $ w i t h v a l ) ] ,
1 HAVE_JVM_SCRIPT = @HAVE_JVM_SCRIPT@
2 USE_JVM_SCRIPT_HOME= @USE_JVM_SCRIPT_HOME@
3 ...
4 ...
5 # LIBS : l i n k s t o ADLB , c−u t i l s , MPE, and MPI
6 ...
7 ...
8 i f e q ( $ ( HAVE_JVM_SCRIPT ) , 1 )
9 SWIFTTJVM_LIB = $ (USE_JVM_SCRIPT_HOME) / s r c
10 LIBS += −L$ ( SWIFTTJVM_LIB ) / . l i b s −l s w i f t t j v m
11 endif
12 ...
13 ...
14 ### INCLUDES
15 ...
16 ...
17 i n c l u d e s r c / t c l / jvm / module . mk
18 ...
19 ...
20 TURBINE_SRC += $ ( JVM_SCRIPT_SRC )
21 TURBINE_SRC += $ (TCL_JVM_SRC)
22 ...
23 ...
Listing 3.23 provides an example of code, which explains the usage of the JVM
interpreted languages support in Swift/T.
1 import jvm ;
2
12 s4 = c l o j u r e ( " \ " CLOJURE SETUP \ " " , " \ " CLOJURE WORKS\ " " ) ;
13 t r a c e ( s4 ) ;
— Paul G. Constantine
(Active Subspaces: Emerging Ideas for Dimension
Reduction in Parameter Studies, 2015)
4.1 Introduction
Complex system simulation are continuously gaining relevance in business and
academic fields as powerful experimental tools for research and management, in
particular for Computational Science. Simulations are mainly used to analyze be-
haviours that are too complex to be studied analytically, or too risky/expensive to be
tested experimentally [Law07; TS04]. The representation of such complex systems
results in a mathematical model comprising several parameters. Hence, there arises
a need for tuning a simulation model, that is finding optimal parameter values which
maximize the effectiveness of the model. Considering the multi-dimensionality of
the parameter space, finding out the optimal parameters configuration is not an easy
undertaking and requires extensive computing power. Simulations Optimization
(SO) [TS04; He+10] and Model Exploration (ME) is used to refer to the techniques
studied for ascertaining the parameters of the model that minimize (or maximize)
given criteria (one or many), which can only be computed by performing a simula-
tion run. This work consideres the SO process as general case of ME, where the ME
is guided by some optimization algorithms.
This work is mainly focused on Agent-based models (ABMs) where the simulation is
based on a large set of independent agents, interacting with each other through sim-
ple rules, generating a complex collective behaviour. What makes ABMs particularly
interesting is that they allow the reproduction of complex and significant aspects of
real phenomena by defining a small set of simple rules regulating how agents interact
in social structures and how information is spread from agent to agent. ABMs have
been successfully applied in several fields such as biology, sociology, economics,
military and infrastructures – for a review of ABM applications see [MN05]. The
115
computer science community has responded to the need for tools and platforms that
can help the development and testing of new models in each specific field by provid-
ing libraries and frameworks that speed up and make easier the tasks of developing
and testing simulations. Some examples are NetLogo [TW04], MASON [Luk+04;
Luk+05] and Repast [Nor+07], described in Section Agent-Based Simulation: State
of Art.
It should be noted that although ABMs are governed by simple rules, interactions
between agents generate network effects that lead to a high degree of complex
behaviour [MN05] where it is quite hard to discern any relation between changes
in variables and changes in the resulting global behaviour. In particular, the shape
of the objective functions is irregular: there are large areas where changes in the
parameters do not affect the final behaviour but at the same time a small change,
like the butterfly effect, may provide a significant shift within a complex simulation
[CH05]. To make matters even more complicated, as a consequence of the stochastic
character of the simulation, a static surface does not even exist but has to be
approximated by multiple simulation runs [DK14; Law07].
In summary, complex simulations, and ABM in particular, are powerful tools for
modeling aspects of real systems. On the other hand, due to the the high dimen-
sionality of the search space, the heterogeneity of parameters, the irregular shape
and the stochastic nature of the objective evaluation function, the tuning of such
systems is extremely demanding from the computational point of view. This raises
the need for tools, which exploit the computing power of parallel systems to improve
the effectiveness and the efficiency of SO strategies. The crucial characteristics of
such tools are: zero configuration, ease of use, programmability and efficiency. Zero
Configuration and easiness of use are required because both the design and the
use of SO strategies are performed by domain experts who seldom are computer
scientists and have limited knowledge of managing modern parallel infrastructures.
Programmability is mandatory because different models usually requires different
SO strategies. Finally, the system must be efficient in order to be able to exploit the
computing power provided by extreme scale architectures.
This Chapter discusses two framework for SO process, respectively, primarily de-
signed for Cloud infrastructure and HPC systems.
The simulation optimization (SO) problem [CL10; TS04; Amm+11; Nel10] can be
presented as
min Γ(x),
x∈D
Generally, the problem has a single objective (i.e., Γ(x) ∈ R), however multi-
objective optimization problems (Γ(x) ∈ Rn ) can be also considered. For the remain-
der of the Chapter, single-objective optimization problems are considered; however,
the proposed methodology can easily be applied to multi-objective optimization in a
similar fashion, as described in [BM05]. The stochastic nature of simulation means
that the output of a simulation run is not deterministic and we calculate an expected
value for it as E[Φ(x, )], where Φ(x, ) is the result of a stochastic simulation run
on configuration x and a random feed . Finally we calculate Γ(x) = f (E[Φ(x, )]),
where f (·) is a function that evaluates the result of a simulation and calculates a
single rank value. For instance, in [CL10], the value of E[·] is estimated as a mean
result of r ≥ 1 simulation runs.
• Repast Simphony, [MN+13] is the Java based toolkit of the Repast Suite. Given
a parameter space as input, Repast Simphony’s batch run functionality can
divide that space into discrete sets of parameter values and execute simulations
over those discrete sets in parallel. The simulations can be run on a local
machine, on remote machines accessible through secure shell (ssh), in the
cloud (e.g., Amazon EC2) or on some combination of the three. Using an
InstanceRunner interface, Repast Simphony models can be launched by other
control applications such as a bash, Portable Batch System (PBS), or Swift
scripts. For example, [Ozi+14] describes how the InstanceRunner can be used
None of the ABM toolkits on their own offer the capabilities or scope, in terms of flexible,
simple integration of external model exploration tools and performance on massively
parallel computing resources, that SOF and EMEWS framework aim to provide.
Model exploration libraries and frameworks. In the following, the existing model
exploration (ME) or in general simulation optimization (SO) libraries and frame-
works are briefly discussed. While most can be used as standalone ME tools, some of
the these libraries can also be used as ME modules within the presented frameworks.
Most of the following software falls under the metaheuristics umbrella. For an
overview of metaheuristics see [Luk13], for reviews of more metaheuristics frame-
works see [Par+11] and for parallel metaheuristics frameworks see [Alb+13].
Hence this framework was designed exploiting the assumption that SO processes
can be easily deployed by exploiting the MapReduce (MP) programming model.
Moreover, an SO process potentially requires several optimization loops in which a
large amount of data is generated. The amount of inputs and outputs generated in a
SO process, that must be managed in a distributed storage environment, is usually
quite large. The MapReduce paradigm and Apache Hadoop will be briefly described
in the following.
Hadoop defines a specification for the Map and Reduce functions, the developers
must provide the input/output specific and the implementations of Map and Reduce
• Splitter, handle the single data source providing input pairs (key/value) to
mappers.
• Mapper, process a key/value pair to generate a set of intermediate key/value
pairs.
• Combiner, also called “Local Reducer” (optional). It can help cutting down the
amount of data exchanged between Mappers and Reducers.
• Partitioner, also called the “Shuffle Operation”. It ensures that records with the
same key will be assigned to the same Reducer.
• Reducer, gathers the results of the computation and concludes the job giving
outputs the new set of key/value, typically stored in the HDFS.
4.2.1 Architecture
This Section presents the Simulation exploration and Optimization Framework on the
cloud (SOF), a framework that allows us to run and collect results for two kinds of
optimization scenarios: parameter space exploration or model exploration (PSE or
ME) and simulation optimization (SO).
Figure 4.1 depicts the SOF work cycle which comprises three phases: selection,
parallel simulations and evaluation. SOF provides a set of functionality that allows
developers to construct their own simulation optimization strategy. The framework
was designed under the following objectives:
• Θ, parameters space;
• D ⊆ Θ, feasible decision space;
• X ⊆ D is a set of configurations from the feasible decision space D, X =
{x1 , x2 , . . . : xi ∈ D};
• r denotes the number of simulation run;
• Φ(x, ) denotes the results of a stochastic simulation run on configuration x
and a random feed ;
• E[· · · ] denotes the expected results of a set of stochastic simulation run;
• Y is the set of expected simulation results corresponding to the configuration
in X.
• t is the current optimization loop;
• Γ contains the ranking values associated to the configurations in X;
INPUT: X, Φ(·, ·)
OUTPUT:
Y
for each
xi ∈ X
for j ← 1 to r
parallel do do Zj ← Φ(xi , j )
Yi ← E[Z1 , Z2 , . . . , Zr ]
Y = {Y , Y , . . .}
1 2
PSE Algorithm (PSE). The PSE or ME scenario describes a generic process of sim-
ulation optimization where a fixed set of configuration X is executed and all the
Then the modeler wishing to use an implementation of SO developed for SOF must
provide:
1. User Request. The user submits the Simulation Implementation, the Selection
Function and the Evaluation Function written using any language supported
by the cloud environment. Then s/he defines the Parameters Domain, the
Simulation Input, Output and Rating format in XML using the SOF XML
schema.
2. Selection. The system processes the request using the Selection Function and
generates a set of parameters according to the XML schema defined by the
user.
3. Spread. The generated XML inputs are dynamically assigned to the compu-
tational resources. We notice that our system delegates to the distributed
computing environment (Hadoop in our case) both scheduling and load bal-
ancing of tasks (simulations).
4. Collect. When all the simulations run terminated, the computation state is syn-
chronized and the outputs are collected according to the XML schema defined
by the user, through a set of messages exchanged between the computational
resources and the system.
5. Evaluation Phase. The system applies the evaluation function on the collected
outputs and generates the rating (again in the desired XML format).
After the evaluation phase, the system goes back to the selection phase, which, also
using the evaluation results obtained during the preceding steps, generates a new
set of XML inputs. Obviously, the selection function also includes a stopping rule
which allows to end the SO process.
During the spread phases, the framework executes a large number of simulations
in order to achieve the results of a PSE or a SO scenario. The challenge is “How
to elaborate a large number of inputs, on a distributed system, in order to ensure
fault tolerance and good performance, even for different SO processes running
Apache Hadoop, briefly described in Section 4.2, provides some tools for managing
MapReduce applications and the HDFS File System. It also provides a set of Java
libraries for writing MapReduce applications. According to the language used by the
Simulation Implementation, it will be possible to run the MapReduce application
in several ways. For instance, when the implementation is written in Java (e.g,
MASON, Repast Simphony) is it possible to write a MapReduce application that
initializes the simulation at code level by using mechanisms like Java Reflection.
Other frameworks, like Netlogo, provide a Java library for executing simulations
from a Java application. Eventually, in the case of generic implementations, the
setting of simulation parameters is performed using the Java Runtime to set the
input as command line arguments of the executable.
• the SOF front-end (client side), which is the SOF application for running and
managing the simulation on the Hadoop infrastructure;
• the Hadoop layer (remote side), which comprises software and libraries pro-
vided from Hadoop infrastructure;
• the SOF core, composed of six functional blocks, that are used on both the
client and the remote side.
The core. The main objective of the SOF core is to ensure the flexibility in terms
of the ability to use any Hadoop installation on-the-fly without requiring a specific
• Parameters Manager: defines the XML schema of the Parameters Domain Defini-
tion, Simulation Input Definition, Simulation Output Definition and Simulation
Rating Definition. It also provides routines for creating, managing and verifying
the XML files.
• File System: defines the structure of the SOF Environment, that is the directories
hierarchy on the HDFS, Remote and Client hosts. This block exposes routines
to get the paths of a simulation, a simulation loop or for temporary files and
folders on both the remote and client hosts.
• HDFS Manager: is responsible for monitoring and creating files on the HDFS.
• MapReduce (SOF process) and Asynchronous Executor (SOF-RUNNER): allow
execution of the SO algorithm on a Hadoop environment.
• Simulation Manager: is the fundamental block in the SOF architecture and
provides the routines for executing and monitoring simulations. This block
uses SSH to invoke an asynchronous execution of the SOF-RUNNER. When an
SO process is started, the remote process ID is stored in the XML simulation
descriptor file on the HDFS. In this way it is always possible to monitor the SO
process on the remote machine and it is also possible to stop/restart or abort
the SO process.
The interactions with Hadoop. SOF was designed under the assumption that the
remote host is a Unix machine. Therefore, the interactions between client, remote
host and Hadoop system are made using SSH and Unix commands. An important
contribution of SOF is that presents a novel approach to managing SO processes by
embedding them in the MapReduce paradigm. SOF is focused on ABM simulations,
hence considers three types of simulation frameworks: MASON, NetLogo and a
generic. The first two are the most relevant ABM frameworks in the ABM community;
the last refers to any application executable on the computation host.
Following are described in detail the interaction between the SOF core and Hadoop.
The main events in the system are:
• User Login: After the user login on the Remote machine, the system automati-
cally builds a new SOF Environment on the Remote machine and the HDFS and
copies two programs onto the remote machine: SOF and SOF-RUNNER. SOF is
the MapReduce application specialized for execution, on Hadoop, of MASON,
NetLogo or a generic simulation framework. SOF-RUNNER is the SOF process
manager, this process is responsible for executing the PSE or SO algorithm
Due to the asynchronous nature of the system and decoupling from the Hadoop
infrastructure, all states of the processes are visible only by reading the state of
the SOF Environment, which comprises the Simulation Environment of all the SO
processes in the system. On the HDFS, the SOF Environment contains the state of
the simulation and the state of the optimization loop for any SO process. On the
Remote machine the SOF Environment stores the state of the SOF-RUNNER process:
in this way, it is also possible to stop/restart or abort any SO process. SOF has
been designed for the concurrent optimization of different simulations performed by
one or many users. In order to avoid the concurrency issues, SOF uses a separated
Simulation Environment with a unique identifier for each SO process.
1. First, it executes the selection function using the Java Runtime. The selection
function takes three arguments: the path to the input sets already executed,
the path to the rating corresponding to the input sets executed and the path
where the function creates the novel input set.
2. When the selection function ends, the SOF-RUNNER transforms the input set
from XML format to a standard format for Hadoop MapReduce application
and copies it on the HDFS (this is not strictly necessary because Hadoop in the
latest version support also XML input but, to ensure compatibility with a larger
number of Hadoop cluster, we preferred to use a standard format).
3. Then it launches the SOF MapReduce application. The MapReduce application
(SOF) consists of two main routines map and reduce as described in the section
4.2:
a) the map routine corresponds to executing the simulation and generating
an output XML file, which represents the final state of the executed
simulation;
b) the reduce executes the evaluation function, using the Java Runtime. The
evaluation function takes two arguments: the path of the output XML
files and the path where the function creates the rating XML file.
4. When the evaluation function ends the reducer puts the rating XML file on the
HDFS.
4.2.4 Evaluation
This Section describes the benchmarks used to evaluate SOF scalability and give the
results obtained on an Hadoop cluster machine.
The Benchmark Datasets. We have tested a simple SO process test on NetLogo Fire
model [Wil97]. This model simulates the spread of a fire through a forest. It shows
that the fire’s chance of reaching the right edge of the forest depends critically on
the density of trees. This is an example of a common feature of complex systems,
the presence of a non-linear threshold or critical parameter. In particular, at 59%
density, the fire has a 50/50 chance of reaching the right edge. The Fire model has
also been used to validate the OpenMOLE platform [Reu+13].
Since we are evaluating the performance of the framework, the SO process is based
on an empty f _ Evaluate(·) function while the f _Selection(·, ·, ·) function generates
a set of n configurations for the first 10 loops and an empty set, at the end of the 10th
loop, so that the SO process terminates. Each configuration consists of the density
parameter and a seed for the random generator. All the simulations perform 1000
simulation steps.
• Hardware:
– CPUs: 2 x Intel(R) Xeon(R) CPU E5-2680 @ 2.70GHz (#core 16, #threads
32)
– RAM: 256 GB
– Network: adapters Intel Corporation I350 Gigabit
• Software:
– Ubuntu 12.04.4 LTS (GNU/Linux 3.11.0-15-generic x86_64)
– Java JDK 1.6.25
– Apache Hadoop 2.4.0
Strong and Weak Scalability. In order to better evaluate the scalability efficiency
of the framework, the weak and strong scaling efficiency (described in Section 1.4)
test were computed. The strong scaling efficiency measures the capability of the
framework to complete a set of simulations in a reasonable amount of time. In
order to compute the strong scalability efficiency the problem size stays fixed but
the number of processing elements are increased. The strong scalability efficiency
(as percentage of the optimum) is given by (t1 × 100)/(p × tp )% where ti denotes
the completion time to perform the overall set of simulations using i cluster nodes.
In our cases the strong scaling efficiency ranges from 34.16% (p = 4, n = 2000) to
95.59% (p = 2, n = 32000). The weak scaling efficiency measures the capability
of the framework to solve larger problems as the number of processing element
increases. In order to compute the weak scalability efficiency the problem size
assigned to each processing element stays constant and additional elements are
used to solve a larger problem. The weak scalability efficiency (as percentage of the
optimum) is given by (t1 /tp ) × 100% and in our cases is equal to 100% for p = 2 and
84.22% for p = 4. The speedup, on the larger problem (n = 32000), is 1.91 for p = 2
and 3.27 for p = 4.
Fig. 4.4.: Overview of Extreme-scale Model Exploration with Swift/T (EMEWS) framework.
Figure 4.4 illustrates the main components of the EMEWS framework. The main
user interface is the Swift script, a high-level program. The core novel contributions
of EMEWS are shown in green, these allow the Swift script to access a running ME
algorithm. This algorithm can be expressed in Python, R, C, C++, Fortran, Julia,
Tcl, or any language supported by Swift/T.
The EMEWS design aims to ease software integration while providing scalability
to the largest scale (petascale plus) supercomputers, running millions of ABMs,
thousands at a time. Initial scaling studies of EMEWS have shown robust scalability
[Ozi+15]. The tools are also easy to install and run on an ordinary laptop, requiring
only an MPI (Message Passing Interface) implementation, which can be easily
obtained from common OS package repositories.
Section 4.3.2 describes three EMEWS use cases, which exploit ME algorithm to
optimize the results of Repast simulation. This use cases are published on a public
repository.
The Repast simulator allows to easily execute Repast simulation from command
line. The Listing 4.1 depicts an example Bash script to execute Repast simulation.
As shown, the most attractive feature of Repast is that it allows to run a partic-
ular parameters configuration (param_line) of a simulation model (-scenario
‘$scenario’).
MASON and NetLogo do not provide the same feature. Hence, it is quite complex to
integrate these ABM toolkits in frameworks as EMEWS or SOF. In SOF the integration
has been made by a programming level approach (the parameters configuration is
loaded from Java code); On the other hand, EMEWS is based on Swift/T language
which does not allow to execute Java pure code, and consequently needs a different
strategy. Two Java Wrappers (for MASON, see Listing 4.2 and NetLogo, see Listing
4.3) have been designed to add on EMEWS the support for both the MASON and
NetLogo, with the same features available for Repast. Chapter 3, describes the new
functionalities of Swift/T to invoke languages based for JVM (available from Swift/T
1.0), these new functionalities allow to configure the simulation by code, instead of
executing bash script.
Two Java wrappers1 were designed to face to this problem and are (the project is
released under MIT License). Both the wrappers use the ABM toolkits Java API in
order to create, initialize and execute new simulations. The wrappers usage is pretty
similar, and change according the ABM toolkits characteristics. For instance the
simulation model in MASON is specified using the full qualified name (parameter
-simstate) of the MASON model class and the the simulation (parameter -m) is
defined by a Jar file (see line 21 − 22 Listing 4.2). Instead, in NetLogo (see line 21
Listing 4.3) the simulation (parameter -m) is a *.nlogo file, which describes also the
model. Both the wrappers allow to take the parameters configuration string (CSV
1
Available on https://github.com/spagnuolocarmine/swiftlangabm public repository.
On the R side, the EasyABC [F.+13] and ABC [Csi+12] packages provide approxi-
mate Bayesian computation (ABC) capabilities that are increasingly being applied to
ABM. For machine learning and other statistical applications, R includes packages
such as caret [Kuh08], randomForest [LW02], and many others.
In the following sections are described three use cases of EMEWS in combination
with Java Repast, MASON, NetLogo simulation and ME libraries in Python and R.
These examples are published on a public repository 5 .
For a first demonstration ABM use case, the first example presents a Swift/T parallel
parameter sweep to explore the parameter space of a model (PSE or ME process).
The full Use Case One [UC1] project can be found at the tutorial website where, in
addition to the Repast Simphony use cases examples, the repository branch multisim
provides also NetLogo and MASON examples, developed using the ABM software
integration described in the Section 4.3.1).
2
Available at http://scikit-learn.org.
3
Available at http://deeplearning.net/software/theano/.
4
Available at http://lasagne.readthedocs.io/.
5
Available at https://bitbucket.org/jozik/wintersim2016_adv_tut.
These agents are located in a two dimensional continuous space where each agent
has a x and y coordinate expressed as a floating point number (and in a corre-
sponding discrete grid with integer coordinates). Movement is performed in the
continuous space and translated into discrete grid coordinates. The grid is used
for neighborhood queries (e.g. given a Zombie’s location, where are the nearest
Humans). The model records the grid coordinate of each agent as well as a count
of each agent type (Zombie or Human) at each time step and writes this data to
two files. The initial number of Zombies and Humans is specified by model input
parameters zombie_count and human_count, and the distance a Zombie or Human
can move at each time step is specified by the parameters zombie_step_size and
human_step_size .
In order for Swift/T to call an external application such as the Zombies model, the
application must be wrapped in a leaf function [fun16]. The Zombies model is
written in Java which is not easily called via Tcl and thus an app function is the
best choice for integrating the model into a Swift script. Repast Simphony provides
command line compatible functionality, via an InstanceRunner class, for passing a
set of parameters to a model and performing a single headless run of the model
using those parameters. The command line invocation of Repast Simphony was
wrapped in a bash script repast.sh that eases command line usage.
The Swift app function that calls Repast Simphony is shown in the top of 4.4. Prior to
the actual function definition, the environment variable T_PROJECT_ROOT is accessed.
This variable is used to define the project’s top level directory, relative to which other
directories (e.g., the directory that contains the Zombies model) are defined. On
line 2, the app function definition begins. The function returns two files, one for
standard output and one for standard error. The arguments are those required to
run repast.sh: the name of the script, the current run number, the directory where
the model run output should be written, and the model’s input scenario directory.
The body of the function calls the bash interpreter passing it the name of the script
file to execute and the other function arguments as well as the project root directory.
@stdout=out and @stderr=err redirect stdout and stderr to the files out and err.
It should be easy to see how any model or application that can be run from the
command line and wrapped in a bash script can be called from Swift in this way.
The full script performs a simple parameter sweep using the app function to run the
model. The parameters to sweep are defined in a file where each row of the file
contains a parameter set for an individual run. The script will read these parameter
sets and launch as many parallel runs as possible for a given process configuration,
passing each run a parameter set.
The Swift script is shown in 4.4. The script uses two additional functions that
have been elided to save space. The first, cp_message_center, calls the unix cp
command to copy a Repast Simphony logging configuration file into the current
working directory. The second, make_dir, calls the Unix mkdir command to create a
specified directory. Script execution begins in line 8, calling the cp_message_center
app function. In the absence of any data flow dependency, Swift statements will
execute in parallel whenever possible. However, in this case, the logging file must
be in place before a Zombie model run begins. The => symbol enforces the required
sequential execution: the code on its left-hand side must complete execution before
the code on the right-hand side begins execution.
Lines 11 and 12 parse command line arguments to the Swift script itself. The first of
these is the name of the unrolled-parameter-file that contains the parameter sets that
will be passed as input to the Zombies model. Each line of the file contains a parame-
ter set, that is, the random_seed, zombie_count, human_count, zombie_step_size
In line 13 the built-in Swift file_lines function is used to read the upf file into an
array of strings where each line of the file is an element in the array. The foreach
loop that begins on line 14 executes its loop iterations in parallel. In this way, the
number of model runs that can be performed in parallel is limited only by hardware
resources.
The variable s is set to an array element (that is, a single parameter set represented
as a string) while the variable i is the index of that array element. Lines 15 and
16 create an instance directory into which each model run can write its output.
The => symbol is again used to ensure that the directory is created before the actual
model run that uses that directory is performed in line 20. Lines 17 and 18 create
file objects into which the standard out and standard error streams are redirected
by the repast function (4.4). The Repast Simphony command line runs allows for
the parameter input to be passed in as a file and so in line 19 the parameter string
s is written to a upf.txt file in the instance directory. Lastly, in line 20, the app
function, repast, that performs the Zombie model run is called with the required
arguments.
In script 4.4 is shown how to run multiple instances of the Zombies model in parallel,
each with a different set of parameters. The next example builds on this by adding
some post-run analysis that explores the effect of simulated step size on the final
number of humans. This analysis will be performed in R and executed within the
Swift workflow. We present this in two parts. The first describes the changes to the
foreach loop to gather the output and the second briefly describes how that output
is analyzed to determine the “best” parameter combination.
6 / /−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
7 ...
8 string results [];
9 f o r e a c h s , i i n upf _ l i n e s {
10 ...
11 ( out , e r r ) = r e p a s t ( r e p a s t _ sh , ups , i +1,
12 i n s t a n c e , s c e n a r i o ) => {
13 s t r i n g code = count _humans % i n s t a n c e _ d i r ;
14 r e s u l t s [ i ] = R( code , " t o S t r i n g ( r e s ) " ) ;
15 }
16 }
Due to the highly non-linear relationship between ABM input parameters and model
outputs, as well as feedback loops and emergent behaviors, large-parameter spaces
of realistic ABMs cannot generally be explored via brute force methods, such as
full-factorial experiments, space-filling sampling techniques, or any other a priori
determined sampling schemes. This is where adaptive, heuristics-based approaches
are useful and this is the focus of the next two use cases.
1 import EQPy ;
In this use case is still used the Repast Simphony JZombies demonstration model.
For resident tasks, which retain state, the location of a worker is used so that
the algorithm state can be repeatedly accessed. The EQ/Py extension provides
an interface for interacting with Python-based resident tasks at specific locations.
4.6 listing shows how EQ/Py is used in the current example.The extension was
imported at line 1. The deap function is defined to take the arguments py_rank
(a unique rank), iters (the number of GA iterations), trials (the number of
stochastic variations per parameter combination, or individual), pop (the number
of individuals in the GA population), and seed (the random seed to use for the
GA). A location ME is generated from ME_rank in line 4. This location is passed to
the EQPy_init_package call, along with a package name (deap_ga), which loads
the Python file named deap_ga.py (found by setting the appropriate PYTHONPATH
environment variable), initializes input and output queues, and starts the run
function in the deap_ga.py file, before returning.
At this point the resident task is available to interact with through the EQPy_get()
and EQPy_put() calls, which get string values from and put string values into the
resident task OUT and IN queues, respectively. The first call to EQPy_get() (line 8) is
Listing 4.7 shows the main DEAP workflow loop, a general pattern for interacting
with resident tasks. Unlike the foreach loop, which parallelizes the contents of its
loop, the Swift for loop iterates in a sequential fashion, only guided by dataflow
considerations. The for loop continues until the EQPy_get() call receives a message
“FINAL”, at which point EQPy_get() is called again to retrieve the final results
and doDEAP() exits the loop and returns (lines 12-15). Otherwise, the next set of
parameters is obtained by splitting (line 17) the string variable retrieved on line
10. The contents of the pop array are individual parameter combinations, also
referred to as individuals of a GA population. Each individual is then sent to a
summary objective function obj which creates trials stochastic variations of the
individual, evaluates their objective function (the number of Humans remaining, the
count_humans R code from 4.5) and returns the average value, (not shown here,
full script] on tutorial website). Lines 25-29 transform the summary objective results
for each individual into a string representation that can be evaluated within the
Python resident task, and this value is sent to it via EQPy_put() (line 30).
The EQ/Py extension makes two functions, IN_get and OUT_put, available for the
Python resident task and these can be used to pass candidate parameters to and
get results from any Swift/T workflow. These functions are the complements to the
EQPy_get() and EQPy_put() functions on the Swift/T side.
The DEAP framework provides flexibility in defining custom components for its
GA algorithms and is taking advantage of this by overriding the map() function
used to pass candidate parameters for evaluation to our custom evaluator with
toolbox.register("map", queue_map). The queue_map function executes calls to
OUT_put and IN_get. In this way the Python resident task is unaware of being a
component in an EMEWS workflow. The full Python resident task code (deap_ga.py)
along with the full DEAP use case can be found in the Use Case Two (UC2) project,
also for MASON and NetLogo in the branch multisim.
In this use case, we will show how to integrate a multi-process distributed native code
model written in C++ into a Swift/T workflow. The model is a variant of the Java
Zombies model, written in C++ and using MPI and the Repast HPC toolkit [CN12]
to distribute the model across multiple processes. The complete two dimensional
continuous space and grid span processes and each individual process holds some
subsection of the continuous space and grid. The Zombies and Humans behave as
In contrast to the previous two examples the MPI-based HPC Zombies model is
compiled as a shared library that exposes a Swift/T Tcl interface [Woz+15]. Swift/T
runs on Tcl and thus wrapping the library in Tcl provides tighter integration than an
app function, but is also necessary in the case of multi-process distributed models
that use MPI. Such models when run as standalone applications initialize an MPI
Communicator of the correct size within which the model can be distributed and
run. Since the HPC Zombies model uses MPI, as do all Repast HPC based models, it
must be treated as an MPI library and passed an MPI communicator of the correct
size when run from Swift/T 3.2.3.
The first step in integrating the HPC Zombies model with Swift/T is to compile it
as library, converting main() into a function that runs the model. The next step is
to make that function callable from Tcl via a SWIG created binding. SWIG[Bea96]
is a software tool that generates the ‘glue code’ required for some target lan-
guage, such as Tcl, to call C or C++ code. The SWIG tool processes an inter-
face file and produces the ‘glue-code’ binding as a source code file. In this case,
the C++ code we want to call from Tcl and ultimately from Swift is the Zom-
bies model function: std::string zombies_model_run(MPI_Comm comm, const
std::string& config, const std::string& parameters). The function takes
the MPI communicator in which the model runs, the filename of a Repast HPC config
file, and the parameters for the current run. When called, it starts a model run using
these arguments.
The SWIG interface file is run through SWIG and the resulting source code is com-
piled with the HPC Zombies model library code. The result is a zombie_model_run
function that is callable from Tcl. The Makefile target, ./src/zombies_model_wrapper.cpp
in the Makefile template is an example of this process.
The next step is to create the Swift bindings for the library function. The Swift
bindings define how the zombie_model_run function will be called from Swift. The
Swift code is shown in 4.8. The function is annotated with @par 3.2.4 allowing it to
be called as a parallel function. The @dispatch=WORKER 3.2.4 directs the function to
1 proc zombies_model { o u t s i n s a r g s } {
2 rule $ins \
3 " zombies_model_body $o uts { * } $ i n s " \
4 { * } $ a r g s t y p e $ t u r b i n e : :WORK
5 }
6
The code in the zombies_model Tcl package that the Swift function calls is shown
in 4.9. For parallel tasks, Swift/T currently requires two Tcl procedures, a dataflow
Tutorial Window. The tutorial has been designed with an innovative approach
which is able to provide within a single web IDE the description, the code and the
project structure. The tutorial window, depicted in Figure 4.5, is composed of five
panes. The tutorial text appears in pane 1. Within the tutorial text you will see
6
Available at http://www.mcs.anl.gov/~emews/tutorial/.
hyperlinks to source code that is being explained. These links are of two types. Both
types will display the content of the relevant source code file in pane 2. The second
type will also scroll the source code and highlight lines of interest. For example,
clicking on the link: swiftrun.swift will open the swiftrun.swift file in pane 2, and
clicking on this one will scroll and highlight lines 13 − −15 in that file. Other links
refer to sections in other tutorials, or bibliographic references. These will appear as a
modal browser window, displaying the relevant text. Pane 3 displays the source files
and project structure for the tutorial currently displayed in pane 1. Double-clicking
a folder or single-clicking the triangle symbol to the left of the folder will reveal its
contents. Pane 4 displays a table of contents for the current tutorial. Single click on
an entry to scroll the tutorial text in pane 1 to that section. Pane 5 displays the list of
tutorials. Double click on an entry to open that tutorial.
— John Tukey
American Mathematician
5.1 Introduction
As discussed in Section 1.1, the Computational Science field comprises research
interests on scientific data visualizations. Despite the fact that scientific data vi-
sualization many times is considered as a subfield of computer graphics, it is an
interdisciplinary research branch which aims to visualize data, using a computational
approach and/or computer science tools, in a way that helps the reader grasp the
nature of (potentially) large datasets.
This Chapter describes an extensible and pluggable architecture for the visualization
of data on the Web, tailored particularly for Open Data. Open data is data freely
available to everyone, without restrictions from copyright, patents or other mecha-
nisms of control. The advantage of the Open data is that they are freely accessible on
the Web in machine readable format. The architecture aims to introduce a scalable
way to easily visualize Open data in classical HTML pages.
The most important feature of the proposed architecture is that it moves the compu-
tation client side: data gahtering, data filtering and the rendering of the visualization
are made client side, and not server side, as in other architecture. This ensure
the scalability in terms of number of concurrent visualizations, data trustiness and
privacy (because the data are dynamically loaded client side, without any server
interactions). These design features are the fundamentals of a novel paradigm of
Distributed Computing named Edge-Centric Computing or Fog Compunting.
149
Fig. 5.1.: Centralized cloud model (left) versus Edge-centric Computing (right).
In EcC the most of the computation is moved client side, this affects the cloud
paradigm that becomes a support infrastructure to the computing. These kind of
decentralization allows to easily ensure the data trustiness and scalability on million
of users, moreover, allows to develop novel kind of human-centered applications.
The Figure 5.1 depicts the difference between the classical Cloud Computing and EcC
paradigms. The EcC architecture is composed of a core and some edges. The core
is represented by smaller web servers and content distribution networks, while the
edges consist of individual human-controlled devices such as desktop PCs, tablets,
smart phones, and nano data centers. As shown in the picture the core (that may
be a cloud infrastructure or service provider) is responsible for a minimum part
of the total computation and information sharing, the edges in the architecture
should operate without of core interference in order to ensure trust and massive
number of operations. The EcC architecture may be seen as an attractive and
complete definition of other architectures, such as Content Delivery Networks(CDN),
Peer-to-Peer(P2P), Decentralized Cloud Architecture and Fog Computing.
Problem of centralized computing. The EcC has been designed to face three main
problems. The first fundamental problem is the loss of privacy by releasing personal
and social data to centralized services. A second fundamental problem is the
delegation of the total control to cloud or service infrastructures. These can not
easily ensure any kind of trust both on the data and on the operations performed on
it. For instance, delegating the visualization of some data to a cloud service, requires
1. Formulate the question, the first step of the data visualization process is to be
clear on the question to be answered. The answer to the question will be given
by a visualization of data. However, the question may be refined, often, after a
good understanding, gathering and filtering of data.
2. Data gathering, once the question has been identified, it is necessary to identify
and collect data related to the formulated question. This can be done by
using own data or using data available elsewhere, as in the case of Open Data.
These kind of data are accessible by service providers. Table 5.1 shows 6
popular open data software providers. Many of them allow to access the data
in machine readable formats using Web API.
3. Apply a visual representation, the last step correspond to identify the appropri-
ate way to visualize the data. The challenges facing visualization researchers
often involve finding innovative graphic and interactive techniques to represent
the complexity of information structures. In general, the visual representation
of data is the crucial point in data visualization process, and may be a complex
task. It is important to have some tools able to help users to chose to right way
of visualizing the data.
The architecture described in this chapter aims to help in the final two steps of
the Data Visualization process, providing automatic mechanisms to gather the data,
software libraries and Web tools to apply a visual representation to the data.
Chapter 5
152
5.1.3 Open Data in a Nutshell
An adequate single definition of Open Data is the Open Definition published by Open
Knowledge. In which Open mean:
“Knowledge is open if anyone is free to access, use, modify, and share it — subject,
at most, to measures that preserve provenance and openness.”
The architecture, we are going to present, is based on the fundamentals of Open Data
[Ope16a], exploiting the standardization induced by their definition, in order to
provide an easily/scalable mechanism to visualize data on Web pages. Many are the
advantages of having an architecture able to easily exploit Open Data. For instance, it
can be used to improve transparency of governments and public administration, that
could provide to citizen information about their activities and tools for understanding
them. Making public sector data open is crucial for different reasons:
visualizations, and dependability of the data and privacy (because the data are
dynamically loaded client side, without any server interactions).
The DEEP architecture is compliant with the EcC architecture (see the grey func-
tional blocks in Fig. 5.2): Datalets guarantee the provenance of data (see [Data
Gathering] in Fig.5.2), showing the link to the original dataset used to create the
visualization. In this way, any user can determine whether information is trusted,
and whether data have been manipulated; Datalets ensure the scalability in terms
of visualizations (see [Datalet] in Fig. 5.1). The computation is made client side,
and does not experience bottlenecks due to overloading of the core. The core may
provide other services to the edges; for instance: reports, statistics, forecasting for
certain data exploiting the Datalets usage (see [Visualization Statistics] in Fig.
5.1).
DEEP may be seen as a cloud provider of visualization for some Open Data. An
increasing interest about public discussion around data visualization is witnessed by
several sites of data-journalism. Among the other, the Economist’s Graphic detail
blog2 regularly posts data visualizations (i.e., charts, maps and info-graphics) along
with the articles. On-line users contribute by actively commenting and sharing their
interpretations, observations, and hypothesis of the visualizations. A recent study
of the users’ comments [Hul+15] observed that 42.2% of them are focused on
the available content, such as, the visualizations, data and articles. On this blog,
the comments are text-based without the opportunity for users to create and share
alternative visualizations of the same dataset or other datasets. The aim, through
the DEEP architecture is to enable the social sharing and the collaboration around
data visualizations. The idea is that users can reuse and share data visualizations
to explain their interpretation of data and support their argumentation during the
discussion.
2
Economist’s Graphic Detail blog is accessible to http://www.economist.com/blogs/
graphicdetail
A Datalet may be seen as reusable web widgets, DEEP exploits the web compo-
nents standard, currently being produced by Google engineers compliant with W3C
specification [Wor16]. The goal is to use component-based software engineering
to develop a bundle of web widgets that can be used whenever needed, without
having to rewrite the common fragments shared by several pages of the platform.
The components model allows for encapsulation and interoperability of individual
HTML element.
The web component technologies enable to create your own HTML elements and
the support for these components is present in some WebKit-based browsers like
Google Chrome, Opera and in Mozilla Firefox (but it requires a manual configuration
change). Microsoft’s Internet Explorer has not implemented any Web Components
specifications yet. Backward compatibility with older browsers is implemented using
JavaScript-based polyfills [Web16] (a library you can import in the web page that
implements the web component specification).
Web components specifications consist of four elements, which can be used also
separately:
1. Custom Elements3 , define the method to create new types of DOM elements
in a document. The element definition consists of custom element type, local
name, namespace, custom element prototype and lifecycle callbacks.
2. HTML Imports4 , is a way to include and reuse HTML documents, typically web
component definitions, in other HTML document called master document. The
imported definitions are linked as external resources to master document.
3. Templates5 , describe a method to declare inert DOM subtree in HTML and
manipulate them to instantiate document fragment with identical content.
4. Shadow DOM6 , defines a method of establishing and maintaining functional
boundaries between DOM tree and how these trees interact with each other
within document enabling better functional encapsulation within the DOM.
The major support libraries that implements the web components are:
3
http://w3c.github.io/webcomponents/spec/custom/
4
http://w3c.github.io/webcomponents/spec/imports/
5
https://html.spec.whatwg.org/multipage/scripting.html#the-template-element
6
http://w3c.github.io/webcomponents/spec/shadow/
According to the specifics of the web components standard, DEEP architecture uses
Polymer. Polymer is the library that supports the major number of requirements
as template, web components, material components, data binding, filters, events
handling, touch and gestures, and AJAX/REST.
The idea is to provide to the user an easily developing to include a data visualiza-
tion in web pages. However, this work does not aim to develop new JavaScript
visualization libraries, but reusing existing libraries and providing an innovative
transparent way to use it. Visualizations JavaScripts libraries are defined by the
following features:
DEEP Datalets
A Datalet may be seen as reusable web widgets; the DEEP exploits the web components
standard compliant with W3C specification [Wor16]. According to this standard,
in this architecture we exploit Polymer [Goo16b] (see section 5.2.1), a library
developed by Google engineers that supports the major number of requirements
as template, web components, material components, data binding, filters, events
handling, touch and gestures, and AJAX/RESTful support.
The idea is to provide to the user an easy development tool to include a data
visualization in web pages. However, this work does not aim to develop new
JavaScript visualization libraries, but on reusing of existing libraries, providing an
innovative transparent way to use them.
Datalets have been designed to process any dataset as input, provided that the Data
Provider enables to access the data using Web API in a machine readable format
The Architectural layer provides common behaviors for all datalets, that are:
The Library layer includes all behavior referred to a particular visualization library
(e.g. Highcharts [Hig16]).
The Datalet layer is the real implementation of the web component datalet, and is
developed by some of the hierarchy behavior.
Clearly, the datalet building process takes advantage of this hierarchy; different
datalets may share low level behaviors facilitating code reuse and error fix to datalet
developers.
content, filter and group the data to finally render the visualization in the web-page.
Fig. 5.4 shows the whole process that evolves from left to the right; on top, there are
the inputs, below repositories. The datalet is included in the web-page source code.
When the browser loads the web-page, it processes the datalet source code that is
self-consistent with all parameters (i.e., datalet url, dataset URL, filters and other
configuration parameters). When the web page loads for the first time, the browser
contacts the DEEP architecture (see Figure 5.6) to download the datalet source code
from the datalets repository. The datalet source code reads the URL of the input
dataset, thus, it downloads the data in real-time and performs further processing
(such as data filtering, grouping and so on).
The DEEP component is a simple RESTful service, providing the list of available
datalets (i.e., a lookup service) and the mapping among the visualization names and
their relevant URL within a datalet repository. A datalet takes as input: a dataset
URL, a query to be performed on the data, and -optionally- a filter and/or some
additional configuration parameters. Currently, on http://deep.routetopa.eu
several kind of datalets (bar chart, map, treemap and others) are available, and
many more are planned. All datalets are published on a public repository7 .
Both the DEEP and the Datalet repository have been designed to be extensible:
they can log all the visualization requests and, as planned future work, they could
also provide aggregated statistics on both users preferences and on data and their
visualizations. For instance, the most popular Datalet visualizations, most used
datasets, most popular visualizations for a particular dataset, most visualized field
for a particular dataset, and so on (see Figure 5.2 [Visualization Statistics]). Such
information will be also used to provide useful suggestions to inexpert users, both
on data selection, their filtering and their visualization.
From the user point of view, datalets are interactive, real-time and dynamic visualiza-
tions for Open Data. Datalets are interactive (they are not static pictures), allowing
7
DEEP Datalets code repository: https://github.com/routetopa/deep-components
users to zoom in and out in the data, move the mouse on the visualization items and
have additional information. Figure 5.5 is an example of datalet: a bar chart that
shows the number of Wi-Fi antennas for each subarea in the city of Prato (Italy).
The Figure 5.6 depicts the work-cycle to include a Datalet in a Web page, that is
composed by:
First, (1) the Client page sends a request to DEEP for a specific datalet. Then, (2)
the DEEP responds with the information needed to inject the datalet into the page.
Finally, (3) the Client retrieves the Datalet from the DEEP repository and includes it
into the page. When the datalet is injected in the page the behavior of the datalet is
executed.
Listing 5.1 shows the code to include a datalet in a classical web page. Lines 2, 3, 4
load the JavaScript libraries, in particular the deepClient.js8 library enables the
page to dynamically download the Datalet code from the DEEP repository. This
request is made using jQuery (lines 6–14), specifying the needed parameters. The
DEEP answer with the corresponding datalet code, which is then injected into the
page.
1 <html> <head>
2 <s c r i p t t y p e=" t e x t / j a v a s c r i p t " s r c=" j s / j q u e r y − 1 . 1 1 . 2 . min . j s "></ s c r i p t>
3 <s c r i p t t y p e=" t e x t / j a v a s c r i p t " s r c=" j s / webcomponents . j s "></ s c r i p t>
4 <s c r i p t t y p e=" t e x t / j a v a s c r i p t " s r c=" j s / d e e p C l i e n t . j s "></ s c r i p t>
5 <s c r i p t t y p e=" t e x t / j a v a s c r i p t ">
6 jQuery ( document ) . ready ( f u n c t i o n ( $ ) {
7 v a r d a t a l e t _ p a r a m s ={
8 component : " DATALET_NAME " ,
9 params : { data−u r l : " DATA_URL " , l a y o u t −param−1: " LAYOUT−VALUE " }
10 fields : Array ( " FIELD1 " , " FIELD2 " ) ,
11 p l a c e H o l d e r : " HTML_PLACEHOLDER " } ;
12 ComponentService . d e e p _ u r l = ' DEEP_URL ' ;
13 ComponentService . getComponent ( d a t a l e t _ p a r a m s ) ;
14 }) ;
15 </ s c r i p t></ head><body>
16 <d i v i d=" HTML_PLACEHOLDER "></ d i v>
17 </ body></ html>
8
DEEP Client JS: https://github.com/routetopa/deep-client/blob/master/js/deepClient.
js
• The first step is the dataset selection. The Controllet provides a list of pre-
configured datasets. When the user selects a dataset from the list, the system
shows the relevant meta-data. Of course, the user if free to select datasets
available in other external data portals by providing the data URL.
• Once the dataset has been selected, the next step is to decide what data to
display in the visualization. Practically this means that the user must select
the column of the datasets to pick up the data to display. Then it is possible
to filter (selecting the row that satisfy some specific requirements) the data.
Grouping operations are also available to group together rows based on the
same value of a specific column. Of course, it is essential to specify a rule that
explain how to aggregate data.
• The final step supports the selection of the visualization. The Controllet assists
the user in the selection of a compatible visualization with the filtered dataset.
For instance, when the user selects latitude and longitude from a dataset, the
Controllet suggests the map as possible visualization.
Within SPOD, discussions evolve using a forum-like approach, where the users post
visualizations, comments and can discuss in a time-centric way. SPOD has been
designed considering the well-known issues pointed out in literature for this type
of interactions, such as scattered content, low-signal-to-noise ratio, dysfunctional
argumentation [Kle10]. In contrast, another approach is to collaborate using a map-
based visualization [KC14], where users interact by adding, moving and modifying
concepts as well as linking them. This kind of visualization aims to support better
exploration of the problem space, to provide a rational organization of the content,
9
DEEP Wizard: http://deep.routetopa.eu/deep_1_9_rev/COMPONENTS/demo.html
10
ROUTE-TO-PA site: http://routetopa.eu/about-project/
to stimulate critical thinking. This also showed its limitations, because it suppresses
conversational dynamics and the usual reply structure, therefore disrupting the
way people communicate [Ian+15]. SPOD introduces an hybrid approach where
the forum-like approach is alongside the map-based visualization, and the two are
interlinked. Selecting a Datalet visualization node on the map, one can see all the
comments that contain the selected dataset to support their argumentations.
SPOD has built over Oxwall [Oxw16], and exploiting the DEEP architecture is
able to access to external existing Open Data platforms based on CKAN [Ope16d],
OpenDataSoft [Ope16b], or any other platform. Fig. 5.7 shows the architecture,
where SPOD is in the middle, and on top there are the external open data portals.
SPOD retrieves the dataset directly from these sources any time the Datalet is
visualized. SPOD belongs a federation of other systems, thus a central authentication
server, the RAS (ROUTE-TO-PA Authentication Server) has been provided. End-users
have one username and password to gain access to the federation systems.
This thesis analyzed Frameworks, Parallel Languages and Architectures for SC consid-
ering the scalability of the proposed solutions as the central challenge, is to realize
our idea of a Scalable Computational Science.
Frameworks for SCS. Problems and framework level design solutions for efficient
and scalable Distributed and Parallel Agent-Based Simulation (DPABS) have been
discussed. The proposed solutions are fully developed and tested on the DPABS
framework D-MASON, a distributed version of the well known ABS Java toolkit
MASON.
Summarizing, the proposed solutions in DPABS have been designed to face the
following issues:
167
Memory consistency. In an ABM, the overall system evolves in discrete events
(ideally all agents change their state simultaneously). However, the agents of a
region are updated sequentially. In this case, the system, or the modeler, must
ensure that the accesses to the states of the agents are consistent. D-MASON
solves this problem at framework-level, by exploiting the Java Method Handler
mechanism.
Scalable communication. D-MASON LPs communicate via a well-known mechanism,
based on the Publish/Subscribe (P/S) design pattern: a multicast channel is
assigned to each topic; LPs then simply subscribe to their interested topics
in order to receive relevant message updates. D-MASON is designed to be
used with any Message Oriented Middleware that implements the PS pattern.
Furthermore, D-MASON can also be deployed on HPC systems. In order
to better exploit such homogeneous environments, D-MASON provides an
MPI-based Publish/Subscribe communication.
Execution on Cloud Computing systems. SIMulation-as-a-Service (SIMaaS) infras-
tructure provides a very attractive prospective for the future environment to
execute simulations, due to the good price-performance ratio. D-MASON
provides easy-to-use system management based on Web technologies and tools
to execute and visualize simulations on cloud computing systems.
The simulation method is mainly used to analyze behaviors that are too complex
to be studied analytically, or too risky/expensive to be tested experimentally. The
representation of such complex systems results in a mathematical model comprising
several parameters. Hence, there arises a need for tuning a simulation model,
that is finding optimal parameter values which maximize the effectiveness of the
model. Considering a multi-dimensionality of the parameter space, finding out the
optimal parameters configuration is not an easy undertaking and requires extensive
computing power. Simulations Optimization (SO) and Model Exploration (ME) are
used to refer to the techniques studied for ascertaining the parameters of the model
that minimize (or maximize) given criteria (one or many), which can only be
computed by performing a simulation run. Scalable solutions for such problems have
been analyzed. Moreover two frameworks for SO and ME, focusing respectively on
Cloud Computing (SOF: Simulation Optimization Framework) and on HPC systems
(EMEWS: Extreme-scale Model Exploration with Swift/T) have been presented.
Parallel languages for SCS. The EMEWS framework is made on the top of a parallel
programming language for scientific workflow, named Swift/T. A scientific workflow
is designed specifically to compose and execute a series of computational or data
manipulation steps in a scientific application. One peculiarity of Swift/T is that it
enables to easily execute code written in other languages, such as C, C++, Fortran,
Python, R, Tcl, Julia, Qt Script, but also invoke executable programs. EMEWS exploits
the capability of Swift/T to realize SO and ME workflows adopting optimization
169
Appendices
171
Graph Partitioning Problem for
Agent-Based Simulation
A
As discussed in Section 2.3.1, D-MASON adopts a framework-level parallelization
mechanism approach, which allows the harnessing of computational power of a
parallel environment and, at the same time, hides the details of the architecture
so that users, even with limited knowledge of parallel computer programming, can
easily develop and run simulation models. D-MASON allows modelers to parallelize
simulation based on geometric fields. It adopts a space partitioning approach,
see Section 2.3.2, which allowed the balancing of workload among the resources
involved for the computation with a limited amount of communication overhead.
The most common formulation of the balanced graph partitioning problem is the
following:
173
Fig. A.1.: Graph partitioning Problem for DNetwork Field.
This problem has been extensively studied (see [Bad+13] for a comprehensive
presentation) and is known to be NP-hard [GJ90].
Being a hard problem, exact solutions are found in reasonable time only for small
graphs. However the applications of this problem require to partition much larger
graphs and so several heuristic solutions have been proposed.
The graph partitioning problem was faced using several approaches. Two version of
this problem have been considered: the former takes into account the coordinate
information in the space of the vertices (this is common in graphs describing a
physical domain) while, in the latter problem, vertices are coordinate free. In this
thesis is discussed the coordinate free problem which better fits the ABMs’ domain.
This solution, which works well for planar graphs, is not efficient for complex graph.
A different approach is presented in the Kernighan-Lin (KL) algorithm [KL70] that,
starting with two sets N1 and N2 (describing a partition of V ), greedily improves
the quality of the partitioning by iteratively swapping vertices among the two sets.
The most promising techniques that either use a multilevel approach or a distributed
algorithm exploiting a local search approach are:
The tests performed aim to compare the analytical results obtained (i.e., size of
the edge-cut, number of communication channels required and imbalance) by
each algorithm with the real performances (overall simulation time) in an ABM
scenario.
• Hardware:
– CPUs: 2 x Intel(R) Xeon(R) CPU E5-2680 @ 2.70GHz (#core 16, #threads
32)
– RAM: 256 GB
– Network: adapters Intel Corporation I350 Gigabit
• Software:
– Ubuntu 12.04.4 LTS (GNU/Linux 3.11.0-15-generic x86_64)
– Java JDK 1.6.25
– OpenMPI 1.7.4 (feature version of Feb 5, 2014).
Simulation results, on k-way partitioning, have been obtained using k logical pro-
cessors (one logical processor per component). Notice that, when the simulation
is distributed, the communication between agents in the same component is much
faster than the communication between agents belonging to different components.
On the other hand, balancing is important because the simulation is synchronized
and evolves with the speed of the slowest component.
• Multilevel approach:
– METIS: (cf. Section A.1);
• Edge-cut size (W), the total number of edges having their incident vertices in
different components;
• Number of communication channels (E), two components U1 and U2 requires
a communication channel when ∃v1 ∈ U1 , v2 ∈ U2 such that (v1 , v2 ) ∈ E.
In other words, we are counting the number of edges in the supergraph SG
obtained by clustering the nodes of each component in a single node.
Notice that this unconventional metric is motivated by our specific distributed
ABMs scenario. In this simulation environment, a communication channel,
between two components U1 and U2 , is established when at least two ver-
tices (agents) u1 ∈ U1 and u2 ∈ U2 share an edge. Thereafter, the same
communication channel is used for every communication between U1 and U2 ,
consequently, these additional communications have less impact on system
performances;
• Imbalance (I), the minimum value of such that each component has size at
most (1 + )d|V |/ke.
Moreover, the real performances of each strategy, by measuring the overall simulation
time (T) to perform 10 simulation steps on the distributed SIR simulation, were
evaluated.
Analyzing the results from Figures A.2 and A.3 notice that the performances of the
multilevel approach algorithms are comparable both in terms of edge-cut size and
number of communication channels. Ja-be-Ja performances are a bit worse (this
is probably due to the small number of iteration used in our tests as observed in
Section A.2) but always better than the random strategy.
Results on imbalance are fluctuating (see Figure A.4). In general all the algorithms
provide a quite balanced partition. Apart from the random strategy that by construc-
tion provides the optimal solution, no strategy dominates the others.
Fig. A.5.: Simulation time (T) comparison:(left) k = 4, (right) k = 64. Y -axes appear in
log scale.
In order to better understand how the metrics evolves according to k, Figure A.6
depicts four plots which describes, for each algorithm, the growth of the Edge-cut
size (top-left), the Imbalance (top-right), the number of communication channels
(bottom-left) and the Simulation time on the f_ocean network as function of the
parameter k (X-axis).
In order to better evaluate the relation between the overall simulation times and the
performances of the algorithm (measured considering the edge-cut size, the number
of communication channels and the imbalance), we measured the correlation using
a statistical metric: the Pearson product-moment Correlation Coefficient (PCC).
PCC is one of the measures of correlation which quantifies the strength as well
as direction of the relationship between two variables. The correlation coefficient
ranges from −1 (strong negative correlation) to 1 (strong positive correlation). A
value of 0 implies that there is no correlation between the variables. The correlation
PCC between simulation time (T) and the three analytical metrics (W, E and I) was
computed, with all the considered value of the parameter k.
In particular, four variables were considered that are parametrized by the class N
of Networks (N ∈ {uk, data, 4elt, cti, t60k, wing, finan512, fe_ocean, powergrid}),
the considered algorithm A (A ∈ {METIS, METIS Relaxed, KaFFPa, Ja-be-Ja and
Random}), and the number of components k (k ∈ {2, 4, 8, 16, 32, 64}).
This final result is counterintuitive: theoretically, the greater the imbalance, the
larger the simulation time should be and this should lead to a positive correlation.
The key observation is that a small amount of imbalance has a limited impact on
the simulation time but can be extremely helpful for reducing both the edge-cut size
and the number of communication channels, which seems to have a sensible payoff
in terms of real performances.
k
2 4 8 16 32 64
r(T, W ) 0.9256 0.9392 0.9431 0.9424 0.9473 0.9474
r(T, E) N.A. 0.2265 0.3094 0.3349 0.3509 0.3922
r(T, I) -0.2244 -0.2750 -0.2903 -0.3188 -0.2971 -0.3025
1
The correlation between T (N, A, 2) and E(N, A, 2) cannot be computed, since for k = 2 all the
partitioning strategy require exactly 1 communication channel and so E(N, A, 2) has standard
deviation equal to 0.
This Appendix describes some technical issues arising when dealing with simulations
with a high number of agents, real data, and GIS based environment. The goal
is twofold: analyze computational and programming issues arising when adding
complexity to a relatively simple social simulation model, in particular showing how
to use GIS in D-MASON simulation; show how distributed computing can support
advances in the investigation of social issues.
GIS data into an ABM model, in a first instance, may be seen as an unnecessary level
of details, but simulation is a model based approximation of a real system and so
adding more details may produce better results. GIS can be thought of as a high-tech
equivalent of a geospatial map. This high-tech map provides a number of additional
data about the environments and the status of it. This information can be used to
model more complex interactions between the agents and the environment. Hence,
the complexity of the models getting bigger inevitably requires the introduction of
parallel and distributed computing.
Many ABM tools support the GIS data. Among them we have:
183
• MASON [Bal+03] toolkit (see also Section 2.2). MASON provides support
for GIS data in an additional library named GeoMASON[Sul+10].
GeoMASON follows the same MASON design philosophy of being lightweight,
modular, and efficient. GeoMASON represents the basic GIS data in a geo-
metric shapes supported via the JTS (Java Topology Suite API), which allows
geometries related operations. GeoMASON supports the ESRI [for] shape files
providing a GeomVectorField Java object that represents the GIS data in the
memory, and provides functionalities to access to geometries in order to read
geospatial metadata and obtain geospatial positions of the objects.
• Netlogo [TW04] is a multi-agent programmable modeling environment. It is
developed at the The Center for Connected Learning (CCL) and Computer-
Based Modeling at Northwestern University. Netlogo provides an extension to
support GIS data field. The extension allows the programmer to load vector
GIS data (points, lines, and polygons), and raster GIS data (grids) into their
models. Netlogo extension also supports the ESRI shapefiles.
The next sections describes how to use GIS in D-MASON simulation and the results
obtained on a GIS simulation scenario.
GIS Campania dataset. The open-data platform of the “Regione Campania” provides
the GIS dataset [Cam15] about its region. The dataset is a ShapeFile ESRI shape
The ISTAT’s table contains information about 2.5M citizens living in Campania, which
each day travel to work (or study) and then go back home. It contains two kind of
records: a S-record and a more detailed L-record.
L-records exposes the following data: city of residence, gender, reason to travel
(work or study), city of work/study, vehicle (on foot, by car, by train, etc.), time of
departure, travel duration.
Since each record is aggregate (that is, each record represents several people with
the same attributes), time of departure and travel duration fields are quantized,
meaning that data is represented in classes. For instance, travel duration can be one
of these classes: up to 15 minutes; from 16 to 30 minutes; from 31 to 60 minutes; 61
minutes or more. Considering the table format, we chose to randomize data. For
instance, if a record says that 100 people travel for 16 − 30 minutes, each of the 100
agents created will travel for a number of minutes randomly chosen in the 16 − 30
minutes interval.
Here is described the cognitive aspect of the model. There is a set of norms (graphi-
cally depicted using different colors). For each norm, agents will hold a salience (a
value in the [0,1] interval) that represents how much that norm is important for
the agent. The norm that is the most important for an agent, will characterize the
agent’s color (belief of the norm).
Agents continuously interact with neighbor agents trying to spread their opinion
(color). When an agent meet an agent advertising a particular color, it will increase
the salience for that specific norm. Agents are influenced by neighbors throughout
the day, but with different weight depending on agent’s state (traveling, staying at
home, staying at work/study). Of course, saliences will naturally decay over time.
Campania is divided into five provinces: Naples, Salerno, Benevento, Avellino and
Caserta. Suppose that in each province of Campania there is a norm that is prevalent
(i.e. 80% of the inhabitants are of that color). For instance: Naples is mostly Red;
Salerno is mostly Green; Avellino, Benevento, Caserta are mostly Blue. In the
remaining 20% of the population, the 15% will be Yellow (the color not advertized by
any region), while the last 5% will be of a random color (different from the region’s
color). Back to our example, 80% of people in the Salerno province will be Green;
15% will be Yellow, and the remaining 5% will be a random chosen between Red
and Blue.
The model simulates an entire day (24 hours) starting from midnight. When the
simulation starts, agents are at their home. Time of departure, travel duration, time
of stay at work/study all depend on ISTAT data. Times and durations need to be
converted into simulation steps: for instance, if travel duration for the agent is from
16 to 30 minutes, the simulation will assign a random duration in that interval. This
duration will be converted in a number of steps, according to discretization time of
the simulation (see section B.4).
The size of the simulated field and discretization time have a significant impact on
the performances (in terms of efficiency) of the simulation. An agent moves at a
speed that is calculated dividing the travel distance by the travel duration (simulation
steps). This gives the speed of an agents, that is the maximum distance covered
in a single simulation step. The largest agents’ travel distance is called maximum
agents ride (α) and will require a certain number of steps (maximum number of
steps to perform a ride, β). So we can compute the maximum speed (α/β): this
parameter has a strong impact on the distributed model performances (the smaller
the better).
Since citizens are basically moving on a map, the agents’ space consists in a rectan-
gular area. MASON includes the Continuous2D field, where agents contained in it
are located by a couple of continuous coordinates ranging from point (0,0) to point
([W ]idth, [H]eight)). The distributed version embedded into D-MASON is called
DContinuous2D: it retains all the features of the Continuous2D field, adding the
support for two approaches to distribute the field and agents contained in it: dividing
the space into vertical rectangles (1-dimensional space partitioning or horizontal);
or dividing it into a rows × columns matrix (2-dimensional space partitioning or
square, see Figure B.3) . The square partitioning (space-based) mode provides a
significant speedup over the horizontal partitioning, lowering the communication
effort while distributing the computational workload of the agents to LPs.
The behavior of agents is influenced by GIS data (map, zones and cities), nevertheless
GIS data is static and does not require any synchronization among LPs.
When reading ISTAT data, each LP manages an area of competence, and take care
of agents that live in its area of competence. This is done by reading agent’s home
location from the ISTAT dataset, looking for correspondent coordinates into GIS data,
and converting it into 2-dimensional D-MASON coordinates.
B.4 Experiments
Simulation Environment. Server test was performed to evaluate the performance of
the model in D-MASON. Both the communications strategies available in D-MASON
were tested: AMQ (Apache ActiveMQ) that is the centralized communication strat-
egy; and MPI that uses the MPI standard [Cor+14a; Cor+14b]. Simulations have
been performed on a cluster of eight computer nodes, each equipped as follows:
• Hardware:
– CPUs: 2 x Intel(R) Xeon(R) CPU E5-2680 @ 2.70GHz (#core 16, #threads
32)
– RAM: 256 GB
The performance trend was investigated by varying the agents positioning over the
field: Real that is the positioning of the agents among the region using the real data;
Random is positioning strategy that set the agents uniformly random among the
whole field; CRandom sets the agents among the field uniformly random but only
in the limits of the region.
Simulate 5 minutes of real life. In this test we are interested in evaluating How
much time is needed to simulate 5 minutes of real life? (which corresponds to 200
simulation steps). Four configurations were tested using different partitioning
scheme of the field in 4 × 4, 6 × 6, 8 × 8 and 10 × 10 cells assigned, respectively, to
16, 36, 64 and 100 LPs (each test uses one region per LP). Each configuration was
performed on 2.5 million of agents. Figure B.1 shows the results for each model (Real,
Random and CRandom) and for each communication layer (MPI and AMQ). For each
configuration, we show the total simulation time as well as how it is partitioned into
the communication overhead (that includes the management overhead introduced
by D-MASON) and the computation time.
The Real test provides the worst performance and unusual trend due to an un-
balanced communication overhead. We investigated this problem analyzing the
simulations with different agents positioning models and we discovered that this
trend is due to the non uniform positioning of the agents (see Figure B.4).
Weak scalability. This experiment aims to evaluate the simulation efficiency varying
the total computation workload. In this test four configurations was considered by
changing the total number of agents 10%, 40%, 70% and 100% (100% = 2.5M). Each
configuration was performed on a 10 × 10 partitioning with 100 LPs. Figure B.2
depicts the results of the three models for each communication layer. Moreover we
also compare the performances with the sequential version of the model implemented
in MASON (we refers to this with the name SEQ). Random and CRandom tests
provide a similar behavior showing good scalability. This results demonstrate the
good performance of a 2-dimensional field partitioning on a uniform and quasi-
uniform positioning density. On the other hand, the Real model manifests the worst
scalability (just a bit better than the sequential version). This result is due to the
communications overhead that is extremely irregular over the LPs. The table B.1
reports the speedup obtained during the weak scalability test. For each configuration,
the minimum and maximum speedup are emphasized in bold. The best results have
been obtained by the Random model with 70% of computation amount and AMQ
as communication layer; the worst performance is achieved by the Real positioning
using the MPI communication layer. This confirms the hypothesis that the speedup
is strongly related to agents distribution.
Workload
Test Name 10 % 40% 70% 100%
AMQ - Real 3,36 2,49 1,97 1,74
MPI - Real 3,07 2,32 1,79 1,46
AMQ - Random 11,32 25,14 35,53 33,06
MPI - Random 7,63 20,01 27,29 27,78
AMQ - CRandom 7,69 21,13 29,14 31,86
MPI - CRandom 5,60 15,53 23,38 26,78
Tab. B.1.: Experiments speedup varying the workload. 10×10 partitioning, 5 minutes of
real clock.
α
AOI ≥ max N IR, , (B.1)
β
√
W 2 + H2 W +H
AOI ≥ ≥ (B.3)
β 2β
We can now evaluate the communication effort required by a GIS based distributed
simulation. For each region, the communication effort δc is obtained by counting
" ! ! #
W H 2
δc = p × 4 √ AOI + 4 √ AOI + 8AOI ×d
p p
" !#
W +H
= 4p × d × AOI × √ + 8p × d × AOI 2
p
√
= 4 p × d × AOI × (W + H) + 8p × d × AOI 2
√
≤ 8 p × d × β × AOI 2 + 8p × d × AOI 2
√ √
= 8 p × d × AOI 2 × (β + p) (B.4)
This analysis motivates the poor performance of the simulation in the Real agents
positioning experiment. Figure B.4 depicts the positioning of the agents on the
geographical zones in the Campania region. Real positioning provides a lots of zones
with a small number of agents but there are also a small number of highly populated
zones. Indeed, the density d over the field is non-uniform (the variance, in the
number of agents per zone, is 302600129.2) and by equation (B.3) the communication
effort δc grows proportionally with the larger value of d.
√ √
δc ≤ 8 p × d × AOI 2 × (β + p).
The above results motivate the need to have a non-uniform work partitioning strategy in
DPABS, that has been developed and is described in Section 2.3.2. Section 2.6.1 provides
a complete analysis of the performances of the non-uniform partitioning strategy in
a real simulation scenario. These analysis shows that the non-uniform partitioning
strategy, ensures better load balancing but introduces a communication overhead.
As discussed in Section 2.4, D-MASON provides the distributed version of the three
main classes above:
195
In D-MASON the simulation is named DParticles and is available on the D-MASON
source code repository [Repce].
The Particle class, shown in the Listing C.1, implements the interface Steppable
and declares a constructor that takes two integer parameters, xdir and ydir (which
represents the direction of a particle).
1 ...
2 public P a r t i c l e implements S t e p p a b l e {
3 p u b l i c boolean randomize = f a l s e ;
4 p u b l i c i n t x d i r ; // −1, 0 , or 1
5 p u b l i c i n t y d i r ; // −1, 0 , or 1
6
13 ...
In D-MASON, the DParticle class, shown in the Listing C.2, extends the
RemoteParticle and declares two constructors: the first takes no parameters
and it is required for D-MASON object serialization, and the second that takes
a DistributedState as parameter.
1 ...
2 p u b l i c c l a s s D P a r t i c l e e x t e n d s R e m o t e P a r t i c l e <Int2D> {
3 p u b l i c i n t x d i r ; // −1, 0 , or 1
4 p u b l i c i n t y d i r ; // −1, 0 , or 1
5 public DParticle () { }
6 public DParticle ( DistributedState state ) {
7 super ( s t a t e ) ;
8 }
9 ...
The agent’s behavior is defined for both version in the method step. The Listing C.3
and C.4 show, respectively, the code for the MASON and D-MASON implementa-
tions.
The reader can observe that the two implementation are basically the same, the
difference consists in the way the particles movements are randomized. The changes
in the two implementation are mainly due to the fact that MASON and D-MASON
exploit a different synchronization strategy: as discussed in Section 2.4.1, D-MASON
provides a self-synchronized environment where, all the agents update their status at
step t, considering the status of all neighbor agents at step t−1(synchronous update);
on the other hand, in MASON, agents are updated asynchronously, that is, their
status reflect the changes of the agents that have already performed their computing
in the current iteration. Consequently, the randomization of the movement in D-
MASON is computed, synchronously, at the beginning of each agents step method
while in MASON the randomization is performed asynchronously.
2 ....
3 p u b l i c v o i d s t e p ( SimState s t a t e ) {
4 Tutorial3 tut = ( Tutorial3 ) state ;
5 Int2D l o c a t i o n = t u t . p a r t i c l e s . g e t O b j e c t L o c a t i o n ( t h i s ) ;
6 tut . t r a i l s . f i e l d [ location . x ][ location . y] = 1.0;
7 i f ( randomize )
8 {
9 x d i r = t u t . random . n e x t I n t ( 3 ) − 1 ;
10 y d i r = t u t . random . n e x t I n t ( 3 ) − 1 ;
11 randomize = f a l s e ;
12 }
13 i n t newx = l o c a t i o n . x + x d i r ;
22 Bag p = t u t . p a r t i c l e s . g e t O b j e c t s A t L o c a t i o n ( newloc ) ;
23 i f ( p . numObjs > 1)
24 {
25 f o r ( i n t x=0;x<p . numObjs ; x++)
26 ( ( P a r t i c l e ) ( p . o b j s [ x ] ) ) . randomize = t r u e ;
27 }
28 }
1 ....
2 ....
3 p u b l i c v o i d s t e p ( SimState s t a t e ) {
4 DParticles tut = ( DParticles ) state ;
5 Int2D l o c a t i o n = t u t . p a r t i c l e s . g e t O b j e c t L o c a t i o n ( t h i s ) ;
6 tut . t r a i l s . setDistributedObjectLocation ( location ,1.0 , state ) ;
7 Bag p = t u t . p a r t i c l e s . g e t O b j e c t s A t L o c a t i o n ( l o c a t i o n ) ;
8 i f ( p . numObjs > 1)
9 {
10 x d i r = t u t . random . n e x t I n t ( 3 ) − 1 ;
11 y d i r = t u t . random . n e x t I n t ( 3 ) − 1 ;
12 }
13 i n t newx = l o c a t i o n . x + x d i r ;
14 i n t newy = l o c a t i o n . y + y d i r ;
15 i f ( newx < 0) { newx++; x d i r = −x d i r ; }
16 e l s e i f ( newx >= t u t . gridWidth ) {newx−−; x d i r = −x d i r ; }
17 i f ( newy < 0) { newy++ ; y d i r = −y d i r ; }
18 e l s e i f ( newy >= t u t . g r i d H e i g h t ) {newy−−; y d i r = −y d i r ; }
19 Int2D newloc = new Int2D ( newx , newy ) ;
20 t u t . p a r t i c l e s . s e t D i s t r i b u t e d O b j e c t L o c a t i o n ( newloc , t h i s ,
state ) ;
21 }
22 ....
• The initializations of the simulation field and of the agents. In Tutorial3 there
are two fields: the former contains the agents, while the latter contains the
trails. In the original simulation, after the initialization of simulation fields all
agents are initialized and randomly positioned over the field. In D-MASON
not all agents can be initialized and positioned, but only the agents that must
be simulated by the corresponding LP. Hence, each LP initializes a portion of
the agents (proportional to the size of the associated cells) and position them
to the associated cell. The method getAvailableRandomLocation enable to
generate a random position on a particular cell of a D-MASON simulation
field.
• The method used to schedule agents. MASON schedules the agents using the
method scheduleRepeating, which schedules an agent, for each simulation
step. In D-MASON agents can migrate from one LP to another. For this reason
the method scheduleRepeating is forbidden (it may happen that in a succes-
sive step the agent must be scheduled by another LP). In D-MASON uses the
method scheduleOnce that schedule an agent only for the successive simula-
tion step. Hence, the method scheduleOnce has to be executed for each simula-
tion step. This is performed by the method setDistributedObjectLocation
which, handling the migration of agents, id always aware of the agents that
need to scheduled for each simulation step.
1 p u b l i c c l a s s T u t o r i a l 3 e x t e n d s SimState {
2 p u b l i c DoubleGrid2D t r a i l s ;
3 p u b l i c SparseGrid2D p a r t i c l e s ;
4 ...
5 p u b l i c T u t o r i a l 3 ( long seed ) {
6 s u p e r ( seed ) ;
7 }
8 public void s t a r t () {
9 ...
10 f o r ( i n t i=0 ; i<n u m P a r t i c l e s ; i++) {
11 p = new P a r t i c l e ( random . n e x t I n t ( 3 ) − 1 , random . n e x t I n t ( 3 )
− 1) ; // random d i r e c t i o n
12 schedule . scheduleRepeating (p) ;
13 ...
14 p a r t i c l e s . s e t O b j e c t L o c a t i o n ( p , new Int2D ( x , y ) ) ; // random
location
15 }
16 }
17 p u b l i c s t a t i c v o i d main ( S t r i n g [ ] a r g s ) {
18 doLoop ( P a r t i c l e s . c l a s s , a r g s ) ;
1 ...
2 p u b l i c c l a s s D P a r t i c l e s e x t e n d s D i s t r i b u t e d S t a t e <Int2D> {
3 p r o t e c t e d DSparseGrid2D p a r t i c l e s ;
4 p r o t e c t e d DDoubleGrid2D t r a i l s ;
5 protected SparseGridPortrayal2D p ;
6 p u b l i c i n t gridWidth ;
7 public int gridHeight ;
8 p u b l i c i n t MODE;
9 private String topicPrefix = " " ;
10 p u b l i c D P a r t i c l e s () { super () ;}
11 p u b l i c D P a r t i c l e s ( GeneralParam params , S t r i n g p r e f i x )
12 {
13 s u p e r ( params , new D i s t r i b u t e d M u l t i S c h e d u l e <Int2D >() ,
14 p r e f i x , params . getConnectionType ( ) ) ;
15 t h i s .MODE=params . getMode ( ) ;
16 t h i s . t o p i c P r e f i x=p r e f i x ;
17 gridWidth=params . getWidth ( ) ;
18 g r i d H e i g h t=params . g e t H e i g h t ( ) ;
19 }
20
21 @Override
22 public void s t a r t () {
23 super . s t a r t () ;
24 try {
25
26 particles =
DSparseGrid2DFactory . createDSparseGrid2D ( gridWidth ,
27 gridHeight , this ,
28 s u p e r . AOI , TYPE . p o s _ i , TYPE . p o s _ j ,
29 s u p e r . rows , s u p e r . columns ,MODE,
30 " particles " , topicPrefix , false ) ;
31 t r a i l s = DDoubleGrid2DFactory . createDDoubleGrid2D ( gridWidth ,
32 gridHeight , this ,
33 s u p e r . AOI , TYPE . p o s _ i , TYPE . p o s _ j ,
34 s u p e r . rows , s u p e r . columns ,MODE,
35 0, false , " t r a i l s " , topicPrefix , false ) ;
36
37 init_connection () ;
38
39 } c a t c h ( DMasonException e ) { e . p r i n t S t a c k T r a c e ( ) ; }
40
41 D P a r t i c l e p=new D P a r t i c l e ( t h i s ) ;
42 i n t a g e n t s T o C r e a t e =0;
43 i n t remainder=s u p e r .NUMAGENTS%s u p e r . NUMPEERS;
44 i f ( remainder==0){
56 w h i l e ( p a r t i c l e s . s i z e ( ) != a g e n t s T o C r e a t e )
57
58 p . setPos ( p a r t i c l e s . getAvailableRandomLocation () ) ;
59 p . x d i r = random . n e x t I n t ( 3 ) −1;
60 p . y d i r = random . n e x t I n t ( 3 ) −1;
61
62 i f ( p a r t i c l e s . s e t O b j e c t L o c a t i o n ( p , new
Int2D ( p . pos . getX ( ) , p . pos . getY ( ) ) ) )
63 {
64 s c h e d u l e . scheduleOnce ( s c h e d u l e . getTime ( ) +1.0 ,p ) ;
65
66 i f ( p a r t i c l e s . s i z e ( ) != s u p e r .NUMAGENTS) p=new
DParticle ( this ) ;
67 }
68 }
69 S t e p p a b l e d e c r e a s e r = new S t e p p a b l e ( )
70 {
71 @Override
72 p u b l i c v o i d s t e p ( SimState s t a t e ) { t r a i l s . m u l t i p l y ( 0 . 9 ) ; }
73 };
74 s c h e d u l e . s c h e d u l e R e p e a t i n g ( Schedule . EPOCH, 2 , d e c r e a s e r , 1 ) ;
75 }
76 p u b l i c s t a t i c v o i d main ( S t r i n g [ ] a r g s )
77 {
78 doLoop ( D P a r t i c l e s . c l a s s , a r g s ) ;
79 System . e x i t ( 0 ) ;
80 }
81 @Override
82 public DistributedField2D getField () { return p a r t i c l e s ; }
83 @Override
84 p u b l i c SimState g e t S t a t e ( ) { r e t u r n t h i s ; }
85 @Override
86 p u b l i c v o i d addToField ( RemotePositionedAgent<Int2D> rm , Int2D l o c ) {
87 p a r t i c l e s . s e t O b j e c t L o c a t i o n (rm , l o c ) ;
88 }
89 p u b l i c boolean s e t P o r t r a y a l F o r O b j e c t ( O b j e c t o ) {
90 i f ( p!= n u l l ) {
91 p . setPortrayalForObject (o ,
92 new sim . p o r t r a y a l . s i m p l e . O v a l P o r t r a y a l 2 D ( C o l o r . YELLOW) ) ;
93 return true ;
98 }
C.1.3 (D)Visualization
The visualization of the simulation is provided for MASON in the class
Tutorial3WithUI (see Listing C.7), while for D-MASON it is in the class
DParticlesWithUI (see Listing C.8). Both the classes are subclasses of the MASON
class GUIState. In this case there are no significant differences between the two im-
plementations, this is due to the fact that the visualization refers to the visualization
of the status of corresponding LP, so the status of the others LPs is not required for
the visualization.
1 p u b l i c c l a s s T u t o r i a l 3 W i t h U I e x t e n d s GUIState {
2 ...
3 p u b l i c s t a t i c v o i d main ( S t r i n g [ ] a r g s ) {
4 T u t o r i a l 3 W i t h U i t = new T u t o r i a l 3 W i t h U i ( ) ;
5 t . createController () ;
6 }
7 public Tutorial3WithUI () {
8 s u p e r (new T u t o r i a l 3 ( System . c u r r e n t T i m e M i l l i s ( ) ) ) ;
9 }
10 p u b l i c T u t o r i a l 3 W i t h U I ( SimState s t a t e ) {
11 super ( s t a t e ) ;
12 }
13 ...
14 }
1 p u b l i c c l a s s D P a r t i c l e s W i t h U I e x t e n d s GUIState {
2 ...
3 p u b l i c s t a t i c S t r i n g name ;
4 ...
5 public DParticlesWithUI ( Object [] args ) {
6 s u p e r (new D P a r t i c l e s ( a r g s ) ) ;
7 name = S t r i n g . va lu e Of ( a r g s [ 7 ] ) + " " +
( S t r i n g . v al u eO f ( a r g s [ 8 ] ) ) ;
8 }
9 p u b l i c s t a t i c S t r i n g getName ( ) {
10 r e t u r n " Peer : <"+name+">" ;
11 }
12 ...
13 }
Both the commands above assumes that the nodes of the HPC systems uses a SSH
Key-Based Authentication1 .
Finally it is possible to execute the simulation using the system management (see
Section 2.4.3). It enable to access the Master web control, from any web browser,
where it is possible to submit new simulations. Simulations consist of Java Jar
containers that comprise all classes and resources required to perform a D-MASON
simulation.
1
SSH Key-Based Authentication https://cs.calvin.edu/courses/cs/374/MPI/ssh.html accessed
on February 28, 2017.
[AK95] C.J. Alpert and A. B. Kahng. “Recent directions in netlist partitioning: A survey”.
In: Integration: The VLSI Journal (1995) (cit. on p. 173).
[And04] D.P. Anderson. “BOINC: A System for Public-Resource Computing and Storage”.
In: Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing.
2004 (cit. on p. 24).
[Arm+14] T.G. Armstrong, J.M. Wozniak, M. Wilde, and I. T. Foster. “Compiler tech-
niques for massively Compiler techniques for massively scalable implicit task
parallelism”. In: SC. 2014 (cit. on p. 132).
[Bad+13] D.A. Bader, H. Meyerhenke, P. Sanders, and D. Wagner. “Graph Partitioning and
Graph Clustering - 10th DIMACS Implementation Challenge Workshop, Georgia
Institute of Technology, Atlanta, GA, USA, February 13-14, 2012. Proceedings”.
In: 2013 (cit. on pp. 54, 174).
[Bak+06] M. Baker, B. Carpenter, and A. Shafi. “MPJ Express: towards thread safe Java
HPC”. In: Cluster Computing, 2006 IEEE International Conference on. 2006 (cit.
on p. 61).
205
[Bak+99] M. Baker, B. Carpenter, G. Fox, S. H. Ko, and S. Lim. “mpiJava: An object-
oriented Java interface to MPI”. In: Parallel and Distributed Processing. 1999
(cit. on p. 61).
[Bea96] D. M. Beazley. “SWIG: An easy to use tool for integrating scripting languages
with C and C++”. In: USENIX Tcl/Tk Workshop. 1996 (cit. on p. 145).
[BM05] J.P. Brans and B. Mareschal. Multiple Criteria Decision Analysis: State of the Art
Surveys. Springer Science, 2005 (cit. on p. 118).
[Car+99] B. Carpenter, G.C. Fox, S.H. Ko, and S. Lim. mpijava 1.2: API Specification. 1999
(cit. on p. 61).
[CC95] R. Conte and C. Castelfranchi. Cognitive and Social Action. 1995 (cit. on p. 25).
206 Bibliography
[CH05] B. Calvez and G. Hutzler. “Parameter Space Exploration of Agent-Based Models”.
In: Knowledge-Based Intelligent Information and Engineering Systems. 2005 (cit.
on pp. 54, 116, 173).
[Che+08] D. Chen, G.K. Theodoropoulos, S.J. Turner, W. Cai, R. Minson, and Y. Zhang.
“Large scale agent-based simulation on the grid”. In: Future Gener. Comput. Syst.
(2008) (cit. on p. 27).
[CL10] C. Chen and L.H. Lee. Stochastic Simulation Optimization: An Optimal Computing
Budget Allocation. World Scientific Publishing Co., Inc., 2010 (cit. on pp. 117,
118).
[CN11] N. Collier and M. North. “A Platform for Large-scale Agent-based Modeling”. In:
W. Dubitzky, K. Kurowski, and B. Schott, eds., Large-Scale Computing Techniques
for Complex System Simulations, Wiley. 2011 (cit. on pp. 26, 27, 29).
[CN12] N. Collier and M. North. “Parallel agent-based simulation with Repast for
High Performance Computing”. In: SIMULATION: Transactions of the Society for
Modeling and Simulation International (2012) (cit. on p. 143).
[CN15] N. Collier and M. North. Repast Java Getting Started. 2015 (cit. on p. 138).
[com] GIS ABM community. URL: http://www.gisagents.org (cit. on pp. 183, 192).
[Con99] R. Conte. “Social Intelligence Among Autonomous Agents”. In: Comput. Math.
Organ. Theory (1999) (cit. on p. 25).
Bibliography 207
[Cor+14a] G. Cordasco, A. Mancuso, F. Milone, and C. Spagnuolo. “Communication Strate-
gies in Distributed Agent-Based Simulations: The Experience with D-Mason”.
In: Proceedings of the 1st Workshop on Parallel and Distributed Agent-Based
Simulations (PADABS). Euro-Par 2013. 2014 (cit. on pp. 18, 59, 83, 176, 187).
208 Bibliography
[Csi+12] K. Csillery, O. Francois, and M.G.B. Blum. “abc: an R package for approximate
Bayesian computation (ABC)”. In: Methods in Ecology and Evolution. 2012 (cit.
on p. 137).
[CW13] A.T. Crooks and S. Wise. “GIS and Agent-Based models for Humanitarian
Assistance, Computers, Environment and Urban Systems”. In: (2013) (cit. on
pp. 184, 192).
[D.P+84] J.H.Saltzer D.P., Reed, and D.D. Clark. “End-to-end arguments in system design”.
In: ACM Trans. Comput. Syst. (1984) (cit. on p. 31).
[dat16] ISTAT Commuters 2011 dataset. 2016. URL: http : / / www . istat . it / it /
archivio/139381 (cit. on p. 185).
[Dur+16] F.R. Duro, J.G. Blas, F. Isaila, J.M. Wozniak, J. Carretero, and R. Ross. “Flexible
data-aware scheduling for workflows over an in-memory object store”. In:
CCGrid. 2016 (cit. on p. 98).
[EK10] D. Easley and J. Kleinberg. Networks, Crowds, and Markets: Reasoning About a
Highly Connected World. Cambridge University Press, 2010 (cit. on p. 54).
[Eps+07] J.M. Epstein, S.A. Levin, and S.H. Strogatz, eds. Generative Social Science:
Studies in Agent-Based Computational Modeling. Princeton University Press,
2007 (cit. on pp. 24, 25).
Bibliography 209
[exp13] http://mpj express.org/performance.html. 2013. URL: {MPJExpressPerformance}
(cit. on p. 61).
[F.+13] Jabot F., T. Faure, and N. Dumoulin. “EasyABC: performing efficient approxi-
mate Bayesian computation sampling schemes using R”. In: Methods in Ecology
and Evolution 4. 2013 (cit. on p. 137).
[For+12] F.A. Fortin, F.M.D. Rainville, M.A. Gardner, M. Parizeau, and C. Gagn. “DEAP:
Evolutionary Algorithms Made Easy”. In: Journal of Machine Learning Research.
2012 (cit. on pp. 137, 142).
[Fos95] I. Foster. Designing and Building Parallel Programs: Concepts and Tools for Parallel
Software Engineering. Addison-Wesley Longman Publishing Co., Inc., 1995 (cit.
on p. 24).
[fun16] Swift/T High Performance Dataflow Computing Defining leaf functions. 2016.
URL : http://swift-lang.github.io/swift-t/guide.html (cit. on p. 138).
[Ghe+03] S. Ghemawat, H. Gobioff, and S. Leung. “The Google File System”. In: ACM
Symposium on Operating Systems Principles (2003) (cit. on p. 121).
[GJ90] M.R Garey and D.S. Johnson. Computers and Intractability; A Guide to the Theory
of NP-Completeness. W. H. Freeman & Co., 1990 (cit. on pp. 54, 174).
[GT05] N. Gilbert and K. Troitzsch. Simulation for the social scientist. 2005 (cit. on
p. 183).
[Gui16] Swift/T High Performance Dataflow Computing Developer Guide. 2016. URL:
http://swift-lang.github.io/swift-t/guide.html (cit. on p. 109).
[Gup96] A. Gupta. Graph partitioning based sparse matrix orderings for interior-point
algorithms. IBM Thomas J. Watson Research Division, 1996 (cit. on p. 173).
[Haf+11] M. Hafeez, S. Asghar, U.A. Malik, A. ur Rehman, and N. Riaz. “Survey of MPI
Implementations”. In: Digital Information and Communication Technology and
Its Applications. Springer, 2011 (cit. on p. 61).
[He+10] D. He, L. Lee H., C. Chen, M. Fu, and S. Wasserkrug. “Simulation Optimization
Using the Cross-entropy Method with Optimal Computing Budget Allocation”.
In: ACM Trans. Model. Comput. Simul. (2010) (cit. on pp. 20, 115).
210 Bibliography
[Hil90] M.D. Hill. “What is Scalability?” In: SIGARCH Comput. Archit. News (1990)
(cit. on p. 4).
[HX98] K. Hwang and Z. Xu. Scalable parallel computing: technology, architecture, pro-
gramming. WCB/McGraw-Hill, 1998 (cit. on pp. 30, 33).
[Ian+15] L. Iandoli, I. Quinto, A. De Liddo, and S. Buckingham Shum. “On online collabo-
ration and construction of shared knowledge: Assessing mediation capability in
computer supported argument visualization tools”. In: Journal of the Association
for Information Science and Technology (2015) (cit. on p. 166).
[Jp] Groovy A multi-faceted language for the Java platform. URL: http : / / www .
groovy-lang.org/ (cit. on p. 102).
[KK98] G. Karypis and V. Kumar. “Multilevel k-way Partitioning Scheme for Irregular
Graphs”. In: Journal of Parallel and Distributed Computing (1998) (cit. on pp. 54,
175).
[KL70] B.W. Kernighan and S. Lin. “An Efficient Heuristic Procedure for Partitioning
Graphs”. In: The Bell Systems Technical Journal (1970) (cit. on pp. 173, 174).
[Kle10] M. Klein. “Using metrics to enable large-scale deliberation”. In: Collective intel-
ligence in organizations: A workshop of the ACM Group 2010 Conference. 2010
(cit. on p. 165).
[Kuh08] M. Kuhn. “Building Predictive Models in R Using the caret Package”. In: Journal
of Statistical Software. 2008 (cit. on p. 137).
Bibliography 211
[Law07] A.M. Law. Simulation modeling and analysis. McGraw-Hill, 2007 (cit. on pp. 20,
115–117).
[Mam+] A.R. Mamidala, R. Kumar, D. De, and D.K. Panda. MPI Collectives on Modern Mul-
ticore Clusters: Performance Optimizations and Communication Characteristics
(cit. on p. 62).
212 Bibliography
[Mat08] B. Matthew. “Review of Software Platforms for Agent Based Models”. In: (2008)
(cit. on pp. 24, 26, 28, 29).
[Mes97] Message. MPI-2: Extensions to the Message-Passing Interface. 1997 (cit. on p. 62).
[MG92] J. Misra and D. Gries. “A Constructive Proof of Vizing’s Theorem.” In: Inf. Process.
Lett. (1992) (cit. on p. 65).
[MN+13] N. T. Collier M.J. North, J. Ozik, E.R. Tatara, C.M. Macal, M. Bragen, and
P. Sydelko. “Complex adaptive systems modeling with Repast Simphony”. In:
Complex Adaptive Systems Modeling. 2013 (cit. on p. 118).
[MN05] C.M. Macal and M.J North. “Tutorial on Agent-based Modeling and Simulation”.
In: Proceedings of the 37th Conference on Winter Simulation. 2005 (cit. on
pp. 115, 116).
[MO15] K. Moreland and R. Oldfield. “Formal Metrics for Large-Scale Parallel Perfor-
mance”. In: ed. by J.M. Kunkel and T. Ludwig. Springer International Publishing,
2015 (cit. on p. 13).
[Mus+09] N. Mustafee, S.J.E. Taylor, K. Katsaliaki, and S. Brailsford. “Facilitating the Anal-
ysis of a UK National Blood Service Supply Chain Using Distributed Simulation”.
In: Simulation (2009) (cit. on pp. 24, 29).
[Naj+01] R. Najlis, M.A. Janssen, and D. C. Parkerx. “Software tools and communica-
tion issues”. In: Proc. Agent-Based Models of Land-Use and Land-Cover Change
Workshop. 2001 (cit. on pp. 24, 28, 29).
[Nel10] B.L. Nelson. “Optimization via simulation over discrete decision variables”. In:
Tutorials in operations research (2010) (cit. on p. 117).
[New06] M.E.J. Newman. “Modularity and community structure in networks”. In: Pro-
ceedings of the National Academy of Sciences (PNAS) (2006) (cit. on p. 173).
[Nor+07] M.J. North, T.R. Howe, N.T. Collier., and J.R. Vos. “Declarative Model Assembly
Infrastructure for Verification and Validation”. In: S. Takahashi, D.L. Sallach and
J. Rouchier, eds., Advancing Social Simulation (2007) (cit. on pp. 26, 29, 48,
116).
Bibliography 213
[opt] OptTek metaheuristic optimization. URL: http://www.opttek.com (cit. on
p. 119).
[Ozi+14] J. Ozik, M. Wilde, N. Collier, and C.M. Macal. “Adaptive Simulation with Repast
Simphony and Swift”. In: Euro-Par 2014: Parallel Processing Workshops. 2014
(cit. on p. 118).
[Ozi+15] J. Ozik, N. T. Collier, and J.M. Wozniak. “Many Resident Task Computing in
Support of Dynamic Ensemble Computations”. In: 8th Workshop on Many-Task
Computing on Clouds, Grids, and Supercomputers Proceedings. 2015 (cit. on
pp. 98, 133, 142).
[Ozi+16a] J. Ozik, N.T. Collier, J.M. Wozniak, and C. Spagnuolo. “From Desktop To Large-
scale Model Exploration with Swift/T.” In: Winter Simulation Conference (WSC)
2016. 2016 (cit. on p. 21).
[PE08] Hazel R. Parry and Andrew J. Evans. “A comparative analysis of parallel pro-
cessing and super-individual methods for improving the computational per-
formance of a large individual-based model”. In: Ecological Modelling (2008)
(cit. on p. 31).
[Rai+06] F.L. Railsback, L. Lytinen Steven, and K.S. Jackson. “Agent-based Simulation
Platforms: Review and Development Recommendations”. In: Simulation (2006)
(cit. on pp. 24, 28, 29).
[Repce] D-MASON Official GitHub Repository. Accessed October 2016. URL: https:
//github.com/isislab-unisa/dmason (cit. on pp. 81, 196, 203).
214 Bibliography
[Reu+13] R. Reuillon, M. Leclaire, and S. Rey-Coyrehourcq. “OpenMOLE, a Workflow En-
gine Specifically Tailored for the Distributed Exploration of Simulation Models”.
In: Future Gener. Comput. Syst. (2013) (cit. on pp. 119, 130).
[Rey87] C. Reynolds. “Flocks, Herds and Schools: A Distributed Behavioral Model”. In:
SIGGRAPH Comput. Graph. (1987) (cit. on pp. 66, 82).
[Set12] B. Settles. “Active Learning”. In: Synthesis Lectures on Artificial Intelligence and
Machine Learning. 2012 (cit. on pp. 137, 145).
[SS13] P. Sanders and C. Schulz. “Think Locally, Act Globally: Highly Balanced Graph
Partitioning”. In: Proceedings of the 12th International Symposium on Experimen-
tal Algorithms (SEA’13). 2013 (cit. on p. 175).
[ST96] D.A. Spielmat and S.H. TenG. “Spectral partitioning works: planar graphs and
finite element meshes”. In: Proceedings 37th Annual Symposium on Foundations
of Computer Science. 1996 (cit. on p. 175).
[Sto11] F.J. Stonedahl. “Genetic Algorithms for the Exploration of Parameter Spaces in
Agent-based Models”. In: P.h.D. Thesis. 2011 (cit. on p. 119).
[Sul+10] K. Sullivan, M. Coletti, and S. Luke. GeoMason: GeoSpatial support for MASON.
2010 (cit. on p. 184).
[Tan+11] W. Tang, D.A. Bennett, and S.Wang. “A parallel agent-based model of land use
opinions”. In: Journal of Land Use Science (2011) (cit. on p. 31).
[Tur13] G. Turkington. Hadoop Beginner’s Guide. Packt Publishing, 2013 (cit. on pp. 21,
117).
[TW04] S. Tisue and U. Wilensky. “NetLogo: A simple environment for modeling com-
plexity”. In: International Conference on Complex Systems. 2004 (cit. on pp. 21,
26, 116, 117, 119, 184).
[Wei+00] T. WeiQin, Y. Hua, and Y. WenSheng. “PJMPI: pure Java implementation of MPI”.
In: High Performance Computing in the Asia-Pacific Region, 2000. Proceedings.
The Fourth International Conference/Exhibition on. 2000 (cit. on p. 61).
Bibliography 215
[Wil+09] M. Wilde, I. Foster, K. Iskra, P. Beckman, Z. Zhang, A. Espinosa, M. Hategan,
B. Clifford, and I. Raicu. “Parallel scripting for applications at the petascale and
beyond. Computer”. In: Computer. 2009 (cit. on p. 95).
[Woz+15] J.M. Wozniak, T.G. Armstrong, K.C. Maheshwari, D.S. Katz, and M. Wilde.
“Interlanguage parallel scripting for distributed-memory scientific computing”.
In: WORKS at SC. 2015 (cit. on p. 145).
[ZZ04] B. Zhou and S. Zhou. “Parallel simulation of group behaviors”. In: WSC ’04:
Proceedings of the 36th conference on Winter simulation. 2004 (cit. on p. 34).
[A J16] A JavaScript library for image- and vector-tiled maps using SVG. 2016. URL:
http://polymaps.org/ (cit. on p. 158).
[A V16] A Versatile and Expandable jQuery Plotting Plugin. 2016. URL: http://www.
jqplot.com/ (cit. on p. 158).
[Bos16] Bosonic a practical collection of everyday Web Components. 2016. URL: http:
//bosonic.github.io/ (cit. on p. 157).
[D3.16] D3.js a JavaScript library for manipulating documents based on data. 2016.
URL : https://d3js.org/ (cit. on p. 158).
[Dim16] Dimple an object-oriented API for business analytics powered by D3.js. 2016.
URL : http://dimplejs.org/ (cit. on p. 159).
[Dyg16] Dygraphs a fast, flexible open source JavaScript charting library. 2016. URL:
http://dygraphs.com/ (cit. on p. 157).
216 Bibliography
[Goo16b] Google Inc. Polymer Library. 2016. URL: https://www.polymer-project.org/
1.0/ (cit. on pp. 71, 157, 159).
[Jav] JavaScript the programming language of HTML and the Web. URL: http :
//www.w3schools.com/js/ (cit. on p. 102).
[Lea16] Leaflet open-source JavaScript library interactive maps. 2016. URL: http://
leafletjs.com/ (cit. on pp. 154, 157).
[MPI13] MPI Standard Official Website. 2013. URL: http : / / www . mcs . anl . gov /
research/projects/mpi/index.htm (cit. on p. 61).
[Ope16c] Open MPI: Open Source High Performance Computing. 2016. URL: http :
//www.open-mpi.org/ (cit. on p. 59).
[Ope16d] Open Source Data Portal Sofware. 2016. URL: http : / / ckan . org/ (cit. on
pp. 153, 166).
Bibliography 217
[Rap16] Raphaël a small JavaScript library that should simplify your work with vector
graphics on the web. 2016. URL: http://dmitrybaranovskiy.github.io/
raphael/ (cit. on p. 158).
[Thece] The Graph Partitioning Archive. Accessed May 2015. URL: http://staffweb.
cms.gre.ac.uk/~wc06/partition/ (cit. on p. 175).
[X-T16] X-Tag is a Microsoft supported, open source, JavaScript library that wraps the
W3C standard Web Components. 2016. URL: http : / / x - tag . github . io/
(cit. on p. 156).
[jQu16] jQuery Visualize. HTML5 canvas charts driven by HTML table elements. 2016.
URL : https : / / github . com / filamentgroup / jQuery - Visualize (cit. on
p. 159).
218 Bibliography
List of Figures
1.1 Computational Science areas and their relations with the contributions
of this work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Systems Architecture Share from Top500. . . . . . . . . . . . . . . . . 5
1.3 Flynn’s Taxnomy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Microprocessor Transistor counts from 1971 to 2011. . . . . . . . . . . 7
1.5 CPU performance 1978 to 2010. . . . . . . . . . . . . . . . . . . . . . 7
1.6 Clock speed 1978 to 2010. . . . . . . . . . . . . . . . . . . . . . . . . 8
1.7 Symmetric multiprocessing architecture. . . . . . . . . . . . . . . . . . 9
1.8 Multi-core architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.9 Multi-computers architecture. . . . . . . . . . . . . . . . . . . . . . . . 9
1.10 Example program segments. . . . . . . . . . . . . . . . . . . . . . . . . 11
1.11 Power–cost relationship according to Grosch’s law. . . . . . . . . . . . 12
1.12 Scaling study results for EMEWS ABM calibaration workflow on IBM
Blue Gene/Q. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
219
2.17 MPI_Bcast approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.18 MPI_Gather approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.19 Possibles simultaneous communications using 4 LP and uniform parti-
tioning mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.20 Parallel approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
2.21 Performance comparison among JMS Strategy, Bcast, Gather, Parallel.
The X axis represents the number of agents while the y axis represents
the time difference expressed in percentage compared to the JMS
Strategy (lower is better) . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2.22 Master control panel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
2.23 Workers seen from Master. . . . . . . . . . . . . . . . . . . . . . . . . . 72
2.24 Simulation Controller. . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.25 Simulations view. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.26 Simulation Controller (left) and Simulation Info (right) . . . . . . . . 74
2.27 History view. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
2.28 Visualization Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.29 Zoom App application architecture . . . . . . . . . . . . . . . . . . . . 77
2.30 Global Viewer Web application . . . . . . . . . . . . . . . . . . . . . . 78
2.31 D-MASON on the Cloud: Architecture. . . . . . . . . . . . . . . . . . 80
2.32 Field Partitioning Strategies: Weak Scalability . . . . . . . . . . . . . . 84
2.33 Field Partitioning Strategies Strong Scalability . . . . . . . . . . . . . . 85
2.34 Communication scalability. . . . . . . . . . . . . . . . . . . . . . . . . . 86
2.35 Computation scalability. . . . . . . . . . . . . . . . . . . . . . . . . . . 88
2.36 D-MASON Weak Scalability . . . . . . . . . . . . . . . . . . . . . . . 89
2.37 D-MASON Strong Scalability . . . . . . . . . . . . . . . . . . . . . . . 90
2.38 D-MASON Scalability beyond the limits of sequential computation . . 91
2.39 D-MASON performances on the Cloud and HPC system . . . . . . . . 93
5.1 Centralized cloud model (left) versus Edge-centric Computing (right). 150
5.2 Edge-centric Computing Architecture of DatalEt-Ecosystem Provider . . 154
5.3 Datalets Object-Oriented paradigm embedding. The four layer of DEEP
datalets architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
5.4 Datalet lifecycle.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
5.5 An Example of datalet.. . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
4.1 Completion time (s) with different test settings where n is the number
of simulation performed per loop and p is the number of cluster nodes. 131
223