IoT@Run Time Uniandes
IoT@Run Time Uniandes
IoT@Run Time Uniandes
A Dissertation
by
Iván Alfonso
Submitted to the Faculty of Engineering of the
Universidad de los Andes
in partial fulfillment for the requirements for the Degree of
Doctor in Engineering
and
Submitted to the Faculty of Computer Science, Multimedia and
Telecommunications of the
Universitat Oberta de Catalunya
in partial fulfillment for the requirements for the Degree of
Doctor in Network and Information Technologies
December 2022
Abstract
In recent years, the Internet of Things (IoT) has expanded its fields and areas of
application, becoming a key component in industrial processes and even in the ac-
tivities we perform daily. The growth of IoT has generated increasingly restrictive
requirements, mainly in systems that analyze information in real time. Similarly,
IoT system architectures have evolved to implement new strategies and patterns
(such as edge and fog computing) to meet system requirements. Traditionally, an
IoT system was composed of two layers: the device layer (sensors and actuators)
and the cloud layer for information processing and storage. Today, most IoT sys-
tems leverage edge and fog computing to bring computation and storage closer to
the device layer, decreasing bandwidth consumption and latency. Although the use
of these multi-layer architectures can improve performance, it is challenging to de-
sign them because the dynamic and changing IoT environment can impact Quality
of Service (QoS) and system operation. IoT systems are often exposed to changing
environments that induce unexpected runtime events such as signal strength in-
stability, latency growth and software failures. To cope with these events, system
adaptations should be automatically executed at runtime, i.e., IoT systems should
have self-adaptation capabilities.
In this sense, better support in the design, deployment, and self-adaptation
stages of multilayer IoT systems is needed. However, the tools and solutions found
in the literature do not address the complexity of multi-layered IoT systems, and
the languages for specifying the adaptation rules that govern the system at runtime
are limited.
Therefore, we propose a modeling-based approach that addresses the limita-
tions of existing studies to support the design, deployment, and management of
self-adaptive IoT systems. Our solution is divided into two stages:
Modeling (design time): to support the design tasks, we propose a Domain
Specific Language (DSL) that enables to specify the multi-layered architecture of
the IoT system, the deployment of container-based applications, and rules for the
self-adaptation at runtime. Additionally, we design a code generator that produces
YAML manifests for the deployment and management of the IoT system at runtime.
i
ii ABSTRACT
Durante los últimos años, el Internet de las Cosas (IoT) ha ampliado sus cam-
pos y áreas de aplicación convirtiéndose en un componente clave en los procesos
industriales e incluso en las actividades que realizamos a diario. El crecimiento en
el uso de IoT ha generado requerimientos cada vez más restrictivos, principalmente
en sistemas que requieren análisis de información en tiempo real. De igual forma,
las arquitecturas de los sistemas IoT han evolucionado para implementar nuevas es-
trategias y patrones (como la computación de borde y niebla) que permitan cumplir
con los requerimientos del sistema. Tradicionalmente, un sistema IoT se componía
de dos capas: la capa de dispositivo (sensores y actuadores) y la capa de nube para el
procesamiento y almacenamiento de la información. Hoy en día, la mayoría de los
sistemas IoT aprovechan la computación de borde y niebla para acercar físicamente
la computación y el almacenamiento hacia la capa de dispositivo, disminuyendo así
el consumo de ancho de banda y latencia. Aunque el uso de estas arquitecturas mul-
ticapa favorecen el desempeño, es un reto diseñarlas debido a que el ambiente IoT
dinámico y cambiante puede impactar la Calidad del Servicio (QoS) y operación del
sistema. Los sistemas IoT suelen estar expuestos a entornos cambiantes que inducen
eventos inesperados en tiempo de ejecución como la inestabilidad de la intensidad
de la señal, el crecimiento de la latencia y los fallos de software. Para hacer frente
a estos eventos, adaptaciones del sistema deben ser automáticamente ejecutadas en
tiempo de ejecución, es decir, los sistemas IoT deberían tener capacidades de auto-
adaptación.
En este sentido, es necesario brindar soporte en las etapas de diseño, despliegue
y autoadaptación de sistemas IoT multicapa. Sin embargo, las herramientas y solu-
ciones encontradas en la literatura no abordan la complejidad de los sistemas IoT
multicapa, y los lenguajes para especificar las reglas de adaptación que gobiernan
el sistema en tiempo de ejecución son limitados.
En esta tesis, proponemos una solución basada en modelado que supera las li-
mitaciones de los estudios existentes para soportar el diseño, despliegue, y gestión
de sistemas IoT autoadaptables. Nuestra solución se divide en dos etapas:
Modelamiento (tiempo de diseño): para soportar las tareas de diseño, propo-
iii
iv RESUMEN
Palabras clave: Internet de las cosas, Ingeniería Basada en Modelos, Sistema Autoadap-
table, Lenguaje de Dominio Específico, Computación de Borde y Niebla
Acknowledgements
The development of this thesis has been possible thanks to the support of my
family, friends, and colleagues. To each and every one, thank you.
First and foremost, I would like to thank my three mentors Kelly, Harold, and
Jordi. Thank you for giving me the opportunity to do a PhD thesis under your
supervision, for trusting me, and for dedicating your time to me. I am very grateful
to Kelly and Harold for their advice from the beginning of the thesis. They have
shaped me as a researcher and have taught me to love this beautiful work. To Jordi,
I want to thank him for his advice, guidance, patience, severity, and friendship.
Thank you for the opportunity to meet and join your great working group (SOM
Research Lab) in Barcelona.
I would also like to thank the jury for their time and effort in evaluating this
dissertation. Your comments, questions, and suggestions will be used to further
improve this thesis.
I would also like to thank my friends at Software Evolution Lab and SOM Research
Lab for all their collaboration in this research, and support as friends and colleagues.
I especially want to thank Olga Vega, Jaime Chavarriaga, Edouard Batot, Marc Oriol,
Marcos Gómez, Abel Gómez, Joan Giner and the others for their friendship and
support.
This dissertation have been partially funded by a PhD Scholarship of MINCIEN-
CIAS (Becas del Bicentenario program, 2018). It have been also funded by a teaching
assistance of Universidad de los Andes and a joint-project (European TRANSACT
Project) with SOM Research Lab, UOC. I am very grateful with all these institutions
for the opportunity they gave me.
I would also like to thank all my family: Jesus, Marlen, Andrea, Andres, Marien,
and Maleja. Thank you for helping and supporting me during these years of work.
Without your trust and love this would not have been possible.
Last but not least, I would like my wife Karen for her love, patience, and support
in all the difficult and happy moments. Gracias por estar siempre a mi lado.
v
Contents
Abstract i
Resumen iii
Acknowledgements v
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Organization of the Document . . . . . . . . . . . . . . . . . . . . . 5
3 Overview 33
3.1 Framework Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.1.1 Design time stage . . . . . . . . . . . . . . . . . . . . . . . . 34
3.1.2 Runtime stage . . . . . . . . . . . . . . . . . . . . . . . . . . 35
vii
viii CONTENTS
7 Experimental Evaluation 97
7.1 DSL Empirical Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 97
7.1.1 Experimental Study 1: DSL Validation - Architectural Con-
cepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.1.2 Experimental Study 2: DSL Validation - Mining Concepts . 100
7.1.3 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . 104
7.2 Evaluation of System Self-Adaptations . . . . . . . . . . . . . . . . 104
7.2.1 Experiment Design and Setup . . . . . . . . . . . . . . . . . 105
7.2.2 Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
7.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
7.2.4 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . 112
7.3 Evaluation of Framework Scalability . . . . . . . . . . . . . . . . . . 115
7.3.1 Design and Setup . . . . . . . . . . . . . . . . . . . . . . . . 116
7.3.2 Experiment Protocol . . . . . . . . . . . . . . . . . . . . . . 120
7.3.3 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . 121
7.3.4 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . 124
7.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Bibliography 151
Introduction
The emergence of the Internet of Things (IoT) has dramatically changed how phys-
ical objects are conceived in our society and industry. In this new IoT era, every
object becomes a complex cyber-physical system (CPS) [87], where both the physi-
cal characteristics and the software that manages them are highly intertwined. More
specifically, IoT is defined by the International Telecommunication Union (ITU) as
a "global infrastructure for the information society, enabling advanced services by
interconnecting (physical and virtual) things based on existing and evolving inter-
operable information and communication technologies" [155].
Although the term IoT was first used by Kevin Ashton in 1999 to describe a
system where the physical world is connected to the Internet through sensors and
RFID (Radio Frequency Identification) technologies, years ago other projects were
already addressing the connection of devices to the Internet.
In 1982, the first thing was connected to the Internet. At Carnegie Mellon Uni-
versity, a Coca-Cola machine was connected to the Internet to check the availability
and temperature of drinks. In 1990, a toaster machine was connected to the Inter-
net via TCP/IP protocol to monitor its usage time. In 1993 students at Cambridge
University connected the first camera to monitor the availability of cage in the de-
partment’s machines. In 1999, the same year that the term IoT was coined, device to
device communications was introduced by Bill Joy in his taxonomy of the Internet.
In 2000, the popularity of wireless connections begins to grow together with
the number of objects connected to the internet. In this year, LG launches the first
internet-connected refrigerator. 2008 was the first year in which the number of de-
vices connected to the Internet exceeded the number of people connected. Although
the term IoT was already being used in closed communities, Kevin Ashton first in-
troduced it in his paper tittled That "Internet of Things" Thing [12]. Since then, IoT
has had quite a big growth, and a lot of startups and companies working in this area
1
2 CHAPTER 1. INTRODUCTION
1.1 Motivation
The ideas behind the IoT have been especially embraced by industry in the so-
called Industrial IoT (IIoT) or Industry 4.0. Currently billions of devices are con-
nected with potential capabilities to sense, communicate, and share information
about their environment. Traditional IoT systems rely on cloud-based architectures,
which allocate all processing and storage capabilities to cloud servers. Although
cloud-based IoT architectures have advantages such as reduced maintenance costs
and application development efforts, they also have limitations in bandwidth and
communication delays [89]. Given these limitations, edge and fog computing have
emerged with the goal of distributing processing and storage close to data sources
(i.e. things). Today, developers tend to leverage the advantages of edge, fog, and
cloud computing to design multi-layered architectures for IoT systems.
Nevertheless, creating such complex designs is a challenging task. Even more
challenging is managing, and adapting IoT systems at runtime to ensure the optimal
performance of the system while facing changes in the environment conditions. IoT
systems are commonly exposed to changing environments that induce unexpected
events at runtime (such as unstable signal strength, latency growth, and software
and hardware aging) that can impact its QoS. To deal with such events, a number of
runtime adaptations should be automatically applied, e.g. architectural adaptations
such as auto-scaling and offloading tasks.
In this sense, a better support to define and execute complex IoT systems and
their (self)adaptation rules to semi-automate the deployment and evolution process
is necessary [133]. One of the most widely used approaches to deal with the com-
plexity of these large systems is Model-Driven Engineering (MDE) [75]. The use of
1.2. PROBLEM STATEMENT 3
models allows to raise the level of abstraction and capture the aspects of interest
of a real system. This helps to avoid the complexity and uncertainty of real and
complex scenarios.
Models are built using domain-specific languages (DSLs). In short, a DSL offers
a set of abstractions and vocabulary closer to the one already employed by domain
experts, facilitating the modeling of new systems for that domain. Nevertheless,
current DSLs for IoT do not typically cover multi-layered architectures [130, 67, 72,
43] and even less include a sublanguage to ease the definition of the dynamic rules
governing the adaptability of the IoT system.
This thesis is focused on the design of a model-based approach to support the
deployment and self-adaptation of multilayer IoT systems. The main research activ-
ities include a systematic literature review (SLR) to analyze the state-of-the-art, the
design and development of our approach for modeling and managing self-adaptive
IoT systems, and experimental studies to validate our approach. This approach is
composed of a DSL to model the multi-layered IoT architecture and its self-adaptation,
a code generator, and a framework to monitor and support system adaptations at
runtime.
RQ1 How to model multi-layer IoT systems including their adaptation scheme to
ensure their run-time operation in changing environments?
RQ2 How to manage real-time adaptations for multilayered IoT systems operating
in changing environments?
1.3 Contributions
The main contribution of this thesis is a model-based approach for the deployment
and management of multilayer IoT systems with self-adaptive capabilities. Specifi-
cally, it consists of:
C1 DSL. We have designed a new DSL for IoT systems focusing on three main
contributions: (1) modeling of multi-layer architectures of IoT systems, in-
cluding concepts such as IoT devices (sensors or actuators), edge, fog and
4 CHAPTER 1. INTRODUCTION
C2 Code generator. The model (built using the DSL) describing the self-adaptive
IoT system is the input to a code generator we have designed. This generator
produces several YAML1 manifests with two purposes: (1) to configure and
deploy the IoT system container-based applications and (2) to configure and
deploy the tools and technologies used in the framework that supports the
execution and adaptation of the system at runtime.
C4 DSL extensions. We propose two extensions of our DSL focused on the mod-
eling IoT systems for two different domains: (1) a DSL extension focused in the
IoT systems deployed for underground mining industry, and (2) a DSL exten-
sion for specifying IoT systems for Wastewater Treatment Plants (WWTPs).
These extensions involve the definition of new domain-specific concepts (e.g.
mine and tunnel concepts for the mining industry), and the design of graph-
ical editors to model specific scenarios.
Introduction Ch 1 Introduction
State of the Art Ch 2 State of the Art
Ch 3 Overview
Ch 4 Modeling Self-adaptive IoT Architectures
Our Approach Ch 5 Adapting IoT systems at runtime
Ch 6 Extending DSL for specific cases
Ch 7 Experimental Validation
Related Work Ch 8 Related Work
Conclusions Ch 9 Conclusions
Over the past few years, the relevance of the Internet of Things (IoT) has grown
significantly and is now a key component of many industrial processes and even
a transparent participant in various activities performed in our daily life. IoT sys-
tems are subjected to changes in the dynamic environments they operate in. These
changes (e.g. variations in the bandwidth consumption or new devices joining/leav-
ing) may impact the Quality of Service (QoS) of the IoT system. A number of
self-adaptation strategies for IoT architectures to better deal with these changes
have been proposed in the literature. Nevertheless, they focus on isolated types of
changes. We lack a comprehensive view of the trade-offs of each proposal and how
they could be combined to cope with dynamic situations involving simultaneous
types of events.
In this chapter, we identify, analyze, and interpret relevant studies related to IoT
systems adaptation and develop a comprehensive view of the interplay of different
dynamic events, their consequences on the architecture QoS, and the alternatives
for the adaptation. To do so, we have conducted a Systematic Literature Review
(SLR) of existing scientific proposals and defined a research agenda based on the
findings and weaknesses identified in the literature.
This SLR was the first research activity conducted in this doctoral thesis and the
results obtained were the inspiration to define our research objectives.
7
8 CHAPTER 2. STATE OF THE ART
of latency and bandwidth usage, since they allow data processing at the edge rather
than in the cloud.
Communication
The MAPE-K loop, proposed by IBM for autonomous computing [92], has been
implemented in several studies for the design of self-adaptive systems. MAPE-K is
a reference model to implement adaptation mechanisms in auto-adaptive systems.
MAPE-K includes four activities (monitor, analyze, plan, and execute) in an itera-
tive feedback cycle that operate on a knowledge base (see Figure 2.3). These four
activities produce and exchange knowledge and information to apply adaptations
due to changes in the managed element.
• Monitor: information about the current state of the system is collected, ag-
gregated, filtered, and reported. Data such as functional and non-functional
system properties are collected.
• Plan: according to the analysis made in the previous stage, an adaptation plan
is generated with the appropriate actions to adapt the system at run-time. The
adaptation plan contains tasks that could be either a complex workflow or
simple commands. This adaptation plan is sent to the execution component
of the next stage.
• Execute: the adaptations are applied to the system following the actions de-
fined in the adaptation plan.
notation used by users to define models can greatly impact the usability of the lan-
guage. To represent concepts with a graphic DSL, graphic objects such as connec-
tors, blocks, axes, arrows, and others are used. In contrast, a textual DSL is based
on grammar, e.g., SQL is a textual DSL used to perform database hides.
2.2.1 Method
A systematic literature review (SLR) is a methodology used for the identification,
analysis, and interpretation of relevant studies to address specific research questions
[91]. Our SLR consists of six main steps and is based on the methodology proposed
by Kitcheham et al. [94]. The steps followed for this SLR are illustrated in Figure
2.4 and documented below.
2.2. ANALYSIS OF IOT SYSTEM ADAPTATION 15
Research questions
Our goal is to identify the dynamic environmental events in the device and edge/fog
layers of an IoT system that could impact its QoS and therefore require the trigger
of self-adaptations of the system. In addition, we classify the strategies to achieve
this self-adaptation. For this purpose, our SLR addresses the following two research
questions:
• SLR-RQ1. Which dynamic events present in the edge/fog and device layers
are the main causes for triggering adaptations in an IoT system?
• SLR-RQ2. How do existing solutions adapt their internal behavior and ar-
chitecture in response to dynamic environmental events in the edge/fog and
device layers to ensure compliance with its non-functional requirements?
• Digital libraries: we chose four digital libraries for our search: Scopus, Web
16 CHAPTER 2. STATE OF THE ART
Search query
SQ1 ("fog" OR "edge" OR "osmotic") AND ("IoT" OR
"internet of things" OR "cyber-physical") AND ("architecture")
AND ("adapt*" OR "self-adapt*")
SQ2 "fog" AND "adapt*" AND "architecture" AND "orchestration"
SQ3 ("orchestration" OR "choreography") AND "fog" AND
"architecture" AND "dynamic"
of Science (WOS), IEEE Explore, and ACM. These libraries are frequently up-
dated and contain a large number of studies in the area of this research.
• Search queries: as shown in Table 2.1, we defined four search queries. We used
keywords including IoT, architecture, dynamic, adapt (or variations of this
word; e.g., adaptation), fog and edge (to retrieve studies that use distributed
architectures with fog and edge computing), orchestration or choreography
(two resource management techniques in the fog layer of an architecture).
We looked for matches in the title, abstract, and keywords of the articles.
• Search results: Table 2.2 shows the search results; we obtained 557 studies,
out of which 223 were duplicates, for a total of 334 studies.
first screening of the titles, abstracts and keywords, we used three exclusion criteria,
to exclude 117 out of the 334 studies. Then, in the second filter we analyzed the full
texts, and we discarded 170 additional studies. Finally, using Snowballing to check
the list of study citations we included three additional studies, for a total of 50 studies
(see Figure 2.4). The inclusion and exclusion criteria for each screening phase are
presented below.
First screening:
Second screening:
• (Inclusion) The study addresses a dynamic event in IoT systems that impacts
QoS.
Quality assessment
The quality assessment step consists of reading the studies in detail, and answering
the assessment questions to get a quality score for each study. We have defined five
quality assessment questions as follows:
• QA1. Are the aims clearly stated? (Yes) the purpose and objectives of the
research are clear; (Partly) the aims of the research are stated, but they are
not clear; (No) The aims of the research are not stated, and these are not
clearer to identify.
• QA2. Is the research compared to related work? (Yes) the related work is
presented and compared to the proposed research; (Partly) the related work
is presented, but the contribution of the current research is not differentiated;
(No) the related work is not presented.
• QA3. Is there a clear statement of findings and do they have theoretical sup-
port? (Yes) the findings are explained clearly and concisely, and are supported
by a theoretical foundation; (Partly) the findings are clearly explained, but
they lack theoretical support; (No) findings are not clear and have no foun-
dation or theoretical support.
18 CHAPTER 2. STATE OF THE ART
• QA4. Do the researchers explain future implications? (Yes) the author presents
future work; (No) future work is not presented.
• QA5. Has the proposed solution been tested in real scenarios? (Yes) The so-
lution is tested in a real scenario; (Partly) the solution is tested in a particular
test bed; (No) the solution is not tested in any scenario.
The score given to each answer was: Yes = 1, Partly = 0.5, and No = 0. We cal-
culated the quality score for each study and excluded those that scored less than
3, in order to select the primary studies that would be used for data extraction and
analysis. We analyzed 50 studies and excluded eleven because they obtained a qual-
ity score of less than three. In total we have obtained 39 primary studies for the
remaining steps of this SLR, and the quality scores for each is presented in Table
2.4. In the remainder of this chapter, we reference these studies in the text by their
assigned ID in the table.
Data collection
The extracted information was stored in an Excel spreadsheet. Table 2.3 shows
the Data Collected (DC) for each study and the research question addressed. First,
we extracted standard information such as title, authors, and year of publication
(DC1 to DC4). Second, we extracted relevant information to address the research
questions defined in section 2.2.1. DC5 records the environmental event addressed
by the study, and this information is used to address research question SLR-RQ1.
DC6 to DC10 are data collected about proposed solutions and strategies to achieve
self-adaptations in the IoT system, and this information is used to address research
question SLR-RQ2.
Data analysis
Table 2.4 presents the list of the 39 studies relevant to this SLR, with the following
information: the assigned identification number (ID), the author, the type of publi-
cation, the year of publication, the answers to the quality questions, and the quality
score obtained. In the following sections, we will refer to primary studies by the
assigned ID code.
From the standard information extracted from the papers, we can note that the
relevant publications for this SLR are relatively recent. The largest number of stud-
ies were published in recent years: 12 studies from 2019, 16 studies from 2018, 7
studies from 2017, 3 studies from 2016, and one study from 2015. As to the type of
publication, 25 are conference publications, 10 are journal publications, and 4 are
workshop publications.
2.2. ANALYSIS OF IOT SYSTEM ADAPTATION 19
# Field RQ
DC1 Author N/A
DC2 Title N/A
DC3 Year N/A
DC4 Publication venue N/A
DC5 Environmental event addressed by the solution SLR-RQ1
DC6 Favored quality attributes SLR-RQ2
DC7 Adaptation strategies and techniques SLR-RQ2
DC8 Architecture description SLR-RQ2
DC9 Architectural styles and patterns SLR-RQ2
DC10 Key responsibilities of architectural components SLR-RQ2
or conditions in which the devices are surrounded. The system devices may in-
crease or decrease the frequency of data transmission due to different stimuli. The
consequences generated by this dynamic event in IoT systems commonly lead to
increased latency and the unavailability of system services, because increased data
volume could congest the network and generate bottlenecks. In addition, this dy-
namic event implies growth in the data to be analyzed or processed by the edge
devices, which likely have limited computer resources. Therefore, the edge nodes
could be overloaded with processing work until they generate delays, down times,
or unavailability.
processes that control the system, and unavailability of the system, among others.
This is why it is essential to ensure the security of the IoT system by designing
self-adaptation techniques to defend against attacks.
ID Adaptation Studies
A1 Data flow reconfiguration S3, S5, S8, S9, S10, S11,
S14, S15, S17, S19, S20,
S23, S25, S35, S37, S38
A2 Auto Scaling of services and applications S2, S7, S17, S18, S19,
S22, S31, S39
A3 Software deployment and upgrade S4, S13, S16, S28
A4 Offloading tasks S1, S6, S21, S26, S27, S29,
S30, S32, S33, S34, S36
vices involved in the communication, such as gateways and messaging servers, are
strategically selected to carry the data to the nodes that perform the processing.
Some authors propose to reconfigure the data flow for balancing the load be-
tween the edge/fog nodes, or to redirect the data flow to the node with the best
conditions (resource availability and lower response latency). For example, S8 pro-
poses a framework that enables the developer to specify dynamic QoS rules. A rule
is made up of a source device (e.g. a video camera), a target device (e.g., a web
server), a rule activation event (e.g. when a system sensor detects motion), and a
QoS requirement that must be guaranteed (e.g. 200ms communication latency be-
tween source and target). When the event configured in the rule is triggered, the
path of the data flow between the source and the destination is reconfigured to es-
tablish the optimal path through a set of switches. This architecture assumes that
there are several switches that enable communication between the device layer de-
vices and the cloud layer. However, the edge/fog layer is not included to do edge
processing, which could improve system QoS by lowering latency and bandwidth.
The system architecture proposed in S8 assumes that the edge/fog layer is composed
of devices that only serve the function of relaying the data, but the data processing
capacity in the edge devices is ignored. Additionally, it is necessary to consider us-
ing the MQTT protocol and broker for communication which offers lower power
consumption and low latency due to its very small message header and packet mes-
sage size (approximately 2 bytes) [171].
used to ensure stable application performance, and it is one of the most widely used
techniques in web applications deployed in the cloud. Auto-scaling is also used in
IoT systems but with additional considerations to take into account. For example,
when scaling a service on an edge node or fog, it is necessary to strategically select
the node that has availability of the necessary computing resources and that offers
the greatest communication latency benefits.
In S2, an auto-scaling method is proposed for a distributed intelligent urban
surveillance system. The proposed architecture has three layers: video cameras in
the device layer, desktops in the edge layer to analyze the video information, and
cloud servers that host the web application for the end user. When the video cam-
eras detect an emergency, the frame rates of video capturing increase and image
analysis for some objects turn to high-priority tasks. The system then scales the
data analysis application by deploying virtual machines to the edge nodes closest to
the emergency site. However, deploying the application at the node closest to the
device layer device does not always guarantee the best performance. Other factors
such as network latency and node specifications should be considered for applica-
tion allocation decisions. Additionally, the use of virtual machines has limitations
given the resource scarcity that characterizes edge nodes. Other virtualization tech-
nologies such as containers have advantages for deploying applications to edge/fog
layer nodes. In particular, the reduced size of the images and the low startup time
are advantages that make containers suitable for IoT systems.
These QoS factors should also be considered for software allocation decisions in fog
nodes. Additionally, Foggy does not monitor the state of the running docker con-
tainers to detect and fix failures through actions such as rollback to the previous
stable version or redeployment of the software container.
Offloading tasks
The processing tasks executed at the edge/fog nodes can be classified according to
their importance and their required response time. While there are system tasks that
do not require immediate processing, other tasks such as real-time data analysis are
critical to the system and require low response latency. It is necessary to guarantee
low latency for these critical tasks, but it is not trivial to achieve this when dynamic
events occur in the system such as increased data flow from the device layer. The
adaptation strategy Offloading tasks addresses this problem in the following way:
to guarantee low response latency for critical processing tasks performed by the
edge/fog nodes, non-critical tasks are offloaded to the cloud servers to free up ca-
pacity in the edge/fog nodes. However, it is necessary to establish when it is really
necessary to offload tasks to the cloud servers.
S6 proposes an architecture that coordinates data processing tasks between an
edge node and the cloud servers. The edge node performs data processing tasks of
the data collected by IoT devices. A monitoring component frequently checks the
CPU usage of the edge node, and every time the value exceeds a usage limit (75%)
one of the non-critical tasks executed by the node is offloaded to a cloud server.
This frees up resources on the fog node for processing tasks that require low la-
tency. However, before moving tasks to cloud servers, the offload tasks between
neighboring edge/fog nodes that have the necessary resources available should be
considered to take advantage of edge and fog computing. In particular, response la-
tency is lower for tasks that can be executed in the edge/fog layer rather than in the
cloud layer. Additionally, decisions to move tasks from one node to another node
or to a cloud server could be determined by other factors such as latency, RAM us-
age, power consumption, and battery level (if the node is battery powered). These
factors must be monitored and analyzed to make intelligent offloading decisions
according to the QoS requirements of the system.
the conclusions above suggest already some areas that are not yet fully developed
even if some works start to appear that address them.
Nevertheless, we want to highlight additional significant open challenges we
believe need to be addressed to improve current adaptation strategies.
We have classified the problems and challenges into four topics that we sum-
marize below. In particular, topic 4 (Global self-adaptive architecture) is studied in
depth in this thesis.
gorithms can also help to prevent disruptive events affecting system availability and
QoS. While there are traditional challenges for the design of a learning algorithm
such as the selection of the efficient model, the amount of data, and data cleaning,
there are also other problems related to the technologies and processes to obtain
the data or features. For example, the monitoring of non-functional properties such
as accuracy, frequency, sensitivity, and drift is one of the challenges due to the het-
erogeneity of IoT devices in the device layer.
The studies included in this SLR propose techniques and strategies to address at
most two of the dynamic events. However, in some scenarios or domains, it is nec-
essary to propose solutions to support various/simultaneous dynamic events. For
example, a smart city system synchronizes the basic functions of a city based on
seven key components, including natural resources and energy, transport and mo-
bility, buildings, life, government, economy, and people [42]. Due to the large num-
ber of IoT devices considered, a smart city system can experience all the dynamic
events that we have identified in table 2.5.
Therefore, it is necessary to design a general architecture for IoT systems with
components to monitor, detect events, and self-adapt the system: an architecture
with the ability to adapt to various dynamic events. For example, a system that
can detect failures in software updates and perform operations such as software
rollback, while supporting new devices being added to the system. This same system
could also support other types of events such as dynamic data transfer rate and
network connectivity failures.
For designing this general self-adaptive architecture for IoT systems, some base
technologies are especially promising. For example, the MQTT communication pro-
tocol is ideal for IoT applications since it presents advantages concerning scalability,
asynchronism, decoupling between clients, low bandwidth, and power consump-
tion. Regarding virtualization technology, containerization offers several advan-
tages for software deployment in IoT systems. In particular, it is possible to deploy
containers on various types of hardware and operating systems, something very
useful considering the heterogeneity of nodes in the edge/fog layer. For example, it
is possible to deploy a container with an application on both a RaspberryPI2 and a
Linux server.
2
https://www.raspberrypi.org
30 CHAPTER 2. STATE OF THE ART
This thesis focuses on addressing the challenges and concerns classified in Topic
4 of Section 2.2.5: design a general architecture/framework to support self-
adaptations in IoT systems. This general framework must support the activities
performed at design-time (to specify the system) and at runtime (to self-adapt the
system).
In this thesis, we address actions and adaptations patterns grouped in two cate-
gories: (1) architectural adaptations (such as those identified in Table 2.7) to guar-
antee system availability and performance despite dynamic events; and (2) system
actuators control to meet system functional requirements involving system ac-
tuator management (e.g., activating/deactivating alarms, turning on/off lamps, and
increasing the power of a fan). In Chapter 4, we discussed the specification of rules
involving these two types of actions or adaptations.
2.3. CONCLUSION 31
2.3 Conclusion
As the first research activity of this thesis, we have conducted an SLR to study the
dynamic events that impact the QoS of IoT systems, to analyze the strategies imple-
mented by the literature in order to address them, and to identify the weaknesses
of the approaches found in the state-of-the-art.
We identified six types of dynamic events or unexpected changes and four adap-
tation strategies in response to the events. Monitoring the resource consumption of
the edge/fog nodes is one of the most used strategies to detect some dynamic events
of the system. In particular, the consumption of CPU and RAM memory are metrics
frequently monitored to identify when a node fails or is close to failure.
We have identified open challenges that we believe need to be addressed to im-
prove current adaptation strategies. These challenges are classified into four topics:
(1) monitoring and logging the dynamic events themselves, (2) software deployment on
heterogeneous devices, (3) machine learning for self-adaptable systems, and (4) global
self-adaptive architecture. In this thesis, we focus on the challenges of topic 4 to
support the design and management of self-adaptive multi-layer IoT architectures.
The design of a DSL for the specification of these systems at desig-time, and the
design of a framework to support the system at runtime are some of the tasks we
conducted to address these challenges.
Chapter 3
Overview
33
34 CHAPTER 3. OVERVIEW
(e.g. real time detection of emergencies), other tasks that do not demand immedi-
ate response are executed in the cloud (e.g. generation of historical data reports).
We will use a simple Smart Building scenario as a running example to better illus-
trate our approach. Other case studies modeling real-world Underground Mining
and Wastewater Treatment Plants (WWTPs) are presented in Chapter 6. We prefer
to introduce our approach modeling a Smart Building scenario because it has been
well-studied in the literature and may result more suitable to ease understanding.
Adopting the concept of smart building, a hotel company (Hotel Beach) wants
to reduce fire risks by automating disaster management in its hotels. A fire alarm
and monitoring system are implemented in each of the company’s hotels. We will
assume that all buildings (hotels) have three floors with two rooms each. Fig. 3.2
presents an overview of the 1st floor of this building. According to this, the infras-
tructure (device, edge/fog, and cloud layers) of the company hotel IoT system are as
follows.
• Device layer. Each room has a temperature sensor, a carbon monoxide (CO)
gas sensor, and a fire water valve. Furthermore, an alarm is deployed on the
lobby. Each sensor has a threshold measurement to activate the correspond-
ing alarm, e.g. a person should not be continuously exposed to CO gas level
of 50 parts per million (ppm) for more than 8 hours, and 400 ppm for more
than 4 minutes.
• Edge layer. In each room, an edge node receives the information collected
by the sensors of the device layer and run a software container (C1 and C2)
for analyzing sensor data in real time to check for the presence of smoke and
generate an alarm state that activates the actuators. A fog node (linked to the
edge nodes) is located in the 1st floor of the building. This node runs the C3
container (running App2, a machine learning model to predict fires on any
of the building’s floors), and C4 (running App3, in charge of receiving and
distributing data, typically a MQTT broken as we will see later on).
• Cloud layer. The cloud layer has a server or cloud node that runs the C5
container, a web application (App4) to display historical information of sensor
data and of fire incidents in any of the hotels property of the company.
Table 3.1 summarizes the applications features and the containers deployed in
the IoT system infrastructure. Ports, memory, and CPU values are approximate to
real applications. In Chapter 4 we introduce the modeling of the architecture of this
system, including the applications, containers, and adaptation.
3.3. RESEARCH METHODOLOGY 37
Figure 3.2: Overview of the smart building IoT system, first floor
information systems and computer science research areas for the creation and eval-
uation of IT artifacts [125]. DSR involves the construction of artifacts as decision
support systems, modeling tools, governance strategies, and methods for system
evaluation. Two high-level stages are performed in the DSR methodology: build
(construct an artifact for a specific purpose) and evaluate (determine how well the
artifact behaves) [106]. Peffers et al. [125] synthesizes this methodology into 6 ac-
tivities as shown in figure 3.3.
We addressed the first research activities (problem identification and definition
of research objectives) by conducting an SLR. Through this SLR (published in the
Journal of Internet Services and Applications [6]) we have identified the open chal-
lenges covered in this thesis. Then, we built the artifacts that make up our proposed
solution, defined a suitable context for its illustration, performed one or several eval-
uations, and finally sought a communication method such as conferences, journals,
or workshops.
Developing our approach solution involves the design and implementation of
several software artifacts such as metamodels, a code-generator, monitors, services,
and the remaining artifacts of the architecture presented in Figure 3.1. The devel-
opment and evaluation of the artifacts have been classified in two groups: artifacts
3.4. CONCLUSION 39
Figure 3.3: Design Science Research Methodology (DSR) process model [125]
3.4 Conclusion
In this chapter we presented an overview of our proposal, a comprehensive ap-
proach for modeling and managing self-adaptive, multi-layer IoT systems. This ap-
proach involves multiple technologies, techniques, components and software tools
in two stages: design time for the specification of the multi-layered IoT architecture
and its adaptive behaviour, and runtime to support the operation and adaptation of
the system.
A running example of a smart building system is also presented in this chapter.
This example is to better illustrate our approach in the following chapters. Finally,
we presented the main steps of the Design Science research methodology adopted
to develop and evaluate the software artifacts of this thesis.
Chapter 4
Modeling IoT architectures is a complex process that must cover as well the specifi-
cation of self-adaptation rules to ensure the optimized execution of the IoT system.
To facilitate this task, we propose a new IoT Domain-Specific Language (DSL) cover-
ing both the static and dynamic aspects of an IoT deployment. Our DSL is focused
on three main contributions: (1) modeling primitives covering multi-layered IoT
architectures, including IoT devices (sensors and actuators), edge, fog, and cloud
nodes; (2) modeling the deployment and grouping of container-based applications
on those nodes; and (3) a specific sublanguage to express rules.
In this chapter, we address the design of the components involved in the de-
sign time phase of our approach illustrated in Figure 3.1 (i.e., the DSL and the code
generator). Our DSL for modeling the static and dynamic aspects of the IoT sys-
tem is introduced as follows. First, Section 4.1 describes the abstract syntax and
the concrete syntax of the DSL elements for the specification of static aspects in-
cluding architecture and deployment of containerized applications. Then, Section
4.2 covers the dynamic ones, i.e. the specification of rules. Section 4.3 describes
the DSL implementation and Section 4.4 presents the code generator developed to
produce YAML manifests and configuration files. Finally, Section 4.5 presents an
installation and configuration guide for using the DSL, and Section 4.6 concludes
this chapter. To illustrate the concepts of the metamodel, we will use the running
example presented in Section 3.2.
41
42 CHAPTER 4. MODELING SELF-ADAPTIVE IOT ARCHITECTURES
unit of the monitored variable by a sensor can be represented through the attributes
threshold and unit.
The location of IoTDevices can be specified through geographic coordinates (lat-
itude and longitude attributes). Both Sensors and Actuators have a type represented
by the concepts SensorType and ActuatorType. For instance, following the running
example (Fig. 3.2), there are temperature and smoke type sensors, and there are
valve and alarm type actuators.
Physical (or even virtual) spaces such as rooms, stairs, buildings, or tunnels can
be represented by the concept Region. A Region can contain subregions (relationship
subregions in the metamodel). For example, region Floor1 (Fig. 3.2) contains subre-
gions Room1, Room2, Lobby, and Stairs. IoTDevices, EdgeNodes, and FogNodes are
deployed and are located in a region or subregion (represented by region relation-
ships in the metamodel). Back to the running example, the edge-a1 node is located
in the RoomA1 region of Floor1 of the Hotel Beach, while the fog-f1 node is located
in the Lobby region of Floor1.
Edge, fog and cloud nodes are all instances of Node, one of the key concepts of
the metamodel. A node has the ability to host the software containers. Communi-
cation between nodes can be specified via the linkedNodes relationship as we may
want to indicate what nodes on a certain layer could act as reference nodes in an-
other layer (e.g. what cloud node should be the first option for a fog node). Nodes
can also be grouped in clusters that work together. A Cluster has at least one mas-
ter node (represented by the master relationship) and one or several worker nodes
(represented by the workers relationship). The details of each node are expressed via
attributes such as ip address (ipAddress), operating system (OS), number of cores in
the processor (cpuCores), RAM memory (memory), storage capacity (storage), and
processor type (processor enum).
A Node can host several software containers according to its capabilities and
resources (primarily cpuCores, memory, and storage). The cpu and memory usage of
a container can be restricted through cpuLimit and memoryLimit attributes. Each
software container runs an application (represented by the concept Application) that
has a minimum of required resources specified by the attributes cpuRequired and
memmoryRequired. The repository of the application image is specified through
the imageRepo attribute, and the used ports through port and k3sPort attributes. The
container volumes and their paths (a mechanism for persisting data used and gen-
erated by containers) are represented by Volume concept. Finally, the MQTT broker
that receives and distributes the messages can also be specified and deployed in a
software container, and its broker topics are represented by the topics relationship.
44
CHAPTER 4. MODELING SELF-ADAPTIVE IOT ARCHITECTURES
Regions
Figure 4.4 shows the specification of the Hotel Beach regions, in particular those
on Floor 1, i.e. four subregions: two Rooms, the Lobby and the Stairs. The regions
defined in this tree diagram are then referenced in the specification of the nodes,
sensors, actuators and rules of the system.
Applications
Fig. 4.5 depicts the modeling of the IoT system applications, including its techni-
cal requirements and repository address. The memory and cpu requirements are
primarily used to determine if the nodes that will host the application container
have the necessary resources. The port specifications are to configure the container
ports, and the repository to download the image of the containerized application.
4.1. MODELING OF THE IOT ARCHITECTURE 47
Nodes
For describing the system nodes, we propose a tabular notation. Figure 4.6 shows
the specification of the nodes deployed in Floor 1 of the Hotel. The node description
includes the layer it belongs to (edge, fog, or cloud), the hardware properties (such as
48 CHAPTER 4. MODELING SELF-ADAPTIVE IOT ARCHITECTURES
memory and storage resources), the regions where it is located, and the application
containers it hosts. Note that C4 is the only container that uses a volume for the
MQTT broker configuration parameters (mosquitto1 in this example).
Clusters
To specify a cluster of nodes, at least one master node and one worker node are
required. An example of cluster modeling is shown in Figure 4.7, in which the cluster
composed of the Hotel nodes is modeled. Although there is one or more master
nodes managing the cluster (usually cloud nodes), constant Internet connection is
not a mandatory for multi-layered architectures. The edge/fog nodes can operate
as a standalone network node with limited internet connectivity.
Broker topics
To specify the MQTT topics, the container running the broker must be selected.
Figure 4.8 shows the topics defined for the sensors and actuators deployed on Floor
1 of the Hotel, whose broker is running on the C4 container. The topics in this
example follow the nomenclature floor/room/sensor_type, however, these could be
specified following a more complex nomenclature according to the case.
1
https://mosquitto.org/
4.2. MODELING OF RULES 49
we rely on our previous systematic literature review presented in Chapter 2.2. For
instance, the three architectural adaptations (offloading, scaling, and redeployment)
addressed in this study were identified in the SLR. Our language covers all of them
and even enables complex rules where policies involving several strategies can be
attempted in a given order.
4.2. MODELING OF RULES 51
Figure 4.10: Metamodel depicting rules. The concepts shaded with gray color (such as Cluster, Application, and Node)
have been previously defined in the metamodel in Figure 4.1
53
54 CHAPTER 4. MODELING SELF-ADAPTIVE IOT ARCHITECTURES
(to avoid firing the rule in reaction to minor disturbances) before executing the
rule. Once fired, all or some of the actions are executed in order, depending on
the allActions attribute. If set to false, only the number of Actions specified by the
attribute actionsQuantity must be executed, starting with the first one in order and
continuing until the required number of actions have been successfully applied.
For the sake of clarity, we have grouped the rule concepts into two categories:
Architectural Adaptation Rules and Functional Rules but note that they could be all
combined, e.g., a sensor event could trigger a functional response such as triggering
an alarm and, at the same time, an automatic self-adaptation action, such as scaling
of apps related to the event to make sure the IoT system has the capacity to collect
more relevant data).
Among the Actions:
We show how to use the rule’s concrete syntax to model two rules from the smart
building example.
Secondly, we model another rule (see Fig. 4.14) to activate the alarm (a-lobby)
when any gas sensors in the Floor1 region (gas-a1 or gas-b1) detects a gas concen-
tration greater than 400ppm for 10 seconds. The "On" message is published in the
broker topic consumed by the actuator (a-lobby alarm). Note that there are two
ways to model this rule. While Option 1 involves all CO type sensors on Floor 1,
Option 2 directly involves both gas sensors.
58 CHAPTER 4. MODELING SELF-ADAPTIVE IOT ARCHITECTURES
Projectional editors such as MPS enable editing of the model by means of projec-
tions of the abstract syntax, but the model is stored in a format (e.g. XML) indepen-
dent of its concrete syntax. In other words, the user interacts with these projections,
which are then translated by the editor to modify the persisted model. Some benefits
of projectional editing are discussed below [162].
• The IDE used for model editing can provide code completion, error checking
and syntax highlighting.
Defining a language in MPS involves the design of several aspects. The defi-
nition of our DSL includes six aspects: Structure to define the language concepts
(abstract syntax), Editor to define the editors for those concepts (concrete syntax),
Constraints and Type-System to define a set of time-system rules and constraints
(well-formedness rules), Behaviour to define reusable methods and functions, and
Generator to define a code generator. These aspects are described below.
3
http://mbeddr.com/
4.3. BUILDING A MODELING ENVIRONMENT FOR THE DSL 61
Figure 4.16: Definition of the tabular editor for the Sensor concept
to generate an error message in the model when the containers allocated to a node
exceed the node’s memory and CPU capacities (WFR7 in Figure 4.2). That is, when
the available memory and CPU of the node is less than zero. The methods avail-
ableMemory and availableCPU are defined in the Behaviour aspect of MPS (Section
4.3.4).
4.18). To calculate the available memory of a node, first the total memory required
(memory_used variable) of the containerized applications that are hosted on the
node is calculated by iterating the node’s containers (for loop). Then the method
calculates and returns the available memory by subtracting the used memory from
the node memory.
port tools including the implementation of rules using PromQL5 language. More
specifically, the generated code includes configuration and deployment files to the
following components (most of these components are detailed in Chapter 5):
• YAML Manifests to deploy the monitoring tools and exporters such as kube-
state-metrics6 , node-exporter, and mqtt-exporter7
• and the Grafana application to display the monitored data stored in the Prometheus
database.
For example, the property macro $[ip] was configured to return the ip address of
the node hosting the MQTT broker. For each adaptation and some expressions we
have defined a reduction rule.
4.4.2 Templates
In the template is where the transformation and code generation is performed. We
have implemented the PlainText Generator 9 plugin to define the templates of our
generator. The templates contain different types of macros used to calculate the
value of a property (e.g., to get the name of a container), to get the target of a refer-
ence, or to control template filling at generation time.
Figure 4.21 shows an excerpt of the template that generates the YAML code for
the deployment of container-based applications via pods (Note that this template
es referenced in the root mapping rule in Figure 4.20). First we attach two LOOP
macros to the template that contain the node.nodes and node.containers expressions
respectively (these expressions are entered through the inspector window and are
not shown in the Figure). This enables the loop through and generation of the
deployment code of a pod for each container. We also use the property macro (dollar
sign) to replace the properties of the container in the generated template such as
the name, the image repository, the limit and required resources, the ports, and the
volumes if any.
9
https://jetbrains.github.io/MPS-extensions/extensions/plaintext-gen/
66 CHAPTER 4. MODELING SELF-ADAPTIVE IOT ARCHITECTURES
The code generated to deploy the application container C1 in the edge-a1 node
of the running example is shown in Figure 4.22. Note that the parameters such
as image, request resources (memory and cpu), and containerPort match the App1
specification (Figure 4.5). Finally, the node that will host the pod is restricted by the
nodeSelector tag.
In the Appendix B you can find the detailed guide to install and use the DSL,
generate the code, and run the framework.
4.6 Conclusion
In this chapter we have presented a DSL for modeling multi-layered architectures of
IoT systems and their rules (architectural adaptations and functional rules). We have
introduced the abstract and concrete syntax of the DSL by illustrating the concepts
through a running example of smart building. The abstract syntax is presented
through meta-models that abstract the concepts that allow specifying the multi-
layer architecture of the IoT system (including devices and nodes of the device,
edge, fog and cloud layers), the deployment of container-based applications, and
the dynamic rules to guarantee the operation of the system.
The DSL is implemented as a projectional editor created with the Jetbrains MPS
tool. This gives us the flexibility to offer, and mix, a variety of concrete notations for
the different concepts of the DSL. The DSL design includes the definition of several
10
https://github.com/SOM-Research/selfadaptive-IoT-DSL.git
11
https://blog.jetbrains.com/mps/2021/05/mps-2021-1-has-been-released/
12
http://mbeddr.com/platform.html
4.6. CONCLUSION 67
aspects such as Structure for the abstract syntax, Editor for the concrete syntax, and
Constraints to define well-formedness rules.
We have also presented a code generator designed in MPS to generate software
artifacts (YAML files) for the deployment and management of the IoT system at
runtime. To generate the code, M2T transformations are performed by configuring
mappings and templates in MPS. Transformation rules (such as root mapping and
reduction rules) are defined to generate templates and obtain the generated code.
The software artifacts generated include YAML manifests for deploying and con-
figuring containerized IoT applications, kubernetes monitoring tools, Pormetheus
tools, the Adaptation Engine, and Grafana.
68 CHAPTER 4. MODELING SELF-ADAPTIVE IOT ARCHITECTURES
Figure 4.21: Excerpt of the M2T transformation for the YAML code generation of
the IoT applications
4.6. CONCLUSION 69
Engineering IoT systems is a challenging task in part due to the dynamicity and
uncertainty of the environment [9]. IoT systems should be designed to meet their
goals by adapting to dynamic changes. In Chapter 4, we have presented our DSL
for the specification of multilayered IoT architectures, architectural adaptation and
functional rules, and the code generator. In this chapter we detail the design of our
approach to support the self-adaptation of the IoT system at runtime. This approach
is based on the MAPE-K loop, a reference model to design of self-adaptive systems
[92].
The rest of this chapter is organized as follows: Section 5.1 describes our runtime
approach to support IoT system adaptations at runtime. Section 5.2 illustrates our
approach through the running example of the smart building. Finally, Section 5.4
concludes the chapter.
71
72 CHAPTER 5. ADAPTING IOT SYSTEMS AT RUNTIME
We next describe how our architecture particularizes the generic MAPE-K con-
cepts for self-adaptive IoT systems.
5.1.1 Monitor
In the Monitor stage, information about the current state of the IoT system is col-
lected and stored. Figure 5.2 shows the Monitor stage of our framework and the
YAML manifests used for deployment and configuration. We have implemented
Prometheus Storage1 (a time-series database TSDB) to store the information col-
lected by three monitors and exporters (kube-state metrics, Node exporter, and
MQTT exporter). We have adopted a time-series database because, compared to
other types of databases (e.g., documentary or relational databases), Prometheus
is optimized to store information in a time-efficient format, enhancing the queries
performed in time windows. These queries are necessary to verify the activation of
adaptation rules at run-time. Additionally, Prometheus contains modules and com-
ponents that facilitate the tasks performed in the later stages of our framework such
as analysis and planning (discussed in previous sections).
Four YAML manifests are required to deploy and configure Prometheus TSDB
(other similar manifests are needed for monitors and exporters): deployment.yaml
to deploy the Prometheus TSDB inside a Kubernetes pod, service.yaml to make it
1
https://prometheus.io/docs/prometheus/latest/storage/
5.1. RUNTIME FRAMEWORK 73
1. Counter is an accumulative metric whose value can only increase but not
decrease.
2. Gauge is a metric that represents a numerical value that can go up and down
at any given time (e.g., processor temperature).
3. Histograms and
The storage of infrastructure metrics (such as CPU usage) and QoS is mostly
through Gauge metrics, but for some database queries we use Histograms and Sum-
maries (e.g. to get the CPU usage of a node in the last 5 minutes).
The collected information is classified into two groups: (1) infrastructure and
QoS metrics, and (2) data sensor metrics (published in the system’s MQTT broker).
These two kinds of information are aligned with the addressed types of rules, i.e.,
architectural adaptation and functional rules.
74 CHAPTER 5. ADAPTING IOT SYSTEMS AT RUNTIME
2
https://github.com/kubernetes/kube-state-metrics
3
https://github.com/prometheus/node_exporter
Table 5.1: QoS and Infrastructure metrics
DSL metric Prometheus time series name Metric type Exporter/Service Description
Availability up Gauge Kube-state-metrics Equal to 1 if the component
being monitored is available,
5.1. RUNTIME FRAMEWORK
0 otherwise
CPU node_cpu_seconds_total counter Node Exporter Counts the number of
seconds the CPU has been
running in a particular mode
RAM node_memory_MemAvailable_bytes Gauge Node Exporter Gets the available and total
node_memory_MemTotal_bytes Ram memory of the node
Disk usage node_filesystem_avail_bytes Gauge Node Exporter Gets the available and total
node_filesystem_size_bytes disk space of the node
Bandwidth in node_network_receive_bytes_total Counter Node Exporter Counts the number of bytes
of incoming network traffic
to the node
Bandwidth out node_network_transmit_bytes_total Counter Node Exporter Counts the number of bytes
of outgoing network traffic
from the node
75
76 CHAPTER 5. ADAPTING IOT SYSTEMS AT RUNTIME
1 apiVersion: v1
2 kind: ConfigMap
3 metadata:
4 name: mqtt - exporter - config
5 namespace: monitoring
6 data:
7 conf.yaml: |
8 mqtt :
9 host : ’192.168.10.3 ’
10 port : 30070
11
12 metrics :
13 - name : ’ floor1_roomA1_smoke ’
14 help : ’ Topic floor1 / roomA1 / smoke ’
15 type : ’ gauge ’
16 topic : ’ floor1 / roomA1 / smoke ’
17 label_configs :
18 - source_labels : [’ __msg_topic__ ’]
19 separator : ’/’
20 regex : ’(.*) ’
21 target_label : ’ __topic__ ’
22 replacement : ’\1 ’
23 action : ’ replace ’
4
https://github.com/fhemberger/mqtt_exporter
5.1. RUNTIME FRAMEWORK 77
5.1.2 Analyze
The information collected in the Monitor stage must be analyzed, and dynamic
events that require adaptations must be identified. To deal with this, we have
used Prometheus Alerting Rules5 (see Figure 5.3) to define alert conditions based on
the rules consigned in the manifest rules.yaml. Prometheus Alerting Rules queries
Prometheus TSDB using the PromQL query language. An alert is sent to the next
MAPE-K loop stage (Plan) whenever one of the rule conditions is firing. Alerting
Rules is a Prometheus TSDB feature so it does not require dedicated manifests to
deploy.
Each rule consists of a name, an expression, a time period, and labels and anno-
tations to store alert information. The code presented in Listing 5.2 is an example
of an alert rule configuration for Prometheus Alerting Rules. The expression of this
rule get the percentage of ram consumption (using the total ram and the available
ram) of thefog-f1 node and checks if it exceeds 80%. The for tag defines how long
the expression must be true to generate the alert (one minute). Finally, we use the
labels and annotations to include information about the actions linked to the alert.
Each IoT system rule (either architectural adaptation or functional rule) specified
through the DSL is transformed into an alert rule of Prometheus.
1 - alert: HighMemoryConsumption
5
https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/
78 CHAPTER 5. ADAPTING IOT SYSTEMS AT RUNTIME
5.1.3 Plan
According to the analysis performed in the previous stage, an adaptation plan is
generated with the appropriate actions to adapt the system at runtime. The adapta-
tion plan contains the list of actions (scaling, offloading, redeployment, or operate
actuator) that the user has defined for each rule via the DSL. In this stage (see Figure
5.4), Prometheus Alert Manager is used to handle the alerts from the previous stage
(Analyze) and routing the adaptation plan to the next stage (Execute). Notification
receivers can be configured to send the alert message to third-party systems such as
email, slack, or telegram. We configured a webhook receiver to notify the alerts and
adaptation plan to the Adaptation Engine. Therefore, the adaptation plan is sent
as an HTTP POST request in JSON6 format to the configured endpoint (i.e., to the
Adaptation Engine).
Three YAML manifests are required for the deployment and configuration of
Prometheus Alert Manager: deployment.yaml to deploy it as a container inside a
pod, service.yaml to make it accessible from outside the cluster, and config-map.yaml
6
(JavaScript Object Notation) is a lightweight data exchange format
5.1. RUNTIME FRAMEWORK 79
for its configuration which includes the notification receiver (the webhook config-
ured).
5.1.4 Execute
In the Execute stage (see Figure 5.5), adaptations are applied to the IoT system fol-
lowing the actions defined in the adaptation plan. To achieve this, we have built the
Adaptation Engine, an application developed using Python7 , flask8 , and the python
API9 to manage the Kubernetes or K3S orchestrator. The Adaptation Engine is freely
available in our repository10 and also the image of the container11 ready to be exe-
cuted. Similar to the Prometheus Alert Manager, three YAML manifests are needed
at this stage: deployment.yaml to deploy the Adaptation Engine as a container, ser-
vice.yaml to configure its accessibility, and clusterRole.yaml to assign management
privileges over the IoT system infrastructure (e.g. privileges to delete or create pods).
The Adaptation Engine can apply two sets of actions: (1) architectural adap-
tations through the orchestrator (e.g., autoscaling an application or offloading a
pod); and (2) system actuators control to meet system functional requirements
involving system actuator management (e.g., activating/deactivating alarms, turn-
ing on/off lamps, and increasing the power of a fan)
7
https://www.python.org/
8
https://flask.palletsprojects.com/en/2.1.x/
9
https://github.com/kubernetes-client/python
10
https://github.com/ivan-alfonso/adapter-engine.git
11
https://hub.docker.com/r/ivanalfonso/adaptation-engine
80 CHAPTER 5. ADAPTING IOT SYSTEMS AT RUNTIME
There are three architectural adaptations that the Adaptation Engine is able to
perform: (1) scaling an application by deploying a new pod on one of the nodes, (2)
offloading a container/pod to a different node, and (3) redeploying a pod/container.
These three architectural adaptations mainly benefit IoT application availability and
system performance. On the other hand, the system actuators control is performed
by sending control messages (defined by the user) to the actuator MQTT topic.
Using the Python API for Kubernetes, the Adaptation Engine manages the ob-
jects in the Kubernetes cluster to perform the architectural adaptations. We have
defined methods for creating, deleting, and scaling pods. For example, Listing 5.3
shows an excerpt the method used by the Adaptation Engine to create a pod without
node selection preferences to host it. Line 1 imports the library. Line 3 defines the
method with all the input parameters needed to create the pod (these parameters
are included in the adaptation plan sent from the Plan stage). Lines 4-5 create the
pod object and assign its metadata (such as name). Line 6 defines the memory and
cpu requirements of the container created in line 7. Line 8 sets the repository of the
container image. Line 10 verifies that the pod has no node selection preferences.
The software containers are assigned to the pod on lines 11-12, and the pod is cre-
ated on line 13. Finally, we verify if the pod was created correctly (lines 14-16). We
have created the verify_pod_creation method to obtain the pod status and verify its
creation. To create pods that have node selection preferences, we use the Kuber-
netes affinity and anti-affinity specifications, which restrict the nodes that will host
the pod.
• In the Monitoring stage, the exporters gather information about CPU con-
sumption of the fog-f1 node. This information is stored in the Prometheus
database.
• Then, in the Analysis stage, the condition of the rule is verified by executing
query expressions in PromQL language. For example, the expression (exe-
cuted by Prometheus Alerting Rules) that checks if the CPU consumption of
the fog-f1 node exceeds 80% for 1 minute is presented in listing 5.4. Note that
we are calculating the average amount of CPU time used excluding the idle
time of the node. If the condition is true, the alert signal is sent to the Alert
Manager component of the next stage of the cycle (Plan).
• When the alert is received, the adaptation plan is built containing the two
actions (offloading and scaling) and their corresponding information in JSON
format. For example, Listing 5.5 shows the JSON built by the code generator
for the offloading action. The information attached to the JSON object in-
cludes the name of the pod/container to be offloaded (pod_name), the image
of the application running the container (image), the Kubernetes/K3S names-
pace, the memory and cpu requirements of the application (requirements),
and the taget nodes and target regions where the C4 container would be of-
floaded, for this example, the edge-b1 node and the Floor 1 region.
• In the Execute stage, the Adaptation Engine component first performs the
Offloading action, and only if it fails, then the second action (Scaling) is per-
formed.
To run the framework tools and applications on the IoT system infrastructure,
the YAML manifests (built by the code generator) must be executed on the master
node of the cluster. Figure 5.6 presents the directory of generated folders and files
(left side) and a snippet of the start.sh script (right side) for the deployment of IoT
tools and applications. Executing the generated code creates several Kubernetes
objects in the cluster such as ConfigMaps, Deployments, Services, and Pods. To run
the framework and all these Kubernetes objects just run the start.sh script, which
uses kubectl (the command line tool of kubernetes/K3S).
5.4. CONCLUSION 83
In Appendix B you can find the detailed guide to install and use the DSL, gen-
erate the code, and run the framework.
5.4 Conclusion
In this chapter we have presented the runtime approach to support the operation
and self-adaptation of the IoT system specified by the DSL. This runtime framework
is based on the MAPE-K loop and involves the implementation of several technolo-
gies to perform monitoring and adaptation of the system. Both the monitoring data
from the infrastructure metrics and the data collected by the system sensors are
84 CHAPTER 5. ADAPTING IOT SYSTEMS AT RUNTIME
stored in the Prometheus time series database. The collection of infrastructure and
QoS metrics is performed with technologies such as kube-state-metrics and node
exporter, while an MQTT exporter subscribed to the broker’s topology is used to
obtain the data collected by the sensors.
The analysis of the information and the triggering of alerts is performed by
PromQL queries to the database and rules configured in the Prometheus Alerting
Rules component. Finally, an adaptation engine developed in Python adapts the
system according to an adaptation plan generated by the detection of a dynamic
event.
Chapter 6
Our DSL can be used as is to model any type of multi-layered IoT system. How-
ever, it has also been designed to be easily extensible (i.e., including new concepts
and updating the editors to enrich the language) so that we can further tailor it to
specific types of IoT systems. This Chapter presents two extensions of our DSL:
(1) in Section 6.1, we describe a DSL extension to model IoT systems for under-
ground mining as this is a key economic sector in the local region of the PhD stu-
dent and there is a need for a better way to model these systems, e.g. for analysis
of regulatory compliance; (2) Section 6.2 introduces a DSL extension to model IoT
systems for Wastewater Treatment Plants (WWTPs), including its its process block
diagram. The implementation of IoT systems in this domain is one of the case stud-
ies addressed in the European project TRANSACT1 , in which we participate. Finally,
Section 6.3 concludes the chapter.
1
https://transact-ecsel.eu/
85
86 CHAPTER 6. EXTENDING DSL FOR SPECIFIC CASES
Figure 6.1: Excerpt of the DSL extension metamodel for underground mines spec-
ification. The Region and IoTDevice concepts have been previously defined in the
metamodel in Figure 4.1
We enable a tree-based notation for modeling the relevant regions2 that make
up the mine structure. Figure 6.2 presents an example of the modeling of an under-
ground mine containing two entries (Entry A and Entry B) in each of its inclined
access tunnels (Slope A and Slope B), an internal tunnel (Internal), and a (Room) with
two exploitation work fronts (W-front 1 and W-front 2). The control points are mod-
eled using textual+tabular notation while the rest of the concepts to represent the
IoT system and the rules (architectural adaptation and functional rules) are mod-
eled following the concrete syntax presented in Chapter 4. Figure 6.3 presents an
example of modeling a control point that is located in the mine room. This control
point contains an actuator (alarm) and three sensors, one for methane (CH4), one
for carbon dioxide (C02), and one for temperature.
2
Note that our DSL is focused on the structure and rules governing the “behaviour” of the IoT
system of the mine, it does not intend to replace other types of 3D mine mining models
88 CHAPTER 6. EXTENDING DSL FOR SPECIFIC CASES
Our DSL enables the specification of functional rules in order to address functional
requirements of the system by controlling the state of the actuators. For example,
activating emergency alarms, turning on the ventilation system, or disabling ma-
chinery. The functional rules are specified as explained in Section 4.2. However,
in this DSL extension we added support for involving control points directly in the
rules.
At each mine control point, the airflow should be controlled by the fans. While
very fast air currents can produce and propagate fires, very low air currents may
not be efficient in dissipating gas concentrations. This extension of the DSL enables
the modeling of conditions such as (ControlP ointA → airF low) > (2m/s).
This condition checks if any of the airflow sensors belonging to the ControlPointA
exceeds 2m/s.
6.1. MODELING IOT SYSTEMS FOR THE UNDERGROUND MINING INDUSTRY 89
Architectural adaptations
Architectural adaptations of the system are also necessary in the underground min-
ing scenario. There are several factors that can impact system operation. For ex-
ample, the sampling frequencies of the sensors deployed in the mine may vary de-
pending on conditions such as the time of day (higher monitoring frequency during
working hours), the number of workers, or gas concentrations detected. Very high
collection frequencies increase bandwidth and resource consumption causing fail-
ures if the system is not designed to cope.
Another factor that impacts system performance is sudden node unavailability.
When an edge/fog node fails, the tasks or applications it hosted should be offloaded
to nearby nodes. Node unavailability can occur due to depletion of the battery that
powers the node (when it is battery powered), node overload, or damage due to
hostile environment or emergencies (e.g. a landslide).
These scenarios that impact system performance can be addressed by specify-
ing architectural adaptation rules. The IoT system architecture and rules can be
specified using the textual and tabular editors presented in Section 4.2. For exam-
ple, Figure 6.4 shows a rule that scales 5 instances of the gas-detection application
when an increase in sensor sampling frequency (i.e., increase in input bandwidth
consumption in the gateway node) is detected. The new 5 instances will be deployed
on any of the nodes located in the Room region.
The DSL editor extended to the underground mining domain is freely available
in our repository3
Concrete Syntax
This DSL extension includes the definition of new editors in MPS for modeling the
mine structure (tree-view editors) and the control points (tabular editors). Figure 6.5
3
https://github.com/SOM-Research/IoT-Mining-DSL
90 CHAPTER 6. EXTENDING DSL FOR SPECIFIC CASES
shows the tree-view editor for the Drift concept composed of a swing component,
the name, the length of the Drift access, and the set of subregions represented as the
branches of the tree.
Java Swing components are inserted into the editor to create graphical shapes.
The code for the swing component to create the Drift Access graphical shape is
shown in Listing 6.1. The graphical shape is drawn on a JPanel object (container
to place a set of components that can generate a graphical representation) defined
in line 4. The dimensions of this panel depend on the size of the font used in the
model. The paintComponent method defined on line 10 contains the instructions
to define the shape, lines, and colors of the drawing. We have defined a graphical
shape (swing component) for each concept that makes up the mine structure such
as Working Face, Drift Access, Room, and Mine, as shown in Figure 6.2.
Listing 6.1: Swing component code to draw the Drift Access shape
This extension of our DSL does not involve modifications to the code generator nor
to the framework that supports runtime adaptations of the IoT system. Therefore,
6.2. MODELING IOT SYSTEMS FOR WASTEWATER TREATMENT PLANTS
(WWTPS) 91
the code generator and framework implemented with this DSL are the same as those
presented in Chapters 4.4 and 5 respectively.
Figure 6.6: Excerpt of the DSL extension metamodel for WWTPs specification. The
Region and IoTDevice concepts have been previously defined in the metamodel in
Figure 4.1
To model the process block diagram we have developed a graphical editor using
6.2. MODELING IOT SYSTEMS FOR WASTEWATER TREATMENT PLANTS
(WWTPS) 93
the MPS Diagrams plugin4 . This plugin allows to define shadows using java code
to graphically represent the metamodel concepts. For example, shapes for Treat-
ments and arrows for water and sludge Flows. Figure 6.7 shows the process block
diagram of the Algemesí WWTP, in Spain (one of the plants studied in the TRANS-
ACT project). Each shaded shape represents one of the plant Treatments, while the
arrows represent the flows of water or sludge between the Treatments. The color
of the shaded shape or arrow denotes the type of fluid: blue when the Treatment or
Fluid is water, and yellow when it is sludge.
Figure 6.7: Algemesí WWTP block process diagram specification using the DSL
In the Algemesí WWTP, there are five water treatments, three sludge treat-
ments, and several water and sludge flows between them (Figure 6.7). In each of
these treatments, several variables are monitored to supervise and control the pu-
rification processes. For the specification of the sensors and actuators for each treat-
ment, we provide a textual and tabular notation. For example, Figure 6.8 shows the
specification of the Grit Chamber, which contains three sensors to monitor the liq-
uid characteristics (pH, electrical conductivity, and total suspended solids), a sensor
to measure the tank level, and an actuator (valve) to regulate the fluid level in the
tank.
The dynamic environment also generates unexpected changes in WWTPs that
must be dealt at runtime. For example, in rainy seasons, some WWTPs exceed
treatment capacity causing a negative impact on the quality of the environment
due to overflow and discharge of wastewater into the environment [79]. Although
4
https://jetbrains.github.io/MPS-extensions/extensions/diagrams
94 CHAPTER 6. EXTENDING DSL FOR SPECIFIC CASES
increases in plant inflow are difficult to deal with, some actions can be taken to
reduce the environmental impact of unwanted runoff. For example, the automatic
generation of alarms, the control of valves, or the manipulation of other plant actu-
ators when an unexpected event is detected.
These types of scenarios can be modeled using this extension of the DSL, in-
cluding rules that directly involve the variables monitored in the plant Treatments.
For example, assuming that in the Grit Chamber of the Algemesí WWTP, the valve
should be opened when the water level exceeds the limit (300 cm according to the
sensor specification in Figure6.8) for 5 minutes, then the rule could be specified as
shown in Figure 6.9.
Concrete Syntax
To enable graphical notation of the WWTP process block diagram, we have de-
fined graphical editors in MPS to use shapes for draw the Treatments and arrows
for water or sludge Flows. Figure 6.10 presents the graphical editor designed for
the BioReactor concept. The graphical form of the concept is defined using the dia-
gram.box instruction. In the editor section we define the displayed text and in the
6.2. MODELING IOT SYSTEMS FOR WASTEWATER TREATMENT PLANTS
(WWTPS) 95
shape section the graphical shape (i.e. Bio_Reactor). To define these shapes we use
a DSL provided by MPS, which allows to define shapes using Java objects such as
arcs, rectangles, lines, and areas. For example, Figure 6.11 shows the Bio_Reactor
shape specification formed by an area that subtracts the rectangle rect2 from rect1.
The resulting shape for BioReactor concept can be seen in Figure 6.7.
Figure 6.10: Graphical editor to represent the BioReactor concept in the process
block diagram
6.3 Conclusion
In this chapter we presented two DSL extensions for modeling self-adaptive IoT
systems. The first one focused on systems implemented in the underground coal
mining domain, and the second one focused on WWTP systems.
The first DSL extension covers the specification of concepts of the underground
coal mining domain, including the modeling of mine areas (e.g., tunnels, working
faces, and rooms) using a tree notation. Modeling of control points within the mine
and their IoT devices (sensors and actuators) involved is also addressed in this DSL
extension. These new concepts can be used for the specification of the architectural
and functional rules.
The second DSL extension address the modeling of the WWTP process block
diagrams through a graphic representation of the sequence of various treatments.
The treatments are modeled as shaded shapes and the water or sludge flows are rep-
resented as directed arrows. The IoT devices involved in each water/sludge treat-
ment are specified using tabular notation. Similar to the first DSL extension (mining
domain), the new concepts can be linked to build rules.
Chapter 7
Experimental Evaluation
Although DSLs are intended to reduce the complexity of software system develop-
ment, a poorly designed DSL can complicate its adoption by domain users. This is
why usability studies are so important in software engineering [15]. Usability is a
measure of effectiveness, efficiency and satisfaction with which users can perform
tasks with a tool.
This chapter presents the empirical evaluations we have conducted to assess the
usability of the DSL, and the self-adaptive capability of our approach. In Section 7.1,
we present two empirical evaluations of the DSL to assess its expressiveness and
ease of use. In Section 7.2 We evaluated the self-adapting feature of our approach
in three scenarios to test the three adaptations (scaling, offloading and redistribu-
tion) that we addressed. Section 7.3 present the evaluation of our MAPE-K based
framework to identify scalability limitations and boundaries to perform concurrent
adaptations. Finally, Section 7.4 concludes this chapter.
1
https://github.com/SOM-Research/IoT-Mining-DSL
97
98 CHAPTER 7. EXPERIMENTAL EVALUATION
• Session 2 (40 min): We have presented the basic concepts and examples for
specifying system architectural adaptation rules. Then, participants performed
a modeling exercise of five architecture adaptation rules involving infrastruc-
ture metrics (such as CPU consumption, RAM, and availability) and architec-
ture adaptations such as application scaling or container offloading. Finally,
the questionnaire Q2 was completed to gather information about the expres-
siveness and ease of use perceived by the participants.
2
https://github.com/SOM-Research/IoT-Mining-DSL
7.1. DSL EMPIRICAL EVALUATION 99
Results
Figure 7.1 shows the level of knowledge reported by the participants about IoT sys-
tems (architecture, deployment, and operation) and containerization as a virtualiza-
tion technology. Although most participants reported a low level knowledge, they
are familiar with monitoring QoS metrics (such as latency, availability, bandwidth,
CPU consumption) and architecture adaptations such as auto-scaling and offload-
ing.
• Some suggestions about typo errors and minor interface improvements (edi-
tors) were reported and have already been addressed.
100 CHAPTER 7. EXPERIMENTAL EVALUATION
Figure 7.3: Percentage and number of right and wrong answers (experiment 2)
• Although the DSL provides textual and tabular notation for modeling the
architecture nodes, including graphical notation (such as a deployment dia-
gram) could be useful to easily follow the hierarchy of the architecture nodes.
• There may be applications that require more than one port to be exposed.
However, the DSL does not allow more than one port to be associated with
each application. The suggestion is to enable the specification of multiple
ports for a single application.
on the mine ventilation system when the methane gas sensor exceeds the threshold
value).
• Session 1 (50 min): In the first 20 minutes of Session 1, we introduce the ba-
sic knowledge of IoT systems and the use of the DSL implemented in MPS to
model the structure of an underground mines, the control points and the IoT
devices deployed (sensors and actuators). Next, the participants performed
the first modeling exercise about an underground coal mine (with the struc-
ture shown in Figure 6.2), two control points (one at each working face) with
three gas sensors and an alarm, a fan, and a control door in the internal tun-
nel. Each participant was provided with a virtual machine configured with
the necessary software to perform the modeling exercise. Finally, the partic-
ipants filled out a questionnaire (Q1) about the usability and expressiveness
of the DSL to model the concepts of the first exercise.
seconds, then turn on the fan and activate the alarms. Finally, participants
completed the questionnaire Q2 to report their experience modeling the rules.
Q2 also contained open-ended questions to obtain feedback on the use of the
entire tool and suggestions for improvement.
Results
Four of the participants were involved in education (either students, teachers, or re-
searchers), while the remaining four were involved in industry. Figure 7.4 presents
the level of general mining knowledge (very low, low, medium, high, and very high)
of the eight participants. All of them are aware of the terminology used in the design
and structure of underground coal mines. Only two participants were not familiar
with cyber-physical or IoT systems for mining. The modeling tools in mining con-
text that they have used are AutoCAD4 and Minesight5 for the graphical design
of the mine structure, and VentSim6 for ventilation system simulations. However,
these mining modeling tools do not allow modeling of self-adaptive IoT systems.
None of the participants were familiar with MPS.
Figure 7.5 presents the responses from questionnaires Q1 and Q2 related to the
ease of use of the DSL. Most of them reported that the modeling of the mine struc-
ture, the control points, the devices (sensors and actuators), and the adaptation rules
were easy. The results are positive and can also be evidenced by the number of right
and wrong concepts modeled by the participants (Figure 7.6). The number of errors
were low (12 of 188 modeled concepts): three incorrect Rule-conditions by wrong
selection of the unit of measure, four incorrect Actuators by wrong assignment of
4
https://www.autodesk.com/products/autocad
5
https://www.ici.edu.pe/brochure/cursos-personalizados/ICI-MINESIGHT-Personalizado.pdf
6
https://ventsim.com
7.1. DSL EMPIRICAL EVALUATION 103
actuator type and location within the mine, three missing Sensors not modeled, and
two incorrect Regions (working faces) whose type was not selected.
Figure 7.6: Percentage and number of right and wrong answers (experiment 1)
• Include the specification of the coordinates for each region and control points
of the mine. Additionally, it would be useful to specify the connection be-
tween internal tunnels.
• The condition of a rule has a single time period. However, it would be use-
ful to associate two time periods for conditions composed of two expres-
104 CHAPTER 7. EXPERIMENTAL EVALUATION
• The mine ventilation system can be activated periodically at the same time
each day. It would be useful if the DSL could model rules whose condition is
associated with the time of day.
7.2.2 Protocol
For each type of architectural adaptation we performed a trial that generally follows
the same protocol consisting of the following steps:
1. Model the IoT system (using our DSL) including the rule to be tested. The
model built for these experiments can be consulted in Appendix C.
2. Run the code generator using the model built in the first step.
3. Deploy the IoT applications and execute our runtime framework (which is
described in Chapter 5) using the YAML manifests built by the code generator.
Scaling an application
The experiment scenario to test Scaling adaptation consists of two steps as shown
in Figure 7.8.
Offloading a container
The experiment scenario to test the Offloading adaptation consists of two steps as
shown in Figure 7.10.
that run the realtime-app and predictive-app applications. Then, in step 2, the system
offloads the C4 container to the fog-1 node freeing resources on the edge-1 node.
The rule modeled is shown in Figure 7.11: if the CPU usage of node edge-1
exceeds 80% for 30 seconds, then the C4 container is offloaded to node fog-1.
Redeploying a container
The scenario for testing the Redeployment adaptation is the same as shown in Fig-
ure 7.7. First we force a failure in the C1 container by logging into the pod using
the command line tool and stopping some system processes. Then, our runtime
framework, which constantly monitors the state of containers, detects container
unavailability and redeploys it using the Adaptation Engine.
The rule modeled for testing the Redeployment of a container is shown in Figure
7.12. If the container C1 is detected to be unavailable for more than 20 seconds, then
it is redeployed.
7.2.3 Results
In the test scenario for each type of adaptation, the system was monitored in two
cases: (1) without implementing adaptations, and (2) self-adapting the system ac-
110 CHAPTER 7. EXPERIMENTAL EVALUATION
cording to the configured adaptation rules. The results for each type of adaptation
are compared below.
Scaling an application
Figure 7.13 presents the CPU usage of the edge-1 node by increasing the sampling
frequency of the sensors. The colored shaded areas represent different sampling
frequencies of the sensors. The blue shading indicates that the sensors publish data
to the broker at a frequency of 5 data/sec, the green shading 12 data/sec, and the
purple shading 30 data/sec. 30 data/sec induces 100% CPU usage on the node. This
is dependent on the type of applications implemented and the characteristics of the
node. In these tests, we deployed applications that generate a high workload on the
most limited AWS instances.
For both cases (non-adaptive and self-adaptive), it was evidenced how the CPU
consumption of the node increased when the amount of data to be processed in-
creased too. However, when the CPU usage grew to 100%, the edge-1 node failed at
tfail time (Figure 7.13(a)) for the case of not using adaptations, while implementing
the adaptation rule the system auto-scaled the realtime-app application (at tscale
time), the workload was reduced for the edge-1 node (7.13(b)), preventing it from
failing.
Similarly, Figure 7.14 shows the time spent by the C2 container to process the
data published in the broker by the sensors. For a non-adaptive system (Figure
7.14(a)) the processing time for some data was reached to grow up to 13.5 sec until
the node failed due to work overload. On the other hand, the self-adaptive system
(Figure 7.14(b)) reached processing times of 8.8 sec, then the system auto-scaled the
realtime-app application at tscale time and the data processing time dropped back
below 1 sec.
7.2. EVALUATION OF SYSTEM SELF-ADAPTATIONS 111
(b) edge-1 node CPU usage (self-adaptive system; scaling the realtime-app) application
Offloading a container
The CPU consumption of the edge-1 node is shown in Figure 7.15. As in the results
of the Scaling adaptation fitting, the colored shades in the figure represent different
data sending frequencies from the sensors. Note that there is a constant CPU usage
(43% approx.) before increasing the sampling frequency of the sensors. This CPU
usage is caused by C4 container that simulates the predictive algorithm. As the
sampling frequency increased, the CPU consumption of the node also increased. For
the non-adaptive system (Figure 7.15(a)) the node overloaded (CPU usage reached
100%) and failed 26 seconds (tfail) after increasing the sampling rate from 12 to 20
data per second. In the case of the self-adaptive system, the CPU consumption of the
node did not reach 100% due to the adaptation. The C4 container was offloaded to
the fog-1 node at toffload time reducing the workload on the edge-1 node, preventing
it from failing.
Figure 7.16 shows the processing time of the C2 container data. This processing
time increased considerably when the node used 100% of CPU as is the case of the
non-adaptive system (Figure 7.16(b)), in which the processing time of some data
reached 17 seconds before the node failed. On the other hand, the self-adaptive
system experience processing times of less than 0.3 seconds. This behavior because
the edge-1 node never reached high CPU consumption.
Redeploying a container
For this Redeployment test, the dependent variable is the availability of the con-
tainer to be adapted (i.e. to be redeployed). Therefore, we present and analyze the
results about the availability of the container.
Figure 7.17 shows the state (available or unavailable) of the C4 container for
both cases: non-adaptive and self-adaptive systems. In the case of the non-adaptive
system, the container is unavailable once its failure has been induced at time t1. On
the contrary case, the self-adaptive system detects that the container is unavailable
for 20 seconds and starts the redeployment process. It took approximately 35 sec-
onds to remove the C4 container and redeploy it. After this procedure, the container
changed its status to available again.
AWS cloud services, which guarantee high availability. Additionally, to avoid dis-
turbances in the delivery of the data sent by sensors, we ran the scripts (simulating
the sensors) on an EC2 instance of AWS deployed in the same virtual private cloud
as the nodes. This way we guarantee that the data generated in the device layer will
be published in the broker on a regular basis and with low latency.
Measurements of dependent variables should be reliable: to ensure the
reliability of the measurement of dependent variables (e.g., latency and availability),
we have run each experiment at least three times and obtained very similar results.
To collect these variables data, we used the monitors and exporters deployed by the
framework and consulted the information in the Prometheus time series database.
Mono-operation bias: the study should include the analysis of more than one
dependent variable. In our experiments, we analyzed various QoS metrics and in-
114 CHAPTER 7. EXPERIMENTAL EVALUATION
frastructure of nodes and containers. For example, we collected and displayed CPU
utilization, availability, and data processing latency. Additionally, for the Redeploy-
ing a container adaptation experiment, in addition to analyzing the availability met-
ric, we also capture the unavailability time of the container while it is redeployed.
Size of the test scenario: one of the threats is related to the size of the IoT
system to be modeled and tested. Since the objective of this validation was to test
the architectural adaptations individually, we proposed a small scenario composed
of an IoT system with two edge nodes and one fog node. This scenario was sufficient
to model an adaptation rule that allowed testing each adaptation. Nevertheless, we
have planned a large scenario (as presented in Section 7.3) to validate the scalability
of our approach and the execution of concurrent adaptations.
• The edge and fog layer nodes run different applications to detect emergen-
cies, control actuators, store information locally and aggregate information
to be sent to the cloud nodes.
• The cloud layer web application to visualize incident reports and query ag-
gregated historical data on the environmental status of mines. A database is
also deployed on one of the cloud nodes to store aggregated data.
To set up the test environment, we provisioned EC2 instances from AWS. In-
stances t2.micro (1 vCPU and 1 GB of memory) for edge nodes, instances t2.small (1
vCPU and 2 GB of memory) for fog nodes, and instances t2.medium (2 vCPU and 4
GB of memory) for cloud nodes. The collection and sending of data from the sensors
was simulated using a script written in Python language.
As indicated in the legend of Figure 7.18, the containerized applications de-
ployed on the nodes include: an MQTT broker that receives and distributes all sen-
sor data; stel-app and twa-app check that the values of the monitored gases do not
exceed their allowable STEL and TWA values; temp-app checks that the temperature
at the different work fronts does not exceed the allowable limit value (depending on
wind speed); local-app is a local application for real-time querying of sensor data;
local-database stores the data locally before being aggregated and sent to the cloud;
finally, web-app and cloud-database enable to store and query the aggregated data
in the cloud. In this experiment, the development of all functional requirements
of these applications is out of scope. Instead, we have developed applications with
10
A large underground coal mine in Colombia could have around ten active work fronts, either
mining, advancement, or development.
7.3. EVALUATION OF FRAMEWORK SCALABILITY 117
the basic functionalities including the container images. For example, the twa-app
application we developed subscribes to the broker to receive the sensor data and
performs a data analysis, but does not calculate the actual TWA value.
This IoT scenario for underground coal mining is intended to be close to a real
implementation following the rules established in the Colombian mining regula-
tions [132]. For example, the STEL and TWA values are suggested by this regula-
tion.
118 CHAPTER 7. EXPERIMENTAL EVALUATION
Experiment 1
Figure 7.19 shows the rule for Work Front 1 (a similar rule was specified for each
work front): if the CPU consumption of node edge-3 exceeds 80% for 60 seconds,
then offloading the container c3 to node edge-2. In this experiment we have chosen
the offloading action, which implies more effort for the Adaptation Engine, since it
involves the removal and creation of a container on a different node.
Experiment 2
1. Model the IoT system (using our DSL) including the rules to be tested. The
model built for these experiments can be consulted in Appendix A.
2. Run the code generator using the model built in the first step.
3. Deploy the IoT applications and execute our runtime framework using the
YAML manifests built by the code generator.
4. Execute the Python script that simulates the generation of sensor data and
publishes the messages to the broker. In this way, the necessary workload is
generated on the nodes for the rules to be fired.
• the event detection delay refers to the time the system (Prometheus Alerting
Rules) takes from the detection of the first rule, to the last one (e.g., for test #
2, the system takes 10,30 seconds from the detection of the event of rule 1 to
the detection of the event of rule 5);
• the time the system (Prometheus Alert Manager) takes to generate the adap-
tation plan and send it to the Adaptation Engine;
• and finally, the time the system (Adaptation Engine) takes to perform the
Offloading adaptation, which involves the creation of a new container and
the deletion of the old one.
The time spent in the monitoring stage to collect QoS and infrastructure metrics
has not been monitored because this is a configurable fixed value for Prometheus.
For these experiments, we have set the Prometheus monitoring frequency equal to
4 times every minute (i.e., monitoring every 15 seconds), and the rule evaluation
frequency equal to 6 times per minute (i.e., evaluation every 10 seconds). These
values were adequate to not generate significant workload on the edge nodes (EC2
t2.micro instances with limited resources).
Findings from the results of this experiment are presented below.
• Ideally the event detection delay (column 4 in Table of Figure 7.21) should
be equal to zero, i.e., all events should be detected at the same time since
the increase in CPU consumption was caused at the same time in all nodes.
However, there are delays between 10 and 15 seconds approximately, due to
122 CHAPTER 7. EXPERIMENTAL EVALUATION
• In all cases (even configuring 30 rules), all actions (offloading) were performed
successfully. Approximately one second is required for Prometheus Alert
Manager to process an alert, generate the adaptation plan, and send it to the
Adaptation Engine. The adaptation time depends on the type of action: the
Adaptation Engine, via the K3S orchestrator, takes approximately 2 seconds
to create a pod (which hosts a container) and 31 seconds to delete a pod. In this
experiment, MAPE-K components did not fail. In Experiment 2 we subjected
the framework to more exhaustive tests increasing the number of adaptation
executes (details can be found in Section 7.3.3).
7.3. EVALUATION OF FRAMEWORK SCALABILITY 123
• The average time taken by the adaptation engine to perform a container of-
fload is about 33 seconds, with the removal of the container being the most
time consuming task (about 31 seconds). This time is due to the grace period
(default 30 seconds) that K3S uses to perform the deletion of a pod. When K3S
receives the command or API call to terminate a pod, it immediately changes
its status to "Terminating" and stops sending traffic to the pod. When the
grace period expires, all processes within the pod are killed and the pod is
removed. Although our DSL does not currently support grace period config-
uration, we plan to include the specification of this parameter to ensure safe
termination of containers for adaptations that require it (such as offloading
or redeployment actions).
Experiment 2
In Experiment 2, we set up rules composed of several actions (summarized in the
legend of Figure 7.22) and ran multiple tests by increasing the number of configured
rules. The results obtained are presented in Figure 7.22, including the number of
tested rules and actions, the number of failed or unsuccessful actions, the number
of nodes that failed (some tests produced high memory and cpu pressure, inducing
failed nodes), and the average time taken by the Adaptation Engine to perform the
successful actions.
• For tests 1 and 2 all actions were performed successfully. However, tests 3
and 4 presented failed actions (i.e. adaptations that could not be completed by
the Adaptation Engine component, mainly Scaling type actions (A2, A3, and
A4). These scaling actions were not completed, because there were no more
resources available on any of the mine nodes to deploy the new container
instances. Even some edge nodes (5 for test 3 and 21 for test 4) failed due to
work overload causing that some of the A6 actions were also not completed
successfully. These results demonstrate that the successful implementation
of the adaptations strongly relies on the availability of resources of the target
nodes of the actions. In this sense, one of the improvements for our DSL
could be to generate warnings to the user when insufficient resources are
detected to perform the modeled rules. In this way we could prevent the
implementation of infeasible rules that could fail due to lack of resources.
• The nodes that failed during tests 3 and 4 showed high CPU consumption
due to the number of containers assigned to them. Whenever a new pod is
deployed on a cluster of nodes (e.g., on one of the edge nodes in mine1), the
Kubernetes Scheduler11 becomes responsible for finding the best node for that
pod to run on. Although the Scheduler checks the node resources, it does not
analyze the real-time consumption of CPU and Ram memory. Therefore, for
Scaling actions in tests 3 and 4, the Scheduler assigned containers to nodes
(without checking their current state) causing them to fail. The design and
implementation of a Scheduler that analyzes real-time metrics (such as CPU
consumption) could avoid this kind of errors.
• The types of actions that require more time to be performed are Offloading
and Redeployment. This is because these actions involve the removal of a
pod, a task that takes about 31 seconds due to the default grace period set by
the K3S orchestrator. On the other hand, the average time it takes to perform
the Scaling action depends on the number of instances to be deployed. The
Adaptation Engine takes about six seconds to scale three pods or containers,
while it takes about four seconds to scale two pods or containers.
7.4 Conclusion
In this chapter, we present the empirical evaluations we performed to validate our
approach: (1) experiments to validate the usability and expressibility of the DSL, (2)
experiments to validate the functionality of the three types of architectural adap-
tations, and (3) experiments to identify scalability limitations and boundaries to
perform concurrent adaptations of our MAPE-K based framework.
The validation of usability and expressiveness of the DSL is divided into two
experiments. The first experiment was conducted with computer science partic-
ipants (doctoral students and postdoctoral researchers) to evaluate the modeling
of architectural aspects of the IoT system, container deployment, and architectural
adaptation rules. The second experiment was conducted with participants from the
mining area to evaluate the modeling of concepts (in the extended DSL for mining)
such as mine structure, control points, sensors, actuators and functional rules (e.g.
triggering of alarms due to toxic gas detection). The participants found the DSL
useful, sufficiently expressive, and easy to use. Although they were not familiar
126 CHAPTER 7. EXPERIMENTAL EVALUATION
with the MPS prior to the experiment, most participants reported that the learning
curve is low. The lower error rate demonstrates the ease of use of the DSL, even for
users who are not experts in Mining, IoT, MPS, or other modeling tools.
We designed three experiments to functionally validate the architectural adap-
tations and compare the availability and performance of a non-adaptive IoT sys-
tem with that of a self-adaptive IoT system that is modeled and managed using
our approach. The protocol of these experiments includes the modeling of the self-
adaptive IoT system, code generation, deployment and self-adaptation of the sys-
tem. The results of these experiments show that the framework is functionally en-
abled to execute the modeled architectural adaptations for an IoT system using our
DSL. Additionally, the results show that these runtime adaptations can favor and
maintain desirable values for IoT system performance and availability.
Finally, experiments to evaluate the scalability of the framework revealed some
important limitations and considerations. First, the frequency of monitoring (in-
frastructure and QoS) and rule evaluation are factors that can delay the detection
of events. Setting these frequencies to high values (e.g., 1 check per second) would
allow Prometheus to identify events in very short times. However, high monitoring
frequencies can generate considerable overheads inducing failures in nodes with
low resources (e.g., some edge nodes). Second, although the Scheduler performs a
filtering process to select the appropriate node when a new container is deployed,
this component of the orchestrator does not analyze metrics of the current CPU and
Ram memory consumption of the nodes. This can lead to node failures when the
Scheduler assigns pods to nodes that are being overloaded. One strategy to address
this concern is to design a Scheduler component that analyzes additional metrics
(such as current CPU, memory, bandwidth, and power consumption) to select the
appropriate node for deployment tasks.
Chapter 8
Related Work
Self-adaptive systems have been studied for several decades. Wong et. al. [167] clas-
sify the evolution of self-adaptive systems into stages. In the first stage (1990-2002),
a theoretical model of self-adaptive systems was proposed and the first studies on
evolution, self-supervision, control theory, and run-time design emerged [36, 19].
The second stage (2003-2005) was dominated by studies proposing novel perspec-
tives but without concrete implementations. In the third stage (2006-2010), research
was focused on autonomous and self-adaptive web services. Runtime solutions pre-
dominated over design time solutions. For example, modesl@runtime approach [25]
was introduced (the use of software models for adaptive mechanisms to manage
complexity in runtime environments). The last stage (2011-2022) shows a transition
between the domains of research interest. The adaptability of IoT systems and In-
frastructure as a Code (IaaS) becomes the focus. However, the exponential increase
and variability of IoT devices, and the unpredictable behavior of the environment
introduces self-adaptation challenges to maintain quality levels.
In this chapter, we analyze and compare the studies published to date that are
related to our research topic. In particular, languages for specifying IoT systems
are analyzed in Section 8.1 and frameworks for supporting system adaptability at
runtime are studied in Section 8.2.
127
128 CHAPTER 8. RELATED WORK
(FSM), Queuing Network (QNs), and YAML to model aspects of the IoT system such
as its architecture, software deployment, or self-adaptive capabilities. However, to
model the complexity of self-adaptive multilayer architectures, it is necessary to
define DSLs that allow representing the entire domain. We have classified the lit-
erature studies into two groups: DSLs for modeling the IoT system architecture,
and DSLs that address the specification of self-adaptive capabilities. Some studies
belong to both groups.
1
https://www.contiki-ng.org/
8.1. LANGUAGES AND METAMODELS FOR MODELING IOT SYSTEMS 129
nodes in different layers (edge, fog and cloud), asynchronous communication (pub-
lish/subscribe), databases, and event processing engines. SimulateIoT also includes
the modeling of rules for the generation of notifications by analyzing topic data from
sensors/actuators. for instance, a notification can be configured when the temper-
ature collected by a sensor exceeds a threshold. However, infrastructure metrics,
QoS monitoring, and architectural adaptations are not supported.
CAPS [116] is a Cyber-Physical Systems (CPS) modeling tool that addresses the
specification of software architecture, hardware configuration, and physical space.
The software architecture specification allows modeling software components and
their behavior based on events and actions. Events are responses to some internal
change in the software component (e.g., a timer fired or a message received), and
actions are atomic tasks that the component can perform (e.g., starting or stopping a
timer, or sending a message to another component). However, the modeling of the
system behavior addresses adaptations at the software component level without
supporting architectural or device-level adaptations. Furthermore, CAPS does not
cover the specification of the concepts of multi-layer architectures.
SMADA-Fog [127] is a model-based approach not exclusively focused on the
IoT domain, but it could be used to model self-adaptive IoT systems. SMADA-Fog
address the deployment and adaptation of container-based applications in Fog com-
puting scenarios. SAMADA-Fog proposes a metamodel that enables modeling the
deployment and adaptation of containers on nodes (edge, fog, and cloud). SMADA-
Fog enables the specification of consumer devices (such as laptops, smartphones,
and IoT devices) and network devices, but does not address the modeling of sensors
and actuators because it is a DSL focused on fog computing applications. SMADA-
Fog addresses the specification of rules whose conditionality involves QoS metrics
and architectural adaptations such as scaling, optimizing a metric, blocking a ser-
vice, and creating/shutting down a service. The deployment and adaptation model
of the system is specified by means of environments designed in Node-RED3 . How-
ever, rules for operating or controlling the system’s actuators are not supported. In
addition, the physical regions and location of devices and nodes are not addressed
concepts by the metamodel.
8.1.3 Discussion
Table 8.1 presents a comparative analysis between the DSLs studied and our DSL.
The comparative information includes: (1) the notation of the language (e.g., textual,
graphical, or tabular); (2) whether the language addresses modeling of IoT devices,
including sensors and actuators; (3) whether the language addresses specification
3
Node-RED is a programming tool for wiring together hardware devices, APIs and online services
8.1. LANGUAGES AND METAMODELS FOR MODELING IOT SYSTEMS 131
• Multi-layer architectures that take advantage of edge and fog computing are
becoming increasingly popular. Modeling languages for IoT architectures
must address the specification of the concepts that enable the implementation
of these technologies. There are only three studies [55, 16, 141] that enable
the modeling of sensor/actuator devices, edge, fog and cloud nodes; and one
of these does not allow modeling the specifications (CPU, RAM, storage, etc.)
of the nodes (an important aspect considering that edge and fog nodes have
limited resources). These are aspects that we address in our DSL.
Table 8.1: Comparative analysis of DSLs and metamodels for IoT system modeling
8.1. LANGUAGES AND METAMODELS FOR MODELING IOT SYSTEMS
133
134 CHAPTER 8. RELATED WORK
layer of the system by manipulating the actuators (e.g., opening a window when
the temperature exceeds a limit). However, architectural rules for the other layers
of the system are not supported.
Weyns et al. [165] propose MARTA, an architecture-based adaptation approach
to automate the management of IoT systems employing runtime models and lever-
aging the MAPE-K loop. Each rule is specified by a quality model that stores a con-
dition and one or more adaptations (e.g., packet loss < 10%, minimize energy con-
sumption). Although MARTA addresses the monitoring of network metrics such as
latency and packet loss, other infrastructure metrics (such as CPU and node Ram us-
age) are not collected. In addition, the monitors and effectors that adapt the system
are designed for a particular case, making reusability challenging.
A few works such as [174, 143, 85, 140] focus on system adaptations to opti-
mize the deployment of IoT applications on edge and fog nodes. These studies
propose the use of orchestrators such as Kubernetes and Docker Swarm. For exam-
ple, Yigitoglu et al. [174] present Foggy, a framework for continuous automated
deployment in fog nodes. Foggy enables the definition of four software deploy-
ment rules in Fog nodes. Foggy’s architecture is based on an orchestration server
responsible for monitoring the resources in the nodes and dynamically adapting the
software allocation according to the rules defined by the user. However, Foggy is fo-
cused on adapting the system exclusively to support the continuous deployment of
applications. IoT system adaptations caused by dynamic events other than software
deployment failures are not supported.
Discussion
Table 8.2 compares the frameworks analyzed in this section with our proposal. The
comparative information includes: (1) the application domain of the framework;
(2) the approach used for specification or modeling of the self-adaptive system; (3)
the aspects of the system that are modeled such as system architecture, software de-
ployment, or adaptation policies; (4) the actions and adaptation strategies addressed
(e.g., architectural adaptations or system actuator control); and (5) the metrics col-
lected to monitor the state of the system and detect events that trigger adaptations.
The relevant findings and differences are listed below.
5
Node-RED is a programming tool for wiring together hardware devices, APIs and online services
8.2. FRAMEWORKS FOR IOT SYSTEM SELF-ADAPTATIONS 137
• System state monitoring is one of the key tasks in frameworks that support
self-adaptation. There are several types of metrics that can be collected to
monitor system state: QoS metrics such as availability, latency, and power
consumption; infrastructure metrics such as CPU consumption, Ram mem-
ory consumption, and free disk space; and sensor data metrics collected with
IoT system sensors such as temperature, humidity, motion, and gas concen-
tration. Most of the frameworks analyzed focus on monitoring only one type
of metrics. For example, IAS [112] focuses on performance, MARTA [165]
and SMADA-Fog focus on QoS, and Lee et al. [99] focus on sensor data. Our
framework uses monitoring tools that collect QoS, infrastructure, and sensor
data metrics.
strategies
Rainbow [68] General Acme Architecture and Depends on user- depends on user-
purpose adaptation rules supplied effectors supplied monitors
IAS [112] IoT Queuing Networks Architectural pattern Architectural pattern Performance
(QNs) adaptation adaptation
Lee et al. [99] IoT Finite-state machine Functional rules System actuator con- Sensor data
trol
MARTA [165] IoT Quality models Functional rules System actuator Packet loss, la-
control (setting the tency, and energy
power and sampling consumption
frequency of the
devices)
Muccini et al. IoT CAPS Architertural pat- Architectural pattern Energy consump-
terns adaptation tion
Hussein et al. [83] IoT SySML4IoT and finite- Functionality and on/off services Availability of sen-
state machine adaptability sors and services
SMADA-Fog [127] Fog Node-Red Architecture, appli- Architectural adap- QoS metrics
Comp. cation deployment, tations
and adaptation rules
Foggy [174] Fog YAML Software Deploy- Allocation strategies Infrastructure and
Comp. ment latency
Our proposal IoT DSL Architecture, appli- Architectural adap- Infrastructure,
cation deployment, tations and system QoS, and sensor
architectural adapt. actuator control data
and functional rules
In this chapter, we first synthesize and conclude all the contributions of this thesis
(Section 9.1). In Section 9.2.1, we list the publications and software artifacts de-
veloped and available. Finally, Section 9.3 presents several perspectives for future
research.
139
140 CHAPTER 9. CONCLUSIONS AND FURTHER RESEARCH
specification of the self-adaptive IoT system, and runtime to support the operation
and adaptation of the system.
To enable the modeling of the IoT system and generate code for the deployment
and self-adaptation of the system, the design time stage is composed of:
• a DSL for IoT systems focusing on three main contributions: (1) modeling
primitives covering multi-layer architectures of IoT systems, including con-
cepts such as IoT devices (sensors or actuators), edge, fog and cloud nodes;
(2) modeling the deployment and grouping of container-based applications
on those nodes; and (3) a specific sublanguage to express architectural adap-
tation rules (to guarantee QoS at runtime, availability, and performance), and
functional rules (to addres fuctional requirements that involve system actua-
tor control). We have implemented this DSL using a projectional-based editor
that allows mixing various notations to define the concrete syntax of the lan-
guage. In this way, the IoT system can be specified by a model containing
text, tables, and graphics.
• A code generator. The model (built using the DSL) describing the self-adaptive
IoT system is the input to a code generator we have designed. This generator
produces several YAML1 manifests with two purposes: (1) to configure and
deploy the IoT system container-based applications and (2) to configure and
deploy the tools and technologies used in the framework that supports the
execution and adaptation of the system at runtime.
To support the system adaptation at runtime stage of our approach, we have de-
veloped a framework to monitor and adapt the IoT system following the adaptation
plan specified in the model. This framework is based on the MAPE-K loop com-
posed of several stages including system status monitoring, data analysis, action
planning, and execution of adaptations. Our framework deploys exporters to col-
lect infrastructure metrics (such as CPU and Ram usage), QoS (such as availability),
and system sensor data. These metrics are stored using Prometheus (a time series
database) and queried using PromQL language to verify rules. We have developed
an Adaptation Engine to perform two types of system actions when necessary: ar-
chitecture adaptations (such as offloading and scaling apps) and system actuator
control (to meet system functional requirements).
We have introduced two extensions to our DSL highlighting the extensibility ca-
pability to add new concepts in the abstract syntax. The first extension focuses on
modeling IoT systems in the underground mining industry while the second exten-
sion focuses on IoT systems implemented in wastewater treatment plants (WWTPs).
1
YAML is a data serialization language typically used in the design of configuration files
9.2. PUBLICATIONS AND SOFTWARE ARTIFACTS 141
In addition to the metamodel, the projectional editors were also extended to offer
new modeling notations for underground mine specification, and for modeling of
WWTP process block diagrams.
Finally, to validate our DSL and framework, we have designed and conducted
empirical experiments: (1) to validate the expressiveness and usability of our DSL
extended to the mining domain (13 participants attended the experiment), (2) to
test the self-adaptive capability of our approach (one test scenario for each of the
architectural adaptations), and (3) to evaluate the ability and performance of our
framework to address the growth of concurrent adaptations. The reported results
demonstrate that the DSL is expressive enough to model self-adaptive IoT systems
and has a favorable learning curve. Moreover, experiments with the framework
validate its functionality and ability to self-adapt the system at runtime.
9.2.1 Publications
Conferences
• D. Prens, I. Alfonso, K. Garcés and J. Guerra-Gomez. Continuous Delivery
of Software on IoT Devices. 2019 ACM/IEEE 22nd International Conference
on Model Driven Engineering Languages and Systems Companion (MODELS-
C), 2019, pp. 734-735. This paper was one of our first steps in defining a
metamodel to support the deployment of applications in IoT systems.
• I. Alfonso, "A Software Deployment and Self-adaptation of IoT Systems" Pro-
ceedings of the XXIII Iberoamerican Conference on Software Engineering, CIbSE
2020, November 9-13, 2020, pp. 630-637. This paper contains the thesis pro-
posal presented at the CIBSE 2020 Doctoral Symposium.
• I. Alfonso, K. Garcés, H. Castro and J. Cabot. Modeling self-adaptative IoT
architectures. 2021 ACM/IEEE International Conference on Model Driven Engi-
neering Languages and Systems Companion (MODELS-C), 2021, pp. 761-766.
This paper contains the first version of our DSL and supports part of the con-
tent of Chapter 4.
• I. Alfonso, K. Garcés, H. Castro and J. Cabot. Modelado de Sistemas IoT para
la Industria en Minería Subterránea de Carbón. XXVI Jornadas de Ingeniería
142 CHAPTER 9. CONCLUSIONS AND FURTHER RESEARCH
del Software y Bases de Datos (JISBD), 2022. In this paper we present the ex-
tension of our DSL for modeling IoT systems in the mining industry domain.
This paper supports part of Chapter 6.1.
Journals
• I. Alfonso, K. Garcés, H. Castro and J. Cabot. Self-adaptive architectures in
IoT systems: a systematic literature review. Journal of Internet Services and
Applications, 2021, 12(1), 1-28. This paper support the content presented in
Chapter 2.2.
Award
• Premio al trabajo liderado por estudiante de doctorado del track de Ingeniería
del Software Dirigido por Modelos (ISDM) de las XXVI Jornadas de Ingeniería
del Software y Bases de Datos (JISBD), por el artículo "Modelado de Sistemas
IoT para la Industria en Minería Subterránea de Carbón".
Mobility of devices
One of the dynamic events addressed that can affect the IoT system is device mobil-
ity. When a device changes location, a set of steps are performed: (1) new commu-
nication must be established between the device and the suitable edge/fog node; (2)
the availability of resources must be guaranteed to deploy the service in the edge/-
fog nodes in order to manage that device; and (3) in case the device changes location
again, it is evaluated if it must be connected to other edge/fog nodes that are closer
to obtain better latency. The mobility of many devices could lead to increased la-
tency, higher resource consumption, and unavailability of system services, as the
increased volume of data generated by devices can congest the network and create
bottlenecks.
Although our approach enables the configuration of rules to deal with the ef-
fects of device mobility, the DSL does not currently address the modeling of mobile
devices. We are interested in including the necessary concepts to the metamodel
for modeling device mobility, as well as including new metrics to quickly identify
this event. For example, monitor the number of clients connected to gateways or
edge nodes and create adaptation rules to identify and deal with the increase of
connected devices. To achieve this, it would also be necessary to generate code to
deploy new monitors with the ability to collect these new metrics and run exporters
to translate the information into the Prometheus database format.
144 CHAPTER 9. CONCLUSIONS AND FURTHER RESEARCH
Regarding the implementation of DSL using MPS, there are two directions we would
like to address.
The first activity consists of bringing the IoT system modeling to the web, i.e.
providing a web version of our DSL. There are a few tools that could be explored for
this purpose. For example, Modelix2 is an open source platform that aims to allow
the editing of models from the browser, for languages created in MPS. Another al-
ternative could be MPSServer3 , a tool to remotely access the project and edit models
in MPS framework.
The second activity consists of designing a graphical editor to model the IoT
system architecture. For some of the participants of our DSL usability validation
it would be more comfortable to model the architecture using graphical notation.
This graphical editor should at least include the use of different types of shapes to
represent the IoT devices, the nodes (edge, fog, and cloud), the regions, the software
containers, and arrows to represent the data flow. MPS currently provides plugins to
support graphical modeling. For example, our DSL extension for WWTPs uses the
MPS Diagrams plugin to enable process block diagram specification using graphical
2
https://modelix.github.io/
3
https://github.com/strumenta/mpsserver
9.3. FURTHER RESEARCH 145
notation. These plugins could be reused to provide graphical notation for system
architecture modeling.
AsyncAPI Integration
To achieve high degree of scalability, improved performance and reliability, IoT sys-
tems often implement event-driven architectures [74]. One of the most commonly
used patterns in this type of architecture is publish/subscribe. One biggest chal-
lenge of these architectures is to maintain message consistency. That is, the topics
and format of the messages published by the system’s sensors must be consistent
throughout the life cycle of the system. A slight change in the format of the mes-
sages could cause a system failure. The AsyncAPI4 specification was proposed to
address this challenge. This specification allows to represent concepts such as mes-
sage brokers, topics of interest, and the different message formats associated with
each topic. One of the future tasks is to integrate the AsyncAPI specification with
our DSL to support different formats in the messages published by the sensors and
to address the consistency issues that may currently arise.
Deployment Patterns
Deployment patterns provide control over the deployment of new software versions
to reduce the risk of a process failure and increase reliability. Implementing these
patterns reduces application downtime in an upgrade process and enables incidents
to be managed and resolved with minimal impact to end users [18]. There are three
popular patterns for managing deployment: canary, blue-green, and rolling.
4
https://www.asyncapi.com/
146 CHAPTER 9. CONCLUSIONS AND FURTHER RESEARCH
The canary pattern suggests deploying the new software version in a subgroup
of nodes (known as canary) to evaluate this version before deploying it in all the
nodes of the system. Figure 9.1 shows the stages of canary deployment in a cluster
of three nodes: (1) the canary node(s) that will host the new software version is
chosen (commonly 30% of all nodes); (2) the canary node(s) are deactivated to deploy
the new release (App v2); (3) traffic is redirected to the canary node to evaluate
functional and non-functional aspects and to determine if the application is stable;
(4) if the assessment of the canary was successful and the application runs properly,
the new release (App v2) is deployed in the rest of the nodes; (5) finally, the traffic
is redirected to all the nodes. On the other hand, if the assessment of the canary
detects errors or inconsistencies in the application, the rollback is performed in the
canary nodes before deploying the application in the remaining nodes.
The blue-green pattern, also called big flip or red/black deployment [18] con-
sists of two different environments. The blue environment runs in production with
the current software version, and the green environment is idle waiting for a new
deployment. After performing a new deployment in green and verifying that it
works correctly, the traffic is switched to the green environment, and the nodes in
the blue environment are put into idle mode. On the other hand, the rolling pattern
allows the software to be progressively updated (node by node) in a group of nodes
or servers. Each node is taken offline while the new software version is deployed
and evaluated. If the evaluation is successful, the node is enabled to receive traffic,
9.3. FURTHER RESEARCH 147
Allocation Strategies
Unlike the cloud layer, the edge and fog layers are composed of nodes with process-
ing and storage limitations that restrict application deployment. One challenges
posed by this fact is related to making intelligent allocation decisions to guaran-
tee QoS. To deploy or offload an application in the system, it is important to select
edge/fog nodes that have sufficient resources to host and run the application prop-
erly. Orchestrators commonly provide a component in charge of making allocation
decisions. For example, Kubernetes uses its scheduler component to determine and
select the appropriate node to host the pod and container. However, this scheduler
only analyze the resources requested (CPU and RAM) by the container [140]. Other
factors such as node CPU consumption, energy consumption, network latency, reli-
ability, and bandwidth usage should be considered to make allocation decisions. For
example, when deploying a container that houses a real-time application, in which
low latency is one of the essential requirements, it is important to select the nodes
that can offer the lowest latency.
Currently, our framework uses the Kubernetes scheduler for allocation deci-
sions. Especially when scaling or offloading adaptations are executed without de-
fine a target node to deploy the new container. For example, the Scaling adaptation
defined in the adaptation rule in Figure 4.13, defines a target region (Beach Hotel)
but not a target node. Then, when the scaling is performed, the Kubernetes sched-
uler selects a node in the Hotel Beach region with the necessary resources to host
the node. However, the scheduler has some limitations as described above. For this
reason, the design (or implementation if it exists) of a scheduler that considers ad-
ditional factors such as node CPU consumption, power consumption, latency, and
bandwidth consumption is one of the future directions of this thesis.
148 CHAPTER 9. CONCLUSIONS AND FURTHER RESEARCH
• There are a large number of attacks that can be conducted in IoT environ-
ments. These are grouped into four categories [129]: Probe (consists of ex-
ploiting network vulnerabilities), User to Root Attacks (U2R consists of ille-
gally gaining root access to a computer resource), Remote to Local Attacks
(R2L consists of exploiting vulnerabilities by sending packets to gain ille-
gal local access to resources on that network), and Denial of Service At-
tacks (DoS consists of making a service inaccessible). Several studies such
as [88, 129, 66, 60] have proposed strategies to detect these attacks by ana-
lyzing the data stream in real time. One of our future lines is focused on the
design or reuse of detection strategies for these attacks to enable the defi-
nition of rules involving security concerns. For example, rules to define an
action in case of detecting a DoS attack.
9.3. FURTHER RESEARCH 149
151
152 BIBLIOGRAPHY
[10] M. Alrowaily and Z. Lu. Secure edge computing in iot systems: Review and
case studies. In 2018 IEEE/ACM Symposium on Edge Computing (SEC), pages
440–444. IEEE, 2018.
[12] K. Ashton et al. That ‘internet of things’ thing. RFID journal, 22(7):97–114,
2009.
[15] A. Barišić, V. Amaral, and M. Goulão. Usability driven dsl development with
use-me. Computer Languages, Systems & Structures, 51:118–157, 2018.
[19] J. Beauquier, B. Bérard, and L. Fribourg. A new rewrite method for prov-
ing convergence of self-stabilizing systems. In International Symposium on
Distributed Computing, pages 240–255, Bratislava, 1999. Springer.
BIBLIOGRAPHY 153
[33] A. Chehri, T. El Ouahmani, and N. Hakem. Mining and iot-based vehicle ad-
hoc network: industry opportunities and innovation. Internet of Things, page
100117, 2019.
[34] L. Chen, P. Zhou, L. Gao, and J. Xu. Adaptive fog configuration for the
industrial internet of things. IEEE Transactions on Industrial Informatics,
14(10):4656–4664, 2018.
[35] W. Chen, C. Liang, Y. Wan, C. Gao, G. Wu, J. Wei, and T. Huang. More:
A model-driven operation service for cloud-based it systems. In 2016 IEEE
International Conference on Services Computing (SCC), pages 633–640. IEEE,
2016.
[53] S. Dustdar, C. Avasalcai, and I. Murturi. Edge and fog computing: Vision
and research challenges. In 2019 IEEE Int. Conf. on Service-Oriented System
Engineering (SOSE), pages 96–9609. IEEE, 2019.
[57] D. Ernst, A. Becker, and S. Tai. Rapid canary assessment through proxying
and two-stage load balancing. In 2019 IEEE International Conference on Soft-
ware Architecture Companion (ICSA-C), pages 116–122. IEEE, 2019.
[60] N. Farnaaz and M. Jabbar. Random forest modeling for network intrusion
detection system. Procedia Computer Science, 89:213–217, 2016.
[65] M. Fowler. UML distilled: a brief guide to the standard object modeling lan-
guage. Addison-Wesley Professional, 2004.
[68] D. Garlan, S.-W. Cheng, A.-C. Huang, B. Schmerl, and P. Steenkiste. Rainbow:
Architecture-based self-adaptation with reusable infrastructure. Computer,
37(10):46–54, 2004.
[71] N. K. Giang, R. Lea, M. Blackstock, and V. C. Leung. Fog at the edge: Ex-
periences building an edge computing platform. In 2018 IEEE International
Conference on Edge Computing (EDGE), pages 9–16. IEEE, 2018.
158 BIBLIOGRAPHY
[76] S. Gregor and A. R. Hevner. Positioning and presenting design science re-
search for maximum impact. MIS quarterly, pages 337–355, 2013.
[78] R. Guntha. Iot architectures for noninvasive blood glucose and blood pressure
monitoring. In 2019 9th International Symposium on Embedded Computing and
System Design (ISED), pages 1–5. IEEE, 2019.
[79] Z. Guo, Y. Sun, S.-Y. Pan, and P.-C. Chiang. Integration of green energy
and advanced energy-efficient technologies for municipal wastewater treat-
ment plants. International journal of environmental research and public health,
16(7):1282, 2019.
[82] G. Huang, G.-B. Huang, S. Song, and K. You. Trends in extreme learning
machines: A review. Neural Networks, 61:32–48, 2015.
[87] N. Jazdi. Cyber physical systems in the context of industry 4.0. In IEEE
Int. Conference on Automation, Quality and Testing, Robotics, pages 1–4. IEEE,
2014.
[89] Y. Jiang, Z. Huang, and D. H. Tsang. Challenges and solutions in fog comput-
ing orchestration. IEEE Network, 32(3):122–129, 2017.
[90] M. Jutila. An adaptive edge router enabling internet of things. IEEE Internet
of Things Journal, 3(6):1061–1069, 2016.
[91] S. Keele et al. Guidelines for performing systematic literature reviews in soft-
ware engineering. Technical report, Technical report, Ver. 2.3 EBSE Technical
Report. EBSE, 2007.
[106] S. T. March and G. F. Smith. Design and natural science research on informa-
tion technology. Decision support systems, 15(4):251–266, 1995.
[107] S. Martínez, A. Fouche, S. Gérard, and J. Cabot. Automatic generation of
security compliant (virtual) model views. In International Conference on Con-
ceptual Modeling, pages 109–117. Springer, 2018.
[108] J. Mass, C. Chang, and S. N. Srirama. Context-aware edge process manage-
ment for mobile thing-to-fog environment. In Proceedings of the 12th Euro-
pean Conference on Software Architecture: Companion Proceedings, pages 1–7,
2018.
[109] A. Mavromatis, A. P. Da Silva, K. Kondepu, D. Gkounis, R. Nejabati, and
D. Simeonidou. A software defined device provisioning framework facili-
tating scalability in internet of things. In 2018 IEEE 5G World Forum (5GWF),
pages 446–451. IEEE, 2018.
[110] C. Mechalikh, H. Taktak, and F. Moussa. A scalable and adaptive tasks orches-
tration platform for iot. In 2019 15th International Wireless Communications
& Mobile Computing Conference (IWCMC), pages 1557–1563. IEEE, 2019.
[111] B. Mishra and A. Kertesz. The use of mqtt in m2m and iot systems: A survey.
IEEE Access, 8:201071–201086, 2020.
[112] M. T. Moghaddam, E. Rutten, P. Lalanda, and G. Giraud. Ias: an iot architec-
tural self-adaptation framework. In European Conference on Software Archi-
tecture, pages 333–351. Springer, 2020.
[113] D. Montero and R. Serral-Gracià. Offloading personal security applications to
the network edge: A mobile user case scenario. In 2016 International Wireless
Communications and Mobile Computing Conference (IWCMC), pages 96–101.
IEEE, 2016.
[114] R. Morabito and N. Beijar. A framework based on sdn and containers for
dynamic service chains on iot gateways. In Proceedings of the Workshop on Hot
Topics in Container Networking and Networked Systems, pages 42–47. ACM,
2017.
[115] K. Morris. Infrastructure as code: managing servers in the cloud. " O’Reilly
Media, Inc.", 2016.
[116] H. Muccini and M. Sharaf. Caps: Architecture description of situational aware
cyber physical systems. In 2017 IEEE International Conference on Software
Architecture (ICSA), pages 211–220. IEEE, 2017.
162 BIBLIOGRAPHY
[121] C. Pahl, N. El Ioini, S. Helmer, and B. Lee. An architecture pattern for trusted
orchestration in iot edge clouds. In 2018 Third International Conference on Fog
and Mobile Edge Computing (FMEC), pages 63–70. IEEE, 2018.
[122] C. Pahl and B. Lee. Containers and clusters for edge cloud architectures–a
technology review. In 2015 3rd international conference on future internet of
things and cloud, pages 379–386. IEEE, 2015.
[123] P. Patel, M. I. Ali, and A. Sheth. On using the intelligent edge for iot analytics.
IEEE Intelligent Systems, 32(5):64–69, 2017.
[124] P. Patel and D. Cassou. Enabling high-level application development for the
internet of things. Journal of Systems and Software, 103:62–84, 2015.
[131] T. Rausch, S. Nastic, and S. Dustdar. Emma: distributed qos-aware mqtt mid-
dleware for edge computing applications. In 2018 IEEE International Confer-
ence on Cloud Engineering (IC2E), pages 191–197. IEEE, 2018.
[134] J. Rubin and D. Chisnell. Handbook of usability testing: how to plan, design
and conduct effective tests. John Wiley & Sons, New Jersey, 2008.
[137] H. Sami and A. Mourad. Towards dynamic on-demand fog computing forma-
tion based on containerization technology. In 2018 International Conference on
Computational Science and Computational Intelligence (CSCI), pages 960–965.
IEEE, 2018.
[145] W. Shi, J. Cao, Q. Zhang, Y. Li, and L. Xu. Edge computing: Vision and chal-
lenges. IEEE internet of things journal, 3(5):637–646, 2016.
[147] S. Singh and N. Singh. Containers & docker: Emerging roles & future of cloud
technology. In 2016 2nd International Conference on Applied and Theoretical
BIBLIOGRAPHY 165
[152] V. Theodorou and N. Diamantopoulos. Glt: Edge gateway elt for data-driven
intelligence placement. In 2019 IEEE/ACM Joint 4th International Workshop
on Rapid Continuous Software Engineering and 1st International Workshop on
Data-Driven Decisions, Experimentation and Evolution (RCoSE/DDrEE), pages
24–27. IEEE, 2019.
[154] C.-L. Tseng and F. J. Lin. Extending scalability of iot/m2m platforms with
fog computing. In 2018 IEEE 4th World Forum on Internet of Things (WF-IoT),
pages 825–830. IEEE, 2018.
[156] M. F. van Amstel, M. G. van den Brand, and P. H. Nguyen. Metrics for model
transformations. In Proceedings of the Ninth Belgian-Netherlands Software
Evolution Workshop (BENEVOL 2010), Lille, France (December 2010), 2010.
[169] World Wide Web Consortium (W3C). Semantic sensor network ontology.
URL: https://www.w3.org/TR/2017/REC-vocab-ssn-20171019/, 10 2017.
In this appendix, using our DSL we model the IoT system for environment control
in underground coal mines tested in the experiments of Section 7.3. Figure 7.18
shows the scenario to be modeled: an IoT system deployed in three underground
coal mines. The description of this system as the device, edge/fog, cloud, and appli-
cation layers is presented in Section 7.3).
169
170
This appendix is a guide to the installation and configuration necessary to use our
approach. Section B.1 presents the instructions for installing and configuring MPS
to use our DSL. Section B.2 presents guidelines for using the DSL, and Section B.3
contains instructions for implementing our framework.
3. Open MPS, and then open the DSL project by choosing the folder
4. Some plugins must be installed. Select File -> Settings -> Plugins and install
the following plugins:
Note: Some of these plugins require additional plugins that MPS will suggest
you install (if this happens, select install). For example, the com.dslfoundry.plaintext
plugin will require the Mouse Selection Support plugin.
1
https://www.jetbrains.com/mps/download
2
https://github.com/SOM-Research/selfadaptive-IoT-DSL
175
176 APPENDIX B. INSTALLATION AND CONFIGURATION GUIDE
a) com.mbeddr.mpsutil.treenotation
b) com.dslfoundry.plaintextgen
5. Restart MPS and you will now be able to use the DSL to model IoT systems.
In the left pane (Logical View) you find an example of a modeled IoT system
(Hotel Beach first floor). You can open this example model by double clicking
and explore the concepts modeled for an IoT system.
Three concepts (Nodes, Containers, and IoT Devices) can be modeled using two
different notations (tabular and textual). The user is free to choose the notation. To
change notation, follow the instructions below.
1. Right-click anywhere in the model workspace and select Push Editor Hints.
2. Select Use custom hints and then check Use tabular notation.
Now, you can see the model in tabular notation for Nodes, Containers, and IoT
devices.
1. Create new solution by right clicking on selfadaptive-IoT-DSL -> New -> Solu-
tion
2. Then, create a new model by right clicking on NewSolution -> New -> Model
3. When you are creating a model, you have add IoT_runtime to Used Languages.
Some concepts of the IoT system model must be created in different models. Specifi-
cally, the types of sensors and actuators, and the metrics that make up the adaptation
rules are concepts that must be instantiated from other models. You could define
these models from scratch, but a quick alternative is to reuse our sandbox models,
which already contain predefined sensors, actuators, and metrics. To reuse these
two models, copy them (right click and copy) from the sandbox and paste them
(right click and paste) into your Solution (see Figure B.11).
184 APPENDIX B. INSTALLATION AND CONFIGURATION GUIDE
To model any aspect of the IoT system, just press the Enter key in the corre-
sponding section and you will get a template with the attributes to be specified. For
example, to model an application, press enter in the Applications section and you
will get the model portion as shown in Figure B.12.
B.2. DSL USE AND CODE GENERATION 185
Some fields can be supported with the MPS autocomplete function. For example,
when creating a new node, it is necessary to select the node type. To do this, press
the Enter key in the Nodes section, and then the auto-complete function (by pressing
Ctrl+space on windows or Cmd+space on MacOS). This will allow you to select one
of the three types of nodes a shown in Figure B.13.
You can use the auto-complete function on any of the fields or attributes of a
concept. In the example of Figure B.14, we have defined two subregions. Then,
when modeling the region of an Edge node, the autocomplete function can be used
to quickly select one of the subregions defined earlier.
186 APPENDIX B. INSTALLATION AND CONFIGURATION GUIDE
When the self-adapting IoT system model is finalized, you can verify the validity
of the model and use the code generator to obtain the YAML manifests for deploy-
ment and execution of the runtime framework. Right click on the model and select
Rebuild Model (see Figure B.15).
B.2. DSL USE AND CODE GENERATION 187
If the model has no errors and the compilation is successful, then the generated
code can be found in the directory «Project_directory»/solutions/«name_solution».
For example, the list of files generated when compiling the sandbox model (Beach
Hotel) is shown in Figure B.16.
Once you have configured the cluster, you must run the start.sh script found
inside the files built by the code generator. This script will automatically deploy all
the tools in pods using kubectl. To run the script execute the following commands.
1. For Linux:
1 ./ start . sh
1 bash start . sh
The time it takes to deploy the framework depends on the number of applica-
tions modeled (it could be minutes). To verify the deployment, you can execute the
following commands from the master node.
Note: if the IoT system model does not have adaptation rules involving sen-
sors, then the mqtt-exporter will not be deployed. This will not interfere with the
execution and operation of the framework.
Finally, you will be able to access the Prometheus and Grafana user interface
to configure dashboards and view system status and adaptation rules in real time.
From the browser, enter the following url.
• Prometheus: http://«ip-master-node»:30000
• Grafana: http://«ip-master-node»:32000
Appendix C
In this appendix we present the modeling of the experiment scenario (IoT system)
to perform the self-adaptation validations of our approach discussed in Chapter 7.2.
To test the three architectural adaptations, we have designed the test scenario shown
in Figure C.1. The modeling of the applications, the edge-2 node, sensors, and the
MQTT broker (using our DSL) are presented in Figures C.2, C.3, C.4, and C.5 respec-
tively. The modeling of the edge-1, fog-1 nodes, and the adaptation rules change
according to the type of adaptation tested.
191
APPENDIX C. MODELING AN IOT SYSTEM FOR THE SELF-ADAPTATION
192 EVALUATION
Figure C.12 shows the test scenario for testing the Redeployment adaptation. The
scenario for testing this adaptation is the same as for testing Scaling. The difference
is the adaptation rule specified, and the stimulus to generate the failure. The stimu-
lus for this scenario is to intentionally generate a failure and the adaptation rule is
shown in Figure C.13.
C.4. REDEPLOYMENT ADAPTATION 197