Proceedings of FTC - Deni Darmawan

Download as pdf or txt
Download as pdf or txt
You are on page 1of 830

Lecture Notes in Networks and Systems 561

Kohei Arai Editor

Proceedings
of the Future
Technologies
Conference
(FTC) 2022,
Volume 3
Lecture Notes in Networks and Systems

Volume 561

Series Editor
Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences,
Warsaw, Poland

Advisory Editors
Fernando Gomide, Department of Computer Engineering and Automation—DCA,
School of Electrical and Computer Engineering—FEEC, University of Campinas—
UNICAMP, São Paulo, Brazil
Okyay Kaynak, Department of Electrical and Electronic Engineering,
Bogazici University, Istanbul, Turkey
Derong Liu, Department of Electrical and Computer Engineering, University
of Illinois at Chicago, Chicago, USA
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Witold Pedrycz, Department of Electrical and Computer Engineering, University of
Alberta, Alberta, Canada
Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Marios M. Polycarpou, Department of Electrical and Computer Engineering,
KIOS Research Center for Intelligent Systems and Networks, University of Cyprus,
Nicosia, Cyprus
Imre J. Rudas, Óbuda University, Budapest, Hungary
Jun Wang, Department of Computer Science, City University of Hong Kong,
Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest
developments in Networks and Systems—quickly, informally and with high quality.
Original research reported in proceedings and post-proceedings represents the core
of LNNS.
Volumes published in LNNS embrace all aspects and subfields of, as well as new
challenges in, Networks and Systems.
The series contains proceedings and edited volumes in systems and networks,
spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor
Networks, Control Systems, Energy Systems, Automotive Systems, Biological
Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems,
Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems,
Robotics, Social Systems, Economic Systems and other. Of particular value to both
the contributors and the readership are the short publication timeframe and
the world-wide distribution and exposure which enable both a wide and rapid
dissemination of research output.
The series covers the theory, applications, and perspectives on the state of the art
and future developments relevant to systems and networks, decision making, control,
complex processes and related areas, as embedded in the fields of interdisciplinary
and applied sciences, engineering, computer science, physics, economics, social, and
life sciences, as well as the paradigms and methodologies behind them.
Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago.
All books published in the series are submitted for consideration in Web of Science.
For proposals from Asia please contact Aninda Bose ([email protected]).

More information about this series at https://link.springer.com/bookseries/15179


Kohei Arai
Editor

Proceedings of the Future


Technologies Conference
(FTC) 2022, Volume 3

123
Editor
Kohei Arai
Faculty of Science and Engineering
Saga University
Saga, Japan

ISSN 2367-3370 ISSN 2367-3389 (electronic)


Lecture Notes in Networks and Systems
ISBN 978-3-031-18343-0 ISBN 978-3-031-18344-7 (eBook)
https://doi.org/10.1007/978-3-031-18344-7
© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2023
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, expressed or implied, with respect to the material contained
herein or for any errors or omissions that may have been made. The publisher remains neutral with regard
to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Editor’s Preface

We are extremely delighted and excited to present before you the seventh Future
Technologies Conference 2022 (FTC 2022), which was successfully held during
20–21 October 2022. COVID-19 necessitated this conference to be held virtually
for two years. However, as the pandemic waded and restrictions eased, we managed
to recreate the scholarly aura by having the esteemed conference in hybrid mode,
wherein learned researches from across the globe adorned the stage by either their
in-person presence or via the online mode. Around 250 participants from over 60
countries participated to make this event a huge academic success.
The conference provided a wonderful academic exchange platform to share the
latest researches, developments, advances and new technologies in the fields of
computing, electronics, AI, robotics, security and communications. The conference
was successful in disseminating novel ideas, emerging trends as well as discussing
research results and achievements. We were overwhelmed to receive 511 papers out
of which a total of 177 papers were selected to be published in the final proceed-
ings. The papers were thoroughly reviewed and then finally selected for publishing.
Many people have collaborated and worked hard to produce a successful FTC
2022 conference. Thus, we would like to thank all the authors and distinguished
Keynote Speakers for their interest in this conference, the Technical Committee
members, who carried out the most difficult work by carefully evaluating the
submitted papers, with professional reviewing and prompt response and to Session
Chairs Committee for their efforts. Finally, we would also like to express our
gratitude to Organizing Committee who worked very hard to ensure high standards
and quality of keynotes, panels, presentations and discussions.
We hope that readers are able to satisfactorily whet their appetite for knowledge
in the field of AI and its useful applications across diverse fields. We also expect
more and more enthusiastic participation to this coveted event next year.
Kind Regards,

Kohei Arai
Conference Program Chair

v
Contents

Developing a Prototype Piezoelectric Wafer-Box for Optimal Energy


Harvesting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
S. Ahmad, K. Seonghoon, A. Mohammad, S. Junan, and B. Yong
Hybrid Meta-heuristic Genetic Algorithm: Differential Evolution
Algorithms for Scientific Workflow Scheduling in Heterogeneous
Cloud Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Faten A. Saif, Rohaya Latip, M. N. Derahman, and Ali A. Alwan
Reconfiguration of Protected Unicast Connections in Elastic Optical
Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Kobenan Ali Ouattara, Adepo Joël Christian, Anoh Nogbou Georges,
and Babri Michel
Users Engagement Factors with e-Court Application Conceptual
Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Adham M. M. Alankar, Nurzi Juana Binti Mohd Zaizi,
and Hanifah Binti Abdul Hamid
On the Reusability of Machine Learning Models in Edge Computing:
A Statistical Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Xenia Skotti, Kostas Kolomvatsos, and Christos Anagnostopoulos
Survey of Technology-Enhanced Learning: Novel Pedagogical
Concepts, Challenges and Future Perspectives . . . . . . . . . . . . . . . . . . . . 90
Tarandeep Kaur and Shubhpreet Kaur
True-Ed Select Enters Social Computing: A Machine Learning Based
University Selection Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Jerry Cearley and Vivek K. Pallipuram
Exploring Public Cloud-ERP Systems’ Impact on Organizational
Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Maria Øverdal, Moutaz Haddara, and Marius Langseth

vii
viii Contents

A Generic Neural Network Implementation on GPU and Its


Performance Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Tristan Udby and Yun Tian
Monitoring Technologies for Animal Welfare: A Review of
Aspirations and Deployments in Zoos . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Ann Morrison and Aleksandra Novikova
Hierarchical Tucker Tensor Regression: A Case Study
on Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Quoc Tran Ngoc
Introducing Database Normal Forms to Students: A Comparison
Between Theory-First and Practice-First Educational Approaches . . . . . 196
Dakota C. Cookenmaster, Jacob A. Bahn, and Germán H. Alférez
Analysis of Load Balancing Algorithms Used in the Cloud Computing
Environment: Advantages and Limitations . . . . . . . . . . . . . . . . . . . . . . 206
Zakariyae Bouflous, Mohammed Ouzzif, and Khalid Bouragba
NeuroTower: A 3D Neuromorphic Architecture with Low-Power
TSVs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
Arghavan Asad and Farah Mohammadi
Coherence Domains in Condensed Matter as Storage “Devices”
of Quantum Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Luigi Maxmilian Caligiuri
Antecedents of Software-as-a-Service Adoption for Small and Medium
Enterprise in Developing Countries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
Ahmed Mamdouh Abdelfatah Ibrahim and Norris Syed Abdullah
Software as a Service Challenges: A Systematic Literature Review . . . . 257
Ahmed Mamdouh Abdelfatah Ibrahim, Norris Syed Abdullah,
and Mahadi Bahari
A Quantum Algorithm to Locate Unknown Hashgrams . . . . . . . . . . . . . 273
Nicholas R. Allgood and Charles K. Nicholas
BUMP: Bridging Unmet Modes of Participation in the Workplace . . . . 286
Claudia B. Rebola, Diego Gomez-Enriquez, and Erwin Vargas-Alfonso
Theoretical Perspectives Towards Culture-Centered User
Engagement Design for Mobile Health in the Global South . . . . . . . . . . 295
Tochukwu Ikwunne, Lucy Hederman, and P. J. Wall
Automated Meal Planner Using Multiple User-Defined Benchmarks
for Healthy Eating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
Catherine Lyons-Rocque and Sudhanshu Kumar Semwal
Contents ix

A Smart Healthcare Framework: Opportunities for Integrating


Emerging Technologies (5G, IoT, AI, and GIS) . . . . . . . . . . . . . . . . . . . 325
Balakrishnan Mullachery and Sarah Alismail
Analytic Hierarchy Process Model for the Diagnosis of
Typhoid Fever . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
Faith-Michael Uzoka, Chukwudi Nwokoro, Okure Obot,
Moses Ekpenyong, Aniema I. A. Udo, and Boluwaji Akinnuwesi
Gradient Boosting and Minimum Redundancy Maximum Relevance
(mRMR) Feature Selection for Diagnosis of Parkinson’s Disease
Through Patient Audio Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
Jagadeepram Maddipatla and Rishi Athavale
Optic Disk Detection in Fundus Images of Retinopathy
of Prematurity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
Monserrate Intriago-Pazmiño, Julio Ibarra-Fiallo, María Pérez-Hernández,
Adán Guzmán-Castillo, and Eddy Torres-Constante
Machine Learning Computational Framework for Alzheimer’s
Disease Stages Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
Carlos Theran-Suarez, Yohn Jairo Parra Bautista, Victor Adankai,
and Richard Aló
Critical Assessment of Current State of the Art in Wearable Sensor
Nodes with Energy Harvesting Systems for Healthcare Applications . . . 398
Alhassan E. Alattar, Ahmed Elkaseer, Steffen Scholz, and Saeed Mohsen
Identifying Severity Clusters in SLE Patients . . . . . . . . . . . . . . . . . . . . . 413
Hamza Zidoum, Sumaya AL-Sawafi, Aliya AL-Ansari,
and Batool AL-Lawati
Automated Real-Time Recognition of Non-emotional Conversational
Head-Gestures for Social Robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
Aditi Singh and Arvind K. Bansal
An Emotional Support Robot Framework Using Emotion Recognition
as Nonverbal Communication for Human-Robot Co-adaptation . . . . . . 451
Osamah M. Al-Omair and Shihong Huang
How Does a Social Robot Analyze Emotions? . . . . . . . . . . . . . . . . . . . . 463
Pierre-André Buvet, Bertrand Fache, Wiam Fadel, and Abdelhadi Rouam
Obstacle Recognition Using Depth Estimation and RGB Data for
Autonomous Vehicle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
Jheanel Estrada, Gil Opinas Jr., and Anshuman Tripathi
Humanoids Improving the Quality of Life of Older People: The Case
of Poland . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
Katarzyna Halicka
x Contents

3D Concrete Printing with Macro-micro Robots . . . . . . . . . . . . . . . . . . 493


Ian D. Walker, Venkat N. Krovi, Abdul B. Peerzada, Adhiti Raman,
Prasad Rangaraju, Matthias J. Schmid, and Manu Srivastava
Automatic Polarity Identification on Twitter Using
Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
José Carmen Morales Castro, Rafael Guzmán Cabrera, José Ruiz Pinales,
Luis Manuel Ledesma Carrillo, and Belém Priego
Sentence Structure and Boundary for Deep Neural Machine
Translation Alignment Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508
Bat-Erdene Batsukh
Topic Discovery About Economy During COVID-19 Pandemic
from Spanish Tweets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
Ana Laura Lezama Sánchez, Mireya Tovar Vidal,
and José A. Reyes-Ortiz
SimLDA: A Tool for Topic Model Evaluation . . . . . . . . . . . . . . . . . . . . 534
Rebecca M. C. Taylor and Johan A. du Preez
Virtual Assistant for Querying Databases in Natural Language . . . . . . . 555
Daiga Deksne and Raivis Skadiņš
Neural Machine Translation for Native Language Aymara
to English . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565
Honorio Apaza, Brisayda Aruhuanca, Mariela M. Nina, Anibal Flores,
Carlos Silva, and Euler Tito
Vocabulary Expansion for the Sub-word WFST-Based Automatic
Speech Recognition System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577
Askars Salimbajevs and Jurgita Kapočiūtė-Dzikienė
A Comparative Analysis of Local Explainability of Models
for Sentiment Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593
Hooria Hajiyan, Heidar Davoudi, and Mehran Ebrahimi
Persuasive Dialogue Corpus: Graph-Based Approach Combining
Persuader and Persuadee Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . 607
Meghna Allamudi and Olga Scrivner
N-Gram Based Amharic Grammar Checker . . . . . . . . . . . . . . . . . . . . . 622
Deepak Sharma, Gurjeet Singh Mattu, and Sukhdeep Sharma
The Internet of Things as a Tool Towards Smart Education:
A Systematic Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633
Abdulsalam K. Alhazmi, Ezzadeen Kaed, Fatima Al-Hammadi,
Nasr Alsakkaf, and Yousra Al-Hammadi
Contents xi

The VCDLN Mobile Learning System for Digital Learning Services in


Pandemic Covid-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 649
Deni Darmawan, Dinn Wahyudin, Dian Rahadian, Andri Suryadi,
and Dianni Risda
Applying Design Thinking Approach to Improve Online Education . . . 660
Asma Alwadai and Reem Alnanih
A Universal IT Support System for Teachers for Educational
Processes, Publishing and Academic Research Using All-in-One
Educational Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 680
Stefan Svetsky and Oliver Moravcik
Communicating Vessels Model for the Intelligent Monitoring System
of the Service Guarantee in the New Generation of Digital Open
Universities (NG-DOU) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 698
Boukar Abatchia Nicolas, Mahamadou Issoufou Tiado, Moussa Harouna,
and Ibrahim Ganaou Noura
People Skills and Online Learning: To Assume Makes an Ass Out of U
and Me . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 710
C. Todd Williams
Scenarios for Virtual Clinical Simulation to Train Nursing Students at
a South African University . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724
Botha Benjamin Stephanus and Fourie Cecile
Learning Factory Synergy: Applied Learning and Problem-Based
Pedagogy in the Digital Transformation Ecosystem . . . . . . . . . . . . . . . . 734
Peter ChunYu Yau, Ejoe Tso, and Dennis Wong
Teacher Training Management Guidelines for Improving Green IT
Teaching Intention and Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 742
Ricky Nhlanhla Dlamini and Grant Royd Howard
Design and Implementation of an Automatic Word
Match Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 752
E. Miles Gertis and Y. Daniel Liang
The Impact of Feedback Modes on Learners’ Performance in
Paragraph Writing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 766
Murad Abdu Saeed, Atef Odeh AbuSa’aleek,
and Enas Abdelwahab Eltom RahmtAllah
Metasearch: A Web-Based Application to Perform Systematic
Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775
Rafael Santos Crema, Guilherme Nunes Nogueira Neto,
and Percy Nohama
xii Contents

Preliminary Study on e-Collaboration Readiness and Community of


Inquiry Presences in a Higher Educational Institution . . . . . . . . . . . . . . 786
Alimatu–Saadia Yussiff, Abdul-Lateef Yussiff, Franklin Kome Amoo,
and Wan Fatimah Wan Ahmad
Utilising Gamification and Virtual Environments to Present Digitally
Enhanced Advanced Services (DEAS) for the Financial Sector . . . . . . . 802
S. Khan, V. Charissis, and D. K. Harrison

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 815


Developing a Prototype Piezoelectric Wafer-Box
for Optimal Energy Harvesting

S. Ahmad1 , K. Seonghoon1(B) , A. Mohammad2 , S. Junan3 , and B. Yong4


1 Department of Civil Engineering and Construction Management, College of Engineering and
IT, Georgia Southern University, P.O. Box 8077, Statesboro, GA 30460-8077, US
{as14483,shkim}@georgiasouthern.edu
2 Department of Electrical Engineering, College of Engineering and IT, Georgia Southern
University, Statesboro, GA 30460, US
[email protected]
3 Asphalt Research Lab in the Department of Civil Engineering and Construction Management,
College of Engineering and IT, Georgia Southern University, P.O. Box 8077, Statesboro, GA, US
[email protected]
4 Department of Civil, Construction and Environmental Engineering, Marquette University,
Milwaukee, WI 53201, US
[email protected]

Abstract. Piezoelectric energy has been paid attention to as conventional renew-


able energy sources including solar, wind, and geothermal power. To address the
dilemma of climatic conditions affecting the energy harvesting using Lead Zir-
conate Titanate (PZT) in pavement, wafer-boxes were used with embedded PZT
sensors, since wafer-boxes have the ability to be embedded in the pavement where
sensors are protected from any kind of physical damage. This research project was
designed to identify which shaped wafer-box produced the most electric voltage
and power. Various forms of wafer-box were developed to identify if there was
any potential difference in voltage generation due to the structural shapes of the
box. Seven different shapes of prototype wafer-boxes were designed utilizing both
a 3D printer and 3D Computer Aided Design (CAD). These wafer-box were cou-
pled with embedded PZT sensors which were tested in asphalt pavement analyzer
(APA) machine under certain load to produce electric voltage. Collected voltage
data from the APA wheel load test were analyzed using various statistical meth-
ods. The statistical analyses results indicated that out of the seven different shaped
wafer boxes, the right-angled triangular shaped box produced the highest average
voltage values where’s square shaped wafer-box produced the lowest amount of
voltage. Structural properties of a wafer-box in terms of section modulus, area
moment of inertia, extreme points, and radius of gyration were also analyzed, and
a regression analysis was conducted to identify the reasons of different amounts
of voltage produced. These voltage values could be used to calculate the power
using power formulas showing relationship between power and voltage values.
The outcome helped to identify which shape is most effective to power genera-
tion under certain circumstances. The regression analysis results indicated that out
of four properties the section modulus is the most influential structural property
affecting voltage production.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 1–15, 2023.
https://doi.org/10.1007/978-3-031-18344-7_1
2 S. Ahmad et al.

Keywords: Piezoelectricity · PZT materials · Voltage · Wafer-box · Experiment


design · Section modulus

1 Introduction

Sihai Wen, 2002, suggested that the term “Piezoelectricity” refers to the alteration of the
electric polarization with stress; this change results in a generation of voltage across the
material in the direction of the polarization [1]. Piezoelectricity is related to the dielectric
behavior of material. According to Ang Hu, 1999, the dielectric constant is a material
property that is related to the dipole electric moment per volume unit [2]. Kim et al., 2011,
pointed out that piezoelectricity allows the conversion of mechanical energy generated
by mechanical vibration to electrical energy [3]. Rahman et al., 2014, mentioned A
piezoelectric material has the ability to transform a mechanical movement like pressure,
movement of substance, or vibration into an electrical signal or electrical power and
vice versa[4]. This energy conversion can be used for the generation of electrical power.
Piezoelectric (PZ) energy harvesting technology has significant advantages over other
renewable energy sources such as solar, wind, and geothermal [5, 6]. Using the pressure
of vehicles caused by gravity, the method generates electric energy from the deformations
in the paving materials [7].
PZT is composed of a perovskite-type crystalline structure. This structure is rep-
resented by the compositional formula ABO3. This structure can achieve large piezo-
electricity, when A is replaced by Pb. This feature of PZT materials can be optimized
by compositional alterations [4]. Piezo ceramics are physically active, chemically inert
and relatively inexpensive to manufacture. PZT ceramic has value because of its higher
sensitivity and operating temperature than other piezo ceramics [8]. PZT-based ceramics
materials show high performance while being used for various purposes at a relatively
low cost. An important feature of PZT is it has large piezoelectricity. The piezoelectric-
ity has ability to intensify on phase boundary composition between rhombohedral and
tetragonal phases in the solid state. The other name phase boundary is morphotropic
phase boundary (MPB) [4]. There are some required electrical characteristics required
for practical usages which are not necessarily high on the MPB. A tetragonal phase PZT
generally has higher heat-proof characteristics, which is often applied to the applications
requiring high temperature durability. All of these compositions could be chosen to meet
the demands of each application. The phases of PZT are easily controlled by the com-
positional change of the zirconate and titanate ratio in several situations. PZT materials
could be given good shaping with flexibility. Utilizing the mechanical displacement or
vibration, the performance of piezoelectric device could be greatly altered by a device
shape including in case of non-resonant devices as well [4].
This technology has been tested for a variety of purposes, including sensors [9–12],
roadway lighting and bridge bearing [13, 14], structural health monitoring [6, 7, 10],
deicing [15] and traffic monitoring [16]. However, the amount of electric voltage pro-
duced by piezoelectric material is not as high as other alternative sources [3]. In addition,
the economic efficiency of producing energy using piezoelectric materials is also not
very high. Thus, it is essential to conduct more research on ways to harvest energy using
Developing a Prototype Piezoelectric Wafer-Box for Optimal Energy 3

piezoelectric material on a larger scale. Researchers have conducted numerous studies


to develop simple and efficient energy harvesting devices to extract electrical energy
from vibration utilizing piezoelectric materials. As a source of energy, the researchers
were trying to use the pavement where force comes along the vertical direction. In
the last decade due to its high-power density, architectural simplicity, and scalability,
piezoelectric energy harvesting has attracted wide attention from researchers. The PZT
sensors could be embedded into a wafer-box and placed in the pavement where sensors
can receive the vertical loads from vehicles’ wheels or human mass. Recently, several
research projects have been performed to assess the possibility of using PZ-embedded
roadways as an alternative energy source, and to identify the possible magnitude of
energy harvesting using this technology [15, 17, 18]. To protect the PZT sensors from
external forces (temperature, pressure, salt environment) wafer-box development was
identified as an important step in the process of harvesting energy from a roadway
utilizing PZT sensors. Therefore, a research project is essential to develop an ideal pro-
totype wafer-box with embedded PZT sensors producing the highest amount of electrical
energy. The outcome of the research project includes an investigation of the effect of
shape of the wafer-box on voltage harvesting production.

2 Research Objective and Scope

The objectives of this research project were: 1) the development of seven different
shaped prototype wafer box utilizing the 3D printer and CAD, 2) the development of an
experimental design for conducting load wheel testing with PZT sensors embedded in a
wafer-box, 3) the identification of the wafer-box shape capable of producing the highest
energy. The research project utilized plastic materials and ceramic disk PZT electronic
sensors for the development of the wafer-boxes. The lesson learned from this research
can be knowledge basis to improve methods for harvesting maximum amounts of energy
in future studies.

3 Methodology

Since there is no specific standards exist to conduct material testing in this research
area, the research team referred to ASTM Standards and UL standards that provide
guidelines for different material testing and environmental standards. The null hypothesis
was assumed that all the means of different data groups are the same. The alternative
hypothesis was assumed that not all the means of different data groups are same.

3.1 Experimental Design for Data Collection

Two experimental design methods were used for conducting the wheel load test. One was
the preliminary experimental design, and the second was the final experimental design.
This preliminary data collection allowed for the research team to improve the final data
collection and analyses [19, 20]. For the preliminary experimental design five circular
disk PZT sensors were randomly assigned to be embedded into a wafer-box. For easy
4 S. Ahmad et al.

identification, these five sensors were provided a mark ranging from 1 to 5 (Table 1). The
same sensor was not embedded in same wafer box for the second time. The wafer-box
PZT sensor combinations for the preliminary test have been shown in Fig. 1.
The PZT sensor and wafer-box combinations for the final experiment are provided
in Table 2. For the final experimental design twenty-five circular disk PZT sensors were
randomly selected. These sensors were given a mark from 1 to 25 to be identified easily
(Table 2). These sensors were arranged into five blocks including B1, B2, B3, B4, and
B5. Each block contained five PZT sensors. The same sensor was not embedded in same
wafer-box for the second time [21–23]. The wafer-box PZT sensor combinations for the
final experiment are visualized in Fig. 2.

Table 1. Possible combinations of PZT sensors and wafer-box (preliminary data collection)

PZT sensor Circular wafer Hexagonal Square wafer Rectangular Triangular


mark box: C wafer box: H box: S wafer box: R wafer: T
P1 P1 C P1 H P1 S P1 R P1 T
P2 P2 C P2 H P2 S P2 R P2 T
P3 P3 C P3 H P3 S P3 R P3 T
P4 P4 C P4 H P4 S P4 R P4 T
P5 P5 C P5 H P5 S P5 S P5 T

Fig. 1. Wafer-Box and PZT Sensor Combination (Preliminary Experiment)

3.2 Voltage Data Collection Method from Wheel Load Test (WLT)
The wafer-boxes with embedded PZT sensors (see Fig. 3) were tested under the load
wheel of the Asphalt Pavement Analyzer (APA) machine (see Fig. 4.) A vertical load of
152lbs was applied on each wafer-box with an embedded PZT sensor by the load wheels
of the APA machine. This vertical load on wafer-box generated mechanical energy
which was received by the PZT sensor. The test wheel completed 30 full cycles in 60 s
means 1 complete cycles took 2 s. The wheel movement was forward and backward and
Developing a Prototype Piezoelectric Wafer-Box for Optimal Energy 5

Table 2. Possible combinations of PZT sensors and wafer-box (final experiment design)

Sensor No. Wafer box-PZT sensor combinations


According to
blocks
Block Sensor Circular Hexagon Square Rectangle Triangle Rightangled Rhombus
No No (C) (H) (S) (R) (T) Triangle (RH)
(RT)
1 C1 H1 S1 R1 T1 RT1 RH1
B1 2 C2 H1 S2 R2 T2 RT2 RH2
3 C2 H3 S3 R3 T3 RT3 RH3
4 C4 H4 S4 R4 T4 RT4 RH4
5 C5 H5 S5 R5 T5 RT5 RH5
B2 6 C6 H6 S6 R6 T6 RT6 RH6
7 C7 H7 S7 R7 T7 RT7 RH7
8 C8 H8 S8 R8 T8 RT8 RH8
9 C9 H9 S9 R9 T9 RT9 RH9
10 C10 H10 S10 R10 T10 RT10 RH10
B3 11 C11 H11 S11 R11 T11 RT11 RH11
12 C12 H12 S12 R12 T12 RT12 RH12
13 C13 H13 S13 R13 T13 RT13 RH13
14 C14 H14 S14 R14 T14 RT14 RH14
15 C15 H15 S15 R15 T15 RT15 RH15
B4 16 C16 H16 S16 R16 T16 RT16 RH16
17 C17 H17 S17 R17 T17 RT17 RH17
18 C18 H18 S18 R18 T18 RT18 RH18
19 C19 H19 S19 R19 T19 RT19 RH19
20 C20 H20 S20 R20 T20 RT20 RH20
B5 21 C21 H21 S21 R21 T21 RT21 RH21
22 C22 H22 S22 R22 T22 RT22 RH22
23 C23 H23 S23 R23 T23 RT23 RH23
24 C24 H24 S24 R24 T24 RY24 RH24
25 C25 H25 S25 R25 T25 RT25 RH25

backward. So one forward movement took 1 s and one backward movement took 1 s.
The test speed was set at 30 Hz which means the wheels completed 30 full cycles in 60
s. The APA cabin temperature was kept at 30 degrees Celsius. The test conditions were
maintained in the same during testing of all the wafer-boxes.
6 S. Ahmad et al.

Fig. 2. Wafer-boxes and PZT sensor combination (final experiment)

Fig. 3. Embedded PZT sensor in wafer-box

Fig. 4. Wafer-box arrangement in APA machine


Developing a Prototype Piezoelectric Wafer-Box for Optimal Energy 7

When there is a connection of rectifier with PZT sensors it converts AC voltage


produced by the PZT sensor to DC voltage and oscilloscope displays the output DC
voltage. The output from this converter was sent to an AC-DC converter where shows
the voltage generated. The direct voltage/open circuit voltage measurement was obtained
by connecting the oscilloscope channel wires directly to the PZT sensor’s wires. The
RMS (Root Means Square) voltage from the oscilloscope display was recorded as an
alternate current (AC) measurement. The RMS value of a quantity is the square root of
the mean value of the squared values of the amount taken over an interval. The RMS
value of an AC voltage is equivalent to the direct voltage (DC) that produces the same
heating effect when applied across an identical resistor [24]. The received mechanical
energy was converted to electrical power using an electric rectifier. The output energy
(voltage) was measured by an oscilloscope.

3.3 Statistical Methods Used for Data Analysis

This research project used several statistical methods including one-way ANOVA test,
Tukey HSD-Q test and Scheffé’s method to analyze collected data. The one-way analysis
of variance (ANOVA) was utilized to identify substantial differences among two or more
independent groups of datasets. The Tukey’s Honest Significant Difference (HSD) test,
is a post-hoc test based on the standardized range distribution. An ANOVA test can show
if results are significant overall, but it does not show exactly where those differences
lie. After an ANOVA result is found to be significant, and then the Tukey’s HSD test
can be applied to learn which specific group’s means (compared with each other) are
different. The test compares all possible pairs of means [25]. The Scheffé’s Test is a
post-hoc test used in analysis of variance. In an analysis of variance (ANOVA) test if the
null hypothesis is rejected, showing that the means of different data groups are not the
same, the Sheffe’s test could then be run to find out which pairs of means are significant.
For Scheffé’s method, a T statistic is defined as the ratio of unsigned contrast mean to
contrast standard error [26]. The basic difference between these two post-hoc tests is
the Tukey’s HSD test is used for data groups of similar sample size and the Scheffé’s
method could be used for data groups of both equal and unequal sample sizes.
Different structural properties of the wafer-boxes, including section modulus, area
moment of inertia, radius of gyration and extreme points, were utilized in linear regres-
sion modelling with an average voltage value produced by different shaped wafer-boxes.
Regression models were developed to substantiate the validity of the research outcomes.

4 Results
4.1 Analysis of the Experiment Data

The data were analyzed using statistical tools to identify the voltage produced by different
shapes of wafer-boxes. The average voltage (RMS) produced by different shaped wafer-
boxes coupled with PZT sensors in the preliminary experiment were ranked by the
average voltage value, from highest to lowest. The preliminary data indicated that the
circular shaped wafer-box had the highest voltage average than all other shapes. The
8 S. Ahmad et al.

hexagonal and triangular shaped boxes produced similar voltage values to each other.
The square shape produced the lowest voltage. After the preliminary experiment, the
researchers conducted the final experiment with seven different shapes of wafer boxes.
In the preliminary experiment, five basic shapes were selected to identify which shape
produce the highest amount of voltage. Later the research scope allowed to work with two
more basic geometric shapes to substantiate if they produce more voltage than the other
shapes. For the final experiment data, a regression analysis was conducted to identify
the reasons of different amounts of voltage produced by different shaped wafer-boxes.
A simple graph (Fig. 5) represents the average voltage value produced by 25 sensors
embedded into different shaped wafer-boxes. Table 3 shows the ranking of wafer-boxes
according to the average voltage value. As shown in the graph the right angled triangle
shape has the highest average energy (voltage), and the square-shaped box produces
lowest average voltage value. Table 3 shows the 175 voltage (RMS) values produced
by twenty-five PZT sensors coupled with seven different shaped wafer boxes, and the
various average values produced by different PZT sensors embedded in the wafer-box.

Fig. 5. Average voltage produced by different shaped wafer box

4.1.1 One-Way ANOVA Analysis for Final Experiment Data


A one-way analysis of variance (ANOVA) was conducted using the final experiment data.
Null hypotheses were assumed that all means are equal and the alternative hypotheses
were set as all means are not equal. The two hypotheses were tested using an F-ratio for
a one-way ANOVA. At significance level α = .01, it was observed that the calculated test
statistics F = 40.72 > Fc = 2.398, thus it was then concluded that the null hypothesis was
rejected. At α = .05, it was observed that F = 40.72 > Fc = 3.48, and it was concluded
that the null hypothesis was rejected. So, it can be claimed that not all seven population
means are equal at significance level α = 0.05 or significance level α = 0.01. Here the
p-value corresponding to the F-statistic of one-way ANOVA is lower than 0.01, which
indicates that one or more pairs of treatments are significantly different.
Developing a Prototype Piezoelectric Wafer-Box for Optimal Energy 9

Table 3. Voltage data generated by twenty-five PZT sensors (final experiment)

Block Sensor Voltage (RMS volts) produced by the different shapes


no Rectangular Circular Square Triangular Hexagonal Right Rhombus
Box Box Box Box Box Triangle Box
Box
B1 1 2.35069 2.85188 1.9028 2.96715 2.3093 4.30445 4.28547
2 2.45843 3.48145 2.0067 2.92065 2.945 4.14547 4.00295
3 2.45389 3.0158 2.07465 2.88828 3.1562 2.47016 3.84305
4 2.40305 3.21455 2.14892 2.90882 3.13517 3.02959 3.80262
5 2.41855 3.47637 2.2267 2.93443 2.68132 4.44418 3.78643
B2 6 2.55527 3.47673 1.47009 2.94454 3.10462 4.55715 3.7797
7 2.68132 3.36679 1.58473 2.94399 3.0723 2.45052 3.75748
8 2.76957 3.21578 1.65087 2.94598 3.11833 3.69215 3.73119
9 2.86102 3.51812 1.67698 2.90115 3.11625 4.13521 3.71971
10 2.96965 3.64371 1.7048 2.90574 3.08728 4.25996 3.71682
B3 11 2.30925 3.353 1.29077 2.89624 3.0912 4.27386 3.71389
12 2.30624 3.33846 1.20859 2.89171 3.22762 4.38052 3.69814
13 2.5561 3.03746 1.28922 2.89475 2.89475 2.27418 3.63347
14 3.13517 3.03515 1.32674 2.93199 2.3267 3.52155 3.61419
15 3.29134 3.02763 1.37448 2.97059 2.37448 3.46213 3.57147
B4 16 3.25516 3.16447 1.17228 3.13897 3.15207 1.81555 3.49395
17 3.19318 2.96146 1.2989 3.0192 2.7696 3.36203 3.44312
18 3.17023 3.15617 1.49626 2.99831 2.9207 3.43794 3.38618
19 3.1834 2.96715 1.38222 2.96848 2.9513 3.60015 3.33156
20 3.17928 3.09344 1.61977 2.93913 2.3093 2.60052 3.30684
B5 21 2.66515 3.09118 3.21247 2.94016 3.12783 3.39243 2.99424
22 2.56721 3.22762 2.40297 2.9524 3.10776 3.62229 2.56954
23 2.48994 3.15208 3.47403 2.95168 3.15211 3.50139 2.30761
24 2.40907 3.25625 2.23919 2.96276 2.32766 3.92994 1.93159
25 2.65109 3.20894 2.73877 2.96963 3.05189 4.30445 4.28547
Average 2.73133 3.213266 1.849756 2.947469 2.90043 3.527638 3.475883
Minimum 3.02763 3.64371 3.21247 3.13897 3.22762 4.55715 4.28547
Maximum 2.30624 2.85188 1.9028 2.97615 2.3093 1.81555 1.93159
Rank order 6 3 7 4 5 1 2

The Tukey HSD Test Table 4 shows the results of the Tukey HSD test analysis, comparing
pairs of wafer-boxes with embedded sensors based on the calculated Q-statistic and p-
value described in the Tukey HSD method [25]. There was a significant difference among
different data groups at the two selected confidence intervals (significance level α = 0.01
and 0.05).
10 S. Ahmad et al.

Table 4. Results of Tukey HSD test for voltage data

Treatments pairs Tukey HSD Qstatistic Tukey HSD p-value Tukey HSD inference
Rectangle vs Circle 5.4053 0.0034066 Significant
Rectangle vs Square 9.3806 0.0010053 Significant
Rectangle vs 2.4996 0.5617074 Insignificant
Triangle
Rectangle vs 1.7817 0.8569824 Insignificant
Hexagon
Rectangle vs Right 8.5558 0.0010053 Significant
triangle
Rectangle vs 7.9997 0.0010053 Significant
Rhombus
Circle vs Square 15.223 0.0010053 Significant
Circle vs Triangle 2.9916 0.3486336 Insignificant
Circle vs Hexagon 3.7307 0.1213037 Insignificant
Circle vs Right 3.4428 0.1904155 Insignificant
triangle
Circle vs Rhombus 2.8715 0.401228 Insignificant
Square vs Triangle 12.2314 0.0010053 Significant
Square vs Hexagon 11.4923 0.0010053 Significant
Square vs Right 18.0685 0.0010053 Significant
triangle
Square vs Rhombus 17.4972 0.0010053 Significant
Triangle vs Hexagon 0.7391 0.8999947 Insignificant
Triangle vs Right 6.317 0.0010053 Significant
triangle
Triangle vs 5.7457 0.0013962 Significant
Rhombus
Hexagon vs Right 7.0271 0.0010053 Significant
triangle
Hexagon vs 6.4558 0.0010053 Significant
Rhombus
Right triangle vs 0.5505 0.8999947 Insignificant
Rhombus

The SCHEffé’s Method Table 5 shows the results of Scheffé’s method test analysis,
comparing pairs based on the calculated T-statistic and p-value described in Scheffé’s
method [26]. It can be identified from the statistical analysis that there was a significant
difference among different data sets at any confidence interval (Significance level α =
0.01 and 0.05).
Developing a Prototype Piezoelectric Wafer-Box for Optimal Energy 11

Table 5. Results of Scheffé’s method test for voltage data

Treatments pairs Scheffe T-statistic Scheffe p-value Scheffe inference


Rectangle vs Circle 3.8221 0.027521 Significant
Rectangle vs Square 6.6331 5.10E-07 Significant
Rectangle vs Triangle 1.7675 0.7921774 Insignificant
Rectangle vs Hexagon 1.2599 0.9527256 Insignificant
Rectangle vs Right triangle 6.0498 7.85E-06 Significant
Rectangle vs Rhombus 5.6566 4.38E-05 Significant
Circle vs Square 10.7643 1.11E-16 Significant
Circle vs Triangle 2.1154 0.6134913 Insignificant
Circle vs Hexagon 2.638 0.3299587 Insignificant
Circle vs Right triangle 2.4344 0.4350667 Insignificant
Circle vs Rhombus 2.0305 0.6602698 Insignificant
Square vs Triangle 8.6489 1.04E-11 Significant
Square vs Hexagon 8.1263 2.03E-10 Significant
Square vs Right triangle 12.7764 1.11E-16 Significant
Square vs Rhombus 12.3724 1.11E-16 Significant
Triangle vs Hexagon 0.5226 0.9996051 Insignificant
Triangle vs Right triangle 4.4668 0.0039641 Significant
Triangle vs Rhombus 4.0629 0.0139646 Significant
Hexagon vs Right triangle 4.9689 0.0006795 Significant
Hexagon vs Rhombus 4.5649 0.0028558 Significant
Right triangle vs Rhombus 0.3893 0.9999293 Insignificant

4.2 Structural Property Analysis


Section modulus is the cross-sectional geometric property of structural members. This
property is used to design beams and flexural members. Radius of gyration, moment of
inertia, polar moment of inertia, area for tension and shear are some other properties
used in the design process. Structural shape has great impact on relationships among
these properties. Different structural property values are shown in Table 6.

4.2.1 Linear Regression Analysis


Using a simple linear regression method, the relationship among structural properties
of wafer-boxes and average voltage values was calculated (Table 7). The R2 value for
area moment of inertia is 0.2332. This value means area moment of inertia explains an
estimated 23.32% of the variation in average voltage data. The R2 value for radius of
gyration and extreme points are 0.1061 and 0.1803 respectively. The radius of gyration
explains 10.61% of variation in average voltage data and the extreme points explain
12 S. Ahmad et al.

Table 6. Structural properties calculations

Descriptions Values
Circular Triangular Hexagonal Rectangular Square Right Rhombus
triangle
Area Moment of 139.15138 239.2961 202.81994 142.41983 205.7792 162.96175 129.2496
Inertia Section
Properties
(inch^4)
Section modulus 38.12367 38.59616 46.10552 48.71977 58.41512 31.69968 31.96175
(inch^3)
Radius of 1.825 2.1948 2.013 1.69065 2.03745 1.8054 3.755
gyration (inch)
Extreme points 3.65 6.2 4.39962 2.925 3.525 5.1 4.6
(inch)

18.03%. The R2 value for the section value is 0.9494. This means the section modulus can
account for 94.94% variation in average voltage values. Thus, from regression analysis it
could be said, out of four properties, the section modulus is the most influential structural
property affecting voltage production. Lower section modulus leads to higher deflections
which translate into higher stress levels and higher voltages.

Table 7. Summary of regression calculations

Regression Relation analysis with average voltage value of 125 and different
description structural properties of wafer-box
Area moment of Section modulus Radius of Extreme points
inertia gyration
r −0.483 −0.9744 0.3258 0.4246
R^2 0.2332 0.9494 0.1061 0.1803
Relationship Very weak Very strong Very weak Very weak
negative negative positive positive
relationship relationship relationship relationship
Independent and Insignificant Insignificant Insignificant Insignificant
dependent variable
relationship
significance

4.2.2 Section Modulus Analysis


As shown in Table 6, the right angled triangle shape had the lowest section modulus and
square shape had the highest section modulus. The rhombus, triangular, hexagonal and
rectangular shapes occupied the second, third and fourth and fifth places, respectively,
Developing a Prototype Piezoelectric Wafer-Box for Optimal Energy 13

according to those section modulus values. If the section modulus is high for any shape,
the member is more resistant to bending. When two plates of same materials are com-
pared, according their section modulus value the plate with higher section modulus can
bear more loads than the plate with lower section modulus. Alternatively, a wafer-box
with a lower the section modulus will bend more when force is applied to it. Out of
seven shapes, the right triangular shaped wafer-box bended more than the other shapes.
The square box bended the least as it had the highest section modulus value. Because
the right angled triangular box bended more due to the lower section modulus, the PZT
sensors imbedded in it bent more as well. The more PZT sensors bend, the more elec-
tricity is produced. According to this principle, the square shape produced the lowest
energy. The energy produced by the other five shapes (rhombus, rectangular, hexagonal,
circular, triangular, square) were shown by this bending moment principle.

5 Conclusions and Recommendations


The goal of this research project was to develop a prototype wafer-box producing highest
energy. In line with that, the primary objective of this research project was set to identify
the shape effect of a wafer-box to produce energy while being coupled with PZT sensors.
The APA wheel load test results indicated that, the right triangular shape produced the
highest positive effect on energy harvesting compared with other shapes. Various struc-
tural properties of wafer-box were analyzed to substantiate the reason behind produc-
ing energy generated by different wafer-boxes. Different structural property including
section modulus, area moment of inertia, and radius of gyration and extreme point were
considered. A simple linear regression model was developed to identify the most influen-
tial structural property for producing highest voltage. From regression modelling it was
found that the section modulus had the very strong negative correlation (R2 =0.95) with
the voltage production. Using the knowledge from this research project, the research
team can improve the magnitude of energy harvesting from an optimal wafer-box.
The research team developed a prototype wafer box coupled with the pavement
materials using a 3D printer on with CAD design. Further studies will be performed
providing insight into the difference in magnitudes of voltages of wafer-boxes. Since this
research framework was successfully developed to provide objective research results,
highway agencies or private industry can apply it to newly-developed and advanced
piezoelectric wafer-boxes that can be used in highway pavements.
This knowledge basis can be used as a standard method of energy harvesting technol-
ogy for different government agencies or private industry working with newly-developed
and advanced piezoelectric technologies. These wafer-boxes could be developed using
more advanced 3D printer with other high strength materials like aluminum and steel to
identify their effectiveness in energy harvesting.
Results of this lab-scale research project will make several major contributions to
the advancement of transportation performance and management, and highway sustain-
ability. Outcomes can be expected to: (1) increase the self-supporting energy capability
of highways, (2) increase the ability of highways to provide electricity to areas that are
remote and far from main electric lines, and (3) improve the performance of the system
to generate energy from both vertical and horizontal forces of vehicles.
14 S. Ahmad et al.

6 Author Contribution Statement


Seonghoon Kim, Ph.D. is the principal investigator of this research. Dr. Kim developed
the framework of this research and supervised all entire process of this research including
design of experiment, data collection, analyses, and future directions of this research.
Ahmad Safayet as the graduate of this research, conducted data collection and statistical
analyses. Finally, he completed his thesis with this research topic. Ilan Stern, Ph.D. as
a GTRI researcher, collaborated and advised this research processes and results. Junan
Shen, Ph.D. and Mohammad Ahad, Ph.D. are co-PIs of this research project. Yong Bai,
Ph.D. reviewed this paper and advised future research plan.

Acknowledgment. This research project was funded by the Georgia Technology Research Insti-
tute (GTRI) as part of the Kennedy Space Center (KSC) Vapor Trail Walkway Project with which
GTRI contracted with Delaware North Companies (DNC) and NASA. See the research team
website. http://pzmaterialtest.s3-website-us-east-1.amazonaws.com/

References
1. Wen, S., Chung, D.D.L.: Piezoelectric cement-based materials with large coupling and voltage
coefficients. Cement Concrete Res. ELSEVIER 32(3), 5 (2002)
2. Hu, A., Fang, Y., Young, J.F., Oh, Y.-J.: Humidity dependence of apparent dielectric constant
for DSP cement materials at high frequencies. J. American Ceramic Soc. 82(7), 8 (1999)
3. Kim, H.S., Kim, J.H., Kim, J.: A review of piezoelectric energy harvesting based on vibration.
Int. J. Precision Eng. Manufact. 12(6), 1129-1141 (2011)
4. Rahman, M., et al.: 1.02 - Techniques for assessing the properties of advanced ceramic mate-
rials. In: Comprehensive Materials Processing, Hashmi, S., et al., Editors: Elsevier: Oxford,
pp. 3–34 (2014)
5. Harnessing Pavement Power: Developing Renewable Energy Technology in the Public Right-
of-Way. Federal Highway Administration, p. 2 (2013)
6. Xiong, H., et al.: Piezoelectric energy harvesting from traffic induced deformation of
pavements. Int. J. Pavement Res. Technol. 5(5), 333–337 (2012)
7. Ali, S.F., Friswell, M.I., Adhikari, S.: Analysis of energy harvesters for highway bridges. J.
Intell. Mater. Syst. Struct. 22(16), 1929–1938 (2011)
8. APC International, L. PZT Materials. PIEZO Theory 2016 07/28/2018 [cited 2018. https://
www.americanpiezo.com/piezo-theory/pzt.html
9. Gkoumas, K., Petrini, F., Bontempi, F.: Energy harvesting for the life-cycle of structures and
infrastructures: State of art, recent trends and future developments. In: Life-Cycle and Sus-
tainability of Civil Infrastructure Systems: Proceedings of the Third International Symposium
on Life-Cycle Civil Engineering (IALCCE’12), Vienna, Austria, October 3–6, 2012. CRC
Press (2012)
10. Yu, L., et al.: In-situ health monitoring on steel bridges with dual mode piezoelectric sensors.
In: Nondestructive Characterization for Composite Materials, Aerospace Engineering, Civil
Infrastructure, and Homeland Security 2013, March 11, 2013 - March 14, 2013. SPIE, San
Diego, CA, United states (2013)
11. Yu, L., et al.: Piezoelectric based sensing in wireless steel bridge health monitoring. In: Non-
destructive Characterization for Composite Materials, Aerospace Engineering, Civil Infras-
tructure, and Homeland Security 2009, March 9, 2009 - March 11, 2009. SPIE, San Diego,
CA, United states (2009)
Developing a Prototype Piezoelectric Wafer-Box for Optimal Energy 15

12. Vijayaraghavan, K., Kossett, A., Rajamani, R.: Passive Roadside Reflectors and Communi-
cations Systems for Improvement of Radar Reliability, p. 54 (2006)
13. Baldwin, J.D., et al.: Energy Harvesting on Highway Bridges, p. 24 (2011)
14. Wang, M., Chang, P.C., Newcomb, R.: Power scavenging from highway bridge vibration. In:
1st International Conference on Structural Health Monitoring and Intelligent Infrastructure,
SHMII-1’2003, November 13, 2003 - November 15, 2003. Tokyo, Japan: A.A. Balkema
(2003).
15. Symeoni, A.: A review on energy harvesting from roads (2013)
16. Huang, R.-B., et al.: Technical approach and research prospect of piezoelectric energy harvest
from highway. Zhongguo Gonglu Xuebao/China J. Highway Transp. 25(6), 1–8 (2012)
17. Sun, C.-H., et al.: Designing piezoelectric harvesting unit from road vibration. In: 4th Inter-
national Conference on Manufacturing Science and Engineering, ICMSE 2013, March 30,
2013 - March 31, 2013. Dalian, China: Trans Tech Publications Ltd. (2013)
18. Zhao, H.D., Ling, J.M., Fu, P.C.: A review of harvesting green energy from road. In: 8th
International Conference on Road and Airfield Pavement Technology, ICPT 2013, July 14,
2013 - July 18, 2013. Trans Tech Publications Ltd., Taipei, Taiwan (2013)
19. Winchester, C.L., Salji, M.J., Kasivisvanathan, V.: Gathering preliminary data. J. Clinical
Urology 10(6), 568–572 (2017)
20. NCBI: Preliminary studies and pilot testing. Field Trials of Health Interventions: A Toolbox
2015 [cited 2018 07/1/18]; 3rd https://www.ncbi.nlm.nih.gov/books/NBK305518/
21. University, Y. Experimental Design. Experimentation [cited 2018 06/05/2018] (1997) http://
www.stat.yale.edu/Courses/1997-98/101/expdes.htm
22. Encyclopedia, W.t.F. Random sampling. Random assignment 02/11/2018 [cited 2018
03/10/18]; (2018). https://en.wikipedia.org/wiki/Random_assignment
23. Teaching, C.f.I.i.R.a. Types of Experimental Research. Experimental Research 07/28/2018
[cited 2018 04/28/2018]; (2018). https://cirt.gcu.edu/research/developmentresources/res
earch_ready/experimental/design_types
24. Engineering, R.A.o. The Study of Root Mean Square (RMS) Value. Mechanical, Electrical,
Electronics Engineering [cited 2018; (2018). https://www.raeng.org.uk/publications/other/
8-rms
25. Technology, N.I.o.S.a., Tukey’s Method Handbook of Statistical Methods, ed. I.T.L. (ITL).
Vol. 2018. MD, USA: NIST. Statistical method (2018)
26. Technology, N.I.o.S.a. Scheffe’s method. Engineering Statistics Hndbook [cited 2018
03/05/18]; Statistical method]. (2018). https://www.itl.nist.gov/div898/handbook/prc/sec
tion4/prc472.htm
27. Safayet, A.J.: Designing and Testing 3-D Printed Wafer-box with Embedded PZT Sensors to
Identify the Shape Effect on Energy Harvesting. Electronic Theses and Dissertations. 1751
(2018). https://digitalcommons.georgiasouthern.edu/etd/1751
Hybrid Meta-heuristic Genetic Algorithm:
Differential Evolution Algorithms for Scientific
Workflow Scheduling in Heterogeneous Cloud
Environment

Faten A. Saif1(B) , Rohaya Latip1 , M. N. Derahman1 , and Ali A. Alwan2


1 University Putra Malaysia, Selangor, Malaysia
[email protected], {rohayalt,mnoord}@upm.edu.my
2 Ramapo College of New Jersey, Mahwah, NJ, USA

[email protected]

Abstract. The gaint cabailities of cloud computing in providing online services


via Internet attract the attention of the distributed sector due to its huge abilities that
include storage, processing, software, databases, and servers that are shared simul-
taneously over the Internet by remote users geographically dispersed. Increasing
the enormous amount of generating data through big data platforms and the use
of IoT devices connected via the network have exploited the computational power
of the cloud. However, the high utilization of the cloud leads to a longer execution
time for a specific task. This paper proposing the hybrid strategy of schedul-
ing the workflow in cloud computing called Genetic Algorithm with Differential
Evolution (GA-DE). This research aims to investigate how heterogeneous cloud
computing affects workflow scheduling. This study is aimed at reducing makespan
and verifying if the metaheuristic technology is more suitable for the distributed
environment by comparing it to existing heuristics, such as HEFT-Downward
Rank,HEFT-Upward Rank,HEFT-Level Rank, and meta-heuristic algorithm GA.
The proposed algorithm is validated through extensive experiments compared to
three scientific workflows (Epigenomics,Cybershake,and Montage). Based on the
simulation result GA-DE algorithm proves its superiority against the other com-
paring algorithms in term of makespan. Furthermore, the conducted experiment
proves that montage scientific workflow is more proper for executing workflow
scheduling in heterogeneous cloud computing.

Keywords: Genetic algorithm · Hybrid meta-heuristic · Cloud computing ·


Workflow scheduling · Makespan · Heterogeneity

1 Introduction
Cloud computing is widely used to deliver services to end-users and enterprises which
are distributed geographically regardless of whether or not they are in the cloud. Cloud
computing aims to share remote resources with clients such as applications, storage,
databases, servers, and services on-demand using a pay-as-you-go method. Due to the

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 16–43, 2023.
https://doi.org/10.1007/978-3-031-18344-7_2
Hybrid Meta-heuristic Genetic Algorithm: Differential Evolution Algorithms 17

ubiquitous nature of the Cloud technology, users will be able to execute large-scale
computations without running out of network bandwidth with low size limitations [1,
2]. Internet services have attracted attention in the current decade to cloud computing
such as Microsoft Azure,Amazon Web Services, and Google App Engine. Cloud enables
access to the resources in the Virtual Machines (VMs) form [3, 4]. The main categories
of cloud services can be divided into three: (IaaS) refers to Infrastructure as a Service that
is deployed in Amazon EC2, (SaaS) indicates software as a Service to provide the online
applications for users, and (PaaS) refers to Platform as a Service to facilitate deploying
the applications for users and provides them with the controlling [5]. The attraction to
adopt the technology of cloud computing by various levels of companies has been raised
in recent years owing to many aspects such as the prompt improvement of the proces-
sor of the computers in the form of multi-crore processors. Furthermore, it decreases
significantly the cost of the system hardware. One of the main functions of cloud com-
puting plays a vital role in allocating tasks to suitable resources in polynomial time
with meeting the Quality of Service (QoS) to satisfy end-user requirements. In addition,
scheduling is a NP-complete problem, especially for Large-scale tasks. So, this chal-
lenge demands a rough solution by maintaining the constraints to improve objectives of
scheduling such as decreasing the energy consumption, the cost of communication, and
completion time. Also, increase the throughput, resource utilization, load balancing, fault
tolerance, tardiness, laxity, and deadline. Generally, there are common classifications of
task scheduling algorithms in the heterogeneous resources [6] namely: heuristic algo-
rithms and metaheuristic algorithms. The heuristic approach offers an optimal solution.
There are different heuristic approaches like Critical Path on a Processor (CPOP), Het-
erogeneous Earliest Finish Time, Graham algorithm, and Minimum Completion Time
(MCP). On the other hand, meta-heuristics get popularity to obtain a good result for the
NP-hard problem with minimal effort of computation making it appropriate for a large-
scale task. There are various popular algorithms like Genetic Algorithm (GA), Particle
Swarm Optimization (PSO), Honeybee, and Ant Colony Optimization (ACO). However,
both approaches heuristic and meta-heuristic have not provided satisfying solutions for
scheduling tasks in heterogeneous resources in the cloud environment.
In this regard, with the speedy improvement of big data, cloud com is with existence
of the numerous workflow applications, for instance, the video intelligent surveillance
that required instant processing to manipulate the five modules: (motion, object) detector,
user interface, object tracking, tilting control, and Zoom monitoring. However, the far
distance of cloud computing and the limited bandwidth may play challenges to the
workflow scheduling in cloud computing. From this point, the significant question is
how to reduce the makespan of workflow scheduling in the cloud and maintain instant
processing for critical applications that cannot afford the delate. The various tasks’
specifications that are manipulated in the machine have become a hot spot to attract the
researchers to take into their consideration the heterogeneity of the system to achieve the
user’s requests effectively. Furthermore, abundant studies have been conducted about
scheduling in cloud computing, which utilize one of the scheduling categories: heuristics
and metaheuristics approach to obtain the approximated solutions. Thus, the significant
question is which approach is more convenient for a distributed environment.
18 F. A. Saif et al.

In the workflow scheduling, the scholars have implemented one of the metaheuristic
algorithms (PSO, Genetic algorithm, and ant colony) for scheduling due to its ability
to provide reasonable solutions in a short execution time and are more appropriate for
the distributed computing environment. Therefore, the main contribution of this study is
merging two metaheuristics to take advantage of the algorithms to guarantee fulfilling
the main aim of the study, which is about reducing the makespan during scheduling the
workflow in cloud computing. The proposed metaheuristic algorithms are differential
evolution (DE) and Genetic algorithm (GA) which are considered Evolutionary Algo-
rithms (EA) that are for solving difficult and complex problems [7]. DE is powerful
among evolutionary algorithms and stochastic search techniques for obtaining results
for optimization problems that are commonly utilized in the science and engineering
area [8]. The main feature of this algorithm is its ability to obtain the global optimal
solution and is stronger than other metaheuristic algorithms. The DE algorithm has the
ability to raise the efficiency of resources and reduce the makespan [9]. However, this
algorithm is easily stuck in the local optimum solution and faces other problems like
premature convergence and slow convergence [3]. Thus, to overcome this challenge and
enhance the DE the study resorted to adopting a Genetic algorithm and exploiting its
feature of overcoming the challenge of trapping in the locoal optimum solution, and
refine the research space. Genetic algorithm (GA) is commonly used in the academic
and industrial fields due to its intuitiveness, easy and simple implementation, and high
capability to solve the problems such as complex, nonlinear, and mixed-integer opti-
mization because it can manipulate the continuous and discrete variables and nonlinear
objectives and meeting the constraints with no need of the information of gradient [10].
Thus, integrating more than one meta-heuristic algorithm for integrity to accomplish
the desired objective. Furthermore, conducting this proposed algorithm versus heuristic
algorithms to validate the metaheuristic is more appropriate for scheduling in distributed
computing.
This study discusses the scheduling of workflow in the heterogeneous cloud and
verifies how scaling the resources by fulfilling the requirements in a heterogeneous
cloud computing environment. The verification of the proposed hybrid meta-heuristic
algorithm (GA - DE) algorithm) respecting of makespan conducting the simulation on
the scientific workflow (Cybershake, Montage, and Epigenetics). The paper is organized
as follows. Sect. 2 illustrates related works; Sect. 3 represents the proposed GA-DE
Algorithm; Sect. 4 describes the experimental setup; Sect. 5 illustrates the results and
summarize, Sect. 6 sets out the conclusion.

2 Related Works

In recent decades, the majority of research focus on investigating issues pertinent to cloud
environment. That it has a crucial role in gathering parallel and distributed systems in
a singular platform. It relies on VMs instead of physical machines for processing and
configuration that facilitates sharing the resources with the remote users via the Internet
in place of utilizing supercomputers that required a high cost for maintenance [11].
Furthermore, cloud computing has an important bearing on allocating various numbers
of the shared resources with distributing user over the Internet [12]. This part is taking
Hybrid Meta-heuristic Genetic Algorithm: Differential Evolution Algorithms 19

a view of scheduling algorithms in cloud computing and highlights the criteria that
affect the process of scheduling. Moreover, describing the effective operation in various
objectives such as delay, energy consumption, cost, and so on. Besides, discuss and
examine the previous research works that are relevant to the research area in cloud
computing. The reviews focus on the three main types of scheduling approaches in the
cloud computing environment, namely: heuristic, meta-heuristic, and hybrid algorithms.
The definition of scheduling is a process of mapping the tasks to the propper pro-
cessor based on the required objectives, such as raising the execution speed, reducing
makespan, and reducing the cost and delay. The main purpose of scheduling in cloud
computing is raising the effectiveness to optimize the performance of system and reduce
the overhead over network [13]. The role of scheduling is guiding the process of execu-
tion of tasks (dependent or independent, like workflow) on the shared resources as stated
in [14]. It assigns tasks to suitable resources to fulfill the user requirements that enhance
the system’s performance. Generally, there are two models of workflow scheduling,
called QoS based and best-effort based. Scheduling the best-effort based depends on the
reduction of the makespan while the QoS based focuses on decreasing the makespan
by maintaining the constraint’s requirement. For example, reducing the cost under the
budget or makespan under deadline. Workflow is considered as a set of independent tasks
that are related to each other (i.e., can’t execute tasks before finishing the previous one)
that facilitate execute the complex applications that are deployed in the heterogeneous
computing and represent a directed acyclic graph (DAG) [6]. The main technique to
measure the performance effectiveness of the scheduling process is by experimenting’s
performance metrics like makespan, laxity, tardiness, delay, cost, and energy consump-
tion [15]. Recently many studies on workflow applications which has large-scale com-
puting and cloud computing, offer a significant chance to execute the s workflows at
low. A workflow application can be illustrated by a Directed Acyclic Graph (DAG) that
contains nodes and edges to represent the tasks and data dependencies respectively.The
main role of a dependency is stopping child node executing till all its parent tasks are
complete from execution then sending the demand child input data. Finishing executes
all tasks that are namely the schedule length or makespan. The model to gain the overall
cost of execution for tasks can take into account storage costs, data transfer costs, and
computation costs [7]. Workflow is considered as set of computational tasks and its pat-
tern are repeatable and dependent on dependent tasks. Workflows represents a series of
activities and mechanisms utilized for executing a singular or set of tasks. Input/Output
intensive workflows demand a massive amount of input data and massive output data.
The main performance objectives that are considered in scheduling in workflow in cloud
computing are makespan, cost, energy, load balancing, and resource utilization. Gener-
ally, executing large-scale of workflow applications demands scalable data and capable
resources [9]. Workflow can be categorized into two types, called business workflow
and scientific workflow. (i) Business Workflow represents the real work that has a set
of sequence of business processes and activities, and (ii) Scientific workflows repre-
sent a scientific application that depends on other tasks and has a complex execution.
Scientific workflows assist in the formulation and struct of the complex process. Many
algorithms have been implemented in workflow scheduling. Generally, scheduling algo-
rithms depend on an optimization approach that is the optimal approach or a heuristic
20 F. A. Saif et al.

approach. The optimal approach such as Simplex techniques and Branch-and-Bound


that guarantee getting optimal result but spending more time due to their complexity. On
the other hand, the heuristic approach provides near-optimal solutions and spends a short
time execution [14]. The effectiveness of heuristic algorithms, especially in distributed
computing, encourages implementation of the heuristic approach. Many researchers
have improved many techniques that lead to appear: heuristic, meta-heuristic and hybrid
algorithms.

2.1 Heuristic Algorithms


There are many heuristic approaches such as Min-Min, Max-Min, Graham algorithm,
Minimum Completion Time (MCP), Heterogeneous Earliest Finish Time (HEFT),
Greedy search, Earliest-due-date (EDD) for flow shop scheduling, and First-come-first-
served (FCFS) [6]. The work in [16] proposes a new heuristic approah for offloading
task in the multisite of mobile cloud computing that is indicated as the first study for
addressing task scheduling in mobile cloud computing. The main purpose is transform-
ing the multi-objective into a single objective and then utilize a heuristic algorithm to
schedule the tasks. The outcomes present the effectiveness of the proposing algorithm
by fulfilling well-founded results in a reasonable time. The work contributed in [16]
aims at developing an algorithm, called Improved Heterogeneous Earliest Finish Time
(IHEFT) for scheduling tasks as DAG based on workload distributing on the VMs to
reduce makespan. It is clear that the heuristic approaches offer acceptable performance
for a particular domain but not for others. The work reported in [17] proposes a heuristic
algorithm called Energy-aware Cloud Workflow Scheduling with geo-distributed Data
(ECWSD) for reducing electricity cost by meeting the deadline constraint. The issue of
scheduling task in cloud computing has been investigated in [18]. They have proposed
a heuristic approach called a dynamic cloud resource provisioning and scheduling algo-
rithm to acheive the required deadline of the workflow. The idea of the work relies on
adopting the summation of the expected time of execution of the task and standard devi-
ation for estimation of the execution of real tasks. The proposed algorithm verifies the
reduction of renting resources cost with meeting the deadline compared to other heuristic
algorithms of MO-HEFT on the ElasticSim first. The work contributed by [19] proposes
a novel heuristic algorithm called Budget Deadline Aware Scheduling (BDAS) for sci-
entific workflow scheduling in dynamic provisioning commercial cloud with focus on
individual specifications and executing workflow on the cloud by keeping the deadline
constraints and cost.

2.2 Meta-heuristic Algorithms


Meta-heuristic scheduling algorithm is most commonly used due to its capabilities in
resolving the NP-hard problem and achieving it together with low complexity that is
suitable for complex tasks such as Ant Colony Optimization (ACO), Honeybee, Genetic
algorithm (GA), and Particle Swarm Optimization (PSO) [6]. The work presented in
[20] develops a new meta-heuristic approach, namely Completion Time-Driven Hyper
Heuristic (CTDHH). The role of the proposed algorithm is reducing the cost of sci-
entific workflow in cloud computing. The proposed approach utilized four populations
Hybrid Meta-heuristic Genetic Algorithm: Differential Evolution Algorithms 21

and relies on a meta-heuristic as a low-level heuristic approach (LLH). The algorithm


presents its effectiveness in reducing the cost. The study [21] has presented a modified
genetic algorithm for scheduling tasks in a cloud computing namely N-GA algorithm.
It exploits the advantages of the GA algorithm along with the heuristic approach. The
proposing algorithm was conducted on the model checker called NuSMV and process
analysis Toolkit (PAT) to be as an infrastructure for changing the proposed approach
to SMV code while the NuSMV model checkers is the formal basis that is appropri-
ate to model the distributed and concurrent system. Moreover, the model has conduct
for addressing the shortcoming between real applications and the official verification
method. Also conducted various behavioral models to facilitate choosing the approach
that generates the ideal result. The experiment results illustrate that the proposed app-
roach provides the best result than the comparing meta-heuristics approaches in many
issues. Furthermore, the work discussed in [22] develops a modified Genetic algorithm
for assigning the static task to the suitable processor in heterogeneous cloud computing.
The proposing approach provides a new operator that assures sample variety and con-
stant coverage for all research space. This strategy replaces the random initial population
with optimizing results for decreasing the iteration numbers in GA. The outcome has
presented that the proposed algorithm fulfilled an important improvement in the case
of makespan and cost. This technique has the attractive trend in gathering more than
one of the meta-heuristic algorithms that take the benefits of their features to offer the
best optimal result. Some attempts have been conducted to improve the task scheduling
performance in cloud computing [23]. Proposing a prediction-based dynamic multi-
objective evolutionary algorithm, namely the NN-DNSGA-II algorithm by cooperated
artificial neural network with the NSGA-II algorithm workflow scheduling in terms of
reducing makespan, cost, energy consumption, reliability, and increasing utilization. In
study [8] the author proposes a meta-heuristic algorithm named Energy-Aware, Time,
and Throughput Optimization heuristic (EATTO) workflow scheduling algorithm which
relies on the bat algorithm to reduce energy consumption and makespan while maximiz-
ing throughput with satisfying the Quality of service (QoS). The experiment outcomes
present that the proposing approach provides the best global solution respecting to energy
consumption, makespan, and throughput. The simulation result verified the efficiency
of the proposed algorithm. Table 1 illustrates the comparison between heuristic and
meta-heuristic algorithms [24].

2.3 Hybrid Strategy


Scheduling on cloud computing is the essential challenge that has drawn the attention
of researchers prompting them for offering a number of hybrid meta-heuristic-based
solutions in order to locate an ideal solution which improves the performance of system.
This method, which integrates more than one meta-heuristic algorithm and exploits its
strengths to offer a better optimal solution, is a recent trend. There have been numer-
ous initiatives to enhance the scheduling speed in the environment of cloud computing.
The work contributed by [25] constructed actual bi-objective scheduling. The idea of
the work relies on developing a strategy that incorporates HEFT algorithms and meta-
heuristic (Gravitational Search Algorithm (GSA)) based on a new parameter termed cost
time. The experiment results show that the suggested approach perform better than other
22 F. A. Saif et al.

Table 1. The comparison between heuristic and meta-heuristic algorithms

Features Heuristics Metaheuristics


Level of heuristic Low High
Development Low High
Level of performance Low Better but not ensured
Domain - Solving Specific problem Solving real-time systems,
- Problem-dependent complex tasks, non-linear,
high dimensional, and
multimodal
- Problem independent
The specification of searching - Narrow A wide range that suitable
space - Exploitative method for complex problem
- Exploratory method
Result Few practical outcomes Outcomes without modifying
the algorithm structures
Core Approximated Approximated, Heuristics &
Required the details of the Stochastic & Iterative
problem It is not required to have
prior knowledge of the
problem

algorithms respecting to makespan and execution time reduction. The work reported in
[26] introduces a new hybrid meta-heuristic approach for scheduling the incoming tasks
that are considered as a bag of task (BOT) application in the interconnected cloud envi-
ronment. The approach cooperates the simulated annealing algorithm with a tabu search
meta-heuristic algorithm for minimizing the cost of scheduling and develop the schedul-
ing processing performance. The proposed algorithm has been compared with the Fastest
Processor Largest Task based on the arrival and the running time. The outcomes have ver-
ified the effectiveness of the proposed algorithm in reducing the cost and the makespan
of the scheduling. The problem of task scheduling in the cloud computing environment
has been discussed in [27]. They have developed a new algorithm called (PSO-SA)
algorithm and its role is resource provisioning in multi-tier cloud computing by imple-
menting the meta-heuristic appraoch. The proposed method also implemented the PSO
with simulated annealing and a hybrid algorithm that has (PSO) and (SA) that leads to
speed up the provisioning of resources in cloud environemnt. Furthermore, The work
reported in [28] propose a constraints-aware multi-QoS workflow scheduling approach
to handle the multi-QoS constraints in grid workflow—a technique based on hybridizing
the PSO algorithm and a novel look-ahead strategy based on a min-max heuristic. The
approach does not take into account accelerating the convergence. The study introduced
in [29] has addressed the topic of cloud computing scheduling. To address scheduling in
virtual machines in the cloud, they developed an approach that incorporates ant colony
optimization and particle swarm optimization (ACOPS). The algorithm uses previous
Hybrid Meta-heuristic Genetic Algorithm: Differential Evolution Algorithms 23

information to anticipate the workload of incoming input requests in a dynamic set-


ting without the need for extra task information. To reduce computing time, the method
ignores unanswered requests before scheduling. The experimental results show that
the algorithm solution can provide a reasonable load balancing in a dynamic environ-
ment. In addition, the work reported in [30] has proposed two hybrid meta-heuristic and
dynamic dispatch Queues-based algorithms (TSDQ). Fuzzy Logic with Particle Swarm
Optimization algorithm (TSDQ-FLPSO) is used in the hybrid meta-heuristic, whereas
TSDQ employs the simulated annealing with Particle Swarm Optimization technique
(TSDQ-SAPSO). According to execution time, queue length, cost, waiting time, load
balancing, and resource utilization. The simulation results suggest that TSDQ-FLPSO
is more effective than other algorithms at finding the best solution. The work presented
in [31] propose a novel called a bio-inspired hybrid algorithm (NBIHA) that gathers
modified particle swarm optimization (MPSO) and modified cat swarm optimization
(MCSO) for resource management, the goal of the study is reducing the delay and
resource utilization. The work presented in [32] investigates the issue of task scheduling
by implementing hybrid method (Genetic + bacterial foraging (BF)) algorithms in cloud
computing. The study aims to reduce the execution time and the power consumption it
takes to make it, both economically and environmentally. In terms of convergence, stabil-
ity, and solution diversity, the conducting result shows that the method outperforms. The
issue of optimal scheduling task and allocation of the resources to process the huge tasks
request by end-users while keeping the (QoS) is presented in the work in [33]. The studies
suggested a hybrid approach for multi-objective optimization called Adaptive Particle
Swarm Optimization (HAPSO), that incorporates a Genetic Algorithm and Adaptive
Particle Swarm Optimization for achieving the multi-objective (energy consumption
and response time). The work reported in [34] presents a hybrid approach known as
a non-dominance sort-based Hybrid Particle Swarm Optimization (HPSO) algorithm
and a Budget and Deadline constrained Heterogeneous Earliest Finish Time (BDHEFT)
algorithm for scheduling workflow in cloud computing with multi-objectives in terms
(cost and makespan) with maintaining the budget and deadline constraints. The method
is adopting the Pareto best solution. Moreover, A Hybrid Approach for Energy-Aware
Scheduling of Deadline Constrained Workflows (HAED) by adopting the Intelligent
Water Drops algorithm and the Genetic Algorithm in cloud computing is introduced
by the work in [35]. The main concern of the work is to decrease the length of sched-
ule, execution cost, and power consumption while fulfilling deadline constraints. The
study focused on four different scientific workflows namely, Epigenomics, Cybershake,
LIGO, and Montage. The work introduced by [36] discusses the issue of scheduling
the workflow in cloud computing. A hybrid meta-heuristic technique called Hybrid
Fuzzy Hitchcock Bird (HFHB) and a multi-objective variant of HFHB (MOHFHB) has
been proposed. The purpose of the work is to enhance the random population of birds,
set the attack regulator parameter using fuzzy Sugeno-signature, and replace the dead
birds with new ones. In addition, to select the Pareto optimal solution, the study adopts
the crowding distance and ranking non-dominated solution. A hybrid solution based
on a genetic algorithm and heterogeneous earliest finish time addressing the issue of
workflow scheduling in cloud computing (HEFT) in [37]. The study has been carried
out using data sets from real-world scientific workflows and the outcome illustrates the
24 F. A. Saif et al.

effectiveness of the proposing algorithm with respect to cost and makespan. The research
[38] has proposed scheduling the workflow in cloud computing by providing a hybrid
technique based on a genetic algorithm and heterogeneous earliest finish time namely a
deadline-constrained and cost-effective hybrid genetic algorithm for scientific work-
flow scheduling in cloud computing (DCHG-TS) (HEFT). All experiments have been
accomplished by involving real-world scientific workflows and the result demonstrates
the proposing strategy performs better than the previous strategies respecting to execu-
tion time and cost. Table 2 summarizes the related works related to the hybrid approach
in the cloud computing environment.

Table 2. Summary of related works using the hybrid approach for workflow scheduling
algorithms for cloud computing

Ref. Algorithm Main Issue Objectives Limitation Environment


no
[25] Gravitational Workflow Cost & schedule Does not Homogeneous
Search scheduling Length consider
Algorithm (GSA) variable
and bandwidth
Heterogeneous between
Earliest Finish VMs
Time (HEFT)
[26] multi-criteria BoT Cost&Makespan Cause heterogeneous
meta-heuristic scheduling Overhead
algorithms for
[27] Particle Swarm Resource execution time Does not Homogeneous
Optimization and provisioning consider the
Simulated technique to
Annealing avoid
(PSOSA) trapped in
local
optimum
[28] PSO algorithm Workflow violation rate & does not take Homogeneous
and the stratego scheduling aggregate into account
of novel satisfaction rate accelerating
look-ahead based the
on a min-max convergence
heuristic
[29] Ant colony VM computing time high cost Homogeneous
optimization with scheduling
particle swarm with load
(ACOPS) balancing
(continued)
Hybrid Meta-heuristic Genetic Algorithm: Differential Evolution Algorithms 25

Table 2. (continued)

Ref. Algorithm Main Issue Objectives Limitation Environment


no
[30] TSDQ-FLPSO Dynamic makespan,cost,queue High energy Homogeneous
combines the load balanc length,waiting time, consumption
Fuzzy logic with load balancing, and
PSO and degree of imbalance
TSDQ-SAPSO next to increasing
integrates resource utilization
Simulated
Annealing with
the PSO
algorithm
[31] bio-inspired task reliability and delay in Heterogeneous
hybrid algorithm scheduling average response completion
(NBIHA) and resource time of the task
management
[32] Genetic Workflow makespan time, Consuming heterogeneous
algorithm with scheduling makespan time, and time
bacterial foraging scalability
[33] Adaptive Particle resource multi-objectives High Homogeneous
Swarm allocation optimization iterations
Optimization optimization (response time and numbers for
(HAPSO) and task energy consumption) the
scheduling convergence
of optimal
value
[34] Particle Swarm Workflow terms (makespan and Focus only Homogeneous
Optimization scheduling cost) with a deadline on the public
(HPSO) and budget constraint cloud
algorithm and without
Budget and considering
Deadline the hybrid
constrained cloud
Heterogeneous
Earliest Finish
Time (BDHEFT)
(continued)
26 F. A. Saif et al.

Table 2. (continued)

Ref. Algorithm Main Issue Objectives Limitation Environment


no
[35] for Developing multi-objectives to Does not Homogeneous
Energy-Aware the Hybrid minimze (execution consider
Scheduling of Approach for cost,energy fault-tolerant
Deadline Energy-aware consumption, and
Constrained scheduling of schedule length)
Workflows Deadline with meeting the
(HAED) using constrained deadline constraints
Intelligent Water workflows
Drops Algorithm (HAED)
and Genetic
Algorithm
[36] Hybrid Fuzzy Task cost and execution QoS heterogeneous
Hitchcock Bird scheduling time
(HFHB) and the
multi-objective
form of HFHB
(MOHFHB)
[37] hybrid method is workflow cost and execution Ignore Homogeneous
based on a scheduling time dynamic
genetic algorithm VMs
and
heterogeneous
earliest finish
time (HEFT)
[38] (DCHG-TS) workflow execution time and Does not heterogeneous
scheduling cost consider the
types of
resources
used

3 The Proposed Method


This part illustrates the proposing algorithm that merge GA and DE algorithms along
with HEFT algorithm to schedule tasks in details.

3.1 Heterogeneous Earliest Finish Time HEFT


HEFT is classified as a heuristic algorithm and is commonly used for task scheduling.
The strategy of this algorithm is based on ordering tasks in priority and then assigning
tasks to the minimum finish time. Generally, this algorithm is implemented for a single
objective [25]. This algorithm relies on two stages: ranking and mapping. The ranking
phase is to determine the distance between submitting tasks time till the end of workflow,
Hybrid Meta-heuristic Genetic Algorithm: Differential Evolution Algorithms 27

to guarantee execution of the tasks for the maximum number of successors that are
executed first. The mapping phase is for aligning the resources with tasks of workflow.
However, that is why this algorithm is not suitable for dynamic changing of workflow
data centers [38]. The main factors in the HEFT algorithm are EST that indicate to the
earliest start time and EFT related to the earliest finish time of the task execution ni on
the processor pj. The value of EST of the entry task in DAG is equivalent to (0) which is
referring in Eq. (1) and the values of EST and EFT are obtaining from Eqs. (2) and (3).
For gaining the EFT of task ni, all the predecessors of the task ni should be scheduled.
The pred(ni) indicates to the entire predecessors of the task ni and the avail j referring
to the ready time of the pj processor for task executing. Calculating the time of all the
requiring data for ni which arrived at pj is presented by internal max in Eq. (2). The
EST and the EFT of task nm on the processor pj will be equivalent to AST (actual start
time) and AFT (actual finish time) after scheduling the task nm on the processor pj. AFT
indicates to the smallest obtained EFT for that task which is presented in Eq. (5). Finally,
getting the makespan of scheduling is from the AFT of the exit task by Eq. (4). Also,
cm, i presents the communication costs between node m and node i. If the two related
tasks m and i are assigned to the same processor, the cost of cm, i is supposed zero.
 
EST nentery , pj = 0 (1)
   
EST ni , pj = max{avail {j}, max AFT(nm ) + cm,i Where i = 0, 1, . . . , n (2)
nm ∈pred(ni )
   
EFT ni , pj = ωi,j + EST ni , pj (3)

makespan = AFT(nexit ) (4)


   
AFT ni, pj = min EFT ni , pl (5)
1≤l≤m

Determine the priority in HEFT based on the upward rank indicate on Eq. (7). The
succ(ni) indicates to a group of the task successors ni and ci;j refers to the AVG cost
of the communication edge (i, j) and ωi is the AVG computational cost of task ni that
is getting through Eq. (8). Upward rank begins from the exit task recursively. Exit task
rank getting from Eq. (6).

Ranku = (nexit ) = ωexit (6)

max  
Ranku = ωi + Cij + ranku (nj ) (7)
nj ∈ Succ(ni)
q
̟i = ωi,j /q (8)
j=1

downward rank is recursively done by Eq. (9) which begins from the entry node of the
graph. Where pred (ni ) indicates to the group of the task predecessors ni. The value of
the downward rank of the entry node is equivalent to zero.
28 F. A. Saif et al.

3.2 Task Prioritization Phase


Getting the priority for each task in the DAG can be obtained via various techniques.
Specifying the priority of each task by upward rank and downward rank regarding to
the technique in [39] that are predefined in Eqs. (7) and (9). Furthermore, obtaining
the tasks priorities by gathering the upward rank and downward rank along with the
tasks level [40] Eq. (10) presents the defined level in the graph. To gain ranks, the AVG
computation and communication costs are utilized. These rank values orders the tasks
in that precedence constraints are satisfied.

⎨  0, if Ti< Tentry
Level (Ti ) = max Level Tj + 1, otherwise (10)

Tj ∈ pred (Tı )

3.3 Processor Selection Phase


Various scheduling algorithms have adopted the HEFT as allocation of processor. Many
of them utilize the policy of insertion based on the probability, and exploit the slot of
idle time between two prior scheduled tasks for inserting tasks. The time of starting and
finishing execution of two tasks that were executed consecutively on the same processor
is the sleep mode of the processor and the cost of task computation that must be capable
of running in the sleep mode without violating the precedence constraints.

3.4 Genetic Algorithm


This algorithm was developed in 1970 by the scholar John Holland from Michigan
University. The genetic approach has been inspired by evolution and genetic science
to simulate the observed generated manner in the population of the biology. The main
idea is to choose the best solution from the research to choose and produce the individ-
uals (solutions) that are fulfilling their objectives [41]. Genetic algorithm is commonly
implemented for its significant features for optimizing high-complex tasks that it can
handle the continuous and discrete variables, non-linear optimization, and maintaining
the constraint. It is considered a meta-heuristic technique for optimization that contains
the three main operations: Replacement, operation, and selection. The aim of this algo-
rithm is the capability to manipulate the wide search area, process the complex tasks,
and its ability to avoid getting to the local optimum. The genetic algorithm depends
on chromosomes to represent the solution. The starting step in the algorithm begins by
selecting the population randomly [42].

3.5 Differential Evolution (DE)


Meta-heuristic algorithm based on population has a robust stochastic search method for
solving optimization issues. Furthermore, this approach has the ability to get the global
optimum solution [9]. This algorithm was introduced by Storn and Price. The algorithm’s
technique is somewhat similar to the genetic algorithm, but they differ in the ordering
Hybrid Meta-heuristic Genetic Algorithm: Differential Evolution Algorithms 29

of the steps that DE algorithm starts with mutation, crossover, then selection whereas
genetic algorithm processes are selection, crossover, and mutation [10].
The work given has enhanced the proposed approach in this paper. The suggested
approach is based on integrating the meta-heuristics of the Genetic Algorithm and Dif-
ferential Evolution Method to determine the ideal task schedule in cloud environments
by identifying the optimal solutions that reduce the makespan of task execution. The
directed acyclic graph G (V, E), where V (Vertices) denotes a set of nodes in the graph
and E (Edges) indicates the priority relationships between jobs. The weight of edges by
the cost of communication between two requested tasks, whereas the weight of nodes
by the value of their computing time. The communication cost is zero when two jobs
are allocated to the same processor. DAG has tasks T1 , T2 , …, T10 , and T0 , which is the
entry task and T10 is the exit task. Figure 1 presents a directed acyclic graph (DAG). The
following is representing the work of the proposed algorithm. Generating a collection of
solutions, the Genetic Algorithm (GA) is first used, which includes selection, crossover,
and mutation procedures. The Differential Evolution (DE) approach uses these generat-
ing solutions as an initial population. The DE processes are then used to these generating
solutions to generate a solution for the GA population. The procedure keeps continuing
till the DE ends the processing and the updated list of all solutions is obtained. It is
ordered from left to right, but the precedence of new generating solutions following the
production of children is violated.

Fig. 1. Simple Directed Acyclic Graph


30 F. A. Saif et al.

Table 3. Demonstrates the tasks computation cost on the m0 , m1, and m2 and the
̟ points out the task computation cost on the machines. Every task has a various
computation cost on every machine which appears to the heterogeneity of the system.

Table 3. Task Computation Cost on the Machines

Tasks m0 m2 M1 ̟i
t0 9 11 10 10
t1 11 7 9 9
t2 8 6 4 6
t3 6 5 7 6
t4 9 17 10 12
t5 7 5 9 7
t6 12 15 9 12
t7 17 12 13 14
t8 8 12 10 10
t9 16 15 14 15
T1 11 10 12 11

Premature convergence is prevented in the suggested strategy by comparing the


producing offspring to their parents; if the fitness value is higher than the results of
parent, the parents are replaced by their children; otherwise, the process is abandoned.
The flowchart of the GA-DE provided in this paper [43] and is shown in Fig. 2. After
the initial population has been established, the fitness values must be evaluated. Every
iteration of the algorithm must check the termination condition. Unless the termination
condition is met, the best solutions will be produced.

3.6 Production of Initial Population


The proposed method begins with the creation of the initial population, which includes
individuals and chromosomes of fixed sizes. This phase is necessary to verify that the
previous heuristic priority method yields a superior result. Priorities discovered using the
three approaches The initial population for the DAG tasks is level rank, downward rank,
and HEFT-Upward rank. Table 4 shows how the remaining individuals are generated
at random [43]. Created the initial and end positions on the chromosome, which were
linked to the begin and exit nodes, respectively. Create the rest jobs at random and rank
them according to their priorities while keeping it within the precedence constraints.

3.7 Makespan
Makespan of (execution time) is one of the critical objectives that has an important bear-
ing on the the system performance, that refers to the period of the whole workflow from
Hybrid Meta-heuristic Genetic Algorithm: Differential Evolution Algorithms 31

Fig. 2. Flowchart for GA-DE Algorithm

Table 4. List of Tasks with their Priorities

Tasks Ranku (ti ) Rankd (ti ) Level Ranku (ti ) + Rankd (ti )
t0 123 0 0 123
t1 81 22 1 103
t2 76 24 1 100
t3 96 27 1 123
t4 62 41 2 103
t5 63 45 2 108
t6 77 46 2 123
t7 39 69 3 108
t8 36 75 3 111
t9 45 78 3 123
t10 11 112 4 123
32 F. A. Saif et al.

the beginning to the end or the spending time to execute all tasks of DAG. Calculating
makespan in this study for each individual. First, tasks must allocate to the processor by
HEFT illustrated in Sect. 3.1

3.8 Fitness Function


This function is responsible to determine the solution based on the objective’s require-
ments. So, in this study fitness function chooses the solution according on the minimum
execution time. The technique of this study is by determining which individuals must
be utilized to generate the next generations. So, the value of the fitness function has a
significant impact on the proposed algorithm. Exit task finishing time among all tasks
in DAG is makespan. Obtaining the makespan is considered in Eq. (4) and the value of
the fitness function is considered in Eq. (11).

Fitness = Makespan (11)

3.9 Selection Operator


The significant impact of the selection operator appears in the effectiveness of the con-
vergence in the genetic algorithm. The most usable technique in selection is the roulette
wheel which depends on probability and chooses the individual that has high fitness that
is evaluated according to Eq. (12). Pi indicates the probability of every individual for
selected. Algorithm 1 presents the Roulette wheel pseudo code.
Fintnessi
Pi = popsize
(12)
j=1 fitnessj

3.10 Crossover Operator


Crossover and mutation contribute are main parameters in the Genetic algorithm and
assist in termination of the population and guarantee the diversity of population that
Hybrid Meta-heuristic Genetic Algorithm: Differential Evolution Algorithms 33

avoid falling in the optimum solution. The crossover operator is considered a key factor
to explore the solution. This phase aims to exchange some of the individual’s genes with
another one for producing two adequate offspring. The crossover operator employs a
random single point that is generated between land n and land n + 1. If the genes of
both parents from the entry node to the crossover point do not match, the crossover is
performed. The two additional offspring are produced by the crossing single point, which
is equal to five. On the left side children inherit genes from their parents in the place of the
same gene, then selected genes from the parent are eliminated, and the remaining genes
are imported to the child from left to right. As a result, the offspring’s development will
be effective, and their fitness will be gained through the fitness function. When children’s
fitness values are compared to those of their parents, the children’s fitness values will
be substituted, and the values will be better than those of their parents. To illustrate the
procedure of the crossover (see Fig. 3). The details of crossover operator (see Algorithm
2).

Fig. 3. Crossover Operator


34 F. A. Saif et al.

3.11 Mutation Operator

The main role of the mutation operators is raising the variety of the population and
facilitate the exploring in the research area is finding the optimal solution [44]. The
responsibility of the component is generating a new chromosome by modifying two genes
in such a way that the precedence constraint is not broken. This is how the procedure
begins. The first successor of the chode task (tj) from the mutation point to the finish is
identified after a gene is chosen at random. If there is mth gene that is a member of [ i
+ 1, j−1] and the predecessors of tm are not in front of ti, ti and tj can be swapped with
each other. If these conditions are not met in the mutation function, the mutation operator
algorithm is restarted from the beginning. Finally, a child’s fitness value is assessed for
the child, and if the child’s fitness exceeds that of the parent, the child will succeed the
parent. Figure 4 depicts the mutation operator’s detailed method, which was adapted
from [43]. Illustrating the procedure of mutation operator (see Fig. 4).

3.12 Termination Conditions

The evolutionary algorithms category includes both genetic algorithms and DE algo-
rithms. It is well known that these types of algorithms might run indefinitely, hence
when using them, a preset termination condition must be considered in order to end
the solution generation process. Some predetermined policies have been presented in
this research to ensure that the algorithms terminate after delivering the most appropriate
result. The fitness evaluations, the system’s operating times, and the population diversity
have all been taken into account while terminating the algorithms. The algorithm in our
suggested technique is terminated after reaching 1000 iterations.
Hybrid Meta-heuristic Genetic Algorithm: Differential Evolution Algorithms 35

Fig. 4. Mutation operator

4 Performance Evaluation
This section illustrates an overview of the simulation, setting the main parameters of the
simulation environment and data set.

4.1 Simulation Setting


The evaluation of the proposed algorithm constructed on a WorkflowSim-1.0 toolkit has
improved from the CloudSim simulator that was utilized to simulate scheduling the work-
flow applications in cloud computing and evaluate the proposed workflow scheduling
algorithm. The simulation settings are listed in Table 5 which shows the specifications of
the parameters that were utilized in conducting the simulation. The table shows numbers
of VMs, Cloudlets, data centers, CPUs, the bandwidth, the size of the memory (RAM),
the capacity of storage of the host machine, the number of hosts (s), the bandwidth of the
host machine, the VM MIPS, and the amount of real memory. The experiments were con-
ducted on a Windows 10 platform, 4 GB of RAM, Pentium Core i3 processor, one Data
Centre broker, and one Data Centre. The simulation experiments was conducted to evalu-
ate the proposed algorithm GA-DE algorithm versus the other algorithms as a benchmark
using existing algorithms which are HEFT-Level rank,HEFT upward rank, HEFT down-
ward rank, and GA according to the evaluation metric (makespan). Implementing the
workflow scheduling in the simulation environment through scientific workflow: Cyber-
shake_30 and Cybershake_50. Conducting the comparison evaluation among the various
scientific workflow (Cybershake, Montage, and Epigenomics) to determine which one
is more appropriate for implementing workflow applications. For each workflow size,
we generated random DAGs that have 90 tasks and 256 edges. (see Table 5).

4.2 Scientific Workflow Data Sets


The generation of the DAGs was randomly on the CloudSim simulator considering
well known scientific workflow namely: (Cybershake, Montage, and Epigenomics) (see
36 F. A. Saif et al.

Table 5. The Simulation Parameter Settings

Parameter Values
Datacenter configurations
MIPS 15000–2000 (Heterogenous)
RAM 4096 – 10240
Storage 150000
Bandwidth 15000
Number of hosts 10
Virtual Machines
MIPS 250–1000 (Heterogenous)
RAM 256–1024
Storage 500
Bandwidth 250
Number of hosts 40
The GA-DE parameters
No. of the population (genes) 80
Selection operator (srate) 30
Crossover operator (crate) 80
Mutation operator (mrate) 20
Number of generations 1000
Population size Randomly
mutation factor (F) Randomly
crossover rate (CR) Randomly

Fig. 5) They are too abstract, the data flow, that is utilized in real applications. Chosen
the scientific workflow applications due to its ability to illustrate that area of applications
in wide broad and the needs in diversity of resources.

• Cybershake: utilized by the Southern California Earthquake Center (SCEC). It is


commonly utilized to execute a massive number of computations on extremely huge
data sets.
• Montage: Generated by NASA and utilized for producing specific mosaics of the sky
(Astronomy) utilizing an image as input.
• Epigenomics is used to map human cells on a large scale and manipulate manufac-
turing DNA Methylation (Biology)

In this study, the workflow data sets are CyberShake_30.xml and Cyber-
Shake_50.xml tasks to evaluate the performance of the proposed algorithm with the
Hybrid Meta-heuristic Genetic Algorithm: Differential Evolution Algorithms 37

comparison to algorithms that conducted 100 iterations according to the evaluation


metric Makespan.

Fig. 5. Types of Workflows. Photograph by [40]

5 Experimental Results

Conducting the simulation for evaluating the GA-DE algorithms’s performance com-
pared to the three heuristic algorithms (HEFT upward rank, HEFT downward rank, and
HEFT-Level rank) and the metaheuristic algorithm Genetic algorithm (GA) with 30 VMs
and 5 hosts.
In Fig. 6 the bar charts have been drawn between various numbers of tasks on
the X-axis and the corresponding makespan of executing the applications on the Y-
axis. The proposed algorithm (GA-DE) was compared versus other heuristic algorithms
as benchmark (HEFT-Upward rank, downward rank, and Level rank) with respect to
makespan by generating random DAGs with 10, 50, and 100 cloudlets. DA-DE algorithm
outperforms algorithms in terms of makespan despite being constantly raised with the
growing number of tasks.
The result of conducting of experiments that applied to Cybershake workflow to esti-
mate the makespan is shown in Table 6. Figure 7(a) illustrates the experimental results
from the comparison algorithms among the three heuristic approaches (HEFT upward
rank, HEFT downward rank, and HEFT-Level rank), and the metaheuristic algorithm
Genetic algorithm (GA) with the proposed hybrid metaheuristic (GA-DE) that was con-
ducted in cybershake_30 cloudlets. It is obvious from the dramatically different results
obtained from considering the makespan metrics that GA-DE performs the best per-
formance due to accomplishing the workflow scheduling with minimum makespan and
outperforming the comparison algorithms. Then, followed by GA with a reasonable
result and better than the three heuristic algorithms. Also, it is noticed that HEFT down-
ward rank obtains the worst result in this experiment with the highest makespan which
means consuming more energy consumption. Overall, this experimental result proves
that metaheuristic approaches are more proper for scheduling the workflow applications
in the distributed environment than the heuristic algorithm.
38 F. A. Saif et al.

Fig. 6. Comparison of Makespan Versus the Number of various Tasks in Random Graphs

Table 6. The Result of Total Executed Jobs

Cybershake_50 Cybershake_30 Algorithm


230.48 202.36 HEFT upward
rank
319.93 202.42 HEFT-
Level rank
334.44 207.94 HEFT downward rank
227.89 199.78 GA-DE
229.37 200.23 GA

In Fig. 7(b) the bar chart illustrates the number of comparison algorithms and the
makespan rate according to the makespan aspect and conducted by Cybershake_50
cloudlets to simulate the scheduling of workflow in cloud computing. Apparently, the
hybrid algorithm GA-DE remains in the forefront with achieving minimum makespan
despite the increasing numbers of tasks processing, with slightly better than GA. In corre-
sponding, still HEFT downward rank in the worst performance with a higher makespan.
Furthermore, it can be noticed HEFT Level rank has a high makespan and a little differ-
ence in result with HEFT downward rank, which means it loses its stability in scheduling
Hybrid Meta-heuristic Genetic Algorithm: Differential Evolution Algorithms 39

210
208
206
Makespan

204
202
200
198
196
194
HEFT upward HEFT HEFT-Level GA-DE GA
rank downward rank rank
Cybershake_30
(a)

400
350
300
Makespan

250
200
150
100
50
0
HEFT upward HEFT HEFT-Level GA-DE GA
rank downward rank rank
Cybershake_50
(b)
Fig. 7. (a) Comparison with Various Algorithms in Terms of Makespan and Cybershake_30 (b)
Comparison with Various Algorithms in Terms of Makespan and Cybershake_30

the workflow when increasing numbers on tasks. The result of a conducting the experi-
mentations of GA-DE algorithm over various Scientific Workflow to calculate the result
of makespan is shown in Table 7.

Table 7. The Simulation Result of Scientific Workflow on GA-DE Algorithm

Proposing algorithm Makespan


Cybershake_100 Epigenomics _100 Montage_100
GA-DE 326.64 42138.66 45.05

In Fig. 8 the second simulation experiments with three real-world scientific appli-
cation DAG structures: (Montage, Epigenomics, and Cybershake) to validate the result
of the proposed GA-DE algorithm with respect to the makespan depending on random
generated DAGs with 100 cloudlets and conducted 100 times till convergent of values.
40 F. A. Saif et al.

Fig. 8. Measuring Makespan among Different Scientific Workflow

It is obvious that the performance of the proposed algorithm when conducting Montage
workflow obtains the best result in terms of makespan with a minimum makespan that is
45.05 followed by Cybershake with a vast difference of about 326.64. The worst perfor-
mance of the proposed algorithm is during conducting the Epigenomics with a massive
difference that is 42138.66.
Generally, this study proves the effectiveness of the hybrid metaheuristic that is
derived from merging two meta-heuristic algorithms than the singular heuristic approach
for task scheduling in distributed computing. A hybrid metaheuristic aims to exploit the
benefits of the metaheuristics by merging them to strengthen the efficiency and overcome
the limitation of the algorithms such as stucking in local optimum. Even more, this
approach achieves minimum makespan that means the lowest response time that plays a
significant impact in guaranteeing the QoS, especially indirectly affecting the experience
of users. Thus, the end-users are open to spending less money to conduct higher services
and that is what this study wants to achieve from the proposed algorithm GA-DE.

6 Conclusion
This study conducted the hybrid meta-heuristic algorithm GA-DE to verify the effe-
ciency of the proposed algorithm for scheduling the workflow in heterogeneous cloud
computing in terms of makespan which is considered as the main objective of the study
to improve the scheduling in cloud computing. This proposed hybrid algorithm has
exploited the features of the two meta-heuristic algorithms namely genetic algorithm
(GA) and Differential Evolution (DE) to reduce the makespan by adopting the roulette
wheel technique to facilitate finding the best optimal solution from fitness value. The sim-
ulation was conducted on scientific workflow (Cybershake, Epigenomics, and Montage)
Hybrid Meta-heuristic Genetic Algorithm: Differential Evolution Algorithms 41

and a cloudsim simulator for modeling the algorithm. The experiment was compared
to three other heuristics (HEFT level rank, HEFT-Upward rank, and HEFT-downward
rank) and the GA meta-heuristic approach. The simulation result demonstrates the effec-
tiveness of the proposed algorithm in order to fulfill the best result compared to others
in providing the reduced makespan. Besides, the simulation result verifies that the best
scientific workflow goes for the Montage workflow whereas the worst result is for the
Epigenomics. However, this study considers a single objective which is reducing the
makespan. Therefore, in the future, we will discuss the multi-objective optimization
(MOP) in term of (energy consumption, delay, and resource utilization) and incorporate
other meta-heuristic. This study can be further extended to adopt with artificial neural
for predicting the workload. Even more, considering multi-objectives optimization such
as ( cost, energy, delay, etc.)

References
1. Rimal, B.P., Choi, E., Lumb, I.: A Taxonomy, Survey, and Issues of Cloud Computing
Ecosystems, pp. 21–46 (2010)
2. Mehdi, N.A., Mamat, A., Amer, A., Abdul-Mehdi, Z.T.: Minimum completion time for power-
aware scheduling in cloud computing. In: Proceedings - 4th International Conference on
Developments in eSystems Engineering, DeSE 2011, pp. 484–489 (2011). https://doi.org/10.
1109/DeSE.2011.30
3. Mehdi, N.A., Ali, H., Amer, A., Abdul-Mehdi, Z.T.: Two-phase provisioning for HPC tasks
in virtualized datacenters. In: International Conference on Emerging Trends in Computer and
Electronics Engineering (ICETCEE’2012), no. March 2012 (2012). https://www.researchg
ate.net/publication/262300772
4. Lagar-Cavilla, H.A., et al.: SnowFlock. In: Proceedings of the fourth ACM european confer-
ence on Computer systems - EuroSys ‘09, p. 1 (2009). https://doi.org/10.1145/1519065.151
9067
5. Armbrust, M., et al.: A view of cloud computing. Commun. ACM 53(4), 50–58 (2010). https://
doi.org/10.1145/1721654.1721672
6. Gupta, A., Garg, R.: Workflow scheduling in heterogeneous computing systems : a survey. In:
2017 International Conference on Computing and Communication Technologies for Smart
Nation (IC3TSN), pp. 319–326 (2017). https://doi.org/10.1109/IC3TSN.2017.8284499
7. Pant, M., Zaheer, H., Garcia-Hernandez, L., Abraham, A.: Differential evolution: a review of
more than two decades of research. Eng. Appl. Artif. Intell. 90, 103479 (2020). https://doi.
org/10.1016/j.engappai.2020.103479
8. Gu, Y., Budati, C.: Energy-aware workflow scheduling and optimization in clouds using bat
algorithm. Futur. Gener. Comput. Syst. 113, 106–112 (2020). https://doi.org/10.1016/j.future.
2020.06.031
9. Alaei, M., Khorsand, R., Ramezanpour, M.: An adaptive fault detector strategy for scientific
workflow scheduling based on improved differential evolution algorithm in cloud. Appl. Soft
Comput. 99, 106895 (2021). https://doi.org/10.1016/j.asoc.2020.106895
10. Subramoney, D., Nyirenda, C.N.: A Comparative Evaluation of Population-based Optimiza-
tion Algorithms for Workflow Scheduling in Cloud-Fog Environments (2020). http://arxiv.
org/abs/2012.00176
11. Wesley Chai, S.J.B.: What Is Cloud Computing?. techtarget, Dec. https://www.techtarget.
com/searchcloudcomputing/definition/cloud-computing
12. Mangla, P.: Heuristic vs meta-heuristic approaches for load balancing in cloud environment.
IJRTI 3(8), 197–200 (2018)
42 F. A. Saif et al.

13. Tyagi, R., Gupta, S.K.: A survey on scheduling algorithms for parallel and distributed systems.
In: Mishra, A., Basu, A., Tyagi, V. (eds.) Silicon Photonics & High Performance Computing.
AISC, vol. 718, pp. 51–64. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-
7656-5_7
14. Vashistha, A., Porwal, R., Soni, A.K.: A Taxonomy of Scheduling Algorithms for Cloud
Computing. 12(1), 67–71 (2015)
15. Luis, F., Moncayo, G.: A review on scientific workflow scheduling in cloud computing. In:
Proceedings of the 2nd International Conference on Communication and Electronics Systems
(ICCES 2017), p. 6 (2017)
16. Enzai, N.I.M., Tang, M.: A heuristic algorithm for multi-site computation offloading in mobile
cloud computing. Procedia Comput. Sci. 80, 1232–1241 (2016). https://doi.org/10.1016/j.
procs.2016.05.490
17. Li, X., Yu, W., Ruiz, R., Zhu, J.: Energy-aware cloud workflow applications scheduling with
geo-distributed data. IEEE Trans. Serv. Comput. 15(2), 891–903 (2020). https://doi.org/10.
1109/TSC.2020.2965106
18. Cai, Z., Li, X., Ruiz, R., Li, Q.: A delay-based dynamic scheduling algorithm for bag-of-task
workflows with stochastic task execution times in clouds. Futur. Gener. Comput. Syst. 71,
57–72 (2017). https://doi.org/10.1016/j.future.2017.01.020
19. Arabnejad, V., Bubendorfer, K., Ng, B.: Budget and deadline aware e-science workflow
scheduling in clouds. IEEE Trans. Parallel Distrib. Syst. 30(1), 29–44 (2019). https://doi.org/
10.1109/TPDS.2018.2849396
20. Alkhanak, E.N., Lee, S.P.: A hyper-heuristic cost optimisation approach for scientific work-
flow scheduling in cloud computing. Futur. Gener. Comput. Syst. 86(ii), 480–506 (2018).
https://doi.org/10.1016/j.future.2018.03.055
21. Keshanchi, B., Souri, A., Navimipour, N.J.: An improved genetic algorithm for task scheduling
in the cloud environments using the priority queues: formal verification, simulation, and
statistical testing. J. Syst. Softw. 124, 1–21 (2017). https://doi.org/10.1016/j.jss.2016.07.006
22. Akbari, M., Rashidi, H., Alizadeh, S.H.: An enhanced genetic algorithm with new operators for
task scheduling in heterogeneous computing systems. Eng. Appl. Artif. Intell. 61(February),
35–46 (2017). https://doi.org/10.1016/j.engappai.2017.02.013
23. Ismayilov, G., Topcuoglu, H.R.: Neural network based multi-objective evolutionary algorithm
for dynamic workflow scheduling in cloud computing. Futur. Gener. Comput. Syst. 102,
307–322 (2020). https://doi.org/10.1016/j.future.2019.08.012
24. Aburukba, R.O., Landolsi, T., Omer, D.: A heuristic scheduling approach for fog-cloud com-
puting environment with stationary IoT devices. J. Netw. Comput. Appl. 180, 102994 (2021).
https://doi.org/10.1016/j.jnca.2021.102994
25. Choudhary, A., Gupta, I., Singh, V., Jana, P.K.: A GSA based hybrid algorithm for bi-objective
workflow scheduling in cloud computing. Futur. Gener. Comput. Syst. 83, 14–26 (2018).
https://doi.org/10.1016/j.future.2018.01.005
26. Moschakis, I.A., Karatza, H.D.: A meta-heuristic optimization approach to the scheduling of
bag-of-tasks applications on heterogeneous clouds with multi-level arrivals and critical jobs.
Simul. Model. Pract. Theory 57, 1–25 (2015). https://doi.org/10.1016/j.simpat.2015.04.009
27. Eawna, M.H., Mohammed, S.H., El-Horbaty, E.-S.M.: Hybrid algorithm for resource provi-
sioning of multi-tier cloud computing. In: Procedia Computer Science 65, 682–690 (2015).
https://doi.org/10.1016/j.procs.2015.09.012
28. Ambursa, F.U., Latip, R., Abdullah, A., Subramaniam, S.: A particle swarm optimization and
min–max-based workflow scheduling algorithm with QoS satisfaction for service-oriented
grids. J. Supercomput. 73(5), 2018–2051 (2016). https://doi.org/10.1007/s11227-016-1901-x
29. Cho, K.-M., Tsai, P.-W., Tsai, C.-W., Yang, C.-S.: A hybrid meta-heuristic algorithm for VM
scheduling with load balancing in cloud computing. Neural Comput. Appl. 26(6), 1297–1309
(2014). https://doi.org/10.1007/s00521-014-1804-9
Hybrid Meta-heuristic Genetic Algorithm: Differential Evolution Algorithms 43

30. Ben Alla, H., Ben Alla, S., Touhafi, A., Ezzati, A.: A novel task scheduling approach based
on dynamic queues and hybrid meta-heuristic algorithms for cloud computing environment.
Clust. Comput. 21(4), 1797–1820 (2018). https://doi.org/10.1007/s10586-018-2811-x
31. Rafique, H., Shah, M.A., Islam, S.U., Maqsood, T., Khan, S., Maple, C.: A novel bio-inspired
hybrid algorithm (NBIHA) for efficient resource management in fog computing. IEEE Access
7, 115760–115773 (2019). https://doi.org/10.1109/access.2019.2924958
32. Srichandan, S., Ashok Kumar, T., Bibhudatta, S.: Task scheduling for cloud computing using
multi-objective hybrid bacteria foraging algorithm. Futur. Comput. Informatics J., 3(2), 210–
230 (2018). https://doi.org/10.1016/j.fcij.2018.03.004
33. Midya, S., Roy, A., Majumder, K., Phadikar, S.: Multi-objective optimization technique for
resource allocation and task scheduling in vehicular cloud architecture: A hybrid adaptive
nature inspired approach. J. Netw. Comput. Appl. 103, 58–84 (2018). https://doi.org/10.1016/
j.jnca.2017.11.016
34. Verma, A., Kaushal, S.: A hybrid multi-objective particle swarm optimization for scientific
workflow scheduling. Parallel Comput. 62, 1–19 (2017). https://doi.org/10.1016/j.parco.2017.
01.002
35. Kalra, M., Singh, S.: Multi-objective energy aware scheduling of deadline constrained work-
flows in clouds using hybrid approach. Wireless Pers. Commun. 116(3), 1743–1764 (2020).
https://doi.org/10.1007/s11277-020-07759-4
36. Zade, B.M.H., Mansouri, N., Javidi, M.M.: Multi-objective scheduling technique based on
hybrid hitchcock bird algorithm and fuzzy signature in cloud computing. Eng. Appl. Artif.
Intell. 104, 104372 (2021). https://doi.org/10.1016/j.engappai.2021.104372
37. Sardaraz, M., Tahir, M.: A hybrid algorithm for scheduling scientific workflows in cloud
computing. IEEE Access 7, 186137–186146 (2019). https://doi.org/10.1109/ACCESS.2019.
2961106
38. Iranmanesh, A., Naji, H.R.: DCHG-TS: a deadline-constrained and cost-effective hybrid
genetic algorithm for scientific workflow scheduling in cloud computing. Clust. Comput.
24(2), 667–681 (2020). https://doi.org/10.1007/s10586-020-03145-8
39. Topcuoglu, H., Hariri, S., Wu, M.-Y.: Performance-effective and low-complexity task schedul-
ing for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002).
https://doi.org/10.1109/71.993206
40. Xu, Y., Li, K., Hu, J., Li, K.: A genetic algorithm for task scheduling on heterogeneous
computing systems using multiple priority queues. Inf. Sci. (Ny) 270, 255–287 (2014). https://
doi.org/10.1016/j.ins.2014.02.122
41. Hassan, R., Cohanim, B., De Weck, O., Venter, G.: A comparison of particle
swarm optimization and the genetic algorithm. Collection of Technical Papers -
AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics and Materials Conference
2, 1138–1150 (2005). https://doi.org/10.2514/6.2005-1897
42. Zubair, M., Javaid, N., Ismail, M., Zakria, M., Asad Zaheer, M., Saeed, F.: Integration of
cloud-fog based platform for load balancing using hybrid genetic algorithm using bin packing
technique. In: Xhafa, F., Leu, F.-Y., Ficco, M., Yang, C.-T. (eds.) 3PGCIC 2018. LNDECT,
vol. 24, pp. 279–292. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-02607-3_25
43. Kamalinia, A., Ghaffari, A.: Hybrid task scheduling method for cloud computing by genetic
and DE algorithms. Wireless Pers. Commun. 97(4), 6301–6323 (2017). https://doi.org/10.
1007/s11277-017-4839-2
44. Ali, I.M., Sallam, K.M., Moustafa, N., Chakraborty, R., Ryan, M.J., Choo, K.-K. R.: An
automated task scheduling model using non-dominated sorting genetic algorithm II for fog-
cloud systems. IEEE Trans. Cloud Comput. 7161, 15 (2020). https://doi.org/10.1109/TCC.
2020.3032386
Reconfiguration of Protected Unicast
Connections in Elastic Optical Networks

Kobenan Ali Ouattara1(B) , Adepo Joël Christian2 , Anoh Nogbou Georges2 ,


and Babri Michel1
1 Ecole Doctorale Polytechnique de l’Institut National Polytechnique de Yamoussoukro
(INPHB), LARIT, Yamoussoukro, Côte d’Ivoire
[email protected]
2 Université Virtuelle de Côte d’Ivoire (UVCI), Unité de Recherche et d’Expertise Numérique

(UREN), Abidjan, Côte d’Ivoire

Abstract. The problem addressed in this article is to reconfigure a set of unicast


connections from an initial routing to a new precomputed routing while limiting
the disruption induced for the clients. The goal is to plan reconfiguration order
of connections without the interrupting flow. This problem is difficult to resolve
because of the dependencies between the initial routing and the new routing.
Existing works propose reconfiguration solutions that use backup path resources
to temporarily move some connections. The final routing is found as you go.
In this work, we reconfigure the initial routing to a new precomputed routing.
This causes new dependencies in addition to those existing between the shared
backup paths, increasing the difficulty of the process. To solve this problem, we
use dependency graphs and backup resources. Simulation results show that our
approach is effective.

Keywords: Reconfiguration · Connection protection · Elastic optical networks

1 Introduction
The evolution of data-intensive applications in transport networks such as videocon-
ferencing, streaming services, cloud computing applications, the development of data
centers, telecom operators are led to design their fiber-based networks optical. The tech-
nology of these networks is Wavelength Division Multiplexing (WDM). Today, we are
talking about elastic or flexible optical networks which give many advantages [1]. In these
networks, connections are often submitted to some disruptions. An approach to effective
management and failure tolerance is to establish connections protected by backup paths.
Fiber-optic networks, like connection-oriented networks, experience a drop in perfor-
mance due to events such as traffic changes (add or deleting connections) or maintenance
operations on a link or node. To optimise them, network operators need to reconfigure
routing (provide a new routing: configuration). Reconfiguration is an important feature
for optimising the use of network resources. It is used by operators to plan path changes
in the event of network disruptions. It must be transparent from the user’s point of view.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 44–58, 2023.
https://doi.org/10.1007/978-3-031-18344-7_3
Reconfiguration of Protected Unicast Connections in Elastic 45

Reconfiguration is a difficult problem that can be reduced to two subproblems [3]. The
sub-problem of calculating the new routing and the sub-problem of switching from the
current routing to the new routing. This paper interests the second sub-problem. When
the reconfiguration is not well done, it produces flow interruption on the connections.
To reduce flow interruption during the reconfiguration process, some approaches use
backup path resources [1, 2]. Backup paths are used to free up resources needed by
some connections in the reconfiguration process. When connections use resources need
for reconfiguring another connection, backup paths are used to temporarily switch those
connections to free their resources for establishing connections for the reconfiguration.
The reconfiguration can be done in a single step using the backup paths. However, it
consumes enough resources [2]. Additional resources may be needed to free up con-
nections on backup paths because the resources available on these paths are limited,
so do not allow simultaneous switching of the primary paths of connections. In some
works, researchers calculate the final paths [3–5]. In this paper, we consider that the
connections established on an initial routing are protected. And we want to reconfigure
then to a new pre-computed routing, using the backup paths resources. In this work,
we want to reconfigure a set of protected unicast connections using the backup paths.
Given the dependencies between the initial and final paths and dependencies between
shared backup paths, solving this problem becomes a challenge. This work proposes
a reconfiguration technique that ensures flow continuity, reduces the number of steps
and additional resources (backup paths) during the reconfiguration process. This paper is
structured as follows. In Sect. 2, we briefly present the general concepts of elastic optical
networks. In Sect. 3, related work is presented. Section 4 gives the problem statement
and, Sect. 5 describes a basic reconfiguration algorithm using backup paths. Section 6
is our approach. Section 7 shows the results of evaluations.

2 General Concepts of Elastic Optical Networks

Elastic optical networks are an extension of traditional Wavelength Division Multiplex-


ing (WDM) optical networks. In this section, we will discuss the basic principles and
protection techniques in optical networks.

2.1 Basic Concepts and Architecture

The Elastic optical networks are developed from the limitations of Wavelength Division
Multiplexing (WDM) optical networks. WDM optical networks allow multiple users to
use the full bandwidth of the optical channel simultaneously at different wavelengths.
In WDM, the frequency grid of the channel is subdivided into sub-bands of fixed width.
The fixed allocation of frequency grids is one of the major problems of these networks.
In this case, a less demanding application will be allocated a large bandwidth, which
will waste the available resources. In addition, a very demanding application will be
allocated insufficient bandwidth, which is inappropriate. This inefficient management
of available resources has introduced elastic or flexible optical networks. This solution
is suitable for future technologies. Elastic optical networks allow for more efficient
resource management. The allocation of available resources depends on the modulation
46 K. A. Ouattara et al.

format, the path length, and the bandwidth demand. The frequency sub-bands of elastic
optical networks are flexible, unlike traditional optical networks. Figure 1 below shows
the spectral difference between the two technologies (fixed-width WDM and flexible-
width optical networks). In Fig. 1(a), the frequency bands are fixed, regardless of the
bandwidth demand of the application. This solution is not suitable for efficient resource
management. Figure 1(b) shows the flexible management of sub-bands according to
the connection demand. The frequency grid consists of small fine granularities which
are allocation units called frequency slots. The size of a slot can vary from 25 GHz to
6.25 GHz depending on the policy in place, the modulation format, and other parameters
[6, 7]. Connection requests must meet the continuity and contiguity constraints of the
frequency slots.

Fig. 1. Difference between WDM and elastic optical networks

Elastic optical networks use the following basic elements: variable bandwidth
transponders (BVT), variable bandwidth optical connectors BV_OXC, optical ampli-
fiers AO. Figure 2 below shows the basic architecture of elastic optical networks. Each
component has a specific function in this architecture. The BVTs allow the right mod-
ulation format and the right number of frequency slots to be found, depending on the
demand. BV_OXC variable bandwidth optical connectors on each node of the path
between the source and destination nodes help to establish an end-to-end optical path.
Optical amplifiers are placed at least 80 km away to regenerate the optical signal [7].

Fig. 2. Basic architecture of elastic optical networks


Reconfiguration of Protected Unicast Connections in Elastic 47

We have briefly presented the basic concepts of elastic optical networks. In the next
section, we present the concept of connection protection in optical networks.

2.2 Protecting Connections in Elastic Optical Networks

Connection protection is a technique used by the research community to guard against


link or node failures to improve fault tolerance. Elastic optical networks, such as tra-
ditional WDM networks, are prone to frequent disruptions that degrade network per-
formance. To recover the signal, the researchers proposed connection protection. This
approach involves reserving an alternative backup path for an established connection in
the network. Connection protection in elastic optical networks has been the subject of
several approaches in the literature. This work was inspired by WDM networks. Several
protection techniques exist in the literature: link protection and path protection. Another
variant is segment protection. The protection can be dedicated or shared depending on
the techniques chosen. In link protection, each link in the path is protected by other links
in the network, while in path protection, the path is globally protected. The backup path
and the primary path are disjoint. The studies in [8] proposed an approach to shared path
protection in elastic optical networks. This approach protects a set of disjoint links with
one or more common links. Another variant of this approach is dedicated protection.
Some authors in particular [9] propose a hybrid approach to both approaches. Shared
path protection is widely used in the literature because of its efficiency in terms of
backup capacity use [10]. In our work, we will use the shared path protection approach
(shared protection). In the next section, we present related work on reconfiguration using
protected paths in optical networks.

3 Related Work
Reconfiguration is used to optimise resource management or when a maintenance oper-
ation is planned in the network [11]. This technique is important for connection-oriented
networks to meet certain optimization requirements. The reconfiguration is part of the
planning of network resources. It consists in changing the routes of the connections
established to respond to disruptive phenomena (maintenance operations, failures of
links or nodes for example). During the reconfiguration process, when the resources of
the final path of a connection are used by another connection, it is necessary to tem-
porarily interrupt this connection to switch the connection to its final path. This can have
consequences if this interrupted connection is carrying very important data. To resolve
this problem, some works use the backup paths to temporarily switch over certain con-
nections using the resources necessary to reconfigure other connections. Backup paths
are therefore used to avoid interrupting connections during the reconfiguration process.
In this section, we discuss the work done on routing reconfiguration. The authors [2]
proposed a trade-off between improving network performance and packet loss during
the reconfiguration process. The trade-off to solve these two conflicting problems is to
use backup paths during the reconfiguration process. In this context, the resources of
the backup paths are reserved for restoring the flow in case of a connection failure. This
approach is resource intensive as it uses a dedicated protection where each primary path
48 K. A. Ouattara et al.

has its backup path. The authors of [12] have done work on connection reconfiguration
in Multiprotocol Label Switching (MPLS) networks. These connections are protected
by backup paths to free up resources for establishing new connections. The authors [13]
propose an approach to reconfiguration using additional resources (i.e. backup paths)
in traditional MPLS and WDM optical networks. Reconfiguration often requires the
interruption of some connections. The objective is to propose a reconfiguration app-
roach without interrupting connections. The authors [3] propose a reconfiguration using
shared backup path resources (shared protection) for Wide Area Network (WAN) mesh
networks. This technique aims to minimise the loss of network performance due to traf-
fic losses when switching from the current routing to the new routing. The author [1]
presents a reconfiguration study in a star topology. The objective is to reduce connection
interruptions resulting in performance loss. To solve this problem, the authors proposed
to use the backup paths which are the reserved resources. The algorithm is based on five
procedures for configuring and destroying unused or faulty optical paths. In the above,
the reconfiguration technique is done without interrupting the connections. Migration is
done connection by connection using the backup paths. As this method is relatively long,
another approach is group reconfiguration. This involves migrating a set of connections
simultaneously according to a defined criterion, which reduces the number of steps in
the process and therefore the duration of the reconfiguration process. This method is pro-
posed by [10] in WDM optical networks. The basic idea is to group primary connections
with disjoint links that do not share backup paths. These connections can be migrated
simultaneously to their backup paths and then from their backup paths to their final paths.
This solution assumes that the resources of final paths are always available. Previous
work has focused on the reconfiguration of routing in WDM optical networks using
backup path resources. In the problem addressed in [10, 13], the initial routing and the
shared backup paths of the set of connections are known. The final routing is unknown
at the beginning and determined during the reconfiguration process. But network oper-
ators, to optimize their network, can determine the final routing before the process of
migration [14–16]. A recent study in [17] concerned work on the reconfiguration of the
routing with the backup paths. This approach does not produce any disruption however
these works are done for multicast connections. In this case, the problem data becomes
the initial routing, the final routing, and the backup paths of the network topology. Thus,
new dependencies arise in this problem such as the dependencies between the paths of
the initial routing and the paths of the final routing. This new type of dependency is in
addition to the existing ones between shared backup paths. Figure 3 and Fig. 4 show
these dependencies. To solve this problem, we will model the dependencies on the one
hand and propose a reconfiguration algorithm on the other hand.
Reconfiguration of Protected Unicast Connections in Elastic 49

4 Problem Modelling
In this section, we address the problem statement of routing reconfiguration in elastic
optical network. Reconfiguration problem can be reduced to two subproblems: (1) find
a new routing corresponding current routing and (2) process of switching from current
routing to new routing [18]. In this work we address the second subproblem. We assume
initial routing and final routing are known. The objective is to determine configuration
sequences of current topology to new topology without interruption.
Consider the physical network can be modelised by an unoriented graph G = (N , L)
where N is the set of physical nodes and L the physical links between the end nodes
representing optic fiber. Consider R0 = (Ci , Si ) and Rt = (Cf ) and where R0 and Rt
denote respectively initial routing and final routing.
Ci = (p1 , p2 , . . . , pn ) and Si = (b1 , b2 , . . . , bn ) where bi is the backup paths of pi,
i ∈ {1, . . . , n}. A unicast connection of the virtual topology is characterised by a source
node and one destination node and the necessary slots number to the signal transmission
called a lightpath. We note that:

• Each link uses an optic fiber capacity K divided to many frequencies slot of size T;
• Connection established on a lightpath must satisfy the continuity’s constraint and
contiguity constraint of frequencies slot;
• An interruption during the reconfiguration process is characterized by the absence of
resources necessary to establish the connection (frequency slots) between the source
node and the destination node of the connection.

We denote by the interruption of a connection i during reconfiguration process


between source s and destination d.

1 if no resource for connection i
Int(s, d )i = (1)
0 otherwise

Total number of interrupts is defined by the Eq. (2) below:


n

Int_tot = Int(s, d )i (2)
i=1

where n denotes the number of reconfigured connections.


The problem is to determine a series of connection sets that can be reconfigured
without interrupting the flow, while reducing the duration (number of connection sets),
and the cost of the resources used (backup and primary path resources). This problem
is difficult to solve because dependencies can be existing between resources used by
current paths and final paths.
50 K. A. Ouattara et al.

5 Basic Reconfiguration Algorithm using Backup Paths


A basic approach of reconfiguration problem is Make Before Break (MBB). This app-
roach has been developed in the literature especially in WDM optical networks. The
author of [15] presents this approach in his work. The basic idea is to make recon-
figuration easy without interruption. The connection is reconfigured by first reserving
bandwidth on the new path of the connection, then migrate the traffic from the old path
to the new path of connection. MBB technique assumes knowledge of the initial path
and the final path of the connection. The new route does not necessarily need to be
disjoint from the old one [19]. This approach does not always allow reconfiguration
without interruption, especially when final resources are not available. A variant is to
use the backup paths in MBB technique to temporarily free up resources to facilitate
reconfiguration. Reconfiguration is done connection by connection, this approach guar-
antees the continuity of flow, but the process is long when the number of connections to
be reconfigured becomes large enough.
The MBB approach can be summarized through the following steps:

• Pre-establish the new path between the source and the destination. All nodes of the
new path are configured in parallel except the source;
• Configure the source to interrupt the flow on the old path to feed the new path;
• Delete the old path, all nodes of the old path are configured in parallel.

Another variant is the use of backup paths in the MBB technique. In this case, the
backup paths are used during reconfiguration to avoid temporarily interrupting of some
connections during the process. The steps can be summarized as follows:

• Pre-establish the backup path of the connection which uses the resources of the final
path of the connection to be reconfigured. All nodes are configured in parallel except
the source;
• Configure the source to interrupt the flow on the old path of said connection and feed
the backup path;
• Delete the old path (the final path which uses the resources of the connection to be
reconfigured);
• Pre-establish the final path of the connection to be reconfigured by configuring all the
nodes in parallel except the source;
• Configure the source to interrupt the flow on the old path of the connection to be
reconfigured and feed the new path (final path of the connection);
• Delete the old connection path by configuring all nodes in parallel;
• Pre-establish the final connection path, switch to its backup path;
• Establish his new path (his final path);
• Delete his backup path.

This approach is not effective in meeting reconfiguration goals because it is slow.


We propose a faster solution without interrupting the flow by using dependency graphs.
We will compare our approach to the two variants of the MBB.
Reconfiguration of Protected Unicast Connections in Elastic 51

6 Proposition of Reconfiguration Algorithm with Shared Backup


Paths

We present here our proposition.

6.1 Dependencies Modelling

First Type of Dependence: Dependence between Initial Paths and Final Paths
The dependencies between the initial paths and the final paths are modelled by a directed
graph called the dependency graph Gd. In this graph, the nodes represent the optical paths
of the main topology, and the dependencies are materialized by arcs. A dependency
between the paths of two connections i and j is defined by the arc (i, j) such that the
initial path of connection j uses the resources necessary to configure the new path of the
connection
 i. The
 details of the construction of the dependency graph are defined in [15].
Gd = N ′ , L′ is the graph representing the dependencies between the initial routing
paths and the final routing paths. N’ the set of nodes representing the connections of the
network topology and L’ the set of arcs between two nodes. The figures below illustrate
the principles for determining the dependency graph.

Primary paths

Backup paths

Final paths

Network nodes

Fig. 3. Example of the main virtual topology of 7 connections established in the network

Fig. 4. Dependency Graph between Initial and Final Paths


52 K. A. Ouattara et al.

Second Type of Dependence: Dependence Between Shared Backup Paths


The dependencies of the shared backup paths are modelled by an undirected graph called
the auxiliary graph Ga. In this graph, each node represents the backup paths. Two nodes
of the graph are adjacent when the corresponding backup paths are shared (at least one
link in common). The construction of this graph is detailed in [10]. Figure 5 below
illustrates this dependency for the previous network example in Fig. 3.

Fig. 5. Auxiliary Graph with Coloured Nodes

Figure 5 above shows the auxiliary graph corresponding to the network topology in
Fig. 3.

6.2 Reconfiguration Algorithm

The network reconfiguration occurs between constant time intervals called observation
periods [20]. The algorithm below illustrates this new approach of reconfiguring the
initial routing to the precomputed final routing based on shared backup paths.
Reconfiguration of Protected Unicast Connections in Elastic 53
54 K. A. Ouattara et al.

Our algorithm takes as input a set of connections whose main paths, backup paths
and final paths are known (line 1). We construct the auxiliary graph Ga using the backup
paths as described in Sect. 6.1.2 (line 2). The algorithm for colouring the nodes of the Ga
graph is then run on graph Ga to build the connection groups [2] (line 3). The connections
belonging to the same set of this colouring have their separate backup paths. Then they
can be switched as needed in parallel on these backup paths (line 4). The reconfiguration
order of the connections is determined by the dependency graph between the initial
path and the final path defined in Sect. 6.1.1. In this context, these dependencies cause
a temporary interruption of some connections during the reconfiguration process (line
6 – line 9). Leaf or isolated nodes in the dependency graph mean that their resources
are available for reconfiguration by migrating the corresponding connections directly to
their final paths. We determine in the dependency graph Gd the set of all leaf or isolated
nodes. Then, we reconfigure them in parallel as described in the algorithm and we update
the both graph (auxiliary and dependency graph) at each iteration (line 10 – line 13). If
the reconfiguration is not completed, this means that there are cycles in the dependency
graph. In this condition the auxiliary graph Ga is used to migrate simultaneously the
connections of the same group (nodes with the same colour in the graph Ga). If the
nodes are of the same colour, we simultaneously migrate the corresponding connections
to their backup paths to free up resources. The remaining connections are migrated
directly to their final paths and then the connections switched to the backup paths are
migrated to their final paths since their respective resources are now available (line 15 –
line 17). In the case where the connections are not of the same group (the Ga nodes
do not have the same colour), we migrate the connections forming the cycles to their
backup paths to free up resources, then the remaining connections are migrated to their
final paths, and finally the connections switched to the backup paths to their final paths
since their resources are now available. We repeat this operation until the dependency
graph is empty. The algorithm stops when all connections have been reconfigured.

7 Simulation and Analysis of Results

In this section, we conduct the simulation on the performance of the proposed


RABP_EON algorithm. And we also compare RABP_EON to the performance of
MBB_BP, MBB_WBP algorithms and Y. Xin et al. contribution [10, 16]. We present the
analysis of the results with the US national backbone topology and USB topology nodes.
Those topologies are used in the literature in this field, which include 24 nodes and 28
nodes, respectively. Figure 6 and Fig. 7 below show these topologies. The lightpaths are
generated using the Dijkstra algorithm for a source and a destination nodes. Each link
contains 320 frequency slots of 12.5 GHz. We assume the same number of frequency
slots, i.e. 8 slots per connection and one guardband slot. First, the primary paths are
generated and the links of these paths are removed from the network topology before
running the Dijkstra algorithm again for the same source and destination nodes to get the
backup paths. Each final path is calculated by considering all links of the other primary
paths by removing the links of the backup paths and those of the corresponding primary
path. We perform our simulations for 300 connections, considering the topologies of
Fig. 6 and Fig. 7 and the constraints of continuity and slot contiguity. The results are
Reconfiguration of Protected Unicast Connections in Elastic 55

illustrated in Fig. 8, Fig. 9 and Fig. 10 below. We compare our approach in terms of
the number of process steps [9], the number of backup path resources, and the number
of interrupted connections. A reconfiguration step or sequence consists of simultane-
ously pre-establishing new paths, simultaneously establishing new paths (backup or final
paths) and deleting the old paths.
The duration of the reconfiguration process is the maximum number of steps during
the process to reconfigure all connections. The number of backup paths used in the
process is defined by the number of nodes creating a cycle in the dependency graph that
are temporarily switched before being reconfigured.

Fig. 6. U.S. Nationwide Backbone Network Topology

Fig. 7. 28-Node USB Topology

We run up to a maximum of 150 connections according to continuity and frequency


slot contiguity constraints and the results are shown in Fig. 8, Fig. 9 and Fig. 10 below.
The duration of the process is evaluated in terms of the number of steps. The results
in Fig. 8 show the performance of our approach in comparison with other approaches, the
Make Before Break with backup paths (MBB_BP) and the Make Before Break without
backup paths (MBB_WBP) in the two topologies mentioned. In this case, we generate
fewer steps in the reconfiguration process. However, the Make Before Break algorithm
with backup paths (MBB_BP) requires more steps than the MBB_WBP approach with-
out backup paths in terms of number of steps. This is because the reconfiguration is
done simultaneously in parallel of a set of connections unlike the approaches of MBB
which runs on a per-connection basis. In addition, Xin’s approach systematically uses
the resources of the back-up paths which require additional steps since the final paths
are calculated as and when needed. Indeed, the use of backup paths in MBB_BP implies
56 K. A. Ouattara et al.

Fig. 8. Duration of the Reconfiguration Process

Fig. 9. Number of Backup Paths used

Fig. 10. Number of Connections Interrupted

additional steps compared to MBB_WBP which does not use any step. Connections in
MBB_WBP are directly migrated on their final paths without additional steps because of
the passage of connections through their backup paths as in MBB_BP. This is observed in
both topologies of our tests where the results are substantially identical with a slight dif-
ference. Regarding the interruptions, we can observe in the Fig. 10, MBB_WBP causes
Reconfiguration of Protected Unicast Connections in Elastic 57

interruptions since it does not use backup paths to temporarily switch connections when
needed. The strength of approaches using backup paths is that they guarantee uninter-
rupted connections, which is why in Fig. 10 we see MBB_WBP. In Fig. 9, our approach
uses fewer backup paths resources. This is because Xin’s algorithm uses the backup
paths at each step to free up resources on the primary paths before computing the final
path. In the MBB_BP approach, connections are randomly selected to be reconfigured
on the final paths. When the resources on the final path of the connection are used by
another. The connection is switched to its backup path before being reconfigured. The
use of backup paths is not systematic. The random selection of connections gives the
MBB_BP approach the use of backup paths more frequently than ours. The same is true
for the second topology which has practically the same characteristics.
The particularity of this work is that we use fewer additional resources compared to
what is done which are limited in these working conditions. In addition, the operation
is non-disruptive to the end user in the sense that signal interruptions and the reduction
in the number of steps are considered in the reconfiguration process.

8 Conclusion
In this paper, we proposed a new algorithm to reconfigure routing with backup paths
in elastic optical networks (RABP_EON). In this work, we considered that the initial
routing and the final routing are known. However, the solution of the reconfiguration
problem considered the dependencies between the initial routing and the final routing
to avoid flow interruptions. Simulation results show that the proposed algorithm makes
reconfiguration with few number of steps (process time) and not flow interruption. In this
work, we assumed that the connections had the same number of slots and considered the
continuity and contiguity constraints. We focused on the process of switching from the
current routing to the new routing because the problem was difficult to solve due to new
dependencies in addition to the shared backup paths. An interesting future research topic
is to consider energy consumption in reconfiguration process. We will try to address this
crucial problem in elastic optical networks.

References
1. Ishida, S., Arakawa, S., Murata, M.: Reconfiguration of logical topologies with minimum
traffic disruptions in reliable WDM-based mesh networks. Photonic Netw. Commun. 6(3),
265277 (2003)
2. Józsa, B.G., Makai, M.: On the solution of reroute sequence planning problem in MPLS
networks. Comput. Netw. 42(2), 199210 (2003)
3. Takagi, H., Zhang, Y., Hua Jia, X., Takagi, H.: Reconfiguration heuristics for logical topolo-
gies in wide-area WDM networks. IEEE Global Telecommun. Conf. 2002, Globecom 2(3),
27012705 (2002)
4. Anoh, N., Babri, M., Kadjo, T.L.: Efficient energy routing with connections rerouting in
elastic optical networks under static connections. Int. J. Computer Science Issues 11 6(2),
89–98 (2014)
5. Marković, G.Z.: Routing and spectrum allocation in elastic optical networks using bee colony
optimization. Photon Netw. Commun. 34(3), 356–374 (2017)
58 K. A. Ouattara et al.

6. Anoh N.G., Babri, M., Kora, A.D., Faye, R.M., Aka, B., Lishou, C.: An efficient hybrid
protection scheme with shared/dedicated backup paths on elastic optical networks. Digit.
Commun. Netw. 3(1), 1118 (2017)
7. Chatterjee, B.C., Sarma, N., Oki, E.: Routing and Spectrum allocation in elastic optical
networks: a tutorial. IEEE Commun. Surv. Tutor. 17, 17761800 (2015)
8. Shen, G., Wei, Y., Bose, S.K.: Optimal design for shared backup path protected elastic optical
networks under single-link failure. J. Opt. Commun. Netw. 6(7), 649 (2014)
9. Anoh, N.G., Adépo, J.C., Babri, M., Aka, B.: Hybrid protection scheme for elastic opti-
cal networks with regenerator and efficient energy consideration. Int. J. Computer Science
Telecommun. 6(10), 7 (2015)
10. Xin, Y., Shayman, M., La, R.J., Marcus, S.I.: Reconfiguration of survivable IP over WDM
networks. Opt. Switch. Netw. 21, 93100 (2016)
11. Atta, A.F., Adepo, J.C., Cousin, B., Oumtanaga, S.: Minimize Flow Interruptions during
reconfiguration of a set of ligth-trees in all-optical WDM network. Int. J. Computer Science
Network Security 20, 7 (2020)
12. Orincsay, D., Szviatovszki, B., Böhm, G.: Prompt partial path optimization in MPLS networks.
Comput. Netw. 43(5), 557572 (2003)
13. Xin, Y., Shayman, M., La, R.J., Marcus, S.I.: OPNp1–2: Reconfiguration of survivable
MPLS/WDM networks. IEEE GLOBECOM 2006, pp. 15 (2006)
14. Cohen, N., Coudert, D., Mazauric, D., Nepomuceno, N., Nisse, N.: Tradeoffs when optimizing
lightpaths reconfiguration in WDM networks. INRIA. Research Report RR-7047 (2009)
15. Cohen, N., Coudert, D., Mazauric, D., Nepomuceno, N., Nisse, N. : Tradeoffs in routing
reconfiguration problems. 12èmes Rencontres Francophones sur les Aspects Algorithmiques
de Télécommunications (AlgoTel) (2010)
16. Adépo, J.C. : Reconfiguration du routage multicast dans les réseaux optiques WDM. Ph.D.
Thesis, Université Nangui Abrogoua, September (2016)
17. Christian, N.N., Christian, A.J., Michel, B.: Protected ligtht tree reconfiguration without flow
interruption in elastic optical networks. Int. J. Computer Networks Appl. 8(3), 140150 (2021)
18. Coudert, D., Huc, F., Mazauric, D., Nisse, N., Sereni, J.: Reconfiguration of the routing
in WDM networks with two classes of services. 2009 International Conference on Optical
Network Design and Modeling, pp. 16 (2009)
19. Coudert, D.: Algorithmique et optimisation dans les réseaux de télécommunications. Inria
Sophia Antipolis, 83 (2010)
20. Melidis, P., Nicopolitidis, P., Papadimitriou, G.: Reserved energy-aware virtual topology
management for IP-over-WDM optical networks. Opt. Switch. Netw. 31, 7285 (2019)
Users Engagement Factors with e-Court
Application Conceptual Framework

Adham M. M. Alankar(B) , Nurzi Juana Binti Mohd Zaizi,


and Hanifah Binti Abdul Hamid

Universiti Sains Islam Malaysia, Sembilan, Malaysia


[email protected], {njuana,hanifah}@usim.edu.my

Abstract. This study adds to our knowledge of the elements that influence user
involvement with e-court applications. The goal of this research is to find out
what factors influence end-user preparedness on an e-court, the structural rela-
tionship between preparedness characteristics and e-court interoperability, and to
create a readiness model based on e-court end-user readiness parameters for self-
assessment of organizational adoption. Based on literature research, a qualitative
approach was employed to determine the most critical variables of users’ inter-
action with the e-court application. The results revealed four factor categories.
Human behavior, technology advancements, organizational structure, and legal
issues. A case study technique might be used to provide a more complete expla-
nation of the phenomenon, which would improve the study’s conclusions. A case
study, for example, in a Malaysian e-Court, is proposed to explore the collected
features and variables. This research can be applied to the private sector as well
as other industries like as healthcare and banking.

Keywords: Malaysia · e-Court · e-Readiness · Interoperability ·


Communications technology

1 Introduction
Access to justice has become a major challenge in many court systems across the world.
Technology is increasingly being seen as a potential facilitator of access to justice,
particularly in terms of improving the justice sector’s efficiency [1]. The key functions
covered in court works are registration, indexing, and case follow-up. Case management
is a significant success factor in the legal system. Citizens’ social behavior reflects their
trust in the ability of justice institutions and courts to offer fast and effective services,
and a lack of trust reduces the function of these institutions in maintaining the rule of
law. This emphasizes the need of conducting surveys and studies to track efforts and
changes in the justice system, identify flaws, and put in place the appropriate measures
and remedies [2].
One of the key areas of interest is the automation of judicial operations; several
issues have arisen in the pursuit of justice, including delays caused by misplaced case
files at the register when a reference should be made. The process of automating the

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 59–68, 2023.
https://doi.org/10.1007/978-3-031-18344-7_4
60 A. M. M. Alankar et al.

functioning of the judicial system has spread nearly all over the world in the wake of the
ICT revolution, as legal practice has improved in terms of technology, thanks to effective
case file management, ease of access and retrieval of information, organizational inte-
gration, and speed of justice [3]. As a result, the courts are increasingly under pressure
to keep up with technological advancements in order to provide quality service. Further-
more, public trust and confidence in judicial institutions require a focus on government
transparency [4]. Furthermore, the government should consider the confidence and plea-
sure of government customers, both residents and enterprises, through openness, cost
reduction, and easy access to government services.
Additionally, when creating interoperable information systems, e-Government agen-
cies should focus on factors that affect performance, such as availability, dependability,
standardization, flexibility, reaction time, and integration [5]. Enterprise interoperabil-
ity is a precondition for assuring collaboration in this scenario [6]. Interoperability is
defined as the ability of multiple types of information and communications technology
(ICT) systems to communicate and share data and information in a meaningful and
usable manner, according to Lallana [7]. When this skill isn’t honed, it becomes an issue
that must be addressed. When interoperability hurdles exist, interoperability concerns
develop. There are three types of barriers: intellectual, technological, and organizational.
The primary conceptual hurdles are the information to be conveyed syntactic and seman-
tic incompatibility. These problems arise when modeling at a high level of abstraction or
when modeling at the information level. Content, syntactic, and semantic barriers are the
three most common types of conceptual barriers. The technical obstacles deal with how
people communicate and share information using computers or ICT (Information and
Communication Technology). Organizational obstacles refer to incompatibility between
two businesses’ organizational structures and management techniques [8]. Indeed, a lack
of interoperability can have a substantial influence on business and network performance
and outcomes. Businesses should be aware of their strengths and weaknesses in terms
of interoperability in order to build such capabilities amongst systems. [9]. As a result,
these obstacles should be removed in order to avoid interoperability issues.
The purpose of this study is to determine the elements that influence end-user pre-
paredness on an e-court, the structural relationship between preparedness characteristics
and e-court interoperability, and to develop a readiness model based on e-court end-user
readiness parameters for self-assessment of organizational IT adoption.
The following research questions are being addressed in this study:

1. What are the most important elements influencing end-user preparedness for e-court
interoperability?
2. How to characterize the structural linkages between preparedness elements and e-
court interoperability?
3. How can a readiness model for self-assessment evaluation in the organization be
developed based on the readiness criteria that influence users on the interoperability
of e-courts?
Users Engagement Factors with e-Court Application Conceptual 61

2 Literature Review
People’s willingness to adopt and use new technologies to achieve goals at home and at
business is referred to as technology readiness.

2.1 Electronic Court Definition


An electronic court is a system that integrates informatics and technology, containing
administrative and judicial units that function through a cadre that includes judges and
their associates, considers judicial advocacy, and codifies its operations using current
technical mechanisms [10].

2.2 Electronic Court Records Management Systems (ECRMS)

Court records are extremely important in the judicial system. Legal researchers, practi-
tioners, and policymakers mostly use them to make decisions. As a result, records man-
agement has grown in popularity, as a well-organized, efficient, and structured records
management system is crucial in ensuring that courts make fair decisions based on reli-
able data. The computerization of court records raises a number of challenges, some of
which may be specific to each country’s legal system. The generation, management, and
preservation of digital records has an impact on policies, standards, copyright, metadata,
and other technical considerations. Given the nature and importance of the court, due
process, impartiality, and independence should be carefully evaluated, even as the use of
technology reduces delays, improves economy, efficiency, and effectiveness, and encour-
ages confidence in the justice system. This is especially true when there are structural and
procedural changes, such as those brought on by new technology [11]. Countries’ expe-
riences in the International Records Management Trust (IRMT) research demonstrated
that a system needs its own robust legal framework to function with authority, trustwor-
thiness, and reliability. The legal and judicial record case studies have identified several
significant issues (IRMT) [12]: (1) the need to raise the status and priority of recordkeep-
ing; (2) the need to allocate greater resources to supporting recordkeeping infrastructure,
such as storage facilities and equipment; and (3) the need to adopt records management
policies and standards. (4) the understanding that computerized case management sys-
tems have the potential to improve case flow management and information access; (5)
the importance of developing an information strategy and business case based on the
needs of all key stakeholders before embarking on case administration computerization;
(6) the value of pilot computerization projects in building confidence and capacity; and
(7) the importance of standardized formats and templates for common documentation.

2.3 E-Readiness Opportunities

They include cost savings and efficiency improvements, improved customer service,
openness, anti-corruption, and accountability, and improved decision-making quality.
62 A. M. M. Alankar et al.

2.4 E-Readiness Challenges

They include ICT infrastructure; lack of knowledge and resistance; change management;
poor leadership in an organization; and materials and methods.

2.5 Issues on Systems Interoperability Within the Malaysian e-Court:


Understanding User Needs for Interoperability

Interoperability in e-courts refers to the ability of various IT systems and software


applications in courts to communicate in order to share data reliably, effectively, and,
most importantly, to use the data that has been exchanged. Malaysia’s judicial system
was established to address issues such as case backlogs and delays. The electronic-
based court system, which stressed the use of mechanisms that are much faster and
efficient, improved judicial service delivery in Malaysia and had a significant impact on
the government and residents. Despite the use of numerous administrative technologies,
the Malaysian judiciary’s management is still tardy in resolving conflicts, and many
concerns and obstacles remain to be addressed: “The Malaysian e-Court used a variety
of systems, including e-Filing and a Queue Management System”.
E-filling allows law firms to file or access case documents anytime and anywhere.
Case Management System (CMS) is developed specifically to improve service efficiency
in handling cases in court. This system can be accessed just by court staff, officers and
judges and does not connect with other courts, agencies.
Queue Management System is an electronic system that arranges the attendance of
lawyers. This system has to be subscribed and parties are charged at a lower rate.
Community and Advocate Portal System (CAP) is a portal system created to enable
easy communication between the courts and the public, to notify, any change of trial
scheduled, to lawyers and judges.
Case Recording and Transcribing (CRT) is a smart system to record the whole process
of hearing before judges in the open court, and record evidence of cases. Parties such as
lawyers and prosecutors can get a copy of the recordings in compact disc free of charge
for reference purposes.
Video Conferencing System (VCS) is where the parties relevant to the case would be able
to communicate with each other via fixed line or mobile phone without being physically
present at one place. This system will be convenient for lawyers and client as it eliminates
the waiting period and saves time and costs travelling to court.

2.6 Research Gaps

The research gap was discovered through a review and analysis of previous publications
and studies in this topic (see Fig. 1).
Users Engagement Factors with e-Court Application Conceptual 63

A Readiness Model for eCCMS (Electronic Court Case Management System)


Interoperability Based on End-User Perspectives

Readiness of Electronic Systems End-User Perspectives


Interoperability (Not Addressed in Studies)
(Not Addressed in Studies)

Gaps from Literature Review

Lacking in previous studies


investigated the factors The previous studies did not
affecting the readiness of e- address the perspectives of users
courts interoperability. of the electronic court system.
Previous studies have not It is crucial to examine the
defined any theories to impact of system users (i.e.,
investigate the readiness of lawyers, courts employees,
application e-court litigants, and others) on the
interoperability. implementation of electronic
None of previous studies have courts.
developed a readiness model on Limited studies have used a
electronic court case quantitative approach related to
management system the electronic court.
interoperability in Malaysia.

Lack of studies in investigating the readiness factors that affect electronic courts
implementation based on end-user perspectives and there is no readiness model for
electronic courts interoperability that have been developed in Malaysia.

Fig. 1. Depicts the summary of gaps related to existing studies.

3 Methodology

This involved a qualitative method, a literature analysis, and a thesis on e-Courts, e-justice
systems, and e-litigation concerns concerning eCCMS readiness. The process includes
64 A. M. M. Alankar et al.

studies of theories of new system readiness, reviews of Multiple-Criteria Decision-


Making (MCDM) and related methods, and user perspectives. However, a detailed anal-
ysis was conducted to better understand each of the relationships and their implications
for eCCMS readiness and use.

3.1 Research Procedure and Formwork


The study was carried out for investigation of the readiness factors within the context of
Behavioral, Technological, Organizational, and Legal for system use in courts.

4 Results and Discussion

The variables in e-readiness and intention readiness models in the literature were exam-
ined in this study, which focused on an electronic court system and the e-government
sector. This study chose various parameters and variables based on a review of the
literature and previous investigations. Table 1 below summarizes all Research variables.
Human conduct can be broken down into eleven variables. The strongest predictor
of intention to employ the target technology is performance expectation [13–15]. Effort
expectation is a significant variable that has been shown to have a direct and considerable
favorable impact on behavioral intention to utilize systems [15, 16]. Social influence
affects overall use intention, where it has a significantly positive impact on the behavioral
intention to use systems [14, 15, 17]. Facilitating factors influence behavioral intention
to utilize systems in a direct and meaningful way [14–16]. The cost and pricing structure
have a huge impact on customers’ technology [18]. It was found that experience and
habit had insignificant influence on behavioral intention to use system [16]. Optimism
and awareness have a significant impact on behavioral intention to use e-file and optimism
is a motivator contributing to Technology Readiness (TR) [14, 19, 20]. As a result of the
research in previous studies, Innovativeness is considered as a motivator contributing to
TR [19, 20]. Discomfort is an inhibitor detracting from TR [19, 20]. Finally, Insecurity is
an inhibitor detracting from TR [19, 20]. Technological factors have four variables. ICT
Infrastructure is a significant variable, and it is one of the success factors responsible for
the effective and efficient implementation of e-Court [21, 22]. Limited accessibility is one
of the key issues and challenges found to be significant in using ICT tools. When users
have greater access to technological resources, attitudes toward technology are more
positive, and they tend to use technology to a greater extent; limited accessibility is one
of the key issues and challenges found to be significant in using ICT tools [1, 23, 24].
The study found that consumers of e-learning systems confront numerous challenges,
including a lack of trust in ICT services and the high cost of maintaining them. As a result,
numerous aspects are required in the use of the electronic court in this context, the most
important of which is technological and legal protection [21]. Lastly, in terms of design
and development factors, it was discovered that in order to obtain positive results in the
design and execution of e-justice projects, it is critical to create principles of involvement
and cooperation with key stakeholders. In addition, the study demonstrated the need of
coordination between different stakeholder groups and different types of actors during the
design and implementation phase; the initial design phase and continuing development
Users Engagement Factors with e-Court Application Conceptual 65

Table 1. Research Variables

Factors Variables
(A) Human Behavior A1 Performance expectancy
A2 Effort expectancy
A3 Social Influence
A4 Facilitating Conditions
A5 Price value
A6 Experience and Habit
A7 Optimism
A8 Innovativeness
A9 Discomfort
A10 Insecurity
(B) Technological B1 ICT Infrastructure
B2 Accessibility and Simplicity
B3 Security and Trust
B4 Design and Development
(C) Organizational C1 Top Management Support
C2 Stockholder Training
C3 Awareness
C4 Strategy
C5 Funds and Resources
(D) Legal D1 Governmental Regulations
D2 Procedure and Standards Unit
D3 IT Standards

scrutiny are the key risk considerations [1, 25]. Organizational factor has five variables.
According to the literature review, the biggest challenge facing court administrators in
general is to support the administration of justice and provide greater access to justice,
which will necessitate new skill sets such as government will, motivating and supporting
leaders [21, 26]. During the research, it was discovered that one of the many challenges
faced by users of e-learning systems is that they lack sufficient skills to use information
and communication technology. As a result, training stakeholders is one of the success
factors responsible for the effective and efficient implementation of the e-court, and a
lack of effective training is one of the main challenges found to be one of the issues
[21, 23, 24]. In terms of the awareness factor, the introduction of information systems
as a tool to assist in organizational structure changes the organization itself, so people in
the organization must be aware of these changes, and awareness of the electronic court
is one of the most important factors that will enable members and litigants to use the
66 A. M. M. Alankar et al.

electronic court appropriately [21, 25, 27, 28]. This study also concluded that the primary
obstacles of the electronic court are uniformity, practice, technology, and strategy, and
that strategic planning is vital and extremely significant [3, 26]. The last variable in
organizational factors is funds and resources, which the study found are required to
broadcast ICT infrastructure and increase scaling levels, despite the fact that technology
and communications investments are costly [28, 29]. Legal factor has three variables; the
first is governmental regulations; The electronic court necessitates legislative oversight
of the use of advanced information technology in judicial proceedings, so the law has
to be updated to include and cover all new technology amendments and procedures,
and difficulties of working with old systems that are out of date [1, 30]. The second
legislative variable is procedure and standards unit; it is critical to enforce practice
and process consistency across the state [30]. The last variable is IT Standards; results
show the electronic court requires legislative regulation of the process of using modern
information technologies in court work, many elements, most notably technical and
legal protection, and legislative regulation of the process of using modern information
technologies in court work. The most pressing issue is the misalignment between what
technology can provide and the current state of technology regulation in the courts [3,
31]. This research contributes to the understanding of user’s engagement factors with e-
court application. The findings of this study could be improved by employing a case study
method to provide a more detailed explanation of the phenomenon. It is suggested that
the collected components and variables be examined, and that a case study be conducted
in a public organization to find uniform multi-perspective criteria for human behavioral,
technological, organizational, and legal challenges. This phase identifies multi-criteria
perspectives for readiness aspects that may influence eCCMS adoption and usage, as well
as building a new multi-perspective decision-making procedure based on the identified
challenges.

References
1. Lupo, G., Bailey, J.: Designing and implementing e-justice systems: some lessons learned
from EU and Canadian examples. Laws. 3(2), 353–387 (2014). https://doi.org/10.3390/law
s3020353
2. Fifth Legal Monitor Report (2018)
3. Saman, W.S., Haider, A.: E-Shariah: information and communication technologies for Shariah
court management. Legal Inf. Manage. 13(2), 94106 (2013)
4. Slowes, R.: Benefits of a modern court case management system. Thomson Reuters, pp. 1–6
(2012)
5. Sulehat, N.A., Taib, C.A.: e-Government information systems interoperability in developing
countries: the case of Jordan. J. Business and Social Rev. Emerging Economies 2(1), 49–60
(2016). https://doi.org/10.26710/jbsee.v2i1.18
6. Panetto, H., Zdravkovic, M., Jardim-Goncalves, R., Romero, D., Cecil, J., Mezgár, I.: New
perspectives for the future interoperable enterprise systems. Comput. Ind. 79, 47–63 (2016).
https://doi.org/10.1016/j.compind.2015.08.001
7. Lallana, E.: An Overview of ICT Policies and e-Strategies of Select Asian Economies. UNDP-
APDIP ICT4D Series. Elsevier (2004)
8. Vernadat, F.B.: Technical, Semantic and Organizational Issues of Enterprise Interoperability
and Networking (2010)
Users Engagement Factors with e-Court Application Conceptual 67

9. Leal, G., Guedria, W., Panetto, H.: A semi-automated system for interoperability assessment:
an ontology- based approach. Enterprise Information Systems, Taylor & Francis, In press,
ff10.1080/17517575.2019.1678767ff. ffhal-02309347f
10. Youssef, A.F.: Electronic Information Courts and Electronic Litigation. 1st Edition.
Alexandria- Egypt (2013)
11. Mosweu, T., Mosweu, O.: Electronic court records management systems: a review of literature
in selected African countries. Mousaion 36(4), 1–21 (2019)
12. International Records Management Trust (IRMT) (2004)
13. Shin, D.H.: Towards an understanding of the consumer acceptance of mobile wallet Original
Research Article. Comput. Hum. Behav. 25, 1343–1354 (2009). https://doi.org/10.1016/j.chb.
2009.06.001
14. Schaupp, L.C., Carter, L., McBride, M.E.: E-file adoption: a study of U.S. taxpayers’
intentions. Computers in Human Beh. 26(4), 636–644 (2010)
15. Chiu, Y.-T., Fang, S.-C., Tseng, C.-C.: Early versus potential adopters: exploring the
antecedents of use intention in the context of retail service innovations. Int. J. Retail
Distribution Manage. 38(6), 443–459 (2010). https://doi.org/10.1108/09590551011045357
16. Enaizan, O., et al.: Electronic medical record systems: decision support examination frame-
work for individual, security and privacy concerns using multi-perspective analysis. Heal.
Technol. 10(3), 795–822 (2020)
17. Keong, M.L., Ramayah, T., Kurnia, S., Chiun, L.M.: Explaining intention to use an enterprise
resource planning (ERP) system: an extension of the UTAUT model. Business Strategy Series
13(4), 173–180 (2012). https://doi.org/10.1108/17515631211246249
18. Venkatesh, V., Thong, J.Y.L., Xu, X.: Consumer acceptance and use of information: extending
the unified theory of acceptance and use of technology. MIS Q. 36(1), 157178 (2012)
19. Sani, O.G., Pesaran, B., Shanechi, M.M.: Modeling behaviorally relevant neural dynamics
enabled by preferential subspace identification (PSID). Nature Neuroscience 24, 808154
(2019)
20. Parasuraman, A., Colby, C.L.: An updated and streamlined technology readiness index: TRI
2.0. J. Service Res. 18(1), 59–74 (2015)
21. Singh, M., Sahu, G.P., Dwivedi, Y., Rana, N.P., Tamilmani, K.:. Success factors for e-court
implementation at Allahabad High-Court. In: PACIS, p. 137 (2018)
22. Mandil, A.F.: Remote litigation: a legal study. Al-Qadisiyah University. Kufa Journal for
Legal and Political Science (2014). http://uokufa.edu.iq/
23. Al-Shboul, M.A.R., Barber, K.D., Garza-Reyes, J.A., Kumar, V., Abdi, M.R.: The effect of
supply chain management practices on supply chain and manufacturing firms’ performance.
J. Manuf. Technol. Manag. 28(5), 577–609 (2017). https://doi.org/10.1108/JMTM-11-2016-
0154
24. Ghavifekr, S., Kunjappan, T., Ramasamy, L., Anthony, A.: Teaching and learning with ICT
tools: issues and challenges from teachers’ perceptions. Malaysian Online J. Educ. Technol.
4(2), 38–57 (2016)
25. Rosa, J., Teixeira, C., Pinto, J.S.: Risk factors in e-justice information systems. Gov. Inf. Q.
30(3), 241–256 (2013)
26. Dillon, M., Beresford, D.: Electronic courts and the challenges in managing evidence; a view
from the inside the international criminal court. In: IJCA, Vol. 6, p. 29 (2014)
27. Upadhyay, M.H.: E-Courts in India and E-Judiciary in India. Int. Multidisciplinary Research
J. (7637: 7), pp. 2–5 (2015)
28. Sharma, C., Mitra, A.: Corruption, governance and firm performance: evidence from Indian
enterprises. Arup J. Policy Modeling 37(5), 835-851 (2015)
29. Ojo, A., Janowski, T., Estevez, E.: Semantic Interoperability Architecture for Electronic
Government, pp. 63–72 (2009). https://doi.org/10.1145/1556176.1556192
68 A. M. M. Alankar et al.

30. Abu Taleb, J.N.: Electronic Courts: Their Procedures and the Extent of their Legal Application
in Jordan, 1st edn. Alaan publishers and distributors, Amman, Jordan (2018)
31. Stefanov, S.: Some aspects of legal regulation of the project electronic court” during its
implementation in the legal procedure of Ukraine. Evropsky politicky a pravni diskurz 3(1),
165–171 (2016)
On the Reusability of Machine Learning
Models in Edge Computing: A Statistical
Learning Approach

Xenia Skotti1 , Kostas Kolomvatsos2 , and Christos Anagnostopoulos1(B)


1
School of Computing Science, University of Glasgow, Glasgow, UK
[email protected], [email protected]
2
Department of Computer Science and Telecommunications,
University of Thessaly, Volos, Greece
[email protected]

Abstract. The adoption of Edge Computing continues to grow with


edge nodes recording increasingly more data, which inevitably requires
that they should be processed through Machine Learning (ML) models
to speed up the production of knowledge. However, training these models
requires an increased amount of resources, which are limited, thus, the
reuse of ML models becomes of paramount importance. Given that we do
not have a pool of models to choose from, is it possible to determine which
nodes in the network require distinct models and which of them could be
reused? In this paper, we propose a solution to this question, an online
model reuse framework which is evaluated for its precision and speedup.
The framework considers all possible combinations of pairs in the network
to determine which are good reusability pairs, by adopting statistical
learning methods. Then for each pair, the node model is chosen that has
the highest data space overlap. Our comprehensive experimental analysis
in the context of both regression and classification shows the feasibility
our solution in model reusability in Edge Computing environments.

Keywords: Edge computing · Model reusability · Machine learning

1 Introduction
Lee et al. [6] define compute reuse as ‘the partial or full utilization of already
executed computational task results by multiple users to complete a new task
while avoiding computation redundancy’. Systems that adopt compute reuse
benefit from significant performance gains motivating model reuse in Machine
Learning (ML). Model reuse [14] attempts to construct a model from other pre-
existing and pre-trained models for other tasks, in order to avoid building a
model from scratch. Exploitation of pre-existing models can set a good basis
for the training of a new model which translates into a reduced time cost, data
amount and expertise required to train a new model. Moreover, model reuse has
been used to tackle concept drift [13] and building ad-hoc analytic models [5].
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
K. Arai (Ed.): FTC 2022, LNNS 561, pp. 69–89, 2023.
https://doi.org/10.1007/978-3-031-18344-7_5
70 X. Skotti et al.

Model reusability is compelling and, therefore, both theoretical [14] and


empirical [5,12] frameworks have been proposed to take advantage of it. Many
of the proposed approaches involve a two-phased framework of a preprocess-
ing and runtime phases, i.e., the model and its data are shared in a pool from
which, in the runtime phase, the relevant ML models are identified. Consider the
case of edge computing, where given a number of nodes and their corresponding
datasets, we want to decide for which nodes to train a distinct model and for
which to reuse one. In this context, the reuse comes from the fact that we do
not train a model for all nodes but instead reuse one of the existing ones. A
framework for model reuse in edge computing requires its online presence, thus,
the aforementioned steps are merged. To the best of our knowledge no such
framework has been proposed so far in the respective literature.
One of the fundamental requirements of any model reuse framework is to
be able to choose the model that best fits the (test) data of the target domain.
One of the ways this can be achieved is by finding the model whose source
domain (training data) is drawn from the same distribution as the target domain.
Therefore, the difference between domains needs to be quantified and minimised
to find the best model. This is essentially what the Maximum Mean Discrepancy
(MMD) [3] statistic does.
In addition to measuring the similarity between two datasets, we need to
determine the direction of reusability. In other frameworks [5,12], the reused
model originated from a pool, hence there was no such requirement because there
was only one direction of reusability, the pool. In this setting though there are two
directions per pair, and we need to define a method to do so. The method needs
to measure the data space overlap between two datasets to determine potentially
which would be better suited to be used to train a replacement model for the
other.
The data space overlap can also be defined as the overlap of the inlier data
space. A predictor for inlier space overlap is the probability of correctly pre-
dicting the non-native inliers of a model. In other words, what is the overlap
between the inlier points of two datasets, the native and non-native one with
regards to the inlier detection model. The reason behind using inliers to deter-
mine the overlap is that any dataset is expected to have a few outliers and hence
some filtering needs to be applied anyway. Simultaneously, this can also be lever-
aged to determine the direction of reusability. We used the One-class Support
Vector Machines (OCSVM) [10] to determine which points are inliers. There-
fore, given two nodes and their corresponding OCSVM models, we can use each
OCSVM model to predict the other node’s inliers and then find the probability
of detecting them, hence their overlap.
The paper is organized as follows: Section II highlights the relevant research
with regards to model reuse and elaborates on our contribution. Section III
provides preliminaries of the theory behind MMD and OCSVM. In Section IV,
we introduce the reusability framework and provide the corresponding algo-
rithms. Experimental evaluation is summarized in Section V highlighting the
real datasets and classifiers used, the parameter configuration and the definitions
Model Reuse Framework 71

of metrics in the context of model reusability. Section VI concludes the paper


with discussion on the important findings along with limitations and directions
for future work.

2 Related Work
Compute reuse has been investigated in the context of edge computing by [6]
to quantify its gain. Experiments on edge-based applications showed that sys-
tems that adopt compute reuse can finish the same task up to five times faster.
Motivated by similar concerns a theoretical paradigm named ‘learnware’ was
proposed by Zhou [14]. More specifically, a learnware is a ML model that is pre-
trained and achieves good performance paired with a detailed specification. The
vision behind the paradigm was that learnware models can be shared in a pool
without their raw data, allowing the identification of pretrained models that sat-
isfy their requirements without concerns over privacy violations. Therefore, the
author identified three characteristics: reusable, evolvable and comprehensible
as fundamental for a model to be considered a learnware.
Based on this paradigm, the Reduced Kernel Mean Embedding (RKME)
[12] was presented, i.e., a two phased framework consisting of the upload and
deployment phase. During the upload phase, each model is paired with its Kernel
Mean Embedding (KME) of the dataset and added to the pool of models. Then,
in the deployment phase either a single or a combination of models is chosen
based on the RKHS distance between the testing (target) mean embedding and
reduced (source) embedding of pool models. In essence, the RKME’s deployment
phase, is similar to the MMD statistic [3], since by quantifying the distance of
the mean embedding of two populations (source and target), it ensures that the
target distribution is the same as the source.
In [14], the authors recognise transfer learning as a preliminary attempt to
reusability. A two-stage framework dubbed as Learning to Transfer (L2T) was
presented [11], which exploits previous transfer learning experiences to optimize
what and how to transfer between domains. In the first stage each transfer
learning experience is encoded into three parts and, then, are utilised to learn
a reflection function, which approximates the performance improvement ratio
and thus encrypts transfer learning skills of deciding what and how to transfer.
The improvement ratio in this framework is the difference between domains
calculated by MMD. In addition to the MMD between domains, the variance is
also calculated since a small MMD paired with an extremely high variance still
indicates little overlap. During the second stage, whenever a new pair of domains
arrives, L2T optimizes the knowledge to be transferred by maximising the value
of the learned reflection function.
Model reuse has also been used to handle concept drift. The assumption
that previous data contain some useful information, indicates that the models
corresponding to the data can be leveraged. Condor was proposed [13] as an
approach to handle concept drift through model reuse. Condor consists of two
modules, ModelUpdate and WeightUpdate which leverage previous knowledge
72 X. Skotti et al.

to build a new model, hence updating the model pool and adapting the weights
of previous models to reflect current reusability performance respectively.
Hasani et al. [5] proposed a two-phased approach, to build faster models
for a popular class of analytic queries. Similar to the other approaches [11] -
[13], there is a preprocessing and a runtime phase. During the first phase the
models, their statistics and some meta-data are stored, while in the second phase
relevant models are identified from which an approximate model is constructed.
Their approach can achieve speed-ups of several orders on magnitude on very
large datasets, however, it is only geared towards exploratory analysis purposes
and the approach is potentially less robust under concept drift.
Concerns over intellectual property (IP) infringement and vulnerability prop-
agation of deep learning models (DNN) motivated the proposal of ModelDiff [8],
a testing-based approach to DNN model similarity comparison. They compare
the decision logic of models on the test inputs represented by a decision distance
vector (DDV), a newly defined data structure in which each value is the dis-
tance between the outputs of the model produced by two inputs. These inputs
are pairs of normal and corresponding adversarial samples and thus when used
to calculate the DDV, the decision boundary is captured.
Lee et al. [6] also discuss alternative approaches and corresponding challenges
of compute reuse including in networks. They identify that reuse can be achieved
either in a distributed or centralized manner. The distributed approach involves
forwarding tasks to the compute reuse node that is responsible for the opera-
tion. This adds additional complexity to the forwarding operations of routers
resulting in a potential downgrade in performance. Reuse of results in a network
setting, undoubtedly improves performance, however speeding up the estima-
tion of parameters can also be beneficial in that regard. Nodes in a network can
collaborate to estimate parameters as discussed in [7]. More specifically, their
method takes advantage of the joint sparsity of vectors used for computations
enhancing estimation performance. Joint sparsity simply means that the indexes
of nonzero entries for all nodes are the same, but their values differ. The authors
also adopt an intertask cooperation strategy to consider intertask similarities.
Their method assumes that both the vectors of interest and their associated
noise follow a zero-mean Gaussian distribution which is a strong assumption for
the data to hold.
The contributions of this paper that, in parallel, depict its differences with
other relevant efforts in the domain are as follows:

– An online model reuse framework for edge computing consisting of two steps,
a pair similarity detector (based on MMD) followed by a direction of model
reusability (based on the inlier data space overlap).
– A decision making algorithm which given the results of the framework it can
maximise the number of nodes which do not require distinct models along
with a list of replacement models.
– Extensive experimental evaluation of the framework with both classification
and regression models over real datasets.
Model Reuse Framework 73

3 Background
3.1 Maximum Mean Discrepancy

MMD is a statistic that can quantify the mean discrepancy of two data distri-
butions in a kernel space in order to determine if two samples are drawn from
different distributions [3]. Let p and q be two independent probability distribu-
tions, and Ex [f (x)] (shorthand notation for Ex p [f (x)]) denotes the mathemat-
ical expectation of f (x) with x under the probability density p. The statistic
definition between p and q is:

M M D (F , p, q) = sup (Ex [f (x)] − Ey [f (y)])


f ∈ F
(1)
= sup f, μp − μq H
f ∈ F

where the function class F is a unit ball in the reproducing Hilbert space (RKHS)
and μp , μq is the mean embedding of p and q, respectively i.e., the mean of the
feature mapping in the kernel space. The function class F is universal meaning
that M M D (F , p, q) = 0 if and only if p = q. Therefore, MMD is the largest
difference in expectations over functions in F and can only be zero if the two
samples were drawn from the same distribution.
In practice, we use the square MMD in order to be able to use kernel func-
tions. Let X = {x1 , ..., xm } and Y = {y1 , ..., yn } denote the independent and
identically distributed (i.i.d.) samples
 from distribution p and q, respectively.
An unbiased estimation of M M D2  μp − μq 2H can be obtained using a U-


statistic:
m  m
1 
M M D2 (F, p, q) = k (xi , xj )
m(m − 1) i=1
j=i
n 
n
1 
+ k (yi , yj ) (2)
n(n − 1)
i=1 j=i
m 
n
2 
− k (xi , yj )
mn i=1 j=1

where k(.) denotes the kernel function. In our model, we adopt the linear and
Gaussian RBF kernels as defined as: k(x, y) = xT y and k(x, y) = exp(− 2σ1 2
 x − y 2 ), where σ ∈ R is a kernel parameter and  x − y  is a dissimilarity
measure (e.g., Euclidean distance).

3.2 One-Class Support Vector Machines


OCSVM is a one-class classification technique, which aims to classify instances
into one of two classes, the inlier and outlier classes. It was first presented by
Schölkopf et al. [10] and utilizes a training dataset with normal data to learn the
boundaries of the normal data points. Therefore, data points which lie outside
74 X. Skotti et al.

the normal data region are going to be classified as outliers. OCSVMs utilize
an implicit transformation function φ (.) defined by the kernel to project data
to a higher dimensional space. The algorithm learns the decision boundary (a
hyperplane) which achieves the maximum separation of the majority of data
points. Only a small fraction of data are allowed to lie on the other side of the
decision boundary and those data are considered outliers.
The OCSVM returns a function f that takes the value +1 for the normal
region and −1 elsewhere. This function f is called a decision function being
defined as: f (x) = sign(g(x)) = sign(wT φ(x) − ρ) where w is the vector per-
pendicular to the decision boundary (g(x) = 0) and ρ is the bias. Given that
the distance of any arbitrary data point to the decision boundary can be cal-
culated by d(x) = |g(x)|
w and that the origin’s value when plugged to g(x) is ρ,
ρ
the distance of the origin to the decision boundary is w . The OCSVM essen-
tially attempts to maximise the distance by solving the minimisation problem
of w
2 − ρ, i.e.,
n
 w 2 1 
min −ρ+ ξi (3)
w,ξ∈RN ,ρ∈R 2 vn i=1

subject to (wT · Φ(xi )) ≥ ρ − ξi , ξ ≥ 0

where ξi is the slack variable for a point i which allows it to lie on the other side
of the decision boundary, n is the size of the training dataset and v ∈ (0, 1) is a
regularization parameter. As shown in (3) the objective is not only to minimise
the distance of the origin to the decision boundary but also minimise the slack
variables ξi for all points. v represents the upper bound limit of the fraction of
outliers and a lower bound on the number of support vectors. In other words, v
specifies the number of training points which are guaranteed to be misclassified
and the number of training examples being support vectors. As mentioned above
v ∈ (0, 1) and therefore a percentage, where a high value may lead to over-fitting
and a low value to under-fitting. v controls the trade off between ξ and ρ.
For reducing the number of variables to a single vector and utilise the kernel
trick, the primary objective is transformed into a dual objective:

aT Qa
min (4)
a 2
1
n
subject to: 0 ≤ ai ≤ vn , i=1 ai = 1

where Q is the kernel matrix and a the Lagrange multipliers. Now, the decision
function becomes:
n
f (x) = sign( ai k(x, xi )) (5)
i=1
Model Reuse Framework 75

Algorithm 1: Calculates the Average Similarity MMD (ASMMD) be-


tween the Given Nodes.
Data: kernel, bandwidth: the kernel type and scalar value to be used for
the MMD calculation, samples: dictionary associating each node with
a sample, similar nodes: nodes identified as similar to each other,
other nodes: the rest of the nodes.
Result: ASM M D
1 begin
// Calculating the baseline ASMMD
2 similar mmds ←− []
3 for x, y in get pair combos(similar nodes) do
4 sx ←− samples[x], sy ←− samples[y]
5 mmd ←− M M D(sx, sy, kernel, bandwidth)
6 similar mmds.append(mmd)
7 end
// Compare which of the the other nodes are similar to the
similar nodes using the current ASMMD in each iteration
8 for x in other nodes do
9 sx ←− samples[x]
10 for y in similar nodes do
11 sy ←− samples[y]
12 mmd ←− M M D(sx, sy, kernel, bandwidth)
13 asmmd ←− mean(similar mmds)
14 if mmd < asmmd ∗ 1.05 then
15 similar mmds.append(mmd)
16 end
17 end
18 end
// Which the other nodes are similar to each other
19 if len(other nodes > 1) then
20 for x, y in get pair combos(other nodes) do
21 sx ←− samples[x], sy ←− samples[y]
22 mmd ←− M M D(sx, sy, kernel, bandwidth)
23 asmmd ←− mean(similar mmds)
24 if mmd < asmmd ∗ 1.05 then
25 similar mmds.append(mmd)
26 end
27 end
28 end
29 asmmd = mean(similar mmds)
30 end
76 X. Skotti et al.

4 ML Models Reusability Framework


Our online model reuse framework needs to be able to determine two things
given a pair of nodes. First and foremost, the pairs of nodes which have similar
datasets and then the direction of reusability.
The first objective is achieved using MMD, which measures the difference
domains and hence theoretically when the MMD value is zero this means the
two datasets are drawn from the same distribution. However, as discussed in
Sect. 3.1 in practice we utilise an estimation of MMD squared. As a consequence,
the value is not actually zero and we need to define a threshold below which a
pair would be considered similar. we have dubbed the threshold to be the average
similarity MMD (ASMMD), a value calculated using Algorithm 1. Algorithm 1
requires that we categorise nodes into two sets, one where all nodes are similar
to each other and the rest of them. Categorising nodes in these categories differs
when using a regression and classification dataset. We discuss this further in
Sect. 5.1.

Algorithm 2: Finds the Similar Pairs of the Dataset using MMD


Data: samples: dictionary associating each node with a sample, asmmd:
average similarity (ASMMD) calculated using Algorithm 1,
kernel, bandwidth: the kernel type and scalar value to be used for
the MMD calculation.
Result: similar pairs, pair mmds
1 begin
2 similar pairs ←− []
3 pair mmds ←− []
4 nodes ←− samples.keys()
5 for x, y in get pair combos(nodes) do
6 sx ←− samples[x], sy ←− samples[y]
7 mmd ←− M M D(sx, sy, kernel, bandwidth)
8 if mmd < asmmd ∗ 1.05 then
9 similar pairs.append((x, y))
10 pair mmds.append(mmd)
11 end
12 end
13 end

Once we have identified these two sets, we calculate a baseline ASMMD by


calculating the MMD of all pair combinations of the similar nodes. Then, we use
ASMMD (allowing for a 5% variation) to judge whether the rest of the nodes
are similar to each other or to the similar nodes. If they are we calculate the new
ASMMD and we use this to judge the next pair. Using the result of this process
we can then judge which pairs are similar for a given experiment as demonstrated
in Algorithm 2. It is worth highlighting that for the MMD implementation to
work and by extension all of the algorithms that utilize it (Algorithms 1 and 2),
the samples of each node need to be of equal size.
Model Reuse Framework 77

Algorithm 3: Calculates the OCSVM Score of each Node per Pair


Data: samples: dictionary associating each node with a sample, models:
dictionary associating each node with its OCSVM node model,
similar pairs: the MMD identified similar pairs
Result: pair prob
1 begin
2 pair prob ←− []
3 for x, y in similar pairs do
4 sx ←− samples[x], sy ←− samples[y]
5 pred y inliers ←− get inliers(models[x], sy),
pred x inliers ←− get inliers(models[y], sx)
6 x y overlap ←− size(pred y inliers)/size(sy)
y x overlap ←− size(pred x inliers)/size(sx)
7 pair prob.append((x y overlap, y x overlap))
8 end
9 end

Once we identify the similar pairs in the network we can then calculate the
OCSVM scores of each node in each pair and hence determine the direction of
reusability per pair. The OCSVM score is essentially the probability of detecting
the inliers of the node by using the other node’s model. Therefore, given two
nodes x and y, and their corresponding OCSVM models, we use each OCSVM
model to predict the other node’s inliers and then we calculate the number of
points that were identified as inliers and divide by the number points in the
dataset, hence the probability. The reason we divide by the number of points in
the dataset is because we expect to do some form of filtering prior and remove the
outliers if they exist, hence all the points in the dataset are inliers. We calculate
the OCSVM score for both directions and whichever is higher is the node for
which we should train the model for Algorithm 3 calculates of the OCSVM scores
of each node per pair.
The framework presented by this point operates on the node level, however
in order unify the information to the network level we prose a naive decision
making algorithm (Algorithm 4). The algorithm provides the user with infor-
mation about which nodes do not require distinct models and the respective
potential replacement models. The algorithm is naive and thus simple whose
aim is to find the maximum number of nodes for which we do not train a model
for. Nevertheless, the algorithm would not take into account any performance
optimising considerations.
A visual representation of the framework being applied to a network is shown
in Fig. 1.
78 X. Skotti et al.

Fig. 1. Example of a network where the framework is applied. The letters N, D and M
followed by a number stands for node, dataset and model respectively.

Fig. 2. The result of clustering on the BM dataset.


Model Reuse Framework 79

Algorithm 4: Finds Nodes that can use a Reused Model, along with a
List of Replacements, based on the Results of the Framework
Data: pair results: dictionary associating each pair with the node whose
model to be reused i.e. the direction of reusability, nodes: the list of
nodes from the MMD identified pairs
Result: mns: modelless nodes i.e. nodes that do not require that a model is
trained for them, model mns: associates each modelless node (mn)
with a list of potential replacement node models
1 begin
2 similar pairs ←− pair results.keys()
3 mns ←− nodes.copy(), model mns ←− {}
4 for node in nodes do
5 model mns[node] ←− []
6 end
7 for node in nodes do
8 node similar pairs ←− get node similar pairs(node, similar pairs)
9 for x, y in node similar pairs do
10 model node ←− pair results[(x, y)]
11 mn ←− dif f erence(model node, (x, y))
12 if model node in mns then
13 mns.remove(model node)
14 end
15 model mns[mn].append(model node)
// ensures we do not encounter the pair again
16 similar pairs.pop((x, y))
17 end
18 end
// Remove replacement options for an mn that can be replaced
themselves
19 for node in nodes do
20 if model mns[node].count() > 1 then
21 for model node in model mns[node] do
22 if model mns[node] not empty then
23 model mns[node].remove(model node)
24 mns.append(model node)
25 end
26 end
27 end
28 end
29 end

5 Experimental Evaluation

5.1 Experimental Setup

Datasets. We have evaluated our framework for both regression and classifica-
tion models. For regression, we have used the GNFUV Unmanned Surface
Vehicles Sensor Data Set [4] which includes data from three experiments.
80 X. Skotti et al.

In each experiment there are four sets of mobile sensor readings data (humid-
ity and temperature) recorded by the Raspberry Pi’s corresponding to four
Unmanned Surface Vehicles (USVs) (see Fig. 3).
For classification, we have used the UCI Bank Marketing Dataset (BM)
[9]. The data was collected by a banking institution through phone calls as part
of a direct marketing campaign. The dataset is a binary classification dataset
of classes ‘yes’ or ‘no’, to subscribe to the product (bank term deposit). More
specifically there are 4640 ‘yes’ instances and 36548 ‘no’ instances.
We have applied Principal Component Analysis (PCA) to reduce the number
of dimensions of the dataset from 20 to 3 and then subsequently used these data
to execute the hypothesis testing.
In comparison to the GNFUV dataset the BM dataset has no inherent
network-node like structure and hence it was constructed. We trained a K-
means classifier with an equal number of yes and no instances to split data into
four clusters. This was done to avoid class imbalance from influencing the clus-
tering algorithm. However, we wanted to have more available samples to split
into more nodes so instead of clustering equal amounts of instances per class, we
used three times the number of yes instances for no instances. We merged three
of these clusters into one (clusters 1,2 and 4 in Fig. 2) and then created 5 nodes
from the two clusters.
It is worth mentioning we have used two data configurations per dataset.
For the GNFUV dataset, the two configurations were the original data and a
standardised version of them. For the BM Dataset, we used the node data created
from the aforementioned process as well as a balanced version of them, by under
sampling the majority class (no) to have an equal number of instances as the
minority class (yes).
Lastly, we have drawn 100 unique samples per network, in each of which
the node data have an equal number of examples in order to comply with
the MMD implementation constraint discussed in Sect. 4. The sample size of
each node dataset is determined by the Minimum Sample Size (MSS), i.e.,
defined by the node with the minimum number of entries. The source code is
available for re-producability at https://github.com/XeniaSkotti/online model
reuse framework edge computing.

ASMMD Algorithm Parameters. As discussed in Sect. 4, the ASMMD


Algorithm takes four arguments, the sample of each node, the kernel, bandwidth,
the similar and other nodes. In this section we discuss how we set and what the
kernel, bandwidth, similar and other nodes are per dataset (and experiment in
the case of the GNFUV dataset).
Model Reuse Framework 81

Fig. 3. The relationship between humidity and temperature per experiment alongside
their distribution plots for the original GNFUV data.

The approach to identifying the similar and other nodes for each dataset
differed due to the nature of each dataset. Since the GNFUV is a regression
dataset of only two dimensions, we plotted the points of each experiment and
visually identified the pairs which we deemed as similar per experiment. Then
we used Algorithms 1 and 2 to confirm our inferences, otherwise we adjusted the
similar and other nodes sets. For the BM dataset the similar nodes are either the
nodes of the newly merged cluster or cluster 3. Similarly, we tested both possible
82 X. Skotti et al.

similar nodes sets for each data configuration (balanced and unbalanced) to
determine which one was best.

Table 1. ASMMD algorithm parameters per dataset.

Dataset Experiment Data ASMMD algorithm parameters


configuration similar nodes other nodes kernel bandwidth
GNFUV 1 Standardised pi2, pi3, pi4 pi5 rbf 0.5
original pi2, pi4 pi3, pi5 rbf 10
2 Standardised pi2, pi3, pi5 pi4 rbf 1
original pi3, pi5 pi2, pi4 rbf 100
3 Standardised pi2, pi4 pi3, pi5 rbf 1
original pi2, pi4 pi3, pi5 rbf 5
BM Balanced pi1, pi2, pi3 pi4, pi5 linear 0.001
unbalanced pi4, pi5 pi1, pi2, pi3 linear 0.001

Once we had an initial idea of the similar and other nodes sets, we could then
use them to determine the kernel and bandwidth. The two kernels we considered
were the Radial Basis Function (rbf) and Linear kernels. We aimed to choose
the parameters which would most effectively separate the similar from dissimilar
pairs. The full parameter configuration of each dataset (experiment) and data
configuration is found in Table 1.

ML Models. For each problem type we chose distinct classifiers, namely Sup-
port Vector Regression (SVR) and Logistic Regression (LR) for regression and
classification, respectively.
Starting off with regression, we have trained SVRs to capture the relationship
between the humidity and temperature attributes of the dataset. SVRs are a
version of SVM for regression proposed by Vapnik et al. [2]. SVRs have a few
variables that should be optimised for each node model. First, we experiment
with both the linear and rbf kernels in order to evaluate how different kernels
interact with our framework. Moreover, we optimise the regularization parameter
and the epsilon in the epsilon-SVR model using grid search given a node’s dataset
to ensure we find the best ǫ-insensitive region for the data. It is worth noting that
the SVR implementation in scikit-learn reports the performance of the classifier
in terms of the coefficient of determination (R2 ).
Our classification dataset, has two classes yes and no and we have used LR [1]
specifically because it is usually a good baseline for binary classification. Hence,
the scikit-learn implementation of LR reports performance in terms of the mean
accuracy on the test dataset. As mentioned in Sect. 5.1, for the BM Dataset
we experiment with two data configurations, one which data are balanced and
another in which they are not. For the case in which the data are not balanced,
we configured the class weight parameter of LR to be balanced to deal with the
Model Reuse Framework 83

Table 2. Classifier parameter values that are fixed and optimised per dataset.

Dataset Classifier Classifier parameters


Fixed Gird search optimised
GNFUV SVR Kernel C Epsilon
Linear 0.01, 0.1, 1, 10 0.1, 0.5, 1, 2, 5
Non-linear 0.01, 0.1, 1, 10 0.1, 0.5, 1, 2, 5
BM LR Class weight C Solver
Balanced 0.01, 0.1, 1, 10 “lbfgs”,“liblinear”,
“saga”, “sag”
None 0.01, 0.1, 1, 10 “lbfgs”,“liblinear”,
“saga”, “sag”

imbalance. The other parameter which we control for both data configurations
is the regularization parameter. Lastly, the scikit learn implementation offers
a variety of solver options hence we optimise it as well. Table 2 details which
parameters were fixed and optimised for each classifier.

Model Reusability Metrics. Investigating the effectiveness of the framework,


requires that we examine two aspects, the speedup we benefit from when we
avoid training models for some nodes in the network, and the precision of the
framework in terms of the recommendations it makes. We have defined both
speedup and precision in the context of model reusability.
Starting off with precision, precision needs to be assessed across three differ-
ent levels. The precision of MMD at identifying good pairs for reusability, the
precision of OCSVM at identifying the correct node to reuse it’s model and lastly
the combined precision of the framework. In order for the MMD precision to
be a meaningful measure to use, it is expressed in terms of the ratio between the
performance of using a proxy model and the true model. We then consider this
ratio with regards with a threshold and if it is above that threshold it is correct.
The thresholds we considered were 0.8, 0.85 and 0.9 and are extremely high.
The OCSVM precision is either calculated strictly or with a 0.05 error
margin, that is if the node pointed by the direction of reusability does not yield
the optimal performance, but it’s performance is equal or less than 0.05 from
the optimal, we consider that the framework has made the right decision. Then
like the MMD precision, we consider the framework made the right decision if
the ratio is above a threshold.
For the combined precision we utilized lower values for the threshold,
namely values 0.6 and 0.8 since when the components are combined this will
likely result in higher errors. Nevertheless, 0.8 is still not only a high threshold
but also it is common threshold across the MMD and combined precision allowing
us to track their difference. The reason that we assess precision across three
levels is to be able to gauge how effective each component of the framework is
84 X. Skotti et al.

in isolation but also combined. Consequently, we can provide a more holistic


evaluation of the framework.
In terms of the speedup, we need to be able to quantify how much time did
we save by not training some models with respect to what time we would need
if we trained all of them. This requires that we first identify the nodes for which
we won’t train a model for. As discussed in Sect. 4, as part of our framework
we proposed a reusability maximising decision making algorithm (Algorithm 4).
The algorithm can provide us with the nodes which we do not need to train a
model for, the model-less nodes along with a list of potential replacement models.
We utilise the list of of model-less nodes to calculate the speedup. It is worth
noting that the speedup potential varies across datasets and samples hence we
simply report it as a number.

5.2 Performance Evaluation

As discussed in the previous Sect. 5.1, we assess the framework across two met-
rics, precision and speedup. In this section we evaluate these metric results one
by one for each dataset and provide a discussion around the effectiveness of the
framework.
In the following sections we discuss the precision results across the three lev-
els, followed by speedup. We will analyse each dataset’s precision individually
and then discuss the speedup across both datasets simultaneously. More specifi-
cally, in the case of the GNFUV dataset precision, we will provide observations
for each experiment before drawing general conclusions using the metric results.
Finally, we will draw some general conclusions on the applicability of the frame-
work in regression, the effect of the kernel and standardisation of data. Similarly,
for the BM dataset precision we will draw conclusions for the dataset, the appli-
cability of the framework in classification and the effect of using balanced and
unbalanced data.

Regression Precision. Original Data: Starting off with the GNFUV original
data, the combined precision is almost 1 for Experiment 1 and Experiment 3 if
we allow a 0.05 margin of tolerance in terms of the OCSVM predictions (non-
strict - as discussed in Sect. 5.1). The combined precision falls to 0.69 when we are
strict about the predictions because of Experiment 3. The combined precision for
Experiment 2 is low but that’s expected considering what we discussed above.
Therefore, the framework, when the threshold is set to 0.8, has a combined
precision of 0.59 with no tolerance and increases to 0.77 when there is. These
results are illustrated in Table 3. If we analyse combined precision per kernel,
the linear kernel is better suited for original data across all three experiments.
Similar trends to those discussed either when we do or do not distinguish per
kernel, can be found in the MMD precision and OCSVM precision (Table 4),
with MMD precision at 0.78 when the threshold is 0.8 and OCSVM precision at
0.79 when we are strict and 0.97 when we are not. It is wroth noting that the
OCSVM precision for Experiment 2 when the kernel is linear is as high (almost
1) as for the other two experiments which illustrates the importance of the kernel
Model Reuse Framework 85

choice. Upon further analysis linear results yield the best results on average for
the original GNFUV data and hence the framework’s high precision overall.

Table 3. GNFUV data combined precision results.

Data configuration Combined precision


Original Experiment Threshold Strict
True False
1 0.6 1.00 1.00
0.8 1.00 1.00
2 0.6 0.89 0.90
0.8 0.46 0.47
3 0.6 0.38 0.98
0.8 0.38 0.98
Weighted average 0.6 0.77 0.95
0.8 0.59 0.77
Standardised Experiment
1 0.6 0.39 0.80
0.8 0.39 0.80
2 0.6 0.27 0.29
0.8 0.25 0.25
3 0.6 1.00 1.00
0.8 1.00 1.00
Weighted average 0.6 0.46 0.63
0.8 0.45 0.61

Standardised Data: The comments made previously for the combined precision of
Experiment 3 when we are strict cease to be true and are instead true for Exper-
iment 1 and the combined precision is low. Nevertheless, similarly to the origi-
nal data the combined precision is extremely high for Experiments 1 and 3 we
are not strict with OCSVM. The Experiment 2 combined precision is almost half
what it is for the original data at the 0.8 threshold. Consequently, the overall com-
bined precision of the framework drops at the threshold level 0.8, to 0.45 and 0.61
when we are strict and non-strict respectively (Table 3). Contrary to the origi-
nal data where the combined precision per kernel showed that the linear kernel is
better suited, for the standardised data the opposite is true, while this difference
is not significant. This is also true for the OCSVM precision when analysed per
kernel. Overall, the OCSVM precision for Experiment 2 drops (Table 4), hence
the OCSVM weighted average precision across experiments drops by 30%. On
the other hand, MMD precision increases slightly by %4 due to an increase in the
precision of Experiment 2. Upon further analysis per kernel, the MMD precision
increased 15% per kernel with the linear kernel providing much better results.
86 X. Skotti et al.

Table 4. GNFUV data OCSVM precision results.

Data configuration OCSVM precision


Original Experiment Strict
True False
1 1.00 1.00
2 0.94 0.95
3 0.38 0.98
Weighted average 0.79 0.97
Standardised Experiment
1 0.39 0.80
2 0.29 0.33
3 1.00 1.00
Weighted average 0.47 0.65

GNFUV Precision Performance Overall: Overall, the MMD precision of the


framework is high, however it is low for Experiment 2 across both original and
standardised data. The difference between the MMD precision of original and
standardised is not high when the threshold is set at 0.8 (only at 4%). However
as the threshold increases this difference as well, to 10% and 13% for thresholds
0.85 and 0.9 respectively. This is because the performance on Experiment 2 is
better on standardised data and as the threshold increases it does not deteriorate
in the same way as for the original data. Considering how high of a threshold
0.9 is, having a 0.63 and 0.76 MMD precision is really good performance. The
kernel choice is not important for Experiment 1, when it comes to the MMD
precision since the performance is perfect regardless of the kernel and data con-
figuration. This is also true for Experiment 3 for the standardised data, while for
the original data the linear kernel is better suited for this experiment. For both
data configurations the linear kernel performs better for Experiment 2. Lastly,
in terms of the OVSVM Precision when the original data are used, the kernel
choice is unimportant for Experiment 1, while for Experiments 2 and 3 the linear
kernel is better, even though for Experiment 3 the difference is not significant.
When the standardised data are used, the statements made for Experiments 1
and 3 are now reversed, with the slight difference that it is the rbf kernel that
is better for Experiment 1 instead of the linear one. Similarly, to Experiment 1
the rbf kernel is slightly better for Experiment 2 but the difference is only 0.07.
The framework performs better on the original data across all three levels of
precision with 0.77 (non-strict) combined precision at threshold 0.8 and a drop
of 15% for standardised ones. When analysing the combined precision results per
kernel, the linear kernel is better suited for the original data, while the opposite
is true for standardised data even though this difference is not large. The rbf
kernel models have higher performance on their native datasets compared to
linear ones, but nevertheless have higher discrepancy. Hence, on average linear
models provide better results.
Model Reuse Framework 87

Table 5. BM data combined precision results.

Data configuration Combined precision


Threshold Strict
True False
Combined 0.6 0.55 0.98
0.8 0.55 0.98
Balanced 0.6 0.58 0.99
0.8 0.58 0.99
Unbalanced 0.6 0.56 1.00
0.8 0.56 1.00

Table 6. BM dataset OCSVM precision results.

Data configuration OCSVM precision


Strict
True False
Combined 0.55 0.98
Balanced 0.58 0.99
Unbalanced 0.56 1.00

Classification Precision. In terms of the classification performance of the


dataset, the BM Dataset results are very good. All the nodes in the BM Dataset,
have good performance on their native dataset, with balanced models having
slightly better performance. All pairs identified are good pairs for reusability on
both sides and the performance across configurations is almost identical. This is
confirmed by the combined precision depicted in Table 5 which is extremely high
regardless of whether we distinguish between the configurations or not if we are
not strict. If we are strict this performance drops at 0.55 on average and this is
a direct reflection of the OCSVM precision (Table 6). However, considering how
good the performance is overall, the real combined precision of the framework is
the one given by the non-strict measure.

Table 7. Framework speedup results.

Dataset Speedup
GNFUV Experiment Data configuration
Standardised Original
1 0.23 0.26
2 0.3 0.28
3 0.24 0.23
Weighted average 0.26 0.26
BM Unbalanced Balanced
0.29 0.41
88 X. Skotti et al.

Speedup. Overall, the speedup of the framework for the particular datasets
used for regression and classification are 26%, and 29% to 41% respectively
(Table 7). These results are expected if you consider that for the GNFUV dataset
regardless of the data configuration on average there is one good pair for reusabil-
ity hence one node’s model is not trained. The two data clusters created from
the BM dataset mean that ideally we would only train two models. Nevertheless
the results are lower than this average case due to the fact we use samples of
the dataset hence the true reusability differs from sample to sample. Hence, for
both the classification and regression case we can argue that the framework if
effective in identifying the true number similar pairs.

6 Discussion and Conclusions

In this paper, we presented a novel online model reuse framework in edge com-
puting. The framework considers all possible pairs of nodes in the network and
infers which are good reusability pairs as well as which of the two nodes’ model
can be used as a replacement model for the other per pair. We utilise MMD as
our dataset similarity measure and we present a newly defined algorithm which
calculates a threshold that distinguishes similar from non-similar pairs. The node
model that is chosen to be reused in each pair is the one with the highest inlier
data space overlap. Experiments in the context of both regression and classifi-
cation have shown the framework achieves good precision. Lastly, we present an
algorithm that, given the results of the framework, can maximise the number of
nodes which use reused models along with a list of potential replacement models.
The framework presented is novel and therefore the results presented in this
paper while encouraging they are still preliminary. We experimented with only
one model per data domain and a limited range of data configurations. Con-
sequently, the evaluation of the framework needs to be extended to check the
compatibility with more domain models and data configurations. Even though
this framework in its current does not preserve user privacy it could be amended
to meet this requirement. In this paper, we hypothesise that the inlier space
overlap is an indicator for the direction of reusability. However, we only consider
one outlier detection model and there many more that could be used. Further-
more, the naive decision making algorithm proposed as part of the framework is
maximising the speedup, which does not guarantee that the solution is optimal
performance wise. Defining an algorithm which can produce either the perfor-
mance optimal or partially optimal solution is a different and challenging task
altogether.

Acknowledgment. This research has received funding from the European Union’s
Horizon 2020 research and innovation programme under Grant Agreement no.
101037247.
Model Reuse Framework 89

References
1. Cramer, J.S.: The origins of logistic regression. Tinbergen Institute, Tinbergen
Institute Discussion Papers, 01 (2002)
2. Drucker, H., Burges, C.J.C., Kaufman, L., Smola, A., Vapnik, V.: Support vector
regression machines. In Proceedings of the 9th International Conference on Neural
Information Processing Systems, NIPS1996, pp. 155–161, Cambridge, MA, USA
(1996). MIT Press
3. Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel
two-sample test. J. Mach. Learn. Res. 13, 723–773 (2012)
4. Harth, N., Anagnostopoulos, C.: Edge-centric efficient regression analytics. In: 2018
IEEE International Conference on Edge Computing (EDGE), pp. 93–100 (2018)
5. Hasani, S., Thirumuruganathan, S., Asudeh, A., Koudas, N., Das, G.: Efficient
construction of approximate ad-hoc ML models through materialization and reuse.
Proc. VLDB Endow. 11(11), 1468–1481 (2018)
6. Lee, J., Mtibaa, A., Mastorakis, S.: A case for compute reuse in future edge systems:
an empirical study. In: 2019 IEEE Globecom Workshops (GC Wkshps), pp. 1–6
(2019)
7. Li, C., Huang, S., Liu, Y., Zhang, Z.: Distributed jointly sparse multitask learning
over networks. IEEE Trans. Cybern. 48(1), 151–164 (2018)
8. Li, Y., Zhang, Z., Liu, B., Yang, Z., Liu, Y.: ModelDiff: testing-based DNN similar-
ity comparison for model reuse detection. In: Proceedings of the 30th ACM SIG-
SOFT International Symposium on Software Testing and Analysis, ISSTA 2021,
pp. 139–151, New York, NY, USA (2021). Association for Computing Machinery
9. Moro, S., Cortez, P., Rita, P.: A data-driven approach to predict the success of
bank telemarketing. Decis. Support Syst. 62, 22–31 (2014)
10. Schölkopf, B., Williamson, R., Smola, A., Shawe-Taylor, J., Platt, J.: Support
vector method for novelty detection. In: Proceedings of the 12th International
Conference on Neural Information Processing Systems, NIPS1999, pp. 582–588,
Cambridge, MA, USA (1999). MIT Press
11. Wei, Y., Zhang, Y., Huang, J., Yang, Q.: Transfer learning via learning to transfer.
In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference
on Machine Learning, vol. 80 of Proceedings of Machine Learning Research, pp.
5085–5094. PMLR, 10–15 July 2018
12. Wu, X.-Z., Xu, W., Liu, S., Zhou, Z.-H.: Model reuse with reduced kernel mean
embedding specification. arXiv preprint arXiv:2001.07135 (2020)
13. Zhao, P., Cai, L.-W., Zhou, Z.-H.: Handling concept drift via model reuse. Mach.
Learn. 109(3), 533–568 (2019). https://doi.org/10.1007/s10994-019-05835-w
14. Zhou, Z.-H.: Learnware: on the future of machine learning. Front. Comp. Sci. 10(4),
589–590 (2016). https://doi.org/10.1007/s11704-016-6906-3
Survey of Technology-Enhanced Learning:
Novel Pedagogical Concepts, Challenges
and Future Perspectives

Tarandeep Kaur(B) and Shubhpreet Kaur

Lovely Professional University, Punjab, India


[email protected]

Abstract. Learning driven by technology is gaining momentum with the advent


of novel digital innovations. Currently, there is a predominant urge for facilitating
learning and teaching backed up by the neoteric digital technologies. The provi-
sion of Technology-enhanced Learning (TEL) is enabled with the presentation of
base technologies as well as proficient learning frameworks. Recently, edge com-
puting has advanced as a novel computing model that offers efficient purveying
of the learning technologies. Edge computing supports provisioning of TEL in
everyday circumstances where response time establishes the important constraint
for many applications. This paper describes the concept of TEL and the role of
edge computing in enabling and supporting it. It highlights the other key role play-
ers in facilitating TEL. As the number of devices connected across the network
and to the edge is increasing, there is a rising need for processing, storing, and
managing the large amounts of data that is generated. The paper also discusses
novel pedagogical concepts and several challenges daunting the delivery of TEL
particularly through edge-systems.

Keywords: Technology-enhanced learning · Education-as-a-service · Edge


computing · Information technology · Internet of things · Information and
communication technology

1 Introduction
Education sector is becoming increasingly integrated with Information Technology (IT)
regarding the distribution, communication, and collaboration of learning as well as the
training materials. The IT support involves provisioning of infrastructure (such as hard-
ware, software, networking, storage); technologies of computing, people who are or will
work with the technologies connected over the Internet.
Many education systems have IT departments to deal with and manage the techno-
logical advances in computers, networks, and other businesses. The IT dissemination of
education helps in promoting knowledge dissemination opportunities for the instructors
and learners as well as help in implementing the latest technological innovations [1].
Nowadays, IT has made the learning process more efficient and beneficial by offering
IT-support for the institutions, universities, and schools with the help of software, servers,

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 90–102, 2023.
https://doi.org/10.1007/978-3-031-18344-7_6
Survey of Technology-Enhanced Learning: Novel Pedagogical 91

and storage conclusively facilitating e-learning approach. This has increased the well-
being of the learners who can take advantage of the technological training methods
and are able to exchange the books with electronic devices (tablets and laptops). The
emergence and adoption of e-learning platforms has further simplified the process. For
the students who are not coming in institutions due to some urgent reasons and face
difficultly in connecting with their instructors in the classrooms find the e-platforms as
the best options for learning. These platforms give learners the chance to review course
at every moment with modest and more concise explanations and this strengthens the
process of learning and for most learners brings better results [2].
Technology-enhanced learning (TEL) is playing a vital role in distance education as
well as regular-mode online learning. It has emerged as a guide for one or more learners
located at geographically distant locations. TEL is used by educational institutions to
support their learning process and to provide training materials and information in real-
time and irrespective of their locale. TEL includes several hands-on imprisonments
of many learning technologies and facilitates the exploration of various processes for
designing, implementing, and critiquing the concept of e-learning inducing pioneering
notions underpinning such processes [3, 4]. The virtual learning and training are being
provided extensively with the advent of novel technologies offering many benefits such
as extensibility, diversification, scalability etc. [3].
In India, the growth of e-learning in the last few years has been an important catalyst.
Owing to the present COVID-19 scenarios, TEL has completely engulfed the education
sector. There are different platforms (Moodle Cloud, Canvas, Blackboard etc.) that offer
real-time classroom environments. Several online degrees and certifications (through
online learning platforms such as Coursera, EduX, Udemy) are feasible by virtue of TEL.
Not only this, but most of the Indian companies are also adopting e-learning platforms
because learning has become a strategic necessity for employees. The large companies
are adopting or engaging with the e-training platforms to provide their employees with
the aim of short-term courses, certifications, and capacity-based training [5].

1.1 Technology-Enhanced Learning (TEL)

In recent years, the concept of Technology Enhanced Learning (TEL) has gained sig-
nificant embellishment. There are technology-based training and instruction systems
through which students acquire skills or knowledge, usually with the help of teachers or
facilities; learning tools and technical resources [6]. However, being open to a very wide
area of explanations, this is not limited to any kind of technical or educational approach.
The use of technology to handle-up all situations where it plays a key role in formulation
training brings productivity, powerfulness, and learnability with pleasure.
A variety of technologies can be used to increase training. In its extensive sense,
“technology” can include both hardware such as collaborative whiteboards, smart tables,
handheld technology, sculptural objects, and software such as Hardware-added learning
systems, Training Management Systems (TMS), Simulation modelling tools, electronic
learning materials and science fix statistics, educational games, Web 2.0 social applica-
tions, 3D Virtual Reality, etc. The specific examples are technologically advanced and
creative small machines such as MIT’s OLPC (One Laptop per Child) or Intel’s Class
92 T. Kaur and S. Kaur

PC; that have emerged as teachers of educational transformation in emerging coun-


tries. In addition to PCs, institutions and other organizations provide what can be called
“smart furniture”. The creators are incorporating intellect into classroom furniture in
other ways- for example: Interactive whiteboards – most things are shown in a group
that is done in such as smartphones [7].
In addition to the “traditional” desktop computing, where teachers access the world
of digital learning through screens, the electronically embedded physical objects or solid
procedures now enable new types of learners to communicate with each other. Subse-
quently, software technology is evolving and improving where the new applications for
software technology are also appearing. The improvement of student learning formulates
students for the use of in effect technology in their potential workplace, and/or enables
staff to be more productive [8].

1.2 Edge Computing

Edge computing has developed a new concept in the computing landscape. The edge
computing directs cloud computing data, applications, and services to the edge of a
network away from the cloud server. It brings cloud computing services and facilities
closer to the end-user and is characterized by more rapidly processing and faster response
times [9]. The content providers and application developers can use the edge computing
system by offering services to their immediate users.
According to IBM, edge computing has been described as a “distributed computing
framework that brings enterprise applications closer to data sources such as Internet of
Things (IoT) devices or local edge servers. The edge computing can provide proximity
to resources and data in significant business benefits: more rapidly insights, improved
reaction times, and better bandwidth availability” [10].

Fig. 1. Relationship between Edge Computing, Fog Computing and Cloud Computing
Survey of Technology-Enhanced Learning: Novel Pedagogical 93

The innovative version of cloud computing is edge computing that involves fetching
the services close to the destination users and reduces latency. The edge computing
provides resources and services by cloud in the edge network and minimize a load of a
cloud. Figure 1 shows the relationship between edge computing, cloud computing, and
fog computing using technology and IoTs like devices, nodes, and data centers.
The fog computing represents network links between edge computing devices and
the cloud. The edge computing refers specifically to computing processes performed
near edge devices. Therefore, the fog computing involves edge computing, as well as
the networks that are needed to send the processed data to the destination. In other
words, it’s a standard that explains how edge computing should work. This creates a
faster control loop using the fog computing model as the data is processed on the device.
As compared to the traditional cloud-based networks, edge computing helps institutions
break down the boundaries set with a new method to network architecture. The exciting
promises offered by edge and IoT devices enhance the ability to process data collected
close to the source [11].
There are some apparent benefits of edge computing over traditional technologies
like cloud computing that give it a superiority (particularly in provisioning TEL) as listed
below and depicted in Fig. 2 [12]:

• Enhanced Learning in Secure Environment: The architecture of cloud computing


is compacted and complicated, so it is exposed to occurrences or any other cyber-
attacks. The edge computing solutions, on the other hand, distribute all phases of data
storage across different devices, such as sensors, transporters, cameras, and some
centers and data centers making the entire network more secure and less exposed. In
the edge networks, the complex and key data is less transferred and stored remotely
[13].
• Enhanced Operational Speed & Performance of Business: There is an extensive
collaboration of edge with IoT where the edge-based IoT devices collect, process,
and analyse locally available data, which are close to the source of the presence. This
reduces the data’s physical transfer time and latency. With it, you can use the data on
an actual basis without any delay. Due to the operation of autonomous edge-based
IoT devices, the stability of the Internet connection is no longer important [14].
• Scalability: Edge computing allow you to increase the entire IoT (network & comput-
ing) capabilities using additional IoT devices and data centers, ignoring the available
cloud storage. The scalability has been made easier for businesses and education
through the development of edge computing. Through the help of edge computing,
companies are leveraged edge systems to expand their edge network’s reach and
abilities.
• Reliability: By recycling and managing data close to the source, edge computing
moderates the amount of data flowing in and out of the primary network, resulting in
less silence and faster overall speed. The physical distance plays an important role for
the performance. By locating the edge systems in the geographically close data centers
of the users and distributing the resulting processes and processing, the companies can
greatly reduce the distance data before the services are provided. These edge networks
ensure a fast, seamless experience for their guests, who expect access to their content
and operations on-demand anytime anywhere.
94 T. Kaur and S. Kaur

Fig. 2. Benefits of Edge Computing

• Integration capabilities: Edge-based IoT devices, as well as edge computing soft-


ware, can be easily integrated with other equipment, devices, platforms, and applica-
tions, thus providing better usability. The edge computing empowers IoT devices to
store large amounts of authorized data. In contrary to cloud computing where people
need to log in to devices and interact directly with integrated cloud servers, computing
devices in edge systems are always scheduled, attached, and always generating data
for the future analysis [15].

2 Novel Pedagogical Concepts in Technology-Enhanced Learning


Technology-enhanced learning (TEL) is a useful tool for teaching and learning. In
the wake of increasing use of TEL, “new” pedagogical concepts have been identified
and developed [16]. These concepts have informed educators in their development of
effective practices in the use of technology-enhanced learning.

2.1 Practices and Pedagogical Approaches in TEL


Many of the exercises have been identified as effective practices and pedagogical
approaches (as shown in Fig. 3) in TEL which include [17]:

1. Promotes interaction between students and faculty in and out of the classroom.
2. Active Learning: Active learning is a conceptual framework that captures the current
trend towards skills and training as opposed to classical knowledge learning. It is
encouraged in the classroom using structured exercises, challenging discussions,
team projects, and collaborative critiques.
3. Feedback Oriented: Digital feedbacks are moving forward more quickly with the
advent of learning analytics. The data can now be collected unobtrusively and during
learning activities. The learners need timely feedback on their performance to benefit
from the courses.
4. It is important to learn to use time wisely by students and professionals alike.
Survey of Technology-Enhanced Learning: Novel Pedagogical 95

5. Communicates high expectations.


6. Smart Pedagogy: The concept of smart pedagogy is triangular where the important
cornerstones are:

(a) Human developmental regularities, which include the conditions for the devel-
opment of cognitive processes, the conditions for sensory development, as well
as the conditions for socio-emotional development.
(b) The taxonomy of the educational process, which includes the goals to be
achieved and the regularities of the learning process needed to achieve these
goals.
(c) Technological progress, which entails the need for changes in teachers’ ped-
agogical competence, where one of the most important components of this
competence is predictive analytical competence.

7. In-Group discussion enhances the learning.


8. Provides a diversified delivery system.
9. Regularities of human development, which include conditions of development of
cognitive processes, conditions of sensory development, as well as conditions of
socio-emotional development.

Fig. 3. Technology Enhanced Learning Pedagogical Approaches

2.2 Technology-Enhanced Learning vs Traditional Learning


TEL is eventually and significantly replacing the traditional learning methods more and
more. The persistent technological innovations have invented many gadgets for TEL
96 T. Kaur and S. Kaur

such as smartphones, tablets, laptops, etc. through which one can study subjects of
one’s choice. TEL implies that students and institutions are increasingly able to follow
specific areas of study, unbundled from complete programs and degrees. Comparatively,
in traditional learning, students are gathered under a roof at a specific time and specific
place. The teaching style of the traditional education system is teacher driven. The
learners discuss with the peers to clear their doubts or interact with the instructor after
the class to do the same. Subsequently, the knowledge attained by the learner depends
on the knowledge of the instructor [18]. Table 1 lists out the comparisons between TEL
and traditional learning.

Table 1. Comparison between Technology-Enhanced Learning and Traditional Learning

Comparison Parameters Paradigms


Technology-enhanced learning Traditional learning
Learning Anywhere, anytime, and Fixed time and place
anyplace
Focus Learner-oriented Teacher-oriented
Discussion with teachers Asynchronously and Synchronously in classroom
synchronously
Exercise and Activities Much Limited
Discussion and Variety of Channels Limited
Communication
Assessment Automatic by system Manual by teacher
Feedback Provided through system Directly provided through teachers
and society
Deviation from Very Easy Not so easy
objectives

3 Key Role Players in Facilitating Technology-Enhanced Learning

TEL supports the use of technology for training and learning with the focus on improvis-
ing the quantity and quality of knowledge in the learning. The “learner-centric” means
increasingly evolution of learning is parallel to the rapid changes in TEL; there have
been important modifications in the roles and duties of teachers & education proce-
dures. The students are being facilitated with their education procedure under guided
expertise and are getting educated without time and place constraints, subsequently,
utilizing the technologies. This helps in reducing the limitations students face in the
schoolrooms. Additionally, the thrusting IT inventions aid in offering various options
for the development and growth of the TEL [19]. The educators need to be able to man-
age a variety of e-learning programs, they need to change the classroom from a fixed
Survey of Technology-Enhanced Learning: Novel Pedagogical 97

mood to one in which the awareness is in a learner-centred environment with a teacher-


to-student dynamic method [20]. There are certain aspects that formulate the backbone
of TEL including computing, teachers, learners, and IT sector.

3.1 Role of Teachers


Nowadays, teachers are engaged in not only design, redesign, and adaptation but also
in creating or preparing TEL and the training resources and work aspects. The term
“design” is often used to describe a mapping method and to develop specialized resources
for training or learning.
The teachers also play an important role in research which has three main areas that
are limited to teachers, but the growing basis of the research is represented by designers
of technology-enhanced training. The research areas need to be involved in the design for
the knowledge of teachers; creating evidence-based ideas to support teacher design tricks
and teacher designers for a variety of purposes. Although sometimes teacher design in
multi-skilled specialist teams.
The teachers working in teams can play a variety of roles, such as redesigning existing
materials and activities or redesigning them completely. In addition to the various aspects
of the teacher design process, individuals and teams need a variety of facts to interact with
the process and design products. This usually involves: (a) redesigning and redesigning
your personal practice, which teachers have come to understand; (b) evidence-based
customization; (c) team design in an organization [21].

3.2 Role of Information Technology


IT includes computers and related technologies; IT can be used to promote information
dissemination opportunities, such as WWW, the Internet, and video conferencing. Learn-
ers can communicate effectively with everyone in the environment; share information,
exchange ideas, and work in two-way and collaborating learning situations in learning
experiences. It can help educators and learners with the latest material and understanding
[22].
Using IT, students can better make decisions about their trainings, knowledge time,
location, and assets. Learners can work in a more understanding environment, help tutors
and associates, and share their education experiences and ideas in a romantic way [23].
Broadly, IT has been used by companies to study, send, receive, store, and edit
information widely used in business organizations and now in education. Many school
systems are using IT to provide students with a better understanding of difficult concepts
in the classroom and at home. There are certain aspects on how IT supports in various
manners:

• IT helps teachers and administrators keep an eye on all students in the classroom
• IT has made both teachers and learners in education easier
• Education using digital books
• IT has made education fun and entertaining
• IT has made easy to access in research and information
• IT has made group study and application easier
98 T. Kaur and S. Kaur

3.3 Role of IT-Based Edge Computing

TEL is the most important and latest trend in the present ICT era. TEL is transforming
and improving learning and informative institutions beyond recognition through that
is taking over learning in the form of different types of informative software. As the
education sector learns to harness the power of classroom devices, edge computing has
a profound effect on classrooms.
Edge computing technology enhances the educational applications and provides a
stage for quickness rather than reducing them down or ending them. It is a technology set
for future high growth, and will dramatically improve day-to-day operations for many
industries, including education.
Edge computing decentralizes computing resources and brings them closer to the data
source. When schools use edge computing, they prioritize connectivity and networking
across multiple campuses to eliminate slow speeds, which dramatically improves the
student and teacher experience [24]. When it comes to edge networks, computing, and
data storage, it’s close to the person building the application, device, or data [25].
Edge computing is already improving several higher education applications, such
as the quality of experience for end-users and network traffic management. There are
three learning and technology experts share how edge computing is gives a classroom-
education a boost [26]: (a) augmented and virtual reality, (b) Internet of Things and (c)
student’s outcomes.
Edge computing continues to evolve, using new technologies and practices to
enhance its capabilities and performance. Where edge computing is often situation-
specific today, technology is expected to become more ubiquitous and change the way
the Internet is used, bringing more abstraction and potential use to edge technology.
Wireless communications technologies, such as 5G, will also impact the deployment
and usability of edge in the coming years, enabling virtualization and automation capa-
bilities that are yet to be explored, such as better vehicle autonomy workloads migration
to the edge, while making wireless networks more flexible and cost-effective [27].

3.4 Role of Learners


Using TEL, the new generation of learners who have grown up with technology and
developing technical skills and learning preferences. The learners have become more
technology-savvy than ever before and much of their day is spent interacting with some
form of technology. Learner access to technology should also not be assumed. For
example, the vogue of smartphones and tablet devices (e.g., iPads, mobile phones) is
undeniable. But this does not mean either that all learners have access to such a kit or
that all learners have sufficient technical skills to use it well to support their learning.
It is a ‘dangerous assumption’ to think that ‘students living in new media environments
automatically comprehend how to use the new technologies [28].
The teachers make use of this enhancement of technology savviness by using tech-
nology to improve interaction and understanding within their classrooms and lectures.
The teachers are also increasing the concept of technology enhancement through the
learners: the rise of remote and hybrid learning, learners learn at their own pace, tech-
nology keeps learners engaged, and technology is necessary to succeed in the real world.
Survey of Technology-Enhanced Learning: Novel Pedagogical 99

With TEL, teachers are no longer limited to the textbooks that their organizations pro-
vide. Using other resources such as video, audio, and interactive learning, learners have
many ways to learn. Educators can find creative ways to teach their students in an inter-
esting way. Technology has changed the learning environment so that learning is more
hands-on [29].

4 Challenges in Facilitating Technology-Enhanced Learning


Through Edge Computing
In the edge computing, learners and teachers cannot access their files or applications. It
depends on the availability of high-speed internet access and the reliability of the cloud.
Anyone can access files anytime, anywhere if proper authentication is not available;
this is a security concern that should be protected [30]. Similarly, there can be other
challenges that may occur while facilitating TEL through edge computing. These are:

1. Strong Internet Maintenance and Availability: This is most important for


technology-enhanced training in education. Several cloud-based services are used
in instructive contexts, specifically in their interaction and association, which can be
very complex to the act and continuity of the network. Therefore, broadband con-
nections should be accessible for users to enjoy a proper educational experience. On
the other hand, having more capacity for broadband lines in educational institutions
can increase the cost of communication services, which compromises the promise
of cost investments [31].
2. Cost: Cost is also a problem in the classroom with technology because of bringing
new technology into the classrooms like computer systems and devices, networks,
etc. All students have no budget able to sustain this combination. There are several
approaches that can be applied to incorporate technology into the curriculum without
breaking the bank [32].
3. Issues of Privacy: The privacy of complex data is important in the instructive
domain. Some contributors have suggested that edge computing may be more pro-
tected than traditional distribution systems for securing these data. The security is a
big challenge that is on the edge of the computer. While those who believe that edge
computing will localize security concerns argue that the number of data processing
sites will increase in proportion to the attack surface. In addition, the only data pro-
cessing equipment and devices run the risk of becoming a vector of attack [19, 33,
34].
4. Power: This is a big challenge for students and teachers. For multi-tenancy needs,
technology-enhancements will need high-power processors to provide cloud-like
remote services to their learners and teachers. Therefore, learning will need high
voltage, three-phase power, which can be a difficult offer, especially in remote areas
[19, 35–38].

5 Conclusion and Future Perspectives


Today, TEL has become an essential part of our daily lives. As in every field, education
must continue to change with the help of technology. When this transformation begins
100 T. Kaur and S. Kaur

properly, we can say that the learning process will be positively affected. Technology-
enhanced learning environments not only encourage the transfer of content but also
support the use of robust re-evaluation methods. These environments are directed towards
the active participation of teachers and students and interaction between them. The
use of a technology-intensive learning environment contributes to the development of
students’ analytical thinking and problem-solving skills. It also allows teachers to follow
the learner’s position, organize the feedback system, and monitor their own situation.
In this paper, we have presented the pedagogical approaches of technology-enhanced
learning and presented the key role players of teachers, edge computing, and information
technology and learners in TEL.
Edge computing directs cloud computing data, applications, and services to the edge
of a network away from the cloud server. It brings cloud computing services and facilities
closer to the end-user and is regarded as by faster processing and faster response times.
Edge computing visualizes bringing services of cloud computing and utilities closer
to the end-user for ensuring fast processing of data-intensive applications. This paper
widely considers the essential concepts related to edge computing, presenting how edge
computing is used in education enabling TEL in education. It summarizes an analysis
of possible challenges in offering TEL through edge computing.
The future research can be directed towards offering knowledgeable material in
offline and online modes so that the future researchers can enhance their knowledge
irrespective of their location and time zones. Additionally, predictive analytics can be
integrated with the TEL such that student performances can be analysed as well as the
instructors can also utilise the predicted data for improvising their teaching approaches
or pedagogies.
Currently, the Information and Communication Technology (ICT) sector is wit-
nessing important reliability on novel IT technologies such as cloud computing, fog
computing and edge computing. Many ICT sectors are based upon these technologies
or an amalgamation of these. TEL sector can involve various educational frameworks
and models based on such impeccable underlying technologies.
Moreover, the world today is facing severe environmental sustainability issues per-
taining to the rising global warming and carbon emissions. The IT sector is being held as
the prime carbon and GreenHouse Gas (GHG) emitter. The TEL being enabled through
IT must be capable of proffering environmental sustainability goals. The United Nations
has also emphasized on enhancing environmental sustainability through TEL. The TEL
can be made greener by efficiently managing the resources green resource management
based on technologies and will be optimized for supporting a green and resource-aware
learning environment to promote sustainable education for a sustainable future.

References
1. Srinivasan, A., Quadir, M.A., Vijayakumar, V.: Hybrid cloud for the educational sector.
Procedia Computer Sci. 50, 37–41 (2015)
2. Li, H., Ota, K., Dong, M.: Learning IoT on edge: deep learning for the Internet of Things
with edge computing. IEEE Network 32(1), 96–101 (2018)
3. Selviandro, N., Hasibuan, Z.A.: Cloud-based e-learning: a proposed model and benefits by
using e-learning based on cloud computing for educational institutions. In: Information and
Survey of Technology-Enhanced Learning: Novel Pedagogical 101

Communication Technology-EurAsia Conference, pp. 192–201. Springer, Berlin, Heidelberg


(2013). https://doi.org/10.1007/978-3-642-36818-9_20
4. Kaur, S., Kaur, T., Sharma. A.: Cloud-Enabled Education-as-a-Service (EaaS)- a review.
In: the 6th International Conference on Information and Communication Technology for
Sustainable Development. Springer, Goa (2021). https://doi.org/10.1007/978-981-16-5987-
4_40
5. https://www.affirmednetworks.com/key-capabilities-of-edge-computing/ Accessed 29 Mar
2021
6. Wang, F., Hannafin, M.J.: Design-based research and technology-enhanced learning environ-
ments. Educ. Tech. Research Dev. 53(4), 5–23 (2005)
7. Goodyear, P., Retalis, S.: Technology-enhanced learning: Design patterns and pattern
languages. BRILL (2010)
8. Kaur, T.: Challenging the cloud. J. Gujarat Res. Society 21(6), 521–531 (2019)
9. Khan, W.Z., Ahmed, E., Hakak, S., Yaqoob, I., Ahmed, A.: Edge computing: a survey. Futur.
Gener. Comput. Syst. 97, 219–235 (2019)
10. https://internationalbanker.com/technology/the-importance-of-edge-computing/ Accessed
10 Apr 2021
11. https://www.infosys.com/services/incubating-emergingtechnologies/offerings/documents/
edge-computing.pdf Accessed 29 Mar 2021
12. https://www.byteant.com/blog/how-edge-computing-will-transform-most-industries-future/
Accessed 31 Apr 2021
13. Hassan, N., Gillani, S., Ahmed, E., Yaqoob, I., Imran, M.: The role of edge computing in the
internet of things. IEEE Commun. Mag. 56(11), 110–115 (2018)
14. https://www.bmc.com/blogs/edge-computing/ Accessed 8 Apr 2021
15. Chris, C.: How To Use Education-as-a-Service “EaaS” For A Tech Economy https://woz-
u.com/blog/how-to-leverage-education-as-a-service-eaas-and-tap-into-the-tech-economy/
Accessed 10 Apr 2021
16. Bailey, C.J., Card, K.A.: Effective pedagogical practices for online teaching: perception of
experienced instructors. Internet and Higher Educ. 12, 152–155 (2009)
17. Sen, A., Leong, C.K.C.: Technology-Enhanced Learning: Springer Nature Switzerland AG
1–7 (2020)
18. Li, F., Qi, J., Wang, G., Wang, X.: Traditional classroom vs e-learning in higher education: dif-
ference between students’ behavioral engagement. Int. J. Emerging Technologies in Learning
(iJET) 9(2), 48–51 (2014)
19. Barindra, D.: Traditional Learning Vs. Online Learning https://elearningindustry.com/tradit
ional-learning-vs-online-learning Accessed 9 Apr 2021
20. Emma, C.: What is Technology Enhanced Learning, and why is it important? https://www.
mentimeter.com/blog/interactive-classrooms/what-is-technology-enhanced-learning-and-
why-is-it-important Accessed 12 Apr 2021
21. Kali, Y., McKenney, S., Sagy, O.: Teachers as designers of technology enhanced learning.
Instr. Sci. 43(2), 173–179 (2015)
22. https://www2.deloitte.com/uk/en/pages/risk/articles/edge-computing-purpose-to-potential.
html Accessed 11 Feb 2021
23. Chana, I., Kaur, T.: Delivering IT as A Utility-A Systematic Review. arXiv preprint arXiv:
1306.1639 (2013)
24. Hussain, I., Safdar, M.: Note for editor: role of information technologies in teaching learning
process: perception of the faculty. Turkish Online J. Distance Educ. 9(2), 46–56 (2008)
25. https://edtechmagazine.com/k12/article/2018/08/how-edge-computing-could-benefit-k-12-
classrooms Accessed 9 Apr 2021
26. https://innovationatwork.ieee.org/3-ways-edge-computing-gives-classroom-learning-a-
boost/ Accessed 7 July 2022
102 T. Kaur and S. Kaur

27. https://www.techtarget.com/searchdatacenter/definition/edge-computing Accessed 7 July


2022
28. Bullock, A., de Jong, P.G.: Technology-enhanced learning. In: Understanding Medical
Education: Evidence, Theory and Practice, pp. 149–160 (2013)
29. https://www.mentimeter.com/blog/interactive-classrooms/what-is-technology-enhanced-lea
rning-and-why-is-it-important
30. Kaur, T., Chana, I.: Energy efficient cloud: trends, challenges, and future directions. In:
International Conference on Next Generation Computing and Communication Technologies
(ICNGCCT 14) (2014)
31. https://datacenterfrontier.com/how-edge-computing-solves-challenges-with-digital-transf
ormation/ Accessed 12 Apr 2021
32. https://www.cloudflare.com/en-gb/learning/serverless/glossary/what-is-edge-computing/
Accessed 30 Mar 2021
33. H880 | Technology-enhanced Learning | Open University Accessed 7 Apr 2021
34. Shilpa, T.K.: Digital healthcare: current trends, challenges and future perspectives. In: Future
Technologies Conference (FTC), Canada, October-2021
35. Kaur, T., Chana, I.: Energy efficiency techniques in cloud computing-a survey and taxonomy.
ACM Computing Surveys. Impact Factor- 6.7, 48(2) 146 (2015). http://dx.doi.org/https://doi.
org/10.1145/2742488
36. Kaur, T., Chana, I.: Energy aware scheduling of deadline-constrained tasks in cloud
computing. Clust. Comput. 19(2), 679–698 (2016). https://doi.org/10.1007/s10586-016-
0566-9
37. Kaur, T., Chana, I.: GreenSched: an intelligent energy aware scheduling for deadline-and-
budget constrained cloud tasks. Simulation Modelling Practice and Theory, Elsevier, Impact
Factor: 1.9, 82, 55–83 (2017). https://doi.org/10.1016/j.simpat.2017.11.008
38. Suleiman, M.M., Kaur, T., Kuliya, M.: Impact of ICT for the 21st century: a change driving
tools for tertiary education in Nigeria. Int. J. Manage. Humanities 4(10), 42-49, (2020)
True-Ed Select Enters Social Computing:
A Machine Learning Based University
Selection Framework

Jerry Cearley and Vivek K. Pallipuram(B)

University of the Pacific, Stockton, CA 95211, USA


[email protected]

Abstract. University/College selection is a daunting task for young


adults and their parents alike. This research presents True-Ed Select,
a machine learning framework that simplifies the college selection pro-
cess. The framework uses a four-layered approach comprising user sur-
vey, machine learning, consolidation, and recommendation. The first
layer collects both the objective and subjective attributes from users
that best characterize their ideal college experience. The second layer
employs machine learning techniques to analyze the objective and sub-
jective attributes. The third layer combines the results from the machine
learning techniques. The fourth layer inputs the consolidated result and
presents a user-friendly list of top educational institutions that best
match the user’s interests. We use our framework to analyze over 3500
United States post-secondary institutions and show search space reduc-
tion to top 20 institutions. This drastically reduced search space facili-
tates effective and assured college selection for end users.

Keywords: Machine learning · Social computing · Education ·


College selection

1 Introduction

Each year, many young individuals (and their parents) throughout the world
face a burning question: “Which university or college to attend? How much
tuition will I pay? Will they help me get a job? Is it worth it?” Considering
the magnitude of this decision and its impact on the prospective students, it
is reasonable to examine as many schools as possible before deciding on which
school to attend. With over 5000 schools to choose from in the United States
alone, the task of fully investigating each school’s alignment with that of the
student’s goals and desires can be daunting and unrealistic. There are far too
many institutions to examine and not enough time for anyone to review all of
them to make the most beneficial decision possible.
To address the above challenge, this paper delves into social computing
and presents True-Ed Select, a machine learning framework to facilitate uni-
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
K. Arai (Ed.): FTC 2022, LNNS 561, pp. 103–120, 2023.
https://doi.org/10.1007/978-3-031-18344-7_7
104 J. Cearley and V. K. Pallipuram

versity/college1 selection. Using a four-layer approach, the framework examines


users’ goals and desires to drastically reduce the search space from thousands of
universities/colleges to a list of top 20. In this research, we specifically focus on
medical school aspirants.
In the first layer, the framework inputs five objective attributes that influence
a user’s decision. These include the distance from home, availability of financial
aid, availability of the desired major, career services, and the cost of attendance
(tuition and living expenses). The framework also inputs a subjective attribute:
a description of the user’s ideal college experience (henceforth natural language
summary).
In the second layer, the framework leverages two machine learning (ML)
techniques to study the objective and subjective attributes. These ML techniques
include the content-based filtering (CBF) [14] and convolutional deep semantic
similarity model (CDSSM) [4], respectively. CBF inputs the objective attributes
and compares them against the attributes of universities existing within the
Integrated Postsecondary Education System (IPEDS) database [1]. For each
user, CBF outputs an objective score (ranging from –4 to 5) that denotes the level
of match between the user and the institution. The higher the score, the better
the match. CDSSM analyzes the natural language summary and compares it
against the mission statements of institutions in the IPEDS database to generate
a CDSSM score between 0 and 1.
In the third layer, the framework breaks ties between institutions with similar
CBF scores. Specifically, the framework uses the institutions’ CDSSM scores as
weights to scale their CBF scores. The scaling results in True-Ed Select (TES)
scores, which are used in the fourth layer to generate a curated list of educational
institutions that best match with users’ interests.
The rest of the paper is organized as follows. Section 2 discusses the literature
in the area of college selection and machine learning. Section 3 provides a back-
ground on CBF and CDSSM machine learning techniques. Section 4 explains the
True-Ed Select framework. Section 5 shows the framework in action. The paper
concludes in Sect. 6 with conclusions and future work.

2 Related Work

Several research activities target the construction of recommendation systems


using machine learning. Sharma et al. [12] develop a web architecture that pro-
vides college recommendations at the graduate level to undergraduate users.
Their method does not address the semantic complexity inherent in string rep-
resentation for information retrieval. Nikhil et al. [9] seek to address the seman-
tic structure problem by applying convolution to a deep structured similarity
model. Their approach does not include the relevant objective attributes that
characterize the target and influence the user’s decision.

1
The terms ‘university’, ‘college’, ‘institution’, and ‘school’ refer to a post-secondary
educational institution and may be used interchangeably.
True-Ed Select 105

Rutkowski et al. [11] apply neuro-fuzzy approach to content based recom-


mendation system to learn user decision paths for simulation and prediction.
However their method requires a record of past user history to model the user
decision paths, making them inoperable when the user information is absent.
Additionally, the model lacks the semantic structure present in natural language
processing, which extracts the user’s sentiments. Nayak et al. [8] use fuzzy cog-
nitive maps to model university desirability and selection by gather data from
cohort groups within a single university. This approach does not account for
variability that exists across users.
Shen et al. [13] present a robust CDSS model to produce latent semantic
structures using convolutional-pooling structures applied to word-n-gram struc-
tures for information retrieval. Their framework does not account for objective
user-feature selection such as the ones used in our research.
Hu et al. [5] propose a revised analytic hierarchy process for university selec-
tion using primitive cognitive network for criteria importance comparison. The
technique applies to objective criteria gathered from a small subset of institu-
tional and user preference data.
Lee et al. [6] apply a rule based inference system derived from a psycholog-
ical test to produce university program recommendations. Their system while
unique, requires a substantial time investment by the user for user questionnaire
completion and relies on subjective rule based inferences obtained from the Hol-
lands Personality Test as opposed to contemporary machine learning techniques.
Powar et al. [10] propose system based on sentiment analysis to process
user and alumni textual rankings to provide similarity scores to the user. Their
proposed system, while good in theory, fails to provide any concrete evidence of
the implementation and efficacy of the system.
Muladi et al. [7] analyze the performance differences of Naive Bayes and Naive
Bayes plus SMOTE classification systems to determine the most accurate model
for predicting university student selection. The model uses established institu-
tional criteria for university entrance selection improving upon Naive Bayes alone
with the addition of SMOTE accounting for imbalanced classes.
Zhao et al. [15] use a collaborative filtering approach to determine user rec-
ommendations based on user service network evaluations and trust level scores.
This approach studies a broad set of similar users, but fails to focus on the user’s
individual preferences.
Guo et al. [3] compare existing information filtering techniques utilizing a
deep learning collaborative filtering method to optimize personalized recom-
mendation generation. The solution only partially solves the cold start problem
associated with machine learning, requiring user feedback of the initial user rec-
ommendation output.
The above systems employ either an objective or subjective selection pro-
cesses, which prevents them from providing a well-rounded recommendation
to users. Our system combines the two in an attempt to fully encompass the
users’ desirability. Via the content-based filtering, the framework targets objec-
tive institutional characteristics (distance, financial aid, attendance cost, etc.),
106 J. Cearley and V. K. Pallipuram

all of which highly concern the users. The CDSSM component handles the user’s
subjective natural language input to produce likely matches between the user
and the institutions. Our framework’s ability to analyze both the objective and
subjective attributes enables it to offer a well-rounded recommendation to the
end-users.

3 Background
In this section, we provide background on the content-based filtering machine
learning (ML) and convolutional deep semantic similarity (CSSSM) model
employed in this research.

3.1 Content-Based Filtering


The Content Based Filtering Model (CBF) [14] uses matrix multiplication to
produce similarity scores between feature matrices and target feature vectors.
The pre-defined feature matrix is structured in an M × N column row format of
labels and their respective feature components. Additionally, the target feature
vector is comprised of a target label and identical feature components. Feature
components of the matrix and vector are unique for each respective label and
typically represent a Boolean value. Matrix dot product operation is performed
between the feature matrix and target vector multiplying all feature elements in
a label by label basis and summing all products, producing a unique similarity
score across all labels. Each respective label score output indicates how similar
the feature label is to the target label, higher scores implying a greater degree
of similarity and lower scores a lesser degree of similarity.

3.2 CDSSM
The Convolutional Deep Semantic Similarity Model (CDSSM) [4,13] uses a series
of machine learning and natural language processing techniques to find similarity
between a user’s document and existing documents. The CDSSM process can be
broken into four distinct stages: 1) user data collection, 2) data vectorization, 3)
convolution semantic layer, and 4) cosine similarity.
In the first stage, the model collects the user data and the existing documents
(from a defined database) for comparison. In the second stage, the model pro-
duces word-n-grams, obtained from a sliding contextual window run across input
word sequences. Letter-trigrams are further produced from each word-n-gram as
a letter-trigram vector. CDSSM concatenates the letter-trigrams of each word
to yield a trigram representation for each word-n-gram in the words sequence.
In the third stage, a convolution operation extracts contextual information from
the word-n-gram sentences structure. CDSSM then applies max pooling to down-
sample the features in the word-n-grams to produce fixed-length feature vectors
across all the dimensions. Next, a semantic dense layer comprising a feed-forward
neural network extracts the non-linear semantic feature vectors. CDSSM applies
True-Ed Select 107

the third stage to both the user document and the existing documents to gen-
erate their respective semantic feature vectors. The fourth stage computes the
cosine similarity between the user-document vector and the existing-documents
vectors to generate a list of CDSSM scores. The CDSSM score ranges from 0 to
1 and denotes the level of similarity between the user document and the existing
documents. Further details on the algorithm can be found in [4,13].

4 Methodology
Figure 1 shows the Tru-Ed Select framework comprising four successive lay-
ers: user-interaction, machine learning, consolidation, and recommendation. The
first layer, user-interaction extracts users’ preference for common university
attributes that influence their decision. The second layer, machine-learning (ML)
inputs the users’ preference from the first layer to compare against the Integrated
Postsecondary Education System (IPEDS) [1] database. The third layer, consol-
idation combines the results from the ML techniques in layer-2. The fourth layer,
recommendation presents a curated list of universities that are most amenable
to users’ interests. Sections 4.1–4.3 expound on these layers.

Fig. 1. Overview of the true-Ed select framework comprising four layers: user interac-
tion, machine learning, consolidation, and recommendation.

4.1 Layer 1: User-Interaction

Figure 2 shows layer-1, user-interaction that creates profiles for multiple users.
Specifically, it asks users questions about the attributes that influence their
choice of universities. We use five objective attributes and one subjective
attribute. The objective attributes include college location (distance from home),
financial aid, career services, medical school major availability, and cost of atten-
dance. The subjective attribute includes a natural language summary of what
108 J. Cearley and V. K. Pallipuram

Fig. 2. Layer 1: user profile construction.

Fig. 3. Layer 1: Survey questions employed by the framework to construct the user
profiles.
True-Ed Select 109

users desire for their ideal school experience. Figure 3 shows the survey questions
pertaining to each of the six attributes.
The location attribute asks users their willingness to relocate on a scale of
1 (not willing to move) to 10 (willing to move over 1000 miles). The financial
aid attribute asks users the importance of financial aid on a scale of 1 (not
important) to 4 (very important). To normalize this attribute, we multiply the
rating by 0.25—a normalized value equal to 0 means that aid is not critical, and
1 implies that aid is highly important. The career services attribute asks users
about the level of assistance provided by the university to secure part-time/full-
time jobs. This attribute uses the same scale as the financial aid attribute. The
medical school is a Boolean attribute that inquires about users’ interest in a
medical major. The cost of attendance attribute includes both the tuition and
cost-of-living. This attribute ranges from less than $5K to ‘do not care’. The
natural language summary attribute inputs the user’s free form response on
their expectations for an ideal college experience. For instance, a user may wish
to attend an engineering-specific school with small classroom sizes and a high
return on investments (ROI).
After extracting the user profiles, layer-1 passes this information to layer-2
for machine learning.

4.2 Layer-2: Machine Learning


The layer-2, machine learning comprises two machine learning (ML) models: 1)
content-based filtering to process the five objective attributes namely the loca-
tion/distance, financial aid, medical school, career services, and cost of atten-
dance; and 2) the convolutional deep semantic similarity model (CDSSM) to
process the subjective attribute (natural language summary).
Content based filtering—Fig. 4 shows the content-based filtering (CBF) ML
with its two stages including data-formatting and dot product. These two suc-
cessive stages produce the preference matrix that provides objective scores to
universities based on the users’ preference. The data-formatting stage inputs
the user profile generated in layer-1 and the university data existing within the
IPEDS database. The stage formats this information into the user profile (U P )
matrix and IPEDS data (ID) matrix, respectively. The dot-product stage mul-
tiplies the two matrices to yield the preference matrix (P REF ). Each cell in
P REF denotes the CBF score between a user and a given institution. In our
implementation, the score ranges from –4 to 5 where the higher the score, the
better the match.
Figure 5 details the data-formatting and dot-product stages employed by
CBF. The figure provides specific examples of the IPEDS data (ID) matrix
from the IPEDS database and the user profile (UP) matrix generated by the
framework’s layer-1. As seen in the figure, the rows of the ID matrix denote the
specific universities from the IPEDS database. The columns of the ID matrix
denote the five objective attributes for college selection: zip code for the distance
calculation, financial aid, medical school availability, career services, and cost of
attendance. The distance of users from the universities varies because users may
110 J. Cearley and V. K. Pallipuram

Fig. 4. The content-based filtering method and its two components: data-formatting
and Dot product.

apply from across the country. We employ an API [2] to calculate the distance
between a user’s home zip code and a given university’s zip code.
The rows of the UP matrix denote the specific users and the columns rep-
resent the five subjective attributes provided by them. The pseudo-code (Fig. 5
bottom right) describes the dot product operation and distance calculation to
yield the preference matrix, P REF . The CBF ML passes the preference matrix
to layer-3 for further processing.
Convolutional Deep Semantic Similarity Model (CDSSM)—Fig. 6 shows the
four stages of CDSSM namely the user survey, data vectorization, convolutional
deep semantic neural network (CDSNN), and cosine similarity. The user survey
stage parses the user’s subjective input (natural language summary) and the
mission statements of universities saved in the IPEDS database. The data vec-
torization stage converts the user input to word-n-grams to create tokens of the
words in the summary. The stage also creates the word-n-grams of the words in
the mission statements of the universities. The stage transforms each word-n-
gram into letter tri-grams, which are concatenated to form the tri-gram vector
representation for each word. These vectors constitute the letter tri-gram matrix.
The CDSNN stage performs convolution on the letter tri-gram matrix by taking
the hyperbolic tangent of the product of the convolution mask and the letter
tri-gram matrix values. Next, the stage performs max pooling to suppress the
insignificant features from the final semantic layer. The cosine similarity stage
performs a cosine similarity operation on the users’ natural language semantic
matrix and the mission statement semantic matrix to obtain a similarity matrix
containing CDSSM scores. In this matrix, the rows denote the multiple users and
columns denote the institutions from the IPEDS database. The CDSSM score
ranges from 0 to 1 and denotes the level of match between a given user’s sum-
mary and a given university’s mission statement. For a given user, CDSSM’s final
True-Ed Select 111

Fig. 5. Content-Based filtering: the details of data-formatting and Dot-product to yield


the preference matrix.

output is a list of CDSSM scores (between the user and multiple universities),
which is passed to layer-3 for further processing.

Fig. 6. Convolutional deep semantic similarity model (CDSSM) with its four stages:
user survey, data vectorization, CDSNN, and cosine similarity.

4.3 Layers 3 and 4: Consolidation and Recommendation


The layer-3, consolidation breaks ties between universities that have similar CBF
scores. To break ties, layer-3 scales the CBF score for each university with its
corresponding CDSSM score. This scaling yields the True-Ed Select (TES) score,
112 J. Cearley and V. K. Pallipuram

which presents the level of match between a user’s preferences (both objective
and subjective) with a given university. The layer-4, recommendation provides
a user-friendly interface to read the curated list of top 15–20 universities that
best match the users’ interests.

5 Tru-Ed Select in Action

This section shows the True-Ed Select framework in action. As a proof-


of-concept, we consider four fictitious users with varying choices in college
attributes. The five objective attributes for these users appear in Table 1. Table 2
provides the natural language summary used by the CDSSM ML.
The users in Table 1 allow us to reasonably cover the spectrum of typical
university aspirants. User-1, John Tesla is willing to pay, however, they are not
willing to move farther from home. They are interested in a medical school major
but not very keen on career services. User-2, Lindsey Croft is willing to pay for
medical school. Additionally, they are interested in aid and career services. User-
4, Urg Golum, has similar aspirations as User-2 but they are far less willing to
pay a high cost. User-3, Nicola Cina, is interested in a major other than medical
school. They are willing to pay reasonable costs and are highly interested in
career services.

Table 1. The values of five objective attributes selected by four randomized users.

User Attendance cost Distance Med school Career services Aid


John Tesla 1000000 840 1 0 0
Lindsey Croft 60000 560 1 3 2
Nicola Cina 35000 2240 0 5 1
Urg Golum 25000 1400 1 2 3

Table 2. The natural language summary (subjective attribute) for the users.

User Natural language summary


John Tesla “I want to go to a prestigious school with other elite
individuals”
Lindsey Croft “The college should be competitive, helpful, and
have high academic standards”
Nicola Cina “I want to challenge myself in both academics and
life”
Urg Golum “A school that can help me get what I want out of
life and help me pay for tuition”
True-Ed Select 113

We use the six attributes in Table 1–2 and IPEDS data to test the CBF and
CDSSM ML. Specifically, we use these attributes to generate the CBF, CDSSM,
and True-Ed Select scores for various universities in the IPEDS database.
As a proof-of-concept, Sect. 5.1 provides the sample scores generated by the
CBF, CDSSM, and the overall framework for the four fictitious users across the
most interesting set of universities. Section 5.2 discusses the recommendations
provided to the four users by the framework.

5.1 Proof-of-Concept

Table 3 provides the CBF score, CDSSM score, and the framework’s True-Ed
Select (TES) score across four selected universities for two users: John Tesla and
Lindsey Croft. While these universities have similar CBF scores, the framework
employs the CDSSM score as a tie-breaker. As seen in this table, the framework
recommends Auburn University to John Tesla and Methodist College to Lindsey
Croft as the top institution. Similarly, Table 4 provides the sample CBF, CDSSM,
and TES scores across four universities for Nicola Cina and Urg Golum.

Table 3. The scores obtained by the content-based filtering (CBF), CDSSM, and the
overall framework (true-Ed select (TES) score) for John Tesla and Lindsey Croft.

Institution John Tesla Institution Lindsey Croft


CBF CDSSM TES CBF CDSSM TES
Auburn Univ. 3 0.226 0.68 Methodist Coll. 4.25 0.26 1.12
Emory univ. 3 0.22 0.67 Valparaiso Univ. 4.25 0.227 0.96
Mercer univ. 3 0.20 0.6 Sullivan Coll. 4.25 0.213 0.91
Univ. Chicago 3 0.18 0.54 Spencerian Coll. 4.25 0.205 0.8

Table 4. The scores obtained by the content-based filtering (CBF), CDSSM, and the
overall framework (true-Ed select (TES) score) for Nicola Cina and Urg Golum.

Institution Nicola Cina Institution Urg Golum


CBF CDSSM TES CBF CDSSM TES
Hickey Coll. 3.5 0.223 0.78 Kansas State. 4.25 0.245 1.04
Univ. Mary 3.5 0.2138 0.75 Mississippi State. 4.25 0.197 0.84
Clarion Univ. 3.5 0.2138 0.77 Univ. Central Fl. 3.75 0.191 0.72
Univ. Pittsburgh 3.5 0.2136 0.75 Univ. Florida 3.75 0.181 0.58
114 J. Cearley and V. K. Pallipuram

Fig. 7. University suggestions for John Tesla (top); Distribution of universities for
various scores per user’s choice of attributes (bottom).

5.2 Analysis for Individual Users

User-1, John Tesla—Figure 7 (top) shows the United States of America (USA)
top colleges for John Tesla based on their choice of university attributes in
Table 1. John’s location is marked with a star and the university selections appear
in dots. Because John wishes to stay near home, True-Ed Select obtains nearby
colleges/universities.
Figure 7 (bottom) shows the frequency of institutions across CBF scores
considered by True-Ed Select as per the user’s choices. The distribution is
right-skewed about the score equal to 0, meaning that True-Ed Select sifts low-
True-Ed Select 115

performing schools to low scores and top-performing schools to high scores. This
process significantly reduces the search space for this user. Specifically, True-Ed
Select identifies 12 schools that obtain a score of 3.5 and above, which makes it
easier for John to evaluate their options versus evaluating hundreds of schools.
It is worth noting that a right-skewed distribution is the most desirable because
such a distribution only keeps the top-performing institutions at high scores.
This kind of distribution is evidently achieved via a relaxed choice of attributes,
such as the ones given by John Tesla.

Fig. 8. University suggestions for Lindsey Croft (left); distribution of universities for
various scores per user’s choice of attributes.
116 J. Cearley and V. K. Pallipuram

Fig. 9. University suggestions for Nicola Cina (Left); distribution of universities for
various scores per user’s choice of attributes.

User-2, Lindsey Croft—Fig. 8 (top) shows the top university selections for
Lindsey Croft. As seen in Table 1, Lindsey is not willing to move far away from
home; however they are willing to pay high cost of attendance, and desire career
services and financial aid. True-Ed Select identifies schools (in dots) that best
match Lindsey’s university attribute values.
Figure 8 (bottom) shows the distribution of universities across CBF scores.
Because Lindsey’s attribute choices are more firm than John Tesla, the distribu-
tion appears to be symmetric about the score of 0. Nonetheless, True-Ed Select
True-Ed Select 117

Fig. 10. University suggestions for Urg Golum (Left); distribution of universities for
various scores per user’s choice of attributes.

identifies 14 schools (out of 3753 throughout the USA) that obtain a score above
4, which simplifies this user’s task of college selection.
User-3, Nicola Cina—Fig. 9 (top) provides the university selection for Nicola
Cina. As per Table 1, this user is willing to move over 2000 miles. Therefore,
True-Ed select identifies universities/colleges that are away from their home and
yet satisfying the other attributes.
118 J. Cearley and V. K. Pallipuram

Figure 9 (bottom) provides the frequency of universities across CBF scores.


Unlike the previous two users, this user is willing to move but unwilling to
pay a high education cost. Additionally, they have a high preference for career
services and some need for financial aid. These firm requirements lead to a
marginally right-skewed distribution. Nonetheless, True-Ed identifies 11 schools
(out of 3753) with a score of 3.5 or above.
User-4, Urg Golum—Fig. 10 (top) shows the selected universities/schools for
this user. As per Table 1, Urg Golum is willing to move a significant distance
(1400 miles from their hometown on the northeast coast), therefore True-Ed
Select identifies institutions that span to mid-west and to the south.
Figure 10 (bottom) shows the distribution of universities across CBF scores.
Although this user is willing to move, they have strict requirements for cost-
of-attendance, career services, and financial aid. Therefore, the distribution
marginally skews to the right side. True-Ed Select identifies 14 schools (out
of 3753) with a score of 4 and above for this user, thereby significantly reducing
the search space.
This section demonstrates the effectiveness of True-Ed Select in satisfying
the user’s choices of university attributes and providing a short list (less than
15 schools) of colleges/universities that best match the users’ interests.

6 Conclusion
We present True-Ed Select, a machine learning framework to facilitate user-
friendly college/university selection. Our framework uses common objective and
subjective attributes to select a concise list of colleges/universities for the users.
The objective attributes include the distance of the schools from home, financial
aid availability, career services, choice of major, and cost of attendance (tuition
and living expenses). The subjective attribute includes a free-form response from
users describing their ideal choice of a university. The machine learning stage,
comprising content-based filtering and convolutional deep semantic similarity
model (CDSSM), inputs these attributes for machine learning. For a given user,
the framework produces objective scores, True-Ed Select scores, for different
universities within the IPEDS database. The framework sifts the low-performing
universities to low scores and keeps only a small set of schools in the high score
range. This process effectively reduces the search space, from several thousand
schools to less than 20, greatly simplifying the college selection task for the users.
This framework is currently a proof-of-concept. In the future, we aim to
include additional objective attributes such as GPA (grade point average), return
on investment (ROI), and graduation rate for a well-rounded recommendation.
After obtaining pertinent approvals from the Institutional Review Board (IRB),
we aim to conduct user surveys for an in-depth analysis of our framework. While
the research presented only considers the United States universities/colleges,
the framework seamlessly lends itself to universities in the other countries. We
envision that this open-source framework will be a valuable addition to the field
of social computing, globally helping high-school students and their parents with
the daunting task of college selection.
True-Ed Select 119

Finally, we note that the universities used in this work are for research pur-
poses only. The analysis showed only demonstrates the framework’s functional-
ity; the results should not be construed as actual recommendations provided by
the authors.

References
1. IPEDS: Integrated Postsecondary Education Data System. https://nces.ed.gov/
ipeds/. Accessed 14 Mar 2022
2. Sami: Zip Code Latitude Longitude City State County (2022) https://www.
mathworks.com/matlabcentral/fileexchange/45905-zip-code-latitude-longitude-
city-state-county, MATLAB Central File Exchange. Retrieved 14 March
2022. https://www.mathworks.com/matlabcentral/fileexchange/45905-zip-code-
latitude-longitude-city-state-county. Accessed 14 Mar 2022
3. Guo, W.-W., Liu, F.: Research on collaborative filtering personalized recommen-
dation algorithm based on deep learning optimization. In: 2019 International Con-
ference on Robots Intelligent System (ICRIS), pp. 90–93 (2019)
4. He, H., Lin, J.: Pairwise word interaction modeling with deep neural networks for
semantic similarity measurement. In: Proceedings of the 2016 Conference of the
North American chapter of the Association for Computational Linguistics: Human
Language Technologies, pp. 937–948 (2016)
5. Hu, O., FungYuen, K.K., Craig, P.: Towards a recommendation approach for uni-
versity program selection using primitive cognitive network process. In: 2017 Inter-
national Conference on Service Systems and Service Management, pp. 1–4 (2017)
6. Lee, C.P., Ng, Z.B., Low, Y.E., Lim, K.M.: Expert system for university program
recommendation. In: 2020 IEEE 2nd International Conference on Artificial Intel-
ligence in Engineering and Technology (IICAIET), pp. 1–6 (2020)
7. Muladi, U.P., Qomaria, U.: Predicting high school graduates using naive Bayes
in state university entrance selections. In: 2020 4th International Conference on
Vocational Education and Training (ICOVET), pp. 155–159 (2020)
8. Nayak, P.K., Madireddy, S., Case, D.M., Stylios, C.D.: Using fuzzy cognitive maps
to model university desirability and selection. In: 2017 IEEE International Confer-
ence on Systems, Man, and Cybernetics (SMC), pp. 1976–1981 (2017)
9. Nikhil, N., Srivastava, M.M.: Content based document recommender using deep
learning. In: 2017 International Conference on Inventive Computing and Informat-
ics (ICICI), pp. 486–489. IEEE (2017)
10. Powar, V., Girase, S., Mukhopadhyay, D., Jadhav, A., Khude, S., Mandlik, S.:
Analysing recommendation of colleges for students using data mining techniques.
In: 2017 International Conference on Advances in Computing, Communication and
Control (ICAC3), pp. 1–5 (2017)
11. Rutkowski, T., Romanowski, J., Woldan, P., Staszewski, P., Nielek, R., Rutkowski,
L.: A content-based recommendation system using neuro-fuzzy approach. In: 2018
IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 1–8 (2018)
12. Sharma, V., Trehan, T., Chanana, R., Dawn, S.: StudieMe: college recommendation
system. In: 2019 3rd International Conference on Recent Developments in Control,
Automation Power Engineering (RDCAPE), pp. 227–232 (2019)
13. Shen, V., He, X., Gao, J., Deng, L., Mesnil, G.: A latent semantic model with
convolutional-pooling structure for information retrieval. In: Proceedings of the
23rd ACM International Conference on Conference on Information and Knowledge
120 J. Cearley and V. K. Pallipuram

Management, CIKM 2014, pp. 101–110, New York, NY, USA. Association for
Computing Machinery (2014)
14. Van Meteren, R., Van Someren, M.: Using content-based filtering for recom-
mendation. In: Proceedings of the machine learning in the new information age:
MLnet/ECML2000 Workshop, vol. 30, pp. 47–56 (2000)
15. Zhao, W., Zhang, W.: Collaborative filtering service recommendation algorithm
based on trusted user and recommendation evaluation. In: 2018 IEEE 4th Inter-
national Conference on Computer and Communications (ICCC), pp. 2248–2255
(2018)
Exploring Public Cloud-ERP Systems’ Impact
on Organizational Performance

Maria Øverdal, Moutaz Haddara(B) , and Marius Langseth

Kristiania University College, Oslo, Norway


[email protected]

Abstract. Moving enterprise resource planning (ERP) systems to the cloud seems
inevitable. The cloud-ERP systems provide many opportunities to organizations.
On the other hand, barriers, and challenges to this move still exist. This research
provides an overview of relevant academic literature on public cloud-ERP migra-
tion and identifies the status quo for the body of knowledge within cloud-ERPs
delivered through software-as-a-service (SaaS). In addition, this paper explores the
motivators and inhibitors, and if cloud-ERP provides the ability to enhance orga-
nizational performance from an IT perspective. The study is motivated by a gap in
research in investigating if cloud-ERP systems’ characteristics encourage or hin-
der enterprises from migrating to the public cloud. Cloud-ERPs delivered in SaaS
models are distributed as a service over the Internet, usually through a public cloud
infrastructure with shared resources. SaaS enables organizations to pay for services
and functions they use and removes the need to maintain the complex information
technology infrastructure by client organizations. The cloud-ERP-system’s char-
acteristics found in the literature are organized, and presented based on DeLone,
and McLean’s IS success model dimensions, and the Technology-Organization-
Environment (TOE) framework. Our main findings suggest that cloud-ERP’s sys-
tem and service quality are the most discussed issues in the literature. The system
quality attributes of cloud-ERPs identified are scalability, availability, accessibil-
ity, reliability, and the ability to compose and customize web services, motivat-
ing organizations to adopt cloud-based ERP. On the other hand, service quality
attributes are found as inhibitors for moving to the cloud due to the organization´s
dependency on vendors´ support and service throughout the product lifecycle, and
the security risks related to the public cloud environment.

Keywords: Cloud-ERP · ERP systems · Enterprise systems · Migration ·


Organizational performance

1 Introduction
Information technology (IT) plays an essential role in the performance of business activ-
ities and their success. The motivation for pursuing IT and information system (IS)
enhancements usually emerges from realizing a need to optimize processes, reduce
costs and resources, improve productivity and efficiency, and increase organizational
competitiveness. Today, those benefits are especially relevant as companies confront

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 121–137, 2023.
https://doi.org/10.1007/978-3-031-18344-7_8
122 M. Øverdal et al.

an increasingly competitive environment with flexible and market-oriented structures


and constant technological innovations. For businesses to survive the highly competitive
environment, both large and small firms have adopted cloud-based Enterprise Resource
Planning (ERP) solutions, as these cloud computing services provide scalability, relia-
bility, availability, cost-effectiveness, time savings, and ease of updates [1, 2]. With the
advance of Internet technology and globalization, ERP systems have been web-enabled,
providing access to information and communications via the Internet as a part of global
business strategy [3].
The rise of mobility and on-demand services has contributed to the development of
web-based ERP, such as Cloud-based computing in the form of Software as a service
(SaaS). SaaS simplifies the utilization of many software applications remotely, elasti-
cally, and seamlessly [4]. Hence, the utilization of ERP systems delivered thru SaaS may
outperform traditional IT offers and profoundly change the way organizations use IS.
As the cloud was already mainstream in 2019, 90 percent of companies are on the cloud
[5]. However, cloud-based ERPs also involves challenges, such as lack of privacy and
data security issues due to the cloud providers’ involvement [6]. Existing literature has
studied the increasing importance of web-based services and embarked on Cloud-ERP
systems’ benefits and challenges based on the discussion above. Besides, several studies
have focused primarily on the cloud-ERP lifecycle phases – adoption and implemen-
tation [7]. Demi and Haddara [8] suggest that cloud-ERP systems are a key strategic
technology for future development. Moreover, Gartner [9] forecasted that the global
public cloud service market would grow to 214 billion by the end of 2019. However, the
global COVID-19 pandemic has accelerated the adoption of cloud and digital infrastruc-
tures worldwide due to the need for increased remote work possibilities [10]. Hence, it
is essential to study how cloud-ERP systems contribute to the demand for continuous
business improvement. The extant literature on cloud ERPs lifecycle phases provides
documented evidence on how cloud-ERP systems’ characteristics influence business and
organizational performance.
This paper adopts the DeLone and McLean [11] updated IS success model to cat-
egorize cloud-ERP characteristics in the six dimensions/components of the IS success
model to evaluate how it can enhance business and organizational performance. The IS
success model is an established framework in IS research that can be used to identify,
describe, and explain the relationships among the critical dimensions of IS success. In
addition, the widely-adopted Technology-Organization-Environment (TOE) framework
developed by Tornatzky et al. [12] is utilized in this research to uncover additional aspects
of cloud-ERP systems performance-related impacts on organizations. Accordingly, this
paper proposes the following research question: “What system characteristics of cloud-
ERP delivered through SaaS motivate or hinder enterprises from moving to the public
cloud?”
The review aims to also detect research gaps as to what variables, dimensions, or
system characteristics may deem important but have not gained enough attention in
existing research. The literature analysis may contribute to a greater understanding of to
what extent cloud computing technology generates opportunities and challenges when
adopting cloud-ERPs, and which factors may inhibit companies’ adoption of cloud-based
ERP systems.
Exploring Public Cloud-ERP Systems’ Impact 123

2 Method
This study followed the guidelines for systematic literature review by Webster and Wat-
son [13]. Sandberg and Alvesson [14] imply that posing innovative and challenging
questions about existing literature is essential to generate or identify exciting and sig-
nificant theories. Hence, to accumulate research evidence from the existing body of
knowledge, this research has identified literature that discusses cloud-ERP character-
istics that may contribute to or inhibit enhanced business performance. Accordingly,
the research findings in the acquired literature are categorized and clustered according
to DeLone and McLean [11] updated IS success model and Tornatzky et al. [12] TOE
framework. The IS success model posits that system, information and service quality can
enhance organizational benefits. Combining TOE with the IS success model to become
an integrated model aims to identify system characteristics and measures and internal
and external influences contributing to organizational performance.
Since DeLone and McLean published the IS success model in 1992, nearly 300
articles in refereed journals have referred to and used the model to measure the dependent
variable in IS research [15]. The updated model has added “service quality” as a third
dimension to “system quality” and “information quality” as components of IS success.
Additionally, DeLone and McLean [11] believed it is more parsimonious to combine
“individual” and “organizational impacts” into a single variable, namely “net benefits.”
The authors initially used the term “impacts” to measure effectiveness and success.
However, in the ten-year update paper, they imply that “impacts” may be positive or
negative, thus leading to a possible confusion as to whether the results are good or
bad. Including “net” in “net benefits” is vital because no outcome is wholly positive
without any negative consequences. Thus, “net benefits” is sought to be the most accurate
descriptor of the updated success variable [11].

2.1 Literature Collection Strategy

This paper reviewed the literature published between 2015 and 2022. The selected range
ensures a timely review of the state-of-the-art technologies and up-to-date literature
studying the latest issues of cloud-ERPs adoptions. The literature search was based on
four primary sources of scientific papers databases: ACM Digital Library, Science Direct,
Google scholar, and Emerald. The search terms included: cloud-ERP success; cloud-ERP
system quality; cloud-ERP information quality; cloud-ERP and business performance,
as well as synonyms and combinations were used. Additionally, a secondary search
was conducted by scanning all the selected articles’ reference lists to identify additional
literature. Furthermore, data from the acquired literature were extracted in guidance by
the main research question of this study. The following research inclusion and exclusion
criteria were defined to ensure a narrow yet comprehensive literature review. The articles
needed to be published in peer-reviewed journals or conference proceedings to ensure
the quality of the literature. No limitations on the industry type or organizational size
were adopted to gather a wider set of research results. The authors read all the papers in
order to check their relevance to this research and to be able to classify the final set of
papers into the various dimensions and adopted frameworks in this research.
124 M. Øverdal et al.

3 Overview of the Articles


The initial results for keyword searches were 487 articles. After screening the titles,
the number of articles was more than halved, resulting in 293 articles. After screening
keywords in the articles, 63 articles were retained. Lastly, after reading the abstract and
considering the articles´ findings, only 12 articles were obtained, including theory on
cloud-ERP, characteristics affecting firm performance, and technological, environmen-
tal, and organizational factors affecting cloud-ERP usage. Table 1 depicts the databases
used, the total number of articles after applying search terms, the selected articles, and
the related journal/conference proceeding.

Table 1. An overview of selected articles

Database Total number of articles Selected articles Journal/conference


proceeding
Emerald 154 1) Chang [16] 1) Journal of
Enterprise Information
Management
2) Gupta et al. [3] 2) International Journal
of Quality &
Reliability
Management
3) Alsharari et al. [17] 3) Journal of Small
Business and
Enterprise
Development
4) Alsharari [18] 4) Transforming
Government: People,
Process and Policy
Elsevier 179 5) Abd Elmonem et al. 5) Future Computing
[19] and Informatics
Journal
6) Chen et al. [20] 6) Applied Soft
Computing
7) López and Ishizaka 7) Computers in
[21] Industry
8) Sørheller et al. [22] 8) Procedia Computer
Science
9) Gupta et al. [23] 9) International Journal
of Information
Management
SSRN e-Library 109 10) Jain and Sharma 10) Annual Research
[24] Journal of SCMS Pune
(continued)
Exploring Public Cloud-ERP Systems’ Impact 125

Table 1. (continued)

Database Total number of articles Selected articles Journal/conference


proceeding
11) Tongsuksai et al. 11) Asia-Pacific
[25] Conference on
Computer Science and
Data Engineering
(CSDE)
ACM 45 12) Muslmani et al. 12) Proceedings of the
[26] First International
Conference on Data
Science, E-Learning,
And Information
Systems

Due to the growing popularity of cloud-ERPs, the literature shows a great interest in
cloud computing in general and cloud-ERP adoptions in specific. Figure 1 below illus-
trates the research methods distribution adopted by the authors of the reviewed articles.
Three papers adopted a case study research method. Four studies have conducted quan-
titative surveys, one conceptual paper discussed cloud-ERP systems with no empirical
data, and four papers were literature reviews. It is important to note that the most recent
and relevant review on cloud-ERP systems identified in this research was published in
2019. Furthermore, based on the increased number of cloud-ERP adoptions and imple-
mentation projects during the pandemic, a more updated literature review is sought to be
needed. Table 2 provides an overview of the reviewed studies and the methods adopted
in their research.

Case study

Survey

Literature review

Conceptual paper

0 1 2 3 4 5

Fig. 1. Research method distribution among reviewed articles


126 M. Øverdal et al.

Table 2. Overview of reviewed papers mapped with their adopted research design/method

Research method Articles


Case study [17]
[18]
[21]
Survey [3]
[16]
[23]
[24]
Literature review [19]
[22]
[25]
[26]
Conceptual Paper [20]

4 Findings
In cloud-ERP environments, system quality measures the desired characteristics of a
cloud-ERP system, and information quality measures (among others) accuracy, time-
liness, and completeness of the information provided by the IS. System and infor-
mation quality characteristics in literature are included in the technological contexts
as predictors for migrating to the cloud. Environmental contexts refer to government,
partners/providers, and industry influences. Service quality is merely a subset of sys-
tem quality. However, this instrument includes measures, such as service reliability,
responsiveness, assurance, and empathy, primarily dependent on IS employees´ (system
providers) user service. Hence, service quality is included in the environmental context.
Organizational contexts refer to the characteristics and resources of the studied organi-
zations in the literature. The third and fourth IS success dimensions - intention to use and
user satisfaction are included in the organizational context. Table 3 shows cloud-ERP
characteristics cited in the literature and contains benefits (+) and hindrances (−) for
migrating to the public cloud.

4.1 Technological Context


Composing and Customizing Web Services. Chen et al. [20] focus on enterprise
users´ requirements and constraints for cloud-ERP systems delivered through SaaS.
Their study argues that the importance of cloud-ERP platforms in which enterprise cus-
tomers can select web services and customize a unique ERP system to meet their specific
needs would ultimately contribute to improved performance. The findings also suggest
that web-service vendors’ strength in composing and providing customized services adds
benefits and enhances user satisfaction. Hence, the capability of web-service composi-
tion is deemed essential for the enterprise customer´s organizational value. In addition,
Exploring Public Cloud-ERP Systems’ Impact 127

Table 3. ERP Characteristics Cited in Literature Mapped with Benefits (+) and Barriers (−) for
Migrating to the Public Cloud

Category Cloud-ERP Reference articles


characteristics
Technological factors System quality Scalability (+) [19]
[24]
[25]
Accessibility and [20]
availability (+) [24]
[19]
[25]
[17]
Reliability (+/−) [21]
[26]
[16]
Composition of web [20]
services (+/−) [19]
[26]
[16]
Information quality Real-time information [24]
flow (+) [23]
Easy updates/upgrades [19]
(+) [24]
Data security (−) [20]
[19]
[3]
[22]
[25]
Environmental factors Service quality Service reliability (−) [22]
[16]
[21]
Vendor dependence (−) [21]
[3]
[25]
[17]
Organizational factors Financial benefits Cost transparency (+) [19]
Pay for use (+) [19]
[16]
Affordable (+) [21]
[17]
[16]
(continued)
128 M. Øverdal et al.

Table 3. (continued)

Category Cloud-ERP Reference articles


characteristics
Costs Long term costs (−) [24]
[3]
Human resources Collaboration [24]
capabilities (+)
Subsequent training, [21]
use, and acceptance+
Employee knowledge [25]
and training of users (+)
Top management
support (+)
Prior experience (+)
People resources (HR) [23]
enhance dynamic
capability (+)
Organizational New ways to handle [3]
changes data (−) [22]
Resistance towards [23]
change (−) [18]
Transparent business
practices (+)
Intention to use/User Perceived risk (−) [25]
satisfaction [16]
Cloud-ERP awareness [23]
(−)
Satisfaction with [16]
incumbent ERPs (−)
Perceived system [16]
quality (+)
Individual [3]
characteristics (+/−) [25]

web-service offers can compose compatible processes by ensuring high-level interoper-


ability. On the other hand, Abd Elmonem et al. [19] state that cloud-ERP systems offered
in off-the-shelf packages with generic features aim to satisfy the requirements of a wide
range of enterprise customers. These ready-made packages can facilitate the imple-
mentation processes, which leads to rapid adoption and implementation of cloud-ERP
systems.
Muslmani et al. [26] suggest that the flexibility of SaaS delivery models resulted in
decreased switching costs and factors from traditional ERP systems, as the SaaS models
are provided in a one-to-many infrastructure in which one application serves multiple
Exploring Public Cloud-ERP Systems’ Impact 129

users. The users can alter the service provider with any changes they require to their
applications. Nevertheless, the study also states that cloud-ERP solutions in packages
with limited customization and integration options may require organizations to add
more integration features, followed by additional costs [26]. Chang [16] implies that
cloud service providers should design more effective systems relevant to organizational
processes and tasks. On the other hand, a case study by Bjelland and Haddara [7] in the
Norwegian cloud-ERP market suggests that cloud-ERP vendors are generally reluctant
to customize cloud-ERP system implementations for their clients, as the concept goes
against the one to many application-infrastructure designs. In addition, customizations
would increase the need for increased vendor support and involvement of vendor-ERP
consultants within the adoption projects, complicate implementations and affect the scale
and speed of cloud-ERP offerings [7].

Reliability. Research by Chang [16] illustrated various enablers and inhibitors for
switching intention/migration from on-premise ERP systems to cloud-ERP. For exam-
ple, system quality refers to the performance characteristics of cloud-ERP systems, and
reliability is presented as an essential contributing factor to increasing system quality
[16]. Moreover, Muslmani et al. [26] studied solutions for adopting Cloud-based ERP
systems and reducing the integration complexity. The need for organizations to integrate
the on-premises systems with their cloud system to ensure data synchronization remains
one of the most significant challenges since most data are saved on the cloud. They sug-
gest that reducing integration complexity before migrating to the cloud could add an extra
level of reliability and productivity. Integrating a traditional, on-premises ERP system
with a new cloud system also may improve information quality [26]. Muslmani et al. [26]
suggested that the solution is an application program interface (API) which facilitates
the integration process, as APIs specify how the system’s components should interact.
The issue is raised as every cloud service provider has its API standards which might cre-
ate conflicts when integrating the current system with the cloud service provider. Thus,
using standard APIs may avoid integration problems and increase system reliability.
Moreover, López and Ishizaka [21] compiled a list of criteria for cloud-ERPs that
organizations should aim for when considering moving to the cloud. The authors studied
a company who decided to adopt a cloud-based ERP system to improve data integration
and operate more efficiently. The list of criteria is related to system and software quality
for evaluating SaaS ERP applications. Furthermore, it was found that the “systems”
criterion, which included reliability, customization, maintainability, security, usability,
and functionality, was considered the most relevant in the cloud-ERP selection process.
A case study conducted in the UAE’s public sector suggests that governmental orga-
nizations may easily migrate from on-premise ERP and align their institutional work
processes with the inbuilt logic of cloud-ERP, resulting in successful and rapid adoption
[18]. Furthermore, Gupta et al. [3] survey found that SMEs and large organizations do
not differ in integration, security, functionality, and provider integrity.

Scalability, Availability and Accessibility. Availability and scalability are frequently


discussed in the existing literature e.g. [25]. Chen et al. [20] suggest that system availabil-
ity is perceived as one of the most crucial system quality attributes contributing to cloud-
ERP adoption delivered through SaaS. Likewise, Abd Elmonem et al. [19] also identified
130 M. Øverdal et al.

high availability and improved accessibility and scalability as perceived cloud-ERP bene-
fits. Jain and Sharma [24] survey discovered that cloud-ERP adoption improved scalabil-
ity and supported modern user experience and socially enabled businesses. Improvement
in system accessibility was another prominent feature of Cloud-ERP that helped firms
customize security services.
Alsharari et al. [17] findings suggest that cloud-ERP´s ease of use and control and
management leads to increased flexibility of the processes’ accessibility. Their study
results demonstrate that the elasticity of different operations and procedures has risen
dramatically since the beginning of the cloud-ERP integration in their case. The sys-
tem’s accessibility can also be enhanced by accessing the needed information from the
organization’s database, which became available from any online resource. Due to the
system´s accessibility, the productivity in different departments was also enhanced and
boosted organizational efficiency due to the optimum-utilization services provided by
the cloud-ERP system that the company applied, which, in turn, improved the overall
organizational performance [17].
Real-Time Information Flow. Information quality refers to the characteristics of the
output (data/information) provided by cloud-ERP systems Chang [16]. Completeness,
understandability, and relevance of data/information are sought to be considered as
enablers of switching and migrating to cloud-ERPs Chang [16]. Thus, organizations
may benefit from the accuracy of information provided by cloud-ERP systems because
information quality is related to workforce collaboration, productivity, and efficiency.
Jain and Sharma [24] identified several benefits for cloud-ERP systems, like improved
information integration for better decision making and faster response time to customer
queries as direct aspects of cloud-ERP real-time information quality. Indirect aspects of
information quality of cloud-ERP systems included better corporate image, improved
customer goodwill, and customer satisfaction [24]. Similarly, [23] also found a clear
positive impact of cloud-ERPs information quality on organizational performance. The
authors proposed that cloud-ERP acts as the catalyst for real-time information flow
between department and manufacturing processes. Cloud-ERP catalyzes supplier inte-
gration, and business integration helps organizations scale efficiently, leading to better
financial and economic performance. Their findings suggest that cloud-ERPs reduce
data losses, enable real-time cloud operations, and improve processing time. The study
concludes that overall economic, social, and environmental performance growth can be
acquired by deploying cloud-ERP [23].
Data Security. Literature frequently mentions data security as the primary concern
related to cloud-ERP systems and that data security may inhibit cloud-ERP adoption [3,
16, 19, 20, 22, 25].
Chen et al. [20] argue that successful cloud-ERP adoption does not depend on the
product itself but mainly on the vendor´s support and the customer experience with
provided service. Hence, the paradigm changes from product feature to service trust
in handling the data securely. Abd Elmonem et al. [19] identified data ownership as
another challenge related to security and data management. Conversely, Alsharari et al.
[17] studied a company that believes that their cloud-ERP vendor properly secures
their organization’s data security and privacy. Data security issues might be linked to
inefficient providers of cloud-ERP rather than the system itself.
Exploring Public Cloud-ERP Systems’ Impact 131

4.2 Organizational Context


Financial Benefits and Costs. Cost transparency is one of the main benefits of cloud-
ERP solutions – enterprise customers pay for what it uses and the number of users.
However, the hidden costs are one of the challenges facing cloud-ERP clients, which
could be discovered in contracts later [19]. Also, cloud-ERP works on regular periodic
subscription fees and long-term costs usually add up with the time [3]. However, Jain and
Sharma [24] recommend that enterprise customers could build an internal private cloud
to reduce ongoing hardware running costs. On the other hand, Alsharari et al. [17] and
López and Ishizaka [21] argue that cloud-ERPs are relatively lower in price in contrast
to on-premise and have zero maintenance costs. Thus, minimizing different kinds of
expenses like up-front costs, operational costs, maintenance costs, and IT professionals’
hiring and training costs can collectively result in better utilization of financial resources
[17]. Chang [16] found that organizational financial/economic benefits can depend on
the organizational contexts; however, pay per use, structured payments, and cost savings
are obvious economic advantages of cloud-ERP systems.

Human Resources and Key Users. Jain and Sharma [24] state that cloud-ERP would
be beneficial for improving resource utilization, enhancing collaboration capabilities,
reducing environmental footprint, and reducing IT infrastructure needs. The importance
of human resources is relevant in the cloud-ERP environment, as Gupta et al. [23] imply;
organizational, people, and technological factors are crucial resources of an organization,
enhancing the process of nurturing the dynamic capability. López and Ishizaka [21] argue
that key users’ active involvement in the cloud-ERP implementation process enables
subsequent training, use, and acceptance of the technology. The key users’ expertise
is critical in accomplishing successful ERP initiatives and adoptions (2017). Likewise,
other studies also suggest that employees’ IT knowledge and training of users are crucial
organizational concerns affecting cloud-ERP adoption success [25].

Organizational Change. Organizational changes in organizations include new ways


to handle the data in cloud-based ERPs. Most of the IT resources of cloud-based ERP
systems are outsourced. The cloud provider governs the IT infrastructure and its main-
tenance, backups, and security procedures. Thus, client organizations cannot monitor,
govern, or secure their systems themselves, which may cause organizational resistance
toward the changes, and the loss of their IT competencies and control over the infrastruc-
ture [3]. Sørheller et al. [22] also identified organizational change as a significant concern
related to cloud-ERPs. Conversely, Gupta et al. [23] achieved customer sustainability
demands through green and transparent business practices, transforming cloud-ERP into
a dynamic capability, which led to sustainable performance.

Intention to Use and User Satisfaction. Perceived risk of cloud-ERP systems and sat-
isfaction with and breadth of use of on-premise ERP systems hinder the adoption of
cloud-ERP [16]. However, data quality, system quality, information quality, and positive
peer-employee opinions are found to affect the perceived benefits leading to increased
intention to use cloud-ERP systems [16]. Other technical characteristics of cloud-ERP,
such as compatibility, complexity, and trialability, may also enhance the organiza-
tional comparative advantage and adoption likelihood [25]. In addition, the individual
132 M. Øverdal et al.

employee characteristics, performance expectancy, effort expectancy, social influence,


and facilitating conditions can influence the intention to use cloud-ERP [25]. Gupta et al.
[3] argue that awareness is an identified challenge with adopting cloud-ERPs. Thus, the
study recommends introducing cloud-ERP awareness programs, workshops, and train-
ing sessions for the organizations to make it easier for those firms to adopt cloud-ERP.
This may increase the organizations/employees’ intentions to adopt and use the system.
Chen et al. [20] state that successful cloud-ERP adoptions rely on the support given in
the SaaS model and the customer experience with provided service rather than the ERP
product itself. Hence, the paradigm changes from product features to service trust.
The intention to use construct captures the balance of the positive and negative
impacts of cloud-ERP on employees, suppliers, customers, organizations, markets,
industries, economies, and even society. However, the literature focuses mainly on the
challenges and benefits of cloud-ERP affecting organizations as an entity. Thus, studies
focusing on weighing the benefits vs. challenges of cloud-ERPs were not identified in this
review. Since several articles argue that cloud-ERP adoption would provide improved
efficiency, reduced total costs, and time savings. These benefits may be viewed as net
benefits as it measures the outcome of inputs. Net system benefits are affected by system
use and user satisfaction with the system. DeLone and McLean [11] argue that it is
appropriate to include the dimension in this section.

4.3 Environmental Context


Service Reliability and Vendor Dependence. Service reliability is a service quality
attribute measuring a system provider’s dependableness. DeLone and McLean [11] argue
that system providers offer higher service quality to users when reacting quickly and
positively to their issues and requests. Sørheller et al. [22] rated suppliers’ reliability as
one of the top concerns related to cloud-ERPs. Other studies attempted to investigate
the measures that cloud-ERP vendors could take to enhance the service quality. For
example, Chang [16] argues that if cloud service providers can reduce system response
time, this may enhance service reliability Chang [16]. Tongsuksai et al. [25] suggest that
industry, competitive pressure, and the trustworthiness of service providers are highly
influencing cloud-ERP adoption’s success. Via a case study, López and Ishizaka [21]
suggest that maintenance ability and support services are deemed essential criteria in
the cloud-ERP vendor selection process.
Alsharari et al. [17] found that lower organizational independence (aka. vendor lock-
in) could hinder the adoption of cloud-ERP, as the firm becomes directly linked to the
provider’s availability and effectiveness. Their case study implies that the efficacy of
implementing a successful cloud-ERP relies mainly on the provider’s professionalism
to minimize organizational independence. Any potential damage or ineffectiveness in
the vendors’ software or hardware might influence the company’s accessibility to its
data. Accessibility is regarded as a vital system characteristic contributing to enhanced
organizational performance Alsharari et al. [17]. Gupta et al. [3] also argue that vendor
dependence (lock-in) is a significant hindrance to migrating to the cloud. So far, cloud-
ERP vendors have been providing limited applications to organizations, focusing on
the core area of ERP applications. Clients are given access to only a certain number
Exploring Public Cloud-ERP Systems’ Impact 133

of modules, which may ease the trust establishment between the clients and service
providers, given the reduced complexity of the systems (2017).

Ease of Updates/Upgrades. Up-to-date hardware equipment and software applications


are considered a service quality measurement and dimension. As cloud-ERP services
are delivered through the internet, the systems are easily and frequently upgraded. The
literature considers the service quality attributes – easy and fast-to-deploy, up-to-date,
and more accessible updated functionalities as major beneficial characteristics of cloud-
ERP [19, 24].
In summary, most of the reviewed articles find that cloud-ERPs overall enhance orga-
nizational performance. For instance, Alsharari et al. [17] case study results suggest that
the cloud-ERP system´s ease of use, low up-front costs, ease of control and management,
availability, accessibility, and the zero maintenance costs were the primary objectives of
adopting a cloud-based ERP in that case. The findings provide evidence that using the
Cloud ERP system is constructive to organizations’ success and improves the quality of
their decision-making process.

5 Discussion
Based on the review of cloud-ERP system characteristic benefits and inhibitors, the
following part presents the main research focus and some research gaps in the existing
literature. Although twelve articles from 2015 to 2022 are a low number of studies, this
review identified reoccurring research themes and findings across the articles (refer to
Table 3).

5.1 Research Focus


According to [16, 17, 23], SaaS generally reduces (and in some cases eliminates) hard-
ware and software-related management processes and costs. The extant literature also
emphasizes that service providers allow the end-user to access the service from anywhere,
at any time. As organizations are becoming more globalized and have subsidiaries dis-
tributed across nations, it may be necessary to implement a cloud-ERP to benefit from
the scalability, accessibility, availability, and reliability, which heavily impact cloud-ERP
adoptions by client organizations. The vendors’ infrastructure scalability potentials are
essential in responding to the increasing demands of a growing business and client. Addi-
tionally, our findings suggest that cloud-ERPs change the way organizations access data,
upgrade functionalities, and benefit from real-time information flow, enhancing organi-
zations’ performance. Various applications and services offered under the umbrella of
cloud-ERPs help organizations and employees to improve collaboration capabilities and
reduce resource and investment efforts. Our findings also support the claim that cost opti-
mization is one of the primary reasons for migration from on-premises to cloud-ERP.
Accordingly, the system characteristics show that cloud-ERP offerings have enabled
enterprise customers to see more applications and services moving to the cloud. Even
though our findings present several positive aspects of cloud services, there are also
critical concerns, such as data security risks.
134 M. Øverdal et al.

Common denominators of the literature are the hindrances for utilizing cloud-ERPs,
primarily based on environmental contexts, such as vendors’ support and service, and to
what extent the organization is dependent on the vendors (vendor lock-in). This is in line
with several studies that categorized vendor lock-in as one of the major challenges of
cloud-ERP adoptions, as the service providers, host, operate, and support both the appli-
cation layer and the data layer. Additionally, customization and integration limitations
are other barriers to cloud-ERP migration. Given that Cloud-ERP providers may vary
extensively when placing governance over cloud-ERP services, it can be expected that
the utilization of cloud-ERP would provide a different degree of efficiency and service
reliability across organizations. Additionally, six out of the twelve reviewed articles pro-
pose data security as a top concern for businesses utilizing cloud services. Although data
security is categorized in the technological aspect and is considered a system quality
measure, cloud-ERP differentiates from other technical attributes since it is delivered as
a service. Thus, the literature discusses data security issues regarding vendors´ trustwor-
thiness and information privacy protocols due to vendors´ access to the organizational
master data and the public cloud. Hence, it may be appropriate to view data security
as a service measure. Security concerns are relevant, as cloud-ERPs may cause data
leakage and/or suffer other vulnerabilities that may affect client organizations. Some
studies (e.g. [17]) imply that security issues might be linked to incompetent providers of
cloud-ERP rather than the system itself. Moreover, client-side users’ IT experience may
improve the perception of security risks in cloud ERPs [24], as some studies suggest that
the security measures taken by cloud ERP providers may be better standards than what
their clients can provide themselves [2]. This may indicate that real-life case studies
and surveys conducted in IT companies or organizations with IT-savvy employees may
provide more valid/realistic evidence regarding the cloud-ERP security landscape.

5.2 Limitations and Future Research


The main obvious limitation is the few databases used and the small number of articles
reviewed. Although the selected databases provide quality sources and research outlets,
it may be applicable to apply the search to other databases and conference proceedings
that could provide other dimensions to this review that may have been overlooked.
Differences deriving from organizational factors, users, and system variations can modify
the view as to which success measures are essential. For example, enterprise cloud-
ERP users (clients) may prefer or emphasize different success measures, depending
on the type of cloud-ERP system deployed and the business’s type and size. However,
not all articles specify the size or type of the studied organization (s). Further, several
articles mention the need for composing and customizing web services. However, the
size of the organizations´ needed modules and applications were not discussed. It can be
assumed that different organizational sizes and industries require various applications
and functionalities. Appropriate alignment and increased functional fit may increase
intention to use and user satisfaction. Additionally, enterprise types and business contexts
could dictate the proper specification and application of the success dimensions.
Cloud-ERPs’ benefits and challenges are frequently discussed in the literature, yet
studies that focus on the net benefits of those systems are second to none. For example, the
system and information quality attributes are expected to enhance cloud-ERP adoptions,
Exploring Public Cloud-ERP Systems’ Impact 135

however, it is not evident if these benefits yield positive net benefits for the organizations
in general. For instance, cloud-ERP offers financial benefits, such as pay per use, lower
up-front costs, cost transparency, and affordability compared to increment ERP systems.
Nevertheless, the costs of cloud-ERPs are not subtracted to capture the actual balance
of positive and negative financial impacts. In general, studies focusing on the total cost
of ownership (TCO) of cloud-ERP systems are needed.
The on-demand feature is essential in cloud-ERPs and is cited as an accessibility-
related dimension in literature. However, some vendors may also access the data, intro-
ducing data security and privacy issues. Researchers might need to investigate the balance
between accessibility and vendor trust. Moreover, future research should investigate how
the benefits and challenges offset each other. Finally, vendors’ perspectives on data secu-
rity and how they aim to create business value and maximize trust, and the steps they take
to reduce the security concerns for their enterprise customers, maybe another interesting
future research avenue.

6 Conclusion
By utilizing the DeLone and McLean’s IS success model and the TOE framework, this
paper attempts to identify which system characteristics of cloud-ERP may lead to an
improved organizational performance or may hinder the migration to the public cloud.
All the reviewed articles found a positive impact of cloud-ERP on organizational perfor-
mance in different contexts. Cloud-ERPs improve collaboration capabilities, scalability,
reliability, system availability, collaboration, and accessibility, enhancing organizational
efficiency. Additionally, real-time information flow and frequent system updates are con-
firmed to improve decision-making processes. These impacts are due to cloud-ERPs sys-
tems and information quality. However, service quality features such as vendor lock-in
and reliability might hinder organizations from moving to the cloud. Data security issues,
integration complexity, and customization difficulties may also inhibit cloud migration.
These findings may help service providers to work on strategies to reduce or eliminate
those concerns, enhance trust, and encourage enterprises to move to a cloud-based ERP
to enhance their business performance in general.

References
1. Christiansen, V., Haddara, M., Langseth, M.: Factors affecting cloud ERP adoption decisions
in organizations. Procedia Comput. Sci. 196, 255–262 (2022)
2. Sædberg, A., Haddara, M.: An exploration of adoption factors for cloud-based ERP systems
in the public sector. In: NOKOBIT, vol. 24, no. 1 (2016)
3. Gupta, S., Misra, S.C., Singh, A., Kumar, V., Kumar, U.: Identification of challenges and
their ranking in the implementation of cloud ERP: a comparative study for SMEs and large
organizations. Int. J. Qual. Reliabil. Manag. (2017). https://doi.org/10.1108/ijqrm-09-2015-
0133
4. Wang, X.V., Xu, X.W.: An interoperable solution for cloud manufacturing. Robot. Comput.
Integr. Manuf. 29(4), 232–247 (2013). https://doi.org/10.1016/j.rcim.2013.01.005
5. Galov, N.: 25 cloud computing statistics in 2020 - will AWS domination continue? In:
HostingTribunal (2020)
136 M. Øverdal et al.

6. Saa, P., Moscoso-Zea, O., Costales, A.C., Luján-Mora, S.: Data security issues in cloud-based
Software-as-a-Service ERP. In: 2017 12th Iberian Conference on Information Systems and
Technologies (CISTI), pp. 1–7. IEEE (2017). https://doi.org/10.23919/cisti.2017.7975779
7. Bjelland, E., Haddara, M.: Evolution of ERP systems in the cloud: a study on system updates.
Systems 6(2), 22 (2018)
8. Demi, S., Haddara, M.: Do cloud ERP systems retire? An ERP lifecycle perspective. Procedia
Comput. Sci. 138, 587–594 (2018). https://doi.org/10.1016/j.procs.2018.10.079
9. Gartner: Gartner forecasts worldwide public cloud revenue to grow 17.5 percent in 2019
(2019)
10. Haddara, M., Staaby, A.: RFID applications for patient safety in the healthcare sector. In:
Quality of Healthcare in the Aftermath of the COVID-19 Pandemic, pp. 155–179. IGI Global
(2022)
11. DeLone, W.H., McLean, E.R.: The DeLone and McLean model of information systems suc-
cess: a ten-year update. J. Manag. Inf. Syst. 19(4), 9–30 (2003). https://doi.org/10.1080/074
21222.2003.11045748
12. Tornatzky, L.G., Fleischer, M., Chakrabarti, A.K.: Processes of Technological Innovation.
Lexington Books, Lexington (1990)
13. Webster, J., Watson, R.T.: Analyzing the past to prepare for the future: writing a literature
review. MIS Q., xiii–xxiii (2002). https://doi.org/10.1080/12460125.2020.1798591
14. Sandberg, J., Alvesson, M.: Ways of constructing research questions: gap-spotting or
problematization? Organization 18(1), 23–44 (2011). https://doi.org/10.1177/135050841037
2151
15. DeLone, W.H., McLean, E.R.: Measuring e-commerce success: applying the DeLone &
McLean information systems success model. Int. J. Electron. Commer. 9(1), 31–47 (2004).
https://doi.org/10.1080/10864415.2004.11044317
16. Chang, Y.-W.: What drives organizations to switch to cloud ERP systems? The impacts of
enablers and inhibitors. J. Enterp. Inf. Manag. (2020). https://doi.org/10.1108/jeim-06-2019-
0148
17. Alsharari, N.M., Al-Shboul, M., Alteneiji, S.: Implementation of cloud ERP in the SME:
evidence from UAE. J. Small Bus. Enterp. Dev. (2020). https://doi.org/10.1108/jsbed-01-
2019-0007
18. Alsharari, N.M.: Cloud computing and ERP assimilation in the public sector: institutional
perspectives. Transf. Gov. People Process Policy 16, 97–109 (2021)
19. Abd Elmonem, M.A., Nasr, E.S., Geith, M.H.: Benefits and challenges of cloud ERP systems–
a systematic literature review. Future Comput. Inform. J. 1(1–2), 1–9 (2016). https://doi.org/
10.1016/j.fcij.2017.03.003
20. Chen, C.-S., Liang, W.-Y., Hsu, H.-Y.: A cloud computing platform for ERP applications.
Appl. Soft Comput. 27, 127–136 (2015). https://doi.org/10.1016/j.asoc.2014.11.009
21. López, C., Ishizaka, A.: GAHPSort: a new group multi-criteria decision method for sorting a
large number of the cloud-based ERP solutions. Comput. Ind. 92, 12–25 (2017). https://doi.
org/10.1016/j.compind.2017.06.007
22. Sørheller, V.U., Høvik, E.J., Hustad, E., Vassilakopoulou, P.: Implementing cloud ERP solu-
tions: a review of sociotechnical concerns. Procedia Comput. Sci. 138, 470–477 (2018).
https://doi.org/10.1016/j.procs.2018.10.065
23. Gupta, S., Meissonier, R., Drave, V.A., Roubaud, D.: Examining the impact of Cloud ERP on
sustainable performance: a dynamic capability view. Int. J. Inf. Manag. 51, 102028 (2020).
https://doi.org/10.1016/j.ijinfomgt.2019.10.013
24. Jain, D., Sharma, Y.: Cloud computing with ERP-a push business towards higher efficiency.
Ann. Res. J. SCMS Pune 4 (2016). https://doi.org/10.2139/ssrn.2755457
Exploring Public Cloud-ERP Systems’ Impact 137

25. Tongsuksai, S., Mathrani, S., Taskin, N.: Cloud enterprise resource planning implementation:
a systematic literature review of critical success factors. In: 2019 IEEE Asia-Pacific Confer-
ence on Computer Science and Data Engineering (CSDE), pp. 1–8. IEEE (2019). https://doi.
org/10.1109/csde48274.2019.9162373
26. Muslmani, B.K., Kazakzeh, S., Ayoubi, E., Aljawarneh, S.: Reducing integration complexity
of cloud-based ERP systems. In: Proceedings of the First International Conference on Data
Science, E-learning and Information Systems, pp. 1–6 (2018). https://doi.org/10.1145/327
9996.3280033
A Generic Neural Network
Implementation on GPU and Its
Performance Benchmark

Tristan Udby(B) and Yun Tian

Eastern Washington University, Cheney, WA 99004, USA


{tudby,ytian}@ewu.edu

Abstract. Due to the parallel and computationally intensive nature of


Artificial Neural Networks, we use GPUs to implement a generic Multi-
layer Perceptron (MLP) framework and compare the speed to an imple-
mentation on the CPU. The speedup achieved increases as the size of the
network increases, but is also contingent on the hardware used. Three
GPUs are tested, the Tesla K80, the Tesla T4, and the Tesla P100. For
the largest ANNs tested, speedups ranged from 331.14× for the K80 up
to 2379.2× on the P100.

Keywords: GPU · Artificial Neural Network · Multilayer perceptron ·


Performance speedup

1 Introduction
As a massively parallel platform for general-purpose computing, Graphics Pro-
cessing Units (GPUs), traditionally the video cards, have been applied beyond
graphics processing. For example, GPU has been widely utilized in computa-
tional chemistry, biology, and computer vision, to accelerate problem solving
and simulations [3,7,23], thanks to its tremendous amount of computational
power that is enabled by thousands of processor cores on a GPU.
Artificial Neural Networks (ANN) are a crucial foundation for deep learn-
ing and many machine learning algorithms. However, training an ANN on the
Central Processing Unit (CPU) is quite computationally intensive. During ANN
training, for each training data sample, the Back Propagation (BP) algorithm
must traverse all neurons in hidden layer(s) and the output layer of such a net-
work [12,24]. As the number of layers, the number of neurons in each layer and
the size of the training data set increase, ANN training may dramatically slow
down.
Training an ANN is inherently parallel, thus is suitable to be parallelized
using a GPU. In an ANN, neurons in the same layer are independent of each other
during the training process. In addition, each training data sample is independent
of other samples during training, if batch mode is used [1] Traditionally, GPU
accelerated ANN training took advantage of both levels of parallelism in two
separate designs/implementations [13].
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
K. Arai (Ed.): FTC 2022, LNNS 561, pp. 138–154, 2023.
https://doi.org/10.1007/978-3-031-18344-7_9
GPU ANN 139

The purpose of this paper is to explore and demonstrate the performance


improvement that a GPU can achieve over a CPU in the training of an ANN.
Several GPUs are tested to observe the effect that different hardware has on
the speedup, and testing is conducted to observe how the speedup behaves as
a function of the size of the network. Our major contributions are explicitly
presented in Sect. 2.
The code for this project can be found on the public repository on github:
https://github.com/TUdby/GPU Neural Network

2 Related Work

Early work nearly twenty years ago showed the promise of GPUs in their infancy.
In 2004 an ATI RADEON 9700 achieved 20× speedup over CPU [20]. Another
test in 2006 tested specifically CNNs and found it could achieve a speedup of
between 3.1× and 4.1×. These tests were promising but such tests must be done
on new hardware [5].
More recent tests looked at more updated devices. One such test looked at
devices that are available in many desktops and used python packages such as
Numba to achieve 100–250× speedups [8]. The advantage of testing on common
devices using popular languages and packages is that the benchmark provided
applies for many implementations that would be seen in practical applications.
A downside is that testing against Python (even with Numbas’ jit optimizations)
will bias the results to make the cpu appear slower than it is. A different study
used cuBLAS (cuda basic linear algebra subprograms) to implement large RNNs
and found 2× to 11× speedup over the CPU [16]. A benefit to this is that
cuBLAS is popular for implementing linear algebra operations through cuda,
but the downside is that further optimization can be achieved by designing
kernels directly.
Several studies have been done to compare lower level implementations
against optimized CPU code. One such study reported 10–60× speedup depend-
ing on the size of the network when comparing the GPU to compiler optimized
C code [6]. Another study used the GTX 260 to achieve 50× speedup over opti-
mized CPU code [11], and a third tested CNNs versus C++ code using the –O3
optimization to achieve 2–24× speedup [22]. These tests provide a good bench-
mark, but utilized lower end hardware. A better GPU, the Tesla C2050, was
shown to provide up to 1095× speedup [21], showing that better GPUs can be
expected to increase the tested speedups into the thousands.
Many tests have also been conducted using mobile hardware. One such study
showed a range of 2× to 9× speedup [15], while another testing recurrent neural
networks RNNs found at least 4× speedup over the mobile phones CPU [4]. The
best tests conducted achieved a super-linear speed of 63.45× [14], showing just
how much potential a mobile devices GPU has for machine learning.
Several studies have probed into the advantages provided by using multiple
GPUs. Source [25] used two GTX 570’s and compared the speedup displayed by
one GPU to the speedup shown by using both. The one GPU sped up over the
CPU by 11.99× while both together achieved 51.35×. A second study compared
140 T. Udby and Y. Tian

different amounts of GPUs together, using eight Tesla P100s for mini-batch of
256 took 29 h while one thousand twenty-four Tesla P100s with a mini-batch size
of 32768 took 15 min to complete [2]. These tests had to grapple with the bot-
tleneck provided by updating the weights. For any large-scale machine learning
system, utilizing multiple GPUs efficiently would be necessary.
Other studies have investigated comparisons between GPUs and other hard-
ware besides a fully synchronous CPU. One such study compared the GPU
implementation to a parallel MLP implemented on a multi-core CPU [18]. While
the GPU outperformed synchronous CPU code with about 4–5.66× speedup, the
multi-core CPU outperformed the GPU with about 6–9.7× speedup. This inter-
estingly led to the conclusion that multi-core CPUs might be useful for machine
learning. A second [17] study looked at the use of FPGAs for CNNs and used a
Virtex-7 FPGA to process the network 8.3× faster than the Titan X GPU, also
with 75× more energy efficiency.
Our study designs kernels using C/CUDA based on tile matrix multiplication
and tests them against a C based implementation that is optimized using the
–O2 flag. Our generic implementation allows for an arbitrary amount of nodes
and hidden layers to be specified and runs for a predetermined amount of epochs.
Several GPUs are accessed using Google Colab to observe what speedups can
be achieved on different hardware. Unlike many of the studies mentioned, no
packages or higher level languages are tested, and the GPUs are high end. This
allows it to be useful as a benchmark of what can be accomplished by dedicated
systems.
Our Contributions: First, our implementation in C/CUDA successfully inte-
grates tile matrix multiplication and is tested against a sequential implementa-
tion using C that is optimized using the –O2 flag. Second, our generic imple-
mentation allows for creating an arbitrary number of neural nodes and hidden
layers and running for a predetermined amount of epochs. Third, several GPUs
are accessed using Google Colab to observe what speedups can be achieved on
different GPU hardware. Unlike many of the existing studies, no packages or
higher level programming languages are used in our tests. This allows it to be
useful as a performance benchmark.
In comparison with existing research, this work has the following limita-
tions. Our work does not test over using multiple GPUs to implement one ANN.
But, we test our ANN implementation on different GPUs in different experi-
ments. Also, the speedup is only compared to sequential CPU code and does
not compare it to other hardware (such as FPGAs, multi-core CPUs, TPUs,
etc.). Finally, the tests run a basic multilayer perceptron and do not test CNNs,
RNNs, or other forms of neural networks.
The outline of the rest of the paper is as follows. First we will state the
necessary math for multilayer perceptrons in matrix form in Sect. 3. Then, we
will explain how the data for these steps is stored and processed in parallel in
Sect. 4. After this, we present pseudocode in Sect. 5 for the construction of the
CUDA kernels necessary to conduct the math previously shown. In Sect. 6, we
test the code on several different GPUs and discuss our results in the performance
section.
GPU ANN 141

3 Matrix Mathematical Form


Training a Multilayer Perceptron can be broken into three steps: forward prop-
agation, backpropagation, and updating the weights. The mathematical results
for these steps in terms of matrices will be stated in order to establish the nota-
tion that is used both here in the pseudocode as well as in the actual code.
Source [9] provides an explanation for this math which will not be restated here
as it is already well established. The notation used here is different than in the
source, but it is conceptually the same.

3.1 Forward Propagation

Let Yr denote the matrix of node values for layer r. The rows correspond with
the nodes in the layer and the columns correspond with training pairs. Then
let Wr be the matrix of weights going from layer r to layer r + 1. The rows
correspond to nodes in the out-layer and columns correspond with nodes from
in-layer, except for the first column, which stores the bias weights. The activation
function used in the program is the sigmoid function, denoted here simply as
f (). By appending a row of ones to the top of the matrix of node values, denoted
Y˚r , a single forward step is calculated using the formula below.

Yr+1 = f (Wr Y˚r ) (1)


Note that the appending of the ones to the nodes values is a purely logical
step and will be handled differently in the code.

3.2 Backpropagation

Backpropagation consists of recursively finding the output of the delta function,


which is the product of the error with the derivative of the activation function.
Let Dr represent every delta value for layer r with each row corresponding
to a node and each column to a training pair. Calculating all delta values for
each node in the last layer, layer L, is done using the following equation.

DL = YL ◦ (J − YL ) ◦ (YL − O) (2)
The matrix J stands for an all-ones matrix, and ◦ is the Hadamard prod-
uct that defines component-wise multiplication. Also, the matrix O holds the
expected output that is being used to train the MLP. After the base case is
calculated, the deltas for every other layer can be calculated using the deltas of
the layer in front of it with the following equation.

Dr = Yr ◦ (J − Yr ) ◦ WrT Dr+1 (3)


142 T. Udby and Y. Tian

3.3 Weight Updates

The change in weights can be calculated using the matrix of deltas and the trans-
pose of the matrix of node values with the appended ones. Using the learning
rate µ and the matrices already defined, the change in all weights (including bias
weights) for a given layer r is given.
T
∆Wr = −µDr+1 Y˚r (4)

4 Data Representation
Before developing code, we must figure out how to store the necessary data. We
have matrices of weights, node values, and delta values. First, let us store all
matrices of a kind in an array in order of their rows.

Y = [Y1 , Y2 , . . . , YL ]
D = [D1 , D2 , . . . , DL ]
W = [W0 , W1 , W2 , . . . , WL ]
Notice that the first element in W is given an index of 0, while the first
elements of D and Y are given indices of 1. This is because layer 0 is the input
layer and W0 represents the weights going from the input layer to the first hidden
layer (or output layer if there are no hidden layers). The matrix D doesn’t
calculate delta values for the input layer as there is no error associate with the
input itself. The matrix Y could have a Y0 , but that is already held in the array
being given as input and including Y0 would only mean copying the data from
the input array into Y0 , which wastes time.
Using multidimensional arrays for this representation of data allows for a
very intuitive way of accessing the information. Let yr,j (i) be the value of node j
in layer r for training pair i, and let δr,j (i) be the delta value for that same node
and training pair. Accessing this information as it has been stored in an array
becomes Y [r−1][j][i] and D[r−1][j][i], respectively. The one has to be subtracted
from the row because the array uses zero based indexing but the first element
of these arrays started with index one. Now let wr,j,h be the weight going from
node j in row r to node h in the next row. This can be accessed as W [r][j][h + 1].
The one is added because the first column holds the bias weight. The bias weight
going into node h from layer r is denoted br,h and can be accessed as W [r][h][0]
For those who look over the code a few details are necessary. When pro-
gramming on a GPU it is best to linearize multidimensional arrays into a single
array. This poses a difficulty with our data representation. When linearized, an
element of a three-dimensional array may be accessed as M [z ∗ matrix size + y ∗
columns amount + x] which is equivalent to M [z][y][x]. However, this assumes
all matrices are of the same size with the same amount of rows and columns.
This is not the case for us, as the size of our matrices are dependent on how many
nodes are in each layer. Because of this, the starting index for a matrix in the
GPU ANN 143

vector has to be calculated beforehand. We define two arrays: nodes indices and
w indices. Because Y and D have identically sized matrices corresponding to
nodes and training pairs, the calculated starting indices of the matrices in both
of these arrays can be stored in the nodes indices array. The W indices will be
stored in w index. To calculate these starting indices, and also to reference the
amount of columns a given matrix has, we also need an array to store the size
of each layer in order, which will be called layers. In the pseudocode presented,
these lower level details are omitted and the arrays are treated as though they
are stored as three-dimensional.

5 The Kernels
The following are pseudocode for the kernels that must implement the math
using the data structures previously described. For the sake of brevity, the pseu-
docode omits certain minor details that would be necessary to make functioning
code, but are not necessary for displaying the concept. Each kernel is built around
the tile matrix multiplication algorithm for GPUs [19].
In the pseudocode, kr is the amount of nodes in layer r. Also, the variable
batch size refers to the amount of training pairs being run at a time. This
allows the kernels to be useful for mini-batch mode as well, but in the code all
the training pairs were run at once in full batch mode. Finally, the variables bx,
by, tx, and ty are used as shorthand for blockIdx.x, blockIdx.y, threadIdx.x, and
threadIdx.y.

5.1 Forward Propagation Kernel

The forward propagation kernel implements equation (1) for a single layer, and
will be called iteratively for each step. First, the matrix multiplication section
within the for loop handles Wr Yr . Notice that this multiplication skips the bias
nodes by adding one to the column. After the multiplication section completes,
every thread adds the bias weight to the accumulated value for its index and
passes it into the sigmoid activation function.

Algorithm 1: Forward Propagation Kernel

1: shared WV[T ILE W IDT H][T ILE W IDT H]


2: shared YV[T ILE W IDT H][T ILE W IDT H]
3:
4: i = bx * T ILE W IDT H + tx
5: j = by * T ILE W IDT H + ty
6:
7: acc = 0
8: for m = 0; m < num tiles; m + + do
9: WV[ty][tx] = W[r][j][tile col+1]
144 T. Udby and Y. Tian

10: YV[ty][tx] = Y[r][tile row][i]


11:
12: syncthreads()
13:
14: for x = 0; x < T ILE W IDT H; x++ do
15: acc += WV[ty][x] * YV[x][tx]
16: end for
17:
18: syncthreads()
19: end for
20:
21: if j < kr+1 and i < batch size then
22: y = sigmoid(acc + W[r][j][0])
23: Y[r][j][i] = y
24:
25: if r = L - 1 then
26: D[r][j][i] = y * ( 1 - y ) * ( y - O[j][i] )
27: end if
28: end if

Also, notice that there is a final check at the end to see if we just went from
the second to last layer, L − 1, to the last layer, L. If the layer is the second to
last layer, then we just completed the final forward computation. At this point,
we actually do the calculation for the first delta values given by equation (2). We
do this because we have all the resources allocated and information calculated,
so it would be a waste to exit the kernel and dedicate a separate kernel to what
can be done in one line.

5.2 Backpropagation Kernel


Because the first layer of deltas was calculated in the forward propagation kernel,
the backpropagation kernel only seeks to iteratively implement equation (3).
The multiplication section handles WrT Dr+1 . Notice that the weights matrix
is meant to be transposed. Rather than actually transpose the matrix in memory,
we switch the indices we use when accessing it. This involves switching the row
and column indices, but little else needs to be changed (in the lower level code,
there are checks to ensure the indices are in the range of the desired matrix, and
these checks need alteration as well).
Once every thread has the accumulated value for its index of the matrix, we
load the necessary node value from Y and calculate the delta.

Algorithm 2: Backpropagation Kernel

1: shared WV[T ILE W IDT H][T ILE W IDT H]


GPU ANN 145

2: shared DV[T ILE W IDT H][T ILE W IDT H]


3:
4: i = bx * T ILE W IDT H + tx;
5: j = by * T ILE W IDT H + ty;
6:
7: acc = 0
8: for m = 0; m < num tiles; m++ do
9: WV[ty][tx] = W[r][tile col][j+1]
10: DV[ty][tx] = D[r][tile row][i]
11:
12: syncthreads()
13:
14: for k = 0; k < T ILE W IDT H; k++ do
15: acc += WV[ty][k] * DV[k][tx]
16: end for
17:
18: syncthreads()
19: end for
20:
21: if j < kr and i < batch size then
22: y = Y[r-1][j][i]
23: D[r-1][j][i] = y * (1 - y) * acc
24: end if

5.3 Weights Update Kernel

The kernel for weights updating must implement equation (4). Once we calcu-
T
late Dr+1 Y˚r we only need to multiply each element by the learning rate µ and
subtract it from the respective element of weights matrix. Both the weights and
the biases will be updated at the same time. For the multiplication, we only
T
need to find how to deal with Y˚r as we haven’t stored a 1 at the end of the
nodes values. This is solved with a brief if-else statement that shifts the threads
assigned column down for each thread, and all threads assigned column 0 are
given the value of 1 to create the all-ones vector. Notice that Yr is transposed
using the same simple trick used on the weights matrix in the backpropagation
kernel.

Algorithm 3: Weights Update Kernel

1: shared DV[T ILE W IDT H][T ILE W IDT H]


146 T. Udby and Y. Tian

2: shared YV[T ILE W IDT H][T ILE W IDT H]


3:
4: i = bx * T ILE W IDT H + tx
5: j = by * T ILE W IDT H + ty
6:
7: acc = 0
8: for m = 0; m < num tiles; m++ do
9: DV[ty][tx] = D[r][j][tile col]
10:
11: if i == 0 then
12: YV[ty][tx] = 1
13: else
14: YV[ty][tx] = Y[r-1][i-1][tile row]
15: end if
16:
17: syncthreads()
18:
19: for k = 0; k < T ILE W IDT H; k++ do
20: acc += DV[ty][k] * YV[k][tx]
21: end for
22:
23: syncthreads()
24: end for
25:
26: if j < kr+1 and i < kr + 1 then
27: W[r][j][i] -= mu * acc
28: end if

6 Performance and Discussion

Three different GPUs were used to test the performance of the code: the Tesla
K80, the Tesla T4, and the Tesla P100. The GPU’s and their specs are in Table 1
in order of performance, with the Tesla K80 being the weakest to the Tesla P100
being the strongest. The difference in performance between GPUs provides a
good demonstration of the extent to which the speedup is dependent on hard-
ware.
Two sets of tests were run for each hardware. First, the CPU and GPUs
were run and timed for an amount of hidden layers ranging from 2 to 16 and an
amount of nodes per hidden layer from 10 to 700 (incrementing by 10). These
tests were to observe how the speedup behaved as a function of the amount of
nodes and layers. Because so many networks were being run in this test it was
not feasible to include larger networks in this mass testing as the CPU took too
much time to complete. Instead, the second test observed the speedup for much
GPU ANN 147

Table 1. GPU specifications

GPU Cores Base Clock Speed (MHz)


Tesla K80 2496 562
Tesla T4 2560 585
Tesla P100 3584 1190

larger networks that were run individually. This test ran for 5000 nodes, 7500
nodes, and 10,000 nodes, each over 3 layers. The code was compiled using the
–O2 optimization flag to give the CPU code its best performance. In prior tests
that omitted any optimization, the speedup was much greater for all tests. This
detail is necessary to understand that the results displayed are a lower bound for
the performance of the GPUs. Unoptimized code, or code running on languages
other than C, are likely to afford the GPU much greater speedups over the CPU
through the use of these kernels.
The following sections go over the first round of tests, then after them is a
section on the large network tests. The next three sections will each display a
figure with four parts. The upper left is a snippet of the dataframe containing the
results of the test with the specific GPU. The upper right is a multiple regression
conducted using the language R. Then the bottom two parts are scatter plots
to visualize the data. The first scatter plot is three-dimensional and displays
the relationship between the amount of hidden layers, the amount of nodes per
hidden layer, and the speedup. The second plot is a two-dimensional plot that
cuts out the layers to focus on the effect of the amount of nodes.

6.1 Tesla K80

Observing the dataframe in Fig. 1, for the smallest networks tested the CPU
outperformed the GPU (without the optimization flag, the GPU outperformed
the CPU, but only barely). Intuitively, we can’t really expect any GPU to do
much better than this. The network being worked with is so small the CPU is
able to brute force the computation in nearly the time it takes for a GPU to set
up and tear down its kernels. Only if the clock speed of the GPU cores could
match the CPU could a significant speedup be seen.
However, as the network reaches the largest sizes tested, the speedup has
increased to 83×. The quantity of the many cores in the K80 overpower the
quality of the CPU core. This can be observed visually in the scatter plots.
There is clearly a positive trend with moderate curvature.
We see diminishing returns as the amount of nodes increases. To test our
observations statistically we turn to the regression in the top right of Fig. 1.
After trying several models, a full second order model was chosen. This is because
in the 2D plot we see a change in slope for different amounts of layers which
implies an interaction term, and the diminishing returns we suspect should occur
demands the quadratic terms. The model fits very well, with R2 and Ra2 values
148 T. Udby and Y. Tian

Fig. 1. K80 Test

above 0.94 and all p-values at the highest significance that R will calculate.
Looking at the estimates for the parameters, the tests allow us to be confident
that there is small negative curvature for both nodes and layers, as well as a
positive interaction between them.
(*Note: The equation provided by the regression does not predict very well.
A residual analysis hints that there is a curvilinear relationship that is not
explained by the model, which might come from the warp scheduling done by the
GPU itself, or some other factor external to the code. However, the regression
still allows us to be confident about the overall effect that a change in the nodes
and layers must have on the speedup).

6.2 Tesla T4

The dataframe sample for the T4 is in Fig. 2. The smallest networks tested are
only a little faster than for the K80. The only increase in speedup comes from
the slightly faster clock speed as the increase in cores is not yet utilized. As
explained previously, not much more can be expected for such small networks,
but with the larger networks we see a more dramatic increase from the 83×
reported by the K80 to a 139.7× seen with the T4.
A look at the plots for the T4 show slightly more erratic speedup behavior for
larger networks. This was observed over several rounds of tests and only showed
up with the T4, so it is likely to originate with the hardware rather than our
code.
GPU ANN 149

Fig. 2. T4 Test

The same general trends are observed that were seen in the K80. To ensure
these observations are accurate, we conduct another multiple regression on the
data. Again, the hypothesis tests for all terms show high significance. Also, the
coefficients of determination are strong, but we can see the decrease that corre-
sponds with the more erratic behavior we noted. The small negative curvature
is more gradual this time, allowing quicker growth in speedup before plateauing.

6.3 Tesla P100


Last is the P100. Below, Fig. 3 contains the dataframe and scatter plots for
the test. Immediately we see the incredible effect the increased clock speed and
greater amount of cores have. For the smallest networks the speedup is not too
much better, as expected. The larger networks, however, are achieving very large
speedups with the largest recorded in the dataframe being about 699×.
Besides a few outliers, the scatter plots show what appears to be cleaner
performance than the T4 and the K80, but with the same general trend.
For larger amounts of layers there is slight negative curvature, but none
is seen for the smallest layers. This hints that we haven’t fully observed the
diminishing returns, allowing us to surmise we that much larger speedups may
still be possible. This will be seen in the tests for larger networks. And again we
conduct a quick multiple regression to confirm our observations on the effect the
layers and nodes have on the speedup.
Due to the greater consistency of the P100 over the T4, we see the R2 and
2
Ra values are the strongest seen so far, giving an even better fit. Notice that the
150 T. Udby and Y. Tian

Fig. 3. P100 Test

p-value for the quadratic nodes term has actually increased massively compared
to prior test, which reduces the statistical significant of that term. The large drop
in significance likely comes from our observation that the diminishing returns for
nodes were not captured in the dataframe. Because the trend occurs for much
larger networks on the P100, the statistical test doesn’t observe it with as much
confidence.

6.4 Large Network Tests


Three different sizes of large networks were tested. The first had 5,000 nodes
per hidden layer, the second had 7,500, and the third had 10,000. They all had
3 hidden layers. Table 2 shows the speedups achieved for each size and GPU.
Clearly, each GPU had lots more to give. The most impressive being the P100
which showed a speedup of up to 2379×.

Table 2. Large network tests

5000 7500 10000


Tesla K80 181.42× 252.76× 331.14×
Tesla T4 227.53× 303.4× 424.01×
Tesla P100 1638.71× 1930.46× 2379.2×
GPU ANN 151

6.5 Discussion on Regression Results


The positive estimate for the nodes term is not surprising. The more nodes, the
larger the matrices. The larger the matrices, the more the matrix multiplication
kernels surpass the CPU performance.
The negative quadratic terms are of no surprise. There must be diminishing
returns as there are only a finite amount of cores.
The significance of the amount of layers is a little more nuanced. The for-
ward and backward propagation is serial so more layers should not affect the
speedup; however, the weights updates for every layer can be done at the same
time. Because of this, the CPU triggers the update kernel for all layers without
synchronizing with the GPU, allowing the GPU to conduct all the matrix mul-
tiplications as soon as possible, which is likely why more layers causes greater
speedup.
A point of interest is why the interaction term was shown to be significant.
The first explanation is that speedup is a percentage between the CPU and GPU
times so networks with fewer layers are not going to increase in speedup by the
same number as larger layers, but rather would increase proportionally. To a large
extent this is most certainly true, but it begs investigation whether anything else
is occurring. Figures 4 and 5 show two linear regressions. The dependent variable
of the first the time of the CPU, and the right is the time of the GPU.

Fig. 4. CPU Speed Regression

Both the GPU and GPU find great significance in the interaction term, and
the explanation of proportions previously given no longer applies here.
It appears the increase in layers amplifies the time taken for the amount of
nodes. It may just be a type 1 error, but the p-value of the t-tests allow for
very high confidence. A possible explanation may be that locality is being better
exploited by the forward and backward propagation. Advice given by GPU Gems
[10] states:
“Access vertex data in a relatively sequential manner. Modern GPUs cache
memory accesses when fetching vertices. As in any memory hierarchy, spa-
tial locality of reference helps maximize hits in the cache [...]”.
152 T. Udby and Y. Tian

Fig. 5. P100 Speed Regression

The arrays of nodes and delta values are both being accessed sequentially
along layers when propagating forward and backward. This same data structure
is used by the CPU which has powerful caching and so also benefits from locality,
so this may be the explanation. However, this remains conjecture for now.

7 Conclusion
A GPU provides significant speedup for Artificial Neural Networks. While hard-
ware clearly makes a significant difference in how much improvement can be
expected, improvement will still be achieved. As the regressions show, an increase
in the amount of nodes and layers leads to large increases in speedup. This tells
us a GPU becomes more practical the larger the network becomes, as the larger
network better exploits its parallel capabilities. Overall, the same general trends
are observed in the estimates between all GPUs tested, allowing us to conclude
that they will likely hold on other GPU hardware. A user implementing the ker-
nels shown can expect (depending on hardware) anywhere from several hundred
to several thousand times speedup for large networks.

References
1. Abraham, A.: Artificial neural networks. In: Syden-ham, P., Thorn, R. (eds.) Hand-
book of Measuring System Design. John Wiley and Sons Ltd., London, pp. 901–908
(2005)
2. Akiba, T., Suzuki, S., Fukuda, K.: Extremely large minibatch SGD: training resnet-
50 on ImageNet in 15 minutes. arXiv preprint arXiv:1711.04325 (2017)
3. Beckingsale, D.A., et al.: Portable performance for large-scale scientific applica-
tions. In: 2019 IEEE/ACM International Workshop on Performance, Portability
and Productivity in HPC (P3HPC), pp. 71–81. IEEE (2019)
4. Cao, G., Balasubramanian, N., Balasubramanian. A.: MobiRNN: efficient recurrent
neural network execution on mobile GPU. In: Proceedings of the 1st International
Workshop on Deep Learning for Mobile Systems and Applications, pp. 1–6 (2017)
GPU ANN 153

5. Chellapilla, K., Puri, S., Simard, P.: High performance convolutional neural net-
works for document processing. In: Lorette, G. (ed.) Tenth International Workshop
on Frontiers in Handwriting Recognition, La Baule (France), October 2006. Uni-
versité de Rennes 1, Suvisoft. https://www.suvisoft.com
6. Ciresan, D.C., Meier, U., Masci, J., Gambardella, L.M., Schmidhuber, J.: Flexi-
ble, high performance convolutional neural networks for image classification. In:
Twenty-Second International Joint Conference on Artificial Intelligence (2011)
7. Dematté, L., Prandi, D.: GPU computing for systems biology. Brief. Bioinform.
11(3), 323–333 (2010)
8. Dogaru, R., Dogaru, I.: Optimization of gpu and cpu acceleration for neural net-
works layers implemented in python. In: 2017 5th International Symposium on
Electrical and Electronics Engineering (ISEEE), pp. 1–6 (2017)
9. Dolhansky, B.: Artificial neural networks: Matrix form (Part 5), Decem-
ber 2014. https://www.briandolhansky.com/blog/2014/10/30/artificial-neural-
networks-matrix-form-part-5
10. Fernando, R.: Reducing the Cost of Vertex Transfer, Chapter 28.3.2. Addison-
Wesley (2004)
11. Guzhva, A., Dolenko, S., Persiantsev, I.: Multifold acceleration of neural network
computations using GPU. In: Alippi, C., Polycarpou, M., Panayiotou, C., Ellinas,
G. (eds.) ICANN 2009. LNCS, vol. 5768, pp. 373–380. Springer, Heidelberg (2009).
https://doi.org/10.1007/978-3-642-04274-4 39
12. Hassoun, M.H.: et al.: Fundamentals of Artificial Neural Networks. MIT Press,
Cambridge (1995)
13. Huqqani, A.A., Schikuta, E., Ye, S., Chen, P.: Multicore and GPU parallelization
of neural networks for face recognition. Procedia Comput. Sci. 18, 349–358 (2013)
14. Salar, S., Oskouei, L., Golestani, H., Hashemi, M., Ghiasi, S.: CNNdroid: GPU-
accelerated execution of trained deep convolutional neural networks on android.
In: Proceedings of the 24th ACM International Conference on Multimedia, pp.
1201–1205 (2016)
15. Lee, J., et al.: On-device neural net inference with mobile GPUs. arXiv preprint
arXiv:1907.01989 (2019)
16. Li, B., et al.: Large scale recurrent neural network on GPU. In: 2014 International
Joint Conference on Neural Networks (IJCNN), pp. 4062–4069 (2014)
17. Li, Y., Liu, Z., Kai, X., Hao, Yu., Ren, F.: A GPU-outperforming FPGA accelerator
architecture for binary convolutional neural networks. ACM J. Emerg. Technol.
Comput. Syst. (JETC) 14(2), 1–16 (2018)
18. Ma, Y., Rusu, F., Torres, M.: Stochastic gradient descent on modern hardware:
Multi-core CPU or GPU? synchronous or asynchronous? In: 2019 IEEE Interna-
tional Parallel and Distributed Processing Symposium (IPDPS), pp. 1063–1072.
IEEE (2019)
19. Nugteren, C.: Tutorial: Opencl sgemm tuning for kepler (2014). https://cnugteren.
github.io/tutorial/pages/page1.html
20. Kyoung-Su, O., Jung, K.: GPU implementation of neural networks. Pattern
Recogn. 37(6), 1311–1314 (2004)
21. Pallipuram, V.K., Bhuiyan, M., Smith, M.C.: A comparative study of GPU pro-
gramming models and architectures using neural networks. J. Supercomput. 61(3),
673–718 (2012)
22. Strigl, D., Kofler, K., Podlipnig, S.: Performance and scalability of GPU-based
convolutional neural networks. In: 2010 18th Euromicro Conference on Parallel,
Distributed and Network-based Processing, pp. 317–324 (2010)
154 T. Udby and Y. Tian

23. Vouzis, P.D., Sahinidis, N.V.: GPU-blast: using graphics processors to accelerate
protein sequence alignment. Bioinformatics 27(2), 182–188 (2011)
24. Yegnanarayana, B.: Artificial Neural Networks. PHI learning Pvt. Ltd. (2009)
25. Zhang, S., Gunupudi, P., Zhang. Q-.J.: Parallel back-propagation neural network
training technique using CUDA on multiple GPUs. In: 2015 IEEE MTT-S Interna-
tional Conference on Numerical Electromagnetic and Multiphysics Modeling and
Optimization (NEMO), pp. 1–3. IEEE (2015)
Monitoring Technologies for Animal Welfare:
A Review of Aspirations and Deployments
in Zoos

Ann Morrison(B) and Aleksandra Novikova

School of Future Environments, Auckland University of Technology, Auckland, New Zealand


[email protected]

Abstract. Focusing on zoo environments, we conducted a literature review


investigating the use of non-invasive technologies designed for monitoring the
behaviour and welfare of animals. The research question asks: What technolo-
gies or monitoring methods have been able to capture information on behaviours
and needs of animals in zoo, sanctuary, domestic or agricultural environments?
From the initial literature review, we determined progressive zoos, research labs,
institutions and companies and identified monitoring technologies developed to
improve animal welfare. We then emailed out a concise survey to those zoos to
gauge what monitoring technologies they were using and asked them to identify
where systems and their deployment could be improved. We highlight advances
and developments identified in the literature, to underline current and future mon-
itoring needs of zoo environments. We contribute to the research field by map-
ping these sought-after changes against the most relevant identified monitoring
technologies distinguished in the literature search.

Keywords: Animal welfare · Monitoring technologies · Zoos

1 Introduction
Routine monitoring of animals in captive settings is essential to provide insights into
the quality of life of the animals and for maintaining and improving exacting standards
of animal welfare in zoos. Monitoring technologies are useful for optimizing welfare
strategies with non-invasive observation of animals. Advanced tracking and monitoring
technologies are used for welfare considerations for livestock, captive, and wild animals.
Monitoring the behaviour of animals in zoos promotes science-based decision making
and future planning for best-case animal care solutions [1]. While one of the goals of this
research is to find technological solutions to take labour intensive duties off zookeepers
including smart data collection and analysis, the focus is the identification of non-invasive
monitoring technologies to enhance animal welfare.
We identified progressive zoos, research labs, institutions and companies working
with monitoring technologies to improve animal welfare [2]. We review literature on
wearable and nonwearable monitoring technologies including camera traps, remote
video camera systems (CCTV), additional technologies, software applications and digital

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 155–178, 2023.
https://doi.org/10.1007/978-3-031-18344-7_10
156 A. Morrison and A. Novikova

tools for data collection, storage, sharing, and analysis. We include monitoring technolo-
gies from sanctuary, domestic and agricultural environments as these may also prove fit
for purpose in zoo environments. We sent a questionnaire survey with five plain language
questions to zoos identified as being concerned with animal welfare [2] to determine
what monitoring technologies were already in use and what their future ‘wish list’ would
be.
In structuring the article, we have placed the method section after the introduction and
before the literature section. The method section, (Sect. 2), details the literature review
selection process to provide context for the ‘literature reviewed’ section (Sect. 3) which
forms the greater part of the of the paper. Section 4 covers zoo responses to the five-
question survey on current use, perceived limitations, issues and wish lists for improving
conditions and animal welfare. The discussion section outlines and maps these wish
lists and more general requirements besides existing, or in-technology-development-
solutions identified in the ‘literature reviewed’ section. The conclusion summarizes the
findings and recommendations for the zoo scenarios that may be more broadly applicable.

2 Literature Review Method


We conducted the literature review using samples of keyword searches via google and
google scholar (e.g., zoo monitoring, behaviour monitoring, monitoring, behaviour mon-
itoring, behaviour remote) using the PICO process [3]. We found a variety of publications
and resources that we include in this article.

P (population) – Captive/contained animals


I (intervention) – Technologies, Observations, Monitoring
C (comparison) - Zoo, Sanctuary, Agricultural, Domestic Environments
O (outcome) – Understanding Best Monitoring solutions for animal welfare

We included articles, conference publications, websites, reports, blogs as dissemina-


tion formats. We incorporated technologies used for animal welfare in domestic, farm,
and wild animal settings in the search as these may also prove fit for purpose in zoo
environments. Most of the analysed papers were published within the last ten years,
even though we did not restrict the years within the search process.

3 Literature Reviewed
In this section we highlight the technologies covered in the literature review process that
relate to potential use for zoo environments.
Monitoring the behaviour of animals in zoos can provide valuable insights into animal
welfare and promote a process of science-based decision making in animal management
[1]. Monitoring relates to (remote) monitoring of the animal behaviour, control, and
studying the populations of wildlife (see Fig. 1).
Behavioural monitoring is the scientific collection of animal behaviour data to under-
stand ‘normal’ patterns of behaviour and identify changes in these patterns [4]. Used
effectively, monitoring can indicate problems compromising animal well-being.
Monitoring Technologies for Animal Welfare 157

Fig. 1. Reasons for monitoring. Diagram by Aleksandra Novikova.

We identified non-invasive wearable and non-wearable technologies as tools for


monitoring animal welfare and organised these technologies accordingly.

3.1 Wearable Technologies

Advances in sensor technology (especially miniaturization) mean multiple wearable


devices have been specifically designed or modified for animal use [5]. Devices include
smart collars and cuffs using tracking, accelerometer sensors [6], pedometers and real-
time health monitoring systems with antenna, relay routers, and base stations [7]. These
are widely used in situ conservation and in agriculture. Non-invasive wearable tech-
nologies can be used as a primary or supportive tool, depending on the animal and
their tolerance for ‘attachments’ on their body. For domestic and wild animals, wearable
sensors are primarily used to detect and track the animal’s movement. The technolo-
gies can provide insights into the behaviour and function of organisms in their natural
environments, which might ordinarily be hostile to the observer, supporting determining
animal’s social relationships; and obtaining precise movement patterns.
Bio-logging and bio-telemetry monitor physiological, behavioural, or otherwise dif-
ficult to observe or unattainable environmental information [8]. Bio-logging technol-
ogy records and stores information in an animal-borne device (archival logger), infor-
mation is downloaded once the logger is retrieved, where bio-telemetry technology
sends information to a receiver within the device [9]. Logger technologies are primarily
used for monitoring and evaluating behaviour, spatial ecology, energetics, and physi-
ology of free-living animals in their natural, harsh environments (e.g., polar regions,
aquatic/marine environments), for rapidly moving or cryptic animals, and for those that
undertake large-scale movements/migrations (e.g., birds, insects, marine mammals and
fish) [9]. Lightweight geolocators or satellite transmitters [10] have enabled modelling
of migratory routes and wintering areas for large and small birds, facilitating testing pre-
dictions on migration strategies. With light-weight radio transmitters even insects can
be tracked for at least a part of their migratory journeys [10]. With environments that
are hostile to the observer, bio-logging technology provides insights into the behaviour
and function of Sea Mammals [8]. Combining these developments changes the capacity
to conduct ecosystem-scale science and to improve the capacity of scientists to explore
unanswered ecological questions [11]. We see bio-telemetry (radio telemetry, acous-
tic telemetry, satellite tracking), biologging (archival loggers), and hybrid technologies
158 A. Morrison and A. Novikova

used for understanding the threats and causes of population decline and assessment of
endangerment status of species [9].
A wireless activity monitoring system (Wireless Sensor Network (WSN)) would
allow scientists to collect data and investigate behaviour without needing to chase and
capture animals and offers a promising solution to monitor animal behaviour [12].
The International Polar Year project [13] used a Conductivity-Temperature-Depth
Satellite Relay Data Logger with southern elephant seals to quantify how animals respond
to differences in the environment because the seals’ behaviour and population trends
signal prevailing conditions for multiple marine habitats. The research collated estimates
of population size to determine the number of southern elephant seals in the Southern
Ocean, comparing these to published numbers to determine overall change.
Twenty-six baboons were each equipped with a smart collar that embeded a tri-
axial accelerometer and GPS to identify running, walking, sitting, standing, and feeding
activities. The system fuses sensors data to perform intelligent behaviour identifica-
tion, allowing for automatic activity profiling by using the ethologists’ agreed activity
identification system and avoiding prior subjectivity in categorising activities [14].

3.2 Non-wearable Technologies


We reviewed non-wearable technologies for monitoring welfare and identified five main
categories: (1) PAM (passive acoustic monitoring), (2) camera traps, (3) remote video
camera systems including CCTV, (4) additional technologies, and (5) drones.

PAM (Passive Acoustic Monitoring)


Passive acoustic monitoring (PAM) is a non-invasive method for surveying wild ani-
mals using remote acoustic technologies such as microphone arrays, hydrophones, or
other autonomous recording devices [15]. While PAM is used effectively in wildlife and
agricultural animal welfare, it has limited use in zoos, impacted by privacy issues for
keepers and zoo visitors (conversations among the zoo staff, between the public etc.),
so we have not expanded on this technology here.

Camera Traps
Camera traps are remote devices equipped with sensors (e.g., motion, infrared) that
record images or videos automatically. They are an important wildlife research tool
that offer a practical approach to answer questions about wildlife beyond density or
estimation of animal populations [16]. For example, camera traps allow researchers to
determine the presence of rare species and sometimes reveal how to better support their
recovery [17]. When used in combination with telemetry, they are useful to examine
scavenging behaviour [18]. Camera Base software is a tool that helps biologists manage
data from multiple camera trap surveys and provides tools for data analysis including
capture-recapture, occupancy, activity patterns and diversity [18–20].
[20] compared the efficiency of arboreal camera trapping with line transects for
inventorying medium and large-sized arboreal mammals and assessed the viability of
using camera traps in trees to model habitat occupancy. Cameras recorded 10-s video
clips for ease of identifying species with 200–300 videos processed per hour. Videos
can be reviewed at double speed and analysed in statistical software.
Monitoring Technologies for Animal Welfare 159

Collaborative wildlife monitoring and tracking large geographical and time scales
with volunteer citizen scientists using camera-traps (motion sensitive cameras) has
expanded conservation research [21, 22]. Priorities identified for future improvement
include automated camera-trap image analysis for animal detection, tracking, species
recognition, advanced machine learning and image analysis methods to improve per-
formance and successful deployment. 2.6 million images of several North American
mammals were processed using eMammal, a biological informatics cyber-infrastructure,
which brings together citizen scientists and wildlife professionals to collect, analyse, and
manage massive camera-trap data. The system comprises: (1) software for viewing, tag-
ging, and uploading photographs, (2) expert review to ensure data quality, (3) an archive
for approved data, and (4) a website for managing the study, including the partici-
pants, and accessing and analysing the data. Macrosystem scale monitoring of wildlife
by volunteer-run camera traps could produce the data needed to address questions
concerning broadly distributed mammals and raise public awareness of conservation
science.
Using 83 camera traps (Bushnell Trophy Cam™), researchers examined the accuracy
of camera trap data to provide assessments of chimpanzees (Pan troglodytes) party size,
seasonal variation in party size, community demographic changes (births, deaths, emi-
grations, immigrations), and community composition (age/sex structure) and habituation
to camera traps [23].
A photographic capture – recapture survey used remotely triggered modified and
installed Pentax ‘point and shoot’ cameras in a waterproof plastic box with a receiver and
separate wireless passive infrared trigger. Later modifications enabled infrared images.
Remote RFID (Radio-frequency identification) scanners have been deployed in a range
of situations for passive monitoring and work well in the wild to record the diversity of
co-occurring species [24].
Bushnell camera traps with infrared sensor and low glow LED flashs, equipped with
SD cards and lithium batteries were left in place to take bursts of three pictures [25].
Camera trapping combined with citizen science was efficient for long-term non-invasive
monitoring at low cost.
A remote camera trapping method took images and video, providing identification
of individual free-roaming wild horses across a range of habitats and capturing multiple
animal-based welfare indicators. This was useful where horses could not be sighted
regularly, for a long enough duration or approached closely enough to enable direct
assessment of welfare. Precise, strategic camera placement and settings enhanced quality
of the data and minimised battery usage and SD card storage [26].
Comparative Review
Comparative testing of the five most frequently used camera traps [27] (Bushnell Tro-
phy Cam Aggressor, Keep Guard 680V, Ltl Acorn Ltl-5310, Scoutguard SG550BV,
and Reconyx HyperFire) identified key factors influencing the probability of successful
usable photographs. Performance differences from varied settings demonstrated cau-
tion is needed for direct comparisons between results of different experiments, or when
designing new ones [27]. The study [28] compares three commonly used camera traps
(Reconyx PC850, Scoutguard KG680v, Bushnell Trophy) used for monitoring behaviour
160 A. Morrison and A. Novikova

of fauna, general survey of fauna and detection of medium to large terrestrial animals to
improve fauna conservation.
Testing in the Zoo with Trail Cameras
Trail cameras were tested in three zoos; Auckland Zoo, Hamilton Zoo, and Currumbin
Wildlife Sanctuary to examine how red panda would respond to these cameras within the
context of gauging their usefulness for wild settings. The author [29] used two main types
of cameras: a Kinopta Blackeye BE2-W (‘Blackeye’) and two different models of trail
cameras: a Bushnell Trophy Cam Aggressor and Browning Dark Ops sub micro-series.
Direct personal observations were also taken, noting typical significant factors, such as
weather and temperature. Statistical analysis demonstrated a significant difference in
types of behaviours recorded with the two observational methods, exposing that method
does affect the type of data collected. Trail cameras affected behaviour at all zoos by
changing the way red panda spent their time, with captive red panda more active when in
trail camera presence. Temperature also had a significant impact with red panda sleeping
and resting longer at higher temperatures. As trail cameras changed the way red panda
spent their time (in a captive setting), care should be taken for using trail cameras in the
wild to compensate for inflated activity estimates.

3.3 Merging Wearable and Non-wearable

Advances from camera trap array data (Reconyx infrared cameras) paired with data
collected from GPS (wearable) tracking collars (containing a triaxial accelerometer, and
ultra-high-frequency transmitter for telemetry and data download) was used to detect
whether, at the population level, the spatial and temporal patterns of detections reflected
the proximity of space use to sampling sites, or variability in the magnitude of animal
movement across the area [30]. Not accounting for multi-species movement may bias
inferences of ecologic processes and result in mis-specified recommendations.
Nonwearable Wildlife Advanced Monitoring Camera (WAMCam) and wearable
(smart collar) monitoring technologies were combined [31]. WAMCam is a smart camera
unit, connected by satellite communications and backed by a system control panel to
manage a collection of deployed devices [32]. This system combines several WAMCam
smart devices, communicating over LoRaWAN with a SATCOM gateway device. The
rugged, battery-powered cameras are designed with AI onboard, capable of identifying
difference species of interest. WAMCam devices monitor live animal traps and send
notifications to the end user when the trap is triggered via SMS and/or email in real time.
To minimise cost, the WAMCam system uses Iridium SBD messaging to notify the user
of the animal trap status and contents. Small, text-based messaging works for sites with
satellite visibility issues. SBD messages are received at the ground station and forwarded
to the Cerebella middleware, where they are processed and passed to the end user as
notifications. Frequency of status reports can also be configured remotely. Notifications
can include the detected species in the trap or indicate when the trap is empty and was
accidentally triggered, e.g., by a falling branch. LoRaWAN allows the user to position the
animal trap where required, unconstrained by satellite visibility constraints. The system
is configured via the web-based Cerebella control panel where devices are managed, with
status updates. Use demonstrated the multi-scale modelling identified primary habitat
Monitoring Technologies for Animal Welfare 161

requirements, limiting factors and the spatial scales at which organisms are strongly
associated with key habitat factors. The projected model provides crucial information
for conservation management, including the identification of suitable core habitats and
medium-quality habitats, critical to meta-population viability through provisioning of
essential connectivity corridors for dispersal and mating among core populations.

3.4 Remote Video Camera Systems


Remote video camera systems are also useful tools for monitoring animals. Video cam-
eras can facilitate around-the-clock monitoring of animals, providing visual access to
their natural habitat where direct observation would be difficult. Observing animals via
video surveillance can provide an in-depth, intimate look into their behaviour and may
reveal unique behaviours particular to day or night [33].
Remote video camera systems, or closed-circuit television (CCTV) cameras, come
in many models, including analogue CCTV, digital CCTV, wireless/wired systems, HD-
TVI CCTV, IP cameras (advanced CCTV), portable CCTV, or trial cameras. The CCTV
camera lens focuses light onto a sensor. Electronics convert this signal to analogue video.
Some cameras include audio. Wildlife CCTV cameras often need an infrared LED light
source to cover nocturnal animals. During the day, sunlight is reflected from the animal
for the camera to produce a colour image. In low light, the camera uses its own infrared
light source to generate a black and white image.
CCTV cameras can be waterproof and weather resistant. High-end cameras use an
IP rating where IP stands for International Protection marking. The most suitable rating
range is IP65 to IP68; where the first digit 6, denotes dust tight, the second depicts water
tolerance. For IP65, the camera withstands jets of water. For IP68, the camera can be
submerged. IP65 to IP68 are waterproof in heavy rain.
Analogue HD system (HD-TVI CCTV) deliver high-quality video. With higher-
quality images, HD-TVI use longer lengths of cable without signal loss and delivers
excellent colour saturation. The cost depends on size and signal quality. Image quality
is described in term of TV lines (TVL). Standard analogue cameras with 600 TVL are
not expensive, where 1,000 TVL are high-resolution and more expensive.
Live feed is usually viewed through a monitor in a control room, particularly when
used for security and safety measures. Better recorders have motion detection. Trail
cameras depend on detecting changes in heat, rather than motion. Capture devices often
include software and screens for direct video viewing. More than one camera stream
requires a connected capture card or PC software to set up motion detection parameters.
All CCTV cameras need a power source. Most use mains power (wired), but some
are wireless. A wired system requires a wire for power attached to the video and one
connected to the camera. In a wireless system, the camera signal is transmitted through
the air, but power is still needed by the camera, either by cable (removing the advantage
of wireless systems), or by battery (which needs regular recharging). Solar panels can
charge the battery and be mounted close by.
Once videos have been reviewed, selected videos can be exported in short clips and
edited in a video-editing package. Data analysis can be done manually or via machine
learning. Remote video camera systems monitor day and night cycles of animal life
with many applications for CCTV [34]. The research [34] makes recommendations on
162 A. Morrison and A. Novikova

monitoring wildlife with low-cost solutions to make CCTV more accessible to wildlife
practitioners and naturalists. While [35] provides recommendations for animal facilities
on installing systems. They outline the benefits of camera systems for sanctuaries to
facilitate animal care and observational research. Further, [36] identified costs, main-
tenance logistics, and location as issues and recommended use for easily identifiable
behaviours. The study [37] used CCTV for sleep monitoring combined with cortisol
measuring for stress testing to assess animal welfare states.

Examples of Use in Zoos


We summarise here examples from the literature review of successful application of
CCTV in zoos and the combined use of multiple behavioural observation technologies,
including camera traps, and programs like ZooMonitor [1] to gather information on
activity budgets, habitat use, and social interactions. These in turn inform management
decisions to improve the welfare of animals in their care.
Chester Zoo, UK used video surveillance system Axis IP cameras in combination
with Milestone’s XProtect video management software, enabling personnel to monitor
live views and easily search and quickly retrieve footage from recordings [38].
Birmingham Zoo, Alabama, USA used high-resolution cameras in MOBOTIX
surveillance system to enhance zoo security while collecting critical information on
animal behaviour. The study [39] recorded elephants’ behaviour and used a portable
MOBOTIX camera to monitor a pregnant female orangutan that recently gave birth.
The event and the baby orangutan’s first days were available viewing for zoo officials
through remote access, providing detailed scientific data than was previously possible.
MOBOTIX is a decentralized video system and includes professional video management
software, to allow unlimited users, layout editor for floor plans, interface and camera
view and reduce the numbers of cameras needed by incorporating a high-speed com-
puter into every camera. This reduces network bandwidth as video footage is processed
within the cameras. One MOBOTIX camera with 3.1 megapixels records more detail
than traditional CCTV cameras with larger image areas of up to 360-degrees [39].
A combination of a Genetec closed-circuit infrared camera system (CCTV) (Genetec
Security Center), five camera traps (Bushnell Trail Camera Trophy Cam HD), and Zoo
Monitor (mobile application software) were used for behavioural observations for one
male and six female Asian elephants in The Smithsonian’s National Zoological Park
(NZP), USA [40]. They compared video and image capture methods to examine activity
budgets, habitat use, and social interactions. They found camera traps were a reliable
technology for comprehensive, 24/7 surveillance of animals in zoos that cannot install
CCTV. Either method can be used to determine accurate activity budgets or habitat use.
30-min focal observations via ZooMonitor better described changes in social interactions
over time.
CCTV can also be used to livestream activity using live streaming capable IP cameras
to YouTube or other platforms, such as Panda Cams (YouTube & Live PandaCam) Zoo
Atlanta, USA [41]. The Dublin Zoo, Ireland has live webcams on their wolves, penguins,
elephants, and animals from the African Savanna area with the aim of motivating con-
servation awareness through bringing animals and humans together [16]. Baseline data
on the Dublin Zoo herd of Asian elephants added to existing knowledge on locomotory
Monitoring Technologies for Animal Welfare 163

behaviour of elephants in urban zoo environments and provides a basis for future wel-
fare recommendations [42]. These elephants displayed behaviours and travel distances
comparable to those in the wild [43]. Data was collected without disturbing elephants’
usual routines. The work promotes monitoring technology use in further zoo studies,
alleviates the need to attach sensors to animals and enables footage to be played in real
time or viewed later.
Delhi Zoo installed CCTV cameras (n = 230) on the premises and in animal enclo-
sures, for 24/7 monitoring of animal and human behaviour [44]. The zoo plans to intro-
duce virtual reality technology, to allow visitors to “get closer” to the animals, and a
GPS-based mobile application to make zoo visits more engaging and informative. The
technologies can provide dependable behavioural information 24/7 while minimising
time and resources used in long-term monitoring. Long-term behaviour data can be inte-
grated into zoo management strategies to respond to the changing needs of animals to
social, environmental, or physical changes.
The Association of Zoos & Aquariums (AZA) Animal Welfare Committee recom-
mends that zoo professionals develop tools for measuring zoo animal welfare on an indi-
vidual animal-based level. Multiple zoos and aquariums have developed their own assess-
ment tools and programs. These include EthoTrak® (developed by the Chicago Zoolog-
ical Society), EthoSearch (developed by Lincoln Park Zoo and partners), ZooMonitor
(developed by Lincoln Park Zoo and partners), WelfareTrak® (developed by the Chicago
Zoological Society and partners), and the geriatric animal quality of life assessment pro-
cess developed by San Francisco Zoo’s Wellness and Conservation Center. These tools
are provided for the zoological community to engage in on-going behavioural moni-
toring and facilitate a continual assessment of animal welfare. Some are offered free
to Accredited Organizations (Zoo, Aquarium, Sanctuary or Museum). For example,
ZooMonitor is a popular free application used in many zoos including the Smithsonian’s
National Zoological Park, North America, the sanctuary Chimp Haven, Shreveport, LA,
etc. Companies selling technology may supply their systems with inbuilt software, such
as Gview, supplied as part of the CCTV system.

Examples of Use on Farms


CCTV is also used for monitoring livestock in the farming industry. With increasing
farm size and diversity of tasks, farmers can benefit from automatic animal behavioural
surveillance [45]. A system based on Internet of Things and machine learning cameras
(cv2.VideoCapture) with environmental sensors for ambient light, NH3, H2S, CO2,
temperature and humidity was used for evaluating the health and welfare of goats in
precision goat farming to assess their daily behaviour and provide real-time monitoring
of their welfare [46]. The architecture of the on-farm monitoring system had several
components, including sensing, data transmission, application layers and Wi-Fi-enabled
communication and data transmission between the hardware node and remote server.
The Faster R-CNN algorithm detected and identified individual goats. Food or water
lines were drawn to identify eating and drinking behaviour, so goat behaviour could be
classified as drinking or eating once the goat’s head was beyond the food or water lines.
Economic gains and breeding efficiency were improved with reduced manual labour
costs, timely offering of adaptive living conditions, and growth care for goats. As a
164 A. Morrison and A. Novikova

multifaceted and multilevel monitoring system of goat welfare, this system may provide
a useful reference for future precision livestock farming and surveillance.
Surveillance of farm animals and automatic detection of deviant behaviours is evolv-
ing in livestock science and farming [45]. [45, 47] use two computer vision algorithms to
analyse and record the movement activity of single-housed sows. The system transforms
the signal, so sows are reliably detected and monitored, with detection levels customised
so unexpected behaviour raises alarms.

3.5 Additional Technologies and Applications


Many other technologies are used for monitoring animals’ behaviour, including, but
not only in zoos. The author [48] used track plates to measure white-footed mouse
(Peromyscus leucopus) activity around individual trees over summer to compare track
activity to predation rates on gypsy moth pupae (Lymantria dispar) deployed on the same
trees. The behavioural response of mice to track plates was evaluated by comparing rates
the oat grains placed on and near track plates were consumed. The acetate sheets with
a graphite, alcohol, and oil coating had relative superior water-resistance and utility.
Hence, [48] concluded track plates offer an economical and reliable quantification of
local risk of attack by terrestrial mammals without altering spatial risk distribution.
Disney personnel conducted research combining individual animal welfare monitor-
ing with measurement of environmental conditions, (comparing sound pressure levels)
to inform science-based animal management decisions [49]. It [49] tested foam, plastic,
and plywood barriers for efficacy. Sound reduction for all three was greater for higher
frequencies vs. lower frequencies. Animal care and science personnel developed a model
that tracked animal keepers’ daily assessments of an animal’s physical health, behaviour,
and responses to husbandry activity; these data were matched to different external stim-
uli and environmental conditions, including sound levels. This approach used elements
and tools from various existing welfare assessment programs and emphasised customi-
sation to individual animals to include daily tracking of multiple welfare measures. The
objective was to better understand how specific events in their animals’ environment
influence their welfare and use that information to inform management decisions.
Social Interactions
A collaborative study from Zoos Victoria examined social interactions with technology
use between humans and animals [50]. Researchers examined five interactive systems
with two used by visitors (Digital Signs and the Zoopermarket), two used by zoo person-
nel with visitors (Educator Screens and Volunteer iPads), and one used by zoo personnel
with animals (Apps for Apes). Use data was gathered from interviews, digital con-
tent and observations investigating tensions between technology and the experience of
viewing animals and technology and the ‘natural’ environment of the zoo. Researchers
recommended mitigation via design choices and incorporation of technology into the
naturalistic landscape of the zoo [50].
Environmental Temperature
The remote environmental monitoring system Sensaphone WSG30 (wireless monitor-
ing), alarm and event logging system with temperature and power sensors was installed
in Elmwood Park Zoo, Norristown, USA [51]. The system watches over areas that house
Monitoring Technologies for Animal Welfare 165

reptiles and monkeys. Temperature is key in this building, because reptiles and amphib-
ians are housed on the upper level and mammals on the ground floor. Each require unique
settings. If the system detects a problem, alerts are instantaneous. Additional entry and
motion sensors can operate as a whole building security system.
Cardiopulmonary Activity
The study [52] used digital cameras for basic health checks to reduce anaesthetic use for
zoo animals. Monitoring included nine species of zoo animals: giant panda (Ailuropoda
melanoleuca), African lions (Panthera leo), Sumatran tiger (Panthera tigris sumatrae),
koala (Phascolarctos cinereus), red kangaroo (Macropus rufus), alpaca (Vicugna pacos),
little blue penguin (Eudyptula minor), Sumatran orangutan (Pongo abelii) and hamadryas
baboon (Papio hamadryas) [53]. The non-contact, non-invasive and cost-effective mon-
itoring system uses digital camera imagery to extract cardiopulmonary signal (PR and
BR) of unrestrained animals at different distances detecting motion on the animal body
surface caused by cardiopulmonary activity. This novel method provides non-contact
physical monitoring and remotely sensed health assessment of animals, demonstrating
promise for applications in veterinary practice, conservation, game management, animal
welfare and zoological and behavioural studies.
Thermal (Infrared)
The author [54] worked with thermal (infrared) imaging in a sanctuary setting where
unrestrained chimpanzees were able to move freely around their enclosures. This was
coupled with an evaluation of pairing information with long-term behavioural data for
a multifactor welfare monitoring system. Use of thermal imaging in large and complex
environments is useful where enclosure elements may otherwise occlude (e.g., trees,
low-light conditions) or for e.g., non-invasive documentation and tracking of wound and
infection healing from a distance.
Used for observation of wildlife in their natural habitat and overview of thermal
physics and the thermal imager, [55] included a manual on sound survey design, the-
ory and performance characteristics of thermal imaging cameras with cooled quantum
detectors and uncooled micro bolometric imagers as introduced in past decades [55].
The study [56] describes how thermal images (or thermographic cameras) work and
presents some examples of using this technology in a variety of contexts beyond wildlife
monitoring, including research on migrations [57], behaviour (e.g., flight patterns; [58]),
welfare and disease diagnosis [59], to avoid killing of animals (e.g., farmland bird nests,
fawns) during mowing [60], to detect wind farm collisions of birds [61].
The contrast between the heat emitted by animals and their immediate surroundings
can help detect them efficiently and unobtrusively, particularly at night, with cryptic
background or when hidden by vegetation [62]. Complexities such as ambient tempera-
ture, insulation by fur, surface temperature vs. core body temperature, distance to target,
field of view of the lens meant pilot studies/case studies were required. For data col-
lection, thermal imaging is passive under day and nighttime conditions. It minimizes
disturbances to wildlife and detects animals which are colder, warmer, or the same as
their background temperature because it does not compare temperatures but detects heat
emissions of the animal against its background.
166 A. Morrison and A. Novikova

Reviews of Multiple Technologies


The research [63] reviewed four indirect noninvasive methods for primate conservation
—camera traps, acoustic monitoring, drones, and portable field labs—and improvements
in machine learning that offer rapid, reliable means of combing through the large datasets
these methods generate.
Portable field labs analyse primate faeces for endocrinological, diet, and genetic
studies, revealing parasites, diagnosis diseases, etc. Genomics is progressively valuable
as a tool in wildlife conservation for species identification and understanding dynam-
ics of endangered populations [64]. It also assists in identifying inbreeding depression,
population structure, and impacts of population fragmentation [65]. Molecular epidemi-
ology from genomic data is an increasingly common tool in primate health monitoring
[66]. Miniature tools for molecular processing of field samples is now pervasive with
portable and compact USB-powered sequencers [67] enabling obtaining data on a wide
variety of primates, and analysis of this information on site. There are limitations with
infrastructure requirements, cost per sample, necessary equipment, lower throughput and
higher error rates [68]. Rapid developments in flow cell chemistry and bioinformatics
pipelines can address some of these [69–71].

Drones
Drones (also known as unmanned aerial vehicles, UAVs and remotely piloted aircraft
systems, RPAS) are remotely operated aircrafts with autonomous flight capabilities.
Drone surveys allow rapid and frequent monitoring in remote and poorly-understood
areas, with data immediately accessible and rich information on habitat and conservation
related conditions [72]. The author in [73] describes a female chimpanzee making two
sweeps at an overhead drone with a branch that she held in one hand. The second sweep
successfully downed the drone, demonstrating forward planning with tool-use and in this
instance, the perceived invasiveness of the drone. Studies [74] and [75] discuss the use
of drones for wildlife conservation, including the three common types of conservation
drones, outlining the pros and cons of each version. There is much potential for drone
use in larger scale environments and for conservation purposes, to detect and monitor
arboreal mammal populations and to assess species occupancy and distribution.

3.6 Data Analyzing Applications (Software)

One of the most critical issues in using technologies in addition to data collection is data
analysis. Different applications are being developed to combine images and/or video with
analytics for smart event detection and automatic control of the technology--reducing or
eradicating the need for user interaction or participation. Some species-specific welfare
monitoring programs are being designed based on multi-institutional studies that tested
many parameters on a single species or taxa. Artificial intelligence is increasingly used to
improve wildlife identification, monitoring and analysis of large amounts of conservation
data, coming from multiple sources such as camera trap, satellite and drone images or
audio and video recordings [76]. Digital tools that increase efficiency in data collection
and visualization are becoming increasingly available. The author [49] points to ideas
surrounding welfare that is unique to individual animals and contexts.
Monitoring Technologies for Animal Welfare 167

4 Questionnaires
To understand what monitoring technologies zoos concerned with animal welfare were
already using – and what their ‘wish list’ for future improvements would be—we sent
out five straight forward plain English questions.

1. What technologies do you use for animal monitoring?


2. For what purposes do you undertake monitoring?
3. With which animals are these technologies used?
4. What brands are your technology solutions?
5. In a perfect world, what else would you like these technologies to be able to do?

Using the samples from keyword searches via google and google scholar (e.g., zoo moni-
toring behaviour remote) using PICO process, we had found a variety of publications and
resources that included this list of zoos (see Table 1)—identified as taking a progressive
approach to animal welfare [2]. The identified Zoos were:

Table 1. Zoos identified in the literature [2]as progressive in relation to animal welfare and use
of technology

• Birmingham Zoo • Moscow Zoo


B Bryan Preserve • Nikolaev Zoo
• Caldwell Zoo – Wilder Institute/Calgary • Indianapolis Zoo Simon Skjodt
Zoo International Orangutan Center
• Zoológico de Cali • Zoos Victoria
• Cameron Park Zoo • Auckland Zoo
• Chicago Zoological Society (Association of • Lincoln Park Zoo
Zoos and Aquariums • North Carolina Zoo
• Columbus Zoo & Aquarium • Point Defiance Zoo & Aquarium
• Hai Park Kiryat Motzkin • Saint Louis Zoo
• Kaliningrad Zoo • Tharonga Sydney (on the list)
• Kiev Zoo • WAZA (World Association of Zoos &
• Los Angeles Zoo Aquariums)
• Woodland Park Zoo
• Wuppertal Zoo

The approach towards all zoos was via email or where no email contact was available,
via their online form queries system. We used the same request text for all enquiries.
Dear [ZOO NAME],
We are researchers at Auckland University of Technology. We are doing a study that
involves identifying the best animal welfare monitoring solutions used by the most pro-
gressive zoos and sanctuaries. We are looking at technology solutions that help identify
and address animal behaviour issues and take the workload off zookeepers.
Could you please pass on this short questionnaire to the right person/people in your
organisation? The findings from this survey will be presented in a report, a copy of which
can be sent to your organisation.
168 A. Morrison and A. Novikova

1. What technologies do you use for animal monitoring?


2. For what purposes do you undertake monitoring?
3. With which animals are these technologies used?
4. What brands are your technology solutions?
5. In a perfect world, what else would you like these technologies to be able to do?

If convenient, can you email me the answers to these questions, or I can also zoom/phone
in to discuss depending on what suits you best.
Ann Morrison (contact details etc.).

4.1 Responses
Four zoos graciously participated, and we present their responses to the questions here:

What Technologies Do You Use for Animal Monitoring?

#1 Our main method of monitoring animals is video cameras that are trained on the
enclosures 24/7.
#2 For our welfare assessments, we enter the data into ZIMS/Species 360. Keeper staff
helped decide what aspects we would like to monitor and then a form was made
for them to fill out. Once it is filled out, it is sent to the Animal Care Supervisor
of Mammals and our veterinarian for review, then entered ZIMS/Species 360. The
hard copies are kept in a file for each individual or in their information folder.
#3 ZIMS. We currently aren’t using the Care and Welfare module yet but are planning
to slowly implement in the next few months. Internet to look up info or help in
creating ethograms. Video/cameras. Thermometers/Hygrometers? For monitoring
animal environments. Metasys?
#4 The primary technology that we use for animal monitoring is the ZooMonitor app
(www.zoomonitor.org). This is an app that was originally developed by Lincoln
Park Zoo in partnership with Tracks Data Solutions, largely funded by the Institute
for Museum and Library Services. Trained observers (volunteers, interns, research
staff, keepers) watch the animals and record animal behavior and space use on tablet
devices (iPads), and the ZooMonitor software provides some basic summary data
and intuitive heat maps to visualize how animals are using their habitats. We are in
the process of expanding the app to facilitate multi-institutional animal monitoring.

In addition, we use Monnit sensors to remotely detect activity and habitat or feature
use (www.monnit.com), motion-triggered or time-triggered trail cameras (e.g., Bushnell.
com, www.Wyze.com), and small “spy” cameras (brand = Blindspot). We also have sev-
eral habitats equipped with 24-h camera surveillance. We will sometimes extract sys-
tematically collected behavior information from our primary record-keeping software,
Tracks. (www.trackssoftware.com).

For What Purposes Do You Undertake Monitoring?


Monitoring Technologies for Animal Welfare 169

#1 This enables us to monitor health, behaviour, group interaction, aggression, interac-


tion with devices etc. If and when we detect any concerning behaviour, the caretaker
is then instructed to monitor closely in person.
#2 All our animal data is entered into ZIMS/Species 360, so it made the most sense to
use that software for our assessments as well. We do not enter the assessments right
into ZIMS, in case more information is needed from the supervisor or vet.
#3 Gaining info about animal interactions, conspecifics, and mix species. Parturition.
Shifting and moving animals around habitats. Medical or dietary observations.
#4 We have an ongoing monitoring program for about 30 species at the zoo (primarily
fuelled by our trained volunteers). Some species were originally selected due to
questions about their behavior, space use or welfare, but not all. Some were chosen
to provide variability of observers, to diversify the taxonomy of our monitored
species, or for logistical reasons. Additionally, we initiate monitoring in response to
questions raised by animal managers, and in response to research questions pursued
by our scientific staff. Often times the projects that are sparked by a question will
transition into long-term monitoring, since the initial project foundation has been
established.

With Which Animals Are These Technologies Used?

#1 In principle all the enclosures are under constant passive video monitoring, but if
and when there is a particular concern we then switch to active monitoring.
#2 In the mammal department, we do assessments on all the individuals. Depending
on health and age, we will do them more often. Some individuals are twice a year,
while others are four times a year.
#3 All animals but less so with our program animal reptiles/invertebrates.
#4 ZooMonitor app has been used as part of an ongoing, long-term monitoring pro-
gram for the African lion, African penguin, Allen’s swamp monkey, American avo-
cet, Asian small-clawed otter, Bactrian camel, Black bear, Black rhino, Black-and-
white colobus, Black-necked stilt, Brush-tailed betting, Chimpanzees, Cinereous
vulture, Crowned lemur, De Brazza’s monkey, Eastern screech owl, Egyptian fruit
bats, Giraffe, Golden-headed lion tamarin, Gorillas, Grey seal, Guam rail, Guam
kingfisher, Harbor seal, Japanese macaques, Jamaican Iguana, Klipspringer, Ornate
box turtle, Polar bear, Pygmy hippo, Red river hog, Snowy owl, Takin, Titi monkey,
Three-toed box turtle, White-faced saki monkey and others.

Trail cameras, small spy cameras, or built-in camera systems have been used to
monitor: African lions, American toads, Domestic chickens, Dwarf crocodiles, Pygmy
hippos, Polar bears, Prevost squirrels, White-blotched river stingray and others. Brush-
tailed bettongs and the Armadillo species have been monitored using remote sensors.

What Brands Are Your Technology Solutions?


170 A. Morrison and A. Novikova

#1 Provision.
#2 We use ZIMS/Species 360.
#3 Camera software genetic security. Trail cameras all different types and brands.
ZIMS.
#4 ZooMonitor app (www.zoomonitor.org), Monnit sensors to remotely detect activity
and habitat or feature use (www.monnit.com), motion-triggered or time-triggered
trail cameras (e.g., Bushnell.com, www.Wyze.com), and small “spy” cameras
(brand = Blindspot). Extract systematically collected behavior information from
our primary record-keeping software, Tracks. (www.trackssoftware.com).

In a Perfect World, What Else Would You Like These Technologies to Be Able
to Do?

#1. Measure cortisol.


#2 ZIMS/Species 360 does everything that we currently need.

In response to further queries on remote in and alerts:

#1 You can access it from home, but I have never tried to set up any alerts. We also
have never used it for behaviour analysis. We have used ZooMonitor, but we don’t
use ZIMS/Species 360 in that form. I am sure it is possible, but we don’t use it that
way here.
#2 Audio. A perfect monitoring camera would be portable, easy to attach places,
weatherproof, have night vision, more recording capabilities, remotely con-
trolled/moveable and viewable, and audio.
#3 We are expanding the ZooMonitor functionality to support multi-institutional data
collection which we think is a step in the right direction! In a perfect world, behav-
ioral monitoring apps like ZooMonitor would have built-in analytics that indicate
real-time when welfare has likely improved or declined in quality. In a perfect
world, there would be non-invasive, accurate, automated recording of behavioral
and physiological changes in animals. The remote sensors are typically made for
larger animals, people, so more sensitivity for smaller-bodied animals, burrowing
animals, flying animals, would be great. Ability to train motion-triggered cameras to
the type of motion of interest (e.g., a moving wolf but not a moving stick) and to fol-
low that motion, view the full scene, would also be ideal, combined with automated
coding of the recorded information.

4.2 Summary of Responses to Questionnaires


While we did not expect ALL zoos (see Table 1) would participate, we were initially
disheartened with so few responses. While understanding the limitations of such small
numbers, it was useful to get current information on the monitoring technologies in use
at these responding zoos and compare not only the differences between the systems in
use, but also what they are used for and their priority focus. Of the first three zoos, one
Monitoring Technologies for Animal Welfare 171

used the system ‘Provision’ with video cameras trained on the all the enclosures 24/7
for passive video monitoring. If any concerning behaviour was detected, the system was
switched to active monitoring, coupled with manual observation from the caretaker. The
zoo uses the technology to monitor health, behaviour, group interaction, aggression,
interaction with devices etc. For future improvements, the zoo would like to add cortisol
measuring to their data gathering to get a better reading of health and stress levels of
their animals.
By contrast, Zoos #2 and #3 used ZIMS/Species 360 on the mammal population
with the monitoring also used for assessments on all individuals. How often these assess-
ments occurred depending on the health and the age of the individuals with the more
fragile being assessed more often (e.g., four times per year versus twice a year). For
each individual animal, there was a hard copy information folder where any changes
were recorded. The data from the assessments was not entered directly into ZIMS, in
the case that more information or assessment would be needed from the supervisor or
vet. ZIMS/Species 360 systems catered for all #2 Zoo’s current needs but were not using
the system for behaviour analysis. #3’s priority is to gain information about animal
interactions, conspecifics, and mix species.
The fourth zoo is a major instigator in a wider problem-based solution process to
fit multiple scenarios. Their responses are comprehensive and detail their historical
and ongoing developmental solution-based approach. Their continual expansion of e.g.,
ZooMonitor functionality is beneficial to many zoos who due to their inclusive approach,
also work with this system. As a key-player in developing technology solutions in this
field it is useful to note their future trajectory with “non-invasive, accurate, automated
recording of behavioral and physiological changes in animals” and “automated coding of
the recorded information.” Something many zoos, farms and animal wildlife sanctuaries
are also looking to implement. In addition, multi-institutional sharing of data, also a
conservation imperative, would accelerate knowledge transfer and impact significantly
on improvements to animal welfare.

5 Discussion
We have identified developments and implementations in the reviewed literature Sect. 3,
versus deployed and future aspirations demonstrated in the zoo questionnaire responses.
Here, we combine advances and ambitions from these two sources and discuss limita-
tions, issues and impact, recommendations, and next steps forward. Overall, we note
a call for ‘non-invasive, accurate, automated recording of behavioral and physiological
changes in animals’ (#4 zoo).

5.1 Limitations

Since writing up the initial report and this article, we are aware other relevant articles
will have been published that we could not include. ‘Relatively’ new to the field, we
took guidance from Auckland University of Technology librarians and conservation
researchers on refining our keyword search terms.
172 A. Morrison and A. Novikova

The small number of zoos that responded compared to those we approached (see
Table 1) is a limitation of the study. Regardless, the responses reveal a diverse set of
priorities, focus, and implemented solutions and contribute to the larger discussion.
Not all monitoring technologies are suitable for use in a zoo environment. Drones
have a limited capacity with legal and institutional restrictions regarding aviation rules
and health and safety. Noise from drones has been identified as a serious disturbance risk
for some species in the wild with future aerial survey or monitoring work requiring strict
protocols to minimize disturbance risks [77]. Recent novel work determined optimal
flight altitudes for minimizing drone disturbance for wildlife using species audiograms
[78]. While Passive Acoustic Monitoring (PAM) [15, 79] is useful technology for sound
recording and automatic sound identification of animals in the wild [56, 80], use is
restricted by privacy issues for zoo environments.

5.2 Issues and Impact

Issues that impact zoo environments more generally include:

Wifi Coverage: The efficiency and capacity of Wi-Fi and the servers the systems run on
impacts what technology can be supported and what remote use is possible within zoos [7,
12]. Traditionally zoos’ focus was on providing ‘natural-enough’ enriched environments
for the animals and this still fits, but technologies did not play such an integral role. More
recent technology interventions require mitigation of technology integration into design
choices to augment the naturalistic landscape environments [50].

Public Institutions: Many zoos are supported by public monies and operate on public
institution networks or cloud-based services. These have standard restrictions on privacy
and data security, plus competition for resources is always a factor within the framework
of a large institutional model. Upgrading and adding new software and data analysis
systems may cause incompatibilities across entire systems, where numerous functions
and institutions need to operate securely within the one multi-serving system.

Public Facing: Keepers and zoos are aware of the need to keep up with the evolving
focus on animal welfare, successful breeding (especially for endangered species) and
education programs, as well as benefits from using enhanced technology systems. Most
important is the re-education of the public’s perception of the usefulness of technologies
to address animal welfare issues, particularly with e.g., visible wearable technologies for
this purpose. Often the public has a mixed perception and reception even of the role of
zoos, which requires Public Relations information management. This might take the form
of radio and online interviews, newspaper clips and social media promotion that focus on
animal welfare benefits. Zoo tours and information sessions already make up many zoo’s
routines and could include information on the benefits of such technologies. Research
studies that demonstrate positive welfare impacts from data gathered through wearables
and other monitoring technologies would support an informed public’s understanding
of these devices as having a positive impact on animal welfare. We also see this in
Sect. 3.4, Examples of Use in Zoos, where technologies bring animals and humans
‘closer together’ through webcam streaming, CCTV and video monitoring, camera traps
Monitoring Technologies for Animal Welfare 173

and VR technology [38–44]. These technologies feed information to the keepers and also
act to connect and bond the public to the animals whose lives they are able to witness.
Events such as the birth of an endangered species [39] provide leverage for updating
global technology coverage, promotes the conservation role of the modern zoo and
attracts visitors.

5.3 Recommendations and Next Steps Forward


Thermal Cover: Infrared coverage contributes non-invasive animal welfare monitoring.
Either in the form of lighting to increase evening image capture quality or with thermal
infrared cameras capacity for early detection of changes in the animals and their envi-
ronments [54–60]. Multiple sensors uptake environmental measurements non-invasively
[46], with temperature alerts [51], and infrared sensors [25]. Including heat emitted
by animals in data collection systems [62] ensures animal detection despite occlusion
by foliage, sleeping etc. Solar driven heat cameras can detect animals’ physiological
conditions 24/7.

Combining Systems: Adaptive modular systems would enable various sensor systems to
be combined, proving useful as would combining monitoring methods, e.g., mixing sleep
observation with cortisol readings (#1 zoo) [37]. Continuing modification and integration
of simple modular systems proves promising. For example, camera traps are mobile,
and motion activated—so they can be readily repositioned in response to changing
activity. However, they cannot be accessed remotely, need an easy-to-use interface,
extended recording capabilities night vision, audio (#3 zoo) and sensitivity to smaller-
bodied animals (#4 zoo). Adapting camera traps to manage Wi-Fi and adding a quality
interface significantly change capacity. Smaller mobile modular solutions can sometimes
be the most useful [31, 32]. Where existing systems can be updated, modified, and/or
coupled with several systems offers flexibility and expands data collection capabilities
[37]. Digital cameras to track cardiopulmonary readings offer basic health checks [52].
Wearable solutions such as a leg band or collar are possible for some animals [6–8, 14]
and would prove to be a less invasive solution. Zoo environments require ruggedized
solutions to operate in restrictive conditions.

Motion Tracking: Individual ID tracking requires high resolution cameras to allow


tracking of individuals, including the type of motion practiced and following that motion
within the full scene would also be ideal (#4 zoo) [26, 45–47]. Complete coverage and
adding motion detection to such systems would assist analysis, particularly when cou-
pled with alerts. Alerts would also prove useful with drones when large zoo animals
are transferred to sanctuary type settings, as happens with the progressive zoos. Those
animals still need monitoring for support, especially while in transition and adjusting to
their new circumstance [38, 39, 53–55, 59, 63, 72, 74, 75, 81]. Useful also for monitoring
in wildlife sanctuaries or for wildlife per se.

Remote Access: Secure robust Wi-Fi coverage throughout zoo environments can expand
viable coverage options and solutions [46]. In turn, this would provide remote access to
monitoring systems [39, 51], reducing manual labour significantly and ensuring systems
174 A. Morrison and A. Novikova

could adapt easily to the changing needs of animals synchronously. Looking through
a 24-h cycle of footage (even with sampling or fast forwarding) to find anomalies is
inefficient use of keeper time. A significant improvement would be to enable alert notifi-
cations in condition changes to be reported and received instantaneously [38, 39, 51]. A
system of remotely accessible in-situ transponders would enable keepers to note trends
vs. established stress baselines. We see this where precision farming captures only above
defined baseline parameters of ‘usual’ behaviour, customised levels are adaptable and
unexpected behaviours raise alarms [45, 47]. Autonomous systems to manage data col-
lection and analysis would also inform longer term welfare management strategies and
address welfare needs.

Data Analysis: Efficient data collection, digital tools and visualisation addressing indi-
vidual animals unique welfare needs and contexts are becoming increasingly available
[49]. Zoos and technology developers have recognised the need for an Artificial Intel-
ligence system or similar to analyse large amounts of data from multiple sources [76].
Combining data capture with automated coding of the recorded information would be
ideal (#4 zoo). In addition, a long-term archive [21, 22] would map improvement or
deterioration of the different species and sanction resources more effectively for future
strategic planning, as would multi-institutional sharing of data collection.

6 Conclusion
We investigate the status of contemporary monitoring technologies for animal welfare
in a review of the literature. With a focus on zoo environments, we included agricultural
and wild environment solutions, as knowledge and applications from those contexts may
be transferrable to zoo environment requirements. Responses from zoos working with
multiple species with distinctive needs reveal current and future requirements envisaged
for the animals in their care and for streamlining workload for the keeper teams. We
discuss those expanded desires and aspirations against findings from the literature to
scope future improvement solutions for monitoring welfare in zoo environments. We
contribute findings, recommendations, and next steps from these scenarios that can be
applied more broadly to other animal welfare contexts.

Acknowledgments. We thank the generous responses from the four zoos who remain anonymised
for this article but have read and agreed that their input be published. Additionally, we thank funding
from the AUT Summer Research Award from the Faculty of Design and Creative Technology,
without which this research would not be possible. We also thank all who reviewed early drafts of
this research, including anonymous FTC reviewers, for their helpful comments that have improved
this publication.

References
1. Wark, J.D., et al.: Monitoring the behavior and habitat use of animals to enhance welfare
using the ZooMonitor app. Anim. Behav. Cogn. 6(3), 158–167 (2019)
Monitoring Technologies for Animal Welfare 175

2. Hawkes, N.: Animal Care Monitoring Tool Coming to ZIMS (2016). https://www.species360.
org/2018/03/animal-care-monitoring/. Accessed 10 Apr 2022
3. Methley, A.M., Campbell, S., Chew-Graham, C., McNally, R., Cheraghi-Sohi, S.: PICO,
PICOS and SPIDER: a comparison study of specificity and sensitivity in three search tools
for qualitative systematic reviews. BMC Health Serv. Res. 14(1), 579 (2014)
4. Watters, J., Margulis, S., Atsalis, S.: Behavioral monitoring in zoos and aquariums: a tool for
guiding husbandry and directing research. Zoo Biol. 28(1), 35–48 (2009)
5. Camal, L., Kirtane, A., Blanco, T., Casas, R., Rossano, F., Aksanli, B.: A wearable device net-
work to track animal behavior and relationships in the wild. In: 2019 IEEE 10th Annual Ubiq-
uitous Computing, Electronics and Mobile Communication Conference, UEMCON 2019,
pp. 0198–0202 (2019)
6. Jukan, A., Masip-Bruin, X., Amla, N.: Smart computing and sensing technologies for animal
welfare. ACM Comput. Surv. 50(1), 1–27 (2017)
7. Kwong, K.H., et al.: Wireless sensor networks in agriculture: cattle monitoring for farming
industries. Prog. Electromagn. Res. Symp. 2, 1719–1723 (2009)
8. Boyd, I., Kato, A., Ropert-Coudert, Y.: Bio-logging science: sensing beyond the boundaries.
Mem. Natl. Inst. Polar Res. Spec. Issue 58, 1–14 (2004). Special issue (ISSN/ISBN: 03860744)
9. Cooke, S.: Biotelemetry and biologging in endangered species research and animal conser-
vation: relevance to regional, national, and IUCN Red List threat assessments. Endanger.
Species Res. 4(1–2), 165–185 (2008)
10. Hedenström, A., Lindström, Å.: Migration and flight strategies in animals: new insights
from tracking migratory journeys. In: Animal Movement Across Scales, pp. 73–89. Oxford
University Press (2014)
11. Block, B.A.: Physiological ecology in the 21st century: advancements in biologging science.
Integr. Comp. Biol. 45(2), 305–320 (2005)
12. Tan, S.-L., Ha Duy, N., Garcia-Guzman, J., Garcia-Orduna, F.: A wireless activity monitor-
ing system for monkey behavioural study. In: 2011 IEEE 15th International Symposium on
Consumer Electronics (ISCE), pp. 40–45 (2011)
13. Hindell, M., et al.: Circumpolar habitat use in the southern elephant seal: Implications for
foraging success and population trajectories. Ecosphere 7(5), e01213 (2016)
14. Leoni, J., Tanelli, M., Strada, S.C., Berger-Wolf, T.: Data-driven collaborative intelligent
system for automatic activities monitoring of wild animals. In: 2000 IEEE International
Conference on Human-Machine Systems (ICHMS), pp. 1–6 (2020)
15. Kalan, A.K., Mundry, R., Wagner, O.J.J., Heinicke, S., Boesch, C., Kühl, H.S.: Towards the
automated detection and occupancy estimation of primates using passive acoustic monitoring.
Ecol. Indic. 54, 217–226 (2015)
16. Pacheco, X.: How technology can transform wildlife conservation. In: Green Technologies
to Improve the Environment on Earth. IntechOpen (2018)
17. Pisto, K.: What do remote cameras reveal for carnivore researchers? Hike with us to find
out, 01 August 2019. https://blog.zoo.org/2019/08/what-do-remote-cameras-reveal-for.html.
Accessed 22 Feb 2022
18. Tobler, M., Zúñiga Hartley, A., Carrillo-Percastegui, S., Powell, G.: Spatiotemporal hierar-
chical modelling of species richness and occupancy using camera trap data. J. Appl. Ecol.
52(2), 413–421 (2015)
19. Tobler, M.: Camera base version 1.7 [computer program] (2015)
20. Bowler, M., Tobler, M., Endress, B., Gilmore, M., Anderson, M.: Estimating mammalian
species richness and occupancy in tropical forest canopies with arboreal camera traps. Remote
Sens. Ecol. Conserv. 3(3), 146–157 (2017)
21. He, Z., et al.: Visual informatics tools for supporting large-scale collaborative wildlife
monitoring with citizen scientists. IEEE Circuits Syst. Mag. 16(1), 73–86 (2016)
176 A. Morrison and A. Novikova

22. McShea, W.J., Forrester, T., Costello, R., He, Z., Kays, R.: Volunteer-run cameras as dis-
tributed sensors for macrosystem mammal research. Landsc. Ecol. 31(1), 55–66 (2015).
https://doi.org/10.1007/s10980-015-0262-9
23. McCarthy, M.S., et al.: An assessment of the efficacy of camera traps for studying demographic
composition and variation in chimpanzees (Pan troglodytes). Am. J. Primatol. 80(9), e22904
(2018)
24. Hogg, C., Fox, S., Pemberton, D., Belov, K.: Saving the Tasmanian Devil. CSIRO Publishing,
Melbourne (2019)
25. Rode, J., et al.: Population monitoring of snow leopards using camera trapping in Naryn State
Nature Reserve, Kyrgyzstan, between 2016 and 2019. Glob. Ecol. Conserv. 31, e01850 (2021)
26. Harvey, A.M., Morton, J.M., Ramp, D., Mellor, D.J., Russell, V., Chapple, R.S.: Use of
remote camera traps to evaluate animal-based welfare indicators in individual free-roaming
wild horses. Animals 11(7), 2101 (2021)
27. Palencia, P., Vicente, J., Soriguer, R.C., Acevedo, P.: Towards a best-practices guide for camera
trapping: assessing differences among camera trap models and settings under field conditions.
J. Zool. 316, 197–208 (2021)
28. Molloy, S.W.: A practical guide to using camera traps for wildlife monitoring in natural
resource management projects. Micronesian Megapode Project View Project Bird Ecology
and Conservation View Project (2018)
29. Bugler, K.: Monitoring the ‘original’ panda: impacts and outcomes of using infra-red trail
cameras on captive red panda (Ailurus fulgens) behaviour (2020)
30. Stewart, F.E.C., Fisher, J.T., Burton, A.C., Volpe, J.P.: Species occurrence data reflect the
magnitude of animal movements better than the proximity of animal space use. Ecosphere
9(2), e02112 (2018)
31. Macdonald, D.W., et al.: Multi-scale habitat modelling identifies spatial conservation priorities
for mainland clouded leopards (Neofelis nebulosa). Divers. Distrib. 25(10), 1639–1654 (2019)
32. Archangel Imaging: WAMCam | ESA Business Applications, August 2018. https://business.
esa.int/projects/wamcam-1. Accessed 22 Feb 2022
33. CCTV Camera World: Utilizing Cameras To Monitor Animals (2015). https://www.cctvca
meraworld.com/utilizing-cameras-to-monitor-animals.html. Accessed 22 Feb 2022
34. Young, S.: CCTV for wildlife monitoring : an introduction (2016)
35. Hansen, B.K., Fultz, A.L., Hopper, L.M., Ross, S.R.: An evaluation of video cameras for
collecting observational data on sanctuary-housed chimpanzees (Pan troglodytes). Zoo Biol.
37(3), 156–161 (2018)
36. Munita, C., Tadich, T.A., Briceño, C.: Comparison of 2 behavioral sampling methods to
establish a time budget in a captive female cheetah (Acinonyx jubatus). J. Vet. Behav. 13, 1–5
(2016)
37. Kalirathinam, U.K., Elangkovan, S., Kawi, J., Cabana, F.: Sleep monitoring of an Asian
elephant Elephas maximus calf at Night Safari, Singapore: testing whether sleep time is a
significant predictor of cortisol or the onset of positive elephant endotheliotropic herpesvirus
viraemia. Int. Zoo Yearb. 53(1), 128–137 (2019)
38. Chester Zoo and NW Security Group: Smart use of CCTV at Chester Zoo - Case Study. https://
www.nwsystemsgroup.com/sectors/visitor-attractions/chester-zoo. Accessed 22 Feb 2022
39. A. The Birmingham Zoo: High-resolution cameras enhance zoo security while collecting criti-
cal information on animal behaviour, July 2017. https://www.mobotix.com/sites/default/files/
2019-09/mx_CS_BirminghamZooUSA_en_2018-A4-web%2B.pdf. Accessed 22 Feb 2022
40. Fazio, J.M., Barthel, T., Freeman, E.W., Garlick-Ott, K., Scholle, A., Brown, J.L.: Utilizing
camera traps, closed circuit cameras and behavior observation software to monitor activ-
ity budgets, habitat use, and social interactions of zoo-housed Asian Elephants (Elephas
maximus). Animals 10(11), 2026 (2020)
Monitoring Technologies for Animal Welfare 177

41. Zoo Atlanta: Giant Panda Research: Giant Panda Maternal Behavior. https://zooatlanta.org/
project/giant-panda/. Accessed 22 Feb 2022
42. Brady, A., McMahon, B., Naulty, F.: Estimates of locomotion in Asian elephants Elephas
maximus using video monitoring at Dublin Zoo, Ireland. J. Zoo Aquar. Res. 9(2), 124–133
(2021)
43. Field, A., Miles, J., Field, Z.: Discovering Statistics Using SAS. SAGE Publications Ltd.,
London (2012)
44. The Times of India: Delhi zoo installs CCTV cameras to monitor animal behaviour |
Delhi News - Times of India (2020). https://timesofindia.indiatimes.com/city/delhi/delhi-zoo-
installs-cctv-cameras-to-monitor-animal-behaviour/articleshow/77051744.cms. Accessed 22
Feb 2022
45. Küster, S., Kardel, M., Ammer, S., Brünger, J., Koch, R., Traulsen, I.: Usage of computer
vision analysis for automatic detection of activity changes in sows during final gestation.
Comput. Electron. Agric. 169, 105177 (2020)
46. Rao, Y., Jiang, M., Wang, W., Zhang, W., Wang, R.: On-farm welfare monitoring system for
goats based on Internet of Things and machine learning. Int. J. Distrib. Sens. Netw. 16(7),
155014772094403 (2020)
47. Traulsen, I., Scheel, C., Auer, W., Burfeind, O., Krieter, J.: Using acceleration data to
automatically detect the onset of farrowing in sows. Sensors 18(2), 170 (2018)
48. Connors, M.J., Schauber, E.M., Forbes, A., Jones, C.G., Goodwin, B.J., Ostfeld, R.S.: Use
of track plates to quantify predation risk at small spatial scales. J. Mammal. 86(5), 991–996
(2005)
49. Orban, D.A., Soltis, J., Perkins, L., Mellen, J.D.: Sound at the zoo: using animal monitoring,
sound measurement, and noise reduction in zoo animal management. Zoo Biol. 36(3), 231–
236 (2017)
50. Webber, S., Carter, M., Smith, W., Vetere, F.: Interactive technology and human–animal
encounters at the zoo. Int. J. Hum. Comput. Stud. 98, 150–168 (2017)
51. Sensaphone Remote Monitoring Solutions: Case Studies | Remote Monitoring Solutions |
Sensaphone (2015). https://www.sensaphone.com/case-studies/2015/03/protecting-animals-
from-dangerous-temperatures-24-7. Accessed 22 Feb 2022
52. Al-Naji, A., Tao, Y., Smith, I., Chahl, J.: A pilot study for estimating the cardiopulmonary
signals of diverse exotic animals using a digital camera. Sens. (Switz.) 19(24), 5445 (2019)
53. Chahl, J.: Using digital cameras for basic health checks saves zoo animals from anesthet-
ics. PhysOrg, 13 February 2020. https://phys.org/news/2020-02-digital-cameras-basic-hea
lth-zoo.html. Accessed 22 Feb 2022
54. Ross, S.R., Lake, B.R., Fultz, A., Hopper, L.M.: An evaluation of thermal imaging as a welfare
monitoring tool for captive chimpanzees. Primates 62(6), 919–927 (2021)
55. Havens, K.J., Sharp, E.J.: Thermal Imaging Techniques to Survey and Monitor Animals in
the Wild: A Methodology. Academic Press, London (2015)
56. Lahoz-Monfort, J.J., Magrath, M.J.L.: A comprehensive overview of technologies for species
and habitat monitoring and conservation. Bioscience 71(10), 1038–1062 (2021)
57. McCafferty, D.J.: Applications of thermal imaging in avian science. Ibis (Lond. 1859) 155(1),
4–15 (2013)
58. Hristov, N.I., Betke, M., Kunz, T.H.: Applications of thermal infrared imaging for research
in aeroecology. Integr. Comp. Biol. 48(1), 50–59 (2008)
59. Cilulko, J., Janiszewski, P., Bogdaszewski, M., Szczygielska, E.: Infrared thermal imaging in
studies of wild animals. Eur. J. Wildl. Res. 59(1), 17–23 (2013)
60. Steen, K.A., Villa-Henriksen, A., Therkildsen, O.R., Green, O.: Automatic detection of
animals in mowing operations using thermal cameras. Sensors 12(6), 7587–7597 (2012)
178 A. Morrison and A. Novikova

61. Desholm, M.: Wind farm related mortality among avian migrants - a remote sensing study and
model analysis. Thesis/Dissertation, ETDEWEB. Danmarks Miljoeundersoegelser, Roskilde
(Denmark); Copenhagen Univ. (Denmark), Denmark (2006)
62. Lathlean, J., Seuront, L.: Infrared thermography in marine ecology: methods, previous
applications and future challenges. Mar. Ecol. Prog. Ser. 514, 263–277 (2014)
63. Piel, A.K., et al.: Noninvasive technologies for primate conservation in the 21st century. Int.
J. Primatol. 43, 133–167 (2021). https://doi.org/10.1007/s10764-021-00245-z
64. Mcmahon, B., Teeling, E., Höglund, J.: How and why should we implement genomics into
conservation? Evol. Appl. 7(9), 999–1007 (2014)
65. Hoban, S.M., et al.: Bringing genetic diversity to the forefront of conservation policy and
management. Conserv. Genet Resour 5, 593–598 (2013)
66. Gilardi, K., et al.: Best practice guidelines for health monitoring and disease control in great
ape populations (2015). https://doi.org/10.2305/IUCN.CH.2015.SSC-OP.56.en
67. Jain, M., Olsen, H.E., Paten, B., Akeson, M.: The Oxford Nanopore MinION: delivery of
nanopore sequencing to the genomics community. Genome Biol. 17(1), 1–11 (2016)
68. Loit, K., et al.: Relative performance of MinION (Oxford Nanopore Technologies) versus
sequel (Pacific Biosciences) third-generation sequencing instruments in identification of agri-
cultural and forest fungal pathogens. Appl. Environ. Microbiol. 85(21), 1–20, e01368-19
(2019). https://doi.org/10.1128/AEM.01368-19. PMID: 31444199; PMCID: PMC6803294
69. Baldi, P., La Porta, N.: Molecular approaches for low-cost point-of-care pathogen detection
in agriculture and forestry. Front. Plant Sci. 11, 1603 (2020)
70. Chang, J.J.M., Ip, Y.C.A., Ng, C.S.L., Huang, D.: Takeaways from mobile DNA barcoding
with BentoLab and MinION. Genes 11(10), 1121 (2020)
71. Krehenwinkel, H., Pomerantz, A., Prost, S.: Genetic biomonitoring and biodiversity assess-
ment using portable sequencing technologies: current uses and future directions. Genes
10(11), 858 (2019)
72. Bonnin, N., Van Andel, A., Kerby, J., Piel, A., Pintea, L., Wich, S.: Assessment of chimpanzee
nest detectability in drone-acquired images. Drones 2(2), 17 (2018)
73. van Hooff, J.A.R.A.M., Lukkenaar, B.: Captive chimpanzee takes down a drone: tool use
toward a flying object. Primates 56(4), 289–292 (2015). https://doi.org/10.1007/s10329-015-
0482-2
74. Wich, S.A., Koh, L.P.: Conservation Drones: Mapping and Monitoring Biodiversity, vol. 1.
Oxford University Press, Oxford (2018)
75. Koh, L.P., Wich, S.A.: Dawn of drone ecology: low-cost autonomous aerial vehicles for
conservation. Trop. Conserv. Sci. 5(2), 121–132 (2012)
76. Minh, T.C.: These new technologies could transform wildlife conservation, 04 Febru-
ary 2022. https://thehill.com/changing-america/sustainability/environment/592820-these-
new-technologies-could-transform-wildlife. Accessed 25 Feb 2022
77. Zhang, H., et al.: Thermal infrared imaging from drones can detect individuals and nocturnal
behavior of the world’s rarest primate. Glob. Ecol. Conserv. 23, e01101 (2020)
78. Duporge, I., et al.: Determination of optimal flight altitude to minimise acoustic drone dis-
turbance to wildlife using species audiograms. Methods Ecol. Evol. 12(11), 2196–2207
(2021)
79. Crunchant, A.S., Borchers, D., Kühl, H., Piel, A.: Listening and watching: do camera traps or
acoustic sensors more efficiently detect wild chimpanzees in an open habitat? Methods Ecol.
Evol. 11(4), 542–552 (2020)
80. Wrege, P.H., Rowland, E.D., Keen, S., Shiu, Y.: Acoustic monitoring for conservation in
tropical forests: examples from forest elephants. Methods Ecol. Evol. 8(10), 1292–1301
(2017)
81. Hyun, C.U., Park, M., Lee, W.Y.: Remotely piloted aircraft system (RPAS)-based wildlife
detection: a review and case studies in maritime Antarctica. Animals 10(12), 1–17 (2020)
Hierarchical Tucker Tensor Regression:
A Case Study on Classification

Quoc Tran Ngoc(B)

University of Science, John Von Neumann Institute, Vietnam National University


Ho Chi Minh City, Ho Chi Minh City, Vietnam
[email protected]

Abstract. Regression and Classification are two of the important prob-


lems in Machine Learning. Despite the differences, there is overlap
between these two problems, so some regression methods can be used
to solve a classification task and vice versa. In this paper, we try to
make a case study on classification using Hierarchical Tucker Regression
(HTR), which is a novel generalized linear tensor regression model based
on the Hierarchical Tucker Decomposition (HTD). This compact, flexible
regression model can work well with the data that takes the form of high-
dimensional arrays, also known as tensors, with just a smaller number
of parameters than traditional vector-based regression models. In addi-
tion, we have replaced the canonical dimension tree in the original HTR
with the front-to-back splitting dimension tree, which leads to the front-
to-back splitting Hierarchical Tucker Regression. This tree structure is a
special case of Hierarchical Tucker format that can present Tensor-Train
Decomposition (TTD), which provides a condition in determining the
general rank bounds for the Hierarchical Tucker rank of HTD based on
the rank of the TTD. We evaluate the effectiveness of our approach on
both simulated and real data.

Keywords: Regression · Classification · Hierarchical tucker


decomposition · Front-to-back hierarchical tucker regression

1 Introduction
Regression analysis is a statistical method to model the relationship between
dependent (target) and independent (predictor) variables with one or more inde-
pendent variables. Mathematically, it is the task of approximating a mapping
function from input variables to a continuous output variable. In machine learn-
ing, in some cases, a regression problem can be converted to a classification prob-
lem by converting the response variable into discrete buckets. Logistic regression,
softmax regression are two classical examples for applying regression to classifi-
cation problems. When the deep learning era comes, the powerful deep learning
models archive many remarkable results in the field of classification.
Nowadays, the strong development of science and technology has produced
multi-dimensional, complex structured, and large-sized data. These data types
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
K. Arai (Ed.): FTC 2022, LNNS 561, pp. 179–195, 2023.
https://doi.org/10.1007/978-3-031-18344-7_11
180 Q. T. Ngoc

Fig. 1. Example of a third-order tensor X ∈ RI1 ×I2 ×I3 where I1 = 3, I2 = 4, I3 = 5


and its fibers, slices

are often represented as multidimensional arrays and are collectively referred to


as Tensor. Classical vector-based regression models have some limitations when
working with Tensor data. They turn a large multidimensional array into a vector
and deal with a huge number of parameters. For example, an MRI image of size
256 × 256 × 256 needs 2563 = 16777261 regression parameters, which takes an
extremely expensive computation. Besides, the conversion to vector form can
damage the original spatial structure of the data, causing the loss of important
information in the original data. There is no need to argue about the effectiveness
of deep learning models, but what these models and vector-based models have
in common are a large number of parameters the high complexity. The above
limitations are the motivation to find a regression model that can interact with
Tensor structured data, reducing the number of parameters but still ensuring
data integrity, and Tensor Regression can be a good choice.
Over the past few years, tensor and tensor factorization have attracted much
attention. In Tensor Regression, several approaches have been proposed. For
Tensor-Scalar Regression, Guo et al. [9] present one of the very first Tensor
Regression models. By adopting linear and support vector machine, they pro-
posed Tensor Ridge Regression and Support Tensor Regression. In both cases,
the unknown tensor is learned in an iterative manner, where at each iteration,
using the Canonical (CANDECOMP)/Parallel Factors (PARAFAC) decomposi-
tion (CP) [4], the data from the input tensors are projected along with a certain
mode and the parameters that are associated to that mode are learned by solving
a linear problem of reduced dimensionality. Zhou et al. [6] proposed a class of
generalized linear tensor regression models, which is the combination of Gener-
alized Linear Model (GLM) [3] and CP decomposition [4]. These models try to
find the low-rank approximation of coefficient tensors when the scalar output is
assumed to belong to an exponential family distribution. The resulting model is
Hierarchical Tucker Tensor Regression: A Case Study on Classification 181

simple but not flexible. Tucker Tensor Regression, proposed by Li et al. [5], has
overcome the inflexibility of CP regression by replacing CP decomposition with
Tucker decomposition [4], as it can admit different ranks on different modes of
Tensor. Both of the above models achieved promising results in neuroimaging
analysis. For Tensor-Tensor Regression, a representative work is the High-order
Partial Least Squares Regression (HOPLS) [13], in which the matrix Partial
Least Squares (PLS) [11,12] is generalized to High-Order Partial Least Squares
(HOPLS) to handle the tensor-output situation. The principle behind HOPLS
is to factor both the input and output tensor into a sum of Tucker tensor, with a
constraint that the extracted latent variables capture the maximum covariance
between input and output tensor. Lock et al. [15] developed a Tensor-on-Tensor
regression model that can estimate a tensor while learning the CP decompo-
sition of the tensor input and the contracted tensor product of the input and
predictor tensor. Gahrooe et al. [16] propose a general multiple tensor-on-tensor
regression approach in which each set of input data and output measurements
are represented by tensors. This work is more general and overcomes the model
in Lock [15], as it can work when input tensor and output tensor have different
ranks. The use of Tucker decomposition instead of CP decomposition makes this
model more flexible and avoids overfitting due to the estimation of a large num-
ber of parameters. In addition, several works were proposed based on interesting
ideas. Kossaifi et al. [17] introduced the Tensor Regression Networks, which can
be seen as the combination of deep learning and tensor method. It reformat the
fully connected layer as coefficients of a tensor regression model and assumed this
tensor coefficient follows a low-rank Tucker format. This model takes advantage
of the information generated from the CNN layers while reducing the number
of parameters through Tucker decomposition. Or Zhao et al. [10] adapted the
Gaussian Process [14] to tensor and proposed the Tensor Gaussian Process to
solve nonlinear regression. All the above methods have gotten remarkable and
promising results in their specified tasks.
In this paper, we try to reuse the Hierarchical Tucker Regression (HTR),
proposed by Hou [7]. This model is similar to the CP regression [6] and Tucker
regression [5], but using the Hierarchical Tucker Decomposition (HTD) [18,19]
instead. HTR maintains the advantages of both the CP model and Tucker model
at the same time. It has the inadequacy of flexibility in the CP model and
avoids the exponential parameter growth with tensor order in the Tucker model
that makes HTR a highly compact, flexible, and scalable tensor regression. In
addition, we have modified the original block relaxation algorithm for HTR
based on the common in the tree structure between HTD and Tensor-Train
Decomposition (TTD) [20]. We make some numerical experiments to evaluate
the original and the adjusted HTR to see the effect of the difference in the tree
structure of the HTD. Finally, we perform a classification case study using HTR
and compare the performance with vector-based regressions on both simulated
and real data.
The paper is structured as follows. In Sect. 2, we want to present some useful
background and notations about tensor, as well as Hierarchical Tucker decom-
182 Q. T. Ngoc

position. In Sect. 3, we review the HTR and its block relaxation algorithm used
to estimate the parameters of this model. Our modifications are also presented
in this Section. Section 4 is about numerical experiments and a case study on
classification. The conclusion is presented in Sect. 5.

2 Background and Preliminaries


2.1 Tensor Overview and Notations
Tensor, also called a multi-way or multi-dimensional array, is the higher-
order generalization of vector and matrix. A higher-order tensor with D orders
is denoted as X ∈ RI1 ×I2 ×...×ID in Calligraphy letters. Each integer d ∈
{1, 2, ..., D} represents the d-th dimension of the Tensor and is often referred
to as mode or way. In addition, a matrix is denoted by boldface capital letter X
∈ RI×J and a vector is denoted by boldface low-case letter x ∈ RI . A d-mode
vector or d-mode fiber is a vector of size Id , obtained by varying the index of
mode d while keeping other indices fixed. Similarly, a slice of Tensor is a matrix
obtained by varying indices of two specified modes while keeping others fixed.
Figure 1 shows an example of a third-order tensor and its fibers, slices.
Matricization [4]: For a Tensor X ∈ RI1 ×I2 ×...×ID , define the subset t =
{t1 , t2 , ..., tk } ⊂ {1, 2, ..., D} and rt = {1, 2, ..., D} \ t, the matricization of Tensor
X along the subset t is the matrix obtained by merging all the modes t into the
row indices and all the modes in rt into the column indices, denoted as:
X(t) ∈ R(It1 It2 ...Itk )×(Irt1 Irt2 ...IrtD−k )
(1)
X(t) [(it1 , it2 , ..., itk ), (irt1 , irt2 , ..., irtD−k )] = X [i1 , i2 , ..., iD ]
In case t = {d}, d ∈ {1, 2, ..., D}, X(t) becomes the mode-d matrix [4] of
X.
The Kronecker product of two matrices X ∈ RI×J and Y ∈ YK×L ,
denoted by X ⊗ Y, produces a matrix of size IK × JL presented as follows:
⎡ ⎤
x11 Y x12 Y · · · x1J Y
⎢x21 Y x22 Y · · · x2J Y⎥
X⊗Y=⎢ . .. . . . ⎥ (2)
⎢ ⎥
⎣ .. . . .. ⎦
xI1 Y xI2 Y · · · xIJ Y
The kronecker product has an useful property called mixed-product. It
states that if X, Y, A and B are conformable matrices whose dimensions are
suitable to construct matrix product XY and AB, we have the following term:
(X ⊗ A)(Y ⊗ B) = XY ⊗ AB (3)
I1 ×I2 ×...×IP I1 ×I2 ×...×IQ
The Inner product of two Tensor X ∈ R and Y ∈ R
returns a scalar as follow:
w = X , Y = vec(X ), vec(Y) = vec(X )⊺ vec(Y)
I1 
I2 ID
  (4)
= ··· xi1 ,i2 ,...,iD yi1 ,i2 ,...,iD
i1 =1 i2 =1 iD =1
Hierarchical Tucker Tensor Regression: A Case Study on Classification 183

Fig. 2. Left: the balanced canonical dimension tree for HT format. Right: the front-
to-back splitting dimension Tree for TT formats.

The d-mode product between Tensor and matrix: For a Tensor X ∈


RI1 ×I2 ×...×ID , to perform the d-mode product between Tensor X and a matrix,
the matrix must be of the form U ∈ RK×Id , i.e. satisfy the condition that the
number of columns of the matrix is equal to the d-dimensional size of Tensor.
The formula performs the d-mode product between Tensor X ∈ RI1 ×I2 ×...×ID
and matrix U ∈ RK×Id is as follows:

Y = X ×d U ∈ RI1 ×I2 ×...×Id−1 ×K×Id+1 ×...×ID (5)

where the element y ∈ Y is calculated as:


Id

yi1 ,...,id−1 ,k,id+1 ,...,iD = xi1 ,...,id ,...,iD ukid
id =1

2.2 Hierarchical Tucker Decomposition


The Hierarchical Tucker Decomposition (HTD) [18,19], also called H-Tucker, is
a novel structured format that represents a higher-order tensor by a hierarchy
of matricizations based on subspace approximation in a multi-level fashion. The
key idea of HTD is making a recursive construction out of lower-dimensional
subspaces, i.e. recursively separating all the modes of Tensor that leads to a
dimension tree.

Definition 1: The dimension tree of a Tensor X ∈ RI1 ×I2 ×...×ID , is defined


with following conditions:

– Each node of the tree is a subset t ⊂ {1, 2, ..., D}.


– It is a finite tree with root node = {1, 2, ..., D}.
– Each singleton mode t = {d} corresponding to a leaf node, while each inner
node contains a set t which is a disjoint of two sets located in its two sub-
branches {tlef t = tl , tright = tr } that satisfying t = tl ∪ tr and tl ∩ tr = ∅
184 Q. T. Ngoc

We denote T as the dimension tree, L(T ) as the set of leaf nodes, N (T ) =


T \ L(T ) is the set of inner nodes and L is the highest level/ depth of the tree.

Definition 2: The set of all tensors X ∈ RI1 ×I2 ×...×ID of hierarchical rank at
most r, with dimension tree T , called H-Tucker tensors, are given by

H − T ucker((rt )t∈T ) = {rt = rank(X(t) ), ∀t ∈ T }

where X(t) is hierarchical matricization or unfolding of Tensor X ∈ RI1 ×I2 ×...×ID


corresponding to node t ⊂ T .

Depending on how we split the modes of vector, we will get different tree
structures. The most usual is balanced canonical dimension tree used by
Grasedyck et al. [19], which obtained by spitting the modes in the way: for
each parent node t = {m, ..., m + p}, its corresponding children are defined as
tl = {m, ..., m + ⌊p/2⌋} and tr = {m + ⌊p/2⌋ + 1, ..., m + p}. As usual, the modes
are balanced split to archive a balance canonical dimension tree [18,19]. Besides,
Lubich et al. [21] introduce a front-to-back splitting dimension tree, which is
obtained by splitting the modes in the way: for each parent node t = {m, .., p},
its corresponding children are defined as tl = {m} and tr = {m + 1, ..., p}. This
tree structure is a special case of Hierarchical Tucker format and Lubich et al.
[21] says it can be used to present Tensor-Train Decomposition. Figure 2 shows
the illustrations of both dimension trees.
For each matrix X(t) at each leaf node t, Grasedyck et al. [19] define a
factor matric Ut , where the number of columns of Ut is equal to the dimension
of X(t) , denoted by rt = rank(X(t) ). To present the relation of subspaces of
matricizations between the parent and children nodes, Hou et al. [7] introduce a
link between the corresponding basis factor matrices using a so-called transfer
matrix Bt via formula:
Ut = (Utl ⊗ Utr )Bt (6)
where Bt ∈ Rrtl rtr ×rt and rtl , rtr , rt are ht-rank at nodes tl , tr and t respectively.
The construction of H-Tucker proceeds by applying Eq. (6) recursively from the
leaf singletons to root of the dimension tree.

3 Method
In this section, we first provide a representation of Hierarchical Tucker Regression
(HTR) and the original block relaxation algorithm used to estimate the model. In
addition, we modify some calculation steps in this algorithm based on replacing
the balance canonical dimension tree by the front-to-back splitting dimension
tree in HTR.

3.1 Hierarchical Tucker Regression


Hou et al. [7] construct the Hierarchical Tucker Regression based on the combina-
tion of Generalized Linear Model (GLM) [2,3] and Hierarchical Tucker Decom-
Hierarchical Tucker Tensor Regression: A Case Study on Classification 185

position with balance canonical dimension tree. The GLM model for Tensor is
expressed through the formula:
g(µ) = η = γ ⊺ z + B, X  = γ ⊺ z + vec(B), vec(X ) (7)
where η = γ ⊺ z+B, X  is the systematic part, µ is the return expected value and
g(µ) is the link function of GLM. γ ∈ RI0 is the coefficient vector corresponding
to input vector z ∈ RI0 . The coefficient Tensor B is as the same order and
dimension as input Tensor X ∈ RI1 ×I2 ×...×ID .
Hierarchical Tucker Regression is the GLM model for Tensor with coefficient
Tensor B is assumed to follow a Hierarchical Tucker Decomposition. For the
root node, vec(B) = U root (rroot = 1) and by using the nestedness property in
(6), formula (7) can be rewritten as
g(µ) = η = γ ⊺ z + (Urootl ⊗ Urootr )Broot , vec(X ) (8)
Then recursively apply the nestedness property in (6) for all other inner nodes
t ∈ N (T ) by replacing Ut with its corresponding children Utl , Utr and transfer
matrix Bt until all the leaf nodes t ∈ L(T ) are reached. The mixed-product
property of the Kronecker product in (3) is also exploited in this procedure.
After all, the final resulting model is obtained in the form
g(µ) = η = γ ⊺ z + ( ⊗ Ut )( ⊗ Bt ⊗ ⊗ It )
t∈L(T )L t∈N (T )L−1 t∈L(T )L−1
(9)
...( ⊗ Bt )...(Broot ), vec(X )
t∈T l

where the level l of the tree T l = {t ∈ T : level(t) = l} with (1 ≤ l ≤ L) denotes


the set of all nodes in the l-th level of the dimension tree and {It ∈ Rrt ×rt } is the
identity matrix. Similar to Tucker Regression [5], the number of free parameters
in HTR model is
  
It rt + rtl rtr rt − rt2 (10)
t∈L(T ) t∈N (T ) t∈T \root

where the term t∈T \root rt2 is used for the propose of nonsingular transforma-
tion indeterminacy [5].

3.2 Estimation of Hierarchical Tucker Regression Model Parameters


The Maximum Likelihood Estimation (MLE) method is used to estimate
the parameters of HTR. Given a set of input-output data P = {(Xn ∈
RI1 ×I2 ×...×ID , yn ∈ R)}N
n=1 , where a random variable represents for yn values is
assumed to belong to an exponential family distribution according to the GLM
model. Let tensor coefficient B ∈ RI1 ×I2 ×...×ID follows HTD format, based on
MLE and the HTR model presented in (9), tensor coefficient B can be estimated
by solving the following optimization problem
max L({Ut }t∈L(T ) , {Bt }t∈N (T ) , γ , P ) (11)
θ=(Ut ,Bt ,γ
γ)
186 Q. T. Ngoc

Algorithm 1. BRA for Hierarchical Tucker Regression [7]


Input: Set of N input-output data {Xi ∈ RI1 ×I2 ×...×ID , zi ∈ RI0 , yi ∈ R}N i=1
Output: γ , {Ut }t∈L(T ) , {Bt }t∈N (T )
Initialize: random γ [0] , {Ut[0] }t∈L(T ) {Bt[0] }t∈N (T )
1: repeat
2: γ [m+1] = argmaxγ L(γ γ [m] , {Ut }[m] t [m]
t∈L(T ) , {B }t∈N (T ) )
3: for each t ∈ L(T ) do
′ ′ [m]
4: Ut[m+1]) = argmaxUt L(γ γ [m] , {Ut }[m+1]
t′ <t,t′ ∈L(T ) , U
t[m])
, {Bt }t′ ∈N (T ) )
5: end for
6: for l = L − 1, ..., 1 do
7: for each t ∈ N (T )l do

8: Bt[m+1]) = argmaxBt L(γ γ [m+1] , {Ut }[m+1]
t′ ∈L(T ) ,
′ [m+1] ′ [m]
{Bt }t′ <t,t′ ∈N (T ) ), Bt[m]) , {Bt }t′ >t,t′ ∈N (T ) )∗
9: end for
10: end for
11: until ||L(θ[m+1] ) − L(θ[m] )|| < ǫ

Hou et al. [7] noticed that in (9), the linear systematic part is only linear
in each Ut and each Bt separately. So they proposed an algorithm referred by
Block Relaxation Algorithm (BRA) [8] with the main idea is to alternately
update one basis factor (or transfer) matrix Ut (or Bt ) at a time while keep-
ing the rest of the matrices fixed. The update steps are performed iteratively
until the convergence criterion is reached. This algorithm breaks the simultane-
ous estimation of all parameters into a sequence of low dimensional parameter
optimizations using classical GLM. All steps of BRA for HTR are shown in
Algorithm 1. In general, there are two phases in this BRA: updating leaf nodes
and inner nodes. In first phase, for each factor matrix Ut , the inner product in
(9) can be rewritten as
′ ′
Ut JL ( ⊗ Ut )⊺ , X(t)  = Ut , X(t) ( ⊗ Ut )(JL )⊺  (12)
t′ ∈L(T )\t t′ ∈L(T )\t

t′ ′
where JL = ( ⊗ Bt ⊗ ⊗ I )...( ⊗ Bt )...(Broot ). Similarly, in sec-
t′ ∈N (T )L−1 t′ ∈L(T )L−1 t′ ∈T l
t
ond phrase, for each transfer matrix B in intermediate level l, the inner product
in (9) can be rewritten as
′ ′
Bt Kl ( ⊗ Bt )⊺ , Hl  = Bt , Hl ( ⊗ Bt )(Kl )⊺  (13)
t′ ∈T l \t t′ ∈T l \t
′ ′
where Hl = ( ⊗ Bt )⊺ ...( ⊗ Ut )⊺ vec(X )
t′ ∈T l+1 t′ ∈L(T )
′ ′
and Kl = ( ⊗ Bt )( ⊗ Bt )...(Broot ). We iteratively run this block updat-
t′ ∈T l−1 t′ ∈T l−2
ing procedure from bottom to top and from left to right along each level of T
until the log likelihood defined for classical GLM in (11) ceases to increase. The
regularization and proof for the convergence of Algorithm 1 can be found in [1]
[5].
Hierarchical Tucker Tensor Regression: A Case Study on Classification 187

3.3 The Front-to-Back Splitting Hierarchical Tucker Tensor


Regression

We make some modifications to Hierarchical Tucker Regression (HTR). More


specifically, we replace the balance canonical dimension tree in the origin HTR
with the front-to-back splitting dimension tree presented in Sect. 2. With this
adjustment, we apply the nestedness property in (6) and the mixed-product
property of Kronecker product in (3) like the original HTR, the model in (8) can
be rewritten as

g(µ) = η = γ ⊺ z + ( ⊗ Ut )( ′ ⊗ It ⊗ ⊗ Bt )
t∈L(T )L t ∈T \t t∈T L−1
(14)
...( ⊗ It ⊗ ⊗ Bt )...(Broot ), vec(X )
t′ ∈T \t t∈T l

as there is at most 1 inner node at each level of the front-to-back splitting dimen-
sion tree. We call the model in (14) the “front-to-back splitting” Hierarchical
Tucker Regression (FTB-HTR) to distinguish it from the original HTR. Because
it is just the change of the tree structure, the number of leaf nodes and inner
nodes does not change, so if we ignore the difference of the initial ht-rank, the
number of free parameters of FTB-HTR is the same as (10).
Similar to the original HTR model, the Maximum Likelihood Estima-
tion (MLE) method continues to be used to estimate the parameters of the
FTB-HTR model. And just like the HTR model in (9), the linear system-
atic part in the FTB-HTR model in (14) is also only linear in each Ut and
each Bt separately. So we can reuse Algorithm 1 but with changes in the
computation steps to solve the optimization problem in (14). Specifically, in
the updating leaf nodes phase (step 3 to step 5 in Algorithm 1), for each
factor matrix Ut , the inner product in (14) can be rewritten as same as
(12). There is a difference that the element JL in (12) can be rewritten as
′′ ′ ′′ ′
JL = ( ⊗ It ⊗ ⊗ Bt )...( ⊗ It ⊗ ⊗ Bt )...(Broot ). Then we are
t′′ ∈T \t′ t′ ∈T L−1 t′ ∈T \t′ t′ ∈T l
going to solve a GLM regression with Ut as the “parameter” and the term

X(t) ( ⊗ Ut )(JL )⊺ as the “predictor”. The number of parameters of this
t′ ∈L(T )\t
GLM regression is just It rt corresponding to the size of factor matrix Ut .
In the updating inner nodes phase (step 6 to step 10 in Algorithm 1), HTR
model in (9) and FTB-HTR model in (14) differ in how the transfer matrices
are calculated, so we rewrite (13) as
′ ′
Bt Kl ( ⊗ It )⊺ , Hl  = Bt , Hl ( ⊗ It )(Kl )⊺  (15)
t′ ∈T l \t t′ ∈T l \t

′′ ′ ′
where Hl = ( ⊗ It ⊗ ⊗ Bt )⊺ ...( ⊗ Ut )⊺ vec(X )
t′′ ∈T \t′ t′ ∈T l+1 t′ ∈L(T )
′′ ′
and Kl = ( ⊗ It ⊗ ⊗ Bt )...(Broot ). Like the first phase, we
t′′ ∈T \t′ t′ ∈T l−1
also solve a GLM regression with Bt as the “parameter” and the term
188 Q. T. Ngoc


Hl ( ⊗ It )(Kl )⊺ as the “predictor”. The number of parameters of this GLM
t′ ∈T l \t
regression is just only rtl rtr rt , corresponding to the size of transfer matrix Bt .
In summary, the parameter estimation of the two models HTR and FTB-HTR
are similar. The difference in tree structure leads to differences in systematic
parts in models and calculation steps. Algorithm 1 helps break the complicated
original GLM problem with a huge number of parameters into a sequence of sub
GLM problems which is simpler and has a smaller number of parameters.
For an input Tensor Xi ∈ RI1 ×I2 ×...×ID , The complexity of vector-based
method is O(I D ) while the complexity of FTB-HTR and HTR are O(DR3 +
DIR) where I = max{I d }D d=1 and R is rank of Hierarchical Tucker format. This
shows the ability to reduce the number of parameters as well as the flexibility
of the model when the efficiency and complexity are highly dependent on the
tree structure and user-defined ht-rank sets. In addition, by dividing a regres-
sion problem with a very large number of parameters into a series of regression
problems with a much smaller number of parameters, HTR and FTB-HTR help
avoid overfitting, especially is when the amount of data is limited.
The basis for us to make these changes comes from the work in Lubich et
al. [21] and Grasedyck et al. [22]. Concretely, Lubich et al. [21] give a dynami-
cal approximation of Hierarchical Tucker format and Tensor-Train format via a
suitable front-to-back splitting dimension tree, while Grasedyck et al. [22] intro-
duce the theories about the relationship between Hierarchical Tucker rank and
Tensor-Train rank. More specifically, Grasedyck et al. [22] says that the Hier-
arhical Tucker ranks depend strongly on the tree or permutation of modes and
tree structute, so there is no straight answer to the question which Hierarchical
Tucker format is ‘the best’. But in some case when we work with some task
relevant to Tensor Train, we will have a useful constrain. Grasedyck et al. [22]
have shown the ranks required for the Hierarchical Tucker format based on an
arbitrary tree can always be bounded by the ranks in the Tensor-Train format,
which may help finding better Hierarchical Tucker ranks.

4 Numerical Experiments

4.1 Parameter Estimation Ability

4.1.1 Identification of 2D Shapes


We redo the experiment identifying shapes on synthetic matrix experiment in
[5]. Suppose we have yi values generated from a normal distribution model whose
expected values are generated from the formula µ = γ ⊺ z + B, X . Each Ten-
sor Xi and vector zi are all generated from independent standard normals. The
coefficient vector γ has all elements of 1. Tensor coefficient B is a simulated
binary matrix, with values of 1 forming a shape and values of 0 serving as the
background of the image and assumed to follow Hierarchical Tucker Decomposi-
tion. Because B is a second-order Tensor (matrix), HTR and FTB-HTR have the
same tree structure. Both dimension trees have the same form as one root node
and its two children. They also have the same ht-rank (1 − r1 r2 ). In short, we
Hierarchical Tucker Tensor Regression: A Case Study on Classification 189

Fig. 3. Simulate the results of parameter estimation of the hierarchical tucker regres-
sion model on a simulated binary matrix of size 25 × 25 and the sample size 1000.

have a synthetic dataset {yi , Xi , zi }N


i=1 , the goal of experiment is based on this
dataset to estimate two parameters γ̄ and B̄ in the formula µ = γ̄ ⊺ z + B̄, X .
Tensor coefficient B has a size of 25 × 25, the sample size is 1000. We will run
with 5 sets of ht-rank {(1 − r1 r2 ) |r1 = r2 = i}5i=1 . Table 1 and Table 2 show the
difference between the value of the estimated parameter and the original coeffi-
cient in the experiment based on the Mean Square Error (MSE) metric. Figure
3 illustrates the results of the experiment, with the leftmost column being the
original images representing the coefficient Tensor B, followed by the correspond-
ing images representing the B̄ parameter estimates for each set ht-rank of the
Hierarchical Tucker dimension tree. The closer the value of B̄ is to B, the closer
the estimated image will be to the original image.

4.1.2 Identification of 3D Shapes


We keep doing the above experiment but with higher-order Tensor coefficient B
on not only synthetic data but also real data. Specifically, the selected order is 3.
When the order of tensor is greater than 2, HTR and FTB-HTR have different
tree structure and initial ht-rank, so we need to define different set up for each
model.
For third-order Tensor, we use the MNIST dataset, where each sample is a
handwriting digit image of size 28× 28. The dataset is normalized by dividing all
the samples by 255. We select and combine samples to create third-order Tensor
with dimensions 2 × 28 × 28, 3 × 28 × 28 and 5 × 28 × 28 respectively. We do
the same thing as the experiment in Sect. 4.1.1, but Tensor coefficient B is no
longer a matrix but a third-order Tensor created from the MNIST dataset. For
190 Q. T. Ngoc

Fig. 4. Simulate the results of parameter estimation of the hierarchical tucker regres-
sion model on third order tensor of size 2 × 28 × 28, 3 × 28 × 28 and 5 × 28 × 28 respec-
tively. Left: the orginal hierarchical tucker regression - right: The Front-to-Back Split-
ting hierarchical tucker regression.

HTR, the ht-rank is as form (1 − r12 r3 − r1 r2 ), we will run with 3 sets of ht-rank
(1-3, 2-2, 2), (1-4, 2-2, 2) and (1-4, 3-2, 2). For FTB-HTR, the ht-rank is as form
(1−r1 r23 −r2 r3 ), 3 selected sets of ht-rank are (1-2, 3-2, 2), (1-2, 4-2, 2) and (1-2,
4-2, 3). These sets of ht-rank guarantee the same number of parameters for both
models. Tensor B is consistent and the sample size is 1500 for both two models.
Table 3 and Table 4 show the Mean Square Error (MSE) between the value
of the estimated parameter and the original coefficient of HTR and FTB-HTR
models corresponding to each dimension. Figure 4 illustrates the results of the
experiment, with the leftmost column being the original shapes representing the
coefficient Tensor B, followed by the corresponding shapes representing the B̄
parameter estimates for each set ht-rank. The closer the estimated value of B̄ is
to B, the closer the estimated shape will be to the original one. This experiment
and the experiment in Sect. 4.1.1 show the parameter estimation ability of the
two models HTR and FTB-HTR. It can be seen that under the same conditions,
the accuracy of the estimated parameter of each model depends on the tree
structure as well as the ht-rank set.
Hierarchical Tucker Tensor Regression: A Case Study on Classification 191

Table 1. MSE of estimated vector γ̄ and original vector coefficient γ on simulated


dataset

Synthetic shape (B) M SE(γγ ) on sets of ht-rank


(1-1, 1) (1-2, 2) (1-3, 3) (1-4, 4) (1-5, 5)
Rectangle 0.00000 0.00000 0.00000 0.00000 0.00000
Swiss 0.02262 0.00000 0.00000 0.00000 0.00000
Circle 0.01511 0.00243 0.00191 0.00121 0.00118
Triangle 0.02902 0.02321 0.01341 0.00979 0.00658
Ring 0.02504 0.00563 0.00183 0.00102 0.00069

Table 2. MSE of estimated tensor B̄ and original tensor coefficient B on simulated


dataset

Synthetic shape (B) M SE(B) on sets of ht-rank


(1-1, 1) (1-2, 2) (1-3, 3) (1-4, 4) (1-5, 5)
Rectangle 0.00000 0.00000 0.00000 0.00000 0.00000
Swiss 0.06667 0.00000 0.00000 0.00000 0.00000
Circle 0.02819 0.01366 0.00658 0.00339 0.00172
Triangle 0.02529 0.01331 0.00908 0.00671 0.00508
Ring 0.08205 0.03391 0.01174 0.00531 0.00531

Table 3. MSE of estimated tensor B̄ and original tensor coefficient B of HTR on


MNIST dataset

Dimension HTR
(1-3, 2-2, 2) (1-4, 2-2, 2) (1-4, 3-2, 2)
2 × 28 × 28 0.06358 0.10344 0.07696
3 × 28 × 28 0.05964 0.05315 0.05910
5 × 28 × 28 0.12313 0.08294 0.07962

Table 4. MSE of estimated tensor B̄ and original tensor coefficient B of FTB-HTR on


MNIST dataset

Dimension FTB-HTR
(1-2, 3-2, 2) (1-2, 4-2, 2) (1-2, 4-2, 3)
2 × 28 × 28 0.10281 0.10344 0.10344
3 × 28 × 28 0.06690 0.06871 0.06871
5 × 28 × 28 0.10876 0.09805 0.09802
192 Q. T. Ngoc

4.2 Case Study on Classification


4.2.1 Binary Classification
A binary classification problem can be solved by Logistic Regression, we want
to try a different solution by using Hierarchical Tucker Regression. We use the
MNIST dataset and normalize it by dividing all samples in this dataset by 255.
Then we choose a pair of labels from 10 handwriting digits (0–9) and images
corresponding to these labels. Tensor coefficient B is a second-order Tensor of
size 28 × 28 because the tensor input is an image of size 28 × 28. Similar to
the experiment in Sect. 4.1.1, B is a second-order Tensor (matrix) so HTR and
FTB-HTR have the same tree structure and ht-rank and we can assume the
two models are the same. We run HRT and FTB-HTR with 3 sets of ht-rank
{(1 − r1 r2 ) |r1 = r2 = i}5i=3 based on good estimation results in experiment in
Sect. 4.1.1. Except for the standard pair of labels (0,1), we try to select pairs
of labels that are close to each other in order to create more difficulty for the
models. For each label, we use 4000 samples for the train set and 800 samples
for the test set. We also build a standard vector-based Logistic Regression as a
premise for comparison. The results of this experiment is shown in Table 5.
It can be seen that Logistic Regression has higher prediction accuracy. The
number of parameters of Logistic Regression is 28 × 28 = 784 in this case,
which is a number that the vector-based regression model still works well. That
does not mean HTR and FTB-HRT are not good. They still have an average
prediction accuracy rate of more than 94% and the difference from the Logistic
model is acceptable in the context that the number of parameters to be processed
is much less. The number of parameters of HTR/FTB-HTR, calculated by the
formula (13), corresponds to their ht-rank are 177, 240 and 305, which is quite
smaller than 784. This shows the potential of model HTR/FTB-HTR for binary
classification.
Table 5. Accuracy for binary classification on MNIST dataset of HTR/FTB-HTR and
logistic regression

Labels Logistic Regression HTR/FTB-HTR


(1-3, 3) (1-4, 4) (1-5, 5)
0-1 0.999375 0.9975 0.9975 0.9975
2-4 0.980625 0.96 0.970625 0.976875
3-7 0.980625 0.946875 0.954375 0.9625
5-6 0.98 0.944375 0.949375 0.95625
8-9 0.97125 0.953125 0.95625 0.95875

4.2.2 Multiclass Classification


In our opinion, the Hierarchical Tucker Regression model is not suitable for mul-
ticlass classification because it is a Tensor-Scalar regression model. The output
scalar of HTR/FTB-HTR can be used to classify two classes, but when the num-
ber of classes is bigger than 2, we need a vector. To apply the HTR/FTB-HTR
Hierarchical Tucker Tensor Regression: A Case Study on Classification 193

Table 6. Evaluate One-vs-Rest Logistic Regression (OvRLoR), softmax regression and


HTR/FTB-HTR for multiclass classification on MNIST dataset

Criterions OvRLoR Softmax regression HTR/FTB-HTR


(1-3, 3) (1-4, 4) (1-5, 5)
Accuracy 0.8912 0.9275 0.7326 0.7759 0.8291
Free param 7480 7480 1770 2400 3050

to multiclass classification problem, we use a “naive” idea that using one-vs-


rest mechanism. The one-vs-rest strategy splits a multi-class classification into
one binary classification problem per class. We reuse the MNIST data for this
experiment. As experiment in Sect. 4.1.1, HTR and FTB-HTR have the same
tree structure, ht-rank and run with 3 sets of ht-rank {(1−r1 r2 ) |r1 = r2 = i}5i=3 .
For each label, we use 2000 samples for the train set and 500 samples for the
test set. We also build a standard one-vs-rest Logistic Regression (OvRLoR)
and a standard vector-based Softmax Regression as a premise for comparison.
The results of this experiment are shown in Table 6.
It can be seen that OvRLoR and Softmax Regression has a higher prediction
accuracy. Unlike the binary classification problem, the difference is significant,
which shows that the HTR/FTB-HTR is not suitable for the multiclass classifica-
tion problem as we thought. There may be a different approach than combining
one-vs-rest strategy and HTR/FTB-HTR (as well as other scalar output mod-
els) to get better results but we think there will be a difference in performance
and stability compared with vector output models. The good point here is that
the number of parameters to be processed by HTR/FTB-HTR is much smaller
than OvRLoR and Softmax Regression.

5 Conclusion
In this paper, we have reviewed the Tensor-Scalar Hierarchical Tucker Regres-
sion based on Hierarchical Tucker Decomposition. Our contribution is shown by
proposing the front-to-back splitting Hierarchical Tucker Regression by replac-
ing the balanced canonical dimension tree in the original Hierarchical Tucker
Regression with the front-to-back slitting dimension tree. This tree structure
can be viewed as a representation of the conversion between Hierarchical Tucker
Decomposition and Tensor-Train Decomposition, which provides one more useful
condition when initializing the tree structure. The numerical experiments show
the flexibility and efficient parameter estimation of both two models. We also
try to apply these Tensor regression models on classification problems instead of
vector-based regression models. Experimental results show the effectiveness of
HTR and FTB-HTR on the binary classification problem when achieving high
accuracy while the number of parameters is much smaller than the vector-based
models. The results on multiclass classification are medium, this is due to the fact
194 Q. T. Ngoc

that HTR and FTB-HTR are Tensor-Scalar models, which makes them unsuit-
able for multiclass problems. For the future works, we are going to address this
limitation by extending the Hierarchical Tucker Regression to a Tensor-Tensor
regression model. Replacing the GLM part in Hierarchical Tucker Regression
with the Vector Generalized Linear Model [23] might be a good approach.

Acknowledgment. Quoc Tran Ngoc was funded by Vingroup Joint Stock Company
and supported by the Domestic Master/PhD Scholarship Programme of Vingroup
Innovation Foundation (VINIF), Vingroup Big Data Institute (VINBIGDATA), code
VINIF.2020.ThS.JVN.11

References
1. Hou, M.: Tensor-based Regression Models and Applications. Tensor-based Regres-
sion Models and Applications (2017)
2. Nelder, J.A., Baker, J.: Generalized linear models. Wiley Online Library (1972)
3. McCullagh, P., Nelder, J.A.: Generalized linear models. Chapman and Hall, Mono-
graphs on statistics and applied, London (1983)
4. Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Review
(2009)
5. Li, X., Zhou, H., Xu, D., Li, L.: Tucker tensor regression and neuroimaging analysis.
Stat. Biosci. (2018)
6. Zhou, H., Li, L., Zhu, H.: Tensor regression with applications in neuroimaging. J.
Am. Stat. Assoc. (2013)
7. Hou, M., Chaib-draa, B.: Hierarchical tucker tensor regression: application to brain
imaging data analysis. In: IEEE International Conference on Image Processing
(ICIP 2015) (2015)
8. De Leeuw, J.: Block-relaxation algorithms in statistics. Springer, In Information
Systems and Data Analysis (1994)
9. Guo, W., Kotsia, I., Patras, I.: Tensor learning for regression. IEEE Trans. Image
Process. 21(2), 816–827 (2012)
10. Zhao, Q., Zhou, G., Adali, T., Zhang, L., Cichocki: Kernelization of tensorbased
models for multiway data analysis: Processing of multidimensional structured data.
IEEE Signal Process. Mag. 30(4), 137–148 (2013)
11. Abdi, H.: Partial least squares regression and projection on latent structure regres-
sion (PLS regression). Wiley Interdisciplinary Rev. Comput. Stat. 2(1), 97–106
(2010)
12. Wold, S., Ruhe, A., Wold, H., Dunn, III, W.: The collinearity problem in linear
regression. the partial least squares (PLS) approach to generalized inverses. SIAM
J. Sci. Stat. Comput. 5(3), 735–743 (1984)
13. Zhao, Q., et al.: Higher order partial least squares (HOPLS): a generalized multilin-
ear regression method. IEEE Trans. Pattern Anal. Mach. Intell. 35(7), 1660–1673
(2013)
14. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT
Press (2005)
15. Lock: Tensor-on-Tensor Regression (2017). arXiv no. 1701.01037
16. Gahrooei, M.R., Yan, H., Paynabar, K., Shi, J.: Multiple tensor-on-tensor regres-
sion: an approach for modeling processes with heterogeneous sources of data. Tech-
nometrics (2020). https://doi.org/10.1080/00401706.2019.1708463
Hierarchical Tucker Tensor Regression: A Case Study on Classification 195

17. Kossaifi, J., Lipton, Z.C., Khanna, A., Furlanello, T., Anandkumar, A.: Tensor
regression networks. arXiv preprint arXiv:1707.08308 (2017)
18. Hackbusch, W., Kühn, S.: A new scheme for the tensor representation. J. Fourier
Anal. Appl. 15(5), 706–722 (2009)
19. Grasedyck, L.: Hierarchical singular value decomposition of tensors. SIAM J.
Matrix Anal. Appl. 31(4), 2029–2054 (2010)
20. Oseledets, I.V.: Tensor-train decomposition. SIAM J. Sci. Comput. 33(5), 2295–
2317 (2011)
21. Lubich, Ch., Rohwedder, T., Schneider, R., Vandereycken, B.: Dynamical approx-
imation of hierarchical Tucker and tensor-train tensors. In SIAM J. Matrix Anal.
Appl. 34(2), 470–494 (2013)
22. Grasedyck, L., Hackbusch, W.: An Introduction to Hierarchical (H-) Rank and
TT-Rank of tensors with examples. Comput. Methods Appl. Math. 11(3), 291–
304 (2011). https://doi.org/10.2478/cmam
23. Yee, T.W.: Vector Generalized Linear and Additive Models: With an Implementa-
tion in R. Springer, New York (2015)
Introducing Database Normal Forms
to Students: A Comparison Between
Theory-First and Practice-First
Educational Approaches

Dakota C. Cookenmaster, Jacob A. Bahn, and Germán H. Alférez(B)

School of Computing, Southern Adventist University,


PO Box 370, Collegedale, TN 37315-0370, USA
{dakotacookenmaster,jacobabahn,harveya}@southern.edu

Abstract. Educating the future generation of computer scientists and


engineers often proves to be challenging, and how the content is intro-
duced plays a large role in how well students will learn. One of the
primary challenges that instructors face is regarding the introduction of
important theory to students, both to show its essential nature to the
field as well as its practicality. This paper analyzes two pedagogical meth-
ods for the instruction of normal forms in database management systems,
a mandatory topic in any database course. The first of these methods
is a theory-based approach that relies on written works (i.e., theory) to
introduce the concept. The second of these focuses on a practice-based
approach (i.e., practice) which aligns with the normal form as students
implement a database schema. Through a small study, it was determined
that most students have a strong predisposition to theory-first education,
though students seemed to prefer the practice-based approach more than
the theory-first approach. This paper compares the two methodologies,
given this insight, and advises the use of an appropriate method for
future educators.

Keywords: Pedagogy · Databases · Normal forms · Theory-first


education · Practice-first education · Experiential learning

1 Introduction

Approaches to education have been pondered, adopted, reconsidered, and reap-


plied with substantial vigor over the last millennium. The academic community
has proven itself to be dedicated to providing students with educational meth-
ods and approaches that encourage true learning and problem-solving over rote
memorization as shown in [4]. In this way, it is expected that these students
will be true pioneers in their respective fields and continue the work we fight so
hard to produce today. Given this strong and ardent dedication to true educa-
tion, it is required that the community consistently reassess their instructional
methods with the intention of auditing their practices, as emphasised in [5]. The
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
K. Arai (Ed.): FTC 2022, LNNS 561, pp. 196–205, 2023.
https://doi.org/10.1007/978-3-031-18344-7_12
Theory-First vs. Practice-First Educational Approaches 197

hopeful result is a cohesive and comprehensive learning approach that adapts


and changes as necessary to best fit the needs of the students.
One of the most challenging topics for students to learn in Computer Science
(regarded in [9] as a notoriously challenging subject area) - specifically database
systems - is the concept of a “Normal Form.” For the reader unfamiliar with
database architectural patterns, database schema architects frequently follow a
tiered system of “Normal Forms,” or rather certain cascading rules designed to
protect data integrity and prevent data duplication in database schema design
[7,8]. The theoretical terminology introduced as part of these forms is often
convoluted at worst and confusing at best. Examples of this can be seen in [5].
Students seeking to learn these normal form definitions, with the explicit
goal of using them for real-world design projects, are often stumped by the
theoretical verbiage and thus hesitate to apply the knowledge in their designs.
A question thus arises: “Is the theory the problem, the instruction of the theory,
or the ordering of when theory and practice are encountered?” This is a case-
study paper that hopes to provide insight into how instructors should consider
the education of normalization in databases, with specific emphasis on student
comfort and retention.
Section 2 of this paper seeks to provide some historical background on
practice-based pedagogy in computer science and elsewhere. Section 3 describes
the concepts behind the practice-based approach we encourage educators to
use. Section 3 emphasizes the learning method proposed and our hypothesis for
it. Section 4 explains the steps taken to develop and execute a small study on
the hypothesis. Section 5 details the result of the study. Section 6 discusses the
results found in the previous section. Lastly, Sect. 7 presents conclusions and
future work.

2 Literature Review
Advances in computer science tied with those in education have resulted in a
nearly ubiquitous positive change for students. In this way, in [9] researchers rec-
ognized that practical student involvement in computer science education was
fundamentally essential for student growth. This study highlighted the impor-
tance of implementing the principles of experiential learning in order to obtain
significant progress. Even between disciplines, the authors in [4] recognize the
helpful nature of experience-based pedagogy. In their research, the authors found
that this philosophy, as opposed to solely theory, assisted student farmers in
Slovak towns and was, by and large, a success. Its methods are gradually being
adopted by the general public and the broader academic community.
Relating to the concept of engineering, practice-based learning (PBL) can
successfully be applied in engineering programs, as mentioned in [6]. Also, the
focus should be placed on application and integration of knowledge rather than
on knowledge acquisition. In [5], the authors state that teachers who underwent
practice-based teacher professional development training were able to more read-
ily adapt new long-lasting positive pedagogical changes, aiding in the theory that
experiential learning is effective. The authors of [3] found that practice-based
198 D. C. Cookenmaster et al.

pedagogy was able to greatly enhance teaching models. With the concept of
web design, in [2], a controlled study found that experiential learning was suited
for complicated topics and instructors were able to focus more on problem-
solving activities. Recent research thus corroborates the idea that integrating a
strong emphasis on experiential and practice-based learning is largely positive
for students and can be utilized to assist in the education of challenging topics,
particularly those in computer science.
In the context of databases, normal forms are a challenging collection of topics
related to database schema design; it is a vital component for ensuring data
integrity and limiting data redundancy. The implementation challenge arises
because database designers have to make sure that non prime attributes do not
depend on a pure subset of the database table’s candidate key and also do not
depend on another non-prime attribute. The only way to check that the database
meets these requirements is by manually parsing through the database. This
means that a database schema needs to be constantly checked and normalized
if new tables are added to the schema [8].
Although there are positive results when instructors and institutions engaged
in practice-based pedagogy, we discovered that there is an absence of sufficient
research in the area of database learning methods, and particularly Normal
Forms. In fact, although [8] and [7] discuss methods for normalization, there
is no insight regarding the way normalization should be taught. Furthermore, [9]
and [6] argue for the use of experiential- and practice-based learning in computer
science and engineering, but the study of databases is not explicitly mentioned.

3 Theoretical Framework
Figure 1 explains the general concept behind this research, namely that practice-
based pedagogy should diverge from theory-based pedagogy through the explicit
inclusion of practice at the forefront of the learning experience. In the practice-
based approach, an instructor facilitates the conversation about a specific prob-
lem and students brainstorm and come up with solutions. The practice-based
approach has been applied in student-centered pedagogies, such as project-based
learning and problem-based learning [1,10]. Theory-based pedagogy often relies
firstly on theory and may potentially exclude real-world examples entirely. An
anecdotal example of this might be a simple math problem for children where
an individual walks to the supermarket to purchase some 200 watermelons.
Theory-only education will never suffice in preparing students for work in
the real world, especially considering the fact that theory-based education often
acknowledges little to no practical limitations. For database systems, this is
wholly unhelpful, as all database systems have physical constraints, policy con-
straints, and real-world data requirements. This paper does not seek to argue the
merit of theory, which is most certainly a necessity in the educational process.
Rather, this paper seeks to encourage real-world practical examples in education
first, which come to eventually rest squarely on solid, theoretical foundations.
Theory-First vs. Practice-First Educational Approaches 199

Fig. 1. Conceptual map describing different pedagogical approaches and some related
examples.

4 Methodology

Given the pressure by the academic community to move towards a practice-


based approach, and more specifically in computer science education, it would
stand to reason that other challenging computing topics would also benefit from
experiential education. In particular, this paper analyzes the pedagogy behind
database schema design and development using normal forms. While research
conducted by authors in [7] has been undertaken to assist in making normal form
theory easier to understand, this paper hypothesizes that - given an alternative
approach to standard theory-based education - students will be even more read-
ily able to understand why normal forms and their implications and impacts
are essential, despite the proven complexity in making such database schema
determinations outright, as expressed in [8].
This section covers the steps taken to test the aforementioned hypothesis.
These steps included preparing the experiment, creating a survey, conducting
the experiment, and gathering the results.

4.1 Preparing the Experiment

The preparation of this experiment consisted of a strong consideration of the


audience, the development of optimal lecture methodologies, and focus on a
relatively challenging topic in the database sciences that was primarily unknown
to the audience. The experiment was conducted in the undergraduate Database
Management Systems class at Southern Adventist University in Collegedale,
Tennessee, USA.

1. Audience: In this experiment, the audience was comprised entirely of univer-


sity students enrolled in an introductory database course. These students had
general knowledge about database design and modelling (concepts including,
but not limited to: tables, tuples, attributes, primary keys, candidate keys,
200 D. C. Cookenmaster et al.

etc.), but they had not yet been introduced to database normal forms nor
concepts related to deduplication, consistency, or isolation.
2. Determining Topic: The research topic focused on educating students in the
normalization of data in a relational database system, with specific emphasis
on 3NF.
3. Lecture Methodology: The lecture methodology utilized in this research
reflects a comparison of theory-based learning and practice-based learning.
In the theory-based pedagogy conducted, theoretical concepts and terminol-
ogy were introduced first with generic (and potentially unrelated) examples
provided to explain differences between the different normal forms, and how to
identify them. Little to no emphasis was placed on walking students through
a real-world example. Rather, this pedagogical approach focused on key terms
and rote memorization over application. The practice-based learning pre-
sented in this paper, on the other hand, began with a plausible real-world
example. Students were encouraged to build new functionality into a system
given a series of requirements, and eventually the students normalized the
data in order to prevent issues that arose. Theory is not completely avoided in
this approach as terminology must still be introduced. However, the emphasis
focused mostly on developing a solution to the problem.

4.2 Creating a Survey

One of the important things taken into consideration during the creation of
the post-lecture survey was the intentional exclusion of leading questions (i.e.,
questions that illicit a specific answer). The survey initially inquires about how
comfortable the participant was with 3NF before the experiment, followed by
their level of comfort post-experiment. Also asked were questions relating to the
student’s comfort with the instructional methodology. All questions utilized fall
on an integral scale from 1 to 5.

4.3 Conducting the Experiment

A group of 16 students between the ages of 18 and 23 were broken into two
smaller groups, which this study labels Section A and Section B, respectively.
The group was split into two due to their being two different teaching method-
ologies. Section A was comprised of 5 students, and Section B was comprised of
11 students; for a detailed breakdown, see Fig. 2. The groups varied in size due
to students arriving late to the class period. Each group was given a lecture,
approximately 20 min long, regarding the concept of Database Normal Forms.
After the lectures were complete, the students were asked to complete a survey
regarding their level of comfort with normal forms both before and after the
lecture, their perceptions on the instructional methodology provided, and their
opinions on when theory should be introduced in the classroom.
Section A was provided with a Theory-first approach to normal forms. Def-
initions on relational database terminology were provided upfront, followed by
Theory-First vs. Practice-First Educational Approaches 201

(a) Ages for Section A (b) Ages for Section B

(c) Academic Majors for Section A (d) Academic Majors for Section B

(e) Grade Levels for Section A (f) Grade Levels for Section B

Fig. 2. Participant data shows varied ages, majors, and grade levels among survey
groups.

specific, pointed examples of designs that violated the normal form being pre-
sented. Finally, the students thoroughly observed an example of a design which
violated Third Normal Form and were shown the proper way to correct the
violation.
Section B was provided with a practice-first approach to normal forms. The
lecture began with a real world example where the students brainstormed how to
extend a database schema given a series of practical requirements. Following this,
the students were provided examples of database designs that violated normal
202 D. C. Cookenmaster et al.

forms. Throughout these steps, they were asked targeted questions related to
normal forms and pressed to update the real world example to be 3NF-compliant.

4.4 Gathering the Results


The results-gathering took place via an online survey, conducted on the Google
Forms platform. Students were asked various questions about their level of com-
fort with normal forms before and after the lecture as well as their beliefs on
when theory should be introduced in the classroom. Other specific information
gathered includes, name, age, grade level, and academic major.

5 Results
A series of questions were asked in a post-lecture survey, which were used to help
determine whether or not students were more or less comfortable with database
normal forms after learning from a particular instructional methodology.

5.1 Level of Comfort Before the Lecture


Students were asked, on a scale from 1–5, how comfortable they were with the
concept of database normal forms before the lecture. A student selecting 1 would
indicate low comfort, and a 5 would indicate high comfort. Students in Section
A were, on average, indifferent to the concept of database normal forms prior
to the lecture. The calculated average was 3.2/5, which rounded to the nearest
integer would be: 3 - Indifferent. Students in Section B were, on average,
indifferent to the concept of database normal forms prior to the lecture. The
calculated average was 2.7/5, which rounded to the nearest integer would be: 3
- Indifferent.

5.2 Level of Personal Comfort After the Lecture


Students were asked, on a scale from 1–5, how comfortable they were with the
concept of database normal forms after the lecture, specifically how comfortable
they would feel using 3NF for their own personal schema designs. A student
selecting a 1 would indicate low comfort, and a 5 would indicate high comfort.
Students in Section A were, on average, indifferent to the concept of database
normal forms after the lecture. The calculated average was 3.4/5, which rounded
to the nearest integer would be: 3 - Indifferent. While the overall result can
be reduced to a 3, it is important to note that the students felt, overall, more
comfortable than they did before the lecture. Students in Section B were, on
average, indifferent to the concept of database normal forms after the lecture.
The calculated average was 3.1/5, which rounded to the nearest integer would be:
3 - Indifferent. While the overall result can be reduced to a 3, it is important
to note that the students felt, overall, more comfortable than they did before
the lecture.
Theory-First vs. Practice-First Educational Approaches 203

5.3 Level of Comfort Explaining 3NF to a Peer After the Lecture

Students were asked, on a scale from 1–5, how comfortable they would be explain-
ing 3NF to a peer after the lecture. A student selecting a 1 would indicate low
comfort, and a 5 would indicate high comfort. Students in Section A were, on
average, comfortable with the concept of explaining 3NF to a peer after the
lecture. The calculated average was 3.8/5, which rounded to the nearest integer
would be: 4 - Comfortable. Students in Section B were, on average, indifferent
to the concept of explaining 3NF to a peer after the lecture. The calculated aver-
age was 3.4/5, which rounded to the nearest integer would be 3 - Indifferent.

5.4 Normal Form Complexity

Students were asked, on a scale from 1–5, what their impressions were on the
complexity of database normal forms. Specifically, a statement posits: “I find
third normal form easy to understand.” A student selecting 1 would indicate
strong disagreement, and a 5 would indicate strong agreement. Students in
Section A were, on average, in agreement with this statement. The calculated
average was 3.6/5, which rounded to the nearest integer would be: 4 - Agree.
Students in Section B were, on average, in agreement with this statement. The
calculated average was 3.6/5, which rounded to the nearest integer would be: 4
- Agree.

5.5 Test Readiness

Students were asked, on a scale from 1–5, what their impressions were on their
own personal readiness for a test on 3NF. Specifically, a statement posits: “If a
test were given today on Third Normal Form, I would ace it.” A student selecting
a 1 would strongly disagree with this statement, and a 5 would indicate strong
agreement. Students in Section A were, on average, indifferent to this statement.
The calculated average was 3/5, which would be: 3 - Indifferent. Students in
Section B were, on average, indifferent to this statement. The calculated average
was 2.5/5, which rounded to the nearest integer would be: 3 - Indifferent.

5.6 Demonstration

Students were asked, on a scale from 1–5, how intuitive they found the lecture.
Specifically, a statement posits: “I found the demonstration intuitive.” A stu-
dent selecting a 1 would indicate strong disagreement with this statement, and
a 5 would indicate strong agreement. Students in Section A were, on average, in
agreement with this statement. The calculated average was 4/5, which would be
4 - Agree. Students in Section B were, on average, in agreement with this state-
ment. The calculated average was 4.2/5, which rounded to the nearest integer
would be 4 - Agree.
204 D. C. Cookenmaster et al.

5.7 Educational Approach

Students were asked, on a scale from 1–5, how much they enjoyed the educational
approach. Specifically, a statement posits: “I enjoyed the educational approach
used in the demonstration.” A student selecting a 1 would indicate strong dis-
agreement, and a 5 would indicate strong agreement. Students in Section A were,
on average, indifferent to this statement. The calculated average was 3.4/5, which
rounded to the nearest integer would be 3 - Indifferent. Students in Section B
were, on average, in agreement with this statement. The calculated average was
4.2/5, which rounded to the nearest integer would be 4 - Agree.

5.8 Predisposition to Theory

Students were asked, on a scale from 1–5, how much they believed the intro-
duction of theory was important to introduce first in education. Specifically, a
statement posits: “It is important to learn theory before engaging in practice.”
A student selecting a 1 would indicate strong disagreement with this statement,
a 5 would indicate strong agreement. Students in Section A were, on average, in
agreement with this statement. The calculated average was 4/5, which would be
4 - Agree. Students in Section B were, on average, in agreement with this state-
ment. The calculated average was 4.1/5, which rounded to the nearest integer
would be 4 - Agree.

5.9 Discussion

The authors of this paper recognize that the sample size is small and acknowledge
the limitations of these results. Classroom size limitations, student availability,
and time constraints resulted in a smaller-than-desirable set of students. How-
ever, the methodology and results presented herein can be used to fuel another,
larger study.
Based on the data collected, while it is currently indeterminate as to whether
or not students were more or less comfortable with the concept of normal forms
after the experiment, what is certainly clear is that students across both groups
have a strong predisposition to theory-first education. Specifically, across both
groups, the average was four or above (general agreement) when asked how much
they believed that the introduction of theory was important to introduce first
in education. This has lasting implications for educators, as students currently
believe that theory is important to introduce first, whether or not it actually
behooves them.
That being said, it should be noted that while students had a strong predispo-
sition to theory-first education, those in the practice-first lecture were reportedly
more comfortable on average with the educational approach afforded them, as
opposed to those in the theory-first lecture, who were on average indifferent.
Theory-First vs. Practice-First Educational Approaches 205

6 Conclusion and Future Work


This paper explored two pedagogical methods for the instruction of normal forms
in database management systems. The first approach was theory-based and the
second was practice-based. For testing these methods, we separated a university
class’ students into two groups and presented normal forms with the theory-based
approach to the first group and the practice-based approach to the second group.
While the data collected in this study has definite limitations, namely a small
sample size and a focus limited to database normalization, what is clear is the
tendency students have towards theory in education. Assisted by the preliminary
results suggesting that students may well prefer a practice-based approach, we
propose that another, larger study be conducted to test this hypothesis with
the hope that practice-based learning be more heavily tested (and eventually
adopted) in the classroom.

References
1. Alférez, G.H.: Ideas para docentes-investigadores adventistas. Publicaciones Uni-
versidad de Montemorelos (2020)
2. Jakovljevic, M., Ankiewicz, P.: Project-based pedagogy for the facilitation of web-
page design. Int. J. Technol. Des. Educ. 26(2), 225–242 (2016)
3. Jao, L., Wiseman, D., Kobiela, M., Gonsalves, A., Savard, A.: Practice-based ped-
agogy in mathematics and science teaching methods: challenges and adaptations
in context. Can. J. Sci. Math. Technol. Educ. 18(2), 177–186 (2018)
4. Katarina Slobodová Nováková and Zuzana Giertlová: New models of theoretical
and practical education in urban environment (on example of experience-based
pedagogy in Slovak towns). Procedia. Soc. Behav. Sci. 228, 305–310 (2016)
5. Pella, S.: Pedagogical reasoning and action: affordances of practice-based teacher
professional development. Teach. Educ. Quart. 42(3), 81–101 (2015)
6. Perrenet, J.C., Bouhuijs, P.A.J., Smits, J.G.M.M.: The suitability of problem-based
learning for engineering education: theory and practice. Teach. High. Educ. 5(3),
345–358 (2000)
7. Salzberg, B.: Third normal form made easy. SIGMOD Rec. 15(4), 2–18 (1986)
8. Sug, H.: A method for normalization of relation schema based on data to abide by
the third normal form. WSEAS Trans. Math. 19, 216–225 (2020)
9. Tzafilkou, K., Protogeros, N., Chouliara, A.: Experiential learning in web develop-
ment courses: examining students’ performance, perception and acceptance. Educ.
Inf. Technol. 25(6), 5687–5701 (2020)
10. Zabala, A., Arnau, L.: Métodos de enseñanza de las competencias. Graó(2014)
Analysis of Load Balancing Algorithms Used
in the Cloud Computing Environment:
Advantages and Limitations

Zakariyae Bouflous1(B) , Mohammed Ouzzif2 , and Khalid Bouragba2


1 ENSEM, Hassan 2 University of Casablanca (UH2C), Casablanca, Morocco
[email protected]
2 EST, Hassan 2 University of Casablanca (UH2C), Casablanca, Morocco

Abstract. Cloud computing as an advanced technology in the IT infrastructure


presents nowadays a big concern of researches. It’s no longer a matter of on-
demand successful delivery of computing resources. Throughput, performance,
server response time, and cost had become the metrics that enable the quality-
of-service agreement. Technically, cloud service provider guarantees to deliver
computing resources (storage, servers and applications) through back-end data
center. It consists of several hosts distributed geographically to answer the client
requests. To ensure the service level agreement between clients and providers,
cloud infrastructure software need to schedule and optimally manage the work-
load of several demands. Here, Load balancing technology enters as a major key
with a set of algorithms to handle the most effectively and fairly the allocation
and scheduling of computational resources, to serve the large amount of calling
jobs. This review presents a comparative and comprehensive study that covers the
principal concepts of cloud computing, and the well-known algorithms used for
load balancing which are classified into static and dynamic sets. The objectives
of this survey are to (1) mention, explain, compare and analyze some developed
methods for load balancing by systematically reviewing papers from the years
2018 to 2021, (2) analyze the level of maturity of the solutions proposed in the
literature and (3) present an insight into the actual solutions which may help with
future improvements.

Keywords: Cloud computing model · Internet-based SW · Load balancing ·


Virtualization · Host/VM migration · QoS Measurements

1 Introduction

Cloud refers to connected IT resources and computing refers to the work, treatment
or processing on those resources remotely in a pay as you go basis. It is an internet-
based technology which provides various cloud-based services characterized by their
efficiency, reliability and low-cost accessibility anytime and from everywhere (Karan
D. Patel 2019, Tosal M. Bhalodia 2019) [1]. Cloud computing had no consent definition
until 2008, and researchers continue improvements to establish a common definition.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 206–226, 2023.
https://doi.org/10.1007/978-3-031-18344-7_13
Analysis of Load Balancing Algorithms Used in the Cloud Computing Environment 207

NIST1 (National Institute of standards and Technology) [5] defined cloud computing as
a model for enabling global, convenient and on-demand network access to a shared pool
of configurable computing resources that can be rapidly provisioned and released with
minimal management effort or service provider interaction (Ahmad Salah al-Ahmad,
Hasan Kahtan 2018) [2]. The definition describes explicitly the major 5 features that
constitutes the essence of the cloud System to provide computing resources accessibil-
ity, which are: On-demand self-service, resource pooling, broad network access, rapid
elasticity, and measured service. On demand self-service focuses on the ability of the
user to request and configure his computing utility demand independently on a third
party. Resource pooling means that CSP’s computing resources and VMs are pooled in
a distributed manner to serve client needs without geographical constraints, especially
while managing sessions and user traffic based on SIP protocol, in a cloudlet or at the
WAN networking level. Broad network access means that all resources, from end point
servers to software applications running on the hosts are available for the users over
the internet, through a network client server architecture. Rapid elasticity refers to the
load, traffic and level of demand on the computing resources, which can be allocated to
the requesting jobs or released elastically based on the cloud state and amount of client
requests. On the QoS level, those capabilities should appear to be unlimited and acces-
sible for any quantity of user requests at any time, and the CSP should consider the fault
tolerance of the cloud system as a principal challenge, to keep all requests running using
load balancing techniques and VM migration solutions. Tolerance to faults ensures that
all services are being delivered continuously even if there is a SW/HW issue with some
cloud servers. Finally, the measured service feature, which means that end users doesn’t
have the responsibility to control, configure or using optimally the computing resources,
CSP handles automatically all capabilities (Servers, storage, runtime environments and
applications. Both cloud consumers and providers have the option to monitor, control
and report the amount and type of the utilized services. Because of those features, orga-
nizations and industries are migrating to a new architecture of cloud services to enhance
their business model and to encourage remote work for their employees. Businesses also
use software as a service (SaaS) to access a web-based application, or infrastructure as a
service (IaaS) to sublet to other smaller companies, or use Platform as a Service (PaaS)
to build their own applications (N. Manikandan, A. Pavin 2019) [3]. At the deployment
level, cloud infrastructure can be categorized into 3 types:

a) Public Cloud: service should be public in a standard model and most services should
be free for a user, this type allows users to access the cloud publicly via interfaces
using web browsers on a pay per use basis (computing utility). However, public
clouds present a less security in comparison to other categories.
b) Private Cloud derived from the intranet model and constitutes a service offered to
a selected category of users instead of a public cloud. It provides high level security
because all cloud networking traffic is processed within the organization’s internal
DCC, means that all services and resources are made available for the users uniquely
at the organizational level.
c) Hybrid: combine both advantages of public and private cloud models. It serves
the organizations security needs and provide access to public capabilities whenever
needed by the private cloud users. Here, the private pool of servers is linked to
208 Z. Bouflous et al.

one or more public cloud nodes to enable the accessibility. This category provides
more flexibility to enhance IT infrastructure, networking and open the possibility for
more options (S. Sahu and M. Pandey 2019) [4]. Furthermore, the NIST mentions
the existence of another type of cloud systems named the Community one, which
is provisioned for configured use by a specific community of CSCs (Cloud Service
consumers) from businesses and organizations that have shared concerns and strate-
gies (e.g., security requirements, policy, data privacy monitoring systems …). One
or more of the organizations within the community could own, operate, configure
and maintain the overall architecture, on or off premises.

Cloud computing model as described by the NIST is shown in the Fig. 1 below:

Fig. 1. The NIST cloud computing model

At the reference architectural level, the NIST defines 5 major actors on the cloud
infrastructure: cloud consumer, cloud provider, cloud carrier, cloud auditor and cloud
broker. Each actor can be considered as an entity (a person or an organization) that
participates in a transaction or process and/or performs tasks in cloud infrastructure
[5]. 1) Cloud consumer (CSC): A person or organization that maintains a business
relationship with, and uses service from, Cloud Providers (CSP). 2) Cloud Provider: A
person, entity or organization which is responsible for the availability of a service to
interested parties. 3) Cloud Auditor: defined as a party that can conduct independent
assessment of cloud services, information system operations, performance and security
of the cloud implementation. 4) Cloud.
Broker: defined as an entity that manages the use, performance and delivery of cloud
services, and negotiates relationships between Cloud Providers and Cloud Consumers. 5)
Cloud Carrier: any intermediary which is responsible of providing connectivity and trans-
port of Cloud computing services from CSP to CSC. Figure 2 resumes the interactions
between those actors in cloud computing:
Analysis of Load Balancing Algorithms Used in the Cloud Computing Environment 209

Fig. 2. The NIST cloud computing actors

There are two service request’s scenarios:

A) When the CSC asks service from CSB instead of contacting directly the CSP. In this
case, the CSB may create a new service by combining multi CSP Service sources.
According to (R. Hentschel and S. Strahringer 2020) [6], Finding appropriate cloud
services that best fit CSC requirements can be a complex and time-cost intensive
process, especially for small and medium organizations, and since there is no “one
fits all” CSP, companies face the challenge of selecting and combining services
from different vendors to meet their requirements.
B) When the cloud consumer requests are linked directly to a cloud provider according
to a certain SLAco (Service level agreement with consumer). In this case, the cloud
provider is linked itself to a cloud carrier according to another SLAca (Service level
agreement with carrier), that enables capability for the cloud provider to request ded-
icated and encrypted connections to ensure that the cloud services are consumed at
the consistent level mentioned on the contractual obligations with cloud consumers.
In this case, the provider may specify its requirements on capability, flexibility and
functionality in SLAca in order to provide essential requirements in SLAco.

CC infrastructure based on those actors guarantee to deliver computing resources


to end users through back-end data center, which host large scale computing resources.
Due to the increase in demand for cloud capabilities, QoS and efficient use of data
center resources have become a major concern for performance validation of the cloud
infrastructure (A. Jyoti, M. Shrimali, S. Tiwari, H. Pratap Singh 2020) [7]. Here, we
need to schedule and optimally manage the workload on the data center resources. Load
balancing can improve the Quality of Service (QoS) metrics, including response time,
cost, throughput, performance and resource utilization in Cloud environments. It’s a key
aspect of cloud computing as it allows to avoid the situation in which some nodes become
overloaded while the others are underloaded or even idle. R. Ben Hamouda, S. Boussema,
I. Ben Hafaiedh and R. RobbanaIt 2019 [8] mentioned that it becomes imperative to
develop an algorithm which can improve the system performance by balancing the work
load among different nodes. When Model Driver engineering in combined with Mutli
Agent Systems approaches, the violation of the SLA cloud be detected in real-time to
210 Z. Bouflous et al.

keep traceability of the overall networking system. Indeed, Service Level Agreement and
user satisfaction could be guaranteed by choosing excellent load balancing techniques.
Hence, another entity named ‘Load balancer’ should be added on the reference model of
cloud computing and the whole picture of cloud computing is presented in Fig. 3 below:

Fig. 3. Overview of cloud computing

Several researches have been done in LB and task scheduling for cloud environments
and different load balancing strategies have been proposed, will be discussed next in
Sect. 2. The remaining segments are structured as follows: Sect. 3 will introduce the
preliminary of the review and Sect. 4 concludes the paper.

2 Related Works

Cloud computing in the stack of web architecture and services remains one of the fas-
cinating fields on IT industry so far. With virtualization of hosts and availability of a
set of servers, it can be considered as an emerging new technique for providing net-
work computing services like water and electricity utility. Cloud users demands for the
computing resources are increasing day after day, because of the huge number of digital
gadgets sending user requests daily. The purpose of CC is to make available anytime
the computing services for all requests taking into account QoS measurements: security,
cost (Pay as You Go), throughput and servers response time. Geeta and Shiva Prakash
2018 [9] reminds that Cloud computing is facing several challenges, mentioned the main
ones as below:

• Security and Privacy: One of the biggest issues in distributed cloud infrastructure. It
depends on jobs nature, networks data and application movements. Various security
policies have been set by CSPs to minimize the likelihood of data control loss.
• Performance: The performance is also a big concern in CC. It’s the mirror metric
capability of the overall cloud infrastructure servers. CSP could face overloading or
underloading situations due to lack in system HW configuration assets like memory,
diminutive CPU speed and limited bandwidth.
Analysis of Load Balancing Algorithms Used in the Cloud Computing Environment 211

• Efficient Load balancing: Aims to distribute the workload the most fairly possible
across all the nodes in cloud environment in order to reach client satisfaction and
manage the availability state of resources, ensuring that no node is underloaded against
overloaded nodes, Hence the refinement of the whole cloud environment throughput.
• Resource Management and Scheduling at all the embedded stack levels of cloud
architecture: software, hardware, networking protocols, virtualization level and load
balancing techniques. It includes also the supervision of memory, threads, CPU’s
cores, disc space, VM images, I/O devices etc.
• Require Fast Internet speed constantly: The full exploitation of the computing
cloud services could not be guaranteed without high-speed communication channels.
Many researches in the field of networking are there to support CSPs on this issue.
• The Energy Consumption behind the Data Center: based on a communication from
Amazon brand, the cost consumption of its data centers is 53%, and the total cost is
used by the servers for a 3-year amortization period. Besides, cooling use 42% of the
total CA including both requirements (23%) and direct power consumption (19%) for
amortization period of 15 years.
• Scale and Quality of Service Management: It’s a primordial issue for CSP to keep
the trust and guarantee the SLA contract made with cloud consumers.

This review concentrates the study on the load balancing key challenging CC. The
problem which is facing the CSP actually is the number of servers which can’t in any way
follow the huge number of calling requests. To rise above the problem, several researches
have been done in Load balancing and task scheduling for cloud environments, to meet
those requirements.
M. Ala’anzi and M. Othman 2019 [10] presented a Meta study of the literature
on load balancing and server consolidation as a reference taxonomy on the most effi-
cient algorithms that achieve load balancing and server consolidation. They have men-
tioned a new classification for load balancing and server consolidation, such as hard-
ware threshold, migration overhead, network traffic, and reliability. Then, they described
how the merge of load balancing techniques and server consolidation can optimize the
exploitation of resource utilization and enhance QoS parameters. Though their study,
they presented a clear overview of the load balancing process, talking about PM and
VM migration and how the whole process in managed by the VMM. In another section,
the review will describe the methods for server consolidation and the parameters to take
into account to effectively achieve requested performance when combining with load
balancing techniques.
M. Asim Shahid, N. Islam, M. Alam, M.S. Mazliham and S. Musa 2020 [11] pre-
sented a comprehensive study of load balancing in the cloud computing environment
and identified the need for a novel LB algorithm that employs fault tolerance (FT) tech-
nique. Their analysis resulted in the idea that existing traditional LB algorithms without
taking into account this new FT approach are not good enough to effectively spread the
workload through Cloud system’s nodes. In their review, they discussed the current state-
of art challenges in cloud computing, and then focused on the Load Balancing issues
related to cloud infrastructure, presenting the various LB techniques currently available
in the literature and their applied performance parameters. The research gap existing
actually in the literature LB techniques was introduced, in addition to the possibility of
212 Z. Bouflous et al.

finding a new LB algorithm that can address the gaps identified. Their survey focused
essentially on the FT metric to consider for optimizing Cloud environment performance.
Due to their study, a LB algorithm aims to have this Fault Tolerance capability, which
significantly reduces the job make-span, produces efficient networking exchanges and
achieves high system efficiency during resource/ server’s losses.
Jyoti, M. Shrimali, S. Tiwari, H. Pratap Singh [7] gathered on their review the most
useful algorithms used for cloud computing, classified into static and dynamic strategies,
dependently on the types of requesting application, capability of the computing hosts
and the behavior of the cloud system. Their review was based on the literature of recent
literatures and surveys on cloud computing. Through their literatures, they focused the
review on a comparative approach of the working solutions to optimize most efficiently
the work handling, describing some strategies (Round Robin, weighted round robin, least
connection, weighted least connection and Random) on which the cloud load handling
is based to classify the algorithms proposed into static and dynamic algorithms. Then
explain in a summarizing way the most useful algorithms in the LB and the service
brokering using a taxonomy section. Another sect ion was introduced to discuss an
overview of the CC techniques in a systematic planning study, presenting a comparative
analysis of LB and the simulation tools used for LB process to discuss in the last chapters
the cloud storage security related to LB and service brokering.
A.A.A. AlKhatib, T. Sawalha, S. AlZu’bi 2020 [12] mentioned that the limitations
of network resources and efficient requirements server responses confirms the need of
load balancing technique that could help in distributing traffic via several resources for
improving the overall cloud architecture efficiency and reliability. Through their review,
a complete and relative understanding of existing literature on several load balancing
algorithms has been proposed complementary to major LB concepts. Then, the review
summarizes the advantages and limitations of major LB algorithms used nowadays for
handling the workload on cloud system, Viz Round robin algorithm, least connection,
Throttled load balancer, Genetic algorithm, Ant colony optimization, Honey bee algo-
rithm, Active monitoring load balancer, FCFS, Generalized Priority algorithm and NBST
algorithm. The review was concluded by a classification of all the previous techniques
into static and dynamic classes, discussing the overhead metric, complexity, advantages
and limitations.
R. Ramya, S. Puspalatha, T. Hemalatha, M. Bhuvana 2018 [13] have presented in their
review a performance analysis of LB algorithms using meta heuristics approach at cloud
provider’s side. Starting from the historical background of cloud theory and challenges,
the commitment tool SLA aims to handle most fairly the expectations between the cloud
service provider and consumers. The challenge resides in difficulty to handle all requests
by the cloud providers at a time during peak hours and to keep up the contractual SLA
measurements. When the occurrence of an uneven request shows up, the cloud resources
may either be underutilized or over utilized. In order to manage this load, LB mechanism
plays a primordial role in cloud computing. For that the study present a review of both the
existing static and dynamic load balancing algorithms proposed till now and design the
implementation of a Load balancer that uses a Meta Heuristics approach and Ant Colony
Optimization technique to meet the SLA criterion. They consider in their implementation
the Amazon AWS EC2 Cloud PaaS method.
Analysis of Load Balancing Algorithms Used in the Cloud Computing Environment 213

3 Preliminary of the Review


This section introduces the scope of paper related to our work. The purpose of this section
is to 1) present an overview of Load balancing algorithms used actually for handling the
cloud workload with the scope of their application. 2) Outline the strengths, weaknesses
and challenges of each algorithm to focus future researches on the remaining issues for
improving continuity of the cloud environment architecture refinement. The most often
used LBA for handling the cloud workload and its essential strategy are presented in the
Fig. 4 below:

Fig. 4. Classification of LB algorithms through strategies

3.1 Static Load Balancing Algorithms

G. Srinivasa Rao, P. Charan Arur and T. Anuradha 2020 [17] confirms in their paper
analyzing real-time cloud based LB algorithms that static techniques need the whole
server and user needs data. It has some limitations regarding the response time that is
considerably less, because it uses more computational resources, it is less scalable (which
means the ineffectiveness of virtual machines to respond demands as per the needs of the
user request traffic during the peak hours), it consumes more energy and with less make-
span and less throughput. This type of algorithms is best suitedfor the homogeneous
environments. To sum up, static load balancing algorithms are not appropriate to the
cloud environment, referring to cloud itself which is dynamic and heterogeneous.

Round Robin
Round robin is a famous load balancing algorithm, based on a list of identified servers;
it divides the assigned task to the same set of cloudlets and forwards each request to the
214 Z. Bouflous et al.

corresponding server in ordered list. Each request is divided to a set of Quantum Times
QT among all processors. Once we reach the last server, the loop moves back to the first
server and restarts the process. The Round Robin algorithm has proved its performance
in case of many CPU scheduling problems and has achieved great efficiency through
fair allocation principles which are based on temporal multiplexing frames for every
processing request (Fig. 5). RR characterizes preemption, means that all running jobs
are forced and preempted from the CPU into a queue to avoid starvation in the system
by making a long wait time for the processes to be allotted CPU. If we consider N tasks,
then each one will be allotted 1/N of the VM’s global processing time and the waiting
time will not be more than (n − 1) * QT and the overhead of scheduling will be of the
order of o(1). The main advantage of this algorithm is the performance while distributing
equally the load to all servers the more identical servers are configured to provide exactly
the same services. However, the Cloud environment systems are subjected to face the
challenge of a longer waiting time in the queue for the clients; in addition, the fixed time
slicing method cannot assure best system performance because of the different nature
of the calling jobs. The time of wait and the ratio of the context switching need to be
handled cautiously. For that, a new enhanced round robin (ERR) method was introduced
by (Sanaj MS and Joe Prathap PM 2020) [14].

Fig. 5. Round robin algorithm


Analysis of Load Balancing Algorithms Used in the Cloud Computing Environment 215

Here, all tasks Ti where i in {1, 2, 3, 4, 5} are supposed runnable for a Quantum
time QT equal to the time slot defined in the Round Robin SW algorithm by the cloud
administrator.

Weighted Round Robin


This algorithm distributes the loads equally between all server nodes with a controlled
sharing of network bandwidth. It’s based of the round robin approach, except the using
of a weight assigned by the administrator at CSP side to each server to demonstrate the
application servers’ traffic handling capability [7], assuming that all cloud servers don’t
fit the same hardware configurations. For example, if we assign the weight 3 to the first
server and the weight 2 to server 2, and weight 1 to server 3 (default one), the distribution
of the traffic will be as below (Fig. 6):

Fig. 6. Weighted round robin algorithm

The main advantage of this algorithm is that’s realistic taking the fact that HW servers
can have different configuration capacity, such as memory, CPUs frequency and number
of cores. The main challenges of this algorithm are the starvation, which is the situation
under which low weighted servers get postponed by the LB algorithm due to the high
priority given to the other servers. Another issue is the non-standardized metrics on
216 Z. Bouflous et al.

which the cloud database administrator could refer to assign the appropriate weight to
server nodes.

Shortest Job Scheduling Algorithm


Uses the priority parameter, this scheduling algorithm is non-preemptive, which means
that next calling job in the queue cannot obtain the other processor until the operating
cycle is complete for the running task. Shortest function first is a complex LB algorithm
that enters the preference notion into the scheduling process. It determines preferences
by deciding phase size [11]. The shortest executable job has the priority to be selected
first. This approach follows to perform full execution of short jobs to use resources to
complete heavy jobs. What makes this algorithm powerful is the advantage of the wait
time for processes which is significantly low.

Min-Min Algorithm
This algorithm is based on the determination of the average time execution of all the
waiting activities first and it processes like that until the entire workload is complete.
The algorithm has shown improved productivity and response time and optimal resource
utilization, challenged by high overhead communication. At first Min-Min steps, the
optimal activities resulting in improved scheduling and overall developments of the
global make-span are scheduled, that means that the algorithm assigns each task to
the best matching server depending on the response time and hardware configuration.
Minimal runnable tasks will then be allocated first, while the bigger tasks would persist
in the holding stage, contributing to weak machine use. The algorithm uses a quick and
easy approach improving the overall make-span; however, it suffers from starvation [11].

Max-Min Algorithm
Max–Min follows the Min-Min heuristic algorithm. This procedure in cloud environment
selects the task with larger size and chose a cloud resource (VM) that has the minimum
processing capacity. After the allocation of task to a VM, the algorithms remove the
task from the queue and proceed forward to distribute the remaining unallocated tasks.
The Max–Min algorithm is suitable for only small scale distributed clustered systems; it
keeps a task status table in memory for real-time VM load measurements, additionally
to the expected completion time while executing tasks. Another Elastic Cloud Max–
Min (ECMM) algorithm was proposed in the literature and proved better than the RR
technique for improvements in task’s average pending time, resulting in arrival of the
tasks in batch mode process [16]. This algorithm performs greatly than the Min-Min
algorithm, because of the higher number of small tasks in comparison to long ones;
nevertheless, it leads to starvation too.

Two Phase (OLB + LBMM) Algorithm


This algorithm combines the opportunistic load balancing (OLB) and the Min–Min load
balancing (LBMM) approaches to obtain better system execution efficiency. The working
principle of the OLB algorithm is to make each node in working conditions in the manner
to get the aim of cloud computing. While OLB distributes all requests on broadcast
mode, LBMM reduces the overall completion pending time of node activities. Those
combined algorithms result in optimal resources utilization and improves the efficiency
Analysis of Load Balancing Algorithms Used in the Cloud Computing Environment 217

of work in the network of multiple processors [7]. The (OLB + LBMM) measurement
follows the approach of the specialists. The algorithm presents multiple stages: stage
one in which a challenging administrator manages the workloads and assigns tasks to
the specific nodes. Stage two where service manager divides the requests into the sub-
enterprises and relegates them to the operational nodes in question. It also consists of
administrative nodes for performing the level three tasks. This algorithm presents the
advantage of efficient resources utilization, and improved work competency. However,
since completion and run time of node tasks are not considered in OLB, the overall
pending time of all activities is significantly long.

Random Algorithm
The random algorithm matches clients and servers randomly, based on a random number
generator; the load balancer follows a large number of requests evenly to the nodes by a
Random algorithm. Like Round Robin, this algorithm has proven its efficiency for cluster
nodes that with similar configurations. S. Kumar Mishra, B. Sahoo and P. Paramita Parida
2020 [16] described in their paper that in cloud computing environment, load refers to the
allocation of different tasks to VMs. We can define the LB research problems at different
levels: (1) Task allocation: The random sharing of a finite number of tasks into various
Physical Machines (PMs) which again responsible to the creation of different VMs using
a hypervisor firmware. The metric of efficient task allocation in the cloud determines
the effectiveness of the load balancing algorithm and the access control to each service
could be provided by ABE (Attribute based encryption), which is widely used with SW
applications using the cloud for data storage [20]. (2) VM/Task Migration Management:
In CC Environment, VM Migration describes the transfer of a VM from one overloaded
PM to another one improving the resource utilization. Similarly, the movement of the
actual task’s state from a virtual machine to another one is called task migration. The VM
or task migration are a primordial concept for load balancing of cloud computing. The
algorithm presents an advantage for reaching load balancing on all system servers and the
better performance of servers with similar and high workloads of computing resources.
Those were the most used strategies on the cloud system history. Improvements and
enhancements were introduced as variations based on the same principle, giving birth to
much advanced LB algorithms, classified into static and dynamic ones. The static-based
balancing algorithms are mostly suitable for stable environments with homogeneous
system, while dynamic-based LBs are more adaptable for complex cloud architectures,
proving effectiveness in both homogeneous and heterogeneous environments. However,
static load balancing processes present less system overhead as compared to the dynamic
ones.

3.2 Dynamic Load Balancing Algorithms


This type of algorithms depends on the current state of the cloud system to make the
scheduling decisions. The key benefit of the dynamic strategy is the task and VM migra-
tions, which allows the movements from an overloaded physical machine to an under-
loaded one. Dynamic load balancing algorithms are beneficial for Fault tolerance, low
overhead and higher scalability, which leads to increased CC efficiency. Those strategies
are versatile, and for that performance was improved. While processing, all dynamic LB
218 Z. Bouflous et al.

algorithms monitor the load of each node real time on a regular basis. The main objective
is to interchange load amount and state’s data between nodes and within nodes (VMs
interchanging) at given times to updated about nodes workload and redistribute the traffic
between and within nodes whenever the need for cloud flow refinement [11].

Least Connection
This algorithm schedules network traffic to the less connected server with the client
requests. It is one of the dynamic load balancing scheduling algorithms; as it depends
on counting the no of connections for every server to extract its load. The load balancer
maintains the connection number of each and every server in real time, increases the
counter number when a new connection is sent off to it, and decreases it at the end of
the connection. LC algorithm transmits a web request to the server that has least web
connection numbers (G. Singh and K. Kaur) [15]. The main advantage of this strategy
is the minimization of the server’s overload likelihood, while sending requests to the
fewest active connections. Based on the [7] reference review, the disadvantage of this
method is that LB can’t guarantee the tasks execution. Another limitation is that longer
traffic connection times stacking up on a single server, which can overload the server
when adding a new request even if it’s just one connection. The distribution of the load
is shown in the Fig. 7 below:

Fig. 7. Least connection algorithm

Weighted Least Connection


The WLC scheduling algorithm is a superset of the least-connection one, in which
Cloud administrator at provider side can assign a capacity weight to all real servers. The
nodes with a higher weight value will receive a larger percentage of active connections
Analysis of Load Balancing Algorithms Used in the Cloud Computing Environment 219

in comparison to low weighted ones. The default server weight is one, and the IPVS
Administrator or monitoring program can assign any weight to real server. In the WLC
algorithm, all next network connections are given to a server which has the minimum
ratio of the number of current active connections related to its weight. The Weighted
least connection scheduling algorithm does to least connection algorithm what weighted
round robin algorithm does to round robin algorithm. That is, it introduces a “weight” that
is based on the hardware configuration and specifications of every server [15]. Figure 8
below describes the behavior of a cloud system using weighted least connection:

Fig. 8. Weighted least connection algorithm

In Fig. 8, the server 1 is selected by the load balancer. Here the algorithm chooses the
request connection depending on the number of active connections in the LB traceability
table according to the weight. This algorithm presents the advantage of preventing a
server from being overloaded by checking the number of server connections using the
WLC approach. However, the main issue with this technique is the long processing time
[7].

Energy Aware Load Balancing Algorithm (EALB)


This process estimates the percentage of use of each calculation node for the work
module, which classifies the request processing nodes and the switched off ones. Three
sections have been introduced here in the work module: the balancing section, high-
level section and low-level section. The balancing section takes the responsibility to
determine the initialization process that virtual machines aim to start. The second section
activates the additional compute nodes and the third low-level section blocks the inactive
calculation node in the process participant. Here, processing energy bandwidth was
reduced in comparison to the other similar existing algorithms [7]. Q. Yang, Y. Shao, H.
Cui, Y. Fang, D. Yang and Y. Pan 2020 [18] notices that this strategy selects the host,
the virtual machine to be migrated, and the target host and proceeds the migration to
reduce the number of running servers resulting in less power consumption. The EALB is
220 Z. Bouflous et al.

based on the gray prediction algorithm to predict the load rate of the departure and target
hosts. Experiments on the paper show that using the energy aware algorithm, CSPs could
reduce their total CC energy consumption, enhance the overall clustering performance
and over scale their environment platform.

Modified Active Monitoring Load Balancer (MAMLB)


This algorithm is centric on the availability of the virtual machines (VM) for allocation to
the next client requests. This technique is based on the active monitoring algorithm; each
next incoming request is assigned to the less loaded VM without checking the memory
usage. Whenever the request arrives from the user’s station to the DCC Controller, the
controller asks to assign the next request among the VMs. MAMLB uses a table with the
different parameters, and it’s scanned for each calling job arrival to find the least loaded
VM of which the state is available, and when there is many available, the algorithm
looks for VM memory which have maximum priority, then returns [7].

Throttled Algorithm
This algorithm is similar to AMLB using a table that contains the VMs and their current
states (available - busy). The algorithm sends a request to the control unit whenever there
is an assignment of a virtual machine to perform a specific task. The DCC will search for
the optimal matching VM based on capabilities qualifications. Throttled Load Balancer
is described in Fig. 9 below:

Fig. 9. Throttled load balancing algorithm

This algorithm presents the advantage of data center searching for the best VM that
fits its capabilities with the task required, which improve the performance of the cloud
structure. Nevertheless, this process of searching starts from the beginning of the table for
each new request, which results in a wastage of time because it passes through unavailable
VMs every time. For that a Modified Throttled Algorithm MTA was proposed in the
literature, working on modifying the VM selection cursor mechanism. For each request
Analysis of Load Balancing Algorithms Used in the Cloud Computing Environment 221

arrival, the algorithm selects the VM index next to already assigned VM depending on
its availability [12].

Genetic Algorithm
J.M. Shah, S. Pandya, N. Joshi, K. Kotecha, D.B. Choksi, N. Joshi 2017 [19] mentioned
in their analysis that the GA model has been based on the natural calculation choice’s
model while simulating the theory of Darwin related to biological method of genetic
operations following the dynamic SW approach of computing. This algorithm presents
the advantage being adoptable for complex objective functions.
Generally, the implementation of such an algorithm requires three steps:

1. Selection Operator for which the procedure selects randomly the initial population.
2. Crossover Operator for finding out the fitness pair of individuals.
3. Mutation Operator: A value which consist of low probability said to be called as
a mutation value. These bits are toggled from 0’s t 1’s or 1’s to 0’s. The GA is
clear for developers to understand its implementation; however, it doesn’t meet the
performance criterion where resources are strictly bound [7].

Honey Bee Foraging Technique (HBFT)


This algorithm follows the imitation of honey bee’s behavior, which works on the opti-
mization of the throughput volume for maximum production. Honey bees believe nat-
urally in their multiple roles inside the colony over time. The principal actor of honey
bees is the active foraging one, which go for a source of food, search neighborhood
resources, and gather food and get back to the hive, whereas Scout bees study the world
around the hive, looking for new food resources. At a certain given moment, a few of
the active foraging bees become inactive. This technique of the foraging activity can be
used in the planning of computation activities in the queue, and food could characterize
the available servers, hosts and VMs which can accept new requests [11]. The main goal
of this algorithm is the distribution of the workload through all virtual machines, taking
into account the necessity of equity in the DCC. The algorithm selects an available VM
for allocation based on two requirements: how less the number of the tasks performing
by the VM compared to the others VMs, and the time estimated for the VM to process the
task: it should be within the average pending time of all other VMs [12]. C. Sudhakar, R.
Jain, T. Ramesh 2018 [19] affirmed in their paper that honey bees have inspired effective
request balancing strategy in the cloud theory. Honey bees search for their food and alert
the other bees in beehive about the amount and quality of the food found by performing
waggle dance. Three types of bees have been noticed in the literature of the algorithm:

1. Scout bees which look arbitrarily for food source and perform waggle dance to
inform about the quality of food.
2. Employed bees which gathers data about food source and shares the information
obtained with onlooker bees.
3. Onlooker bees which calculate the fitness value for finding the best source food. In
respect of LB the incoming requests, tasks from overloaded machines presents the
honey bees, being transferred from overloaded machines to under loaded ones. The
222 Z. Bouflous et al.

dynamic approach of this algorithm makes the changes in the status of the load to
be checked real time, and the updated load of the departure overloaded machine is
taken into account for the remaining tasks. However, the algorithm suffers from the
exactitude while calculating the VM load, because it doesn’t take into consideration
the task transfer time between nodes.

Ant Colony Algorithm (ACA)


The Ant Colony Optimization Algorithm (ACA) is used to convert the cloud optimization
problem for finding the shortest path on weighted graph. While considering the ants as
the computational agents (jobs, threads and processes). It minimizes the overall yield
duration. The ACA algorithms proceed for scheduling the load following the simulation
process of the ants. At the beginning, the ants choose a random direction, and once
finding the desired goals, they calculate the fitness path, at which point the ants set the
pheromone on the path based on the physical form. Finally, for being productive finding
the high fitness path the optimal solution ASAP, it is primordial to update the choices
of pheromone and behavior and based on the number of pheromones on the routes. The
ants choose their next movements. So, most ants moving on the same path have a higher
absorption of pheromone with low evaporation rate on the shortest route, which is why
the ants choose the nonstop route [7]. ACO algorithms inspect each new route if the
computational agents find an obstacle in the path, and assign therefore the new route
between nodes. The algorithm starts by initializing the table, data flow, and threshold
level for nodes needed. While the flow goes through the servers, the ACO checks the level
of each node load, if underloaded, we have wastage of resources and there is necessity
to apply the maximum trailing pheromone (TP) which addresses the path for the under
loaded node and update the table until the node reaches the threshold limit. However,
once we meet the threshold, the algorithm applies the Foraging Pheromone (FP) to
explore new food sources and updates the table until an under loaded node is found,
and then it reassigns the overall resources. This cycle follows up until the end of the
procedure [11]. The advantage of Ant colony method is the dynamic non-stop transition
from one VM to another of all tasks like the living style of the ants, nevertheless, the
load on the system unit is significantly high and functioning of the entire algorithm is
highly complex during implementation.

4 Conclusion and Future Work

Cloud computing (CC) provides powerful tools and methods for on-line digital process-
ing, making it a fascinating field in the computing intelligence (CI). It consists of network
delivery of computational services such as storage, applications and servers. The global
optimization model of the cloud is qualified NP-hard, seeking the use of multiple meta
heuristics as part of the architectural solution. Many challenges are facing the CC nowa-
days with the increased amount of cloud services users, and throughput criterion had
become a big concern for cloud providers, to meet the SLA contractual negotiations.
Load balancing (LB) plays a major role on the availability of the cloud, because of its
aim to equally distribute the load among all servers. This survey overviews the cloud
Analysis of Load Balancing Algorithms Used in the Cloud Computing Environment 223

computing, details its major concepts and 10 reviewed the most used load balancing
algorithms, while identifying the main advantages and limitations in the literature for
future opened researches. A new insight could be discussed here about a collaborative
approach between several load balancing algorithms discussed in the review, taking in
consideration the core advantages presented by each algorithm, and to find an inter-
operability link that could contribute in minimizing the remaining limitations in the
literature, such as make span, throughput and server response time. This approach will
lead to design a new algorithm which gets all parameters about distributed cloud servers
and calling requests, analyze the available data and choose whether to apply Genetic or
Ant Colony algorithms for example. After several processing iterations the overall met-
rics about server response time and performance could be improved. Table 1 summarizes
a comparison between the load balancing algorithms discussed through the review.

Table 1. Comparison of existing LB algorithms based on several metrics

Metrics/Technique Throughput Response Resource Performance Make-span Migration


time utilization time
Round Robin Yes Yes Yes Yes No No
Weighted Least Yes Yes No Yes No No
connection
Random Yes Yes No No Yes No
Shortest job Yes No Yes No No Yes
Min-Min Yes Yes Yes Yes No No
Max-Min Yes Yes Yes Yes No Yes
Two phase No Yes Yes Yes No Yes
OLB+LBMM
Genetic Yes No Yes Yes No No
Algorithm
Ant Colony Yes Yes Yes No Yes Yes
Honey Bee No Yes Yes No Yes Yes
Foraging

Authors’ Contributions. Z.B wrote the main manuscript except conclusion, and prepared Fig. 5,
6, 7, 8 and 9. He contributed in gathering data of related works and analysis of each algorithm.
M.O prepared Fig. 1 and 2 and the conclusion of the manuscript, and contributed in gathering
and choosing data of related works, thus analysis of each algorithm in the article.
K.B prepared Fig. 3 and 4 and contributes in gathering and choosing data of related works,
thus analysis of each algorithm in the article
All authors reviewed the manuscript twice.

Funding. This research didn’t receive any specific grant from funding agencies in the public,
commercial, or not-for-profit sectors.
224 Z. Bouflous et al.

Data Availability. Data sharing not applicable to this article as no datasets were generated or
analyzed during the current study.

Competing Interests. The authors declare no competing financial interests.

List of Abbreviations
A-
ACA: Ant Colony Algorithm
C-
CSP: Cloud Service Provider
CSC: Cloud Service Consumer
CSB: Cloud Service Broker
CC: Cloud Computing
CPU: Central Processing Unit
D-
DCC: Data Center Coalition
E-
ERR: Enhanced Round Robin
EALB: Energy Aware Load Balancing
F-
FT: Fault Tolerance
FCFS: First Come First Serve
G-
GA: Genetic Algorithm
H-HW: Hardware
I-
IaaS: Infrastructure as a Service
IT: Information Technology
I/O: Input/Output
L-
LB: Load Balancing
LC: Least Connection
LBMM: Load Balancing Min-Min
M-
MAMLB: Modified Active Monitoring Load Balancer
MTA: Modified Throttled Algorithm
N-
NIST: National Institute of standards and Technology
O-
OLB: Opportunistic Load Balancing
P-
PaaS: Platform as a Service
PM: Physical Machine
Q-
QoS: Quality of Service
Analysis of Load Balancing Algorithms Used in the Cloud Computing Environment 225

QT: Quantum Time


R-
RR: Round Robin
S-
SaaS: Software as a Service
SW: Software SIP: Session Initiation Protocol
SLA: Service Level Agreement
V-
VM: Virtual Machine
VMM: Virtual Machine Manager
W-
WAN: Wide Area Network
WLC: Weighted Least Connection

This research did not receive any specific grant from funding agencies in the public,
commercial, or not-for-profit sectors.

Note
The NIST defines CC as ‘a model for enabling ubiquitous, convenient, on-demand
network access to a shared pool of configurable computing resources (e.g., networks,
servers, storage, applications, and services) that can be rapidly provisioned and released
with minimal management effort or service provider interaction’ [5].

References
1. Patel, K.D., Bhalodia, T.M.: An efficient dynamic load balancing algorithm for virtual machine
in cloud computing. IEEE Xplore Part Number: CFP19K34-ART (2019). ISBN: 978-1-5386-
8113-8
2. Salah al-Ahmad, A., Kahtan, H.: Cloud computing review: features and issues, 978-1-5386-
5630-3/18/$31.00. IEEE (2018)
3. Manikandan, N., Pavin, A.: Comprehensive solution of scheduling and balancing load in
cloud – a review. IEEE Xplore Part Number: CFP19OSV-ART (2019). ISBN: 978-1-7281-
4365-1
4. Sahu, S., Pandey, M.: Efficient load balancing algorithm analysis in cloud computing. In: Pro-
ceedings of the Fourth International Conference on Communication and Electronics Systems,
ICCES (2019)
5. Liu, F., et al.: NIST Cloud Computing Reference Architecture Special Publication 500-292
(2011)
6. Hentschel, R., Strahringer, S.: A broker-based framework for the recommendation of cloud
services: a research proposal. In: Hattingh, M., Matthee, M., Smuts, H., Pappas, I., Dwivedi,
Y.K., Mäntymäki, M. (eds.) I3E 2020. LNCS, vol. 12066, pp. 409–415. Springer, Cham
(2020). https://doi.org/10.1007/978-3-030-44999-5_34
7. Jyoti, A., Shrimali, M., Tiwari, S., Pratap Singh, H.: Cloud computing using load balancing
and service broker policy for IT service: a taxonomy and survey. J. Ambient Intell. Humaniz.
Comput. 11, 4785–4814 (2020)
226 Z. Bouflous et al.

8. Ben Hamouda, R., Boussema, S., Ben Hafaiedh, I., Robbana, R.: Performance evaluation of
dynamic load balancing protocols based on formal models in cloud environments. In: Atig,
M.F., Bensalem, S., Bliudze, S., Monsuez, B. (eds.) VECoS. LNCS, vol. 11181, pp. 64–79.
Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00359-3_5
9. Geeta, Prakash, S.: A literature review of QoS with load balancing in cloud computing environ-
ment. In: Aggarwal, V., Bhatnagar, V., Mishra, D. (eds.) Big Data Analytics. AISC, vol. 654,
pp. 667–675. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-6620-7_64
10. Ala’anzi, M., Othman, M.: Load balancing and server consolidation in cloud computing
environments: a meta-study (2019). https://doi.org/10.1109/ACCESS.2019.2944420
11. Asim Shahid, M., Islam, N., Alam, M., Mazliham, M.S., Musa, S.: A comprehensive study
of load balancing approaches in the cloud computing environment and a novel fault tolerance
approach. IEEE Access (2020). https://doi.org/10.1109/ACCESS.2020.3009184
12. AlKhatib, A.A.A., Sawalha, T., AlZu’bi, S.: Load balancing techniques in software-defined
cloud computing: an overview. In: Seventh International Conference on Software Defined
Systems (SDS) (2020)
13. Ramya, R., Puspalatha, S., Hemalatha, T., Bhuvana, M.: A survey on and performance analysis
of load balancing algorithms using meta heuristics approach in public cloud-service provider’s
perspective, 978-1-5386-9432-9/18/$31.00. IEEE (2018)
14. Sanaj, M.S., Joe Prathap, P.M: An Enhanced Round Robin (ERR) algorithm for effective and
efficient task scheduling in cloud environment, 978-1-7281-6453-3/20/$31.00. IEEE (2020)
15. Singh, G., Kaur, K.: An improved weighted least connection scheduling algorithm for load
balancing in web cluster systems. Int. Res. J. Eng. Technol. (IRJET) (2018)
16. Kumar Mishra, S., Sahoo, B., Paramita Parida, P.: Load balancing in cloud computing: a big
picture. J. King Saud Univ. Comput. Inf. Sci. 32, 149–158 (2020)
17. Srinivasa Rao, G., Charan Arur, P., Anuradha, T.: Real time cloud based load balance
algorithms and an analysis. SN Computer Science (2020)
18. Yang, Q., Shao, Y., Cui, H., Fang, Y., Yang, D., Pan, Y.: Energy-aware and load balancing
based dynamic migration strategy for virtual machine. In: 4th International Conference on
Recent Advances in Signal Processing, Telecommunications Computing (SigTelCom) (2020)
19. Sudhakar, C., Jain, R., Ramesh, T.: Cloud load balancing - honey bees inspired effec-
tive request balancing strategy. In: International Conference on Computing, Power and
Communication Technologies (GUCON) (2018)
20. Naregal, K., Kalmani, V.: Study of lightweight ABE for cloud based IoT. In: Proceedings of
the Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)
(I-SMAC) (2020)
NeuroTower: A 3D Neuromorphic Architecture
with Low-Power TSVs

Arghavan Asad(B) and Farah Mohammadi

Electrical and Computer Engineering Department, Toronto Metropolitan University, Toronto,


ON, Canada
{arghavan.asad,fmohamma}@ryerson.ca

Abstract. This paper sets out an architecture to achieve efficient processing of


neural networks through neuromorphic processing. The NeuroTower is effectively
a 2D, mesh connected network-on-chip integrated with stacks of DRAM integrated
on top for 3D stacked memory. This architecture employs programmable neurose-
quence generators, which act as a medium of communication in the system to aid
with the retrieval data between the DRAM stacks and processing elements. Our
research introduces a pruning component to exploit sparsity and reduce network-
on-chip traffic, a significant source of power consumption in many hardware accel-
erators. The pruning unit prevents ineffectual operations from being executed and
leaves only the effectual data required for processing.

Keywords: Deep neural network (DNN) · 3D integration · Neuromorphic


processing · Machine learning · Sparsity exploitation · Pruning · Near memory
processing

1 Introduction

The main concern regarding neural networks is their incessant need for large amounts of
energy. The ambition of neural networks is to mimic the function of the human brain, and
in doing so requires copious volumes of energy [15]. In order to speed up the process of
learning and processing while maintaining a high degree of efficiency, many accelerator
architectures have been developed. Currently, most of the success in accelerator devel-
opment has come from architectures using general purpose graphics processing units
(GPGPU), which offer a high degree of scalability but poor power efficiency. Contrast-
ingly, application specific integrated circuits (ASIC) offer better power efficiency, but
a low degree of scalability [1, 2]. This paper aims to outline an architecture to acceler-
ate neural networks with good scalability similar to GPGPU, while also providing high
power efficiency like ASIC.
The NeuroTower offers various benefits when compared to existing accelerators.
To start, the NeuroTower makes use of in-memory processing through integration of a
compute layer within a 3D high-density memory package [16–19] where a high degree
of parallelism can be realized. It also does not use a traditional instruction set to carry out
processing of the neural network. As a result, the system uses less energy and operates

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 227–236, 2023.
https://doi.org/10.1007/978-3-031-18344-7_14
228 A. Asad and F. Mohammadi

with higher efficiency. Lastly, the introduction of a component called the programmable
neurosequence generator (PNG) allows the host to program state machine descriptions
into the architecture, which can give rise to abstractions within the network simplifying
the processing procedure.
This paper expands upon the already existing Neurocube, proposing a method of
reducing the NoC (network on chip) traffic previously limiting the efficiency of the
architecture [3]. This is accomplished through a pruning unit designed to exploit naturally
occurring sparsity in both interconnection weights, and states of neurons [4].
The architecture of the Neurocube is explained in several sections detailing the
function of each of the individual components as well as communication among those
components. Following this is the design for the suggested pruning unit to reduce traffic
through sparsity exploitation.

2 Proposed Method
2.1 Dram
In the Neurocube, memory is integrated as a stack of multiple DRAM chips each sepa-
rated into 16 partitions. Along one column of partitions is a vault as shown in Fig. 1 below.
Each of these vaults has an associated vault controller which controls data movement in
and out of the vaults to other elements of the NeuroTower. Each vault is connected to one
processing element to allow for parallel processing and these connections are realized
by using high speed through silicon vias (TSVs) [5]. The DRAM stack is crucial to the
operation of the system as all the information for processing is contained here. Every
layer of the neural network, their states, and connectivity weights are stored in the vaults
of the DRAM. This implies that the data movement paths are known before beginning
processing. To make use of this, the paths are compiled into finite state machine descrip-
tions which drive the programmable neurosequence generators (PNG) [3]. To initiate
processing the host must load these state machine descriptions into the PNG which
begins the data-driven processing of each layer of the neural network.

DRAM
Layers
Vault

PE16

PE12 Logic layer


PE 8 Router
PE 1 PE 2 PE 3 PE 4 PNG Vault
Ctrl.
Spreader
Heat sink PE

Fig. 1. NeuroTower architecture with depiction of stacked memory


NeuroTower: A 3D Neuromorphic Architecture with Low-Power TSVs 229

2.2 Programmable Neurosequence Generator


The programmable neurosequence generator (PNG) is important for the packaging and
delivery of data. The PNG accomplishes this through creating packets which are injected
into the NoC for processing. The host initiates processing through a global controller
interacting with the host. The configuration registers are loaded with state machine
descriptions and the processing of the first layer begins.

Fig. 2. Address generation logic

For each neuron in a layer, the PNG generates the address of connected neurons
and connectivity weights from the previous layer through the address generator. These
addresses are sent to the vault controller which makes accesses to the vault to retrieve
data located at the requested addresses [3]. Figure 2 presents the address generator,
which uses combinational logic in addition to three nested loops to generate the address
of the required neurons. The combinational logic computes the memory address of the
target neurons from the neuron and connectivity counter. As the counters increment,
the combinational logic computes the address of the target neuron and makes a request
to the vault control for the required data. This loop continues until the states of all the
neurons in the layer have been computed [3]. This loop is constantly incrementing and
sending addresses to the vault controller while processing is taking place. This ensures
that processing elements are not wasting clock cycles waiting for data to arrive.
As the PNG receives the data stream from the vault controller, it must also apply the
non-linear activation function to the input neuron state. In the NeuroTower, the activation
function is implemented as a look-up table (LUT). The output state is then encoded into a
packet along with other relevant information as dictated by the encapsulation logic. Once
the packet is ready, the PNG sends it to the router of the network on chip for delivery to
the processing elements [3]. Figure 3 illustrates the interactions of each element of the
PNG with each other, as well as with the vault and network on chip.

2.3 Packets
The processing elements require data in the form of packets to conduct arithmetic.
In addition to state and connectivity weight, each packet also contains the different
identifiers listed below and illustrated in Table 1 [3]:

• MAC-ID: Indicates which MAC unit will conduct processing and which neuron in
the current layer is under computation.
230 A. Asad and F. Mohammadi

Fig. 3. a) Position of the PNG between the vault controller and router. b) Overview of the PNG
architecture.

• Operation-ID: Indicates which neuron from the previous layer is being used as an
input.
• Source-ID: Indicates the DRAM vault accessed for location and is used to locate
where the neurons state should be updated.
• Destination-ID: Indicates which processing element will conduct processing.

Table 1. Packet fields

SRC DST DATA MAC-ID OP-ID


4 bit 4 bit 16 bit 4 bit 8 bit

2.4 Processing Elements


The processing elements are the main arithmetic unit in the system. Since convolutional
neural networks require multiply and accumulate operations, the PEs are comprised
of multiple MAC units which are each responsible for computing the weighted sum-
mation of one neuron with the appropriate input neuron. Depending on the bandwidth
requirements of the vault, each processing element can have a different number of MAC
units per PE [6]. In a processing element, there are five different components to con-
sider. Figure 4 shows the micro-architecture the PE. There is a cache memory, temporal
buffer, memory module, OP-ID counter, and MAC units contained within the processing
element [3]. According to the OP-ID counter, data from incoming packets is stored in
either the temporal buffer or cache memory. If the OP-ID of a packet is higher than the
current OP-ID indicated by the counter, data is stored in the cache memory; otherwise,
data is moved to the temporal buffer. Once the temporal buffer is filled, all the contained
data is flushed and sent to the MAC units for processing.
The following example demonstrates the function of each element in a PE:
In Fig. 5(a), the incoming packet has an OP-ID of 3, which matches the OP counter
value. Thus, data from this packet is stored in the temporal buffer.
Subsequently, Fig. 5(b) shows that a packet with an OP-ID higher than the OP counter
will have its data stored in the cache memory for future use. The memory location this
data is stored in is calculated through a modulo operation between the OP-ID and 16.
NeuroTower: A 3D Neuromorphic Architecture with Low-Power TSVs 231

Fig. 4. Architecture of a processing element

Fig. 5. (a–d). Operation of a processing element

In part c, the OP-ID is equal to the OP counter, so data is stored in the temporal
buffer. At this stage, the temporal buffer is full with 16 weights and inputs.
Lastly, the temporal buffer is flushed, and the MAC units receive this data to conduct
processing. The operation counter increments, and relevant data (with the same OP-ID)
is retrieved from the cache memory and stored in the temporal buffer [3].

2.5 Pruner Unit

The Neurocube faces a high degree of difficulty dealing with network on chip traffic and
consumes more power than necessary as a result while also slowing down computation
[3]. As a solution to this problem, sparsity can be exploited to reduce the amount of
data transfer within the system. Though sparsity occurs naturally in activation functions
232 A. Asad and F. Mohammadi

and weights, the amount can be increased without a loss of accuracy through pruning,
wherein values under a threshold are set to zero [4]. Thus, in order to address the issue
of NoC traffic, a pruner unit can be used as a medium to reduce data transfer through
sparsity exploitation [7]. This is the key difference between the NeuroTower and the
existing Neurocube.

Fig. 6. Pruner unit logic circuit

In a MAC operation, there are three parameters to consider: input state, weight, and
output state of the previous neuron. If any of these values are 0, the MAC operation is not
necessary and is labeled ineffectual [8]. On the other hand, if all the values are non-zero,
the MAC operation is effectual and must be carried out [9]. Through avoiding ineffectual
operations, the NoC traffic can be reduced while also decreasing power consumption.
The pruner unit, shown in Fig. 6, is simply a comparator used during runtime to
compare a packet’s data field to a predefined threshold. The output of the comparator
is used as an enable signal to control the function of the appropriate MAC unit. If the
data field in the packet is less than the threshold, the comparator outputs a 1-bit null flit
and disables the corresponding MAC unit. Otherwise, the MAC units are enabled, and
data is sent as it normally would. When a MAC unit is disabled from the comparator,
the output is taken to be equal to the state of the current input neuron. As a result, three
unnecessary operations are avoided which not only reduces traffic, but also opens up
space in a given MAC unit for more data to be processed earlier than it would be without
the pruner unit.
The flow of operations in the NeuroTower is depicted through the flow chart in Fig. 7,
with the cases of sparsity existing and not.
NeuroTower: A 3D Neuromorphic Architecture with Low-Power TSVs 233

Fig. 7. Flow chart of processing

3 Experimental Results
3.1 Experimental Setup
In order to validate the efficacy of the NeuroTower and other architectures in this work,
we used 3D-Noxim, [10, 11] and GEM5 full-system simulator [12].
We used the Caffe Model Zoo [13] framework to run various DNNs as shown in
Table 2. DNNs in this work were used to perform image classification on the ILSVRC12
dataset [13]. ILSVRC12 is a dataset with 256 × 256 images across 1000 classes.

Table 2. Summary of DNNs under consideration

Network AlexNet VGG 19 GoogLeNet Nin CnnM CnnS


v1
Conv. 5 16 59 12 5 5
layers
Pretrained Model Zoo Model Zoo Model Zoo Model Zoo Model Zoo Model Zoo
model [5] [5] [5] [5] [5] [5]
website

Traffic traces of real NN workloads, shown in Table 2, are extracted from the GEM5
full-system simulator [12]. For simulating a 3D architecture, the extracted traffic traces
234 A. Asad and F. Mohammadi

from GEM5 are interfaced with 3D Noxim as a 3D interconnection. To implement


the NeuroTower, the 3D Noxim interconnection has been modified. Synopsis Design
Compiler and McPAT [14] is used to extract the power profiles of CPUs and other
components in this work.

3.2 Experimental Evaluation


In this sub-section, we evaluate two different architectures: (a) 3D Meshed-CPU, and
(b) NeuroTower. 3D Meshed-CPU is a 3D multicore platform with 16 CPUs in the core
layer and a 3D stacked memory layer. CPUs are connected with a mesh topology in
the core layer. The 3D stacked memory system in the 3D Meshed-CPU includes three
layers. Each layer includes 16 × 256 KB SRAM bank. In NeuroTower, there are 16
PEs (including MACs and cache banks) in the lower layer and a 3D stacked memory
system. There are 16 Vaults in each layer of the stacked memory system.
To continue, we examine latency and energy of the NeuroTower and 3D Meshed-
CPU.

3.3 Latency Parameter

Figure 8 shows the latency parameter, normalized to the 3D Meshed-CPU. NeuroTower


improves the latency by about 21%, on average, as compared to the 3D Meshed-CPU.
In NN workloads, the growth in required memory capacity increases exponentially.
On average, more than 80% of the communication is between cores and LLC banks
[6]. In NeuroTower, since complex routers have been placed with simple multiplexers,
packets do not need to wait in saturated paths. Therefore, the latency of NeuroTower
reduces when compared to the 3D Meshed-CPU with complex routing units.

Fig. 8. Energy consumption, normalized to the 3D meshed-CPU design.


NeuroTower: A 3D Neuromorphic Architecture with Low-Power TSVs 235

Fig. 9. System latency, normalized to the 3D meshed-CPU design

3.4 Energy Consumption

Figure 9 shows the system energy, normalized to the 3D Meshed-CPU. Due to the higher
power consumption of interconnection support in 3D Meshed-CPU (complex routing
units and cores) in contrast to the NeuroTower, energy consumption of the 3D Meshed-
CPU is about 1.76 times that of the NeuroTower on average. Due to the low-latency direct
paths between the PEs and Vaults in the NeuroTower, energy consumption is improved
compared to the 3D Meshed-CPU.

4 Conclusion

The NeuroTower is an innovative means to employ neuromorphic processing and shows


great promise for future use. In a world dominated by the ever-expanding market for tech-
nology, neural network processing must continually satisfy these demands. This paper
expanded upon the original idea of the Neurocube to accomplish this. The NeuroTower
makes use of stacks of DRAM to store all the information for processing of the neural
network. In conjunction with the programmable address generator, packets are formed
which contain the data required to complete MAC operations in the processing elements.
Before being sent to the processing element, the data field of packets are pruned using
a comparator. Depending on the result of the comparison, the processing continues as
outlined in Fig. 7. By exploiting naturally occurring sparsity within the system through
a simple pruner unit, traffic congestion and processing power can both be reduced.

References
1. Ibrahim, Y., et al.: Soft errors in DNN accelerators: a comprehensive review, Micro-
electron. Reliab. 115, 113969 (2020). https://doi.org/10.1016/j.microrel.2020.113969. ISSN
0026-2714
2. Maxwell, J.C.: A Treatise on Electricity and Magnetism, vol. 2, 3rd edn., pp. 68–73.
Clarendon, Oxford (1892)
236 A. Asad and F. Mohammadi

3. Kim, D., Kung, J., Chai, S., Yalamanchili, S., Mukhopadhyay, S.: Neurocube: a programmable
digital neuromorphic architecture with high-density 3D memory. In: 2016 ACM/IEEE 43rd
Annual International Symposium on Computer Architecture, pp. 380–392 (2016)
4. Mahmoud, M., et al.: TensorDash: exploiting sparsity to accelerate deep neural network
training. In: 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture
(MICRO), pp. 781–795 (2020)
5. Panem, C., Gad, R.S.: Brajesh Kumar Kaushik, Vertical traversal approach towards TSVs
optimisation over multilayer network on chip (NoC). Microelectron. J. 116, 105231 (2021).
ISSN 0026-2692
6. Höppner, S., et al.: The SpiNNaker 2 processing element architecture for hybrid digital
neuromorphic computing. arXiv [cs.AR] (2021)
7. Albericio, J., Judd, P., Jerger, N., Aamodt, T., Hetherington, T., Moshovos, A.: Cnvlutin:
ineffectual-neuron-free deep neural network computing (2016)
8. Judd, P., Delmas, A., Sharify, S., Moshovos, A.: Cnvlutin2: ineffectual-activation-and-weight-
free deep neural network computing (2017)
9. Asadikouhanjani, M., Zhang, H., Gopalakrishnan, L., Lee, H.-J., Ko, S.-B.: A real-time archi-
tecture for pruning the effectual computations in deep neural networks. IEEE Trans. Circuits
Syst. I Regul. Pap. 68(5), 2030–2041 (2021). https://doi.org/10.1109/TCSI.2021.3060945
10. Norollah, A., Derafshi, D., Beitollahi, H., Patooghy. A.: PAT-Noxim: a precise power &
thermal cycle-accurate NoC simulator. In: 2018 31st IEEE International System-on-Chip
Conference (SOCC), pp. 163–168. IEEE (2018)
11. Chen, K.-C., Wang, T.-Y.: NN-noxim: high-level cycle-accurate NoC-based neural networks
simulator. In: 2018 11th International Workshop on Network on Chip Architectures (NoCArc),
pp. 1–5. IEEE (2018)
12. The gem5 simulator. https://www.gem5.org/
13. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput.
Vis. 115(3), 211–252 (2015)
14. Joardar, B.K., Kim, R.G., Doppa, J.R., Pande, P.P., Marculescu, D., Marculescu, R.: Learning-
based application-agnostic 3D NoC design for heterogeneous manycore systems. IEEE Trans.
Comput. 68(6), 852–866 (2018)
15. Asad, A., Kaur, R., Mohammadi, F.: A survey on memory subsystems for deep neural network
accelerators. Future Internet 14(5), 146 (2022)
16. Asad, A., Al-Obaidy, F., Mohammadi, F.: Efficient power consumption using hybrid emerg-
ing memory technology for 3D CMPs. In: 2020 IEEE 11th Latin American Symposium on
Circuits and Systems (LASCAS), pp. 1–4. IEEE (2020)
17. Dorostkar, A., Asad, A., Fathy, M., Jahed-Motlagh, M.R., Mohammadi, F.: Low-power het-
erogeneous uncore architecture for future 3D chip-multiprocessors. ETRI J. 40(6), 759–773
(2018)
18. Asad, A., Ozturk, O., Fathy, M., Jahed-Motlagh, M.R.: Optimization-based power and ther-
mal management for dark silicon aware 3D chip multiprocessors using heterogeneous cache
hierarchy. Microprocess. Microsyst. 51, 76–98 (2017)
19. Al-Obaidy, F., Asad, A., Mohammadi, F.A.: Improving power-performance via hybrid cache
for chip many cores based on neural network prediction technique. Microsyst. Technol. 27(8),
2995–3006 (2020). https://doi.org/10.1007/s00542-020-05048-5
Coherence Domains in Condensed Matter
as Storage “Devices” of Quantum
Information

Luigi Maxmilian Caligiuri(B)

Foundation of Physics Research Center (FoPRC), 87100 Cosenza, Italy


[email protected]
http://www.foprc.org

Abstract. The theory of QED coherence predicts that, in condensed


matter, if temperature and density obey given constraints, the system
undergoes a spontaneous phase transition to a macrosopic quantum
state, energetically favored and then more stable, in which matter field
oscillates in tune with an autogenerated e.m. field. In this state, “coher-
ent domains” form, namely macroscopic regions, in which such tuned
oscillations take place, that are characterized by an energy gap and low
entropy. In this paper, we show that such domains can even behave as
storing “devices” of quantum information and that, when they interact
each other by forming large network of coherently interacting coherent
domains, they further boost their storing capabilities. Because of their
coherent dynamics, the coherent domains are also characterized by high
stability and scalability making them the suitable storing substrate for
quantum computational systems. In this perspective the application of
such results to liquid water is discussed.

Keywords: QED coherence · Coherent domains · Quantum


information · Water

1 Introduction

According to Quantum Field Theory (QFT), the observed features of macro-


scopic complex systems result from the collective dynamics of their microscopic
components. Generally, these systems are characterized by the emergence of
order and stability at macroscopic space-time scales, despite from the non-
ordered set of their elementary components and the quantum fluctuations charac-
terizing their behavior at microscopic scales. The stability of the system, despite
the fluctuations of its elementary components, is ensured by the Lagrangian
function L to be invariant under a change of the local phase of the quantum
field ψ (x, t), that is [1]

ψ (x, t) → ψ ′ (x, t) = exp [iαθ (x, t)] ψ (x, t) (1)

c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 237–248, 2023.
https://doi.org/10.1007/978-3-031-18344-7_15
238 L. M. Caligiuri

The required invariance of L under Eq. (1) implies the introduction of a


gauge field Aµ (x, t) which, at the dimensions of atoms and molecules, is just
the electromagnetic field. The Lagrangian is then invariant under a local gauge
transformation
Aµ (x, t) → A′µ (x, t) − ∂µ θ (x, t) (2)
and we can substitute in L the usual derivative with the co-variant one defined
as
Dµ ψ = (∂µ + iαAµ ) ψ (3)
that also transforms like Eq. (1), namely

Dµ ψ → exp [iαθ (x, t)] Dµ ψ (4)

Due to the local gauge invariance of L, the oscillations of the matter and
e.m. fields within the system are tuned with each other, for the gauge field
generates the required background in which such dynamics takes place, giving
rise to a new macroscopic coherent quantum state of the system [2,3]. More
specifically, the occurrence of suitable values of temperature and density [2–4]
makes the system to spontaneously undergo a quantum phase transition from a
non-coherent ground state (characterized by uncorrelated matter and e.m. fields
oscillations) towards a coherent ground state (CGS) in which, on the contrary,
these oscillations are phase correlated. This phenomenon can be also described
as a spontaneous symmetry breaking (SSB) according to which, in the “true”
ground state of the system, i.e. the CGS, the matter and e.m. fields are phase-
locked each other. Even more remarkably, such a mechanism also gives rise to
the formation of macroscopic spatial regions, named “coherent domains” (CD)
that are seats of such tuned oscillations. Such domains, that resemble the macro-
scopic state of a superfluid or a superconductor, are characterized by a quantum
wavefunction that is macroscopic in character [4] and given by

Ψ (x, t) = Ψ0 (x, t) eiσ (x,t) (5)

where Ψ0 (x, t) is the amplitude of the wave function and Θ (x, t) its quantum
phase, the latter being eigenfunction of a suitable quantum phase operator [5,6].
According to Eq. (5) the matter and e.m. fields of the system exhibit a collective
coherent behaviour giving rise to a quantum field resulting in the “condensa-
tion” of quasi-particles at macroscopic scale. As a very meaningful result of
such dynamics, to each coherent domain is associated an energy gap ffiE < 0
compared to the non-coherent ground state that makes it stable against envi-
ronmental decoherence (for not too high temperatures) [4]. Moreover, the phase
- correlation between matter components and e.m. field occurring within a CD,
resulting from the “phase - locking”, produces a long-range order at macroscopic
scale and a decrease of the entropy, that, in turn, allows for the “storing” of an
amount of information in the CD itself. Such information is even quantum in
nature since it arises from the coherent quantum dynamics of CDs. In particular,
the coherent oscillations taking place inside CDs, determine a rescaling of the fre-
quency of e.m. field, the latter becoming “trapped” inside them, for the photons
Quantum Information by Water CDs 239

belonging to it are characterized by a purely imaginary mass [4]. QED coherent


dynamics then determines the spontaneous onset of macroscopic regions (the
coherent domains), characterized by long-range order and lower entropy com-
pared with the non-coherent ground state and the surrounding environment. The
case of liquid water is mostly interesting, the coherent dynamics predicting for it
the possible formation of a spectrum of excited energy levels. These arise in the
form of vertices of quasi-free electrons, representing collective oscillations of the
whole CD, corresponding to its excitations [5,6]. The resulting dynamics makes a
CD able to convert the high-entropy energy, absorbed from the surroundings, to
information stored in a low-entropy single coherent excited state of the CD. The
energy gap per molecule |ffiE|/N characterizing an isolated CD mainly results
from the two energy levels of the matter components involved in the coherent
oscillation (as well as from some parameters related to the atoms/molecules com-
posing the matter field) that are automatically selected by the dynamics itself.
It has been shown that when two or more CDs are close each other, the overall
energy gap associated to their system meaningfully increases if compared with
an ensemble of an equal number of isolated CDs. In this paper, starting from our
previous results [7], we give an expression for the amount of quantum informa-
tion that could be “stored” inside an extended network of interacting CDs, also
giving a numerical estimation of such quantity as a function of temperature if
we consider coherent domains of liquid water. We also suggest such system could
represent a storing “device” to be used for quantum computing by exploiting the
coherent dynamics of liquid water as already proposed in previous publications
[8–11].

2 A “Bird-Fly” View of QED Coherence in Matter


According to the theory of QED coherence in matter [4], a system composed
by a very high number of elementary components (atoms/molecules) is able to
spontaneously undergo a phase transition towards a special state characterized
by the presence of tuned collective oscillations of all its own matter components
with a e.m. field associated to the transition between a couple of energy levels
of such components. The dynamic equations of such system admit a limit cycle
solution (long-time stationary state) defined by the difference between the energy
levels generating the coherent oscillation, namely
hc
E = ω0 = (6)
λ
λ being the wavelength of the tuning e.m. field, c the speed of light in vacuum
and  = h/2π (in the following we adopt, unless otherwise specified, the “natural
units”  = c = 1).
The new coherent and more stable state towards which the system sponta-
neously evolves is characterized by the formation of coherent domains (CDs),
whose dimension is of order of λ, given by Eq. (6), inside which the e.m. field is
confined and the coherent oscillations take place. We consider a quantum system
240 L. M. Caligiuri

characterized by two energy levels, with respectively associated energies (E1 , E2 )


and wave functions ψ1 (x, t) and ψ2 (x, t), satisfying the condition
2 2
|ψ1 | + |ψ2 | = 1 (7)

Matter fields ψ1 (x, t) and ψ2 (x, t) interact, within the CD volume (of size
λ3 ), with a e.m. field associated to a potential vector of amplitude A (x, t). The
long-term evolution of the system is then described by the equations.

ψ1 (τ ) = cos γ exp [iθ1 (τ )]


ψ2 (τ ) = sin γ exp [iθ2 (τ )] (8)
A (τ ) = A0 exp [iϕ (τ )]

where 0 < γ < π/2 and τ = ω0 t is an adimensional time. The fields given by
Eqs. (8) also satisfy the “phase-locking” constraint [2–4]

∂ϕ ∂θ1 ∂θ2
= − (9)
∂τ ∂τ ∂τ
The new stable state of the system, namely the CGS, assumes, after a very
short transient time, non-vanishing amplitudes, described by Eqs. (8).
The phase-locking constraint given by Eq. (9), occurring in the coherent
state, defines a macroscopic quantum state (including a very high number N
of elementary components of the system), being an eigenstate of the quantum
phase operator with a well-defined eigenvalue Θ (x, t) [5,6] the latter defining
the oscillation of the whole CD as a single macroscopic quantum system.
The evolution from the non-coherent state to CGS implies a reduction of
the CD energy per atom/molecule given by ffiE/N > 0 so that it represents
the “true” ground state of the system that is much more stable compared to
the non-coherent “perturbative” ground state (PGS). For spherically symmetric
CDs of radius RCD , the radial profile of the energy gap (δE = −ffiE < 0) is
given by [4,12]
δE δE
(x) = (0) g (x) (10)
N N
where x ≡ ω0 r/π, r being the radial distance from the CD’s center, δE/N (0) is
the value of the energy gap at the CD’s centre (r = 0) and g (x) ≡ P 2 (x) is a
function of x giving the radial behavior of energy gap. A specific calculation [4]
shows that, for such a CD, the “radius” is given by

rcoh ≡ RCD ≃ (11)
4ω0

so that, “inside” a CD, 0 < x < 3/4.


For an “isolated” coherent domain, the function g (x) is given by [4,12]

sin(ρ x) 3
P (x) = √ 1 ρ x  3
0 ≤ x <3
4 (12)
2ρ x
exp −π x − 4 x≥ 4
Quantum Information by Water CDs 241

showing the energy gap extends far beyond the CDs “borders”, due to the evanes-
cent tail of the coherent e.m. field generated by the coherent dynamics inside
the CD [4].
For this reason, when two or more CDs are close each other, within one
coherent domain we observe a superposition between the “inner” e.m. field and
the evanescent e.m. field due to the neighbouring domains. This is energetically
advantageous for the system for it becomes more stable compared to far isolated
domains. We can then interpret this increase of the energy gap as the arise
of a “binding” energy between neighbouring CDs, This energy gain or binding
energy acquires a maximum value when the coherent domains are the most
closely packed, namely, in the case of spherically symmetric CDs, when the
interdomains distance d satisfies the condition.

d = 2RCD (13)

For a couple of such closely packed CDs, the function g (x) assumes, within
a single domain, the form
√  
sin (πx) 2 3
P (x) = + exp −π −x (14)
πx π (3 − 2x) 4
and the full profile of g (x) including both the two close domains is represented
in Fig. 1.

Fig. 1. Graph of the Function g (x) for a Couple of Close Interacting Domains: a)
Isolated Domains (Dotted Line); b) Close Domains (Continuous Line)

All the above considerations strictly hold for absolute temperature T = 0.


On the other hand, when T > 0, the thermal energy of the atoms/molecules
242 L. M. Caligiuri

would be able to desynchronize some of them pushing these out from the CD.
In this case, we can quantify the number of components belonging to coherent
(non-coherent) state by the average fraction of the total species in this state
Fc (T ) (Fnc (T )) such that, at a given temperature

Fcoh (T ) + Fnc (T ) = 1 (15)

The number of the atoms/molecules performing the coherent oscillations,


at temperature T , is then given by Ncoh (T ) = Fcoh (T ) N . Consequently, the
overall energy gap associated to a given coherent domain, at T > 0, is given by

δE
δECD (T ) = Fcoh (T ) N (16)
N coh

As already mentioned, the case of water is particularly interesting since, due


to the particular energy levels involved in the coherent oscillation, the coherent
state contains a charged fluid of quasi-free electrons able to create excited energy
states of the coherent domain [5,12]. The very special features of the coherent
state of water are further emphasized in the case of interfacial water (namely
water very close to a hydrophilic surface) since the surface walls are able to
attract water molecules so stabilizing the coherent fraction close to unity and
making water almost fully coherent even at room temperature [13,14].

3 Quantum Information Stored in a Network of QED


Coherent Domains
As we have seen, the coherent state of liquid water is characterized by long-range
order, a precise value of phase (the same for all the elementary components
involved in the common oscillation) of the related macroscopic wave function
given by Eq. (5), as well as a lower value of entropy compared to non-coherent,
commonly considered, macroscopic state.
The amount of information stored in a system can be roughly estimated by
using the general definition given by [15]

P0
I = K ln (17)
P1
where P0 is the number of chances for the system in the initial state, in which
we assume I0 = 0 (no information), P1 (P1 < P0 ) the number of chances in the
final state in which I1 > 0, given by Eq. (17), and K = kB (the Boltzmann
constant) in thermodynamic units.
We now apply Eq. (17) to a QED coherent system by invoking the concept
of “bound information” [7,15] and assuming the whole entropy variation is con-
verted into information. We then calculate the highest information that can be
stored in a coherent system, after the transition to CGS has taken place, as

− ffiS = (kB ln 2) I (18)


Quantum Information by Water CDs 243

with ffiS = S1 − S0 being the variation of entropy occurring during the pro-
cess and I is the associated information expressed in bits. By following [7] we
obtain the expression of the information that can be stored in a coherent system
including N elementary oscillating components

δE Fcoh (T ) N
I=− (19)
N coh ln 2kB T
In particular, Eq. (19) tells us the quantity of quantum information storable
in a coherent domain is proportional to the energy gap/molecule, to the coherent
fraction and to the total number of elementary components of the system, while
it decreases with the temperature. Furthermore, according to Eq. (19), the total
amount of information storable in a given system, including a number nCD of
different coherent domains, is directly proportional to nCD so that we can write
I (nCD ) = ICD nCD (20)
where ICD indicates the information stored per single coherent domain as given
by Eq. (19). If we consider a network of closely packed coherent domains, whose
intercentrum distances are all equal to 2RCD , we must account for the energy
gain due to such configuration in order to calculate the overall information
storable in the network of coherent domains. We firstly note (let’s consider Fig. 1
as well as Eq. (12) and Eq. (14)) that, in this configuration, at x = 3/4 the
energy gained by a couple of packed CDs is about four time larger than in the
two isolated domains. In fact, by using Eq. (14), we obtain
δE12 δE1
(RCD ) ≈ 32 9π 2 8 9π 2 = 4
  
(RCD ) (21)
N N
where δE1 is the energy gap of an isolated domain and δE12 that of two close
domains. Ideally, as already suggested, we could think of the energy gap per
couple of domains as a “binding” or, even, a “potential” energy V12 associated
to the couple of coherent domains in this geometrical configuration. In first
approximation we could further assume
V12 δE12 δE12 32 δE1
(x) = (x) ≈ (RCD ) = (0) ≡ ffiV (22)
N N N 9π 2 N
the overall energy gap associated to a network of closely packed CDs, each con-
taining N elementary components, in the above geometrical configuration, can
be then calculated in the same way as the potential energy of a distribution of
point-wise electric charges interacting via a Coulomb - like potential, namely,
nCD −1 nCD
Vtot = Vij (23)
i=1 j>i

that, with the assumption Vij = ffiV , gives


nCD −1 nCD
32 δE1 nCD (nCD − 1)
Vtot = ffiV = (0) (24)
i=1 j>i
9π 2 N 2
244 L. M. Caligiuri

that can be compared with the expression of Ṽtot in the case of nCD “isolated”
coherent domains, that is
8 δE1
Ṽtot = (0) nCD (25)
9π 2 N
As regard as the total amount of information I˜tot stored by a network of
interacting coherent domains we obtain, by inserting Eq. (24) in Eq. (19):

32 δE1 Fcoh (T ) N nCD (nCD − 1)


I˜tot (nCD ) = − 2 (0) (26)
9π N ln 2kB T 2
to be compared with the information stored by isolated coherent domains, given
by Eq. (20):
8 δE1 Fcoh (T ) N
Itot (nCD ) = − 2 (0) nCD (27)
9π N ln 2kB T
with δE1 < 0. Finally, by using Eq. (27) in Eq. (26) we can also write (nCD > 1)

I˜tot (nCD ) = −2InCD (nCD − 1) (28)

where I is given by Eq. (19), calculated at r = RCD (or x = 3/4).


A numerical estimation of the information stored in coherent domains can be
given in the case of water recalling that for an isolated CD at room temperature
[4,5,12] (T = 293.15 K), by assuming δE1 (0)/N ≃ −0.26 eV , N ≃ 2 · 105 and
Fcoh ∼ 0.4 we obtain, for a single CD, I ≃ 1.08 · 105 bits and, for nCD coherent
domains
I˜tot (nCD ) ≃ 2.16 · 105 nCD (nCD − 1) bits (29)
For interfacial water, for which we can assume Fcoh ∼ 1, we have, for a single
CD, I ≃ 2.70 · 105 bits and, for nCD coherent domains

I˜tot (nCD ) ≃ 5.40 · 105 nCD (nCD − 1) bits (30)

We see from Eq. (20) and Eqs. (29)–(30) the information stored in n isolated
coherent domains is O (n) while, in the case of n interacting domains, for large
n, it is O n2 . Figure 2 shows the plots of Itot (isolated domains) respectively
 

for bulk and interfacial water, while Fig. 3 the plots of I˜tot as a function of nCD .
We note the amount of information storable by water CDs is noticeable, even
for a low number of coherent domains, both for bulk and interfacial water. For
example, we have, for nCD = 100, Itot = 1.08 · 107 bits, I˜tot = 2.14 · 109 bits for
bulk water and Itot = 2.70 · 107 bits, I˜tot = 5.35 · 109 bits
 for interfacial water.
The higher the value of nCD , the higher the ratio I˜tot Itot . For example, at

nCD = 106 we have I˜tot Itot ∼106 . It is interesting to estimate the amount of
quantum information storable by the coherent domains enclosed in a given vol-
ume of water, by assuming the geometrical configuration corresponding to their
closest packing satisfying Eq. (13). This can be done by considering spherically
symmetric domains characterized by a radius RCD ≃ 375A. This calculation is
shown in Fig. 4.
Quantum Information by Water CDs 245

Fig. 2. Information Itot Stored by Water Domains: (a) Bulk Water; (b) Interfacial
Water as a Function of n.

Fig. 3. Information I˜tot Stored by Water Domains: (a) Bulk Water; (b) Interfacial
Water as a Function of n.

Fig. 4. Information I˜tot Stored by Water Domains as a Function of the Overall System
Volume.
246 L. M. Caligiuri

It is very remarkable to note that, even small volumes of coherent water


(2 cm3 or less in our example) could be able to store, in principle, a quantity
of quantum information up to 1037 bits ! This very noteworthy result is due
to the increase of the energy gap of the macroscopic coherent system as the
coherence domains become closer each other. The amount of information that
can be stored in the coherent system is proportional to its degree of coherence, in
turn related to the number of coherently oscillating elementary components. We
can recognize, in this behavior, a general feature of the QED coherent dynamics
in matter. Furthermore, in the case of water, as already shown [12], the existence
of excited states of coherent domains allows them even to interact each other by
the exchange of evanescent photons due to quantum tunneling effect.
This kind of interaction can produce, if favorable constraints are satisfied, a
synchronized oscillation between coherent domains, occurring at different space-
time scales, we can call “supercoherence” (i.e. a coherence among coherent
domains) [11,12,16,17]. Supercoherence raises the coherence level and, conse-
quently, the energy gap of the system composed by many water coherent domains
oscillating in phase. This increase of energy gap δE/N would give, in turn, a
boost to the storing capabilities of a large amount of information as shown by
Eq. (26).
A large ensemble of water CDs (supercoherent of not) then acts as negen-
tropic “device” able to accumulate high-entropy energy from the environment
and convert it into low-entropy energy associated to an individual macroscopic
quantum state of a coherent domain, corresponding to a large amount of storable
information that, for a large set of nCD interacting domains in the fundamen-
tal energy state (CGS), is proportional to n2CD . As regards as the use of such
results in order to realize physical devices able to exploit the coherent dynamics
of water to store large amounts of quantum information, this author has already
shown in detail how quantum information can be “accumulated” in the quantum
phase associated to the macroscopic wave function describing a coherent domain
as a whole [8,9], also suggesting some possible ways to store and retrieve such
information [9].
Without entering into a more detailed analysis of such proposals that will
be the subject of forthcoming publications, we limit ourselves to emphasize just
two key features: a) the stability of the coherent state against the environmental
decoherence, even at room temperature, especially as occurring for interfacial
water; b) the scalability of the system up to a very high number of elementary
components in order to further increase the overall storing capacity of the system.
On the other hand, the requirement of a lower limit of the volume needed to
realize the coherent dynamics, namely that corresponding to just a single coher-
ent domain, could appear as a drawback compared to its utilization in the field
of computer science and, in particular, of quantum computation. Nevertheless,
this isn’t the case if we consider the prevailing strengths offered by the proposed
approach compared to the present ones, especially those concerning, for instance,
large-scale distributed computers and quantum internet.
Quantum Information by Water CDs 247

4 Outlook and Conclusions


The theory of QED coherence predicts the existence, for condensed matter sys-
tems, of a coherent ordered macroscopic quantum state, called CGS, formed by
coherent domains, namely defined regions characterized by phase-locked oscil-
lations of the elementary components and a non-vanishing e.m. field confined
within them.
To this quantum state, described by a macroscopic wave function with a
well-defined phase, is associated an energy gap per particle (atom/molecule)
that makes it more stable compared to the non - coherent state.
The transition of the system towards CGS is associated to a decrease of
entropy, in turn corresponding to an increase of the information “stored” by
each coherent domain in the systems itself that, consequently, is “informed”
by the transition. Furthermore, when coherent domains are closely packed each
other their “binding” energy makes the energy gap higher than in the isolated
coherent domains so further increasing the energy gap characterizing them.
Yet, liquid water exhibits a very particular behavior since its coherent
domains admit a spectrum of excited coherent states, that allow them to interact
each other through the quantum tunneling of evanescent photons, so realizing
a tuned oscillation among the interacting coherent domains themselves (super
coherence).
Such process causes a subsequent reduction of the energy gap/molecule and
an increase in the amount of quantum information potentially storable inside
coherent domains.
In summary, the QED coherent dynamics in condensed matter allows the
spontaneous formation of low-entropy and stable macroscopic quantum states,
able to store a large quantity of quantum information. In this paper we gave a
preliminary, although through calculation of the amount of quantum informa-
tion that can be memorized by a network of coherent domains both isolated and
closely packed in a given volume. In particular, we have shown that, for a net-
work of n closely packed interacting domains, the amount of storable quantum
information increase as n2 , for large n, at a given value of temperature whereas,
for isolated domains, the quantity of information is proportional to n.
The particular and meaningful case of liquid water has been analyzed numer-
ically, showing that, event for volumes of coherent water less than 2 cm3 the
amount of information stored in the system can be of the order of 1037 bits !.
Furthermore, when looking at the interfacial water, the system can be consid-
ered as fully coherent (and so extremely stable against the thermal environmen-
tal destructive effects) even at room temperature, so still raising the storing
capability of the coherent domains. Finally, we’d like to remark the exploitation
of QED coherent macroscopic state of liquid water as a physical substrate to
perform quantum (hyper)computation has been already studied by this author
[8–11] so, although still in an introductory stage, our present results further show
the potentials of the QED coherence in condensed matter in the field of com-
puter science and the realization of very powerful, stable and scalable quantum
computational devices.
248 L. M. Caligiuri

References
1. Maggiore, M.: A modern Introduction to Quantum Field Theory, pp. 243–247.
Cambridge University Press, Cambridge (2004)
2. Del Giudice, E., Vitiello, G.: The role of the electromagnetic field in the formation
of domains in the process of symmetry breaking phase transitions. Phys. Rev. A
74, 022105 (2006)
3. Del Giudice, E., Vitiello, G.: Quantum fluctuations, gauge freedom and meso-
scopic/macroscopic stability. J. Phys.: Conf. Series 87, 012009 (2007)
4. Preparata, G.: QED Coherence in Matter. World Scientific, Singapore (1995)
5. Del Giudice, E., Tedeschi, A.: Water and autocatalysis in living matter. Electro-
magn. Biol. Med. 28, 46–52 (2009)
6. Caligiuri, L.M.: The quantum phase operator and its role in quantum comput-
ing. In: Caligiuri, L.M. (ed.) Frontiers in Quantum Computing, pp. 39–56. NOVA
Science Publisher, New York (2020)
7. Caligiuri, L.M.: QED coherence in matter, syntropy and the coherent domains as
storing “devices”. J. Phys.: Conf. Series 2197, 012004 (2022)
8. Caligiuri, L.M.: Quantum (hyper)computation by means of water coherent domains
- Part II: the computational level. In: Caligiuri, L.M. (ed.) Frontiers in Quantum
Computing, pp. 57–102. NOVA Science Publisher, New York (2020)
9. Caligiuri L.M.: Fast and accurate control of gates for quantum hypercomputation
in coherent domains of water. J. Phys. Conf. Series 2162 012025 (2022)
10. Caligiuri L.M.: Quantum (hyper)computation through universal quantum gates in
water coherent domains. J. Phys. Conf. Series 2162 012003 (2022)
11. Caligiuri, L.M.: QED coherence and super-coherence of water in brain microtubules
and quantum hypercomputation. In: Bandyopadhyay, A., Ray, K. (eds.) Rhythmic
Advantages in Big Data and Machine Learning. SRE, pp. 225–262. Springer, Sin-
gapore (2022). https://doi.org/10.1007/978-981-16-5723-8 9
12. Caligiuri, L.M.: Quantum (hyper)computation by means of water coherent domains
- Part I: the physical level. In: Caligiuri, L.M. (ed.) Frontiers in Quantum Com-
puting, pp. 1–37. NOVA Science Publisher, New York (2020)
13. Buzzacchi, M., Del Giudice, E., Preparata, G.: Coherence of the glassy state. Int.
J. Mod. Phys. B 16(25), 3771–3786 (2001)
14. Del Giudice, E., Tedeschi, A., Vitiello, G., Voeikov, V.: Coherent structures in
liquid water close to hydrophilic surfaces. J. Phys. Conf. Series 442, 012028 (2013)
15. Brillouin, L.: Science and Information Theory, pp.152-153. Dover Publication, New
York (1962)
16. Del Giudice, E., Spinetti, P.R., Tedeschi, A.: Water dynamics at the root of meta-
morphosis in living organisms. Water 2, 3771–3786 (2010)
17. Caligiuri, L.M.: Super-coherent quantum dynamics of zero-point field and super-
luminal interaction in matter. In: Amoroso, R.L., Kauffman, L.H., Rowlands, P.,
Albertini, G. (eds.) Unified Field Mechanics II: Formulation and Empirical Tests,
pp. 331–343. World Scientific, Singapore (2018)
Antecedents of Software-as-a-Service Adoption
for Small and Medium Enterprise in Developing
Countries

Ahmed Mamdouh Abdelfatah Ibrahim1(B) and Norris Syed Abdullah2


1 Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia
[email protected]
2 Azman Hashim International Business School, Universiti Teknologi Malaysia, 81310 Skudai,
Johor, Malaysia
[email protected]

Abstract. Cloud computing Software as a Service (SaaS) has become one of


the most hotly debated topics among enterprise information technology profes-
sionals (IT). Small and medium enterprises (SMEs) with limited budgets and
human resources are among the most frequent users of cloud computing to take
advantage of this technology. Due to their limited financial resources, it is crucial
for enterprises to adopt innovative technologies that do not require initial invest-
ment and can be deployed as and when required. The rapid improvements and
innovations in SaaS in recent years have led to many companies in various indus-
tries accepting this technology as viable. The shift to SaaS comes with a number
of challenges that go beyond the technology. The adoption of cloud computing
in SMEs is influenced by a number of factors. Before deciding to use cloud-
based solutions, these conditions need to be thoroughly assessed. The objective
of this research is to investigate into the significant factors which influence Cloud
computing Software as a Service (SaaS) adoption by Small and medium enter-
prises in the developing nations. Previous research has demonstrated that SaaS
adoption benefits SMEs greatly. Therefore, the goal of this study to presents and
develop a research model including Relative Advantage, Complexity, Security &
Privacy, Compatibility, Top Management Support, SaaS Awareness, Organiza-
tion Culture, Competitive Pressure, Regulatory Support, Personal Innovativeness
and Prior Technology Experience factors based on the integration of the Tech-
nology Organisation Environment (TOE) and Human Organisation Technology
Fit (HOT-fit) models. The study pertains to SMEs that have previously adopted
cloud computing SaaS. The data can be obtained using a quantitative technique
via surveys of SMEs in Qatar as a developing country and analyzed using the
Smart PLS tool.

Keywords: Cloud computing · SaaS · Adoption · SME’s · TOE · HOT-fit ·


Developing countries

1 Introduction
The development of telecommunications and technology has long been considered one
of the most important growth drivers in various industries [1]. This makes it necessary for

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 249–256, 2023.
https://doi.org/10.1007/978-3-031-18344-7_16
250 A. M. A. Ibrahim and N. S. Abdullah

companies to adopt and use new technologies to improve quality and increase benefits
in all industries [3]. Cloud computing has made significant progress in the last decade,
with usage reaching up to 94 per cent (Shang & Kauffman, 2020). As a result, cloud
computing has become one of the most advanced solutions IT for public and private
companies around the world. According to a recent study, the market for cloud computing
will grow by more than 17% in 2019, reaching $200 billion in 2019 and $278 billion in
2022 [11].
The below is a summary of the paper’s framework: The first Section gives an overview
of SaaS cloud computing, including the research question and objectives as well. The
second section digs into the literature review which includes CC service delivery models,
research methodology and SaaS adoption challenges. Section three contains the proposed
framework and the hypothesis. While the last Section including the main conclusions
and the future work.

Problem Background
Low performance and lack of economic diversity are the result of the absence of a tech-
nological environment for entrepreneurship that necessitates the use of cloud computing
SaaS in small and medium enterprises (SMEs). This is to enable enterprises with limited
budget [12] and human resources to benefit from SaaS cloud computing and reduce the
cost of IT infrastructure and expertise [13, 14] to improve their economic viability.
Despite the benefits of SaaS, there are a number of barriers to adoption. Different
authors have pointed out different difficulties in adopting SaaS in different situations.
According to the findings of [16], these are customization, security and privacy [17],
virtualization and multi-tenancy [9], lack of familiarity with the definition of cloud
computing services & insufficient knowledge, [10] loss of control, regional regulations
[10], governance [21], lack of a well-managed and established standard, management
resistance [22].
Despite the many benefits cloud computing can offer, companies are reluctant to
adopt it [26]. In a modern environment, the pace of technological innovation and new
thinking is a hot topic. Although several studies have been conducted to investigate the
barriers to adoption of CC, they are insufficient [27], and the question remains: How can
a technology like cloud computing (CC) help SMEs overcome the previous problems?
In the context of developing countries, Qatar offers an excellent economic climate
[28], but the majority of SMEs in Qatar still struggle with the use of cloud computing
SaaS and the adoption of cloud computing is still in its infancy with only 3%.

Research Questions
This research will address the antecedents that influence intention to adopt cloud com-
puting SaaS for small and medium-sized enterprise in Qatar. The aim of this research is
to find answers to the questions listed below.
“How can a model be developed to enable the adoption of SaaS in SMEs?” The
following sub-questions support the primary research question:

• RQ1: What is the status of SAAS adoption in SMEs in Qatar?


• RQ2: What factors influence the adoption of SAAS in SMEs in Qatar?
• RQ3: How can the model for SAAS adoption be developed for SMEs in Qatar?
Antecedents of Software-as-a-Service Adoption 251

Research Objectives
The research objectives can be formulated as follows:

• RO1: To explore the current status of SAAS adoption in SMEs in Qatar.


• RO2: To identify the factors influencing the adoption of SAAS among SMEs in Qatar.
• RO3: Develop a model for the adoption of SaaS for SMEs in Qatar.

This research will address the antecedents that influence the intention to adopt cloud
computing Software as a service for SMEs in Qatar.

2 Literature Review

Cloud Computing Service Delivery Models


Cloud computing is a model for ubiquitous, convenient and on-demand network access
to a shared pool of configurable computing resources (e.g. networks, servers, storage,
applications and services) that can be rapidly provisioned and released with minimal
management effort or interaction with the service provider [4].
Previous research has found that there are three main models for cloud service
delivery, referred to as SPI MODEL. SPI is an acronym that stands for Software, Platform
and Infrastructure [5]. Infrastructure as a Service (IaaS), where a provider offers its
customers paid access to storage, networks, servers and other computing resources in
the cloud [6].
Platform as a Service (PaaS) is where the provider enables access to an infrastructure
so that the user can build their own application. The platform allows customers to build
their applications that run on the service provider’s infrastructure [7]. Software-as-a-
Service (SaaS): In this model, the customer can access the provider’s infrastructure via
an interface. The most commonly used interfaces are web browsers. In this model, a
single instance on the service provider’s side supports multiple access instances on the
client’s side [8].

Research Methodology
Howell defines research methodology as the general research strategy that outlines the
way in which research is to be conducted and specifies, among other things, the methods
to be used in the process [2].
In order to answer the research question, survey research needs to collect data from
various sources. For this research, the existing literature was also consulted and reviewed.
Therefore, the research design used can be called survey research.

Software as a Service Cloud Computing Adoption Challenges


Both developing and developed countries have adopted SaaS. However, developed coun-
tries are far ahead of emerging countries in terms of adoption and use of the technology.
Despite the many benefits of cloud computing, its value has yet to be recognized in devel-
oping countries. Many different studies [29, 30] have contributed significantly to the
adoption of cloud computing in developed countries. For example, [29] studied CCA at
252 A. M. A. Ibrahim and N. S. Abdullah

CC adopter organizations in the UK using the TOE model. The study’s findings revealed
that while risks are frequently connected with SEMs’ adoption of innovative technolo-
gies, organizational innovativeness and associated capacities play a critical influence in
cloud computing adoption.
In Universities in Malaysia, a study was done to determine the differences based on
SaaS adoption [31]. The research emphasized the importance of innovation for SaaS
adoption, as well as effort expectation, social impact, performance expectancy, self-
efficacy, peer, and even superior components. As stated by Awodele et al., issues in cloud
computing services can be in terms of (a) Network and Data Security, (b) Governance,
Compliance and legal, and (c) Communication interface and Virtualization Security
[18]. Haider and Selvan, have addressed the inability to maintain data confidentiality
because of the huge number of access devices and applications to store and manage data
in cloud-based storage to be the most prominent issue in cloud computing [19].
Security and privacy challenges associated with SaaS be addressed as follows [15].
Following findings from (Nema, S. 2016), security and privacy issues in cloud computing
are in terms of Loss of control, Lack of transparency, Virtualization, and Multi-Tenancy
as shown in figure [9]. Kumar et al., narrated that Loss of control in cloud computing
can be in terms of data loss and breach, data storage, and sharing under several Multiple
Regional Regulations [10]. Research on the use of cloud computing infers nearly 63%
of customers disagree to use services of cloud services providers in case the vendor fails
to prevent loss of data through unauthorized accesses.

Factors Influencing Adoption of SaaS Cloud Computing


While SaaS is a strategic decision made by businesses to cut costs, reduce risk, and
increase system flexibility, Challenges to the adoption of SaaS, in general, can be cate-
gorized as stated by Branco et al. [20], can be in terms of Advantages of Cloud Computing
Maturity of Business, Trust in Cloud Service provider, Analysis of Risks and Agreements
of service level.
The shortage of qualified folks and leadership abilities, high labor, and space
expenses [32] which are regarded as malfunction factors in the organization, as well
as a dearth of a society of entrepreneurialism which is strongly based on good strategy
implementation, are the most specific issues facing the small and medium-sized enter-
prises segment in Qatar [33]. To lower the risk, the business owner must also identify
and examine the current scenario as well as the problematic area.
As discussed by [34] there are various factors that affect the decision of adopting
SaaS services [23]. While some companies can be seen having technological concerns
in terms of the security of data, some find adoption of cloud services complex. This is
because of the fact that; cyber-attacks nowadays have been on a significant rise these
days. Hackers are deliberately trying to find out alternatives to breach securities and
many times they are being successful in it.
In addition, being a comparatively advanced and ever-evolving technology, most
of the users are significantly ignorant of appropriate management and security features
associated with cloud computing and hence find the management quite more complex
than onsite information storage, maintenance and safekeeping. On the other hand, it is
impossible for anyone to access data at his own convenience. Rather, any unauthorized
user is supposed to get into the storage premises and acquire information from individuals
Antecedents of Software-as-a-Service Adoption 253

having knowledge of security features in order to breach the security. Moreover, sharing
cloud services by a number of users reduces the control of organizations over IT/IS.
While small and medium enterprises find it quite comfortable to enjoy services of
a common shared pool, larger organizations do not find it convenient. This is because
of the fact that a shared information pool allows users to use information stored by
other users and hence costs of information collection, maintenance and security reduces
significantly which in turn helps in cost reduction of SMEs.
Quality of service and security provisions affects the decision of using cloud services
by government organizations [24]. This is because; some of the government organiza-
tions such as national financial institutions and data related to the defense sector are
highly confidential and hence cannot be taken a chance of getting any kind of unautho-
rized access. The use of any kind of cloud services by financial and defense organizations
is supposed to make them susceptible to getting attacked by cyber attackers and get the
entire country to go through a financial crisis and breach in national security.
Factors identified by Tehrani and Shirazi, [25] influence the Adoption decision.
These factors are; External Support, Pressure of competition in the market, Knowledge
of decision-makers and employees on efficiencies of Cloud Computing, Information
intensity, potential advantages, privacy and security, innovativeness, complexity, triala-
bility, and compatibility with business requirements, used technologies, and company
norms.
To summarize, the majority of past research focused on businesses as a whole, with
only a few studies focusing specifically on small and medium-sized businesses. Previous
research has primarily focused on the adoption of CC, with only a few studies focusing
specifically on SaaS. While these studies may not be able to exactly pinpoint the benefits
that small and medium-sized businesses can achieve by adopting SaaS, very few studies
are able to address the reasons why their issues with cloud computing SaaS persist.
Only a small amount of research has been done in Qatar to investigate the antecedents
of SMEs adopting SaaS, and only 3% of Qatari SMEs have done so. Very minimal
studies are also looking at human level adoption for small and medium-sized businesses
in developing or industrialized countries. As a conclusion, more research is needed to
determine the various antecedents that may influence SaaS adoption among SMEs in
developing nations.

3 Proposed Frame Work and Hypothesis


This study aims to study the current antecedents which may obstruct SAAS cloud
computing Adoption through implementing combined model consisting of TOE model
integrated with the Hot-Fit model.
The current proposed model as shown in Fig. 1, incorporates TOE (Technology-
Organization-Environment) as among the most applicable frameworks for technology
innovation at the organizational level while acquiring DOI (Relative Advantage, Com-
plexity) technological constructs [35]. A few current proposed constructs (Organiza-
tional Culture, Awareness of SaaS) are got to add to the organizational level by customiz-
ing TOE to define the significant factors that affect Cloud Computing SaaS adoption in
SMEs. Which means, the Technology construct incorporates three DOI factors (Rela-
tive Advantage, Compatibility, and Complexity) with Security, Organization constructs
254 A. M. A. Ibrahim and N. S. Abdullah

(Company Size, Awareness, Organizational Culture), and Environment constructs (Com-


petitive Pressure, and Government Policy) into the original TOE model. Furthermore,
with the addition of HOT-Fit, factors such as innovativeness and technical competence
are considered.

Fig. 1. The proposed model (VAM & ECM)

• H1: Relative Advantage would significantly influence SME’s to adopt SaaS.


• H2: Complexity of SaaS would significantly influence SME’s SaaS adoption.
• H3: Security & Privacy of SaaS would significantly influence SME’s to adopt
• H4: Compatibility of SaaS would significantly influence SME’s SaaS adoption.
• H5: Top Management Support would significantly influence SME’s to adopt SaaS.
• H6: SaaS Awareness would significantly influence SME’s to adopt SaaS.
• H7: Organization Culture would significantly influence SMEs to adopt SaaS.
• H8: Competitive Pressure would significantly influence SME’s to adopt SaaS.
• H9: Regulatory Support would significantly influence SME’s to adopt SaaS.
• H10: Personal Innovativeness would significantly influence to adopt SaaS
• H11: Prior Technology Experience would significantly influence SMEs to adopt SaaS.

4 Conclusion
Across all the services especially on request, SaaS cloud computing is a technology where
computation can be carried out at extremely low costs. To encourage more companies
Antecedents of Software-as-a-Service Adoption 255

to adopt SaaS cloud computing, the service providers need to address all the challenges
affecting the adoption for their clients. In this paper, we have addressed and demonstrated
the numerous obstacles and challenges related to SaaS cloud computing challenges.
Addressing these challenges requires research to be conducted as a future work
from different areas like informatics, statistics, risk modeling, social sciences, and phys-
iological factors. Also, a to investigate the relationship between Relative Advantage,
Complexity, Security & Privacy, Compatibility, Competitive pressure, Regulatory Sup-
port, Awareness, Top Management support, Culture, Personal innovation, Prior Tech
experience, and SaaS adoption as well. As a future work, a research need to conduct to
test the developed model and the hypotheses by collecting the data using a quantitative
technique via surveys of SMEs in Qatar as a developing country then analyzed the data
using the Smart PLS tool to sort out the significant factors that may influence SaaS
adoption in the developing countries.

References
1. Gangwar, H., Date, H., Ramaswamy, R.: Understanding determinants of cloud computing
adoption using an integrated TAM-TOE model. J. Enterp. Inf. Manag. 28(1), 107–130 (2015).
https://doi.org/10.1108/JEIM-08-2013-0065
2. Howell, K.E.: Introduction to the Philosophy of Methodology. Sage Publications, London
(2013)
3. Arvanitis, S., Kyriakou, N., Loukis, E.N.: Why do firms adopt cloud computing? A compara-
tive analysisbased on South and North Europe firm data. Telematics Inform. 34(7), 1322–1332
(2017). https://doi.org/10.1016/j.tele.2016.05.013
4. Mell, P.M., Grance, T.: The NIST definition of cloud computing (2011). https://doi.org/10.
6028/nist.sp.800-145
5. Hashizume, K., Rosado, D., Fernández-Medina, E., Fernandez, E.: An analysis of security
issues for cloud computing. J. Internet Serv. Appl. 4(1), 5 (2013)
6. IaaS, PaaS and SaaS – IBM Cloud service models. https://www.ibm.com/cloud/learn/iaas-
paas-saas. Accessed 24 July 2019
7. Cloud computing service and deployment models: layers and management. Choice Rev.
Online 50(07) (2013). https://doi.org/10.5860/CHOICE.50-3896
8. Alajmi, Q., Sadiq, A.S., Kamaludin, A., Al-Sharafi, M.A.: Cloud computing delivery and
delivery models: opportunity and challenges (2018). https://doi.org/10.1166/asl.2018.11537
9. Nema, S.: A survey of security and privacy challenges in cloud computing. Int. J. Adv. Res.
Comput. Commun. Eng. 5(3), 191–194 (2016)
10. Kumar, D., Samalia, H.V., Verma, P.: Exploring suitability of cloud computing for small and
medium-sized enterprises in India. J. Small Bus. Enterp. Dev. 24(4), 814–832 (2017). https://
doi.org/10.1108/jsbed-01-2017-0002
11. Gartner, Assessing the Security Risks of Cloud Computing (2008).
http://www.gartner.com/DisplayDocument?id=685308 La. Accessed July 2020
12. Mwaniki, P., Ondiek, C.: Evaluation of the effects of SaaS on SMEs in Nairobi County, Kenya.
J. Inf. Syst. Eng. Manag. 3(3), 20 (2018)
13. Fakieh, B., Blount, Y., Busch, P.: SMEs and cloud computing: the benefits to the national
economy and global competitiveness. Paper presented at the Conference: The 13th European
Mediterranean & Middle Eastern Conference on Information Systems. EMCIS (2016)
14. Trinh, T.P., Pham, C.H., Tran, D.: An adoption model of Software as a Service (SaaS) in
SMEs. Paper presented at the PACIS (2015)
256 A. M. A. Ibrahim and N. S. Abdullah

15. Sakr, S., Zomaya, A.: Encyclopedia of Big Data Technologies, 1st edn. Springer, Cham (2019).
https://doi.org/10.1007/978-3-319-77525-8 . eReference ISBN 978-3-319-77525-8
16. Aleem, S., Ahmed, F., Batool, R., Khattak, A.: Empirical investigation of key factors for SaaS
architecture dimension. IEEE Trans. Cloud Comput. 9, 1037–1049 (2019)
17. Sun, Y., Zhang, J., Xiong, Y., Zhu, G.: Data security and privacy in cloud computing. Int. J.
Distrib. Sens. Netw. 10(7), 190903 (2014)
18. Awodele, O., Ominike Akpovi, A., Adebayo, A.O., Tayo, O.O.: Security and privacy issues
in cloud computing (2017). ISSN: 2394-4714
19. Haider, Y., Selvan, S.: Confidentiality issues in cloud computing and countermeasures: a
survey (2016)
20. Branco, T., Jr., de Sá-Soares, F., Rivero, A.L.: Key issues for the successful adoption of cloud
computing. Procedia Comput. Sci. 121, 115–122 (2017)
21. Awodele, O., Adebayo, A.O., Tayo, O.O.: Security and privacy issues in cloud computing.
Commun. Appl. Electron. 7(3), 14–17 (2017)
22. Ahmed, A.M., Moreton, R., Mehdi, Q.H., Elmaghraby, A.: E-government services challenges
and opportunities for developing countries: the case of Libya. Paper presented at the 2013
Second International Conference on Informatics and Applications (ICIA) (2013)
23. Hsu, C.-L., Lin, J.-C.: Factors affecting the adoption of cloud services in enterprises. IseB
14(4), 791–822 (2015). https://doi.org/10.1007/s10257-015-0300-9
24. Alsanea, M., Barth, J., Griffith, R.: Factors affecting the adoption of cloud computing in the
government sector: a case study of Saudi Arabia. Int. J. Cloud Comput. Serv. Sci., 36 (2014)
25. Tehrani, S.R., Shirazi, F.: Factors influencing the adoption of cloud computing by small
and medium size enterprises (SMEs). In: Yamamoto, S. (ed.) HCI 2014. LNCS, vol. 8522,
pp. 631–642. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07863-2_60
26. Fishman, C.: The insourcing boom. The Atlantic, pp. 44–52, December 2012
27. Gide, E., Sandu, R.: A study to explore the key factors impacting on cloud based service
adoption in Indian SMEs. In: 12th International Conference on e-Business Engineering,
pp. 387–392. IEEE (2015)
28. Arabian Business: Qatar SMEs face barriers to 2022 projects (2013). https://www.arabianbu
siness.com/qatar-smes-face-barriers-2022-projects-485087.html
29. El-Haddadeh, R.: Digital innovation dynamics influence on organisational adoption: the case
of cloud computing services. Inf. Syst. Front. 22(4), 985–999 (2019). https://doi.org/10.1007/
s10796-019-09912-2
30. Senarathna, I., Wilkin, C., Warren, M., Yeoh, W., Salzman, S.: Factors that influence adoption
of cloud computing: an empirical study of Australian SMEs. Australas. J. Inf. Syst. 22 (2018).
https://doi.org/10.3127/ajis.v22i0.1603
31. Yadegaridehkordi, E., Nilashi, M., Shuib, L., Samad, S.: A behavioral intention model for
SaaS-based collaboration services in higher education. Educ. Inf. Technol. 25(2), 791–816
(2019). https://doi.org/10.1007/s10639-019-09993-1
32. The Report: Qatar 2008 - Page 178 - Google Books Result. https://books.google.com.
qa/books?id=stSsFxDTl4QC&pg=PA178&lpg=PA178&dq=problem+of+high+cost+of+
skilled+labour+%26+spaces+in+qatar&source=bl&ots=5y1PtZWyjy&sig=ACfU3U342dib
1LmBUbGBh0Szt8oGi0AA1g&hl=en&sa=X&ved=2ahUKEwibh76v7ZroAhXo63MBHX
JHCXMQ6AEwAHoECAoQAQ#v=onepage&q=problem%20of%20high%20cost%20of%
20skilled%20labour%20%26%20spaces%20in%20qatar&f=false
33. Singh, A., Sharma, S., Kumar, S.R., Yadav, S.A.:Overview of PaaS and SaaS and its applica-
tion in cloud computing. Paper presented at the 2016 International Conference on Innovation
and Challenges in Cyber Security (ICICCS-INBUSH) (2016)
34. Hsu, C.-L., Lin, J.C.-C.: Exploring factors affecting the adoption of internet of things services.
J. Comput. Inf. Syst. 58(1), 49–57 (2016). https://doi.org/10.1080/08874417.2016.1186524
35. Rogers, E.M.: Diffusion of Innovations, 5th edn. Free Press, New York (2003)
Software as a Service Challenges: A Systematic
Literature Review

Ahmed Mamdouh Abdelfatah Ibrahim1(B) , Norris Syed Abdullah2 ,


and Mahadi Bahari2
1 Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia
[email protected]
2 Azman Hashim International Business School, Universiti Teknologi Malaysia, 81310 Skudai,
Johor, Malaysia
{Norris,Mahadi}@utm.my

Abstract. Software as a Service (SaaS) as a type of Cloud computing is a type


of computing that become a contentious topic among enterprise information tech-
nology (IT) professionals. In recent decades, rapid advances and breakthroughs
in SaaS have enabled many businesses in numerous industries to consider it as
a promising technology towards using. Nevertheless, such a new approach is
up against a number of obstacles, a systematic literature review of SaaS cloud
computing’s possible antecedents has been conducted to describe the significant
challenges SaaS cloud computing adoptions are facing. Articles describing the
difficulties of SaaS Adoption were compiled. For a concentrated discourse on
solutions, we organized the key challenges in the ontology. As a result, out of
more than 68 factors which has been addressed, 16 factors has been identified and
deeply discussed as a significant factors that may affects CC SaaS adoption. A
comprehensive framework of SaaS Adoption obstacles will be required to expedite
the adoption of this Innovation.

Keywords: Systematic review · SLR · Antecedents · Challenges · Cloud


computing · SaaS

1 Introduction
Migration to SaaS faces various obstacles that go beyond the technology itself. A mass of
antecedents influences the adoption of cloud computing SaaS. These antecedents must
be systematically evaluated prior to making the decision to adopt SaaS solutions. The
goal of this research is to find and assess previous studies on SaaS cloud computing
adoption challenges, as well as to pinpoint any gaps that may exist.
The following is a breakdown of the structure of this paper: Sect. 1 provides an
overview of cloud computing, including a definition of CC, challenges to cloud expan-
sion, the different service delivery models and SaaS benefits as well. Section 2 delves
into research methodology, as well as why SLR is relevant in this study and the scope of
the study. Section 3 summarizes the results of the search and the main conclusions while

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 257–272, 2023.
https://doi.org/10.1007/978-3-031-18344-7_17
258 A. M. A. Ibrahim et al.

Sects. 4 and 5 contain the primary discussion, conclusion, future work with explanations
of the limitations and factors that may influence this research.

Problem Background
Cloud computing is the supply of any provided services via the Internet at cheap rates
by paying only for what you consume [1]. Cloud Computing, as defined by the National
Institute of Standards and Technology (NIST), is a model for delivering and imple-
menting a shared resource of configurable computing resources (e.g. networks, servers,
stockpiling, apps, and services) with relatively minimal exertion or interaction among
organizations and consumers [2]. Furthermore, using flowcharts, the moniker Cloud was
utilized to identify the cloud service symbolized by the cloud sign that makes up the
internet.
According to [3] and [4], there are three ‘levels’ of cloud computing delivery mod-
els from which consumers can pick. Infrastructure-as-a-Service (IaaS), Platform-as-
a-Service (PaaS), and Software-as-a-Service (SaaS) are the three services. Figure 1
depicts the differences among those levels [5]. IaaS consists of infrastructure-centric IT
resources that allow users to have complete control over their configuration and use. An
IaaS environment typically consists of computer hardware, operating systems, networks,
connectivity, and other raw IT resources.

Fig. 1. Cloud computing service delivery models

PaaS refers to a data modeling framework that is ready to use. This service level
consists of IT resources that have been configured and deployed but do not include
infrastructure [6]. SaaS provides a comprehensive solution that is run and managed by
the service provider [7]. Its software licensing methods [8], in which customers can
Software as a Service Challenges 259

access apps made by others via an internet browser [9]. SaaS is a consumable cloud-
based service that is used financially by a set of cloud users. Examples include Office
360, Jira, Google Drive, Oracle CRM, MS Azure, and HubSpot.

SaaS Benefits and Characteristics


As stated by Avram [10], cloud computing possesses a range of benefits and impor-
tance from an enterprises’ point of view [11] as shown in Fig. 2. Cloud computing helps
companies in reducing the cost of entry into the market [12], it also can help in get-
ting immediate access to hardware resources without any capital investment, and hence
market entry becomes faster. With help in lowering IT barriers, which are supposed to
hinder innovation, it becomes easier for enterprises reliant on precise information for
scaling their business.
With the adoption of Cloud-based services, SMEs can also reduce the cost of proper
maintenance and monitoring of the computer system. Cloud computing adoption pro-
vides a high level of scalability by providing significant storage locations and accurately
managing the organization’s business data [13]. It also provides a high level of privacy
and security-related issue as it has an inbuilt security mechanism. SMEs can have sig-
nificant benefits to be able to increase their business efficiency [14] by adopting Cloud
Computing.

Fig. 2. Cloud computing benefits

Despite the growing acceptance of cloud computing, entrepreneurs and scholars have
been vocal about the antecedents and obstacles that this new paradigm has presented.
Some of the difficulties are critical in nature, such as Privacy and data confidentiality.
Other difficulties, like inadequate performance, vendor lock-in, and limited bandwidth,
are a logical outgrowth of this innovation [15]. Those antecedents must be addressed
properly to increase the SaaS adoption level. We conducted a Systematic literature study
of potential SaaS issues and used this information to classify these obstacles into a
categorization that can be used as a framework to encourage an international discussion
260 A. M. A. Ibrahim et al.

on approaches and tools. The goal of our study is to learn more about the kind of
antecedents and challenges that have recently emerged.

2 Methodology
As narrated by Armstrong et al. [16], Systemic reviews are a kind of literary research
utilizing systematic methods for data gathering and evaluating previous studies. Trans-
parency considers one of the important principles in systematic literature reviews [17].
SLR meant to provide a detailed and comprehensive review from the previous studies to
the main research question by establishing formalized and well-defined questions, then
investigation through the correlated papers [18] by evaluating all relevant researches,
and finally synthesizing the finding. This means the main three phases as shown in Fig. 3,
are plan, Conduct, and report [19].

Fig. 3. SLR approach

As summary and guidelines from [20–22] a detailed step has been extracted to
perform SLR in this research, these steps are:

1. Identify Research Questions and Forming Query Strings: Research questions


have been formulated to achieve the illustrated research objectives which have
been discussed in chapter one. From the research question, the extracted Keywords
are “Cloud”, “Adoption”, “Cloud Computing”, “Challenges”, “Benefits”, “Fac-
tors”, “Enterprise”, “Organization”, “Critical”, “Influence”, “Intention”, “Model”
and “Framework” constructed by “AND & OR”.
Software as a Service Challenges 261

Data was collected from Scopus, Thomson Reuters, Elsevier, Science Direct,
Springer, IEEE, Library Genesis, Google Scholar, and World Cat. In addition, some
ranked journals like Telematics and Informatics, Computer Standards & Interfaces, Jour-
nal of Enterprise Information Management, Information Systems Frontiers, Information
Development, Information Systems, International Journal of Information Systems and
Project Management, Australasian Journal of Information Systems, Journal of Enter-
prise Information Management and Journal of Enterprise Information Management to
find the papers published since 2015 and above.

2. Set include/Exclude Criteria:

• Inclusion Criteria which has been used to extract the most related papers are:

– Published papers from 2015 and above.


– Studies wrote in English Format
– Studies that are published in journals, conferences, proceedings of conferences, or
workshops.
– Studies addressed Cloud Computing Adoption Challenges.

• Exclusion Criteria which has been used to exclude the non-related papers are:

– Studies before 2015.


– Studies which not written in the English language.
– Studies are not related to Cloud Computing Adoption
– Under review Studies

3. Quality Assurance: For a positive and valuable analysis, Quality assurances should
be taken into consideration by implementing the below criteria:

• Choose Studies from reputable and esteemed libraries and repositories only.
• Articles from well-known journals only.
• Ranked journals only

4. Extract Literature based on the predefined Criteria: By using the predefined


keywords, 4872 results were found. Filtering this result systematically and applying
the predefined criteria 639 results were founded. With excluding the non-related
titles, 214 papers were defined with related to cloud computing adoption. Then we
filtered this number by reading Abstract, Introduction, and Conclusion to get the
result of 79. Implementing the Quality assurance criteria and reading the full text of
each study finally seventeen papers were selected as illustrated in Fig. 4.
262 A. M. A. Ibrahim et al.

Fig. 4. Final paper selection process

5. Synthesize Data and Document Outcomes: The outcome from reviewing the
selected 17 papers is addressed in Table 1. That shows the selected papers, and
measured factors.

Table 1. The selected papers and measured factors

Year & author Country Measured factors


2017 [23] Spain Top management support, communication,
training, size of the firm, and technological
complexity
2019 [24] Portugal Technological, organizational, environmental
2016 [25] Malaysia Data security, technology trust, vendor trust,
Support from the government and the external
influences, ERP functionality fit
2019 [26] United Kingdom IT Innovation Driven Competitivness,
Innovation IT Cabability, Preceived Innovation
Risks, Organizational Innovativeness, Preceived
Innovation Barries
(continued)
Software as a Service Challenges 263

Table 1. (continued)

Year & author Country Measured factors


2017 [27] India Relative advantage, compatibility, complexity,
organizational readiness, top management
commitment, and training and education,
perceived ease of use (PEOU) and perceived
usefulness (PU)
2016 [28] Ghana Relative Advantage, Security Concern,
Compatibility, Firm Size, Firm Scope, Top
Management Support, Technology Readiness,
Competitive Pressure, Trading Partners’
Pressure, Regulatory Support
2015 [29] China Behaviour and socio-technical, technological,
organizational, and environmental
2018 [30] North America, & Europe Compatability, Compexity, Relative, dvantage,
Image, Security & Trust, Attitude toward cloud
2017 [13] India Perceived benefits, Top management support,
competitive pressure, Perceived concerns,
Security and privacy, Reliability
2020 [31] Malaysia Technology fit, effort expectancy, social
influence, performance expectancy,
self-efficacy, peer and even superior elements
2017 [32] South Korea Relative advantage, competitive pressure,
security and privacy, compatibility, trialability,
observability, IT resources, culture, Social
Influences
2018 [33] Malaysia Cost-saving support from top management,
technology readiness, Relative advantage,
Competitive pressure, External Support
2015 [34] United Kingdom Relative advantage, compatibility, complexity,
top management support, firm size, technology
readiness
2016 [35] Namibia Attitude towards change, satisfaction with the
existing system, lack of knowledge,
compatibility, unreliable internet, data security
2015 [36] Taiwan CIO innovativeness, Perceived technical
competence, Data security, complexity,
compatibility, costs, relative advantage, top
manager’s support, adequate resource, benefits,
government policy, perceived industry pressure
(continued)
264 A. M. A. Ibrahim et al.

Table 1. (continued)

Year & author Country Measured factors


2017 [37] Saudi Arabia Quality of service, security, privacy, trust,
relative advantage, compatibility, trialability, top
management support, external support, culture
2018 [38] Australia Awareness of cloud computing, quality of
service, cloud relative advantage, security,
privacy, flexibility

3 Results and Findings


Out of more than 68 factors which have been addressed, many challenges have been
identified as the significant factors that may affect CC SaaS adoption. These factors are:

Security: Although Most of the SMEs using SaaS services and enjoy services of com-
mon cloud space, Security considered as one of the significant challenge to adopt
cloud computing [39, 40]. With the incorporation of logical isolation in various virtual
machines that is developed with the underlying technology of SaaS cloud computing that
affects the vulnerability of data piracy and data security. Security and privacy challenges
associated with cloud computing can be addressed as follows [41]. The security breach
in the SMEs has occurred with the unauthorized access of the data network and fault
authentication code. A research on use of cloud computing infers nearly 63% customers
disagreeing to use services of cloud services provider in case the vendor fails to prevent
loss of data through unauthorized accesses.

Privacy: Data privacy in SaaS cloud computing [42] refers to the prevention of potential
adversary by cloud services when users visit sensitive data. It is done by assessment of
user behavior in context to visit model of the user. Moreover, the researchers also have
focused on a technology used for maintaining data privacy in cloud computing. This
technology is ORAM or otherwise known as oblivious RAM [43].

Data Confidentiality: Similarly, data confidentiality in cloud computing [42], refers


to the implementation of authentication and access control strategies so that private and
highly confidential data such as bank account information, different access passwords
can be stored securely. On the other hand, Makkaoui [44] has defined data confidentiality
in terms of data storage and data treatment. While confidentiality of data storage helps
in securing stored data from unauthorized accesses, the confidentiality of data treatment
requires data to be encrypted first and then stored in the cloud [45].

Governance: Any business deals with a considerable amount of data need to make
sure that all assets are totally controlled and properly managed. Without an appropriate
governance of data procedure, no SME’s will successfully manage their information, and
the privacy of the information will be broken [46]. In Cloud computing is availability of
confidential information regarding position and security features of data center. Haider
Software as a Service Challenges 265

and Selvan, [15] have addressed inability to maintain data confidentiality because of
huge number of access devices and applications to store and manage data in cloud-
based storage to be the most prominent issue in cloud computing. In addition, it is well
known to everybody that users of cloud computing store data in and extract data from a
shared pool of computing resources.

Interoperability: Lack of well-managed and well-established standard sometimes


makes it impossible to adopt cloud-based services to their data center. In addition, the
lack of proper internet standards is always diminishing the issue of incorporating one
Communication Service Provider to another CSP in case of termination of internet
services. For this, it is not easily understandable and challenging for the employee in
Organization to extract their valuable sources of program information. It is always ben-
eficial for the Communication Service Provider, but the employee often faces various
risks for implementation. SMEs are always looking for achieving maximum efficiency
and increase their benefits [47] through an excellent range of services.

Management Resistance: Management Resistance acted in SMEs in two ways as a


belief or regulatory challenges. The primary method of SMEs for implementing Cloud-
based service is the attitude of the business manager [61], experience, and knowledge of
the manager. Sometimes business managers in SMEs could not possess a clear under-
standing of the ICT technology and could not clearly understand the opportunity behind
the implementation. It is very much difficult for the manager for achieving their business
goal without the proper application of Cloud-based services [47].

Financial Resistance: This is one of the critical factors for the manager for not adopting
cloud computing in their business place. In addition, the underlying reason includes many
SMEs could not spend a large amount of money for the implementation of this Cloud-
based service. Even lack of poor technical knowledge along with the no proper investor
for funding the Cloud-based technology gives the SME minimum bargaining power.
Cloud-based technology is also new to the market. Therefore, it also requires adequate
knowledge and technical skills to implement the technology, in particular, SMEs [48].

Quality Assurance Transparency: Firm contracts among the legal regulations is


essential for transparent and various audit methodology of continuous analysis of the
multiple services provided. Most Major Organization always finds it challenging to
analyze the overall performance of the business organization after the incorporation of
the Cloud-based secure [47, 49]. Wholly owned infrastructure, organized and managed
by CSP is very much difficult for the information technology manager for adopting
this technology. In addition, a service-oriented Cloud-based computer model always
emphasizes various issues of service management. Software quality and assurance man-
agement is very much a critical factor for the employee who is working on SMEs with
the incorporation of Cloud-based service [47].

Regulatory Requirement: Law enforcement becomes challenging when the personal


data of the customer is held in different jurisdictions by different cloud service providers.
Specific consideration of various legal, organizational data is helpful to secure the own
266 A. M. A. Ibrahim et al.

organization’s set of information [48]. In addition, careful consideration of SME’s busi-


ness data is required. This is because various physical location [37] of the organization’s
business data usually determines the load that can govern by the management of the
data. In addition, both of the party pays specific special attention to ensuring the differ-
ent clause that addresses the various security-related risks, control change, and allocation
of varying business liability [26, 50].
Regulatory Support: Considered as the most significant factor which affecting Cloud
Adoption [51]. In addition, cloud-based computing service helps SME’s for managing
and integrating with the various trade partner because of the delivery of products to the
end-user [52].
Proficient Users: Today, no business can survive for a more extended period without
a unique, proper, and intelligent business plan. As the SMEs in Qatar is growing con-
tinuously, the need for better data storage and data management procedure should be
there within the organization, and for the successful adoption of cloud computing, the
presence of adequately trained individuals is necessary [52, 53].
Data Confidentiality: Data confidentiality is one of the most important things for
clients who store their confidential and private data in the cloud. Various strategies
are there that used to ensure the fact of data confidentiality. The fact of data confiden-
tiality is a very important matter as this is associated with the idea of trustworthiness.
Improper control over stored data that is stored in the Cloud becomes the reason for the
client’s lack of trust in several cloud service providers. It should be kept in mind that it
is not the safest idea for clients to store their confidential data directly in cloud storage.
Managing Multiple Clouds: Nowadays, many organizations use the concept of multi-
cloud. Almost 80% of organizations around the world are using multiple clouds. On
average, those enterprises are using almost five different clouds. However, with the
increasing of using multi-cloud, complications also increase. One of the main reasons
it is becoming more compl8icated is that monitoring every cloud environment is not an
easy job.
Many organizations go for the concept of multi-cloud because of its flexibility and
sustainability. However, constant change is one of the key challenges for multi-cloud
management. For monitoring what is happening in the cloud environments, many orga-
nizations depend on cheap or free tools provided by the vendors, but these tools are full
of drawbacks. In simple words, it can be stated that, for monitoring multiple vendors,
multiple tools need to use. No cloud support in the legal tools is another key challenge
for managing multiple clouds.
Most of the legacy tools are not generally designed for cloud monitoring. However,
some legacy vendors have upgraded their monitoring tools for providing cloud support;
one significant challenge is to select multiplatform monitoring tools for the management
of multiple clouds. One of the most common challenges that an organization can face
while managing multiple clouds, is the lack of required and proper skills among the
employees of that organization. It is natural that for a new and updated technological
adoption, an organization must have trained employees otherwise the adoption will be
a failure.
Software as a Service Challenges 267

Performance: There are several performance issues in cloud computing services. One
of them is that one cannot use applications that are not suitable for the cloud, so this
is very important to identify the most suitable applications for cloud computing. One
important point is being aware of which physical server is the application is running.
Without knowing this, no SMEs can find the root causes of any problems associated with
the performances. It is very much necessary to know for an SME’s or enterprise how
much CPU is consumed by a particular application. It should ensure that the services
are allocated according to several priorities of a specific business.

Service Level of Agreement (SLA): While SMEs rely on the Cloud service provider
to handle and manipulate their data, and accepted agreement including SAL should
be signed [54]. There are many users and companies to resist the adoption of cloud
computing solutions for the privacy of their confidential information and the quality of
the provided service. In this context, a Service Level Agreement can be used. According
to Alkhater [37], Service Quality in SLA play a crucial role in increasing cloud Adoption.
This increases the trust in products provided to the clients through a transparent form of
guarantees provided by the service providers to the subscribers.

Confirmation: are those provided by the service providers to the clients about their
services and this is one type of guarantee to the clients about how good will the service
be. Usually, in this context, clients get the confirmation of several security aspects such as
data security and the guarantee of not losing any important information of the customer
that is stored in the cloud.

Quality of Service: is also another confirmation that the clients get from their ser-
vice provider. In case clients do not get any trustworthy confirmation from the service
providers, they will not take the service from those service providers. Besides, functional
testing and non-functional testing are two important sectors that put an impact on the
adoption of cloud computing services.

App Rating and Free Alternative to Paid Apps: In today’s time, free apps are
becoming more popular than the apps that are paid which means, not free. However,
this has some drawbacks. Free Apps are easy to get and access, it is true, but at the
same time, the quality of those apps is not up to the mark. Paid Apps usually have bet-
ter quality. It can easily state that those apps provide better features and facilities than
the free Apps. Besides the fact of quality, the paid apps are more secure to use as the
developers put more effort into developing these apps compared to the free apps. In the
adoption of cloud computing, the idea of security and confidentiality is one of the most
important parts. Therefore, if an app is free but provides poor security and quality, and
fewer features, clients will ignore those apps.

Besides these, cost, service provider reliability, vendor lock-in is some other critical
challenges for the adoption of cloud computing [53]. Organizations are also influenced
because of some prominent factors to adopt Cloud computing technology. The most
prominent of these factors can be identified to be the ability to reduce cost and relevant
benefits. Apart from these, more factors identified by Tehrani and Shirazi [55] influence
268 A. M. A. Ibrahim et al.

the decision. These factors are; External Support, Pressure of competition in the mar-
ket, Knowledge of decision-makers and employees on efficiencies of cloud computing,
Information intensity, potential advantages, privacy and security, innovativeness, com-
plexity, trialability, and compatibility with business requirements, used technologies,
and company norms.
Awodele et al. [39] stated that issues in cloud computing services could be in terms
of (a) Network and Data Security, (b) Governance, Compliance and legal, and (c) Com-
munication interface and Virtualization Security. As discussed by [40] various factors
affect the decision of adopting cloud services.

4 Discussion
As discussed there are many significant challenges for Adopting SaaS Cloud Computing.
Security is considered as the most significant factor affecting SaaS adoption [28, 30, 32,
37, 56, 58 and 59]. While Matias and Hernandez [51] consider Regulatory support as
the most significant factor affecting Adoption decision, another research [37] argue that
Quality of services, Security, Privacy, and Trust has a significant impact on adoption
decision.
According to Some researches [27, 28, 30, 32, 34, 54, 56–58, 60], Benefits like
saving cost or Positive Relative Advantages have a significant impact on the adoption
decision. Other research adds CC Awareness, Innovativeness Skills have a significant
direct impact on the adoption decision [26, 27, 38, 59, 60 and 61]. Different researchers
[37, 38] added, Quality of service has a significant and positive relationship toward CC
adoption.
However, the considerable conclusion is that no matter how many antecedents are
faced in adopting SaaS it is always beneficial for SMEs to utilize the SaaS cloud comput-
ing technology [2]. One of the significant findings that there is no comprehensive study
had conducted to study antecedents that affect Cloud Computing adoption. None of
those studies focused on both benefits and Scarifies from a technological and behavioral
perspective. Another Significant finding is that none of those been studied the human
factors (Prior technology experience, personal innovation) or attractiveness alternatives.
Moreover, it is noticed that most of the previous adoption studies using TOE [26, 28,
34, 37, 51, 58 and 61]. Or integrating TOE with TAM or DOI [13, 27, 30, 32, 38, 56,
57, 59, and 60].

5 Conclusion and Future Work


Software as a service (SaaS) Cloud computing is one of the most recent digital-related
technologies, and it has the potential to give businesses several economic benefits, includ-
ing cost savings, flexibility, scalability, and ease of adoption with no CAPEX. Despite the
potential benefits, employing this strategy has limitations and obstacles. In a relatively
short period, a large amount of study on this topic has been written.
This paper compiles and discusses the research approach, along with an explanation
of the challenges associated with SaaS CC adoption. The research methodology was
then discussed, which included determining the important of SLR, Identify Research
Software as a Service Challenges 269

Questions, Forming Query Strings, Quality Assurance and the limitation that could
affect this sturdy.
It also outlined include/Exclude Criteria as well. After that extract Literature based on
the predefined Criteria has been done. Finally Synthesize data and Document Outcomes
have been performed. As a result, researchers must address these issues by developing
a comprehensive framework to address the highlighted SaaS Adoption obstacles as a
future work in order to grow the market share of SaaS cloud computing and make it
work in practice.

References
1. Ratten, V.: Cloud computing technology innovation advances: a set of research propositions.
Int. J. Cloud Appl. Comput. 5(1), 69–76 (2015). https://doi.org/10.4018/ijcac.2015010106
2. Mell, P., Grance, T.: The NIST definition of cloud computing. National Institute of Standards
and Technology, Gaithersburg, USA, September 2011
3. QSS: Cloud Computing – Delivery and Deployment Models (2019). https://www.qsstechno
soft.com/cloud-computing-delivery-and-deployment-models/
4. Pillai, S.: Cloud computing delivery models explained (2014). https://www.ibm.com/blogs/
cloud-computing/2014/03/17/cloud-computing-delivery-models-explained/
5. www.binaryinformatics.com (2019). Cloud Service Model – Understand the Types, Charac-
teristics, & Advantages. http://blog.binaryinformatics.com/technology/what-is-the-cloud-ser
vice-model/. Accessed 03 Apr 2019
6. Jing, X., Jian-Jun, Z.: A brief survey on the security model of cloud computing. Distrib.
Comput. Appl. 34(19), 475–478 (2010)
7. Amazon AWS: What is cloud computing? (2019). https://aws.amazon.com/what-is-cloud-
computing/
8. Mitchell Grant: Software-as-a-Service (SaaS) (2020). https://www.investopedia.com/terms/
s/software-as-a-service-saas.asp
9. Mell, P., Grance, T.: The NIST definition of cloud computing version 15. National Institute of
Standards and Technology (NIST). Information Technology, Laboratory (2009). www.csrc.
nist.gov
10. Avram, M.G.: Advantages and challenges of adopting cloud computing from an enterprise
perspective. Procedia Technol. 12, 529–534 (2014)
11. Sasikala, P.: Cloud computing in higher education. Int. J. Cloud Appl. Comput. 1(2), 1–13
(2011)
12. CtrIs 2015: Top 5 Benefits of Cloud Adoption (2016). http://www.ctrls.in/blog/benefits-of-
cloud-adoption/. Accessed 14 Mar 2020
13. Kumar, D., Samalia, H.V., Verma, P.: Exploring suitability of cloud computing for small and
medium-sized enterprises in India. J. Small Bus. Enterp. Dev. 24(4), 814–832 (2017). https://
doi.org/10.1108/JSBED-01-2017-0002
14. Rath, A., Kumar, S.: Decision points for adoption cloud computing in small, medium
enterprises (SMEs). Internet Technol. Commun. 34(4), 688–691 (2012)
15. Haider, Y., Selvan, S.: Confidentiality issues in cloud computing and countermeasures: a
survey (2016)
16. Armstrong, R., Hall, B.J., Doyle, J., Waters, E.: Cochrane update. ‘Scoping the scope’ of a
cochrane review. J. Public Health 33(1), 147–150 (2011). https://doi.org/10.1093/pubmed/
fdr015.PMID21345890
270 A. M. A. Ibrahim et al.

17. Pittway, L.: Systematic literature reviews. In: Thorpe, R., Holt, R. (eds.) The SAGE Dictionary
of Qualitative Management Research. SAGE Publications Ltd. (2008). https://doi.org/10.
4135/9780857020109
18. Eden, J., Levit, L., Berg, A., Morton, S., et al.: Institute of medicine (US) committee on
standards for systematic reviews of comparative effectiveness research. In: Finding What
Works in Health Care: Standards for Systematic Reviews (2011). https://doi.org/10.17226/
13059. ISBN 978-0-309-16425-2. PMID 24983062
19. Kitchenham, B., Charters, S.: Guidelines for performing systematic literature reviews in
software engineering. Keele University and Durham University joint report ((2007))
20. Anwer, F., Aftab, S.: Latest customizations of XP: a systematic literature review. Int. J.
Mod. Educ. Comput. Sci. (IJMECS) 9(12), 26–37 (2017). https://doi.org/10.5815/ijmecs.
2017.12.04
21. Brereton, P., Kitchenham, B.A., Budgen, D., Turner, M., Khalil, M.: Lessons from applying
the systematic literature review process within the software engineering domain. J. Syst.
Softw. 80(4), 571–583 (2007)
22. Kitchenham, B.A., et al.: Preliminary guidelines for empirical research in software engineer-
ing. IEEE Trans. Softw. Eng. 28(8), 721–734 (2002)
23. Palos-sanchez, P.R., Arenas-marquez, F.J., Aguayo-camacho, M.: Cloud computing (SaaS)
adoption as a strategic technology: results of an empirical study (2017)
24. Oliveira, T., Martins, R., Sarker, S., Thomas, M., Popovič, A.: Understanding SaaS adoption:
the moderating impact of the environment context. Int. J. Inf. Manag. 49, 1–12 (2019)
25. Salum, K., Rozan, A., Zaidi, M.: Exploring the challenge impacted SMEs to adopt cloud ERP.
Indian J. Sci. Technol. 9, 1–8 (2016). https://doi.org/10.17485/ijst/2016/v9i45/100452
26. El-Haddadeh, R.: Digital innovation dynamics influence on organisational adoption: the case
of cloud computing services. Inf. Syst. Front. 22(4), 985–999 (2019). https://doi.org/10.1007/
s10796-019-09912-2
27. Hemlata, G., Date, H., Ramaswamy, R.: Understanding determinants of cloud computing
adoption using an integrated TAM-TOE model (2017)
28. Senyo, P.K., Effah, J., Addae, E.: Preliminary insight into cloud computing adoption in a
developing country (2016). https://doi.org/10.1108/JEIM-09-2014-0094
29. Yang, Z., Sun, J., Zhang, Y., Wang, Y.: Understanding SaaS adoption from the perspective of
organizational users: a tripod readiness model. Comput. Hum. Behav. 45, 254–264 (2015)
30. Stieninger, M., et al.: Factors influencing the organizational adoption of cloud computing: a
survey among cloud workers. Int. J. Inf. Syst. Proj. Manag. 6(1), 5–23 (2018). https://doi.org/
10.12821/ijispm060101
31. Yadegaridehkordi, E., Nilashi, M., Shuib, L., Samad, S.: A behavioral intention model for
SaaS-based collaboration services in higher education. Educ. Inf. Technol. 25(2), 791–816
(2019). https://doi.org/10.1007/s10639-019-09993-1
32. Safari, F., Safari, N., Hasanzadeh, A.: The adoption ofsoftware-as-a-service (SaaS): ranking
the determinants (2017)
33. Ming, C.F., et al.: The determinant factors affecting cloud computing adoption by small and
medium enterprises ( SMEs) in Sabah, Malaysia. J. Telecommun. Electron. Comput. Eng.
(JTEC) 10(3), 83–88 (2018)
34. Gutierrez, A., Boukrami, E., Lumsden, R.: Technological, organisational and environmental
factors influencing managers’ decision to adopt cloud computing in the UK. J. Enterp. Inf.
Manag. 28(6), 788–807 (2015). https://doi.org/10.1108/JEIM-01-2015-0001
35. Hasheela Miss, V.T., Mufeti Dr, T.K.: An investigation of factors leading to the reluctance of
SaaS ERP adoption in Namibian SMEs. Afr. J. Inf. Syst. 8(4), 1 (2016)
36. Lian, J.W., Yen, D.C., Wang, Y.T.: An exploratory study to understand the critical factors
affecting the decision to adopt cloud computing in Taiwan hospital. Int. J. Inf. Manag. 34(1),
28–36 (2015)
Software as a Service Challenges 271

37. Alkhater, N., et al.: organisations an empirical study of factors influencing cloud adoption
among private sector organisations. Telemat. Inform. (2017). https://doi.org/10.1016/j.tele.
2017.09.017
38. Senarathna, I., Wilkin, C., Warren, M., Yeoh, W., Salzman, S.: Factors that influence adoption
of cloud computing: an empirical study of Australian SMEs. Australas. J. Inf. Syst. 22 (2018)
39. Awodele, O., Adebayo, A.O., Tayo, O.O.: Security and privacy issues in cloud computing.
Commun. Appl. Electron. 7(3), 14–17 (2017)
40. Hsu, C.-L., Lin, J.C.-C.: Exploring factors affecting the adoption of Internet of Things services.
J. Comput. Inf. Syst. 58(1), 49–57 (2016). https://doi.org/10.1080/08874417.2016.1186524
41. Sakr, S., Zomaya, A.: Encyclopedia of Big Data Technologies, 1st edn. Springer, Switzerland
(2019). https://doi.org/10.1007/978-3-319-77525-8. eReference ISBN 978-3-319-77525-8
42. Sun, Y., Zhang, J., Xiong, Y., Zhu, G.: Data security and privacy in cloud computing. Int. J.
Distrib. Sens. Netw. 10(7), 190903 (2014)
43. Gholami, A., Laure, E.: Security and privacy of sensitive data in cloud computing: a survey
of recent developments. arXiv preprint arXiv:1601.01498 (2016)
44. Makkaoui, K.E., Ezzati, A., Beni-Hssane, A., Motamed, C.: Data confidentiality in the world
of cloud. J. Theor. Appl. Inf. Technol. 84(3) (2016)
45. Aloraini, A., Hammoudeh, M.: A survey on data confidentiality and privacy in cloud com-
puting. In: Proceedings of the International Conference on Future Networks and Distributed
Systems - ICFNDS 2017 (2017). https://doi.org/10.1145/3102304.3102314
46. Vasiljeva, T., Kreslins, K., Novik, D.: Challenge of cloud computing for SMEs: a case of
baltic countries. J. Innov. Manag. Small Medium Enterp. 1–10 (2018). https://doi.org/10.
5171/2018.238581
47. Arvanitis, S., Kyriakou, N., Loukis, E.N.: Why do firms adopt cloud computing? A compara-
tive analysisbased on South and North Europe firm data. Telemat. Inform. 34(7), 1322–1332
(2017). https://doi.org/10.1016/j.tele.2016.05.013
48. Narwal, R., Sangwan, S.: Benefits, dimensions and issues of software as a service (SAAS).
Int. J. New Innov. Eng. Technol. (IJNIET), 36–40 (2013)
49. Caldeira, M.M., Ward, J.M.: Using resource-based theory to interpret the successful adoption
and use of information systems and technology in manufacturing small and medium-sized
enterprises. Eur. J. Inf. Syst. 12(2), 127–141 (2003)
50. Gashami, J.P.G., Chang, Y., Rho, J.J., Park, M.-C.: Privacy concerns and benefits in SaaS
adoption by individual users: a trade-off approach. Inf. Dev. 32(4), 837–852 (2016). https://
doi.org/10.1177/0266666915571428
51. Matias, J.B., Hernandez, A.A.: Cloud computing adoption intention by MSMEs in the
Philippines. Glob. Bus. Rev. (2019). https://doi.org/10.1177/0972150918818262
52. Singh, A., Sharma, S., Kumar, S.R., Yadav, S.A.: Overview of PaaS and SaaS and its applica-
tion in cloud computing. Paper presented at the 2016 International Conference on Innovation
and Challenges in Cyber Security (ICICCS-INBUSH) (2016)
53. Opara-Martins, J., Sahandi, R., Tian, F.: A holistic decision framework to avoid vendor lock-in
for cloud SaaS migration (2017). https://doi.org/10.5539/cis.v10n3p29
54. Chou, D.C.: Cloud computing: a value creation model. Comput. Stand. Interfaces 38,. 72–77
(2015). https://doi.org/10.1016/j.csi.2014.10.001
55. Tehrani, S.R., Shirazi, F.: Factors influencing the adoption of cloud computing by small and
medium size enterprises (SMEs). Paper presented at the International Conference on Human
Interface and the Management of Information (2014)
56. Gangwar, H., Date, H., Ramaswamy, R.: Understanding determinants of cloud computing
adoption using an integrated TAM-TOE model. J. Enterp. Inf. Manag. 28(1), 107–130 (2015).
https://doi.org/10.1108/JEIM-08-2013-0065
57. AlBar, A.M., Hoque, M.R.: Factors affecting cloud ERP adoption in Saudi Arabia: an
empirical study (2017). https://doi.org/10.1177/0266666917735677
272 A. M. A. Ibrahim et al.

58. Hsu, C., Lin, J.C.-C.: Factors affecting the adoption of cloud services in enterprises. Inf. Syst.
e-Bus. Manag. (321) (2015). https://doi.org/10.1007/s10257-015-0300-9
59. Priyadarshinee, P., et al.: Understanding and predicting the determinants of cloud computing
adoption: a two staged hybrid SEM - neural networks approach. Comput. Hum. Behav. 76,
341–362 (2017). https://doi.org/10.1016/j.chb.2017.07.027
60. Lal, P., Bharadwaj, S.S.: Understanding the impact of cloud-based services adoption on
organizational flexibility an exploratory study (2017)
61. Dincă, V.M., Dima, A.M., Rozsa, Z.: Determinants of cloud computing adoption by Romanian
SMEs in the digital economy. J. Bus. Econ. Manag. 20(4), 798–820 (2019)
A Quantum Algorithm to Locate
Unknown Hashgrams

Nicholas R. Allgood(B) and Charles K. Nicholas

University of Maryland Baltimore County, Baltimore, MD 21250, USA


{allgood1,nicholas}@umbc.edu

Abstract. Quantum computing has evolved quickly in recent years and


is showing significant benefits in a variety of fields, especially in the
realm of cybersecurity. The combination of software used to locate the
most frequent hashes and n-grams that identify malicious software could
greatly benefit from a quantum algorithm. By loading the table of hashes
and n-grams into a quantum computer we can speed up the process of
mapping n-grams to their hashes. The first phase will be to use KiloGram
to find the top-k hashes and n-grams for a large malware corpus. From
here, the resulting hash table is then loaded into a quantum simulator. A
quantum search algorithm is then used search among every permutation
of the entangled key and value pairs to find the desired hash value. This
prevents one from having to re-compute hashes for a set of n-grams,
which can take←on average O(M N ) time, whereas the quantum algorithm
could take O( N ) in the number of table lookups to find the desired hash
values.

Keywords: Quantum computing · Malware · N-Gram · Hashgrams ·


Cybersecurity

1 Introduction
Quantum computing is rapidly evolving and each day something new is being
discovered. These discoveries are beginning making these concepts applicable
across a variety of domains. In the late 1980s and early 1990’s, quantum com-
puting was entirely theoretical and many of the early algorithms created then
have since provided a foundation on which to build other quantum algorithms.
While many of these algorithms, such as Simon’s [15] and Grover’s [7], were seen
as proof of concept algorithms, they in fact have more value on their own merits
than simply providing a foundation for other algorithms.
Though the situation is improving, one of the current limitations has to do
with availability of quantum computing. While companies such as IBM [8] and
D-Wave [5] are providing access to their quantum computers at no cost via cloud
platforms, they are still limited in the number of qubits and quantum volume
available. For that reason, much of our work is done using Qrack [16], a high-
performance quantum simulator. Simulators on classical hardware can simulate
approximately 30–32 qubits.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
K. Arai (Ed.): FTC 2022, LNNS 561, pp. 273–285, 2023.
https://doi.org/10.1007/978-3-031-18344-7_18
274 N. R. Allgood and C. K. Nicholas

One of the first steps in malware analysis is to perform static analysis which
searches the suspect binary file for static information such as strings that indicate
the program’s purpose, if a binary is maliciously packed, and whether the file
is malicious. [14]. It is also desirable to compare the suspect binary with other
binaries, malicious or not, to see if the suspect binary is similar to any of them.
An n-gram is a sequence of n contiguous bytes, for some small integer n. Files
that happen to have many of the same n-grams, in roughly the same proportions,
can be regarded as similar [6]. Historically, the value of n might be in the range
2–6. But unlike ordinary text, executable binaries use most if not all of the
characters in the range 0x00 to 0xFF. For n of 4, for example, that results in
2564 or roughly 4 billion possible n-grams to be tabulated. More recently, as
described below, larger values of n are also of practical value, but in tabulating
n-grams for any n larger than say 3 or 4, a hash table would be used to keep
track of which n-grams have been seen, and how often. Hash tables are usually
sized so that collisions don’t matter too much in practice. As a file is ingested,
though, a lot of n-grams are seen multiple times, and the same hash value is
computed multiple times. We will show how to improve n-gram tabulation by
calculating an n-gram’s hash once, storing the result, and using quantum search
to find the desired hash value, without recomputing it, should that n-gram be
seen again. This paper is organized as follows: in Sect. 2 we provide a review of
related work. We present the concept of quantum search as applied to n-grams
in Sect. 3. Our numerical and simulation results are presented in Sects. 4 and 5.
In Sect. 6 we summarize our results and make suggestions for future work.

2 Related Work
2.1 n-grams for Malware Analysis

Cybersecurity professionals are constantly under pressure to identify and neu-


tralize incoming threats. While antivirus software is essential, it is not always
able to keep up with the threat. Often a new piece of malware is released and
performs some sort of damage before its signature is identified and updates made
in the antivirus databases. Leveraging the latest techniques in machine learn-
ing, static analysis of malicious software has become a great tool in the arsenal
against malware. [14] A large variety of malware is in the form of PE32 exe-
cutable’s that target the Microsoft Windows operating systems. One example
use of n-grams would be to take sequences of bytes from a PE32 executable to
construct features to be utilized by machine learning algorithms [12]. Once such
sequences of bytes are identified, the feature selection process goes through and
eliminates duplicate or irrelevant pieces of information from the sets of data.
Using n-grams as features has proven effective in malware detection, showing up
to a 97% detection rate [12].
There are a number of machine learning techniques used for malware detec-
tion [12]. Using n-grams as features are what makes it possible to leverage auto-
mated and intelligent classification methods. n-grams can be used for data, but
A Quantum Algorithm to Locate Unknown Hashgrams 275

also to represent a sequence of opcodes, as well as operating system API calls


such as AdjustTokenPrivileges for Win32 and execve for Linux.

2.2 KiloGram

KiloGram [10] was released as open source software in 2020.1 KiloGram takes
a set of benign and known malicious software as input data. The output will
be a list of the top-k most frequent n-grams found that are contained within
the malicious software. Benign software is any software that is considered to not
contain any malicious code where malware is any software that is design to cause
harm in some fashion. We chose the KiloGram approach since it can be used for
a large number of n-grams and large values of n. Of use to us, the KiloGram
algorithm can handle ngrams that are 8-bytes or larger while keeping 1000 or
more of the most frequent entries.
In the context of malware analysis, n-grams are used to represent strings that
appear in some if not all members of a set of suspected malware specimens. These
n-grams then can be provided to other algorithms for a variety of uses, such as
classification into malware families. KiloGram was designed with these uses in
mind. Recall that the n in n-gram refers to some small integer n. For example,
if we wish to process a 4-byte string such as 0xABCD, you would see this called
a 4-gram. Unfortunately one major drawback of an n-gram based approach for
malware detection, is that the shorter the n-gram, the more likely you will also
find the byte sequence also benign software, making your rate for false positives
increase. Fortunately, KiloGram was also designed to overcome this limitation by
allowing the storing of larger and more specific n-grams, increasing the likelihood
they will be unique within a variety or family of malware.

2.3 Grover’s Algorithm

Grover’s algorithm [7] was one of the first quantum searching algorithms to be
developed. Grover’s has even been the inspiration for other quantum algorithms
such as Shor’s [13] factoring algorithm. While much attention and research has
been specifically around Shor’s algorithm with regards to quantum cryptography,
Grover’s has been used and even improved upon in recent years [18].
Grover’s search algorithm implements what is known as an amplitude ampli-
fication algorithm [3] which has been said to be a generalization of Grover’s algo-
rithm (although amplitude amplification was first discovered in 1997 by Gilles
Brassard in 1997, and then a year later by Lov Grover). The fundamental idea is
to increase (amplify) the probabilities of the desired results, and this is accom-
plished by using a sequence of reflections.2 What is occurring in the amplitude
amplification is that the reflections are rotated closer to the desired quantum
state along the Bloch Sphere. The target state is marked as sin2 (Θ) so that
when the amplitude amplification algorithm is applied m times, the probability
1
https://github.com/NeuromorphicComputationResearchProgram/KiloGrams.
2
https://docs.microsoft.com/en-us/quantum/libraries/standard/algorithms.
276 N. R. Allgood and C. K. Nicholas

of obtaining the correct state is sin2 ((2m + 1)Θ) In other words, we think of the
target state on the Bloch Sphere [2] and we keep rotating it until we find the
correct result, with each rotation getting slightly closer.

3 Quantum N -gram Searching


3.1 Amplitude Amplification

Referring back to the previous statements, we explain that instead of looking up


a value by key, we do a direct lookup by value. The reason being is we essen-
tially have to invert the key/value lookup problem when dealing with quantum
entanglement. Grover’s search [7] makes heavy use of quantum entanglement.
What this algorithm will do is when a lookup table is loaded into a quantum
machine, Grover’s algorithm will entangle all permutations of potential key and
value pairs based upon the input. The next step is to perform what is known
as amplitude amplification to the entangled pieces of data. Prior to the actual
amplitude amplification, the oracle is queried which places a tag value equal to
our search value As part of amplitude amplification, a tag value that equals the
search value is placed into memory and then the phase (sign) is flipped.
While amplitude amplification may sound like a phrase belonging in signal
processing, it is heavily used in quantum mechanics to describe the nature of
things, and it happens that most of those things happen to be analogue. For
practical purposes, in quantum computing amplitude amplification and phase
flipping refer to changing the sign of a value. For example, say we look at the
following matrix and we wish to locate the value at row 1, column 3:
⎡ ⎤
AB CD EF
⎣ 12 97 85 ⎦
2D 3F 9C
Once we perform the phase flip, we will get the following matrix:
⎡ ⎤
AB CD -EF
⎣ 12 97 85 ⎦
2D 3F 9C
For our purposes, this tag value is our n-gram we wish to locate and the
key is the hash provided (which is also the index value). As mentioned, the key
and value are entangled and with each lookup (iteration) of Grover’s search, we
can visualize the Bloch Sphere is rotated closer to the desired n-gram with each
iteration.
Anything written about topics such as signal processing and quantum
mechanics would be remiss if it failed to mention the Fourier Transform.3 We are
given a function f (x) and the Fourier Transform breaks down f (x) to its con-
stituent frequencies [4]. The conceptual structure of quantum mechanics defines

3
https://www.encyclopediaofmath.org/index.php/Fourier transform.
A Quantum Algorithm to Locate Unknown Hashgrams 277

the existence of pairs of complementary variables p and q connected by the


Heisenberg uncertainty principle. We can measure a particle’s quantum mechan-
ical position, but by doing so we lose information about the particle’s momentum
[4]. Going deeper into quantum mechanics, this gets into what is known as the
wave-particle duality of nature, for which the physical state of a particle can
be described by a wave function. The wave functions are used to describe the
physical state of a particle and one can use either a function of p or a function
of q, but never both. The real vector space that is the set of all possible physical
states and which contain the p-axis and q-axis is known as a phase space.
Referring back to phase shifting and amplitude amplification as part of the
algorithm, quantum mechanics choose a specific polarization of a defined space
and picks a subspace containing half of its dimensions. In contrast to picking all
of the points within this selected space that contains the q-axis, the quantum
Fourier transform takes the set of all complex-valued wave functions on the axis
[4]. We then examine the p-axis which while also having a valid polarisation,
has a set of possible states of a particle related to the first representation by the
Fourier transform:

pq
Φ(p) = ψ(q)2πi h dq (1)

Physical states exist inside what is known as an L2 space, which is a vector


space (specifically a measure space) that contains all of the squarable integral
functions. Due to this property, an L2 space is more specifically a Hilbert space
[11]. According to Plancherl’s theorem,4 Fourier Transforms also exist inside L2
spaces. Quantum mechanical operators are required to be unitary and a Fourier
Transform within a L2 (Rn ) space applied to itself is unitary. This upholds the
unitary requirement for all quantum computing operations.

4 Quantitative Results
4.1 Grover’s Circuits
Grover’s algorithm is an oracle based algorithm and in the majority of the
literature that discusses Grover’s algorithm, it’s typically split into four parts:
1. Initialization
2. Oracle processing
3. Amplitude amplification
4. Measurement
We now describe how a quantum simulator, in particular Qrack [16], imple-
ments both the oracle and amplification components of Grover’s search. Figure 1
is an example of a traditional quantum circuit for Grover’s search. Figure 2 is
a modified version of Grover’s search to be utilized with the Qrack quantum
simulator.
4
https://link.springer.com/article/10.1007%2FBF03014877.
278 N. R. Allgood and C. K. Nicholas

|0 /n H • X H • H
Oracle M easure
|0 /n H X H Z H

Cl. Reg /n

Fig. 1. Example Grover’s circuit

|key /k H • X H H
Oracle UZ M Reg
|value /v H X H H

Cl. M em IndexedSBC IndexedADC

Fig. 2. Qrack implementation Grover’s circuit

A few comments on the notation: /n is shorthand to state that each of the


gates apply to n qubits. In the Qrack example, we chose /k and /v to represent
the number of qubits used for the key and value. We use a Uz to represent
a phase-flip operation. We chose this over the standard Pauli-Z gate since we
want a single permutation’s phase flipped instead of flipping the phase on every
individual |0. In the Qrack implementation, we are applying the phase-flip to
all of the used qubits simultaneously. That is, we start out with setting our
qubit permutations to |0. Next we apply a Hadamard gate to each of the qubits
to place them into superposition where each qubit now equals √12 |0 + |1 and
√1 |0 − |1. The second step of this circuit is to place all qubits through the
2
oracle as defined in Grover’s algorithm. The next series of gates are CNOT gates
which were previously place in superposition, the superimposed values of |1 will
trigger a NOT operation on the target qubits. The IndexedSBC operation is in
reference to Qrack’s IndexedSBC [17] operator that we will cover later in Sect. 4.
We proceed with the phase-amplification part of the circuit by applying either
an X or NOT gate on all qubits followed by both a Hadamard gate and a custom
unitary phase-flip gate. Finally, we complete the circuit with another series of
Hadamard gates, followed by IndexedADC which is Qrack’s IndexedADC [17]
operation, and proceed with measurement of the resulting quantum state.
Figure 3 is an example of a Qrack implementation of the oracle or blackbox
used in Grover’s algorithm. While the oracle might look small in comparison to
the entire Grover’s circuit, it’s absolutely crucial to the algorithm. In the Qrack
[16] implementation of the oracle, we start by doing a DEC operation for all
qubits. This instruction is what starts the tag process by subtracting the target
value from a start value, typically 0. For example, if our target value is 100
then we would have 0 − 100 = −100. From here we do Uz gates on all qubits,
flipping the phases of their respective amplitudes. Practically, this translates to
flipping the sign of the bits so in the above example, this would make our value
+100. Lastly, the oracle reverts the previous DEC operation with Qrack’s INC
A Quantum Algorithm to Locate Unknown Hashgrams 279

|key /k
UZ
v
|value /

Cl. M em DEC IN C

Fig. 3. Qrack implementation Grover’s Oracle

operation to return to the original value, only with the sign flipped. To finalize
our example, we add 0 + (+100) where + is the phase, with our result being
+100. √
Theoretically, Grover’s algorithm requires an average of O( N ) lookups to
find a match for the specified target. While we are using a traditional lookup
table for Grover’s, the input time complexity evaluation might not be that obvi-
ous. If we dive into the bare fundamentals of Qrack/VM6502q, we notice we
have a IndexedLDA instruction [17]. This is a modified LOAD instruction that
allows loading a key with a superimposed index into a quantum register. The
IndexedLDA operation is unitary by design so it will not affect the overall quan-
tum state as it is loaded into the registers. The writing of the data with a
superimposed index, will actually entangle the classical memory cache and the
index register. Knowing this, we can say that the IndexedLDA operation takes
O(1) to load data into quantum registers.
In addition to the initial loads, there
M
will be an input time complexity of O N where M is the total number of
keys in the lookup table and N is the total number
 of matches
 [9]. This yields
M M
an overall input time complexity of O(1) + O N =O N .
We use the term lookups but this refers to the number of iterations of Grover’s
algorithm. To be specific, Qrack uses the following equation to determine the
number of iterations to use [17]:

π
f loor (2)
4 arcsin2 ( √ 1N )
2

5 Simulated Results
5.1 Qrack Operations
The Qrack [16] implementation utilizes some specialized methods for implement-
ing many of the operations in the oracle and amplitude amplification portions
of the algorithm. Here are some of the most commonly used operations [17]:

IndexedLDA: Set 8 bit register bits by a superposed index-offset-based read


from classical memory.
280 N. R. Allgood and C. K. Nicholas

IndexedADC: Add to entangled 8 bit register state with a superposed index-


offset-based read from classical memory.
IndexedSBC: Subtracts to entangled 8 bit register state with a superposed
index-offset-based read from classical memory.
INC: Integer addition without sign.
DEC: Integer subtraction without sign.
H: Hadamard gate implementation.
ZeroPhaseFlip: Controlled Z-gate implementation.
Z: Z-gate implementation, non-controlled.
X: X (NOT) gate implementation.
MReg: Measures the current state of a quantum register(s).

5.2 Benign vs. Malicious Datasets

As briefly mentioned in Sect. 2, there are some limitations with quantum simu-
lations, the most obvious being limited computing resources available for simu-
lation. While Qrack [16] can take full advantage of a GPU for processing using
OpenCL5 , one typically is limited to simulating approximately 30-qubits. Qrack
has some development branches of code where they are simulating 128-qubits for
testing the quantum supremacy problem released by Google [1], however, these
branches are quite experimental. To better appreciate why 30 qubits is a limita-
tion for simulation, we must recall our base formula 2n where n is the number of
qubits we wish to simulate. 2n specifically refers to the total amount of quantum
states we wish to simulate. With 30 qubits, we end up with 230 = 1073741824
or roughly one billion values. But the amplitudes represented by the quantum
states are complex numbers, so we must include the real and imaginary parts
when factoring in memory requirements. We use 22 bytes for the real value and
22 bytes for the imaginary value. This then gives us 22+2 = 16 bytes for each of
those one billion values, or

230+4 = 17179869184 bytes ≈ 16GB (3)

Outlined in Table 1, Using our above equation, the following table represents
how much memory is required for simulating and encoding up to 40 qubits.

5
https://www.khronos.org/opencl/.
A Quantum Algorithm to Locate Unknown Hashgrams 281

Table 1. Simulation memory allocation

Qubits Real bytes Imaginary bytes Total memory


4 2 2 256 bytes
8 2 2 4 KB
16 2 2 1 MB
24 2 2 ∗ 268 MB
28 2 2 ∗ 4 GB
30 2 2 ∗ 16 GB
32 2 2 ∗ 64 GB
40 2 2 ∗ 17TB

In Table 2 we state our benign and malicious datasets along with the the
respective number of files in each dataset.

Table 2. Benign vs. Malicious software dataset

Benign Benign Files Malicious Malicious Files


Windows 7 System32 4565 Vxheaven 2015 284151
MAML 691 VirusShare 2018 131072

In Table 3 we list the number of kept n-grams when comparing malicious and
benign datasets. We also record the size of our n-gram kept, with a maximum
of n-gram size of 3-bytes due to limitations of the simulation hardware.

Table 3. Benign vs. Malicious Software n-grams

Benign Malicious n-gram size Kept n-grams


Windows 7 System32 Vxheaven 2015 3 bytes 64
Windows 7 System32 Vxheaven 2015 2 bytes 16384
Windows 7 System32 Vxheaven 2015 2 bytes 4096
MAML VirusShare 2018 3 bytes 64
MAML VirusShare 2018 2 bytes 2048
MAML VirusShare 2018 2 bytes 1024

The hardware and software used was a 16-Core Intel Xeon E5-2630 @ 2.4 Ghz
with 32 GB RAM and two GeForce GTX 1660 video cards. The machine was
running 64-bit Ubuntu Linux 18.04 and OpenCL 1.2. As one can see, due to the
282 N. R. Allgood and C. K. Nicholas

limitations of approximately 30-qubits, we had to select the number of bits for


our key and value size with care. Since n-grams are typically byte sequences,
we were limited to a maximum of n-grams with n set to 3. Using 3-grams gave
us 24-qbits for our n-gram value with 6-qbits remaining for our index values.
Utilizing a 2-grams gave us a much larger span of bits to use for our index value
(14-qbits). Recall that the index for this is the hash for a specific n-gram, and
since KiloGram utilizes Rabin-Karp hashing modulo B where B is the KiloGram
bucket size [10].
Table 4, we provide an estimated number of query’s needed Grover’s algo-
rithm required for the number of n-grams.

Table 4. Grover’s Query’s for n-gram sizes

Number of n-grams Number of Lookups (iterations)



64 64 = 8

128 128 = 11.31 ∗ 12

256 256 = 12

512 512 = 22.63 ∗ 23

1024 1024 = 32

2048 2048 = 45.25 ∗ 46

4096 4096 = 64

8192 8192 = 90.50 ∗ 91

16384 16384 = 128

As we can see from Table 4, more n-grams requires more iterations and the
number of iterations increases by a much smaller amount as we keep a larger
number of n-grams. Using a practical example, below we describe pseudo-code
for the Qrack implementation of Grover’s algorithm in addition to showing the
output for a 2-byte n-gram with a 10-bit index. We search for a n-gram with
the value of 0xF3D7 which has an unknown hash, which we quickly find to be
0x3a9.

5.3 Example Hash Retrieval for n-gram: 0xf3d7

Figure 4 is an example where we search for an n-gram with the value of 0xF3d7
that has a hash value of 0x3a9.
A Quantum Algorithm to Locate Unknown Hashgrams 283

0> chance of match:0.00876619


1> chance of match:0.0242241
2> chance of match:0.0471087
...
22> chance of match:0.98967
23> chance of match:0.998456
24> chance of match:0.999461
After measurement (of value, key, or both):
Chance of match:1
Ngram: f3d7
Hash: 3a9
Total Iterations: 25

Fig. 4. Searching for the Hash of n-gram 0xF3D7

5.4 Qrack Pseudocode


In the following algorithm, we show pseudo-code utilizing Qrack that is an imple-
mentation of both an oracle and amplitude amplification for Grover’s search.

1: idxLen = 10
2: valLen = 16
3: cryIdx = idxLen + valLen
4: ngrams = ngramtable[indexLength]
5: ngram = 0xf 3d7
6: qReg = CreateQuantumInterf ace(∗params)
7: qReg = SetP ermutation(0)
8: qReg = H(valLen, idxLen)
9: qReg = IndexedLDA(valLen, idxLen, 0, valLen, ngrams)
10: procedure QueryOracle(tP erms, qReg, valueSt, valLen)
11: qReg = DEC(tP erms, valueSt, valLen)
12: qReg = ZeroP haseF lip(tP rems, valueSt, valLen)
13: qReg = IN C(tP rems, valueSt, valLen)
14: end procedure
15: procedure AmplitudeAmplifiation
16: idxLen = 10
17: valLen = 16
18: cryIdx = idxLen + valLen
19: ngrams = ngramtable[idxLength]
20: ngram = 0xf 3d7
21: qReg = CreateQuantumInterf ace(params)
22: qReg = SetP ermutation(0)
23: qReg = H(valLen, idxLen)
24: qReg = IndexedLDA(valLen,
  0, valLen, ngrams)
idxLen,

25: for i = 0 to f loor  π


4 arcsin2 ( √ 1 )
 do
2N
26:
27: T agV alue(ngram, qReg, 0, valLen)
28: qReg = X(cryIdx)
29: qReg = IndexedSBC(valLen, idxLen, 0, valLen,
30: cryIdx, ngrams)
31: qReg = X(cryIdx)
32: qReg = H(valLen, idxLen)
33: qReg = ZeroP haseF lip(valLen, idxLen)
34: qReg = H(valLen, idxLen)
35: qReg = IndexedADC(valLen, idxLen, 0, valLen,
36: cryIdx, ngrams)
37: end for
38:
39: end procedure
284 N. R. Allgood and C. K. Nicholas

6 Conclusion
We have shown that combining the results of an efficient n-gram collection soft-
ware such as KiloGram with quantum computing, we can provide a faster way of
finding a previously computed, but currently unknown hash for a known n-gram.
We have compared this solution to the classical approach, and have shown that
for a large number of n-grams, the quantum based solution outperforms them
substantially. When better quantum hardware is available, these concepts could
be applied to cryptographic hashes such as SHA-256 or BLAKE3. We hope
that our work will remain useful when better quantum computers are available.
Quantum computing research is continuing to grow each day and while it might
seem that adequate enough hardware is far into the future, it is will be upon us
before we realize and cybersecurity professionals will need to be ready.

Acknowledgment. We extend our thanks to our colleagues Sam Lomonaco and


Edward Raff for their comments on an earlier version of this paper. We also extend
our sincere gratitude to Dan Strano for the development and support of the Qrack
quantum simulator.

References
1. Arute, F., Arya, K., Babbush, R., et al.: Quantum supremacy using a pro-
grammable superconducting processor. Nature 574, 505–510 (2019). https://www.
nature.com/articles/s41586-019-1666-5
2. Bloch, F.: Nuclear induction. Phys. Rev. 70, 460–474 (1946)
3. Brassard, G., Høyer, P., Mosca, M., Tapp, A.: Quantum amplitude amplification
and estimation. Quantum Computation and Information, pp. 53–74 (2002)
4. Coppersmith, D.: An approximate fourier transform useful in quantum factoring,
2002
5. D-Wave. D-wave (2020). https://dwavesys.com
6. Damashek, M.: Gauging similarity with N-Grams. Science 267(5199), 843–848
(1995)
7. Grover, L.K.: A fast quantum mechanical algorithm for database search. In: Pro-
ceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing
- STOC 1996 (1996)
8. IBM. IBM quantum experience (2020). https://quantum-computing.ibm.com
9. Michael, A.: Nielsen and Isaac L. Cambridge University Press, Chuang. Quantum
Computation and Quantum Information (2000)
10. Raff, E., et al.: KiloGrams: very large N-grams for malware classification. In: Pro-
ceedings of KDD 2019 Workshop on Learning and Mining for Cybersecurity (LEM-
INCS 2019) (2019)
11. Rudin, W.: Real and Complex Analysis, 3rd Edn. McGraw-Hill, Inc., USA (1987)
12. Shalaginov, A., Banin, S., Dehghantanha, A., Franke, K.: Machine learning aided
static malware analysis: A survey and tutorial. Cyber Threat Intelligence, pp. 7–45
(2018)
13. Shor, P.W.: Algorithms for quantum computation: discrete logarithms and factor-
ing. Proceedings 35th Annual Symposium on Foundations of Computer Science,
Santa Fe, NM, pp. 124–134 (1994)
A Quantum Algorithm to Locate Unknown Hashgrams 285

14. Sikorski, M., Honig, A.: Practical Malware Analysis. no starch press (2012)
15. Simon, D.R.: On the power of quantum computing. In: Proceedings of the 35th
Annual Symposium on Foundations of Computer Science, pp. 116–123 (1994)
16. Strano, D., Bollay, B.: Qrack a comprehensive, gpu accelerated framework for devel-
oping universal virtual quantum processors (2020). https://github.com/vm6502q/
qrack
17. Strano, D., Bollay, B.: Vm6502q and qrack (2020). https://vm6502q.readthedocs.
io/en/latest/index.html
18. Wang, Y.: A quantum walk enhanced grover search algorithm for global optimiza-
tion (2017)
BUMP: Bridging Unmet Modes
of Participation in the Workplace

Claudia B. Rebola, Diego Gomez-Enriquez(B) , and Erwin Vargas-Alfonso

University of Cincinnati, Cincinnati, OH 45221, USA


[email protected], {gomezeda,vargasem}@mail.uc.edu

Abstract. This paper describes the implementation of an inclusive


alternative designed for social interaction in workspaces. Based on off-
shelf technologies, this work proposes a creative approach in technology
iterations that encourages users’ participation regardless of digital pro-
ficiency. Furthermore, bridging Unmet Modes of Participation (BUMP)
in the workplace is also an invitation to facilitate the interaction of the
multiple types of individuals in any workspace. BUMP aims to clearly
understand how prosthetic interventions in workplaces enhance employee
interaction and positively impact work systems. Moreover, this paper
discusses how technology designs can help us bring people together at a
distance. It considers the technical aspects and the observational facts of
implementing a pilot testing of inclusive technologies in workspaces. This
project is focused on demonstrating the importance of creating inclusive
platforms for user engagement with elements of the digital era. Addition-
ally, this work explores how modern communication scenarios can help
us bridge the gap between digital proficiency and the groups occupying
common spaces by implementing seamless and effortless communication
methods while recovering from the impact of the pandemic on social
interactions. The result is a novel option to socialize in the workspace
by connecting people through virtual windows that can be placed in
multiple spaces.

Keywords: Post-pandemic future · Inclusive technology ·


Workspaces · Office environments · Technology design · Well-being ·
Accessibility · e-leisure

1 Introduction
Technologies allow users to connect at a distance. Humans are essentially social
species with the motivation to form and maintain interpersonal relationships
as a fundamental organizational principle of behavior [1]. Especially during the
COVID-19 pandemic, connecting at a distance via video conferencing and chat-
ting proves to benefit bringing people together. Despite many technologies and
platforms that existed before the pandemic, their importance to our daily liv-
ing increased exponentially during the pandemic. People are using tools like
Zoom, Skype, and other technologies to stay in touch with work collaborators,
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
K. Arai (Ed.): FTC 2022, LNNS 561, pp. 286–294, 2023.
https://doi.org/10.1007/978-3-031-18344-7_19
BUMP 287

colleagues, and family, thus making the relevance of technology even more evi-
dent [2]. However, these systems usually require programming and coordination
to establish interaction and connection. While chat rooms and videoconferenc-
ing are continuously evolving into low-delay and high-quality platforms, both
systems can be further evolved better to increase the “social” aspect and the
capabilities to “expand” the physical space of users. Office spaces and digital
communication technologies are changing the nature of work for individuals
[3]. Users, online platforms, and spatially are interrelated and come together
in the constitution of everyday spatial encounters and experiences [4]. However,
barriers to technologies at the workplace are related to issues of adoption and
accessibility. In addition to discrepancies in digital literacy, there is imminent
segregation of groups of the population that decides to stay apart from the
continuous-evolving communication networks.
Multiple factors such the age, academic degree, and the ability to effectively
use technology [5] may affect the adherence to modernized tools to interact and
solve problems [6] while the meaning of life-like interactions is preserved [7]. This
study explores the design and implementation of accessible technologies to bridge
participation in the workplace. This paper describes the Bridging Unmet Modes
of Participation (BUMP) in the workplace, mainly focusing on the technology
development for implementation. The significance of this paper is to address the
creative approach in technology iterations based on off-shelf technologies; and
how the results contribute to designing inclusive technologies that can facilitate
the way users communicate with accessible technologies at a distance.
The rest of the paper is organized as follows. Section 4 discusses the techni-
cal setup, criteria, and observation facts considered for the pilot studies. Each
subsection elaborates on critical components for the project’s development. Sub-
section 4.1 explains the technological components that enable the system to run
independently without involving existing or commercial software. Subsection 4.2
describes the environment considered for the implementation of the pilot study,
focusing on the sampling size and the current space conditions where the study
took place. Subsection 4.3 focuses on the results of the pilot study conducted
in two different approaches, passive and active, to test different users’ levels of
interaction with the BUMP stations. Finally, Sect. 5 concludes the article and
discusses future research opportunities.

2 Related Work

Other prototypes have been previously developed. The authors in [8] and [9]
presented the design of a new communication method to bridge unmet partici-
pation. The project’s goal was to extend older adults’ home spaces by enabling
them to connect with families and the environment in more natural ways. Such
a project explored the possibilities of improving the environment by making a
more innovative and more supportive environment for its inhabitants.
In terms of post-pandemic scenarios, authors have remarked on the impor-
tance of finding methods to reduce the social distance created by the pandemic
288 C. B. Rebola et al.

on remote workers [10], as well as the necessity to propose innovative solutions


to regain some of the affordances of physical co-location [11]
Some companies have also developed proposals to link cities worldwide under
the same principle of bridging connections despite the distance. This is the case
of the company Portal Cities which has developed a virtual bridge called The
Portal Project. It virtually connects cities in the European Union. The message
behind the portal project is to make people forget about the division between
them and us and see our planet united [12].
Other more complex proposals incorporate augmented reality [13] and virtual
reality (VR) to create working environments even on the go and allow new
collaborative approaches that can help mitigate the effects of physical distance
[14].

3 Design Concept
Following the idea of “bumping into someone”, we present this project as an
alternative to meeting or introducing someone unexpectedly. Our work aims
to create unplanned connections in workspaces to promote interaction between
coworkers. As a result, a novel option to socialize in the workspace is developed
by connecting people through virtual windows that can be placed in multiple
spaces such as hallways, cafeterias, or lounges. This pilot work consists of two
physical stations located in two different spaces. These two stations are connected
using software that allows them to stream and receive video. While no interac-
tion occurs at any of the stations, the screen switches to stand-by by showing a
black background with the BUMP logo on top. Once any of the systems detects
the movement around the stations, it starts transmitting video allowing the par-
ticipants to interact with each other in real-time Fig. 1. BUMP is also meant to
be an unprecedented method of pressure-relieving while creating periodic breaks
that generate greater productivity, inspire creativity, and improve the positive
attitude among employees. Furthermore, this approach reduces distraction by
limiting the interaction time to 30 s.
This study also measures the impact of adding such dynamic interactions
in the workspace. Although this first approach only involves two stations, the
system could potentially be expanded to more stations inside multiple corporate
buildings.

4 Implementation
4.1 Technical Setup
The major challenges of the design were related to sensing users in the spaces
while timely connecting them at a distance. To compensate for the delay, this
project used a couple of movement sensors to locate the user several seconds
before being in front of the screen and thus create the interaction “bump”
at a distance. The current prototype implemented two BUMP stations that
BUMP 289

Fig. 1. BUMP interaction diagram

a RaspberryPi entirely controls. Each station has an embedded 40” HD display


that remains on standby if no interaction occurs. Two infrared sensors are con-
nected to the I/O ports and sense the surroundings’ movement. When activity
is detected in the station’s vicinity, a quick “Hey, Psst” sound is played to at-
tract potential users. The camera is activated automatically, and the interaction
begins. Via WiFi, the two modules create their own wireless TCP connection;
thus, the session is effortlessly established without authentication methods. The
overall system’s latency is very low in tens of milliseconds (see Fig. 2).

Fig. 2. Streaming components diagram

Noteworthy, this project does not use any existing video-conference appli-
cation or software. Instead, the stations create a Point-to-Point connection by
using Netcat. Initial set-up requires the manual execution of the script that exe-
cutes a Raspivid command to stream the video to a local port, followed by a
290 C. B. Rebola et al.

Mplayer command that shows the other participant’s video on the local display.
The automated script is shown in Fig. 3. A python program -start.py- plays a
coordination role as it controls the sequence and timing, plays the invitation-to-
interact sound, and reads the IR sensors connected to the I/O ports to operate
the display accordingly.

Fig. 3. Automated script

4.2 Workplace Pilot Studies

The BUMP prototype was tested for implementation. The BUMP prototype was
launched for pilot studies once the system was stable and produced no errors.
The criteria to select the ideal place to conduct the pilot study were to find
two different spots in the building. The research team had immediate access to
a workplace site with a large setting (housing approximately 2500 users) with
different floors and open spaces. Therefore, the two modules were located along
the busiest halls; both featured a precise vicinity that was noticeable and did
not block any walkable area.
The BUMP prototype installation consisted of two stations in different hall-
ways of the workplace where a high amount of passersby was expected (see
Fig. 4). High traffic areas were chosen to facilitate an optimal scenario with mul-
tiple observations from several users.

4.3 Pilot Observations

In order to assess how users interact with the device, the BUMP prototype
pilot study was conducted under two different scenarios with a duration of three
weeks each. The first approach (passive) was focused on passerby’s curiosity,
people’s acceptance, and indifference, and whether the user exhibited an inten-
tion to interact with the stations without any explicit invitation. Upon com-
pletion of the first set of trials, different behaviors were analyzed. The second
approach (active) incorporates two simple invitation methods, the exclamation
“Hey, psst!” and a screen message with the simple sentence “give a smile here”
(see Fig. 5). This modification aimed to attract more passersby and explicitly
invite them to interact with the device.
BUMP 291

Fig. 4. BUMP standalone station

4.4 Pilot Observations Results


For the first approach, despite everyone noticing the deployment was curious
about what was happening on the screen, not all of them tried to get closer
and find out more. In general, there was a low rate of interactions for those
who showed interest in the stations. Younger participants, who seemed more
captivated, approached the station to look at what was happening on the screen
and tried to guess where the counterpart station was operating. Groups of friends
exhibited more confidence when trying to interact at both stations by waving and
grinning as remote users showed up on the screen. On the other hand, middle-
aged adults just stopped at the stations to try to find what the functionality is
with a limited intention to interact.
Similar behaviors continued for the next two weeks until people got used
to the installation. Furthermore, during the third week, people’s conversations
around the building halls were related to explaining how the system works.
New observers were walking with people that were already familiar with the
292 C. B. Rebola et al.

Fig. 5. Standalone station with explicit (Active) invitation

device. These people were teaching the new observers their definition of how the
system works, making the appropriation of the people with the device and the
environment evident.
For the second approach, an increasing number of individuals started inter-
acting with the BUMP stations by waving and smiling at the person on the
counterpart. The interaction between individuals occurs easier as people feel
invited to meet and produce an interaction. In addition, the BUMP stations
start to receive personality and acceptance in the building. Figure 6 exemplifies
the interaction between two users using BUMP stations at different locations.
BUMP 293

Fig. 6. Sample interaction at remote locations

5 Conclusion and Future Work


This paper discusses how technology designs can help us bring people together
at a distance. It focuses on the technical aspects and the observational facts
of implementing a pilot testing of inclusive technologies. This project aims to
demonstrate the importance of creating inclusive platforms for users’ engage-
ment with elements of the digital era. Additionally, this work discusses how
modern communication scenarios can help us bridge the gap between digital
proficiency and the groups occupying common spaces by implementing seamless
and effortless communication methods.
Contrasting the two ways of approaching (passive vs. active), we provide
evidence that users with a certain level of confidence and friendship will be
more willing and enthusiastic to interact with the devices. The changes in the
conditions demonstrate the importance of explicitly requesting the passerby to
engage with the modules. Further studies are necessary to record users’ interac-
tions with the system systematically. Moreover, additional scenarios should be
tested by contemplating using implicit appeals that promote interaction. Addi-
tionally, deploying a higher number of interconnected BUMP stations might lead
294 C. B. Rebola et al.

to more-qualified results with a broader set of observations. Finally, this project


also invites the research community to consider using immersive experiences as
part of inclusive technologies.

References
1. Lieberz, J., et al.: Loneliness and the social brain: how perceived social isolation
impairs human interactions. Adv. Sci. 2021 8, 2102076 (2021). https://doi.org/10.
1002/advs.202102076
2. Queen, D.: Technological impact of COVID-19. Int. Wound J. 18(2), 129–130
(2021). https://doi.org/10.1111/iwj.13578
3. Bowen, T., Pennaforte, A.: The Impact of Digital Communication Technologies
and New Remote-Working Cultures on the Socialization and Work-Readiness of
Individuals in WIL Programs: Global Perspectives on the Future (2017). https://
doi.org/10.1108/S1479-367920170000032006.
4. Repenning, A.: Workspaces of Mediation: How Digital Platforms Shape Practices,
Spaces and Places of Creative Work. Tijds. voor econ. en Soc. Geog. 113, 211–224
(2022). https://doi.org/10.1111/tesg.12508
5. Davies, R.S.: Understanding technology literacy: a framework for evaluating edu-
cational technology integration. Techtrends Tech Trends 55, 45 (2011). https://
doi.org/10.1007/s11528-011-0527-3
6. Pfaltzgraf, D., Insch, G.S.: Technological illiteracy in an increasingly technological
world: methods to help employees create with rather than simply consume technol-
ogy. Dev. Learn. Organ. 35(6), 4–6. https://doi.org/10.1108/DLO-12-2020-0235
7. Petrova, K., Schulz, M.S.: Emotional experiences in technology-mediated and in-
person interactions: an experience-sampling study. Cogn. Emotion (2022). https://
doi.org/10.1080/02699931.2022.2043244
8. Chu, C., Rebola, C.B., Kao, J.: BUMP: bridging unmet modes of participation. In:
Proceedings of the 2015 British HCI Conference (British HCI 2015). Association
for Computing Machinery, New York, NY, USA, 261–262 (2015). https://doi.org/
10.1145/2783446.2783601
9. Rebola, C.B., He, S.: Project BUMP: developing communication tools for the older
adult population. In: Arai, K., Bhatia, R., Kapoor, S. (eds.) Proceedings of the
Future Technologies Conference (FTC) 2019. FTC 2019. Advances in Intelligent
Systems and Computing, vol. 1069. Springer, Cham (2020). https://doi.org/10.
1007/978-3-030-32520-6 70
10. Bleakley, A., et al.: Bridging social distance during social distancing: exploring
social talk and remote collegiality in video conferencing. Hum.-Comput. Interact.
(2021). https://doi.org/10.1080/07370024.2021.1994859
11. Jacobs, N.J., Lindley, J.: Room for Improvement in the Video Conferencing. AoIR
Selected Papers of Internet Research (2021). https://doi.org/10.5210/spir.v2021i0.
12188
12. Portalcities.org. 2022. Portal - a Bridge to The United Planet. [online] https://
portalcities.org/. Accessed 20 June 2022
13. Lages, W.S., Bowman, D.A.: Walking with adaptive augmented reality workspaces:
design and usage patterns. In: Proceedings of the 24th International Conference on
Intelligent User Interfaces (IUI 2019). Association for Computing Machinery, New
York, NY, USA, pp. 356–366 (2019). https://doi.org/10.1145/3301275.3302278
14. Ofek, E., Grubert, J., Pahud, M., Phillips, M., Kristensson, P.O.: Towards
a practical virtual office for mobile knowledge workers (2020). arXiv preprint
arXiv:2009.02947
Theoretical Perspectives Towards
Culture-Centered User Engagement Design
for Mobile Health in the Global South

Tochukwu Ikwunne(B) , Lucy Hederman, and P. J. Wall

ADAPT Centre, School of Computer Science and Statistics, Trinity College Dublin, Dublin,
Ireland
[email protected]

Abstract. The main objective of this article is to propose a theoretical model for
uncovering users’ socio-cultural contexts for user engagement designs, as well
as to present a set of consolidating concepts to guide future research in mobile
health (mHealth) designs to strengthen the theoretical framing of user engage-
ment designs. First, we present a brief discussion of the importance of employing
appropriate theoretical frameworks that promote user engagement. Second, we
discuss different theoretical perspectives on user engagement. Two theoretical
frameworks, activity theory and the communicative ecological framework, are
used to frame an understanding of users’ socio-cultural contexts and inform the
design of a framework called Design Process Engagement Enhancement Sys-
tem (DECENT) to support designers in mHealth engagement designs. This paper
addressed the research question of how to uncover the socio-cultural contexts of a
user group to inform the design of engaging digital artifacts (tools) by augmenting
two phases of user-centered methodology with users’ socio-cultural filtration and
socio-cultural checklists for user engagement designs. The DECENT framework
is an adapted user-centered design framework that uses activity theory and the
communicative ecological framework as theoretical framework for understanding
the socio-cultural complexity of users in the context of mHealth. The findings of
the paper suggest that to fully explain and design for user engagement in mHealth,
an integrated approach incorporating a variety of technological and socio-cultural
factors should be required.

Keywords: Activity theory · Communicative ecological framework ·


Culture-centered design · Mobile health · mHealth · User engagement

1 Introduction

In the design of content, products, systems, and services, engagement is a universal


goal, and every designer strives to engage users [1]. Insufficient engagement may result
from an imbalance in the integration of appropriate theoretical content and features that
sustain user interest [2, 3]. Ikwunne et al. [4], conducted a systematic review of the design
process of user engagement design and identified a lack of consideration of socio-cultural

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 295–311, 2023.
https://doi.org/10.1007/978-3-031-18344-7_20
296 T. Ikwunne et al.

contexts in the design of mHealth interventions and suggested that such socio-cultural
contexts be considered and addressed systematically by identifying a design process for
engaging users in mHealth interventions [5].
New products are frequently developed for international markets in our globalized
economy. Because user characteristics and needs differ significantly across regions [6],
product development for global markets necessitates organizations to take these differ-
ences in user characteristics and needs into account when designing products. These
differences are frequently influenced by national and ethnic cultures. According to Shen
et al. [7, p. 7] “Successful interface metaphors should be developed or adapted through
cultural requirements by or regarding, representatives of the culture for which they are
intended”. Culture refers to the similar patterns of thinking, feeling, and acting of people
who belong to the same group but differ from other groups in these patterns [8]. This
type of ‘acting’ is largely based on unwritten rules and habits that are passed down from
generation to generation [9]. Thus, this paper focuses on the research question - How to
uncover the socio-cultural complexities of a user group to inform the design of engaging
digital artifacts?
Honold [10] asserts that the approach to learning how to use a new mobile phone, for
example, may differ depending on national culture. According to the findings from Hon-
old’s study, German users prefer to use a user manual, whereas Italians prefer learning
by doing. Furthermore, while the salesperson is an important source of information for
the Chinese, the entire family is involved in the knowledge acquisition process for Indian
users. Honold’s study demonstrates that culture influences patterns of user-product inter-
action and engagement, which implies that designers should take these differences in
knowledge acquisition approaches into account. However, determining the nature of this
impact may be difficult without theorization and a deeper understanding of how users
actually engage with technologies [11]. This emphasizes the importance of theoreti-
cal models that provide a framework for assessing cultural differences and generating
design recommendations. Activity theory (AT) and the communicative ecology frame-
work (CEF) are suitable models used in the paper for uncovering users’ socio-cultural
contexts for user engagement designs for many reasons.
AT is a well-known theoretical model for examining and understanding the human
use of technology. The use of AT to uncover the socio-cultural contexts of users helps
establish a context for human activity and provides key insights into the potential meaning
and significance of user engagement designs. AT provides a framework for investigating
the complexities of interactions between people and their environments by identifying
the units of an activity system, how they are related, their various voices, their history,
flaws, and changes [12]. This paper uses the Engeström model of AT (as described
in Sect. 3) for uncovering users’ socio-cultural contexts for user engagement designs.
The Engeström model of AT provides a conceptual framework for understanding the
interconnections between activities, actions, operations, and artifacts, as well as aspects
of the social, cultural, and societal contexts in which these activities are framed [12].
In the case of mHealth technology engagement design as an example, AT can be
used to illustrate why a user relates with the mHealth but may not investigate in detail
what causes a specific user to relate with mHealth technologies through his or her daily
communication in broader socio-cultural contexts. As a result, the CEF (as defined by
Theoretical Perspectives Towards Culture-Centered User Engagement Design 297

Foth and Hearn, [13]), which integrates three layers of interpretation (technical, social,
and discursive), is used in this paper to provide a rich description of how mHealth is
structured in a social context. Section 3 provides a detailed example of the use of AT and
the CEF to uncover socio-cultural contexts of a user group for user engagement designs
that inform the design process engagement enhancement system (DECENT) framework.
To clarify the theoretical perspective taken to uncover the socio-cultural contexts of
users, it is necessary to outline the conceptualization of user engagement and theories of
engagement used as described in Sect. 2. In this paper, the definition of user engagement is
a context-dependent, individual-specific psychological state that emerges from two-way
interaction with an object, such as an app [14–16].
Overall, the research in mHealth design for user engagement demonstrates that the
impact of culture on design should not be overlooked. Furthermore, theoretical models
are useful in guiding the product design process of user engagement designs. The fol-
lowing section presents a theoretical framework for user engagement. Other sections of
the article cover DECENT and the use of AT and the CEF to inform DECENT tools, as
well as the DECENT framework, its phases, and conclusion.

2 Theoretical Framing of User Engagement


In the design of content, products, systems, and services, engagement is a universal goal,
and every designer strives to engage users [1]. Insufficient engagement may result from
an imbalance in the integration of appropriate theoretical content and features that sus-
tain user interest [2, 3]. In various attempts to understand engagement, researchers have
drawn on and developed a wide range of theories which primarily, are concerned with
engagement. Table 1 summarizes the disciplinary foundations, descriptions, and con-
ceptualizations of engagement as they relate to each of these theories and models. Eight
theoretical perspectives that focus on engagement were identified and they are Flow
Theory, Motivation Theory, O’Brien and Tom’s Model of Engagement, Sidner et al.’s
Model of Engagement, Technology Acceptance Model, Unified Theory of Acceptance
and Use of IT, Short et al.’s Model of User Engagement and Ludden et al.’s ‘PERMA’
Framework. Table 1 indicates that the theoretical conceptions of engagement can be sep-
arated into several categories, including academic, cognitive, intellectual, institutional,
emotional, behavioral, social, and psychological. These distinctions are made to assist
a specific type of socio-cultural context-dependent analysis and design, rather than on
objective knowledge of engagement as a universal neurophysiological or social real-
ity. These reasons indicate the ontological perspectives on engagement that have been
brought to bear and the choice of using AT and the CEF as suitable models used in the
paper for uncovering users’ socio-cultural contexts for user engagement designs. For
example, Engeström’s [29] model of AT, derived from psychology, information sys-
tems design, human-computer interactions, organizational learning, and cultural studies
emerged as a result of the socio-cultural perspective. It investigates the complexities of
human-environment interactions by identifying the units of an activity system, how they
are related, their various voices, their history, flaws, and changes [12]. The elements of
an activity system are the subject, object, and community, while the artifacts used in
context-determining activities are tools, rules, and divisions of labour.
298 T. Ikwunne et al.

Table 1. Theoretical perspectives to frame user engagement

Theory/model Disciplinary foundation Model description Engagement


conceptualization
Flow theory, Cowley et al. Derived from the field of Flow theory postulates the Flow and engagement are
[17] health psychology existence of an optimal and frequently expressed in the
pleasurable state same way. Engagement has
characterized by a tractable also been defined as a subset
challenge, immersion, of flow and a more passive
control, freedom, clarity, state, which makes it
immediate feedback, preferable to flow when the
temporal insensitivity, and user has less control [18]
changes in one’s sense of
identity [17]
Motivation Theory, Seddon Derived from the field of Motivation is described as a When planning online
et al. [19] health psychology dynamic process in which collaborations, for example,
sustained effort is applied to the motivational model is
pursue goals that satisfy used to provide a structure to
needs that are “subject to promote the interaction and
cognitive processes and set motivation required for
against values” [19]. long-term engagement. The
Motivational factors are reasons for activities, goals,
thought to include reasons for sense of agency, rewards, and
action (needs and values), social contexts of users have
interest, a favourable social all been identified as key
context, effective feedback, elements of motivation
and a sense of agency derived
from choice, control, and a
positive expectation of
success (self-efficacy and
self-worth) [19]
O’Brien and Tom’s Model Information systems Proposed four stages of user Engagement is characterized
of Engagement [20] design engagement: ‘Point of as “a type of user experience
Engagement’, ‘Engagement’, characterized by attributes
‘Dis-engagement’ and such as challenge, positive
‘Re-engagement’. The affect, durability, aesthetic
following attributes are and sensory appeal, attention,
associated with each stage. feedback, variety/novelty,
Novelty, aesthetics, interest, interactivity, and perceived
and motivation are important user control.”
at the first stage, ‘Point of
Engagement,’ while control,
feedback, positive and
negative effect, challenge,
and connectedness are also
important at the second stage,
‘Engagement.’ The model
also includes a
‘Disengagement’ stage that is
influenced by user demands,
affect, time, and usability, as
well as a ‘Re-engagement’
stage that can be short or long
term and occur multiple times
during use
(continued)
Theoretical Perspectives Towards Culture-Centered User Engagement Design 299

Table 1. (continued)

Theory/model Disciplinary foundation Model description Engagement


conceptualization
Sidner et al.’s Model of Information systems This model explicitly The process by which two (or
Engagement [21] design depicted engagement as a more) participants establish,
process with three distinct maintain, and terminate their
phases: a beginning, a perceived connection is
suspensive period, and an referred to as engagement.
end. These phases are then This process entails making
subdivided into a set of user initial contact, negotiating a
actions. This framing collaboration, ensuring that
implicitly situates the the other party is still
engagement within the participating in the
context of a conversation interaction, deciding whether
between at least two agents, to stay involved, and deciding
in which the user is portrayed when to cut the connection
as an active and receptive
participant, and engagement
as a continuous, synchronous
process with a clearly defined
beginning and end
Short et al.’s Model of User Health psychology, social The model incorporates the This model describes
Engagement [22] psychology, persuasive Elaboration Likelihood engagement as “the quality of
technology, information Model [23]; Persuasive a user experience,
technology, and business Systems Design [24]; the characterized by increased
Internet Intervention Model attention, positive affect,
[25]; and the Conceptual sensory and intellectual
Model of User Engagement satisfaction, and mastery.“
[20] to unify growing Sustained engagement occurs
empirical evidence that when a user perceives an
individual, environmental, intervention to be usable,
and design factors influence relevant, interactive,
engagement with web-based motivating, and persuasive
interventions
(continued)
300 T. Ikwunne et al.

Table 1. (continued)

Theory/model Disciplinary foundation Model description Engagement


conceptualization
Unified Theory of Psychology and This model combines eight The model is centered on the
Acceptance and Use of IT, information systems social cognitive theories into adoption and sustained use of
Venkatesh et al. [26] design four key constructs that technological interventions
explain IT acceptance and
use. These constructs are
‘performance expectancy’
(the degree to which an
individual believes the system
will assist him or her in
achieving gains in job
performance), ‘effort
expectancy’ (the degree of
ease associated with the use
of the system),’ social
influence’ (the degree to
which an individual perceives
that important others believe
he or she should use the new
systems), and ‘facilitating
conditions’ (perceptions of
the resources and support
available to perform a
behaviour). These constructs,
which serve as direct
determinants of acceptance
and usage behaviour, are
moderated to varying degrees
by age, gender, and
voluntariness of use
Technology Acceptance Psychology and The model is based on the The Technology Acceptance
Model, Davis [27] information systems idea that the user’s attitude Model, which is similar to the
design toward technology influences Unified Theory of
the adoption and usage of a Acceptance and Use of IT, is
technological platform used to explain the adoption
and sustained use of a
technological platform
Ludden et al.’s ‘PERMA’ Persuasive Technology, It is proposed that effective, Engagement is incorporated
Framework Positive Psychology, and appealing, and compelling into the model as a
[28] Information Technology design can improve adherence component that can improve
to web-based interventions participant well-being
Positive emotion,
engagement, relationships,
meaning, and
accomplishment are five
components that are said to
be relevant to the design of
web-based interventions
(PERMA)

While AT can be used to explain why an individual user interacts with mHealth
artifacts in his or her daily communication, the Altheide [30] model of CEF provides
a rich description of how each individual interacts with mHealth artifacts in broader
Theoretical Perspectives Towards Culture-Centered User Engagement Design 301

socio-cultural contexts to inform DECENT tools (as detailed in Sect. 3.1). Thus, AT and
CEF are used as a lens for uncovering the socio-cultural contexts of a user group for
user engagement designs.

3 DECENT Framework and the Use of AT & CEF to Inform


the DECENT Tools

The DECENT framework is intended to be a process-oriented user-centered design pro-


cess that takes a socio-cultural approach, has a critical understanding of the importance
of design in socio-cultural contexts, and focuses on aesthetic, socio-cultural, and contex-
tual values. DECENT framework has six phases and it’s adapted from a user-centered
framework, as described in detail in Sect. 4. Phase 1: Socio-cultural filtration – This
phase involves a collection of sets of tools to understand users’ socio-cultural contexts.
Phase 1 consists of six tools as stated earlier: (1) personas, (2) capture cards and post-
cards, (3) communication delight, (4) self-built guide, (5) contextual inquiry, and (6)
ethics. Other phases of DECENT include analysis of needs assessment of users, design
solution, socio-cultural checklist, evaluation, and implementation. The phases of the
DECENT framework are described in Sect. 4.2.
AT and CEF perform as broad theoretical frameworks to inform DECENT tools.
AT and CEF provide a holistic context with which to inform DECENT tools (Fig. 1).
In this case of mHealth technology engagement design, AT can illustrate why a user
relates with the mHealth, but it lacks a vocabulary to investigate what causes a specific
user to relate with mHealth technologies through his or her daily communication in
broader socio-cultural contexts. As a result, applying AT does not provide insight into
how users’ engagement with mHealth technologies is structured in social contexts. To
address this limitation, CEF (as defined by (Foth and Hearn, [13]) will be used, which
integrates three layers of interpretation (technical, social, and discursive) to provide a
rich description of how mHealth is structured in a social context.

Fig. 1. Activity system and CEF to inform the DECENT tools


302 T. Ikwunne et al.

These units of activity systems can be used as an organizing principle for understand-
ing the socio-cultural complexity of users [12]. Oers [31, p. 71] defined an activity in AT
as “any motivated and object-oriented human enterprise, with roots in cultural history
and depending for its actual occurrence on specific goal-oriented actions”. Deliver-
ing mobile training to community health workers (CHWs), for example, is an activity;
CHWs can replay important training content without the need for additional classroom
presence, which helps to manage care for vulnerable populations. The AT framework,
according to Good and Omisade [32, p. 54], “uses activity as the basic unit for studying
human practices”. “Activity, or ‘what people do,’ is reflected in actions as people interact
with their environment.“ Components are embodied in activity. The theory’s components
are the subject, object, and community, while the artifacts used in the activities to deter-
mine the context are tools, rules, and divisions of labour. Activity is carried out by
a subject (e.g., mobile trainers) who carries out activity toward the solution from the
activity (object, e.g., trained CHWs) and is mediated by tools (e.g., training modules) in
collaboration with others (community). Cultural factors such as conventions (rules) and
social divisions (a division of labour) within the context shape and constrain the structure
of the activity [33]. AT also emphasizes context factors and interpersonal interactions,
arguing that some context must be considered in the analysis of human actions because
the ultimate cause of human activities is needs [34]. AT provides a strong framework
for investigating contextual factors and demonstrates the complexities and fluidity of
activities in context.

3.1 EXAMPLES of AT & CEF as a Theoretical Lens to Inform DECENT Tools


Use AT to Analyze Effective Personas: AT serves as an analytical lens in understanding
how to achieve the goal of creating effective personas to understand users. The instance
in Fig. 2 depicts the pattern in which developers interact with their tools to achieve their
goals of building an effective persona.

Fig. 2. A schematic diagram for analyses of building effective personas

Use AT to Analyze Capture and Postcards: AT serves as an analytical lens in under-


standing how to achieve the goal of creating capture and postcards to understand the ways
users engage or disengage with mHealth products in Fig. 3. The capture and postcards
(Fig. 3) are one in which users are involved improvising new approaches and solutions.
Theoretical Perspectives Towards Culture-Centered User Engagement Design 303

The users are referred to as “innovators” as they are individuals deeply involved in co-
creating their user engagement designs with the designers. They share more information
about different points of engagement/disengagement with mHealth products that they
have used previously with developers. The information will aid in user engagement
designs.

Fig. 3. A schematic diagram for analyses of users as innovators (capture and postcards)

This approach is applied as a DECENT tool to capture and post in form of cards,
special moments of users engaging with mHealth products and different stages of points
of engagement/disengagement to uncover insights about users’ lifestyles often portray in
their inherent cultural roots while engaging with mHealth products for user engagement
designs.

Use CEF to Analyze Communicative Delight: The concept of communicative ecol-


ogy arose in response to concerns that studies attempting to identify causal relationships
between discrete technologies and social impacts overlook variables critical to the suc-
cessful implementation and adoption of technologies in situ [35]. The technical, social,
and discursive layers embodied by CEF will help illustrate how uses of technologies are
structured in social contexts by solidifying contextual factors in the division of labour
and community components of AT. According to Foth and Hearn [13], the three layers
of CEF are as follows: The technology and media layer describes the methods used
to communicate between various people and groups, and it includes all communica-
tion devices, distribution systems (either digital or analogue), and the technical systems
that enable them (e.g. software). The discursive layer is ideational and focuses on the
actual communication content, such as stories, understandings, beliefs, and symbols that
define – in this case – design culture and design practices for user engagement. The peo-
ple layer describes the various individuals and groups involved, as well as their social
relationships and the social institutions and structures that link them. Figure 4 shows the
analyses of communicative delight with CEF.
304 T. Ikwunne et al.

Fig. 4. A schematic diagram for analyses of communicative delight

Communicating visually via technology occurs to us every day and we should be


aware that it is happening. The technology and media layer explains the means used to
communicate between the different people and groups and includes all communication
devices, distribution systems (either digital or analogue), and the technical systems that
enable them (either software or mechanical). By information technology, we simply
mean those external devices, and procedures that are used in helping create, organize,
transmit, store, and retrieve info.
The discursive layer is ideational and focuses on the actual content of communication
such as the stories, understandings, beliefs, and symbols that define – in this case – design
culture and design practices for user engagement. The ecology or network metaphor
underlying this framework supports “the possibility of network analyses of relationships
between agents in the ecology” (Foth and Hearn, [13], p. 9), and supports a nuanced
approach to uncovering these “communication activities” because “to understand one
aspect of communication within a particular setting, you need to understand how it fits
into the wider communicative ecology” (Tacchi [36], p. 6).

Use AT to Analyze Self-Built Guide: Users build personal information based on their
experience engaging with a mHealth product in form of stories.

Fig. 5. A schematic diagram of the self-built pattern.

This may make developers more included in knowing users’ values and interests.
Figure 5 shows a schematic diagram of the self-built guide pattern.
Theoretical Perspectives Towards Culture-Centered User Engagement Design 305

4 DECENT Framework
This study is about improving user engagement designs and development. McCurdie
et al. [37] argue that mHealth technologies have not done enough in engaging the users.
It has been established in this paper that the ways users engage with mHealth technol-
ogy and behave are greatly influenced by their previous experience and socio-cultural
background. It is through a better understanding of users’ perceptions and socio-cultural
values that software designers/developers will move into a new paradigm of quality where
technological products have added value, met users’ true needs, and make their experi-
ence more meaningful [38]. Thus, we augmented two phases of the user-centered design
with “socio-cultural filtration” and “socio-cultural checklist” to inform the DECENT
tools. Thus, DECENT is an adapted model of user-centered design that provides tools
for establishing socio-cultural contexts. The DECENT framework has six phases and
it’s adapted from a user-centered framework (details in Sect. 4.1).
The presented “socio-cultural filtration” and “socio-cultural checklist” would fit well
into two phases of the user-centered design framework (Fig. 6). The “socio-cultural
filtration” phase is focused on enabling mHealth designers to understand the socio-
cultural contexts of the user to bring input to the first phase of user-centered design, the
“analysis of needs assessment of user” which focuses on the understanding of the user’s
needs and values.

Fig. 6. DECENT FRAMEWORK -adapted user-centered design framework extended by our


method. Changes are in blue.

The second phase of the user-centered design framework that would benefit from the
introduction of the presented socio-cultural checklist is the “design” phase. This phase
is focused on incorporating the data gathered from the “analysis of needs assessment
of user” and design of the solution. To this end, the “socio-cultural checklist” aids in
determining whether the quality of the solution is consistent with the socio-cultural
values of the designed solution.
As shown in Fig. 6, the “socio-cultural filtration” phase and “socio-cultural checklist”
phase extends the user-centered design framework in two specific phases to get to the
overall phases of the DECENT framework. Thus, DECENT has 6 phases: socio-cultural
filtrations, analysis, design, socio-cultural checklists, evaluations, and implementation.
The goal of DECENT is to aid designers to become more aware of or sensitive to and
capturing users’ socio-cultural contexts into user engagement designs. According to ([7],
306 T. Ikwunne et al.

p.12), the closer the similarity in the socio-cultural background between the user and
the designer, the stronger the assurance of a successful human-computer interaction.
Whether designers/developers share the same socio-cultural origin with users or not,
designers are required to be sensitive to the users’ socio-cultural contexts and be able to
view them using DECENT tools.
The DECENT tools for capturing socio-cultural contexts of the user group are shown
in Fig. 7. It is comprised of six steps.

Fig. 7. DECENT’s tools of socio-cultural context filtrations.

They comprise (I) contextual inquiry (II) personas, (III) capture cards and postcards,
(IV) self-built guide, (V) communication delight, and (VI) ethics. These six steps are
explained in the next section of the phases of DECENT.
In understanding the Korean context of cross-cultural design practice in Korean edu-
cational design, Lee [39] conducted a comparison of ‘Before’ and ‘After’ the application
of the Cross-Cultural Design education method in Korean design education to create a
more user-centered design, which is summarized in Table 2.

4.1 EXAMPLES of AT & CEF as a Theoretical Lens to Inform DECENT Tools


DECENT has six phases that include phase1: Socio-cultural filtration, phase 2: analysis
of the needs assessments of users, phase 3: design of the solution, phase 4: socio-cultural
Theoretical Perspectives Towards Culture-Centered User Engagement Design 307

Table 2. Summary of Lee [39]’s comparison of ‘Before’ and ‘After’ the application of the cross-
cultural Design

Before After
Outcome-oriented design process Process-oriented design process
Function, solution approach Socio-cultural approach
Focus majorly on aesthetic values Focus both on aesthetic, socio-cultural, and
contextual values
Lack of critical understanding of the importance Critical understanding of the importance of
of design in socio-cultural contexts design in socio-cultural contexts

checklist, phase 5: evaluation against requirements, and phase 6: implementation. The


phases of DECENT are described in detail as follows:

Phase 1: Socio-Cultural Filtration: This element of DECENT consists of tools to


enable mHealth developers to understand the socio-cultural complexity of the users.
Each tool for capturing the socio-cultural contexts of the user group is shown from I to
VI.

Contextual Inquiry: DECENT employs a contextual inquiry tool to elicit users’ socio-
cultural backgrounds by probing for examples of user behaviour when interacting with
a mHealth app, stories or images about user acceptance and use of mHealth apps, and
changes in users’ lifestyles because of their use of mHealth apps.

Personas: DECENT framework uses Personas as a tool to represent a particular group


of people, based on their interests and behaviour. Collecting insights about the socio-
cultural background of the users, feelings of frustration, and their goals will aid in devel-
oping a broad knowledge of the users. Personas are synthesised from the trend obser-
vation [40]. One of the widely described benefits of personas in literature is improved
communication about the target users within the design team and with other stakehold-
ers [41–43]. Chapman and Milham [44] observed that personas still lack solid empirical
grounding. According to Miaskiewicz and Kozar [45], a name and a picture are selected
to represent the fictional representative in a persona and a persona is described in narra-
tive form. The narrative has two goals: (1) to make the persona seem like a real person,
and (2) to provide a story about what the persona needs in the context of the product
being designed. This narrative of a persona begins with a description of the type of
individual that the persona is, interests and frustrations, occupation, and so forth.

Capture and Postcard: The capture and postcard tool of DECENT framework is
inspired by Lee [39]. Lee [39] engaged in a cross-cultural design programme called
“Bon-Voyage” that aimed to develop designs based on the understanding of Eastern and
Western cultures, and anchor ideas on the differences between Korean and British cul-
ture to inspire a unique product design that captures tourist experience while traveling.
Capture and postcard were important tools in uncovering the socio-cultural contexts of
tourists while traveling. ‘Capture Cards’ enable capturing of special objects encountered
308 T. Ikwunne et al.

while traveling, in a postcard format, thus allowing sharing the experience with others. In
the work, capture and postcard were identified as key mechanisms for understanding cul-
ture and cross-culture, identifying cross-cultural design strategies, and adding insights
into how cross-cultural design can benefit design communities. This idea is applied to
capture points of engagement, disengagement, re-engagement, and self-management of
one’s health that does not involve mHealth technology.
Self-Built Guide: Lee [39], in his paper created a self-built guidebook that allows users to
build a body of travel information based on their own experiences. The book was aimed
at assisting people in creating fun memories as they encounter locations that tourists
are not normally aware of, giving them the impression that they are on a treasure hunt
to discover information and stories to share and send to friends or keep as a personal
reminder. Self-built as a DECENT tool, allows users to build a body of engagement
stories with app info based on their experience. It assists users in creating fun memories
as they encounter while engaging with a previous mobile app that normally designers are
not aware of with an impression to share stories to friends or keep for personal memories.
Communication Delight: DECENT framework incorporates communication delight
tools to share ideas and facilitate communication in the form of images between two or
more cultures. This helps bridge the gap in cultural differences between the software
designers and users in the design of mHealth technologies. According to Lee [39], designs
might emerge as a result of improved communication or as a result of advancements in
communication- it can work both ways. Lee [39] developed a tool Emotional Blind, that
gives British people the experience of Korean community culture by presenting how
people can freely express their emotions visually and communicate with neighbours. In
the same vein, communication delight as a DECENT tool enables designers and users
to share ideas and communicate via images and visual diagrams. The idea of using
visual images to communicate between two or more cultures was birthed because of the
researchers’ interactions with software developers on ways to uncover the socio-cultural
background of users. Thus, communication delight is used as a DECENT tool to allow
mHealth users to communicate with designers or other users by using images rather than
words when interacting with them to share their experiences with mHealth products. As
a result, language barriers may no longer irritate the user.
Ethics: Every procedure for collecting data from participants should be ethical and
should not infringe on the participants’ rights.
Phase 2. Analysis: This phase incorporates the data gathered in the first phase (phase
1) and entails mapping out all of the necessary stakeholders as well as empathizing with
users.
Phase 3. Design: This is the phase in which ideas are generated. Ideas are generated
and can be improved through brainstorming. Team members build on each other’s ideas
before deciding on the best one and prototyping it.
Phase 4. Socio-Cultural Checklist: This phase involves determining whether the qual-
ity of the developed concepts/prototypes is consistent with the socio-cultural values of
the designed solution.
Theoretical Perspectives Towards Culture-Centered User Engagement Design 309

Phase 5. Evaluation: The evaluation phase entails testing the solution prototype with
users to learn how they feel about it.

Phase 6. Implementation: The final phase focuses on how to put the final solution into
action.

5 Conclusion
Much research has been conducted to investigate the concept of user engagement. This
paper identified and described eight theoretical perspectives pertinent to understand-
ing user engagement, namely the Flow theory, Motivation theory, O’Brien and Tom’s
Model of Engagement, Sidner et al.’s Model of Engagement, Short et al.’s Model of
User Engagement, Unified Theory of Acceptance and Use of IT, Technology Accep-
tance Model, and ‘PERMA’ framework. According to this paper, an interdisciplinary
approach incorporating a variety of technological and socio-cultural factors is required
to comprehensively model user engagement in mHealth-based interventions. This study
can be used to guide future research in mHealth designs by providing a set of consolidat-
ing concepts to strengthen the theoretical framing of user engagement designs. Further
evaluation is also required to determine the extent to which the core proposals of the
two theoretical perspectives – AT and CEF as a lens in uncovering users’ socio-cultural
contexts – are supported by empirical evidence in the implementation of the DECENT
tool. The plan for the future is to develop, refine and test DECENT using the theoretical
knowledge of AT and the CEF in a specific application of mHealth interventions.

Acknowledgments. This research was conducted with the financial support of Science Founda-
tion Ireland under Grant Agreement No. Grant 18/CRT/6222 at the ADAPT SFI Research Centre
at Trinity College Dublin. The ADAPT SFI Centre for Digital Content Technology is funded by
Science Foundation Ireland through the SFI Research Centres Programme and is co-funded under
the European Regional Development Fund (ERDF) through Grant #13/RC/2106_P2.

References
1. Doherty, K., Doherty, G.: Engagement in HCI: conception, theory, and measurement. ACM
Comput. Surv. (CSUR) 51(5), 1–39 (2018)
2. Hingle, M., Patrick, H.: There are thousands of apps for that: navigating mobile technology
for nutrition education and behavior. J. Nutr. Educ. Behav. 48(3), 213–218 (2016)
3. Tang, J., Abraham, C., Stamp, E., Greaves, C.: How can weight-loss app designers best engage
and support users? A qualitative investigation. Br. J. Health. Psychol. 20(1), 151–171 (2015)
4. Ikwunne, T., Hederman, L., Wall, P.J.: Design processes for user engagement with mobile
health: a systematic review. Int. J. Adv. Comput. Sci. Appl. 13(2), 291–303 (2022)
5. Ikwunne, T., Hederman, L., Wall, P.J.: Understanding user engagement in information and
communications technology for development: an exploratory study. In: Stephanidis, C., Mar-
cus, A., Rosenzweig, E., Rau, PL.P., Moallem, A., Rauterberg, M. (eds.) HCII 2020. LNCS,
vol. 12423, pp. 710–721. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60114-
0_46
310 T. Ikwunne et al.

6. Ono, M.M.: Emergent strategies for designing new products facing cultural diversity, within
the globalisation context. In: 2nd Conference on Innovative Research in Management,
Stockholm (2002)
7. Shen, S.T., Woolley, M., Prior, S.: Towards culture-centered design. Interact. Comput. 18(4),
820–852 (2006)
8. Hofstede, G.: Cultures and Organizations: Software of the Mind. McGraw-Hill, New York
(1991)
9. De Angeli, A., Kyriakoullis, L.: Globalisation vs. localisation in e-commerce: cultural-
aware interaction design. In: Proceedings of the Working Conference on Advanced Visual
Interfaces, pp. 250–253 (2006)
10. Honold, P.: Learning how to use a cellular phone: comparison between German and Chinese
users. Tech. Commun. 46(2), 196–205 (1999)
11. Sonderegger, A., Sauer, J.: The influence of socio-cultural background and product value in
usability testing. Appl. Ergon. 44(3), 341–349 (2013)
12. Frambach, J.M., Driessen, E.W., van der Vleuten, C.P.M.: Using activity theory to study
cultural complexity in medical education. Perspect. Med. Educ. 3(3), 190–203 (2014). https://
doi.org/10.1007/s40037-014-0114-3
13. Foth, M., Hearn, G.: Networked individualism of urban residents: discovering the com-
municative ecology in inner-city apartment buildings. Inf. Commun. Soc. 10(5), 749–772
(2007)
14. Brodie, R.J., Hollebeek, L.D., Jurić, B., Ilić, A.: Customer engagement: conceptual domain,
fundamental propositions, and implications for research. J. Serv. Res. 14(3), 252–271 (2011)
15. Brodie, R.J., Ilic, A., Juric, B., Hollebeek, L.: Consumer engagement in a virtual brand
community: an exploratory analysis. J. Bus. Res. 66(1), 105–114 (2013)
16. Hollebeek, L.: Exploring customer brand engagement: definition and themes. J. Strateg. Mark.
19(7), 555–573 (2011)
17. Cowley, B., Charles, D., Black, M., Hickey, R.: Toward an understanding of flow in video
games. Comput. Entertain. (CIE) 6(2), 1–27 (2008)
18. Webster, J., Ahuja, J.S.: Enhancing the design of web navigation systems: the influence of
user disorientation on engagement and performance. MIS Q., 661–678 (2006)
19. Seddon, K., Skinner, N.C., Postlethwaite, K.C.: Creating a model to examine the motivation
for sustained engagement in online communities. Educ. Inf. Technol. 13(1), 17–34 (2008)
20. O’Brien, H.L., Toms, E.G.: What is user engagement? A conceptual framework for defining
user engagement with technology. J. Am. Soc. Inform. Sci. Technol. 59(6), 938–955 (2008)
21. Sidner, C.L., Lee, C., Kidd, C.D., Lesh, N., Rich, C.: Explorations in engagement for humans
and robots. Artif. Intell. 166(1–2), 140–164 (2005)
22. Short, C., Rebar, A., Plotnikoff, R., Vandelanotte, C.: Designing engaging online behaviour
change interventions: a proposed model of user engagement (2015)
23. Petty, R.E., Cacioppo, J.T.: The elaboration likelihood model of persuasion. In: Petty, R.E.,
Cacioppo, J.T (eds.) Communication and Persuasion. SSSP, pp. 1–24. Springer, New York
(1986). https://doi.org/10.1007/978-1-4612-4964-1_1
24. Oinas-Kukkonen, H., Harjumaa, M.: Persuasive systems design: key issues, process model,
and system features. Commun. Assoc. Inf. Syst. 24(1), 28 (2009)
25. Ritterband, L.M., Thorndike, F.P., Cox, D.J., Kovatchev, B.P., Gonder-Frederick, L.A.: A
behavior change model for internet interventions. Ann. Behav. Med. 38(1), 18–27 (2009)
26. Venkatesh, V., Morris, M.G., Davis, G.B., Davis, F.D.: User acceptance of information
technology: toward a unified view. MIS Q., 425–478 (2003)
27. Davis, F.D.: A technology acceptance model for empirically testing new end-user information
systems: theory and results, Doctoral dissertation, Massachusetts Institute of Technology
(1985)
Theoretical Perspectives Towards Culture-Centered User Engagement Design 311

28. Ludden, G.D., Van Rompay, T.J., Kelders, S.M., van Gemert-Pijnen, J.E.: How to increase
reach and adherence of web-based interventions: a design research viewpoint. J. Med. Internet
Res. 17(7), e4201 (2015)
29. Engeström, Y., Miettinen, R., Punamäki, R. (eds.) Perspectives on Activity Theory. Cambridge
University Press, Cambridge (1999)
30. Altheide, D.L.: An ecology of communication: toward a mapping of the effective environment.
Sociol. Q. 35(4), 665–683 (1994)
31. van Oers, B.: Educational forms of initiation in mathematical culture. In: Kieran, C., Forman,
E., Sfard, A. (eds.) Learning Discourse, pp. 59–85. Springer, Dordrecht (2002). https://doi.
org/10.1007/0-306-48085-9_2
32. Good, A., Omisade, O.: Linking activity theory with user centred design: a human computer
interaction framework for the design and evaluation of. Appl. Interdiscip. Theory Health
Inform. Knowl. Base Practitioners 263, 49 (2019)
33. Kang, S.: Designing for design activity. In: Undisciplined! Design Research Society
Conference 2008, pp. 16–19. Sheffield Hallam University, Sheffield, UK (2009)
34. O’Leary, D.: An activity theory framework for DSS for extreme events: with a hurricane
example. In: Pre-ICIS SIG DSS Workshop (2007)
35. Dourish, P.: What we talk about when we talk about context. Pers. Ubiquit. Comput. 8(1),
19–30 (2004)
36. Tacchi, J.A.: Studying communicative ecologies: an ethnographic approach to information
and communication technologies (ICTs). In: 56th Annual Conference of the International
Communication Association (2006)
37. McCurdie, T., Taneva, S., Casselman, M., Yeung, M., McDaniel, C., Ho, W., et al.: mHealth
consumer apps: the case for user-centered design. Biomed. Instrum. Technol. 46(s2), 49–56
(2012). https://doi.org/10.2345/0899-8205-46.s2.49. PubMed, CrossRef, Google Scholar
38. Marzano, S.: New Values for the Millennium: Philips Corporate Design. V+ K Publishing,
Eindhoven (2000)
39. Lee, D.Y.: Interaction of cultures through design’ cross-cultural design (CCD) learning model:
the development and implementation of CCD design education in South Korean higher
education, Doctoral dissertation, Goldsmiths, University of London (2016)
40. Roussou, M., Katifori, A., Pujol, L., Vayanou, M., Rennick-Egglestone, S.J.: A life of their
own: museum visitor personas penetrating the design lifecycle of a mobile experience. In:
CHI 2013 Extended Abstracts on Human Factors in Computing Systems pp. 547–552 (2013)
41. Cooper, A., Reimann, R.M.: About Face 2.0. Indianapolis. Wiley, Hoboken (2002)
42. Grudin, J., Pruitt, J.: Personas, participatory design and product development: an infrastructure
for engagement. In: Proceedings of the Participatory Design Conference, pp. 144–161. ACM
Press (2002)
43. Ma, J., LeRouge, C.: Introducing user profiles and personas into information systems
development. In: Proceedings of the Americas Conference on Information Systems. AIS
(2007)
44. Chapman, C.N., Milham, R.P.: The persona’s new clothes: methodological and practical
arguments against a popular method. In: Proceedings of the Human Factors and Ergonomics
Society, pp. 634–636. HFES (2006)
45. Miaskiewicz, T., Kozar, K.A.: Personas and user-centered design: how can personas benefit
product design processes? Des. Stud. 32(5), 417–430 (2011)
Automated Meal Planner Using Multiple
User-Defined Benchmarks for Healthy Eating

Catherine Lyons-Rocque and Sudhanshu Kumar Semwal(B)

Department of Computer Science, University of Colorado, Colorado Springs, USA


[email protected]

Abstract. “What’s for dinner?” is an age-old question asked in households across


the world on a daily basis. In a world where people are both increasingly busy
and increasingly health conscious, it is becoming more and more cumbersome to
create meal plans quickly, using user-defined recipes that meet nutritional value
targets. This paper seeks to solve that problem by allowing users to define recipes,
including the recipe’s nutritional value, and then generates a meal plan by randomly
selecting foods that fit in the user-defined criteria in order to provide dietary variety.

Keywords: Meal planning · User interface design

1 Introduction
1.1 Problem Statement
For families, it can be cumbersome to balance nutrition, cost, and the time needed to make
the plan. While there are several tools widely available on a variety of platforms, there
seems to be no option that generates a plan for the user, unless that plan is driven by the
app’s proprietary recipes. According to the Bureau of Labor Statistics, the percentage
of households where both adults work is between 52% and 58% [1]. This increasing
busyness in the day-to-day lives has led people to turn to digital options for handling
simple, but time-consuming tasks. Meal planning is something that everyone needs to
do. Whether it is done meal-to-meal, day-to-day, week-to-week or month-to-month,
everyone at some point needs to decide what to eat. Cooking at home is cheaper than
dining out, and with health and fitness as a multi-billion-dollar industry, it is clear that
individuals are interested in trying to ensure their meals fit in with their dietary and
nutritional goals. Meal planning relies on individuals to juggle a number of factors: how
many individuals are they cooking for, does the plan for each day fit with dietary and
nutritional needs and exactly which recipes should be served when. Generating shopping
lists from these recipes without relying on digital tools is cumbersome, requiring the
individual to go through every recipe they plan on cooking and copying the ingredients
into a list.
The application created here was designed to make meal planning and shopping list
generation a much easier on end users by providing a randomly generated meal plan
that fits within the provided nutritional and cost criteria, complete with a pre-generated
menu.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 312–324, 2023.
https://doi.org/10.1007/978-3-031-18344-7_21
Automated Meal Planner Using Multiple User-Defined Benchmarks 313

1.2 Desired Functionality


This tool was initially created to serve a modern working family where meal planning
was taking longer than desired. Trying to juggle nutrition goals manually was difficult
and time-consuming. Often, variability was ignored in favor of repeating the same meals
week after week since the calculations were already done. Another common issue was
making plans driven by cravings or struggling when none of the family recipes were
appealing at the time the meal plan was created.
Many meal planner application options currently available lack one or more of the
features that were desired. For example, the planner may only offer calories as a criterion,
ignoring other macro nutrients, and the family budget. Or the planner did not allow for
the user to input their own recipes, instead pushing their own recipes that may or may
not have been desirable. Or users could not manually set meals on given days, such as a
weekly family breakfast or favorite Friday night meal. Or the user could not get a new
plan for days that just did not feel pleasing i.e. a recipe that has been overused.
Certain desired behaviors were defined for the scope of this project:

• Easy to use, generating a meal plan in seconds


• Use user-driven recipes
• Add new recipes to the selector
• Adaptability as dietary and nutrition needs adjusted via criteria (calories, macros,
cost)
• Ranked criteria where if a plan meeting all criteria on a given day could not be found,
least valued criteria could be removed, and the day regenerated
• Ability to select a new dinner if a system provided dinner was not desired
• Ability to set certain meals on certain days instead of generating the meal randomly,
while still randomly generating any meals not directly set by the user
• Provide a shopping list for the week’s meal plan to ease buying groceries
• Break the shopping list into “Pantry” items that are multi-use (such as milk, spices,
oils) and “Buy Weekly” items that are consumed in the recipe
• Option to include sides with dinner as not every user may want to plan for sides as
well as the meal

Option to remove the previous week’s dinner recipes from the list of possible recipes
to force variation.

2 State of the Art


There are several popular existing apps to handle meal planning, but none that meets all
of the criteria listed above. This section will cover several popular meal planning apps.
Popularity based on number of downloads in the Google Play Store. While each of these
apps have several features to reduce the stress of meal planning, none provide all of the
functionality listed in the previous section.

Paprika
This app stores your recipes, allowing you to sort your recipes into user-defined cate-
gories. Recipes can be added manually or imported directly from websites. It generates
314 C. Lyons-Rocque and S. K. Semwal

shopping lists sorted into categories (such as “Dairy” or “Produce”. It also includes
“Pantry” items that users would keep in stock, with the ability to check off if it is in
stock, when it was purchased, and when it expires. It allows you to save reusable menus
for your favorite meals [2].
However, it still requires manual meal planning, returning to the same issue of having
to select recipes when the user may be driven by cravings, or a “nothing sounds good”.
Additionally, nutritional value of the recipes is not included (whether calculated or
entered by the user), leaving it to the user to track the criteria of their meal plan manually.
While it presents an easy-to-use interface, it does not meet the need of reducing time
spent meal planning.

Mealime
Mealime constructs meal plans for the user based on dietary restrictions and preferences.
User profiles boast “200 personalization options” plan type (such as “Classic” or “Veg-
etarian”), saves allergies and dislikes, and adjusts the recipe for the number of servings
desired. Nutritional data for each recipe is provided, and the app helps manage the cost
of the meal plan [3].
Where this app falls short is there is no mechanism for user defined recipes. They
provide only their own recipes, with no option to add additional recipes. The user is
limited to whatever recipes are available in the app. It also relies on the user to build the
plan manually, rather than providing a randomly generated plan. Nutritional information
is not totaled for the day, leaving total calculations for the user.

Whisk: Recipes and Meal Planner


Whisk allows users to import meal plans, provides nutritional calculations for each
recipe, has a catalog of recipes for trying new things. It allows you to scale the number
of servings up or down. It provides an easy-to-use shopping list. It provides a direct
connection to online shopping services such as Amazon, Walmart, InstaCart and more.
This allows users to automatically order their groceries with just a few clicks [4].
But as is the case with the other meal planners, it is still up to the user to take the time
to select meals. It also does not separate the list between items the user may have on stock
versus what needs to be purchased each time a recipe is cooked. No daily nutritional
value or weekly cost sums are available, leaving calculations to be made by the user.

3 Necessary Data and Tools Used – and Problem Solution


Prior to software design, it was important to identify what data would be necessary,
and the most appropriate way to store it. First, the criteria for how to select recipes was
narrowed down to the primary targets of daily nutrition: number of calories, amount in
grams of carbohydrates, protein and fat. Additionally, managing costs is important to
many households, so maintaining a weekly cost limit would aid users in staying within
their food budget. A weekly cost criterion was chosen over a daily cost criterion to
allow more expensive recipes to be included, while offsetting the weekly cost with less
expensive recipes on other days of the week. VisualStudio using Winforms classes were
used for the initial design. This provided simple tools to drag, drop, and change GUI
Automated Meal Planner Using Multiple User-Defined Benchmarks 315

elements quickly and easily. Both the GUI and the underlying processing are coded in
C#. For data storage, csv and text files are used. Csv files provide simple two-dimensional
relational databases with minimal overhead, and built-in library tools for reading and
writing to such files. Additionally, as a standard file type, there is no concern for any
changes to APIs from using more robust database tools. In the future, the vision for this
project is to become a mobile app. Csv and text files have a low storage footprint, and
do not require internet connectivity to use. They also do not require additional software
to be installed on the system. It was determined that these file types provided all the
functionality necessary without loss of data clarity.
Three files in total are required for this application to work. First, Recipes.csv, which
holds recipe names, nutritional value, cost, and the active Boolean. Second, Ingredi-
ents.csv which holds the ingredients of each recipe, with each ingredient as a single line
containing the recipe name, ingredient name, units, quantity, and the pantry Boolean.
The “isPantry” Boolean defines an ingredient as the sort of ingredient that does not need
to be purchased every time it is called for, for example milk or olive oil. These are items
that the user would have “in their pantry”, and simply need to check if the quantity they
have is enough for the week’s recipes.
This value is user-defined in the Input Recipe GUI, which allows user to tailor items
based on their own shopping habits. For example, some user may buy pasta in bulk and
would thus label it a “pantry” item, while others may simply buy a package of pasta
every time the meal plan includes a pasta dish.
The final file required for processing is the Preferences.txt file. This contains the
maximum thresholds for calories, carbohydrates (in grams), protein (in grams), fat (in
grams), and maximum weekly cost. Additionally, it contains a string list of each of these
criteria, ordered from least important to most important to the user. Each of these values
can be edited via the Preferences GUI.

Algorithm Details
In this section, the key algorithms developed in this application is reviewed. Not all
methods are covered, only the ones the author felt were significant or significantly coded
by the author. Before discussing the algorithms, an overview of the classes is needed.
The “Recipe” class defines a recipe by name, which meal of the day it is associated with
(an enumeration defined as Breakfast, Lunch, Dinner, Side, and Snack), the number
of calories in the recipe, the amount of carbohydrates (grams), the amount of protein
(grams), the amount of fat (grams), a Boolean indicating if the recipe is active, a list of
ingredients in the recipe, and the cost as a double. Though output uses the “$” symbol
to denote American dollars, the field itself is not tied to any currency, and users can
enter values in whatever local currency they prefer. This class does no manipulation or
calculations of data. The only methods available are simple getters.
The “Ingredient” class defines an ingredient by name, quantity, unit of measure (an
enumeration defined as Unit, lbs, tsp, tbsp, oz, cup, kg, g, ml, and l), and a Boolean
indicating whether the item is a “pantry” item or not. No calculations are performed in
this class. Methods consist of simple getters, and a method to convert the ingredient to a
string based on the layout of the Ingredients.csv file (quantity, unit and name separated
by tab characters) for use in the IO class.
316 C. Lyons-Rocque and S. K. Semwal

The “Day” class defines the “Day” object, which contains a recipe for breakfast,
lunch and dinner, a list of recipes for snacks, an enumeration for the day of the week,
and a Boolean indicating if it is a “good” day i.e., meets all of the criteria as defined
by the user, minus any criteria removed from calculation consideration. No outflowing
change in data is processed in this class. The methods consist of simple getters/setters,
and methods that total the calories, carbohydrates, protein, fat, and cost of the recipes
held by the Day object.
The “Preferences” class defines the user preferences of the application. It holds max
calories, max carbohydrates, max protein, max fat, max weekly cost, the ranking of the
aforementioned criteria, a Boolean indicating whether the user would like to remove the
previous week’s dinners from the list of options (preventing a recipe from appearing more
than once every other week to increase variety), and a Boolean indicating whether the
user would like to include dinner side dishes in their meal plan. Beyond the simple getter
and setter methods, this class provides the mechanism to both read in the preferences
from the “Preferences.txt” file, and to save updated preferences back into the same file.
The “IO” class handles the input and output processing necessary for the application,
including manipulation of the data into output-ready format. The “IO.CSVRecipeInput”
method reads the “Recipes.csv” file and parses out the columns to create a list of Recipe
objects, including converting Meal integers to the proper enumeration value (0 = Break-
fast, 1 = Lunch, 2 = Dinner, 3 = Snack, 4 = Side) and converting “TRUE” of “FALSE”
text to a Boolean for “isActive”. The contents of the Recipes.csv are shown in Fig. 1
with example recipes.

Fig. 1. Recipes.csv file

For each recipe it reads in, it calls “IO.IngredientInput”, which returns a list of ingre-
dients associated with the recipe. These ingredients are stored in the “Ingredients.csv”
file, which contains all the necessary information required for an Ingredient object,
including the name of the recipe it is associated with, the name of the ingredient, the
quantity, the unit, and a Boolean value indicating if it is a pantry item. Figure 2 shows
the Ingredients.csv file with the ingredients for a several recipes. Even simple foods
consumed as single ingredients (such as coffee) can be entered to be included in the
meal plan. Additional ingredients (such as coffee creamer) can be added to the “Coffee”
recipe to ensure proper calculations.

CalculateDay Method
Calculator.CalculateDay is the primary method where most calculations are completed.
For each day of the week, it calculates a meal plan meeting the user’s criteria. If a meal
cannot be found that meets the criteria, it will start removing criteria from the calculations
Automated Meal Planner Using Multiple User-Defined Benchmarks 317

Fig. 2. Ingredients.csv file

by setting the threshold variable to an absurdly high amount of 500,000 cal, and 5000
each for carbs, protein, fat, and cost.
Figure 3 shows the entry to the CalculateDay method, which takes in the Day object
of the day of the week being calculated.

Fig. 3. CalculateDay beginning

After reading the preference values and ensuring that there are remaining dinner
recipes to choose from, the CalculateDay method begins with calculating dinner. Dinner
is usually the largest meal of the day for most American families as parents work during
the day and children are typically in school.
As the largest meal, it was decided that it should be entered first as the other, smaller
meals would be easier to fit around the criteria values remaining after choosing dinner.
As dinner is the first meal, the method does not need to check if the recipe calories and
macros will fit. It is assumed that the user will not input a dinner whose calories, carbs,
protein and fat are higher than the daily maximum.
318 C. Lyons-Rocque and S. K. Semwal

If Day.Dinner is not null, it indicates that the user utilized the “Set Meal” functionality
to set a specific recipe for that day’s dinner.
If the Day’s object has a “null” dinner recipe, the dinner is chosen by calling “Se-
lectMeal”, which takes a list of recipes as input, selects a random index, and returns
the recipe in that index. After a recipe is chosen, the recipe is removed from the list of
dinners to prevent duplicates appearing in a single week.
After the dinner is removed, UpdateRemaining is called, which passes in the recipe,
and as output variables passes the variables containing the remaining calories, carbs,
protein, and fat. UpdateRemaining subtracts the recipe’s values from the totals remaining
for the day, as well as subtracting the cost from the remainingWeeklyCost variable, which
is held at the class level (and therefore does not need to be passed in or out of the method).
With dinner set, and again assuming that the user would not enter a dinner recipe with
calorie, carb, protein, or fat values that exceed the daily target, the Day.Good Boolean
is set to True indicating that the recipe fits in the day.
After the dinner is set, if there is at least one Side recipe, and the user has selected to
enable using dinner sides in the meal plan, the side dish is selected via SelectMeal, and
then UpdateRemaining is called. Again, no check against the criteria totals is done as it
is highly unlikely that a dinner plus side dish would consume all calories, carbs, protein
and fat for the day.
The code for the dinner portion of the algorithm is shown in Fig. 4.

Fig. 4. CalculateDay - dinner logic

Lunch Logic is summarized below: As the algorithm moves into the Lunch selection,
the program needs to start checking if the remaining meals fit within the criteria for the
day when its values are added to that of the dinner selection. If the user added the lunch
manually, the following process is skipped.
Automated Meal Planner Using Multiple User-Defined Benchmarks 319

• A Boolean “lunchOk” is used to track whether the lunch fits into the meal plan criteria
for the day. A second list is created so recipes that do not fit can be removed. Currently,
only dinners are needed to be unique, and so the lunch recipes, are not removed from
the main list of lunch recipes after selection, allowing them to be repeated during the
week. If the lunch was not set by the user, after randomly selecting the lunch recipe
via SelectMeal, the RecipeOk method is called, passing in the recipe and remaining
criteria values. RecipeOk checks if the remaining values minus the values of the
selected lunch recipe are greater than zero, and sets “lunchOk” to the returned value.
• If the lunch fits, UpdateRemaining is called, the Day’s Lunch variable is set to the
chosen recipe, the day is marked as good, and the algorithm breaks from the loop. If
the lunch does not fit, the Day.Good value is set to false, and the recipe is removed
from the list of possible lunches for that day.
• If the “todayLunches” list gets emptied due to no lunch fitting in the criteria, the
CalculateDay method is exited, returning to the Calculate method, which then calls
UpdateCriteria to remove the least important criteria remaining from the list of criteria,
and attempts to calculate the day again.
• As the Dinner reference on the day is already set, it will try to find a lunch that now
fits based on the updated criteria (where variable tracking the max threshold for the
least important remaining criteria is set to a high enough amount as to always return
TRUE when any recipe’s values is subtracted from it).
• If the lunch is already set, the values are subtracted from the remaining totals, and the
Day.Good value is set to TRUE.

The maximum number of snacks per day is three, leaving a possible range of snacks
chosen from zero to three.
First, a list of snacks to hold the snacks chosen for the day is created. Then, a
temporary list of snacks duplicates the snacks list to avoid deletion of snacks from
the main snacks list, similar to todayLunches and todayBreakfasts lists used in their
respective portions of the code. Snacks can not only be repeated day-to-day, but can be
repeated within the day as well.
If any snacks have already been set by the user in the “Set Meal” interface, those
snacks update the total remaining values for the criteria, and are added to the “daySnacks”
list. While there exists snacks in the “tempSnacks” list, and there are less than three snacks
in the “daySnacks” list, all snacks remaining in “tempSnacks” are checked if they would
be acceptable snacks (i.e., will not overshoot the criteria). Any snacks that no longer fit
after each snack is added is removed from the list.
This avoids any snack that does not fit from being chosen to add to the list, and
breaks out of the loop if the criteria list is expended prior to reaching the three-snack
maximum. As this processing occurs before the snack is randomly selected, the selected
snack does not need to be checked again prior to adding it to the list.
After a snack is chosen, the remaining criteria values are updated, the snack is added
to the list of snacks for the day, and the process is completed. This continues until either
320 C. Lyons-Rocque and S. K. Semwal

there are no snacks that will fit the remaining criteria for the day, or until the maximum
three snacks are reached.

Reroll Method
The Reroll method is called when the user is unhappy with the dinner choices for a given
day and wants to generate a new random choice.
To start, the method checks if the list of dinner recipes is empty. This occurs when
the user rerolls more times than they have dinner recipes, and so no recipe is left to be
chosen. The error message prompts the user to exit the application and try again with
the list of dinners reloaded.
If the list of dinner recipes is not empty, the program ensures that the chosen recipe
is removed from the list. It should have been removed when the dinner was set, but a
simple call to List.Remove verifies that it is no longer in the list of options.
A new dinner is chosen via the SelectMeal recipe, and that recipe is removed from the
list of potential dinners. Before that meal is set to the Day.Dinner property, the program
first finds the remaining allotment for calories, carbs, protein, and fat by taking the max
threshold value from the preferences object, subtracting the amount from the day (which
is totaled across all recipes held by that day), and adding back the values of the dinner
recipe being changed. The same is also done with cost, but the arithmetic is simpler as
it is held at the weekly level as a class variable. As such, only the cost of the changing
dinner recipe need be added, without the need to query the Day object.
After these variables have been calculated, the RecipeOk method is called to verify
the dinner will work within the confines of the remaining threshold values. If it fits the
requirements, the recipe is set to the Day.Dinner property.
This enter process is repeated until either the list of dinner recipes is exhausted, or
a suitable replacement dinner recipe is found and is added to the meal plan.

SetMeal Method
The SetMeal method allows the user to set a specific recipe to a specific meal on a specific
day. It takes the meal, recipe name, and the day (as an int to convert to the enumeration).
This method is called from the GUI, and the user’s choices are passed in. If it finds the
recipe in the list of recipes, it sets the recipe to the appropriate meal on the appropriate
day and returns true. If the meal is not found, the method returns false.

Implementation
This section will review the implementation details of the Meal Planner tool. First, the
GUI is described, and each of the features available on the GUI is described. Following
the GUI description, a review of the algorithms used to generate the meal plan for the
user is provided.
The user interface is simple. The program starts up with several options to the user.
“Shopping” (displays shopping list), “Reroll Dinner” (replace dinner on a given day
with a randomly generated alternative) and “Save Plan” (saves text file to a user-selected
location) are all disabled.
These options require a meal plan to have been generated, and disabled until after
the meal plan is generated to avoid user confusion.
Automated Meal Planner Using Multiple User-Defined Benchmarks 321

The GUI starts with the following options available to the user: “Generate” (generates
to the meal plan and outputs to the screen), “Set Meal” (allows the user to set a specific
meal), “Preferences” (allows the user to set and update preferences), “Add Recipe”
(allows user to add a new recipe to the planner), and “Update Active/Inactive Recipes”
(allows user to reactivate or deactivate selected recipes). (Fig. 5).

GUI Description Main UI

Fig. 5. Meal Planner GUI upon start up

Fig. 6. Preferences GUI

The “Preferences” GUI allows the user to set different options that impact the criteria
the system uses. The user can update their preferred maximum value for their calories,
carbohydrates (carbs), protein, fat, and cost. Once the user has the required number of
recipes available, the “Generate” button will activate the calculator and will output a
meal plan to the text box located on the main GUI (Fig. 6).
Figure 7 shows how a meal plan appears to the user after it is generated. First, any
criteria that had to be removed in order to generate the plan is provided to the user so
they can see which criteria was ignored.
Next, the total cost of ingredients to make for all the week’s recipe entries is dis-
played. Next, each day’s plan is displayed. The name of the day and a breakdown of
the nutritional criteria for that day is provided, allowing the user to see their nutritional
value immediately.
Each meal is then listed, including any snacks that were added to the day (max three,
assuming the snacks fit the criteria). If “Use Dinner Sides” is checked, the meal plan
will list dinner as “Dinner <recipe> and: <side>” (see Fig. 8).
322 C. Lyons-Rocque and S. K. Semwal

Fig. 7. Main GUI - post plan generation

Fig. 8. Meal plan with dinner sides

4 Future Work
This work had a few features that were out of scope of this project but should be added to
increase usability. The ability to check off ingredients as the user purchased them would
make tracking the shopping easier. Unit conversions to condense shopping list entries
would make the shopping list more readable (e.g., “1.25 cups of flour” instead of “1 cup
flour, 4 tbsp flour”). The ability to edit recipes would allow for tweaks without requiring
the user re-enter a recipe with a new name and deactivate the previous recipe, or needing
to edit the recipe file directly. Finally, converting into a mobile application would make
it easier for the user to access the items from anywhere via the user’s smart phone.
Applications such as this are infinitely extensible. There are many ways of using this
app, and as such many new features that could be added to support the needs of the end
user. This section describes some of the features that could be added, but is by no means
an exhaustive list:

• We imagine statistical and deep learning techniques could also be used to add new
features based on possible patterns observed in connecting meals to possible future
health predictions.
• Adding tags that could categorize recipes and allow certain meals to be restricted to
certain tags would allow users to tailor their meal plan to different goals, audiences, and
dietary needs. For example, a “Vegetarian” tag could be used for “Meatless Mondays”.
“Meatless Mondays” is a popular way to reduce a family’s carbon footprint and meat
Automated Meal Planner Using Multiple User-Defined Benchmarks 323

consumption without going entirely vegetarian. For blended families where custody of
children is shared with another parent, a “Kids” tag could be used for meals when the
children were with the user. “Vegan” or “Allergen” tags could be used to accommodate
friends, family or other guests who have dietary restrictions that the user only needs
to follow when those guests join for dinner.
• Some families may consistently have leftovers, so having an option to plan on leftovers
for lunch the next day would reduce waste. This is especially useful if a limited number
of family members are tracking their nutrition (such as one member monitoring what
they eat for weight loss).
• Providing connection to other apps and services would allow users to synchronize their
usage across apps. Recipes could be imported from tools used to calculate nutritional
value of a recipe. Meal plans could be sent directly to food tracking applications.
Shopping lists could be sent directly to the grocery store app or website to auto-
populate the online shopping cart. Integration with digital assistants, such as Alexa
or Siri would allow the user to use voice commands to generate, order, and save meal
plans. Commands to ask “What’s for Dinner Tonight” etc. can be added to allow
for easy reminders without needing to refer to the saved meal plan text file. New
“smart” refrigerators can be integrated to automatically check for items such as milk,
heavy cream, etc. In time, the ability to use that same camera and food identification
technology could be added to pantries and cupboards to check the stock of all items.
• Providing means for user accounts to communicate. Allow trading of recipes between
users, or share meal plans within the user’s network of friends and family could also
be added.
• Although we have tested working of the program, the quantitative and qualitative
testing by several group of people will be beneficial in future.

5 Conclusion

This project addresses a modern family’s need to save time and energy, while still
planning healthy meals that meet health and fitness goals. Modern families can struggle
with the age-old question “What’s for dinner?”.
Typically, one-button meal plan generators push their proprietary recipes and diets
onto the user. Tools that allow the user to enter their own recipes do not provide the
option for one-click meal planning, relying on the user to manually plan their meals,
storing the plan, and offering tools like one-click shopping.
But no tool was found that automatically generated a meal plan based on user-defined
recipes and food. This tool allows users to plan their favorite recipes and foods while
still keeping to their nutritional goals.
This tool allows users to generate nutritional goal-driven meal plans quickly and
easily, without requiring the user to manually calculate if their goals are being met. If
goals cannot be met with the foods available, it removes the least important criteria. Final
totals are provided to the user, allowing them to validate their meal plan themselves.

Acknowledgment. This program was developed by the first author under the supervision of the
second author to satisfy part of the MS project requirement and is based on [5]. Both authors will
324 C. Lyons-Rocque and S. K. Semwal

like to thank the MS project Committee Members, and reviewers of FTC 2022 conference as their
insightful comments improved our work. Thank you.

References
1. Bureau of Labor Statistics: Bureau of Laber Statistics, 5 March 2022. https://www.bls.gov/
opub/mlr/2020/article/comparing-characteristics-and-selected-expenditures-of-dual-and-sin
gle-income-households-with-children.htm
2. Mealime: Mealime, 5 March 2022. https://www.mealime.com/
3. Paprika: Paprika, 5 March 2022. https://www.paprikaapp.com/
4. Whisk: Whisk, 5 March 2022. https://whisk.com/
5. Lyons-Rocque, C.: Automated meal planner using multiple user-defined benchmarks for
healthy living, MS project, advisor: Dr. Sudhanshu Semwal, Department of Computer Science,
University of Colorado, Colorado Springs, pp. 1–61, 27 March 2022
A Smart Healthcare Framework: Opportunities
for Integrating Emerging Technologies (5G, IoT,
AI, and GIS)

Balakrishnan Mullachery1,3(B) and Sarah Alismail2,3


1 Esri, Redlands, CA 92373, USA
[email protected]
2 City of Hope, Duarte, CA 91010, USA
3 Claremont Graduate University, Claremont, CA 91711, USA

Abstract. A connected society is a much-discussed topic in the research com-


munity. In science and technology, the connected society is interpreted as the
manner in which information enables the society to enrich its knowledge recovery
process, hence, allowing the pursuit of higher living standards and greater social
rights. Information and communications technology (ICT) proved its ability to
provide solutions for many unresolved questions in social science, economics,
business, science, engineering, medicine, and healthcare. The pervasiveness of
ICT, along with the recent advancement in technologies such as Artificial Intelli-
gence (AI), the Internet of Things (IoT), and the Fifth Generation (5G) wireless
communication paradigm, are changing the landscape of technology implemen-
tation. This work-in-progress paper proposed a model for a smart healthcare with
the convergence of 5G, IoT, GIS and AI technologies and discusses how this con-
vergence help to develop a sustainable eco-system for the future smart-connected
society emphasizing on healthcare and well-being.

Keywords: Smart healthcare · 5G · AI · IoT · GIS

1 Introduction
With the continued advancement in information and communications technology (ICT),
the world has become digitally transformed and moving towards a connected society. A
connected society is a much-discussed topic in the research community. Citizen science
discusses the connected society in multiple dimensions. In political science, the con-
nected society is discussed as egalitarian empowerment and bringing power to people
for engagement in progressive politics [1]. In geography, it is interpreted as the spatial and
topological connectivity of objects that people require for well-being and good living.
Distributed wealth among people is also discussed in economics. However, in science
and technology, the connected society is interpreted as the manner in which information
enables the society to enrich its knowledge recovery process, hence, allowing the pursuit
of higher living standards and greater social rights. Social connectivity is indeed for the

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 325–340, 2023.
https://doi.org/10.1007/978-3-031-18344-7_22
326 B. Mullachery and S. Alismail

collective co-existence of society, and the technical nuances of implementation are in


the hands of policymakers [1].
The ubiquity of smarter technologies in real-time communication is the foundation of
a connected society. The concept of “connected society” is used as a motivating scenario,
where an intelligent platform augments the ability of humans to transfer their knowl-
edge in real-time with each other or with machines around them and provides potential
efficiency gains. Trustworthiness between humans and machines and the autonomous
operation of machines with self-learning capability enhances competitive advantages
in terms of continuous performance and quality enhancement. Imagine a connected
community with an intelligent platform where chronic disease patients, hospitals, care-
givers, smart devices, and technologies are connected through each other with intuitive
technological features of and efficient communication systems for information sharing.
This type of system can alleviate sickness and emotional stress, and it can also improve
self-care, quality of life (QoL), active daily life, and support mitigating life risks.
The increasing number of wireless communication network users and innovative
emerging wireless services has led to increasing demand of wireless communication
networks worldwide [2]. Supporting the growing potential for disruptive internet of
things (IoT) applications along with the growing need for broadband services provided
through mobile networks are two main motivators underlying the development of fifth
generation (5G) wireless networks [3].
The inability to detect an early sign of health risk can lead to potential adverse health
outcomes. Earlier research showed the potential of using technology-based application
solutions in remote patient monitoring to avoid unnecessary use of hospital admissions
and resources [4]. Nevertheless, there is a lack of a collaborative, inter-connected plat-
form that orchestrates various medical and logistics requirements to notify/direct people
who need medical support. Such smart healthcare platform should have capabilities in
three major aspects of an enterprise system: system of records, system of insights, and
system of engagement. Hence, to address the need for designing and building such tech-
nological solutions, this paper looks at how to collaborate spatial science, IoT, AI and
5G communication systems to develop a next generation healthcare framework.
In this work-in-progress paper, we proposed a model that supports the technological
advancement of ICT with the convergence of artificial intelligence (AI), IoT, geographic
information system (GIS) and the 5G wireless communication paradigm for developing
an eco-system for the future connected society emphasizing the healthcare and well-
being under the framework of Healthcare 4.0 [5]. Each of these technological tools (i.e.,
AI, IoT, GIS, and 5G) has its own strengths and weaknesses. Leveraging the convergence
of these technologies will provide the opportunities that the strengths of one tool can
offset the weakness of another tool. Despite the strength that one tool has, they all depend
on one another for an effective solution. Thus, the objectives of this paper were to: 1)
explore the affordances of the convergence of AI, IoT, GIS, and 5G in smart healthcare;
2) discuss how to optimize the efficiency and reliability of healthcare services using ICT
solutions designed and operated through 5G along with other technologies; 3) design a
model for a patient-centric software platform with the convergence of AI, IoT, GIS, and
5G in smart healthcare.
A Smart Healthcare Framework 327

Main contributions of this work-in-progress paper to literature and practice are three-
fold. First, it discusses current and potential applications of collaborative, interconnected
technological solutions to improve healthcare activities and services taking into account
three different dimensions: healthcare stakeholders, local level computing, and regional
level computing services. Second, it proposes a comprehensive model of smart healthcare
that converges IoT, AI, GIS, and 5G that may be used to guide researchers through a com-
prehensive solution development process. Third, the proposed model can also be used
by solution providers in the healthcare sector to facilitate their business development.
The remainder of this paper is structured as follows. First, a literature review and
description section of each of the technologies: AI, IoT, 5G, and GIS was provided.
Next, a model for the convergence of 5G, IoT, AI, and GIS for a smart healthcare was
presented. A use case was then presented to demonstrate the proposed model by providing
scenarios and examples from the scholarly literature of the individual technologies and
their integration to support smart healthcare. Finally, expert opinions of the model’s
value and utility were summarized.

2 Literature Review
2.1 Artificial Intelligence
AI is today one of the most discussed buzzwords of the technology industry. It refers to
a self-learning system based on data. It augments the human decision-making process
using the experiences or observations of a human brain with machine-learned patterns
from data. The two main components of AI are data and mathematical algorithm. The data
can be structured or unstructured. AI algorithms classified into supervised, unsupervised,
and reinforced learning processes based on how the algorithm learns patterns from data.
Supervised learning algorithm uses human-labeled data for training.
In the unsupervised learning process, the algorithm creates labels from data and
trains itself. The third class, reinforced learning, uses iterative interactions of an agent
and the environment for actions to change the status either as a reward or penalty. During
the interaction of the agent and environment, the agent learns the environment through
the iteration of an action and the reward/penalty process [1].
The patient-centric treatment model has been discussed in the industry for a decade.
Technological barriers to fully implementing the patient-centric model have yet to be
overcome. One of the main characteristics of the patient-centric model is the real-time
information sharing process. In a real-time decision-making process for mission-critical
applications such as healthcare and for the development of an intelligent connected
society for the knowledge sharing process, AI will be part of a technological ecosystem.
For instance, chronic disease patients’ data would be available from patient medical
records at a hospital. The hospital can develop an AI tool to identify the patterns from
the data to support patients and treatments. However, such information and decisions
are based on a clinical aspect. Assuming a shared social platform between patients and
hospitals, IoT devices (e.g., wearable sensors, drones, and mobile phones) automati-
cally captures patient health information and shares it through a technology platform.
AI-based real-time analytics can provide new information to patients directly by perform-
ing data analytics on the collated data from medical records, patient-level time-series
328 B. Mullachery and S. Alismail

data, patient-to-patient communication data, and the geolocation aspects of patients. For
this type of technological integration, computing and communication infrastructure rep-
resents a limitation to instant knowledge sharing geographically. An enhanced wireless
communication platform with cloud computing and AI can provide predictive knowl-
edge to a patient at risk. A 5G wireless network can reduce the gap in the real-time
data delivery process from patients to the computing platform and from the computing
platform to patients.

2.2 Internet of Things (IoT)


IoT is a digital network with connected things as part of a web services in real-time
data sharing. It has the potential to break traditional boundaries and allow individuals to
self-organize, self-align, and self-act based on remote or self-learned instructions [5]. A
thing can refer to devices, equipment, technologies, or peoples, and it could also mean
sharing from people to things, people to people, or things to things. The internet provides
connectivity services to users anywhere at any time [6]. These things are connected in
a physical core network that uses a heterogeneous complex model-to-model system so
that information can be exchanged [7]. The model-to-model system communicates in
dynamic interactions, and information is processed in deterministic communication for
real-time information delivery for things (devices) to perform actions or reactions. Sev-
eral factors such as network latency, bandwidth, and reliability, determine the real-time
and dynamic interactions of things [8]. The devices are connected to the physical network
using wireless networks. Furthermore, the location of the devices, data propagation, and
network traffic over time are important for the efficient design of the model-to-model
and real-time communication protocols. However, current wireless technology lacks
security, reliability, and low latency for machine critical applications compared to wired
networks. Nevertheless, mobility and cost will be advantages over wireless technology
for numerous industries to interface IoT devices.
The IoT is a big data source of information with favorable characteristics, including
high volume, velocity, and variety. A sophisticated wireless communication system is
needed to transfer information to the computing infrastructure. A 5G wireless commu-
nication network can augment real-time data transfer or information delivery processes
with required network throughput and latency limitation. This 5G characteristic makes
the versatility of IoT in a real-time machine-critical application such as smart healthcare
or be part of healthcare 4.0.

2.3 Fifth Generation Wireless (5G)


Since its inception, wireless communication has been undergoing changes and enhance-
ments based on technological advancements and innovations in communication net-
working and computing technology. Wireless technology has transitioned from 1G to
4G-LTE. Now 5G is expected to be deployed for public use during 2020, to accommodate
the growing number of connected devices and the future wireless networking demand
to support vertical industries such as eHealth, agriculture, and automated transportation
[9]. Between 5G devices and the telecommunication industries, one study projects that
16 billion devices will be connected to the internet by the year 2021 [10]. The 5G is
A Smart Healthcare Framework 329

defined as “an end- to- end ecosystem to enable a fully mobile and connected society. It
empowers value creation towards customers and partners, through existing and emerg-
ing use cases delivered with consistent experiences and enabled by sustainable business
models” [9:1].
The 5G infrastructure provides an ecosystem for every internet-enabled device
and can be configured in a state-of-the-art fashion, including a network-as-a-service
for infrastructure efficiencies. The core parts of the 5G are: Software Defined Net-
work (SDN), Network Function Virtualization (NFV), and cloud computing. SDN and
NFV provide capabilities for logically virtualizing the network (aka network slicing).
SDN is agile and flexible, and the programmatically configurable data plane and con-
trol plane enable the operators to provide a faster infrastructure service. NFV enables
network function as software which dynamically scale the resources. Five G is capa-
ble of providing three generic services: enhanced mobile broadband (eMBB), massive
machine-type communications (mMTC); and ultra-reliable and low-latency communi-
cations (URLLC) in mission-critical communication [9, 11]. These applications suggest
new performance criteria for latency, reliability, connection and capacity density, spec-
tral efficiency, energy efficiency and peak throughput that need to be addressed with
the 5G technology [11]. Non-standard architecture and eMBB standard [12] and the
standard architecture version for 5G New Radio (5G NR) standards [13] provides the
guidelines for 5G implementation. Figure 1 depicts the different requirements of 5G
wireless network [9].

Fig. 1. Five G requirements [4]

Five G users can access the service from anywhere at anytime a data rate of 1Gbps
and a latency of 1ms with a quality of service and user experience [9]. The spectrum band
330 B. Mullachery and S. Alismail

is less than 6 GHz. Five G network is expected to provide faster data rates, higher con-
nection density, and lower latency. Also, device-to-device communication, better battery
consumption, and improved wireless coverages are characteristics of the 5G communi-
cation system. The max speed of the 5G system is expected over 20 times faster than a
4G communication system [6]. It is estimated that 5G can support one billion devices
in 1 sq.km and a mobile device at a speed of 500 km/h. The reliability and availability
of the services are expected to be up to 99.99% [9]. The 5G communication platform
can overcome many limitations in the healthcare automation process by providing uni-
fied communication platforms. The low-latency and high throughput of 5G along with
fog/cloud computing and AI analytics can hinder a smarter healthcare platform from
sharing mission-critical information.

2.4 Geographic Information System

The pervasiveness spatial sciences and ICT are increasingly being used to uncover hid-
den questions related to society and to find solutions to many unresolved social problems,
and they are becoming popular in the knowledge recovery process. Studies have shown
that 80% of the data from various sources (i.e., big data) available today can be georefer-
enced [14]. Georeferenced data can provide its association within a geographic context.
A new interdisciplinary field, Geospatial Artificial Intelligence (GeoAI), which learns
and predicts or forecasts an event in a geographic frame of reference, has emerged for
knowledge recovery using spatial analysis with the integration of AI. For example, in the
field of epidemiology, the disease spread prediction model was developed using GeoAI
algorithms [14].
GeoAI can analyze environmental exposure and develop exposure modeling related
to health. Many studies have proved that air pollution, including PM2.5, NO2 , SO2 , and
PM10, influences the mortality and hospitalization of CHF patients [15], and increases
the number of COPD, cancer, and CHF cases in urban and sub-urban areas [16–19].
IoT plays a role in geolocation services and the information-gathering process. In
the context of healthcare, information such as patient locations, hospital and caregiver
locations, routing and tracking of an ambulances, time-series and static environmental
factors of a location, and patients’ geographic movements is important. At the same
time, sharing and processing this information in a real-time application are very process-
intensive. The technologies of 5G, IoT and AI contribute to transferring and processing
geolocations with environmental factors in real-time scenarios to create smart healthcare.

3 Design and Development


The 5G system provides an ecosystem that enable a fully mobile and connected soci-
ety [20], and it will increase user demand and data rate demand manifold. An efficient
demand and performance management system is thus required to monitor demand, per-
formance, and higher data rate at a lower cost. Five G is envisioned as a user-oriented
unified platform for providing seamless connectivity at a higher data rate. Therefore,
an efficient resource allocation process meeting Quality of Services (QoS) is required
[20]. For maximum resource utilization, AI algorithms can be helpful in determining the
A Smart Healthcare Framework 331

scheme optimization for allocating resources to different users who share the network.
Five G network is deployed in the Multiple Input Multiple Output (MIMO) framework,
whereas AI technology can be implemented for channel optimization and detection error
rate minimization [7].
Machine to Machine (M2M) and device to device (D2D) communications are char-
acteristics of IoT devices. Billions of addressable devices on the network require massive
amounts of data transfer through the network [12]. This will pose a challenge to inter-
net and mobile communications in terms of location refresh and network congestion.
The efficient throughput and low latency of 5G will augment the effectiveness of M2M
communication. This calls for a need to have a comprehensive model to facilitate the
development of such collaborative, interconnected solutions.
In this section, we proposed such a model to support current and future researchers
and practitioners in designing ICT-based solutions. Figure 2 illustrates the proposed
model for smart healthcare platforms under 5G network with IoT, AI and GIS tech-
nologies in a connected society. In this platform, patients, hospitals, caregivers, medical
equipment, and healthcare infrastructures are equipped with IoT or connected to the 5G
network. The network is sliced and optimized using technologies such as SDN, NFV,
fog/edge computing, and cloud. The D2D, M2M, and URLLC capture and compute
the information without manual intervention or patient knowledge. The health-related
feedback will be passed on to the patient, caregivers, and/or medical equipment (e.g.,
drone or an ambulance) for immediate assistance. Such a system can increase the QoL
of the patient, lessen the burden on healthcare facilities, and reduces overall cost.
The model illustrates a geography centric approach in the solution design. Spatial
information becomes increasingly popular for efficient management of infrastructure
and resource location prediction models. The infrastructure and resource locations (e.g.,
healthcare providers, 5G communication towers) are directly proportional to the pop-
ulation of an area. The population can be better defined using geography and the base
of this solution is spatial technology or GIS. The concept behind 5G is having multiple
frequency-based towers. High-frequency towers are used for faster data services. The
clusters of low frequency and high-frequency towers provide better connectivity for
machine critical application usage which was a limitation in the existing telecommuni-
cation networks (3G, 4G, or LTE). The number of IoT devices generates huge volumes
of data in a local area that can be computed in a fog computing node instead of send-
ing it into the cloud. This enables faster data storage, computing, and dissemination of
information.
The basic concept of this design is a mashup of technology-based patient-centric
and provider-centric healthcare systems. Every patient is equipped with IoT devices and
smartphones that collect and transfer current information with near-zero lag time into a
cloud computing center. The data is spatiotemporal, repeated at constant intervals, and in
an environment where the subject is live. This follows the theory of ecological momen-
tary assessment (EMA) [21], random ecological momentary assessment (R-EMA), or
context-sensitive ecological momentary assessment (CS-EMA), all of which are very
popular theories in determining patients’ psychological and mental health. This helps to
forecast health determinants before they appear.
332 B. Mullachery and S. Alismail

The fog computing center processes information using AI technology locally and
determines the appropriate actions to message back to a patient. It then sends the infor-
mation back to the cloud for a better AI prediction using more data available from
historical archives as well as from other fog nodes. It also sends messages to the patient
and providers, where the providers can see all processed information in a dashboard.
This helps to determine the level of urgency. The cloud platform also integrates mul-
tiple healthcare systems that can provide patient-specific clinical and subjective data.
The geolocation services can process and route incorporating agents, such as a patient’s
location, medical facilities, resources, medical devices, and medical equipment for the
efficient provision of services. This information is available for other research centers,
agencies, universities, and providers under a common government governance model.

Fig. 2. Smart healthcare model

3.1 Use Case: Smart Healthcare Solutions to Healthcare Crises

The world faced a novel healthcare crisis in 2020. Technological innovations were needed
to help remediate the delivery of healthcare services during the coronavirus (COVID-
19) outbreak. The public health crisis surrounding COVID-19 is used as a use case to
demonstrate and discuss how technology, such as IoT, AI, 5G and GIS, contributes to
healthcare services. More specifically, this section discusses a use case of a patient-
centric smart healthcare system for chronic disease patients by mapping examples from
A Smart Healthcare Framework 333

scholarly literature of the individual technologies and their integration to support smart
healthcare.
One of the advantages of a connected society is “people power”. The connectiv-
ity between people using technology makes them more knowledgeable about their
environment and gives them learning power. Five-G technology could augment fur-
ther network coverage for underserved areas and create a wider-connected society of
patients, healthcare providers, and caregivers while allowing for real-time data transfer
with URLLC.
Home healthcare allows patients to live in an environment where they are most
comfortable. The ICT is enabled by IoT, AI and 4G or 5G. Such a system replaces a
caregiver and behaves as a system in charge of the patient’s health to read and interpret
real-time health data and provide instructions or connect to a remote healthcare provider
for further health advice and services. This can impact the cost, time, and quality of
life of a patient and place less burden on healthcare services, providing a balanced
service to patients that satisfies their needs [22]. Hence, technology-driven, patient-
centric healthcare services can provide cost-effective quality self-care for suppressed
and diseased communities. Healthcare is a broad area within the connected society
paradigm.
The rapid increase of chronic illnesses poses a significant threat to human popula-
tions in terms of health economics [23]. According to the Centers for Disease Control
and Prevention’s (CDC) 2019 statistics, 60% of Americans live with a chronic illness.
Currently, 90% of the $3.3 trillion of healthcare expenditure is spent on treating chronic
and mental illnesses. A chronic patient needs ongoing medical attention. The illness
cannot be permanently cured; however, medical care attempts to mitigate the symptoms
and complications of the disease.
One of the solutions for this growing problem is keeping chronic disease patients at
home by providing prolonged and efficient healthcare support either through caregivers
or self-management. Researchers have been studying long-term, at-home self-care using
multidisciplinary approaches by including clinical, medical, and behavioral sciences
(e.g., [24–26]). In the advent of information systems and technology (IS&T), researchers
have developed telemedicine and telemonitoring systems for healthcare services (e.g.
[4]). The study of this concept explores the possibilities of home-based health monitoring
systems through the integration of home-to-clinic or home-to-hospital ICT to enhance
communication between patients and caregivers remotely by using IoT devices. This
becomes especially important in delivering healthcare services during a crisis like a
pandemic. The approach of monitoring patients remotely while they are at home can be
a less expensive option in terms of healthcare resource management and utilization as
well as a safer option in situations where in-person medical visits are unsafe, such as
during the COVID-19 pandemic. Figure 3 depicts the convergence of these technologies
and the intersection of 5G, IoT, AI and GIS in healthcare where future studies can be
performed.
The advent of high-performance computing, big data storage (BDS), data mining
(DM) and advanced data analytics using AI including machine learning (ML), deep
learning (DL), and text mining (TM), have all further enhanced the application of ICT.
These pervasive forms of technology, along with spatial data, are becoming popular in the
334 B. Mullachery and S. Alismail

Fig. 3. IoT, GIS, AI, and 5G intersection

field of knowledge recovery to find discernible patterns and answer questions of interest
using advanced data analytics. A research focus of an interdisciplinary study by the
integration of the technologies described above can produce new insights about chronic
illnesses and self-care and assist in identifying patterns that are helpful for forecasting
QoL and risks for a patient with chronic condition(s) and/or disease(s).
Internet of things (IoT)-based mobile health (mHealth) applications are becoming
very popular with faster information collection and dissemination in healthcare. In the
case of chronic disease patients, there is a knowledge gap between their living experi-
ences and the shared understanding available from different care management medical
models. The main purpose of a patient-centered application is to provide the patient with
the ability to make an informed decision about their health and self-manage the risks
during their active daily life [27]. The effective use of the mHealth application along
with other forms of ICT helps to understand wellness and manage chronic diseases [28].
Virtual healthcare service centers have gained popularity due to the current COVID-
19 pandemic and increase in population density together with the lack of sufficient
healthcare resources and awareness of disease prevention and management. Youm and
Park [29] studied the advantages and disadvantages of ubiquitous healthcare (u-Health)
service centers. The monitoring and care management of cardiovascular disease patients
has been successfully performed by means of assessments obtained via unmanned kiosks
without a face-to-face interaction with a physician [30]. Telemedicine has been success-
ful in analyzing the risk of heart failure for patients at home and has proven to be a
potential IT solution for the reduction of hospital readmissions [4]. A cross-sectional
study reports the self-assessment of acute stroke patients treated without hospitaliza-
tion using a mobile phone application-based scale [28]. Considering the abundance of
mHealth application devices in the market, Ruiz-Fernández [31] studied the effective
use of the business process management (BPM) paradigm by the integration of patient
data collecting devices, and clinical processes. The fundamental objectives of a BPM
model are collaboration and coordination of multiple technologies, techniques, and IT
principles to empower patients and increase treatment adherence. Empirical evaluation
of the usability of mHealth systems available for self-care management has also been
reported in the literature [32].
These studies describe the background and proliferation of IoT-based mHealth and its
applications. The key concept highlighted in these studies is the novelty in the design of an
mHealth application for maximum utilization from the patient perspective. The success
of a mHealth application system is based on several factors that affect the continuation of
A Smart Healthcare Framework 335

its use. These factors are typically based on perceived usefulness and patient satisfaction,
which are measured in terms of individual literacy, social support, information quality,
and service quality [33]. An enhanced communication system is essential for faster data
transfer; it analyzes the data and delivers feedback to the connected patients that are
geographically distributed across a wide area by accessing a large knowledge base. For
instance, if a patient is at-risk, the system is expected to automatically analyze the data
by using readings for current and previous health conditions and comparing them against
the knowledge base, both locally (fog computing), and globally (cloud computing). This
is to be followed by the execution of protocols to provide instructions to the patient
directly or through caregivers (if any), send medicine via drone, or sends an ambulance
(ground or air) for faster medical support. Ambulances or drones should have the most
efficient routing system to provide faster service. These analytics and actions must be
executed without any interaction between the patient and the system. Such essential
communication protocols for D2D and M2M critical applications can be provided by
5G wireless communication with URLLC characteristics.
Time series data is an equal-interval type of data and is collected in a sequence
of orders from a single object. This data will help to monitor an event or object in
time-space and explain the underlying structure or mechanism exhibited by that object
or event. In healthcare, time series data can reveal trends by emphasizing the study of
the real-world effectiveness of complex, personalized clinical interventions, a patient-
centered salutogenic focus, or engagement with nonmedical diagnostic and treatment
frameworks. Time series-based clinical data from patient electronic records are widely
used in clinical utility studies [34]. For a congestive heart failure (CHF) patient, accurate
detection of cardiac health is essential to improve the quality of life, and an ML model—
support vector machines (SVM)—was used by [35] to classify electrocardiogram (ECG)
signals to study the heart rate variability that leads to cardiac arrhythmias.
AI and ML models, linear regression (LR), Recurring neural Networks (RNN), Reg-
ularized Regression (LASSO), and Gradient Boosting Machines (GBM) have been used
to forecast healthcare expenditures [36]. Seasonal Autoregressive Integrated Moving
Average (SARIMA) machine learning has been used to understand and enhance the
opportunities for resource allocation and patient care at periods of elevated risk [37].
Multiple ML models from simple to complex decision trees and ensemble models were
used by [38] for feature extraction in the classification of ventricular tachycardia risk
based on device-measured time series data.
Kwon et al. [39] developed a deep learning-based early warning system to predict
the possibility of cardiac arrest for hospitalized patients using a ML model RNN, the
periodic records extracted from the patients’ electronic database, and clinical vital sign
features: systolic blood pressure, heart rate, respiratory rate, and body temperature. In
terms of disease detection, smartphones built-in inertial measurement unit sensor data
have been used to detect cardiovascular diseases [40]. In addition, AI models have been
tested in tracking and predicting diseases using IoT, including mobile and wearable
devices [41].
336 B. Mullachery and S. Alismail

3.2 Value and Utility of the Model

We collected research and industry expert opinions about the potential value and utility of
the proposed model. We conducted qualitative phone-based semi-structured interviews,
which took about 20 to 30 min to complete. We used a convenience purposeful sampling
of five experts. The experts we interviewed were two professors in the field of health
information systems and technology, a computer scientist, a healthcare finance specialist,
and a healthcare professional. The feedback we obtained from the expert interviews were
used to generate ideas for improving and enhancing the proposed model as well as to
propose ideas for future work.
Most experts expressed that the proposed model promises great value in providing
a comprehensive view of how all these different technologies will ultimately converge.
A health information systems and technology professor noted that given the fact that
we are still in the process of adopting the 5G network, what constitutes the novelty
of this model is that it addresses the convergence of 5G network with other existing
technologies along with being based on GIS. She further expressed that “we do have
now robust infrastructure offered by [the] 5G network, so why don’t we take advantage
of that and start collecting more data from different nodes whether it is on the individual
level or household level” (Expert 2). Another expert in the field of health information
systems and technology noted that to the best of her knowledge, she has not seen models
that really integrate all these cutting-edge technologies. She believes that this can be
used as a guidance model for how the technologies can converge together to ultimately
benefit patient, physician, and/or research community. Both the integration of 5G with
other cutting-edge technologies and the benefit they promise for healthcare stakeholders
shows the value of the proposed model.
Given the COVID-19 pandemic the world is currently facing, a health information
systems and technology professor mentioned that,
“I am hoping to see more and more of harnessing cutting-edge technologies such as
the contact tracing apps…that are currently used in some countries such as China and
Singapore to help with the surveillance… so instead of using the traditional contract
tracing, which means that a person who have shown some symptoms of COVID-19, he
is going to wait 14 days to see if these symptoms developed and then he is going to be
approached by a healthcare facility to share names of people who has been in contact
with. Instead of this long cycle, they can use what is called the digital contact tracing,
which is a mobile app that is based on the location of the person/user where the geo-
location feature will be enabled in those apps, which mean that automatically whenever
the app user meet any person, the app will show all the contacts of that user, given that
this person and whoever he is been in contact with during these 14 days both have the
contact tracing app” (Expert 1).
Considering future work, another expert in computer science noted the importance
of considering privacy at the local area computing level when these types of technologies
are deployed. According to her, the number one issue that researchers and practitioners
are facing with these smart devices is how we can make sure that these technologies
preserve the privacy of the individuals. The issue of security is also critical to consider.
Health information is highly sensitive and transmitting it over unsecured networks like
the POTS has troubled privacy experts for decades. Only with the advent of legislation
A Smart Healthcare Framework 337

like HIPAA and the incorporation of encryption, technologies have the privacy situation
become somewhat tenable. However, bringing IoT devices into the proposed model not
only makes communication vulnerable, but it also jeopardizes end user devices through
malware and uninvited remote access. This would be a worthwhile area to explore in
future research.
A healthcare finance specialist viewed the proposed model through the lens of
finance. The feedback was that such a collaborative system is essential to fulfilling
the requirements of interoperability between multiple systems for common healthcare
expenses and budgeting. She further mentioned, “the options available in the model can
cut insurance cost and instead of Medicare for all, this will improve healthcare systems
effectively in all ways” (Expert 4).
Another healthcare professional who is working in telemedicine communication
as a nurse practitioner expressed an interest to have such a fully automated system for
effective use of the telemedicine concept. In her experience, the patients must upload the
requested health data through a manual process which takes weeks to be processed into
the system. This delays the appropriate medical consultation and sometimes the situation
goes worse than expected. She also noted that the trend analysis available in the proposed
model can offer analysis for many common diseases. Some examples of these use cases
include, food and nutrition requirements based on the medication, medicine dosage
adjustment, monitoring pregnant women and providing instructions, during a trauma
bleeding versus health vitals analysis to determine the risks, unprecedented pandemic
and abnormal situations, and mental health depression and suicide transformation in an
early state. She further noted that, “the model proposed, once implemented, can bring
big changes in healthcare services due to the collaboration of various data. This can also
help to avoid unwanted health testing and screening processes and during COVID 19
the resources could have been better managed and forecasted” (Expert 5).

4 Conclusion
This work-in-progress paper presents a proposal of a model converging AI, IoT, 5G
network, and GIS in a concept that contributes to smart healthcare. We present a review
of technologies including AI, IoT, 5G network, and GIS; and highlight some of the chal-
lenges of 5G implementation. The increasing potential of those technologies individually
and in tandem to improve the process and delivery of healthcare services constitutes the
significance of our work. The transformative power of these technologies to provide effi-
cient and inexpensive healthcare services illustrates the potential benefits of designing
and developing such technological solutions. Geolocation and intelligent built around
using technologies can enable and optimize the smart healthcare deployment landscape.
It would be interesting to learn from future research projects or solutions in the realm of
healthcare that could be organized, implemented, and/or evaluated using the proposed
model. This work is part of a larger project on smart healthcare. The proposed model
is a starting point for designing and developing technological solutions to solve real-
world problems that will be further assessed for their efficacy and effectiveness using
empirical studies. The proposed model was preliminarily evaluated with academic and
medical industry experts to understand the usefulness and acceptance from the industry
338 B. Mullachery and S. Alismail

and research community for the study continuation. We have received favorable encour-
agements for having such system to reduce the current hospital and patient burdens
and providing satisfactory health services. Future studies are encouraged to continue to
build on the conceptual model, articulate the role of GIS, how it integrates with other tech-
nologies, how the location information can be mined in and of itself, or can be combined
with other patient attributes to provide spatiotemporal location intelligence. Another
interesting opportunity is to examine patterns and relationships between the location
and non-location attributes of patients with the overall smart healthcare framework.

References
1. Allen, D.: A connected society. Soundings 53(53), 103–113 (2013). https://doi.org/10.3898/
136266213806045719
2. Wang, C.-X., Di Renzo, M., Stanczak, S., Wang, S., Larsson, E.G.: artificial intelligence
enabled wireless networking for 5G and beyond: recent advances and future challenges.
IEEE Wirel. Commun. 27(1), 16–23 (2020)
3. Henry, S., Alsohaily, A., Sousa, E.S.: 5G is real: evaluating the compliance of the 3GPP 5G
new radio system with the ITU IMT-2020 requirements. IEEE Access 8, 42828–42840 (2020)
4. Alnosayan, N., Chatterjee, S., Alluhaidan, A., Lee, E., Feenstra, L.H.: Design and usability
of a heart failure Mhealth system: a pilot study. JMIR Hum. Factors 4(1), e9 (2017)
5. Chanchaichujit, J., Tan, A., Meng, F., Eaimkhong, S.: Healthcare 4.0. Springer, Singapore
(2019). https://doi.org/10.1007/978-981-13-8114-0
6. Yamin, M.: Information technologies of 21st century and their impact on the society. Int. J.
Inf. Technol. 11(4), 759–766 (2019)
7. Yang, W., et al.: Narrowband wireless access for low- power massive Tnternet of Things: a
bandwidth perspective. IEEE Wirel. Commun. 24(3), 138–145 (2017)
8. Ekudden, E.: Five Technology Trends Augmenting the Connected Society (2018). https://
www.ericsson.com/en/ericsson-technology-review/archive/2018/technology-trends-201
9. Liyanage, M., Ahmad, I., Abro, A.B., Gurtov, A., Ylianttila, M.: A Comprehensive Guide to
5G Security. Wiley, Hoboken (2018)
10. Ericsson: Be the First to Deliver 5G Access (2020) https://www.ericsson.com/en/networks/
offerings/5g
11. You, X., Zhang, C., Tan, X., Jin, S., Wu, H.: Ai for 5G: research directions and paradigms.
Sci. China Inf. Sci. 62(2), 21301 (2019)
12. ITU: Minimum requirements related to technical performance for IMT-2020 radio inter-
face(S), November (2017)
13. 3rd Generation Partnership Project (2019). https://www.3gpp.org/release-15. Accessed 15
14. VoPham, T., Hart, J.E., Laden, F., Chiang, Y.-Y.: Emerging trends in geospatial artificial
intelligence (geoAI): potential applications for environmental epidemiology. Environ. Health
17(1), 40 (2018)
15. Sovuthy, C.: Imess 2018: Focus on IoT, AI, and 5G [conference reports]. IEEE Solid State
Circuits Mag. 11(1), 88–91 (2019)
16. Cakmak, S., Hebbern, C., Vanos, J., Crouse, D.L., Tjepkema, M.: Exposure to traffic and
mortality risk in the 1991–2011 Canadian census health and environment cohort (CanCHEC).
Environ. Int. 124, 16–24 (2019)
17. Cerrone, M., Cantile, M., Sacco, O., Botti, G.: Geo-location of oncological diseases in the
extra-urban areas of naples and creation of territorial biobanks: an important tool to study
potential connections between environmental factors and cancer. Anticancer Res. 38(11),
6459–6463 (2018)
A Smart Healthcare Framework 339

18. Han, X., et al.: Estimating the spatial distribution of environmental suitability for female
lung cancer mortality in china based on a novel statistical method. Environ. Sci. Pollut. Res.
26(10), 10083–10096 (2019)
19. Zhang, Z.: Prediction model for patients with acute respiratory distress syndrome: use of a
genetic algorithm to develop a neural network model. PeerJ 7, e7719 (2019)
20. Tayyaba, S.K., Shah, M.A.: Resource allocation in SDN Based 5G cellular networks. Peer
Peer Netw. Appl. 12(2), 514–538 (2019)
21. Stone, A.A., Shiffman, S.: Ecological momentary assessment (EMA) in behavorial medicine.
Ann. Behav. Med. 16(3), 199–202 (1994). https://doi.org/10.1093/abm/16.3.199
22. Lin, T.-S., Liu, P.-Y., Lin, C.-C.: Home healthcare matching service system using the Internet
of Things. Mob. Netw. Appl. 24(3), 736–747 (2018). https://doi.org/10.1007/s11036-018-
1087-y
23. Centers for Disease Control and Prevention (2019). Health and Economic Costs of Chronic
Disease. https://www.cdc.gov/chronicdisease/about/costs/index.htm
24. Bosworth, H.B., Steinhauser, K., Orr, M., Lindquist, J., Grambow, S., Oddone, E.: Con-
gestive heart failure patients’ perceptions of quality of life: the integration of physical and
psychosocial factors. Aging Ment. Health 8(1), 83–91 (2004)
25. Gallacher, K., May, C.R., Montori, V.M., Mair, F.S.: Understanding patients’ experiences of
treatment burden in chronic heart failure using normalization process theory. Ann. Fam. Med.
9(3), 235–243 (2011)
26. Juenger, J., et al.: Health related quality of life in patients with congestive heart failure:
comparison with other chronic diseases and relation to functional variables. Heart 87(3),
235–241 (2002)
27. Evangelista, L.S., et al.: Examining the effects of remote monitoring systems on activation,
self-care, and quality of life in older patients with chronic heart failure. J. Cardiovasc. Nurs.
30(1), 51 (2015)
28. Chang, H., et al.: Mobile phone application for self- assessment of acute stroke patients: a
tool for extended care and follow-up. Medicine 97(26) (2018)
29. Youm, S., Park, S.-H.: How the awareness of u-healthcare service and health conditions affect
healthy lifestyle: an empirical analysis based on a u-healthcare service experience. Telemed.
E-Health 21(4), 286–295 (2015)
30. Bahadin, J., Shum, E., Ng, G., Tan, N., Sellayah, P., Tan, S.W.: Follow-up consultation
through a healthcare kiosk for patients with stable chronic disease in a primary care setting:
a prospective study. J. Gen. Intern. Med. 32(5), 534–539 (2017)
31. Bradway, M., Pfuhl, G., Joakimsen, R., Ribu, L., Grøttland, A., Årsand, E.: Analysing
mhealth usage logs in RCTs: explaining participants’ interactions with type 2 diabetes
self-management tools. PLoS ONE 13(8) (2018)
32. Georgsson, M., Staggers, N., Weir, C.: A modified user-oriented heuristic evaluation of a
mobile health system for diabetes self-management support. Comput. Inform. Nurs. 34(2),
77 (2016)
33. Wu, W., et al.: Unsupervised phenotyping of severe asthma research program participants
using expanded lung data. J. Allergy Clin. Immunol. 133(5), 1280–1288 (2014)
34. Sherman, E., Gurm, H., Balis, U., Owens, S., Wiens, J.: Leveraging clinical time-series data for
prediction: a cautionary tale. In: AMIA Annual Symposium Proceedings: American Medical
Informatics Association, p. 1571 (2017)
35. Ashtiyani, M., Lavasani, S.N., Alvar, A.A., Deevband, M.: Heart rate variability classification
using support vector machine and genetic algorithm. J. Biomed. Phys. Eng. 8(4), 423 (2018)
36. Yang, L., MacEachren, A.M., Mitra, P., Onorati, T.: Visually-enabled active deep learning for
(Geo) text and image classification: a review. ISPRS Int. J. Geo-Inf. 7(2), 65 (2018)
340 B. Mullachery and S. Alismail

37. McCoy, T.H., Pellegrini, A.M., Perlis, R.H.: Assessment of time-series machine learning
methods for forecasting hospital discharge volume. JAMA Netw. Open 1(7), e184087–
e184087 (2018)
38. Marzec, L., et al.: Device-measured physical activity data for classification of patients with
ventricular arrhythmia events: a pilot investigation. PLoS ONE 13(10), e0206153 (2018)
39. Kwon, J.M., Kim, K.H., Jeon, K.H., Park, J.: Deep learning for predicting in-hospital mortality
among heart disease patients based on echocardiography. Echocardiography 36(2), 213–218
(2019)
40. Dubey, A.K., Gupta, U., Jain, S.: Epidemiology of lung cancer and approaches for its
prediction: a systematic review and analysis. Chin. J. Cancer 35(1), 71 (2016)
41. Sheth, A., Jaimini, U., Yip, H.Y.: How will the Internet of Things enable augmented
personalized health? IEEE Intell. Syst. 33(1), 89–97 (2018)
Analytic Hierarchy Process Model
for the Diagnosis of Typhoid Fever

Faith-Michael Uzoka1(B) , Chukwudi Nwokoro2 , Okure Obot2 , Moses Ekpenyong2 ,


Aniema I. A. Udo3 , and Boluwaji Akinnuwesi4
1 Department of Mathematics and Computing, Mount Royal University, Calgary, Canada
[email protected]
2 Department of Computer Science, University of Uyo, Uyo, Nigeria
[email protected]
3 Department of Internal Medicine, University of Uyo, Uyo, Nigeria
4 Department of Computer Science, University of Eswatini, Kwaluseni, Eswatini

[email protected]

Abstract. Typhoid fever is a global health problem, which seems neglected. Still,
it is responsible for significant levels of morbidity in many regions of the world,
with about 12 million cases annually, and about 600,000 fatalities. Diagnosis of
typhoid poses a lot of challenges because its clinical presentation is confused with
those of many other febrile infections such as malaria, yellow fever, etc. In addi-
tion, most developing countries do not have adequate bacteriology laboratories
for further investigations. Decision support systems (DSSs) have been known to
increase the efficiency and effectiveness of the diagnosis process, in addition to
improving access; however, most existing decision support models for the diag-
nosis of diseases have largely focused on ‘non-tropical’ conditions. An effective
decision support model for the diagnosis of tropical diseases can only be devel-
oped through the engineering of experiential knowledge of physicians who are
experts in the management of such conditions. In this study, we mined the experi-
ential knowledge of twenty-five tropical disease specialist physicians to develop
a decision support system based on the Analytic Hierarchy Process (AHP). The
resulting model was tested based on 2044 patient data. Our model successfully
determined the occurrence (or otherwise) of typhoid fever in 78.91% of the cases,
demonstrating the utility of AHP in the diagnosis of typhoid fever.

Keywords: Typhoid · Tropical diseases · Diagnosis · Analytic hierarchy process

1 Introduction

The World Health Organization [1] population estimate of the global typhoid fever bur-
den lies between 11–21 million. Also, up to 161,000 associated deaths have been reported
annually, and a greater proportion of this statistic comes from poor and vulnerable com-
munities such as South and South-East Asia including sub-Saharan Africa. Without
B. Akinnuwesi—Formerly University of Swaziland.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 341–358, 2023.
https://doi.org/10.1007/978-3-031-18344-7_23
342 F.-M. Uzoka et al.

appropriate tests, typhoid fever could be misdiagnosed, leading to complications and


possible death [2] since it presents with symptoms (e.g. fever, headache, fatigue, chills
and loss of appetite) that overlap with other febrile diseases. Most developing countries
plagued with infectious diseases lack adequate medical facilities to conduct appropriate
laboratory tests. They are also characterized by low health care expenditure–as the aver-
age per capita health expenditure in Africa, for instance, is $105, compared to $3,599
in the America [3]. Furthermore, the number of doctors per 10,000 populations is esti-
mated at 2.7 and 5.9 in Africa and the South East Asia, respectively, compared to 32.1
in Europe. This disturbing statistic corroborates the prevalence of tropical infectious
diseases in these regions and is responsible for the growing rate of self-medication and
other alternative sources of health care, which often prove counterproductive.
Some diseases are often neglected even when they cause serious fatalities to the
populace and pose a significant burden on public health and the economic stability of
societies around the world. One of such diseases is typhoid fever, which is not given
adequate attention accorded to other tropical diseases like malaria, hepatitis, and cholera.
Typhoid fever is a disease that is caused by bacteria called Salmonella Typhi. It is
also known as Enteric fever and has numerous symptoms such as fever, constipation,
diarrhoea, abdominal pain and many more. Unfortunately, typhoid fever is responsible
for a great number of morbidities in Africa and other tropical regions of the world,
as travellers to these regions suffer a great deal of infectious symptoms. Worse, the
lack of access to medical facilities and shortage of medical personnel in the few health
facilities have hugely contributed to high rates of fatalities from febrile tropical diseases.
Hence, accurate and timely diagnosis/therapy are essential conditions for the reduction
of complications associated with most tropical diseases [4]. Medical diagnosis, like other
diagnostic processes, is made more complex because of the level of imprecision involved.
Patients may not be able to describe exactly what has happened to them or how they feel;
doctors and other health care practitioners may not understand or interpret exactly their
observations; laboratory reports are not instantaneous and may come with some degree
of error [5]. This conundrum is compounded when a pathological process presents with
ambiguous symptoms like those of other conditions, as in the case of several tropical
diseases, or situations when expert medical practitioners are inexperienced or in short
supply and pressured [6]. Thus, corroborating the urgent need to increase access to
healthcare in developing countries.
The shortage of qualified doctors has necessitated the training and use of frontline
health workers (FHWs), such as midwives, nurses, and community health workers,
to improve patient care and access to life-critical interventions [1]. Studies have shown
that these workers are able to diagnose some common diseases using established manual
process [7]. Currently, WHO in [8] used teams located in nearby towns to reach conflict
zones by helicopter, setting up mobile clinics run by FHWs with basic screening tools,
and characterized by long waiting lines. The use of procedure manuals by FHWs is often
a slow process that could lead to diagnosis errors and delays. A very crucial aspect of
medical diagnosis is the process of gathering data from a patient. During an interrogation
by a medical doctor, a patient is hardly prevented from divulging details about his state
of health. It is the responsibility of the doctor to decipher and separate the tangible from
the intangible. This, he does by comparing one piece of information with another and
Analytic Hierarchy Process Model for the Diagnosis 343

based on his experience of the disease he determines the degree of importance of the
pair he compares. This task becomes daunting in the face of so many patients waiting to
be attended to by few medical doctors. To attend to most of them, most inexperienced
doctors (or FHWs) are prone to diagnosis errors, especially if the disease suspected is
the type that presents confusable symptoms. The clinical presentation of typhoid fever
is one of such diseases whose symptoms are often confused or in conflict with diseases
like malaria, hepatitis, urinary tract infection and others. The outcome is misdiagnosis
and attendant consequences of late diagnosis resulting in high morbidity and mortality
rate.
The need for a robust analytic hierarchy process model presented in this paper is
justified by the following major issues of concern in the course of diagnosing Typhoid
fever: (1) Typhoid fever is identified among of the neglected diseases that pose global
health problem and it is responsible for significant levels of morbidity in many regions
of the world, with about 12 million cases annually, and about 600,000 fatalities; (2)
Diagnosis of typhoid poses a great deal of challenge because its clinical presentation
is confused with those of many other febrile infections such as malaria, yellow fever,
and many others; (3) Most developing countries do not have adequate bacteriology
laboratories for further investigations; (4) Most patients suffering from typhoid fever find
it difficult to express how they feel making it difficult for a medical doctor to decipher the
cause of their illness; and (5) Typhoid fever is one of the often-misdiagnosed diseases
in low-to-middle income countries (LMICs) due to self-diagnosis resulting from poor
access to quality health care and lack of access to pathogenic testing.
This study proposes the use of the Analytic Hierarchy Process (AHP) in the devel-
opment of a decision support system for diagnosing typhoid fever. AHP [9] provides a
suitable mechanism for evaluating complex multicriteria decision variables, such as that
presented in the diagnosis of tropical diseases, which in most cases could be challenging
in terms of the combinatorial analysis of symptoms and their degrees of intensity in
the diagnosis process. The AHP technique has been applied in various facets of human
endeavour, including health care [10–12] and provides a mechanism for evaluating con-
sistency in the pairwise comparison of decision variables in the knowledge engineering
process [13]. In [14], AHP is seen as a theory of measurement through pairwise compar-
ison and relies on the judgment of experts to derive priority scales. These scales measure
tangibles in relative terms.
The model reported in this paper helps to successfully determine the occurrence
(or otherwise) of typhoid fever in 78.91% of the cases, demonstrating the utility of
AHP in the diagnosis of typhoid fever. In Sect. 2, we review literature on the use of
decision support systems in the diagnosis of some tropical diseases and the conventional
method of diagnosis of typhoid fever is also discussed. Section 3 presents the Study
Methodology. The results and discussion are presented in Sect. 4. Some conclusions are
drawn in Sect. 5.

2 Literature Review
The first efforts at creating decision support tools for medical diagnosis began with
the pioneering works of [15, 16]. These works attempted a paradigm shift from purely
344 F.-M. Uzoka et al.

engineering approaches toward a deeper ‘cognitive model’ consideration that explains


physicians thinking processes and reasoning in medical diagnosis. However, it was later
observed that purely rule-based systems were only good for narrow domains of medicine,
because most serious diagnostic problems were so broad and complex that straightfor-
ward attempts to chain together larger sets of rules encountered major difficulties, hence
such systems lacked the model of the disease or clinical reasoning [5]. As research
in the application of DSS in medical diagnosis deepened, emphasis shifted to the rep-
resentation and utilization of unstructured, imprecise, and dynamic knowledge. It is
noted in [17] that uncertainty and imprecision characterize the sources of information
available to medical DSSs. These sources include the patient, physician, laboratory and
other technical methods of evaluation, including the mathematical models that simulate
the diagnostic process; thus, medical DSS researchers have resorted to soft-computing
techniques for the management of issues of uncertainty and imprecision in medical
diagnosis [18]. Medical decision support systems have gained significant attention and
utilization in the past decade, bringing to actualization, and series of improvements over
the past four decades. Soft-computing technologies – a multicriteria decision-support
methodologies have been variously harnessed in the development of medical decision
support systems. Marsh et al. in [19] assessed the value of health care interventions using
multi-criteria decision by reviewing the literature of the approaches adopted. Focus of
their search was on EMBASE and MEDLINE where 40 studies were identified with 41
examples of MCDA in healthcare. The studies were observed to have been undertaken
in 18 different countries where majority of the studies focused on the design to support
investment (56%). Research on MCDA was found to be mostly done in Europe (46.3%)
with South America (0.02%) and Australia (0.02%) as the least. The combination of
experts’ opinion and literature (44%) of measuring criteria was the most used method
while AHP (26.8%) was the most used tool for eliciting weights. The approach of using
value for measuring comparison (93%) was found to be the most used approach. Grosan,
Abraham & Tigan in [20] proposed a multicriteria programming methodology for med-
ical diagnosis and treatment of neurectomy, while Hancerliogullari et al. in [21], used
multi-criteria decision-making models (Fuzzy Analytic Hierarchy Process) in evaluating
anaesthesia method options in circumcision surgery.
The alarming mortality rate of tropical diseases such as malaria, pneumonia, tuber-
culosis, cholera and others due to shortage of medical staff prompted a proposal of
an Android-based expert system for diagnosis of selected tropical diseases [22]. Two
soft-computing methodologies (fuzzy logic and AHP) were employed in the design.
Fuzzy logic was used to derive membership functions of each of the diseases and to
generate fuzzy rules to drive the inference engine of the system. AHP was used for pair-
wise comparison of the symptoms in order to select principal symptoms based on the
weights assigned by medical experts to each of the symptoms. Ajenahughrure, Sujatha
and Akazue in [23] proposed a system driven by fuzzy logic to classify symptoms for
the differential diagnosis of tropical febrile diseases. Their system was evaluated and
acceptable by system users, especially with respect to cost and potentials for improved
diagnosis effectiveness. A number of tropical diseases decision support systems pow-
ered by fuzzy logic and/or AHP or their variants have proven to covary with diagnosis
Analytic Hierarchy Process Model for the Diagnosis 345

outputs by human experts e.g. [11, 24, 25], because of the ability of fuzzy logic to han-
dle vagueness in symptom elicitation and the strength of AHP in the development of
multi-criteria models.
Typhoid fever is one of the often-misdiagnosed diseases in low-to-middle income
countries (LMICs) due to self-diagnosis resulting from poor access to quality health care
and lack of access to pathogenic testing. Most poor communities in LMICs have high
incidence of malaria (due to poor vector control) and typhoid fever (due to poor sanitation,
drug resistance, and self-diagnosis) [26]. Fever is a commonly reported symptom in
several tropical diseases, and without localized features and appropriate tests, diagnosis
often erroneously defaults to malaria [27] without consideration to other pathogens [28].
In the absence of accurate laboratory tests, presumptive diagnosis would require a careful
analysis of symptoms presentation and other clinical/non-clinical parameters [29].
Decision support systems have been previously proposed/developed for the diagno-
sis of typhoid fever. Oguntimilehin et al. in [30] proposed a machine learning approach,
using 18 symptoms, 100 training datasets and 50 testing datasets, with a 95% detection
rate. Though the results are impressive, the number of data sets utilized for training and
testing were few. Moreover, using 18 symptoms would likely reduce the efficiency of
diagnosis. Santosa et al. in [31] applied fuzzy logic Sugeno methods to the diagnosis of
typhoid fever and Dengue hemorrhagic fever. The similarity of the symptoms of these
two diseases necessitated the use of soft-computing methods, with 80.2% diagnostic
accuracy; but also, with a small set of 86 data. Several other researchers have developed
hybrid systems for the diagnosis of typhoid with varying degrees of diagnostic accu-
racy. For examples, Asogbon, et al. in [32] deployed an enhanced neuro-fuzzy system
which was applied in genetic algorithm for medical diagnosis. Their aim was to opti-
mize performance of an Adaptive Neuro-Fuzzy Inference System (ANFIS) in terms of
its connection weights which is usually computed based on trial and error when used
to diagnose typhoid fever. The study used Genetic Algorithm (GA) technique to auto-
matically evolve optimum connection weights needed to efficiently train a built ANFIS
model used for typhoid fever diagnosis, 104 medical records were adopted for the study
with 15 to 75 age range. This was used to test the performance of the multi-technique
decision support system. 70% of the dataset was used for training data, 15% was used
for validation while the remaining 15% was used to observe the performance of the
proposed system. Genetic Adaptive Neuro Fuzzy Inference system (GANFIS) gave an
average diagnosis accuracy of 92.7% compared to 85.5% recorded by the ANFIS.
Most of the studies on the use of soft-computing and multicriteria methods for
the diagnosis of typhoid fever produced encouraging results in terms of the matching
diagnoses; however, they mostly used small datasets, which makes the outputs difficult
to generalize. In addition, they failed to provide the false positive and false negative
values. The false positive (FP) is a Type-1 error because it indicates that the patient
actually has the disease, whereas a confirmatory test proves the initial test result to be
false. A false negative (FN) result is a Type-II error, whereby the diagnosis fails to
reject a false null hypothesis. The existence of false positive and false negative results
underscores the need for further confirmatory investigations with a higher degree of
sensitivity. According to Ioannidis, Tarone and McLaughlin in [33] the FP and FN
results do not necessarily lead to the same consequences, and their relative importance
346 F.-M. Uzoka et al.

may vary in different investigations, which indicates that the acceptable threshold may
also vary. In general, an acceptable threshold value could be from 0.5 accuracy for a
random classifier but a close accuracy value which is close to 1 could be better. Cooper
in [34] established that threshold can be addressed by selecting symptom-based cut-off
points to distinguish between disorder and normality which may be more or less wisely
chosen so that results obtained should be widely accepted.
Typhoid fever is one the diseases that is responsible for high mortality rate in the
tropical regions of the world. The high mortality rate is occasioned by misdiagnosis
due to its confusable symptoms that overlap with symptoms of other febrile diseases.
Inadequate medical facilities and personnel has also contributed to the high mortality and
morbidity as patients resort to self-medication which complicate the symptoms resulting
into deaths. Most patients suffering from typhoid fever find it difficult to express how
they feel making it difficult for a medical doctor to decipher the cause of their illness. This
uncertainty and imprecision though not peculiar to typhoid fever has necessitated the use
of soft computing techniques for the management and processing of these uncertainties
and imprecisions to medical diagnosis. AHP is found to be the most used tool for eliciting
weights of symptoms of diseases while fuzzy logic is known to be the most popular tool
in managing uncertainties and imprecision. A strong correlation has been found in most
AHP/Fuzzy logic diagnostic system and human experts’ diagnosis, though most of such
systems use small datasets. Such systems are also not found to use the false positive and
false negative values which underscores the need for further confirmatory investigations
with higher degree of sensitivity. In the light of this, the results produced by such systems
are not generalised.

2.1 Overview of Conventional Method of Diagnosis of Typhoid Fever


Typhoid fever presents with clinical features that are similar to many other febrile ill-
nesses in developing countries. Thus, making a diagnosis require more than just a good
clinical acumen as there are an array of other infectious diseases presenting with similar
symptoms. In a study by Andrews et al. in [35] only 4.1% of patients with an empirical
diagnosis of enteric fever had positive blood cultures for typhoidal Salmonella organ-
isms. The implication of their study is that >90% of the patients clinically diagnosed
with Typhoid fever had a febrile illness from other causes. Febrile illness like malaria,
dengue, sepsis, leptospirosis and many others are difficult to differentiate from typhoid
fever in endemic countries without diagnostic tests. This highlights the need for better
diagnostic approaches to limit inappropriate use of antibiotics and adequate treatment
of other causes of febrile illnesses [36, 37].

Microbiological Cultures
The isolation of the causative organism, Salmonella enterica serovar Typhi (Salmonella
Typhi), is the gold standard for the diagnosis [1]. Body fluids like blood, bone marrow,
stool, urine, rose spots, gastric and intestinal secretions may be cultured. Blood culture
gives a definitive diagnosis. However, the rates of positive culture are usually higher
when using bone marrow aspirates for the culture [38]. In a systemic review by Mogasale
et al. in [39], the proportion of Salmonella Typhi detection was 61% from blood cultures
compared to 96% from bone marrow aspirate cultures. The use of bacteriological cultures
Analytic Hierarchy Process Model for the Diagnosis 347

for the diagnosis of typhoid infection is cost-intensive and technically difficult, hence
the need for other diagnostic tests.

Antibody Detection Tests


These are rapid serologic tests designed for early and easy point-of-care use. The Widal
Test is based on the measurement of antibodies (agglutinins) against somatic (O) and
flagellar (H) antigens of Salmonella typhi in the sera of patients. Diagnosis is made by
demonstrating a four-fold increase in the antibody titre in paired samples collected 10–
14 days apart. Although widely used in many developing countries because of its low
cost, Widal test is limited by lack of standardized methods of assay and misinterpretation
of results [38, 40]. This has led to the overestimation of the number of patients presenting
with acute febrile illnesses diagnosed with Typhoid fever [41, 42]. A systematic review
by Mengist and Tilahun in [43] revealed poor reliability, low sensitivity and specificity
of the Widal test. Alternative serologic tests detecting S. typhi specific antibodies have
been developed. There are many types using different methods of serologic assay like
the rapid dipstick assays, dot enzyme immuno-assays and agglutination inhibition tests
[38].
Typhidot (Malaysian Biodiagnostic Research SDN BHD, Kuala Lumpur, Malaysia),
is an Enzyme-Linked Immunosorbent Assay (ELISA)-based method, modified into an
immunodot test format; TUBEX (IDL Biotech, Sollentuna, Sweden) detects antibodies
using agglutination inhibition tests and Enterocheck-WB (Zephyr Biomedicals, Goa,
India), a dipstick test, which IgM antibodies. Many other rapid tests kits are available,
but its use is limited by low sensitivity, low specificity and the cost [36].

Antigen Detection Tests


Many methods have been employed to detect S. typhi antigens in body fluids like serum
and urine. Monoclonal and polyclonal antibodies targeting somatic, flagellar and Vi
antigens found on S. typhi [36] are evaluated using Enzyme immuno-assay, counter
immune electrophoresis and co-agglutination tests. These tests also have low specificity
and sensitivity when compared to Blood cultures [44].

Molecular Assay
The need to overcome the challenges posed by the inadequacies of using serologic tests
and cultures have led to exploration of molecular methods for the diagnosis of typhoid
fever. DNA-based detection methods, such as Polymerase Chain Reaction (PCR) has
shown better sensitivity and specificity than blood cultures. The results are even better
with the use of nested multiplex PCR [38, 45].

3 Methodology
3.1 Data Collection

Data collection for the development and testing of the typhoid fever model was obtained
in Nigeria, which is a tropical country with a high population and a fairly significant
prevalence of tropical diseases. Two data collection instruments were designed for the
348 F.-M. Uzoka et al.

purpose of the study. The first instrument obtained experiential knowledge from 25 physi-
cians, experienced in the diagnosis of tropical diseases, for the development of models
to diagnose the following tropical diseases: malaria, typhoid, chicken pox, measles hep-
atitis B, yellow fever and UTI. In this paper, we report on the model for the diagnosis of
typhoid fever. The knowledge extraction instrument also elicited the following physician
demographic information: age range, gender, professional experience, type of clinic they
work in (public or private) and experience in diagnosing and treating the tropical dis-
eases under consideration. The instrument required the physicians to carry out a pairwise
comparison of various symptoms (obtained through literature search) that are associ-
ated with the diseases on a nine-point linguistic scale. Prior to the administration of the
AHP questionnaire, we employed the assistance of a physician and an epidemiologist
in reviewing our model to ensure that the correct symptoms are captured for each of
the diseases. Overall, eighteen symptoms were considered relating to the diagnosis of
typhoid: fever, headache, abdominal pain, fatigue, vomiting, coughing, loss of appetite,
chills, rash, and diarrhoea. The second instrument was administered to 40 physicians
who provided patient consultation and diagnosis data for 2199 patients, for purposes
of model testing – 2044 were found usable after data cleaning. In addition to the tests
with real life patient data, we requested 13 physicians to do a validation of the results
generated by our model.

3.2 Processing

This study adopted the classical AHP methodology [46] in the development of a model
for the diagnosis of typhoid fever. The AHP modeling was based on the group decision
analysis using an online Excel template (https://bpmsg.com/ahp-excel-template/). Based
on the results of the AHP computation, we developed the diagnosis model. The key
elements of the AHP are: pairwise comparison of variables; measurement of consistency,
and priorities derivation, all of which are detailed below:

Pairwise Comparison of Variables


One distinguishing trait of AHP is its ability to permit the evaluation of quantitative as
well as qualitative criteria and alternative on the same preference scale of nine levels.
Let A1 , A2 , . . . , An be evaluation variables, the priority of Ai , over Aj be represented
by a n x n matrix
′  
A = aij , i, j = 1, . . . , n (1)

Then the entries are defined by the following rules:

Rule 1 : If aij = p, then aij = p−1 , p > 0. (2)

Rule 2: If Ai is judged to be of equal relative importance/intensity, as .Ai


Then aij = aji = 1 which symmetric in nature, in particular,
Analytic Hierarchy Process Model for the Diagnosis 349

Measurement of Consistency
The levels of consistency and consensus of judgments by experts in AHP decision
modelling are crucial pointers to the model’s reliability and reflection of the dependability
of the expert judgments in relation to the pairwise comparison of the decision variables.
A consistency check must be conducted since priorities make sense only if they are
derived from consistent or near consistent matrices. Saaty in [46] proposed a consistency
ratio, which is related to the eigenvalue method. Deviations from the consistency are
represented by the consistency index (CI). Related to the CI is the consistency ratio (CR),
which is the ratio of the CI to a random consistency index (RI). CI is calculated as:
λmax − n
CI = (3)
n−1
and the consistency ratio is given as:

CR = CI /RI (4)

λmax = maximal eigenvalue, and n is the number of variables in the pairwise comparison
matrix.
RI is the random index determined by Saaty in [46] as follows:

n 3 4 5 6 7 8 9 10
RI 0.58 0.9 1.12 1.24 1.32 1.41 1.45 1.49

A consistency ratio of 0.1 is the maximum acceptable value [14].

Priorities Derivation Procedure


The online AHP template performs the synthesis of the pairwise comparison judgments.
It involves the computation of the eigenvector, which presents linear relationships among
the evaluation variables; thus, establishing the priority model. For each PWC matrix,
priorities are calculated based on the eigenvalue method to produce a priority vector P,
given as:
⎛ ⎞
P1
⎜ P2 ⎟ n
P=⎜ . ⎟ and Pi = 1 (5)
⎜ ⎟
⎝ .. ⎠ i

Pn
n
pi is generated as ( vij /n) (6)
j=1

vij is the eigenvalue corresponding to element aij of the PWC matrix. This is obtained
from the matrix of eigenvectors. The matrix of eigenvectors V is computed as:
⎡ a11
· · · na1n a

n
a
⎢ i=1. i1 . i=1 in
.. ⎥
V =⎢⎣ .
. . . .

⎦ (7)
a n1 a nn
n
a
· · · n a
i=1 i1 i=1 in
350 F.-M. Uzoka et al.

If there are lower levels in the hierarchy, then the global priority is obtained by
factoring in the eigenvector value of the priority at the level above the current hierarchy.
If µi is the eigenvector value associated with the upper-level criteria directly above the
set of variables (si ) under consideration, then the global priorities would be given as:

GP i = µi −
 −→
pi. xi. (8)

where

GP i is the global priority associated with the vector of variables and weight pairs
p−→
i. xi. . The variables are xi1 , xi2 , …, xin ; while pi. represents the lower level priority
weights (pi1 , pi2 , …, pin ) associated with xi1 , xi2 , …, xin .

4 Results and Discussion


The AHP questionnaire provided a basis for developing the typhoid diagnosis model
based on the experiential knowledge of the 25 physicians involved in the knowledge
definition process. The expert judgments were entered into the online AHP template
to produce the pairwise comparison matrix shown in Table 1, with 77.4% consensus
and 0.135 level of consistency. Karpetrovic and Rosenbloom in [47], found that it is
possible to answer rationally and consistently and obtain a consistency ratio above 0.1.
A number of studies (e.g. [48]) have adopted a consistency cut-off of 0.2 because of
the large number of comparisons and variations existing in expert’s institutional and
disciplinary variations in terms of emphasis placed on each evaluation component. In
this study, we have adopted a consistency cut-off of 0.2 due to the number of variables
under consideration, the confusable and overlapping nature of tropical disease symptoms
and the number of experts involved in the study.

Table 1. PWC Matrix (Relative Importance) with Respect to the Typhoid Symptoms

Fever Headache Fatigue Abdominal Vomiting Chills Diarrhoea Coughing Rash Loss of
pain appetite
Fever 1.00 0.79 0.75 0.49 0.70 0.81 0.50 0.00 0.23 0.78
Headache 1.00 0.00 0.00 0.47 0.00 0.00 0.00 0.00 0.00
Fatigue 1.00 0.46 0.68 0.00 0.00 0.00 0.00 0.00
Abdominal 1.00 0.69 0.38 0.71 0.00 0.30 0.59
pain
Vomiting 1.00 0.00 0.00 0.00 0.00 0.00
Chills 1.00 0.00 0.00 0.00 0.00
Diarrhoea 1.00 0.00 0.00 0.00
Coughing 1.00 0.37 0.35
Rash 1.00 0.00
Loss of 1.00
appetite

Synthesis involves the computation of eigenvalues and the eigenvector. Synthesis


yields the percentage of relative priorities, which is expressed in a linear form to give the
Analytic Hierarchy Process Model for the Diagnosis 351

eigenvector. The implication of the eigenvector is that it expresses the relative importance
of a symptom over another relating to the diagnosis of typhoid fever in the minds of the
physician. Figure 1 shows the relative priorities (relevance) of symptoms in the diagnosis
of typhoid fever, while the linear model (typhoid fever diagnosis factor index – TFDFI)
is shown in Eq. (9).

FDFI = (FeverScore ∗ 0.269) + (HeadacheScore ∗ 0.184)


+ T(AbdominalPainScore ∗ 0.118)(FatigueScore ∗ 0.137) + (VomitingScore ∗ 0.052)
+ (CoughingScore ∗ 0.016) + (LossOfAppetiteScore ∗ 0.083) + (ChillsScore ∗ 0.081)
+ (RashScore ∗ 0.017) + (DiarrhoeaScore ∗ 0.044) (9)

Diarrhoea 0.044
Rash 0.017
Chills 0.081
Loss of appetite 0.083
Coughing 0.016
Vomiting 0.052
Fatigue 0.137
Abdominal pain 0.118
Headache 0.184
Fever 0.269

0 0.05 0.1 0.15 0.2 0.25 0.3

Fig. 1. Priorities graph for typhoid symptoms

The TFDFI model shows that typhoid fever manifests mostly with fever (26.9%),
headache (18.4%), fatigue (13.7%), abdominal pain (11.8%), loss of appetite (8.3%) and
chills (8.1%). These are in agreement with the results obtained in [27, 49, 50]. Fever and
headache are two symptoms that manifest across most tropical diseases. The confusable
nature of symptom manifestations in these diseases call for methodical approaches to
isolate each disease based on other peculiar symptoms. Our research shows that a com-
bination of abdominal pain, chills, fatigue and loss of appetite in addition to headache
and/ or fever are strong pointers to the possibility of typhoid presence, though a num-
ber of these symptoms could present more at the later stages of typhoid infection [51,
52]. Several researchers have revealed that there are primary symptoms of typhoid which
starts with fever lasting for more than 48hrs, thereafter accompanied by intense headache
of about 43–90% presentation, followed by gastrointestinal symptoms which includes;
abdominal pain/cramps, nausea and vomiting, constipation or diarrhoea. All of these
symptoms present the same way for both children and adults [53–56].
Our model was tested using data from the 2044 patients, based on an aggregation
procedure shown in Fig. 2. The patients are assessed on each of the symptoms based on
352 F.-M. Uzoka et al.

a six-point linguistic scale as follows: none = 0, mild = 1, moderate = 2, strong = 3,


very strong = 4, extreme = 5.

Fig. 2. Diagnosis weight aggregation

We determined that on a linguistic scale any Dw ≥ 2 (moderate or above) is considered


non-trivial and as such points to the presence of typhoid fever in some degree. This was
compared with the patient confirmatory tests conducted by the physicians to determine
the matching diagnoses between our system and those conducted by the physician. We
also conducted the false positive and false negative analysis. The summary results are
shown in Table 2.

Table 2. Summary of diagnosis results

Parameter Number Percent (%)


Acceptable classifications 1613 78.91
False positives 208 10.18
False negatives 223 10.91
Total number of cases 2044 100

The results show 78.91% matching classifications of typhoid fever, with 10.18%
false positives and 10.91% false negatives. The results align with a number of results,
which have recorded false positives and false negatives of between 3% and 15% [57].
We approached 13 medical doctors to evaluate our model in terms of the results
obtained and feasibility of utilizing a computer application that would be developed
based on the AHP model for the diagnosis of typhoid fever. Most of the physicians were
of the opinion that computational methods, such as the use of AHP, could be viable in the
diagnosis of typhoid fever. There was a general opinion that the AHP model is complex
Analytic Hierarchy Process Model for the Diagnosis 353

to understand by non-computing (medical) experts; however, the results were considered


highly encouraging and the symptom weights obtained through the AHP model align
strongly with what exists in practice. Some physician’s comments are shown in Table 3:

Table 3. Physicians’ Evaluation of the AHP Model

Years of experience Hospital Comments


Less than 5 years Save Alive Hospital Port Harcourt The new model will help to save
Rivers State waiting time of patient suffering
from these symptoms
Less than 5 years Dalhatu Araf Specialist Hospital Typhoid is very common in the
Lafia. Nasarwa State North area; as such this model will
be of added advantage when
adapted into the health system.
Such a mobile application will
easily drive diagnosis
11–15 years University of Port Harcourt Strongly agree that there are
Teaching Hospital (UPTH) improved methods of attending to
Rivers State patients with typhoid, but this new
method should be clear
16–20 years University of Port Harcourt Good and interesting work, the
Teaching Hospital (UPTH) model should be applied to more
Rivers State common diseases like pneumonia
and tuberculosis should be
considered too. Typhoid is not quite
common among the paediatrics
population. An interesting work!
16–20 years Leads General Infirmary, Leeds Typhoid fever is not common
University Teaching Hospital Trust within UK region, expect in Africa
Leeds UK countries. The study is a good one,
it will be better if it is incorporated
into a mobile application for easy
detection, also other symptoms
should be incorporated
Above 20 years Zion Medical Centre Ahuoda Port The study is relevant, but there are
Harcourt improved methods of typhoid
diagnosis, this new method with
AHP is highly computational
Above 20 years Green Medical Centre Port Strongly agreed that there are
Harcourt conventional and improved
methods for the diagnosis of
typhoid fever; the new method will
be an added advantage
354 F.-M. Uzoka et al.

Though our sensitivity results (FP and FN) are within fairly acceptable thresholds
[58, 59], there is need to reduce the FP and FN levels. This could be accomplished
through: i) increase in the number of physicians providing experiential knowledge for
the model development; and ii) use of Delphi method for refining the physicians’ expert
judgments in pairwise comparison of symptoms of diseases. The physicians further
pointed out the need to use our syndromic test as a first-stage diagnostic tool to isolate
cases for further tests. Since a number of the further tests could be expensive [60, 61],
especially in low-to-middle-income countries, a computational the syndromic diagnosis
tool could be a veritable means of methodically isolating cases for further laboratory
tests. Previous studies (e.g. [62]) have emphasized the utility of soft-computing tools in
aiding inexperienced physicians and front-line health workers in syndromic diagnosis
of tropical confusable diseases.

5 Conclusion and Future Research Perspective


Typhoid fever is known to cause significant morbidity and mortality in LMICs, with
inaccurate estimates recorded in affected countries, especially in the South and South-
east Asia including the Sub-Saharan Africa. Furthermore, assessment of disease burden
appears limited and often trailed by a high degree of sensitivity and specificity of most
rapid diagnostic tests. In this study, we developed an AHP model for mining experi-
ential knowledge, to power decision support systems for efficient diagnosis of typhoid
disease. Our results are in alignment with existing knowledge (e.g. [10, 63]) of the
ability of AHP to support medical diagnosis modelling. In addition, our study adds the
following contributions to knowledge:

1. Pure domain knowledge mining: Mining knowledge from experience provides


opportunities for developing robust cognitive systems. This study mined experi-
ential knowledge from domain experts, as building blocks to an efficient decision
support system. Our model showed 78.59% 78.91% effectiveness in the classifica-
tion of typhoid fever. Although, several diagnostic models have been developed for
typhoid fever, among others are the works of Antillón et al. in [64], Zhang, et al. in
[65], and Hosoglu, et al. in [66] basically they considered laboratory experiments
and machine learning to predict typhoid fever, their results contributed to the body of
knowledge. However, our model will be useful because we considered the strength
of each of the symptoms
2. Confusable symptoms discrimination: Confusable symptoms present a dangerous
trajectory to failed treatments and misdiagnosis. Our AHP model, therefore, provides
an approach with the consistent threshold for ranking symptoms according to their
relative importance. With this, a trade-off between prominent symptoms can be
established, and the exact symptoms effectively isolated.
3. Cost-effective solution to disease diagnosis: Making health care solutions affordable
would impact positively on the healthcare system and maintain access to quality
treatments. This study serves as a pre-diagnostic toolkit that enables the detection of
typhoid fever. It is cost-effective because the trial-and-error detection of confusable
diseases would not only be minimized but the path to quality disease diagnosis is
certain.
Analytic Hierarchy Process Model for the Diagnosis 355

The results of our study and their generalizability can be improved upon by increas-
ing the number of domain experts (physicians) involved in the knowledge definition, and
implementing mechanisms that could improve the consistency and consensus in pair-
wise comparisons by the domain experts. The consistency of the pairwise comparison
could be improved by methods such as the adaptive AHP approach [A3 ] [67], and the
linguistic preference relations [Fuzzy LinPreRa] [68, 69], which also improves consen-
sus. Additional utilization of a Delphi process would also refine the experts’ pairwise
comparison results [70], while AHP hybridization with fuzzy logic could potentially
increase the predictive ability of the model by dealing with the fuzzy nature of data
that could arise during expert pairwise comparison judgment, and patient consultation.
We note that typhoid co-infects with some other febrile diseases such as malaria [71,
72]. It will be desirable to develop a multi-criteria diagnosis system that assists in the
differential diagnosis of febrile diseases, recognizing co-infection.

References
1. World-Heath-Organisation: Typhoid vaccine: WHO position paper. Weekly epidemiological
record 93, pp. 153–172 (2018). http://www.who.int/wer. Accessed 23 July 2020
2. Iheukwumere, I., Nwachukwu, C.N., Kanu, M.A.: Manifestations, mismanagement and diag-
nostic challenges of malaria and typhoid fever. Malar Chemoth Cont Elimin. 2(109), 38–41
(2013)
3. W. W. H. organization: World Health Organization, World Health Statistics 2015, World
Health Organization, Geneva, Switzerland (2015). https://www.who.int/gho/publications/
world_health_statistics/2015/en/. Accessed 20 Apr 2017
4. Djam, X., Wajiga, G., Kimbi, Y., Blamah, N.: A fuzzy expert system for the management of
malaria (2011)
5. Szolovits, P., Patil, R.S., Schwartz, W.B.: Artificial intelligence in medical diagnosis. Ann.
Intern. Med. 108(1), 80–87 (1988)
6. Driver, C.: Malaria and its avoidance. Pract. Nurse 37(8), 19–24 (2009)
7. Kayemba, C.N., et al.: Introduction of newborn care within integrated community case
management in Uganda. Am. J. Trop. Med. Hyg. 87(5 Suppl), 46 (2012)
8. World-Health-Organization: WHO teams assist people in hard-to-reach areas of Nige-
ria (2017). https://www.who.int/news-room/feature-stories/detail/who-teams-assist-people-
in-hard-to-reach-areas-of-nigeria. Accessed 12 May 2019
9. Wind, Y., Saaty, T.L.: Marketing applications of the analytic hierarchy process. Manag. Sci.
26(7), 641–658 (1980)
10. Liberatore, M.J., Nydick, R.L.: The analytic hierarchy process in medical and health care
decision making: a literature review. Eur. J. Oper. Res. 189(1), 194–207 (2008)
11. Uzoka, F.-M.E., Obot, O., Barker, K., Osuji, J.: An experimental comparison of fuzzy logic and
analytic hierarchy process for medical decision support systems. Comput. Methods Programs
Biomed. 103(1), 10–27 (2011)
12. Agapova, M., et al.: Using the analytic hierarchy process for prioritizing imaging tests in
diagnosis of suspected appendicitis. Acad. Radiol. 24(5), 530–537 (2017)
13. Zyoud, S.H., Fuchs-Hanusch, D.: A bibliometric-based survey on AHP and TOPSIS
techniques. Expert Syst. Appl. 78, 158–181 (2017)
14. Saaty, T.L.: Decision making with the analytic hierarchy process. Int. J. Serv. Sci. 1(1), 83–98
(2008)
356 F.-M. Uzoka et al.

15. Kulikowski, C.A.: Pattern recognition approach to medical diagnosis. IEEE Trans. Syst. Sci.
Cybern. 6(3), 173–178 (1970)
16. Shortliffe, E.H.: MYCIN: a rule-based computer program for advising physicians regarding
antimicrobial therapy selection, Stanford Univ Calif Dept of Computer Science (1974)
17. Kaeding, A.-K., Flor, T.: Processing unexact information in a medical used multiparadigm
system. pp. 590–592 (1995)
18. Song, Q., Ma, T., Kasabov, N.: A novel generic higher-order TSK fuzzy model for prediction
and applications for medical decision support, pp. 241–245 (2003)
19. Marsh, K., Lanitis, T., Neasham, D., Orfanos, P., Caro, J.: Assessing the value of healthcare
interventions using multi-criteria decision analysis: a review of the literature. Pharmacoeco-
nomics 32(4), 345–365 (2014)
20. Grosan, C., Abraham, A., Tigan, S.: Multicriteria programming in medical diagnosis and
treatments. Appl. Soft Comput. 8(4), 1407–1417 (2008)
21. Hancerliogullari, G., Hancerliogullari, K.O., Koksalmis, E.: The use of multi-criteria decision
making models in evaluating anesthesia method options in circumcision surgery. BMC Med.
Inform. Decis. Mak. 17(1), 1–13 (2017)
22. Olaniyan, O.M., Alegbeleye, O.: An android-based expert system for diagnosis of selected
tropical diseases using fuzzy-analytical hierarchy process. Int. J. Innov. Res. Educ. Technol.
Soc. Strateg. 6(1), 149–155 (2019)
23. Ajenaghughrure, I.B., Sujatha, P., Akazue, M.I.: Fuzzy based multi-fever symptom classifier
diagnosis model. Int. J. Technol. Comput. Sci. 10(1), 13–28 (2017)
24. Prihatini, P.M., Putra, I.K.G.D.: Fuzzy knowledge-based system with uncertainty for tropical
infectious disease diagnosis. Int. J. Comput. Sci. Issues (IJCSI) 9(4), 157 (2012)
25. Obot, O., Inyang, U.: ANFIS based fuzzy clustering system for differential diagnosis of
confusable diseases. World 6(2), 160–165 (2014)
26. Ajibola, O., Omisakin, O.A., Eze, A.A., Omoleke, S.A.: Self-medication with antibiotics,
attitude and knowledge of antibiotic resistance among community residents and undergraduate
students in Northwest Nigeria. Diseases 6(2), 32 (2018)
27. Crump, J.A., Luby, S.P., Mintz, E.D.: The global burden of typhoid fever. Bull. World Health
Organ. 82(5), 346–353 (2004)
28. Acestor, N., et al.: Mapping the aetiology of non-malarial febrile illness in Southeast Asia
through a systematic review—terra incognita impairing treatment policies (2012)
29. Luvira, V., et al.: Etiologies of acute undifferentiated febrile illness in Bangkok, Thailand.
Am. J. Trop. Med. Hyg. 100(3), 622 (2019)
30. Oguntimilehin, A., Adetunmbi, A., Abiola, O.: A machine learning approach to clinical
diagnosis of typhoid fever. Mach. Learn. Approach Clin. Diagn. Typhoid Fever 2(4), 1–6
(2013)
31. Santosa, I., Rahmanita, E., A’Yuni, T., Novianti, T.: Application of fuzzy logic Sugeno
methods for diagnosis typhoid fever disease and dengue hemorrhagic fever, pp. 24–10 (2018)
32. Asogbon, M., Samuel, O., Omisore, M., Awonusi, O.: Enhanced neuro-fuzzy system based
on genetic algorithm for medical diagnosis. J. Med. Diagn. Meth. 5(205), 2 (2016)
33. Ioannidis, J.P., Tarone, R., McLaughlin, J.K.: The false-positive to false-negative ratio in
epidemiologic studies. Epidemiology, 450–456 (2011)
34. Cooper, R.V.: Avoiding false positives: Zones of rarity, the threshold problem, and the DSM
clinical significance criterion. Can. J. Psychiatry 58(11), 606–611 (2013)
35. Andrews, J.R., et al.: High rates of enteric fever diagnosis and lower burden of culture-
confirmed disease in peri-urban and rural Nepal. J. Infect. Dis. 218(suppl_4), S214–S221
(2018)
36. Parry, C.M., Wijedoru, L., Arjyal, A., Baker, S.: The utility of diagnostic tests for enteric
fever in endemic locations. Expert Rev. Anti Infect. Ther. 9(6), 711–725 (2011)
Analytic Hierarchy Process Model for the Diagnosis 357

37. Andrews, J.R., Ryan, E.T.: Diagnostics for invasive Salmonella infections: current challenges
and future directions. Vaccine 33, C8–C15 (2015)
38. Sultana, S., Al Maruf, M.A., Sultana, R., Jahan, S.: Laboratory diagnosis of enteric fever: a
review update. Bangladesh J. Infect. Dis. 3(2), 43–51 (2016)
39. Mogasale, V., Ramani, E., Mogasale, V.V., Park, J.: What proportion of Salmonella Typhi
cases are detected by blood culture? A systematic literature review. Ann. Clin. Microbiol.
Antimicrob. 15(1), 1–8 (2016)
40. Bharmoria, A., Shukla, A., Sharma, K.: Typhoid fever as a challenge for developing countries
and elusive diagnostic approaches available for the enteric fever. Int. J. Vaccine Res. 2(2),
1–16 (2017)
41. Ammah, A., Nkuo-Akenji, T., Ndip, R., Deas, J.: An update on concurrent malaria and typhoid
fever in Cameroon. Trans. R. Soc. Trop. Med. Hyg. 93(2), 127–129 (1999)
42. Nsutebu, E.F., Ndumbe, P.M., Koulla, S.: The increase in occurrence of typhoid fever in
Cameroon: overdiagnosis due to misuse of the Widal test? Trans. R. Soc. Trop. Med. Hyg.
96(1), 64–67 (2002)
43. Mengist, H., Tilahun, K.: Diagnostic value of Widal test in the diagnosis of typhoid fever: a
systematic review. J. Med. Microbiol. Diagn. 6(01), 1–4 (2017)
44. Ajibola, O., Mshelia, M.B., Gulumbe, B.H., Eze, A.A.: Typhoid fever diagnosis in endemic
countries: a clog in the wheel of progress? Medicina 54(2), 23 (2018)
45. Srivastava, K.R., Awasthi, S., Mishra, P.K., Srivastava, P.K.: Biosensors/molecular tools for
detection of waterborne pathogens. Waterborne Pathog., 237–277 (2020)
46. Saaty, T.L.: A scaling method for priorities in hierarchical structures. J. Math. Psychol. 15(3),
234–281 (1977)
47. Karapetrovic, S., Rosenbloom, E.: A quality control approach to consistency paradoxes in
AHP. Eur. J. Oper. Res. 119(3), 704–718 (1999)
48. Cook, M., Angus, A., Gottberg, A., Smith, R., Longhurst, P.: Promoting sustainable resource
use through product service systems. In: CIWM Conference, Waste: A Global Resource.
Technical Session 5, Resource Recovery. Paignton, Torbay, UK, pp. 12–15 (2007)
49. Bhan, M., Bahl, R., Bhatnagar, S.: Typhoid and paratyphoid fever. Lancet 366(9487), 749–762
(2005)
50. Mouton, F., Ohuoba, E.I., Evans, F.M., Desalu, I., Wilson, C.: Typhoid enteric fever–part.
Update Anaesth. 32, 13 (2017)
51. Sanhueza Palma, N.C., Farías Molina, S., Calzadilla Riveras, J., Hermoso, A.: Typhoid fever:
case report and literature review. Medwave 16(05) (2016)
52. Buzğan, T., Evirgen, Ö., Irmak, H., Karsen, H., Akdeniz, H.: A case of typhoid fever presenting
with multiple complications. Eur. J. Gen. Med. 4(2), 83–86 (2007)
53. Zein, U.: Management of severe typhoid fever, pp. 1–6 (2017). https://www.researchgate.net/
publication/321144926_Management_of_Severe_Typhoid_Faver
54. Bhutta, Z.A.: Current concepts in the diagnosis and treatment of typhoid fever. BMJ
333(7558), 78–82 (2006)
55. Stephens, I., Levine, M.M.: Management of typhoid fever in children. Pediatr. Infect. Dis. J.
21(2), 157–159 (2002)
56. Woodward, T.E., Smadel, J.E.: Management of typhoid fever and its complications. Ann.
Intern. Med. 60(1), 144–157 (1964)
57. Lee, J.-H., et al.: False-positive results for rapid diagnostic tests for malaria in patients with
rheumatoid factor. J. Clin. Microbiol. 52(10), 3784–3787 (2014)
58. Hjalmarsson, V.: Machine learning and multi-criteria decision analysis in healthcare: a
comparison of machine learning algorithms for medical diagnosis (2018)
59. Dhouib, S., Kharrat, A., Chabchoub, H.: A multi-start threshold accepting algorithm for
multiple objective continuous optimization problems. Int. J. Numer. Meth. Eng. 83(11), 1498–
1517 (2010)
358 F.-M. Uzoka et al.

60. Lehmann, L.E., Herpichboehm, B., Kost, G.J., Kollef, M.H., Stüber, F.: Cost and mortality
prediction using polymerase chain reaction pathogen detection in sepsis: evidence from three
observational trials. Crit. Care 14(5), 1–10 (2010)
61. Bartlett, J., Stirling, D.: A short history of the polymerase chain reaction. In: Bartlett, J.,
Stirling, D. (eds.) PCR Protocols. Methods in Molecular Biology™, vol. 226, pp. 3–6. Humana
Press (2003). https://doi.org/10.1385/1-59259-384-4:3
62. Uzoka, F.-M.E., Nwokoro, C., Debele, F., Akinnuwesi, B., Olaniyan, M.: AHP model for
diagnosis of tropical confusable diseases, pp. 1758–1763 (2017)
63. Khanmohammadi, S., Rezaeiahari, M.: AHP based classification algorithm selection for
clinical decision support system development. Procedia Comput. Sci. 36, 328–334 (2014)
64. Antillón, M., et al.: The burden of typhoid fever in low-and middle-income countries: a
meta-regression approach. PLoS Negl. Trop. Dis. 11(2), e0005376 (2017)
65. Zhang, X., Liu, Y., Yang, M., Zhang, T., Young, A.A., Li, X.: Comparative study of four
time series methods in forecasting typhoid fever incidence in China. PLoS ONE 8(5), e63116
(2013)
66. Hosoglu, S., Geyik, M.F., Akalin, S., Ayaz, C., Kokoglu, O.F., Loeb, M.: A simple validated
prediction rule to diagnose typhoid fever in Turkey. Trans. R. Soc. Trop. Med. Hyg. 100(11),
1068–1074 (2006)
67. Lin, C.-C., Wang, W.-C., Yu, W.-D.: Improving AHP for construction with an adaptive AHP
approach (A3). Autom. Constr. 17(2), 180–187 (2008)
68. Wang, T.-C., Chen, Y.-H.: Applying fuzzy linguistic preference relations to the improvement
of consistency of fuzzy AHP. Inf. Sci. 178(19), 3755–3765 (2008)
69. Wu, Z., Huang, S., Xu, J.: Multi-stage optimization models for individual consistency and
group consensus with preference relations. Eur. J. Oper. Res. 275(1), 182–194 (2019)
70. Abdel-Basset, M., Mohamed, M., Sangaiah, A.K.: Neutrosophic AHP-Delphi group decision
making model based on trapezoidal neutrosophic numbers. J. Ambient. Intell. Humaniz.
Comput. 9(5), 1427–1443 (2017). https://doi.org/10.1007/s12652-017-0548-7
71. Baba, M., et al.: Evidence of arbovirus co-infection in suspected febrile malaria and typhoid
patients in Nigeria. J. Infect. Dev. Ctries. 7(01), 051–059 (2013)
72. Odikamnoro, O., et al.: Incidence of malaria/typhoid co-infection among adult population in
Unwana community, Afikpo north local government area, Ebonyi state, Southeastern Nigeria.
Afr. J. Infect. Dis. 12(1), 33–38 (2018)
Gradient Boosting and Minimum Redundancy
Maximum Relevance (mRMR) Feature Selection
for Diagnosis of Parkinson’s Disease Through
Patient Audio Data

Jagadeepram Maddipatla1 and Rishi Athavale2(B)


1 Rock Ridge High School, Ashburn, USA
2 Academy of Engineering and Technology, Ashburn, USA

[email protected]

Abstract. Parkinson’s Disease (PD) consistently ranks as one of the most


common neurodegenerative diseases nationally within the U.S., behind only
Alzheimer’s. While current research to fight the underlying causes of PD are
underway, diagnosis remains a pertinent issue due to an absence of formal lab
based tests. Patients with PD often develop abnormal speech patterns, such as the
slurring of words. This study aims to utilize and compare various different Gradient
Boosting models to diagnose PD based on audio data taken from a patient. Addi-
tionally, this study also tests these models with mRMR feature selection, Principal
Component Analysis (PCA) dimensionality reduction, Min-Max normalization,
and standardization. The dataset used in this study was the UCI Machine Learn-
ing Repository’s Parkinsons Data Set. The dataset includes 195 voice recordings
from 31 subjects with 22 biomedical voice measurements per recording. 48 of the
voice recordings in the dataset were from patients without PD and 147 of the voice
recordings were from patients with PD. 60% of the voice recordings were used for
model training, 20% for cross validation, and 20% for the test set. The HistGra-
dientBoosting model with Min-Max normalization and mRMR feature selection
was found to perform the best on the cross validation set, in which it had an accu-
racy of 94.8718%, a sensitivity of 96.7742%, a specificity of 87.5000% , an AUC
score of 0.921371, and an F1 score of 0.967742. This same model also achieved
an accuracy of 89.7436%, a sensitivity of 96.6667%, a specificity of 66.6667%,
an AUC score of 0.816667, and an F1 score of 0.935484 on the test set. The results
in this study show that Gradient Boosting models have the potential to provide
quick, efficient, and accurate diagnoses for PD in a clinical setting so patients can
receive treatment sooner.

Keywords: Parkinson’s disease · Gradient boosting models · Maximum


relevance feature selection · mRMR · Principal component analysis

1 Introduction
Parkinson’s Disease (PD) is a neurological disease which persists typically amongst
the elderly, though not entirely. While the exact causes of the disease vary amongst the

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 359–369, 2023.
https://doi.org/10.1007/978-3-031-18344-7_24
360 J. Maddipatla and R. Athavale

affected population, all exhibit injury within the basal ganglia and substantia nigra por-
tions of the brain [8]. These regions are most closely correlated with voluntary movement
and dopamine assembly; thus, excessive damage and inhibition of the neurons can lead
to noticeable manifestations of PD. For example, common in a majority of PD patients
is an involuntary tremor within the hands and sometimes feet [11]. Motor control and
movement is inhibited as well, with sudden bouts of muscle rigidity preventing typical
bodily actions [11].
Currently, doctors and laboratories tasked with diagnosing PD base their reports
upon symptom descriptions and brain scans, particularly dopamine mapping. In particu-
lar, tremors and muscle stiffness symptoms typically reported by PD patients result in the
official diagnosis of the disease due to its connection to the substantia nigra [3]. Rigorous
PD diagnosis, however, is unable to be conducted properly simply due to the unavail-
ability of clinical PD tools and unique symptoms of PD. While the disease results in the
manifestation of numerous symptoms, these are oftentimes not related solely to PD, and
can also be attributed to a variety of other diseases and disorders. Utilizing dopamine
mapping as the basis for PD diagnosis also leads to dramatically limited accessibility to
patients worldwide due to the lack of proper imaging tools.
In conjunction with the aforementioned symptoms, PD patients also experience a
change in speech patterns, often with slight variations in enunciation [3]. While such
changes are typically too insignificant to be noticed by the human observer on individual
cases, tell-tale patterns are clear within frequency metrics that are derived from vocal
recordings. The prevalence of such a symptom within the PD population as a whole
allows for an opportunity to differentiate active PD patients through a machine learning
approach. Unlike other widely considered symptoms, PD voice discrepancies are specific
to the PD disease, and a diagnosis tool based around voice fluctuations could prove to
be comparatively rigorous in the diagnostic process.

2 Methods
2.1 Dataset
This study utilized biomedical voice measurements of voice recordings to both,
train and evaluate Gradient Boosting models to detect PD. This data was taken
from the UCI Machine Learning Repository’s Parkinsons Data Set (available here).
This dataset includes a total of 195 voice recordings taken from 31 subjects, 23
of whom have PD and 8 of whom are healthy. Subjects who have PD are labeled
with a 1 and healthy subjects are labeled with a 0. For each voice recording,
22 biomedical voice measurements are included. These biomedical voice measure-
ments include the average vocal fundamental frequency (MDVP:Fo(Hz)), the max-
imum vocal fundamental frequency (MDVP:Fhi(Hz)), the minimum vocal funda-
mental frequency (MDVP:Flo(Hz)), five measures of variation in fundamental fre-
quency (MDVP:Jitter(%), MDVP:Jitter(Abs), MDVP:RAP, MDVP:PPQ, Jitter:DDP),
six measures of variation in amplitude (MDVP:Shimmer, MDVP:Shimmer(dB), Shim-
mer:APQ3, Shimmer:APQ5, MDVP:APQ, Shimmer:DDA), two measures of ratio of
noise to tonal components in the voice (NHR, HNR), two nonlinear dynamical complex-
ity measures (RPDE, D2), signal fractal scaling exponent (DFA), and three nonlinear
Gradient Boosting and Minimum Redundancy Maximum Relevance 361

measures of fundamental frequency variation (spread1, spread2, PPE). The dataset is


then split, with 60% of the voice recordings (117 voice recordings) going to the training
set, 20% of the voice recordings (39 voice recordings) going to the cross validation
set, and 20% of the voice recordings (39 voice recordings) going to the test set. The
training set is used to train the Gradient Boosting models and the cross validation set is
used to compare them. Once the best model is determined based on cross validation set
performance, it is then evaluated on the test set to measure its performance on new data
outside of the training or cross validation sets.

2.2 Gradient Boosting

Gradient Boosting is an ensemble machine learning model that utilizes numerous


decision trees [1]. Gradient Boosting starts with an initial prediction F 0 (x) is shown
below:
n

F0 (x) = argmin L(yi , γ ) (1)
γ
i=1

For Eq. (1), L is the loss function and yi is the ith label.
Once an initial prediction is made, regression trees are then constructed based on the
pseudo residuals of the previous prediction [1]. The equation for calculating the pseudo
residuals is shown below:
 
∂L(yi , F(xi ))
rim = − F(x) = Fm−1 (x) for i = 1 . . . n (2)
∂F(xi )

For Eq. (2), r im is the pseudo residual for sample i. This pseudo residual will be used to
create regression tree m.
To create regression tree m, a regression tree is fitted to the pseudo residuals [1]. The
terminal regions of the regression tree are denoted by Rjm , where j is the number of the
terminal region in the regression tree and m is the number of the regression tree [1]. The
output for each leaf node the tree is then computed with the following equation:
For j = i…J m :

γjm = argmin L(yi , Fm−1 (xi ) + γ ) (3)
γ
xi∈Rij

For Eq. (3), J m is the number of terminal regions for regression tree m.
Using the outputs from the tree, the predictions (denoted by F m (x)) are now updated.
Jm
  
Fm (x) = Fm−1 (x) + v γjm I x ∈ Rjm (4)
j=1

For Eq. (4), ν is the learning rate.


This process is then repeated M times, with each iteration generating a new tree [1].
The final output is denoted by F M (x).
362 J. Maddipatla and R. Athavale

There are multiple other variations of Gradient Boosting which introduce improve-
ments to the base algorithm. XGBoost (eXtreme Gradient Boosting) is a version of Gra-
dient Boosting that improves on scalability [12]. LightGBM (Light Gradient Boosting)
decreases memory usage and training time [4]. CatBoost allows for automatic handling
of categorical features and reduces overfitting [7].

2.3 Model Implementation

This study utilized five different Gradient Boosting models: XGBoost, HistGradient-
Boosting, GradientBoosting, LightGBM, and CatBoost. XGBoost was implemented
using the XGBClassifier class from the xgboost Python. HistGradientBoosting and Gra-
dientBoosting were implemented using sklearn’s HistGradientBoostingClassifier and
GradientBoostingClassifier classes. LightGBM was implemented using the LGBM-
Classifier class from the lightgbm Python library. CatBoost was implemented using
the CatBoostClassifier class from the catboost Python library.

2.4 Data Preprocessing

The biomedical voice measurements in the data were also normalized using Min-Max
scaling. For a given feature, MinMax scaling subtracts the minimum value for that
feature and then divides the result by the maximum value for that feature. The equation
for Min-Max scaling is shown below:
xi − min(xi )
xi′ = (5)
max(xi ) − min(xi )

For Eq. (5), x i is the ith feature. To prevent data leakage, the maximum and minimum
feature values used were taken from the training set.
Standardization was also tested as an additional data preprocessing method. Stan-
dardization works similarly to Min-Max scaling, except that in Standardization the mean
of the values for a feature are subtracted from each feature value and the result is then
divided by the standard deviation for the feature values.
xi − µ
xi′ = (6)
σ
For Eq. (6), μ represents the mean of xi and σ represents the standard deviation of xi. For
standardization, the mean and standardization values used were taken from the training
set.
Min-Max scaling, and Standardization were all tested for each model and with both
mRMR feature selection and PCA.

2.5 mRMR Feature Selection

Minimum redundancy maximum relevancy (mRMR) feature selection is an algorithm


that identifies the best group of K features in a dataset [13]. Unlike other feature selection
Gradient Boosting and Minimum Redundancy Maximum Relevance 363

algorithms like Boruta that seek to identify features that have any predictive capabil-
ity, mRMR identifies a small subset of features that will be the most useful [13]. For
this dataset in particular, which includes numerous redundant features (such as the six
different measures of variation in amplitude), decreasing the number of features to the
most essential will help to eliminate redundant features and potentially improve model
performance. For this study, mRMR feature selection was used to reduce the number of
features from 22 to 20. Additionally, to prevent data leakage, the features will be selected
based on training data. The models will be tested both with and without mRMR feature
selection.

2.6 PCA Dimensionality Reduction

Similar to mRMR, Principal Component Analysis (PCA) reduces the number of features
inputted to the model [2]. However, unlike mRMR which selects features to utilize, PCA
condenses features that correlate with one another into a new feature [2]. This allows
PCA to both reduce the number of dimensions and minimize the amount of information
lost in the process [2]. For this study, PCA was used to reduce the number of dimensions
from 22 to 20. Additionally, to prevent data leakage, PCA was performed based on data
from the training set. The models will be tested both with and without PCA.

2.7 Model Training and Evaluation

The five different Gradient Boosting models were trained on the data with different
combinations of Min-Max, Standardization, mRMR, and PCA. One group on data with
MinMax scaling, one group on data with Min-Max scaling and mRMR, one group on
data with Min-Max scaling and PCA, one group on data with Standardization, one group
on data with Standardization and mRMR, and one group on data with Standardization
and PCA. In total, 30 models were trained on the training set and then evaluated and
compared on the cross validation set. The models were evaluated on the cross validation
set using the metrics of accuracy, sensitivity, specificity, AUC score, and F1 score.

Sensitivity and Specificity: Sensitivity is the model’s accuracy on positive examples


(examples where the patient has PD). Specificity is the model’s accuracy on negative
examples (examples where the patient is healthy). These metrics are useful because they
can specifically identify how a model does on a specific class of data. This is especially
useful for this study because the dataset that is being utilized has a high degree of class
imbalance in favor of positive examples.

AUC Score: The AUC score is the area under the Receiver Operating Characteristic
(ROC) Curve. An ROC Curve is generated by varying the model’s threshold and plotting
the different false positive and true positive rates. The area under this curve works as a
measure of how likely a model is to output a higher probability for a positive example
than a negative example. For example, an AUC score of 0.7 would represent that if given
a positive example and a negative example, the model will output a higher probability
for the positive example than the negative one 70% of the time.
364 J. Maddipatla and R. Athavale

F1 Score: A model’s F1 score is the harmonic mean of the model’s precision and recall.
Precision is the likelihood that if the model predicts that a given example is positive that
the example is actually positive. This metric is also known as the Positive Probability
Value (PPV). Recall is the same as sensitivity (the model’s accuracy on positive exam-
ples). The term recall is used in this context because recall is most commonly used when
concerning F1 score. The equation for the F1 score is shown below:

precision × recall
F1 = 2 × (7)
precision + recall
For Eq. (7), it is common to also add a value ǫ to the denominator. ǫ is often a very small
value (such as 1e−100) and serves to prevent dividing by zero. Equation (7) with the
term included is shown below:
precision × recall
F1 = 2 × (8)
precision + recall+ ∈

3 Results

The five models were trained on the training set with different combinations of Min-
Max scaling, Standardization, mRMR feature selection, and PCA. The models were then
evaluated on the cross validation set based on their accuracy, sensitivity, specificity, AUC
score, and F1 score. The results for each model on the cross validation set are shown in
Tables 1, 2, 3, 4, 5 and 6.

Table 1. Gradient boosting models on validation set (Min-Max)

Model Accuracy Sensitivity Specificity AUC Score F1 Score


(%) (%) (%)
XGBoost 87.1795 90.3226 75.0000 0.826613 0.918033
HistGradientBoosting 92.3077 93.5484 87.5000 0.905242 0.950820
GradientBoosting 82.0513 87.0968 62.5000 0.747984 0.885246
LightGBM 92.3077 93.5484 87.5000 0.905242 0.950820
CatBoost 84.6154 87.0968 75.0000 0.810484 0.900000
Gradient Boosting and Minimum Redundancy Maximum Relevance 365

Table 2. Gradient boosting models on validation set (Min-Max + Pca)

Model Accuracy Sensitivity Specificity AUC Score F1 Score


(%) (%) (%)
XGBoost 76.9231 83.8710 50.0000 0.669355 0.852459
HistGradientBoosting 89.7436 96.7742 62.5000 0.796371 0.937500
GradientBoosting 79.4872 83.8710 62.5000 0.731855 0.866667
LightGBM 87.1795 93.5484 62.5000 0.780242 0.920635
CatBoost 89.7436 100.000 50.0000 0.750000 0.939394

Table 3. Gradient boosting models on validation set (Min-Max + Mrmr)

Model Accuracy Sensitivity Specificity AUC Score F1 Score


(%) (%) (%)
XGBoost 89.7436 90.3226 87.5000 0.889113 0.933333
HistGradientBoosting 94.8718 96.7742 87.5000 0.921371 0.967742
GradientBoosting 76.9231 80.6452 62.5000 0.715726 0.847458
LightGBM 92.3077 93.5484 87.5000 0.905242 0.950820
CatBoost 87.1795 90.3226 75.0000 0.826613 0.918033

Table 4. Gradient boosting models on validation set (Standardization)

Model Accuracy Sensitivity Specificity AUC Score F1 Score


(%) (%) (%)
XGBoost 87.1795 93.5484 62.5000 0.780242 0.920635
HistGradientBoosting 87.1795 93.5484 62.5000 0.780242 0.920635
GradientBoosting 84.6154 90.3226 62.5000 0.764113 0.903226
LightGBM 84.6154 93.5484 50.0000 0.717742 0.906250
CatBoost 82.0513 87.0968 62.5000 0.747984 0.885246
366 J. Maddipatla and R. Athavale

Table 5. Gradient boosting models on validation set (Standardization + Pca)

Model Accuracy Sensitivity Specificity AUC Score F1 Score


(%) (%) (%)
XGBoost 84.6154 96.7742 37.5000 0.671371 0.909091
HistGradientBoosting 89.7436 100.0000 50.0000 0.750000 0.939394
GradientBoosting 87.1795 93.5484 62.5000 0.780242 0.920635
LightGBM 87.1795 100.0000 37.5000 0.687500 0.925373
CatBoost 89.7436 100.0000 50.0000 0.750000 0.939394

Table 6. Gradient boosting models on validation set (Standardization + Mrmr)

Model Accuracy Sensitivity Specificity AUC Score F1 Score


(%) (%) (%)
XGBoost 84.6154 93.5484 50.0000 0.717742 0.906250
HistGradientBoosting 89.7436 96.7742 62.5000 0.796371 0.937500
GradientBoosting 84.6154 90.3226 62.5000 0.764113 0.903226
LightGBM 84.6154 90.3226 62.5000 0.764113 0.903226
CatBoost 84.6154 90.3226 62.5000 0.764113 0.903226

The HistGradientBoosting model with Min-Max and mRMR performed the best on
the cross validation set because, as shown in Table 3, it obtained the highest accuracy,
specificity, AUC score, and F1 score and obtained the second highest sensitivity. This
model was then evaluated on the test set. Its performance on the test set is shown in
Table 7 and Fig. 1.

Table 7. Histgradientboosting (Min-Max + mRMR) model on test set

Model Accuracy Sensitivity Specificity AUC Score F1 Score


(%) (%) (%)
HistGradientBoosting 89.7436 96.6667 66.6667 0.816667 0.935484
(Min-Max + mRMR)
Gradient Boosting and Minimum Redundancy Maximum Relevance 367

Fig. 1. HistGradientBoosting (Min-Max + mRMR) confusion matrix on test set

4 Discussion and Conclusion

PD currently affects roughly 1 million people in the U.S. alone, with 60,000 U.S. citizens
being positively diagnosed for the disease annually [9]. With this statistic only expected to
rise in the future, it is becoming increasingly important to diagnose PD in its early stages.
Prolonged diagnosis delays have proven to be catastrophic in the livelihood of families
and patients due to a lack of proper medication and attention. While alternatives for
diagnosis such as dopamine screening and symptom checklists exist, they require medical
professionals and extensive equipment to allow for proper execution; not only is this not
accessible to many populations, but it can also be extremely expensive. Furthermore,
many of the tested symptoms of PD also overlap with the known symptoms for other
diseases. Creating a viable and accurate solution for the rapid diagnosis of PD is an
essential asset in the race to stem disease progression. Voice analysis, being a PD specific
symptom, can easily be scaled to meet the needs of analysis due to its prevalence in
positively diagnosed patients. A machine learning algorithm to detect discrepancies in
patient voices for diagnosis tackles the issues of both accessibility and cost by creating a
readily available software solution. Voice is also unique with regards to PD, and so can
be used as a relatively accurate metric for diagnosis.
To improve the accuracy and efficiency of speech-based PD diagnosis, this study
aims to apply Gradient Boosting to classify a patient as either having PD or being
healthy based on various biomedical voice measurement features. After training on
the biomedical voice measurements from 117 voice recordings, the best performing
Gradient Boosting method was found to be HistGradientBoosting with Min-Max feature
scaling and mRMR feature selection. On the cross validation set, this model achieved
an accuracy of 94.8718%, a sensitivity of 96.7742%, a specificity of 87.5000%, an
AUC score of 0.921371, and an F1 score of 0.967742. When tested on the test set,
368 J. Maddipatla and R. Athavale

this model was found to have an accuracy of 89.7436%, a sensitivity of 96.6667%, a


specificity of 66.6667%, an AUC score of 0.816667, and an F1 score of 0.935484. This
relatively high performance on a limited amount of data highlights Gradient Boosting’s
high applicability to speech-based PD diagnosis and usefulness in clinical practice.
Gradient Boosting often produces models that take little memory and are able to
both run and train very quickly. This would further increase their accessibility, as they
wouldn’t require intensive hardware to run. Additionally, since they only require an
audio sample from a user, they may also be applicable to smart phone applications so
that users may obtain diagnoses from their home. The performance of Gradient Boosting
on classifying PD also indicates that it may also have applications in diagnosing other
neurological diseases, such as Alzheimer’s Disease, from audio samples. Based on the
findings of this study, Gradient Boosting has the potential to provide accessible and
efficient diagnosis for PD and possibly many other neurological diseases.
There are a variety of other methods that may improve on these results and that
weren’t implemented in this study. This study had to work with a limited number of
voice recordings (195) that were taken from a small range of patients (31). Due to
the differences in speech based on language and accent, voice samples from a large
and diverse number of subjects would help make a more universal model. Other than
collecting more data from more participants, which may be time consuming and costly, it
may also be possible to increase the diversity in the dataset by utilizing machine learning
algorithms that convert speech samples to different accents. Additionally, since this study
found success by using mRMR feature selection, other feature selection methods such as
Boruta and Fisher’s Score may be worth testing to see how they may perform differently
than mRMR. Finally, other boosting algorithms such as AdaBoost may be worth testing
on this problem based on the performance of Gradient Boosting.
Based on Gradient Boosting’s relatively impressive results on a small dataset of 195
voice recordings, Gradient Boosting is a promising method for providing accessible and
efficient PD diagnosis throughout the world, especially following further research.

References
1. (pdf) Gradient Boosting Machines, a tutorial - researchgate. (n.d.). https://www.researchg
ate.net/publication/259653472_Gradient_Boosting_Machines_A_Tutorial. Accessed 16 Feb
2022
2. Gewers, F.L., et al.: Principal component analysis: A natural approach to data exploration, 19
June 2018 arXiv.org. https://arxiv.org/abs/1804.02502. Accessed 16 Feb 2022
3. How parkinson’s disease is diagnosed. Johns Hopkins Medicine. (n.d.). https://www.hopkin
smedicine.org/health/treatment-tests-and-therapies/how-parkinson-disease-is-diagnosed. 16
Feb 2022
4. Ke, G., et al.: LightGBM: A highly efficient gradient boosting decision tree. Microsoft
Research, 6 August 2019. https://www.microsoft.com/en-us/research/publication/lightgbm-
a-highly-efficient-gradient-boosting-decision-tree/. Accessed 16 Feb 2022
5. Niccolini, F., Su, P., Politis, M.: Dopamine receptor mapping with PET imaging in parkinson’s
disease. J. Neurol. December 2014. https://pubmed.ncbi.nlm.nih.gov/24627109/. 16 Feb 2022
6. Mayo Foundation for Medical Education and Research. Parkinson’s disease. Mayo Clinic,
14 January 2022. https://www.mayoclinic.org/diseases-conditions/parkinsons-disease/dia
gnosis-treatment/drc-20376062. 16 Feb 2022
Gradient Boosting and Minimum Redundancy Maximum Relevance 369

7. Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A.V., Gulin, A.: CatBoost: Unbiased
boosting with categorical features, 20 January 2019. arXiv.org, https://arxiv.org/abs/1706.
09516. 16 Feb 2022
8. Spine, M.B. (n.d.). Parkinson’s disease. Parkinson’s Disease (PD) Mayfield Brain & Spine
Cincinnati, Ohio. https://mayfieldclinic.com/pe-pd.htm. Accessed 16 Feb 2022
9. Statistics. Parkinson’s Foundation. (n.d.). https://www.parkinson.org/Understanding-Par
kinsons/Statistics#:~:text=Nearly%20one%20million%20people%20in,to%201.2%20mill
ion%20by%202030. Accessed 16 Feb 2022
10. UCI Machine Learning Repository: Parkinsons data set. (n.d.). https://archive.ics.uci.edu/ml/
datasets/parkinsons. Accessed 16 Feb 2022
11. U.S. Department of Health and Human Services. (n.d.). Parkinson’s disease. National Institute
on Aging. https://www.nia.nih.gov/health/parkinsons-disease. 16 Feb 2022
12. XGBoost: A scalable tree boosting system -researchgate. (n.d.). fromhttps://www.researchg
ate.net/publication/310824798_XGBoost_A_Scalable_Tree_Boosting_System. Accessed 16
Feb 2022
13. Zhao, Z., Anand, R., Wang, M.: Maximum relevance and minimum redundancy feature selec-
tion methods for a marketing machine learning platform, 15 August 2019. arXiv.org. https://
arxiv.org/abs/1908.05376. 16 Feb 2022
Optic Disk Detection in Fundus Images
of Retinopathy of Prematurity

Monserrate Intriago-Pazmiño1,2(B) , Julio Ibarra-Fiallo3 , María Pérez-Hernández1 ,


Adán Guzmán-Castillo1 , and Eddy Torres-Constante3
1 Departamento de Informática y Ciencias de la Computación, Escuela Politécnica Nacional,
Quito, Ecuador
[email protected]
2 Biomedical Informatics Group, Universidad Politécnica de Madrid, Madrid, Spain
3 Colegio de Ciencias e Ingenierías, Universidad San Francisco de Quito, Cumbayá, Ecuador

Abstract. Children born prematurely may suffer from an eye disease called
retinopathy of prematurity (ROP). To estimate the severity of this disease, physi-
cians need to type, among other things, the extent of the disease on images that
are often of poor quality. The extent of ROP is measured from the optical disc.
Therefore, it is essential to have an automatic method that locates and segments
the optical disc. In order to contribute to a computational method that detects the
optic disc in children’s pathological images, a fast-processing method is presented
in this work. This method creates a template-based local binary pattern histogram.
Next, the method proposes recognizing candidate windows as an optical disk from
regional maxima. Then, the template is used to choose the correct optic disk. This
method used thirty images from the ROPFI dataset that contains infant patholog-
ical images. The optic disk has been manually labeled. The test identifying the
optic disk achieved a sensitivity of 0.95.

Keywords: Optic disk detection · Retinopathy of prematurity · Medical image


processing

1 Introduction
Retinopathy of prematurity is an eye pathology that could occur in children born pre-
maturely. In extreme severity, it can cause blindness. This pathology is typified in the
International Classification of Retinopathy of Prematurity within stages, extent, pre-
plus, and plus-disease [1, 2]. ROP is the infantile eye pathology that, in recent years,
has gained more effort to be diagnosed using computational solutions [3], including the
use of artificial intelligence approaches [4]. However, researchers are dealing with some
difficulties; on the one hand, the availability of modern cameras used to obtain images
in high quality, the lack of quantitative agreements that facilitate the implementation
of algorithms, and, on the other hand, the few public datasets accessible to replicate
research [3–5].
Returning to the fact that this pathology is diagnosed by extent (zone I, II, and
III) and that the extension is measured by taking the optic disc as a reference, there is

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 370–380, 2023.
https://doi.org/10.1007/978-3-031-18344-7_25
Optic Disk Detection in Fundus Images of Retinopathy 371

a fundamental interest in automatically locating the optical disc through programmed


algorithms.
In the following section, it is observed that most of the related work has proposed
methods for detecting the optic disc in adult images. For that, this work also presents
an essential contribution to the scientific community in solving the needy in children’s
pathological images. Additionally, it is of interest that this computational solution can
be run on an ordinary computer. In this sense, we propose to build a simple method
of detecting the optic disk in an infantile image, which can be executed with the least
computational resource, and with a high sensitivity rate. The method we propose is
based on preprocessing and modeling digital image processing techniques that meet the
objectives.
The rest of the article is organized as follows. Section 2 details related works.
Section 3 depicts the proposed method. In Sect. 4, the results are presented. Section 5
discusses this proposal and related works. Finally, in Sect. 6, some conclusions and
future work are stated.

2 Literature Review
In this section, we review proposals for localization of the optic disc in fundus images
of both adults and children.
An automatic method for segmenting the optic disk is presented in [6]. The proce-
dure for OD segmentation applied some techniques like Principal Component Analysis
(PCA), Mathematical Morphology, and Circular Hough Transform. The input image is
well presented using PCA and allowed to convert the image to grayscale. Then, blood
vessels are removed from the image using mathematical morphology. Finally, the Cir-
cular Hough Transform is applied for the OD separation. This proposal was tested in the
MESSIDOR database. It contains fundus images with the presence of diabetic retinopa-
thy. Authors reported that OD was effectively segmented in some images, whereas, in
others, the proposed approach failed to segment OD. Nevertheless, this work did not
report quantitative results.
The optic disk is automatically segmented in fundus adult images using a morpho-
logical approach, as is detailed in [7]. This research uses a low pass finite impulse
response (FIR) filter to suppress blood vessel dominance and improve the OD area in
fundus images. Optimized grayscale morphological dilation and median filtering oper-
ations are used to segment the OD area. This method was tested in four public datasets
(DRIVE, DIRATEDB0, DIRATEDB1, and DRIONS). Considering the four datasets,
the sensitivity reached values of up to 0.8707.
An automatic optic disk detection is given in [8]. The authors propose an approach
combining lexicography representation and a super vector machine classifier (SVM).
Local feature spectrum analysis (LFSA) is used for pre-processing. The sparse dictionary
selection strategy was used to choose optic disc candidate windows in LFSA. And an
SVM implementation realizes the final classification. This proposal was tested in the
MESSIDOR dataset, previously described. On accuracy, this approach achieved a rating
of 0.9975.
An optic disk and optic cup (OC) location method is proposed in [9]. The method
is a personalized fully convolutional network (FCN) and Inception building blocks of
372 M. Intriago-Pazmiño et al.

GoogleNet. This method is developed for adult fundus images healthy and with the
presence of glaucoma. It was tested in two datasets REFUGE and a private dataset
from the Second Affiliated Hospital of Zhejiang University School of Medicine. The
performance of this method is presented in terms of Intersection-over-Union (IOU) and
Dice scores.
In [10], the optic disc segmentation is achieved with a convolutional neural network
(CNN). The Eye Hospital of Wenzhou Medical University provided manually labeled
images, creating a private dataset of 541 images. The CNN was trained using 487 images,
and testing was achieved with 54 images. The sensitivity was reported equal to 0.94.
In [11], the optic disc is detected and localized, and then the presence of glaucoma is
stated. Regions with Convolutional Neural Networks (RCNN) are used for OD detection,
which is responsible for locating and extracting the optic disc from a fundus image. The
method is tested on several adult datasets ORIGA, HRF, OCT & CFI, DIARETDB,
DRIVE, DRIONS DB, and MESSIDOR. The results show that more than 96% of the
DO are located with more than 50% of the labeled disk present in the prediction.
In [12], a Locally Statistical Active Contour Model (LSACM) is introduced, which
allows the modeling of inhomogeneous objects as Gaussian distributions of different
means and variances. To achieve the modeling, a sliding window is used within the
original image to another domain, in which the intensity distribution of each object with
intensity non-homogeneity has less overlap statistically. Then, a maximum likelihood
energy function is solved to approximate the correct image signal. The results obtained
for this proposed method, on the adult fundus images DRISHTI-GS, averaged 0.95 for
the F-Score metric and 8.32 for the boundary bases distance metric.
In [13], LSACM is also applied for optic disc and optic cup segmentation in the pres-
ence of fundus intensity non-homogeneity. This method was applied to the DRISHTI-GS
and RIM-ONE R2 adult patient datasets. The results obtained for optic disc and optic
cup performance in F-score are 0.5% and 2.2% higher than baseline, respectively.
At the time of writing this article, we have identified only one research paper propos-
ing the detection of the optic disc in fundus images of infant patients [10]. The other
papers propose methods applied for imaging adult patients. For motivations of our
research, this article details a method for detecting the optic disc in images of preterm
infants. These images present challenges due to their low quality, aiming to contribute to
the later stages of the computational processing to assist in Retinopathy of Prematurity
(ROP) diagnosis.

3 Method
Given the significance of identifying the optical disk in the fundus images from premature
births, an effective method for recognizing the optical disk is proposed. The difficulty
of the automatic digital processing of these pathological images is considered. The
Retinopathy of Prematurity Fundus Images (ROPFI) dataset is used in this research [14].
ROPFI has sixty-four pathological images. However, only thirty images were suitable
to work on for this research. Images that did not contain the complete optical disc were
discarded.
The method is a template matching proposal. In advance, a template is created
based on histogram representation. Then areas with maximum regional are found, the
Optic Disk Detection in Fundus Images of Retinopathy 373

template is compared with each region, and the most similar or less different is the most
probable optic disk (see the method’s graph in Fig. 1). In order to create the template
for matching, optical disk samples from ten images were used. The template is based
on the local binary pattern histogram (LBP) developed in [15]. Having the template,
the process of identifying the optical disk on any fundus image can be executed. The
entry is a color fundus image and its mask. The mask is a binary image that delimits the
region of the vascular network. It allows optimizing the processing by sidestepping the
nonvascular area, which could be black background or an abnormal area of the retina.
Then, the fundus color image is enhanced in contrast, brightness, and correction gamma
using the method proposed in [14]. The image is required to convert to grayscale for the
next step. Making use of the image in grayscale regional maximums are computed. The
maximum regions represent the lightest or closest to white pixels, which are likely to be
the optical disk. The center of mass of each maximum region is calculated, and a window
is created from that center. The LBP is calculated and compared with the template using
the structural similarity index measure (SSIM) proposed in [16]. The histogram with the
most significant similarity to the template has the highest probability of being the optical
disk. Finally, the optical disk is marked on the original picture. The detail of each step
is described below.

3.1 Enhancement of Color Fundus Image

The original images present poor contrast. In order to facilitate the later steps of process-
ing, it is required to alter the contrast and brightness. Therefore, the proposed method
in [11] was selected. This method performs an adaptive improvement using a feedfor-
ward artificial neural network to choose the filters’ parameter. The filters used are basic
contrast and brightness, gamma correction, and contrast-limited adaptative histogram
equalization. In addition, Gaussian smoothing increases the distinction of connected
pixels in the same region [17]. See an example image in Fig. 2.
374 M. Intriago-Pazmiño et al.

Fig. 1. The method proposed to recognize the optic disk from fundus images with the presence
of retinopathy of prematurity. Contrast has been modified in some images for visibility proposals.

(a) (b)
Fig. 2. A color fundus image from the ROPFI dataset. (a) Original image. (b) Its corresponding
image where contrast and brightness are enhanced.

3.2 Optic Disk Template

The optic disk template is created in terms of a local binary pattern histogram (LBPH)
[15]. It has been created using ten images of the ROPFI set. Images were improved
using the method mentioned in the following subsection. To create the template, only
the optic disk is required; for this reason, the optic disk was cropped in a window of
100x100 pixels. A sample is presented in Fig. 3. The LBP histogram of each image was
Optic Disk Detection in Fundus Images of Retinopathy 375

computed, and the template is the result of the average. The histograms of each model
image and the resulting template are shown in Fig. 3 and (Fig. 4)

Fig. 3. Sample of three optic disks used for generating the template.

Fig. 4. Local binary pattern histograms (LBPH) of ten optic disks utilized to create the template.
The last graph is the template’s LBPH.

3.3 Regional Maximums and Their Centroids


A new image from which you want to detect the optical disk must be previously pre-
processed, as mentioned in Sect. 3.1. After being prepared, it will be exposed to this
phase of localization of possible regions containing the optical disk. This is achieved by
identifying the regional maxima and their centroids.
376 M. Intriago-Pazmiño et al.

The main features of the optical disk are a ring-shaped region, the clearest region, and
the thickest blood vessels. Having understood these characteristics, the clearest regions
are searched. They are known as regional maximum. They are computed with eight-
connected neighbors. Then, the centroid is identified for each region. A window of the
same size as the template is drawn from each centroid. Its LBP feature vector is generated
and compared with the template, and the optic disk is decided by the region with the
most significant similarity to the LBP template. The similarity measure is described in
the following subsection.

3.4 Structural Similarity Index

A measure of proximity between an object to be analyzed and the template is required.


This could be measured by how different or similar the objects are. The difference and
similarity between the template and each model’s object are presented in Fig. 5 and
Fig. 6, respectively. The mean squared error (SQE) measure has been used to calculate
the difference. In comparison, structural similarity index measure (SSIM) has been used
for likeness. The SSI measure allows a better distinction of objects closer to the template
and is invariant to rotations since it considers texture [16]. SSIM is the multiplication of
luminance, contrast, and structural term, as in (1).

SSIM (I , T ) = [l(I , T )]α · [c(I , T )]β · [s(I , T )]γ (1)

where ∝, β, γ are parameters to set the weight of the three terms. Luminance is driven
by Weber’s law, contrast is compared by the mean of intensity standard deviation, and
structure comparison is performed by cross-covariance. The development of this metric
is available in [16].
This method has been implemented in Matlab 2021b. Most of the algorithms have
been coded, as well as the similarity and dissimilarity measures.

4 Results

This research work used the ROPFI image set, containing only pathologic images. Suc-
cessive filters and the definition of an effective template achieved a 95% true positive
rate or sensitivity. Sensitivity is defined [18], and mathematically expressed in (2).
TP
Sensitivity = (2)
P
where TP means true-positive, and it is the number of correctly recognized optic discs,
and P is the total number of test images. In this case, there are no negative classes (N)
because all images contain an optic disc.
A comparison with related work is presented in Table 1. Two recent works about
adult images and only one about children’s images are summarized. The F-score metric
used in one related work can be reviewed at [18].
The nature of fundus imaging varies between adults and children; even so, the result
of our model is quite close to imaging studies of adult patients.
Optic Disk Detection in Fundus Images of Retinopathy 377

Fig. 5. Mean square error measure between images used for getting the template and the template.
Images five and six are the most and less different, respectively.

Fig. 6. Structural similarity index measure between images used for getting the template and the
template. Images six and nine are the most and less similar, respectively.

5 Discussion

In this research, we have chosen to present the proposal as if it were two since the
results vary according to the similarity metric chosen: mean squared error (SQE) and
structural similarity index measure (SSIM). This decision made it possible to find the
best metric to refine the method’s performance. In this case, the metric SSIM gave the
best performance.
378 M. Intriago-Pazmiño et al.

Table 1. Performance of recent proposals to detect the optic disk. SQE: mean squared error,
SSIM: structural similarity index measure

Proposal Dataset Type of images Comparison’s Sensitivity F-score


metric
[11] Adult datasets Other diseases – 0.96 –
ORIGA, HRF,
OCT & CFI,
DIARETDB,
DRIVE,
DRIONS DB,
and MESSIDOR
[12] Adult dataset Other diseases – – 0.5
DRISHTI-GS
[10] Children’s Diseased with – 0.94 –
Deep private no-named ROP
convolutional dataset
neural networks
This proposal Children’s private Diseased with SQE 0.80 –
image dataset ROPFI ROP
processing
This proposal Children’s Diseased with SSIM 0.95 –
image private dataset ROP
processing ROPFI

Another significant contribution of our work is to require minimal computational


resources. Whereas the solution most related to our work [10] presents a deep neural
network solution. As is well known, CNNs need more computational resources such as
a high-performance graphics card in order to achieve minute processing times.
Perhaps the greatest weakness of our work is the limited number of images, as well
as the source of capture. All ROPFI images were captured with the same camera. We
are trying to obtain images from other related work or to prepare our own dataset with
more images to identify opportunities to improve our method.

6 Conclusion

In this research work, a straightforward technique for optical circle recognition has been
introduced, whose fundamental commitments are this proposal is focused on dealing with
pathological images of infants, its simplicity makes it possible to run on any computer,
and performance is slightly superior to other related work.
The proposed method is based on image processing techniques. First, a template
describing the optical disc is established. Then, candidate regions are identified, and the
region with the best match with the template is chosen. This simple solution is suitable
for the needs of many physicians.
Optic Disk Detection in Fundus Images of Retinopathy 379

In future works, the method should be tested on other datasets to refine its model if
necessary; acquire or create a dataset of a larger number of images; provide new methods
to achieve a complete typification of ROP disease.

References
1. Zhao, J., et al.: A deep learning framework for identifying zone I in RETCAM images. IEEE
Access. 7, 103530–103537 (2019). https://doi.org/10.1109/ACCESS.2019.2930120
2. An international committee for the classification of retinopathy of prematurity: an inter-
national classification of retinopathy of prematurity revisited. Arch. Ophthalmol. 102,
1130–1134 (1984). https://doi.org/10.1001/archopht.1984.01040030908011
3. Reid, J.E., Eaton, E.: Artificial intelligence for pediatric ophthalmology. Curr. Opin.
Ophthalmol. 30, 337–346 (2019). https://doi.org/10.1097/ICU.0000000000000593
4. Scruggs, B.A., Paulchan, R. V., Kalpathy-Cramer, J., Chiang, M.F., Peter Campbell, J.: Arti-
ficial intelligence in retinopathy of prematurity diagnosis. Trans. Vis. Sci. Technol. 9, 1–10
(2020). https://doi.org/10.1167/tvst.9.2.5
5. Shen, Y., et al.: Domain-invariant interpretable fundus image quality assessment. Med. Image
Anal. 61, 101654, (2020). https://doi.org/10.1016/j.media.2020.101654
6. Akhade, S.B., Deshmukh, V.U., Deosarkar, S.B.: Automatic optic disc detection in digital
fundus images using image processing techniques. In: 2014 International Conference on
Information Communication and Embedded Systems, ICICES 2014. (2015). https://doi.org/
10.1109/ICICES.2014.7034118
7. Bharkad, S.: Automatic segmentation of optic disk in retinal images. Biomed. Signal Process.
Control 31, 483–498 (2017). https://doi.org/10.1016/J.BSPC.2016.09.009
8. Zhou, W., Wu, H., Wu, C., Yu, X., Yi, Y.: Automatic optic disc detection in color retinal
images by local feature spectrum analysis. Comput. Math. Meth. Med. 2018 (2018). https://
doi.org/10.1155/2018/1942582
9. Qin, P., Wang, L., Lv, H.: Optic disc and cup segmentation based on deep learning. In: Pro-
ceedings of 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation
Control Conference, ITNEC 2019, pp 1835–1840 (2019). https://doi.org/10.1109/ITNEC.
2019.8729455
10. Mao, J., et al.: Automated diagnosis and quantitative analysis of plus disease in retinopathy of
prematurity based on deep convolutional neural networks. Acta Ophthalmol. 98, e339–e345
(2020). https://doi.org/10.1111/aos.14264
11. Bajwa, M.N., Malik, M.I., Siddiqui, S.A., et al.: Two-stage framework for optic disc local-
ization and glaucoma classification in retinal fundus images using deep learning. BMC Med
Inform Decis Mak 19, 136 (2019). https://doi.org/10.1186/s12911-019-0842-8
12. Gao, Y., Yu, X., Wu, C., Zhou, W., Wang, X., Chu, H.: Accurate and efficient segmentation of
optic disc and optic cup in retinal images integrating multi-view information. IEEE Access.
7, 48183–148197 (2019). https://doi.org/10.1109/ACCESS.2019.2946374
13. Jiang, Y., Tan, N., Peng, T.: Optic disc and cup segmentation based on deep convolutional
generative adversarial networks. IEEE Access. 7, 64483–64493 (2019). https://doi.org/10.
1109/ACCESS.2019.2917508
14. Intriago-Pazmino, M., Ibarra-Fiallo, J., Crespo, J., Alonso-Calvo, R.: Enhancing vessel vis-
ibility in fundus images to aid the diagnosis of retinopathy of prematurity. Health Inform. J.
1–15 (2020). https://doi.org/10.1177/1460458220935369
15. Ojala, T., Pietikäinen, M., Mäenpää, T.: Multiresolution gray-scale and rotation invariant
texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24,
971–987 (2002). https://doi.org/10.1109/TPAMI.2002.1017623
380 M. Intriago-Pazmiño et al.

16. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error
visibility to structural similarity. IEEE Trans. Image Process. 13, 600–612 (2004). https://doi.
org/10.1109/TIP.2003.819861
17. Hsiao, P.Y., Chou, S.S., Huang, F.C.: Generic 2-D gaussian smoothing filter for noisy image
processing. In: IEEE Region 10 Annual International Conference, Proceedings/TENCON,
pp. 1–4 (2007). https://doi.org/10.1109/TENCON.2007.4428941
18. Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27, 861–874 (2006).
https://doi.org/10.1016/j.patrec.2005.10.010
Machine Learning Computational
Framework for Alzheimer’s Disease
Stages Classification

Carlos Theran-Suarez(B) , Yohn Jairo Parra Bautista, Victor Adankai,


and Richard Aló

Florida A&M University, Tallahassee, USA


{carlos.theran,yohn.parrabautista,victor.adankai,richard.alo}@famu.edu

Abstract. Alzheimer’s Disease (AD) is a neurodegenerative disorder


primarily characterized by deteriorating cognitive functions. In 2016 an
estimated 40 million people were diagnosed with AD, and the expectation
for 2050 is 131 million. Therefore, healthcare systems require detecting
and confirming AD at its different stages to provide adequate and accu-
rate treatments. Recently, Machine Learning (ML) models have been
used to classify AD’s stages. It has become a priority to develop a frame-
work for AD’s stages detection based on ML and imputation methods
capable of handling datasets with missing values while providing high
accuracy. We propose a ML computational framework that integrates
data processing, feature selection, imputation methods and 5 different
ML models. The performance of the proposed framework has been eval-
uated using the main metrics for classification problem; accuracy, F1-
score, recall, and precision. As a results of the proposed process, our
framework classifies the AD’s onsets with an accuracy of 99%.

Keywords: Machine learning techniques · Alzheimer’s disease ·


Classification problem · Missing data

1 Introduction
Alzheimer’s disease (AD) is a neurodegenerative disorder characterized by dete-
riorating cognitive functions and neuropsychiatric symptoms. AD is a progres-
sive disease, typically resulting in episodic memory loss and behavioral changes.
Symptoms occur because nerve cells (neurons) in parts of the brain involved
in thinking, learning, and memory (cognitive function) have been damaged or
destroyed. The damage to the brain is usually irreversible. Eventually, neurons in
parts of the brain that enable a person to carry out basic bodily functions, such
as walking and swallowing, are affected. Also, individuals become bedbound and
require around-the-clock care, ultimately fatal [2,29]. Around 50% and 75% of
dementia worldwide has been characterized as AD. It is the sixth-leading cause
of death in the United States and the fifth-leading cause of death among Amer-
icans aged 65 and older, with over 6.2 million Americans are affected by 2021,
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
K. Arai (Ed.): FTC 2022, LNNS 561, pp. 381–397, 2023.
https://doi.org/10.1007/978-3-031-18344-7_26
382 C. Theran-Suarez et al.

and it is expected that the number will increase up to 13.8 million by 2060. It is
reported that deaths from stroke, heart disease, and HIV decreased, but reported
death from AD increased about 145% [2]. Research has shown that the progress
of AD may improve if it can be detected early and taking treatment at the ini-
tial stages [14,21]. AD is thought to begin 20 years or more before symptoms
arise [2]. Once the patients are detected with the AD after about 3–10 years, the
patient will eventually die [13]. The detection of AD at its different stages is a
priority for medical practices to provide adequate and accurate treatments.
Machine Learning (ML) techniques have been well used in biomedical
research due to its ability to capture complex patterns from challenging datasets
[18,22]. Several works have been reported where ML algorithms are used to
predict the different stages of Alzheimer’s disease based on different biomark-
ers [3,6,28,34]. For example, the use of boosting-based ML predictive model for
accurate prediction of AD age of onset is investigated in individuals for potential
treatments and therapeutic interventions, where biomarkers were characterized
by extracellular deposits of the betamyloid (Aβ) peptide, the formation of intra-
cellular neurofibrillary tangles of hyperphosphorylated tau protein (p-Tau), and
the impairment of neurons and synaptic connections in the cerebral cortex and
hippocampus. In this case the performance of the model was evaluated using the
Root Mean Square Error (RMSE) obtaining 1.79 as a best result [34].
ML models have been applied to Magnetic Resonances Imaging (MRI) con-
tributing to a faster diagnosis of AD as well as predicting the evolution of the
disease using longitudinal brain MRI features. A framework supervised learning
classifiers in the dementia subject categorization as either AD or non-AD based
on MRI features was proposed obtaining an accuracy of 97.58% [6]. The perfor-
mance of genetic-based ML multivariate strategies in predicting late-onset AD
and to describe the main genetic features associated with the risk of developing
late-onset AD has been studied. Individuals with either a Cognitively Normal
or Alzheimer’s Disease were selected [28]. On the other hand, generalized classi-
fication schema where all stages of the disease are considered simultaneously in
the classification and decision-making processes for the prediction of AD onsets,
where at the same time missing information is handling. As a result an accuracy
of 80.52% was achieve [3].
To study the stages of this disease, the exploration of different biomarkers
has been integrated and studied to improve the diagnosing of AD and its differ-
ent stages [1,19,31]. For example, the eight most common biomarkers used to
predict and identify AD’s onsets are: Main cognitive test (CDR Sum of Boxes,
ADA11, ADAS13, MMSE, RAVLT, Moca, Ecog), MRI ROI (volumes, cortical
thicknesses, surface areas). FDG PET ROI averages(measure cell metabolism,
where cells affected by AD show reduced metabolism), AV45 PET ROI averages
- (measures amyloid-beta load in the brain, where amyloid-beta is a protein that
mis-folds, which then leads to AD), AV1451 PET ROI averages measures tau
load in the brain, where tau is another protein which, when abnormal, damages
neurons and thus leads to AD, DTI ROI measures - measures microstructural
parameters related to cells and axons (cell radial diffusivity, axonal diffusivity)
ML Computational Framework for AD 383

CSF biomarkers, amyloid and tau levels in the cerebrospinal fluid (CSF), as
opposed to the cerebral cortex.
Given the needs to study AD’s diseases and its stages, different frameworks
based on deep learning model were proposed [8,16,26]. But, it is well known
that deep learning model need large amount of data to provide good perfor-
mance. The access of data in the field of neurodegeneratives disorder is limited.
In this paper, a ML computational framework that integrate different classical
machine learning models, as well as imputation methods to handle missing data
is proposed. It is known that ML methods provides good performance upon the
quality of the data. In the experimental results, We provide the performance
of five different ML models for AD’s stages multiclass prediction to discrimi-
nate the stages of Alzheimer’s disease that include cognitive normal (CN), mild
cognitive impairment (MCI), and Alzheimer’s disease (AD). These models are
evaluated in terms of accuracy, precision, and F1-score. Also, the analysis of
three imputation methods to handle the missing value problem is presented. A
general schema that integrates ML model for AD’s stages multiclass prediction
is proposed, performing an average accuracy of 99%.
This paper is organized as follows. Section 2 presents a description of the
ensemble learning model and feature selection. Section 3 describes the machine
learning models used to predict the stages of Alzheimer’s disease. Section 4 pro-
vides a performance analysis of the models using the metrics accuracy, precision,
and F1-score. And Sect. 5 provides the conclusion.

2 Ensemble Learning Model with Feature Selection


We have different techniques to predict the stages of Alzheimer Disease (AD).
Accurate predictions depends on indicators that satisfy the missing data in longi-
tudinal studies. Traditional informed experience by doctors relies in clinical his-
tory data and visual rating of brain scans [3]. These studies focus on treatment
of missing data on particular variables including magnetic resonance imaging
(MRI), fluorodeoxyglucose positron emission tomography (FDGPET), diffusion
tensor imaging (DTI), and Cerebrospinal fluid(CSF). However, a comparison
study of imputation across observations and biomarkers using different imputa-
tion techniques is lacking in other relevant ML research.
We used data provided by the Alzheimer’s Disease Prediction of Longitudinal
Evolution which makes efforts with the Alzheimer’s Disease Neuroimaging Initia-
tive (ADNI) to evaluate the proposed ML Computational Framework. Table 1
describes the features of our dataset, which are divided among the following
groups; cognitive test, MRI, genetic, and CSF. We have a percentage of miss-
ing data (28%) in the dataset, which makes the analysis challenging due to the
missing values that affect the variables with only 20% complete rows.
Figure 1 shows different stages of the analysis using imputation methods for
the missing values. New studies using ML algorithms to do imputations across
cognitive tests in Alzheimer’s has been adopted [32]. For instance, measuring
the imputation error of different techniques and evaluating the impact on clas-
sification performance is well use approach when the missingness is significant
384 C. Theran-Suarez et al.

[12]. We can assert that biomarkers allow us to accurately monitor Alzheimer’s


disease (AD) in the early stages. Therefore, detecting changes needs biomark-
ers that can couple methods from different groups [36]. For instance, finding
differences between AD and MCI patient subjects is challenging.
Our computational framework starts with the process of transforming the
data. Medical data sets have the property of being small in volume but high
dimensional [25]. Then, selecting irrelevant attributes will make the models less
efficient and not increase accuracy in most cases. Consequently, classify this data
is complicated by many factors including dimensionality and the definition of the
prediction objective itself [7]. For instance, ADNI dataset guidelines encourage
the use of many attributes to understand better the lack of data in particular
features.
Data Normalization implies a important part of the process to have data
quality applying also feature selection [33]. This work uses the Boruta algo-
rithm to perform a selection process for relevant features by comparing original
attributes’ importance with importance achievable at random. Boruta algorithm
was used as a wrapper function around random forest algorithm classification to
make a feature selection [24].

Fig. 1. Flowchart computational framework

Table 2 shows important ADNI biomarkers used in the analysis. The Boruta
algorithm consists of following steps [23]:
ML Computational Framework for AD 385

Table 1. ADNI biomarkers

Multimodal source Features Description


Cognitive test EcogPtMem Condition memory
Cognitive test EcogPtLang Condition language
Cognitive test EcogPtVisspat Visuospatial abilities
Cognitive test EcogPtPlan Everyday planning
Cognitive test EcogPtOrgan Everyday organization
Cognitive test EcogPtDivatt Everyday divided attention
MRI Ventricles Network of cavities measurements
MRI Hippocampus Hippocampal volume measurements
MRI WholeBrain Functional magnetic resonance imaging
MRI Enthorhinal Network of perception measurements
MRI Fusiform Network of object visual processing
PET FDG Glucose being used in the brain measurement
PET PIB amyloid Amyloid burden measurement
PET AV45 amyloid Amyloid radiopharmaceutical measurement
PET CDRSB Clinical dementia rating scale sum of boxes
Genetic APOE4 Apolipoprotein involved in healthy blood vessels
CSF Ab1 Cerebrospinal fluid Abeta 1
CSF T-tau Cerebrospinal fluid phosphorylated

1. Extend the information system by adding copies of all variables (the infor-
mation system is always extended by at least 5 shadow attributes, even if the
number of attributes in the original set is lower than 5).
2. Shuffle the added attributes to remove their correlations with the response.
3. Run a random forest classifier on the extended information system and gather
the Z scores computed.
4. Find the maximum Z score among shadow attributes (MZSA), and then
assign a hit to every attribute that scored better than MZSA.
5. For each attribute with undetermined importance perform a two-sided test
of equality with the MZSA.
6. Deem the attributes which have importance significantly lower than MZSA as
‘unimportant’ and permanently remove them from the information system.
7. Deem the attributes which have importance significantly higher than MZSA
as ‘important’.
8. Remove all shadow attributes.
9. Repeat the procedure until the importance is assigned for all the attributes,
or the algorithm has reached the previously set limit of the random forest
runs.

We can observe in Table 2 how the variables are confirmed to be essential for
the ensemble model. Higher means evaluate the importance of each feature, and
MaxImp against the mean confirmed the importance for the model. A Wrapper
386 C. Theran-Suarez et al.

method allows the subset of ADNI features to forward and backward elimination
to be applied. We draw some inferences from previous training data to progres-
sively find irrelevant features. However, We found that all subset attributes have
importance significantly higher than Z score among shadow attributes (MZSA).
Therefore, confirmed to be necessary for future classification tasks.

Table 2. Feature selection results

Features MeanImp MedianImp MinImp MaxImp normHits Decision


DXbl 27.19598 27.30384 26.04956 28.37973 1 Confirmed
EcogPtMembl 40.71083 40.02669 38.20191 43.00827 1 Confirmed
EcogPtLangbl 38.68324 38.93262 33.44646 43.01334 1 Confirmed
EcogPtVisspatbl 32.42445 31.78419 29.12277 37.66257 1 Confirmed
EcogPtPlanbl 31.78993 31.38005 26.93096 38.94865 1 Confirmed
EcogPtOrganbl 34.60144 36.16615 26.53411 39.82835 1 Confirmed
EcogPtDivattbl 32.85267 32.68401 31.01130 34.83885 1 Confirmed
EcogPtTotalbl 43.86562 43.59127 40.01838 47.02400 1 Confirmed
EcogSPMembl 35.27871 34.90675 28.76055 40.14536 1 Confirmed
EcogSPLangbl 23.10888 22.50946 20.94001 28.19491 1 Confirmed
EcogSPVisspatbl 30.24635 30.46110 25.63408 37.30937 1 Confirmed
EcogSPPlanbl 33.47799 34.14166 29.22066 35.86551 1 Confirmed
EcogSPOrganbl 33.79228 34.42899 29.61909 38.05332 1 Confirmed
EcogSPDivattbl 34.31047 34.44297 29.04899 37.31827 1 Confirmed
EcogSPTotalbl 35.41085 34.21652 32.05605 41.49297 1 Confirmed
FAQbl 32.36046 31.54027 29.67207 38.49800 1 Confirmed
MOCAbl 23.81650 23.69553 23.08375 24.88196 1 Confirmed
RAVLTimmediatebl 39.58123 39.26829 35.91302 41.70472 1 Confirmed
Hippocampusbl 53.24983 53.42204 50.95167 55.69558 1 Confirmed
WholeBrainbl 43.35859 43.34009 41.62744 45.29308 1 Confirmed
MidTempbl 32.93979 32.90083 32.23282 33.52757 1 Confirmed
Entorhinalbl 33.64190 33.70358 31.65124 35.46157 1 Confirmed
Ventriclesbl 63.78429 63.85642 60.80048 67.38507 1 Confirmed
Fusiformbl 39.19080 39.13170 38.24670 40.26875 1 Confirmed
ICVbl 44.14537 44.24441 41.49815 46.56004 1 Confirmed
FDGbl 35.06548 34.70946 33.56355 37.51593 1 Confirmed
AV45bl 31.16417 31.01355 29.79520 32.77149 1 Confirmed
CDRSBbl 41.24595 41.46690 39.03736 44.31880 1 Confirmed
ABETAbl 39.02237 39.16411 37.55478 40.75990 1 Confirmed
TAUbl 41.96499 41.51367 37.39941 47.22169 1 Confirmed
PTAUbl 45.88015 46.20613 42.21296 48.47499 1 Confirmed
APOE4 38.97598 38.72304 37.17893 41.76816 1 Confirmed
ML Computational Framework for AD 387

3 Machine Learning Models and Imputation Methods


This section covers the ML techniques used to predict the onsets of Alzheimer’s
disease classification problem. The ML algorithms adopted include Decision
Tree Classifier, XGBoost, Random Forest, Multi-Layer Perceptron (MLP), and
Stochastic Gradient Descent (SGD) Classifier. On the other hand, missing values
were handled using three different imputation methods: median, knn, and linear
regression.
Several classifiers based on statistical, probabilistic, and linear programming
theory have been used to predict onsets Alzheimer’s disease [3,4,15,20]. But
a detailed analysis of the performance of ML has not been reported. Also, a
few authors reported the contribution of imputation methods to address miss-
ing values [3]. For example, two categories (Nondemented and demented) were
considered to analyze the performance of different ML methods, which limited
the problem to a binary classification task [4]. It is well known that the chal-
lenge increases when the problem involves multiples categories or classes. Now,
three different classes (AD, MCI, and NC) were considered to categorize poten-
tial patients with Alzheimer’s disease, where the classifiers were trained using
one feature at a time. In particular, the reported measurements features are
subcortical volume, cortical volume, cortical thickness, surface area, and dif-
ferent volumes of hippocampal structure [15]. Consequently, the analysis and
the performance of the model are based only on accuracy metrics and using
one feature at a time for the training process, meaning that the feature space
is one-dimensional instead of an n-dimensional space, which is a challenge for
classification problem.
This section starts describing the mathematical background of each method
and how these models are learning from the input data and setting the learnable
parameters.

3.1 Decision Tree


Decision Tree for classification problems is the most commons method to esti-
mate discrete-valued target function, where the nodes of the decision tree repre-
sent the learned function. This model has shown to provide good performance to
diagnose medical cases [5]. In literature, different Decision Tree algorithms have
been reported; Iterative Dichotomiser 3 (ID3), C4.5 (which is an improvement
of ID3), C5.0, and CART (Classification and Regression Trees). Our implemen-
tation uses scikit-learn modules, which use an optimized version of the CART
algorithm, so the mathematical description is based on CART algorithms [10,35].
CART is very similar to C4.5, but it differs in supporting numerical target vari-
ables (regression). Like many other classifier methods, decision tree finds the
probability of a sample belonging to a category or class defining splitting rules
[11]. In particular, CART can classify the samples or define the splitting rules
using different measures. For example, the Entropy and Gini index or Gini impu-
rity are the most commons measures used to create the splitting rules. For a
binary classification problem, the Gini measure of impurity is defined as Eq. 1
388 C. Theran-Suarez et al.

G(t) = 1 − p(t)2 − (1 − p(t))2 (1)

where p(t) is the relative frequency of the first class in the node t. Defining G(t)
as Eq. 2

G(t) = −p(t) ln p(t) − (1 − p(t)) ln(1 − p(t)) (2)


an improvement splitting the parent node P into left and right children L and
R is define by the Eq. 3

I(P ) = G(P ) − qG(L) − (1 − q)G(R) (3)


where q is the probability of going to the left of the tree. Other well know
measure to build the splitting rules is the twoing rule, which is based on a direct
comparison of the target attribute distribution in two child nodes.

I(split) = {0.25(q(1 − q)u ) |pL (k) − pR (k))|}2 (4)
k

where k indexes the target class, pL() and pR() are the probability distributions
of the target in the left and right child nodes.

3.2 XGBoost

Different ML are derived from Numerical optimization theory viewing it from


the perspective of numerical optimization in function space. If we consider F (x)
as a well defined function over each point x to be a parameter and looking to
minimize the Eq. 5

F ∗ = argmin Ey,x L(y, F (x)) (5)


F

where L(y, F ) is the loss function, which is defined as negative binomial loglike-
lihood log (1 + e−2yF ), when y ∈ {−1, 1} for classification problem. And Ey,x is
the joing distribution. It is possible to minimize the function F ∗ finding a global
or minimum solution using different numerical methods such as gradient descent.
For multiclass classification problem the gradient-decent boosting algorithm for
K-class is defined in [17] as follow.
K

L({yk , Fk (x)}K
1 )=− yk log (pk (x)) (6)
k=1

where yk ∈ {0, 1}, and k = 1, 2, 3, . . . , K. where we defined pk (x) = P r(yk =


1|x). Transforming the Eq. 6, the Eq. 7 is obtained.
K
1 
Fk (x) = log(pk (x)) − log(pl (x)) (7)
K
l=1
ML Computational Framework for AD 389

the Eq. 7 can be formulate as Eq. 8


exp(Fk (x))
pk (x) = K
(8)

exp(Fl (x))
l=1

doing substitution of (8) into (6) we got the Eq. 9.


K
 k

L({yk , Fk (x)}K
1 )=− yk [Fk (x) − log( exp(Fk (x)))] (9)
k=1 k=1

Taking the first derivative of Eq. (9), K-trees are generated per each iteration
m to predict the current residual for each class based on the probability scale.

3.3 Random Forest


Random Forest is a classifier or predictor defined of M randomized regression
trees. Given a set of training samples Ωn = ((X1 , Y1 ), X2 , Y2 ), . . . , (Xn , Yn )) and
X ∈ χ ⊂ Rp , p ∈ N the objective is to find a classifier, or classification rule
fn : χ −→ R that satisfy the condition of consistency in term of its probability
error (10)

E(fn ) = P[fn (X) = Y ] → E ∗, when , n → ∞. (10)


where E ∗ represent the minimal error. For a binary problem classification we
have

∗ 1 if P[Y = 1|X = x] > P[Y = 0|X = x]
f (x) = (11)
0 otherwise.
random forest classification is defined using a majority vote among the classifi-
cation trees. Each tree casts a vote for the most popular class at input x, and
the class with the most vote wins [9].

3.4 Multi-layer Perceptron


Multi-layer Perceptron (MLP) is defined as a supervised machine learning model
[27,30]. MLP mimics the human brain using multiple layers that contain activa-
tion functions called nodes or neurons, which communicate between layers using
the weighted connections (parameters). There are three main layers; input, inter-
mediate, and output. But, the intermediate layer or hidden layer can be defined
as many as the problem needs, Fig. 2 shows the structure of a MLP.
The objectives of MLP is estimate the set of weights w ∈ R using a set of
training samples (x1 , y1 ), (x1 , y1 ), . . . , (xn , yn ) where xi ∈ Rr , r is the number of
features or characteristic, and yi = 1, 2, . . . , k, k is the number of classes. The
input vector xi fit the model to predict the desired output yi . The information
between layers’ connections (weights estimation) is computed using the Eq. (12)
390 C. Theran-Suarez et al.

Fig. 2. One hideen layer MLP


xh+1
j = yih wij
h
− θh+1 (12)
i

where yih is the response of the ith neuron from the (h − 1)th layer, wij is the
weight from the ith neuron in layer h to the jth neuron in layer h+1, and θjh+1 is
the threshold of the jth neuron in layer h + 1. The output of each neuron is the
results of a activation function (nonlinear function). for example, the sigmoid
activation function is well known and it is defined by the Eq. (13).
1
yjh = h (13)
1 + e−xj
To find the optimum weight values w, the least mean square error (LMS) is
used in output vectors. For a given weight w the LMS is defined as follows
1 h
E(w) = (y (w) − ŷj )2
2 j j,c
h
where yj,c (w) is the output for the node j in the layer h, and ŷj is its real target.
We apply the method gradient descent to minimize the function E(w) .

3.5 Support Vector Machine


Support Vector Machine is a classical ML method well use in classification prob-
lem. In essence, given a set of points (x1 , y1 ), (x1 , y1 ), . . . , (xn , yn ), it finds an
hyperplane that separate the points into different categories or classes minimiz-
ing the margin of error. Given the complexity of this method, the formulation
will be constrained to two classes {−1, +1}. The training samples are defined as
X = {xt , rt } where

Si rt = +1 → x ∈ C1
(14)
Si rt = −1 → x ∈ C2
ML Computational Framework for AD 391

where C1 and C2 are the two classes. The objective is to find a w and w0 such
that

wt xt + w0 ≥ +1 for rt = +1
(15)
wt xt + w0 ≤ −1 for rt = −1
The samples xi closest to the hyperplane are called support vector and define
the margin that needs to be maximized. The distance between the hyperplane
and the samples x is defined as follows

|wt xt + w0 |
(16)
||w||
when rt ∈ {−1, +1} the Eq. (16) is transformed into the Eq. (17)

rt (wt xt + w0 )
(17)
||w||
t t t
Given a threshold ρ > 0, we look for r (w||w||
x +w0 )
≥ ρ. To maximize the margin
1
distance, we need to maximize ρ. Defining a fixed value ρ||w|| = 1, then ||w| . To
maximize the margin, we need to minimize ||w||. This generate an optimization
problem with constrains, which is describe as follows.
1
min ||w||2 subject to rt (wt xt + w0 ) ≥ +1, ∀t (18)
2
Solving the Eq. (18) we find the optimal hyperplane w.

4 Results
The proposed ML Computational framework for Alzheimer’s disease stages clas-
sification has been evaluated using the ADNI dataset, which is a well established
dataset in the study of neurodegenerative disorder disease. In particular, the
ADNI-merge data was considered to perform the evaluation. A number of 1853
subjects were considered, where each subject belongs to a different group (AD,
CN, EMCI, LMCI) as illustrated in Table 3, this provide a total of 13182 of
samples.

Table 3. ADNI dataset components

Group Subjects Age Education years


AD 367 75.02 15.21
CN 512 74.75 16.31
EMCI 353 71.19 15.96
LMCI 621 73.99 15.86
392 C. Theran-Suarez et al.

4.1 Experimental Procedure

This section provides the performance of the proposed framework in terms of


classification. The framework starts to normalize the data (standardization), a
common requirement for different ML models. Otherwise, the models might mis-
behave if the features’ distribution is not close to a standard normal distribution
(µ = 0, λ = 1). Different imputation methods were adopted due to the number
of missing values present in the dataset (28% of missing values). The impu-
tation methods were implemented per category instead of imputing the whole
data, which helps to avoid inserting possible noise into the processing data due
to mixing information from the different categories. Then, the processing data
was divided into training and testing sets with a 80% and 20% for training and
testing, respectively. To provide the information presented in Table 4 a k-fold
(cross-validation) strategy was adopted to evaluate the performance of the mod-
els integrated into the proposed framework and to avoid data contamination
during the training and testing process (use samples for testing when they were
use to train the model).

4.2 Experimental Results

To evaluate the performance of the proposed strategy for AD’s disease stages
classification four different metrics were adopted; Accuracy, Precision, F1-score,
and Recall. These metrics quantify the uncertainty of the model, it means, how
well these model are classifying patients over the four different categories (AD,
CN, LMCI, and EMCI). A good classifier provides high values among a range of
[0,100%]. On the other hand, The Receiver Operator Characteristic (ROC) curve
was computed to show the true positive rate (TPR) against the false positive rate
(FPR), which measures the capability of the models to distinguish the patterns
among classes.
The Table 4 shows the performance of the adopted ML models, which were
trained using the prepossessing data. These results show that the proposed strat-
egy can provide quality data that help the different models to identify and gen-
eralize the learned pattern to classify new data with the same characteristic. The
trained model provide an average of 99% for the four different metrics from 10
k-fold (cross-validation) process.
Figure 3 shows how well the models distinguish among the all positive and
negative class points. For each category the area under the curve (AUC) is equal
to one, which means that the models are perfectly discriminating the samples
among the categories using the prepossessing data. In particular, Fig. 3 shows
the ROC curve for SVM (right figure) and decision tree (left figure).
On the other hand, Table 5 compares the performance of our proposed frame-
work illustrated in Fig. 1 versus the performance of the approach proposed by
Aghili in [3]. Our proposed framework for AD’s classification is superior in our
best case by 19% in term of accuracy, preccision, F1-score and recall.
ML Computational Framework for AD 393

Table 4. Ensemble learning results with imputation

Models Acc(%) Precc (%) F1 (%) Recall (%)


Classifier Imputation
Median 99.76 99.76 99.76 99.76
Decision tree Linear regression 99.56 99.56 99.56 99.56
KNN 99.57 99.57 99.57 99.57
Median 99.90 99.90 99.90 99.90
XG Boost Linear regression 99.78 99.78 99.78 99.78
KNN 99.72 99.72 99.72 99.72
Median 99.92 99.92 99.92 99.92
Random forest Linear regression 99.64 99.64 99.64 99.64
KNN 99.70 99.70 99.70 99.70
Median 99.68 99.68 99.68 99.68
MLP Linear regression 99.21 99.21 99.21 99.21
KNN 99.57 99.57 99.57 99.57
Median 99.00 99.00 99.00 99.00
SVM Linear regression 98.71 98.71 98.71 98.71
KNN 98.03 98.05 98.03 98.03

Fig. 3. ROC curve for SVM (Right Image) and decision tree (Left Image) from the
ensemble learning model.

5 Discussion
The proposed framework can be defined as a sequence of processes that start with
data preparation and end with the implementation of different ML models for a
classification task. The strength of this framework relies on a data-centric app-
roach, which is the crucial step in providing quality data and therefore improv-
ing the performance of well-known ML methods. Consequently, our framework
394 C. Theran-Suarez et al.

Table 5. Performance comparison between the framework proposed in [3] and the
proposed in this work.

Classifier Approach Acc (%) Precc (%) F1 (%) Recall (%)


XG boost Ours 99.92 99.92 99.92 99.92
[3] 80.52 80.62 80.31 80.51
Random forest Ours 99.70 99.70 99.70 99.70
[3] 77.17 76.98 77.95 77.17
SVM Ours 98.71 98.71 98.71 98.71
[3] 74.58 75.98 74.40 74.58

includes feature selection, data transformation, imputation methods, and five dif-
ferent ML models, which were discussed in Sect. 3. In particular, this framework
has been developed for Alzheimer’s Disease classification problem handling 28%
of the missing data values presented in the dataset. The effectiveness of this pro-
posed framework was illustrated in Table 4, showing performance of around 99%
of accuracy, precision, F-1 score, and Recall. In terms of performance, XG-Boost
performs best out of the five implemented ML models, followed by Decision
Tree. Both models achieved the highest performance when the median impu-
tation method was adopted to handle the messing data and improve the data
quality. Comparing the presented results against the ones found in the literature
[3], an improvement of about 19% of accuracy was obtained.

6 Conclusions
A ML computational framework was develop that integrates data processing,
feature selection, imputation methods, and five different ML models. The per-
formance of the proposed framework was evaluated, showing that the framework
provides a strategy capable of handling missing data value, providing a quality
prepossessing data that makes the ML models manage to learn and distinguish
the pattern gathered from the data. The accuracy, F1-score, recall, and precision
metrics were computed, showing that the framework classifies the AD stages with
an accuracy of 99%. Also, this paper shows that classical ML algorithms such
as Decision Tree Classifier, XGBoost, Random Forest, Multi-Layer Perceptron
(MLP), and Support Vector Machine (SVM) can achieve high accuracy upon
the of the quality data. This paper advanced knowledge in AD field comparing
different ML models with the proposed ensemble learning computational frame-
work. As well as, promoting the use of traditional machine learning models for
AD’s stages classification.
The dataset used contains 39 features amounts Cognitive tests, MRI, PET,
and Genetic information. It is well known that the probability of having miss-
ing values increase when the number of feature increase. In this way, there is
still the need to analyze the proposed framework when a significant percentage
ML Computational Framework for AD 395

of data is missing (higher than 28%). In other words, how sensitive is the pro-
posed framework in managing missing values to improve the data quality and
provide high accuracy classification. On the other hand, due to the complexity
of this framework, the design of an interface is required for those unfamiliar or
not knowledgeable in programming. The interface can facilitate the use of this
framework.

Acknowledgment. The authors would like to acknowledge NIH BioMed Grant Num-
ber 150108136 under Florida A&M University, and CI-New: Cognitive Hardware and
Software Ecosystem Community Infrastructure for allow us to run our application in
their infrastructure (Nautilus).

References
1. Biomarkers of Alzheimer’s disease: Neurobiol. Dis. 35(2), 128–140 (2009). Biomark-
ers of Neuropsychiatric Disease
2. Alzheimer’s disease facts and figures: Alzheimer’s & Dementia 17(3), 327–406
(2021)
3. Aghili, M., et al.: Prediction modeling of Alzheimer’s disease and its prodromal
stages from multimodal data with missing values. Int. J. Med. Health Sci. 13(2),
36–40 (2019)
4. Antor, M.B., et al.: A comparative analysis of machine learning algorithms to
predict Alzheimer’s disease. J. Healthc. Eng. 2021, 1–12 (2021)
5. Bae, J.-M.: Clinical decision analysis using decision tree. Epidemiol. Health 36,
e2014025 (2014). https://doi.org/10.4178/epih/e2014025. Korean Society of Epi-
demiology
6. Battineni, G., et al.: Improved Alzheimer’s disease detection by MRI using multi-
modal machine learning algorithms. Diagnostics 11(11), 2103 (2021)
7. Bhagwat, N., Viviano, J.D., Voineskos, A.N., Chakravarty, M.M., Alzheimer’s Dis-
ease Neuroimaging Initiative, et al.: Modeling and prediction of clinical symptom
trajectories in Alzheimer’s disease using longitudinal data. PLoS Comput. Biol.
14(9), e1006376 (2018)
8. Bhatkoti, P., Paul, M.: Early diagnosis of Alzheimer’s disease: a multi-class deep
learning framework with modified k-sparse autoencoder classification. In: 2016
International Conference on Image and Vision Computing New Zealand (IVCNZ),
pp. 1–5 (2016)
9. Biau, G., Scornet, E.: A random forest guided tour. TEST 25(2), 197–227 (2016).
https://doi.org/10.1007/s11749-016-0481-7
10. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regres-
sion Trees. Routledge (2017)
11. Buntine, W., Niblett, T.: A further comparison of splitting rules for decision-tree
induction. Mach. Learn. 8(1), 75–85 (1992)
12. Campos, S., Pizarro, L., Valle, C., Gray, K.R., Rueckert, D., Allende, H.: Evaluat-
ing imputation techniques for missing data in ADNI: a patient classification study.
In: CIARP 2015. LNCS, vol. 9423, pp. 3–10. Springer, Cham (2015). https://doi.
org/10.1007/978-3-319-25751-8 1
13. Chávez-Gutiérrez, L., et al.: The mechanism of γ-secretase dysfunction in familial
Alzheimer disease. EMBO J. 31(10), 2261–2274 (2012)
396 C. Theran-Suarez et al.

14. Crous-Bou, M., Minguillón, C., Gramunt, N., Molinuevo, J.L.: Alzheimer’s disease
prevention: from risk factors to early intervention. Alzheimer’s Res. Therapy 9(1)
(2017). https://doi.org/10.1186/s13195-017-0297-z
15. Fan, Z., Fanyu, X., Qi, X., Li, C., Yao, L.: Classification of Alzheimer’s disease
based on brain MRI and machine learning. Neural Comput. Appl. 32(7), 1927–
1936 (2019)
16. Feng, Q., Zhu, D., Yang, J., Li, B.: Multisource hyperspectral and lidar data fusion
for urban land-use mapping based on a modified two-branch convolutional neural
network. ISPRS Int. J. Geo-Inf. 8, 28 (2019)
17. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann.
Stat. 29(5), 1189–1232 (2001). http://www.jstor.org/stable/2699986. Institute of
Mathematical Statistics. ISSN 00905364
18. Gao, H., Li, Y., Zhang, Z., Zhao, W.: Editorial: machine learning used in biomedical
computing and intelligence healthcare, volume i. Frontiers in Genetics, 12 May 2021
19. Humpel, C.: Identifying and validating biomarkers for Alzheimer’s disease. Trends
Biotechnol. 29(1), 26–32 (2011)
20. Joshi, S., Shenoy, D., Simha, G.G.V., Rrashmi, P.L., Venugopal, K.R., Pat-
naik, L.M.: Classification of Alzheimer’s disease and Parkinson’s disease by using
machine learning and neural network methods. In: 2010 Second International Con-
ference on Machine Learning and Computing, pp. 218–222 (2010)
21. Kalaria, R.N., et al.: Alzheimer’s disease and vascular dementia in developing
countries: prevalence, management, and risk factors. Lancet Neurol. 7(9), 812–826
(2008)
22. Koohy, H.: The rise and fall of machine learning methods in biomedical research.
F1000Research, 6:2012, January 2018
23. Kursa, M.B., Jankowski, A., Rudnicki, W.R.: Boruta-a system for feature selection.
Fundamenta Informaticae 101(4), 271–285 (2010)
24. Kursa, M.B., Rudnicki, W.R.: Feature selection with the Boruta package. J. Stat.
Softw. 36, 1–13 (2010)
25. Li, D.-C., Liu, C.-W., Hu, S.C.: A fuzzy-based data transformation for feature
extraction to increase classification performance with small medical data sets. Artif.
Intell. Med. 52(1), 45–52 (2011)
26. Mahendran, N., PM, D.R.V.: A deep learning framework with an embedded-based
feature selection approach for the early detection of the Alzheimer’s disease. Com-
put. Biol. Med. 141, 105056 (2022)
27. Murtagh, F.: Multilayer perceptrons for classification and regression. Neurocom-
puting 2(5–6), 183–197 (1991)
28. De Velasco Oriol, J., Vallejo, E.E., Estrada, K., Peña, J.G.T., The Alzheimer’s
Disease Neuroimaging Initiative: Benchmarking machine learning models for late-
onset Alzheimer’s disease prediction from genomic data. BMC Bioinformat. 20(1),
1–17 (2019)
29. Reitz, C., Mayeux, R.: Alzheimer disease: epidemiology, diagnostic criteria, risk
factors and biomarkers. Biochem. Pharmacol. 88(4), 640–651 (2014)
30. Rosenblatt, F.: The perceptron: a probabilistic model for information storage and
organization in the brain. Psychol. Rev. 65(6), 386 (1958)
31. Sharma, N.: Exploring biomarkers for Alzheimer’s disease. JCDR 10, KE01 (2016)
32. Shishegar, R., et al. Using imputation to provide harmonized longitudinal measures
of cognition across AIBL and ADNI. Sci. Rep. 11(1), 1–11 (2021)
33. Singh, D., Singh, B.: Investigating the impact of data normalization on classifica-
tion performance. Appl. Soft Comput. 97, 105524 (2020)
ML Computational Framework for AD 397

34. Vélez, J.I., et al.: A comprehensive machine learning framework for the exact pre-
diction of the age of onset in familial and sporadic Alzheimer’s disease. Diagnostics
11(5), 887 (2021)
35. Wu, X., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37
(2007)
36. Yang, W., et al.: Independent component analysis-based classification of
Alzheimer’s disease MRI data. J. Alzheimer’s Dis. 24(4), 775–783 (2011)
Critical Assessment of Current State of the Art
in Wearable Sensor Nodes with Energy
Harvesting Systems for Healthcare Applications

Alhassan E. Alattar1 , Ahmed Elkaseer2,3,4(B) , Steffen Scholz2,4,5 , and Saeed Mohsen1


1 Electronics and Communications Engineering Department, Al-Madina Higher Institute for
Engineering and Technology, Giza 12947, Egypt
[email protected]
2 Institute for Automation and Applied Informatics, Karlsruhe Institute of Technology, 76344
Karlsruhe, Germany
[email protected]
3 Department of Production Engineering and Mechanical Design, Faculty of Engineering, Port
Said University, Port Fuad 42526, Egypt
4 Karlsruhe Nano Micro Facility, Karlsruhe Institute of Technology, 76344
Eggenstein-Leopoldshafen, Germany
5 Future Manufacturing Research Institute, College of Engineering, Swansea University,

Swansea SA1 8EN, UK

Abstract. This paper presents recent advances of different architectures of wear-


able sensor nodes for healthcare monitoring applications, and pays special atten-
tion to three examples. The first node examined was originally developed to mea-
sure two vital parameters of human body; i.e. body temperature and heartbeats.
The second node incorporated sensors to detect a blood oxygen level to the archi-
tecture of the first node, while the third node enhanced the former architecture
by adding a body acceleration sensor. The three proposed nodes are powered by
different energy harvesting systems. The harvesting of the energy started by using
only a photovoltaic cell to power the first and second nodes, then a thermoelectric
generator is added with the photovoltaic cell to power the third node. In the harvest-
ing operations, the utilized energy storages varied between Li-ion rechargeable
batteries and super-capacitors. The super-capacitors found to have a lower energy
density than the batteries. In the three nodes, the vital data transmission is varied,
via a Bluetooth Low Energy (BLE) and a Wi-Fi. Whilst BLE is preferred for a
personal use, the Wi-Fi is needed in hospitals. The conducted review reveals that
although, the first node has the longest lifetime i.e. capacity of energy storage
compared with the other two nodes, the third node is found more sustainable in
terms of energy harvester type. Finally, considering the performance of the three
nodes, a tradeoff between the power consumption and the complexity of each
proposed architecture is emphasized.

Keywords: Wearable sensor node · Healthcare applications · Energy harvesting


system · Sustainability factor · Internet of Things · Photovoltaic

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 398–412, 2023.
https://doi.org/10.1007/978-3-031-18344-7_27
Critical Assessment of Current State of the Art 399

1 Introduction
The internet of things (IoT) technology has recently contributed to the medical field, and
new healthcare devices termed “smart wearable sensor nodes”. Such smart wearable
devices and sensors help patients to monitor their vital conditions through their own
smart mobile phones, which enable proper displaying and smart analysis of the measured
data, and reduced the consumed power. Presently, these nodes are utilized by hospitals
and doctors, to follow the health status of the patients directly and remotely, without
any need to a manual measurement, and store the vital data of patients for later needs.
These data is sent via different wireless technologies, such as Wi-Fi, or Bluetooth, etc.
One of the challenges which face wearable devices is the continuous measurement of the
data, which could be unnecessary, and consumes more power for insensible reasons. So,
software algorithms are developed to discretely measure the data every a defined period of
time, and switch to a sleep/off mode, when the data measurement is unneeded. However,
securing the used energy sources is a challenge, and thus energy harvesting techniques
has appeared; to simultaneously collect energy required to operate the devices/nodes,
without the need to a manual recharge. The most preferred energy sources to use are the
solar and thermal ones which depend on the temperature difference between a human
body and an ambient temperature. On the other hand, wearable nodes are utilized for
seamless and remote diagnosing of patients, where the sensor data can be transmitted
automatically to hospitals in case of critical vital data or even for further analysis by
specialists.
In the literature, three types of wearable sensor nodes were proposed to monitor
the vital signs of patients. In [1], the node measured the heart rate and body tempera-
ture, transmitted by a BLE. In [2], the node measured the heart rate, temperature, and
oxygen concentration in the blood (SpO2 ), transmitted by a Wi-Fi. In [3], the node
measured the heart rate, temperature, oxygen concentration and acceleration, and also
connected by a BLE. The energy storage of these nodes varied between batteries [1, 2]
and super-capacitors [3]. These storages can be recharged by energy harvesting systems
using only a photovoltaic (PV) cell [1, 2] or a hybrid harvester comprises a PV cell
with a thermoelectric generator (TEG) [3]. In [4], a sensor node powered by a BQ25570
energy harvester, used to increase energy conversion efficiency via automatic setting of
its internal input impedance to achieve maximum power transfer. This node can harvest
energy using a solar panel. It can be worn as a ring in the finger and used to measure
the blood oxygen level and employs an accelerometer to measure movements during
measurements and correct movement artifacts. In [5], the reported node is a hand wear-
able device and had a BQ27441 fuel Gauge chip which performs the same function.
This device contains a blood pressure sensor, a 9-axis motion sensor, a microphone, an
ultra-low-power ECG/EMG sensor with a bio-impedance analog front-end. The device
also empowered with a low-power galvanic skin response front-end which was used to
collect vital data and utilize artificial intelligence with these data for stress detection,
where the data can be transmitted by a Bluetooth. In [6], a wearable sensor node was
designed based on a maximum power point tracking (MPPT) algorithm, was used with
employing a solar panel for harvesting energy. This node can be worn as a smart patch
sewed on clothes. It measures the echocardiogram (ECG) using an ECG acquisition cir-
cuit, which utilizes AD8232 chip for ECG data acquisition. It uses a Bluetooth module
400 A. E. Alattar et al.

for transmission of data to be shown on a personal computer (PC) or a mobile phone.


The node in [7] utilized the MPPT algorithm which was also applied for the node in [6],
but a LT3119 harvester was used to step down the generated voltage to a suitable level
to recharge a battery. The proposed architecture included a battery protection to turn off
the power when the battery voltage decreases. It described an energy-harvesting system
which was integrated into clothing and can be used by mountain rescuers; due to their
use of power-consuming devices, such as global positioning system (GPS) receivers
and radios which consume energy due to operation of their communication modules.
In [8], a DC-DC converter chip was used for energy management, which utilized a
LTC3108 IC to power an ordinary hand watch using thermoelectric energy harvested
from the human body heat. In [9], a wearable IoT-cloud based health monitoring sys-
tem (WISE) was developed to enable people to register on an online server using their
smart wearable devices which support this technology, using radio frequency identifi-
cation (RFID) card as an identity for every user. It avoided utilizing a mobile phone as
a processing unit by sent the vital data directly to the server. In [10], a wireless sensor
system measured blood pressure depending on electrocardiography (ECG) and photo-
plethysmography (PPG), where pulse arrival time and heart rate were used to calculate
the blood pressure, utilizing two sensors: one sensor to collect an ECG signal and the
other sensor to measure PPG signals. This system can be worn on the chest. Also, the
blood pressure was measured using a mercury sphygmomanometer, also used handcuffs,
similar to traditional devices. In [11], a node was proposed to harvest power by solar
cells employing a MPPT to extract maximum power. It’s made with a flexible printed
circuit board (PCB) which can be compatible with a body contour as well as a flexible
solar panel. It’s integrated with an on-board an ADXL362 accelerometer to measure
body acceleration, a MAX 30205 temperature sensor for measuring temperature and a
PPG sensor consists of a APDS-9008 light photo sensor and a MCP6001 amplifier to
measure a heart rate. The data was transmitted using a BLE module to a smartphone. In
[12], a sensor system had an energy harvesting system that harvested an energy from the
solar and thermoelectric energy based on a BQ25570 low-power IC, which exploited a
boost converter, a MPPT, a protection from over-voltage with under-voltage and a buck
converter. It had an accelerometer, a temperature sensor, a microphone and a camera,
as well as a display based on an e-paper technology and a NFC/RFID tag transceiver.
The node reported in [13] had an accelerometer to measure body tilts. For measuring
the heart rate, it exploited electrodes with a conditioning circuit to improve the cardiac
signal quality, then the heart rate was measured by the difference between peaks. The
respiration activity was measured by the inductance variation of inductive sensors which
occurred due to a variation of coil dimensions with rib cage movements during inhala-
tion and exhalation. To measure the positions and tilts of the chest, an accelerometer
was used. This node harvested an energy from solar cells and sent a data wirelessly
to an external reading unit, which was connected to a PC. In [14], the wearable node
measured ECG and PPG signals, employing a solar harvester to harvest solar energy
recharging a battery and a BLE module to transmit the data to a smartphone. The node
proposed in [15] a PPG sensor which can be worn on earlobe, using a button cell battery,
but it consumed 1.5 h which was a short period and had no harvesters. It exploited a
Bluetooth to transmit data. The node in [16] presented a miniaturized sensor platform
Critical Assessment of Current State of the Art 401

which measured a sweat pH using a smart textile that changed its colors according to a
pH value with a color sensor, as well as the body temperature. The data from this node
was sent through a Bluetooth to a PC.
This paper presents a comprehensive review about the reported sensor nodes in
the literature. Nevertheless, three nodes will be paid special attention in the following
discussion. The three nodes will be compared with respect to different aspects, for
instance power consumption, lifetime, charging time, and energy harvesting. The paper
is organized as follows: Sects. 2, 3, and 4 present the architectures description of the three
nodes. Section 5 discusses the results of the three nodes. Section 6 presents suggestions
to improve the three wearable nodes. Finally, the conclusion is provided in Sect. 7.

2 The First Node


This node measures healthcare data and it is wearable on the wrist. So, the node has
sensors to be placed on human fingers. This node mainly workes on an Arduino Lily-
Pad which comprised an ATMega328P microcontroller manufactured by Atmel®; to
process the healthcare data from sensors, consuming current of 6.1 mA. To measure
heart rate, an SEN-11574 heartbeat sensor manufactured by SparkFun Electronics® is
used, consuming 3.1 mA. The MAX30205 manufactured by Maxim® is used to mea-
sure a body temperature, consuming 1.3 mA. An HM-10 BLE module manufactured
by Texas Instruments®, is used to transmit data to mobile phones in a range of 100 m,
consuming 9.15 mA. The power source of this node uses a 4.8 V Li-ion battery with a
capacity of 4800 mAh, where its voltage is reduced to 3.3 V, as the operating voltage of
the microcontroller and sensors, by a MCP1700 low-dropout (LDO) regulator, drawing
current of 1.6 µA. The battery is recharged by a solar energy harvester, comprised a
flexible amorphous MPT 4.8–7.5 photovoltaic panel which has an area of (7.2 cm ×
6 cm) which outputs a maximum voltage of 4.8 V and power of 240 W at 1000 W/m2 ,
with a TP4056 charging controller from TPOWER Semiconductor®, which is drawn 2
µA. The used microcontroller outputs the measurements of the physiological data “heart
rate and temperature” to be demonstrated on a display, and sent the measurements to a
mobile phone through a Bluetooth or a cloud through the internet to an online service.

2.1 Work Mechanism


Figure 1 illustrates the block diagram of the first node [1]. This node has an energy
harvester which collects energy from the sun and stored it in the battery, and powers the
node in absence of light. The wearable node has sensors to measure body temperature
and heart rate, and the measurements are delivered to the microcontroller to process the
sensors data, and then they’re sent through the BLE, which is paired with a mobile phone
to visualize the data on the phone. The setup of this node is shown in Fig. 2 [1].

2.2 Software Implementation


Due to unnecessity of measurement of the vital data in some times, which may waste
some power, a software algorithm was developed, so the node measures for predefined
402 A. E. Alattar et al.

Energy Harvester Mobile


Phone
Flexible PV Panel Charging Controller

Rechargeable BaƩery

Microcontroller Unit

Body Heart Bluetooth


Temperature Pulse Low Energy
Sensor Sensor (BLE)

Wearable Sensor Node

Fig. 1. Block diagram of the first node [1].

Fig. 2. The setup of the first wearable node [1].

intervals and sleeps the rest of the time, for a definite work cycle. A time of 15 min was
chosen as a repeated cycle, where the device is initialized to wake up and measure the
data for 15 s (0.25 min), then it sleeps for 885 s (14.75 min), and so on every 15 min.
This algorithm was developed using C programming language. The data is transferred
by a Bluetooth to a mobile phone to be visualized using HMBLE Terminal application,
which can be downloaded through Google Play Store.

2.3 Energy Consumption


In case of working without the energy harvesting system, this node works on a voltage
of 3.3 V and draws a current of 19.65 mA in wake-up mode, and 1.15 mA in sleep
mode. If a time period of 1 h is assumed, the average current drawn in 1 h is 1.458 mA.
The node consumes an average power of 4.811 mW and 17.32 J. The battery capacity
is 4800 mAh, the lifetime is 137.17 days. Practically, it consumes 4.97 mW. In case of
working using the energy harvesting, the average generates power by the photovoltaic
panel is 230 mW. The panel is illuminated for 6 h daily, the generated energy is 4968 J
per day. The battery stores 20.16 Wh, which is equivalent to 72576 J, and so the charging
time is 14.6 days, which is shorter than the lifetime “node operation time”. Assuming
Critical Assessment of Current State of the Art 403

an illumination of 1 h per day, the energy generated per day is 828 J, So the charging
time is 87.65 days, which is less than the lifetime (137.17 days). So, a sustainable work
performance of the device is achieved, even in short illumination times. Assuming used
of a lower voltage and capacity battery of 3.7 V and 280 mAh, its lifetime is 192 (8 days).
This battery stores an energy of 1.036 Wh. The power generated by the PV panel is 230
mW (0.23 W), so assuming that the panel is illuminated 1 h daily, the generated energy
per day is 0.23 Wh (828 J). The time required to recharge the battery was 4.5 days.

3 The Second Node


This wearable node utilized internet of things (IoT) technology to facilitate a measure-
ment of vital signs, and sent the patient vital data directly monitored by hospital and
doctors. It’s wearable and rechargeable, so it offers more mobility to patients. This node
is worn on the upper arm, which makes it more comfortable and frees the hands of the
patients. This node measures healthcare data, and sends it to an online server, through
a Wi-Fi connection. It works on a NodeMCU board, which works as a microcontroller
to process the data, and has a Wi-Fi module, of 400 m range, to send data to Ubidots
server. The NodeMCU is a board comprises an ESP8266 microcontroller, with an ESP-
12 module to enable Wi-Fi connectivity. This board consumes 70 mA in wake-up mode
and 0.02 mA in sleep mode. This node uses a MAX30205 sensor to measure body tem-
perature, and draws a current of 1.3 mA. A MAX30100 pulse oximeter sensor is used to
measure heart rate and blood oxygen saturation (SpO2 ), drawing 1.2 mA current. The
sensors and the NodeMCU operate at 3.3 V. The harvesting unit comprised 2 amorphous
flexible photovoltaic panels (MPT 4.8–7.5) of a total area 6 cm × 14.4 cm, with a total
maximum current of 100 mA and a voltage of 4.8 V. It uses a TP4056 charging con-
troller, which draws 2 µA and its input ranges in 4.5–6 V. To supply the node, an 18650
Li-ion rechargeable battery, with a capacity of 3800 mAh, nominal voltage of 3.7 V
and fully-charged voltage of 4.2 V is used. The voltage is regulated to 3.3 V using an
MCP1700 LDO regulator.

3.1 Work Mechanism


Figure 3 shows the block diagram of the second node [2]. The harvester has two photo-
voltaic panels which collect a solar energy and then store it in the battery. The temperature
sensor measures the body temperature, while the pulse oximeter sensor measures the
blood oxygen level and heart rate, then the vital data is sent to Ubidots cloud service,
to be visualized on a graphical user interface (GUI). The server can be accessed by a
doctor or anyone who has a credential to Ubidots service to diagnose and monitor the
patient remotely. The setup of this node is shown in Fig. 4 [2].

3.2 Software Implementation


A cycle of one minute is assigned for the node to work, where it wakes up for 5 s, and
sleeps for 55 s. In one hour, the wake-up time is 300 s (5 min) and the sleep time is 3300
s (55 min). This node is initialized in wake-up mode for 5 s, where the sensors measures
a body temperature, a blood oxygen percentage and heartbeats, then it sends the vital
data to the server, through the Wi-Fi module, then it sleeps for 5 s.
404 A. E. Alattar et al.

Energy Harvester

Flexible PV Panels Charging Controller Computer

Rechargeable BaƩery

Microcontroller Unit

Body Pulse
Wi-Fi
Temperature Oximeter
Module
Sensor Sensor
Online Server

Wireless Wearable IoT Sensor Node

Fig. 3. Block diagram of the second node [2].

Fig. 4. The setup of the second wearable node [2].

3.3 Energy Consumption

The node works on a voltage of 3.3 V, and consumes 72.5 mA and 0.02 mA in wake-up
and sleep modes, respectively. Assuming a period of 1 h, the wake-up time is 300 s and
sleep time is 3300 s. The average consumed current in one hour is 6.06 mA, consuming
an average power of 19.99 mW. Practically, it consumes 20.23 mW. The battery stores
a power of 15.96 Wh, so the lifetime of the battery is 798.39 h (33.26 days). In case
using the harvester, the weather is assumed sunny for 2 h where a power of 414 mW s
generated, cloudy for 2 h where a power of 266 mW is generated, and the rest of the
day was night. Thus, the generated power per day is 1.36 Wh. So, the charging time is
11.73 days. The average power generated in one hour is 0.34 Wh. When an illumination
of one hour is assumed, the charging time is 46.94 days. This time is more than the
Critical Assessment of Current State of the Art 405

time of recharging, which means that the node may not be sustainable in short daily
illumination times.

4 The Third Node

Figure 5 demonstrates the block diagram of the third node [3]. It is a wearable sensor
node which is wearable on a hand wrist and its sensors are placed on the fingers, it
works on an Arduino Lilypad board and uses ATMega328P microcontroller. It utilizes
the sensor MAX30205 to measure a body temperature, MAX30100 to measure a blood
oxygen and heart rate. An ADXL335 accelerometer is added. It uses the HM-10 BLE
module to send the vital data to a mobile phone to be shown on it using an application,
but the used application is designed using MIT App Inventor. To power this node, two
(100 F-2.7 V) super-capacitors from AVX® are used, connected in series to achieve
a capacitance of 50 F and a voltage of 5.4 V. The LDO regulator MCP1700 which is
used to regulate the voltage of the super-capacitors, as the operating voltage of the node,
3.3 V. For harvesting energy, the harvester is hybrid of two energy sources, a photovoltaic
panel, and a thermoelectric generator (TEG) module (SP1848–27145) [17]. Both hybrid
harvesting systems are used to generate an electric power from a body heat of the patient,
where a thermal-conducting coat is used between the module and the skin. To regulate
the voltage output of the PV cells and TEG module, a DC-DC converter based on low-
power LTC3105 is used [18]. The health information revealed includes heart beats, blood
pressure, oxygen blood level, temperature, and acceleration. The continuous monitoring
may be needed for some groups of patients. In hospital patients who aren’t in the intensive
care and the quarantined patients need monitoring as mentioned before. The home-
quarantined patients can be monitored, where it’s suggested for the health authorities or
hospitals to open a server where the home-quarantined patients can register their sensor
nodes which are enabled to do so, and the health authorities can monitor their health and
the emergency services can be called automatically in critical cases.

Hybrid Energy Harvester


Mobile
Phone
Flexible PV Panel Thermoelectric Module

DC-DC Converter

Super-Capacitors

Pulse Body
Oximeter Temperature
Sensor Sensor
Microcontroller
Unit
Bluetooth
Accelerometer
Low Energy
Sensor
(BLE)

Wearable Medical Sensor System

Fig. 5. Block diagram of the third node [3].


406 A. E. Alattar et al.

4.1 Work Mechanism


The harvester collects an energy from the sunlight and a patient’s body heat to recharge
the super-capacitors. Thermoelectric generators works by Seebeck effect, where Seebeck
coefficient (α) differs between materials. The generated voltage depends on the material
and the temperature difference (T ), where V = αT . The node has sensors, where
the temperature, heart rate, oxygen percentage in blood and speed are measured through
the sensors. The data are processed by the microcontroller to be sent through the BLE
module and visualized on an Android mobile phone application. Figure 6 shows the
setup of this node [3].

Fig. 6. The setup of the third wearable node [3].

4.2 Software Implementation


Similarly [1, 2], this node works in wake-up and sleep modes to save energy consumed
due to unnecessary measurement of the vital data. A cycle of 20 min is chosen, where
the system wakes up for 10 s, then sleeps for 1190 s. The algorithm of the software is
developed using C programming language. The system is initialized in wake-up mode,
then the vital data “body temperature, heart rate, blood oxygen level, and accelera-
tion” are measured through the sensors and sent through Bluetooth to a mobile phone.
The Android application is implemented using MIT App Inventor. The application has
options to scan, connect and disconnect devices. It has text boxes to show the data of
each sensor separately. Firstly, the app scanns for available devices, checks BLE module
status, the data is split to three parts, then the data are shown in separate textboxes.

4.3 Energy Consumption


The system operates on 3.3 V. It drawn 18.1 mA in wake-up mode and 0.45 mA in sleep
mode, with an average current of 0.6 mA. It consumes an average power of 1.97 mW. The
super-capacitors has a capacity of 50 F and stored an energy of 352.75 J (approximately
0.097 Wh). The consumed energy per one cycle is 2.36 J, the capacitors power the
node for approximately 149 cycles, and so the lifetime is 2980 min (2.07 days). The
Critical Assessment of Current State of the Art 407

practical consumption is 2.13 mW. So, the lifetime is 46 h (1.91 days). The harvester
comprises two methods to generate power: photovoltaic cell with thermoelectric (TEG)
generator. Assuming a PV cell which had an area of 4.32 cm2 , conversion efficiency
of 7%, irradiated by 1000 W/m2 for 6 h a day, then it generates 302.4 mW, so the
generated energy in 6 h is 6531.84 J, and the battery got charged with the PV cell only
in 0.054 days (1.29 h). For thermoelectric generator, it’s assumed that the temperature
of the body is 37 °C and the ambient temperature is 17 °C, so the temperature difference
is 20 °C. The TEG module generates a power of 100 mW. It’s assumed that it works
for 6 h, it generates 2160 J per day, and so recharges the capacitors in 0.16 days (3.9
h). The simultaneous work of the two energy generators is assumed, they are generated
together 402.2 mW, and so they generate 8687.52 J per day, and so the recharging time
is 0.04 days (approximately 0.97 h). Studying the case for continuous charging, the PV
cell generated in one hour (0.3022 Wh), the TEG module generates 0.1 Wh and both
generators produce 0.4022 Wh, so the charging time is for PV only 0.32 h (or 0.32 days).
While for TEG only is 0.97 h (or 0.97 days if generating for 1 h per day) and for both
PV and TEG together is 0.24 h (or 0.24 days if generating for 1 h per day). It’s noticed
that the charging time is very short and less than the lifetime, which proves sustainable
work performance.

5 Discussion

The node in [1] consumes an average power of 4.811 mW, while the node in [2] consumes
19.99 mW and the node in [3] practically consumes 2.13 mW. Looking at the lifetimes,
the lifetime in [1] is 137.17 days, lifetime in [2] is 33.26 days, and lifetime in [3] is
1.91 days. For the charging times, [1] is charged in 14.6 days, [2] is charged in 11.73 days,
while in [3] is charged in 0.04 days. It’s noticed from the previous data that the node [3]
is the least power consumption, [1] is the longest lifetime and [2] is the least charging
time. Figure 7 shows a comparison of the specifications of the three nodes. For activity
times, it’s noticed in [1] that it measures the vital data for 15 s and sleeps for 885 s,
which indicates four times an hour. In [2], it measures for 5 s and sleeps for 55 s, which
is equivalent to 60 times an hour. The node in [3] measures for 10 s and sleeps for 1190
s, which means 3 times per hour. The most precise node is [2]; because it’s the most
active node among others. To know the most sustainable node, looking at the lifetime and
recharging time isn’t enough. A figure of merit can be taken by taking a ratio between
the discharging time (lifetime) to charging time, here it is termed a sustainability factor,
denoted by Q. In [1], Q = 137.17/14.6 = 9.39, while in [2], Q = 33.26/11.73 = 2.83
and in [3], Q = 1.91/0.04 = 47.5. The sustainability factor is the highest in [3], so it’s
the most sustainable. The method used to transmit data differs. In [1, 3], the vital data
are transmitted by a BLE to a mobile phone app, where a Bluetooth terminal app is used
in [1], but in [3] a manually-designed app is used, where the data are shown on a mobile
phone. In [2], the used method to transmit vital data is Wi-Fi internet connection, to
be accessed through a cloud service, by someone who has a credential to the service.
Table 1 shows comparison between the three wearable sensor nodes. Table 2 shows a
comparison between sensor nodes in [4–6, 9–11].
408 A. E. Alattar et al.

Fig. 7. Comparison of the specifications of the three nodes.

6 Suggestions
For harvesting energy, it’s preferred to vary energy sources, not to use only photovoltaic
energy; to reduce the charging time as possible and to ensure continuous charging if
some sources are unavailable. In [3], it uses a photovoltaic source with a thermal source,
but in [2] it uses two photovoltaic sources, which completely stops the charging in
absence of illumination. However, in [3], the use of a thermoelectric source ensures
continuity of charging even if the PV panel isn’t illuminated. Regarding energy storages
in [1, 2], it uses Li-ion battery, but in [3] it uses super-capacitors of low capacity, so
these capacitors aren’t preferred. The preferred source is the Li-ion batteries. Regarding
charging circuits, the preferred is the DC-DC boost converter; to produce the maximum
power. For precision of measurement, it’s preferred to increase activity times, as in [1],
the algorithm can be modified to measure for 5 s and sleep for 295 s, to measure every
5 min. A similar modification can be done in [3] to make the vital data measurements
for 5 s and sleep for 595 s, to measure every 10 min. A button can be added to measure
at pressing it when a measurement is needed even in the sleeping time. Regarding the
wireless communication, a BLE is the best when the node is used personally, but a Wi-Fi
connection is better when the device is used in hospitals. BLE and Wi-Fi may be used
together, where the Wi-Fi enables access of the hospital to the patient’s data, and BLE
enables the patient to access to his own data, or the patient can be given a credential
to access his own data only, according to the use. The use of Wi-Fi to send data to a
server is useful in epidemics such as COVID-19 pandemic, where the data are sent to
a server which can be accessed by doctors. The most advantage of using Wi-Fi is the
ability to use for home-quarantined patients, where their vital data are monitored by
hospitals to help the critically ill patients as soon as possible. An option can be added
is to automatically alert the doctors when the device measures a definite value of heart
rate, blood oxygen level or temperature. Regarding wearing place, [1, 3] are wearable
on the wrist and the sensors are placed on the fingers, but [2] is wearable on the upper
arm. The best selection is to wear on the upper arm; to make the patient’s hand free.
Regarding adding other vital data, the sensor node that measures the most vital data is
[3], which measures heart rate, blood oxygen level, temperature and acceleration. It’s
suggested to add a blood pressure sensor to measure the blood pressure.
Critical Assessment of Current State of the Art 409

Table 1. Comparison between the three wearable sensor nodes.

Reference [1] [2] [3]


Sensors Temperature sensor and Temperature sensor Temperature sensor,
heart pulse sensor and pulse oximeter pulse oximeter sensor,
sensor accelerometer sensor
Vital data Temperature and Temperature, heart Temperature, heart rate,
heart rate rate and blood blood oxygen level and
oxygen level acceleration
Energy storage Li-ion battery Li-ion battery Super-capacitors
4.2 V 4800 mAh 4.2 V 3800 mAh 5.4 V 50 F
Energy harvester One photovoltaic cell Two photovoltaic Photovoltaic cell
with TP4056 charging cells in series with parallel with
controller TP4056 charging thermoelectric
controller generator module and
DC-DC boost converter
Power 4.97 20.23 2.13
consumption
(mW)
Theoretical 137 33.26 1.91
lifetime (days)
Charging time 14.6 11.73 0.04
(days)
Times of activity 4 60 3
per hour
Wake-up-sleep 15 1 20
period (minutes)
Active time per 15 5 10
Period (seconds)
Sleep time per 885 55 1190
period (seconds)
Sustainability 9.38 2.84 47.75
factor
Wireless BLE 100 m Wi-Fi 400 m BLE 100 m
Technology
Wearability Wrist with sensors on Upper arm Wrist with temperature
fingers and pulse oximeter
sensors on fingers
Data monitoring Android phone Online server Android phone
application application
410 A. E. Alattar et al.

Table 2. Comparison between sensor nodes reported in the literature.

Reference [4] [5] [6] [9] [10] [11]


Sensors Pulse oximeter 9-axis motion ECG data Heartbeat ECG sensor, PPG sensor,
sensor, sensor, pressure acquisition sensor, blood PPG sensor Accelerometer
accelerometer sensor, circuit pressure sensor, gyroscope
sensor microphone, sensor,
ECG AFE temperature
sensor
Vital data Heart rate, blood Stress ECG Heart rate, Blood PPG,
oxygen level and blood pressure acceleration,
acceleration pressure, angular velocity
temperature
Energy Li-ion battery LiPo battery Li-ion Li-ion battery N/A Super-capacitors
storage 3.7 V 40 mAh 120 mAh battery 3.6 V 120 5.4 V
2.4 V 240 mAh 12.5 F
mAh
Energy Solar energy Solar energy Solar N/A N/A Photovoltaic
harvester harvester harvester and energy panel parallel
TEG energy harvester with TEG module
harvester and DC-DC boost
converter
Wireless BLE BLE BLE Wi-Fi BLE BLE
technology
Wearability Finger Wrist Sewed on N/A Placed on N/A
clothes chest

7 Conclusion

This paper has presented a detailed review of a number of wearable nodes, among which
three architectures were given special and detailed discussion. These three architectures
utilized algorithms to reduce the measuring of vital parameters in unnecessary times and
thus reducing their power consumption. These nodes have various energy harvesters to
fulfill sustainable work performance in terms the power supplying. The first node [1] was
wearable on a hand wrist, measured the temperature and heart rate, worked on a Li-ion
battery charged by a photovoltaic (PV) cell, transmitted the vital data by a Bluetooth
low energy (BLE), and it had the highest lifetime. The second node [2] was upper-arm-
wearable, measured the heart rate, blood oxygen level, and temperature, supplied by a
Li-ion battery charged by two parallel PV cells, transmitted the vital data via a Wi-Fi to a
Ubidots server, and it gets active the most times. The third node [3] was wrist-wearable,
measured the body acceleration, heart rate, blood oxygen level and temperature, powered
by two series super-capacitors charged by a PV cell with a thermoelectric generator, and
transmitted the vital data by a BLE. Regarding future works, one could use a variety of
energy sources, increase activity times of the nodes, and varying energy sources. New
sensors can also be added to measure other vital parameters.

Acknowledgment. This work was carried out with the support of the Karlsruhe Nano Micro
Facility (KNMFi, www.knmf.kit.edu) a Helmholtz Research Infrastructure at Karlsruhe Institute
Critical Assessment of Current State of the Art 411

of Technology (KIT, www.kit.edu) and under the Helmholtz Research Programme MSE (Materials
Systems Engineering) at KIT.

References
1. Mohsen, S., Zekry, A., Youssef, K., Abouelatta, M.: An autonomous wearable sensor node
for long-term healthcare monitoring powered by a photovoltaic energy harvesting system.
Int. J. Electr. Telecommun. 66(2), 267–272 (2020)
2. Mohsen, S., Zekry, A., Youssef, K., Abouelatta, M.: On architecture of self-sustainable wear-
able sensor node for IoT healthcare applications. Wireless Pers. Commun. 119(1), 657–671
(2021)
3. Mohsen, S., Zekry, A., Youssef, K., Abouelatta, M.: A self-powered wearable wireless sensor
system powered by a hybrid energy harvester for healthcare applications. Wireless Pers.
Commun. 116(4), 3143–3164 (2021)
4. Magno, M., Salvatore, G.A., Jokic, P., Benini, L.: Self-sustainable smart ring for long-term
monitoring of blood oxygenation. IEEE Access 7, 115400–115408 (2017)
5. Magno, M., Wang, X., Eggimann, M., Gavigelli, L., Benini, L.: InfiniWolf: energy effi-
cient smart bracelet for edge computing with dual source energy harvesting. In: 2020th
Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 342–345. IEEE,
Grenoble, France (2020)
6. Wu, T., Redouté, J.-M., Yuce, M.: A wearable, low-power, real-time ECG monitor for smart
T-shirt and IoT healthcare applications. In: Fortino, G., Wang, Z. (eds.) Advances in Body
Area Networks I. Internet of Things (Technology, Communications and Computing) 2019,
vol. 2019, pp. 165–173. Springer, Cham (2019)
7. D˛abrowska, A., Bartkowiak, G., P˛ekosławski, B., Starzak, Ł: Comprehensive evaluation of a
photovoltaic energy harvesting system in smart clothing for mountain rescuers. IET Renew.
Power Gener. 14(16), 3200–3208 (2020)
8. Ivanov, K.: Design, realization and study of thermoelectric watch. In: 21st International Sym-
posium on Electrical Apparatus & Technologies (SIELA), pp. 1–4. IEEE, Bourgas, Bulgaria
(2020)
9. Wan, J., et al.: Wearable IoT enabled real-time health monitoring system. EURASIP J. Wire.
Commun. Netw. 298, 1–10 (2018)
10. Qiu, C., Wu, T., Redouté, J-M., and Yuce, M. R.: A wireless wearable sensor patch for the
real-time estimation of continuous beat-to-beat blood pressure. In: 41st Annual International
Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 6842–
6845. IEEE, Berlin, Germany (2019)
11. Wu, T., Wu, F., Redouté, J., Yuce, M.R.: An autonomous wireless body area network imple-
mentation towards IoT connected healthcare applications. IEEE Access 5, 11413–11422
(2017)
12. Magno, M., et al.: InfiniTime: multi-sensor wearable bracelet with human body harvesting.
Sustain. Comput. Inform. Syst. 11, 38–49 (2016)
13. Dionisi, A., Marioli, D., Sardini, E., and Serpelloni. M.: Autonomous wearable system for
vital signs measurement with energy-harvesting module. IEEE Trans. Instrum. Measure. 65
(6), 1423–1434 (2016)
14. Tran, T.V., Chung, W.: High-efficient energy harvester with flexible solar panel for a wearable
sensor device. IEEE Sens. J. 16(24), 9021–9028 (2016)
15. Sonoda, K., et al.: Wearable photoplethysmographic sensor system with PSoC microcon-
troller. Int. J. Intell. Comput. Med. Sci. Image Process. 5(1), 45–55 (2013)
412 A. E. Alattar et al.

16. Caldara, M., Colleoni, C., Guido, E., Re, V., Rosace, G., Vitali, A.: A wearable sweat ph and
body temperature sensor platform for health, fitness, and wellness applications. In: Di Natale,
C., Ferrari, V., Ponzoni, A., Sberveglieri, G., Ferrari, M. (eds.) Sensors and Microsystems.
Lecture Notes in Electrical Engineering, vol. 268. Springer, Cham (2014). https://doi.org/10.
1007/978-3-319-00684-0_82
17. Mohsen, S.: Hybrid energy harvester for medical sensor node toward real-time healthcare
monitoring. Proc. Eng. Technol. Innov. 18, 43–48 (2021)
18. Mohsen, S.: A solar energy harvester for a wireless sensor system toward environmental
monitoring. Proc. Eng. Technol. Innov. 21, 10–19 (2022)
Identifying Severity Clusters in SLE
Patients

Hamza Zidoum1(B) , Sumaya AL-Sawafi1 , Aliya AL-Ansari2 ,


and Batool AL-Lawati3
1
Department of Computer Science, Sultan Qaboos University, Muscat, Oman
[email protected]
2
Department of Biology, Sultan Qaboos University, Muscat, Oman
3
Department of Medicine, Sultan Qaboos University, Muscat, Oman
http://www.squ.edu.om

Abstract. Machine learning (ML) has a successful impact in health-


care data mining. We use unsupervised ML methods to extract features
and identify subgroups of Systemic Lupus Erythematosus (SLE) patients
related to the disease severity. We analyze the similarity between SLE
patients within these clusters. Finally, we evaluate the clustering results,
using two types of cluster validation, internal cluster validation, and
external cluster validation. The clustering analysis results show two sep-
arate patients clusters which are mild and severe subgroups. Patients
in the severe subgroup have a higher prevalence of the renal disorder,
hemolytic anemia, anti-dsDNA anti- body, and low complements (C3,
C4). The severe subgroup of patients suffer from malar rash and pro-
teinuria with higher use of cyclophosphamide, mycophenolate mofetil,
and azathioprine. The second cluster is mild disease activity, and it is
associated with joint pain, low complements (C3, C4), and a positive
anti-dsDNA antibody.

Keywords: Clustering · Data analytics · Systemic Lupus


Erythematosus (SLE) · Biomedical informatics · Healthcare

1 Introduction
Systemic lupus erythematosus (SLE) is a complex immune multi-system disease
effecting various organs and tissues [1]. The production of auto- antibodies by
the immune system against DNA is considered as a hallmark of SLE. As a result,
the immune system attacks its own organs and tissues [2]. Main symptoms of
SLE are rash joint pain, in addition to more severe manifestations such renal
disease, autoimmune hemolytic anaemai.
Disease distribution is worldwide in both genders with nine times higher
ratio in women than men [1]. Prevalence and incidence of SLE vary according
to gender, age and ethnicity [1,3]. SLE exact etiology is unknown, the complex
interaction between different multi factor such as genetics, environment, and
hormonal factors causes the disease [1].
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
K. Arai (Ed.): FTC 2022, LNNS 561, pp. 413–431, 2023.
https://doi.org/10.1007/978-3-031-18344-7_28
414 H. Zidoum et al.

SLE is a heterogeneous disease [1] and some of its symptoms overlap with
other autoimmune pathology [4,5]. SLE is related to unpredictable exacerbations
and remissions with a wide range of symptoms [1,5].
The nature of SLE led the researchers to propose SLE as a syndrome [1].
Clinical signs don’t always occur together at the same time and may develop at
any phase of the disease. The extreme heterogeneity of SLE disease constitutes
many challenges for patients and clinicians. At present, the diagnosis is based
on the experience of doctors [1]. They can evaluate symptoms and the results of
laboratory analysis.
The course of this disease is characterized by dynamic episodes of active
inter- change with remissions [6]; consequently, there is a problem in defining an
efficient line of treatment for the cumulative symptoms. Older treatment meth-
ods often implied a reduced life span for SLE patients due to organ malfunctions
or the toxic effects of therapy. At this stage, SLE disease does not have a clear
path, which helps clinicians to follow up and manage patients.
The complex nature of SLE makes measuring disease activity accurately
remains a challenging task. SLE disease activity measurement is essential to
evaluate the Patient’s status, the result affects the decisions of therapeutic strate-
gies. Also, it gives an index to predict chronic damage and mortality [1]. Disease
activity needs to be monitored and controlled to avoid the chronic damage devel-
opment and enhance the management of SLE patients.
The complex nature of SLE makes measuring disease activity accurately a
challenging task. SLE disease activity measurement is essential to evaluate the
patients’ status; the result affects the decisions of therapeutic strategies. Also,
it gives an index to predict chronic damage and mortality [1]. Disease activity
needs to be monitored and controlled to avoid chronic damage development and
enhance SLE patients’ management.
To identify clinically significant changes in SLE different disease activity
indices were developed [1,5,7]. The SLE Disease Activity Index (SLEDAI) is
an internationally widely used index. SLEDAI developed in order to calculate
disease activity in the preceding 10 days. It considered as a clinical index, which
includes 24 weighted clinical and laboratory variables of nine organ systems
[5,8]. Another global activity index, European Consensus Lupus Activity Mea-
surement (ECLAM) involves 5 weighted clinical and serological items related to
the previous 30 days [5].
Physicians face different challenges in defining, diagnosing, and managing
SLE patients. Effort and time are also devoted. Thus, there is a need to reduce
the effort of diagnosis and management. Disease activity is a fundamental issue
that needs to be monitored and assessed to facilitate disease management, reduce
health care costs (e.g., medications), and reduce the harmful effects of the
medicines. To the best of our knowledge, no research has used the clustering
methods to identifying severity clusters in Omani patients with SLE and detect
the features associated with disease severity.
In this paper we aim at (a) identifying severity clusters in Omani patients with
Systemic Lupus Erythematosus, (b) detecting features related to diseases severity,
SLE Patients Clusters 415

and (c) examining the correlation between disease activity index (SLEDAI) and
physician global assessment (PGA) with each subgroup in results.
This paper is organized as follows. Section 2 discuss related work that uses
the same method to solve a different problem. Section 3 introduces clustering
methods. Section 4 defines materials and methods used to solve our problem.
Section 7 discussed experimental results and evaluation. Section 6 contains our
clustering result validation. Section 7 discussion. Section 8 conclusion and future
work.

2 Literature Review
Clustering analysis used in several medical fields, for instance, brain image seg-
mentation [9,10], partitioning patients of Alzheimer’s disease [11] and clustering
autoantibody of lupus patients [12].
Authors in [13], used the concept of clustering to identify damage clusters
in SLE patients. Around 1130 patient’s data were clustered using the K-means
algorithm with Euclidean distance. Patient’s data were grouped into three clus-
ters, cluster 1 least damage, cluster 2 high in renal damage and ocular dam-
age, and cluster 3 neuropsychiatric damage and musculoskeletal damage. Here
is observed that most of the symptoms are closely related to each other. This
study introduced new approach to evaluate SLE patients by damage. It show
that neuropsychiatric and renal disorder are present in different clusters and
patients with this damage had highest risk of death.
Authors in [14], used the concept of clustering to cluster symptoms of SLE in
children’s. They created 5 clusters to group data of five major SLE symptoms in
children’s. Each cluster has an associated set of factors. All participants in the
study were below 18 years of age. This study was conducted to find out other
problems, which may arise due to existing SLE in children’s below age of 18 and
identify the symptom occurring in patients with childhood-onset. 75 patients
were included and five separate clusters were identified usin agglomerative hier-
archical clustering using centroid method with cosine similarity measure. Cluster
1 had symptoms related to pain (joint pain, headache, and painful muscles )and
itching, cluster 2 symptoms related to bruises and stomach complaints, cluster 3
symptoms related to weight gain, cluster 4 symptoms related to white fingers in
cold weather, hair lossand sensitivity to sunlight, and cluster 5 symptoms related
to fatigue. The problem with study is that the number of participants was less,
also participants were heterogeneous.
Authors in [15] collected data of 150 SLE patients from Egypt. Three dif-
ferent clusters were created on the basis of some associated features. In this
study, K- means clustering algorithm was used. Euclidian distance is used as the
similarity measure. This study helped in understanding the disease pattern in
Egyptian SLE patients. Such study helps in disease prediction and precaution.
Three distinct clusters identified, cluster 1 significantly hight in mucocutaneous
and arthritis manifestations. Cluster 2 more frequent in renal and hematological
manifestations and finally Cluster 3 had a high prevalence of mucocutaneous
manifestations,serositis, hematologic manifestations and renal involvement.
416 H. Zidoum et al.

To the best of our knowledge, no research has used the clustering methods to
identifying severity clusters in Omani patients with SLE and detect the features
associated with disease severity. Furthermore our method is robust as we used
cluster validation for evaluating the goodness of clustering algorithm results.
In this research, we used both internal cluster validation and external cluster
validation.

3 Clustering Methods
Several clustering methods have been proposed [16] partition, density-based,
and hierarchical [17]. Partitioning algorithms separate the data set into speci-
fied number of clusters according to the similarity or distance among the data
samples. Hierarchical algorithms compose the clusters in the hierarchical struc-
ture. Density- based algorithms find the dense regions among data samples to
form clusters and low-density regions create boundaries between the clusters.
Using a specific method depends on data size, data dimensionality, and time
complexity [18].
Partition methods start by consider all data points in the data set as sin-
gle cluster and do splitting until a stopping criterion is met. While, hierarchical
methods start considering each data point in the data set as a cluster and aggre-
gating them to form a hierarchical cluster structure (larger cluster) [19]. There
are many clustering methods to cluster datasets based on calculating the sim-
ilarity. The following subsections describe and provide the detail of clustering
methods.

3.1 Hierarchical Clustering Methods

Hierarchical clustering seek to construct a hierarchy of cluster. It has two cat-


egories: agglomerative and divisive. Based on how the decomposition is formed
hierarchical clustering methods can further categorize to agglomerative (bottom-
up approach) or divisive (top-down approach) [20].
Hierarchical clustering methods never require to prespecify the number of
desired clusters to be created. The tree-based representation ( dendrogram ) of
the objects is the result of this method. In general, it allows cutting the hierarchy
at the desired level and this characteristic makes it different from other clustering
methods.
Divisive clustering techniques follow top-down flow which starts from a single
cluster having all data points in it and then, split this cluster recursively into
smaller clusters until each data point in it is a disjoint cluster (each cluster
contains one data point) [21].
As an opposite technique of divisive clustering, agglomerative clustering
method ( bottom-up approach ) start by considering each data point in data set
as cluster and it contains only itself. Then, successively cluster them together
according to similarity criteria until the clusters are merged into one (final cluster
contains all data points on it) [18,22].
SLE Patients Clusters 417

3.2 Partitional Clustering


k-means clustering method is most widely used in unsupervised learning algo-
rithms to solve the clustering problem. It is clustered the objects into mutually
exclusive clusters. It takes the number of desired clusters as input. The algorithm
works as follow:
First, Initialize cluster centers by choosing K (the number of clusters desired).
Once we have these cluster centers, we can assign each point to the clusters based
on the minimum distance to the cluster center. The next step assign each data
point to the closest cluster center. Then, update the cluster centers based on
the points assigned to cluster. Repeat the assignment and update steps until no
point changes clusters and centroids remain the same. In other words centroids
do not move anymore.
k-means clustering has some limitation associated with identifying number
of clusters k so the results produced depend on the initial values k. Also, it is
sensitive to data noise and outliers [19].

3.3 Spectral Clustering Algorithm


Spectral clustering methods are based on graph and matrix theories [23], where
the approach is used to identify communities of nodes in a graph based on the
edges connecting them. The method is flexible and allows us to cluster non-
graph data as well. In spectral clustering, the data points are treated as nodes
of a graph. Three main steps to perform spectral clustering [23] are as follows:

1. Pre-Processing: creates a similarity graph for all data points (undirected


graph). The graph can be represented as an adjacency matrix, where the
row and column indices represent the nodes, and the entries represent the
absence or presence of an edge between the nodes.
2. Spectral Representation: from the resulting similarity graph, the associated
Laplacian matrix is computed by subtracting the weight matrix from the
(diagonal) degree matrix. Then, compute the eigenvectors of the Laplacian
matrix to define a feature vector for each object.
3. Clustering: to obtain a discrete solution from eigenvectors, a standard clus-
tering algorithm, such as the K-means method, is applied in the spectral
space.

3.4 Cluster Validation


Cluster validation is used to design the procedure of evaluating the goodness
of clustering algorithm results. In this research, we use two kinds of cluster
validation, internal cluster validation and external cluster validation.

3.4.1 Internal Cluster Validation Internal cluster validation, which uses


the internal information of the clustering process to evaluate the goodness of a
clustering structure without reference to external information. It can be also used
418 H. Zidoum et al.

for estimating the number of clusters and the appropriate clustering algorithm
without any external data.
1. The silhouette score value was used to determine the appropriate cluster-
ing algorithm. Silhouette helps to evaluate the correctness of a data object’s
assignment in a particular cluster instead of another cluster by measuring
both inter-cluster separation and intra-cluster cohesion [24]. Negative Silhou-
ette values show the incorrect placement of objects, while a positive value
represents better object placement [24].
Using this technique, go through three steps, assume the data have been
clustered via any technique into k clusters.
(a) Compute a cluster distance for every data point with respect to other
points in the same cluster, and we denote the results as a(i). a(i) mean
distance between i and all other data points in the same cluster (Eq. 1).
1 
a(i) = d(i, j) (1)
|Ci | − 1
j∈Ci ,i=j

Divide by Ci -1 because we do not include the distance d(i,i) in the sum,


d(i,j) is the distance between data points i and j in the cluster. Ci
(b) For the ith object, compute the object’s average distance to all the objects
in other clusters. Calculate the minimum value with respect to all clusters.
Let this value be b(i) (Eq. 2).
1 
b(i) = min d(i, j) (2)
k=i |Ck |
j∈Ck

(c) We now define Silhouette Coefficient of one data point i (Eq. 3).
b(i) − a(i)
s(i) = (3)
maxa(i), b(i)
2. The Calinski Harabasz Score (CH) one of the internal validity measures com-
monly used for evaluating clustering solutions [25]. CH measures the two
criteria simultaneously with the help of average between and within cluster
sum of squares. It is described by:
Bc (k) Wc (k)
CH(k) = / (4)
(k − 1) (n − 1)
where n stands for number of the clusters and k stands for class.BC and WC
denotes between and within cluster sums of squares respectively, given by:
K
  2
Bc = |Ck | Ck − x (5)
k=1

N
K 
  2
Wc = wk ,i xi − Ck  (6)
k=1 i=1
SLE Patients Clusters 419

3.4.2 External Cluster Validation External cluster validation consists of


comparing the results of cluster analysis to an externally known result, such as
externally provided class labels. It measures the extent to which cluster labels
match externally supplied class labels. Since we know the proper cluster num-
ber in advance, this approach is mainly used for selecting the right clustering
algorithm for a specific data set.
We will use external validation in this study, systemic lupus erythematosus
disease activity index (SLEDAI), and global physician assessment. Physician
global assessment (PGA) is a visual score that reflects the clinician’s judgment of
overall SLE disease activity, while SLEDAI described in 1 is the most widely used
disease activity measures in international. PGA and SLEDAI were evaluated for
the patients by a clinician (domain experts).

4 Materials and Methods


4.1 Research Datasets

The dataset was collected from Sultan Qaboos University Hospital (SQUH) for
studies on SLE and approved by the Ethics Committee of the College of Medicine
and Health Science in the Sultan Qaboos University (SQU) (MERC # 1418 and
1650). To maintain confidentiality, every patient was assigned a unique label,
and the data were analyzed anonymously.
For this research work, we included only Omani adult patients (15–55 years
old) who were diagnosed with SLE and followed up in a Rheumatology clinic in
SQUH from 2006 to 2019. Patients who meet the American College of Rheuma-
tology (ACR) classification criteria were included in the research. Research par-
ticipants were 138 SLE patients. Extracted data are from SQUH includes dif-
ferent files, including demographic data file, clinical notes file, laboratory test
result file, and medication data file.

4.2 Data Preprocessing

This section describes the data preprocessing phase conducted to our dataset
to apply clustering. Data preprocessing is an important step to apply cluster
analysis methods to the dataset to increase quality. It consists of a sequence of
steps that depend on the data itself for example normalize, scale, and transform
feature data. The preprocessing was implemented using the python language.
The following steps of preprocessing data were applying to our dataset:

1. Encoding categorical features: The first step of preprocess the data is


encoding categorical features. Features can be numerical or categorical. Most
of our features are categorical and it requires encoding to categorical values. In
label encoding, we replace the categorical value with a numeric value between
0 and the number of classes minus 1.
420 H. Zidoum et al.

2. Feature scaling: In second place of data preprocessing was feature scaling. In


this thesis, the min-max normalisation method was used. Feature was linearly
rescaled to the fixed range between 0 and 1. We used this technique to ensure
that features value in the same ranges, none of them bias the clustering results.
Equation 7 represent formula for normalization.
x − xminimum
Xnormalization = (7)
xmaximum − xminimum
where , Xmaximum and Xminimum are the maximum and the minimum values
of the feature, respectively.

4.3 Clustering and Experimental Design

We attempted to identify clinical patterns of symptoms in these patients by using


clustering methods and comparing the prevalence of various clinical, immuno-
logical, and medications features among these patients’ clusters. The clustering
methods we apply to the set are k-means, agglomerative hierarchical, and spec-
tral clustering (Clustering methods are the most commonly used in data anal-
ysis), representing three categories of algorithms: partitioning, hierarchical, and
spectral, respectively.
In this section, we describe and explain the following factors related to the
experiments. Experimental environment, the experimental parameters and how
we tune these parameters, and the experimental procedures. We used all features
in our dataset as input to the clustering model. We mostly used the default
value selected by the Scikit-learn libraries for clustering algorithms for other
experiment settings.
Distance measures are used to determine the similarity between similar
objects in the dataset. The objective of distance measures is to find similar
objects and to group them in the same cluster. For our experiment, we used
Euclidean Distance. Euclidean Distance is a non-negative measure that calcu-
lates the distance between two points (Fig. 1).

Fig. 1. Euclidean distance between two points


SLE Patients Clusters 421

The distance between these two points is quantified based on the Pythagoras
Theorem, where (x1 ,y1 ) and (x2 ,y2 ) are points in 2-dimensional space then the
Euclidean distance between them is calculated by using the Eq. 8.

ED = (x2 − x1 )2 + (y2 − y1 )2 (8)

5 Experimental Results and Evaluation


5.1 Optimal Number of Clusters
When we cluster a dataset with no labels, we don’t know the right number of
clusters. In this project, the Elbow method is used to determine the number of
clusters. It measures the within-cluster sum of squared errors (WSS) for different
numbers of clusters.
Figure 2 shows that the WSS goes down rapidly, with K increasing from 1
to 2, and then the distortion goes down slowly after that. And then it looks
like maybe using two clusters is the right number of clusters because that’s the
elbow of this curve. Distortion goes down rapidly until K = 2 and goes down
slowly after that; hence the number of clusters needed for this data set is K = 2.

Fig. 2. Elbow plot for optimal number of clusters

5.2 The First Experiment: Hierarchical Clustering


Figure 3 shows the dendrogram result of the hierarchical agglomerative clus-
tering method based on the similarity between two clusters using the wards
method. The height axis displays the distance between groups. The horizontal
bars indicate the point at which two clusters merged.
Looking at the dendrogram in Fig. 3, we can extract a different number of
clusters. As we go up, the number of clusters decreases as more objects are
combined. After level 33, all objects are connected under one big cluster-the
desired number of clusters obtained by cutting the dendrogram at the proper
distance (level).
422 H. Zidoum et al.

Fig. 3. Dendrograms of hierarchical clustering

5.2.1 Results Analysis for K = 2 Figure 3 shows cutting the dendrogram


at level 25 gives us two separated clusters. Cluster 1 (n = 84) was the largest
cluster with two males and 82 females, while our second cluster had 54 patients,
51 females, and three males. The radar chart in Fig. 4 shows features distribution
in Cluster 1 and Cluster 2 using a hierarchical method, radar chart shows Cluster
2 (severe) has the highest prevalence of features (lupus nephritis, hemolytic
anemia, low complement (C3 and C4), and positive anti-dsDNA).

Fig. 4. Features distribution in two clusters using the hierarchical method.

Renal disorders were more prevalent among Cluster 2 patients (0.87) than
only 0.18 Cluster 1 patients. Of these 54 patients in Cluster 2, 0.41 patients had
a fever, while 0.23 of the patients in Cluster 1 had a fever. Cluster 2 patients
significantly were higher in low C3 (0.96) and low C4 (0.93) than those in Cluster
1 (0.51 and 0.54). The majority of patients in Cluster 2 had significantly more
Anti-dsDNA (0.85) than those in Cluster 1 (0.67). Cluster 2 significantly had
more Acute Cutaneous Lupus (0.65) than those in Cluster 1 (0.33). Interestingly,
Cluster 2, in general, tends to be more expensive compared to another cluster.
Table 1 shows the prevalence of features detected using a hierarchical method
where the maximum is one and minimum is zero.
SLE Patients Clusters 423

Table 1. Prevalence of features detected using a hierarchical method where the max-
imum is one and minimum is zero

No. Feature Cluster 1 (n = 84) Cluster 2 (n = 54)


1 Male 0.02 0.06
2 Female 0.98 0.94
3 Fever 0.23 0.41
4 ACL 0.33 0.65
5 CCL 0.04 0.04
6 Oral Ulcers 0.15 0.30
7 Alopecia 0.39 0.44
8 Joint pain 0.88 0.87
9 Serositis 0.06 0.07
10 Renal disorders 0.18 0.87
11 Proteinuria 0.14 0.72
12 Vasculitis 0.07 0.11
13 Neurologic 0.11 0.15
14 Hemolytic Anemia 0.20 0.56
15 Combs 0.21 0.56
16 Leukopenia 0.14 0.11
17 Thrombocytopenia 0.06 0.11
18 Anti-dsDNA 0.67 0.85
19 Anti-Sm 0.10 0.17
20 Anti-Phospholipid 0.31 0.37
21 Low C3 0.51 0.96
22 Low C4 0.54 0.93

5.3 The Second Experiment: K-Means Clustering

The K-Means clustering method uses to group the patients into a different num-
ber of clusters. First, we set up the desired number of clusters to 2,3, and 4.
Next, via the maxiter parameter, we specify the maximum number of iterations
for every run (maxiter = 300).

5.3.1 K-Mean Results Analysis for K=2 In the first experiment, we set
the number of the desired cluster to two. Cluster 1 was the smallest cluster with
54 patients (53 females and one male) than the second cluster with 84 patients
(80 females and four males). Cluster1 patients 0.89 was Females, and they suf-
fer from the renal disorder, hemolytic anemia, anti-dsDNA antibody, and low
complements (C3, C4). The radar chart in Fig. 5 shows features distribution
424 H. Zidoum et al.

in Cluster 1 and Cluster 2 using a K-Means method, radar chart shows Clus-
ter1 (severe) has the highest prevalence of features (lupus nephritis, hemolytic
anemia, low complement (C3 and C4), and positive anti-dsDNA) compared to
Cluster 2.

Fig. 5. Features distribution in two clusters using K-Means

Cluster 1 characterize by high proportions of renal (0.76 ) and hemolytic


anemia (0.59). Patients in Cluster 1 had higher prevalence in Low C3 (0.96) and
Low C4 (0.94) than Cluster 1 (0.51 and 0.52, respectively). Cluster 1 also had a
higher prevalence of acute cutaneous lupus (0.59) and oral ulcers (0.30) than in
the other Cluster. 0.91 patients in Cluster 1 suffer from renal disorders compared
to 0.15 in Cluster 2. Table 2 show the prevalence of features detected using the
K-mean method where the maximum is one and minimum is zero.

5.4 The Third Experiment: Spectral Clustering

5.4.1 Results Analysis for K=2 Two separate clusters were identified using
spectral methods; the first cluster included 44 (one male and 43 females) patients,
while the second cluster contains 94 (4 males and 90 females) patients. The radar
chart in Fig. 6 shows features distribution in Cluster 1 and Cluster 2 using a K-
Means method, radar chart shows Cluster 1 (severe) has the highest prevalence
of features (lupus nephritis, hemolytic anemia, low complement (C3 and C4),
and positive anti-dsDNA) compared to Cluster 2.
Table 3 show the prevalence of features detected using the Spectral method
where the maximum is one and minimum is zero. Cluster 2 (n = 93) was the
largest cluster with the lowest renal disorder’s symptoms (0.21) compare to Clus-
ter 1 (0.95). Cluster 1 patients significantly were higher in low C3 (0.98) and Low
C4 (0.95) than those in Cluster 2 (0.55 for low C3 and 0.56 low C4). Cluster 1 had
a higher prevalence in acute cutaneous lupus (0.64) and hemolytic anemia (0.70)
than those in Cluster 2, 0.37 for acute cutaneous lupus and 0.17 hemolytic ane-
mia. However, there were no differences in the prevalence of leukopenia between
the two clusters.
SLE Patients Clusters 425

Table 2. Prevalence of features detected using K-Mean method in clusters where the
maximum is one and minimum is zero

No. Feature Cluster 1 (n = 84) Cluster 2 (n = 54)


1 Male 0.02 0.05
2 Female 0.98 0.95
3 Fever 0.39 0.24
4 ACL 0.59 0.37
5 CCL 0.04 0.04
6 Oral Ulcers 0.30 0.15
7 Alopecia 0.44 0.39
8 Joint pain 0.87 0.88
9 Serositis 0.07 0.06
10 Renal disorders 0.91 0.15
11 Proteinuria 0.76 0.12
12 Vasculitis 0.11 0.07
13 Neurologic 0.17 0.10
14 Hemolytic Anemia 0.59 0.18
15 Combs 0.59 0.19
16 Leukopenia 0.15 0.12
17 Thrombocytopenia 0.13 0.05
18 Anti-dsDNA 0.81 0.69
19 Anti-Sm 0.15 0.11
20 Anti-Phospholipid 0.37 0.31
21 Low C3 0.96 0.51
22 Low C4 0.94 0.52

6 Clustering Validation
Clustering validation has long been recognized as one of the vital issues essential
to the success of clustering applications. As described in Sect. 3.4 we will use
two kinds of cluster validation, internal cluster validation and external cluster
validation.

6.1 Internal Cluster Validation


Here we can see in Fig. 2 the drop in the sum of squared distance starts to slow
down after k = 2. We can verify this by calculating the silhouette coefficient and
CH score. Table 4 shows results validity with Silhouette measure and Calinski-
Harabasz index (CH); we observed that silhouette score and CH score giving
the maximum value when K= 2 for the three clustering methods. So we can
conclude that K= 2 is the optimal number of clusters for this dataset.
426 H. Zidoum et al.

Fig. 6. The features distribution in the two clusters using spectral clustering

Table 3. Prevalence of features detected using spectral method in clusters where the
maximum is one and minimum is zero

No. Feature Cluster 1 (n = 45) Cluster 2 (n = 93)


1 Male 0.02 0.04
2 Female 0.98 0.96
3 Fever 0.45 0.22
4 ACL 0.64 0.37
5 CCL 0.02 0.04
6 Oral Ulcers 0.30 0.17
7 Alopecia 0.39 0.43
8 Joint pain 0.82 0.90
9 Serositis 0.09 0.05
10 Renal disorders 0.95 0.21
11 Proteinuria 0.82 0.16
12 Vasculitis 0.09 0.09
13 Neurologic 0.16 0.11
14 Hemolytic Anemia 0.70 0.17
15 Combs 0.70 0.18
16 Leukopenia 0.14 0.13
17 Thrombocytopenia 0.14 0.05
18 Anti-dsDNA 0.82 0.70
19 Anti-Sm 0.14 0.12
20 Anti-Phospholipid 0.41 0.30
21 Low C3 0.98 0.55
22 Low C4 0.95 0.56

6.2 External Cluster Validation


The clinician evaluated twenty-two patients using the physician global assess-
ment (PGA) and disease activity index (SLEDAI) discussed in Sect. 3.4.2. PGA
SLE Patients Clusters 427

Table 4. Silhouette coefficient and Calinski Harabasz score

Method No. cluster Silhouette score CH score


Hierarchical clustering 2 0.15 22.71
3 0.12 18.64
4 0.11 14.83
K-Means clustering 2 0.13 21.34
3 0.09 15.92
4 0.09 13.67
Spectral clustering 2 0.13 20.05
3 0.07 11.59
4 0.06 10.48

categorizes patients into four categories: severe, medium, mild, or none (remis-
sion), while SLEDAI five categories depending on the SLEDAI scores. No activ-
ity SLEDAI when scoring zero, mild activity when the SLEDAI score between
one and five, moderate activity when the SLEDAI score between six and ten,
high activity when the SLEDAI score between eleven to nineteen, and very high
activity when the SLEDAI greater than or equal to twenty.
Table 5 shows clinician evaluation (PGA and SLEDAI) and clustering result
of evaluated SLE patients. All three methods groups none cases evaluated using
PGA in the same group except two cases. Hierarchical agglomeration and K-
means methods group all mild cases in the same group except one case, while the
spectral method groups two cases in one cluster and another two cases in other
clusters. All med cases are clusters in the same cluster using the three clustering
methods. All severe cases are clustered in the same group using hierarchical
agglomeration and K-means methods, while the spectral method groups all cases
in the same group except one case.
The evaluation of clustering analysis results to SLEDAI shows all very high
activity and high activity cases are clustered in the same group using three meth-
ods. All moderate activity cases are clustered in the same group using hierarchical
agglomeration and K-means methods, while the spectral method groups all cases
in the same group except one case. Regarding no activity disease activity cases,
all three methods group the cases in the same group except one case.
The reason for missing classified some patients in the same cluster was the
period between collecting data and assessing patients; The data was collected in
different periods. Most of it in the diagnosis period, while the patients were
recently evaluated. When patients’ diagnosed with SLE, some patients were
severe in the age-onset and then alternating to remissions or vice versa.

7 Discussion
To the best of our knowledge, this is the first study aimed to establish SLE
Omani patients clusters, which leads to identify clusters in patients based on
Table 5. Clustering result validation using physician global assessment and disease activity index
428

Cases Patients Hierarchical Agglomerative K-means Spectral Percentage


PGA None 8 All cases clustered in same All cases clustered in same All cases clustered in same 75%
cluster except two cases cluster except two cases cluster except two cases
(SLE 54, SLE 55) (SLE 54, SLE 55) (SLE 54, SLE 55)
Mild 4 All cases clustered in same All cases clustered in same Two cases in one cluster 75%
cluster except one cases cluster except one cases (SLE 41, SLE 50) and
H. Zidoum et al.

(SLE 5) (SLE 5) another two cases in other


cluster (SLE 40, SLE 46)
Med 5 All cases clustered in the All cases clustered in the All cases clustered in the 100%
same cluster same cluster same cluster
Server 5 All cases clustered in the All cases clustered in the All cases clustered in the 100%
same cluster same cluster same cluster except one case
(SLE 44)
SLEDAI No 6 All cases clustered in the All cases clustered in the All cases clustered in the 83.3%
activity same cluster except one case same cluster except one case same cluster except one case
(SLE 54) (SLE 54) (SLE 54)
Mild 5 Two cases in one cluster Two cases in one cluster Two cases in one cluster 60%
activity (SLE 40, SLE 51) and three (SLE 40, SLE 51) and three (SLE 41, SLE 55) and three
cases in another cluster cases in another cluster cases in another cluster
(SLE 41, SLE 46, SLE 55) (SLE 41, SLE 46, SLE 55) (SLE 40, SLE 46, SLE 51)
Moderate 4 All cases clustered in the All cases clustered in the All cases clustered in the 100%
activity same cluster same cluster same cluster except one case
(SLE 44)
High 3 All cases clustered in the All cases clustered in the All cases clustered in the 100%
activity same cluster same cluster same cluster
Very high 4 All cases clustered in the All cases clustered in the All cases clustered in the 100%
activity same cluster same cluster same cluster
SLE Patients Clusters 429

symptoms and immunological tests. The clustering results will give researchers
improved way to group the heterogeneous patients. For instance, patients that
tend to occur together may develop the same symptoms and therefore share the
same cause and/or underlying immunopathology. We explored the prevalence
of the symptoms in Omani SLE patients. The most frequent symptoms were
joint pain, followed by acute cutaneous lupus (ACL), renal disorder, hemolytic
anemia, anti-dsDNA antibody, alopecia, and low complements (C3, C4).
The patients’ are clustered into two clusters in the present study according to
the optimal number of clusters. We have found severe and medium cases clustered
in the same group. These patients suffered from rash, lupus nephritis, hemolytic
anemia, low complement (C3 and C4), and positive anti-dsDNA. Also, remission
and mild cases clustered in the same group; patients were suffering from joint
pain. 90% of severe and medium cases were grouped in the same group and
81.8% of none and mild cases.
We called the first cluster severe and the second cluster a mild cluster. The
severe cluster has a high prevalence of renal disorder, hemolytic anemia, anti-
dsDNA antibody, and low complements (C3, C4), which indicate disease severity
in SLE patients in Oman. The mild cluster was associated with joint pain, low
complements (C3, C4), and a positive anti-dsDNA antibody.
By identifying the patient clusters, we confirmed that renal disorder,
hemolytic anemia, and low complements (C3, C4) are related features. These
features were highest in the severe cluster (high disease activity). Joint pain
symptom was not a distinct symptom for disease severity, but we observed that
it tends to be one of the most common symptoms in Omani patients with SLE.

8 Conclusion and Future Work


In this thesis, we used the most common types of clustering techniques which are
k-means, agglomerative hierarchical clustering, and spectral in order to identify
severity clusters in Omani patients with SLE. We collected the dataset for the
conducted research from SQUH and preprocessing it for clustering experiments.
The results of clustering experiments were validated using two types of cluster
validation which are internal cluster validation and external cluster validation.
Two separate patients clusters were identified, severe cluster and mild clus-
ter. The results demonstrated the existing relation between symptoms prevalence
associated with disease activity. Our findings showed the symptoms (renal disor-
der, hemolytic anemia, low complement (C3, C4), and positive anti-dsDNA) are
associated with severe cases of SLE disease activity. These important findings
can be helpful for research purposes and patients management. In the future, we
are looking to expand on this research:
1. Collect longitudinal data to create a model in order to predict chronic damage
in SLE patients.
2. Use unsupervised feature selection methods, to eliminate irrelevant and
redundant features to optimize the clustering results.
3. Investigate the impact of other distance metrics on the results of clustering
techniques.
430 H. Zidoum et al.

References
1. Bertsias, G., Cervera, R., Boumpas, D.T.: Systemic lupus erythematosus: patho-
genesis and clinical features. EULAR Textbook Rheumat. Dis. Geneva Switzerland:
European League Against Rheumat. 2012, 476–505 (2012)
2. Pavlovic, M., Kats, A., Cavallo, M., Chen, R., Hartmann, J.X., Shoenfeld, Y.:
Pathogenic and epiphenomenal anti-DNA antibodies in SLE. Autoimmune Dis.
2010 (2010)
3. Rees, F., Doherty, M., Grainge, M.J., Lanyon, P., Zhang, W.: The worldwide inci-
dence and prevalence of systemic lupus erythematosus: a systematic review of
epidemiological studies. Rheumatology 56(11), 1945–1961 (2017)
4. Cruz, B.H., et al.: Differences in clinical manifestations and increased severity of
systemic lupus erythematosus between two groups of Hispanics: European Cau-
casians versus Latin American mestizos (data from the relesser registry), Lupus
29, 0961203319889667 (2020)
5. Ceccarelli, F., et al.: Assessment of disease activity in systemic lupus erythemato-
sus: lights and shadows. Autoimmun. Rev. 14(7), 601–608 (2015)
6. Campar, A., Farinha, F., Vasconcelos, C.: Refractory disease in systemic lupus
erythematosus. Autoimmun. Rev. 10(11), 685–692 (2011)
7. Smith, P.P., Gordon, C.: Systemic lupus erythematosus: clinical presentations.
Autoimmun. Rev. 10(1), 43–45 (2010)
8. Aranow, C.: A pilot study to determine the optimal timing of the physician global
assessment (PGA) in patients with systemic lupus erythematosus. Immunol. Res.
63(1–3), 167–169 (2015)
9. Hrosik, R.C., Tuba, E., Dolicanin, E., Jovanovic, R., Tuba, M.: Brain image seg-
mentation based on firefly algorithm combined with k-means clustering. Stud.
Inform. Control 28, 167–176 (2019)
10. Huang, H., Meng, F., Zhou, S., Jiang, F., Manogaran, G.: Brain image segmenta-
tion based on FCM clustering algorithm and rough set. IEEE Access 7, 12 386–12
396 (2019)
11. Alashwal, H., El Halaby, M., Crouse, J.J., Abdalla, A., Moustafa, A.A.: The appli-
cation of unsupervised clustering methods to Alzheimer’s disease. Front. Comput.
Neurosci. 13, 31 (2019)
12. Mizus, M., Li, J., Goldman, D., Petri, M.A.: Autoantibody clustering of lupus-
associated pulmonary hypertension. Lupus Sci. Med. 6(1), e000356 (2019)
13. Ahn, G.Y., et al.: Identifying damage clusters in patients with systemic lupus
erythematosus. Int. J. Rheumat. Dis. 23(1), 84–91 (2020)
14. Chiang, Y.-C., Huang, J.-L., Wang, C.-H., Lee, H.-C., Lee, M.-Y., Hsiao, Y.-C.:
Symptom clustering in patients with childhood-onset systemic lupus erythemato-
sus. J. Adv. Nurs. 75(1), 54–62 (2019)
15. Helalyand, M., Mansour, M.: Clinical features clusters in systemic lupus erythe-
matosus. Egypt. J. Hosp. Med. 71(5), 3136–3141 (2018)
16. Abbas, O.A.: Comparisons between data clustering algorithms. Int. Arab J. Inf.
Technol. (IAJIT) 5(3) (2008)
17. Jung, Y.G., Kang, M.S., Heo, J.: Clustering performance comparison using k-
means and expectation maximization algorithms. Biotechnol. Biotechnol. Equip.
28(sup1), S44–S48 (2014)
18. Aggarwal, S., Phoghat, P., Maitrey, S.: Hierarchical clustering- an efficient tech-
nique of data mining for handling voluminous data. Int. J. Comput. Appl. 129(13),
31–36 (2015)
SLE Patients Clusters 431

19. Saxena, A., et al.: A review of clustering techniques and developments. Neurocom-
puting 267, 664–681 (2017)
20. Ahmad, A., Khan, S.S.: Survey of state-of-the-art mixed data clustering algorithms.
IEEE Access 7, 31 883–31 902 (2019)
21. Reddy, M.V., Vivekananda, M., Satish, R.: Divisive heirarchical clustering with k-
means and agglomerative heirarchical clustering. Int. J. Comp. Sci. Trends Technol.
5(5), 6–11 (2017)
22. Popat, S.K., Emmanuel, M.: Review and comparative study of clustering tech-
niques. Int. J. Comput. Sci. Inf. Technol. 5(1), 805–812 (2014)
23. Hamad, D., Biela, P.: Introduction to spectral clustering. In: 2008 3rd Interna-
tional Conference on Information and Communication Technologies: From Theory
to Applications, pp. 1–6. IEEE (2008)
24. Bruno, G., Cerquitelli, T., Chiusano, S., Xiao, X.: A clustering-based approach to
analyse examinations for diabetic patients. In: 2014 IEEE International Conference
on Healthcare Informatics, pp. 45–50. IEEE (2014)
25. Lukasik, S., Kowalski, P.A., Charytanowicz, M., Kulczycki, P.: Clustering using
flower pollination algorithm and calinski-harabasz index. In: IEEE Congress on
Evolutionary Computation (CEC). IEEE, pp. 2724–2728 (2016)
Automated Real-Time Recognition
of Non-emotional Conversational Head-Gestures
for Social Robots

Aditi Singh(B) and Arvind K. Bansal

Department of Computer Science, Kent State University, Kent, OH 44242, USA


{asingh37,akbansal}@kent.edu

Abstract. Social robotics promises to augment human caretakers in the medi-


cal industry, elderly care industry, entertainment industry, education industry and
space and deep-sea explorations in the future. Automated understanding of human
facial expression, speech, non-emotional conversational gestures and their integra-
tion are necessary for human-robot interaction. Conversational gestures comprise
conversational head-gestures, hand-gestures, gaze, lip movements, and their syn-
chronous integration with speech. In this paper, we implement a synchronous col-
ored Petri net model for automated recognition of non-emotional conversational
head-gestures. The scheme integrates head-motion analysis, eye-focus analysis,
and their synchronization. The technique performs video analysis to derive x and
y coordinates of facial feature points, stillness-vector, and silence-vector in real-
time. These vectors are analyzed to derive a signature comprising meta-attribute
values of the corresponding synchronous Petri net graph for each gesture. These
signatures are matched against archived signatures to recognize and label the actual
gestures in real-time. An algorithm using dynamic matrix-based implementation
has been presented. Conversational head-gestures have been partitioned into mul-
tiple classes based upon the combination of type of head-motions, eye-focus,
repeated motion, and associated speech to reduce ambiguities in gesture-labeling
caused by sensor inaccuracies, sampling interval choices and various threshold
limitations. A confusion matrix for a subset of gestures shows that signatures and
classification on major attributes achieve a high percentage of recall in gesture
recognition.

Keywords: Human-robot interaction · Gesture recognition · Petri net · Social


robotics · Synchronization

1 Introduction

In recent years, developed countries are facing growth in aging population [1]. It is
anticipated that the world population will soon plateau and start declining by 2050 [1].
This will cause a severe lack of availability of human caretakers. Social or assistive
robots will be needed to provide mental, physical, and social support for special needs,
including medical industry or elderly care industry [2] for therapy or care [3], education

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 432–450, 2023.
https://doi.org/10.1007/978-3-031-18344-7_29
Automated Real-Time Recognition 433

industry as teachers [4] and space or deep-sea exploration [5]. Social robots will share
a common space with humans. To improve acceptability and interact, social robots will
have to comprehend human emotions, speech, and gestures, and generate natural human
comprehensible gestures and speech [6, 7].
Human-robot interaction, including conversation, is mostly non-emotional.
Researchers have developed many models for the analysis and generation of multi-
pose conversational head-gestures based upon a combination of head-nod, headshake,
head-tilt and eye-focus for dynamic adaptation and learning [8, 23]. Researchers have
also developed a synchronized colored Petri net-based model for multimedia gesture
analysis and training of social robots, one gesture at a time [9]. The training model uses
silence, motion, stillness and synchronization between motions and speech to train social
robots one gesture at a time [9]. However, they do not propose an implementation of
the proposed model that continuously detects conversational gestures during real-time
interaction.
In this paper, we analyze a combination of motion sequence, eye-focus, cyclic
motions, and their synchronization associated with non-emotional conversational head-
gestures and implement the synchronous colored Petri net for real-time automatic recog-
nition of gesture boundaries and gestures in the wild. We extract vector of meta-attributes
for each gesture and match vectors for the incoming gestures and vectors in the knowl-
edge base to identify gestures in the wild. We also use head-motion and eye-focus based
gesture classification for better disambiguation between similar gestures.
The implemented model has five modules, as illustrated in Fig. 1. The first module
derives motion, stillness and silence vector using video analysis of facial feature-points
and speech intensity analysis. The second module derives places, transitions, delays, and
synchronization for each gesture using dynamically generated matrices for the corre-
sponding Petri net graphs. The third module derives the meta-attributes of each derived
gesture-graph. Meta-attributes include types of head-motion, eye-focus, types of syn-
chronization, and number of places, transitions, and cycles. The fourth module matches
the signatures derived from meta-attributes of analyzed gestures with archived signa-
tures to label the gestures being analyzed. Fifth module performs gesture resolution for
error introduced in the places and transitions due to the smaller head and eye motion.
The major contributions of this research are:

1. Real time recognition of multiple conversational head gestures in the wild.


2. Meta-attributes based gesture matching for fast gesture recognition.
3. Classification based upon major attributes: head-motion, eye-focus, repeated actions,
and speech to enhance labeling accuracy of actual gesture.

The overall organization of the paper is: Sect. 2 presents the background of conver-
sational head gestures, synchronous gesture modeling, and the derivation of the basic
elements of Petri net graphs. Section 3 describes the related work. Section 4 discusses
synchronous colored petri net modeling of conversational head-gestures and the deriva-
tion of corresponding meta-properties. Section 5 describes matching of meta-properties
to derive similar gestures to resolve ambiguity. Section 6 describes implementations and
algorithms. Section 7 describes the experimentation and discusses the results. Section 8
concludes the paper and describes future work.
434 A. Singh and A. K. Bansal

Video source

Video analysis for gesture-boundary, places, and transitions

Creating synchronous colored Petri net graph using matrices

Meta-attribute and synchronization analysis

Meta-attributes matching and similarity analysis

Ambiguity resolution

Recognized gesture

Fig. 1. Overall gesture recognition model

2 Background
2.1 Conversational Non-emotional Head Gestures

Conversational head gestures comprise a sequence of combination of different head-


motion types and head postures combined synchronously with spoken phrases. Conver-
sational head-gestures refer to entities, express warmth when combined with haptics, and
augment speech and facial expressions. The 36 conversational head-gestures described
by researchers [9] are: acceptance/appreciation, argumentation, arrogance, avoidance,
backchanneling, confidence, confusion, discouragement, disagreement, defensive, defi-
ance, denial, depressed, dominate, encourage, expectation, frustration, greeting, inclu-
sion, interest, interject, interrogation, permission, persuasion, pleading, questioning,
rejection, relaxation, request, ridicule, seeking attention, submission, unsure, and veiled
disagreement.
Major attributes needed to classify and label gestures are head-nod, head-shake,
head-tilt, eye-focus, repeated motion, and synchronization between motions of multiple
organs and speech.
The lack of synchronization between head-motions, eye-movement and speech
causes distortion and incorrect labeling of the actual gesture [9]. Temporal synchroniza-
tion between dialogs and conversational head-motions exhibits six different Allen’s syn-
chronization [10, 11]: sequential, concurrent, start synchronization, end synchronization,
strict synchronization and during [9].
Automated Real-Time Recognition 435

2.2 Synchronized Colored Petri Net

A Petri net is a directed cyclic graphical token-based model of embedded concurrent


processes [12]. The graph comprises two types of nodes: places and transitions. Places
are connected to transition, and transitions are connected to places. Concurrent processes
fire after the number of tokens in a place exceeds the number of tokens required to fire.
Tokens flow from source place to destination place through a transition node. Repetitions
of a set of processes and transitions are modeled as cycles. Colored Petri nets fold the
similar parts of the Petri net using colors. For example, motion in 3D-plane is modeled
using three colors, one for each dimension.

P1 T1
end
P

P
P T
2a. start sync (| | ) 2b. end sync
T T
P3
P1
P2

P1
P0
T1
P2 T2 P4
2c. during sync (δ > +1) 2d. simple cycle

Fig. 2. Modeling synchronization/cycle in synchronous colored petri net

Temporal synchronization between two coordinating concurrent processes is mod-


eled using triggering nodes and starting delays or ending delays associated with places,
transitions, or edges [9]. A trigger node (see Fig. 2) sends tokens from a place to multiple
transitions to start multiple concurrent processes in a synchronized manner. By associ-
ating delay along the edges, a process is regulated to start after a fixed delay. Delays at
the destination place during a concurrent process simulate the effect of waiting for other
processes to terminate, allowing end synchronization [9].

2.3 Deriving Basic Elements of Petri Net Graph

Video analysis of face motion is used to derive a head-motion. Head-motion measure-


ments return values of x-coordinates and y-coordinates of facial feature-points [9, 23].
A head is still if the change in x-coordinate and y-coordinate is below a threshold for an
empirically derived longer time-interval [9].
A head moves if one or more coordinates change above a threshold during the
next sampling interval. Head-motion corresponds to head-nod if y-coordinate changes
above a threshold and x-coordinate movement are below a threshold in the next sample.
436 A. Singh and A. K. Bansal

Head-motion corresponds to head-shake if x-coordinate changes above a threshold and


y-coordinate movement are below a threshold in the next sample. During head-tilt, both
x-coordinate and y-coordinate change above the threshold.
A place is detected if a head is in the relaxed-state, still-state, changing head-motion
type or reversing direction to repeat the previous motion. A transition occurs between
two places, caused by head-motion or speech. In the relaxed-state, head remains still
and silent for the duration above an empirically derived longer temporal threshold. A
conversational gesture occurs between two adjacent relaxed-states. Repeated motions
such as repeated head-shake are modeled by cyclic transitions between two places. To
find a cycle in a Petri net, Euclidean distance between new coordinates and coordinates
of one of the visited places should be below a threshold.

3 Related Works
Video analysis for gesture recognition is classified as a subclass of Human Action Recog-
nition (HAR) [13]. HAR has many applications such as video surveillance, healthcare,
and human-robot interaction [14–17]. Researchers have used 2D Convolutional Neu-
ral Network (CNN) for HAR [14–17] and 3D convolution kernel and depth sensors to
extract spatio-temporal data [18–22].
Conversational gesture analysis differs from HAR such as identification of walking
and running due to embedded nonverbal communication and interaction and inherent
synchronization of multiple actions such as head-motion, hand-motion, eye-focusing,
lip-movements, and speech.
The current research on head-gesture recognition is limited to 1) video analysis
for basic head-motion (head-node and head-shake) detection [23]; and 2) hand-motion
analysis during conversation using various artificial intelligence methodologies such as
Hidden Markov Model (HMM), Dynamic Bayesian Network (DBN), and Long Short-
Term Memory (LSTM) [23, 24]. However, the research does not capture the temporal
synchronization of concurrent motions and speech needed for conversational gesture
analysis.
Automatic head-nod detection methods have been developed in the context of
Human-Computer Interaction (HCI) to facilitate affirmation [25]. Researchers have
proposed designs to track the facial points in combination with finite state machines
(FSM) and HMM for head-nod detection using facial feature-detections, including mul-
tipose detections [23–29]. Researchers also combine head-posture analysis with head-
motion analysis [23, 24]. However, these methods do not derive the combination of
head-movements for different conversational head-gestures as identified by behavioral
and clinical psychologists.

3.1 Gesture Recognition

The existing methods of gesture recognition comprise 3D pose-estimation, motion-


estimation, and temporal sequence analysis [30, 31]. Facial pose has been used to
estimate frontal pose estimate [32, 33]. The temporal sequence analysis of gestures
Automated Real-Time Recognition 437

was employed using a neural network [30]. The limitations of face-pose estimations are
that they are not sensitive to small head-pose changes.
Automated co-speech gesture recognition combines multimedia (video stream,
including sound) analysis. Pioneering research has been done on exploiting cognitive
models of conversational gesture recognition and generation, along with limited con-
current speech generation capability in Asimov robots [34]. However, the model lacks
Allen’s synchronization in movements between multiple organs, and synchronization
in speech and organ movements for realistic, human-like interactions. Lack of synchro-
nization causes perceptual distortion. They also do not have a formal declarative model
required for real-time learning and adaptation.
In recent years, researchers have also used deep neural networks (DNN) such as CNN
[35] or fusion of CNN and LSTM [36] to classify and label gestures from head-motions.
Both studies suffer from incomplete classification of head-movements in co-speech ges-
tures and do not take synchronization and cycles into account. Ad hoc grouping of head-
motion classes [36] is not as concise as synchronous petri net based abstractions, which
is also invariant of individual-specific and gesture-specific motion-speed variations.

3.2 Petri Net Models for Gesture Recognition


Current research on gesture analysis focuses on Human Activity Recognition (HAR)
and recognizing subsets of hand-gestures using varied combinations of lidar or Kinect
based depth analysis; silhouette analysis of still images; image-segmentation based upon
color, shape, and motion; machine learning techniques such as HMM, DBN, integration
of convolution with LSTM and Petri nets [37–43]. These studies do not analyze inter-
organ gesture synchronization and synchronization between speech and other organs
involved in conversational gestures.
Researchers have also developed a Petri net model of conversational head-gesture
using silence and stillness vector-based model to train social, robots, one gesture at
a time [9]. However, the proposed methodology is based upon user-based labeling of
gestures, assumes training one gesture at a time, and does not propose any scheme to
match acquired Petri net with archived Petri Net for gesture recognition in the wild.
In this research, the relaxed state is automatically identified; a dynamic matrix along
with a vector of meta-attributes model synchronous colored Petri net; the dynamically
generated matrix is analyzed to derive meta-attributes. The archived vectors of meta-
attributes are matched with the vector of meta-attributes of gesture being analyzed to
derive the best match. We also use the semantic association of the head-nod with affirma-
tion, head-shake with negation, head-tilt with emphasis and eye-focus with attentiveness
along with the derived meta-properties to develop a decision tree for better ambiguity
resolution between two gestures with similar meta-attributes. To our knowledge, this
is the first effort to recognize the sequence of complex non-emotional conversational
head-gestures that supports temporal synchronization of motions and speech.

4 Modeling Synchronous Colored Petri Net


Implementing synchronous colored Petri net comprises an m × n matrix and a (m +
n) vector of property-tuples where m is the number of places and n is the number of
438 A. Singh and A. K. Bansal

transitions. Each cell of the matrix is a 5-tuple of the form (connectivity, edge-delay,
colors, part-of-cycle, type-of-synchronization). Connectivity is ‘1’ if the corresponding
nodes are connected; edge-delay is non-zero in the case of starting delay of processes in
‘during synchronization’; colors describe the dimensions in which motion takes place;
part-of-cycle marks the edge as part of a cycle; and type-of-synchronization describes
the type of synchronization associated with the edge. For example, for head-shake color-
entry is ‘x-dimension;’ for head-nod color-entry is ‘y-dimension’; for head-tilt color-entry
is {x-dimension, y-dimension}. The matrix is built dynamically during video analysis,
one node at a time based upon motion, stillness, and silence analysis. All the place-labels
are coalesced consecutively followed by all the transition-labels.
The property tuple for each node is {node-type, in-degree, out-degree, cycles, syn-
chronization, delays}. Node-type could be ‘place’, ‘transition’, or ‘trigger-node’. In-
degree is the number of incoming edges to the node. In-degree is greater than 1 when
(1) two concurrent processes merge; and (2) presence of a cycle is caused by a repeated
sequence of head-motions. Out-degree is greater than 1 when (1) two concurrent pro-
cesses are spawned; and 2) non-deterministic junction where a motion may repeat,
change the motion-type, or terminate motion.
Synchronization in Petri net is derived using in-degree and out-degree analysis in
the associated connected nodes. A transition node with in-degree ≤1 and out-degree >1
indicates the beginning of synchronization. A place with in-degree >1 and out-degree
≤1 indicates termination of synchronization.

Example 1. Gesture ‘appreciate’


Figure 3(a) describes a synchronous colored Petri net graph for the conversational head-
gesture ‘appreciate’. In appreciation, first the head is tilted, then the head moves down
synchronously with a speech phrase such as ‘good’. After the agent becomes silent, the
head is moved back from the tilted-down position to a relaxed position.
Gesture ‘appreciate’ starts with relaxed head (place P1 ) followed by a transition
head-tilt Tr 1 leading to tilted-head P2 . The trigger Tr 2 enables the start-synchronization,
leading to a tilted head-down position (place P3 ) and speech P4 .
After the synchronized motion, the places speech and a tilted head-down position are
followed nondeterministically by transitions to place tilted head with silence (transition
Tr 4 followed by place P5 ). The place P5 is followed by a transition to relaxed head
position with silence (transition Tr 5 followed by place P6 ).
The corresponding Petri net graph is acyclic and has six places (P1 to P6 ), five
transitions (Tr 1 to Tr 5 ), one strict synchronization between P2 and P5 . The transition
node Tr 2 is a trigger-node that starts a strict-synchronization, and node P5 ends the
synchronization. It creates an 11 x 11 matrix and a vector of 11 tuples.

Example 2. Gesture ‘disagreement’


Figure 3(b) describes a synchronous colored Petri net graph for the conversational head-
gesture disagreement. During disagreement, first the head shakes left, then the head
moves right synchronously along with a speech phrase such as ‘no’. The head shakes
repeatedly from left to right in a simple cycle, and then the head moves back to the
relaxed position, and the gesture ends.
Automated Real-Time Recognition 439

tilted head
down P3 Tr3{y} relaxed
relaxed and Tr {x, y} tilted head & silent
silent head 1
Tr2{y} silent head
P1
P2

tilted head + P5 P6
Tr5{x, y}

P4 Tr4

start sync speak phrase

3(a). Petri net for gesture appreciate


Tr3{x}
head left head right Tr {x} relaxed head
P2 4
P4 P5
Tr1{x}
relaxed head
P1

Tr2{x} Tr5

+ silent P6
speak phrase
P3

3(b). Petri net for gesture disagreement

Fig. 3. Synchronous colored petri net for gestures ‘appreciate’ and ‘disagreement’

The corresponding Petri net has six places (P1 to P6 ), five transitions (Tr 1 to Tr 5 ),
one strict synchronization between P2 and P5 . The transition node Tr 1 is a trigger-node
that starts a synchronization, node P2 is the entry point of the cycle and node P4 is the
end of the cycle. Tr 2 and Tr 3 are transitions involved in a cycle. It creates an 11 × 11
matrix.

5 Signature and Gesture Recognition

Each gesture has a signature. Each signature-tuple defines ((head-nod, direction), (head-
shake, direction), (head-tilt, direction), eye-focus, number of places, number of tran-
sitions, number of start-synchronization, number of end-synchronization, number of
strict-synchronization, number of during-synchronization, number of concurrent asyn-
chronous actions, number of cycles, speech). Head-nod, head-shake, Head-tilt, eye-
focus and speech have binary values, and other attributes are multivalued integers. The
signature of conversational head-gestures is given in Table 1.
440 A. Singh and A. K. Bansal

Table 1. Signature of the non-emotional conversational head-gestures.

Gesture Class Signature


Appreciation 7 ((1, 1), (0, 0), (1, 1), 0, 6, 5, 1, 0, 0, 0, 0, 0, *)
Agreement 8 ((1, 1), (0, 0), (1, 1), 1, 9, 8, 1, 0, 0, 0, 0, 1, *)
Arrogance 6 ((1, 1), (0, 0), (1, 1), 0, 9, 8, 0, 0, 0, 0, 0, 0, 0)
Avoid 9 ((0, 0), (1, 1), (0, 0), 0, 5, 4, 1, 0, 0, 0, 0, 0, 0)
Backchannel 5 ((1, 1), (0, 0), (0, 0), 1, 7, 6, 1, 0, 0, 0, 0, 1, *)
Confusion 12 ((0, 0), (1, 1), (1, 1), 0, 7, 6, 1, 0, 0, 0, 0, 0, *)
Defensive 15 ((0, 0), (1, 1), (1, 1), 1, 9, 8, 1, 0, 0, 0, 0, 0, *)
Defiance 14 ((0, 0), (1, 1), (1, 1), 1, 9, 8, 1, 0, 0, 0, 0, 0, 0)
Denial 11 ((0, 0), (1, 1), (0, 0), 1, 5, 4, 0, 0, 0, 1, 0, 1, *)
Depressed 1 ((1, 1), (0, 0), (0, 0), 0, 2, 1, 0, 0, 0, 0, 0, 0, 0)
Disagreement 11 ((0, 0), (1, 1), (0, 0), 0, 6, 5, 1, 0, 0, 0, 0, 1, *)
Discourage 15 ((0, 0), (1, 1), (1, 1), 1, 6, 5, 1, 0, 0, 1, 0, 0, *)
Dominate 7 ((1, 1), (0, 0), (1, 1), 1, 10, 9, 2, 0, 0, 0, 0, 0, *)
Encourage 4 ((1, 1), (0, 0), (0, 0), 1, 6, 5, 0, 0, 0, 1, 0, 0, *)
Expect 3 ((1, 1), (0, 0), (0, 0), 0, 3, 1, 1, 0, 0, 0, 0, 0, 0)
Frustrated 13 ((0, 0), (1, 1), (1, 1), 1, 5, 4, 1, 0, 0, 1, 0, 0, *)
Greet 8 ((1, 1), (0, 0), (1, 1), 1, 10, 9, 1, 0, 0, 1, 0, 0, *)
Include 8 ((1, 1), (0, 0), (1, 1), 1, 8, 7, 1, 0, 0, 1, 0, 0, *)
Interested 7 ((1, 1), (0, 0), (1, 1), 0, 7, 6, 0, 0, 1, 0, 0, 0, *)
Interject 2 ((1, 1), (0, 0), (0, 0), 0, 5, 4, 0, 0, 0, 0, 0, 0, *)
Interrogation 5b ((1, 1), (0, 0), (0, 0), 1, 7, 7, 0, 0, 1, 0, 0, 1, *)
Permit 2 ((1, 1), (0, 0), (0, 0), 0, 5, 4, 1, 0, 0, 0, 0, 0, *)
Persuade 5b ((1, 1), (0, 0), (0, 0), 1, 8, 7, 1, 0, 1, 0, 0, 0, *)
Plead 2 ((1, 1), (0, 0), (0, 0), 0, 5, 4, 1, 0, 0, 0, 0, 0, *)
Question 4 ((1, 1), (0, 0), (0, 0), 0, 5, 4, 1, 0, 0, 0, 0, 0, *)
Reject 11 ((0, 0), (1, 1), (0, 0), 1, 3, 1, 0, 0, 0, 1, 0, 0, *)
Relax 1 ((1, 1), (0, 0), (0, 0), 0, 4, 1, 0, 0, 0, 0, 1, 0, 0)
Request 7 ((1, 1), (0, 0), (1, 1), 0, 6, 5, 0, 0, 1, 0, 0, 0, *)
Ridicule 15 ((0, 0), (1, 1), (1, 1), 1, 6, 5, 1, 0, 0, 1, 0, 0, *)
Seek attention 3 ((1, 1), (0, 0), (0, 0), 1, 8, 7, 2, 0, 0, 0, 0, 0, 0)
Submit 1 ((1, 1), (0, 0), (0, 0), 0, 6, 5, 0, 0, 0, 0, 0, 0, 0)
Veiled disagreement 10 ((0, 0), (1, 1), (0, 0), 0, 5, 4, 1, 0, 0, 0, 0, 0, 0)
Automated Real-Time Recognition 441

The absence of head-nod, head-shake or head-tilt is denoted by ‘0’. The direction of


head-nod = ‘1’ denotes upward direction; head-nod = ‘0’ denotes downward direction.
The direction of head-shake = ‘1’ denotes head turning right; head-shake = ‘0’ denotes
head turning left. The direction of head-tilt in the right direction is denoted by ‘1’; head-
tilt in the left direction is denoted by ‘0’. A non-specific head-movement is denoted as
a ‘*’. ‘*’ matches both ‘0’ or ‘1’. Unfocused eye is denoted as ‘0’, and focused eye is
denoted as ‘1’. The absence of a specific synchronization is denoted by ‘0’.

5.1 Sampling Error and Ambiguity Resolution

If a place is missed due to a sampling error because sampling time ≥ the stillness thresh-
old, the erroneous signature may not match with the signature of the actual gesture, and
the actual gesture may be mislabeled. This labeling error is reduced using a classification
tree that groups gestures based on major attributes.
The classification tree is based on the semantics of the head-motion-type, eye-focus,
cycle (repeated motion) and speech. Head-nod and head-shake are mutually exclusive.

Table 2. Subclasses of gestures based upon major attributes

Major classification attributes Gestures


Class # Nod Shake Tilt Cycles Eye-focus Speech
1 1 0 0 0 0 0 Depressed; Relax; Submission
2 1 0 0 0 0 1 Interject; Permit; Plead
3 1 0 0 0 1 0 Expect; Seek Attention
4 1 0 0 0 1 1 Encourage; Question
5a 1 0 0 1 1 0 Backchannel
5b 1 0 0 1 1 1 Backchannel; Interrogation;
Persuade
6 1 0 1 0 0 0 Arrogance
7 1 0 1 0 0 1 Appreciate; Interested; Request;
Dominate
8 1 0 1 1 0 1 Agreement; Greet; Include
9 0 1 0 0 0 0 Avoid
10 0 1 0 1 0 0 Veiled disagreement
11 0 1 0 1 0 1 Denial; Disagreement; Reject
12 0 1 1 0 0 0 Confusion
13 0 1 1 0 0 1 Frustration
14 0 1 1 0 1 0 Defiance
15 0 1 1 1 1 1 Defensive; Discourage; Ridicule
442 A. Singh and A. K. Bansal

Single motion (cycle = 0) and repeated motions (cycle = 1) are mostly mutually exclu-
sive. Similarly, focused-eye and unfocussed-eye are mutually exclusive. This mutual
exclusion helps in reducing labeling error in identified gestures.
Based upon major attributes, we classified gestures into fifteen classes. Within the
same class, gestures are separated further using motion direction, synchronization infor-
mation, place information and transition information for further resolution. The classes
are illustrated in Table 2.

6 Automated Identification Algorithm

A dynamically constructed matrix is realized to model Petri net graph. Petri net graphs
are modeled using: 1) a matrix comprising (m + n) × (m + n) where m is the number
of places, and n is the number of transitions; 2) a vector of associated meta-properties
derived by matrix-analysis. Since there is no edge between places, and there is no edge
between transitions, the corresponding matrix segments have zero entries.
Facial video is analyzed to derive the sequence of facial feature-point coordinates.
Feature-point coordinates are analyzed to derive transition coordinates, stillness vector
stl and silence vector sil. Using changes in transition coordinates, various head-motion
types are derived.
Dynamic matrix is built by incrementing the place-label counter i or the transition-
label counter j after identifying a place or transition. The dynamic matrix is also analyzed
to derive synchronization information and cycles in the Petri net.
A simplified abstract algorithm for building Petri net matrix and continuous gesture
identification is given in Fig. 4. The algorithm uses stillness vector, silence vector, and the
coordinates of the feature-points during motion. For simplicity, the algorithm illustrates
synchronization between head-motions and speech.
A place is identified when the stillness vector stl transitions from ‘0’ to ‘1’. Head is
still if the distances between centroids of facial feature-points for consecutive sampling
time are below a statistically derived threshold. If a place, based on coordinates compar-
ison, has not already been visited, a new node-label Pi+1 is created, an edge is marked
between the previous label LM Prev → Pi+1 , place-counter i is incremented by one, and
previous label LM Prev is updated to new Pi .
A place is part of a cycle if the place has already been visited. Before creating a new
place-label, coordinates of the next place Pi+1 are compared with the coordinates of the
visited places in a memo table S place and searched using a similarity-based analysis. If
the Euclidean distance between the two coordinates is below a threshold, the new place
is the same as the visited place, and the label of the visited place is used instead of the
new label Pi+1 , and the new place index i is not incremented.
A cycle detection algorithm is initiated using the set S place to identify the places and
transitions involved in the cycle. All the nodes (places and transitions) in the cycle are
marked, and the corresponding meta-attribute vector is updated.
A transition is identified when the stillness vector stl transitions from ‘1’ to ‘0’. If a
transition is not a part of the cycle, then a new transition node Tr j +1 is created. An edge
LM prev → Tr j+1 is marked between the previous node (a place Pi ) and the new transition
Automated Real-Time Recognition 443

Tr j+1 . The transition-counter j is incremented by one, and the new label Tr j is stored as
LM prev .
When the stillness-vector stl transitions from 1 → 0 and silence vector sil transitions
from 1 → 0 within a short time-lag δ, there is a concurrent occurrence of speech,
which indicates either start-synchronization or duration-synchronization. If the silence-
vector transitions within |δ| ≤ 1 time-unit, a Boolean variable start-synch is set, and
start_sync_count is incremented by one.

Fig. 4. A simplified algorithm for building a dynamic matrix for gesture recognition
444 A. Singh and A. K. Bansal

For time-lag δ > + 1 time-unit, is treated as during synchronization, and during-


synch count is incremented by one. In both cases, a transition-trigger Tr j+1 is created.
The corresponding edge LM prev → Tr j+1 is created, and the transition-counter j is incre-
mented by one. The new transition Tr j is stored as LM prev . Two additional places Pi+1
(for motion) and Pi+2 (for speech) are created to separate the concurrent processes. Edges
LM prev → Pi+1 (for motion) and LM prev → Pi+2 (for speech) are set to mark the start of
concurrent actions. The place counter i is incremented by two. The motion place Pi+1 is
stored as LM prev , and the speech place Pi+2 is stored as LS prev . After the silence-vector
sil transitions from 0 → 1, a new transition Tr j+1 is created to mark the end of speech.
An edge is created between LS prev → Tr j+1 . LS prev is updated to speech transition Tr j+1 .
The transition counter j is incremented by one.
The endSynCount is incremented when silence-vector sil goes from 0 → 1, and
stillness-vector stl transitions from 0 → 1 within a time-delay δ end−sync . A new motion
place Pi+1 , is created. Two edges (LM prev → Pi+1 ) and (LS prev → Pi+1 ) are created. The
place-counter i is incremented by one, and the previous node LM prev is updated to Pi+1 .
If Boolean variables startSync or duringSync are true, then the variable strictSync-
Count is incremented by one, and all the places and transitions between transition-trigger
and the place ending synchronization are included in strict-synchronization.
A gesture ends if the head reaches in the relaxed-state or the stillness-vector stl
and the silence vector sil end. In the relaxed state, both stillness vector still and silence
vector sil have the value of 1 for a prolonged time above a statistically derived temporal
threshold.
After a gesture-boundary is identified, motion analysis is done based upon transition
colors, synchronization count, number of places and number of transitions, and the meta-
attribute vector mv is updated. At the end of each gesture-boundary, the dynamic matrix,
meta-attribute vector mv, and all the variables (except time-index t) are re-initialized for
the detection of the next gesture.

7 Implementation and Discussion

The algorithms have been implemented in Python, interfaced with Visual Studio Code
(1.55.2), using the python library for face-detection and OpenCV library [44], PyAudio
library for speech analysis [45] and Pydub audio library for silence analysis [46]. The
software was executed on a machine with Intel(R) Core (TM) i7-6500U CPU @ 2.50 GHz
2.60 GHz 64-bit system with 8 GB RAM. The system was developed using the Python-
library.
Video frames were sampled 10 times/sec with a time interval between two frames
being 100 ms. The upper threshold for silence detection was 45 dB. The empirical
analysis of recorded data showed that the maximum threshold of random head-motion
for still head x-coordinates is ±4.0 and y-coordinates is ±3.0. The relaxed-state was
found to be in the region (x-origin ±8.5, y-origin ±4.0). Increasing the sampling rate
improves the recall percentage slightly at the cost of added computational overhead.
During the experimentation, there were instances of the time where the software
failed to detect the face. Hence, coordinates of the feature-points were unavailable. In
such cases, we assumed continuity of motion, according to Gestalt theory of cognition
Automated Real-Time Recognition 445

[47], and predicated the coordinate values x t = (x t-1 + x t+1 )/2 and yt = (yt-1 + yt+1 )/2.
This assumption leads to some measurement error if the video-frame for the endpoint
of motion is not sampled. We also identified perceptual time for gesture boundaries and
synchronization distortion.
The experiment was repeated 500 times for each gesture. Table 3 shows a confusion
matrix (in percentage) to describe the accuracy and mislabeling of gestures appreciate,
interest and request (class 7), question (class 4) and interrogation (class 5b).

Table 3. Confusion matrix (in percent) for labelling some similar gestures

Labeled gestures (in percent)


Gesture-class Labeled as Appreciate Interest Request Question Interrogate
other gestures
Actual Appreciate 7 7.8 81.5 1.5 0.0 0.0 1.2
Gestures Interest 7 0.0 0.0 89.4 1.4 0.0 3.4
Request 7 2.1 0.0 1.5 87.0 2.5 0.0
Question 4 6.4 0.0 0.0 0.8 85.0 1.7
Interrogate 5b 3.2 0.0 3.7 0.0 0.9 91.0

The gestures ‘question’ and ‘interrogation’ are similar and share five major attributes:
head-nod, no head-shake, no head-tilt, eyes focused and speech (see Table 1 and Table 2).
However, ‘interrogation’ has a cycle and ‘question’ does not have a cycle. In addition,
additional attributes such as start-synchronization (in ‘question’), number of places and
transitions also differ.
The result shows a high percentage of recall (correct labeling of observed gestures)
for all five gestures. Recall varies from 81.5% for the actual gesture ‘appreciate’ to 91%
for the actual gesture ‘interrogate’. This is significant, as it establishes the importance
of motion analysis for accurate gesture identification.
There are many mislabelings: actual gesture ‘appreciate’ mislabeled as other gestures
(7.8%), as ‘interest’ (1.8%) and as ‘interrogate’ (1.2%); the actual gesture ‘interest’
mislabeled as ‘request’ (1.4%) and as ‘interrogation’ (3.4%); the actual gesture ‘request’
mislabeled as other gestures (2.1%), as ‘interest’ (1.5%) and as ‘question’ (3.4%).
The mislabeling between the gestures ‘appreciate’ and ‘interest’ stems from the
choice of threshold to derive start and end synchronization (see Table 1), the lack of
comprehension of the meaning of the associated dialogs and sensing inaccuracies. Larger
threshold for start synchronization treats during synchronization as start synchronization,
and a smaller threshold misses the start synchronization. Sensing inaccuracies are caused
by sampling instance of the motion and speech-digitization.
The mislabeling between the gestures ‘interest’ and ‘request’ is caused by missing
a place and error in the relaxed-state temporal threshold that may insert a place (see
Table 1) and the lack of dialog analysis. The mislabeling between the gestures ‘request’
and ‘interest’ can be reduced by dialog analysis.
There are mislabeling errors in the gestures ‘question’ and ‘interrogation’: ‘inter-
rogation’ is mislabeled as ‘question’ 0.9% of the time, and ‘question’ is mislabeled as
446 A. Singh and A. K. Bansal

‘interrogation’ 1.7% of the time. Signature analysis in Table 1 shows that ‘interrogation’
being mislabeled as ‘question’ is caused by a missing cycle during error in the proximity
threshold to identify visited places. The mislabeling of the gesture ‘question’ as interro-
gation is caused by mixing of a small amount of ‘backchanneling’ with ‘question’. The
mislabeling can be reduced by dialog analysis and better tuning of proximity threshold.
The mislabeling of the gesture ‘interest’ as ‘interrogation’ (3.4%) and the gesture
‘interrogation’ as ‘interest’ (3.7%) are quite striking due to two gestures being in the
different classes based upon major attributes’ classification (see Table 2). The mislabel-
ing of ‘interrogation’ as ‘interest’ is caused by the differences in one or more of three
attributes (see Tables 1 and 2): 1) the missing of a cycle due to error in the proximity
threshold to identify visited places; 2) presence of the focused-eye during ‘interroga-
tion’ compared to unfocussed eye in ‘interested’; 3) the presence of slight tilt during
‘interrogation’. This mislabeling can be reduced by dialog analysis.
Mislabeling is also caused by missing small undetectable motions in gestures and
missing feature-points and frames in video analysis, resulting into missing places and
the corresponding transitions. Larger angular threshold for eye-focus causes mislabeling
such as disagreement being labeled as denial or discourage (see Table 1). Signatures
also lack the results of speech analysis, dialog-context, motion-speed, facial expression
analysis and number of cycles. To improve the accuracy, motion analysis needs to be
augmented with head-motion sequence, dialog context, facial expression analysis, and
dialog understanding.
An error in the measurement of actual changes in x and y coordinates during tilt
+ an acyclic head-shake can also be mislabeled as an acyclic head-nod + tilt in the
corresponding signature. Similarly, small tilt may be treated as ‘no tilt’ because of the
limitations in video analysis and feature-point detection.

8 Conclusion and Future Work

A generalized scheme has been described to analyze head-motions to label psycho-


logically derived sequence of non-emotional conversational head-gestures in real time.
The scheme is based upon integrating video analysis of head-motions to derive coor-
dinates of facial feature-points, dynamic matrix-based implementation of synchronous
colored Petri net model, summarizing the meta-attributes of the resulting dynamic-
matrix as signatures and matching the derived signatures with archived signatures using
similarity-based analysis.
The scheme is affected by the choices of sampling interval, thresholds, sensor inac-
curacies to detect a change in the coordinates of feature-points, which can miss some
places resulting into gesture recognition error based upon signature that captures lim-
ited meta-attributes. Since the gesture-time depends upon age, diseases, and individuals’
gesture-patterns, there is a need for adaptation to adjust to individual behavior patterns.
Error is also caused by the change of both x-coordinate and y-coordinate values in head-
tilt, conflict between threshold values, the lack of feature-point changes in small motion,
missing the repeated motions due to small motions, lack of speech detection for lower
decibel sounds, the lack of facial expression analysis and dialog analysis.
Automated Real-Time Recognition 447

These errors cause actual gestures to be mislabeled around 9%–17% of the time.
Despite mislabeling, combination of motion analysis, synchronization information cap-
ture and cycles in the Petri net graph gives around 83%–91% accurate recognition of the
gestures.
Currently, we are looking at patterns of head-motion analysis and LSTM based anal-
ysis of stillness vectors and coordinate stream to reduce the error caused by sampling
intervals. Thresholds for start and end synchronization affect the detection of synchro-
nization and signatures accordingly. The threshold for the same gesture varies for indi-
viduals. Threshold and motion-speed also change with gestures [48]. Hence, thresholds
must be adaptive based upon gesture prediction. We are also looking at transfer learning
for adapting the learnt gesture for different age and gender. We are also looking into
separating head-motions involved in deictic gestures from the head-motions involved in
co-speech gestures. We are also doing speech analysis to separate co-speech gestures
such as ridicule and denial from gestures such as disagreement.

References
1. Yenilmez, M.I.: Economic and social consequences of population aging the dilemmas and
opportunities in the twenty-first century. Appl. Res. Qual. Life 10(4), 735–752 (2015). https://
doi.org/10.1007/s11482-014-9334-2
2. Agrigoroaie, R.M., Tapus, A.: Developing a healthcare robot with personalized behaviors and
social skills for the elderly. In: Proceedings of the 11th ACM/IEEE International Conference
on Human-Robot Interaction (HRI), pp. 589–590. Christchurch, New Zealand (2016). https://
doi.org/10.1109/HRI.2016.7451870
3. García, D.H., Esteban, P.G., Lee, H.R., Romeo, M., Senft, E., Billing, E.: Social robots in
therapy and care. In: Proceedings of the 14th ACM/IEEE International Conference on Human-
Robot Interaction (HRI), pp. 669–670. Daegu, Korea (2019). https://doi.org/10.1109/HRI.
2019.8673243
4. Rosenberg-Kima, R., Koren, Y., Yachini M., Gordon, G.: Human-robot-collaboration (HRC):
social robots as teaching assistants for training activities in small groups. In: Proceedings of the
14th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 522–523.
Daegu, South Korea (2019). https://doi.org/10.1109/HRI.2019.8673103
5. Diftler, M.A., et al.: Robonaut 2 – the first humanoid robot in space. In: IEEE International
Conference on Robotics and Automation, pp. 2178–2183, Shanghai, China (2011)
6. Glas, D.F., Minato, T., Ishi, C.T., Kawahara, T., Ishiguro, H.: ERICA: the ERATO intelligent
conversational android. In: Proceedings of the 25th IEEE International Symposium on Robot
and Human Interactive Communication (RO-MAN), pp. 22–29, New York (2016)
7. Kendon, A.: Gesture: Visible Actions as Utterance. Cambridge University Press, Cambridge,
UK (2004)
8. Singh, A., Bansal, A.K.: Declarative modeling and implementation of robotic head-based
gestures for human-robot interaction. Int. J. Comput. Appl. 16(2), 49–66 (2019)
9. Singh, A., Bansal, A.K.: Towards synchronous model of non-emotional conversational gesture
generation in humanoids. In: K. Arai (ed.) Intelligent Computing. LNNS, vol. 283(1), pp. 737–
756. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-80119-9
10. Allen, J.F.: Maintaining knowledge about temporal intervals. Commun. ACM 26(11), 832–
843 (1983). https://doi.org/10.1145/182.358434
11. Singh, A., Bansal, A.K.: Towards modeling gestures for non-emotional conversational inter-
action by humanoid robots. In: Proceedings of the 31st International Conference on Computer
Applications in Industry and Engineering, pp. 59–64. New Orleans, LA, USA, (2018)
448 A. Singh and A. K. Bansal

12. David R., Alla, H.: Petri Nets & Grafcet, Tools for Modelling Discrete Event Systems, Prentice
Hall, New York, USA (1992)
13. Liu, H., Wang, L.: Gesture recognition for human-robot collaboration: a review. Int. J. Ind.
Ergon. 68, 355–367 (2018). https://doi.org/10.1016/j.ergon.2017.02.004
14. Wang, P., Li, Z., Hou, Y., Li, W.: Action recognition based on joint trajectory maps using
convolutional neural networks. In: Proceedings of the 24th International ACM Conference
on Multimedia, pp. 102–106. New York (2016) https://doi.org/10.1145/2964284.2967191
15. Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. 43(3),
Article 16 (2011). https://doi.org/10.1145/1922649.1922653
16. Liu, M., Liu, H., Chen, C.: Enhanced skeleton visualization for view invariant human
action recognition. Pattern Recogn. 68, 346–362 (2017). https://doi.org/10.1016/j.patcog.
2017.02.030
17. Gholamrezaii, M., Almodarresi, S.M.T.: Human activity recognition using 2D convolutional
neural networks. In: Proceedings of the 27th Iranian Conference on Electrical Engineer-
ing (ICEE), pp. 1682–1686. Yazd, Iran (2019). https://doi.org/10.1109/IranianCEE.2019.878
6625
18. Qiu, Z., Yao, T., Mei, T.: Learning spatio-temporal representation with pseudo-3D residual
networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV),
pp. 5534–5542. Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.590
19. Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recogni-
tion. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013). https://doi.org/10.1109/
TPAMI.2012.59
20. Arunnehru, J., Chamundeeswari, G., Bharathi, S.P.: Human action recognition using 3D con-
volutional neural networks with 3D motion cuboids in surveillance videos. Procedia Comput.
Sci. 133, 471–477 (2018). https://doi.org/10.1016/j.procs.2018.07.059
21. Yang, H., Yuan, C., Li, B., Du, Y., Xing, J., Hu, W., et al.: Asymmetric 3D convolutional
neural networks for action recognition. Pattern Recogn. 85, 1–12 (2019). https://doi.org/10.
1016/j.patcog.2018.07.028
22. Li, C., Zhong, Q., Xie, D., Pu, S.: Co-occurrence feature learning from skeleton data for action
recognition and detection with hierarchical aggregation. In: Proceedings of the International
Joint Conference on Artificial Intelligence (IJCAI), pp. 786–792. Stockholm, Sweden (2018).
https://doi.org/10.24963/ijcai.2018/109
23. Dong, L., Jin, Y., Tao, L., Xu, G.: Recognition of multi-pose head gestures in human con-
versations. In: Proceedings of the Fourth International Conference on Image and Graphics
(ICIG), pp. 650–654. Chengdu, China (2007). https://doi.org/10.1109/ICIG.2007.176
24. Thafar, M., Ghayoumi, M., Bansal, A.K.: A formal approach for multimodal integration to
derive emotions. J. Vis. Lang. Sent. Syst. 2, 48–54 (2016). https://doi.org/10.18293/DMS201
6030
25. Ishi, C.T., Liu, C., Ishiguro, H., Hagita, N.: Head motion during dialogue speech and nod
timing control in humanoid robots. In: Proceedings of the 5th ACM/IEEE International Con-
ference on Human-Robot Interaction (HRI), pp. 293–300. Osaka, Japan (2010). https://doi.
org/10.1109/HRI.2010.5453183
26. Kapoor, A., Picard, R.W.: A real-time head nod and shake detector. In: Proceedings of the
Workshop on Perceptive User Interfaces (ICMI-PUI), pp. 1–5. Orlando, FL, USA (2001).
https://doi.org/10.1145/971478.971509
27. Tan, W., Rong, G.: A real-time head nod and shake detector using HMMs. Expert Syst. Appl.
25(3), 461–466 (2003). https://doi.org/10.1016/S0957-4174(03)00088-5
28. Morency, L. P., Sidner, C., Lee, C., Darrell, T.: Contextual recognition of head gestures. In:
Proceedings of the International Conference on Multimodal Interfaces (ICMI), pp. 18–24.
Trento, Italy (2005). https://doi.org/10.1145/1088463.1088470
Automated Real-Time Recognition 449

29. Saunders, J., Syrdal, D.S., Koay, K.L., Burke, N., Dautenhahn, K.: Teach me–show me-end-
user personalization of a smart home and companion robot. IEEE Trans. Hum.-Mach. Syst.
46(1), 27–40 (2016). https://doi.org/10.1109/THMS.2015.2445105
30. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York, NY, USA
(2006)
31. Murase, H., Nayar, S.K.: Visual learning and recognition of 3-D objects from appearance.
Int. J. Comput. Vision 14(1), 5–24 (1995). https://doi.org/10.1007/BF01421486
32. Tang, J., Nakatsu, R.: A head gesture recognition algorithm. In: International Conference of
Multimedia Interfaces (ICMI), Beijing, China 2000, LNCS, vol. 1948, pp. 72–80. Springer,
Heidelberg (2000). https://doi.org/10.1007/3-540-40063-X_10
33. Lu, P., Zhang, M., Zhu, X., Wang, Y.: Head nod and shake recognition based on multi-
view model and Hidden Markov Model. In: Proceedings of the International Conference on
Computer Graphics, Imaging and Visualization (CGIV), pp. 61–64. Beijing, China (2005).
https://doi.org/10.1109/CGIV.2005.41
34. Ng-Thow-Hing, V., Luo, P., Okita, S.: Synchronized gesture and speech production for
humanoid robots. In: Proceedings of the IEEE/RSJ International Conference on Intelligent
Robots and Systems, pp. 4617–4624. Taipei, Taiwan (2010)
35. Otsuka, K., Tsumore, M.: Analyzing multifunctionality of head movements in face-to-face
conversations using deep convolutional neural networks. IEEE Access 8, 217169–217195
(2020). https://doi.org/10.1109/ACCESS.2020.3041672
36. Sharma, M., Ahmetovic, D., Jeni, L.A., Kitani, K.M., Recognizing visual signatures of spon-
taneous head gestures. In: Proceedings of the IEEE Winter Conference on Applications of
Computer Vision (WACV), pp. 400–408, Lake Tahoe, NV, USA (2018). https://doi.org/10.
1109/WACV.2018.00050
37. McGlaun, G., Althoff, F., Lang, M., Rigoll, G.: Robust video-based recognition of dynamic
head gestures in various domains - comparing a rule-based and a stochastic approach. In:
Antonio, C., Volpe, G. (eds.) 5TH International Gesture Workshop On Gesture-Based Com-
munication In Human-Computer Interaction (GW) 2003, LNAI, vol. 2915, pp. 180–197.
Springer-Verlag, Berlin Heidelberg (2004)
38. Lavee, G., Borzin, A., Rivlin, E., Rudzsky, M.: Building petri nets from video event ontologies.
In: Bebis, G., Tanveer S.-M., et al. (eds.) International Conference on Advances in Visual
Computing (ISVC) 2007. LNCS, vol. 4841, pp. 442–445. Springer-Verlag, Heidelberg (2007).
https://doi.org/10.1007/978-3-540-76858-6_44
39. Ghanem, N., DeMenthon, D., Doermann, D., Davis, L.: Representation and recognition of
events in surveillance video using Petri nets. In: Proceedings of the Second IEEE Workshop
on Event Mining, Computer Vision and Pattern Recognition, International Conference on
Computer Vision and Pattern Recognition, p. 112 (2004). https://doi.org/10.1109/CVPR.200
4.430
40. Mancas, M., Glowinski, D., Volpe, G., Coletta, P., Camurri, A.: Gesture saliency: a context-
aware analysis. In: Kopp, S., Wachsmuth, I. (eds.) GW 2009. LNCS (LNAI), vol. 5934,
pp. 146–157. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12553-9_13
41. Qiu, J., Wang, L., Wang, Y., Hu, Y.H.: Multi-event modeling and recognition using extended
petri nets. IEEE Access 8, 37879–37890 (2020). https://doi.org/10.1109/ACCESS.2020.297
5095
42. Beddiar, D.R., Nini, B., Sabokrou, M., Hadid, A.: Vision-based human activity recognition:
a survey. Multimedia Tools Appl. 79(41–42), 30509–30555 (2020). https://doi.org/10.1007/
s11042-020-09004-3
43. Vrigkas, M., Nikou, C., Kakadiaris, I.A.: A review of human activity recognition methods.
Front. Robot. AI 2(28), Article 28 (2015). https://doi.org/10.3389/frobt.2015.00028
44. Open CV. https://opencv.org. Accessed 29 Apr 2022
450 A. Singh and A. K. Bansal

45. PyAudio. https://people.csail.mit.edu/hubert/pyaudio/docs/. Accessed 29 Apr 2022


46. Pydub. https://pypi.org/project/pydub/. Accessed 29 Apr 2022
47. Ellis, W.D., (ed.): A Source Book of Gestalt Psychology. Kegan Paul, Trench, Trubner &
Company, (1938). https://doi.org/10.1037/11496-000
48. McClave, E.Z.: Linguistic functions of head movements in the context of speech. J. Pragmat.
32(7), 855–878 (2000)
An Emotional Support Robot Framework Using
Emotion Recognition as Nonverbal
Communication for Human-Robot
Co-adaptation

Osamah M. Al-Omair1(B) and Shihong Huang2


1 King Faisal University, 31982, Al-Ahsa, Saudi Arabia
[email protected]
2 Florida Atlantic University, Boca Raton, FL 33431, USA

Abstract. Human emotion is an essential nonverbal communication tool. Grant-


ing machines this type of ability will improve our communication with technol-
ogy significantly, thus giving us a more natural experience while interacting with
machines. Software systems should have the ability to adapt to such nonverbal
cues. The focus of our research is the incorporation of human emotions in co-
adaptive software systems. Specifically, how emotionally aware systems should
react to human emotions. One of the numerous application areas for this promis-
ing technology is affective robotics. In this paper, we propose a Framework for a
co-adaptive Emotional Support Robot. This framework adopts facial expression
recognition as the main method for detecting emotions. In addition, this human-
centric framework has a strong emphasis on the personalization of user experience.
We adopt a personalized emotion recognition approach, as not all humans show
emotions in the same way. As well as the personalization of the system’s adaptive
reactions, based on a reinforced learning approach where the system assesses its
own actions.

Keywords: Human robot interaction · Affective computing · Emotion


recognition · Affective robotics · Facial expression recognition

1 Introduction
Showing affection is an ability machines have yet to master. Technologies are advanc-
ing and the way we communicate with them is becoming more natural as if we are
heading towards a symbiotic relationship. Human emotions contain so much meaning
and, in some cases, can be more expressive than words. Technology was invented and
developed to make our lives easier. For this reason, computers should be affective in
every aspect; in showing emotion and in perceiving it. This ability is referred to as affec-
tive computing, [1] defines it as “computing that relates to, arises from, or deliberately
influences emotions.” However, a system to perceive human emotion is not enough, the
system needs to know how to react and how to adapt to these emotions. A system is

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 451–462, 2023.
https://doi.org/10.1007/978-3-031-18344-7_30
452 O. M. Al-Omair and S. Huang

considered to be adaptive if it has the ability to change its behavior based a change in its
environment or requirements. We aim for software systems to be adaptive to their users,
leading to a more desirable user experience.
Affective computing adds tremendous enhancements to the field of Human-Robot
Interaction (HRI). According to [2], HRI “is the interdisciplinary study of interaction
dynamics between humans and robots.” Robots have reached new heights in their tech-
nologies and their importance has risen as well. Affection in robotics is aimed to make
robots more socially acceptable, referred to as affective social robots [3] or socially adap-
tive robots [4]. These types of robots can produce a great candidate to replace emotional
support animals. An emotional support animal is an untrained animal that gives comfort
through companionship and is usually targeted to help patients of mental and emotional
illnesses [5]. A study by [6], concludes emotional support animals can be beneficial to
mental health patients.
An emotional support robot would be easier to maintain than a real animal and can
be programmed to do exactly what they’re supposed to do. For example, a patient with
Alzheimer’s disease or dementia could forget to feed their emotional support animal.
With an emotional support robot, this can be avoided as the robots can be taught to be
self-sufficient, such as going to their charging station when they’re running low on power.
Some users might enjoy the responsibility of having an animal, and an emotional support
robot can also be programmed to be more dependent on its owner. Not all people have
the same taste or enjoy being treated in the same manner. This is the reason behind our
emphasis on personalization, as one size shouldn’t fit all. We are different and subjective
to have our own tastes and opinions.
We propose an emotional support robot that adapts to its owner’s emotions. In this
framework we focus on personalization in two aspects. First, the personalization of
emotion recognition as in [7]. Personalizing emotion recognition will give higher detec-
tion accuracy, as not all humans portray emotions similarly. Second, personalizing the
emotional support robot’s responses to their owner, where we adopt a reinforcement
learning approach where the robot learns more about their owner over time. Similar to
a real emotional support animal at a young age learning their owner’s likes and dislikes
overtime. The longer the Emotional support robot is used, the more familiar it becomes
with its owner. In the following section we go over a number of Emotion Recognition
technologies that are applicable to our proposed framework.

2 Emotion Recognition

Human emotion is rather challenging for machines to interpret. Even as humans it’s not
always easy to fully comprehend another person’s emotions. Several factors account
to this, such as race and cultural differences [8]. This can also be due to difference in
personality, as some people show more or less emotions. There are numerous technolo-
gies that exist for emotion detection. Such as using neural input, voice/tone analysis,
heart rate tracking, and facial expression recognition. Neural input of brain waves can
be achieved by electroencephalography (EEG), the most commonly used technology for
Brain Computer Interface (BCI) [9]. BCI “is a system that measures activity of the cen-
tral nervous system and converts it into artificial output [10].” Emotions can be detected
An Emotional Support Robot Framework Using Emotion Recognition 453

using BCI, as done in [11], by feature extraction using fractal dimensions from EGG
signals then classifying those features into emotions. However, extracting human emo-
tions using neural input requires specific hardware that is not commonly owned or easily
accessible by the general public. Heart rate monitoring also requires specific hardware
but has become more widespread other the years. The majority of smart watches today
have the ability to monitor its user’s heart rate. Shu et al. [12] utilized a wearable smart
bracelet to recognize emotions based on heart rate data. In addition, speech emotion
recognition is a widely research area that has promising results. Detecting emotions
from speech can be achieved using classification based machine learning, as in [13].
We focus our research mainly on facial expression recognition using standard cam-
eras. Computer vision and Machine Learning are both used to achieve this task. Further-
more, there are different methods and algorithms to classify emotions from images, as
discussed and compared in [7]. Generally, a cascade classifier is used to first detect any
faces in an image. This can be accomplished with the Open Source Computer Vision
Library (OpenCV) [14]. Then facial features are extracted, compared, and classified into
emotions. In addition, the Deep Learning Library [15] can be utilized to extract facial
features. Within this library is a shape predictor that detects facial landmarks, devel-
oped by [16]. Calculations can be made among these facial landmarks to differentiate
between facial expression as done in [7]. In addition, cloud-based image classification
is also an option for recognizing facial expression within an image. In [17], the cloud
based emotion recognition services of Amazon, Google, and Microsoft are compared.
A personalized approach discussed and tested in [7], shows potential for higher emo-
tion detection accuracy. This approach entails that the system learns how to detect the
emotions of a specific person. This requires the system to run a preliminary training
phase, where the human is asked to purposely show the system how they portray cer-
tain emotions. The system captures a number of images per emotion supported by the
system. These images are labeled based on emotion and trained using a machine learn-
ing algorithm. A drawback with this approach is that the training process may be time
consuming and tedious for the user. Therefore, we suggest a hybrid approach, where
the system has a baseline emotion recognition method that is not personalized with the
capability of personalization overtime. This way, the robot can still function before any
personalized emotion detection training is accomplished. The robot can then perform
the personalized training in smaller increments over time.

3 Emotional Adaptation

The main objective behind this research is for an emotionally adaptive system where the
basis of adaptation is the human. Our goal is for an emotional support robot that adapts
its behavior based on its owner’s emotions. Adaptive systems can also be considered
autonomic systems which are, according to [18], “computing systems that can manage
themselves”. As described in [18], the main structure of an autonomic element consists
of a managed element and an autonomic manager. An autonomic system can consist of
more than one autonomic element. The autonomic manager consists of a feedback control
loop known as the MAPE-K loop [18]. This feedback control loop is considered the
most significant reference control model for autonomic systems [19]. The components
454 O. M. Al-Omair and S. Huang

of the MAPE-K loop are Monitor, Analyze, Plan, Execute and Knowledge. We map
the components of our proposed framework model to the MAPE-K loop, as presented
in Fig. 1, as evidence of autonomy. We define each component of the MAPE-K loop
according to our proposed model as follows:
Sensor is the physical element that captures the external elements that effect the
adaptation of the system. In our case, the sensor is the robot’s camera that views the
human. More than one sensor can be implemented. Using more than one sensor, espe-
cially for emotion recognition, will increase the adaptability of the system leading to a
more effective outcome. Such as a heart rate monitor as well as a camera at the same
time.
Monitor is the component of the system that gathers data from the sensor(s). In most
cases, monitoring should be continuous, unless otherwise specified in the adaptation
rules. The collected data is stored in the Knowledge component to be shared with the
other components of the MAPE-K loop. In our proposed model, the monitor component
is responsible for monitoring the human’s facial expressions. In the case of the sensor is
a heart monitor, the monitor element would continuously record the human’s heart rate.
Analyze is the component that is responsible of analyzing the data acquired by the
monitor component. This analysis will determine if adaptation is required or not. In our
case this component will analyze the facial expressions to extract emotions with the aid
of the personalized emotion recognition data that is stored in the knowledge component.
According to the adaptation rules, the analyze component determines whether a detected
emotion needs the robots to take an adaptive action or not.
Plan is executed only if the analyze component determines that an adaptive action
needs to be performed. This component may also determine if more than one action
needs to be performed. The plan component is in charge of determining the actual action
that should be performed. Action determination is based on the analysis data from the
analyze component and, in our case, the semi-random pool of actions that is stored in
the shared knowledge component. The semi-random pool of actions is further discussed
in Sect. 4.

Fig. 1. The ESR framework modelled to the MAPE-K loop


An Emotional Support Robot Framework Using Emotion Recognition 455

Execute is the component that is responsible for executing the action or actions
determined by the plan component. The execute component uses the effectors of the
managed element to physically perform the action(s). In our case, the execute component
prepares the step-by-step instructions for the robot to perform the required action(s).
Effector, also known as actuator, is the physical element or elements that carry out the
instructions given by the execute component. For our model, these would be the robot’s
physical elements. These elements may include, but are not limited to, the robot’s arms,
hands, legs, wheels, screen, speaker. This differs based the capabilities and physical
elements of the implemented robot.
Autonomous systems may have more than one MAPE-K loop depending on the
nature of the system, specifically how may properties of adaptation the system has. We
discussed the emotional adaptation property of the Emotional Support Robot framework.
Whereas, for actual implementation, the ESR should have various adaptation properties,
hence more than one MAPE-K loop of autonomy can be applied. In addition, other
methods of emotion recognition can be implemented to the same loop of autonomy,
leading to a more effective system. In the following section, we discuss our proposed
framework and its components.

4 Emotional Support Robot Framework


In this section we discuss the various component of the Emotional Support Robot frame-
work. We aim to enable software systems the ability to understand human emotion. In
this human-centric framework we give emphasis on personalization. As a personal robot
should be personalized to the needs of their owner. There are two main sources of per-
sonalization in this framework and those are the personalization of emotion detection
and the personalization of emotional reaction. Figure 2 illustrates the proposed frame-
work and its different components. The framework consists of three main modules: the
emotion recognition module, the human module, and the reaction module. Next, we
discuss each of these modules and their components.

4.1 The Emotion Recognition Module


This module is in charge of acquiring the humans face and extracting data that leads the
detection of the human’s correct emotion. The motivation behind this framework is to
provide a solution for mentally ill or elderly patients. The components of the emotion
recognition module are as follows.

Camera. The robot’s camera is the sensor that monitors the human and also the robot’s
environment. It captures a number of facial images that would be processed within the
module. Other sensors can be implemented as well such as an EEG for brain wave moni-
toring, this will add another degree of emotion recognition and higher emotion detection
accuracy as previously discussed. After images of the human’s face are captured, the
images are sent to the facial features extraction component.

Facial Features Extraction. This component is in charge of locating the face within
the images acquired by the camera. This can be done with the Open Source Computer
456 O. M. Al-Omair and S. Huang

Vision Library (OpenCV) [14], as discussed in Sect. 2. In addition, facial features can
be extracted by using the Deep Learning Library [15].

Emotion Detection Algorithm. This component is responsible for predicting the


human’s emotion based on the data from the facial features extraction component. Typ-
ically, this is a Machine Learning algorithm that has been trained to classify facial
features into emotions. The training files are stored in the personal emotion recognition
data component of the human module.

4.2 The Human Module


Generally, any data that specified to the human is stored within this module. Specifically,
the human’s personalized data. The components of this module are as follows.

Human. The human is the owner of the Emotional Robot. The framework is centric to
this component. The human shows emotions that are acquired by the emotion recognition
module and receives or is shown an action by the robot that is determined within the
reaction module.

Personal Emotion Recognition Data. This component is responsible for storing the
trained machine learning model for recognizing the emotions of the human. The emotion
recognition module depends on this component to know specifically how the human
portrays emotions. This file is specific to the owner of the robot, personalized to their
way of showing emotions.

Semi-Random Pool of Actions. This component is essentially a database of actions


that are correlated with the number of times the user showed a positive reaction when
that given action was performed by the robot. At first, when the robot is considered to be
new, these correlated numbers are by default set to 1 and change overtime. Meaning the
robot needs some time to adjust to its new owner and overtime becomes more familiar
with its owner’s likes and dislikes. Table 1 shows an example Semi-Random Pool of
Actions matrix, where each action has a rating value per emotion. The rating of an
action per emotion is updated every time the robot performs an action in response to the
human having that emotion. If the human is pleased by showing a positive emotion, the
rating is incremented. Otherwise, if the human shows a negative emotion the rating is
decreased. For example, in Table 1, the rating of 3 is shown for play dance music with
sadness, this means the number of times the user was pleased with the ESR’s action
“Play dance music” when the user was sad. The reason behind this approach is to help
the robot in selecting an appropriate action to perform for the user.

4.3 The Reaction Module

This component is in charge of instructing the robot what action is to be performed based
the human’s current emotion. The components of this module are discussed below.
An Emotional Support Robot Framework Using Emotion Recognition 457

Fig. 2. The emotional support robot framework.

Table 1. An example of the semi-random pool of actions

Action\Emotion Anger Happy Neutral Sadness


Turn on TV 2 1 1 3
Play “calming” Music 5 1 2 2
Make coffee 2 1 2 3
Bring cold beverage 1 2 1 2
Bring flowers 2 1 2 4
Play “dance” Music 1 5 3 3

Emotion Processing. This component is given the human’s current emotion by the
emotion recognition module. It is responsible for registering and analyzing the human’s
emotion for action selection and action assessment.

Action Selection. An appropriate action is semi-randomly selected to be performed to


the human based on their current emotion. The semi-random pool of action, described
previously, contains the different possible actions that can be performed. The selection
458 O. M. Al-Omair and S. Huang

process is random, but each action has a weight of possibility. It resembles a raffle,
but with the weights representing multiple raffle entries. Therefore, actions with higher
weight values have a bigger chance to be selected. After an action is chosen it is sent to
be executed.

Action Execution. This component is in charge of taking the selected action and pro-
ducing step-by-step instructions for the robot to perform. As discussed in Sect. 3, these
instructions are to be performed by the robot’s effectors.

Action Assessment. After an action is performed by the robot, the action assessment
component is responsible for measuring feedback of that action from the human. This
assessment is based on measuring the human’s emotions after the action is performed. If
the new emotion is positive, the action assessment component increments the weight of
that action and initial emotion within the semi-random pool of action. If the new emotion
is negative, the weight is decreased.
To clarify the idea of how the ESR functions we illustrate the step-by-step process
in Fig. 3. In the following section we discuss a number of related works in affective
robotics.

Fig. 3. The step-by-step process of the emotional support robot framework.

5 Related Works

The field of affective robotics is promising, and a great deal of research has gone into it.
In this section we discuss several studies that are related to our research. Chumkamon
An Emotional Support Robot Framework Using Emotion Recognition 459

et al. [20], propose a model for an emotional animal robot. Their model focuses on
emotional motivation, where the robot’s mood is stimulated based on its environment
which determines the robot’s level of motivation. Each level of motivation is correlated
with a number of behaviors, such as sleep, hate, gaze around. They also take the user’s
facial expressions into account for how the robot shows emotions for better social inter-
action. However, they only use the robot’s eyes to show emotions. Admoni et al. [21]
propose using eye gaze as a nonverbal commination cue to predict a user’s intention.
Their research is on the focus of a shared autonomy that can be used for assistive care.
Furthermore, recognizing human gesture and eye gaze are proposed by [22], to model
nonverbal behavior for socially adaptive robots.
An empathetic virtual agent, developed by [23], showed positive results in their study.
The empathetic virtual agent mimics a health counselor to help with health intervention.
Their system adapts its own behavior to mimic empathy, also referred to as facial mimicry,
by portraying facial expressions in response to the user’s facial expressions during a
conversation with the virtual agent. This is referred to as reflective listening, which is
conventional in the way we communicate amongst us humans. [24], also apply facial
mimicry in their model for empathetic virtual agents, with a focus on showing different
levels of empathy.

6 Conclusion and Future Work

The proposed framework was designed for the purpose of enabling software systems
to understand human’s nonverbal cues. In our case, we focus on human emotions. We
discussed a number of Emotion Recognition options that can be applied to the proposed
framework. Applying more than one method for recognizing the human’s emotion would
lead to a more holistic solution. Our approach focuses on the detection of emotion from
facial expressions. This requires the robot to monitor the human to detect their emotion.
Adding a smart internet of things device, such as a smart bracelet, to monitor heart rate
would give a more continuous option for detecting a change in the human’s emotions.
Other methods such as, neural input, requires more expensive hardware, making it not
as easily applicable. For future work we aim to add more levels of emotion recognition
that would lead to more adaptation rules. In addition, other peripherals can be added
for additional functionality other than emotion recognition. Such as, using a heart rate
monitor to monitor the well-being of the patient and the robot can contact the health
authorities in case of an emergency.
460 O. M. Al-Omair and S. Huang

Fig. 4. A Screenshot of the ESR Simulation.

We plan to validate our framework with a simple action/reaction experiment to test


the accuracy and applicability of our approach. In this experiment, we will simulate
a scenario where the robot predicts what their owner desires based on their current
emotion. The robot will be shown a facial image from a publicly available facial dataset.
The Emotional Support Robot should detect the emotion within it, respond accordingly,
then assess its own action. Figure 4 shows screenshot of a working prototype for this
experiment.

Acknowledgment. This work was supported by the Deanship of Scientific Research, Vice Presi-
dency for Graduate Studies and Scientific Research, King Faisal University, Saudi Arabia [Project
No. GRANT669].

References
1. Picard, R.W.: Affective computing. Pattern Anal. Appl. 1, 71–73 (1997). https://doi.org/10.
1007/BF01238028
2. Feil-Seifer, D., Matarić, M.J.: Human robot interaction. In: Encyclopedia of Complexity and
Systems Science, pp. 4643–4659. Springer, New York (2009)
3. Kirby, R., Forlizzi, J., Simmons, R.: Affective social robots. Rob. Auton. Syst. 58, 322–332
(2010). https://doi.org/10.1016/j.robot.2009.09.015
4. François, D., Polani, D., Dautenhahn, K.: Towards socially adaptive robots: a novel method for
real time recognition of human-robot interaction styles. In: 2008 8th IEEE-RAS International
Conference Humanoid Robot Humanoids 2008, pp. 353–359 (2008). https://doi.org/10.1109/
ICHR.2008.4756004
An Emotional Support Robot Framework Using Emotion Recognition 461

5. Carroll, J.D., Mohlenhoff, B.S., Kersten, C.M., et al.: Laws and ethics related to emotional
support animals. J. Am. Acad. Psychiatry Law 48(4), 509–518 (2020) https://doi.org/10.
29158/JAAPL.200047-20
6. Brooks, H.L., Rushton, K., Lovell, K„ et al.: The power of support from companion animals
for people living with mental health problems: a systematic review and narrative synthesis
of the evidence. BMC Psychiatry 18(1), 1–12 (2018). https://doi.org/10.1186/s12888-018-
1613-2
7. Al-Omair, O.M., Huang, S.A.: Comparative study of algorithms and methods for facial expres-
sion recognition. In: IEEE International Systems Conference (SysCon), pp. 1–6. Orlando, FL
(2019)
8. Ekman, P., Friesen, W.V.: Constants across cultures in the face and emotion. J. Pers. Soc.
Psychol. 17, 124–129 (1971). https://doi.org/10.1037/h0030377
9. Huang, S., Miranda, P.: Incorporating human intention into self-adaptive systems. In: Pro-
ceedings IEEE International Conference on Software Engineering, vol. 2, pp. 571–574 (2015).
https://doi.org/10.1109/ICSE.2015.196
10. Hill, N.J., Wolpaw, J.R.: Brain–Computer Interface✩. Ref Modul Biomed Sci. (2016).https://
doi.org/10.1016/B978-0-12-801238-3.99322-X
11. Kaur, B., Singh, D., Roy, P.P.: EEG based emotion classification mechanism in BCI. Procedia
Comput. Sci. 132, 752–758 (2018). https://doi.org/10.1016/J.PROCS.2018.05.087
12. Shu, L., Yu, Y., Chen, W., et al.: Wearable emotion recognition using heart rate data from a
smart bracelet. Sensors 20(3), 718 (2020). https://doi.org/10.3390/s20030718
13. Nwe, T.L., Foo, S.W., De Silva, L.C.: Speech emotion recognition using hidden markov
models. Speech Commun. 41, 603–623 (2003). https://doi.org/10.1016/S0167-6393(03)000
99-2
14. Introduction — OpenCV 3.0.0-dev documentation. https://docs.opencv.org/3.0-beta/mod
ules/core/doc/intro.html. Accessed 30 Jan 2018
15. dlib C++ Library. http://dlib.net/. Accessed 10 Jan 2018
16. Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees.
In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1867–1874. IEEE
(2014)
17. Al-Omair, O.M., Huang, S.: A comparative study on detection accuracy of cloud- based
emotion recognition services. In: The International Conference on Signal Processing and
Machine Learning. Shanghai, China, pp. 142–148 (2018)
18. Kephart, J.O., Chess, D.M.: The vision of autonomic computing. Comput. (Long Beach Calif)
36, 41–50 (2003). https://doi.org/10.1109/MC.2003.1160055
19. Arcaini, P., Scandurra, P.: Modeling and Analyzing MAPE-K Feedback Loops for Self-
Adaptation - IEEE Xplore Document (2015)
20. Chumkamon, S., Masato, K., Hayashi, E.: Facial expression of social interaction based on
emotional motivation of animal robot. In: Proceedings - 2015 IEEE International Conference
on Systems, Man, and Cybernetics, SMC 2015, pp .185–190. IEEE (2016)
21. Admoni, H., Srinivasa, S.S.: Predicting user intent through eye gaze for shared autonomy. In:
Proceedings of the 2016 AAAI Fall Symposium: Shared Autonomy in Research and Practice.
pp. 298–303 (2016)
22. Admoni, H., Scassellati, B.: Nonverbal behavior modeling for socially assistive robots. In:
Proceedings of the 2014 AAAI Fall Symposium: Artificial Intelligence for Human-Robot
Interaction (AI-HRI), pp. 7–9 (2014)
462 O. M. Al-Omair and S. Huang

23. Lisetti, C., Amini, R., Yasavur, U., Rishe, N.: I can help you change! an empathic virtual
agent delivers behavior change health interventions. ACM Trans. Manag. Inf. Syst. 4, 1–28
(2013). https://doi.org/10.1145/2544103
24. Boukricha, H., Wachsmuth, I., Carminati, M.N., Knoeferle, P.: A computational model of
empathy: empirical evaluation. In: Proceedings - 2013 Humaine Association Conference on
Affective Computing and Intelligent Interaction, ACII 2013, pp. 1–6. IEEE (2013)
How Does a Social Robot Analyze Emotions?

Pierre-André Buvet1(B) , Bertrand Fache2 , Wiam Fadel3 , and Abdelhadi Rouam4


1 Sorbonne Paris Nord University, 99 Avenue Jean Baptiste Clément, 93430 Villetaneuse, France
[email protected]
2 TeamNet, 10 Rue Mercoeur, 75011 Paris, France
[email protected]
3 Ecole Nationale des Sciences Appliquées d’Oujda, 17, BP 669, 60000 Oujda, Morocco
[email protected]
4 Ontomantics, 959 Rue de la Bergeresse, 45160 Olivet, France

[email protected]

Abstract. We present a study on the multimodal analysis of emotions. It will be


used to create an emotion detector that can be implemented in a social robot. First,
we specify the nature of the emotions studied. Secondly, we detail the main stages
of the research: constitution of the dataset; extraction of the dataset from an image
stream, a sound stream, and a text stream; classification of facial expressions and
prosody in image and sound streams; labeling of the emotions of the text flow with
a predefined typology; labeling the emotions of the image and sound stream classes
with the tagged text stream. Thirdly, we discuss the protocol used to evaluate the
research work. Finally, we present the perspectives of the research.

Keywords: Robotics · Emotion recognition · Machine learning

1 Introduction
We present an experiment on the automatic processing of emotions in a multimodal
framework. The results will be implemented in a social robot. They will take the form
of an emotion detector that simultaneously processes image, voice, and text. For a robot
to be empathetic, it must identify the emotions of its users. Empathy is defined as:
‘intuitive ability to put oneself in the place of others, to perceive what they feel’ (source
Larousse dictionary). Being empathetic does not mean sharing the same emotional state
but consists in manifesting an affective state testifying to the understanding of the feelings
of the other. It is an appropriate response to an emotional reaction. For example, we show
compassion when someone is sad or complicity when someone is happy. The multimodal
approach is necessary to automatically analyze emotions because the only processing of
the textual content is sometimes insufficient to identify the emotional state of the person
who speaks. For example, the French expression ça va (in English it’s ok) is interpreted
as a positive or negative emotion depending on the context. The prosodic analysis of this
utterance helps to disambiguate it. Pronounced with a falling melodic pattern in the infra-
low register and a jerky tempo implies a negative emotion, while produced with a rising
falling intonation pattern and a slower tempo will be interpreted as a positive emotion

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 463–477, 2023.
https://doi.org/10.1007/978-3-031-18344-7_31
464 P.-A. Buvet et al.

[1]. Similarly, the analysis of facial expressions also makes it possible to disambiguate
this language expression [2].
For nearly four years, we have been developing UKKO, an intelligent dialogue
system [3–5]. This system will become a component of the social robot BUDDY [6]
inter alia. This component will allow the robot to have conversations with its users. The
creative aspect of the UKKO system sets it apart from other dialogue systems. Instead of
reproducing stored human formulations, UKKO produces appropriate utterances thanks
to an algorithm that uses linguistic resources to simulate human language. The way the
system works is based on human language modeling [7]. Its development is based on a
mixed approach, called extended intelligence, which combines a symbolic approach and
a numerical approach [8]. The first approach, called linguistic intelligence, gives rise
to metalinguistic rules which allow to: 1) decode the incoming message; 2) encode the
outgoing message, ensuring the adequacy of the two types of messages; 3) chain these
messages in a conversational form. The second approach, called artificial intelligence,
contributes to automatically producing the linguistic descriptions that are necessary for
the application of the metalinguistic rules developed as part of the first approach. The
complementarity between linguistic intelligence and artificial intelligence is central in
the development of the system.
The evaluation of verbal interactions between a social robot and its users has pro-
vided good results. Misunderstandings of incoming messages are rare. They are easily
correctable. The generation of outgoing messages is well controlled. The use of a social
robot is facilitated if its users forget that they are interacting with a machine [9]. The
humanoid form of the robot and its discursive competence contribute to the personifica-
tion of the robot. This is also true when the robot has an empathic skill [10]. Artificial
empathy is a set of marks of interest shown by a robot for its users [11]. Affective
computing studies this category of interaction [12]. This area of research has two com-
ponents: there is, on the one hand, the automatic recognition of human affective states
and, on the other hand, the automatic creation of marks of empathy. This paper deals
with the first part only.
In recent years, many works have been conducted in affective computing [13]. These
studies deal, inter alia, with multimodal emotion recognition, based on deep learning
[14, 15]. However, in these works, the number of emotions is quantitatively limited and
multimodality is confined either to the analysis of facial features and voice, or to the
analysis of voice features and the content of what is said.
We proceeded differently than in the works mentioned above. First, we automatically
classified facial and voice features. Second, we projected the emotions identified and
labeled automatically onto what is said about the classes of facial and voice features
of the first step when the three modalities are activated at the same time in the videos
studied. Third, we used a semi-supervised learning method based on deep learning. This
method exploits the results of the second step to label the other classes of facial and
voice features. These classes are not associated with verbally expressed emotions (this
is the most frequent case in our corpus). Our approach aims to automatically recognize
a wide variety of emotions on the face and voice in order to get closer to the variety of
emotions expressed verbally.
How Does a Social Robot Analyze Emotions? 465

First, we specify the issues of the research and the methodology used. Secondly, we
present the experimental protocol used to test the research work hypothesis. Third, we
discuss the interpretation of the obtained results. Finally, we specify what will be the
extension of this work.

2 Research Problem
Emotions are a subject of study in several scientific disciplines. Among the many studies
in philosophy on emotions, there are those of René Descartes. Emotions, called by
the philosopher “passions of the soul” transcend the Cartesian body-mind dichotomy,
because as subjects of emotions, true human beings are necessarily aggregates of body
and spirit [16]. Studies in psychology on the emotions of William James and Carle
Lange at the end of the 19th century are also essential. They define emotions as felt
bodily reactions: the triggering of emotion would be determined by the perception of
a peripheral activation pattern [17]. This analysis of emotions was challenged at the
beginning of the 20th century by physiologists Walter Bradford Cannon and Philip Bard:
the triggering of emotion is determined by the processing of a stimulus at the level of the
central nervous system, the peripheral activation pattern being neither specific nor causal
[18]. More recent studies in neuroscience distinguish two categories of affect: emotions
and feelings [19]. The first category is physiological, an emotion appearing essentially
on the body, for example, lumped shoulders. The second category is neuropsychological
since feeling is considered a cognitive process. A causal continuity, in which the emotion
precedes the feeling, is established between the two kinds of affect.
First, it emerges from all these studies that the physiological properties of emotions
are undeniable. Therefore, these properties are necessary for the automatic detection
of emotions. Their physiological aspect is integrated into the processing presented here
because of its multimodal nature: two somatic indicators are analyzed: facial expressions
and voice. These studies focus on the inner point of view instead of the outer point of
view.The inner point of view is that of the subject who feels the affects. It explains how
emotions work by modeling the human mechanisms that produce them. The external
point of view is that of the observer. It is adopted in this study since its purpose is to
detect the emotions of users of a social robot.
Detecting emotions involves categorizing them according to specific markers: bodily
attitudes, facial expressions, the voice, and the messages it delivers. When the observer
is a human being, the physiological aspects are fundamental because they echo those
that characterize his own emotions: a large part of our interactions with the surrounding
environment and our emotional behaviors depend on our ability to perceive and under-
stand the emotions of others [20]. This is the first challenge of the research presented
here.
A language is a tool used by human beings to communicate. It allows describing the
emotions in a very deep way. The literature has described affective feelings in detail.
Therefore, languages have a very complete lexicon to signify them. In a more famil-
iar, even very familiar register, they also have a large stock of expressions to express
their emotions; for example, c’est le fun (in English it’s fun) to express satisfaction. The
second challenge of the research is to rely on the description of the vocabulary to desig-
nate the emotional markers that are physiological. The starting point is a very detailed
466 P.-A. Buvet et al.

analysis of affect predicates [21]. We rely on this analysis to define the physiological
markers of emotions from their textual markers. According to this theory, utterances
proceed from predicate-argument structures. The predicates are the linguistic forms of
oriented relations between entities corresponding to their arguments. Affects are cogni-
tive processes. Their particularity is to be centered on the psychic interiority of people.
They are distinct from sensations. These have the particularity of being centered on the
physiological interiority of people. This distinction is not absolute because there are
interactions between the two kinds of interiority. The feeling of cold can go hand in
hand with annoyance. Similarly, disgust can lead to nausea. It is these interactions that
explain the physiological aspect of emotions in the context of the evolution of species
[22]. Affects are distinguished from cognitive processes that are not centered on the
interiority of people, those that contribute to processing information from the outside
world. Affects correspond to moods, emotions, and feelings. Moods and emotions are
conceived as reflexive relations: their point of departure and their point of arrival are the
same. Feelings are conceived as oriented binary relationships: their starting point is a
human being, and their point of arrival is another human being. The linguistic forms of
these affects are predicates. For example, the French adjectives morne (in English bleak),
morose(in English gloomy), somber (in English depressed)and the French phraseolo-
gism avoir le moral dans les chaussettes (literally ‘to have moral in the socks’) are
mood predicates of the class MOROSITY, the French nouns frayeur (in English fright),
frousse (in English jeatters), peur (in English fear) and the French phraseologism les
avoir à zero (literally ‘to have them to zero’) are emotion predicates of the class FEAR
and the French verbs détester(in English to dislike), exécrer (in English to execrate) and
haïr(in English to hate) and the French phraseologism ne pas pouvoir le voir en peinture
(literally’not being able to see it in painting’) are feeling predicates of the class HATE.
The French lexicon of affect predicates contains at least 5000 monolexical or polylexical
units.
Affect predicates are adjectives, adverbs, nouns, and verbs corresponding to simple
words or complex words. Their identification and categorization in terms of emotion,
mood, and feeling result from their ontological properties specified above. The subdi-
vision of the three main categories into sub-categories, called classes, results from a
thorough linguistic analysis of their lexical items. This is based on both their semantic
properties and their distributional properties. Items of the same class are quasi-synonyms
and share the same contexts. Now we are discussing only emotions. For these, 11 classes
are identified. They are distributed on a first axis according to their tonality (negative
tonality versus positive tonality). From this point of view, the class SURPRISE is neu-
tral while the class ANGER is the most negative and the class JOY is the most positive.
Between these poles, the other classes are distributed as follows (from the negative tone to
the positive tone): FEAR, DISGUST, DISSATISFACTION, BURDEN, SADNESS, and
CONFUSION (classes of negative emotion); CONTENTMENT and APPRECIATION
(classes of positive emotion).
Each class of emotion is subdivided into subclasses according to their intensity
(low intensity versus high intensity). For each class, the subclasses are listed from low
intensity to high intensity. The class ANGER subsumes the subclasses IRRITATION
(low intensity) and RESENTMENT, VARIOUS ANGER (high intensity). The class
How Does a Social Robot Analyze Emotions? 467

FEAR subsumes the subclasses WORRY and APPREHENSION (low intensity) and
VARIOUS FEAR and TERROR (high intensity). The class DISGUT subsumes the sub-
classes DISPLEASURE (low intensity) and VARIOUS DISGUT (high intensity). The
class DISSATISFACTION subsumes the subclasses FRUSTRATION, CONTRARIETY
(low intensity), and DISAPPOINTMENT, VARIOUSDISSATISFACTION, MORTI-
FICATION, and INDIGNATION (high intensity). The class BURDEN subsumes the
subclasses DISENCHANTMENT and WEARINESS (low intensity) and VARIOUS
BURDEN, DISCOURAGEMENT, DESPAIR (high intensity). The class SADNESS
subsumes the subclasses VARIOUS SADNESS and UNHAPPINESS (strong inten-
sity). The class CONFUSION subsumes the subclasses DISCOMFORT (low inten-
sity) and VARIOUS CONFUSION, DISORIENTATION, TROUBLE (high intensity).
The class SURPRISE subsumes the subclasses ASTONISHMENT (low intensity) and
VARIOUS SURPRISE, STUPEFACTION (high intensity). The class CONTENTMENT
subsumes the subclasses VARIOUS CONTENTMENT (low intensity) and ENTHUSI-
ASM and SATISFACTION (high intensity). The class APPRECIATIONsubsumes the
subclasses VARIOUS APPRECIATION, PLEASURE, ADMIRATION, and FASCI-
NATION (strong intensity). The class JOY subsumes the subclasses VARIOUS JOY,
HAPPINESS, and EXALTATION (strong intensity). An excerpt from the emotions list
is provided in Table 1.

Table 1. Extract from the emotions classes and subclasses list.

Class Anger Fear


Irritation Various contentment
Subclasses Resentment Enthusiasm
Various anger Satisfaction

The projection of emotion subclasses on the tone axis (x-axis) and the intensity
axis (y-axis) provides a categorical and dimensional representation of these affects.
Therefore, a subclass corresponds to a point in this two-dimensional space. It is defined by
coordinates related to its tone or intensity. Table 1 shows this representation of emotions.
The third challenge is to exploit this linguistic knowledge about emotions to automat-
ically obtain new knowledge about the physiological aspect of emotions (facial expres-
sions and voice). The goal is to integrate them into a device based on artificial intelli-
gence. From this point of view, the chosen approach is based on extended intelligence
(see above).
Our study starts from the observation of recovered video recordings. Emotions that
are physiologically signified appear much more frequent than emotions that are expressed
orally. Therefore, to detect emotions in a multimodal way, we made the following hypoth-
esis: it is possible to infer the emotions signified by facial expressions and the voice of
the emotions said by the people who feel them. This hypothesis is associated with three
questions: 1) Which data can be used to automatically process emotions in a multimodal
setting?; 2) Which unit of analysis should be chosen for automatic processing?; 3) Is the
categorization of data text relevant for the categorization of image and sound data?
468 P.-A. Buvet et al.

Fig. 1. Categorical and dimensional representation of emotions.

To answer the above questions, it is also necessary to consider the particularities of


the data. It is a set of videos from which three streams are extracted: an image stream,
a sound stream, and a text stream. French is the language spoken in these videos. These
are reports, interviews, and documentaries retrieved from the Web. The need to process
good quality data explains this choice: authentic emotions are preferred to emotions
played by actors. Then, we developed an experimental protocol: 1) data preprocessing;
2) data analysis; 3) interpretation of the results obtained. Phase 1 involves indexing the
three streams in a coordinated way to align image type, sound type, and text type data.
This alignment is essential for interpreting emotions that are physiologically signified
from emotions that are verbally expressed. In addition, data from image and sound
streams need to be filtered. Filtering allows the processing to focus only on the relevant
properties from the point of view of the analysis of emotions. The data from the filtered
video stream consist of facial expressions related to geometric shapes; those of the
sound stream are prosodic markers of the voice. In a second step, the textual data are
labeled in terms of emotions using a supervised method. This uses training data that
have been processed beforehand. Thirdly, the image stream, the sound stream, and the
text stream are merged to detect all the emotions expressed both verbally and non-
verbally. Sentence 3 corresponds to the evaluation of the results obtained. It establishes
the degree of relevance of the analyzes of phase 2. The evaluation validates or invalidates,
the hypothesis proposed in this study.

3 Experimentation
The first step is the constitution of the corpus. It is a set of videos from the web. It
is subdivided into three subcorpora. The first sub-corpus is a test corpus. Its role is to
use data to develop analytical tools for experimentation. The second sub-corpus is a
validation corpus. Its role is to use data to verify the quality of the analysis tools we have
developed for the experiment. The third sub-corpus is an evaluation corpus. Its role is
to use data to verify the quality of the results of the analyses.
The sound quality of the input file and the quality of its images are necessary condi-
tions for efficient information processing. The selection of the sources of information is
a preliminary work to the analysis of the contents. Three criteria are used for profiling
How Does a Social Robot Analyze Emotions? 469

the corpus: 1) people are facing the camera and their face is uncovered; 2) the soundtrack
does not contain extraneous noise; 3) the words recorded are those of the people filmed.
The first phase of the experiment aims to segment the raw corpus according to each of
the comments made by the people filmed. We chose VOSK [23] as our speech-to-text
tool. This is the most efficient for representing a monologue as a series of utterances.
Figure 1 illustrates the processing of VOSK. The input represents the continuous sound
stream (its language content is indicated in italics). The output is the representation of
this stream in a textual form. It is the segmentation of the initial stream into utterances.
Sound streams (WAV format) are extracted from the various videos in the corpus.
They are obtained by using the software FFmpeg [24]. This software was chosen because
it provides a sound file compatible with the processing performed by VOSK. The results
of the speech-to-text tool are then used to split video files, using the Python language’s
moviepy module, and sound files, using the Python language’s pydub module. The size
of the split files is not identical. It is delimited by the utterance of the text file. Each
utterance determines the segmentation of the text file and the segmentation of the sound
and image files from which it comes. In this way, it is possible to align the image, sound,
and text streams extracted from the original video file. The average length of split files
is 5 sc.

Table 2. Input and output of a sound file processed by VOSK.

INPUT OUTPUT
{
"text" : c'était le premier livre
d'enquête"
}{
"text" : "bon voilà donc on peut
être européen et défendre ses
intérêts nationaux c'est pas
incompatible"
}{
c'était le premier livre d'enquête bon voilà donc on peut être
"text" : " et voilà ce n'est pas
européen et défendre ses intérêts nationaux c'est pas incompatible
se poser cette question là ce
et voilà ce n'est pas se poser cette question là ce n'est pas à un
n'est pas à un problème
problème justement le souci en france c'est qu'on a tendance à
justement"
faire des beaux débats sur des concepts donc on se réfugie
}{
derrière il y a la casaque bleue rouge"
"text" : "le souci en france
c'est qu'on a tendance à faire
des beaux débats sur des
concepts"
}{
"text" : "donc on se réfugie
derrière il y a la casaque bleue
rouge"
}

Then, the data from the image and sound streams are pre-processed to
make them more manipulable. The image stream data are filtered using the
shape_predictor_68_face_landmarks shape predictor. It is pre-trained on the ibug 300-
W dataset. The tool only analyses facial features: eyes, eyebrows, mouth, and nose.
These strokes are associated with geometric shapes and represented in a digital format.
470 P.-A. Buvet et al.

They are then automatically processed to segment facial expressions. Table 2 shows this
representation of facial features.

Table 3. Digital representation of facial expressions.

left_eye right_ left_ right_ all_mouth open_ nose/ left_ right


/face mouth mouth _eye
eye/face eye eye eyebrow
brow

0.05609 0.064142 0.2117 0.2388 0.323563 0.134 0.48664 0.287754 0.32


0 0 25 45 352 4 5960

0.05523 0.063163 0.1999 0.2532 0.296793 0.110 0.51335 0.283184 0.31


1 5 99 80 789 8 7161

0.05590 0.063926 0.1999 0.2388 0.303686 0.082 0.52744 0.287754 0.32


2 1 99 45 823 7 5960

0.07943 0.062310 0.2828 0.2457 0.292979 0.087 0.53060 0.380058 0.29


3 0 42 18 893 8 4085

0.07811 0.061277 0.2679 0.2457 0.423576 0.150 0.57656 0.334767 0.27


4 436503 9494780 98515 180467 58344161 89291 0999826 0317059 4721
535563 2897 70463 335804 664 00708 512 9303 1278
88 7 6422 9737
807

… … … … … … … … …

The sound stream is filtered to extract prosody markers from the voice recording
(for example, voice intensity; voice pitch, or speech rhythm). The raw data are filtered
using the openSMILE tool. It identifies all kinds of voice characteristics. Only the char-
acteristics of a prosodic nature are exploited. They reveal emotions [25]. The prosodic
features identified by the tool are represented in a digital format. Then, they have pro-
cessed automatically to segment the prosodic markers. Table 3 shows this representation
of prosody.
The prosodic variations are shown in the three right columns of Table 3. The left
column specifies the time course.
Text stream preprocessing consists of preparing training data for a supervised learn-
ing method. The data are obtained by tagging a portion of the text stream with the
How Does a Social Robot Analyze Emotions? 471

Table 4. Numerical representation of prosodic markers.

F0final_sma voicingFinalUnclipped_sma pcm_loudness_sma


220076 276,13089 0,785414 0,317043
220077 275,943726 0,76355 0,335212
220078 321,704712 0,739038 0,348994
220079 367,660492 0,724123 0,41338
220080 413,781342 0,717396 0,393543

semantic analysis engine of the UKKO system. The engine is configured to qualify and
detect verbally expressed emotions. The analysis integrates the centering of the emo-
tions on the person who speaks and not on another person. For example, the sequence
j’ai de la peine(in English I feel sorry) is labeled SADNESS while the sequence il a de
la peine (in English he feels sorry) is not. Therefore, it is not only keywords that are
identified but also their context. A posteriori human verification validates the quality of
the semantic labeling carried out. This allows any faulty labels to be corrected. Table 4
presents emotion labels inserted into the text to provide training data.

Table 5. Training data for text stream processing.

Content emotion
"on passe d'ailleurs plus il ne pèse plus grange et qui ne neutral
pèse plus grand-chose"
"mais en fait comme ils sont toujours en contentieux" neutral
"je vais être vulgaire je suis désolé on est en train de contrariety
se faire botter le cul"
"oui ils sont inquiets par la présence de la chine en neutral
afrique"
"d'ailleurs le fils débit et plus c'est compliqué confusion
l'histoire mais enfin en tout cas le lien de la famille
plus que des institutions tchadiennes "
"oui mais déjà c'est voilà ils sont pas très formés pas neutral
très compétents généralement lors des fouilles et des
surprises d'ailleurs"
"comme c'est des sujets complexes plupart du temps" confusion
"et c'est parfois un reproche qu'on me fait sur mes confusion
enquêtes c'est trop complexe"
"c'est trop compliqué vous a trouvez ça très simple" confusion
"il est neuf cent trente mille abonnés" neutral
"de mon point de vue ce qui m'inquiète c'est que" worry
"ils sont abonnés à des lettres confidentielles" neutral
"c'est pas très sexy c'est compliqué" confusion
"elles sont même quand elles sont posées les enjeux sont neutral
très mal décrypté et c'est une responsabilité collective"
"moi ce qui m'a frappé après mon passage mon premier surprise
passage chez vous c'est que"
"mais on voit bien que ça ne fonctionne plus et ça disenchantment
fonctionne plus puisque la cinquième république totalement
verticale dans un monde qui est aussi"
472 P.-A. Buvet et al.

The second step is the processing of the image, sound, and text streams. The streams
are analyzed separately. The goal is to obtain classes of facial expressions and classes of
prosodic markers with the image and sound streams. The classes are labeled during the
third step. The flows are segmented with the iterative K-means algorithm. The Elbow
for optimal K method heuristically determined the number of segments. The number
of segments proposed is 11. For the processing of the image stream. The number of
segments proposed is 5 for processing the sound stream. This number is insufficient for
the detailed analysis of emotions. Also, the number of segments chosen is 12 for the
processing of the two streams. This number corresponds to the 11 categories of emotions
presented above and to the neutral category, i.e. the absence of emotion. This number
will then be increased empirically to integrate the sub-categories of emotions. Table 5
shows an extract of the segmentation performed on the image stream.

Table 6. Segmentation of facial expressions (12 Segments).

left_ey rigth_e left_ right all_m open_ nose/ left_ey rigth_ey clus
e/face ye/face eye _eye outh mouth mouth ebrow ebrow ters

12 0,0790 0,06391 0,29 0,30 0,0588 0,579 0,3162 0,27487


73 38 1 8398 904 0,372 24 345 28 4 4
0 033

12 0,0790 0,06391 0,29 0,32 0,372 0,0415 0,579 0,3162 0,28588


73 38 1 8398 596 033 95 345 28 6 4
1

12 0,0474 0,04743 0,18 0,25 0,372 0,0415 0,565 0,1871 0,18973


73 34 4 9737 6307 033 95 747 35 7 3
2

12 0,0316 0,03535 0,11 0,19 0,353 0,0877 0,596 0,1174 0,13633


73 23 5 931 104 553 06 464 44 5 3

12 0,0317 0,03552 0,12 0,19 0,335 0,0831 0,565 0,1247 0,14907


73 78 9 6491 104 346 89 747 57 1 3

Table 6 and Fig. 2 are a two-dimensional representation of the segmentation per-


formed on the image stream obtained with the T-SNE (T-distributed Stochastic Neighbor
Embedding) algorithm.
How Does a Social Robot Analyze Emotions? 473

Fig. 2. Représentation of the segmentation of facial expressions (12 Segments).

Table 7 shows an extract of the segmentation performed on the sound stream;

Table 7. Segmentation of prosodic markers (12 Segments).

F0final_sma voicingFinalUnclipped_sma pcm_loudness_sma Kmeans_cluster


16 −1,00639 −0,914327 −0,913639 5
17 −0,92758 −0,102571 −0,498767 2
18 −0,307951 0,196993 −0,433225 2
19 0,634308 1,257509 1,6602 1
20 3,329189 0,6405 1,065743 8
21 1,041436 0,998476 1,655949 1

Figure 3 is a two-dimensional representation of the segmentation performed on the


sound stream obtained with the T-SNE (T-distributed Stochastic Neighbor Embedding)
algorithm.
For the text stream, the goal is to identify emotions, extract and categorize them from
the training data previously provided. The stream is labeled with a recurrent neural net-
work based on the BiLSTM method (Bidirectional Long Short-Term Memory Network).
The BiLSTM method differs from the LSTM method by its data exploration mode: for-
ward and backward. The prediction quality of the model is 95%. Table 8 presents an
extract of the results obtained.
474 P.-A. Buvet et al.

Fig. 3. Représentation of the segmentation of prosodic markers (12 Segments).

Table 8. Labeling of emotions in the text stream.

Text Emotion
1 tout le monde est dans l’invective et tout le monde dans la morale cela ANGER
me met en rage de constater cela
2 C’était gênant de savoir qu’elleen souffrait CONFUSION
3 savoir qu’elle est heureuse de vivre me comble de joie JOY
4 quand elle m’a dit oui j’étais content CONTENTMENT
5 sa réaction m’a étonné SURPRISE

The third step consists in merging the new data obtained during the previous step
and detecting all the emotions expressed by a facial expression or a voice. We specify
the key principles of the processing of this data. The learning method is semi-supervised
[26]. First, the data are merged. For each of the split file triplets during the first step,
the emotion deteincted in the text file is assigned to the facial expression class and the
prosodic marker class of the corresponding video and sound files. Secondly, the result
of the fusion is exploited with a deep semi-supervised learning algorithm to predict
the emotions expressed in video files and sound files that have not been labeled from
a text file. The neural network used is based on the SGAN model (SGAN stands for
Semi-supervised Generative Adversarial Network). It predicts non-verbal emotions.
How Does a Social Robot Analyze Emotions? 475

4 Interpretation
Interpreting the results of the experiment consists in answering the questions associated
with the initial hypothesis. The answers obtained confirm, or invalidate, the hypothesis.
The interpretation depends on the evaluation of the analyses presented in the previous
section.
The evaluation concerns the corpus that is specific to it. To avoid hypothesis con-
firmation bias (processing previously validated information), it is necessarily different
from the test and validation corpora. Corpus profiling is essential for evaluation. The
videos retrieved must be sufficiently representative of the phenomena studied. The goal
of the evaluation is to test the relevance of the semantic categorization of image, sound,
and text flows in terms of emotion.
The evaluation of the model used for voice and face analysis aims to measure the
quality of segmentation using the Silhouette index:

(b − a)
(max(a, b))
The Silhouette index is defined for each sample. It is composed of two scores a,
average distance between a sample and all the other points of the same cluster, and b,
average distance between a sample and all the other points of the closest cluster.
After segmentation, fusion is performed to label the data. A part of labeled textual
data is used to calculate the similarity of the results obtained using the Jaccard index:
|A ∩ B|
J(A, B) =
|A ∪ B|
The Jaccard index is used to measure the similarity between sets of finite samples.
It is the size of the intersection divided by the size of the union of the sample sets.

Table 9. Evaluation of the labeling of emotions for each modality

Modality Index Scoring


face (image stream) Silhouette 0,75
voice (sound stream) Silhouette 0,89
speech (text stream) Jaccard 0,91

Table 9 shows that The results obtained validate our hypothesis: it is possible to
automatically infer the emotions signified by facial expressions and the voice from the
emotions that are expressed verbally by those who feel them.
From the results, the model is deployed in an HDF file (HDF stands for Hierarchical
Data Format), type HDF5. The file includes the configuration of the model and the
weights used for labeling.
476 P.-A. Buvet et al.

5 Perspectives
The research presented here shows that automatic, multimodal, and detailed analysis
is possible. It is fundamental to apprehend the emotional variety instead of the five
fundamental emotions. An empathetic response is different when someone shows fear
or when he is terrified.
Incorporating a system that identifies the emotions into a social robot requires the
machine to respond to those signals appropriately. The empathic words formulated by
the robot must correspond as closely as possible to the emotions felt by its users. Any
discrepancy between an emotional manifestation of a human being and an empathetic
reaction of a robot tends to discredit its use. This is why it is necessary to broaden the
spectrum of automatically detected emotions.
Once the research results are stabilized, we will develop an emotion detector that
analyzes multimodality (image, voice, and text) and emotional variety. The signals from
the detector will be integrated into the knowledge base of the intelligent dialogue system
called UKKO. The role of this knowledge base is to contextualize the verbal interactions
between the machine and its users. It integrates extralinguistic information in the analysis
of incoming messages and the synthesis of outgoing messages. Informing the system
about the emotions of users will ultimately make the robot empathetic and, consequently,
increase its sociability.

References
1. Lacheret, A.: Le corps en voix ou l’expression prosodique des émotions. Evolutions
Psychomotrices, Fédération Européenne des Psychomotriciens 23(90), 25–37 (2011)
2. Bassil, J.: Facial motion in the perception of faces and of emotional expression. J. Experimental
Psychology 4(3), 373–379 (1978)
3. Buvet, P.-A., Fache, B., Rouam, A.: How does a robot speak? about the man-machine ver-
bal interaction. In: Proceedings of The 3rd International Workshop on the Applications of
Knowledge Representation and Semantic Technologies in Robotics ((AnSWeR19), CEUR.
http://ceur-ws.org/Vol-2487/(2019)
4. Buvet, P.-A., Fache, B., Rouam, A.: Interview with a robot: How to Equip the Elderly Compan-
ion Robots with Speech?. In: Proceedings of the Future Technologies Conference (FTC 2020,
2, pp. 310–326, Springer, Heidelberg (2020). https://doi.org/10.1007/978-3-030-63089-8_20
5. Buvet, P.-A., Fache, B., Rouam. A.: Which intelligence for human-machine dialogue sys-
tems?. In: Proceedings of the Future Technologies Conference (FTC 2021), 1, pp. 121–133,
Springer, Heidelberg (2021). https://doi.org/10.1007/978-3-030-89906-6_10
6. http://www.bluefrogrobotics.com/robot/
7. Buvet, P.-A.: Linguistique et intelligence, Etudes de linguistique appliquée, Klincksieck (in
press)
8. Buvet, P.-A.: Prédication et relation: l’exemple de la détermination, in Le prédicat en
questions, Champion, Paris (in press)
9. Tisseron, S. : Le jour où mon robot m’aimera, Albin Michel, Paris (2015)
10. Devillers, L.: Les Robots émotionnels. Santé, surveillance, sexualité… : et l’éthique dans tout
ça ? Editions de l’Observatoire, Paris
11. Bensoussan, J., Bensoussan A.: IA, robots et droit, Liège (2019)
12. Pelachaud, C. (ed.): Systèmes d’interaction Émotionnelle. Lavoisier, Paris (2010)
How Does a Social Robot Analyze Emotions? 477

13. Stock-Homburg, R.: Survey of emotions in human-robot interactions perspectives from


robotic psychology on 20 years of research. Int. J. Sociologic Robotics 14, 389–411 (2022)
14. Kansizoglou, I., Bampis, L., Gasteratos, A.: An active learning paradigm for online audio-
visual emotion recognition. IEEE Transactions on Affective Computing (2019). https://doi.
org/10.1109/TAFFC.2019.2961089
15. Guanghui, C., Xiaoping, Z.: Multi-modal emotion recognition by fusing correlation features
of speech-visual. IEEE Signal Process. Lett. 28, 533–537 (2021)
16. Amatayakul, S., AlbertNicole, G.: Surmonter ses émotions, conquérir son destin. Réflexions
sur l’éthique de Descartes. Diogène 237(1), 109–120 (2012)
17. Coppin, G., Sander, D.: Théories et Concepts Contemporains en Psychologie de l’émotion in
Systèmes d’interaction Émotionnelle, pp. 25–56. Lavoisier, Paris (2010)
18. Damasio, R.A.: Looking for Spinoza: Joy, Sorrow and the Feeling Brain. Harcourt, Boston
(2003)
19. Rizzolatti, G.S., C.: Les Neurones Miroirs. Odile Jacob, Paris (2008)
20. Buvet, P.-A., Girardin, C., Gross, G., Groud, C.: Les prédicats d’affect, LIDIL, 32, Presses
Universitaires de Grenoble, pp. 125–143 (2005)
21. Gross, M.: Les bases empiriques de la notion de prédicat sémantique, Langages, 63, Larousse,
pp. 7–52 (1981)
22. Damasio, R.A.: Self Comes to Mind: Constructing the Conscious Brain. Knopf Doubleday
Publishing Group, New York (2002)
23. VOSK Offline Speech Recognition API: https://alphacephei.com/vosk/ Accessed 28 Mar
2022
24. https://ffmpeg.org/
25. Mary, L.: Extraction and Representation of Prosody for Speaker, Speech and Language
Recognition. Springer, Heidelberg (2015). https://doi.org/10.1007/978-1-4614-1159-8
26. Chapelle, O., Scholkopf, B., Zien, A.: Semi-Supervised Learning. MIT Press, Cambridge
(2006)
Obstacle Recognition Using Depth Estimation
and RGB Data for Autonomous Vehicle

Jheanel Estrada1,2(B) , Gil Opinas Jr.2 , and Anshuman Tripathi2


1 Technological Institute of the Philippines, Manila, Philippines
[email protected]
2 Energy Research Institute @ NTU, Singapore, Singapore

Abstract. In the past few years, autonomous vehicles have been the subject of
innovation and technology amidst the challenges and difficult situations surround-
ing the urban driving and infrastructure. The recent developments in the sensors
and embedded systems made way to do a deeper analysis of the cost and needs
of AV. This study aims to make use of these new technologies and sensor to have
a better recognition model of the relevant road actors surrounding autonomous
vehicles. The results of this study made it possible to use depth and RGB data to
recognize obstacles in an urban road driving scenario. The mean average preci-
sion of the model across all labels shows acceptable results running in ten frames
per second setting. The model was deployed in an autonomous golf picker buggy
using Robotic Operating System.

Keywords: Computer vision · Object recognition · Stereo camera · AV ·


Unmanned vehicle

1 Introduction

In the past few years, researchers found interest in the evolving field of Autonomous
Vehicles due to the enormous capabilities of available sensors nowadays [1]. These
self-driving vehicles heavily rely on its capabilities in motion planning, path planning,
perception, localization and controls. Automation in the transport industry offers a wide
range of advantages. To achieve their full potential in the market, autonomous vehicles
deal with the safety and comfort of their users. Self-driving vehicles have been the talk
of many studies exploring the potential use cases of future mobility solutions. Over the
last five decades, several companies are competing to achieve Level 5 Full Automation.
Taking this into a high note, this requires the vehicle to sense the local environment,
classify important objects on the road that detects in real-time for both day and night-
time even under rain conditions which also requires very big data. A part of the Level 5
Full Automation is the application of deep neural networks in classifying road actors.
Automation in the transport industry offers a wide range of advantages. To achieve
their full potential in the market, autonomous vehicles deal with the safety and comfort
of their users. Self-driving vehicles have been the talk of many studies exploring the
potential use cases of future mobility solutions [1–3].

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 478–484, 2023.
https://doi.org/10.1007/978-3-031-18344-7_32
Obstacle Recognition Using Depth Estimation 479

One of the common techniques in this task of object detection and recognition is
Deep Neural Network (DNN). It can be utilized in two famous techniques, namely,
one-stage detection and two-stage detection. One-stage detection includes You Only
Live Once (YOLO) [5] and Single Shot Detection (SSD) [4]. On the other hand, two-
stage detection includes Region Convolutional Neural Network (R-CNN) [7] and Spatial
Pyramid Pooling (SPP) [6]. The advantage of one stage detection is that it detects on
a higher speed and in a real-time basis, but this shows a slight disadvantage in higher
precision recognition. 2D and/or 3D images captured by high powered cameras and
highly programmed lidar sensors could be used for this task.
This study aims to utilize a convolutional neural network done using YOLOv4 to
recognize obstacles using an RGB and depth image data. Both the RGB and depth image
data will be captured by a stereo camera placed in a golf picker buggy.

2 Methods
2.1 Preparation
Although there are many datasets that could be used for obstacle recognition, the study
developed its own dataset. The images in the dataset were captured using two cameras –
FLIR and Mynteye. These cameras were installed in an autonomous vehicle that runs
for more than 200 h on the road both for daytime and night time. Both cameras run in a
Linux and ROS and provide a rosbag after the recording process. As seen in Fig. 1, the
cameras were attached in autonomous vehicles presented below.

Fig. 1. Cameras attached in the autonomous vehicles

The camera is positioned in the front side of the vehicle with approximately one
meter above the ground. The setup was intended to take images of the obstacles such as
pedestrian, car and movables.

2.2 Data Gathering Tool


The data gathering tool is composed of a Mynteye AI D1000-IR-120. Its lens provides
a stereo resolution of 2560 × 720 at 60 fps while its depth resolution captures at 1280
× 720 at 60 fps. This camera provides both depth and an RGB data (see Fig. 2 below).
480 J. Estrada et al.

2.3 Data Gathering

For the data gathering, the team was able to develop a more comprehensive, with high-
density and heavily occluded image dataset which comprises eight labels. A total of
700,000 annotations were completed across all labels. The annotation was done using
labelImg, an opensource tool for different annotation formats such as YOLO, JSON and
others.

Fig. 2. Depth and RGB Data of Mynteye Stereo Camera

2.4 Pre-requisites

The study used YOLO (You Only Look Once) and LabelImg for annotation. The YOLO
format includes center of the bounding box, width and height of the bounding box and
the class or label of the object. Annotation of at least 200,000 images was done manually
using LabelImg. A total of 700,000 annotations were gathered. This sums up as the initial
dataset for this study.
The software implementation was developed in Robotic Operating System using
C++ language using the pre-trained YOLOv3 architecture and Darknet and PyTorch
framework and libraries. In addition to this, mynteye sdk and modules were used. This
provides a more convenient and efficient working with the mynteye camera and OpenCV
libraries. Initially, we applied YOLO which works by putting and splitting an image into
n grid cells (usually 19x19). For each cell that represents a certain part of an object,
there will be predicted bounding boxes, confidence scores, and class probabilities. The
confidence is calculated using an IOU (intersection over union) metric that measures
how much a detected object overlaps with the ground truth as a fraction of the total area
spanned by the two together (the union).
YOLO aims to predict a class of an object in an image by the use of a bounding box
(see Fig. 3 below). Each bounding box has four descriptors namely:
Center of the bounding box (bxby) or (x, y)
Width (bw)
Height (bh)
C = class or label
YOLO is predicting that there is an object in the image instead of searching for
regions of interest. It is splitting an image S x S grid cells. Each cell is responsible for
predicting n number of bounding boxes. Each grid cell predicts a bounding box alongside
a confidence value. If a grid cell does not contain a bounding box, its confidence value
Obstacle Recognition Using Depth Estimation 481

must be zero. Most of these cells do not contain the object; therefore YOLO will predict
a value that will remove boxes with low object probability and bounding boxes with the
highest. This process is called non-max suppression. If the center of an object falls into
a grid cell, that cell is mainly responsible for the detection.

Before non-max suppression After non-max suppression

Fig. 3. YOLO architecture

3 Results
The results of the training are shown in Table 1 below. Given the generous number
of samples in the dataset, the model was able to achieve an accuracy of 92.48% for
pedestrian, 82.22% for car and 49.96% for movables which includes traffic cones. The
true positive and false positive metric was also shown below.

Table 1. Model training performance

Class Performance metrics


Accuracy TP FP
Pedestrian 92.48 11160 1356
Bicycle 62.27 8288 3174
Motorcycle 69.36 7969 1800
Car 82.22 188692 33501
Bus 61.87 4125 844
Truck 62.46 23401 7018
Movable 49.96 87947 54307

Once the model training is done and when we reached the acceptable and desirable
model, the mynteye camera will be connected to a linux-based industrial PC which has
an installed ROS Kinetic in Ubuntu 16.04. The mynteye camera has a downloadable
SDK, for this version of mynteye, SDK D is needed. Then, mynteye camera is up and
482 J. Estrada et al.

running, it will show multiple topics that were being published. Some of the topics that
needed for this object recognition task are:
/camera/image_raw == rgb data
/camera/dmap = = depth data

Fig. 4. Depth data and the Topics Associated with the Synchronization

As seen in the Fig. 4 above, the depth image data from the RGB-D camera shows as the
associated topics presented in rqtgraph. In the topics listed, there are two topics captured
from the RGB-D camera which are image_raw_color and depth_registered. These two
topics are the inputs for the darknet_ros node. These two topics were synchronized using
an image_sync function. This module is enabled for each grab call and these images will
be fed into the AI module (darknet_ros node) that will output the detected objects for
each frame.

4 Discussion
The next step is to publish the information as a topic with its respective and relevant
information. This information is being transferred as a message for the AV. The target
contents of the message are the following (see Table 2 below):

Table 2. Ros message contents

Data type Message contents


Name Description
Float64 Probability Confidence level of the recognized object
Int64 xmin
Int64 Ymin
Int64 Xmax
Int64 Ymax
Int16 Id
String Class Label’s name
Float32 z Distance from the camera
Obstacle Recognition Using Depth Estimation 483

Fig. 5. Actual Ros message being published

The sample output of the framework is seen on Fig. 5 below.


Figure 5 above shows the actual Ros message that was being published by the
darknet_ros node. This includes the name of the class (object), confidence level of
detection and the bounding box parameters and lastly the estimated distance of the class
to the camera. Figure 6 below shows the actual detection and recognition of the objects.

Fig. 6. Actual detection and recognition

5 Conclusion
The study was able to develop a system prototype of object recognition task using 2D
and depth information with distance estimation. The prototype was implemented in an
AV. This study recommends the use of lidar and camera for better sensing capabilities
of the vehicle since lidar is also a powerful tool for object recognition task.

References
1. Rohr, C., Ecola, L., Zmud, J., Dunkerley, F., Black, J., Baker, E.: Travel in Britain in 2035:
Future scenarios and their implications for technology innovation (2016). In Innovate UK from
https://www.rand.org/pubs/research_reports/RR1377.html
484 J. Estrada et al.

2. Trommer, S., Kolarova, V., Frädrich, E., et al.: Autonomous driving The impact of vehicle
automation on mobility behavior (2016). https://www.ifmo.de/publications.html?t=45
3. Urry, J.: What is the Future. Polity Press, Cambridge (2016)
4. Liu, W., Anguelov, D., et al.: SSD: single shot multi-box detector. In: ECCV (2016). https://
doi.org/10.1007/978-3-319-46448-0_2
5. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv:1804.02767 (2018)
6. He, K., Zhang, X., et al.: Spatial pyramid pooling in deep convolutional networks for visual
recognition. In CVPR (2014)
7. Girshick, R.: Fast r-cnn. In: ICCV (2015)
Humanoids Improving the Quality of Life
of Older People: The Case of Poland

Katarzyna Halicka(B)

Bialystok University of Technology, Bialystok, Poland


[email protected]

Abstract. Population ageing is a long-term global trend that began several


decades ago and became especially prominent in Europe. This trend is visible
in the changing age structure of the population as the proportion of older people
is growing while the proportion of working-age people is reducing. The demo-
graphic changes in the EU result in the decreasing percentage of working-age
people and the increasing relative number of pensioners. The coming decades will
see a significant rise in older adult numbers in the total population. The increas-
ing life expectancy and ageing are putting a strain on the economy while also
demanding to improve society’s well-being, meet the needs of older people and
make use of their potential and abilities. Older adults require support in the form
of care, especially in the case of a low independence level. One technology that
can significantly improve the quality of life of older people is humanoid robot (for
example the Rudy Robot). This paper aims to analyse and evaluate this technology
in terms of different groups of criteria: competitiveness, demand, technical, social
and ethical, ecological, and ease of use. These criteria were initially identified
and then examined to see which of them the respondents thought were the most
important. According to the respondents the most important criteria are demand
and technical aspects. Therefore, the article analyses the Rudy Robot in more
detail in terms of these two groups of criteria. It was also checked whether age,
gender of the respondent, place of residence and education had an impact on the
assessment the Rudy Robot. The survey was conducted in Poland at the turn of
2020 and 2021 with people over 40 years of age. The representative research sam-
ple consisted of 1152 respondents (all respondents were Polish citizens from all
voivodships). This type of research has not been conducted in Poland so far.

Keywords: Humanoids · Robots · Ageing population · Older adults ·


Technology assessment

1 Purpose
The scientific objective of the study is to analyse and evaluate the humanoid robotic
technology Rudy Robot for its capacity to improve the quality of life of older people. The
technology was evaluated against different groups of criteria. The examined demographic
characteristics included age, gender, place of residence and education. The main research
problem was formulated in the form of the following questions:

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 485–492, 2023.
https://doi.org/10.1007/978-3-031-18344-7_33
486 K. Halicka

1. What are the most important criteria for evaluating the Rudy Robot?
2. How was the humanoid assessed against different criteria?
3. Does age influence the assessment of humanoid robotic technologies used in the care
of older adults?
4. Does gender influence the assessment of humanoid robotic technologies used in the
care of older adults?
5. Does education affect the assessment of humanoid robotic technologies?
6. Does the place of residence affect the assessment of the Rudy Robot?

The article consists of five sections: literature review, research method, results, dis-
cussion, conclusions. Initially, based on the literature review, it discussed examples of
humanoid robot technologies. Their main features were indicated. In the next section,
the research process is presented. Four main research tasks are described and methods
used in the whole process are characterised. Further, the result of the research conducted
within the article is described. Then, the results are discussed and the article ends with
a short summary. Limitations of the performed research and plans for further research
are also presented.

2 Literature Review

The development of civilisation would have been impossible without inventions. For
more than two decades, the proportion of working-age people in the EU-28 has been
steadily declining, while the relative number of people in retirement has increased.
In 2021, compared to 2020, the proportion of people aged 65 and over increased in
all Member States except Lithuania, where it remained unchanged. The proportion of
the population aged 65 and over is increasing in every EU Member State, EFTA and
candidate countries. In the last decade, the increase ranged from 5.2 p.p. in Finland, 5.1
p.p. in Poland, 4.7 p.p. in Liechtenstein and 4.6 p.p. in the Czech Republic, to 1.3 p.p.
in Germany and 0.7 p.p. in Luxembourg. Over the last decade (2011–2021), there has
been an increase of 3 p.p. for the EU as a whole [1].
The proportion of the working-age population is estimated to continue decreasing
between 2021 and 2100, while older adults are likely to represent an increasing pro-
portion of the total population: people aged 65 or over will represent 31.3% of the EU
population by 2100, compared to 20.8% in 2021. The median age is projected to increase
by 4.9 years, rising from 44.1 in 2021 to 48.8 in 2100.
The increase in the number of older people is also associated with the need to provide
institutional support in the form of care, particularly in the case of a low level of inde-
pendence. The need exists for pro-health and digital education, the development of care
services, the creation of safe and functional housing and access to public transport. Tech-
nical solutions must be developed to support the functioning of older people. However,
such solutions require recognising knowledge from such fields as anthropotechnics (in
the scope of human-computer relations), cognitive psychology, neurobiology, artificial
intelligence, and IT, electrical and communication engineering. Robots can be one of the
major ways of helping older people. The purpose of robots is to help older people to live
and function as independently as possible [2]. Older people can use robots to lift, grip,
Humanoids Improving the Quality of Life of Older People 487

carry, remind them to take medication, recognise and assess health conditions, monitor
gait, motivate walking, and meet social needs through interaction. Pearl is an example of
a humanoid robot developed by Carnegie Mellon University. It is a mobile robot that can
help older people navigate a care facility. Pearl can follow patients, communicate via a
graphical touchscreen and serve as a telehealth device. It is equipped with two comput-
ers, sonar, stereo camera systems and wireless Ethernet. It can recognise and synthesise
speech using microphones and speakers [3]. Peral reminds older people of their daily
activities, such as eating, drinking, taking medicine or using the bathroom, and helps to
navigate their environments. Robovie is a humanoid robot designed to communicate with
humans and weighs approximately 40 kg. It is equipped with various sensors, including
skin and touch, microphones, vision sensors and ultrasonic obstacle sensors installed on
the mobile platform. The combination of sensors and various actuators for moving eyes,
head and arms facilitate meaningful behaviour [4].
Another example is the Olivia Robot, acting as a personal assistant and companion for
older people. The robot was developed at the A*STAR social robotics lab in Singapore.
Olivia version 2.1 is equipped with a pair of cameras for the eyes, and a third camera on the
forehead, targeting the face of the interlocutor and determining from the movement of the
mouth whether the person is speaking to it. Then, using the built-in eight microphones,
it starts listening to the caller while pointing its mechanised face towards the person’s
face. The stationary version of Olivia stands 1.6 m tall while weighing 152 kg, and the
researchers plan to mount it on a moving platform and equip it with gripping three-finger
manipulators.
Twendy-One is an extremely agile and intelligent humanoid robot designed to help
older and disabled people. It was designed and built by Japanese students and researchers
at Waseda University in Tokyo. Twendy-One can hold limited conversations. It uses a
built-in camera to locate designated objects. It can greet you, bring you breakfast on a
tray, wish you a tasty meal, then help you get out of bed and give you clothes or a walking
stick. The Twendy-One robot, equipped with soft hands that respond to human touch,
could be a recipe for overcrowded retirement homes. It can sensitively and effectively
help people get out of a chair or bed, and interact with its owner, responding to touch
and pressure accordingly, thanks to its sensors. The Twendy-One motor system has 47
degrees of freedom. It stands 147 cm tall and weighs 111 kg.
Another example of a humanoid robot is the RUDY robot. The RUDY robot can
remind a person to take medication and can also dispense it. It can call for help in an
emergency, be a good companion, reducing the feeling of loneliness.
Some characteristic features of humanoid robot, their functional capabilities have
been presented in the literature. On the basis of the literature review, it should be stated
that so far there have been no studies in which the humanoid robot technology and its
usefulness have been assessed. Therefore, it seems important to investigate what features
of robots according to current and future users are very important. It is also important
to find out whether the age, gender, etc. of the respondent affects the assessment of this
technology.
488 K. Halicka

3 Research Method
The entire research process consists of four main steps. The first research task is to iden-
tify, based on a literature review, criteria for evaluating the humanoid robotic technology
Rudy Robot.
Another research task is to evaluate the individual six groups of criteria by the
respondents, organise these groups and select the top two. In the study, the research
method was a diagnostic survey using the CAWI (Computer-Assisted Web Interview)
survey technique. Groups of criteria were evaluated by 1152 respondents from Poland,
aged over 40 years.
The third step of the research was the evaluation of the technology (Rudy Robot) in
the context of the two highest-rated criteria: demand and technical aspects.
The last step of the research was to investigate whether age, gender, education affect
and place of residence impacted the technology assessment (Rudy Robot) in the context
of the two groups of criteria: demand and technical aspects. It was checked how the
humanoid robotic technology Rudy Robot would be assessed by older adults in the
context of the highest-rated groups of criteria. The study used the non-parametric Mann–
Whitney U test to determine the effect of gender on the technology assessment. The
ANOVA Kruskal–Wallis test was used to examine the influence of age, education and
place of residence on the evaluation of the Rudy Robot.
The survey was conducted in Poland at the turn of 2020 and 2021 with people over
40 years of age. The representative research sample consisted of 1152 respondents (all
respondents were Polish citizens of all voivodships). In terms of age, the largest group
of respondents was aged over 60 (520 people, 45.1%). The second-largest group was
aged 50–59 and consisted of 329 people (28.6%), while the smallest group was aged
40–49 (303 people, 26.3%). Analysing the age of the respondents, women accounted
for over half of all respondents (625 people, 54.3%), and men comprised 45.7% (527
people). Over 42% (493 people) of all respondents had primary education, 31.1% (358
people)—higher education, 22.3% (257 people)—vocational education and 3.8% (44
people)—basic education.
The respondents evaluated the assessment of the analysed humanoid robotic tech-
nology using a seven-point Likert scale, where one meant “it definitely means I do not
agree with the given statement” and seven—“I definitely agree.”

4 Results

First, based on the literature review, the following six criteria groups were identified:
competitiveness, demand criteria, technical criteria, social and ethical criteria, ecological
criteria, and ease of use [5–7]. The criteria were formulated in the form of statements.
37 criteria were identified with eight statements related to the demand for technologies
(D1–D8) and the ecological aspect (E1–E8), six—to social and ethical aspects (SEC1–
SEC6) and competitiveness of technology (C1–C6), five—to technical criteria (T1–T5)
and four—to ease of use (EU1–EUT4).
Next, the six groups of criteria were evaluated, where one meant “very unimportant”
and seven—“very important”. The highest-rated were demand and technical criteria. The
Humanoids Improving the Quality of Life of Older People 489

list of the highest-rated group of criteria used to assess the humanoid robotic technology
Rudy Robot is presented in Table 1.

Table 1. Catalogue of highest-ranked technology assessment criteria

Acronym Name of the criterion Mean scores


DEMAND
D1 There is a need for the Rudy Robot in institutions responsible for 4.85
the care of older adults (e.g., nursing homes)
D2 The Rudy Robot will bring users additional benefits unavailable 5.03
through other solutions
D3 The popularisation of the Rudy Robot corresponds to forecasts 5.05
concerning technology development directions and the
expectations of older adults
D4 The Rudy Robot has higher ease of use and operation simplicity 5.00
than the technologies used so far
D5 The use of the Rudy Robot is consistent with the previous habits 4.26
of older adults
D6 Changes in the environment (e.g., new legal regulations, 4.73
consumption trends or technological standards) make the Rudy
Robot more attractive for older adults
TECHNICAL CRITERIA
D7 The infrastructure necessary for the efficient use of the Rudy 4.15
Robot is available
D8 Potential users are ready to pay for the Rudy Robot considering 3.65
the prices of technologies used so far
T1 The Rudy Robot is implemented and successfully used by older 3.80
adults
T2 Serious technical problems are likely to occur during the 5.01
development of the Rudy Robot
T3 The widespread use of the Rudy Robot depends on the use of 5.20
hard-to-reach materials
T4 The Rudy Robot can complement the solutions currently 4.15
available on the market
T5 There is a great potential for further improvement of the Rudy 3.65
Robot

Next, the age and gender of respondents were checked for the influence on the tech-
nology assessment of the humanoid robotic technology Rudy Robot. A critical signifi-
cance level was assumed at p = 0.1. The study used the non-parametric Mann–Whitney
U test [8, 9] to determine the effect of gender on the technology assessment. The ANOVA
490 K. Halicka

Kruskal–Wallis test was used to examine the influence of age, education and place of res-
idence on the evaluation of the Rudy Robot. The research used the Statistica 13 software.
The respondents assessed the demand for the humanoid robotic technology Rudy Robot.
The statistical values for the technology assessment in terms of demand are presented
in Table 2.

Table 2. Statistical values of the Rudy Robot’s technology assessment by older adults for demand

Acronym Gender Age Education Residence


p T p T p T P
D1 0.00887 1.559 0.458 8.491 0.369 2.647 0.754
D2 0.01587 7.867 0.019 1.779 0.619 10.155 0.071
D3 0.01951 5.523 0.063 2.686 0.443 6.113 0.295
D4 0.27676 6.434 0.040 5.997 0.1127 7.100 0.213
D5 0.18436 0.55 0.759 8.901 0.306 5.820 0.324
D6 0.01123 8.052 0.018 14.590 0.002 3.433 0.634
D7 0.99766 1.652 0.438 8.369 0.385 6.861 0.231
D8 0.73719 0.497 0.780 8.641 0.345 10.989 0.052

The respondents also assessed technical aspects of the humanoid robotic technology
Rudy Robot (Table 3).

Table 3. Statistical values of the Rudy Robot’s technology assessment by older adults for the
technical aspect

Acronym Gender Age Education Residence


p p T p T p T
T1 0.15486 0.667 14.906 0.002 13.900 0.016 0.667
T2 0.09695 0.361 4.867 0.187 3.548 0.616 0.361
T3 0.01099 0.435 2.886 0.409 6.856 0.231 0.435
T4 0.10666 0.5455 7.247 0.644 12.027 0.034 0.5455
T5 0.20400 0.284 2.684 0.443 9.759 0.082 0.284

5 Discussion

Table 2 shows significant assessment differences (D1, D2, D3 and D6) between genders
in their rating of the demand level for the humanoid robotic technology Rudy Robot (p <
Humanoids Improving the Quality of Life of Older People 491

0.1). Also, Table 2 demonstrates statistically significant (p < 0.1) differences depending
on age (the acceptance of drivers D2, D3, D4 and D6), education (D6) and the place of
residence (D2 and D8).
Table 3 provides that statistically significant differences between genders in this
technology assessment for the technical aspect occur only in statement T3 (p < 0.1).
Such differences depending on education appear in the acceptance of statement T1 (p
< 0.1) and in terms of the place of residence—statements T1, T4 and T5 (p < 0.1).
No significant differences were observed depending on age in the assessment of these
statements.
A 90% probability exists that a respondent’s gender influences the assessment of
technology in terms of demand and technical criteria, i.e., statements D1 (“There is a
need for the Rudy Robot in institutions responsible for the care of older adults e.g.,
nursing homes”), D2 (“The Rudy Robot will bring users additional benefits unavailable
through other solutions”), D3 (“The popularisation of the Rudy Robot corresponds to
forecasts concerning technology development directions and the expectations of older
adults”), D6 (“Changes in the environment make the Rudy Robot more attractive for
older adults”) and T3 (“The widespread use of the Rudy Robot depends on the use
of hard-to-reach materials”). The research results also allow observing with the same
probability of 90% that the age of a respondent influences the technology assessment in
terms of demand and technical criteria, i.e., statements D2, D3, D4 (“The Rudy Robot
has higher ease of use and operation simplicity than the technologies used so far”) and
D6. The age of the respondent has no influence on the evaluation of the technical aspect
of humanoid robotic technologies.
Also, it was found that a respondent’s education influences the technology assessment
of demand and technical criteria, i.e., statements D6 and T1 (The Rudy Robot is imple-
mented and successfully used by older adults). In turn, place of residence influenced
the assessment of humanoid robotic technologies in terms of the criteria D2, D8, T1,
T4 (“The Rudy Robot can complement the solutions currently available on the market”)
and T5 (“There is a great potential for further improvement of the Rudy Robot”).

6 Conclusions
The article assesses humanoid robotic technologies (Rudy Robot) that improve the qual-
ity of life of older people. The research mainly aimed to find answers to the following
questions: (1) What are the most important criteria for evaluating the Rudy Robot? (2)
How would a humanoid be assessed against different criteria? (3) Does age influence
the assessment of humanoid robotic technologies used in the care of older adults? (4)
Does gender influence the assessment of humanoid robotic technologies used in the
care of older adults? (5) Does education affect the assessment of humanoid robotic
technologies? (6) Does the place of residence affect the assessment of the Rudy Robot?
The conducted research showed that the humanoids for older adults received the
highest rating in terms of demand (average of all scores on a scale from 1 to 7 – 4.67).
This technology also received high ratings for technical and demand (average of all
scores on a scale from 1 to 7 – 4.36).
The research indicated that the age of the respondent does not affect the evaluation
of the Rudy Robot technology in terms of technical aspects, for each criterion of this
492 K. Halicka

group a critical significance level was greater than 0.1 (p > 0.1). Also the education
and gender of the respondent mostly does not affect the evaluation of this technology in
terms of technical aspects.
However, age and gender have an impact on the evaluation of this technology (the
Rudy Robot) in terms of demand.
The study carried out the following limitations:

1. only respondents over 40 years of age made an assessment of the technology, younger
people were not considered;
2. the survey was conducted on the territory of only one country;
3. all criteria had the same weighting.

In her next research, the author plans to extend the research to a larger sample and
other countries. She also intends to consider a different weight of criteria and a different
weight of decision makers (e.g. taking into account the age of respondents according
to the assessment criterion). Furthermore, in the author’s opinion, other technology
assessment criteria should also be taken into account, such as technology readiness
levels or technology life cycle analysis.

Acknowledgments. This research was funded by the Ministry of Science and Higher Education,
grant number W/WIZ/1/2019. The publication of the article for 11th International Conference on
Engineering, Project, and Production Management - EPPM2021 was financed in the framework of
the contract no. DNK/ SN/465770/2020 by the Ministry of Science and Higher Education within
the "Excellent Science" programme.

References
1. Eurostat. https://ec.europa.eu/eurostat/statistics-explained/index.php?title=Population_stru
cture_and_ageing#Past_and_future_population_ageing_trends_in_the_EU Accessed 29 Mar
2022
2. Zhou, D., Barakova, E.I., An, P., Rauterberg, M.: Assistant robot enhances the perceived
communication quality of people with dementia: a proof of concept. IEEE Transactions on
Human-Machine Systems (2021). https://doi.org/10.1109/THMS.2021.3112957
3. Pollack, M.E., et al.: A mobile robotic assistant for the elderly. Paper presented at the AAAI
Workshop on Automation as Eldercare, July 29, 2002, Edmonton, Alberta, Canada (2002)
4. Kanda, T., Ishiguro, H., Imai, M., Ono, T.: Development and evaluation of interactive humanoid
robots. Proc. IEEE 92, 1839–1850 (2004)
5. Ejdys, J.: Innovativeness of residential care services in Poland in the context of strategic
orientation. Procedia. Soc. Behav. Sci. 213, 746–752 (2015)
6. Nazarko, J., et al.: Foresight study of road pavement technologies. Procedia Eng. 122, 129–136
(2015)
7. Radziszewski, P., et al.: Future trends in road pavement technologies development in the context
of environmental protection. Baltic J. Road Bridge Eng. 11(2), 160–168 (2016)
8. Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics Bulletin 1, 80–83 (1945)
9. Mann, H.B., Whitney, D.R.: On a test of whether one of two random variables is stochastically
larger than the other. Ann. Math. Stat. 18, 50–60 (1947)
3D Concrete Printing with Macro-micro Robots

Ian D. Walker(B) , Venkat N. Krovi, Abdul B. Peerzada, Adhiti Raman,


Prasad Rangaraju, Matthias J. Schmid, and Manu Srivastava

Clemson University, Clemson, SC 29634, USA


[email protected]

Abstract. We discuss an innovative approach to automating a key process in the


construction industry. Specifically, we describe the design and development of a
robotic system designed to interactively assist construction workers to dexterously
deploy concrete-delivery hoses in congested spaces for 3D printing of concrete.
The approach is based on an intelligent novel cable-driven macro/micro co-robot,
where a cable driven parallel robot acts as the macro-base, and a cable-driven
continuum robot (integrated with an industrial concrete delivery hose) serves as
the micro-unit. The macro system acts as a relatively light and easily deployable
base unit, with the micro-unit providing the delivery hose with dexterity, by con-
verting it into a continuum robot. Overall, the system is kinematically redundant,
enabling the use of the redundant degrees of freedom to enhance the performance
of the system. The robotics research is enabled by materials research in cementi-
tious materials, developing concrete with suitable properties for 3D printing. We
describe the design, development, and initial testing of the system.

Keywords: Robotics · 3D printing · Construction

1 Introduction
1.1 Motivation and Overview
The construction industry is vital for the national economy, accounting for approximately
5% of US Gross Domestic Product (GDP). In 2017, the average gross output by the
construction industry in the first three quarters was reported at $1.463 trillion [1], creating
employment for 6.7 million workers [2]. However, construction is also one of least
automated industries in the world [3]. Productivity within the industry is hindered by
the lack of automation tools. Concrete operations are a foundational element within
construction, and automation of those operations could improve efficiency, productivity,
and safety.
In this work, we discuss the augmentation of the cement delivery process, specifi-
cally the robotization of the concrete delivery process. The goal is to develop a relatively
portable, cable-actuated system, to position and manuever the cement hose in the pres-
ence of obstacles, including construction workers, in the workspace. A further goal is to
provide additional dexterity at the nozzle, to allow for non-traditional pouring of cement,
to enable, for example, repair operations requiring non-vertical deposition.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 493–498, 2023.
https://doi.org/10.1007/978-3-031-18344-7_34
494 I. D. Walker et al.

In the following sections we describe and discuss the robot system design and imple-
mentation, as well as related materials research (to produce cement suitable for robotic
printing) and experiments conducted to date with an early prototype of the robotic hose.

2 Robot System Design and Implementation


2.1 Macro-system Parallel Cable Driven Robot Base
The macro-system, which is a cable-driven parallel robot (CDPR), maneuvers the (now
itself robotic) cement delivery hose within the construction site. The design concept is
for the (vehicle-) portable system to be transported to the site and quickly assembled. A
supporting frame structure houses the winches that actuate the cables of the CDPR. The
winches vary the length of the cables, and in addition their bases can actively translate
along the beams of the supporting structure. This arrangement provides redundancy in
the CDPR system. The payload for the CDPR, to which the cables are terminated in
the interior of the system, is the micro-system cable hose (discussed in the following
subsection).
As part of this research, we have adopted a new modeling framework for prototyping
the CDPR design. We have performed a wrench space analysis of end-effector properties
of general reconfigurable CDPRs, and subsequently developed redundancy resolution
schemas to optimize alternative parameters in the system. In particular, by suitably
utilizing the redundancy, the stiffness of the CDPR payload can be optimized in given
(desirable) directions.
We are also developing a framework for failure mode identification and recovery
using the redundancy in the CDPR system. This allows the system to identify the most
likely location of cable failures, and to use the redundancy in the system to recover from
the failure, where physically possible.

2.2 Micro-system Cable Driven Continuum Robot Hose


The micro-system is a robotized version of the industrial cement hose. Actively actuated
by externally mounted cable tendons, the hose effectively becomes a continuous back-
bone “continuum” robot [4], with two sections (independently controllable lengths of
the backbone). Conversion of the hose to a cable driven robot was achieved by fitting a
series of collars through which cable tendons were routed, around the hose. The collars
were evenly spaced along the last 2 m of the hose – the part for which robotic activation
was desired. See Fig. 1.
A set of three tendons, spaced at 120 degree separation radially, were terminated at
the tip of the hose, and another three tendons were terminated approximately half way
along the robotized hose. Differential tensions on each of the three tendon sets produces
controllable bending, in two dimensions, of that section of the hose. Consequently, the
hose has four controllable degrees of freedom, two per section. This provides a single
degree of redundancy for orientation of the hose tip, which can used to support avoidance
of obstacles in the workspace.
The tendons were actuated by electric motors, which were mounted in an assembly at
the base (proximal end) of the robotized part of the hose. Grooved capstans over which
3D Concrete Printing with Macro-micro Robots 495

Fig. 1. Tendon-routing collars mounted on industrial cement hose.

the cables were routed formed a key part of this assembly. Bearings were mounted
within the collars to reduce the effects of friction – a challenge for tendon-operated
continuum robots in configurations featuring high bending angles in practice. Encoder
measurements at the motors were transformed to hose shape, via cable lengths, with
continuum kinematics [5]. The resulting system is shown in Fig. 2.
Control of the hose robot was initially implemented utilizing Constant Curvature
(CC) kinematics, the standard approach for continuum robots [4, 5]. However, the non-
uniform construction of the cement hose, featuring a heavy nozzle, produced section
curvatures that deviated significantly from the constant curvature assumption. This led
to significant errors in the end effector positions and orientation achieved. Consequently,
we subsequently developed and implemented an Euler curve based Variable Curvature
(VC) kinematic model for multiple (two, in the hardware implementation) continuum
robot sections. The VC approach significantly improved end effector accuracy compared
to the CC model. Details of this approach can be found in [6].
The assembled prototype system with the cable hose integrated is illustrated in Fig. 3.
We are currently implementing a statics-based model that ensures uniform EE velocity
and curvature at all bending planes developed for a single section. This approach further
improves upon on the VC kinematics model. Multi-section versions of the model are
currently under development.
496 I. D. Walker et al.

Fig. 2. Cable-driven robotic cement hose micro-system

Fig. 3. Robotic cement hose system prototype


3D Concrete Printing with Macro-micro Robots 497

3 Materials Research and Experiments


3.1 Materials Research

In order to produce cementitious materials suitable for 3D printing with the robot system,
fundamental rheological research was conducted. Specifically, we developed 3D print-
able mixtures of Portland cement with slag and metakaolin. We further investigated the
influence of aggregate shape characteristics on the rheological behavior of 3D printable
mixtures, supplemented with preliminary work on shrinkage, mechanical and transport
behavior of 3D printable mixtures. We are evaluating the impact of chemical accelerators
on properties of printable mixes.
We are also developing dynamic rheology control for cementitious materials for
3D printing. We have designed and manufactured an active-rheology control test setup,
conducting material trials utilizing rheometer and flow rate measurements. The active-
stiffening test setup designed is currently under manufacture.

3.2 Printing Experiments

Initial experiments with the prototype, using the new cementitious mixes pumped into
and through the hose, demonstrated the ability of the hose to maneuver to print a series
of shapes in the horizontal plane. See Fig. 4.

Fig. 4. Initial pumping of cement with robotic cement hose system.

The results further validated the ability of the research in rheology to generate mixes
which could both be smoothly pumped through the hose without congestion, but also
hardened sufficiently quickly to support printing multiple additional layers in the vertical
direction as the robot re-traversed the trajectory. Additional details are provided in [6].
498 I. D. Walker et al.

4 Conclusions
We describe a novel intelligent cable-driven macro/micro co-robot system aimed at 3D
printing of concrete in the construction industry. A a cable driven parallel robot acts
as a macro-base, maneuvering the cable-driven continuum robot (integrated with the
concrete delivery hose in the application) which serves as the micro-unit, providing the
hose with controllable dexterity.
Both the macro and micro robotic elements possess redundant degrees of freedom.
Redundancy resolution for the cable-driven macro robot system allows controllable
stiffness of its payload. Variable curvature kinematics are used for motion planning
and control of the micro robot hose. Fundamental research in rheology of 3D-printable
concrete has enabled the research by identifying and evaluating concrete mixes suitable
for robotic printing.
Experiments with an initial prototype demonstrate the ability to 3D print closed
structures, and also highlighted the need for improved control of the system. Future
work will exploit the overall redundancy in the system, treating the macro and micro
units as a single coupled system. We are currently working on IMU-based sensing to
support end-effector-based control. The wider goals for the research are to provide field
intelligence by adding situational awareness and physical-assist capabilities.

References
1. Bureau of Economic Analysis, Department of Commerce (2017). https://www.bea.gov/
Accessed 20 Apr 2022
2. Bureau of Labor Statistics, Department of Labor (2017). https://www.bls.gov/ Accessed 20
Apr 2022
3. Agarwal, R., Chandrashekaran, S., Sridhar, M.: Imagining Construction’s Digital
Future (2016). https://www.mckinsey.com/industries/capital-projects-and-infrastructure/our-
insights/imagining-constructions-digital-future Accessed 20 Apr 2020
4. Webster, R.J., III., Jones, B.A.: Design and kinematic modeling of constant curvature
continuum robots: a review. Int. J. Robot. Res. 29(13), 1661–1683 (2010)
5. Jones, B.A., Walker, I.D.: Kinematics for multisection continuum robots. IEEE Trans. Rob.
22(1), 43–55 (2006)
6. Srivastava, M., Ammons, J., Peerzada, A.B., Krovi, V.N., Rangaraju, P., Walker, I.D.: 3D
Printing of Concrete with a Continuum Robot Hose Using Variable Curvature Kinematics. to
appear. In: IEEE International Conference on Robotics and Automation (ICRA), Philadelphia,
PA (2022)
Automatic Polarity Identification on Twitter
Using Machine Learning

José Carmen Morales Castro1 , Rafael Guzmán Cabrera1(B) , José Ruiz Pinales1 ,
Luis Manuel Ledesma Carrillo1 , and Belém Priego2
1 Departamento de estudios multidisciplinarios Colonia Yacatitas, Universidad de Guanajuato,
Yuriria, Gto., Yuriria, Guanajuato, México
{jc.moralescastro,guzmanc,pinaleslm.ledesma}@ugto.mx
2 Departamento de Sistemas, Reynosa Tamaulipas universidad Autónoma Metropolitana
Unidad Azcapotzalco, Azcapotzalco, Ciudad de México, México
[email protected]

Abstract. This work presents a study of emotions to analyze the polarity of a


set of data that was extracted from Twitter, detailing each of the resources in the
different forms that a language has, and to be able to observe feelings such as
irony, sarcasm, and happiness, among others. This research can help us classify
the polarity of each one of them deeply in the corpus that deals with this research
work. Experimental results conducted using different machine learning methods
are presented: Support Vector Machines, Naïve Bayes, Logistic regression, KNN
and Random Forest, with which a classification system based on cross-validation
was implemented. All experiments were performed in Python. The results obtained
are shown with two different Corpus; where the first set is made up of 10,653 tweets
in total divided equally each with 3551 tweets with a positive, negative and neutral
label; while the second set was handled with 10% of all the tweets contained in
the database mentioned in the article, where the first set shows a polarity precision
of 74.9%, having Logistic Regression as the best classifier using the classification
scenario known as cross validation, while the second set shows an accuracy of
78.5%, also having Random Forest as the best classifier using Cross Validation as
the best classification scenario.

Keywords: Sentimental polarity · Sentiment analysis · Machine learning

1 Introduction
Currently microblogging websites have become digital spaces of varied information,
where users publish and disseminate information in real time related to a wide variety of
topics where opinions can be expressed through texts that implicitly carry an emotional
charge. This means that opinions carry an emotional charge that becomes a positive or
negative opinion about people, products, or services that are conducted in daily life.
Several companies, organizations and institutions have made use of this type of
media to obtain feedback, promote themselves, or simply to convert the opinion of users
into an improvement network that has begun to poll micro blogs to get an idea about the

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 499–507, 2023.
https://doi.org/10.1007/978-3-031-18344-7_35
500 J. C. M. Castro et al.

general sentiment of their users, products and services [1]; in this context is Twitter that
in recent years has had an important growth in the so-called “social panoramas”, used in
a transmission system, as well as a conversation tool [2], that is why this social network
is currently widely used for the development of numerous investigations, including
sentiment analysis or opinion mining as it is also known, where sentiment analysis
is defined as the process of determining opinions based on attitudes, evaluations and
emotions about specific topics [3].
Some research works like [4] describe opinion mining as an automatic treatment
of opinions contained in a sentence, this allows determining the polarity or feeling that
is expressed, whether positive, negative, or mixed, as well as allowing the automatic
extraction of characters that helps to know the perception that users have on specific
topics and aspects.
Because the emotions that users express in the Tweets are related to the feelings
of the person, and the polarity (positive, negative, and neutral) that is the measure of
the emotions expressed in a sentence. Generally, the polarity goes from negative (-1) to
positive (1) passing through neutral (0), this last value means that no feeling or opinion
has been expressed [5].
The structure of this research work is briefly described below, where Sect. 2 talks
about the related working order to give a better understanding of the orientation of the
project.
Section 3 explains the problem that we want to combat with this research work and
the hypothesis that we want to carry out to verify the results that are going to be obtained.
Section 4 explains the proposed methodology where the explained process that allowed
us to develop the project in a competitive and efficient manner.
The results and conclusions that are delivered with this research work are shown in
Sect. 5 and 6 respectively, to end with the references that were used to carry out this
research work.

2 Related Work

Authors like [6] describe sentiment analysis as a task that is responsible for identifying
and classifying different points of view and opinions on a particular issue, which may
be an object, a person, or an activity, among others; based on natural language process-
ing (NLP) to identify the state of mind of people, collecting comments, reactions and
messages through social networks, where the main objective is the analysis of online
documents and their classification as feelings: positive or negative, there is also the
possibility that they do not exist and would be classified as neutral. In social networks,
research has been growing in the sentiment analysis where its classification depends a lot
on the use of keywords in the texts, a small factor that can cause a problem would be the
information stored in the graphics, videos, or images since they can include information
not found in the accompanying text.
In [7] the authors present some techniques used to review sentimental analysis,
which help automatically determine the polarity in a text, the most common being those
based on machine learning, which is an important part of Artificial Intelligence, since
it develops programs through algorithms of learning and the generation of knowledge
Automatic Polarity Identification on Twitter Using Machine 501

capable of learning to solve problems; within the possible applications that can become
as useful as they are different, which is why it is currently still an open research topic
in which very attractive contributions continue to be made and interesting in the area
of sentiment analysis, it is worth mentioning that sentiment analysis not only focuses
on the part of identifying polarity in opinions expressed through subjective texts, since
this task can go much further, allowing even to carry out the identification of particular
feelings, such as the classification of primary feelings such as joy, sadness, anger, fear,
among others.
Another technique used for reviewing sentiment analysis is semantic orientation,
which is responsible for extracting opinions. In [8] the authors explain that the semantic
orientation of a word can become positive when it is shown through praise or negative,
when it is presented as criticism. It uses a learning technique that does not necessar-
ily have to be supervised since it does not require initial training, that is, it does not
require manually labeled instances to conduct the learning process. Authors like [9] tell
us about how to adapt this semantic orientation system, as an example, to be able to
perform sentiment analysis in a new language, building support vector machine (SVM)
classifiers, taking into account the approach they used through automatic learning of this
text classifier, based on the fact that the classifiers can be trained in any language, for
this they carried out cross-validation tests using a classifier based on the SVM learning
method.
To conduct the automatic identification of sentiments in information systems, with
the best features, in tweets using an architecture that combines base classifiers and lexical
resources. To try to define automatic tools capable of extracting subjective information
from texts in natural language, such as opinions or feelings, to create structured and
actionable knowledge to be used by a decision-making system.
The question we ask in the development of this research work is to know if it is
possible to identify polarity in unstructured texts using machine learning techniques.

3 Methodology
In the present work, the classification of tweets was conducted. The evaluation uses
two datasets corresponding to opinions issued on Twitter, which have around 163,000
tweets, which are found labeled in terms of the polarity of opinion as: positive, negative,
and neutral. The first data set is made up of 10,653 tweets in total, divided equally
where 3,551 tweets contain a positive label, 3,551 a neutral label and 3,551 contain a
negative label; while the second database used to manage 10% of the totality of the tweets
contained in the database. These tweets and comments were made about Narendra Modi
and other leaders, as well as the opinion in society towards the next prime minister of
the nation (in the context of the general elections held in India in the year 2019), and
that will help classify them automatically. For this work we take this database as an
object of study since it is interesting to see how the media perception of a character can
be measured through opinions issued on social networks and that this can undoubtedly
help the character in question correct or moderate your speech in relation to a particular
topic.
Selecting the database was our first step to build the classifier, the texts are labeled
with values from −1 to 1, where:
502 J. C. M. Castro et al.

• 0 indicates a neutral Tweet/comment


• 1 indicates a positive sentiment
• −1 indicates a negative tweet/comment

It is worth mentioning that this is a standard database that is available on the internet1 .
In Fig. 1, the diagram that illustrates the method implemented in this work is shown.
In our case, we use the classification scenario based on cross-validation, which is one
of the most widely used resampling methods to evaluate the generalization capacity of
predictive models and thus estimate the true prediction error and parameter adjustment
[10].

Fig. 1. Methodology implemented in this work

Next, each of the elements that make up the proposed method is briefly described.
For data entry, the first thing that was done was to change the value in the database
by setting the polarity as positive, negative, and neutral respectively, according to the
existing numerical label.
The experiments were conducted in Python, we added the corpus, where the text of
the tweets was used as learning characteristics, that is, the comments or opinions that
the users made.
In the preprocessing part, the stop words were eliminated. To create the stop words
lists, a wordcloud is inserted to be able to visualize the words that are repeated the
most and that may be irrelevant for the analysis which are the empty words of content
but that serve us for structurer prayers and being able to express ourselves correctly.
However, since for the classification system, it becomes a matrix problem, will fewer
1 https://www.kaggle.com/cosmos98/twitter-and-reddit-sentimental-analysis-dataset.
Automatic Polarity Identification on Twitter Using Machine 503

elements reduce the dimensionality of the matrix. Once this process is done; to verify,
a wordcloud is created again, as shown in Fig. 2 to ensure mentions were found as
headwords, once stop words were removed.

Fig. 2. Word cloud with stop words removed

With the adjustment to the document, a bag of words is passed, also known as a
bag of words, where the IDF (Inverse document frequency) is selected as the document
frequency. As the next step of the proposed method, the label variable to be analyzed
is placed, in this case it is the row where the positive, negative, and neutral labels are
found.
An analysis is conducted to see which classifier yields the best precision results,
for which the following learning methods, widely known in the state-of-the-art, were
used: Support Vector Machines (SVM), which is a method that is based on learning and
provides us with support in solving problems through classification and regression, which
is based on training and resolution phases, this method proposes an answer (output) to
an established problem.
Logistic Regression (RL), which is defined in [11] as a machine learning classifica-
tion algorithm used to predict probability and data using lines, requiring the dependent
variable to be binary.
Naïve Bayes (NB), which is a classifier that helps us calculate the probability of
an event having information about it, based on the theorem and additional hypotheses
[12]. Random forest, according to Breiman, is a classifier that consists of a combination
of tree classifiers where each one of them is generated using a sampled random vector,
independently of the input vector, where each tree casts a unitary vote for the most
popular class and can classify an input vector.
González in [13] explains that KNN is a supervised machine learning non-parametric
classification method that estimates the value of the probability density function or
directly the probability that an element belongs to a class from the information provided
by the set of prototypes. It is used to classify values by looking for the most similar data
points learned in the training stage and making guesses of the new points based on that
classification.
504 J. C. M. Castro et al.

Fig. 3. Sequence diagram in work done in python

As evaluation metrics, the following were used:

1. Area under the curve (AUC) is calculated using the area under the ROC curve and the
larger the area the more accurate the predictor is formally, the formula to calculate
the AUC is represented by Eq. (1):
1
AUC = ∈ f (x)dx (1)
0

where f(x) represents the receiver operating characteristic (ROC) curve function,
however since f(x) tends not to have an integration form like a parabola; several
authors suggest using approximation methods to calculate AUC [14].
2. Accuracy is the degree of closeness to the true value; it refers to a measurement with
both true and consistent results. The formula to calculate the Accuracy is represented
by Eq. (2):
tp + tn
Accuracy = (2)
tp + tn + fp + fn
where tp represents a true-positive value, tn a true-negative value, fp a false-positive
value, and fn a false-negative value.
3. F1 is a measure of precision in a test that is calculated from the precision and recall
of the test that is being conducted. In a nutshell F1 is the harmonic mean of the
precision and recall, which is shown in the Eq. (3):
tp
F1 = 1
(3)
tp + 2 (fp + fn)

where tp is a true-positive value, fp is a false-positive value, and fn is a false-negative


value.
Automatic Polarity Identification on Twitter Using Machine 505

4. Precision is a performance metric applied to data retrieved from a collection, corpus,


or sample space; it is also known as positive predictive value which is a fraction of
relevant instances among the retrieved instances as shown in Eq. (4):
tp
Precision = (4)
tp + fp
where tp equals a true-positive value and fp equals a false-positive value.
5. Recall also known as sensitivity is a fraction of relevant instances that were
successfully retrieved, is shown in Eq. (5):
tp
Recall = (5)
tp + fn

where tp represents a true-positive value and fn represents a false-negative value [15].

4 Results
Carrying a sequence and work as shown in the Fig. 3, using Python, the results obtained
in the experiments conducted are shown below.
This experiment was carried out in two stages, where two lists of stop words were
used, which were eliminated from the documents under study. The first list consists of
173 stop words, and the second list contains a total of 665 stop words and is available
on the internet2 , it can be found online3 . The results obtained for the database with 10%
of tweets, are shown in Table 1 for cross validation and in Table 2, for the training and
testing sets. As well, those results for the database with the number of tweets in equal
content for each polarity, are shown in Table 3 for cross validation and in Table 4 for the
training and testing sets.

Table 1. Evaluation metrics database at 10% for cross validation

Model Baseline Short stopwords Long stopwords


KNN 0.409 0.405 0.499
SVM 0.566 0.485 0.485
Random Forest 0.746 0.785 0.757
Naïve Bayes 0.679 0.687 0.685
Logistic Regression 0.763 0.748 0.730

As a result, we can observe that for the first database which contains 10% of the total
tweets, the best result is given by Random Forest with a 78.5% in accuracy with cross
validation, while for the second database which contains the same number of tweets for
each polarity, the Logistic Regression classifier gives us a result of 74.9% accuracy, with
the same classification scenario: Cross validation.
2 https://github.com/manishkanadje/reuters21578/blob/master/stopwords.txt.
3 https://www.ranks.nl/stopwords.
506 J. C. M. Castro et al.

Table 2. Evaluation metrics database at 10% for training and test sets

Model Baseline Short stopwords Long stopwords


KNN 0.400 0.398 0.491
SVM 0.570 0.517 0.496
Random Forest 0.758 0.780 0.762
Naïve Bayes 0.686 0.689 0.685
Logistic Regression 0.756 0.739 0.721

Table 3. Evaluation metrics equitable database for cross validation

Model Baseline Short stopwords Long stopwords


KNN 0.544 0.524 0.562
SVM 0.536 0.519 0.494
Random Forest 0.717 0.735 0.714
Naïve Bayes 0.641 0.646 0.639
Logistic Regression 0.766 0.749 0.725

Table 4. Evaluation metrics equitable database for training and test sets

Model Baseline Short stopwords Long stopwords


KNN 0.546 0.547 0.558
SVM 0.546 0.514 0.497
Random Forest 0.717 0.740 0.714
Naïve Bayes 0.645 0.643 0.645
Logistic Regression 0.762 0.747 0.724

5 Conclusions and Perspectives


The fundamental objective of sentiment analysis is to define automatic tools capable
of extracting subjective information from natural language texts, such as opinions or
sentiments, to create structured and actionable knowledge to be used by a decision-
making system.
In conclusion, it can be said that conducting the identification of sentiments in
unstructured texts, such as those found on Twitter, is a non-trivial task that is increasingly
used by both companies and government institutions. Based on the results obtained, we
can observe that the best result in the identification of feelings is obtained by using the
short list of stop words, thus easing the processing of large volumes of information,
Automatic Polarity Identification on Twitter Using Machine 507

and allowing the ability to identify areas of opportunity for improvement in the case of
negative opinions.
As future work, new text processing techniques could be tested to reduce the margin
of error in terms of classification, as well as testing with different classification methods
in order to be able to make the comparison with the other methods that were used in the
study.

References
1. Agarwal, A., Xie, B., Vovsha, I., Rambow, O., Passonneau, R.J.: Sentiment analysis of twitter
data. In: Proceedings of the Workshop on Language in Social Media (LSM 2011), pp. 30–38
(2011)
2. Jackson, J., Gettings, S., Metcalfe, A.J.N.: “The power of Twitter”: Using social media at a
conference with nursing students. 68. Elsevier, pp. 188–191 (2018)
3. Fiorini, P.M., Lipsky, L.R.: Search Marketing Traffic and Performance Models. 34(6), 517–
526 (2012)
4. Fernandez, J., Boldrini, E., Manuel Gomez, J., Martinez-Barco, P.J.P.D.L.N.: Sentiment
Analysis and Opinion Mining: The EmotiBlog Corpus. 47, pp. 179–187 (2011)
5. Reyes, A., Rosso, P., Veale, T.: A Multidimensional Approach for Detecting Irony in Twitter
47(1), 239–268 (2013)
6. Saberi, B., Saad, S.: Sentiment Analysis or Opinion Mining: A Review. 7(5), 1660–1666
(2017)
7. Hierons, R.:Machine learning. Tom M. Mitchell. Published by McGraw-Hill, Maidenhead,
UK, International Student Edition, 1997. ISBN: 0-07-115467-1, 414 pages. Price: UK£ 22.99,
soft cover, ed: Wiley Online Library (1999)
8. Chaovalit, P., Zhou, L.: Movie review mining: A comparison between supervised and unsu-
pervised classification approaches. In: Proceedings of the 38th Annual Hawaii International
Conference on System Sciences, pp. 112c-112c: IEEE (2005)
9. Brooke, J., Tofiloski, M., Taboada, M.: Cross-linguistic sentiment analysis: from english to
Spanish. In: Proceedings of the International Conference RANLP-2009, pp. 50–54 (2009)
10. Refaeilzadeh, P., Tang, L., Liu, H.: Cross-validation. 5, p. 532–538 (2009)
11. Wright, R.E.: Logistic Regression (1995)
12. Castro, W.M., Cabrera, S.G.: Tuberculosis: Diagnosis by Image Processing. 24(2) (2020)
13. González, R.H., Morell, C., Blanco, A.: Regresión lineal local con reducción de rango para
problemas de predicción con salidas compuestas. Revista Cubana de Ciencias Informáticas
10(4), 184–193 (2016)
14. Bowers, A.J., Zhou, R.: Receiver operating characteristic (ROC) area under the curve (AUC):
a diagnostic measure for evaluating the accuracy of predictors of education outcomes. 24(1),
20–46 (2019)
15. Makhoul, J., Kubala, F., Schwartz, R., Weischedel, R.: Performance measures for information
extraction. In: Proceedings of DARPA Broadcast News Workshop, 249–252 Herndon, VA
(1999)
Sentence Structure and Boundary for Deep
Neural Machine Translation Alignment Model

Bat-Erdene Batsukh(B)

University of the Humanities, Ulaanbaatar, Mongolia


[email protected]

Abstract. In terms of social globalization, the world remains more connected


than ever due to the widespread use of digital technologies. The translation is a
gate for those communication. When we want to understand, study, and express
any kind of information prepared in a language other than one’s native language,
no matter where you are in the world, we will need translation. Due to social
media usage, consumers are more likely to get information written by other users
in foreign languages, and researchers need to conduct research in many languages
and publish the results in one’s second languages. Because professional trans-
lation requires so much hard work, automated translations, known as machine
translations, have played a vital role in helping millions of consumers understand
information written in a foreign language. In addition to being used by ordinary
users to make every day common translation, it is also possible to help professional
translators translate quickly. The modern neural machine translation not only per-
forms better than systems that consider the sentences structure, but is also able
to find complex relationships for those translation candidates. It offers a simpler
modeling that makes it easier to implement. Neural machine translation no longer
requires intermediate steps such as word rank, which is a key component of a sys-
tem that uses word and sentence structure. While those easiness can be count as a
benefit, on the other view, the absence of careful wording is a loss of accordance
over translation. On the other corner, neural machine translation is more supple
for translation that does not exactly match the training data. The prevalent usage of
neural machine translation in translation systems has the benefit of allowing users
to translate certain terms and translate untrained data to a positive extent, but in
some cases often results in distorted sentence structure and boundary. This paper
aims to address issues such as neural machine translation control, more precise
translation of unrecognized data, correct sentence structure and boundaries.

Keywords: Mongolian-English Translation · Sentence Structure · Deep NMT ·


Sentence Boundary · Hierarchical Triple Model

1 Introduction

Neural machine translation refers to machine translation based on several neural network
models [1]. It differs from a system that takes into account word and sentence structure

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 508–520, 2023.
https://doi.org/10.1007/978-3-031-18344-7_36
Sentence Structure and Boundary 509

in its translation based on previously trained neural network models. The difference
between the two methods is as follows.
Neural machine translation generates translation hypotheses based on neural net-
work scores, and machine translation generates hypotheses using count-based language
models that take into account word and sentence structure. The neural network can be
used in combination with a system that takes into account word and sentence structure,
and can be used to calculate points directly or to calculate the N-best evaluation sequence
to rank previously established translation hypotheses based on k shortest paths [2].
This means that when neural network-based training is performed in machine trans-
lation, no intermediate training steps are required to generate the information needed to
train the neural network. Models are taught using only the ordered pairs of source and
target sentences. However, systems that take into account word and sentence structure
are based on prepared language patterns based on word position. Since it is often not
possible to use word matching, first, word matching training is conducted separately on
the parallel training fund.
Another difference between the two methods is the compatibility between the training
phase and the decoding phase. Just as standard neural machine translation uses training
sentence pair evaluation models during the training phase, it also evaluates translation
hypotheses generated during decoding. Systems that take into account word and sentence
structure often include a combination of independently prepared templates. Therefore,
decoding is not directly compatible with training, so there is a difference between training
and decoding.
Standard neural machine translation may require some modifications to normalize
the length between training and decoding, and to use assessment criteria that are different
from the learning objective function. Neural machine translation is more consistent with
trends that take into account word and sentence structure, and is much simpler in the
learning process. Neural network-based model architecture is a complex concept that
differs significantly from the count-based models used in systems that take into account
word and sentence structure. Simple single-layer neural networks are theoretically func-
tional packers that can model any functional relationship, but in practice the quality
of these models is based on whether the network is well calculated and the input data
are sampled and, in addition, word and sentence structure taken into account. Therefore,
designing a neural network usually means finding, experimenting with architectures that
can be practiced, and improving the results as much as possible. There are mixed meth-
ods of combining models that take into account word and sentence structure in neural
machine translation search [3]. These systems, which require significant changes to the
decoding algorithm, include many models, so it is considered necessary to fine-tune the
weight of the model. Although these technologies have improved the quality of machine
translation to some extent, they have not yet reached the level of fully automatic transla-
tion. Irregular sentences are usually created automatically and there is no guarantee that
the meaning of the sentence will be preserved. Alternatively, an automatic translation
system can be used to help professional translators translate high quality. Initiated by
Bender et al. [4], an interactive translation tool that allows translators to translate written
sentences in real time and complete them, if necessary, has led to a change in the strategy
for creating a translation system. This is because as soon as the user starts typing, he
510 B.-E. Batsukh

or she is faced with the problem of finding the best translation of a given sentence that
matches the word or sentence he or she is writing.
Compared to structures that take into account the structure of words and sentences,
the neural machine translation does not require extra intermediate steps, such as word
dependency, and produces direct results using a trained model. Lacking a specific word
connection can make it problematic to link the target words and sentence structure.

2 Related Works
We can identify three different trends in machine translation [5]. First, the propensity
to translate directly from the text into the target language. The next is the “transfer
method”, which is a step-by-step method of translating among the text and the abstract
representation of the target text. The final approach is to translate the text into a non-
linguistic representation between languages, and the target text is extracted from the
abstract representation of all these languages. It can be separated into rule-based and data-
based machine translation. Rule-based methods focus on manually defined translation
rules for a given bilingual. For example, phrase structure trees [6] analyze a sentence
from a sentence into several small phrases and even a single word, while a tree that
shows the relationship between words and sentence structure determines the relationship
between individual words [7]. Usually, they contain only one type of node, and the
relationship between the parent node and the dependent node is indicated by specially
marked notation. The process of defining translation rules requires a great deal of human
knowledge and involvement. On the other corner, a data-based approach, for example a
statistical machine translation, does not need such human knowledge, but translates the
model based on the data example.
Statistical machine translation is a data-based method established in the late 1980s
[8]. Its core idea is to develop a translation template that can be taught using a collection of
source data and target data. Preceding systems of machine translation based on statistics
were word-based, and each translation step contained of the creation of one correct word
[9, 10]. In the early 2000s, a system that considered word and sentence structure was
proposed [11–13]. Later, neural machine translation [14, 15] became the leading trend
in machine translation. To successfully implement these models, the lowest error rate
training [16] is used. It uses variants of attention mechanisms to address the backwardness
of encoder and decoder models [17–19]. On the other view, neural machine translation
is more flexible for translation that does not exactly match the training data. The key
solution is to divide words into sub-words based on creative research and use them in
machine translation [20]. This aspect delivers more opportunities for such models, but
limits the translation to predefined constraints. Without a specific word connection, it
will be difficult to connect the target words to the source word. Therefore, research
into the design and development of neural machine translation models has been widely
conducted in the field of applied and computational linguistics in the form of mixtures
and hierarchies based on basic statistical translation models.
Sentence Structure and Boundary 511

3 Sentence Structure and Boundary for Deep Neural Machine


Translation Alignment Model

We will contemplate two different approaches of machine translation: first, statical


machine translation (SMT) [8], and another is neural machine translation (NMT) [14,
15]. Statistical machine translation systems are based on the models proposed in [12]
and the approach discovered by [10]. All of these models are word-based and generate
one word per step. Later, a model approach to phrase was proposed [11]. Models that
take into account word and sentence structure differ from word-based models in that
they score a whole phrase at each step. For example, “What are you doing right now?”
Let’s take the sentence.
Using Bayesian decision rules using the minimum error rate training [16], each word
is described as follows (see Fig. 1).

Fig. 1. Word alignment

Sentence endings do not need to be taken into account when determining sentence
structure and scope. We define this boundary using an algorithm developed by Stanford
University [21]. Word-based models must model a long context to generate such a
sentence, and the search must be supple enough not to stop the partial assumptions that
lead to such a translation (see Fig. 2).

Fig. 2. Word-based model


512 B.-E. Batsukh

Nevertheless, phrase-based systems that take into account word and sentence struc-
ture are adequate to store such entries in the sentence table. Throughout the search, all
expressions can be assumed to be a single atomic unit (see Fig. 3).

Fig. 3. Phrase-based model

Decoding using repetitive neural networks in translations that take into account
word and sentence structure, or making decisions with N-best ratings. On the topic
of combining repetitive patterns in coding that takes into account word and sentence
structure, Auli suggests keeping hidden repetitive states in the search state, and suggests
a way to reconcile the states and decide whether they are correct when comparing
search states [22]. Although state reassembly is not abstract, the method of repeating
the model is used to approximate the iteration of the node, as it only retains the latent
state corresponding to the best path when the node is reassembled. Schwenk, on the
other hand, uses transfer models to calculate additional language scores [23], while
Le and his colleagues use short lists [24] to evaluate translation models using class-
based output layers and transmission networks. Kalchbrenner & Blunsom have used
repetitive neural networks to describe the original sentence obtained by using sequential
alignments in the source sentence [14]. Textual descriptions fall into the hidden layer of
repetition on target words. The finest translation is shaped by segmenting all possible
translations and their key phrases. For instance, if the source sequence of sentences in
a text of length K is M = mK 1 = m1 m2 . . . mK , then the equivalent MOSE format, or
the sequence of sentences in the target language corresponding to the same length L,
must be E = e1L = e1 e2 . . . eL .. In our case, we want to translate from Mongolian to
English, we get a (M, E) ranked pair. Based on this, t1L = t1 t2 . . . tL is the alignment
path of the position of each word in the target language to the position of the words in
the target language, the position of each word in the target language to the position of
the words in the target language s1K = s1 s2 . . . sK , [25, 26] and let g1K = g1 g2 . . . gK
be the sentence structure and boundary. In fact, finding the structure and position of a
translated sentence is a matter of probability theory to determine the distribution p (·)
of the translation pattern corresponding to the sentence to be translated at that time,
from the unknown probability distribution Pr (·).. These models include: (1) translation
and targeting, which includes text and target information, (2) language model, which
contains only target language information, and (3) inter-phrase reordering modeling,
which includes word and sentence structure. Can be classified into basic categories.
Modern neural network-based models are able to learn these three models on their own
from the language model and the parallel corpus. However, in some cases, grammar and
sentence structure are not taken into account, which can lead to problems in translating a
given text, such as misinterpretation, synthesis, or omission of sentences. To address this
issue, we have added the sentence structure and boundary as extra model (see Fig. 4).
Sentence Structure and Boundary 513

Fig. 4. General diagram of proposed neural machine translation

Once we have identified the templates, we need to solve the search problem, and
this process is called decoding. A search is to find the best translation based on sentence
structure, boundary and word placement. The search is performed by the max and argmax
operators, which search for the best translation among the different translations. In doing
so, length normalization is used to balance the probabilities of long and short sentences
(Eq. 1).
 L 
K L̂
 
K 1 l−1 K
m1 ∈ ê1 m1 = arg max logp(el |e1 , m1 ) . (1)
L,eL L
1 l=1

The search for this model, which combines the three models we propose in a
hierarchical manner, is as follows (Eq. 2).
L
  1  
l−1 l

mK
1 ∈ ê1

mK
1 = arg max max { ( λ logp el |e1 , t ,
1 1g K
, mK
1 + (1 − λ)
L,e1L t1L L
l=1
 
log p l |e1l−1 , t1l−1 , g1K , mK
1 )} (2)

λ is the weight of the lexical model, and (1−λ) is the weight of the hierarchical structure
of the word, sentence structure and boundary. When modeling grammar and sentence
boundary, the overall connection of sentences in Mongolian is first designed. “Bapak
Obama Xavad t8pc8n”. Assumed the sentence, the diagram looks like this (see Fig. 5).
For us, the UD, which syndicates Mongolian grammar and sentence boundaries, is
encouraged by Stanford’s method [27], which studies neural network-based words and
514 B.-E. Batsukh

Fig. 5. Dependency tree

sentence structures and relations. For instance, “Bapak Obama Xavad t8pc8n.” The
Stanford dependency of the Mongolian language is as follows (see Fig. 6).

Fig. 6. Stanford dependency

When training grammar and sentence boundary in a total of 1000 steps, sentence
structure and boundary recognition loss was condensed to 0.02 (see Fig. 7).

Fig. 7. Training loss reduction

During the training, development scores were automatically evaluated for every 200
steps, and the final development score reached 98,653 (see Fig. 8).
Sentence Structure and Boundary 515

Fig. 8. Training score improvement

By including this dependency in the search for neural translation model, we have
become a gateway to better understanding of sentence structure and boundary.

4 Methods and Results

An effort was made to mix neural network results with a model that takes into account
word and sentence structure, and for the first time projected a model of re-alignment by
changing the position of words [25]. In fact, this integrated model of neural machine
translation uses phrases to train neural networks. For those training, we created a local
Mongolian-English miscellaneous bilingual corpus by translating the following corpora
(see Table 1) (see Table 2).

Table 1. Translated sentences from following bilingual corpus.

Corpus File size Translated sentences


United Nations Parallel Corpus [28] 3.44 gb 25,173,399
Wikimatrix [29] 227 mb 1,661,908
OpenSubtitles [30] 832 mb 25,910,106

With the purpose of present the results of the study more evidently and in more
detail, we have measured some statistical indicators. The average number of words in the
original data or 2,402,138 sentences prepared in Mongolian was 15.919942984124976,
the average number of characters was 112.4927664438929, the smallest line consisted
of 2 characters with 1 word, and the line with the most words consisted of 2149 words
with 2039369 indexes.
The model choice was chosen in two or three different models, the data size was
taught in the same two million four hundred thousand sentences, and the translation test
was performed with a 95% confidence level. Also, when the alpha level was chosen
516 B.-E. Batsukh

Table 2. Mongolian-English Mixed Bilingual Corpus via Back Translation [31–33].

Corpus Mongolian English


train Sentences 2,402,138
Words 39,298,174 43,170,480
dev Sentences 300,000
Words 4,895,610 5,378,414
test Sentences 300,000
Words 4,893,721 5,382,746

to be 0.05 and the t-test was performed, the difference in t value was 6.889030645 >
1.961889826, which negates the null hypothesis. Selecting the data size in one million
sentences and two million four hundred thousand sentences, the mean of 0.8445 for one
million sentences and 0.9514 for two million four hundred thousand sentences, with a
95% confidence interval, was significantly different. In addition, when the alpha level
was chosen to be 0.05 and tested on a t-test, the value of 11.2556322 > 1.962023587,
which is significantly higher than the comparison point, refutes the null hypothesis. To
evaluate our model, we have generated sample paragraph that sampled from the test
package. The following results were obtained by comparing and evaluating the quality
of translation using those text (see Table 3).

Table 3. Quality comparison on Mongolian-English Translation.

Method BLEU TER


OpenNMT RNN 0.0 0.6348547717842324
Alignment-based NMT [25, 26] 0.3544063928399769 0.44813278008298757
Proposed model 0.4036094327844361 0.4419087136929461
Google translate 0.3468626172713963 0.47302904564315357

The most important feature of our study is the introduction of a completely new triple-
hierarchical model that adds focus to the neural network model that takes into account
word and sentence structure, correctly defining the sentence structure and boundary
according to the grammar. We started this experiment just to make this model. To develop
this model, we have developed a general definition of the model and explained it in terms
of probability theory. The search steps were identified by examining the translation
model, the lexical model, and the count-based language model [34]. The results of the
neural network were then staged in a three-step model that worked by correctly defining
the sentence structure and boundary by linking it to a pattern that took into account word
and sentence structure. It then significantly increased the speed by running the template
on the best translations without having to re-calculate the search source data using the
N-best list rating. Finally, we developed a hierarchical neural machine translation model.
Sentence Structure and Boundary 517

The diagram illustrates which three models are step-by-step, how the learning process
works, how to translate with the help of a already trained model, and how to use sentence
structure and boundary.

Fig. 9. Proposed model

By modeling the outcomes of the neural network in a step-by-step manner that


takes into account the word, sentence structure and boundary model, we are able to
accurately determine sentence structure, meaning, word position, and sentence boundary,
making Mongolian translation more accurate. (see Fig. 9). The above evaluation used
the translations of three professional translators as reference translations. Previous three
types of assessments including accuracy and BLEU [35] and TER [36] have shown that
the quality of our model translations has improved to some extent.

5 Conclusion
The neural machine translation has recently become a new paradigm to dominate the
machine translation research. In this case, this type of translation model and methodical
518 B.-E. Batsukh

study have entered the field of computational linguistics. By introducing the neural
machine translation alone or in two stages, it reduces the control of the system’s output,
which takes into account the structure and boundaries of words, sentences, and grammar.
Therefore, we developed a neural machine translation system with three different models
of hierarchical connections to improve sentence structure and grammar boundary. This
study was the first attempt to use neural machine translation as a hierarchical system
of sentence structure, word placement, and sentence coverage. Furthermore, a neural
machine translator can produce direct translations without waiting for a whole input
sentence, allowing the user to translate directly or in synchronously, even when the user
is translating. The main thing that makes it possible to translate like this is the sentence
structure and boundary.

References
1. Sutskever, I., Vinyals, O., Le, Q.: Sequence to sequence learning with neural networks. CoRR,
vol. abs/1409.3215 (2014). http://arxiv.org/abs/1409.3215
2. Eppstein, D.: Finding the k Shortest Paths. SIAM J. Comput. 652–673 (1997). Accessed 20
Jan 2022. http://www.ics.uci.edu/
3. Dahlmann, L., Matusov, E., Petrushkov, P., Khadivi, S.: Neural machine translation leveraging
phrase-based models in a hybrid search. In: Proceedings of the 2017 Conference on Empirical
Methods in Natural Language Processing, September 2017, pp. 1411–1420 (2017) https://
doi.org/10.18653/v1/D17-1148
4. Bender, O., Hasan, S., Vilar, D., Zens, R., Ney, H.: Comparison of generation strategies for
interactive machine translation. In: EAMT, pp. 33–40 (2005)
5. Vauquois, B.: A survey of formal grammars and algorithms for recognition and transformation
in mechanical translation (1968)
6. Chomsky, N.: Three models for the description of language. IRE Trans. Inform. Theory 2,
11–124 (1956)
7. Tesnière, L.: Eléments de syntaxe structurale ´Editions Klincksieck, vol. 6, no. 1. Cambridge
University Press (1959). https://doi.org/10.1017/S0008413100018922
8. Brown, P.F., et al.: A statistical approach to machine translation. Comput. Linguist. 79–85
(1990)
9. Brown, P.F., della Pietra, S.A., della Pietra, V.J., Mercer, R.L.: The mathematics of statistical
machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993). https://
aclanthology.org/J93-2003
10. Vogel, S., Ney, H., Tillmann, C.: HMM-based word alignment in statistical translation. In:
International Conference on Computational Linguistics, pp. 836–841 (1996)
11. Och, F.J., Ney, H.: Improved statistical alignment models. In: Proceedings of the 38th Annual
Meeting of the Association for Computational Linguistics, October 2000, pp. 440–447. https://
doi.org/10.3115/1075218.1075274
12. Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of the
2003 Human Language Technology Conference of the North American Chapter of the Asso-
ciation for Computational Linguistics, pp. 127–133 (2003). https://aclanthology.org/N03-
1017
13. Zens, R., Ney, H.: Improvements in dynamic programming beam search for phrase-based
statistical machine translation. In: International Workshop on Spoken Language Translation,
pp. 195–205 (2008)
Sentence Structure and Boundary 519

14. Kalchbrenner, N., Blunsom, P.: Recurrent continuous translation models. In: Proceedings of
the 2013 Conference on Empirical Methods in Natural Language Processing, October 2013,
pp. 1700–1709. https://aclanthology.org/D13-1176
15. Tan, Z., et al.: Neural machine translation: a review of methods, resources, and tools. AI Open
1, 5–21 (2020). https://doi.org/10.1016/j.aiopen.2020.11.001
16. Och, F.J.: Minimum error rate training in statistical machine translation. In: Proceedings
of the 41st Annual Meeting of the Association for Computational Linguistics, July 2003,
pp. 160–167. https://doi.org/10.3115/1075096.1075117
17. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align
and translate. CoRR, vol. 1409.0473 (2014)
18. Vaswani, A., et al.: Attention Is All You Need. In: Advances in Neural Information Processing
Systems, pp. 5998–6008 (2017)
19. Cheng, J., Dong, L., Lapata, M.: Long short-term memory-networks for machine reading. In:
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing,
November 2016, pp. 551–561. https://doi.org/10.18653/v1/D16-1053
20. Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword
units. In: Proceedings of the 54th Annual Meeting of the Association for Computational
Linguistics (Volume 1: Long Papers), August 2016, pp. 1715–1725. https://doi.org/10.18653/
v1/P16-1162
21. Wang, H., Huang, Y.: Bondec-A sentence boundary detector (2003)
22. Auli, M., Galley, M., Quirk, C., Zweig, G.: Joint language and translation modeling with
recurrent neural networks. In: Proceedings of the 2013 Conference on Empirical Methods in
Natural Language Processing, October 2013, pp. 1044–1054. https://aclanthology.org/D13-
1106
23. Schwenk, H.: Continuous space translation models for phrase-based statistical machine trans-
lation. In: Proceedings of COLING 2012: Posters, December 2012, pp. 1071–1080. https://
aclanthology.org/C12-2104
24. Le, H.S., Allauzen, A., Yvon, F.: Continuous space translation models with neural networks.
In: Proceedings of the 2012 Conference of the North American Chapter of the Association for
Computational Linguistics: Human Language Technologies, June 2012, pp. 39–48. https://
aclanthology.org/N12-1005
25. Wang, W., Alkhouli, T., Zhu, D., Ney, H.: Hybrid neural network alignment and lexicon
model in direct HMM for statistical machine translation. In: Proceedings of the 55th Annual
Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), July
2017, pp. 125–131. https://doi.org/10.18653/v1/P17-2020
26. Alkhouli, T., Bretschner, G., Peter, J.-T., Hethnawi, M., Guta, A., Ney, H.: Alignment-based
neural machine translation. In: Proceedings of the First Conference on Machine Translation:
Volume 1, Research Papers, August 2016, pp. 54–65. https://doi.org/10.18653/v1/W16-2206
27. Dozat, T., Qi, P., Manning, C.D.: Stanford’s graph-based neural dependency parser at the
CoNLL 2017 shared task. In: Proceedings of the CoNLL 2017 Shared Task: Multilingual
Parsing from Raw Text to Universal Dependencies, August 2017, pp. 20–30. https://doi.org/
10.18653/v1/K17-3002
28. Ziemski, M., Junczys-Dowmunt, M., Pouliquen, B.: The United Nations parallel Corpus v1.0.
In: Proceedings of the Tenth International Conference on Language Resources and Evaluation
(LREC 2016), May 2016, pp. 3530–3534. https://aclanthology.org/L16-1561
29. Schwenk, H., Chaudhary, V., Sun, S., Gong, H., Guzmán, F.: WikiMatrix: mining 135M
parallel sentences in 1620 language pairs from Wikipedia. CoRR, vol. abs/1907.05791 (2019).
http://arxiv.org/abs/1907.05791
30. Lison, P., Tiedemann, J.: OpenSubtitles2016: extracting large parallel corpora from movie
and TV subtitles (2016). http://www.opensubtitles.org. Accessed 20 Jan 2022
520 B.-E. Batsukh

31. Graça, M., Kim, Y., Schamper, J., Khadivi, S., Ney, H.: Generalizing back-translation in
neural machine translation,” in Proceedings of the Fourth Conference on Machine Translation
(Volume 1: Research Papers), Aug. 2019, pp. 45–52. doi: https://doi.org/10.18653/v1/W19-
5205
32. Cotterell, R., Kreutzer, J.: Explaining and generalizing back-translation through wake-sleep.
arXiv preprint, vol. 1806.04402 (2018)
33. Edunov, S., Ott, M., Auli, M., Grangier, D.: Understanding back-translation at scale. In:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing,
October 2018, pp. 489–500. https://doi.org/10.18653/v1/D18-1045
34. Ma, S., Sun, X., Wang, Y., Lin, J.: Bag-of-words as target for neural machine translation.
ACL, vol. 1805.04871 (2018). Accessed 20 Jan 2022. https://github.com/
35. Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: Bleu: a method for automatic evaluation
of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for
Computational Linguistics, pp. 311–318 (2002). https://doi.org/10.3115/1073083.1073135
36. Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate
with targeted human annotation. In: Proceedings of the 7th Conference of the Association for
Machine Translation in the Americas: Technical Papers, pp. 223–231 (2006). https://aclant
hology.org/2006.amta-papers.25
Topic Discovery About Economy During
COVID-19 Pandemic from Spanish
Tweets

Ana Laura Lezama Sánchez1 , Mireya Tovar Vidal1 ,


and José A. Reyes-Ortiz2(B)
1
Faculty of Computer Science, Benemerita Universidad Autonoma de Puebla, 72590
Puebla, Mexico
[email protected], [email protected]
2
Departamento de Sistemas, Universidad Autónoma Metropolitana,
Azcapotzalco, 02200 Ciudad de México, Mexico
[email protected]

Abstract. Automatic topic discovery from natural language texts has


been a challenging and widely studied problem. The ability to discover
the topics present in a collection of text documents is essential for infor-
mation systems. Topic discovery has been used to obtain a compact rep-
resentation of documents for grouping, classification, and retrieval. Some
tasks that can benefit from topic discovery: recommendation systems,
tracking misinformation, writing summaries, and text clustering. How-
ever, topic discovery from Spanish texts has been somewhat neglected.
For this reason, this work proposes analyzing the behavior of topic dis-
covery tasks in texts in Spanish, specifically in tweets about the Mex-
ican economy during the COVID-19 pandemic, under three different
approaches. A comparison was conducted, achieving promising results
because the topic coherence metric indicates coherent topics. The high-
est score of 1.22 was obtained using PLSA with 50 topics, concluding
that the topics encompassed the study domain.

Keywords: Spanish tweets · Topic discovery · COVID-19 pandemic ·


Mexican economy

1 Introduction
The main aim of topic discovery in text documents is to extract the text’s mean-
ing, imitating human capacity automatically. This area of study is investigated
within Natural Language Processing (NLP), which allows the automatic extrac-
tion of the meaning of texts, identifying recurring topics automatically, and cre-
ating algorithms capable of interpreting human language. The goal of topic dis-
covery is to extract information by identifying recurring topics and thus finding
the central topic. The purpose of topic discovery is to show relevant information
for other information systems.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
K. Arai (Ed.): FTC 2022, LNNS 561, pp. 521–533, 2023.
https://doi.org/10.1007/978-3-031-18344-7_37
522 A. L. Lezama Sánchez et al.

The accelerated growth in social networks has significantly revolutionized


in recent years since it has allowed obtaining large amounts of text written
in natural language and can contain valuable information for other purposes.
The text is ready to be analyzed to obtain non-explicit knowledge valuable for
companies, associations, and general users.
There are many works on the topic discovery on diverse domains for various
applications in the literature. However, there is a lack of work that carried out
experiments with texts in Spanish, representing the absence of computational
resources. Therefore, this paper discovers topics from Spanish texts.
Natural Language Processing (NLP ) is a branch of the study of Artificial
Intelligence (AI ) responsible for generating computational models so that a
computer can understand human language. A tweet written by a person in their
mother tongue can be processed so that a computer understands it and can
discern if it is a tweet that talks about diseases or a positive comment. PLN
provides the computer with the information that must be analyzed to transform
the language into computer-readable code. Information extraction is a PLN area
for obtaining text without any specific format (unstructured text).
Automatic topic discovery is a branch of the NLP that gets the central idea
of a collection of text documents. A document talks about a central idea, that
is, around a particular topic. Humans can read a document and identify the
topics present using previous knowledge. The existence of social networks has
revolutionized communication and expression between their users. The volume
of information has been increasing, and for a human to carry out the task of topic
discovery manually is tedious, time-consuming, and almost impossible given the
vast amount of information currently available on social media. It has been nec-
essary for NLP researchers to provide systems capable of analyzing any number
of texts in a short time compared to humans.
This paper compares algorithms LDA, LSA and PLSA (explained in the next
section) for topic discovery in Spanish tweets.
The rest of this paper is organized as follows. Section 2 is presented the
definition of topic discovery and some of techniques used for this task. In Sect. 3,
some of the existing works in the literature that work with texts in Spanish are
exposed; they discover the existing topics from the corpus text. Section 4 exposes
the proposed approach. Section 5 exposes the experimental analysis used in the
paper. In Sect. 6, the dataset and the obtained results are presented. Section 7
presents the conclusions.

2 Topic Discovery

Topic discovery is analyzing a large amount of text and finding the topics dis-
cussed in a set of them. Some methods or techniques perform this task auto-
matically in the literature, but it is still necessary to develop new methods or
improve existing ones.
The task of topic discovery is an essential part of computer systems that
require the analysis of large amounts of text automatically in a reasonable time,
Topic Discovery About Economy During COVID-19 Pandemic 523

compared to the time that the human would spend on the same task. Several
algorithms can extract topics present in large volumes of information quickly in
the literature, such as Latent Semantic Analysis, Probabilistic Latent Semantic
Analysis and Latent Dirichlet Analysis.
The computational systems that topic discovery presents in large volumes
of text have the characteristic that they can shred the text as a human being
does. In the literature, some techniques can provide a system with the ability
to examine a significant number of texts in a short time, aided by techniques
that incorporate mathematical procedures that together uncover the topics [6].
Some of the most used techniques are Latent Dirichlet Analysis (LDA), Latent
Semantic Analysis (LSA), and Probabilistic Latent Semantic Analysis (PLSA).
It is worth mentioning that the authors who have worked with texts in English
have mainly applied LDA or some method developed by them. The Spanish
language has the disadvantage that it has been little investigated.
Latent Dirichlet Analysis (LDA) is a generative probabilistic model for ana-
lyzing discrete data collections. The hierarchical Bayesian model of three levels
(document, word, and topic) considers a topic distribution over a vocabulary.
The model explains the number of topics and defines the words that belong to
those topics [5]. Furthermore, Latent Semantic Analysis (LSA) discovers words
from the same semantic field (those that form a group of words that share char-
acteristics in their meaning). LSA is based on the linear factorization known
as Singular Value Decomposition (SVD) [4]. And finally, Probabilistic Latent
Semantic Analysis (PLSA) descends from LSA, discovers the semantics of hidden
topics in documents using the representation called bag of words (they represent
each document ignoring the order of the words) [4].
Some authors have not used traditional techniques for this task. The use of
neural networks, cosine similarity, clustering algorithms, Principal Component
Analysis, among others, are applied by authors such as [6,8]. They have not
used LDA, LSA or PLSA, but they have discovered topics with their proposed
algorithm.
Regarding text pre-processing, some tools provide morphological analysis,
recognition of named entities, PoS-tagging, disambiguation of words’ meaning,
and lemmatization. The Freeling Analyzer [14] provides these types of analysis
for English, Spanish, Portuguese, Italian, French, German, and Russian texts.
This paper proposes to carry out the topic discovery in texts in Spanish, tweets
about the Mexican economy with COVID19 mentions. We implemented three
techniques most used in the literature: Latent Dirichlet Analysis, Latent Seman-
tic Analysis, and Probabilistic Latent Semantic Analysis, with three different
approaches using textual features and morphological information of texts.

3 Related Work
This section presents some of the works carried out by other authors on topic
discovery in Spanish texts.
In [10], a method is presented to detect polarity in opinions by topics in texts
in Spanish about opinions of Andalusian hotels on the TripAdvisor site. The
524 A. L. Lezama Sánchez et al.

topic discovery was based on semantic processing by applying an agglomerative


hierarchical clustering algorithm. The authors used Spanish SentiWordNet as
a source of sentiment score values. In addition, they evaluated various fuzzy
traders to add sentiment scores and thus calculate the sentiment polarity score.
The method includes pre-processing that features tokenization and stopword
removal. A stemming process was added, and they used a part-of-speech tagger
to determine the corresponding tag for each word present in the sentences. The
topic discovery was based on grouping the sentences present in the corpus and
a procedure to determine the most relevant phrases of each group discovered.
Opinion polarity was detected through an unsupervised lexical approach. The
method was able to identify feelings such as positivity, negativity, and neutrality.
The authors evaluated the approach with accuracy, precision, recall, and F1 -
measure.
Sridhar et al. [17] propose a method for topic discovery for short texts in
English, Spanish, French, Portuguese and Russian. The method employs a low-
dimensional semantic vector space model represented by vectors of dense words.
The authors incorporated Gaussian mixing models that learned representations
in long context windows that overcome the problem of sparse word matching
patterns. The method proposed by the authors outperformed the Latent Dirich-
let Analysis (LDA) in short texts through a subjective and objective evaluation.
The results obtained allowed the authors to conclude that the proposed method
reliably learns latent topics, and that it is possible to use them to categorize
short texts with high fidelity compared to the results provided by LDA and the
Biterm Topic Model (BTM).
In [9], a system is shown that relies on topic discovery with LDA to train
a vector-supported machine model to recognize metaphorical text. The system
locates instances of metaphors in text in English and Spanish about news and
blogs on political issues. For each concept, they incorporate a label and a set
of initial words that represent that concept. The result obtained by LDA is 100
topics, where each topic is a probability distribution over the vocabulary. The
evaluation was carried out with recall, F1 -measure, and Kappa.
Navarro-Colorado et al. [13] apply LDA method from a corpus of Spanish
sonnets from the Golden Age. The topics obtained were manually analyzed to
define the relationship between topics and poems. The authors propose filtering
the empty words and conducting a lemmatization on the corpus. The authors
conducted experiments with standard LDA implemented in MALLET and LDA
version based on word embeddings LF-LDA. The evaluation was carried out with
the topic coherence metric and the intrusive word technique.
A model for topic discovery in health documents is propose in [12]. The
authors used a corpus of 220 digital documents written in Spanish on different
health problems from Biomedical Abbreviation Recognition and Resolution 2nd
Edition. The authors propose to carry out a pre-processing that consists of tok-
enization, elimination of stopwords, and stemming. Later, they built a matrix
of document terms through TF-IDF. In addition, they built a medical glossary
of three online dictionaries that served as the basis for adding, or not adding
Topic Discovery About Economy During COVID-19 Pandemic 525

extra weight to an existing word in those dictionaries. The authors obtained a


final matrix of 220 rows representing the documents and 5987 columns repre-
senting the words or terms. In the resulting matrix, the authors applied LDA,
obtaining two new matrices. They also applied the cosine similarity metrics, the
Kullback-Leibler method, and the harmonic mean. The evaluation was made by
carrying out a comparison between topic terms and keywords present in some
documents. Values greater than or equal to 0.7 in the document-topic matrix
were observed, and these topics were searched in the topic-term matrix. Subse-
quently, they extracted the most relevant topics and terms for each document
to create a table with the name of the document, its keywords, and the terms
obtained with LDA.
Fuentes-Pineda et al. [8] expose a method for the topic discovery in corpora
such as 20-newsgroup, Reuters, Wikipedia in English and Spanish. The exposed
method is based on min-hashing, an alternative approach to topic discovery. The
main idea is to generate multiple random partitions of the corpus vocabulary
in the word occurrence space spanning inverted file lists to find sets of concur-
rent words subsequently grouped to produce the final topics. The approach put
forward by the authors was well adapted to both the size of the texts and the
number of words in the vocabulary.
In [7], it is carry out the topic discovery in Spanish using LDA algorithm.
The corpus used is news in Spanish from the newspaper ABC automatically
extracted. A second corpus used was collected manually from the newspaper El
Paı́s. LDA was used as a topic discovery technique with MALLET, extracting
30 topics for each corpus. The results obtained for the two newspapers are very
similar (the same topics and with similar intensity).
A method for the topic discovery in tweets about COVID in Spanish is
exposed in [1]. The set of tweets used was extracted from January 1, 2019, to
April 20, 2020. The authors carried out a three-phase analysis of tweets about
COVID in Spain. The pre-crisis, the outbreak of the disease, and the confine-
ment are the three phases analyzed by the authors. The tweets were collected
manually and grouped into different topics using LDA, and later they extracted
key phrases and more representative sentences for each topic.
[3] presents a topic discovery from Facebook conversations about COVID-
19. The data is extracted between January 1, 2020, and May 15, 2020, divided
into three periods in seven different languages: English, Arabic, Italian, Spanish,
French, German and Japanese. The objective of the work was to analyze the cog-
nitive development of people on the COVID-19 pandemic. The authors included
a pre-processing that encompasses the removal of HTML tags and stopwords
according to the language in question. The discovery of the existing topics in
the corpus was made with LDA in MALLET. Later they build a representation
graph based on the topics and words obtained.
An analysis of the topics present in the PubMed document on COVID-19 is
proposed in [2]. The topics were extracted with the Latent Dirichlet Assignment
model and also carried out trend analysis to understand the changes in the
topics, the impact factor, and geographic origin present in the data set during
526 A. L. Lezama Sánchez et al.

the investigation. The authors discovered 14 main topics. The most common
were health care responses and clinical manifestations. To obtain the data set
with which they carried out their experiments, the authors carried out a search in
PubMed on June 1, 2020, with terms such as covid or covid-19 without language
or date restrictions. They excluding the term coronavirus, using the Biopython
package. The corpus consisted of the title, keywords, abstract, date of the last
revision, the author’s affiliation list, the name of the journal, and the PubMed
identification number for each publication. The corpus was preprocessed; that
is, uppercase letters were converted to lowercase, and double spaces; special
characters, stopwords, and numbers were removed, and they were also applied to
stems. The evaluation was carried out with the evaluation metrics of perplexity,
probability of exclusion, and PCA. The authors decided the final number of
topics based on the evaluations of the three evaluation metrics, as well as the
authors’ mastery of COVID-19 and medical research.
In [11], it is proposed a topic discovery study and sentiment analysis from
tweets about the COVID-19 vaccine. The method distinguishes between changes
in topics and feelings over time in the population. The authors built a corpus
of tweets on COVID-19 dating from March 11, 2020, the day the World Health
Organization declared COVID-19 a pandemic, through January 31, 2021. The
key phrases they used to download these tweets were CoronavirusPandemic,
COVID-19, 2019nCoV, CoronaOutbreak, coronavirus, WuhanVirus, covid19,
coronavirus pandemic, covid-19, 2019ncov, coronaoutbreak y wuhanvirus. The
authors used R software to pre-process and preserve the tweets that contained
the keywords vaccination, vaccines, vaccine, vaccines, immunization, vaccinate,
and vaccinated. The authors applied the Latent Dirichlet Assignment for topic
modeling and sentiment and emotion analysis using the Council of Canada’s
National Lexicon of Emotion Research. The analysis yielded 16 topics, which
were grouped into five general topics. Based on the results obtained, the most
discussed topic was vaccination and how to obtain it. Regarding sentiment anal-
ysis, they showed that sentiment was increasingly optimistic.
An analysis of Twitter narratives around decision making by applying a
dynamic theme model to tweets is expose in [16]. The authors downloaded a set
of COVID-19 related tweets about governors and members of the US presiden-
tial cabinet, with a total of 73 politicians. The tweets were downloaded from
January 1, 2020, to April 7, 2020. The corpus obtained had 7,881 tweets related
to COVID-19 of the 73 politicians ranked in ascending order over time. The
model used was the Network Hawkes Binomial Topic Model to track evolving
subtopics around COVID-19. The authors built networks of influence among
government officials using Granger causality. Based on experimental results, the
authors found themes about risks, working from home, staying at home, school
closings, and social distancing.
A study that aims to understand the discourse and psychological reactions
of Twitter users on COVID-19 is proposed in [18]. The authors selected a list
of 19 trending hashtags related to COVID-19. The proposed study managed
to identify 11 topics, including confirmed cases, mortality, cases outside and
Topic Discovery About Economy During COVID-19 Pandemic 527

within China. Covid-19 outbreak in South Korea, early signs of the outbreak in
New York, Diamond Princess cruise ship, economic impact, preventive measures,
authorities, and supply chain. The results obtained did not reveal topics related
to treatments and symptoms as frequently as the topics on Twitter. In addi-
tion, they applied a sentiment analysis that showed that fear of the unknown
nature of coronavirus was dominant in all topics. The authors applied an obser-
vational study design and an intentional sampling approach to select all Tweets
containing defined hashtags related to COVID-19 on Twitter.
In general, one of the limitations when working with texts in Spanish is the
lack of tools for its treatment. For this reason, work that addresses the discovery
of text topics in Spanish is lacking; added to this, the processing of texts from
social networks represents an important challenge in tasks of this magnitude.
For this reason, in this work, it was proposed to discover the topics presented in
Spanish texts but working with nouns, adjectives, and verbs since it is considered
that these three elements are the ones that provide information to carry out this
task.
This work proposes to discover the latent topics in a corpus with the three
topic discovery techniques most used in the literature in a corpus in Spanish
extracted from Twitter. The corpus has been pre-processed traditionally, and
the topics are subsequently extracted. After using the Freeling analyzer, the
dependency graph was obtained, providing information about where the nouns,
abjectives, and verbs are found. The objective was to analyze the behavior of
LDA, LSA, and PLSA in a corpus in Spanish, and when Freeling was used
the coherence levels of each technique used were improved compared to the
first approach. The proposed evaluation is topic coherence aided by an external
corpus, in this case, Wikipedia, with a size of 1,495,246 million documents.

4 Proposed Method
In this work, it is proposed to carry out three different approaches for topic
discovery in Spanish. First, the corpus is pre-processed by removing stop words,
punctuation marks, non-ASCII symbols, converting uppercase to lowercase, and
removing non-printable symbols. The second approach consists of removing non-
ASCII symbols and, with the help of Freeling, extracting the dependency graph
and working with nouns and adjectives. The third approach works only with
adjectives, verbs, and nouns.
The following steps describe the proposed experiments:
1. Corpus pre-processing: For the first approach, this stage includes the following
actions:
(a) Mentions, number symbols, emoticons, punctuation marks are removed,
and accents of the language.
2. For the second and third approaches, we remove the non-ASCII symbols and
extract the parts of speech with the help of Freeling. The dependency graph is
obtained from the original corpus. It is presented as an algorithm as follows.
Begin
According to option do:
528 A. L. Lezama Sánchez et al.

2.2: Adjectives and nouns for the second approach.


2.3: Verbs, nouns and adjectives for the third approach.
End According
3. Apply the topical discovery method: LDA, LSA, or PLSA, and 20, 50, and
100 topics are extracted with ten main words each.
4. Evaluation: The evaluation of the obtained results is carried out with the
topic coherence metric that uses Eq. 1 to measure how coherent the recovered
topics are and, in our case, also evaluate how coherent it was to add work
only with adjectives, nouns, and verbs according to each proposed approach.
It should be noted that this metric uses an external corpus for its operation. In
our case, we use a corpus of Wikipedia in Spanish with 1,495,246 documents
[15].
2  p(wi , wj )
P M I(wi , wj ) = log (1)
T (T − 1) p(wi )p(wj )
1≤ i<j≤ T

where T is the main words p(wi ) (resp. p(wj )) is the probability that the
word wi (resp. p(wj )) appears in a text window of a given size, while p(wi , wj )
denotes the probability that wi y wj co-occur in the same window.

5 Experimental Analysis
The results obtained with the proposed method provided a view of the three
selected elements: adjectives, nouns, and verbs. Although the results so far are
not the best compared to the literature, they did indicate that the proposed
method can provide coherent topics about language and domain. The proposed
method is applied to the LDA, LSA, and PLSA techniques, combining the num-
ber of topics parameters 20, 50, and 100 to be discovered, respectively, to observe
the method’s behavior according to the number of topics extracted. The follow-
ing section presents in detail the results obtained and evaluated with the topic
coherence metric.

6 Results
This section presents the dataset used and the results obtained with the proposed
procedure.

6.1 Dataset
This section presents the dataset used (Table 1) and the experimental results
with the proposed procedure. The effects and differences when applying LDA,
LSA, and PLSA on a set of tweets are analyzed. The tweets are extracted between
May 2020 and November 2021, filtering the Spanish and Mexican territory and
contains #COVID-19 in economics. We use Twitter API to extract and filter
them. The dataset was pre-processed in two different ways for three different
approaches. For the first approximation, stop words, non-ASCII symbols, URL,
Topic Discovery About Economy During COVID-19 Pandemic 529

mentions, punctuation marks, and symbols such as #, %, &. With the pre-
processed corpus, LDA, LSA, and PLSA were applied and evaluated with topic
coherence. Later, returning to the original corpus, only non-ASCII symbols are
removed and using the Freeling analyzer, certain parts of each sentence that
make up the corpus are obtained. The dataset information is shown in Table 1,
where D represents the number of documents in the dataset, and T is the total
vocabulary, including stop words.

Table 1. Dataset

Dataset D(Documents) T (V ocabulary)


Tweets in Spanish 38,000 449,634

6.2 Experimental Results


The experimental results obtained with implementing the proposed procedure
will be presented below.
The first approach is to test the three topic discovery methods on the cor-
pus that has been pre-processed by removing words with accents, commas, and
symbols. Table 2 shows the experimental results obtained with the topic coher-
ence evaluation metric. 20, 50, and 100 topics were obtained from each method
with ten main words each. As shown in Table 2, the highest scores are for LDA
with 20 topics and LSA and PLSA with 100 topics, because for LDA, the lower
the number of topics, the words that make up these topics are more related.
However, LSA and PLSA, when the number of topics is more significant, tend
to obtain words that are more related to each other.

Table 2. Results obtained with topic coherence and normal preprocessing

Dataset 20 50 100
LDA 1.15 1.11 0.94
LSA 0.93 0.93 0.96
PLSA 1.07 1.11 1.12

Table 3 the results obtained when working on the second sub-corpus created
are shown. The second approach was formed by extracting only the adjectives
and nouns labeled by Freeling from the corpus. It is observed that the results
vary a little. In the case of LDA, when 20 topics are discovered with their ten
most representative words, it obtains higher coherence levels than when 50 and
100 topics are discovered, respectively. In the case of LSA, the same behavior is
observed, but when 100 topics are obtained and in the case of PLSA when 20 and
50 are obtained. This behavior in the results is since on the original corpus. It was
530 A. L. Lezama Sánchez et al.

considered that not eliminating hashtags would provide important information


to each topic discovered. For instance, “#vacuna” and “#sputnik”. They are
valuable information when discovering topics, but the presence of hashtags such
as “#forolatibex” and “#mercomunaiztapalapa” is misspelled, and they are not
relevant information, although Freeling labels as adjectives or nouns.

Table 3. Results obtained with topic coherence and the adjectives and nouns recog-
nized by freeling

Dataset 20 50 100
LDA 0.94 0.92 0.84
LSA 1.00 1.02 1.05
PLSA 1.19 1.19 1.18

Table 4 shows the results obtained with the topic coherence metric when
evaluating LDA, LSA, and PLSA, with the third proposed approach, extracting
nouns, adjectives, and verbs labeled by Freeling from the original corpus. In this
experiment, the highest results were obtained for LDA with 20 topics compared
to the results obtained with 50 and 100 topics. In the same way, the same happens
with LSA with 100 topics and with 50 topics for PLSA.
The results shown in Table 3 compared with those shown in Table 4 are higher
in LDA with 20 topics because the verbs were incorporated, but the nouns are
the same, which stopped them from obtaining higher results than those obtained
previously. LSA did not exceed the results obtained in Table 3. However, if it
exceeded those obtained in the Table 2 which shows that in this case, incorpo-
rating adjectives, nouns and verbs provided information that LSA incorporated
into the topics discovered, and therefore, its levels of coherence were higher. On
the other hand, PLSA obtained the highest results compared to the previous
ones reported, which shows that for PLSA, incorporating nouns, adjectives, and
verbs allowed to eliminate much present noise that was had during the first
approximation.

Table 4. Results obtained with topic coherence and the adjectives, nouns and verbs
recognized by freeling

Dataset 20 50 100
LDA 1.06 0.88 0.86
LSA 1.00 1.01 1.04
PLSA 1.00 1.22 1.19

Among the topics recovered were the words vacuna, sputnik, ola, contagios,
casos, financieros, polı́tica, mexicanos as shown in Table 5 shows 5 of the 20
Topic Discovery About Economy During COVID-19 Pandemic 531

topics obtained with LDA with adjectives, verbs and nouns with only five top
words. Topic 1 reference is made to health measures as well as vaccination in
companies. Topic 2 refers to the financial crisis due to the pandemic. However,
in topic 3, reference is made to vaccination against COVID in Mexican territory.
On the other hand, Topic 4 refers to financial activity and positive cases of
COVID. And topic 5 refers to taxes in the country and the increase in infections
during the pandemic.

Table 5. Topics obtained with LDA and the adjectives, verbs and nouns recognized
by freeling

Topics Topword 1 Topword 2 Topword 3 Topword 4 Topword 5


Topic 1 negocio vacunación empleados medidas trabajadores
Topic 2 salud finanzas gobierno crisis pandemia
Topic 3 territorio oaxaca sputnik petróleo vacuna
Topic 4 positivo financiero fondo compra ola
Topic 5 delta impuestos contagios aumento tercera

7 Conclusions and Future Work


In this work, an analysis was carried out using three different approaches to
the behavior of only 3 of the traditional techniques present in the topic dis-
covery literature about tweets in Spanish by inhabitants of the Mexican terri-
tory. The tweets were taken from the social network Twitter with comments on
the COVID-19 pandemic in the economic sphere. The three approaches imple-
mented explore the morphological information of texts such as adjectives, nouns,
and verbs. For its part, three traditional techniques were also compared: LDA,
PLSA, and LSA.
The purpose is to obtain topical texts in Spanish and analyze the results
obtained, given the lack of work in this language. The results showed that it
is possible to obtain important information from Spanish texts with algorithms
traditionally applied in English, thus supporting research on texts in Spanish.
The main contributions of this work are a) to discover latent themes of the
text in Spanish; b) provide knowledge and approaches to bridge the gap between
word processing in Spanish and English; c) a comparison of three techniques with
three approaches using morphological information from tweets; d) a set of topics
on the Mexican economy in the context of the COVID pandemic.
This work will be useful for topic analysts for a large set of texts since it
managed to provide an approach to the discovery of topics in Spanish with
techniques such as LDA, LSA and PLSA worked mainly on corpus in English.
It can be seen that the obtained results are variable. It is because each tech-
nique creates its dictionary and the existence of noise, as well as error labels
generated by Freeling. Nonetheless, the discovered topics can be beneficial for
532 A. L. Lezama Sánchez et al.

text analysts and, in general, users who want to know the topics discussed on
social networks in this pandemic situation over a specific region.
The purpose of this article was topic discovery in spanish tweets with LDA,
LSA and PLSA only with features like verbs, adjectives and nouns. The obtained
results show that the aim was accomplished because the results obtained were
obtain topical texts in Spanish tweets about COVID-19, showed that it is possible
to obtain important information about pandemic situation in the social network
Twitter.
As work in the future, it is necessary to consider in-depth information from
the texts, such as relationships or semantic roles. Additionally, implement deep
learning techniques to discover topics with better coherence.

Acknowledgment. The authors would like to thank Universidad Autónoma


Metropolitana, Azcapotzalco. The present work has been funded by the research
project SI001-18 at UAM Azcapotzalco, and by the Consejo Nacional de Ciencia y
Tecnologı́a (CONACYT) with the scholarship number 788155. The authors thankfully
acknowledge computer resources, technical advice and sup-port provided by Laborato-
rio Nacional de Supercómputo del Sureste de México (LNS), a member of the CONA-
CYT national laboratories, with project No 202103090C and partly by project VIEP
2021 at BUAP.

References
1. Agüero-Torales, M.M., Vilares, D., López-Herrera, A.G.: Discovering topics in twit-
ter about the COVID-19 outbreak in Spain. Procesamiento del Lenguaje Natural
66, 177–190 (2021)
2. Älga, A., Eriksson, O., Nordberg, M.: Analysis of scientific publications during the
early phase of the COVID-19 pandemic: topic modeling study. J. Med. Internet
Res. 22(11), e21559 (2020)
3. Amara, A., Hadj Taieb, M.A., Ben Aouicha, M.: Multilingual topic modeling for
tracking COVID-19 trends based on Facebook data analysis. Appl. Intell. 51(5),
3052–3073 (2021)
4. Anaya, L.H.: Comparing Latent Dirichlet Allocation and Latent Semantic Analysis
as Classifiers. ERIC (2011)
5. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn.
Res. 3, 993–1022 (2003)
6. Bougteb, Y., Ouhbi, B., Frikh, B., et al.: Deep learning based topics detection. In:
2019 Third International Conference on Intelligent Computing in Data Sciences
(ICDS), pp. 1–7. IEEE (2019)
7. Figuerola, C.G.: Applying topic modeling techniques to degraded texts: Spanish
historical press during the transición (1977-1982). In: Proceedings of the Sixth
International Conference on Technological Ecosystems for Enhancing Multicultur-
ality, pp. 857–862 (2018)
8. Fuentes-Pineda, G., Meza-Ruiz, I.V.: Topic discovery in massive text corpora based
on min-hashing. Expert Syst. Appl. 136, 62–72 (2019)
9. Heintz, I., et al.: Automatic extraction of linguistic metaphors with LDA topic
modeling. In: Proceedings of the First Workshop on Metaphor in NLP, pp. 58–66
(2013)
Topic Discovery About Economy During COVID-19 Pandemic 533

10. Hernández, A.R., Lorenzo, M.M.G., Simón-Cuevas, A., Arco, L., Serrano-Guerrero,
J.: A semantic approach for topic-based polarity detection: a case study in the
Spanish language. Procedia Comput. Sci. 162, 849–856 (2019)
11. Lyu, J.C., Le Han, E., Luli, G.K.: COVID-19 vaccine-related discussion on twitter:
topic modeling and sentiment analysis. J. Med. Internet Res. 23(6), e24435 (2021)
12. Mena, A., Reátegui, R.: Topic identification from Spanish unstructured health
texts. In: Botto-Tobar, M., Montes León, S., Camacho, O., Chávez, D., Torres-
Carrión, P., Zambrano Vizuete, M. (eds.) ICAT 2020. CCIS, vol. 1388, pp. 351–362.
Springer, Cham (2021). https://doi.org/10.1007/978-3-030-71503-8 27
13. Navarro-Colorado, B.: On poetic topic modeling: extracting themes and motifs
from a corpus of Spanish poetry. Front. Digit. Humanit. 5, 15 (2018)
14. Padró, L., Stanilovsky, E.: Freeling 3.0: towards wider multilinguality. In: LREC
2012 (2012)
15. Saorı́n, T.: Wikipedia de la A a la W, vol. 8. Editorial UOC (2012)
16. Sha, H., Hasan, M.A., Mohler, G., Brantingham, P.J.: Dynamic topic modeling
of the COVID-19 twitter narrative among US governors and cabinet executives.
arXiv preprint arXiv:2004.11692 (2020)
17. Sridhar, V.K.R.: Unsupervised topic modeling for short texts using distributed
representations of words. In: Proceedings of the 1st Workshop on Vector Space
Modeling for Natural Language Processing, pp. 192–200 (2015)
18. Xue, J., Chen, J., Chen, C., Zheng, C., Li, S., Zhu, T.: Public discourse and sen-
timent during the COVID 19 pandemic: using latent dirichlet allocation for topic
modeling on twitter. PLoS ONE 15(9), e0239441 (2020)
SimLDA: A Tool for Topic Model
Evaluation

Rebecca M. C. Taylor(B) and Johan A. du Preez

Department of Electrical and Electronic Engineering, Stellenbosch University,


Stellenbosch 7602, South Africa
[email protected], [email protected]

Abstract. Topic model evaluation is a well studied field. Two classes


of metrics are typically used to evaluate the quality of extracted top-
ics, namely held-out perplexity and coherence measures. Although these
metrics have been improved and refined, they still have drawbacks. In
this paper we propose using simulated data generated from our flexible
corpus generation tool, SimLDA, combined with an exact measure of
dissimilarity, the average Kulback-Leibler divergence (KLD), to achieve
a more fine-grained method for detecting differences in topic quality. In
this work, we use our proposed approach to evaluate and compare topics
extracted from synthetic data using two inference algorithms for latent
Dirichlet allocation (LDA), namely, variational Bayes (VB) and collapsed
Gibbs sampling. We then evaluate the extracted topics using a coherence
measure (the Cv score). Using the same two inference algorithms we then
extract topics from the popular 20 Newsgroups data set and evaluate the
extracted topics based on the Cv score. Through these three steps, we
show that although collapsed Gibbs sampling consistently outperforms
VB, the use of simulated data (evaluated using both coherence measures
and KLD) provides more insight into the quality of the extracted top-
ics and allows us to examine performance differences of the inference
algorithms.

Keywords: Topic model evaluation · Latent Dirichlet allocation ·


Variational Bayes · Collapsed Gibbs sampling · Divergence measure ·
Topic coherence

1 Introduction
In supervised learning models, the ability of a trained model to predict a target
variable is evaluated using a test set. Evaluating the performance of unsupervised
learning algorithms such as topic models, is less straightforward and a measure
of success needs to be defined. Typically, to evaluate topic models, the metrics
discussed below can be utilised.

1.1 Standard Measures for Evaluating LDA Performance


Held-out perplexity [14] has been the most popular evaluation metric for topic
models such as latent Dirichlet allocation (LDA) [23]. Although much work has
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
K. Arai (Ed.): FTC 2022, LNNS 561, pp. 534–554, 2023.
https://doi.org/10.1007/978-3-031-18344-7_38
SimLDA: A Tool for Topic Model Evaluation 535

been done to improve the estimators [36], held-out perplexity does not give suffi-
ciently fine-grained resolution: Minka and Lafferty address similar concerns [25].
They demonstrate that held-out perplexity for two different models can be
almost identical but when inspected (using simulated data where the word-topic
and topic-document distributions are known), large performance differences are
seen [25]. Furthermore, a large-scale human topic labeling study by Chang et al.
[14] demonstrated that low held-out perplexity is often poorly correlated with
interpretable latent spaces.
In more recent work, coherence measures are typically preferred in topic
evaluation [29]. Coherence, unlike held-out perplexity, is highly correlated with
human interpretability of topics [28]. In a comprehensive study of multiple coher-
ence measures, the CV coherence score had the highest correlation with human
topic ratings [28]. This measure is a combination of three measures: the indi-
rect cosine measure, the Boolean sliding window and the normalised pointwise
mutual information score, CNPMI , which performed almost as well as the CV
score. Other well-known coherence measures evaluated in their analysis include
CUCI and CUMass [24,28]. The CV score (used in this article) and the simpler
CNPMI score, are now popular for evaluating topic modelling results.
These coherence measures, however, are not without their drawbacks since
they take only the top words per topic into account, and not the full distributions
over topics. Consequently, much detail of the learnt distributions is discarded.
Because these measures are not comprehensive evaluation tools, it is good
practice to inspect the topics extracted (read through the words in each topic)
where the metrics indicate good performance [14]. Here we propose using sim-
ulated data along with a Kulback-Leibler divergence (KLD) measure to replace
extensive use of this tedious process and show how this metric gives more fine-
grained results than the CV coherence score for the same simulated data sets.

1.2 A More Exact Measure of Topic Model Performance


for Simulated Data

To avoid the problems mentioned above, we implement a corpus simulation sys-


tem based on the generative LDA graphical model. In this corpus simulation
system, the underlying distributions are known, allowing a more fine-grained
approach to evaluating algorithm performance.
A distance measure (forward KLD) is then used to compare the approximate
distributions learned from the topic model and the true distributions. An error
value, taking into account the error over all topics, is generated per run of each
model.
A further advantage of using simulated data is that an array of data sets
with a range of hyperparameters, like number of documents, number of topics,
and number of topics per document can be generated. This allows us to evaluate
a variety of corpus types.
536 R. M. C. Taylor and J. A. du Preez

1.3 Overview

In Sect. 2 (Background), we introduce LDA and define the distributions used in


the LDA graphical model. We also introduce the two inference methods that will
be used to extract topics from the simulated corpora.
In Sect. 3, we present SimLDA and describe its use in relation to topic model
evaluation. In Sect. 4 we describe the two simulated data sets that are used in
this article as example data sets, as well as the hyperparameters used in the
topic extraction experiments.
The topic modelling results are presented in Sect. 5. Using box plots we sum-
marise average KLD values obtained using the two algorithms for each data set.
Word-topic plots are included for closer scrutiny of the results from individual
corpora. We then present the topic coherence over a range of topic numbers so
as to compare the coherence and KLD results. To show the typical usage of
coherence metrics on a non-simulated data set, we also compare the two perfor-
mance of the two inference algorithms to a real, well known, text corpus—the
20 Newsgroups corpus.
In Sect. 6, we discuss our results and motivate the use of our topic model
evaluation methodology. We conclude this paper and present ideas for future
work in Sect. 7.

2 Background
In this section, we introduce latent Dirichlet allocation (LDA) and the two
approximate inference techniques that will be used to showcase our topic model
performance evaluation methodology.

2.1 Latent Dirichlet Allocation (LDA)

Although many types of topic models exist, ranging from latent-sematic indexing
(LSI) [20], as well its probabilistic counterpart, probabilistic-LSI (pLSI) [18], to
correlated topic models (CTM) [8], latent Dirichelt allocation (LDA) is still one
of the most popularly used topic models [32].
While the LDA model can extract latent topics of any type from a wide range
of inputs, it is most commonly known for its ability to extract latent semantic
information from text corpora (collections of documents).
By applying LDA to text corpora, we can extract topics, each consisting of
a list of words, where each word in the vocabulary has a probability of being in
that topic. Similarly, after running the inference algorithm, each document in
the corpus is represented as a probability distribution over topics. The notation
used to represent these word-topic and topic-word distributions, as well as the
other distributions that characterise LDA, are listed in Table 1.
SimLDA: A Tool for Topic Model Evaluation 537

Fig. 1. Plate model of LDA system as a bayes net. The symbols used in this figure are
explained in Table 1.

Table 1. Symbols used for the LDA model shown in Fig. 1.

Symbol Description
M Total number of documents
m Current document
N Number of words in current document
n Current word (in document)
K Total number of topics
k Current topic
Km n Number of topics per document
V Total number words in the vocabulary
v Current word (in vocabulary)
v Observed word (in vocabulary)
θm Topic-document Dirichlet for document m
Zm,n Topic-document categorical for word n in document m
Wm,n Word-topic conditional categorical for word n in document m
φk Word-topic Dirichlet for topic k

We use the LDA model in this article to perform topic modelling, and com-
pare the topic extraction results using two different approximate inference tech-
niques that are introduced in the following section.

2.2 Approximate Inference for LDA

Exact inference is intractable for many useful graphical models such as LDA
[7,9,10]. In fact, one cannot perform exact inference on any graphical model
where continuous parent distributions have discrete children [26]. A range of
538 R. M. C. Taylor and J. A. du Preez

approximation techniques can be used to overcome this difficulty. These tech-


niques vary in performance, based on the models to which they are applied [19].
Particle based approaches, such as collapsed Gibbs sampling [16,17], are com-
putationally expensive [33] and convergence rates can be slow, though asymp-
totic convergence is guaranteed. Because larger data sources are now readily
available, faster and equally effective approaches such as variational Bayes (VB)
have gained popularity [4,5,11].
Collapsed Gibbs sampling and VB are currently two of the most frequently
used inference techniques for LDA, and in this work, we use our topic model
evaluation approach to compare these inference algorithms for two simulated
data sets. To demonstrate how our results compare with standard coherence
measures, we also show how the two inference algorithms perform on a text
corpus, namely the 20 Newsgroups corpus [2].

3 SimLDA: As a Tool for Generating Simulated


Documents
In this section we present our corpus simulation tool, SimLDA, and describe
our method of measuring topic model performance based on the extracted and
ground truth topics. We also discuss the implementation details of SimLDA.

3.1 Generation of Simulated Documents


Our corpus simulation system outputs a corpus after input of the following
parameters: number of documents, corpus vocabulary, number of words per doc-
ument, number of topics in the corpus, number of topics per document (these
will be assigned random proportions that sum to 1 within a document), and a
measure of overlap.
Each corpus is generated as follows:
1. For each topic, generate it’s word distribution.
2. For each document, generate its topic distribution.
To facilitate graphical evaluation of the results of topic models, the words in
the corpus are all word indices, so that they can be reordered and plotted for
visual inspection (see Fig. 2(a) for ordered words and (c) for unordered words).
The words are organised in a circular arrangement i.e., the last one is adjacent
to the first one, and a topic is represented by a collection of words centered
around a particular position on this circle with a Laplace, Fig. 2(a), or Gaussian,
Fig. 2(b), decline to both sides, depending on the data set. Note that in our
modelling we do not make use of this particular distribution of words in the
topics—it merely serves to illustrate the results in an understandable way. We
show this in Fig. 2(c), where we display the unordered vocabulary on the x-axis.
As noted by Blei et al. in [10], LDA handles documents in a bag-of-words [13]
manner, which implies that the actual sequence of words or topics is not taken
into account by the LDA model.
SimLDA: A Tool for Topic Model Evaluation 539

Probability of word per topic

Probability of word per topic

Probability of word per topic


0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2

10 20 30 40 50 60 20 40 60 80 100 10 20 30 40 50 60
Words in dictionary for corpus Words in dictionary for corpus Words in dictionary for corpus

(a) Generated Word-Topic (b) Generated-Word Distri- (c) Shuffled Word-Topic Dis-
Distributions for a Laplace butions on a Gaussian Dis- tributions for (a). This Il-
Distributed Data Set with tributed Data Set with 15 lustrates that no Reliance
Number of Topics being 7, Topics. This Data Set has is made on the Sequence of
with the 7th Topic as the a Narrow Width (Support) Words within a Topic or the
Function Words Topic. per Document. Fact that Adjacent Topics
are Significantly more Likely
to Share Words.

Fig. 2. Plots of generated word-topic distributions from which samples are drawn in
the simulation of documents. The width of the word-topic distributions relates to the
support of each distribution.

To the topics mentioned above, we add an additional topic, non-overlapping


with the others, but occurring in all documents (the rightmost flat topic in
Fig. 2(a) and (b)). The addition of these words makes the task of learning the
word-topic and topic-document distributions significantly more challenging. This
is one of the challenges when applying LDA to true text corpora and it is typically
handled by applying pre-processing techniques before running LDA (such as
removing stop words and using the TF-IDF) [35] or by post-processing (removing
“context-free” words after extracting topics [25]). By including these stop words,
we aim to make our simulations more difficult and realistic.

3.2 Measuring Performance

All word-topic and topic-document distributions are Dirichlet distributions. One


can easily calculate the forward Kullback-Leibler divergence (KLD) between two
Dirichlet distributions.
Unfortunately, the Gensim implementation of the VB algorithm allows access
only to the mean of these Dirichlet distributions, not to the distributions them-
selves. Fortunately, in LDA, the mean of these distributions is, in fact, the prob-
ability of finding a word in a topic. We therefore calculate the forward KLD
between the actual word-topic distributions p and approximate word-topic dis-
tribution q for each topic,
 pi
KL(p  q) = pi ln (1)
i
qi
540 R. M. C. Taylor and J. A. du Preez

To match up the extracted topic to the ground truth topic, we compare each
extracted topic with the ground truth topic and choose the extracted topic that
is closest to the ground truth topic base on KLD. We repeat this process for
all ground truth topics and the average KLD over all topics is taken to be the
error for each model. It is important to note that when generating a corpus, we
are sampling from the underlying true distributions. We compare the extracted
distributions with the ground truth distributions from which we sample, and not
from the sampled distributions.

3.3 Implementation

SimLDA was developed using EMDW, a C++ library for Bayesian statistics from
Stellenbosch University [12,22,30,31], and can be used directly from Python. It
has also been Dockerised so that it can be used on any machine (see Fig. 3). It
can be used as an HTTP API (accepting a PUT request with JSON payload),
through the LDA wrapper package or directly from the console. Below we show
an example of input that is written to a JSON file (with Python-style comments),

1 SimLDA_json = {
2 ‘‘ topics_per_doc " : 3 , # topics per doc
3 ‘‘ number_of_docs " : 5 0 0 0 , # number of documents
4 ‘‘ total_topics " : 5 0 , # topics per doc
5 ‘‘ words_per_doc " : 1 5 0 , # words per doc
6 ‘‘ total_vocab " : 1 0 0 0 0 0 , # number words in corpus
7 ‘‘ weightingfactor " : 0 . 2 5 , # scale the width of the
distribution
8 ‘‘ tag " : " api " , # the tag is be used in
path
9 ‘‘ laplace " : False , # True for Laplace , else
Gaussian
10 }

If the API is used, the documents are returned in JSON format, along with a
dictionary. If SimLDA is used natively, the documents are written to compressed
text files locally.
SimLDA: A Tool for Topic Model Evaluation 541

AWS Cloud
lda_wrap package

docker container Amazon ECR

Input
documents
Notebook / Python Script

Document
Topic word
topic
Distriputions
distributions

(a) Deployment of the Docker Container on Amazon Web Services (AWS) Elastic
Container Registry (ECR)
lda_wrap package

docker container

Input
documents
Notebook / Python Script

Document
Topic word
topic
Distriputions
distributions

(b) Local or Remote Deployment of the Docker Container.

Fig. 3. Diagram showing how the SimLDA can be made available either on (a) a cloud
service such as amazon web services (AWS) or (b) on a server or local machine.

Once the simulated documents are created and made available, our LDA
wrapper package can be used to parse the created documents, and to interface
with the topic models. The LDA wrapper package also allows us to run a number
of iterations for each corpus type for the simulated data sets. On completion,
SimLDA writes the generated documents to file, or, if used as an API, returns
the documents as a JSON payload.
542 R. M. C. Taylor and J. A. du Preez

4 Method
Here we describe the method used to showcase SimLDA and our custom topic
modelling evaluation metric. We start by describing the simulated data sets, and
then describe the hyperparameters that are used in the experiments.

4.1 Simulated Data Sets

We chose two small synthetic data sets to illustrate the functionality of SimLDA.
Each data set consists of 20 groups of corpora, where each group contains corpora
consisting of a set number of documents per corpus. We generate multiple cor-
pora per data set so that we can compare performance over a number of samples
to have an idea of how performance varies with small changes to a corpus.
These data sets are small by real-word text topic extraction standards (in
terms of number of documents, and words per document), which makes it harder
for LDA to learn their underlying distributions—they contain less information.
By choosing harder data sets, differences between topic models are often more
apparent.
Furthermore, smaller corpora require less processing time. Choosing small
corpora allows us to:

1. Run collapsed Gibbs sampling for long chains and take multiple samples.
2. Generate many corpora per corpus generation parameters setting (such as
document length, number of topics per document, etc.).
3. Iterate over multiple hyperparameters for LDA (such as the Dirichlet hyper-
parameters, and number of epochs).

We now describe the two simulated corpora that are used in this work.
Smaller Simulated Data Set: For each corpus we use the following corpus
generation parameters (see Table 1): V = 100, N = 100, K = 7 and Km = 3.
This data set is smaller than the other in terms of number of topics and
vocabulary length. There are 100 words per document, which makes the total
number of observed words low—which would be the case even with many docu-
ments.
The ratio of topics per document to total topics is reasonably high (about
1:2) when compared to text topic extraction data sets. When performing LDA
on text corpora, we typically expect fewer topics within each document (often
only one or two, such in the 20 Newsgroups corpus), but expect many more
topics for the entire corpus.
Larger Simulated Data Set: For each corpus we use the following corpus
generation parameters: V = 500, N = 120, K = 10 and Km = 5. This data set
has a larger vocabulary, though considerably smaller than most text corpora.
Each document contains five documents out of the 10 available topics.
SimLDA: A Tool for Topic Model Evaluation 543

4.2 Hyperparameter Selection for Simulated Data Sets

Here we provide details about the hyperparameters that are chosen to be used
for our experiments.

Epochs: For the implementations of VB and collapsed Gibbs sampling that


are used, one does not have access to the internal distributions at each epoch.
We therefore test convergence by running LDA a number of times for various
numbers of epochs and inspecting the average result. For VB, performance is sig-
nificantly worse at 70 epochs, even for the smaller simulated data set but shows
no improvement at 200 epochs for either data set. For the larger simulated data
set, for VB, we use 150 epochs for all runs. For collapsed Gibbs sampling, 5, 000
samples are used since poor results are obtained when using 2, 000 iterations.
This is significantly more than the 2, 000 samples recommended in the Python
package [1] and the 1, 000 used by Zeng et al. [37].

Dirichlet Hyperparameters: A grid search is applied to choose the appropri-


ate Dirichlet hyperparameters for each corpus. The hyperparameters α = β = 0.1
do well over both algorithms for the larger simulated data set and α = β = 0.5
yield the best results for the smaller simulated data set.

We now present the topic extraction results for these two simulated data sets,
as well as for a well known text corpus, the 20 Newsgroups corpus [2].

5 Results
To objectively determine the degree to which the estimated topic-word distribu-
tions differ from the actual distributions from which the simulated data are gen-
erated, we present average KLD values for each of the two algorithms. For each
group of 20 corpora (each group consisting of a different number of documents
per corpus M —with the other hyperparameters fixed), we compute average KLD
over all topics for the two algorithms.
Using box plots, we show the average KLD against the number of documents
per corpus. This allows the median KLD and interquartile ranges (the latter
indicating the degree of variability in the data) of the algorithms to be compared
visually. These results are summarised in Fig. 4 (smaller simulated data set) and
Fig. 9 (larger simulated data set).
We also, for select corpora, plot the word-topic distributions inferred by the
algorithms, superimposed on the true distributions from which the corpora are
sampled. Average KLD over all topics is provided in these plots (which we
call word-topic plots), as an objective indication of the extent to which the
true and extracted distributions agree. Algorithm performance can also be visu-
ally assessed by examining the differences between the true distributions and
544 R. M. C. Taylor and J. A. du Preez

extracted distributions. In Fig. 4, we show the summary box plots for the exper-
iments performed on this data set. For corpora containing fewer documents, col-
lapsed Gibbs sampling outperforms VB in terms of both variability and median
value. Average KL-Divergence for topics (per run)

Gibbs VB
0.7

0.6

0.5

0.4

0.3

0.2

0.1

50 100 200 300 500


Number of Documents (per run)

Fig. 4. Box plot showing the average KLD values for collapsed gibbs sampling and VB
as the number of documents per run increases for the Smaller Simulated Data Set. The
average KLD is computed over all topics for 20 runs. For smaller corpora, collapsed
gibbs sampling performs best. For larger corpora, VB starts to perform as well or even
better than collapsed gibbs sampling. (Color figure online)

We show only one example of poorer performance and one example of better
performance (based on KLD scores provided in each figure) of each algorithm.
In each plot, the ground truth topics are represented by red lines, and extracted
topics are represented by different colours. The closer the coloured curves are to
the red lines over all topics, the better the performance of the algorithm.

5.1 Smaller Simulated Data Set

For corpora with 200 documents each, VB starts to outperform collapsed Gibbs
sampling in terms of the median value, but not in terms of variability. For corpora
with more than 200 documents, VB outperforms collapsed Gibbs sampling in
terms of median value, and the variably starts to decrease to a level that seems
to be nearing that of collapsed Gibbs sampling.
Inspecting the topic extraction of individual corpora containing 50 documents
each (Fig. 5 and 6), allows us to compare the extracted topics (the coloured
curves) with the ground truth topics (as defined in SimLDA). It is clear that
collapsed Gibbs sampling extracts topics more correctly than VB does, since in
SimLDA: A Tool for Topic Model Evaluation 545

Fig. 5, we see that the coloured curves do not match the red curves and that this
is reflected in the high KLD values of 0.49 and (at best) 0.19 (compared with
the KLD values of 0.16 and 0.12 for the examples shown in Fig. 5 as extracted
by collapsed Gibbs sampling.

0.200
0.20

0.175
Probability of word per topic

Probability of word per topic


0.150
0.15

0.125

0.100 0.10

0.075

0.050 0.05

0.025

0.000 0.00

0 20 40 60 80 100 0 20 40 60 80 100

Words in dictionary for corpus Words in dictionary for corpus

(a) KLD = 0.16 (Average over All Topics). (b) KLD = 0.12 (Average over All Topics).
This is One of the Corpora where Collapsed This is an Example of Good Topic Extrac-
Gibbs Sampling Performed the Worst (al- tion by Collapsed Gibbs Sampling.
though it is still good performance).

Fig. 5. True versus extracted topics identified by collapsed gibbs sampling from two
simulated corpora derived from the Smaller Simulated Data Set (Color figure online)

Although we have only presented results in this manner for a few select cor-
pora, one can inspect the results for each corpus. This is valuable when devel-
oping either new topic modelling techniques or when developing a new inference
algorithm.

0.200 0.200

0.175 0.175
Probability of word per topic

Probability of word per topic

0.150 0.150

0.125 0.125

0.100 0.100

0.075 0.075

0.050 0.050

0.025 0.025

0.000 0.000

0 20 40 60 80 100 0 20 40 60 80 100

Words in dictionary for corpus Words in dictionary for corpus

(a) KLD = 0.49 (Average over All Top- (b) KLD = 0.19 (Average over All Top-
ics). This is an Example of Typical Topic ics). This Shows Exceptionally Successful
Extraction by VB. Typical Topic Extraction by VB.

Fig. 6. True versus extracted topics identified by VB from two simulated corpora
derived from the Smaller Simulated Data Set. Each corpus contains 20 documents.
The result in (a) is a typical result, not an extreme one. In (b) this result for VB is
in fact the KLD outlier that can be seen in the summary box plot in Fig. 4 (Plotted
where M = 50 on the x-Axis).
546 R. M. C. Taylor and J. A. du Preez

We now compare these results with the standard Cv coherence score. By


extracting topics for this data set for values of K other than the true number of
K, we can use the standard way of plotting coherence for a range of topics to
evaluate the data set (for a specific corpus group). In Fig. 7 and 8 we show the
coherence scores for M = 100 and M = 500 respectively. In both figures, the
highest Cv scores are shown for the correct number of topics (K = 7).
In Fig. 12, collapsed Gibbs sampling performs better than VB only for the
correct number of topics, and only marginally so. When comparing this with the
KLD score shown in Fig. 4 at M = 100, we can see that the KLD score shows a
that VB performs much worse than collased Gibbs.
For M = 500, (see Fig. 8), collapsed Gibbs sampling performs better than
VB at for 8 and 9 topics, but worse for lower numbers of topics. At 7 topics, the
correct number based on the underlying distributions, the algorithms perform
very similarly. This is similar to what is seen using the KLD measure in Fig. 4.

0.70
Gibbs VB

0.65

0.60
Coherence score: c v

0.55

0.50

0.45

0.40

0.35

0.30
4 5 6 7 8 9
Number of topics

Fig. 7. Cv scores for the two algorithms for the Smaller Simulated Data set for corpora
containing 100 documents.
SimLDA: A Tool for Topic Model Evaluation 547

0.70
Gibbs VB

0.65

0.60

Coherence score: c v
0.55

0.50

0.45

0.40

0.35

0.30
4 5 6 7 8 9
Number of topics

Fig. 8. Cv scores for the two algorithms for the Smaller Simulated Data set for corpora
containing 500 documents.

5.2 Larger Simulated Data Set

Here the inference problem is harder to solve than when performed on the smaller
simulated data set, since there are more topics per document (6 topics, instead
of 3), which implies greater topic overlap within each document.
Over all the groups of corpora (from those containing 100 to those containing
500 documents each), collapsed Gibbs sampling outperforms VB with a large
margin in terms of variability as well as median value.
The word-topic plots show more detail with regard to these summarised
results. In Fig. 10, we show topics extracted using VB on two corpora containing
100 documents each. In (a) the topic extraction performance is very poor. In (b)
we can see that the algorithm identifies most of the underlying topics, but not
well.
Figure 11 shows topic extraction by collapsed Gibbs sampling. For these cor-
pora, collapsed Gibbs sampling successfully identifies the topics.
548 R. M. C. Taylor and J. A. du Preez

Average KL-Divergence for topics (per run)


Gibbs VB
1.2

1.0

0.8

0.6

0.4

0.2

100 200 300 500


Number of Documents (per run)

Fig. 9. Box plot showing the average KLD values for the four algorithms as the number
of documents per run settings increase for the larger data set. KLD is computed over
all topics for 20 runs. It is clear that VB is the worst performing algorithm over this
range of corpora.

0.040

0.035 0.04
Probability of word per topic

Probability of word per topic

0.030

0.03
0.025

0.020
0.02
0.015

0.010
0.01

0.005

0.000 0.00

0 100 200 300 400 500 0 100 200 300 400 500

Words in dictionary for corpus Words in dictionary for corpus

(a) KLD = 1.3 (Average over All Topics). (b) KLD = 0.77 (Average over All Topics).
This is an Example of Poor Topic Extrac- This is an Example of Good Topic Extrac-
tion by VB. tion by VB.

Fig. 10. True versus extracted topics identified by VB for two simulated corpora
derived from the Larger Simulated Data Set. Each corpus contains 100 documents.
SimLDA: A Tool for Topic Model Evaluation 549

0.05

0.04
Probability of word per topic

Probability of word per topic


0.04

0.03
0.03

0.02
0.02

0.01 0.01

0.00 0.00

0 100 200 300 400 500 0 100 200 300 400 500

Words in dictionary for corpus Words in dictionary for corpus

(a) KLD = 0.32 (Average over All Topics). (b) KLD = 0.3 (Average over All Topics).
This is a Typical Topic Extraction by Col- This is Another Typical Topic Extraction
lapsed Gibbs Sampling. by Collapsed Gibbs Sampling.

Fig. 11. True versus extracted topics identified by collapsed gibbs sampling for two
simulated corpora derived from the Larger Simulated Data Set. Each corpus contains
100 documents.

To compare our KLD metric with coherence, we chose the corpus group
where M = 200, and plot the Cv coherence scores in box plot form in Fig. 12.
Collapsed Gibbs sampling performs better than VB for the correct number of
topics (K = 10), as well as where (K = 9). For other numbers of topics, VB
either performs similarly or better than collapsed Gibbs sampling. It is also
interesting to note that the correct number of topics, does not give the highest
coherence score.

0.70
Gibbs VB

0.65

0.60
Coherence score: c v

0.55

0.50

0.45

0.40

0.35

0.30
6 7 8 9 10 11 12 13 14
Number of topics

Fig. 12. Cv scores for the two algorithms for the Larger Simulated Data Set for corpora
containing 200 documents.
550 R. M. C. Taylor and J. A. du Preez

We now evaluate the two inference algorithms by extracting topics from


a commonly used text corpus, the 20 Newsgroups corpus, and comparing the
coherence scores for these two algorithms.

5.3 Evaluation of the Inference Algorithms Using the 20


Newsgroups and Coherence
The well-known 20 Newsgroups corpus [2,3,15,34] has been generated by
extracting posts from 20 different newsgroups, each typically covering a specific
logical topic.
Before applying topic modelling to this corpus, standard pre-processing steps
are applied, using a combination of regular expressions and functions available
from The Natural Language Toolkit (NLTK) [6,21] and Gensim [27].
In Fig. 13, the Cv scores are shown over a range of K values for the 20
Newsgroups corpus. Collapsed Gibbs sampling clearly performs better than VB,
and shows the highest coherence at 20 topics K = 20. Because we do not know
the true number of topics, it is hard to objectively determine which algorithm
is better at topic extraction.
Given that collapsed Gibbs sampling consistently provides higher coherence
values, over the range of K, based on these results, one could conclude that
collapsed Gibbs sampling performs better for this data set. This is in keeping
with our results for the simulated data sets, and also with other research [4].
Without the ground truth distributions, however, it is harder to quantify the
differences in performance than when we know the true number of latent topics.

Fig. 13. Cv scores for the two algorithms. The performance is similar for K = 13, but
for other values of K, collapsed gibbs sampling performs much better than VB.
SimLDA: A Tool for Topic Model Evaluation 551

6 Discussion
SimLDA allows very large numbers of simulated documents to be created with
a wide range of hyperparameters. By varying these hyperparameters such as
number of topics per document and topic width, one can compare topic model
performance over a wide range of corpora. In this article, we demonstrate this
for the two simulated data sets.
Because the ground truth distribution of the simulated corpora is known,
we can easily compare the extracted topics with the word-topic distributions
used to create the corpora in the first place. By using an average forward KLD
over all the topics, we can quantify the error that a topic model makes for a
specific corpus. Since many corpora can be extracted using the same underlying
distributions, we can apply LDA to a number of these corpora, and inspect the
variability of the results. This gives an indication of the stability of the topic
model, inference technique used for topic extraction, or hyperparameters chosen.
For example, we see that in both the smaller simulated data set and the larger
simulated data set (Fig. 4 and 9), collapsed Gibbs sampling shows less variability
than VB does.
By inspecting these box plots, we can also see that although the general
performance of collapsed Gibbs sampling is better than that of VB by a large
margin, there are times when VB starts to do better than collapsed Gibbs sam-
pling. This can also be seen by looking at the coherence plot in Fig. 8. Should
one have only looked at specific text corpora (such as the 20 Newsgroup corpus,
shown in Fig. 13), this effect could have been missed.
In contrast to our results using SimLDA and KLD, plots of CCv scores reveal
that differences between the two algorithms appear to be very small, with a
large amount of variability in scores at each topic number setting. In the larger
simulated data set, the highest scores for both algorithms could not clearly
identify the correct number of topics. Our KLD metric can show the performance
differences between topic models more clearly than the standard Cv score because
we use the ground truth distributions in the KLD metric, and we work with
probabilities and not merely the word rank.
The visual nature of the word-topic plots are another advantage of our topic
modelling performance evaluation methodology. By using these plots we can see
the probabilities of a word being assigned to a topic, compared with the underly-
ing probability of that word in the topic (as part of the word-topic distributions
from which the corpus was generated). These word-topic plots can, moreover, be
inspected after every few epochs, allowing one to visually compare convergence
for different inference algorithms for the same corpus, or to compare convergence
for corpora with various hyperparameters.

7 Conclusion and Future Work


In this article, we present SimLDA and show how it can be used to evaluate topic
models. We use two popular approximate inference techniques, collapsed Gibbs
552 R. M. C. Taylor and J. A. du Preez

sampling and VB, to perform topic modelling using LDA, and calculate the topic
modelling performance of these algorithms using a forward KLD measure. This
measure utilises the posterior word-topic distributions as well as the original
word-topic distributions from which the corpora were generated.
We plot the results using box plots which show the median values for both
inference algorithms over a range of corpus sizes for both simulated data sets.
Collapsed Gibbs sampling performs better than VB in both data sets overall,
but in the smaller simulated data set, when the number of documents is higher,
VB does marginally better than collapsed Gibbs sampling. This is a function of
the hyperparameters chosen for inference, as well as the corpus hyperparameters.
Being able to identify cases like this is one of the advantages of SimLDA.
We also provide word-topic plots to inspect the results of individual corpora
visually. These plots give a more detailed view of the information provided in the
box plots, and allow the user to see exactly where the topic modelling does well,
and where topics are incorrectly learned. The Cv scores are also computed over a
range of K for the two simulated data sets and compared with the KLD metric.
Coherence scores were not able to discriminate between the two algorithms as
well as what is seen using the custom KLD metric.
As future work, the use of synthetic data generated using SimLDA, together
with our KLD measure, could find application in research involving new topic
models or for comparing existing models and inference algorithms over a wider
range of corpora. Expanding the scope of these methods to include corpora
with diverse characteristics and data distributions could present opportunities
for future work and advance current understanding on which models are most
useful for specific types of datasets. SimLDA currently supports only topics that
have a Gaussian or Laplace shaped distribution. Future work could include the
addition of distributions having other properties. Additionally, SimLDA could
be extended to generate data for other similar graphical models.

References
1. Python Package Index - PyPI
2. 20 newsgroups dataset, empty
3. Albishre, K., Albathan, M., Li, Y.: Effective 20 newsgroups dataset cleaning. In:
2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelli-
gent Agent Technology (WI-IAT), vol. 3, pp. 98–101. IEEE (2015)
4. Asuncion, A., Welling, M., Smyth, P., Teh, Y.W.: On smoothing and inference for
topic models. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in
Artificial Intelligence, pp. 27–34. AUAI Press (2009)
5. Attias, H.: A variational Baysian framework for graphical models. In: Advances in
Neural Information Processing Systems, pp. 209–215 (2000)
6. Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing
Text with the Natural Language Toolkit. O’Reilly Media Inc., Sebastopol (2009)
7. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York
(2006)
8. Blei, D., Lafferty, J.: Correlated topic models. Adv. Neural. Inf. Process. Syst. 18,
147 (2006)
SimLDA: A Tool for Topic Model Evaluation 553

9. Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for
statisticians. J. Am. Stat. Assoc. 112(518), 859–877 (2017)
10. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn.
Res. 3(Jan), 993–1022 (2003)
11. Braun, M., McAuliffe, J.: Variational inference for large-scale models of discrete
choice. J. Am. Stat. Assoc. 105(489), 324–335 (2010)
12. Brink, D.: Using probabilistic graphical models to detect dynamic objects for
mobile robots (2016)
13. Cao, L., Fei-Fei, L.: Spatially coherent latent topic model for concurrent segmen-
tation and classification of objects and scenes. In: 2007 IEEE 11th International
Conference on Computer Vision, pp. 1–8. IEEE (2007)
14. Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J., Blei, D.: Reading tea leaves:
how humans interpret topic models. In: Advances in Neural Information Processing
Systems, pp. 288–296 (2009)
15. Elberrichi, Z., Rahmoun, A., Bentaalah, M.A.: Using wordnet for text categoriza-
tion. Int. Arab J. Inf. Technol. (IAJIT) 5(1) (2008)
16. Griffiths, T.: Gibbs sampling in the generative model of latent dirichlet allocation
(2002)
17. Griffiths, T.: Gibbs sampling in the generative model of latent dirichlet allocation-
gruffydd@ psych (2004)
18. Hofmann, T.: Probabilistic latent semantic analysis. arXiv preprint
arXiv:1301.6705 (2013)
19. Knowles, D.A., Minka, T.: Non-conjugate variational message passing for multi-
nomial and binary regression. In: Advances in Neural Information Processing Sys-
tems, pp. 1701–1709 (2011)
20. Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: the latent semantic
analysis theory of acquisition, induction, and representation of knowledge. Psychol.
Rev. 104(2), 211 (1997)
21. Loper, E., Bird, S.: NLTK: the natural language toolkit. arXiv preprint cs/0205028
(2002)
22. Louw, E.J.: A probabilistic graphical model approach to multiple object tracking
(2018)
23. Mcauliffe, J.D., Blei, D.M.: Supervised topic models. In: Advances in Neural Infor-
mation Processing Systems, pp. 121–128 (2008)
24. Mimno, D., Wallach, H., Talley, E., Leenders, M., McCallum, A.: Optimizing
semantic coherence in topic models. In: Proceedings of the 2011 Conference on
Empirical Methods in Natural Language Processing, pp. 262–272 (2011)
25. Minka, T., Lafferty, J.: Expectation-propagation for the generative aspect model.
In: Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelli-
gence, pp. 352–359. Morgan Kaufmann Publishers Inc. (2002)
26. Murphy, K.P.: Dynamic Bayesian networks: representation, inference and learning,
dissertation. Ph.D. thesis, UC Berkley, Department of Computer Sciences (2002)
27. Rehurek, R., Sojka, P.: Software framework for topic modelling with large cor-
pora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP
Frameworks. Citeseer (2010)
28. Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence mea-
sures. In: Proceedings of the Eighth ACM International Conference on Web Search
and Data Mining, pp. 399–408. ACM (2015)
29. Stevens, K., Kegelmeyer, P., Andrzejewski, D., Buttler, D.: Exploring topic coher-
ence over many models and many topics. In: Proceedings of the 2012 Joint Confer-
554 R. M. C. Taylor and J. A. du Preez

ence on Empirical Methods in Natural Language Processing and Computational


Natural Language Learning, pp. 952–961 (2012)
30. Streicher, S., du Preez, J.: Graph coloring: comparing cluster graphs to factor
graphs. In: Proceedings of the ACM Multimedia 2017 Workshop on South African
Academic Participation, pp. 35–42 (2017)
31. Streicher, S., du Preez, J.: Strengthening probabilistic graphical models: the purge-
and-merge algorithm. IEEE Access 9, 149423–149432 (2021)
32. Vayansky, I., Kumar, S.A.P.: A review of topic modeling methods. Inf. Syst. 94,
101582 (2020)
33. Wainwright, M.J., Jordan, M.I., et al.: Graphical models, exponential families, and
variational inference. Found. Trends← R Mach. Learn. 1(1–2), 1–305 (2008)
34. Wallach, H.M.: Topic modeling: beyond bag-of-words. In: Proceedings of the 23rd
International Conference on Machine Learning, pp. 977–984 (2006)
35. Wallach, H.M., Mimno, D.M., McCallum, A.: Rethinking LDA: why priors matter.
In: Advances in Neural Information Processing Systems, pp. 1973–1981 (2009)
36. Wallach, H.M., Murray, I., Salakhutdinov, R., Mimno, D.: Evaluation methods
for topic models. In: Proceedings of the 26th Annual International Conference on
Machine Learning, pp. 1105–1112 (2009)
37. Zeng, J., Cheung, W.K., Liu, J.: Learning topic models by belief propagation. IEEE
Trans. Pattern Anal. Mach. Intell. 35(5), 1121–1134 (2012)
Virtual Assistant for Querying Databases
in Natural Language

Daiga Deksne1,2(B) and Raivis Skadiņš1,2


1 Tilde, Riga, Latvia
{daiga.deksne,raivis.skadins}@tilde.lv
2 Faculty of Computing, University of Latvia, Riga, Latvia

Abstract. This paper reports on creating virtual assistants (VA) that enable users
to query a database in the natural language. Building SQL queries from the natural
language is a complicated task. We build the query via a conversation between
the user and the virtual assistant allowing the users to describe their needs during
a more detailed conversation. The VA uses information about the schema of the
data source to guide the user. The query is built incrementally. To test the proposed
method, we implemented a dialogue system for querying a part of the Open Food
Facts database. The evaluation results show that users successfully completed the
task in most cases. The easiest task was completed by 72% of users, the most
sophisticated task was completed by 58% of users. To finish the tasks, users had
to provide parameters that the VA prompted for, to sort the records, and to add
filtering conditions using natural language. The proposed approach allows the
building of similar VAs for different databases.

Keywords: Virtual assistants · Semantic parsing · Machine learning

1 Introduction
Nowadays, a wide range of open data stored in various databases is available. Frequently,
information from databases cannot be accessed by people who need it, as databases
can be queried either through limited, pre-built user interfaces or by writing queries in
SQL, SPARQL, or other query languages. Thus, for a user without in-depth technical
knowledge full access to the data is impossible. Typical users who could benefit from
the knowledge base information include non-IT researchers, journalists, entrepreneurs,
etc., and typical knowledge bases include open data archives, sales data, etc. Currently,
there is no easy solution. The most typical solution involves asking the IT specialists for
help, trying to load data in Excel, or using graphical query building tools.
In this research, we are looking for solutions that would help people without in-
depth technical knowledge in accessing the information stored in various databases. We
propose to build database queries in a conversation between the user and the virtual
agent. We allow the users to describe their needs in dialog during which the users can
provide more details, if needed, and the VA can help the users by asking questions and
guiding them.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 555–564, 2023.
https://doi.org/10.1007/978-3-031-18344-7_39
556 D. Deksne and R. Skadiņš

To answer a user’s question by relying on a database, a number of sequential steps


must be completed: 1) user’s request must be analyzed by employing natural language
understanding methods; 2) a query in a query language must be created by relying on
the information provided by a user; 3) the query must be executed; 4) a table or a list of
results should be presented to the user if there are several records or an answer must be
created by natural language generation methods if there is one single result.
The rest of the paper is organized as follows. Section 2 contains previous work about
building SQL queries from the natural language. Then we continue with the description
of the research method used in this research in Sect. 3. In Sect. 4 we propose the concept
of our solution that involves constructing SQL queries as a conversation between the
user and the virtual agent, in Sect. 5 we describe the implementation of the VA prototype.
In Sect. 6 we evaluate the prototype by asking respondents to solve tasks by interacting
with the VA. Finally, in Sect. 7 we draw conclusions and outline possible directions for
future research.

2 Previous Work

There have been many attempts to implement solutions that allow writing the queries in
the natural language that are later automatically converted into the SQL or other technical
query language. Typically, queries are generated using the natural language processing
workflow components. The study [10] offered to use a workflow with six components for
translating natural language questions into structured SPARQL queries. However, each
of the workflow components can create noise, thereby reducing the ability to generate
a correct query. The author [7] proposed to use a simpler workflow that tried to rely
only on keyword detection. Also, [13] developed a SPARQL query generator capable of
coping with noisy inputs. After generating query hypotheses, the system ranked them
based on their structural similarity to the input question.
In a study by [1], 24 systems with a natural language interface for databases have
been evaluated. There are systems that 1) use keywords, 2) use samples (pattern-based),
3) parse text, and 4) use grammar. Each of the systems has been evaluated as to its ability
to interpret 10 questions with a varying degree of complexity. For example, systems had
to find answers to the question All movies starring Brad Pitt from 2000 until 2010 or
All movies with the same genres as ‘Sin City’. Testing results demonstrate that keyword
systems are sufficient to understand simple questions. However, to deal with questions
whose interpretation requires generation of sub-queries, parsing systems that clarify the
structure of the question are more suitable. Overall, grammar-based systems are the most
powerful, but are highly dependent on manually designed rules.
As methods based on neural network algorithms become more popular, researchers
have begun to study end-to-end techniques for generating queries from questions posed
in the natural language. The architecture of solutions based on neural network algorithms
is very diverse. Inputs of the Seq2SQL model [14] contain a question and names of the
table columns, but the output has three components that match the parts of an SQL
query: aggregate function(s), column(s), and filter condition(s). An enhanced learning
algorithm has been employed: it uses the result of the generated query as a reward. The
SQLNet [9] uses the sequence-to-set architecture based on the query template (sketch)
Virtual Assistant for Querying Databases in Natural Language 557

to be filled in with column names and values. The column attention mechanism is used
to determine the columns. With this approach, no query structure needs to be generated.
The sequence-to-SQL approach is also used in [8]. The input of the model contains a
question and a table consisting of column names and cells. At each time step, a channel
is selected for predicting an SQL keyword, a column name, or a cell. The study [4] uses
a two-step neural model, first, by generating the SQL Query Template (sketch) from
the question, and second, by generating a full SQL query by relying both on the text
query and the acquired sketch. There is also a different way to answer questions by
relying on knowledge bases: instead of generating knowledge base queries, [6] train the
memory-to-sequence model for a task-oriented dialog.
The recent advances in automatic speech recognition have promoted development
of a voice-based interface for database querying. The EchoQuery developed by [5] uses
voice command device Echo from Amazon and the voice command service Alexa to
provide a stateful dialogue-based query interface between the user and the database.
There are several labeled datasets that are used for training and testing the systems
that address the challenge of employing the natural language for retrieving information
from relational datasets such as ATIS [3], WikiSQL[14], Spider [11], and CoSQL [12]
which is the dialogue version of Spider. The ATIS Corpus contains information on air
traffic in the United States. ATIS0 Pilot has 2884 questions in the natural language
about information from 28 tables (125 fields). The WikiSQL corpus includes 80,654
queries about the information in 24,241 Wikipedia tables. The Spider dataset contains
200 databases in 138 domains, and 5,693 SQL queries corresponding to 10,181 questions
in the natural language.
Questions for querying information from various domains can be very different.
Besides, the question labeling process can be expensive, timely, and requires expert
knowledge. Many previous solutions assume that the entire query will be described
by a single expression in the natural language (a sentence or a few), but it may be
very challenging for a human to describe complex queries in this manner. Unlike the
approaches described above, we offer to take a different approach: we propose building
the query as a conversation between the users and the virtual agent. We want to allow
the users to describe their needs in a more detailed conversation during which the users
can provide more details, if needed, and the VA can help the users by asking questions
and guiding them.

3 Methodology

In this research we are doing a feasibility study, and the research question of our paper
is – is it possible to create a virtual assistant that helps its users without deep technical
knowledge to build SQL queries and access the necessary information from databases.
And we are looking for solution that would be easy to adapt to different databases and
would not require collecting and annotating large datasets.
To answer this question, we are building a prototype that demonstrates how we can
build an example virtual assistant that allows users to access one particular database.
The creation of such a prototype would open opportunities for other researchers to build
virtual assistants for other databases using techniques similar to those we propose.
558 D. Deksne and R. Skadiņš

To validate, the prototype are conducting a user study to understand whether the
prototype created allows users to make queries to the database. In this study, we analyze
user behavior, we analyze which tasks are easier and which tasks are more difficult, and
we count how many percent of the cases users succeeded with the task.

4 Proposed Solution
The VA uses knowledge about the data source to query. Although there have been
attempts to convert any natural language query into a SQL query, in this research we use
database schema information that describes the structure of the database – tables, fields,
field types, links, indexes, etc.
The VA builds the query incrementally. Initially, we start with a query template, and
add some query elements analyzing user’s input using the natural language understanding
(NLU) techniques: intent detection and named entity recognition. We use an intent
detection component that is based on fastText word embeddings and a convolutional
neural network [2]. Input of the classifier contains the embedding vector for user’s
utterance. Output contains probability distribution for all possible intents.
The query is built up with each input from the user. Typical intents allow to specify
a query template, fields for selection, the sorting order, filtering conditions, the number
of records to retrieve, and typical entities used with these intents include table and field
names, and filtering values. The query is built up by relying on the detected intent as a
command for the query builder, and entity values are the attributes of these commands.
During the conversation, the parts of the query are stored in conversation context vari-
ables, and the SQL query is generated only when we need to execute it. This approach
allows us to focus the query building process on describing the data we want to get and
not the syntax of the query language.
The dialogue between the user and the VA is guided by a specific scenario managed
by the Dialog Manager (see Fig. 1). A combined dialogue style is used: it combines the
features of the guided and the free dialogue. In the course of the guided dialogue, the
VA determines the course of the conversation by asking questions or asking the user
to choose one of the provided options. In the course of the free dialogue, the initiative
is given to the user. The users express their wishes or ask for something, and the VA
responds accordingly.
The solution consists of two parts. One part does not depend on a particular data
source and the other part contains specific data for a particular database. The dialogue
scenario, the functions that encapsulate the API calls, and the VA responses are three
database-independent parts of the solution, they are the same for any database. Entities,
query templates, the way how users express their intents to use these templates, and
information about the database schema depend on a specific database.
Building the query in the form of a conversation also helps to deal with the problem
of multilingualism, as we can use the same intents and named entities for all languages
and can independently train the NLU models for each language. Thus, multilingualism
is handled at the NLU level, and query building and SQL generation is language inde-
pendent. Initially, we focus on the SQL data sources because they are the most popular,
but the same method can be later adapted to access SPARQL, CKAN, GraphQL, and
other information sources.
Virtual Assistant for Querying Databases in Natural Language 559

Fig. 1. The architecture of the virtual assistant

We are also investigating the option of building multilingual intent detection and
named entity recognition models, as this approach would help us in building models for
less-resourced languages by leveraging data from well-resourced languages (in particular
English).

5 Prototype
We have implemented a prototype VA to evaluate the suitability of the proposed solution.
For our experiments, we selected a popular open data source - the Open Food Facts1
database. This database represents a typical use case that we address in this research, the
database contains information that is valuable for many non-IT specialists or the general
public, but this database cannot be queried by non-IT specialists without knowledge of
SQL.
In the process of developing a VA for querying databases, one first needs to define
templates that will be offered to the user. Templates reflect the most common tasks that
users usually wish to accomplish. Internally, we represent each template consisting of
four parts that correspond to the SQL query as follows:

• The SELECT part with the names of fields or variables.


• The FROM part with the names of tables.
• The WHERE part with filter criteria.
• The ORDER BY part containing record ordering information.

For each template, we define a list of parameters whose values need to be acquired
from the user during a conversation with the VA.
Five types of entities are identified in user utterances: table names, field names,
number of records, sorting order, field alias names.
The only intents that depend on the data source are the intents that express the user’s
desire to proceed with a certain type of query template. Other intents do not depend on
the data source. The user can ask a question about the database, table, or field. Even if
1 https://world.openfoodfacts.org/.
560 D. Deksne and R. Skadiņš

the question contains a table/field name associated with the data source (recognized as
an entity), the form of the question does not depend on the data source. There are intents
that allow the user to change the list of fields, search in a different table, change the
sorting order of records, choose the number of records to display, and change filtering
conditions. The intent classifier is trained using 5-fold cross validation. It recognizes 14
intents with the accuracy 79.14%. The training data contains 163 utterance examples,
11 examples per intent on average. Some examples for the intent sort order: change
sorting order to descending, can I order by x field ascending, I would prefer sorting in
descending order, show me the records having the largest value of a, show records with
the least amount of x.
The dialog scenario consists of dialog states and transitions. One can move forward
in a conversation between various states of the scenario if a specific intent is recognized
in the user’s utterance or if any other condition defined for the transition is fulfilled.
The conversation starts with a guided dialog:

• The user is prompted to select one of the predefined query templates.


• The user is asked to specify the values of all parameters required for the template.
• The user is asked to specify the number of records to display.
• The user is offered to specify the sorting order of records by a specific field.

Next, the control is passed to the user. The user can adjust the parts of the query in
natural language. The VA processes the user’s input to detect the intent and entities, and
updates the query accordingly.
In this prototype we have implemented the following questions or commands that
the user can give at any time during the conversation:

• The user can ask what tables are included in the database.
• The user can ask what information is included in table X.
• The user can ask what fields are included in table X.
• The user can filter the query by specific field values and remove the filter from a field.
• The user can specify the number of records in the query.
• The user can specify which fields have to be included or removed from the query.
• The user can specify which table to search.
• The user can specify by which field name and in which order the records should be
sorted.
• The user can start composing a new query.

As the Open Food Facts database contains products from all over the world and
its data completeness varies, we selected a subset of the database containing 383,725
records for products available in the United States and the United Kingdom. This subset
was imported into a SQL database. In the original database, there were approximately
200 fields most of which were empty. We used only 20 fields containing information
about product category, energy, salt, sugars, fat, carbohydrates, proteins, fiber, vitamins
A, B12, C, D, and minerals iron and magnesium.
We implemented three templates from that the user can choose:
Virtual Assistant for Querying Databases in Natural Language 561

Table 1. Fragment of a dialog between the user and the VA based on the Food Facts Database

• Nutritional facts about a single product,


• Products filtered by a specific category,
• Products containing specific vitamins or minerals.

A sample conversation between the user and the VA is provided in Table 1. When
the user selects nutritional facts of a specific product, the VA asks to provide the name of
the product. When the user selects a template concerning a product that contains specific
vitamins or minerals, the VA asks to provide the name of the mineral or the vitamin field.
When the user selects a template about the products filtered by a specific category, the
VA asks to provide the name of the category.
For now, we use only textual input and output. As the result retrieved from the SQL
Server might contain several rows and fields, the voice output would not be efficiently
consumable by the user.

6 Evaluation
To evaluate the implemented VA prototype, 15 respondents were asked to solve three
tasks in a conversation with the VA (see Table 2). During every task, the users were
required to provide values of template parameters and to modify the query by asking the
VA to sort the records or to change specific filtering conditions. Not every respondent
tried all three tasks.
Qualitative evaluation of the errors allowed us to determine the main causes that led
to the failure to complete the tasks.

• Some users did not follow the instructions and did not use the ‘help’ option. The first
mandatory step of each dialog scenario was to choose one of the predefined templates.
562 D. Deksne and R. Skadiņš

Table 2. Tasks for evaluating VA.

Task Number of respondents Successful completion of the task


(%)
Find a product containing less 12 58%
than 8.0 g of iron
What is the maximum amount of 11 72%
salt in orange juice
Find the top 5 sugar richest 11 64%
beverages

Some users started by giving tasks to the VA prior to choosing a template. As no basic
template was selected, composing the query failed. To avoid such initial failure, some
restrictions were introduced in the dialog scenario to prevent the user from proceeding
without selecting a template.
• The second cause was the users’ wish to refer to fields with simplified names though
the VA presented a list of valid field names, e.g., the field ‘vitamin_c_100g’ was called
just ‘vitamin C’. This problem was solved by training the named entity recognizer to
recognize the field alias names and to map them depending on the field names in the
database schema.
• The third cause was the users’ inability to adjust the initial query. All possible actions
were described in the help section, and individual tips were given to users to advise
on how to proceed.
• On some occasions, the users provided just keywords, e.g., top5 beverages, sug-
ars_100g. The intent detection module failed to find a correct intent for inputs like
that. In fact, such inputs contained three intents: the number of records to show, results
filtered by product category that matches the beverages, and also a request to include
field ‘sugars_100g’ in the selection. This problem has not yet been solved.

7 Conclusion and Future Work

In this paper, we have presented the VA that helps users in composing a SQL statement
for querying the database in the natural language. The solution allows adjusting the
dialogue system for querying any database using various query languages. As most
parts of our system do not depend on a particular data source only intents for choosing
query templates, a set of named entities and query templates corresponding to a different
data schema must be adjusted.
To test our approach, we have implemented the VA prototype to query the Open
Food Facts database, but the same method can be applied also to other databases. 15
respondents were asked to perform three tasks. Analysis of conversations demonstrates
that this approach can succeed, if the dialog is very intuitive and, as users do not like
to read long instructions, short tips displayed during the dialog help to achieve the
desired result. The lowest success rate (68%) was achieved during the task that required
Virtual Assistant for Querying Databases in Natural Language 563

specifying filtering conditions. Other tasks required specifying the sorting order of the
records.
The created VA prototype clearly demonstrates that it is possible to build a VA that
allows its users without deep technical knowledge to build SQL queries and access the
necessary information from databases. Although the prototype VA needs some informa-
tion about the database schema and query templates, it can be adapted to other databases,
and to implement such VA, we do not need to collect and annotate large datasets. Standard
VA development techniques have been used in developing the VA prototype - rule-based
dialogue scenarios, intent detection, and entity recognition. This opens opportunities for
other researchers to build VAs for other databases using an approach similar to what we
propose.
There are still some limitations and issues that have not yet been addressed:

• When the values of the template parameters are collected or filtering conditions are
added, no checking is done if the value entered by the user is valid for the field type. It
should be checked in the future. A message explaining the error should be displayed
if an invalid value is provided.
• When the SELECT or WHERE part of the query is adjusted, the user can specify only
table fields and not the calculated fields that use mathematical functions. Calculated
fields should be defined in the initial query template.
• Two types of predicates can be entered for filtering conditions: standard comparison
operators (=,! =, <, >, <=, >= and the operator BETWEEN. The operator can
be expressed in words, for example, should not exceed, would be translated as <=
. All filtering conditions set out in separate utterances of a query are joined with the
logical operator AND. If the OR operator is required, the user must specify it in a
single utterance, e.g., the value of energy_100g is larger than 20.5 or less than 15.4
will be translated as energy_100g > 20.5 OR energy 100g < 15.4.
• Records can be sorted only by a single field.

These limits are not related to the approach proposed, rather to the scale of the
experiment. The methods already described can be used to make the prototype more
complete. The listed issues can be addressed in the future, and we are also planning to
work on more advanced SQL features, such as grouping, counting, joining tables etc.

Acknowledgments. The research leading to these results has received funding from the research
project “Competence Centre of Information and Communication Technologies” of EU Structural
funds, contract No. 1.2.1.1/18/A/003 signed between IT Competence Centre and Central Finance
and Contracting Agency, Research No. 2.3 “Neural network machine learning techniques for
automated creating of virtual assistant dialog scenarios”.

References
1. Affolter, K., Stockinger, K., Bernstein, A.: A comparative survey of recent natural language
interfaces for databases. VLDB J. 28(5), 793–819 (2019). https://doi.org/10.1007/s00778-
019-00567-8
564 D. Deksne and R. Skadiņš

2. Balodis, K., Deksne, D.: FastText-based intent detection for inflected languages. Information
10(5), 161 (2019)
3. Hemphill, C.T., Godfrey, J.J., Doddington, G.R.: The ATIS spoken language systems pilot
corpus. In: Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley,
Pennsylvania (1990)
4. Hosu, I.A., Iacob, R.C.A., Brad, F., Ruseti, S., Rebedea, T.: Natural language interface for
databases using a dual-encoder model. In: Bender, E.M., Derczynski, L., Isabelle, P. (eds.)
Proceedings of the 27th International Conference on Computational Linguistics, pp. 514–524.
ACL, Santa Fe (2018)
5. Lyons, G., Tran, V., Binnig, C., Cetintemel, U., Kraska, T.: Making the case for query-by-
voice with EchoQuery. In: SIGMOD 2016: Proceedings of the 2016 International Conference
on Management of Data, pp. 2129–2132. ACM, New York (2016)
6. Madotto, A., Wu, C.S., Fung, P.: Mem2Seq: effectively incorporating knowledge bases into
end-to-end task-oriented dialog systems. arXiv preprint arXiv:1804.08217 (2018)
7. Shekarpour, S., Auer, S., Ngomo, A.C.N., et al.: Keyword-driven SPARQL query gen-
eration leveraging background knowledge. In: WI-IAT 2011: Proceedings of the 2011
IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent
Technology, vol. 1, pp. 203–210. IEEE Computer Society, New York (2011)
8. Sun, Y., Tang, D., Duan, N., et al.: Semantic parsing with syntax-and table-aware SQL gen-
eration. In: Proceedings of the 56th Annual Meeting of the Association for Computational
Linguistics (Volume 1: Long Papers), pp. 361–372. ACL, Stroudsburg, PA (2018)
9. Xu, X., Liu, C., Song, D.: SQLNet: generating structured queries from natural language
without reinforcement learning. arXiv preprint arXiv:1711.04436 (2017)
10. Yahya, M., Berberich, K., Elbassuoni, S., Ramanath, M., Tresp, V., Weikum, G.: Deep answers
for naturally asked questions on the web of data. In: Proceedings of the 21st International
Conference on World Wide Web, pp. 445–449. Association for Computing Machinery, New
York (2012)
11. Yu, T., Zhang, R., Yang, K., et al.: Spider: a large-scale human-labeled dataset for complex
and cross-domain semantic parsing and text-to-SQL task. In: Proceedings of EMNLP 2018,
pp. 3911–3921. ACL, Stroudsburg, PA (2018)
12. Yu, T., Zhang, R., Er, H., et al.: CoSQL: a conversational text-to-SQL challenge towards cross-
domain natural language interfaces to databases. In: Proceedings of EMNLP-IJCNNLP 2019,
pp. 1961–1979. ACL, Stroudsburg, PA (2019)
13. Zafar, H., Napolitano, G., Lehmann, J.: Formal query generation for question answering over
knowledge bases. In: Gangemi, A., et al. (eds.) ESWC 2018. LNCS, vol. 10843, pp. 714–728.
Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93417-4_46
14. Zhong, V., Xiong, C., Socher, R.: Seq2SQL: generating structured queries from natural
language using reinforcement learning. arXiv preprint arXiv:1709.00103 (2017)
Neural Machine Translation for Native
Language Aymara to English

Honorio Apaza1(B) , Brisayda Aruhuanca2 , Mariela M. Nina1 , Anibal Flores1 ,


Carlos Silva1 , and Euler Tito1
1
Universidad Nacional de Moquegua, Moquegua, Peru
{hapazaa,2020204032,afloresg,csilvad,etitoc}@unam.edu.pe
2
Universidad Cayetano Heredia, Lima, Peru
[email protected]

Abstract. In Latin America, there is a culture called Aymara, which


has its own language also named Aymara. It is a native language in
danger of extension declared by UNESCO and a heritage of the Peruvian
nation. The work of Neural Machine Translator since its appearance has
been able to translate many languages of the world, however it is not very
well researched with native languages, in this work we experience for the
first time the automatic translation from Aymara to English with the
seq2seq model. First interesting results were obtained that could open
up new research projects.

Keywords: Neural machine translator · Natural language processing ·


Native language · Aymara · English

1 Introduction
The Aymara language: ISO (ayc, ays) is traditionally spoken in south zone of
Peru, Bolivia and northern Argentina and Chile. In the language itself, the cor-
rect spelling is Aymara [1], more precisely, it can be seen in Fig. 1.
Due to the field of computing in the year 1950, a translation machine based
on rules originates, then there is an evolution in the year 1980, the translation
machine based on the example, in 1990 a statistical translation machine appears
and finally in the year 2015 it gives the conversion to a new translation machine,
the latter has very interesting applications and with very good results, such is
the case in the industry, it is better known for the google translator application,
to cite one of the examples. Machine translation is an important research topic in
the field of artificial intelligence, which allows machines to learn to automatically
translate one language into another [3].
There are many translation studios for the most famous languages in the
world like English, Spanish, Chinese, Portuguese, etc. However, the problem is
that there are no studies with native languages of Latin America, because they
are poorly known and/or in danger of spreading, and there is no corpus of trans-
lation data to train neural machine translator (NMT) models. One of the factors
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
K. Arai (Ed.): FTC 2022, LNNS 561, pp. 565–576, 2023.
https://doi.org/10.1007/978-3-031-18344-7_40
566 H. Apaza et al.

Fig. 1. Aymara language domain area [2].

to take into account is the bilingual corpus, this issue becomes a critical resource
since they are the basis of any state-of-the-art machine translation system; fur-
thermore, building a parallel corpus is usually a complex and very expensive [4]
operation. Another important thing to consider is the existing models today, the
existing NMT systems use sequence-to-sequence neural networks to generate the
word-by-word target translation and then have the word generated at each time
step and the counterpart in the references are as consistent as possible [5], this
type of translation requires a set of examples of translations between the lan-
guages we want to translate, but so far they are the most used and with good
results.
The objective of this work is to build a small Aymara-English data set to
study the behavior and operation of the sequence-of-sequence model with RNN
architecture for translation from the native Aymara language to English.
The present work focuses on three main parts, 1) Due to the lack of data
set, we began by collecting texts of conversations in Aymara. 2) The next step is
create a data set structure with Natural Language Processing (NLP) techniques
to standardize the scripts and finally format them in input format to a recurrent
neural network (RNN). 3) Finally, the translations (set of collected data) are
trained with the seq2seq model and finally, the translation tests of examples
written in Aymara are carried out and the model returns the translation in
English version.
Neural Machine Translation for Native Language Aymara to English 567

2 Methodology
The procedure of this research work is according to Fig. 2.

Collection of documents with Manual translation of Spanish texts


conversations in into English
Aymara/Spanish

Preprocessing Machine translate modeling

Fig. 2. Research pipeline.

2.1 Collection of Documents

It is a stage of collecting documents that contain conversations in the native


Aymara language and their respective translations. The next step is to extract
this information to fit the input format to the training model.

2.2 Manual Translation

In the document [6], the texts of the conversations are written in Aymara and
their respective translations in Spanish. It was necessary to translate the Spanish
part into English by three of the authors, whose mother tongue is Aymara.

2.3 Pre Processing

It is the stage where natural language processing (NLP) techniques are applied,
which includes standardizing the writing, transforming characters compatible
with ACSII, cleaning strange characters and transforming the input format to
the recurrent neural network (RNN).

2.4 Machine Translate Modeling

Computers must receive input in a specific format so that they can under-
stand natural languages as humans do [7]. In this stage, the seq2seq model is
trained with the data previously pre-processed in the input format to the neural
networks.
568 H. Apaza et al.

3 Aymara
According the Ministerial Resolution No. 1218-85-ED, of November 18, 1985,
with 32 spellings (a, ä, ch, chh, ch’, i, ı̈, j, k, kh, k’, l, ll, m, n, ñ, p, ph, p’, q, qh,
q’, r, s, t, th, t’, u, ü, w, x, y) [1]. However, the variables of Aymara from Chile,
Tacna, Moquegua, and Jacaru, which have the velar nasal sound nh, were not
taken into account; That is why we have to know that Peru has 44 languages, 3
in the mountains, 1 on the coast and 40 in the jungle, as well as the varieties [8].
One of the characteristics of the Aymara language is the influence of the
Spanish language, because in the countries where the native Aymara language
is found, the Spanish language is also declared the official language. Therefore,
globalization and the migration of the same Aymara to the cities has gener-
ated considerable influence from Spanish to Aymara, therefore today Aymara is
spoken with words borrowed from Spanish.
The linguistic study of Aymara is still in process, many characteristics of
Aymara writing and speech need to be defined, according to existing studies of
the typology of the Aymara language, we present in the Tables 1, 2, 3 and 4 some
rules of morphophonological operation: According [8] it lacks voiced stop conso-
nants /b/, /d/, /g/; and the fricative consonants /f/ and /θ/ of Spanish. Hence,
we note that due to a substratum process (aimarization), by loan, hispanic terms
such as:

Table 1. Lacks voiced stop consonants.

Spanish Aymara English


Dios yusa/iyusa God
bueno winu Good/well
Gusto wustu Taste
compadre kumpäri Buddy
comadre kumäri Midwife
eucalipto yukalı̈tu/ükalitu Eucalyptus

According [8] Aymara has 140 suffixes, 40 verbal derivations (DV), 31 verbal
inflections (FV), 15 nominal derivations (DN), 25 nominal inflections (FN), 17
independent suffixes (SI) and 12 fossilized suffixes (SF).

Table 2. Absence of final consonant in nouns, verbs and suffixes.

Spanish Aymara English


Pantalón Pantaluna Pants
Reloj Reluju Watch
Higos Jiwusa Figs
Doctor Tuktura Doctor
Neural Machine Translation for Native Language Aymara to English 569

Table 3. Lacks the diphthongs /ue/ and /ie/, or different vowel sequences.

Spanish Aymara English


Fuentes Phuntisa Source
Fierro jiru Iron
Miercoles Mirkulisa Wednesday

Table 4. Absence of consonant clusters in word initial and syllable.

Spanish Aymara English


Platanos palatanusa/latanusa Bananas
Brigida pirijira/pirjicha Brigida (name)
Compadre kumpäri /kumpayri Buddy
Comadre kumäri/kumayri Midwife

4 Machine Neural Translator

The input texts can have various characters, such as initial capital letters, dif-
ferent writing characters, etc. therefore, it is very important to standardize the
input text. The first step is Unicode normalization to split accented characters
and replace compatibility characters with their ASCII equivalents. In this step,
we use tensrof lowt ext function of the tensorflow library.
Tokenization is the first step, before natural language processing, it is the
delimitation of sequences of words in a document, there are usually two ways
to build, first following the lexicographer’s experience and second is following
personal experience [9]. For this procedure we use the preprocessing.T extV ec
torization function from the Tensorflow library.
Figure 3 shows a very general visualization of the model. The result of the
decoder output is combined with a sum over the encoded input, for the prediction
of the next word [10].
The most basic way to understand NMT is by recognizing two main steps,
a) Encoder: which computes a representations for each source sentence and b)
Decoder: which generates one target word at a time and hence decomposes the
conditional probability [10].
570 H. Apaza et al.

Fig. 3. Encoder and decoder Model seq2seq.

4.1 Encoder

The decoder selects the input character sequence piecewise. Attention takes a
sequence of vectors as input for each instance and returns an “attention” vec-
tor for each instance. The equations presented below are extracted from the
Tensorflow Neural machine translation with attention tutorial [11], Effective
Approaches to Attention-based Neural Machine Translation [10] and Neural
Machine Translation by Jointly Learning to Align and Translate [12].

exp(score(ht , hs ))
αts = S (1)
s′ = 1exp(score(ht , hs′ ))

h∈s W hs
score(ht , hs ) = (2)
vσ∈ tanh(W1 ht + W2 hs )
The Eq. 1 compute the attention weights, as a softmax across the encoder’s
output sequence. The Eq. 2 calculates the context vector as the weighted sum of
the encoder outputs.
Neural Machine Translation for Native Language Aymara to English 571

Where:

– s is the encoder index.


– t is the decoder index.
– αts is the attention weights.
– hs is the sequence of encoder outputs being attended to (the attention “key”
and “value” in transformer terminology).
– ht is the decoder state attending to the sequence (the attention “query” in
transformer terminology).
– ct is the resulting context vector.
– at is the final output combining the “context” and “query”.


ct = αts hs (3)
s

The job of the Eq. 3 is to calculate a scalar logit-score for each key-query
pair.

4.2 Decoder

In the Eq. 4 show the decoder job which is generate the prediction for the next
out token. The decoder get a complete out of the encoder. It uses RNN to
track predictions and queries the attention on the encoder output, producing
the context vector. So, it combines the RNN output and the context vector
using to generate the attention vector.

αt = f (ct , ht ) = tanh(Wc [ct ; ht ]) (4)

5 Dataset
The first step was to recover translations of conversations in Aymara, which with
their respective translations into Spanish. Later, the translation into English was
carried out by the Aymara speakers, the texts of conversations correspond to
material AYMARA ARUSKIPAWINAKA (Conversations in Aymara) [6].
In total, 1915 conversations have been collected in text format, which can be
found at: https://github.com/Honorio-apz/AYMARA ARUSKIPAWINAKA,
the data set format is as shown in the Table 5.
572 H. Apaza et al.

Table 5. Data set format of the English - Aymara conversations translation.

N Aymara English
1 Aski urukı̈pan kullaka good day sister
2 Aski urukı̈panay kullaka good day sister
3 Kamisaki? how are you?
... ... ...
... ... ...
... ... ...
1912 Anupax allqawa Her dog is 2 colors
1913 Jurpürkam kullaka Until the day after tomorrow sister
1914 Jurpürkamay jilata see you the day after tomorrow brother

6 Results
6.1 Training
The training consists of three specific functions, 1) to calculate the loss function
and one to calculate the optimization, 2) update method for each input/target
batch at each training step, 3) a training loop and save the checkpoints for each
step.
Specifically, we read the input texts, then convert the input texts into tokens
and masks, then run the encoder to get the input tonks and state tokens. Next,
we start with the decoder function, it is executed on the target tokens loop, the
decoder step by step is executed, also the loss calculation step by step and the
loss average is accumulated.
A good sign of a new model is that it may overfit a lot of input, meaning that
the loss function values should quickly approach zero. See Fig. 4a. The jumps
visible on the graph are at the epoch limits, see Fig. 4b.

(a) Batch Loss. (b) The Epoch Limits.

Fig. 4. Detail of the training result.


Neural Machine Translation for Native Language Aymara to English 573

6.2 Test Translations Aymara-English

In Table 6 there are some examples of translations from Aymara to English,


which we test for the training model, in theory the translation should be correct.
In Fig. 5 you can see the python with Flask code and the respective translations
made by our training model. Where it can be noted that most of the translations
are exact, but in example 3 and 5 it is the model that chose to write other words
but that they mean the same, the context of the sentence implies the same
meaning as the original text.
The examples are simple sentences, as well as the sentences that are in the
data set used for training, however it was noted that it does not work well for
sentences composed of several words.

Table 6. Example of translations.

N Aymara English
1 Aski urukı̈pan kullaka Good morning/day sister
2 Kamisaki? How are you?
3 Waliki I am good
4 Juman sutimax kunasa What is your name?
5 Jumax uywanitati? Do you have animals?
6 Jikisiñkamay kullaka see you later sister

Fig. 5. Translation testing interface.


574 H. Apaza et al.

The attention values in theory should show where the focus is or the infor-
mation when the model generates new translations, the sum of attention values
should return all ones, the input and output word alignment should approximate
on a diagonal line, the results for example 4 and 5 can be seen in Fig. 6, where
the attention sum clearly does not align diagonally, this could be future work to
improve.
The examples of translations carried out have been evaluated according to the
original text and by a language specialist, the model was trained by forcing the
examples by entering the correct tokens at each step, regardless of the model’s
predictions. The model could be made more robust if it were sometimes fed with
its own predictions.
It could be improved by keeping feedback with the users, who validate the
good and erroneous translations, which are taken in future predictions, as is the
case with Google translator.

(a) 4ht Translation Example. (b) 5th Translation Example.

Fig. 6. Detail of the result examples one and three translations.

7 Conclusion

From this work it can be concluded that:

– The model works quite well with basic/simple sentences, it has difficulty
translating complex sentences, it also clarifies in the same tutorial of the
model in tensorflow.
Neural Machine Translation for Native Language Aymara to English 575

– The data set is very little, in total it was trained with 1914 examples of
conversations in the native Aymara language, this detail must be improved if
we want to improve the translations of the model and to make a much more
real analysis of the translation behavior from Aymara to English.
– The fact of studying cases of machine translation from native languages to
the world’s most well-known languages can bring improvements to the current
NMT models.

8 Future Work

– Work needs to be done on ways to improve the model for more complex
sentence translations.
– It is necessary to work on the collection of the data set of conversations in
the Aymara language, after on making the respective translations into the
English language.
– Study how to improve the alignment of sum of attentions in a diagonal form,
which would help predict longer sentences.

References
1. Ministerio de la cultura del Peru, Base de datos de pueblos indiginas u originarios
(2022)
2. Albó, X., et al.: Raices de América: el mundo aymara, 1a ed., Alianza Editorial
(1988). ISBN 84-206-4213-4
3. Zhou, M., Secha, J., Cai, R.: Domain adaptation for Tibetan-Chinese neural
machine translation. In: 2020 3rd International Conference on Algorithms, Com-
puting and Artificial Intelligence (ACAI 2020). Association for Computing Machin-
ery, New York, NY, USA, Article 77, 1–5 (2020). https://doi.org/10.1145/3446132.
3446404
4. Tse, R., Mirri, S., Tang, S.-K., Pau, G., Salomoni, P.: Building an Italian-Chinese
parallel corpus for machine translation from the web. In: Proceedings of the 6th
EAI International Conference on Smart Objects and Technologies for Social Good
(GoodTechs 2020). Association for Computing Machinery, New York, NY, USA,
pp. 265–268 (2020). https://doi.org/10.1145/3411170.3411258
5. Duan, C., et al.: Modeling future cost for neural machine translation. IEEE/ACM
Trans. Audio, Speech and Lang. Proc. 29 (2021), 770-781 (2021). https://doi.org/
10.1109/TASLP.2020.3042006
6. Aruskipawinaka, A.: Conversaciones en aimara, Román Pairumani Ajacopa and
Alejandra Bertha Carrasco Lima, Centro de Apoyo en Investigación y Educación
Multidisciplinaria - CAIEM (2022)
7. Zanini, N., Dhawan, V.: Text Mining: An introduction to theory and some appli-
cations. Research Matters, pp. 38–44 (2015)
8. Huayhua Pari, F.: Normas para el buen uso de la ortografı́a aimara. Lengua Y
Sociedad, 12(1), 167–176 (2017). Recuperado a partir de http://revista.letras.
unmsm.edu.pe/index.php/ls/article/view/428
576 H. Apaza et al.

9. Webster, J.J., Kit, C.: Tokenization as the initial phase in NLP. In: Proceedings
of the 14th Conference on Computational linguistics - Volume 4, COLING ’92.
Association for Computational Linguistics, USA, pp. 1106–1110 (1992). https://
doi.org/10.3115/992424.992434
10. Luong, M.-T., Pham, H., Manning, C.D.: Effective approaches to attention-based
neural machine translation (2015). https://doi.org/10.48550/arxiv.1508.04025
11. Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous sys-
tems (2015). Software available from tensorflow.org
12. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning
to align and translate (2014). https://doi.org/10.48550/arxiv.1409.0473
Vocabulary Expansion for the Sub-word
WFST-Based Automatic Speech
Recognition System

Askars Salimbajevs1,2 and Jurgita Kapočiūtė-Dzikienė3,4(B)


1
Tilde SIA, Vienibas Street 75a, 1004 Riga, Latvia
[email protected]
2
University of Latvia, Raina blvd. 19, 1050 Riga, Latvia
3
Tilde IT, Naugarduko Street 100, 03160 Vilnius, Lithuania
[email protected]
4
Faculty of Informatics, Vytautas Magnus University,
Vileikos Street 8, 44404 Kaunas, Lithuania

Abstract. This paper reports the improvement of the Lithuanian Auto-


matic Speech Recognition (ASR) system focusing on “vocabulary expan-
sion”, i.e. enabling ASR system to recognize words never seen during
training. These unseen words are called out-of-vocabulary (OOV) words
and involve: 1) regular Lithuanian words appearing due to different top-
ics or domains not covered in training; 2) complicated cases, i.e., foreign
names, brand names, and loanwords pronounced not according to reg-
ular Lithuanian pronunciation rules. In weighted finite-state transducer
(WFST) ASR OOV problem is typically solved by applying one of the
following solutions: (1) making ASR vocabulary unlimited by performing
recognition on sub-word level, (2) adding words directly to the WFST
decoding graph or (3) by reconstruction of OOV words from ASR result.
Our baseline Lithuanian ASR system already follows the first approach,
however many OOV words are still not being recognized, because of low
probability of corresponding sub-word sequences. Therefore, our offered
approach can be seen as a combination of the first two solutions: we
boost probabilities of sequences of sub-words (corresponding to words
being “added” to vocabulary) in a sub-word weighted finite-state trans-
ducer (WFST) ASR system. In such way the vocabulary of the ASR is
being “expanded”. The proposed approach allowed to achieve significant
improvement over the baseline: the percentage of misrecognized out-of-
vocabulary words dropped by ∼7%, while F1 reached 85.6%.

Keywords: Speech recognition · Out-of-vocabulary · Vocabulary


expansion · Sub-word units · The lithuanian language

1 Introduction
Since a human language constantly evolves, ASR systems need to correctly rec-
ognize newly occurring words that are unseen during training. This problem is
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
K. Arai (Ed.): FTC 2022, LNNS 561, pp. 577–592, 2023.
https://doi.org/10.1007/978-3-031-18344-7_41
578 A. Salimbajevs and J. Kapočiūtė-Dzikienė

especially apparent in the spoken language full of loanwords, foreign surnames,


brand names, etc. Moreover for languages that are morphologically rich and
highly inflected any fixed vocabulary of reasonable size will not cover all sur-
face forms (even not considering newly added words). Our research object is the
Lithuanian language as a good example of having all these problems.
A typical example of newly occurring words in Lithuanian are words having
another language origin typically retain their original pronunciation and spelling
(e.g., Joe Biden, Facebook, Coca cola, etc.), which are pronounced not accord-
ing to the conventional Lithuanian spelling rules. Besides, such words can be
even more “Lithuanized” by adding appropriate Lithuanian endings and inflec-
tion forms (e.g., Emmanuelis Macronas pronounced as Imanuelis Makronas, Joe
Bidenas as Džo Baidenas, etc.). Also, spoken language tends to contain a lot of
unique terms and jargon which are specific to the particular domain, company,
institution, etc. These words cause many issues in ASR systems. These unseen
or newly occurring words are called out-of-vocabulary (OOV) words.
A typical solution would be adding more and more training data contain-
ing these new words and/or memorizing them as exceptions. However, (1) this
can require retraining of ASR models, (2) is not possible to cover every pos-
sible scenario or domain. Thus, this solution does not work if ASR needs to
be dynamically adapted by end-users. For example, voice dialing ASR should
be able to recognize a new name immediately after it is added to the contact
list. Moreover, this adaptation has to be performed without requiring a lot of
memory and computing resources.
The baseline Lithuanian ASR system is a weighted finite-state transducer
(WFST) based speech recognition system. In such systems OOV problem is
typically solved by applying one of the following solutions: (1) making ASR
vocabulary unlimited by performing recognition on sub-word level, (2) adding
words directly to the WFST decoding graph or (3) by reconstruction of OOV
words from ASR result. Our baseline Lithuanian ASR system already imple-
ments first solution and is designed to recognize right-marked BPE (byte-pair
encoding) sub-word units [1] and uses special adapted WFST which allows only
legal sub-word combinations [2]. The system can potentially recognize any previ-
ously unseen word, however, if the word is not in the training data and acoustic
conditions are challenging, then the correct word will have a very low probability
and most likely won’t be recognized.
Therefore, in this paper we focus on improvement of the Lithuanian WFST
ASR system focusing on “vocabulary expansion”, i.e. enabling ASR system to
recognize words never seen during training. We offer several solutions how to
“update” the vocabulary of the sub-word ASR system and boost the probability
of the new words without retraining of ASR models.
The remainder of this paper is structured as follows. Section 2 presents
previous studies on the OOV problem in the ASR field in general and for
Lithuanian language specifically. Section 3 describe the proposed solution: WFST
speech recognition and baseline Lithuanian speech recognition system. Section 4
presents the proposed solution for vocabulary expansion in sub-word WFST
system, the baseline and evaluation metrics. Results of evaluation are given in
Vocabulary Expansion for the Sub-word WFST-Based ASR 579

Sect. 5. Section 6 is devoted to discussion of the achieved results. Finally, Sect. 7


concludes the paper.

2 Related Work
The problem of out-of-vocabulary words (OOV) is typical for any speech recog-
nition system. Most systems are usually constructed to recognize a fixed set of
words and rarely can include all the words that will be encountered during the
exploitation of the system. Instead, the system will try to find the (acoustically)
closest in-vocabulary (IV) word, affect the surrounding context, and confuse the
end-user or downstream models like machine translation or intent detection.
Character or grapheme-based end-to-end (E2E) systems [3] seem like the per-
fect solution to our problem: they use neural models mapping audio (acoustic
features) to text (graphemes) directly. E2E systems perform global optimiza-
tion in a data-driven fashion and reduce the complexity compared to traditional
hybrid ASR systems. Since E2E systems have mechanisms for jointly learning
pronunciation and language information as a single model, it makes them espe-
cially robust in coping with open vocabulary problems. However, despite open
vocabulary advantage, grapheme-based E2E systems are significantly outper-
formed by sub-word or word-based systems [4–6].
Also, E2E systems require much more training data to outperform hybrid
ones. Comparative experiments on irregularly spelled English demonstrate E2E
superiority over hybrid ASR systems only with more than 10,000 h of training
data [7]; and with fewer data (∼100-1,000 h) hybrid systems guarantee much
better performance [8]. The word error rate (WER) with the E2E system on
Turkish and Georgian languages with much smaller datasets (73.4 and 50.2 h)
is high > 38.9% and > 46.3%, respectively, and may not be sufficient for some
tasks [9]. Whereas comparative experiments performed under the same experi-
mental conditions demonstrate the drop in WER to 32.2% for the hybrid ASR
system [10] on the same Georgian dataset.
The Lithuanian language has several publicly available corpora: LIEPA1 [11],
SEIMAS [12], LIEPA21 which makes up ∼1,300 h in total. However, at the time
when baseline Lithuanian ASR was trained, only about ∼300 h were available
(no LIEPA2). Except for several consonant assimilation rules, the Lithuanian
language has relatively regular spelling, which theoretically means that E2E
systems should not require as much training data as English to learn how to
recognize regular Lithuanian words. However, the problem we are tackling in
this research is not only regular Lithuanian words but also surnames, brand
names, and other complicated cases that appear in different domains/topics
and are pronounced not according to the Lithuanian rules. Unfortunately, the
publicly available Lithuanian corpora lack these critical examples: they should
be collected specifically for different customization tasks.
Considering resources available for the Lithuanian language together with
findings by other researchers, we come up with the decision of using a hybrid
1
https://xn--ratija-ckb.lt/liepa-2/infrastrukturines-paslaugos/garsynas/.
580 A. Salimbajevs and J. Kapočiūtė-Dzikienė

ASR system for our OOV problem-solving, especially having in mind that we
already have the background in this direction.
There are multiple approaches to deal with OOVs in hybrid ASR depend-
ing on the application. In some applications it is enough just to detect these
occasions [13–17] while other applications require a mechanism to recover OOV
words. The most primitive way is by adding these words directly into the lan-
guage model and pronunciation model. Unigram probabilities can be set to some
default value or trained on a small number of examples, while the pronuncia-
tion model can be updated either manually or automatically [18,19]. There has
been a lot of research on how to achieve this in WFST-based speech recognition
without rebuilding the decoding graph and retraining ASR models [20–24].
Another popular approach is to use language models containing <unknown>
token that can represent any OOV word and another generic (phonemic) lan-
guage model trained on a lexicon of words with low counts. During the recovery
process, the OOV word is aligned with the <unknown> token from the language
model and recognized as the sequence of phones from the phonemic language
model [25–27]. Usually, both learned word and phonemic language models are
static. However, some authors (e.g., [28]) overcome this limitation by offering
solutions on how dynamically recover recognized phoneme sequences as OOV
words: with the second pass decoding, the vocabulary is dynamically expanded
by calibrating OOV candidates’ language model scores (considering their pro-
nunciation, spelling, empirical frequency, and overall OOV rate which cannot be
done during the first pass).
However, taking into account the Lithuanian language specifics (high inflec-
tion) word-based approaches would require having a very large vocabulary (hun-
dreds of thousands of units) as each surface form will be represented as a separate
entry in the vocabulary. This creates challenges for accurate language modeling,
as it greatly increases the sparsity of n-grams and requires special solutions for
the state-of-the-art neural network language models (most of such models can
not be efficiently trained with such a large output layer).
One workaround for the large vocabulary problem is the sub-word-based
model. The approach is based on the assumption that theoretically each word
can be composed as a sequence of sub-word units, the number of which is much
smaller and fixed. The comparative experiments [29] between word-based and
sub-word-based approaches on the English and German (which is inflective and
full of compound words) languages show no improvement for the English lan-
guage but significant improvement for German. Similar improvements have also
been demonstrated for agglutinative languages [2].
The sub-word approach potentially can also solve the OOV problem. There
are several groups in this family of methods. Some approaches are straightfor-
ward: the language model is trained on the variable-length sub-word units which
make such ASR system open vocabulary [2,30,31]. Another group represents the
hybrid language models which combine both word and sub-word units. During
the decoding, sub-words would have higher posterior probabilities at the regions
of OOV words [32–34]. Despite theoretically sub-word units (especially shorter)
Vocabulary Expansion for the Sub-word WFST-Based ASR 581

having the potential to “recover” any word in the Lithuanian language; they
still fail to recover quite a lot of words, especially correct written forms of non-
Lithuanian origin words. The ability of sub-word systems to recognize unseen
words can be improved using regularization during training [35,36]. However, we
are not aware of research on boosting the probabilities of such words in sub-word
WFST ASR systems without retraining.
The previously presented related review analysis covers the research done
not for the Lithuanian language. Whereas [37] presents a comprehensive 15-
year overview of various attempts to create Lithuanian word and sub-word ASR
systems. The authors also experimentally investigated several approaches with a
rather small dictated speech corpus of 50 h containing readings from books. Their
investigation claims the superiority of phone-based mappings over grapheme-
based by proving that the best results are achieved with a phoneme-based lex-
icon that explicitly models syllable stress and represents diphthongs as single
phonetic units. Despite the comprehensive overview and interesting comparative
experiments, the authors do not pay specific attention to the OOV or vocabulary
expansion problems.
The other currently available Lithuanian ASR systems [38] also do not have
mechanisms able to dynamically treat the OOV problem. Consequently, this
research will be mainly focused on this type of problem for the Lithuanian lan-
guage. The contribution of our research is two-fold:
– We focus on “vocabulary expansion” and recognition of new words in the open-
vocabulary sub-word ASR system. Since the feature is intended to be used
by end-users, two important conditions are considered: 1) the customized
vocabulary is “expanded” without acoustic and language model retraining:
only by boosting probabilities of new words; 2) the system’s adaptation is
performed in the production environment without requiring a lot of memory
and computing resources.
– For the first time, the OOV problem is being solved in the Lithuanian ASR
system. Besides, we tackle not only regular Lithuanian words, but also more
complicated cases (i.e., foreign surnames, brand names, etc.).

3 WFST-Based Speech Recognition


Speech recognition problem typically is formulated mathematically as follows:

T ∈= arg max(W |X)


W

where X is the acoustic signal and T ∈ is recognized word sequence. In other


words, we are trying to find a word sequence that has the maximal conditional
probability given acoustic signal X. For this, we need to train a statistic model
P (W |X). Unfortunately, it’s practically impossible to estimate this probability
directly; however, the speech recognition problem can be solved by evaluating
this probability indirectly by using the Bayes rule, thus the problem is rewritten
as follows:
582 A. Salimbajevs and J. Kapočiūtė-Dzikienė

P (X|W )P (W )
T ∈= arg max
W P (X)
where P (X|W ) is the conditional probability of acoustic signal X given word
sequence W , or acoustic model, P (W ) is unconditional probability of word
sequence W , or language model, and P (X) is unconditional probability of acous-
tic signal X.
Because the optimal value of W is independent of P (X), this probability can
be ignored in the optimization process and the decoder works only with non-
normalized probabilities. However, some estimate of this probability might be
necessary if one would want to calculate the normalized probability of T ∈ given
acoustic signal X. For example, for calculating the confidence of the decoder for
a given recognized word sequence.
A finite-state transducer (FST) is a finite-state machine with two memory
tapes, following the terminology for Turing machines: an input tape and an out-
put tape. An FST is a type of finite-state automaton (FSA) that maps between
two sets of symbols. An FST will read a set of strings on the input tape and
generate a set of relations on the output tape. An FST can be thought of as a
translator or relater between strings in a set. Finite State Transducers can be
weighted, where each transition is labeled with a weight in addition to the input
and output labels.
In weighted finite-state transducer (WFST) based speech recognition [40],
the search network for a decoder is composed out of four separate finite-state
transducers that each provide one part of the mapping from sounds to words. The
hidden Markov model FST (H) maps emission distributions from the acoustic
model (which can be neural network or Gaussian model) to context-dependent
phones. After that, the context FST (C) maps these context-dependent phones to
context-independent phones. The third part is the lexicon FST (L) which maps
phone sequences to words and inserts appropriate silences on word boundaries.
The last FST is more like an acceptor; the grammar or language model FST
(G) gives appropriate probabilities to the word sequences. All these transducers
are composed together into a single optimized transducer HCLG (usually called
decoding graph), which is used for speech recognition.

4 Proposed Solution

4.1 Baseline

Our baseline Lithuanian ASR system is a WFST-based hybrid ASR system,


which uses a sub-word approach to tackle the vocabulary size problem. It is
designed to recognize right-marked BPE sub-word units [1] and uses special
adapted WFST which allows only legal sub-word combinations [2].
The acoustic model is a hybrid Hidden Markov and a Time-delay Deep Neural
Network (TDNN) model, which is trained on LIEPA1 and SEIMAS speech cor-
pora (about 300h in total, ≈ 1, 500h after speed and reverb augmentation [41]).
Vocabulary Expansion for the Sub-word WFST-Based ASR 583

The language model is implemented on a sub-word level and consists of 4-


grams, 6-grams, and a recurrent neural network (RNNLM). Decoding is per-
formed with 4-gram and other models are used for rescoring. All models are
trained on the same Lithuanian web news text corpus containing about 55 mil-
lion sentences and 769 million words. Before training corpus is pre-processed:
normalized, filtered, and split into sub-words.

4.2 Vocabulary Expansion for the Sub-Word WFST ASR


Let W be the set of OOV words that we would like our ASR to be able to rec-
ognize. Some of these words can be of non-Lithuanian origin and therefore their
pronunciation would be different from spelling. For such words (e.g. airlines)
we rewrite them with regular Lithuanian spelling (e.g. eirlains) before adding to
the W. Let R be the mappings between “Lithuanized” and original spellings. The
rewriting can be done automatically by the grapheme-to-phoneme algorithm or
specified manually.
Then, each word in W can be represented as a sequence of sub-word units,
that can be translated to a path in the sub-word G transducer. Let P be the set
of such paths. Then, the idea of our proposed solution is to use a “path boosting”
transducer B, which is composed with the decoding graph of the WFST ASR
and lowers the weights of specific paths P in the sub-word G transducer. As the
result, the probabilities of words W are boosted.
The boosting FST B can be defined with the following grammar:
< text > = ‘ ‘* bypass * ’ ’ | < words >;
< words > = ‘‘ bol + ’ ’ ‘‘ tik ’ ’ | ‘‘ e + ’ ’ ‘‘ ir + ’ ’ ‘‘ la + ’ ’ ‘‘ ins ’ ’ | ... ;

where “*bypass” is a special symbol that matches any input sub-word.


This grammar is then compiled into a classic FST with “*bypass*” arcs
expanded to cover all possible sub-words (see Fig. 1). Arcs of boosted paths
are assigned negative weight -a and “*bypass*” arcs have zero weight (so they
will not interfere with usual decoding). While it is also possible to specify indi-
vidual boosting weights, for simplicity in this work we will use a single weight
for all words. Finally, the FST is converted into a closure to allow zero or more
repetitions of the words.
There are multiple ways how to integrate the resulting boosting FST into a
WFST decoding progress. In this research we investigated the following avenues:
(a) Static composition with HCLG decoding graph, e.g. HCL ⊗ (G ⊗ B);
(b) Hybrid composition: static composition with G and dynamic composition
with HCL, HCL ⊗ (G ⊗ B);
(c) Dynamic composition with HCLG decoding graph, e.g. HCLG ⊗ B.
In (a), we first statically combine language model FST G with boosting
FST B. Then, the result is composed with other speech recognition transducers
as usual. The advantage of this method is that it uses only the usual static
composition, which allows fast decoding and easy implementation. However, the
static composition itself requires a lot of computational and memory resources.
584 A. Salimbajevs and J. Kapočiūtė-Dzikienė

Fig. 1. Boosting FST to cover possible sub-words.

In the (b) static composition is used to combine G and B, but then the
dynamic composition is used to combine the final decoding graph. This method
should allow lowering the hardware requirements. For dynamic composition, we
use lookahead composition by [39].
Finally, in (c) boosting FST is dynamically composed with the static HCLG.
This approach should theoretically have the lowest hardware requirements and
lowest latency, allowing it to be used in real-time speech recognition.
During decoding with such composed graph probabilities of sub-word paths
P are boosted. After decoding sub-words are glued back into words and reverse
rewriting is performed using mappings R. Therefore, the probability to recognize
OOV words W is increased making their recognition possible. Note that G itself
is not modified, so when rescoring is performed only G weights are subtracted,
that means boosting continues to work during rescoring too as boosting weights
remain in the lattice.

4.3 Evaluation

For evaluation of the proposed solution, we use the 4-hour, 1,417 utterance data
set, made mainly from the news broadcast recordings. These utterances contain
named entities of non-Lithuanian origin (e.g., Thomasas Walkupas pronounced
as Tomasas Volkapas, Baltic MG as Boltik emdži, Facebook as Feisbuk ) or other
special terms, like car body types (e.g., bečbekas). From these named entities
and terms, we created a list of 54 OOV words (together with manually assigned
Vocabulary Expansion for the Sub-word WFST-Based ASR 585

“Lithuanized” pronunciations) that were not seen during language model train-
ing. These 54 OOV words appear 352 times in total or ∼6.52 times per word on
average.
From these 54 words an additional synthetic audio test set was created by
applying the Lithuanian speech synthesizer2 on pronunciations of these words.
All three variations of the proposed method was evaluated and compared
with the baseline using the following metrics:
– ASR WER and CER (Character Error Rate) on both test sets.
– Percentage of OOV words missing in the ASR transcript on both test sets.
– For each OOV word, numbers of true positives, false positives, and false nega-
tives were collected and used to compute micro-average precision, recall, and
F1 score on real test set.
As mentioned in the Introduction section, not only the accuracy but also the
speed and computational resources are important in our work. Therefore, we
also measured how much time and memory resources are needed to update the
vocabulary using proposed method and to perform the ASR decoding.

5 Results
Firstly, evaluation experiments on the synthetic audio test set were performed.
Table 1 contains the best WERs (after optimizing the language model (LM)
weight scores from the range [7, 14]) obtained by each booster composition
method. We have used a = 1 as an initial boosting weight in B.

Table 1. Results on the synthetic dataset.

Method Miss, % WER,%


Baseline 59 71.48
Boosting with (a) 40 42.87
Boosting with (b) 40 42.87
Boosting with (c) 40 42.87

The results show that the baseline sub-word system could not recognize about
59% (see Miss, % notation) of the OOV words (with WER = 71.48%). This is
a very good result considering these words are not seen during language model
training. It shows that in good acoustic conditions sub-word approach alone
can recognize significant amount of unseen words. With the proposed boosting
FST the result can be improved to 40% (WER = 42.87%), which is a significant
improvement. Since all composition strategies achieved the same result, it allows
us to conclude that the offered approach is effective and stable.
2
https://xn--ratija-ckb.lt/liepa-2/paslaugos-vartotojams/interneto-naujienu-skaityt
uvas/.
586 A. Salimbajevs and J. Kapočiūtė-Dzikienė

Table 2. Evaluation on the real dataset containing human speech recordings.

LM weight Method Results with OOV words Results with all words
Precision, % Recall, % F1, %, Miss, % WER, % CER, %
7 baseline 98.6 65.1 78.4 43 13.0 4.5
(a) and (b) 98.6 67.9 80.5 39 12.9 4.5
(c) 98.7 71.8 83.1 39 12.9 4.4
8 baseline 97.9 65.6 78.5 43 12.2 4.3
(a) and (b) 97.9 67.9 80.2 39 12.2 4.3
(c) 97.4 72.7 83.3 39 12.1 4.2
9 baseline 98.0 68.9 80.9 43 11.6 4.1
(a) and (b) 98.0 71.8 82.9 37 11.6 4.1
(c) 97.4 73.7 83.9 39 11.5 4.1
10 baseline 97.4 70.8 82.0 37 11.3 4.0
(a) and (b) 97.5 73.7 83.9 31 11.3 4.0
(c) 97.5 75.5 85.2 31 11.2 4.0
11 baseline 97.4 71.8 82.6 37 11.3 4.0
(a) and (b) 97.5 74.6 84.6 31 11.3 4.1
(c) 97.7 76.1 84.8 31 11.3 4.0
12 baseline 96.8 72.2 82.7 37 11.3 4.1
(a) and (b) 96.9 75.6 84.9 31 11.2 4.1
(c) 95.8 75.6 84.5 31 11.2 4.1
13 baseline 96.8 72.7 83.1 37 11.4 4.2
(a) and (b) 96.4 76.1 85.0 31 11.4 4.2
(c) 95.2 76.1 84.6 30 11.4 4.2
14 baseline 96.2 71.8 82.2 39 11.6 4.3
(a) and (b) 95.2 75.1 84.2 33 11.6 4.3
(c) 94.6 75.6 84.0 30 11.6 4.3

The second set of experiments is performed on the real test set and boosts the
same list of OOV words. This allows the evaluation of sub-word path boosting
on real audio with human pronunciation and context around each OOV word.
For each OOV word, numbers of true positives, false positives, and false
negatives were collected and used to compute micro-average precision, recall,
and F1 score reported in Table 2. Besides, the percentage of OOV words that
the ASR system failed to recognize are also presented. The last two columns
(WER and CER) represent the overall ASR recognition results on all words
(including OOV words) in the test set. The table also reports the results for
different language model weights = [7, 14] and booster composition approaches
(none/baseline, static (a), hybrid (b), dynamic (c)). Since boosting with static
and hybrid composition achieved the same result, they are presented as the
single row next to the each language model weight. Boosting weight is a = 1 as
previously.
Vocabulary Expansion for the Sub-word WFST-Based ASR 587

Table 3. Performance and memory requirements of different composition methods.

Method Composition RAM Decoding RT Rescoring


time time factor time
Static (a) 240 s 8 GB 393 s 0.09 633 s
Hybrid (b) 40 s <2 GB 1022 s 0.24 627 s
Dynamic (c) 1.5 s <0.1 GB 1002 s 0.24 575 s

The comparative analysis of baseline results achieved on the synthetic and


real datasets allows us to conclude that the additional context around OOV
words positively impacts their recognition: in the best scenario, only 37% of OOV
words cannot be recognized in the real dataset, compared to 59% in synthetic.
The best F1 score for the OOV words with the baseline approach is 83.1%.
Similar trends were observed during the analysis of results with the boosting
FST: i.e., contextual information helps to better recognize the OOV words (the
percentage of OOV words that the system failed to recognize drops from 40% to
∼30% with synthetic and real data, respectively). Besides, despite the language
model weight being used, boosting help to increase F1 score values over the
baseline. With the optimal language model weights, the increase is from ∼83.1%
to ∼85.2%. Such improvement mainly comes from the better recall (i.e., lower
false-negative rate: the lower number of the incorrectly indicated absence of an
OOV word when that word is present).
While (a) and (b) composition approaches give absolutely the same result
(which is to be expected), the composition (c) differs slightly and achieves the
best overall F1 score with the language model weight = 10. Further investigation
is needed to understand the cause of such results.
Speed and computational resource investigation was performed on the real
testing dataset (details are summarized in Table 3). The static composition
enables the fastest decoding equal to the speed of the baseline; however, it
requires the biggest amount of computation resources to prepare the decod-
ing graph, i.e., about 4 min and 8 GB of RAM, which makes it impossible to use
in most dynamic adaptation scenarios.
When using the hybrid or dynamic composition, the decoding graph is being
constructed at the run-time, which makes the recognition process almost 3 times
slower. Still, it is faster than the real-time (RT factor is 0.24). Results show
both hybrid and dynamic composition require less time and memory than static
composition to prepare the decoding graph. Dynamic composition is the most
efficient, only 1.5 s and 100 MB are necessary to prepare the boosting FST, while
decoding performance is equal to hybrid. Such a small delay allows using this
method in the online or streaming speech recognition scenario.
Finally, several experiments with the dynamic composition were performed to
determine how different values of the boosting weight “a” affect the performance
of the ASR. Table 4 reports different tested “a” values with the best F1 scores
achieved by optimizing language model weight values from the interval [7, 14].
588 A. Salimbajevs and J. Kapočiūtė-Dzikienė

Table 4. Effect of different boosting weight values.

a Results with OOV words Results with all words


Precision, % Recall, % F1 %, Miss, % WER, % CER, %
0.5 97.5 76.1 85.5 33 11.3 4.1
1.0 97.5 75.5 85.2 31 11.2 4.0
1.5 96.4 76.6 85.3 31 11.3 4.0
2.0 94.8 78.0 85.6 30 11.3 4.1
2.5 91.6 78.0 84.2 28 11.3 4.0
3.0 90.2 78.9 84.2 26 11.3 4.0
4.0 88.7 82.8 85.6 20 11.3 4.0

6 Discussion
Overall, the experimental results have proved that the proposed approach
enables ASR to recognize more OOV words than a simple sub-word system.
It has demonstrated robustness even recognizing complicated cases, i.e., words
that are not pronounced accordingly to regular Lithuanian pronunciation rules
(e.g., Facebook, Baltic MG, etc.). However, there are several limitations: (1) there
is a performance penalty for booster FST initialization and decoding, (2) only
the main form of the word is boosted, each inflection form has to be manually
added to the boosting FST, (3) recall is still far from 100%.
The static composition enables the fastest decoding but requires much more
decoding, rescoring time, and RAM, which makes it unsuitable to use in pro-
duction by typical end-users. Despite, decoding with the dynamic or hybrid
composition is almost 3 times slower, this is still faster than real-time, therefore
can be used in practice. Moreover, the dynamic composition does not require
resource-consuming preparation of the decoding graph and can be used even in
online recognition. It is important to mention, that these features are in line
with our goals of using the system with real customers.
It can be seen from Table 4 that between the boosting weight “a” and the
recall there is a direct correlation: increasing “a” increases the recall and thus
more “OOV” words are recognized (therefore the percentage of OOV words that
are failed to recognize is lowered). On the contrary, an increasing number of “a”
increases the number of false positives simultaneously and degrades the precision.
The peak of the F1 score is with a = 2: the continuous increase of “a” degrades
the F1 score. However, we believe that the optimal value of “a” depends on the
specific usage scenario: in some cases, higher might be more important than
precision or recall.

7 Conclusion
This paper presents the method for vocabulary expansion in a sub-word WFST
ASR system, that enables to recognize newly added words, that were unseen or
Vocabulary Expansion for the Sub-word WFST-Based ASR 589

OOV during training. The method works by creating a boosting FST from a
list of words to be added and optionally their pronunciations. Then, this FST
is composed with decoding graph to increase probabilities of added words. Dif-
ferent FST composition techniques are evaluated on experimental results of the
Lithuanian ASR. The OOV word list used in evaluation includes both the spe-
cific terminology and complicated cases (i.e., words that are pronounced not
according to regular Lithuanian pronunciation rules, e.g., Facebook pronounced
as Feisbuk ; Thomasas Walkupas as Tomasas Volkapas). The research is novel
because this problem for the Lithuanian language has never been solved before.
The evaluation shows that the proposed approach achieved its aim. Improve-
ments over the sub-word ASR baseline were shown on both synthetic and real
evaluation data. On the later the percentage of misrecognized out-of-vocabulary
words dropped by ∼7% and F1 improved from 83.1% to 85.6% comparing with
baseline sub-word WFST system.
However, there are still many different cases where our approach failed.
We believe that sub-word regularization should improve boosting performance.
Another problem is that currently each inflection form should be added to boost-
ing list separately. These issues will be addressed in the future research.

Acknowledgments. This research has been supported by the ICT Competence Cen-
tre (www.itkc.lv) within the project “2.8. Automated voice communication solutions
for the healthcare industry” of EU Structural funds, ID no 1.2.1.1/18/A/003.

References
1. Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with
subword units. In: Proceedings of the 54th Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long Papers), pp. 1715–1725 (2016)
2. Smit, P., Virpioja, S., Kurimo, M., et al.: Improved subword modeling for WFST-
based speech recognition. In: Interspeech, pp. 2551–2555 (2017)
3. Wang, S., Li, G.: Overview of end-to-end speech recognition. J. Phys: Conf. Ser.
1187(5), 052068 (2019). https://doi.org/10.1088/1742-6596/1187/5/052068
4. Rao, K., Sak, H., Prabhavalkar, R.: Exploring architectures, data and units for
streaming end-to-end speech recognition with RNN-transducer. 2017 IEEE Auto-
matic Speech Recognition and Understanding Workshop (ASRU), pp. 193–199.
IEEE (2017)
5. Chiu, Ch.-Ch., et al.: State-of-the-art speech recognition with sequence-to-sequence
models. In: 2018 IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP), pp. 4774–4778. IEEE (2018)
6. Zenkel, T., Sanabria R., Metze, F., Waibel, A.: Subword and crossword Units for
CTC acoustic models. In: Proceedings of the Interspeech 2018, pp. 396–400 (2018).
https://doi.org/10.21437/Interspeech.2018-2057
7. Sainath, T.N., et al.: No need for a lexicon? Evaluating the value of the pronunci-
ation lexica in end-to-end models. CoRR, arXiv:abs/1712.01864 (2017)
8. Lüscher, Ch., et al.: RWTH ASR systems for LibriSpeech: hybrid vs attention
Interspeech 2019, ISCA (2019). https://doi.org/10.21437/interspeech.2019-1780
590 A. Salimbajevs and J. Kapočiūtė-Dzikienė

9. Laptev, A., Andrusenko, A., Podluzhny, I., Mitrofanov, A., Medennikov, I.,
Matveev, Y.: Dynamic acoustic unit augmentation with BPE-dropout for low-
resource end-to-end speech recognition. Sensors (9), 3063 (2021). MDPI AG .
https://doi.org/10.3390/s21093063
10. Alumäe, T., et al: The 2016 BBN Georgian telephone speech keyword spotting sys-
tem. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Pro-
cessing (ICASSP), pp. 5755–5759 (2017). https://doi.org/10.1109/ICASSP.2017.
7953259
11. Laurinčiukaitė, S., Telksnys, L., Kasparaitis, P., Kliukienė, R. Paukštytė, V.:
Lithuanian speech corpus liepa for development of human-computer interfaces
working in voice recognition and synthesis mode. Informatica 29(3), 487–498
(2018). https://doi.org/10.15388/Informatica.2018.177. Vilnius University Insti-
tute of Data Science and Digital Technologies
12. Salimbajevs, A., Kapočiūtė-Dzikienė, J.: General-purpose lithuanian automatic
speech recognition system. In: Human Language Technologies – The Baltic Per-
spective – Proceedings of the Eighth International Conference Baltic HLT, vol.
307, pp. 150–157. IOS Press (2018). https://doi.org/10.3233/978-1-61499-912-6-
150
13. Rastrow, A., Sethy, A., Ramabhadran, B.: A new method for OOV detection using
hybrid word/fragment system. In: 2009 IEEE International Conference on Acous-
tics, Speech and Signal Processing, pp. 3953–3956. IEEE (2009)
14. White, Ch., Zweig, G., Burget, L., Schwarz, P., Hermansky, H.: Confidence estima-
tion, OOV detection and language id using phone-to-word transduction and phone-
level alignments. In: 2008 IEEE International Conference on Acoustics, Speech and
Signal Processing, pp. 4085–4088. IEEE (2008)
15. Kumar, R., et al.: Detecting OOV named-entities in conversational speech. In:
Thirteenth Annual Conference of the International Speech Communication Asso-
ciation (2012)
16. Lin, H., Bilmes, J., Vergyri, D., Kirchhoff, K: OOV detection by joint word/phone
lattice alignment. In: 2007 IEEE Workshop on Automatic Speech Recognition &
Understanding (ASRU), pp. 478–483. IEEE (2007)
17. Asami, T., Masumura, R., Aono, Y., Shinoda, K.: Recurrent out-of-vocabulary
word detection based on distribution of features. Comput. Speech Lang. 58, 247–
259 (2019)
18. Lee, Ch-y., Zhang, Y., Glass, J.: Joint learning of phonetic units and word pronun-
ciations for ASR. In: Proceedings of the 2013 Conference on Empirical Methods in
Natural Language Processing, Seattle, Washington, USA, pp. 182–192, Association
for Computational Linguistics (2013). https://aclanthology.org/D13-1019
19. Lee, Ch.-y., O’Donnell, T. J., Glass, J.: Unsupervised Lexicon Discovery from
Acoustic Input. Transactions of the Association for Computational Linguistics,
Cambridge, MA, vol. 3, pp. 389–403. MIT Press (2015). https://doi.org/10.1162/
tacl_a_00146
20. Aleksic, P., Allauzen, C., Elson, D., Kracun, A., Casado, D.M., Moreno, P.:
Improved recognition of contact names in voice commands. In: 2015 IEEE Inter-
national Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.
5172–5175. IEEE (2015)
21. Allauzen, C., Riley, M.: Rapid vocabulary addition to context-dependent decoder
graphs. In: Sixteenth Annual Conference of the International Speech Communica-
tion Association (2015)
Vocabulary Expansion for the Sub-word WFST-Based ASR 591

22. Bulusheva, A., Zatvornitskiy, A., Korenevsky, M.: An efficient method for vocab-
ulary addition to WFST graphs. In: Sojka, P., Horák, A., Kopeček, I., Pala, K.
(eds.) TSD 2016. LNCS (LNAI), vol. 9924, pp. 452–458. Springer, Cham (2016).
https://doi.org/10.1007/978-3-319-45510-5_52
23. Horndasch, A., Kaufhold, C., Nöth, E.: How to add word classes to the Kaldi
speech recognition toolkit. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.)
TSD 2016. LNCS (LNAI), vol. 9924, pp. 486–494. Springer, Cham (2016). https://
doi.org/10.1007/978-3-319-45510-5_56
24. Liu, J., Zhu, J., Kathuria, V., Peng, F.: Efficient dynamic WFST decoding for
personalized language models. arXiv preprint, arXiv:1910.10670 (2019)
25. Bazzi, I.: Modelling OOV words for robust speech recognition. Ph.D. thesis, Mas-
sachusetts Institute of Technology, Cambridge, MA, USA (2002)
26. Kombrink, S., Hannemann, M., Burget, L., Heřmanský, H.: Recovery of
Rare Words in Lecture Speech. In: Sojka, P., Horák, A., Kopeček,
I., Pala, K. (eds.) TSD 2010. LNCS (LNAI), vol. 6231, pp. 330–
337. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15760-8_42
https://www.fit.vut.cz/research/publication/9323
27. Alumäe, T., Tilk, O., Ullah, A.: Advanced Rich Transcription System for Estonian
Speech. CoRR, arXiv:abs/1901.03601 (2019)
28. Zhang, X., Povey, D., Khudanpur, S.: OOV recovery with efficient 2nd pass decod-
ing and open-vocabulary word-level RNNLM rescoring for hybrid ASR. ICASSP,
pp. 6334–6338. IEEE (2020)
29. Braun, R.A., Madikeri, S.R., Motlícek, P.: A comparison of methods for OOV-word
recognition on a new public dataset. CoRR. arXiv:abs/2107.08091 (2021)
30. Hirsimäki, T., Pylkkönen, J., Kurimo, M.: Importance of high-order n-gram models
in morph-based speech recognition. IEEE Trans. Speech Audio Process. 17(4),
724–732 (2009)
31. Siivola, V., Hirsimäki, T., Creutz, M., Kurimo, M.: Unlimited vocabulary speech
recognition based on morphs discovered in an unsupervised manner. In: Interspeech
(2003)
32. Klakow, D., Rose, G., Aubert, X.L.: OOV-detection in large vocabulary system
using automatically defined word-fragments as fillers. In: Eurospeech, ISCA (1999)
33. Bisani, M., Ney, H.: Open vocabulary speech recognition with flat hybrid models.
In: Interspeech [and] Eurospeech, 9th European Conference on Speech Communi-
cation and Technology, pp. 725–728 (2005). https://publications.rwth-aachen.de/
record/113162
34. Kombrink, S., Hannemann, M., Burget, L.: Out-of-vocabulary word detection and
beyond. In: Weinshall, D., Anemüller, J., van Gool, L. (eds.) Detection and Identi-
fication of Rare Audiovisual Cues. Studies in Computational Intelligence, vol. 384,
pp. 57–65. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-24034-
8_4
35. Drexler, J., Glass, J.: Subword regularization and beam search decoding for end-
to-end automatic speech recognition. In: ICASSP 2019-2019 IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6266–6270,
IEEE (2019)
36. Lakomkin, E., Heymann, J. Sklyar, I., Wiesler, S.: Subword regularization: an anal-
ysis of scalability and generalization for end-to-end automatic speech recognition.
In: Proceedings of the Interspeech 2020, pp. 3600–3604 (2020). https://doi.org/10.
21437/Interspeech.2020-1569
592 A. Salimbajevs and J. Kapočiūtė-Dzikienė

37. Raškinis, G., Paškauskaitė, G., Saudargienė, A., Kazlauskienė, A., Vaičiūnas, A.:
Comparison of phonemic and graphemic word to sub-word unit mappings for
lithuanian phone-level speech transcription. Informatica 30(3), 573–593 (2019).
https://doi.org/10.15388/Informatica.2019.219
38. Alumäe, T., Ottokar, T.: Automatic speech recognition system. In: Human Lan-
guage Technologies–The Baltic Perspective: Proceedings of the Seventh Interna-
tional Conference Baltic HLT 2016, vol. 238, pp. 39. IOS Press (2016)
39. Allauzen, C., Riley, M. Schalkwyk, J.: A generalized composition algorithm for
weighted finite-state transducers In:. Proceedings of the Interspeech 2009, pp.
1203–1206 (2009). https://doi.org/10.21437/Interspeech.2009-348
40. Mohri, M., Pereira, F., Riley, M.: Weighted finite-state transducers in speech recog-
nition. Comput. Speech Language 16(1), 69–88 (2002)
41. Ko, T., Peddinti, V., Povey, D., Khudanpur, S.: Audio augmentation for speech
recognition. In: Interspeech, pp. 3586–3589, ISCA (2015). http://dblp.uni-trier.de/
db/conf/interspeech/interspeech2015.html#KoPPK15
A Comparative Analysis of Local
Explainability of Models for Sentiment
Detection

Hooria Hajiyan, Heidar Davoudi, and Mehran Ebrahimi(B)

Faculty of Science, Ontario Tech University, Oshawa, ON, Canada


{Hooria.Hajiyan,Heidar.Davoudi,Mehran.Ebrahimi}@ontariotechu.ca

Abstract. Sentiment analysis is one of the crucial tasks in Natural Lan-


guage Processing (NLP) which refers to classifying natural language sen-
tences by their positive or negative sentiments. In many existing deep
learning-based models, providing an explanation of a sentiment might be
as necessary as the prediction itself. In this study, we use four different
classification models applied to the sentiment analysis of the Internet
Movie Database (IMDB) reviews, and investigate the explainability of
results using Local Interpretable Model-agnostic Explanation (LIME).
Our results reveal how the attention-based models, such as Bidirectional
LSTM (BiLSTM) and fine-tuned Bidirectional Encoder Representations
from Transformers (BERT) would focus on the most relevant keywords.

Keywords: Explainable Artificial Intelligence · LIME · Sentiment


Analysis · Attention-based model

1 Introduction
Machine learning has been widely used in various fields such as video caption-
ing [1], big data analysis [2], Natural Language Processing (NLP) [3], text clas-
sification [4], and sentiment analysis [5] which has led to remarkable growth in
Artificial Intelligence (AI) research [6]. Sentiment Analysis is a sub-field of NLP
that combines tools and techniques from linguistics and computer science to sys-
tematically identify, extract, and study emotional states and personal opinions
in natural languages. However, the main question is how an algorithm can detect
if a text is expressing positive or negative sentiment. Despite the high accuracy
and satisfactory performance of machine learning models in sentiment detection,
still, the models can be complicated that provide no information about how the
sentiment classification task is performed [7]. However, as these models are fre-
quently used to predict people’s preferences in recommendation engines, it may
be useful to inspect how they learned this prediction knowledge. In social net-
works, negative sentiments can be shared quickly, which may become a problem
if recommendation systems cannot explain the reasons behind their recommen-
dations. Explanations supporting the output of a black-box model are crucial
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
K. Arai (Ed.): FTC 2022, LNNS 561, pp. 593–606, 2023.
https://doi.org/10.1007/978-3-031-18344-7_42
594 H. Hajiyan et al.

where experts require more information about the decision than a simple pre-
diction. Recent research shows that most of the machine learning models used
in NLP, text classification, and sentiment analysis work as black-box models,
and the non-transparent structure of these models have led to different types of
research in this field using XAI [7–10].
Explainable AI (XAI) emphasizes on understanding the cause and effect
within the AI system by examining the sensitivities of the output to changes in the
parameter inputs without needing to understand the complex computation of the
model [6]. Furthermore, the explanations are helpful in supporting collaboration
between AI agents and human experts in many applications [11–13].
Although the transformer models, which rely on attention-based mechanism
have become very popular in the development of NLP tasks [14,15], other well-
known approaches, like random forest, Support Vector Machine (SVM), Naive
Bayes, and K-nearest neighbors can produce a quite good performance in senti-
ment analysis, their results are not explainable [16]. In this study, we investigate
the high performance of attention models in sentiment analysis compared to the
random forest and multinomial Naive Bayes, and sentiment explanation results
at features level show how each relevant or irrelevant keywords contribute to
true positive and false positive sentiment analysis.
To do so, we chose LIME as an explanation tool to evaluate the performance
of four different classifiers. As LIME is a model agnostic Post-hoc explanation
tool, it can be easily applied to any classification model, no matter if the decision
process is interpretable by itself or not. Therefore, we can investigate and com-
pare the sensitivity of attention-based models, and any other simple classifier in
sentiment analysis, to locally check the importance of each token/word in the
decision process.
In Sect. 2, explanation tools are categorized into local or global approaches
and determining whether generating the explanation requires post-processing
or not. Related work and the most XAI tools in NLP tasks are provided in
Sect. 3. Section 4 discusses four different classification algorithms, besides the
state-of-the-art XAI tool used in this study. Quantitative results of each model
and sentiment explanations for some instances are provided in Sect. 5. Finally,
we conclude the explanation results in Sect. 6.

2 Background
2.1 Transparency of a Black-Box Model
Transparency often refers to interpretability, which means a model is understand-
able by a human, such as regression, or decision tree [17]. However, explainability
is associated with the notion of explanation as an interface between humans and
a decision-maker [18]. There is an interchangeable misuse of interpretability and
explainability in the literature. The notable difference among these concepts
is interpretability refers to a passive characteristic of a model referring to the
level at which a given model makes sense for a human, which is also expressed
as transparency. In contrast to transparency, explainability refers to an active
Local Explainability of Models for Sentiment Detection 595

characteristic of a model with its internal functions [13]. In other words, a model
can be explained, but the interpretability of the model is something that comes
from the design of the model itself [13].

2.2 Categorization of Explanation Methods

Explanations are often categorized into two main aspects [12,18]. The first one
distinguishes whether the explanation is for an individual prediction, called local
explanation, or the model’s prediction process as a whole, called global expla-
nation. Local explanation provides information or justification for the model’s
prediction on a specific input. Global explanation provides a similar justification
by revealing how the model’s predictive process works. In other words, global
explanation describes the whole decision process as a human term independent
of any particular input [8].
Whether the explanation is local or global, explanations differ on whether
they arise as part of the prediction process or whether their generation requires
post-processing following the model making a prediction [8].

Ante-Hoc. An Ante-hoc approach, which may also be referred to as directly


interpretable [19], generates the explanation at the same time as the prediction is
made, decision trees and rule-based models are examples of Ante-hoc explanation
models.

Post-Hoc. In contrast, a Post-hoc approach requires an additional operation


after the predictions are made. Local Interpretable Model-agnostic Explanations
(LIME) [11] is an example of producing a local explanation using a surrogate
model applied following the predictor’s operation. LIME can be applied to any
black-box model without the need to understand, change, or modify the decision
process. In this research, we choose LIME as a model agnostic Post-hoc explana-
tion method to evaluate four different classification models in sentiment analysis.

3 Related Work

To interpret a prediction of a model in NLP tasks, prior research has focused on


developing post-hoc analysis algorithms to select or modify particular instances
and explain the behavior of the model [20,21]. One of the post-hoc explana-
tion tools which estimates the input contribution towards the output by comput-
ing the partial derivative is called first derivative saliency [22]. As suggested by
its name and definition, first derivative saliency can enable feature importance
explainability, especially on word/token-level features [8,23]. Gradient-weighted
Class Activation Mapping (Grad-CAM) is an attribution method to calculate
each input’s contribution to the output in text classification problems and NLP
tasks [24,25]. The other approach reveals attribute relevance to features computed
in an intermediate layer of a neural network is Layer-wise Relevance Propagation
(LRP). LRP has been used to enable feature importance explainability in fully
596 H. Hajiyan et al.

connected, convolution, and recurrent layers [26,27]. However, the main issue is
needing access to the inner structure of the model. For this reason, model agnostic
explanation methods, specifically perturbation-based, such as Local Interpretable
Model-Agnostic Explanations (LIME), have been used among for explaining text
classification problems such as sentiment analysis [7,11,28], as they are easy to
understand and do not require access to the inner structure of the model [21]. In
other words, model-agnostic explanation probe the black-box model by observing
the probability change on the predicted class when erasing a certain word [11,29].
Although several studies have been conducted to provide an explanation for
the results of sentiment analysis by LIME, they have been mostly focused on
explaining a single model, specifically a fined-tuned attention-based model [30],
using LIME to improve the explainability of a sentiment classifier with aug-
mented data [30], or a comparison of sentiment analysis methods, then choosing
the most accurate one to explain the results of the right predictions, with a high
accuracy score [31]. More specifically, there is no work investigating how local
explanations can reflect the accuracy of sentiment analysis methods in detecting
positive or negative sentiments at the features level.

4 Methodology

4.1 Classification Methods

For this study, we considered four classification methods trained on IMDB review
dataset to compare the results of a sentiment analysis, which could either be
positive or negative.
Random Forest: Random forest is a classification algorithm consisting of many
decisions trees. It uses bagging and feature randomness when building each tree
to create an uncorrelated forest of trees whose prediction is more accurate than
any individual tree. There have been many studies conducted using random for-
est and produced quite good performance in sentiment analysis on a large num-
ber of sentiments from online purchasing, movie reviews, YouTube, and Twitter
social media [32–35]. Nevertheless, trusting the predictions and explaining the
results are remain as the main issues.
Multinomial Naive Bayes: Naive Bayes is one of the most popular algorithms
used in a variety of classification problems because of its fast processing time
and high level of effectiveness [36]. This algorithm uses statistical methods to
calculate the probability of a class based on its attributes, then find the highest
probability value to classify the data to the most appropriate category. Due to
the basic concept of the Naive Bayes algorithm, it is more often used in clas-
sifying texts and sentiments analysis as it combines the probability of words
and categories documents [37,38]. The Multinomial Naive Bayes (Multinomial
NB) method is the probability-based algorithm suitable for classification with
discrete features such as word counts for text classification. This approach con-
siders the term frequency and calculates the probability of each label given the
input text [39,40].
Local Explainability of Models for Sentiment Detection 597

Bidirectional LSTM: Bidirectional Recurrent Neural Network (BRNN) is a


kind of recurrent neural networks that can be trained using all available input
information in the past and future of a specific time frame. The idea is to split the
state neurons of a regular RNN in a part that is responsible for the positive time
direction, forward states, and apart for the negative time direction, backward
states [41]. These kinds of networks have been the most suitable method in text
classification, NLP, text mining, and sentiment analysis in recent years [42,42–
47]. Because in RNNs the gradient becomes vanished in long sequences, a Bidi-
rectional LSTM Network (BiLSTM) has been proposed as a solution to this
problem [48]. BiLSTM has the ability to extract the contextual information
from the feature sequences by dealing with both forward and backward depen-
dencies. In this study, we used BiLSTM with a forward LSTM, which processes
the sequence in chronological order, and a backward LSTM, which handles the
sequence in reverse order. The output is then the concatenation of the states of
the forward and backward LSTM, followed by a classifier layer to provide the
probability of each positive or negative sentiment.
Bidirectional Encoder Representations from Transformers-BERT:
BERT is designed to pre-train deep bidirectional representations from unlabeled
text by jointly conditioning on both left and right context in all layers. As a
result, the pre-trained BERT model can be fine-tuned with just one additional
output layer to create state-of-the-art models for a wide range of tasks [49].
BERT improves the fine-tuning based approaches by using a Masked Language
Model (MLM) pre-training objective inspired by Cloze test [50]. The MLM ran-
domly masks some of the tokens from the input, and the objective is to predict
the original vocabulary of the masked word based only on its context. The MLM
objective also enables the representation to fuse the left and the right context,
which allows us to pre-train a deep bidirectional transformer. In other words,
BERT models are usually pre-trained on a large corpus of text, then fine-tuned
for specific tasks. We chose BERT because it has been wildly successful on a
variety of tasks in NLP, text classification, and sentiment analysis [4,39,51–54].
We used this pre-trained model and modify the output layer, which acts as a
classifier to predict the probability of each class, then train the model on the
sentiment analysis dataset.

4.2 LIME

Since the post-hoc explanation methods have received increasing attention in


recent years, and model agnostic explanation has been widely used in sentiment
analysis and NLP tasks [7,11,28,29], we chose LIME as a local explanation
approach in this research. LIME is a post-hoc explanation tool that is easy to
understand and it can be applied to any classification method when the predic-
tion process is done, and it is a model agnostic explanation as it does not need
to get access to the inner structure of a complex model [11]. LIME can faith-
fully explain the predictions of any black-box model, which takes an instance
and creates some neighbours around that called perturbations, then builds a
598 H. Hajiyan et al.

locally linear model around the predictions of an opaque model to explain it.
The number of perturbations around each instance and the total number of fea-
tures represented in the output are two main parameters of LIME that should be
fine-tuned based on the given problem. According to the literature, the number
of perturbations will guarantee the stability of the resulting explanation [55],
and it should be 10 times larger than the number of words [56]. As the instances
are of different lengths, we adjusted this parameter based on the size of the given
instance. Therefore, the linear model fits on these perturbations and returns the
n important features. We decided to highlight 10% of the length of each instance
as the most important words detected by LIME.

5 Experiments
This study was conducted on the IMDB review dataset for binary sentiment
analysis containing 25,000 reviews for training and 25,000 for testing. In addition,
there is 50,000 additional unlabeled data for unsupervised use as well, Table 1.

Table 1. IMDB Movie Dataset.

IMDB review (negative - positive)


Train (‘text’, ‘label’) 25,000
Test (‘text’, ‘label’) 25,000
Unsupervised 50,000

We trained four classification methods on this dataset, and the classification


reports for training and test are provided in Table 2. In order to avoid the over-
fitting problem on a large dataset, we trained random forest using Grid Search
Cross-Validation (GridSearchCV) with a set of possible values for each param-
eter. GridSearchCV loops through all possible values and combinations of the
hyperparameter and fits the model on the training data. Finally, it will return
a suitable combination of parameter values. The chosen parameters are as fol-
lows: n-estimators = 800, max-depth = 10, and max-features = 100. Although
we expected to see the attention models such as BRRN and BERT have higher
accuracy than random forest and Multinomial NB, the main purpose of this
study is to interpret the decision made by each classifier using LIME. In other
words, the results of LIME will represent which words/tokens the model was
looking at to make a prediction.

5.1 Quantitative Results


This Section provides classification reports shown in Table 2. The results show
that BiLSTM and BERT make the most accurate predictions in text classifica-
tion. In the next section, we explain the predictions made by classifiers for an
individual instance and reveal the performance of each method at features level.
Local Explainability of Models for Sentiment Detection 599

Table 2. Classification accuracy.

Train Test
Random forest 0.80 0.79
Multinomial NB 0.85 0.82
Bidirectional RNN 0.87 0.85
BERT 0.93 0.86

5.2 Sentiment Explanation Results

To explain the sentiment predictions, we have evaluated the performance of


each model using LIME on two instances of positive and negative sentiment.
The instances showing true positive and false positive to discuss the accuracy of
each method at the features level, and the explanations are shown visually. More
precisely, we chose two instances to reveal how false positive prediction may arise
by looking at irrelevant words, and on the other hand, a correct prediction can
stem from an inaccurate model, where the prediction can not be explained in
a meaningful way. Note that first the input instance is fed into the model, and
then LIME creates some neighbours or perturbations by masking some tokens
or words. These perturbations are new input instances, and the model predicts
the labels. Finally, a linear model has been fitted on these neighbours, and
LIME returns the coefficients, called weights, showing the importance of tokens
toward each label. LIME represents the weight of tokens that affect the prediction
positively and negatively.
The first sentiment is: I really tried, but movie just didn’t work for me. The
action scenes were dull, acting was surprisingly poor, and some of these char-
acters were TOO stereotypical to even be funny. Pam Grier tries, but when you
have nothing to work with, even her considerable talent can not prevent a disas-
ter. Even by standards of weak genre, film is pretty bad.
The actual label of this review is negative. The predicted label made by ran-
dom forest and Multinomial NB is positive, which is wrong. However, BiLSTM
and BERT made a correct prediction by considering the whole context and look-
ing at the words which affected the prediction toward the negative label. The
prediction probabilities and local explanations are provided in Fig. 1, 2, 3, and 4,
respectively. Figure 1 and 2 show that random forest and Multinomial NB have
looked at some irrelevant words and considered each word/token as a feature
instead of the whole context. This is an example of a false-positive error that
happened by these two methods and the predicted probabilities are shown for
each sentiment analysis. Although it is not a long sentence, this explanation
shows why the models made a wrong prediction by looking at some stop words,
or irrelevant tokens, as these classification models look at each token/word indi-
vidually without considering the contextual meaning. We removed those stop
words and then passed sentences to the same models, but there was no spe-
cific improvement in the results, and again, some irrelevant words/tokens were
600 H. Hajiyan et al.

highlighted by LIME. So as we keep the sentences as is to have more consistency


in our comparison.
From another point of view, the predicted probabilities in Fig. 1, 2, and the
weight of each word toward the positive and negative sentiments are slightly dif-
ferent, however, the highlighted words remain the same. Therefore, these words
seem to be the most frequent ones in the input instance, and the behavior of
both of these methods would be alike in this case. Figure 3, and 4 show the
attention models made the correct prediction for the same instance by looking
at the most relevant words.

Fig. 1. Actual: negative, predicted label by random forest: positive.

Fig. 2. Actual: negative, predicted label by multinomial NB: positive.

As can be seen, the attention-based models are trying to capture the contex-
tual meaning of a text. Figure 3, and 4 show the true negative sentiment analysis
made by BiLSTM and BERT, and the probabilities are about 1. Though the
highlighted words and their weights detected by LIME are slightly different in
BiLSTM and BERT, the whole context conveys the same sentiment and it is
what we hope to see in attention-based methods.

Fig. 3. Actual: negative, predicted label by BiLSTM: negative.


Local Explainability of Models for Sentiment Detection 601

Fig. 4. Actual: negative, predicted label by BERT: negative.

The second sentiment is: “I first saw this movie on IFC. Which is a great
network by the way to see underground films. I watched this movie and was
thinking it was going to be pure drama and a story line that doesn’t hold water.
But it really was a worth while watch. The main character is in such rough shape,
and you hate to see him deny help, but no matter what you just can’t hate him.
His devotion to The Beatles and John Lennon is a great metaphor for his life
and the helplessness he feels. The atmosphere of the film is also great. At times,
you feel like you can see what he sees, feel what he feels in some situations.
This movie does not leave you wanting to know more, or disliking a loophole
in the plot. There are NO loopholes (in my opinion). I have always been a fan
of foreign films, especially now with movies being made so poorly in America.
I really enjoy the foreign settings because I feel it can take you on a trip, and
sometimes understand a different culture. This movie did all those things to me
and more. Please watch this movie and if you’re new to foreign films, this is a
great start.”
The actual label of this instance is positive. The prediction made by each
classifier besides the local explanations are as follows.
According to Fig. 5 and 6, although the predictions of random forest and
Multinomial NB is correct and the probability of the correct label is significantly
higher than the opposite one, again like the first instance, most of the highlighted
words detected by LIME seem to be irrelevant to the sentiment analysis task,
like the most frequent stop words, a, is, and, etc. The point is that a model
can make a correct prediction, but it does not seem to be meaningful if we
explain the reason behind that. That is why we cannot easily trust the model’s
prediction, besides it might raise the question is this correct prediction happened
accidentally.
The same instance has been fed into BiLSTM and BERT. The prediction
probabilities and explanation results are shown in Fig. 7, and 8. The prediction
probabilities of the actual label are close to 1, and the highlighted parts of the
sentence are more precise. In Fig. 7, and 8 the word great got more weight, besides
other relevant words such as worth, enjoy detected by LIME. Although these
attention-based models consider the contextual meaning of the given instance,
these highlighted words are the most affecting ones towards the positive label.
602 H. Hajiyan et al.

Fig. 5. Actual: positive, predicted label by random forest: positive.

Fig. 6. Actual: positive, predicted label by multinomial NB: positive.

Fig. 7. Actual: positive, predicted label by BiLSTM: positive.

Fig. 8. Actual: positive, predicted label by BERT: positive.


Local Explainability of Models for Sentiment Detection 603

6 Conclusion
Sentiment analysis has become very popular in both research and business due
to the availability of vast amount of opinions currently produced by users on
social media. Standard sentiment analysis deals with classifying the overall sen-
timent of a text by considering the importance of each word within a context.
This study investigated the accuracy of four classification models for sentiment
analysis using local explanations. Precisely, we performed four different classifi-
cation methods, including random forest, Multinomial NB, BiLSTM, and BERT
on the IMDB review, then used LIME to explain the predictions. We investi-
gated two case studies from the given dataset chosen as true positive and false
positive predictions, then revealed the importance of each keyword affecting the
predicted label. The results proved that although random forest and Multino-
mial NB would predict the actual label, the prediction might not be precise in
long sentences when some irrelevant words are highlighted in the explanations.
In contrast, attention-based models like BiLSTM and BERT accurately pre-
dicted the correct label by focusing on the most relevant parts of the sentence.
This study shows how a correct prediction, even with a high prediction proba-
bility, cannot be accepted blindly. In other words, a correct prediction may arise
from an inaccurate model when we explain the model’s behavior at the features
level. Although many classification methods have been used in sentiment anal-
ysis, attention-based models represent the high performance through a decision
process, when the results are explained. For further research, we can examine
the sensitivity of attention-based models to different words through an expla-
nation framework by replacing the most affecting tokens with their synonyms
and then explaining the decisions again to see if the new words can change the
contextual meaning of a sentence or not. Furthermore, since LIME is a model-
agnostic explanation technique, we can combine it with other NLP tasks, e.g.
summarizing, extraction, and question-answering, which we leave for our future
work.

Acknowledgments. This work has been supported in part by the Natural Sciences
and Engineering Research Council of Canada (NSERC).

References
1. Zhou, L., Zhou, Y., Corso, J.J., Socher, R., Xiong, C.: End-to-end dense video
captioning with masked transformer. In: Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 8739–8748 (2018)
2. Najafabadi, M.M., Villanustre, F., Khoshgoftaar, T.M., Seliya, N., Wald, R.,
Muharemagic, E.: J. Big Data, 2(1), 1–21 (2015)
3. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed repre-
sentations of words and phrases and their compositionality. In: Advances in Neural
Information Processing Systems, pp. 3111–3119 (2013)
4. Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., Gao, J.:
Deep learning-based text classification: a comprehensive review. ACM Comput.
Surv. (CSUR) 54, 1–40 (2021)
604 H. Hajiyan et al.

5. Hasan, A., Moin, S., Karim, A., Shamshirband, S.: Machine learning-based senti-
ment analysis for twitter accounts. Math. Comput. Appl. 23, 11 (2018)
6. Linkov, I., Galaitsi, S., Trump, B.D., Keisler, J.M., Kott, A.: Cybertrust: from
explainable to actionable and interpretable artificial intelligence. IEEE (2020)
7. Bodria, F., Panisson, A., Perotti, A., Piaggesi, S.: Explainability methods for nat-
ural language processing: applications to sentiment analysis (Discussion Paper)
(2020)
8. Danilevsky, M., Qian, K., Aharonov, R., Katsis, Y., Kawas, B., Sen, P.: A sur-
vey of the state of explainable AI for natural language processing arXiv preprint
arXiv:2010.00711 (2020)
9. Liu, H., Yin, Q., Wang, W.Y.: Towards explainable NLP: a generative explanation
framework for text classification arXiv preprint arXiv:1811.00196 (2018)
10. Wiegreffe, S., Marasović, A.: Teach me to explain: a review of datasets for explain-
able NLP. arXiv preprint arXiv:2102.12060 (2021)
11. Ribeiro, M.T., Singh, S., Guestrin, C.: Why should i trust you?” Explaining the
predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD Interna-
tional Conference on Knowledge Discovery and Data Mining Pages, pp. 1135–1144
(2016)
12. Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable
artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018)
13. Arrieta, A.B., et al.: Explainable Artificial Intelligence (XAI): concepts, tax-
onomies, opportunities and challenges toward responsible AI. Information Fusion,
vol. 58. Elsevier (2020)
14. Grimsley, C., Mayfield, E., Bursten, J.: Why attention is not explanation: surgical
intervention and causal reasoning about neural models (2020)
15. Brunner, G., Liu, Y., Pascual, D., Richter, O., Ciaramita, M., Wattenhofer, R.:
On identifiability in transformers. arXiv preprint arXiv:1908.04211 (2019)
16. Daeli, N.O.F., Adiwijaya, A.: Sentiment analysis on movie reviews using Informa-
tion gain and K-nearest neighbor. J. Data Sci. Appl. 3, 1–7 (2020)
17. Lipton, Z.C.: The mythos of model interpretability. In: Machine learning, the Con-
cept of Interpretability is Both Important and Slippery. ACM New York, NY,
USA,, vol. 16, no. 3, pp. 31–57. Queue Press (2018)
18. Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.: A
survey of methods for explaining black box models. In: ACM Computing Surveys
(CSUR) (2018)
19. Arya, V., et al.: One explanation does not fit all: a toolkit and taxonomy of AI
explainability techniques. arXiv preprint arXiv:1909.03012 (2019)
20. Kenny, E.M., Keane, M.T.: Twin-systems to explain artificial neural networks using
case-based reasoning: comparative tests of feature-weighting methods in ANN-
CBR twins for XAI. In: Twenty-Eighth International Joint Conferences on Artifi-
cial Intelligence (IJCAI), Macao (2019)
21. Keane, M.T., Smyth, B.: Good counterfactuals and where to find them: a case-
based technique for generating counterfactuals for explainable AI (XAI). In: Inter-
national Conference on Case-Based Reasoning (2020)
22. Saltelli, A., et al.: Global sensitivity analysis: the primer (2008)
23. Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In:
International Conference on Machine Learning (2017)
24. Gorski, L., Ramakrishna, S., Nowosielski, J.M.: Towards grad-cam based explain-
ability in a legal text processing pipeline. arXiv preprint arXiv:2012.09603 (2020)
25. Lertvittayakumjorn, P., Toni, F.: Human-grounded evaluations of explanation
methods for text classification. arXiv preprint arXiv:1908.11355 (2019)
Local Explainability of Models for Sentiment Detection 605

26. Poerner, N., Roth, B., Schütze, H.: Evaluating neural network explanation
methods using hybrid documents and morphological agreement. arXiv preprint
arXiv:1801.06422 (2018)
27. Croce, D., Rossini, D., Basili, R.: Explaining non-linear classifier decisions within
kernel-based deep architectures. In: Proceedings of the 2018 EMNLP Workshop
BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (2018)
28. Alvarez-Melis, D. Jaakkola, T.S.: A causal framework for explaining the predictions
of black-box sequence-to-sequence models. arXiv preprint arXiv:1707.01943 (2017)
29. Chen, H., Zheng, G., Ji, Y.: Generating hierarchical explanations on text classifi-
cation via feature interaction detection. arXiv preprint arXiv:2004.02015 (2020)
30. Chen, H., Ji, Y.: Improving the explainability of neural sentiment classifiers via
data augmentation. arXiv preprint arXiv:1909.04225 (2019)
31. Aljuhani, S.A., Alghamdi, N.S.: A comparison of sentiment analysis methods on
Amazon reviews of Mobile Phones. Int. J. Adv. Comput. Sci. Appl. 10, 608–617
(2019)
32. Karthika, P., Murugeswari, R., Manoranjithem, R.: Sentiment analysis of social
media network using random forest algorithm. In: 2019 IEEE International Con-
ference on Intelligent Techniques in Control, Optimization and Signal Processing
(INCOS) (2019)
33. Singh, J., Tripathi, P.: Sentiment analysis of Twitter data by making use of SVM,
Random Forest and Decision Tree algorithm. In: 2021 10th IEEE International
Conference on Communication Systems and Network Technologies (CSNT) (2021)
34. Munshi, A., Arvindhan, M., Thirunavukkarasu, K.: Random forest application of
twitter data sentiment analysis in online social network prediction. In: Emerging
Technologies for Healthcare: Internet of Things and Deep Learning Models (2021)
35. Aufar, M., Andreswari, R., Pramesti, D.: Sentiment analysis on YouTube social
media using decision tree and random forest algorithm: a case study. In: 2020
International Conference on Data Science and Its Applications (ICoDSA) (2020)
36. Novendri, R., Callista, A.S., Pratama, D.N., Puspita, C.E.: Sentiment analysis of
YouTube movie trailer comments using Naı̈ve Bayes. Bull. Comput. Sci. Electr.
Eng. 1, 26–32 (2020)
37. Dey, S., Wasif, S., Tonmoy, D.S., Sultana, S., Sarkar, J., Dey, M.: A comparative
study of support vector machine and Naive Bayes classifier for sentiment analysis
on Amazon product reviews. In: 2020 International Conference on Contemporary
Computing and Applications (IC3A) (2020)
38. Li, Z., Li, R., Jin, G.: Sentiment analysis of danmaku videos based on Naı̈ve Bayes
and sentiment dictionary. IEEE Access (2020)
39. Dhola, K., Saradva, M.: A comparative evaluation of traditional machine learning
and deep learning classification techniques for sentiment analysis. In: 2021 11th
International Conference on Cloud Computing, Data Science & Engineering (Con-
fluence) (2021)
40. Rahman, R., Masud, M.A., Mimi, R.J., Dina, M.N.S.: Sentiment analysis on ben-
gali movie reviews using multinomial Naı̈ve Bayes. In: 2021 24th International
Conference on Computer and Information Technology (ICCIT) (2021)
41. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans.
Signal Process. 45, 2673–2681 (1997)
42. Nistor, S.C., Moca, M., Moldovan, D., Oprean, D.B., Nistor, R.L.: Building a
twitter sentiment analysis system with recurrent neural networks. Sensors 21, 2266
(2021)
606 H. Hajiyan et al.

43. Islam, M.S., Sultana, S., Roy, U.K., Al Mahmud, J., Jahidul, S.: HARC-new hybrid
method with hierarchical attention based bidirectional recurrent neural network
with dilated convolutional neural network to recognize multilabel emotions from
text. Jurnal Ilmiah Teknik Elektro Komputer dan Informatika (JITEKI) (2021)
44. Abid, F., Li, C., Alam, M.: Multi-source social media data sentiment analysis
using bidirectional recurrent convolutional neural networks. Comput. Commun.
157, 102–115 (2020)
45. Cai, Y., Huang, Q., Lin, Z., Xu, J., Chen, Z., Li, Q.: Recurrent neural network
with pooling operation and attention mechanism for sentiment analysis: a multi-
task learning approach. Knowl.-Based Syst. 203, 105856 (2020)
46. Turek, J., Jain, S., Vo, V., Capotă, M., Huth, A., Willke, T.: Approximating stacked
and bidirectional recurrent architectures with the delayed recurrent neural network.
In: International Conference on Machine Learning (2020)
47. Elfaik, H., et al.: Deep bidirectional LSTM network learning-based sentiment anal-
ysis for Arabic text. J. Intell. Syst. 30, 395–412 (2021)
48. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9,
1735–1780 (1997)
49. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: pre-training of
deep bidirectional transformers for language understanding. arXiv preprint
arXiv:1810.04805 (2018)
50. Taylor, W.L.: “Cloze procedure”: a new tool for measuring readability. J. Quart.
30, 415–433 (1953)
51. Habimana, O., Li, Y., Li, R., Gu, X., Yu, G.: Sentiment analysis using deep learning
approaches: an overview. Sci. China Inf. Sci. 63, 1–36 (2020)
52. Chauhan, P., Sharma, N., Sikka, G.: The emergence of social media data and
sentiment analysis in election prediction. J. Ambient. Intell. Humaniz. Comput.
12, 2601–2627 (2021)
53. Karimi, A., Rossi, L., Prati, A.: Adversarial training for aspect-based sentiment
analysis with Bert. In: 2020 25th International Conference on Pattern Recognition
(ICPR) (2021)
54. Hoang, M., Bihorac, O.A., Rouces, J.: Aspect-based sentiment analysis using
BERT. In: Proceedings of the 22nd NORDIC Conference on Computational Lin-
guistics (2019)
55. Zhou, Z., Hooker, G., Wang, F.: S-lime: stabilized-lime for model explanation. In:
Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery &
Data Mining (2021)
56. Garreau, D., Mardaoui, D.: What does LIME really see in images? In: International
Conference on Machine Learning (2021)
Persuasive Dialogue Corpus:
Graph-Based Approach Combining
Persuader and Persuadee Perspectives

Meghna Allamudi and Olga Scrivner(B)

Rose-Hulman Institute of Technology, Terre Haute, USA


{allamum,scrivner}@rose-hulman.edu

Abstract. Persuasion is omnipresent in our daily communication. As a


mechanism for changing or forming one’s opinion or behavior, persuasive
dialogues and their strategies have gained interest for developing intelli-
gent conversational systems. Given the complexity of this task, persua-
sion systems, especially dealing in conversations that require ‘no action’
by the user but rather a change in opinion or belief, require specialized
annotated corpora and the understanding of logical structure, natural
language, and persuasive strategies. The sparsity of available annotated
data and a wide range of proposed models make it challenging for devel-
oping strategic chatbots specific to user needs. To address these issues,
this study introduces a novel framework combining a replicable data col-
lection tool and a topic-independent annotation schema for designing an
argument-graph corpus and incorporating both persuader and persuadee
perspectives, essential for building smart conversational agents.

Keywords: Persuasion · Conversational agent · Annotation schema ·


Neo4j · Graph-based corpus · Chatbot

1 Introduction
Persuasion is the act of convincing a person to believe in or act on some-
thing, whether it is making a donation, voting for a particular candidate, or
following healthier habits [9]. Understanding persuasion is one of the keys for
building smart AI-powered applications (e.g. chatbot assistants, tutors, gaming
avatars) able to recognize and predict persuasive strategies [1,6,12]. Due to its
nature, persuasion is a complex natural language phenomenon, often inconsis-
tent or irrational, which makes it difficult to classify into underlying constructs.
Recently, identifying persuasion as well as improving persuasive dialogue systems
has become an active area of Natural Language Processing (NLP) and Natural
Language Generation (NLG) research [7,11]. Current topics are concentrated on
i) argument mining to gather data, ii) annotation schemes to categorize each
argument and make persuasion more normative and iii) models trained on these

c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 607–621, 2023.
https://doi.org/10.1007/978-3-031-18344-7_43
608 M. Allamudi and O. Scrivner

schemes to predict persuasion technique. While most research focuses on one-


directional modeling (only persuader responses), there is a need to develop a hier-
archical [13] and bi-directional model involving both persuader and persuadee
perspectives (including a categorization of both responses from the persuadee
and those from the persuader). In addition to cognitive strategies (e.g. emotional
or logical appeal), persuasive dialogues can also be classified into a binary prag-
matic act: ‘action’ versus ‘no-action’. While the success of ‘action’ is determined
by the actual action, such as donating to charity [14], the ‘no-action’ dialogue
(e.g. changing a personal opinion) poses challenges. Such conversations may lead
to one of the following: i) a draw (where both parties remain unconvinced), ii) a
partial success (where one party agrees to certain points made by the opposing
party but not entirely to their opinion), or iii) a complete success (where one
party is entirely convinced that the opposing party is correct) [4]. With these
kind of conversations, it is important for the persuader to understand the point
of view of the persuadee in order to make arguments that are tailored better
to their line of thinking, and it may not have a clear-cut definition of success
(no action required, but some sort of change in opinion). Finally, persuasive dia-
logues are often topic-dependent and the scarcity of available annotated corpora
for training creates another challenge for modeling conversational agents.
To address these challenges, our research focuses on designing i) a topic-
independent graph-based framework to encode persuasive ‘no-action’ strategies
from both perspectives (persuader and persuadee) and ii) a web-based corpus
building tool to facilitate data collection. Overall, this work aims to make the fol-
lowing contributions to the persuasive conversation research: i) revising existing
annotation schema for arguments [14] and extending it to incorporate both the
persuader and the persuadee, ii) developing a graph-based corpus using Neo4j
that can be used to train a dialogue systems that engage in similar ‘no action’
persuasive conversations, and iii) introducing a reproducible process for collect-
ing dialogue data for training.
The paper is organized as follows. Section 2 presents the overview of related
work and existing persuasive dialogue corpora. Section 3 introduces our frame-
work for developing persuasive data collection tool and creating an annotated
graph-based corpus. In Sect. 4, we present our corpus iterations developed with
Neo4J. Section 5 discusses and provides an analysis on the data we have col-
lected as well as the graphs we have constructed. Finally, we present our future
directions and study limitation in Sect. 6.

2 Related Work

Persuasion has been extensively studied in many fields (e.g. linguistics, cogni-
tive science, gaming) and there are many different ways to represent persuasion.
From a cognitive perspective, the behavior of persuadee can be influenced by
using the following acts: command, convincing, and suggestion [2]. An example
of a command would be when the persuader is simply commanding the user to
donate using some online forum or by providing credit card details. Convincing
Persuasive Systems 609

requires altering certain desires of the user. To add convincing arguments, the
persuader could describe some benefits to the persuadee that come with making
a donation, such as tax exemption for the persuadee. Lastly, suggestion involves
giving power back to the persuadee to decide if they want to make the donation.
Furthermore, there are multiple factors that can make a persuadee more sus-
ceptible to changing their mind [9], for example, acknowledging perceived social
norms, conforming to social pressure, and having an emotional attachment to the
topic being discussed. Persuasive dialogues can also be used in various domains,
such as emotional reasoning and gaming, where parents or game players must
strategize a way to get their child or another player to perform a task [10].
First, with many factors at play during an argumentative discussion and a
variety of contexts in which these discussions may take place, persuasion is very
difficult to capture. Current ways of representing persuasion in text includes
establishing a set of categories for each argument. In this way, a model over
the categorized dataset can be created to classify new arguments. To test this
method, a simulation is often created where persuader arguments are generated
from the model and tested on persuadees that are played by real people (users
of the simulation). For example, a chatbot where the bot is the persuader is a
type of simulation used in recent work [3,4]. It is important to note, however,
that developing a chatbot with an artificial persuader requires having a balanced
corpus. It is even more helpful if said corpus categorizes each argument so that
the chatbot can pull an argument from a relevant category when listening to the
user and understanding their perspective [5,14]. It is also important to construct
a dataset for possible persuadee responses so that the chatbot can be more
responsive and can directly address the user’s points in a way that is appealing
to them [6]. This highlights the importance for creating an annotation schema
to represent the persuadee.
Second, there exist only a few available persuasive dialogue corpora. Data
collection and conversational corpus development are time-consuming and
often topic-dependent tasks. Despite the availability of several conversational
datasets [7], we focus our review on only two corpora that would serve as a
foundation for our framework, a combined approach of incorporating annotation
schemas along with graph-based connections between dialogue taken directly
from persuasive conversations. The first corpus has been developed from conver-
sations between assigned persuaders and persuadees recruited on the Amazon
Turks Platform [6,14], where the persuader was tasked with trying to convince
the persuadee to donate to a charity (‘action’ persuasive conversation). This
corpus has 1017 data entries with annotated schemes tagging each persuadee
response and persuader argument with a specific category type [6,14]. For exam-
ple, argument categories include logical appeal, emotion appeal, and credibility
appeal, while persuadee categories include request for organization information,
inquiries about donation procedure, and positive/negative reactions. This cor-
pus, however, does not encode the sequence of the conversations collected. It
solely provides categories for which to sort arguments and responses in an ‘action’
conversation (persuasive conversations aimed at convincing the persuadee to do
610 M. Allamudi and O. Scrivner

something). We are more interested in persuasive conversations aimed at chang-


ing the opinion or belief of the persuadee (‘no-action’ persuasive conversations)
in this work.
The second corpus is created as an argument graph [3–5]. Nodes represent
arguments made by people who either believe that tuition for university in the
United Kingdom should be lowered or not (this is a change in the other person’s
opinion on tuition, thus, ‘no action’ persuasive conversation). Google forms are
used as a collection tool. In this representation, each counterargument is entered
as a subnode (child) to the argument described in the parent node. An example is
shown in Fig. 1. The graph corpus provides 1,288 arguments and each argument
has an average depth of five arguments [4]. The main issue with this dataset
is that arguments and counterarguments are collected in a non-conversational
context. This means that it is missing certain semantics and sentiment expressed
in each response that we would usually see in a natural conversation between
two humans.

Fig. 1. Graph-based corpus: arguments B and C argue against the point made in their
parent, argument A. Argument D agrees with the point made in argument A, but is a
counterargument to the argument made in argument B. The graph shows the flow of
a potential, back-and-forth debate between two people [3, p.2].

Furthermore, each persuasive dialogue applies its own model architecture.


The first corpus uses a hybrid RCNN neural network along with sentence embed-
ding, context embedding, turn position embedding, sentiment analysis, and char-
acter embedding to classify each argument into a specific category [14]. This
model performs with an accuracy of 74.8%, and misclassification mainly occurs
between the emotional appeal and the personal story categories [14]. The findings
also reveal the importance of persuadee’s personality and persuasion strategies.
The strongest strategies used by the persuader are shown when they told a per-
sonal story related to the charity being discussed. The second argument-based
corpus uses two chatbots for model evaluation and no annotations/categories are
used in the graph-based corpus: The first chatbot (the baseline chatbot) does
not use the graph-based corpus and acts as an echo bot, listening to the user’s
responses and sometimes providing surface-level arguments. The second chatbot
(the strategic chatbot), however, uses the graph-based corpus to produce more
dynamic arguments against the user’s point [4]. In addition to testing the two
Persuasive Systems 611

chatbots with the number of changes in stance points (a point is awarded if the
user’s opinion is changed from what it originally was before the conversation
started), users are also asked whether they have been satisfied. It is hypothe-
sized that the reason why the users were not satisfied with how understood they
were and how their points were addressed by the strategic chatbot is because
this specific chatbot could have come across as being more stubborn, ignoring
how the user felt, which is why a consideration for the persuadee perspective is
important.
In regards to the persuadee perspective, multiple models and their own
predefined categories based on the first corpus are used to represent per-
suadee responses in [6]. Persuadee categories include: ask-org-info, ask-donation-
procedure, positive reaction, agree donation, etc. A Transformer-based model
with extended CRF (Conditional Random Field) is used to build a persuasive
strategy recognition model. This model (Transformers-ExtCRF) proves to be
more accurate when categorizing persuader responses according to the defined
categories in [14]. Finally, HARGAN (Heterogenous Argument Attention Net-
work) uses a graph tree to learn argument structure for both persuader’s and
persuadee’ stance predictions using ChangeMyView dataset [8].
Thus, as previous research has shown, it is important that the persuadee
perspective be considered and annotated thoroughly. Additionally, the combi-
nation of annotated methods with the graph-based corpus method to create a
corpus can lead to a more strategic and informative corpus tracking arguments
and counterarguments.

3 Corpus Development Framework


This section introduces three stages developed to support our framework:

1. The creation of a publicly available data collection tool to help gather con-
versational data,
2. The gathering of data with ‘no-action’ persuasive conversations,
3. The design of a graph-based schema for Neo4j, a graph database platform.

3.1 Data Collection Instrument

To achieve the first milestone, we have followed previous work by [4,5] and
designed our collection instrument specific to participating students who would
play a role of either the persuadee or persuader. It is important to note that
there is a lack of dialogue collection tools and original corpora made available in
the NLP community (without the use of web-scraping from sites such as Reddit),
so this developed data collection tool and our overall data collection process will
be a contribution on its own. Figure 2 illustrates the process used to gather data
from students on campus.
As illustrated in the second step of the flowchart in Fig. 2, we developed a
data collection site where two students can chat simultaneously. The interactive
612 M. Allamudi and O. Scrivner

Fig. 2. Flowchart process for collecting conversational data from students.

web application, built with React (a front-end javascript library) and Firebase
(a back-end cloud service) platform, introduces a novel way to create and store
conversational data. Additionally, the web app code can be reproduced and used
for small and large datasets. The dialogues are stored via Firestore, a document
database in Firebase platform, and exported to CSV after each conversation.
Once the topic for conversation is determined, it is placed at the top of the
chatting web interface, where users can anonymously log in for each conversation.
Each dialogue is stored as a separate document within Firestore. Each data entry
(persuader or persuadee) has its unique ID, a timestamp, and a text field. When
data is exported to CSV, the user can manually add labels and annotations. The
user-interface of the web collection tool is shown in Fig. 3.

Fig. 3. Web application for data collection: a screenshot of chatting area where partic-
ipants can engage in synchronous conversation. The deployed app is available online -
https://dialogue-data-collect-site.netlify.app/.

3.2 Data Gathering: Details of Study


The study, approved by IRB (RHS0383), took place in Spring 2022 at Rose-
Hulman Institute of Technology. Data has been collected from 42 undergraduate
STEM students (each conversation took place with exactly two students) yielding
a total of 21 conversations with 228 dialogues. For this study, we have chosen the
following conversation topic: “Is online learning beneficial academically?” since
every student has experience with both online and in-person learning, ensuring
that this topic can be discussed by any two students. The students participating
in each conversation chose one of the two positions to take on: persuader or
persuadee. The persuader would argue that online learning is beneficial and
the persuadee would engage in the conversation and state whether they are
completely convinced, partially convinced, or not convinced at all at the end of
Persuasive Systems 613

the conversation. The position of persuadee and persuader was determined at


the beginning of each conversation. If one person was passionate about a specific
side, they could choose that position because the more authentic the arguments
are, the better. If neither participant had a preference, a coin flip was used to
determine the persuader role. Each conversation lasted 5 min (or longer if the
students were finishing up their last points). To help diversify the conversational
strategies, the persuader was provided with a list of talking points, illustrated
in Fig. 4. In addition to existing persuasive strategies and examples, we have
added additional categories based on suggestion from study participants, namely,
the scenario-based inquiries, experience-related inquiries, clarifying inquiries and
opening/closing remarks.

Fig. 4. Talking points for the persuader used during the study (Adapted from [14]).
Additional strategies include scenario-based inquiries, experience-related inquiries, clar-
ifying inquiries and opening/closing remarks. Each persuader can choose any of the
strategies.
614 M. Allamudi and O. Scrivner

3.3 Graph-Based Design


Several recent studies have shown that the graph-based and hierarchical tree-
structure representations of arguments are able to capture the various depths of
argumentation dialogues [4,8], portraying a variety of persuasive routes. In our
framework, there are a total of eleven persuasive strategies generating possible
paths from Opening to Closing. The schema in Fig. 5 represents the possible
orders that could be used by any given persuader in the conversation.

Fig. 5. Basic design schema for persuasive strategies: the schema represents possible
paths of persuasive strategies that can be taken to convince the persuadee that online
learning is more beneficial. In this case, starting from opening remarks, the persuader
could potentially use any other strategy to begin or follow their arguments.

The schema is used as a design outline for the Neo4j network database. It is
important to note that given the current study limitation, no node for outcome
(draw, partial success, or complete success) is incorporated. In this schema, the
opening remarks strategy is kept in the center, as it is the beginning of every
conversation. Any of the other strategies (except for a clarifying inquiry or a
closing remark) can potentially follow an opening remark. From any of those
eight strategies can follow any other strategy including a closing remark if some
conclusion has been reached (whether it is a draw, partial success, or complete
success) and a clarifying question.

4 Persuasion Graph Corpus: Neo4j Implementation


To develop a graph-based corpus, we used Neo4j database, Python, and Cypher
(Neo4j query language). Our code is made available to the public via GitHub.1
1
https://github.com/MeghnaAllamudi/Neo4JDatabaseDev.
Persuasive Systems 615

Our algorithms can be adjusted so that commonalities or differences between


certain conversations can be represented based on researcher’s needs.
The Persuasion Neo4j corpus is developed using several iterations: i) per-
suader strategies only, ii) persuader and persuadee strategies, iii) sequence of
dialogues. The first iteration only includes the sequence of persuader strategies
used in all 21 conversations. The dataset is grouped by the conversation number
so that each sequence is carefully routed and traced in the graph (the LEADTO
links), illustrated in Fig. 6. Logical appeal seems to be a common persuasive
strategy in our dataset.

Fig. 6. Neo4j graph with persuader strategies in sequence. Logical appeal is the largest
node since many routes intersect with that persuasive strategy.

The second iteration includes both the sequence of persuader and persuadee
strategies used in all 21 conversations. In this schema, the data is grouped by con-
versation number and each route of strategies regardless of position are mapped
in this graph, as illustrated in Fig. 7.
Figure 8 demonstrates the graph for conversations 10 through 13 with a
closer look at logical appeal node, showing that many points are leading to that
strategy.
Figure 9 shows a zoomed-in version of iteration number one of the graph
that only maps conversation number 4 to show it in a more comprehensible
light. Here, we see that the persuader starts at opening remarks, circles around
logical appeal a few times, and then ends with getting a full user agreement.
Finally, the last iteration includes every single dialogue as a separate node and
each node also includes the strategy used.
616 M. Allamudi and O. Scrivner

Fig. 7. Neo4j graph with persuader and persuadee strategies in sequence.

Fig. 8. A close-up graph of the conversation numbers 10–13: many conversation points
lead to logical appeal strategy.
Persuasive Systems 617

Fig. 9. Persuader strategies for the conversation number 4: persuader started with an
opening remark, uses three logical appeal arguments, and ends with a user agreement.

5 Discussion

Our current corpus includes 21 conversations yielding in total 228 dialogues


between students, each annotated with a specific persuader or persuadee argu-
ment/response category. Initially, we labeled each persuadee response with the
strategy used by the persuader beforehand and added “user response” to capture
all potential ways a response could be phrased to any given persuasive strategy.
However, we realized that each persuadee response could be placed in its own
category, which helped us draw more valuable conclusions regarding the behav-
ior of the persuadee. For example, in some cases, when an emotional or personal
story argument was used by the persuader, the persuadee responded with a
logical argument that counteracted the point previously made. This indicated
that the persuadee in these particular conversations were more logic-leaning
and that emotional arguments were not going to be as effective. Ultimately,
these conversations ended in a draw. Table 1 shows the frequency of each strat-
egy including response categories that were established during our study for
the persuadee. Each response category for the persuadee is labeled ending with
‘user response’, while the other categories are categories used by persuader.
Logical appeal is the most frequent strategy by persuaders (52), followed by
thought-provoking inquiry (13). Interestingly, persuadees responded using log-
ical appeal only in 38 dialogues, followed by thought-provoking strategy (9).
Note that a “Thought-provoking User Response” is a persuadee’s answer to a
thought-provoking inquiry by a persuader, while a “Thought-provoking Inquiry
User Response” is a thought-provoking question asked by the persuadee to the
persuader. In contrast, emotional appeal strategy is used only in 6 dialogues,
whereas personal story strategy is used in 8 dialogues.
618 M. Allamudi and O. Scrivner

Table 1. Breakdown of categories for each dialogue in a dataset of 228. Note that a
“Thought-Provoking User Response” is a persuadee’s answer to a thought-provoking
inquiry by a persuader, while a “Thought-Provoking Inquiry User Response” is a
thought-provoking question asked by the persuadee to the persuader.

Persuader category Count Persuadee category Count


Logical appeal 52 Logical appeal user response 38
Opening remarks 17 Opening remarks user response 12
Closing remarks 13 Closing remarks user response 7
Thought-provoking inquiry 13 Thought-provoking user response 9
Thought-provoking inquiry user response 11
Personal Story 8 Personal story user response 12
Experience-related inquiry 8 Experience-related user response 6
Emotional appeal 6 Emotional appeal user response 1
Credibility appeal 1 Credibility appeal user response 2
Personal-related Inquiry 1 Personal-related inquiry user response 0
Clarifying inquiry 1 Clarifying inquiry user response 3
Scenario-based inquiry 4 Scenario-based user response 4
User agreement 2

Additionally, we were able to graph the first thirteen conversations (each one
represented in a different color) and two had resulted in a complete success (see
Fig. 10). If we follow the yellow and blue lines that resulted in a complete success
with the persuadee agreeing completely that online learning is more beneficial
for students academically, we see that the majority of the arguments made in
each pipeline are logical, which shows that the persuadees engaged in both of
these conversations (two different students) are both logic-leaning. We have yet
to see a persuadee that is more emotion-leaning or responds better to personal
stories.
Finally, we present a map of clustered (by their annotations) arguments and
counterarguments to online learning along with their specific category annota-
tion. This will create a collection of various arguments and annotations that will
be more flexible and understanding of different persuadee perspectives. In the
future, with more data and more observations made on persuadee behavior, it
will also give us some more insight into the best strategies. For example, Fig. 11
is a snippet of what we wanted our end-goal to be before we finished collecting
data and constructing our final Neo4J iteration. Notice here that a logical user
response can be refuted with either a thought-provoking inquiry or a personal
story made by the persuader. It all depends on what kind of persuadee person-
ality the persuader is working with. Our strategy graphs should be able t2o help
determine what paths work for certain persuadees, especially when we begin to
build conversational agents that can use it.
Persuasive Systems 619

Fig. 10. A graph of the first thirteen conversations: the graph shows paths to complete
success and any other paths (we did not encode partial success in this schema yet).
(Color figure online)

Fig. 11. Example graph with dialogue based on collected data. Arguments A, C and D
are made by the persuader, while argument B is made by the persuadee and in response
to argument A. In this case, depending on the kind of persuadee is participating in
the conversation, a conversational agent would decide whether to take a more logical
approach or a more personal and emotional approach.
620 M. Allamudi and O. Scrivner

6 Conclusion
Currently our work in progress is focused on creating Neo4j graph-based
databases to highlight which argument strategies in ‘no-action’ persuasive con-
versations prove to be optimal and on developing more in-depth profiles of poten-
tial persuadees to help persuaders be more understanding of their perspective.
Thus far, we have collected 228 dialogues and plan on adapting our graphs and
annotation schemas to larger ‘no-action’ conversational datasets in the future.
These 228 dialogues have provided insights into which strategy seems to be the
most used and which sequence of strategies seems to work for the persuadees.
Moreover, the collected data has helped us developed an annotation schema for
the persuadees. This is an important contribution in determining how to inter-
pret success among persuadees and how to categorize their responses. It can
help us understand what strategies each persuadee is more likely to resonate
with and what kind of responses they will tend to use in their retort. Addition-
ally, we developed a reproducible conversational data collection instrument2 ,
Neo4j corpus with collected dataset3 .
Our study has also several limitations. The first limitation is the amount of
data we have collected. In order to create a more representative graph-based
corpus, we need more data points. However, with a smaller dataset, we have
been able to create an annotation schema for the persuadee perspective for this
specific ‘no-action’ persuasive conversation. We have also been able to construct
several algorithms for developing the Neo4J graphs presented in Sect. 4. These
are important contributions that we believe can now be adapted to a larger ‘no-
action’ persuasive conversation (this will be our future work). Additionally, the
pool of students consists of STEM-majoring students. Therefore, the demograph-
ics of our users are difficult to diversify, which means our data, to an extent, is
not diverse. Another limitation to our current project is that we have manually
labeled each dialogue. We understand that it is possible for an argument to be
partially logical and partially appealing to the emotions of the user, so in the
future it will be best if we change our categorization technique from a 1:1 map-
ping (from dialogue to category) to a scale-based categorization. For example,
a persuader argument could then potentially be 70% logical appeal and 30%
emotional appeal.
Finally, it is difficult to understand or measure the full impact of our data-
storing and representation approach at the moment. Therefore, our next step
is to evaluate our algorithms and data-labeling processes to a larger, existing
conversational dataset.

References
1. Benner, D., Schöbel, S., Janson, A.: Exploring the state-of-the-art of persuasive
design for smart personal assistants. In: International Conference on Wirtschaftsin-
formatik (WI) (2021)
2
https://github.com/MeghnaAllamudi/Thesis-Data-Collection.
3
https://github.com/MeghnaAllamudi/Neo4JDatabaseDev.
Persuasive Systems 621

2. Boella, G., Hulstijn, J., Van Der Torre, L.: Persuasion strategies in dialogue. In:
The ECAI Workshop on Computational Models of Natural Argument (CMNA
2004) (2004)
3. Chalaguine, L.A., Hunter, A.: Chatbot design for argument harvesting. Front.
Artif. Intell. Appl. 305, 457–458 (2018)
4. Chalaguine, L.A., Hunter, A.: Knowledge acquisition and corpus for
argumentation-based chatbots. In: Proceedings of the 3rd Workshop on Advances
in Argumentation in Artificial Intelligence, pp. 1–14 (2019)
5. Chalaguine, L.A., Hunter, A., Potts, H.W.W., Hamilton, F.L.: Impact of argument
type and concerns in argumentation with a chatbot. In: IEEE 31st International
Conference on Tools with Artificial Intelligence, pp. 1557–1562 (2019)
6. Chen, H., Ghosal, D., Majumder, N., Hussain, A., Poria, S.: Persuasive dialogue
understanding: the baselines and negative results. Neurocomputing 431, 47–56
(2021)
7. Duerr, S., Gloor, P.A.: Persuasive natural language generation - a literature review
(2018), 1–17 (2021)
8. Huang, K.-Y., Huang, H.-H., Chen, H.-H.: HARGAN: heterogeneous argument
attention network for persuasiveness prediction. In: Proceedings of the AAAI Con-
ference on Artificial Intelligence, vol. 35, no. 14, pp. 13045–13054 (2021)
9. Hunter, A.: Towards a framework for computational persuasion with applications
in behaviour change. Argument Comput. 9(1), 15–40 (2018)
10. Kacprzak, M.: Persuasive strategies in dialogue games with emotional reasoning.
In: Polkowski, L., et al. (eds.) IJCRS 2017. LNCS (LNAI), vol. 10314, pp. 435–453.
Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60840-2 32
11. Lipa-Urbina, E., Condori-Fernandez, N., Suni-Lopez, F.: Towards an automatic
generation of persuasive messages. In: Ali, R., Lugrin, B., Charles, F. (eds.) PER-
SUASIVE 2021. LNCS, vol. 12684, pp. 55–62. Springer, Cham (2021). https://doi.
org/10.1007/978-3-030-79460-6 5
12. Oduor, M., Alahaivala, T., Oinas-Kukkonen, H.: Software design patterns for per-
suasive computer-human dialogue: reminder, reward, and instant feedback. In: Lit-
tle, L., Sillence, E., Joinson, A. (eds.) Behavior Change Research and Theory:
Psychological and Technological Perspectives, pp. 47–67. Elsevier Science (2017)
13. Sakai, K., Higashinaka, R., Yoshikawa, Y., Ishiguro, H., Tomita, J.: Hierarchical
argumentation structure for persuasive argumentative dialogue generation. IEICE
Trans. Inf. Syst. E103D(2), 424–434 (2020)
14. Wang, X., et al.: Persuasion for good: towards a personalized persuasive dialogue
system for social good. In: ACL 2019 - 57th Annual Meeting of the Association for
Computational Linguistics, Proceedings of the Conference, pp. 5635–5649 (2020)
N-Gram Based Amharic Grammar Checker

Deepak Sharma(B) , Gurjeet Singh Mattu, and Sukhdeep Sharma

Desh Bhagat Foundation Group of Institutions, Ferozepur Road, Moga, India


[email protected]

Abstract. Objective: To investigate the applicability of all possible sentence level


grammatical features along with grammatical agreement rules for grammatical
error detection and correction of textual documents without language restriction.
Methods: This work used sentence level tagged n-gram and word n-gram with
grammatical agreement rules of a language to develop a system. To demonstrate
the language independency of the model we used Amharic language corpus taken
from HaBit (Harvesting big text data for under-resourced languages). Findings:
This work indicates the relevance of sentence level tag and word features on detec-
tion and correction of grammatical errors. The result of this model outperformed
the existing fixed N-gram (i.e. Bigram and Trigram) and this will be used for
researchers to investigate further. Novelty: This work indicates the weight of all
possible sentence level grammatical features to boost the effectiveness of grammar
proofing systems, particularly at multilingual textual document setting.

Keywords: Natural language processing · Grammar checker · Language


independent · N-gram · Sentence level

1 Introduction
Now days, the demand for producing texts with high quality is increase. The automated
tools that check and correct sentence error contribute to improvements in writing high
quality texts [1]. This is one area of Natural Language Processing (NLP) and concerned
with creating proofing systems. As result, such error free text achieved at different levels
such as morphology to define the structure of words, syntax to determine the composition
of sentences, and semantics to determine the meaning [2].
Several research works have been conducted on this research area; the work by [2]
used ontology to define the logical description of the rules of Arabic grammar to gen-
erate all possible sentences that are syntactically correct for each extracted words of
target sentence. Afterwards the target sentence is compared with all possible generated
sentences to detect any grammatical mistakes followed by correction phase. This app-
roach requires a detailed grammatical and structural knowledge of a language to build
ontology as a knowledge source. As result, this approach is not suitable for a model
operates without language restriction.
On the other hand, most research works used fixed n-gram (i.e. Bigram, Trigram,
or in combination) tag sequence probability as language model for grammatical error
detection and correction of a language [3–6]. In these research works, the grammatical

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 622–632, 2023.
https://doi.org/10.1007/978-3-031-18344-7_44
N-Gram Based Amharic Grammar Checker 623

features are extracted at Bigram and Trigram level and this limits extracted features to
train model about the grammatical properties of language. As result, the model is not
effective on detection and correction of grammatical errors.
Beside this, all the above mentioned research works are design for a particular lan-
guage. Due to a de fact of multilingualism textual contents on the web, tools that operate
beyond language barriers are more required. In this research area, one work done by [7]
attempted a grammar checker for text written with any language using statistical data.
In this work, the model used Trigram tag sequence to learn the grammatical properties
of the language. As result, this approach is not effective on grammatical error detection
and correction. Since, the extracted grammatical features are limited to maximum of
Trigram tag sequence. This also increases the probability of tag feature sequences that
are out of the training model. Finally, checking the grammatical correctness of a text by
only considering of three neighbour tag sequence ignores the normal scenario of natural
languages. Since, during natural language the grammatical correctness of a text checked
by considering all tag sequences per sentence [8].

Table 1. Statistical details of language

Language #Word #Sentence #Tag Grammatical correct Grammatical incorrect


sentence sentence
Amharic 17,320,000 1,208,926 33 400 300

Therefore, in this study all possible sentence level grammatical features are used to
learn the model about grammatical properties of a language. For this purpose, we have
formulated the following research questions: evaluate the effectiveness of grammatical
errors detection and correction at multilingual setting (i) using fixed n-gram (i.e. Bigram
and Trigram) grammatical features (ii) using all sentence level grammatical features or
sentence level n-gram. To demonstrate the model, we adopt textual documents written
in under-resourced language such as Amharic, Afaan Oromo and Tigrigna.

2 Methodology
2.1 Data Selection
For part-of-speech tagging we adopt TreeTagger and to train this tool we used corpus
for each supported language from HaBit (Harvesting big text data for under-resourced
languages) [9, 10]. Training data for tree tagger described under (Table 1). To train
the language properties for grammar detection and correction module, we used word
n-gram and tag n-gram data sets. For this, 408,920 Amharic sentence tag and word n-
gram are used respectively. Finally, to train grammatical disagreement error detection
and correction module all possible word-class agreement combinations and words are
extracted from tagged training corpus for all domain language. For testing, we adopt
the experience of many researchers which is creating a test set artificially by randomly
replacing words in correct sentence. The statistical details of grammatical incorrect text
units used to evaluate our model describe in Table 1.
624 D. Sharma et al.

2.2 Proposed Model

As shown in Fig. 1 the proposed model is structured into seven main modules such
as language selection, sentence segmentation and indexing, POS tagging, POS label
normalization, n-gram extraction, grammar error detection and grammar error correction.
This model requires text as input that should be checked its grammatical error and text
at any level (i.e. phrase, sentence, etc.). First, the language of the written text should
be identified to perform further operation on the model. Once the language is known
the text split into set of statements with index information and pass via POS tagging
to assign each words into their word class category. This grammatical information is
very important for grammatical error detection and correction. However, a POS training
corpus used for POS tagging and for generating language model have different POS label
representation and to handle this complexity we include POS label normalization. The
tagged sentence is split into set of n-grams (i.e. fixed n-gram or sentence level n-gram)
for both word and tag sequence features. All possible tag n-grams of sentence is checked
along language model and if none of them not found then considered as grammatical
error text unit. To provide correction for detected grammatical error proposed model use
word n-gram language model.
Annotated corpus is used to generate list of POS tag and from these generated tag
sequences, some sequence will be very common, others will probably not occur at all.
Commonly occurring sequences will be considered correct in other words uncommon
sequences will lead to errors. In this study we adopt Tree Tagger, words are represented
by word itself, lemma and POS tag [10]. We trained this tagger by using a manually
tagged training corpus. For tree tagger we used POS annotated training corpus having
different POS representation and would be different POS outcome for each tree tagger
corpora. For best match, the POS label of test text and training POS should have same
representation, unless the model is not effective. To handle this we include a module that
converts original tagged POS label of each word into standard POS label representation.

2.3 N-gram Extraction

We adopt fixed n-gram and sentence level n-gram grammatical feature extraction tech-
niques. In case of fixed n-gram number of grammatical features extracted to learn
grammatical properties of a language is limited with specified size of N (i.e. tri-gram
and bigram) and this reduce effectiveness of model. Example1: show context feature
extraction techniques with Amharic text unit

Each words of given text unit are analyzed with POS tagger and assigned with their
word class category as follows:-
N-Gram Based Amharic Grammar Checker 625

Fig. 1. Design of language independent Amharic grammar checker

The possible tag n-grams extracted from above Amharic sentence is:-
Possible tag tri-gram patterns:-

Similarly all possible tag Bigrams also extracted to enhance the probability of
grammatical error detection and correction for a given tagged text.
We incorporate a technique that extract rich grammatical features of a given tagged
text unit. In this technique, we are going to extract all sentence level possible n-gram
grammatical features and we call this sentence level tag n-gram grammatical feature
626 D. Sharma et al.

extraction technique. This enables proposed model to learn more about grammatical
properties of language.
All possible extracted tag n-gram sequence at sentence levels for Example1:-

As shown above all possible tag n-grams from higher tag n-gram sequence up to the
tag bi-gram sequence are extracted.

2.4 Grammar Error Detection

2.4.1 POS Tag Order Grammar Error Detection


This module is design to detect grammatical errors based on grammatical features
extracted with either fixed n-gram or sentence level n-gram feature extraction techniques.
In both cases, all possible non-checked tag n-gram features (i.e. tag n-gram sequence
having last tag of text unit) are looking up in target tag n-gram language model and the
checking process is start from higher non-checked tag n-gram to lower non-checked tag
n-gram sequence (i.e. tag bi-gram). Once all possible tag n-grams are checked and if
none of them are not found then given text unit is considered as a grammatical error.
Example2:-

In Example2, all possible tag n-gram sequences are extracted to enhance grammatical
mistake detection capability of proposed model. However, for efficiency we only extract
tag n-gram sequence which contains the last word tag information.
N-Gram Based Amharic Grammar Checker 627

Grammar checkers are interactive systems which check grammatical errors of given
text unit before any other input word. In Example2 grammatical error detection and cor-
rection is already done for tag n-gram sequence without last tag “ <ENDPUNC>” (i.e.
“<ADV><N><NUMCR><N><V>”, “<ADV><N><NUMCR><N>” etc.). In
given text unit grammar checker module validates non-checked tag-sequence by looking
it up in target tag n-gram language model. First checks higher non-checked tag n-gram
(i.e. hexa-gram) and if not found, the module further checks for availability of the lower
non-checked n-grams. This checking process continue until non-checked bi-gram gram-
matical feature and if none of them are not found in the language model then “last tag”
is considered as suspicious grammatical information.
We used word n-gram model to formulate text unit variation by replacing all available
words in terms of last word of original text unit. After grammatical error word is replaced
by other variation words that are extracted from language model relative probability is
computed. When relative probability of newly formulated text unit is higher than original
text, the proposed models verify previously detected grammatical error is a real error.

2.5 Grammatical Disagreement Error Detection

2.5.1 Adjective-Noun Disagreement


Any natural language has adjectives that modify nouns. These adjectives may mark
number (person and gender) and gender (feminine and masculine) of the noun they
modify. When an adjective marks number and gender of the noun, the marker should
agree with number and gender of noun. The location of adjectives that appear in a given
sentence may differ along different natural language. But, in any language an adjective
should agree with noun it modifies in number and gender.
Example3: - “ ”/“ tlqlamoc” “big cows”.
The possible pair of adjective and noun tag tokens is extracted from same text unit and
to check their agreement check their occurrence in the target POS agreement training set
as shown in Fig. 2. When the pair of tokens is not found in training dataset then considered
as grammatical disagreement error. Example3 is considered as grammatically incorrect
as it contains a singular marker in adjective and the noun it modifies is a plural noun.

2.5.2 Adverb-verb Disagreement


Adverb defined as category which is to verb what an adjective is to a noun. In most
natural language adverbs are usually modifies the first verb that comes next to it. In
many languages adverbs are classified into subclasses such as adverb of time, place,
circumstance etc. Time adverb and verb disagreement is one of the common grammatical
errors of languages. Since, correct type of adverb should be used for the verb and vice
versa.
See Amharic sentence: - “ ”/“htEbemiqeTlewsamnt
‘agbac’”adverb and verb tagged tokens are occurred in tagged test text unit then it is
feed to this module for agreement checking. The adverb-verb agreement between words
in text is checking using word co-occurrence information. So, we extract all adverbs and
verb tag token in text and build a language model with their relative probability. This
628 D. Sharma et al.

language model is used to lookup test tagged text adverb and verb agreement, it is valid
when found in the model. In above sentence, the word “ ” and “ ” are not
found in the training model and are considered as adverb and verb agreement error.

2.6 Grammar Error Correction

2.6.1 Tag Order Grammatical Error Correction


Possible candidate texts are generated using the original text detected grammatical error
as clue. Particularly, n-grams that includes grammatical mistake word. Since, token in the
middle of n-gram is already checked and replaced earlier by most probable alternative.
The proposed approach is interactive at any text level. Once mistaken word is detected
from original text, so new text units are generated form word n-gram language model
by replacing grammatical mistaken by all its resemble words extracted from lexicon.
Let’s take wrong Amharic sentence:-

When the given Amharic text is analyzed by POS tagger

To detect grammatical mistakes of above tagged text, all possible tag n-grams features
are extracted with either fixed or sentence level extraction technique and check their
occurrence in target n-gram language model. According to the above example the last
tag n-gram order is detected as an error (V and N). As result, to suggest grammatical
corrections the first step is generate new formulated candidate text variations as follows

The entire new formulated candidate texts are analysed along POS tagger and each
word in a text are labelled to their corresponding word class as shown in below.

In example4 each word of candidate texts are assigned to their word class category
and our model further requires selection of top relevant suggestions.
To suggest nearly similar texts as user wants to write, we rank candidate tagged
suggestions based on their relevance to the original text. Compute the probability of tag
N-Gram Based Amharic Grammar Checker 629

n-gram sequence extracted from each tagged candidate texts via tag n-gram language
model as follows
occurance of tag − ngram
relativeProb − tag − ngram =  (1)
all target tag − ngrams occurrence
where
relativeProbn − tag − ngram is the probability degree of tag n-gram along target tag
n-gram language model.
occurance of tag−ngram is the tag n-gram frequency in a target tag n-gram language
model.

all target tag − ngrams occurrence is the sum of frequency of all tag n-gram in
target language model.
Finally, the proposed model is providing the grammatical error suggestions to user if
and only if one of the grammatical candidate text probabilities is greater than the original
grammatical mistaken text. To provide grammatical error suggestions, there are two
alternative ways: In case of fully automated grammar checker system, the grammatical
mistaken text is replaced with suggestion text having highest relative probability. But, in
case interactive system the top K (i.e. K is integer) ranked suggestion texts are suggest
to user based on their relevance to original grammatical error text. In this investigation,
the grant to choose the right suggestion text is given for end user and this resolve the
problem of false positive.

2.6.2 Grammatical Disagreement Error Correction


In some case, input text tag sequence may be correct. However, there is grammatical
disagreement and in such cases the proposed model incorporate grammatical error cor-
rection module. As we stated earlier, grammatical disagreement error is detected by
checking occurrence of tag token pairs along the POS agreement training model as
shown in Fig. 2. Once the pairs of tag tokens are extracted from a single text unit and
these pairs are not found in grammatical agreement training dataset then it is considered
as grammatical disagreement error. This module use this information as clue and to pro-
pose correction for this error, all possible variation words that co-occurred along with
one of the tag token are extracted form grammatical agreement training set.
The sentence “ ”/“htEbemiqeTlewsamnt ‘agbac”’
is detected as grammatical disagreement error particularly as adverb (i.e.
) and verb (i.e. ) agreement problem in time. Hence, the
possible frequent co-occurred tokens along with “ ” or “ ” are
extracted. Therefore, all possible tokens are ranked according to their co-
occurrence frequency and in this example the grammatical error is corrected as
either ‘ ’/“‘htEbemiqeTlewsamnttagebalec” or ‘
’/‘“htEbalefewsamntagebac”. In these two candi-
date suggestions, the adverb and verb agree in time.
We implement two kinds of word class agreement: the agreement between an adverb
and a verb as well as the agreement of an adjective and noun. However, our model is
flexible to be adoptable for other word class agreement checking as far as there is training
dataset.
630 D. Sharma et al.

Fig. 2. Sentence level tag N-gram grammatical error detection

3 Results and Discussion

We adopt recall, precision, and F-measure to measure effectiveness of proposed model.


There are three different experimental techniques used to observe effectiveness of pro-
posed model on detection and correction of grammatical mistake along with different
features. Experiment 1 is conducted with fixed n-gram grammatical feature extraction
technique. Experiment 2 is conducted with sentence level n-gram grammatical feature
extraction technique. Finally, we conduct Experiment 3 to indicate sentence level n-
gram feature extraction technique along with grammatical agreements (i.e. Noun-verb,
Adverb-verb and Adjective-noun) feature to observe the effectiveness and performance
of proposed model.
From this experimental result, we observe the effectiveness of proposed model along
with combination of sentence level grammatical feature extraction technique and gram-
matical agreement features achieves a better result for grammatical error detection.
Since, sentence level grammatical feature extraction technique able to extract more fea-
tures even if at short text input and this enables the proposed model to learn more about
the grammatical properties of the target language. However, in case of fixed n-gram the
numbers of grammatical features are less so it reduces grammatical capability of model.
Grammatical errors are not only grammatical tag order problems; there are also grammat-
ical disagreement mistakes in terms of gender, number etc. to enhance the effectiveness
N-Gram Based Amharic Grammar Checker 631

of detecting such grammatical errors, the grammatical disagreement checker module


via sentence level grammatical feature extraction is incorporated and this enables the
Experiment 3 achieves a promising result than others.
To evaluate the effectiveness of proposed model on providing grammatical cor-
rections for detected grammatical mistaken text inputs; we evaluate the above three
experimental techniques. Experiment 3 incorporates sentence level grammatical fea-
ture extraction technique via grammatical disagreement checker module and achieves
a better result on effectiveness of providing grammatical corrections. As we discussed
previously, the combination of these two features enables proposed model to learn more
about grammatical properties of target language.

4 Conclusion
In this study, we design and implement an automatic grammatical error detection and
correction without any language restrictions. To do this, we incorporate seven high level
modules such as: language selection, sentence segmentation and indexing, POS tagging,
POS label normalization, n-gram extraction, grammar error detection and correction.
Both supervised and unsupervised corpus of domain language was collected and used
as training of proposed grammar checker. To evaluate proposed approach erroneous test
sets are created by exchanging the location of words in random manner due to lack of
actual and well organized test set.
In this study we conduct three experimental techniques: Experiment 1 aims to eval-
uate the effectiveness of proposed model along with fixed n-gram feature extraction
technique. Moreover, Experiment 2 conducted to evaluate the effectiveness of proposed
model with sentence level n-gram feature extraction technique and Experiment 3 con-
ducted to evaluate proposed model using combination of sentence level n-gram feature
extraction technique via disagreement rules. The experimental result indicates proposed
model via Experimental 3 performs better for both grammatical mistake detection and
correction. In this investigation, for demonstration purpose the number of supported
language is limited but our model is flexible and modular so, it is possible to extend for
other language. Therefore, future work is bound towards extending the model for other
languages and improving the performance of the system.

References
1. McCarthy, K.S., Roscoe, R.D., Likens, A.D., McNamara, D.S.: Checking it twice: does adding
spelling and grammar checkers improve essay quality in an automated writing tutor? In:
Isotani, S., Millán, E., Ogan, A., Hastings, P., McLaren, B., Luckin, R. (eds.) AIED 2019.
LNCS (LNAI), vol. 11625, pp. 270–282. Springer, Cham (2019). https://doi.org/10.1007/
978-3-030-23204-7_23
2. Chouaib, M., Tragha, A., El Habib, B.A., Almalki, T.: An innovative approach to autocor-
recting grammatical errors in Arabic texts. J. King Saud Univ. Comput. Inf. Sci. 33, 476–488
(2019). https://doi.org/10.1016/j.jksuci.2019.02.005
3. Jindal, L., Singh, H., Sharma, S.: A framework for grammatical error detection and correction
system for Punjabi language using stochastic approach. EAI Endorsed Trans. Scalable Inf.
Syst. 8 (2021). https://doi.org/10.4108/eai.27-4-2021.169421
632 D. Sharma et al.

4. Leekha, J., Vijay, R., Sanjeev, K.: N-gram statistical grammar checker for an Indian language.
Int. J. Adv. Sci. Technol. 29, 3098–3106 (2020). http://sersc.org/journals/index.php/IJAST/
article/view/4541
5. Nahid, H., Salekul, I., Mohammad, N.: Development of Bangla spell and grammar checkers:
resource creation and evaluation. Inst. Electr. Electron. Eng. 9 (2021). Digital Object Identifier.
https://doi.org/10.1109/ACCESS.2021.3119627
6. Riazur, R., Tarek, H., Sadekur, R., Shaon, B., Mohammad, S.: An investigative design based
statistical approach for determining Bangla sentence validity. Int. J. Comput. Sci. Netw. Secur.
16, 30–37 (2016). http://paper.ijcsns.org/07_book/201611/20161106.pdf
7. Verena, H., Timo, R.: LIS Grammar Checker: Language Independent Statistical Gram-
mar Checking (2009). https://www.ru.is/kennarar/hrafn/students/MasterThesis_HenrichRe
uter.pdf
8. Debela, T.: A rule-based Afan Oromo grammar checker. Int. J. Adv. Comput. Sci. Appl.
(IJACSA) 2(8) (2011). https://doi.org/10.14569/IJACSA.2011.020823
9. Suchomel, V., Baisa, M., Jakubíček, V., Kovář, Z., Nevěřilová, A., Rambousek.: HaBiT -
harvesting big text data for under-resourced languages. Habit-project.eu. (2014). http://habit-
project.eu/. Accessed 20 Nov 2019
10. Helmut, S.: Probabilistic part-of-speech tagging using decision trees. In: International Con-
ference on New Methods in Language Processing, pp. 44–49 (1994). http://citeseerx.ist.psu.
edu/viewdoc/similar?doi=10.1.1.28.1139&type=cc
The Internet of Things as a Tool Towards Smart
Education: A Systematic Review

Abdulsalam K. Alhazmi1(B) , Ezzadeen Kaed2 , Fatima Al-Hammadi3 ,


Nasr Alsakkaf1 , and Yousra Al-Hammadi4
1 Faculty of Engineering and Computing, University of Science and Technology, Main Campus,
Aden, Yemen
[email protected], [email protected]
2 Faculty of Informatics and Computing, UniSZA, Terengganu, Malaysia
[email protected]
3 International Programmes, University of Science and Technology, Main Campus, Aden,
Yemen
[email protected]
4 IT Department, YonderKen UK, London, UK
[email protected]
http://www.ust.edu/

Abstract. IoT’s adoption has grown exponentially across a vast number of indus-
tries; with each industry that IoT is applied in being characterised by a unique set
of prospects and challenges. In education, advancements in new technologies,
guided by the advent of artificial intelligence and IoT, has seen the learning envi-
ronment transform from traditional-based learning to digital-based learning. Edu-
cation Institutions that leverage all of the big data generated from IoT applications,
is a process that could be adopted to address the challenges of implementing IoT
solutions, as well as challenges in the education industry. We surveyed the litera-
ture to identify new developments, trends and applications of IoT in the education
industry. To achieve this we used the Scopus database and retrieved related articles
using key words, for example IoT in education, IoT and education, IoT application
in education, IoT in teaching and learning, IoT and online learning, Implication
of IoT in Education. IoT and distance learning, IoT and monitoring in education.
We established that the IoT’s application in the education field has expanded in
recent years, where applications have already been adopted that have been devel-
oped in different institutions globally. The results of the content analysis were
classified into four main categories, namely, Application, Potential, Factors and
Challenges of IoT in education. The directions and recommendations for future
work concerning IoT’s implementation in education have been presented.

Keywords: IoT applications · IoT in education · Smart education · Smart


campus · E-learning · Education management · Industry 4.0

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 633–648, 2023.
https://doi.org/10.1007/978-3-031-18344-7_45
634 A. K. Alhazmi et al.

1 Introduction
1.1 IoT Overview

The Internet of things (IoT) is defined as a set of electronic devices that are connected via
the internet or intranet. Such devices and objects include sensors, electronics and soft-
ware. This technology enables connection between devices (things), people and environ-
ments in order to collect data by embedding actuators and sensors, then transmitting such
data to specialised applications to create useful and actionable information. The extant
literature has adopted several terms to define IoT, for instance, the Internet of Anything
(IoA), Internet of Everything (IoE), Web of Things, Industrial Internet of Things (IIoT),
or Machine-to-Machine communication. IoT affects numerous spheres of life including
education, social, health, transport, communication, environmental monitoring, business
and society.
The concept of IoT is deemed to be a gateway to the digital society. IoT’s innova-
tions will increase based on continuous advances in cloud computing, nanoelectronics,
communications, sensors, big data, as well as smart objects. IoT is a particular aspect of
the Internet that permits the connection of humans to each other, connection of human
and things, in addition to connections between things. Consequently, the emergence of
the IoT has facilitated the establishment of giant intelligent systems.

1.2 IoT in Education

Novel technologies’ continuing advancement, guided by the advent of artificial intelli-


gence (AI), the IoT and robotics, among others, has led to the learning environment’s
transformation from traditional-based learning to digital-based learning [1]. The IoT will
connect people, processes, devices and data, thereby enabling stakeholders in education
to find simpler means of transforming the data collected from sensors and portable
devices into valuable information, thus undertaking significant actions based on that
information [2].
According to [3], the IoT is significant in education as a means of enhancing students’
participation, motivation, attention, immediate assessment and feedback. In the educa-
tion sector, IoT is transforming how teachers and students collaborate. For instance, the
IoT enables collection of big data in the school context, which may be used to monitor
and track activities. Nevertheless, given that many of these devices are connected to the
internet, challenges arise relating to the source of the internet, power sources, security,
privacy and cost [4]. The IoT is enhancing distance learning because barriers are being
eliminated, including physical ones. Smart classrooms, intelligent classrooms, or smart
learning, is enhancing students’ class experiences, because they are able to learn from
virtual reality (VR), animations and share resources online.
The COVID-19 pandemic has caused fast-paced changes in education, forcing ICT’s
integration in higher education. Regardless, the IoT remains at the nascent stage of
incorporation into the education system, with the effect of its implementation far from
being thoroughly comprehended [2]. Concurrently, the IoT will lead to multiple changes
in the education sphere, for example technological changes (Cloud/Fog Computing,
instructional technologies and mobile apps), education reform, changes in teaching and
The Internet of Things as a Tool Towards Smart Education 635

learning, practical and experimental changes, campus changes, changes in security and
confidentiality, quality and ethics, changes of a financial nature, in addition to other types
of changes [2].
The IoT’s adoption in various fields has helped to revolutionise them. One such field
is higher education, which has begun adopting the IoT as a means of enhancing learning,
training, management, experimentation and so forth [1]. However, the IoT’s adoption
and its applications remains at a growth stage across industries. Given that the IoT is
at a nascent stage of widespread implementation in the education field, it is significant
to investigate the challenges and variables affecting its implementation, future potential
use, as well as overall benefit in education.

2 Methodology
This paper aims to review the literature with a focus on new developments, trends and
the application of the IoT in education. Thus, a range of IoT-focused literature was
searched, with relevant peer reviewed research articles being retrieved from the Scopus
database. From over 100 papers retrieved, only 54 fulfilled the inclusion criteria, namely
being published between 2017 and 2021, having the keywords in the title, as well as
being peer reviewed. In accordance with this paper’s scope, a range of related keywords
and phrases were applied to retrieve the related articles, with keywords being “IoT in
education”, “IoT and education”, “IoT application in education”, “IoT in teaching and
learning”, “IoT and online learning”, “Implication of IoT in Education”, as well as “IoT
and distance learning”. Subsequently, thematic analysis was applied to identify key
emerging themes [5], followed by summarising and organising of data [6]. The selected
papers data were broadly divided into the themes of “potential”, “challenges”, “factors”
and “application”.

3 Review
The IoT concerns technology transformation in various aspects. Smart cities, smart
homes, smart transportation and smart industries are such transformations stemming
from the IoT. Considerable crucial research and investigations have been undertaken with
the aim of enhancing technology via the IoT. Even so, substantial challenges and issues
remain that require resolution to attain the IoT’s full potential. This research presents a
systemic review of recent academic research published in scientific journals concerning
the IoT’s application in the education field. The paper’s results have been classified into
four sections: the technologies and application; benefits and potential; challenges, as
well as factors. Finally, the paper summarises the literature review directions, while also
providing recommendations regarding how future research could elaborate on the trends
and research developments identified in the extant literature reviewed.

3.1 IoT Application in Education


Various studies have been conducted in relation to IoT-based applications adoption in
the educational context, which may be classified into three categories. First, IoT appli-
cations used in smart classrooms and campuses. Second, IoT systems for the teaching
environment and management constructions. Third, IoT applications used in e-learning.
636 A. K. Alhazmi et al.

Smart Campus is one of the newest concepts linked to the IoT’s application to the
education field. A few related terms have been adopted in the extant literature, including
smart classroom, smart library and smart books, all of which pertains to IoT’s integration
with campus related technologies. For example, the IOT may be used to collect mass
data through wearable devices, sensors and actuators, embedded sensors and QR codes.
These technologies can promote and enable the smart campus through the IoT being
adopted to manage related functions of university campuses, including temperature-
controlled devices, light power, security cameras and access to buildings, simplifies
access control, enhanced security, classroom monitoring and notification, automated
attendance processes, integration of the IoT and open data in school books, smart boards,
smart libraries and numerous other applications. Meanwhile, the IoT offers convenience
in relation to future smart campus design, construction, teaching, as well as overall
management.
The IoT enhances how schools monitor students’ behaviour, performance, loca-
tions, health and social behaviours, with applicability in the use of beacon chips as
a form of student identity. This technology ensures simplification of facial identifica-
tion challenges through the use of biometrics. Additionally, the IoT guarantees student
monitoring activities’ accuracy, thereby ensuring smart school applications [7]. Smart
operations management through the IoT can reduce costs for a sustained campus, because
it enables smarter service delivery. Higher education institutions are privileged by devel-
oping smart solutions that are achievable through smart services [3, 8–10]. Concerning
teaching and learning, the IoT’s most recent applications in the education field enable
the simplification of pedagogical methods. This technology enhances teacher-student
relationships, providing the teacher with novel ways of realising students’ deep learning
abilities. Schools are concerned with comprehending how the teaching environment’s
overall intelligence may be enhanced in order to strengthen learners’ outcomes.
Currently, it is economical to manage students, staff, researchers and lecturers
through sharing data and functionalities, the coexistence of old and new systems, in
addition to the elimination of major drawbacks that challenge school management. Tech-
nologies including sensor modules, micro controllers’ boards, digital payment services
and other infrastructure, enhance schools’ sustainability. IoT improves the traditional
education system through an innovative technology-guided learning strategy. In this
case, students, teachers and staff may collaborate to share ideas, materials, projects,
screens and communications. This ensures transversal combinations across actors along
the education value chain, where a common language is tangible for all stakeholders.
Flipped classrooms and online classes are further means through which students engaged
in long-distance learning can collaborate online.
The research reported applications pertaining to the IoT and eLearning, including
IoT technologies adopted for enhancing online learning through IoT data driven analy-
sis, gamification for making the learning experience engaging and effective, as well as
intelligent systems combining IoT, AI and VR tools. This assists instructors with super-
vising the students while presenting lessons and during their exams. Overall attempts
to fully automate the learning process have been made by connecting IoT devices with
cutting-edge learning technologies. The summary of IoT applications in education is
presented in Table 1.
The Internet of Things as a Tool Towards Smart Education 637

Table 1. IoT applications in the education field

Application Description Reference


Smart Campus/Smart Adopting IOT to collect mass data [9, 11–18]
Library/Smart Classroom using wearable devices, sensors and
actuators, embedded sensors, as well
as QR codes. These technologies can
promote and enable smart campuses
through the IoT being used to manage
related functions of the university
campuses, including temperature
controlled devices, light power,
security cameras and building access,
simplified access control, enhanced
security, classroom monitoring and
notification, automated attendance
processes, integration of the IoT and
open data in school books, smart
boards, smart libraries to keep track of
library books, alongside numerous
other applications
Smart Teaching/Learning IoT-based technologies can enable the [4, 5, 13, 16, 19–25]
gap between the virtual and physical
world to be bridged, as well as
innovation of teaching and learning
strategies and methods. This includes
enhanced user interface, support for
Human–Robot Interaction (HRI),
verification and real-time monitoring,
flipped learning, real-time information
and feedback for students during
learning activities, IoT assessment of
learning processes to collect and track
students’ performance, smart LMS
enabling delivery and evaluation of
instructional content, as well as
tracking of time, student behaviours
and interactions
(continued)
638 A. K. Alhazmi et al.

Table 1. (continued)

Application Description Reference


E-Learning IoT and eLearning-related [26–31]
applications, including IoT
technologies, may be adopted as a
means of enhancing online learning,
for example through IoT data analysis,
gamification to make the learning
experience engaging and effective, as
well as intelligent systems combining
IoT, AI and VR tools to assist the
instructors with supervising the
students while presenting lessons and
during exams. Moreover, instant
feedback between teachers and
students is possible through dynamic
digital signal modulation; this enables
insertion of signals for brain
stimulation, thus providing attention
control to create the optimal learning
pattern for each person. Overall
attempts to fully automate the learning
processes have been made, through
connecting IoT devices with
cutting-edge learning technologies

3.2 IoT Benefits and Potential in Education


Generally, IoT is considered to enhance students’ performance. The traditional edu-
cation system typically fails to capitalise on the needs of each student. Activity-based
performance is largely overlooked by the traditional school system, given that the princi-
pal focus is usually academic attainment. The modern system powered by IoT facilities
has automated and facilitated students’ performance evaluations. IoT has created an
enhanced educational environment irrespective of age, finances and location. IoT guar-
antees students’ productivity, engagement, activeness and safety, strengthening learners’
performance particularly in relation to personalised learning [9, 16, 32–34]. A further
advantage is that IoT ensures smart operation management, thereby reducing the expen-
diture involved in a sustained campus because the service delivery becomes smarter.
Ultimately, higher education institutions are privileged by developing smart solutions,
which are achievable through smart services [3, 8–10].
The IoT has the potential to significantly alter education processes in relation to a
variety of aspects, for example AR learning, automated processes, personalised interac-
tions, operations’ real time management, improved teaching and learning experiences,
smart cards and biometrics, cloud applications, in addition to sensor based decision mak-
ing [16, 19, 35, 36]. A summary of the IoT’s potential in the education field is presented
in Table 2.
The Internet of Things as a Tool Towards Smart Education 639

Table 2. IoT’s potential in the education field.

Potential/Benefits Description Reference


IoT facilitates smart classroom Smart lesson planning, [1, 7, 37]
and smart campus technology monitoring of attendance,
monitoring ventilation, heating
and lighting, smart whiteboards,
smart books, as well as automated
translation services
Monitoring of students Movement, location, safety, [3, 9, 26, 33]
performance, feedback, illnesses,
sleeping behaviours, social
behaviours
Smart operation management that Energy and rubbish management, [4, 20, 26, 38]
reduces the costs of a sustained smart ticketing, bus tracking,
campus parking sensors, biometric entry,
wireless door locks, situational
awareness, optimised resource
usage, instant responses,
automated operations,
sensor-driven decision making,
optimised operations, data
collection, storage and analysis, as
well as direct and instant support
Enhanced student, faculty, Shared access to materials, shared [2, 34, 39, 16, 40]
stakeholder and teachers’ screens, shared lecture halls,
collaboration telecasting, online repository, real
time interactions, project
collaborations, mobile access to
material, connected campuses, as
well as data sharing
(continued)
640 A. K. Alhazmi et al.

Table 2. (continued)

Potential/Benefits Description Reference


Improved student performance Increased accountability, [2, 3, 9, 40, 33, 22, 41]
improved learning experiences,
learning at one’s own pace, online
availability of materials, instant
communication with teachers,
improved learning comfort,
personalised learning, cultivated
learning abilities, combined
theoretical and practical learning,
material access from multiple
devices, flexible learning, VR
learning and animations, voice
interaction between student and
devices, acquisition of practical
experience, facilitation of special
education to cater to special
educational needs, absence of
physical obstacles
Reduction of the majority of Location, language, equity, [16, 35, 42]
barriers to education economies, as well as social
human integrity
Streamlining all management This will affect school business [1, 7, 10, 43, 44]
operations models, for example fee
payments, reporting and
assessments
Will facilitate green IoT Enhanced sustainability of IoT [9, 20, 33, 38, 45]
resources and environmental care
Application to practical courses, Wearable glasses, enhancement of [18, 34, 42]
e.g. medicine a new learning paradigms,
combining online and offline
educational methods, as well as
facilitating exploratory learning

3.3 Challenges Limiting Implementation of IoT in Education

Regardless of the IoT’s popularity in the education field having increased due to it
offering empowerment, while being swift and effective, it has not yet been comprehen-
sively implemented. The challenges limiting its implementation are financial constraints,
complexity, privacy, security, trust and ethics. Furthermore, there is limited expertise,
meaning a dearth of guidance and standard authentication. This leads to the incompati-
bility of devices, while poor auditing standards are not defined for IoT components and
restricted interfaces. Additional challenges include a dearth of skills among users, in
addition to poor acceptability and scalability.
The Internet of Things as a Tool Towards Smart Education 641

The challenges limiting the IoT’s implementation are financial constraints, complex-
ity, privacy, security, trust and ethics. Security and privacy are among the fundamental
difficulties confronting the IoT in the education field, due to there being a lack of secu-
rity, limited device update improvements, in addition to poor user awareness concerning
security [19, 34, 39]. Cyber security attacks originate from objects’ massive intercon-
nectivity online, thus making it accessible by anonymous and untrusted users. Users’
privacy rights are fundamental in ensuring confidence in interconnected devices.
Further challenges include a dearth of skills among users, poor acceptability and
scalability, alongside poor power and internet connections, especially in developing
nations. Ultimately, it is significant to ensure safety and reliability, as well as a dual
computer backup system combined with other management strategies.
Big data comes hand in hand with the IoT, meaning that a large domain of interacting
objects will generate big data. It is anticipated that the IoT’s scalability will be an issue
with regard to the virtual classroom’s size, sensors and actuators, among other virtual
and physical objects. Key resources are expensive to acquire, for example accreditations,
buildings and faculty members, databases, in addition to other IoT technologies. Despite

Table 3. Challenges of the IoT in relation to the education field

Challenge Description Reference


High costs Installation, expertise, maintenance, necessary [19, 41, 42]
complementary infrastructure including power
and internet, training, scholarly material,
orientation
Cyber security risks The results of cyber-attacks mean costly layers [1, 8, 15, 46]
of security are required
Privacy risks Misuse of personal data by hackers, loss of [16, 18, 32, 34, 40, 47]
important data
Poor acceptability Results from the complexity of IoT [37, 43]
technology, the digital gap between rich and
poor, negative reactions from students due to
their monitored usage, stakeholder reluctance,
as well as devices’ incompatibility
Limited scalability Results from limited research and [3, 9, 26, 38]
development, poor acceptance, poor power and
internet coverage, absence of full compatibility
with practical subjects, limited stakeholders’
goodwill, lack of moral role, dearth of
awareness, insufficient battery life and poor
curricula development
Laws and regulations These laws govern how data is collected, [18, 32, 34, 40]
stored and utilised
Lack of skills Users may lack digital or ICT skills, having [16, 22, 32, 34]
minimal expertise
642 A. K. Alhazmi et al.

the IoT enabling savings to be made on future expenditure, maintenance and installation
expenditure is nevertheless substantial. Additionally, curriculum divergence is a chal-
lenge, because it disadvantages students with limited credit hours and less profound IoT
skills and knowledge. This results in deficient practical skills that are generally lacking
across numerous campuses. Table 3 presents a summary of the difficulties linked with
the IoT’s implementation in the education field.

4 Discussion
Several studies have been undertaken in relation to the IoT in the educational context.
Research has considered the prospects of the IoT for transforming the educational system,
for example [16, 17, 19, 48]. Additional studies have been undertaken investigating users’
acceptance of IoT-based applications [39, 49] being adopted in the educational context
[1, 24, 36]. The results revealed that such technologies have tremendous potential to
transform the education paradigm, although the IoT’s implementation continues to be
linked with particular difficulties, for example security and educational management
issues [20].
A study by [50] investigated solutions to problems pertaining to the curriculum,
human resources, financial restrictions, distance learning and cultural challenges, as
well as how to have confidence in the examination process. Such solutions include inter-
active text books, 3D positioning technologies to solve security issues, IoT end-devices
attendance data and intelligent camera vision, which are all usable on campus. Further-
more, [51] appraised the educational management decision-making process. The IoT has
the potential to shift the educational system’s design to be more responsive to students’
needs. For IoT-based application studies, the results revealed the most significant IoT
applications, including smart tools (pens, stopwatches, glasses), experimental learning,
smart notifications, instant feedback and monitoring, security and control equipment,
students’ behaviour and interaction systems, monitoring, student attendance, big data
analysis, in addition to information sharing.
Additional research has attempted to devise innovative concepts, as well as having
devised or recommended novel approaches. For instance, the Internet of robotic things
manages the interaction between the physical and virtual worlds [46], while [40] sought
to identify how each aspect of the educational system may be automated with IoT chips.
Further studies have concerned online education, for example research [27] that
developed an amended hybrid blended learning model, enabling educators and learners
to co-create knowledge and enjoy online learning, combining the advantages of face-to-
face learning with the traditional approach. Additional studies have focused on the IoT
and Gamification, IoT for e-learning using gamification [29], as well as an education
game based on the IoT. The research [31] discussed the IoT’s role in an effective dis-
tance learning process, aiming to devise a model to enable teachers’ provision of instant
feedback to students, thus overcoming the challenges of distance learning compared
with face-to-face learning. Study [52] proposed a framework for measuring students’
behaviour and attentiveness by observing facial expressions, while [18] proposed a novel
model for integrating educational environment objectives with virtual academic com-
munities. Study [26] adopted an application orientated architecture (AOA) alongside
The Internet of Things as a Tool Towards Smart Education 643

FIWARF technologies to provide a mechanism integrating multiple applications and


for data sharing, thus ensuring coexistence between new systems and traditional legacy
systems.
Furthermore, [4] provided a novel monitoring and irregularity detection framework
capable of performing more effectively than other temporary decision making methods,
in terms of efficiency, data classification, irregularity prediction and system stability.
The author [21] combined the IoT and cloud computing to locate student identities and
locations for real time monitoring for physical education. Another study [30] proposed a
new intelligent system called VIRICTA, which is an end-to-end solution with stack tech-
nology integrating IoT and AI. The system aims to offer a valuable learning experience,
providing efficient, interactive and proactive context-aware smart learning services. This
can facilitate improved interactivity and comprehension of the student context.
A study [25] proposed a hybrid educational environment based on RFID technology
and the IoT, with this system enabling children to play traditionally yet also in connection
with a PC and dedicated software. [53] coined a novel concept called Fabrication-as-a-
Service (FaaS), which adopts the IoT as a means of democratising access to Fab Labs.
This involves enabling a broad learning community to remotely access these computer-
controlled tools and equipment online. A two-tier architecture is employed, comprising
of a hub, deployment in the cloud, in addition to a network of distributed Fab Labs.
Overall, it may be concluded from the literature that there is massive potential and
future prospects for the IoT’s application within the education field. With the develop-
ment of personalised learning, teachers are able to monitor individual students’ needs and
track their progress. The technology will lead to full automation of manual processes with
the use of sensors, thus there are strong prospects of contributing to seamless learning
experiences [1, 19, 35, 54].
With such advances, schools will feel the need to adapt and accept the technology. The
research has suggested that the IoT will enhance students’ comprehension of the content,
ensuring that schools save on the expenditure involved in running the institutions, while
enabling schools to invest in reusable resources to soon make classrooms and campuses
smarter.

5 Recomendations and Future Studies

The IoT’s potential to support education has been significantly emphasised in the extant
literature, although the majority of research continues to be of a theoritical nature rather
than dealing with practical aspects. Most studies used qualitative methodologies [1],
primarily literature reviews, while other researches have used case studies. A more lim-
ited number of studies have adopted quantitative methodolgies such as surveys, while a
few have developed innovative technologies and proposed novel applications employ-
ing the IoT in the education field. Accordingly, it is recommend that further studies be
undertaken into the various IoT technologies’ applications in the learning environment.
In the e-learning context, popular learning management systems (LMS) must adopt
it as part of new emerging technologies in the education field [55], thereby providing
students with rich experiences of informal and lifelong learning. The author in [56] rec-
ommended that future research should concentrate on constructing the smart education
644 A. K. Alhazmi et al.

environment, curriculum teaching, as well as data-driven education evaluation. Overall,


in the LMS and e-learning system, it is recommended that future studies analyse the
optimal practices for integrating LMS with IoT technology and tools.
Adopting the IoT and its associated applications remains at a nascent stage, while
various difficulties have also been encountered [57]. Therefore, further research is highly
encouraged as suggested by [3, 57]. Also, [13] mentioned that the challenges hinder the
IoT’s adoption in learning environments, primarily because of limited preparedness for
IoT technologies’ implementation. This results in high implementation expenditure,
alongside fear and uncertainty when introducing innovative technologies to the field.
Hence, future research should be undertaken to mitigate those obstacles hindering the
IoT’s adoption in the education field. The study [12] recommended that in future, the
smart campus will be a substantial breakthrough in information collection, chip research
and programmed algorithm. The interaction between objects is an issue that requires
further research according to [18]. Other research has recommended that future work
should consider security using IoT applications in education [14, 17, 26].
Regarding the IoT and its application to teaching and learning, there is a dearth of
empirical literature analysing the effect of IoT applications on students’ learning perfor-
mance [3]. Accordingly, there is abundant scope for scholars to investigate the variables
affecting IoT applications’ deployment in the education field. In [38] the author rec-
ommends considering humanisation as a prominent challenge for software architecture
in the forthcoming years, both in IoT-based systems and the education domain. Also,
[50] provided recommendations for future research for developing an IoT technology
approach to resolving educational challenges.
Further studies should analyse the role of the IoT in the preservation and transmission
of social values to achieve sustainable development, implementation of an intelligent
educational management system, the infrastructure of IoT technology implementation,
in addition to developing technology-orientated solutions for security, interoperability,
management and privacy of the IoT in the educational context.

References
1. Chweya, R., Ibrahim, O.: Internet of Things (IoT) implementation in learning institutions:
a systematic literature review. Pertanika J. Sci. Technol. 29(1), 471–517 (2021). Universiti
Putra Malaysia Press. https://doi.org/10.47836/pjst.29.1.26
2. Mircea, M., Stoica, M., Ghilic-Micu, B.: Investigating the impact of the Internet of Things
in higher education environment. IEEE Access 9, 33396–33409 (2021). https://doi.org/10.
1109/ACCESS.2021.3060964
3. Al-Emran, M., Malik, S.I., Al-Kabi, M.N.: A survey of Internet of Things (IoT) in education:
opportunities and challenges. In: Hassanien, A.E., Bhatnagar, R., Khalifa, N.E.M., Taha,
M.H.N. (eds.) Toward Social Internet of Things (SIoT): Enabling Technologies, Architectures
and Applications. SCI, vol. 846, pp. 197–209. Springer, Cham (2020). https://doi.org/10.1007/
978-3-030-24513-9_12
4. Verma, A., Singh, A., Anand, D., Aljahdali, H.M., Alsubhi, K., Khan, B.: IoT inspired intelli-
gent monitoring and reporting framework for education 4.0. IEEE Access 9, 131286–131305
(2021). https://doi.org/10.1109/ACCESS.2021.3114286
5. Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101
(2006). https://doi.org/10.1191/1478088706qp063oa
The Internet of Things as a Tool Towards Smart Education 645

6. Maguire, M., Delahunt, B.: Doing a thematic analysis: a practical, step-by-step guide for
learning and teaching scholars (2017)
7. Banica, L., Burtescu, E., Enescu, F.: The Impact of Internet-of-Things in Higher Education.
http://www.gartner.com/newsroom/id/2819918
8. Charmonman, S., Mongkhonvanit, P., Dieu, V.N., van der Linden, N.: Applications of Internet
of Things in E-learning. Int. J. Comput. Internet Manag. 23(3), 1–4. www.charm.SiamTe
chU.net
9. Martins, P., Lopes, S.I., da Cruz, A.M.R., Curado, A.: Towards a smart & sustainable campus:
an application-oriented architecture to streamline digitization and strengthen sustainability in
academia. Sustainability 13(6) (2021). https://doi.org/10.3390/su13063189
10. . McRae, L, Ellis, K., Kent, M.: The Internet of Things (IoT): Education and Technology.
http://www.curtin.edu.au/
11. Shehzad, K., Xiaoxing, L., Sarfraz, M., Zulfiqar, M.: Signifying the imperative nexus between
climate change and information and communication technology development: a case from
Pakistan. Environ. Sci. Pollut. Res. 27(24), 30502–30517 (2020). https://doi.org/10.1007/s11
356-020-09128-x
12. Majeed, A., Ali, M.: How Internet-of-Things (IoT) making the university campuses smart?
QA higher education (QAHE) perspective. In: 2018 IEEE 8th Annual Computing and Com-
munication Workshop and Conference, CCWC 2018, vol. 2018, pp. 646–648, January 2018.:
https://doi.org/10.1109/CCWC.2018.8301774
13. Domínguez, F., Ochoa, X.: Smart objects in education: an early survey to assess opportu-
nities and challenges. In: 2017 4th International Conference on eDemocracy eGovernment,
ICEDEG 2017, pp. 216–220 (2017). https://doi.org/10.1109/ICEDEG.2017.7962537
14. Hardyanto, H.: Smartclass design based on Internet of Things. In: International Confer-
ence on Education and Science, Icons, pp. 959–962 (2017)
15. Bagheri, M., Movahed, S.H.: The effect of the Internet of Things (IoT) on education business
model. In: Proceedings - 12th International Conference on Signal Image Technology and
Internet-Based Systems, SITIS 2016, pp. 435–441 (2017). https://doi.org/10.1109/SITIS.201
6.74
16. Ramlowat, D.D., Pattanayak, B.K.: Exploring the Internet of Things (IoT) in education: a
review. In: Satapathy, S.C., Bhateja, V., Somanah, R., Yang, X.-S., Senkerik, R. (eds.) Infor-
mation Systems Design and Intelligent Applications. AISC, vol. 863, pp. 245–255. Springer,
Singapore (2019). https://doi.org/10.1007/978-981-13-3338-5_23
17. Alalade, A.M., Ejemeyovwi, J.O., Ekong, E.E., Adeyemo, D.: Internet of Things as a tool
for enhancement of education administration and delivery. Int. J. Mech. Eng. Technol. 10(5),
48–62 (2019)
18. Marquez, J., Villanueva, J.,. Solarte, Z, Garcia, A.: IoT in education: integration of objects
with virtual academic communities. In: Advances in Intelligent Systems and Computing, vol.
444, pp. 201–212 (2016). https://doi.org/10.1007/978-3-319-31232-3_19
19. Rodney, B.D.: Understanding the paradigm shift in education in the twenty-first century: the
role of technology and the Internet of Things. Worldw. Hosp. Tour. Themes 12(1), 35–47
(2020). https://doi.org/10.1108/WHATT-10-2019-0068
20. He, X., Guo, H., Cheng, X.: Blockchain-based privacy protection scheme for IoT-assisted
educational big data management. Wirel. Commun. Mob. Comput. 2021 (2021). https://doi.
org/10.1155/2021/3558972
21. Guo, J., Sun, C.: Real-time monitoring of physical education classroom in ceges and universi-
ties based on open IoT and cloud computing. J. Intell. Fuzzy Syst. 40(4), 7397–7409 (2021).
https://doi.org/10.3233/JIFS-189563
22. Paganelli, F., Mylonas, G., Cuffaro, G.: A RESTful rule management framework for internet
of things applications. IEEE Access 8, 217987–218001 (2020). https://doi.org/10.1109/ACC
ESS.2020.3041321
646 A. K. Alhazmi et al.

23. Herlianto, H.R., Kusuma, G.P.: IoT-based student monitoring system for smart school appli-
cations. Int. J. Emerg. Trends Eng. Res. 8(9), 6423–6430 (2020). https://doi.org/10.30534/ije
ter/2020/242892020
24. Jasim, N.A., Salim AlRikabi, H.T., Farhan, M.S.: Internet of Things (IoT) application
in the assessment of learning process. In: IOP Conference Series: Materials Science and
Engineering, vol. 1184, no. 1, p. 012002 (2021). https://doi.org/10.1088/1757-899x/1184/1/
012002
25. Miglino, O., Di Fuccio, R., Di Ferdinando, A., Ricci, C.: BlockMagic, a hybrid educational
environment based on RFID technology and Internet of Things concepts. In: Giaffreda, R.,
et al. (eds.) IoT360 2014. LNICSSITE, vol. 150, pp. 64–69. Springer, Cham (2015). https://
doi.org/10.1007/978-3-319-19656-5_10
26. Magyari, A., Chen, Y.: FPGA remote laboratory using IoT approaches. Electronics 10(18)
(2021). https://doi.org/10.3390/electronics10182229
27. Njeru, A.M., Omar, M.S., Yi, S., Paracha, S., Wannous, M.: Using IoT technology to improve
online education through data mining. In: Proceedings of the 2017 IEEE International Con-
ference on Applied System Innovation: Applied System Innovation for Modern Technology,
ICASI 2017, pp. 515–518 (2017). https://doi.org/10.1109/ICASI.2017.7988469
28. Shinghal, K., Saxena, A., Saxena, N., Misra, R.: IoT based modified hybrid blended learning
model for education. In: Proceedings of the 2020 International Conference on Advances
in Computing, Communication and Materials, ICACCM 2020, pp. 229–232 (2020). https://
doi.org/10.1109/ICACCM50413.2020.9213049
29. AjazMoharkan, Z., Choudhury, T., Gupta, S.C., Raj, G.: Internet of Things and its applications
in E-learning. Int. J. Eng. Technol. 7, 422–427 (2017). https://doi.org/10.1109/CIACT.2017.
7977333
30. Zaguia, A., Ameyed, D., Haddar, M., Cheikhrouhou, O., Hamam, H.: Cognitive IoT-based
e-learning system: enabling context-aware remote schooling during the pandemic. J. Healthc.
Eng. 2021 (2021). https://doi.org/10.1155/2021/7358874
31. Yakoubovsky, R., Sarian, V.: IoT in effective distance learning process. In: 2019 10th IFIP
International Conference on New Technologies, Mobility and Security, NTMS 2019, pp. 1–4
(2019). https://doi.org/10.1109/NTMS.2019.8763805
32. Hassan, R.H., Hassan, M.T., Naseer, S., Khan, Z., Jeon, M.: ICT enabled TVET education:
a systematic literature review. IEEE Access (2021). https://doi.org/10.1109/ACCESS.2021.
3085910
33. Hayashi, V.T., Arakaki, R., Ruggiero, W.V.: OKIoT: trade off analysis of smart speaker archi-
tecture on open knowledge IoT project. Internet Things 12 (2020). https://doi.org/10.1016/j.
iot.2020.100310
34. Riekki, J., Mammela, A.: Research and education towards smart and sustainable world. IEEE
Access 9, 53156–53177 (2021). https://doi.org/10.1109/ACCESS.2021.3069902
35. Shi, W., Haga, A., Okada, Y.: Web-based 3D and 360∝VR materials for IoT security education
and test supporting learning analytics. Internet Things. 15, 100424 (2021). https://doi.org/10.
1016/j.iot.2021.100424
36. Pour, M.J., Hosseinzadeh, M., Rafiei, K.: Identifying and prioritizing applications of Inter-
net of Things (IOT) in educational learning using Interval Best-Worst Method (BWM). In:
Proceedings of the 4th International Conference on Smart City, Internet Things Applications,
SCIoT 2020, pp. 1–6 (2020). https://doi.org/10.1109/SCIOT50840.2020.9250206.
37. A.-L. Enterprise: The Internet of Things in Education Improve learning and teaching
experiences by leveraging IoT on a secure foundation Solution Brief IoT in Education
38. Kassab, M., Neto, V.V.G., Allian, A.: Investigating quality requirements from a human per-
spective in IoT-based software architectures for education. In: PervasiveHealth: Pervasive
Computing Technologies for Healthcare, vol. 2, pp. 241–244 (2019). https://doi.org/10.1145/
3344948.3344978
The Internet of Things as a Tool Towards Smart Education 647

39. Ionescu-Feleaga, L., S, tefan Ionescu, B., Bunea, M.: The IoT technologies acceptance in
education by the students from the economic studies in Romania. Amfiteatru Econ. 23(57),
342–359 (2021). https://doi.org/10.24818/EA/2021/57/342
40. Tripathi, G., Ahad, M.A.: IoT in education: an integration of educator community to promote
holistic teaching and learning. In: Nayak, J., Abraham, A., Krishna, B.M., Chandra Sekhar,
G.T., Das, A.K. (eds.) Soft Computing in Data Analytics. AISC, vol. 758, pp. 675–683.
Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-0514-6_64
41. Bajracharya, B., Blackford, C., Chelladurai, J.: Number 1, vol. 6
42. Kumar, S.R., et al.: This work is licensed under a creative commons attribution 4.0 interna-
tional license IOT based cloud integrated smart classroom and sustainable campus. Int. Adv.
Res. J. Sci. Eng. Technol. 8(5) (2021). https://doi.org/10.17148/IARJSET.2021.8560
43. Gómez, J., Huete, J.F., Hoyos, O., Perez, L., Grigori, D.: Interaction system based on Internet
of Things as support for education. Procedia Comput. Sci. 21, 132–139 (2013). https://doi.
org/10.1016/j.procs.2013.09.019
44. Kiryakova, G., Yordanova, L., Angelova, N.: Can we make Schools and universities smarter
with the Internet of Things? TEM J. 6(1), 80–84 (2017). https://doi.org/10.18421/TEM61-11
45. Kumar, A., Vengatesan, K., Rajesh, M., Singhal, A.: Teaching literacy through animation &
multimedia. Int. J. Innov. Technol. Explor. Eng. 8(5), 73–76 (2019)
46. Romeo, L., Petitti, A., Marani, R., Milella, A.: Internet of robotic things in smart domains:
applications and challenges. Sensors 20(12), 1–23 (2020). MDPI AG. https://doi.org/10.3390/
s20123355
47. Gutiérrez-Martínez, Y., et al.: A challenge-based learning experience in industrial engineering
in the framework of education 4.0. Sustainability 13(17) (2021). https://doi.org/10.3390/su1
3179867
48. Shrestha, S.K., Furqan, F.: IoT for smart learning/education (2021)
49. Abed, S., Alyahya, N., Altameem, A.: IoT in education: its impacts and its future in saudi
universities and educational environments. In: Luhach, A.K., Kosa, J.A., Poonia, R.C., Gao,
X.-Z., Singh, D. (eds.) First International Conference on Sustainable Technologies for Com-
putational Intelligence. AISC, vol. 1045, pp. 47–62. Springer, Singapore (2020). https://doi.
org/10.1007/978-981-15-0029-9_5
50. Mohammadian, H.D.: IoT - a solution for educational management challenges. In: IEEE
Global Engineering Education Conference, EDUCON, pp. 1400–1406, April 2019. https://
doi.org/10.1109/EDUCON.2019.8725213
51. Silva, R., de Pontes Bernardo, C., Watanabe, C.Y.V., da Silva, R.M.P., da Silva Neto, J.M.:
Contributions of the internet of things in education as support tool in the educational man-
agement decision-making process. Int. J. Innov. Learn. 27(2), 175–196 (2020). https://doi.
org/10.1504/IJIL.2020.105077
52. Mahmood, S., Palaniappan, S., Hasan, R., Sarker, K.U., Abass, A., Rajegowda, P.M.: Rasp-
berry PI and role of IoT in education. In: 2019 4th MEC International Conference on Big
Data and Smart City, ICBDSC 2019, pp. 1–6 (2019). https://doi.org/10.1109/ICBDSC.2019.
8645598
53. Cornetta, G., Touhafi, A., Togou, M.A., Muntean, G.M.: Fabrication-as-a-service: a web-
based solution for STEM education using Internet of Things. IEEE Internet Things J. 7(2),
1519–1530 (2020). https://doi.org/10.1109/JIOT.2019.2956401
54. Fang, A.D., Xie, S.C., Cui, L., Harn, L.: Research on the structure and practice of internet
environment of things based on big data analysis. Ekoloji 28(107), 4239–4247 (2019)
55. Alhazmi, A.K., Imtiaz, A., Al-Hammadi, F., Kaed, E.: Success and failure aspects of LMS
in e-learning systems. Int. J. Interact. Mob. Technol. 15(11), 133–147 (2021). https://doi.org/
10.3991/ijim.v15i11.20805
648 A. K. Alhazmi et al.

56. Dai, Z., Zhang, Q., Zhu, X., Zhao, L.: A Comparative study of Chinese and foreign research
on the Internet of Things in Education: bibliometric analysis and visualization. IEEE Access
9, 130127–130140 (2021). https://doi.org/10.1109/ACCESS.2021.3113805
57. Kumar, S., Tiwari, P., Zymbler, M.: Internet of Things is a revolutionary approach for future
technology enhancement: a review. J. Big Data 6(1), 1–21 (2019). https://doi.org/10.1186/
S40537-019-0268-2/FIGURES/9
The VCDLN Mobile Learning System for Digital
Learning Services in Pandemic Covid-19

Deni Darmawan1(B) , Dinn Wahyudin1 , Dian Rahadian2 , Andri Suryadi3 ,


and Dianni Risda1
1 Universitas Pendidikan Indonesia, Jl. Dr. Setiabudhi No. 229, Bandung 40154, West Java,
Indonesia
[email protected]
2 Institut Pendidikan Indonesia, Garut, West Java, Indonesia
3 Universitas Terbuka, Jakarta, Indonesia

Abstract. During the Pandemic-covid-19, educators in remote villages require


developing the VCDLN (Virtual Community Digital Learning Nusantara) system.
The objectives of this research are as follows (1) integrating the Distance Learn-
ing system with Learning Resources multiple “Hand on hand Technology”; (2)
developing the CBT (Computer Based Test) system for e-assessment needs; and
(3) measuring the opinions of the VCDLN community members, which include
Mini Market, Village Office, District Police Office, Integrated Healthcare Cen-
ter, School, Sub-District Office, Community Health Centers, and Military Rayon
Command on the implementation of the program as a Multiplatform Distance
Learning model during the COVID-19 pandemic. A Mix Method was used in this
study (Qualitative and Quantitative). Moreover, the findings revealed that the CBT
system had supported the development of mobile VCDLN as a portable VCDLN
service system. Furthermore, the VCDLN community’s influence, either partially
or simultaneously, positively impacts the program’s implementations.

Keywords: VCDLN · Mobile learning system · Digital learning · Pandemic


Covid-19

1 Introduction
The COVID-19 pandemic, which impacts service delivery and direct learning interac-
tions, requires new studies for policymakers, scientists, and industrial partners provid-
ing digital learning platforms to collaborate in the development of an e-learning system
[1]. According to research, several professional educator organizations such as Subject
Teacher Consultations (STC) and Teacher Working Groups (TWG), Indonesian Teach-
ers Association, and even UNESCO, quickly implemented the necessary innovations.
Everyone has realized how important it is to establish a robust communication system
and strategy [2] in learning services right away. Also, new techniques and strategies are
urgently needed considering the development of the educational world, which has seen
a shift from face-to-face teaching to learning from home.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 649–659, 2023.
https://doi.org/10.1007/978-3-031-18344-7_46
650 D. Darmawan et al.

As part of its strategic response to the COVID-19 pandemic in Indonesia, the ministry
developed an online digital mobile learning innovation policy [3]. This is a new force in
the “New Normal Education” era’s learning revolution. As a result, this research aims to
create a “Virtual Community Digital Learning Nusantara in the COVID-19 pandemic”.
It is expected to be able to accommodate all innovations and revolutions in learning
steps through the mobile systems [4] and healthy learning communication strategies in
a virtual, community, digital, online, mobile, electronic, distance learning framework
that packaged in the form of Television Digital Mobile. Then, this is developed and uti-
lized by the VCDLN community, which includes educators, industry, local governments,
schools, minimarkets, police stations, village, and sub-district heads, all of whom work
together to serve students in remote parts of the archipelago. Specifically, the objectives
of this research are as follows (1) to integrate the distance learning system with Learning
Resources multiple “Hand on hand Technology”; (2) to develop a CBT system for e-
assessment needs; and (3) to measure the opinions of the VCDLN community members
which include minimarket, village office, district police office, integrated healthcare cen-
ter, school, sub-district office, community health centers and military Rayon Command
on the implementation of the program as a model for Multiplatform Distance Learning,
during the COVID-19 pandemic.

2 Literature Review
2.1 Element of the VCDLN System
In retrospect, some essential elements in implementing the VCDLN system can be
analyzed regarding several terms and objects or target subjects that are often used in
educational practice, such as software, hardware, Brain ware, and environmental ware.
Likewise, the analysis is likely to put into practice quickly a new mobile learning concept

Hardware

Brain ware

Environment
ware SoŌ
ware

Learning PracƟces
Technological Virtual
EducaƟon Community
InsƟtuƟon (Euro,
USA, Asia)

Virtual
communicaƟon
Technology
(Spain)
Social Media
& Website

Fig. 1. Element of the VCDLN system


The VCDLN Mobile Learning System for Digital Learning Services 651

or model adapted to research from [5, 6]. In the implementation paradigm, mobile
education communication system and strategy services are needed [7], and the learning
innovation in this research is called the Virtual Community Digital Learning Nusantara
(VCDLN) system. Further, the analysis results of these elements can be seen in the
following Fig. 1 below this.
These elements of VCDLN System, like on the Fig. 1 become the main study in the
regulation in the section above that the current rules and demands return to normal in
new conditions. Then the implementation of this system must be harmonized with the
“New Normal” Regulation following the evaluation of the Mobile Technology expert
system [8]. As a method of implementing VCDLN in the context of the realization of
this new normal, Mobile Blended Learning will be possible. In addition, this method is
governed by the Minister of Research, Technology, and Higher Education Regulation
Number 51 of 2018. Thus, this is a concrete form of education and learning policy with
face-to-face and distance learning systems that use online databases as a research study
[4], primarily through the Television Broadcast Program.

2.2 Television Broadcast-Based Mobile VCDLN


Today e-learning systems range from simple and inexpensive to complex and expensive
that has been meticulously designed, including educational television broadcasts. A
study was conducted on the level of proficiency or digital literacy by [9]. According to
the research, the competencies possessed by the millennial generation or generation-Z
are assumed to be capable of becoming designers of the digital learning information
system paths [10].
Based on analysis from the regulation policy of the Minister of Higher Education
at the time regarding the targets of implementing fully online distance learning, it was
confirmed that it would produce 80% of the knowledge [11]. It is also compatible with
[12] if performed mobile. Thus, the remaining 20% can be completed using Blended
Learning Television as one of the VCDLN models in the learning corridor in schools
during the “New Normal” [1] allowing for 100% success.
All of these regulations, however, must be adequately and compactly implemented.
Consequently, the VCDLN concept will become a superior platform and trend for all
parties involved in implementing education and learning in a “New Normal” condition.
The initiators of the VCDLN Community are in remote areas, including educators who
work face-to-face in the context of learning communication through mobile digital tele-
vision. Additionally, the atmosphere created during their meeting will be a forum for
exchanging experiences or working together to design the preparation and development
of VCDLN learning content when they carry out inter-community learning services.
Moreover, the power of content presented through access to a built and equipped ICT
center with an educational television platform will be less costly. The following illus-
trates in the Fig. 2, as the VCDLN Mobile Roadmap Schematic using TVUPI broadcasts
as the core model in this research.
Through this learning television LCJ (Live Campus Journalist), the VCDLN pro-
gram has evolved into a media and database of live learning materials and rebroadcasts
from the online learning community. Gradually, [13] refers to the production of online
teaching materials to realize a Virtual Community Digital Learning Nusantara that can
652 D. Darmawan et al.

Fig. 2. VCDLN mobile roadmap scheme based on television broadcasting technology

be facilitated through access to LCJ-TVUPI for dissemination to remote areas of the


country.

3 Research Method
This research employed the Mix Method (Qualitative and Quantitative) [14], which is
commonly used in study every year (2001, 2022, and 2023). The qualitative approach was
used to integrate Distance Learning Services with multiple Learning Resources “Hand
on hand Technology” as a Mobile Television system adaptable to Elementary, Middle,
and Higher Education levels. Also, it was used to develop a CBT system for VCDLN’s
e-Assessment needs. Meanwhile, the quantitative approach was utilized to assess the
impact of the opinions of VCDLN community members on the implementation of the
VCDLN Service Program. In addition, Multiple Regression analysis was used as the test
statistics.

4 Result and Discussion

4.1 Integrating the Distance Learning Service System with Multiple Learning
Resources “Hand on Hand Technology”

The VCDLN system, which was built in a special website with the domain name VCDLN
Learning was developed as an e-learning educational database that accommodates the
learning video products [7]. Besides being a Dbase system, it was also created in the
form of Mobile-VCDLN learning through an APK application that can be accessed and
downloaded via Android, using research from [15]. The goal is to design a multiplatform
system that will benefit both educators and students. Furthermore, to utilize ready-made
The VCDLN Mobile Learning System for Digital Learning Services 653

learning video products, the last system is in the form of Youtube Official with the name
VCDLN access, which is integrated with TVUPI, which can be found at https://www.
youtube.com/watch?v=hGDec-Jpm4E&t=206s.
In this pandemic era, the three systems that are integrated into the Mobile Distance
Learning model are constructed in the following below this Fig. 3.

Fig. 3. Integration of VCDLN development products for mobile distance learning services in
“hand on hand technology.”

The need for integration arises from the desire to serve all students who have varying
ownership levels of digital learning infrastructure. Considering that not all pupils have
mobile phones, VCDLN technology tries to provide a variety of devices, channels,
and learning resources, citing the research [8]. Furthermore, one of the most important
reasons for incorporating the VCDLN product is to address differences in the region and
geographical conditions in which students live. Indonesia has regional differences that
are still limited in terms of access and signals so far. As a result, the solution developed
by VCDLN, Satellite TVUPI, will become an option in distance learning services with
research references dating back to [16].

4.2 Developing CBT System for e-Assessment Needs in VCDLN

This research also developed an evaluation tool to measure students’ success in learning
using the Multiplatform VCDLN. Because the learning system is digital, online, and
mobile-based, the evaluation system is also digital, online, and mobile using the CBT
concept (Computer Based Test). This CBT system is placed as an e-Assessment function,
with architectural considerations for online digital services that mobile online clients can
access.
654 D. Darmawan et al.

Client

Server
Client

Fig. 4. Client-server system

Figure 4 above shows the Client-server communication system. The application has
two login page views for test takers and administrators. Participants can only register and
take tests, whereas VCDLN administrators (in this case: teachers) can manage systems
such as questions, texts, grades, and students’ [1]. In addition, students can register, test,
and view test results in this system, while educators or administrators can manage text,
types of questions, users, and test results [16].
The first step in developing CBT VCDLN is making DFD by creating context dia-
grams. This illustration depicts the system as a whole, with visible external entities.
Then, it is subdivided into DFD level 1, which consists of processes that occur within
the context of the diagram. It is broken down from DFD level 0 to DFD level 1, a more
detailed process, like on Fig. 5 below this.

Export nilai
Logout
Home Text
Question
Partisipants as User Registration
Guidline Evaluation
Rest of CBT Result
Manage of Login
Manual CBT Test

Teacher as Computer Based Test for Teacher as


Administrator VCDLN Multiplatform Administrator
Login
User & Password
Manage Home Text See Questions
Manage Participants Logout
Show the Result
Manage Questions
See of Evaluation
Manage Guideline
Manage test Results

Fig. 5. Data flow diagram level 0

DFD level 0 shows the CBT process traffic for each educator during the online
evaluation process for their students, to which this development refers [17]. As shown
in the following figure, a level 1 DFD was developed to explain how the CBT system
works. The interrelation diagram between all elements in Data Flow Diagram Level 01,
can be seen in the Fig. 6 below this.
The VCDLN Mobile Learning System for Digital Learning Services 655

1.12 User Data User


Logout Data user
Administrati
Data User 1.1
on
Registration
Logout
Information 1.3
Manage
Manage
Participants
Users User &
Password
1.11 1.2
Login Login
Administrati Login
on 1.4
Manage
Login Manage Item Information Login
Item Test
Tests
Manage
Data Home Text Partisipants
Pesan
Userlogin Item Test
Logout
Administrator
Test Choice Information

Question
Result
1.5
Admin Manage Logout
Manual
Questions 1.7
Item Test
Data Score test questions Show
Results
Result of
Evaluation Evaluation
1.10 1.6
Data Modul
Modul Result
Result of 1.8 Show
Evaluation Evaluation Evaluation

Modul Data modul


1.9 score
Export Score Score
Scores

Fig. 6. Data flow diagram level 01

4.3 The VCDLN Community’s Influence on the VCDLN Program’s


Implementation as a Multiplatform Distance Learning Model
in the COVID-19 Pandemic

This research assessed the opinions of the VCDLN community to determine the impact
of the VCDLN program’s implementation based on the findings of a previous study [18].
The communities that have been designated as members of the VCDLN are asked to
contribute their thoughts on the influence of changes caused by the program’s integration
in their area. Moreover, the measurement process was carried out using multiple regres-
sion test statistics. The following measurement output shows the results of calculations
regarding the effect of 8 community variables (X1 , X2 , X3 , X4 X5 , X6 , X7 , and X8 ) on
the success variables of the VCDLN program implementation (Y). For more details can
be seen in Table 1 below.

Table 1. Model summary

Model R R square Adjusted R Std. Error Change statistics


Square of the R square F change df1 df2
estimate change
1 .970a .940 .929 .13440 .940 84.591 8 43
656 D. Darmawan et al.

The table above depicts the relationship between the independent variables consisting
of (X1 ), (X2 ), (X3 ), (X4 ), (X5 ), (X6 ), (X7 ), and (X8 ) on the dependent variable (Y),
namely the VCDLN Program Implementation, with a coefficient of 0.794. The total
contribution of Military Rayon Command, Community Health Centers, Sub District
Offices, Integrated Healthcare Centers, District Police Offices, and schools to implement
the program is 0.59. Thus, the successful implementation of the VCDLN program is
determined by the contribution of the 8 X variables, namely KP x 100% = 0.590 x
100% = 59.0%. In contrast, the remaining 41% is influenced by other variables not
examined. This finding demonstrates that the strength of each leadership community
will determine the success of the VCDLN program field [19].
Furthermore, to prove the significance of the simultaneous effect of the dependent
variable of (X1 ), (X2 ), (X3 ), (X4 ), (X5 ), (X6 ), (X7 ), and (X8 ) on the success of the
VCDLN (Y) program implementation, the ANOVA test with the F-count formula is used
as follows.
R2
K
F−count = (1)
(1 − R2 )/(n − 1)
The SPSS output results can be seen in the ANOVA table below.

Table 2. Anova

Model Sum of squares df Mean square F Sig


1 Regression 12.223 8 1.528 84.591 .000b
Residual .777 43 .018
Total 13.000 51
a. Dependent Variable: Implementation Program of VCDLN (Y).
b. Predictors: (Constant), Mini Market (X1), Village Office (X2), District Police Office (X3),
Integrated Healthcare Center (X4), School (X5), Sub District Office (X6), Community Health
Centers (X7), Military Rayon Command (X8).

According to the Table 2 above, the F count value is 84 591, with a significance value
of 0.000. This value is greater than the F-Table value of 4.57, and the F Significance
Value is less than α = 5%. It can be explained that the null hypothesis is rejected. The
alternative view is accepted, which means that the variables (X1 ), (X2 ), (X3 ), (X4 ),
(X4 ), (X5 ), (X6 ), (X7 ), and (X8 ), which are community strengths, simultaneously have
a significant effect on the Y variable [20].
Based on the Coefficients on the Table 3 above, the multiple regression equation


can be formulated, namely Y = 0,300 + 0,106X1 + 0,368X2 + 0,222X3 + 0,352X4 +


0,552X5 + 0,286X6 + 0,520X7 + 0,282X8. It shows that the positive constant value of
0.300 indicates that all dependent variables positively affect the independent variable. If
the dependent variable increases or has an effect in one unit of time, the implementation
of the VCDLN program will increase. This finding is consistent with studies of [21] and
[22].
The VCDLN Mobile Learning System for Digital Learning Services 657

Table 3. Coefficients

Model Unstandardized Standardized t Sig.


coefficients coefficients
B Std. Error Beta
1 (Constant) .300 .198 2.516 .037
Mini Market .106 .075 .105 2.309 .046
(X1)
Village Office .368 .064 .073 2.555 .027
(X2)
District Police .222 .081 .222 2.260 .004
Office (X3)
Integrated .352 .057 .056 2.909 .039
Healthcare
Center (X4)
School (X5) .552 .063 .162 4.402 .021
Sub District .286 .083 .086 2.039 .015
Office (X6)
Community .520 .073 .518 7.111 .000
Health Centers
(X7)
Military Rayon .282 .048 .086 1.692 .048
Command (X8)

According to the table above, the minimarket regression coefficient (X1 ) is 0.106.
It indicates that if the score increases by one unit, the implementation of the VCDLN
program will rise by 10.6% as a result of the minimarket community’s influence. The
impact of other variables in a row starting from the Village Office variable (X2 ) with an
increase of 0.306 or 30.6%; District Office (X3 ) of 0.222 or 22.2%; Integrated Health
Care Center (X4 ) by 0.352 or 35.2%; School (X5 ) of 0.552 or 55.2%; Sub District Office
(X6 ) of 0.286 or 28.6%; Community Health Centers (X7 ) of 0.520 or 52.0%; Military
Rayon Command (X8 ) of 0.282 or 28.2%. These findings support research from [23] on
the power of Understanding value co-creation in virtual communities.
Furthermore, to test the significance level of each effect of the dependent variable
on the independent variable, the t-count value is used. The table above expresses that
all t-counts are greater than t-tables, 2.021. As a result, Ho is Rejected, and H1 is
accepted, indicating that the regression coefficient of the variable’s influence (Constant),
Military Rayon Command (X8 ), School (X5 ), Village Office (X2 ), Integrated Healthcare
Center (X4 ), Community Health Centers (X7 ), Mini Market (X1 ), District Police Office
(X3 ), and Sub District Office (X6 ) have a significant influence on VCDLN Program
Implementation (Y). These vital variables are expected to be able to become the basis
for distance learning services ranging from early childhood education to university, as
research from [21, 24–27].
658 D. Darmawan et al.

5 Conclusion
The implementation of this VCDLN innovation program achieved the targets of (1) inte-
grating the Distance Learning system with Learning Resources multiple “Hand on Hand
Technology,” (2) developing the CBT system for e-assessment needs, and (3) measuring
the opinions of the VCDLN community members which include Mini Market, Vil-
lage Office, District Police Office, Integrated Healthcare Center, School, Sub-District
Office, Community Health Centers, and Military Rayon Command on the implementa-
tion of the program as a Multiplatform Distance Learning model during the COVID-19
Pandemic. They have all provided comprehensive benefits in delivering solutions to dis-
tance learning services throughout Indonesia during the pandemic. The measurements
revealed that the contribution and influence of all the VCDLN community’s opinions on
the implementation of the programs are positive. With the help of the learning service
community, this VCDLN Mobile, which was built and tested, can be a multi-platform
distance learning service solution.

References
1. Tawafak, R.M., et al.: A combined model for continuous intention to use e-learning system.
Int. J. Interact. Mob. Technol. Technol. 15(03), 113–129 (2021)
2. Landicho, J.A.: VOISEE COMMUNICATOR: an android mobile application for hearing-
impaired and blind communications. Int. J. Interact. Mob. Technol. 10(4), 26 (2016). https://
doi.org/10.3991/ijim.v10i4.5859
3. Mun, S.H., et al.: Active learning using digital smart board to enhance primary school students’
learning. Int. J. Mob. Technol. 13(7), 4–16 (2019)
4. Rimale, Z., El Habib, B., Tragha, A., El Guemmat, K.: Survey on the use of the mobile
learning based on mobile cloud computing. Int. J. Interact. Mob. Technol. 10(3), 35 (2016).
https://doi.org/10.3991/ijim.v10i3.5672
5. Chohan, A.H., Mohd Affandi, H., Awad, J., Che-Ani, A.I.: A methodology to develop a
mobile application model to appraise housing design quality. Int. J. Interact. Mob. Technol.
11(6), 4 (2017). https://doi.org/10.3991/ijim.v11i6.6379
6. Almeatani, M., Alotaibi, H., Alasmari, E., Meccawy, M., Alghamdi, B.: Thesis supervision
mobile system for enhancing student-supervisor communication, pp. 4–14 (2019)
7. Karkar, A., Al Ja’am, J.: An educational ontology-based m-learning system. Int. J. Interact.
Mob. Technol. 10(4), 48 (2016). https://doi.org/10.3991/ijim.v10i4.6011
8. Divayana, D.G.H., et al.: An evaluation of instructional process of expert system course
program by using mobile technology-based CSE-UCLA model. Int. J. Interact. Mob. Technol.
11(6), 18 (2017). https://doi.org/10.3991/ijim.v11i6.6697
9. Villa-martinez, H.A.: Digital learning tools for mobile devices for accomplish hypothesis
testing of statistical parameters. Int. J. Interact. Mob. Technol. 13(6), 15–26 (2019)
10. Sharma, K., Mangaroska, K., van Berkel, N., Giannakos, M., Kostakos, V.: Information flow
and cognition affect each other: evidence from digital learning. Int. J. Hum. Comput. Stud.
146, 102549 (2021). https://doi.org/10.1016/j.ijhcs.2020.102549
11. Deng, C., Ji, X., Rainey, C., Zhang, J., Lu, W.: Integrating machine learning with human
knowledge. iScience 23(11), 101656 (2020). https://doi.org/10.1016/j.isci.2020.101656
12. Kattayat, S., Josey, S., Asha, J.V.: Mobile learning apps in instruction and students achieve-
ment. Int. J. Interact. Mob. Technol. 11(1), 143–147 (2017). https://doi.org/10.3991/ijim.
v11i1.6420
The VCDLN Mobile Learning System for Digital Learning Services 659

13. Zhao, H.: A summary of the research on the teaching mode of MOOCs, pp. 96–109 (2019).
https://doi.org/10.4236/jss.2019.72007
14. Creswell, J.D., Creswell, J.W.: Research Design: Qualitative, Quantitative, and Mixed
Methods Approaches. Sage Publ. (2017)
15. Zhampeissova, K., Kosareva, I., Borisova, U.: Collaborative mobile learning with smartphones
in higher education. Int. J. Interact. Mob. Technol. 14(21), 4 (2020). https://doi.org/10.3991/
ijim.v14i21.18461
16. Haddad, M.E.O., Ferreira, N.S.C., Faria, A.A.: The use of educational technologies in distance
education—enabling the appropriation of teaching and learning process. Open J. Soc. Sci.
02(01), 54–58 (2014). https://doi.org/10.4236/jss.2014.21006
17. Kraleva, R.: Designing an interface for a mobile application based on children’s opinion. Int.
J. Interact. Mob. Technol. 11(1), 53–70 (2017). https://doi.org/10.3991/ijim.v11i1.6099
18. Kattayat, S., Josey, S., Asha, J.V.: Mobile learning apps in instruction and students achieve-
ment. Int. J. Interact. Mob. Technol. 11(1), 143 (2017). https://doi.org/10.3991/ijim.v11i1.
6420
19. Hamzah, M.I.M., Jamil, M.F.: The relationship of distributed leadership and professional
learning community. Creat. Educ. 10(12), 2730–2741 (2019). https://doi.org/10.4236/ce.
2019.1012199
20. Gómez, R.L., Suárez, A.M.: Extending impact beyond the community: protocol for a scoping
review of evidence of the impact of communities of practice on teaching and learning in
higher education. Int. J. Educ. Res. Open 2, 100048 (2021). https://doi.org/10.1016/j.ijedro.
2021.100048
21. Strunga, A.: The integration of virtual learning communities into universities’ knowledge
management Models. Procedia Soc. Behav. Sci. 197, 2430–2434 (2015). https://doi.org/10.
1016/j.sbspro.2015.07.306
22. Strungă, A.: Using virtual learning communities in shaping the professional identity of pri-
mary and preschool pedagogy specialization students: a knowledge management approach.
Procedia Soc. Behav. Sci. 180, 460–467 (2015). https://doi.org/10.1016/j.sbspro.2015.02.145
23. Rodríguez-López, N.: Understanding value co-creation in virtual communities: the key role
of complementarities and trade-offs. Inf. Manag. 58(5) (2021). https://doi.org/10.1016/j.im.
2021.103487
24. Amponsah, E., Fusheini, A., Adam, A.: Influence of information, education and communi-
cation on prenatal and skilled delivery in the Tano North District, Ghana: a cross-sectional
study. Heliyon 7(6), e07245 (2021). https://doi.org/10.1016/j.heliyon.2021.e07245
25. Ahmady, S., Kohan, N., Bagherzadeh, R., Rakshhani, T., Shahabi, M.: Validity testing of
classroom community scale in virtual environment learning: a cross sectional study. Ann.
Med. Surg. 36, 256–260 (2018). https://doi.org/10.1016/j.amsu.2018.08.021
26. Liu, X., Zhang, J.: Foreign language learning through virtual communities. Energy Procedia
17, 737–740 (2012). https://doi.org/10.1016/j.egypro.2012.02.165
27. Aderibigbe, S.A., Dias, J.M., Abraham, M.S.: Understanding issues affecting students’ com-
mitment to online discussion forums in undergraduate courses. Int. J. Interact. Mob. Technol.
15(1), 4–23 (2021). https://doi.org/10.3991/IJIM.V15I01.17939
Applying Design Thinking Approach to Improve
Online Education

Asma Alwadai and Reem Alnanih(B)

Computer Science Department, Faculty of Computing and Information Technology, King


Abdulaziz University, Jeddah, Saudi Arabia
[email protected]

Abstract. COVID-19 has transformed face-to-face learning interactions and


forced children to stay home and connect through online education. During
COVID-19, the Ministry of Education in Saudi Arabia established the Madrasty
platform as the new gateway for distance teaching and learning for the 1st to 12th
grade for the 2020–2021 school year. However, students have faced many issues
with the platform related to usability and features. This paper aims to improve the
design of the Madrasty website educational in Saudi Arabia during online educa-
tion through applying the Design Thinking approach as a foundation to extract the
requirement set for a design prototype with complete features options to support
remote learning. In addition to investigating the level of depression between the
physical school attendance and remote school attendance in Saudi Arabia, this
comparison is to consider adding the feature of mental health states to the educa-
tional website for remote learning. The authors studied the issues in Madrasty and
proposed the newly updated platform, including a set of new features to resolve
the problems in the existing platform. Usability testing was conducted with two
types of users- teachers and students. The usability testing consists of two phases:
1) a list of tasks to measure features’ usability, and 2) a post-test questionnaire
to measure users’ level of satisfaction. The result shows an excellent rate in the
number of participants in both groups regarding the number of clicks and time to
perform the tasks.

Keywords: Design Thinking · Child depression level · Online education ·


Usability testing · Educational website design · Madrasty

1 Introduction
The massive worldwide expansion of online education in response to COVID-19 teaches
many lessons about the value of distributed learning; equally important, it reminds us of
its limits. For example, all the education facilities in King Saudi Arabia (KSA) moved
to 100% online learning during the COVID-19 crisis. Squinting at a tiny display is more
taxing and less enjoyable than conventional in-person interactions for many professors,
parents, and students who hold and conduct their lessons online. Although there are not
any comprehensive data or surveys capturing this phenomenon yet, many teachers and

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 660–679, 2023.
https://doi.org/10.1007/978-3-031-18344-7_47
Applying Design Thinking Approach to Improve Online Education 661

students report that they cannot spend as much time learning online as they could in
person without being overwhelmed [1].
The COVID-19 pandemic has also led to depression, anxiety, and mental issues,
among school children [2]. During the current pandemic situation, it has been important
to continue children’s education by implementing measures for remote online learn-
ing. However, the change in routine and limited access to their courses affect students’
behavior, alter their moods, and can cause acute depression [1]. Children are unable to
cope with online education, as they have difficulties managing challenges caused by
the pandemic. They are unable to attend school in person and are restricted to online
education. They can often become depressed when they are unable to complete their
tasks at home, and see online education as a burden [3].
Many optimization algorithms have been rendered unusable over time since not all
stakeholders’ interests are often addressed when creating a unique technological solution.
Recognizing this, Design Thinking is a strategic approach based on humans to improve
solving problems through invention in different fields [4].
Design Thinking is “a discipline that uses the designer’s sensibility and methods to
match peoples’ needs with what is technologically feasible and what a viable business
strategy can convert into customer value and market opportunity” [5]. It underlines a
thoughtful and specific process for identifying issues in the system and developing or
coming up with potential solutions. It is predicated on the transformative yet simple
notion that individuals who face problems daily would have a high likelihood of holding
the key to solving them. Design thinkers often collaborate with multiple stakeholders
and actively identify problems and remedies, so the resultant solutions are the product of
thought, collaboration, and iterative effort from different perspectives. Design Thinking
revolves around three aspects: desirability (desirable conditions are available to help
understand the scenario), viability (the ability to grow and observe conditions), and
feasibility (conditions matching with the people’s needs) [6].
Online education cause depression, anxiety, and frustration among youth and chil-
dren. About 13.4% aged 5 to 24 years old account for mental health disorders in the
USA [7]. Studies in Saudi Arabia showed that depression was present among 6.7% of
children aged 14–25 years and 11.3% of children aged 7–9 years [8].
Since COVID-19, the Ministry of Education in Saudi Arabia established the
Madrasty platform as the new gateway for distance teaching and learning for the 1st
to 12th grade for the 2020–2021 school year. However, students have faced many issues
with the platform. The authors studied these issues as part of this research.
This paper aims to improve the design of the Madrasty website educational in Saudi
Arabia during online education through applying the Design Thinking approach as a
foundation to extract the requirement set for a design prototype with complete options
to support remote learning. In addition to investigate the level of depression between
the physical attendance in school and remote school attendance in Saudi Arabia. This
comparison is considering adding the feature of mental health states to the educational
website for remote learning.
The rest of the paper is organized as follows: Sect. 2 presents the related work in
applying Design Thinking in education and the relation between depression in children
and online education. Section 3 illustrates the approach of Design Thinking workflow.
662 A. Alwadai and R. Alnanih

Section 4 addresses the emphasis phase, including the data collection and analysis.
Section 5 identifies the problem and Sect. 6 defines the idea. Section 7 presents the
prototype. Section 8 describes the test phase, and Sect. 9 shows the result analysis in
detailed tabular form. Section 10 discusses the work and concludes in Sect. 11.

2 Literature Review
The literature presents the related work from two views: 1) Design Thinking in education
and 2) children’s depression in remote learning.

2.1 Design Thinking in Education


In the twenty-first century, students desire to be equipped with meta-competencies that
go beyond academic ability. As a result, education must shift from transmitting infor-
mation to improving student prospects through social constructivism. Teachers may use
Design Thinking as a collaborative learning approach to enhance practice-oriented and
comprehensive constructive learning in assignments. Research has affirmed that adopt-
ing Design Thinking improves the teaching experience for both instructors and students
[7]. This results in a more favorable attitude about constructive learning and greater use
in education. Fan [8] argued that a wide vision of STEAM (an educational approach to
learning that uses Science, Technology, Engineering, the Arts, and Mathematics as access
points for guiding student inquiry, dialogue, and critical thinking) beyond arts integra-
tion would be beneficial, and discussed the possibilities of Design Thinking in STEAM.
Despite widespread interest in STEAM, some teachers find it difficult to incorporate it
into their online course material. They claimed that Design Thinking, as a multidisci-
plinary intersection, serves as a natural link between aesthetics, mathematics, and other
disciplines. It can provide a unique system and innate predisposition for instructors to
develop STEAM-based courses and combine them as an integral component of students’
development.

2.2 The Relation Between Children’s Depression and Online Education


The COVID-19 pandemic has caused an increased focus on individuals’ mental health. It
is understood that pandemics are widespread and trigger new stressors, including fears or
worries about themselves, physical mobility restrictions and social interaction affected
by quarantine, and abrupt and drastic lifestyle changes. A recent analysis of the effects of
outbreaks of viruses and pandemics has identified stressors including disease, frustration,
and depression [9]. Research on the current pandemic’s pathological or mental health
effects on students, considered a vulnerable population, is limited, especially from Saudi
Arabia [10].
Rajab et al. suggested pandemic-related anxiety and depression in a cross-sectorial
sample of Saudi Arabic medical students, which was related to the complexities in online
learning [1]. Duraku and Hoxha demonstrated a lack of concentration and enthusiasm
for learning and studying during the COVID-19 pandemic [11]. A decline in morale,
self-efficiency, and cognitive commitment has been correlated with the shift to remote
Applying Design Thinking Approach to Improve Online Education 663

learning. Schaeffer and Konetes found that students learning remotely are more likely
to leave their studies than traditional education students [12]. Furthermore, the major
factor affecting students’ studying skills was social isolation during online learning [13].
Generally, depression and anxiety are higher in children with ASD (autism spectrum
disorder), which may be due to school closures and working from home. Moreover,
depression has increased among parents due to a lack of professional support. Althiabi
et al. suggested a methodology to explore the anxiety and attitudes of children and parents
during COVID-19 [14]; that study helped to analyze factors for government involvement
to resolve the burden of working from home to stabilize the mental health of the people.

3 The Approach of Design Thinking


Design Thinking refers to the development of “design systems” ascertaining the prac-
tical, solution-based, strategic, and cognitive processes for coping with problems. By
analyzing existing models based on the Design Thinking framework, a design model is
outlined that helps define the attributes that affect depression in the selected age group.
Figure 1 shows the five key steps of the Design Thinking approach. Each step is
described in the following sections.

Fig. 1. Design Thinking 5-stage process

4 Empathize
Empathy-driven development aims to improve user involvement, engagement, and moti-
vation; as a result, it has the capability to mitigate some of the drawbacks of traditional
methods. The empathic design considers the entire end-to-end user experience in addition
to the core issue, its relevance, and the needs of multiple users [4].
The first step of the Design Thinking methodology is empathically providing the
researchers with real children’s data and needs to improve the current design of educa-
tional websites compatible with the remote learning needs and requirements. This phase
is achieved through two sequential steps; the output from the first step is input for the
second step. The details of these two steps are described below.
664 A. Alwadai and R. Alnanih

4.1 Step1

The researchers conducted a questionnaire to evaluate the current educational site and
verify adding features to support remote learning. The target users were the parent
of children in grades 1 and 2 (7–9 years old) and the Madrasty platform teachers. A
questionnaire was distributed to parents and teachers to measure their opinion about the
Madrasty platform in virtual learning. A total of 350 questionnaires were distributed,
74.3% for parents and 25.7% for teachers. A list of proposed features related to the
Madrasty platform and the results are presented in Table 1.

• Regarding having “Tutorial guide” video on how to use the Madrasty platform help
to better understand the platform’s work, about 81.4% of the participants agreed that,
as 57.4% of the parents and 24% of the teachers agreed that.
• Regarding the preference of an option for a student to display new “Notifications” on
the platform, such as new assignments and tasks to be completed, 90% preferred that,
as 66.9% of the parents and 23.1% of the teachers preferred it.
• Regarding the preference of the lesson “Recorded classes” for reference when study-
ing, about 82.6% of the total participants preferred that, as 64% of the parents and
18.6% of the teachers preferred it.
• Regarding preference of having an option for a student to display the “Grade Center”
consisting of different assessments (assignment - project test) for a course in one place
so that they can be followed up, about 82% preferred that, as about 59.4% of the parent
preferred and 22.6% of the teachers preferred it.

Table 1. Features in questionnaire analysis

Features Parent Teachers Total


Tutorial guide Yes 57.4% 24.0% 81.43%
To some extent 11.7% 1.7% 13.43%
No 5.1% 0.0% 5.14%
Notifications Yes 66.9% 23.1% 90.0%
To some extent 3.1% 2.3% 5.4%
No 4.3% 0.3% 4.6%
Recorded classes Yes 64.0% 18.6% 82.6%
To some extent 6.3% 3.1% 9.4%
No 4.0% 4.0% 8.0%
Grade Center Yes 59.4% 22.6% 82.0%
To some extent 10.0% 2.3% 12.3%
No 4.9% 0.9% 5.7%
Applying Design Thinking Approach to Improve Online Education 665

• Regarding the most important reasons that negatively affect the effectiveness of online
education, Table 2 shows the psychological reason such as (depression and anxiety)
has the important reason with about 41.7% of the total participants, as 30.3% of
the parents and 11.4% of the teachers selected the psychological reason. Then the
family and the health reasons got about 32% of the total participants as 16% for
each. In comparison, the material reason got 14.9% of the total. Furthermore, the
remaining 11.4% were for other reasons that negatively affect the effectiveness of
online education.

Table 2. Mental health feature analysis

What are the most reasons that negatively Parents Teachers Total
affect the effectiveness of online education Freq. Percent Freq. Percent Freq. Percent
(from your point of view)?
Psychological 106 30.3% 40 11.4% 146 41.7%
Family 39 11.1% 17 4.9% 56 16.0%
Health 48 13.7% 8 2.2% 56 16.0%
Material 34 9.7% 18 5.1% 52 14.9%
Other 33 9.4% 7 2% 40 11.4%
Total 260 74.3% 90 25.7% 350 100%

From this step, the author concludes that adding the aforementioned features to the
Madrasty platform is required and the importance of considering the mental health states
in the Madrasty platform.

4.2 Step 2

Based on the previous step, the researchers conducted a questionnaire to examine the
impact of children’s mental health status on physical school attendance and remote
school attendance. The sample population was chosen from the Asir and Riyadh regions
of Saudi Arabia because of the easy accessibility for the researchers during the year
2021. The target users were the parent of young children in grades 1 and 2 (7–9 years
old). The questionnaire was structured into three parts as follow:

Part 1: Demographic information.


Part 2: Children’s depression level in attending school.
Part 3: Children’s depression level in the online school.

Before distributing the questionnaire, a test questionnaire was conducted with expert
users to ensure the items were clear. The questionnaire was distributed online through
family and friends, and responses from the two target regions were extracted. The
666 A. Alwadai and R. Alnanih

researchers received 1455 responses, 843 came from the Asir and Riyadh regions; 595
participants had children aged 7–9 years, and this is the sample considered. The sample
size was decided based on a confidence level of 95% and an error margin of 5%. Relia-
bility was tested to measure what degree to which the research tool could be relied on to
ensure the same results in repeated application. The results indicate that the indicators
(Cronbach’s alpha and split half) were 0.747, and 0.679 considered high for parts 2 and
3, respectively.

Part 1: Demographic Information Analysis


Of the 595 participants, 53.9% of the parents were 31–40 years old, 25.4% were 41–
50 years old, 17.5% were 20–30 years old, and 2.7% were more than 50 years old.
In terms of region, 60.8% of the participants were from the Asir region, while 39.2%
were from Riyadh. Regarding the number of children, 8.9% had one child, 23.5% had
two, 20% had three, 21% had four, and 26.6% had five or more children. Most of
the participants answered that their children attended kindergarten or preschool, while
16.8% of participants did not. Thirty-three percent of participants answered that their
children “sometimes” used computer-based activities to support their education before
COVID-19, 7.8% responded “usually,” and 5.9% responded “always.” However, 23.8%
answered that their children “never” used computer-based activities to support their
education before COVD-19.

Part 2: Children’s Depression Level in Attending the School


This part of the survey consisted of nine items based on adapting the PHQ-9 International
Standard Test with Arab understanding and online learning to answer by the parent
regarding their children’s mental health in attending the school [15]. Items were designed
using a Likert-type scale where “Not at all” = 0, “several days” = 1, “more than half
the days” = 2, and “nearly every day” = 3. These scales used to calculate the sum
for each item. Table 3 shows the analysis of the descriptive measurements for the 595
participants. The most prevalent item was item #1 (sum = 783, mean = 1.32, SD =
0.99), and least prevalent was item #6 (sum = 48, mean = 0.08, SD = 0.39).
The possible scores for the nine items and the four alternatives Likert-type scale
range from 0–27, indicating the degrees of depression, classified into three levels based
on Tönnies et al. [16], as shown in Table 4.
Based on the level of depression calculated individually for each participant, Table 5
shows that 50.3% of the students did not have depression when attending school in per-
son, 44.4% suffered from moderate depression, and 5.4% suffered from severe depres-
sion. Based on these results, most students aged 7 to 9 years did not suffer from depression
when attending school in person.

Part 3: Children’s Depression Level in the Online School


Part 3 consisted of the same nine items as before, but in the context of learning online.
Table 6 shows the descriptive measurement analysis for the 595 participants. The most
prevalent item was #9 (sum = 761, mean = 1.28, SD = 1.11), and least prevalent item
was #6 (sum = 86, mean = 0.14, SD = 0.51).
Based on the classification levels of depression (Table 4), Table 7 shows that 42.5%
of students do not experience depression in attending school online, 48.4% suffer from
Applying Design Thinking Approach to Improve Online Education 667

Table 3. Descriptive measurements of items of the physical depression scale

Items 0 1 2 3 Sum Mean SD


1 Your child has little N 126 256 112 101 783 1.32 0.99
interest in % 21.2 43.0 18.8 17.0
completing his or
her schoolwork
2 Your child has a hard N 336 172 43 44 390 0.66 0.90
time falling asleep % 56.5 28.9 7.2 7.4
during school time
3 Your child is lacking N 197 255 85 58 599 1.01 0.93
energy while % 33.1 42.9 14.3 9.7
preparing for school
4 Your child feels N 410 132 35 18 256 0.43 0.74
unsuccessful in the % 68.9 22.2 5.9 3.0
school environment
5 Your child is having N 414 122 41 18 258 0.43 0.75
a hard time % 69.6 20.5 6.9 3.0
concentrating on the
lessons
6 Your child is N 565 17 8 5 48 0.08 0.39
thinking to hurt % 95.0 2.9 1.3 0.8
himself when he
encounters a
problem at school
7 Your child feels N 371 160 33 31 319 0.54 0.82
down while % 62.4 26.9 5.5 5.2
attending school
classes
8 Your child feels bad N 443 109 24 19 214 0.36 0.71
about himself during % 74.5 18.3 4.0 3.2
learning face-to-face
school
9 Your child increased N 317 185 48 45 416 0.70 0.91
his movement while % 53.3 31.1 8.1 7.6
taking classes

Table 4. The classification of the levels of depression based on PHQ-9 scores

Score Level of depression


<= 4 No depression
>5–14 Moderate depressed
>15 Severe depressed
668 A. Alwadai and R. Alnanih

Table 5. Children’s depression levels while attending school in person

Level of depression Frequency Percent


No depression 299 50.3%
Moderate depression 264 44.4%
Severe depression 32 5.4%
Total 595 100%

moderate depression, and 9.1% suffer from severe depression; more than half of the
sample suffered from moderate or severe depression when studying online.
Figure 2 compares the results between children’s level of depression when attending
school in-person versus online. The results clearly show that children’s depression was
higher during remote learning than attending school in person.

5 Definition
The second step of the Design Thinking approach is to define the problem. The empathy
step identified the problem of existing depression during remote learning. It indicates a
need to design and add a set of features to complete the requirements need of children
during online education. The authors collected the issues from the Madrasty website and
defined them as PM# (problem Madrasty #), and the list is shown in Table 8.

6 Ideate

The third step of the Design Thinking approach is to ideate. For this purpose, the authors
explored various ideas related to the platform and its issues. The proposed ideas are
mainly based on the following criteria related to design principles:

• Simplicity: Making the design easy to understand, regardless of the users’ experience,
knowledge, language skills, or current concentration level, and adding instructions
with illustrations and text.
• Usability: Considering the most critical factor in assessing the quality of web appli-
cation user interface, where product users are mainly concerned with the ease of
finding information quickly and want a platform that is easy to navigate, along with
its design, and content. For example, initializing camera capturing whenever a student
is potentially depressed.
• Accessibility: Designing an inclusive and equitable online learning environment for
diverse users to improve access to course content for all learners.
• Satisfaction: Making a product with the overall ease of use, comfortable learning,
ease of set-up and installation, accessibility, table of contents, help, graphics, and so
on.
Applying Design Thinking Approach to Improve Online Education 669

Table 6. Descriptive measurements of items of the online depression scale

Items 0 1 2 3 Sum Mean SD


1 Your child has little N 164 264 91 76 674 1.13 0.96
interest in % 27.6 44.4 15.3 12.8
completing his or
her schoolwork
online
2 Your child has a N 347 157 41 50 389 0.65 0.93
hard time falling % 58.3 26.4 6.9 8.4
asleep during
online school time
3 Your child is N 238 235 71 51 530 0.89 0.92
lacking energy % 40.0 39.5 11.9 8.6
while preparing for
online classes
4 Your child feels N 399 133 36 27 286 0.48 0.80
unsuccessful in the % 67.1 22.4 6.1 4.5
online education
environment
5 Your child is having N 247 212 77 59 543 0.91 0.97
a hard time % 41.5 35.6 12.9 9.9
concentrating on
the lessons during
online class
6 Your child is N 541 31 14 9 86 0.14 0.51
thinking to hurt % 90.9 5.2 2.4 1.5
himself when he
encounters a
problem during
online school
7 Your child feels N 351 165 48 31 354 0.59 0.85
down while taking % 59.0 27.7 8.1 5.2
virtual classes
8 Your child feels bad N 422 111 36 26 261 0.44 0.79
about himself % 70.9 18.7 6.1 4.4
during virtual
learning
9 Your child N 179 196 95 125 761 1.28 1.11
increased his % 30.1 32.9 16.0 21.0
movement while
taking online
classes

The authors developed three ideas and examined the proposed solutions’ applicability
based on the above criteria. The approved idea is described as follows:
670 A. Alwadai and R. Alnanih

Table 7. Children’s depression levels while attending school online

Level of depression Frequency Percent


No depression 253 42.5%
Moderate depressed 288 48.4%
Severe depressed 54 9.1%
Total 595 100%

Children's Depression Levels


60.00%
50.30% 48.40%
50.00% 42.50% 44.40%
40.00%
30.00%
20.00%
9.10%
10.00% 5.40%

0.00%
No depression. Moderate depression. Severe depression

AƩending the school Online school

Fig. 2. Comparative graph of child depression during in-person and online school

Table 8. Madrasty problems

Problem # Features Description


PM1 Tutorial guide missing Students and teachers do not have
sufficient information about all the
platform’s available features
PM2 Notifications missing There are no notifications to alert
students to new tasks or lessons
PM3 Recorded classes missing There is no option to record the class
and automatically upload it for
students
PM4 Grade Center not clear There is no grade center where
parents can follow students’ marks
for each course
PM5 Students’ mental health state missing The platform has no features for
supporting students’ health,
especially mental health
Applying Design Thinking Approach to Improve Online Education 671

The authors planned to design a platform similar to Madrasty with new missing
features. Teachers and students can log in through their Madrasty credentials. On the
platform, students can view all the missing features in PM#s 1–4 and access online
lectures and academic data. PM 5 is related to the teacher’s view only. The redesign of
Madrasty is called Madrasty 2, which includes the new proposed added features.
This step concludes that there is a difference in the children’s mental health between
physical and online school. The level of depression in online school is higher than the
physical school. This indicates the importance of adding the features of mental health
in the education site.

7 Design Prototype
Figure 3 shows the prototype screens for the Madrasty 2 platform, including a set of
new features to resolve the problems in the existing Madrasty. The main design looks
the same as the existing one. The new features that have been added are as follows.

1. Notification center: This will notify students of new tasks, events, and deadlines for
assignments (Fig. 3A).
2. Grades center: This will help students and parents keep track of courses grades
(Fig. 3B).
3. Tutorial video: This will provide users with information on how to use the website
(Fig. 3C).
4. Student’s mental health status: This is to inform parents and instructors look into
students’ mental health state (Fig. 3D).
5. Recordings of previous classes: This is to help students to download the previous
classes as shown in Fig. 4.

Fig. 3. Madrasty 2 main page


672 A. Alwadai and R. Alnanih

Fig. 4. Madrasty 2 course page

8 Pilot Test

The last step of the Design Thinking approach is testing. The proposed prototype for the
Madrasty 2 website with the new features was used to perform the test. Usability testing
was conducted with two types of users as follows:

1. Students: 30 students aged 7–9 years who have previously used the Madrasty
platform.
2. Teachers: 20 instructors in the teaching domain to evaluate the newly added features.

The usability testing for teachers and students consisted of two parts. First, a list
of tasks (Table 9) was prepared to measure the performance task and the number of
incorrect actions. Second, a post-test questionnaire to measure their satisfaction. To
define the benchmarks, an expert in the domain with background on the new features
in the platform performed the tasks and measured the time and number of clicks for the
teacher to perform each task. The expert determined that a completion time of 25 s for a
given task constituted excellent performance, 35 s constituted acceptable performance,
and 45 s constituted unacceptable performance. It was also determined that 1 click to do
a task constituted excellent performance, 2–3 clicks were acceptable, and 4 or more was
unacceptable. The same method was used for the students, but an additional 10 s was
added for each performance time to balance the differences in age and abilities, although
the number of clicks was the same as for teachers.
The prototype testing was run on a MacBook pro device individually for each student,
and each student’s comments were recorded immediately. The test was given during real
Madrasty platform time, between classes and previous classes as shown in Fig. 4.
Applying Design Thinking Approach to Improve Online Education 673

9 Results
9.1 Teacher Performance Tasks and Satisfaction Questionnaire
The usability testing for the 20 teachers consisted of five tasks (to measure the number
of clicks per task), shown in Table 9. The average number of “excellent” clicks, 75.8%,
enhanced the new features. The best performance was on Task 1, Task 2, and with 100%,
85% of teachers performing at “acceptable” or better.

Table 9. Teachers’ task performance (Clicks) (n = 20)

Tasks Excellent Acceptable Not acceptable


Task 1: If you want to check your grade for the last 85% 15% 0
homework, which feature do you click?
Task 2: If you want to check new notifications, 75% 10% 15%
which feature do you click?
Task 3: If you want to watch the guide video, which 75% 10% 15%
feature do you click?
Task 4: If you want to check a recorded class, 70% 10% 20%
which feature do you click?
Task 5: If you want to check a student’s mental 70% 15% 15%
state, which feature do you click?
Average 75% 12% 13%

Table 10 shows the results of performance time for the teachers needed to com-
plete the tasks, divided into three groups: excellent (≤25 s), acceptable (26–35 s), and
not acceptable (≥36 s). The average number of “excellent” and “acceptable” times in
seconds, 72%, reflects the easy use of new features.

Table 10. Time classification of teachers needed to perform tasks successfully

Tasks Acceptable Excellent Not acceptable


Task1 30% 50% 20$
Task 2 30% 50% 20%
Task 3 15% 55% 35%
Task 4 35% 30% 30%
Task 5 25% 45% 35%
Average 26% 46% 27%

Table 11 shows the results of teachers’ post-test questionnaire (measuring their sat-
isfaction). The results were obtained using relative weights (RW), which is a way to
674 A. Alwadai and R. Alnanih

quantify the relative importance of correlated predictor variables in regression analysis


[17]. RW can be counted by (mean/3) 100%, where 3 indicates the number of answer
classifications. The value has a positive weight if it is more than 50%, as follows: The
relative weight of all the results was about 78%, which indicates a positive result. The
notification center on the home page has the highest relative weight of 98%. That was
followed by the preference for displaying the “Grade Center” of different assessments
(assignment - project test) for one course in one place for ease of tracking, with a result
of 83% (2.5/3 * 100 = 83%). An option to note the student’s psychological state in the
main menu was 82%. The items with the lowest results were the “Class recording” with
67%.

Table 11. Results for teachers’ usability testing questionnaire

Questions Yes No Nearly Mean Relative


weight
1. Do you prefer 19 95.0% – – 1 5.0% 2.95 98%
having the
“Notification”
option on the home
page?
2. Do you prefer 13 65.0% 3 15.0% 4 20.0% 2.50 83%
having an option
“Grade Center” to
display the
different
assessments
(assignment -
project test) for
one course in one
place?
3. Do you prefer 11 55.0% 2 10.0% 7 35.0% 2.45 82%
having the “Mental
Health” option to
note the student’s
psychological state
within the main
menu
appropriates?
4. Do you prefer 6 30.0% 6 30.0% 8 40.0 2.00 67%
having the “Class
recorded” for
reference when
studying?
Average 78%
Note: relative weight = (mean / 3) * 100%.
Applying Design Thinking Approach to Improve Online Education 675

9.2 Student Performance Tasks and Satisfaction Questionnaire


The performance tasks for the 30 students are shown in Table 12. The average number
of “excellent” clicks, 60%, led to enhancing the new features for the students. The best
performance was on Task 2, with 90% of students performing at “acceptable” or better;
the lowest was for Task 4, with 37%.

Table 12. Students’ task performance (clicks) (n = 30)

Tasks Excellent Acceptable Not acceptable


Task 1: If you want to check your grade for the last 80% 10% 10%
homework, which feature do you click?
Task 2: If you want to check for new notifications, 83% 7% 10%
which feature do you click?
Task 3: If you want to watch the guide video, which 80% 7% 13%
feature do you click?
Task 4: If you want to check a recorded class, 43% 20% 37%
which feature do you click?
Average 71.5% 11% 17.5%

Table 13 shows the performance time for the students to complete each task, which
is divided into three groups: excellent (≤35 s.), acceptable (36–50 s.), and not acceptable
(≥50 s.). The average number of “excellent” and “acceptable” times in seconds, 83%,
reflects the easy use of new features.

Table 13. Time classification of students needed to perform tasks successfully

Tasks Excellent Acceptable Not acceptable


Task 1 80% 7% 13%
Task 2 87% 3% 10%
Task 3 80% 3% 17%
Task 4 43% 40% 17%
Task 5 53% 20% 27%
Average 68% 15% 17%

The students’ post-test questionnaire (measuring their satisfaction) is shown in


Table 14. The relative weight of all the results was about 85%, which indicates a pos-
itive result. The preference for displaying the “Grade Center” of different assessments
(assignment - project test) had the highest relative weight of 86.7%. Then, the notification
center on the home page having a relative weight of 73.3% followed by the preference
for “Class recorded,” with 60%.
676 A. Alwadai and R. Alnanih

Table 14. Results for students’ usability testing questionnaires

Questions Yes No Nearly Mean Relative


weight
1. Do you prefer 22 73.3% 4 13.3% 4 13.3% 2.60 87%
having the
“Notification”
option on the
home page?
2. Do you prefer 26 86.7% 1 3.3% 3 10.0% 2.83 94%
having an option
“Grade Center” to
display the
different
assessments
(assignment -
project test) for
one course in one
place?
3. Do you prefer 18 60.0% 12 40.0% – – 2.20 73%
having the “Class
recorded” for
reference when
studying?
Average 85%

10 Discussion

Regarding the analysis for the teachers’ responses to the performance task, it can be
stated that despite having room for improvement, the percentage of right actions for
the performance task is significantly high, with 87% (Table 8). The average number of
teachers to perform all the tasks successfully was 72% (46% + 26%), indicating that
over half of the sample could perform all the tasks in an acceptable period. Table 15
shows that the average time needed for the teachers to perform all tasks is about 34.2
s. The success rate for all tasks is 81%, the average number of clicks for all tasks is
about 1.11 clicks, and the correlation between the average time needed for tasks and the
average number of clicks is about 0.94 where a correlation value closed to 1.0 shows a
perfect positive correlation between the movement of time and clicks.
For students, the overall success rate of right actions during the performance tasks
was 82.5% (Table 9), indicating the majority could perform all the tasks with an excellent
number of clicks. The average time to perform all the tasks successfully was 83% (68%
+ 15%), indicating that most students performed all the tasks in an excellent period.
Table 16 shows the average time needed for the students to perform all task is about
39..35 s, and the success rate for all tasks is 74.75%, the average number of clicks for all
tasks is about 1.28 clicks, and the correlation between the average time needed for tasks
Applying Design Thinking Approach to Improve Online Education 677

Table 15. Correlation between average time and average clicks for teachers

Task Time of task Done Clicks Correlation between time and clicks
Task 1 29.9 100% 0.55 0.812
Task 2 29.9 85% 1.00 0.954
Task 3 34.4 80% 1.00 0.988
Task 4 39.2 70% 1.40 0.970
Task 5 37.8 70% 1.60 0.943
Average 34.2 81% 1.11 0.940

and the average number of clicks is about 0.931 which means it has a positive relation
between these two variables because it closed to 1.0. As for task 1: the average time
needed for the task is about 34.5 s, with a success rate of about 83%, the average click
for the task is 1 that is a mapped to excellent as mentioned above, and the correlation
between the time needed for the task and the number of clicks is about 0.898.

Table 16. Correlation between average time and average clicks for students

Task Time of task Done Clicks Correlation between time and clicks
Task 1 34.5 83% 1 0.898
Task 2 32.7 90% 0.9 0.929
Task 3 34.5 83% 0.97 0.959
Task 4 55.7 43% 2.27 0.948
Average 39.35 74.75% 1.28 0.931

11 Conclusion

Design Thinking is a method for coming up with solutions to problems that already exist.
These solutions are always tailored to the demands of users and have a beneficial impact.
Design Thinking is an organized and iterative approach. This paper aimed to apply the
Design Thinking approach to improve the educational website and meet the missing
required features such as 1) adding a grade center to display the different assessment
methods grade, 2) adding a notification feature to notify the students with updated notes.
3) support the website with a tutorial guide to guide all the different types of students.
4) support the website with recording class to allow the students to return to it when
needed. Finally, highlight the importance of adding the mental health feature to the
panel of the website in such a way so as to be visible to the instructors and invisible
to the students, to record any unnormal observation during the online education, for
678 A. Alwadai and R. Alnanih

investigating children’s level of depression during online education compared to in-


person school. The researchers confirmed that the level of depression with children
attending the school online is higher than offline. This result suggests the authors should
redesign the existing Madrasty platform and implement the new features to integrate
the mental health features in the educational website. This approach uses the Design
Thinking to predict the proposed prototype’s issues. The users’ experience acts as a map
in this approach, with an emphasis on the following criteria: 1) Usability, tasks can be
performed quickly and without much instruction, 2) Accessibility, having clear headings
to organize the content’s structure and make sure the platform is keyboard-friendly; 3)
Satisfaction, having limited options for increased interaction, an appealing color, and
focusing on straightforward navigability. Current plans should be developed in the future
to more completely address children’s psychological needs to manage depression during
the continued online education due to COVID-19.

Acknowledgment. The authors gratefully acknowledge all the participants in the experiment test
for their time and effective feedback.

References
1. Althiabi, Y.: Attitude, anxiety and perceived mental health care needs among parents of
children with autism spectrum disorder (ASD) in Saudi Arabia during COVID-19 pandemic.
Res. Dev. Disabil. 111 (2021). https://doi.org/10.1016/j.ridd.2021.103873
2. Gul, H., Iqbal, S.Z., Saqib, M.: Usability evaluation of an educational website in Saudi Arabia.
VAWKUM Trans. Comput. Sci. 8(2) (2015). https://doi.org/10.21015/vtcs.v8i2.382
3. AlAzzam, M., Abuhammad, S., Abdalrahim, A., Hamdan-Mansour, A.M.: Predictors of
depression and anxiety among senior high school students During COVID-19 pandemic:
the context of home quarantine and online education. J. Sch. Nurs. 37(4) (2021). https://doi.
org/10.1177/1059840520988548
4. Scholten, H., Granic, I.: Use of the principles of design thinking to address limitations of
digital mental health interventions for youth: viewpoint. J. Med. Internet Res. 21(1) (2019).
https://doi.org/10.2196/11528
5. Böhm, M., et al.: Fluid status telemedicine alerts for heart failure: a randomized controlled
trial. Eur. Heart J. 37(41) (2016). https://doi.org/10.1093/eurheartj/ehw099
6. Langkamp, D.L., McManus, M.D., Blakemore, S.D.: Telemedicine for children with develop-
mental disabilities: a more effective clinical process than office-based care. Telemed. e-Health
21(2) (2015). https://doi.org/10.1089/tmj.2013.0379
7. Olfson, M., Druss, B.G., Marcus, S.C.: Trends in mental health care among children and
adolescents. N. Engl. J. Med. 372(21) (2015). https://doi.org/10.1056/NEJMsa1413512
8. Fan, Y.: Research on feature extraction of EEG signals using MSE-PCA and sleep staging
(2018). https://doi.org/10.1109/ICSPCC.2018.8567757
9. Tung, K., Liu, P.K., Chuang, Y.C., Wang, S.H., Wu, A.Y.: Entropy-assisted multi-modal
emotion recognition framework based on physiological signals (2019). https://doi.org/10.
1109/IECBES.2018.8626634
10. Herman, K.C., et al.: Does child likeability mediate the link between academic competence
and depressive symptoms in early elementary school? Child Dev. 91(2) (2020). https://doi.
org/10.1111/cdev.13214
Applying Design Thinking Approach to Improve Online Education 679

11. Duraku, Z.L., Hoxha, L.: The impact of COVID-19 on higher education: a study of interaction
among students’ mental health, attitudes toward online learning, study skills, and changes in
students’ life. Researchgate.net, May 2020
12. Schaeffer, C.E., Konetes, G.D.: Impact of learner engagement on attrition rates and student
success in online learning. Int. J. Instr. Technol. Distance Learn. 7, 3–9 (2010)
13. Hopkins, K., Crosland, P., Elliott, N., Bewley, S.: Diagnosis and management of depression
in children and young people: summary of updated nice guidance. BMJ 350 (2015). https://
doi.org/10.1136/bmj.h824
14. Venkataraman, D., Parameswaran, N.S.: Extraction of facial features for depression detection
among students. http://www.ijpam.eu
15. Belfer, M.L.: Child and adolescent mental disorders: the magnitude of the problem across
the globe. J. Child Psychol. Psychiatry Allied Discip. 49(3) (2008). https://doi.org/10.1111/
j.1469-7610.2007.01855.x
16. Tönnies, J., et al.: Mental health specialist video consultations for patients with depression or
anxiety disorders in primary care: protocol for a randomised controlled feasibility trial. BMJ
Open 9(9) (2019). https://doi.org/10.1136/bmjopen-2019-030003
17. Tonidandel, S., LeBreton, J.M., Johnson, J.W.: Determining the statistical significance of
relative weights. Psychol. Methods 14(4) (2009). https://doi.org/10.1037/a0017735
A Universal IT Support System for Teachers
for Educational Processes, Publishing
and Academic Research Using All-in-One
Educational Software

Stefan Svetsky(B) and Oliver Moravcik

Slovak University of Technology in Bratislava, Bratislava, Slovak Republic


{svetsky,oliver.moravcik}@stuba.sk

Abstract. Current learning technologies do not meet the needs of teachers


and individuals. There is an information overload and the academic field has
also become technology driven. Rather than technology operating according to
the teacher’s needs, individuals are required to adapt to the existing software.
Due to the incompatibilities between software, hardware and the formats of com-
puter files, information chaos is growing to huge proportions. Unless educational
algorithms (i.e., what is done with educational content) are defined, computer
algorithms, software, and systems for the integration of IT into teaching cannot
be designed. As part of our research into the automation of knowledge-based
processes, which includes educational processes, we have managed to solve the
problem of how to simulate human knowledge and pass it on to a computer, so
that it can ‘understand’ it. Our solution is a model of virtual knowledge that can be
processed quickly by a computer. The computer ‘understands’ this as a universal
representation of knowledge, while from a teacher’s point of view, it is an ordinary
table into which the teacher inserts educational content. This virtual knowledge
(having the structure of a database table) acts a kind of knowledge container that
isomorphically connects the mental processes of the teacher with the physical pro-
cesses of the computer. Our WPad educational software is programmed to control
the structure (content), so, it is possible to create educational knowledge tables and
personal knowledge base for any activity that the teacher performs during teach-
ing, or research. The teacher does not need to adapt to the technology; instead,
the technology adapts to the teacher’s activities. Since the software runs on every
Windows computer and works as a multiple-in-one educational IT tool for a vari-
ety of lessons, it is probably the most efficient and cheapest technology solution.
It is used to support the integration of technology into classroom and distance
learning. Future research will focus on the creation of multilingual educational
packages.

Keywords: Learning technologies · Technology-enhanced learning ·


Educational software · Human-centered computing · Educational knowledge
management

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 680–697, 2023.
https://doi.org/10.1007/978-3-031-18344-7_48
A Universal IT Support System for Teachers 681

1 Introduction
The automation of teacher activities and the integration of IT into teaching forms
a complex interdisciplinary problem. Terms such as e-learning, learning technology,
technology-enhanced learning, educational technology, technology in education, and
digital learning can be encountered in the scientific literature, and these differ in princi-
ple based on whether a technology-driven or educationally driven approach is empha-
sized. Unlike in the recent past, a university teacher now needs to bulk process a much
larger volume of educational content in digital form and with more software that was
not developed for educational purposes. The automation of these activities is hampered
by information overload and the incompatibility between current software, hardware,
and computer files in various formats (such as text, image, audio, and video files). The
practical impact on educational processes is that the teacher, and in fact users in general,
must adapt to technology rather than the technology being used as a support tool to
automate learning processes, i.e., for the fastest possible and most efficient processing
of educational content.
From general sources can be mentioned, how it is written in the Edutopia “It is
sometimes difficult to describe how technology can impact learning because the term
technology integration is such a broad umbrella that covers so many varied tools and
practices” [1]. However, from the teacher’s point of view, the use of these tools is based
on a technology-driven approach rather than an educationally driven one. The same
case is when a reader reads that “Learning technology encompasses the full range of
tools and media that can be used to facilitate teaching and learning” [2]. The web page
for the Learning Technology Toolkit of the University of Saskatchewan lists over 20
technological items (e.g., Microsoft 365, One drive, Canvas, Mobius, Zoom). In terms
of a different division of technology categories, the university has Approved Academic
Tools by Function, which is more understandable to the university teacher, e.g., assess-
ment, course management, content creation, STEM student practice, or open textbook
creation/sharing.
For comparison, the ICT-22-2016 call from the European Union Research program
“Technologies for Learning and Skills”, the focus of this call was on “innovation of learn-
ing technology” and a challenge “to create an innovation ecosystem that will facilitate
use of digital content, tools and services for personalized learning and teaching.”
From this short introduction, we can see that there is considerable chaos and incon-
sistency in the definitions of technologies that are suitable for IT integration. Although
attempts have been made to highlight the differences between terms, e.g. “The difference
between technology of education and technology in education,” as in [3], ambiguities are
also observed in general sources; for example, the link to technology-enhanced learn-
ing on Wikipedia redirects the user to the page on educational technology [4], which
states that this term “is not restricted to high technology but is anything that enhances
classroom learning in the utilization of blended, face to face, or online learning.”
In other words, these are basically only synonyms from the point of view of the aver-
age teacher, i.e., the same technology is referred to as e-learning, technology-enhanced
learning, learning technology, digital learning, or educational technology. The average
teacher needs to choose only a few tailor-made supporting tools from a wide portfolio of
existing technological tools and needs to focus on those that will allow for the creation,
682 S. Svetsky and O. Moravcik

management, and communication of educational content, and can automate teaching or


the activities performed at a given time in the classroom or digital space.
Useful information on the integration of IT into teaching is provided by several
scientific monographs that have focused on technology-enhanced learning or educational
technology [5–9]. These consider real pedagogical practice, in which the educational
technology is subordinated to the teacher and serves as a support tool for streamlining
his activities and expertise. According to Stošič [10], educational technology has three
domains of use: (1) as a tutor (where a computer gives instructions and guides the user);
(2) as a teaching tool; and (3) as a learning tool, while simultaneously emphasizing
the importance and use of educational technology in the classroom. It should be noted
that technology is not a panacea and should be derived primarily from education [5, 7].
Namely, the current technocentric approach to technology-enhanced learning means that
teachers and students should adapt to existing technologies and test whether they can be
used for teaching. Martens also highlights the lack of educational software in the field
of technology-enhanced learning [11]. Such techno-centrism has also been the target of
relatively frequent criticism in scientific publications [12–16], which have emphasized
that technology should reflect so-called Technological Pedagogical Content Knowledge
(TPACK) [7, 17]. From this point of view, the all-in-one educational software WPad can
also be considered to be TPACK software teaching and academic research.
Because the automation of teacher activities represents an interdisciplinary issue that
combines pedagogy (didactics) and informatics (information technologies), our research
deals with issues such as batch processing of educational information and knowledge,
the creation of educational packages, adaption to existing software, hardware, networks,
and clouds. Since similar approaches have not been described in the literature, references
cannot be given, meaning that only general comparisons can be made. In addition, there
is no similar educational all-in-one software as described here (its novelty is confirmed
by the registration of our two utility models with the Slovak Patent Office).
In the following sections, we discuss pedagogical aspects of the use of technology,
motivation and research focus on technology integration, research outcomes, explaining
the universal approach for IT support using WPad educational software (design of virtual
knowledge and knowledge tables, basic and advanced levels of IT support using WPad,
basic classroom support: a combination of face-to-face and online teaching, advanced
level of IT support using WPad and IT infrastructure).

2 Pedagogical Aspects of the Use of Technology in Education

It is interesting that no interdisciplinary approach to the application of technology in


education appears in the scientific literature (based on a search of Springer, Wiley, Sci-
ence Direct, and Emerald), although this is an area in which pedagogy (didactics) and
technology intersect. The technology-driven approach is preferred over an educationally
driven one. In this context, one very useful monograph on Technology Enhanced Learn-
ing (TEL) with such an interdisciplinary approach should be mentioned [5]. Although it
was written 20 years ago, its conclusions on the integration of technology are still valid
today. The scientific monograph emphasizes that “learning, not technology, should be
the driver of any educational innovation”, and “the design is not driven by technology
A Universal IT Support System for Teachers 683

but by the painstaking analysis of human learning processes and of the requirements of
a particular task.” The ineffective implementation of technology-related change is also
mentioned, in relation to “resistance to change from students, professors, administra-
tors”. In the context of educational software, the authors mention that future software
applications should be concerned with three aspects of learning: lectures, laboratories,
and libraries. Particularly pertinent is the statement: “That is why we find it necessary
to have a design and technology team behind every professor.” Exactly the same view
is put forward today by the authors of [6], who state that “learning support services are
extremely important, so the instructors or tutors have to understand the learning difficul-
ties and the learning environment of the learners so as to have effective communication
with them.” It is also interesting to argue that “with the advent of visual technologies,
students lose the motivation to make their own notes.” A similar approach is found in
another monograph [7], which emphasizes that technology is not a simple panacea for
education and that a teacher is always a key player in the process of teaching and learn-
ing, in terms of creating and managing educational content. Specific emphasis is placed
on the TPACK model, which introduced the concept of Technological and Pedagogical
Content Knowledge (TPACK) as a framework for “integrating technology in teachers’
knowledge.” According to Mishra and Koehler, TPACK is an emergent form of knowl-
edge that goes beyond all three components (content, pedagogy, and technology), and
is “different from knowledge of a disciplinary or technology expert and also from the
general pedagogical knowledge shared by teachers across disciplines.”
The TPACK framework, with its seven types of knowledge, is still popular in educa-
tional technology, and is explained by many of the internet sources for the educational
community (see for example in the literature review in [18]). In regard to the TPACK
model, the authors of [7] emphasize that “teachers not only need to know the content
they are teaching but also must recognize how to integrate technology into pedagogy to
achieve greatest impact on desired outcomes.” Another model of classroom-based sce-
nario is discussed in [8] and is called the “Turn around Technology Integration Pedagogy
and Planning” (TTIPP) model; this includes phases based on the analysis of learning
and teaching assets and needs, design of the integration framework, and post-instruction
analysis and revisions. A great deal of attention is paid to a specific integrating tech-
nology across certain disciplines (e.g., science, engineering, mathematics, second and
foreign languages). The principle of the TPACK model is aligned with Laurillard’s state-
ment [9] that the optimal solution to technology-enhanced learning can be achieved in
practice if the teacher, researcher and designer work closely with each other.
In a paper on educational technology [6], which is mainly related to the area of
learning activity design it is emphasized that from the perspective of learners, each
learning activity includes four aspects: (1) the learning tasks (which allow the learners
to explicitly understand what they should do); (2) learning resources (non-digital and
digital materials that provide the learner with the necessary information and content);
(3) evaluation methods (which should allow for adequate examination of the completion
of learning activities); and (4) learning support services (where the instructors or tutors
should understand the learning difficulties and environment of the learners, in order to
facilitate effective communication with them). Several theories have been put forward on
the better design of learning activities, such as Bloom’s taxonomy, Sweller’s cognitive
684 S. Svetsky and O. Moravcik

load theory, and Mayer’s principles of multimedia learning. In the last approach, the idea
is that students can learn more deeply with multimedia than they could have with words or
pictures alone, and that multimedia instruction should “encourage the learner to construct
a coherent mental representation of the material” in order to “construct new knowledge”
[19]. One important argument is that “a technology need not be a specific device, as a
technology could be generally understood to be a systematic and disciplined application
of knowledge”. This question of knowledge is a key aspect of the all-in-one software
WPad, which can be considered an adaptive learning software. The cybernetics idea on
which the software is based was published in an AI journal, as it allows for knowledge
extraction and representation, even in natural language, usable by lay users [20]. As
will be explained, WPad is based on a specific model of knowledge representation,
while, but in the mentioned monograph [7] the knowledge representation is discussed
only in general and without any definition. The use of sensors, graphs, drawing and
painting programs, hypermedia is declared as technologies which represent knowledge.
However, computers do not know what knowledge is and how to use it if it is not
computer-defined. Such a representation of knowledge presents, e.g., Syed [21] for next
generation knowledge machines, when he represents the knowledge in the form of graphs
as a quantifiable and dimensioned entity. This is only a theory and is far removed from
the work of a teacher in the realm of natural language within educational settings.
The related pedagogical context can also be selected from a newer monograph of
the design of technology-enhanced learning [8]. The TPACK framework approach is
emphasized, as in previous studies, and the pedagogical aspects of technology-enhanced
learning are also clarified. From the point of view of the function of the WPad educa-
tional software, the important aspect of representing and sharing content is mentioned
in relation to conceptualizing content in the Anderson-Krathwohl taxonomy of learning,
teaching, and assessing. In the Anderson-Krathwohl taxonomy (i.e., a revised Bloom’s
taxonomy), factual, conceptual, procedural, and metacognitive knowledge are taken into
consideration [22].
The pedagogical aspects discussed above are rarely followed in practice, since com-
puters were invented for calculations rather than for teaching. As a result, the current
state of the technology has not yet reached the level required to support the teacher.
Existing technologies are still not optimal for practical teaching; users have to adapt to
the technology, and to check whether it is suitable for their educational needs. As set
out in the introduction, such a huge range of technological tools is now available that
ordinary teachers are likely to find this disorienting.

3 Purpose/Goal
3.1 Motivation and Research Focus on Technology Integration
Technology that is suitable for integration into teaching and selected pedagogical aspects
is discussed here to clarify the purposes and goals of our research. As mentioned above,
our approach focuses on several interdisciplinary elements that are not described else-
where in the scientific literature. Although the scientific monographs discussed above
are very useful, from the point of view of the teacher or researcher, it is interesting
that they do not pay more attention to the factor of time, i.e., the speed of processing
A Universal IT Support System for Teachers 685

educational knowledge and content, which is a key element in solving the automation
of any educational process and fundamentally affect a teacher’s performance, and hence
the learning outcomes in general. In addition, it is well known that if a teacher creates
educational materials, these need to be updated after a certain time, which poses a signif-
icant problem in practice. There is also little mention of the fact that although teachers
typically work for 10–20 years, the lifespan of software and hardware is only a few
years (for example, laptops and mobile phones often fail after 2–3 years, a programming
language may change, and operating systems and software are continually updated).
These practical issues are mentioned because a universal software must be ‘resistant’ to
any changes in software and hardware. In this respect, there was a particular focus in
the development of WPad software on adaptation to the Windows operating system and
the most common Internet browsers. Compatibility with Microsoft Office packages and
other software used in education and the ability to switch from the program environment
to other software and online portals and environments are also advantageous.
From our point of view, however, the issue of mass processing of information and
knowledge is much more important, and this is not mentioned in the related scien-
tific literature. This issue formed the basis of our vision, published in 2007–2008, that a
knowledge worker (such as a teacher or researcher) needs to process such a large amount
of information in the course of teaching and research that they need to be technologi-
cally equipped like a “contemporary soldier” [23]. Since no suitable software was on
the market at the time, the designer of WPad began developing an all-in-one software
for undergraduates, based on a batch information and knowledge processing paradigm.
The progress made in terms of integrating technology into teaching was a subject on
which we continuously published papers in conferences and scientific journals in the
field of technology-enhanced learning. Our original empirical research (which was ini-
tially based on a technology-driven approach and then an educationally driven one) was
transformed into the current interdisciplinary research, including the registration with
the Slovak patent office of a utility model for the conversion of uncertain and unstruc-
tured data into semi-structured data. This is related to our model of virtual knowledge,
i.e., a specific data structure operating on the cybernetic principle of isomorphism of
physical computer processes and mental activities [24].
In the context of finding solutions for future learning technologies, our motivation
and focus is presently on the automation of knowledge-based educational processes
(based on our academic research). In practice, when a teacher aims to develop various
training activities, this mainly requires solving the following key issues:

• Accelerating the transfer of educational knowledge to computers (due to the lack of


switching between mental processes and machine).
• Customizing computer outputs in publishing, teaching, research (in terms of form and
compatibility).
• Solving the problem of the concentration of knowledge so that it is immediately usable
(for a given purpose and specific activity).
• Managing knowledge, exchanging it and transferring it between offline and online
environments (laptops, clouds, networks, Internet services, file repositories).
• Dealing with adaptation to operating systems, software, and hardware (for Windows,
Internet browsers, Web portals, Internet resources, clouds, and networks).
686 S. Svetsky and O. Moravcik

• Creating educational algorithms and synchronizing them with computer algorithms


(without defined teaching algorithms, it is impossible to write programs).
• Turning a typical users’ work with a computer into bulk work with information and
knowledge (i.e., making it easier for the computer to work by using the naturally
marking content and files and multitasking).

The main aim of this paper is to give an overview of how all these key elements
have been addressed in our academic research over about 15 years, with a focus on the
integration of IT into teaching to support the teacher (researcher, designer) as a key player
in the educational process. The secondary objectives are to exchange experience with the
academic community and to outline the challenges for solutions of future technologies. In
the following sections, several outcomes will be presented based on case study examples
drawn from academic practice.

3.2 Summary of Research Outcomes

A milestone in our research and the key outcome was the invention of the informatics
data structure mentioned above, which simulated human knowledge; this was predicted
in the author’s habilitation thesis, which focused on the mass construction of educational
content and e-learning materials [25]. This virtual knowledge was invented by addressing
the issue of how one computer program could function in academic practice as an all-in-
one program that was suitable for teaching, research, and publishing. As mentioned in the
previous section, it can replace numerous other software packages that the teacher would
need to use for the same educational activities. For a lay person, this can be explained by
the fact that it is sufficient for a teacher, student, researcher, or other user to find a way
of transferring tacit and explicit human knowledge into virtual knowledge (which takes
the form of an ordinary table containing plain text that can be edited). In this case, the
computer can process the virtual knowledge extremely quickly and give the teacher the
desired output in relation to classroom teaching or other educational activities. And in
doing so, a never-ending story of hundreds of outputs began and is possible to perform.
Numerous categories of activities of teachers can now be supported by the IT system of
support using WPad at the personalized IT infrastructure regarding teaching, research,
and publishing (at the level of personnel or collaborative outcomes, including distance
learning).
Within our research, the different categories of educational activities were supported
from the perspective of providing knowledge in teaching (lectures, exercises, self-study,
collaborative learning) as follows:

• Learning content for several study programs (for which the outcomes were published
in global conferences and scientific journals).
• Outcomes from cooperation with international consortia that have submitted proposals
for projects related to the integration of IT into teaching (FP7 and Horizon 2020 calls
for IT).
• The WPad educational software, a shared IT infrastructure (including WEB, cloud, vir-
tual servers), communication channels, and a butch knowledge processing paradigm.
A Universal IT Support System for Teachers 687

• Outcomes from the V4+ACARDC project of the International Visegrad Fund


(including the collaborative creation of multilingual educational content).
• Outputs from current non-project research focused on human-centered computing.
• Teaching methodologies and a non-relational database paradigm that can be used for
self-study, research, and publishing.

4 A Universal System for IT Support Using WPad Educational


Software
4.1 Virtual Knowledge and Knowledge Tables
Universal IT support for educational processes, academic research and publishing is
based on default data structure, known as a virtual knowledge, which is controlled
by our WPad software. This structure simulates the way in which people understand
human knowledge. Virtual knowledge is defined as meta-information, which identifies
the content, and which is combined with the (educational) content within one row of the
common database table as illustrated in Fig. 1. A set of rows forms a (virtual) knowledge
table, so, a teacher or other user can create many categories of own knowledge tables
and manage the tables in the same way as computer files (e.g., combining, selecting,
transmitting, exchanging, copying, saving).

Fig. 1. Virtual knowledge representation for the automation of knowledge-based processes

Each user (a lay person, teacher, student, researcher, or expert) needs to find their
own style and way of inserting their tacit or explicit knowledge into the knowledge
tables using plain text. The content field of the table has a simple text editor, which
enables a user to manually input shorter texts or to paste in a larger amount of text. It
therefore functions as a container for the content and uses hypertext to directly connect
the knowledge tables to the Internet and the personal folders on the user’s computer. For
688 S. Svetsky and O. Moravcik

example, undergraduate students worked on computers in the classroom on which WPad


had been installed and created their own learning tables when performing collaborative
teaching tasks during lectures or exercises. From a pedagogical (didactic) point of view,
the added value is the possibility of concentrating the teacher’s or student’s knowledge
from many offline/online resources into one place and to process it very quickly.
The basic functions provided by WPad software are illustrated in Fig. 2. The table
is interlinked with some other tables and has both offline and online links related to
publishing using hypertext. The left-hand window shows part of the meta-information,
while the right-hand window shows the related (educational) content.

Fig. 2. Example of a (Virtual) knowledge table entitled PAPER, which is used to support
publishing in a WPad work environment

From a user point of view, it is important using the hypertext link directly from the
knowledge table both to folders on their personal computer and Internet paths without
need to open browsers or Windows explorer for writing paths. WPad also functions as
a simple HTML editor, so by simply clicking CTRL-F1, the table can be converted into
mirrored HTML-format, as shown in Fig. 3.

Fig. 3. Example of the conversion of a knowledge table entitled PAPER into HTML format

In other words, a user can produce HTML tables with concentrated content, where
one row represents one Web page. Since the knowledge table can contain many rows,
it enabled to develop the batch information and knowledge paradigm, i.e., a way of
A Universal IT Support System for Teachers 689

applying mass processing to large amounts of educational content using a minimum


number of interfaces at the level of an individual.
Figure 4 illustrates the plethora of different educational situations it is possible to
solve using WPad and allows the reader to understand why and how WPad functions as a
universal all-in-one tool. This means that this single software tool can perform activities
for which a teacher or other user would normally have to use several types of software.
Moreover, a teacher can use WPad to build a personal knowledge base in the form of a
system of interconnected knowledge tables.

Fig. 4. Model of the virtual knowledge table function

The teacher inserts educational content into the table and the computer ‘understands’
it as an IT data type that can be processed extremely quickly, with outputs provided in a
form that is comprehensible to humans. It should be emphasized that the computer does
not perform the mental work in the place of humans, but simply supports our mental
processes.

4.2 Basic and Advanced Levels of IT Support Using WPad


WPad can be used in either basic mode (for educational activities) as illustrated in Fig. 5,
or in advanced mode (for research or publishing) as illustrated in Fig. 6.
Figure 5 shows that the software can be installed in the classroom on a regular
Windows computer. If there is an Internet connection, it is also possible to use the
communication channels and virtual educational environment on the faculty server, and
there is a connection to the Academic Information System (AIS).
Figure 6 illustrates the advanced mode, in which all of the WPad functions and
a combination of offline and online environments can be used, for example, to create
e-learning content for a virtual learning environment, carry out collaborative activities
on a virtual server, provide multilingual language support, work with many computer
files (e.g., to create a content visualization for STEM), or to transfer (virtual) knowledge
tables and computer files over the teacher’s offline/online IT infrastructure.
690 S. Svetsky and O. Moravcik

Fig. 5. Example of the basic level of WPad, as used in the classroom teaching of undergraduates

Fig. 6. A teacher’s personal “hybrid” Offline/Online IT infrastructure, based on the advanced use
of our educational WPad software, including a communication channel (PIKS)

4.2.1 Basic Classroom Support: A Combination of Face-To-Face and Online


Teaching
The basic method of using our WPad educational software, as shown schematically in
Fig. 5, evolved over several years of use in teaching undergraduates as part of multiple
study programs. Our approach is probably the cheapest way of supporting teachers and
students as an individual only need to use a computer with the Windows operating
system and an installation of WPad. It is important that WPad works directly with
Windows Explorer as well as the other Windows features that the teacher is familiar
with. In practice, students typically use WPad in combination with an Internet browser
and Microsoft Office programs. Thanks to its high compatibility with Windows, an
individual can also create educational and e-learning sets for the Web in bulk and can
interconnect the learning texts with images or audio files, etc. From a pedagogical point
A Universal IT Support System for Teachers 691

of view, this is also in line with the Meyer model, which states that a student will
understand learning material more quickly if the learning content is a combination of
texts and images, and possibly in a multimedia format.
The teacher found that the WPad tables created by the students functioned as their
notebooks; they made notes on lectures and exercises directly into the knowledge tables,
and many of them would not take notes without using WPad as a supporting tool. Another
pedagogical (didactic) advantage was that the teacher could collect and evaluate the
students’ notes from the class computers, or combine them into a single table, and placed
it in the shared faculty’s virtual learning space. So, a collaborative activity was used to
create a new study material, which was also used for self-study by other undergraduates
in the subsequent years.

4.2.2 Advanced Level of IT Support Using WPad and IT Infrastructure


As can be seen from Figs. 4, 5 and 6, it is possible to handle an infinite number of
IT support situations using WPad. Our research has focused on the automation of
knowledge-based processes, as university teaching relates to the creation, dissemination,
presentation, and management of knowledge.
The advantage of (virtual) knowledge tables is that their data structure is not depen-
dent on the database platform, so such tables can also be used in an online environment
(WPad only offline or on the cloud with a virtual machine). This is also the focus of
recent research, which has an application focus on the creation of PIKS channels that
function as a PHP/MySQL web application. The difference is that while WPad runs on
the Visual FoxPro database platform (which has its own programming language and the
educational content is transferred with the tables), PIKS runs on the MySQL database
platform and the content of the tables is controlled by PHP source codes (the disadvan-
tage is thus that MySQL tables cannot be sent as regular computer files). On the other
hand, the advantage is the possibility to create table content and to share tables on the
Internet, i.e., for teachers and/or students to collaboratively insert learning content into
the same table. This is also advantageous for addressing questions in the field of CSCL
research (Computer Supported Collaborative Learning).
Unlike the basic level of technology integration using WPad, interdisciplinary
research requires a sophisticated approach and a more comprehensive IT infrastruc-
ture that combines offline and online tools and environments. For example, the standard
programming in which the user menu is positioned at the top of the screen is no longer
optimal, as it already covers the entire screen. Since it contains numerous optional items,
it can often be confusing, even for the author of the program. Therefore, a part of the user
menu contains additional application menu items, i.e., the application menu is composed
of sequences of simple menu items. Examples of simple menu items include Save/Save
As, a search of the table, ordering or filtering rows, transferring a row to another tab,
etc. The application menu can be understood as a sequence of simple menu items that
are selected with one click, so that in practice they function as a black box. The function
of the application menu can be illustrated by the following combinations of one-click
activities:
692 S. Svetsky and O. Moravcik

• Go to the IEEE journal page, create a table with links to yearly issues of the journal,
select the option to download it to a computer, open it, convert to HTML format, open
it in the internet browser, or if necessary, synchronize the transfer to the BOX cloud,
which is shared with several researchers.
• Copy the text from the conference proceedings to a row in the knowledge table, create
a corpus table and enter search keywords, e.g., keywords or stylistic phrases to support
writing an article in English.
• Copy the RTF output from the university’s publication server to a row in the knowledge
table (e.g., for the years 2010 to 2021), create a corpus table from it and search for
a list of your publications or any publications from the department, institute, or the
whole faculty
• Make a list of all PDF files on the computer, USB, or backup disk, and add them as a
new line at the end of the opened table (SHIFT-F9).
• Write source code in a row of the table that will do something with the rows or content,
so it can be used instead of the standard command window or console, and the user can
enter the source code into the same table in which the educational content is stored.

IT support for publishing is also being developed, which is based on inserting various
content (e.g., multilingual annotations, links to journals, instructions for authors, pdf-
articles, and various custom or e-resources) into the text field of the table. This advanced
mode of operation is schematically illustrated in Fig. 7.

Fig. 7. Relationships between virtual knowledge/knowledge tables and computer files (after
loading or linking files to an empty table, the tables contain a domain educational knowledge)

It should be emphasized that the transfer of WPad tables containing virtual knowl-
edge between notebooks, client computer folders or online servers is radically different
from the transfer of computer files as commonly used by teachers and other users. This is
not generally understood by reviewers of scientific journals with a focus on educational
technology and database specialists (as these users are familiar only with the relational
A Universal IT Support System for Teachers 693

database paradigm). Computer files are processed in batches by file management meth-
ods, while in knowledge tables, the batches consist of groups of rows. The monthly man-
ual table for individuals contains about 20–50 rows, whereas an automatically created
table with WPad can have a million rows.
Figure 8 illustrates a content of the row 4887 from a table that has 676,896 rows.
The table was created automatically and contains a list of paths to all the existing files in
the teacher’s notebook. These can be opened directly from the table; for example, after
clicking on the path in this row, a picture of the Fe-C diagram from the STEM course will
be displayed. This principle of offline hypertext, which can be used as a menu item of the
user menu of WPad, represents added value in terms of file management in Windows.
In this content, Fig. 9 illustrates a file management process. The result table contains
a list of files output from a search of the BOX folder; this folder is synchronized with
the online BOX cloud, meaning that the researcher does not have to search in the BOX
cloud and can instead search offline.

Fig. 8. Knowledge table automatically created from the backup folder of teacher’s personal
computer (knowledge base - 676,896 Rows, 800 MB, opening/closing takes 20–40 s)

Fig. 9. Results of an offline search for an explanation of a file management function used in WPad
694 S. Svetsky and O. Moravcik

In our case, learning and teaching content can be stored both in the knowledge tables
and in computer files with various formats (TXT, HTML, PDF, DOC, JPG, MP3, MP4,
PHP, CPP, etc.). A regular user typically has many windows (different types of software,
browsers, e-mail accounts, etc.) open at one point of time. User must use numerous
computer files, interfaces, and switches between them by clicking with the mouse. In
comparison, using the knowledge tables, the user can directly visit websites, local folders,
open directly software or browsers, meaning that the number of mouse clicks required
is significantly lower when using WPad. It can be estimated that an individual working
with information and knowledge can save tens of thousands of clicks per year in this way.
Since only selected learning content is inserted to the knowledge tables, the knowledge
base consisting of the knowledge tables of individuals is drastically smaller than the size
of the computer files. As WPad is an all-in-one software, it is not possible to describe
all the cases that the authors have dealt with over the years of research. The following
screenshots illustrate some of them.
Figure 10 shows some teacher’s tables using WPad, i.e., tables with direct links to
the academic information system without using a browser (shown at the top) and tables
used for assessment, with automatic evaluation and grading (shown at the bottom). For
the lower image in Fig. 10, it should be noted that at this time, handwritten work by
students was scanned and evaluated by scoring three areas, meaning that the computer
was able to automatically sum these and insert the result into the table (although the
addition of the points was done in the text field).

Fig. 10. Teacher’s tables linking to the academic information system and student assessments

Figure 11 presents schematically two cases from research on speech recognition and
modeling of the creation of educational packages by an international team.
A Universal IT Support System for Teachers 695

Fig. 11. Testing speech recognition software for controlling source code via voice (Left); scheme
for educational packages creation by an international team (Right)

5 Conclusion
In this paper, we have described a solution for integrating IT into educational processes
based on the design of own educational software that supports as a universal tool for all
the common activities of a teacher in his teaching, publishing, and research. The teacher
does not have to adapt to existing technology, but the software and the IT infrastructure
are built according to the needs of the teacher and the students. WPad software was
explained in terms of its use as a universal interdisciplinary all-in-one educational tool.
From an informatics point of view, it can be used (1) for the processing of educational
texts; (2) creating a large amount of e-learning and educational materials, as it also
functions as a simple personal HTML editor; (3) as an editor and corpus when teaching
programming languages (C++, C, PHP); and (4) as a supporting tool for pre-service
teachers for their diploma theses, and in a wide variety of situations in the realm of
teaching and learning.
As WPad allows the teacher to process large amounts of educational content, it
has also been tested as a tool for processing large volumes of information contained in
computer files. Indeed, thanks to today’s technology, teachers have a “small internet”
on their computers. Therefore, the research focuses on aggregating educational content
from all offline/online sources and reducing it into a form of a personal knowledge base.
This approach allows teachers to minimize the current information overload. There are
also technological limitations, e.g., when transferring a very large amount of computer
files between offline and online environments, or limitations related to the technology
lifecycle, which is shorter than teachers need in practice. From a pedagogical point of
view, it is important whether the teacher is able to formulate the educational algorithms
needed to write the appropriate informatics algorithms. This is particularly important for
automating the creation of educational content in the form of learning packages. Future
696 S. Svetsky and O. Moravcik

work could therefore focus on the design of a shared virtual server for teaching students,
or the creation of an educational portal with language support. In terms of future plans,
the research will focus on interdisciplinary aspects such as synchronization of teaching
algorithms and computer algorithms. In this context, research is limited by the level of
available technology (e.g., the planned use of Speech recognition technologies depends
on the possibility of using it for languages other than English).

References
1. Edutopia: Technology integration (2007). https://www.edutopia.org/technology-integration-
guide-description
2. Learning technologies: Teaching with technology. https://teaching.usask.ca/strategies/lea
rning-technologies.php#Usingtechnology
3. Technology of education vs technology in education (2011). https://www.differencebetween.
com/difference-between-technology-of-education-and-vs-technology-in-education
4. Wikipedia: Educational technology. https://en.wikipedia.org/wiki/Technology-Enhanced_
Learning
5. Goodman, S.P., et al.: Technology-Enhanced Learning: Opportunities for Change. Laurence
Erlbaum Associates, Mahwah, NJ, USA (2002)
6. Huang, R., Kinshuk, Jemni, M., Chen, N.-S., Spector, J.M. (eds.): Lecture Notes in
Educational Technology Series (2021). https://www.springer.com/series/11777
7. Roblyer, M.D., Doering, A.H.: Integrating Educational Technology into Teaching, 6th edn.
Pearson (2013)
8. Bower, M.: Design of Technology-Enhanced Learning: Integrating Research and Practice.
Emerald Group Publishing – Education (2017)
9. Balacheff, N., Ludvigsen, S., Jong, T, Lazonder, A., Barnes, S. (eds.): Technology-Enhanced
Learning. Principles and Products. Springer, vol. XXVI, 326 p (2009)
10. Stošić, L.: The importance of educational technology in teaching. Int. J. Cogn. Res. Sci. Eng.
Educ. 3(1), 111–114 (2015). https://doi.org/10.23947/2334-8496-2015-3-1-111-114
11. Martens, A.: Software engineering and modelling in TEL. In: Huang, R., Kinshuk, N.-S.C.
(eds.) The New Development of Technology Enhanced Learning: Concept, Research and
Best Practices, LNET, pp. 27–40. Springer, Heidelberg (2014). https://doi.org/10.1007/978-
3-642-38291-8_2
12. Oliver, M.: Learning technology: theorising the tools we study. Br. J. Edu. Technol. 44, 31–43
(2013)
13. Kinchin, I.: Avoiding technology-enhanced non-learning. Br. J. Edu. Technol. 43(2), 43–48
(2012)
14. Walker, R., Voce, J., Swift, E., Ahmed, J., Jenkins, M., Vincent, P.: 2016 Survey of Technology
Enhanced Learning for Higher Education in the UK. UCISA TEL Survey Report 2016.
University of Oxford (2016)
15. Lundie, D.: Authority, autonomy and automation: the irreducibility of pedagogy to informa-
tion transactions. Stud. Philos. Educ. 35(3), 279–291 (2016)
16. Svetsky, S., Moravcik, O.: Some barriers regarding the sustainability of digital technology
for long-term teaching. In: Arai, K., Bhatia, R., Kapoor, S. (eds.) FTC 2018. AISC, vol. 880,
pp. 950–961. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-02686-8_71
17. Mishra, P., Koehler, M.J.: Technological pedagogical content knowledge: a framework for
integrating technology in teachers’ knowledge. Teach. Coll. Rec. 108(6), 1017–1054 (2006)
18. Zhang, W., Tang, J.: Teachers’ TPACK development: a review of literature. Open J. Soc. Sci.
9, 367–380 (2021). https://doi.org/10.4236/jss.2021.97027
A Universal IT Support System for Teachers 697

19. Mayer, R.: Cognitive Theory of Multimedia Learning, pp. 43–71. The Cambridge Handbook
of Multimedia Learning, Cambridge University Press, Cambridge, UK (2014)
20. Svetsky, S., Moravcik, O.: The automation of teaching processes based on knowledge
processing. Trans. Mach. Learn. Artif. Intell. 2(5) (2014)
21. Syed, V.A.: Next generation knowledge machine: design and architecture Page xiii. Elsevier.
https://www.sciencedirect.com/science/article/pii/B9780124166295000153
22. Biology discussion: Anderson and Krathwohl’s taxonomy (with comprehensive view)
| Biology. https://www.biologydiscussion.com/living-organism/taxonomy-living-organism/
anderson-and-krathwohls-taxonomy-with-comprehensive-view-biology/85945
23. Svetsky, S., Moravcik, O., Tanuska, P., Rehakpva, A., Ruskova, D.: The implementation of
technology enhanced learning at dislocated university workplace. In: ICETA International
Conference on Emerging e-Learning Technologies (2008)
24. Svetsky, S., Moravcik, O.: The utility model UV 7340-2014: The linked unstructured data
processing system using a specific data structure. Industrial Property Office of the Slovak
Republic (2016)
25. Svetsky, S.: The practical aspect of knowledge construction and automation of teaching
processes within technology-enhanced learning and eLearning. Habilitation thesis, Slovak
University of Technology (2012)
Communicating Vessels Model for the Intelligent
Monitoring System of the Service Guarantee
in the New Generation of Digital Open
Universities (NG-DOU)

Boukar Abatchia Nicolas(B) , Mahamadou Issoufou Tiado, Moussa Harouna,


and Ibrahim Ganaou Noura

Department of Mathematics and Computer Science, Research Team on Network and


Telecommunication, University of Abdou Moumouni, 10662 Niamey, BP, Niger
[email protected]

Abstract. The method of course exemption is improved through technology evo-


lution. An example is given by the distance education, the electronic learning, or
the mobile learning (m-learning) and the cloud learning. Using these techniques,
some recent works highlight the advantages to build a New Generation of Digital
Open Universities (DOUNG). This new model uses hybrid architecture based on
the interconnection between Internet and GSM (Global System for Mobile). It then
becomes necessary to establish the threshold of parameters that can compromise
the service of course delivery alive because of the multimedia traffic constraints.
This paper aims to contribute by highlighting significant parameters integrated in
the knowledge base of the Intelligent Interface of Monitoring the Service Guaran-
tee (IIMSG). From the network constraints and the lecture warehouse complexity,
the parameters are set for the need of “service guarantee” mainly deriving from
the synchronous access mode that brings the resource mobilization to reach a crit-
ical threshold. We use the communicating vessels model to illustrate the system
operation and to identify all important parameters.

Keywords: m-learning · Digital open university · Service guarantee · Quality of


Service (QoS)

1 Introduction

The DOUNG is an improved model defined in [1] that allows a learner to follow lec-
tures by using laptop or cell phones. The learner can use the synchronous mode and/or
asynchronous file transfer over multiple available channels. But the mobile devices have
capacity limitations in terms of processing, storing and data display that brings the sys-
tem to operate under many constraints. A response is given by the authors of document
[2] through the Advanced Text Reading System (ATRS) allowing to convert into audio,
the course available in text format. It is possible to improve the efficiency of the DOUNG
service that includes the VPN (Virtual Private Network) [3–6] and the m-learning [7]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 698–709, 2023.
https://doi.org/10.1007/978-3-031-18344-7_49
Communicating Vessels Model for the Intelligent Monitoring 699

model. For this purpose, this paper starts with the definition of the “service guarantee”
concept and its problem in the DOUNG system. Then the area covered is extended
with the complexity calculation of the audio/video lecture warehouse for asynchronous
access mode. The communicating vessels model is used to highlight the system opera-
tion in synchronous mode with significant QoS parameters that populate the Intelligent
Interface of Monitoring the Service Guarantee (IIMSG).

2 The “Service Guarantee” Concept


The IEEE Threshold of Error Rate
The “service guarantee” concept was earlier integrated to the computer network
domain. An example belongs to the IEEE (Institute of Electrical and Electronics Engi-
neers) call for proposition to normalize the local area network. It defines the tolerable
threshold of error rate equal to 10–14. Thus, researchers and industrialists were engaged
to specify solutions that must be implemented by conveying 1014 bits through the local
area network before the modification of one of them occurs. In this example, the service
guarantee is defined as an allowed error rate of the network. The conveying mechanism
is considered normal if an error occurs following the specified rate.

The Allowed Rate of the Internet Service Provider (ISP)


In second example, the agreement based on an allowed rate of number of bits per second
can be established between an Internet Service Provider (ISP) and a client. The ISP uses
a fixed period to determine the average number of bits put into the network by the client.
For the rate of X kb/s (kilo bits/second) and if the ISP is using a period of Y seconds,
the number of bits put into the network during that period is divided by Y. When the
average is below X, then the service guarantee is respected by both two partners. In
reverse, if it is great, the ISP slows down the rate of the client by destroying some next
packets without prejudice, because of the retransmission management of the protocols.
The target of that mechanism belongs to the time management to achieve the balancing
of the accepted rate. By the monitoring of the network activity, the ISP imposes on the
client to follow the agreement and verifies that the service guarantee is effective at both
the two sides. The monitoring system used by the ISP can be based on the volume of
data or on the duration of the connection as in the telephone network. Like that system,
the IIMSG is proposed to monitor the service guarantee between the DOUNG and the
learners. It is used to verify the lecture delivery activity according to their commitment.
The parameters used to set the verification mechanism are integrated in the knowledge
base of the IIMSG.

The Traditional Mail Exchange of the Postal Service


Let’s consider the third example. The traditional mail exchange through the postal service
and let’s take a stamped letter in its envelope as a bit. Thus, if in a first scheme one letter,
out of two reaches its destination while the second is lost, the service is overmuch
unreliable and will inspire the rejection of the post office clients. Contrary, in the second
scheme, if millions of letters reach their destination before losing one, that transmission
error will not compromise the reliability of the service. Generally, the service guarantee
700 B. A. Nicolas et al.

belongs to some identified parameters. Their values can strengthen or compromise the
reliability of the service, meaning in that last case the failure of the provider to respect the
commitment. In some cases, the commitment violation impacts the civil responsibility
of the service provider with often the constraint to compensate the client.

3 The DOUNG “Service Guarantee” Problematic

When transposing the postal service model to the distance education service offered
by the DOUNG, for a learner following a lecture in real time, the service reliability is
compromised if a great percent of the lecture is lost. The loss can occur in the system
or according to the vagaries of the network link. Particularly, the volatile nature of the
wireless link integrated in the DOUNG architecture can compromise the transmission
of the teacher’s message and can alter the understanding of the lecture. To ensure the
reliability of its lecture delivery service, the DOUNG can use the IIMSG to indicate
the service guarantee level attached to each learner access type. The reliability of the
service helps to achieve the goal of increasing the attendance and influences the learner’s
choice of the service access mode according to many constraints such as the type of
communication and of Internet access, the using of cell phones and the crossing of the
GSM network [8].
Generally, the mobile networks as GSM are characterized by low rate of flow, high
latency during the information exchange, low battery autonomy and great coast. In
addition, the availability of the channel is volatile, having thus a high influence on the
DOUNG lecture followed in real time option. It becomes more complex to set the coast
of each kind of course access because of the nature of the mobile networks in addition
to the no more less than binding of the actual cell devices. In many multimedia cases,
the real time system implements the anticipation window concept. To support a great
size of the used buffer, it is necessary to cross the storage limitation constraint of the cell
phones with the high latency constraint described in the channel model using a repeated
link failure of GSM network. The crossing of the two constraints brings the IIMSG to
invoke inference rules and determine the rank from which the choice of a download of
a multimedia file or other file formats (web page, treated text, untreated text, or pdf)
becomes more efficient than following the course alive.

4 The Courses Content Visualization Problematic on the Cell


Devices

The nature of the cell device determines the local environment of the learner. The great
constraint of the visualization application in terms of storage, processing or brows-
ing capacities is contrasted by the cell phone limited interface, their low capacity of
processing, their poor ergonomic possibilities, and the weakness of their storage capacity.
As the DOUNG offers multiple options of lecture delivery, an incoming constraint
belongs to access the lecture alive or to the lecture warehouse regardless of the require-
ments of the used hybrid platforms. The cell devices environment requires a global
approach to specify the lectures formats and a particular management of the lecture
Communicating Vessels Model for the Intelligent Monitoring 701

warehouse. One solution is based on the use of the learning objects that are unbound
to a platform. The concept of systems interoperating easily with each other becomes
paramount interesting. A learning object can be designed as a cloned model that incor-
porates different options. It can be used by the software application according to their
need. In the DOUNG case, the learning object will incorporate the content delivery
system to implement a modular learning. The learner can access the DOUNG lecture
warehouse, download a learning object ready to be played and visualized on conform
system. In addition, the evolution improves the web browsers by integrating the capacity
and the connectivity of mobile devices.
To achieve the goal of using a mobile learning solution on a wide range of devices,
the content delivery must be independent of the mobile devices. The content is to be
separated to the format for avoiding the devices to implement specific solution. The XML
language (eXtensible Markup Language) [9–12] illustrates a solution of specifying the
content regardless to its visualization over multiple types of mobile devices.

5 Definition of the First List of Parameters Used to Monitor


the Service Guarantee

The Complexity of the Audio Lecture Warehouse


The DOUNG asynchronous option for following courses requires a lecture ware-
house. The complexity of the space needed by the warehouse is integrated in the service
guarantee parameters for opening of a new branch of study. When started, the process
of providing education must be lead to its end and the issue of space resolution becomes
prior for the insurance of the service reliability. The amount of space devoted to the
storage of all lessons is predetermined with theoretical parameters of the IIMSG and
real calculated parameters. The first ones belong to projection; the seconds are calculated
during the lecture delivery.
We consider an hour as the unit of the lecture duration in association with the average
throughput of the multimedia device. The average space needed to store a learning object
is then instantiated as that teaching unit. Its complexity is deduced for a lecture given in
synchronous mode by determining the network transfer duration.
Let’s assume Ta as the amount of bits by second generated by an audio capture
application. The rate of Ta = 96 kb/s (kilo bit per second) is an example of audio format
used by some usual applications. In a second example, the MP3 (Motion Picture Expert
Group – Audio Layer 3) generates about 1 Mega byte (Mb) by minute with a rate of
128 kb/s. When taking the duration of a teaching unit as equal to an hour, the equation
below gives the ratio Ca in Mb that is the complexity rate of one teaching unit in audio
format:

Ca = 3600 ∗ Ta /8 ∗ 1024 ∗ 1024 = Ta /2330 (1)

The Complexity of the Video Lecture Warehouse


Different from the audio, the video stream integrates the image resolution and the frame
rate called metadata. For example, the AVI (Audio Video Interleave) video format uses
702 B. A. Nicolas et al.

an encapsulation technique of audio and video stream in continuous mode. The required
synchronization between the two flows facilitates the simultaneous playing of the sound
and the image sequences with various formats for the video. The OpenDML format
subsequently developed allows to exceed the limit of 2 Mb set by the basic AVI format.
The evolution of the AVI format brings also to the AMV (Anime Music Video) format
created for the MP3 and 4 players with a ratio of 4 pixels by byte (Pbb) instead of the
10 Pbb ratio generated by the Mpeg2 (Moving Picture Experts Group). The resolution
range used by the AMV format evolves from 96 × 96 pixels to 208 × 176 pixels. The
image by second (Ibs) cadency varies from 10, 12 to 16 images. For a resolution of 128 ×
96 using 12 Ibs, a video stream of thirty minutes generates around 80 Mb. Before the
advent of the Mpeg format giving a high compression ratio, the multimedia technology
evolution generates the M-JPEG (Motion-Joint Photographic Expert Group) with video
capture devices, able to process 29 Mbps rate of flow.
Let’s assume Tv as the amount of bits per second generates by a video capture
application. Let Nx × Ny be the image resolution with Nx the number of pixels in line
and Ny that of columns. Let’s consider Ni the Ibs cadency used by the video capture
device and Np the number of Pbb. By taking the duration of the teaching unit as equal
to an hour, the equations below give the complexity rate Cv in Mb of one teaching unit
in video format:
The number of bits of every image is given by:
   
Nx ∗ Ny / 8 ∗ Np (2)

   
Tv = Nx ∗ Ny ∗ Ni / 8 ∗ Np (3)

Cv = Tv /2330by (1) and (3) (4)

Ca and Cv allow to determine the space required in the lecture warehouse and on
the learner device that uses the complete download option. Thus, the space required
on the cell device can be deduced and the capacity of the external storage device to be
used. The amount of space required for opening a distance education branch of study
in the DOUNG derives from the two parameters by using university norms. Efficient
projection can be made with the equipment to be mobilized. Ca and Cv are integrated
in the IIMSG used as a dashboard. In addition, the decision of opening a new branch of
study needs to resolve the asynchronous and synchronous connection throughput with
necessary additional parameters.

6 Determination of the Lectures Warehouse Complexity


Recently, all academic backgrounds have been normalized and converted into the uni-
versal “Licence-Master-Doctorat” (LMD) standard. The previous Ca and Cv parameters
indicate the amount of space required to store one hour of lecture in the lecture ware-
house. To determine some next equations, let’s consider Cas as the amount of space
required to store all audio courses in a DOUNG branch of study and Cvs its equivalent
for multimedia courses. The standard duration for the exemption of a lecture varies from
Communicating Vessels Model for the Intelligent Monitoring 703

one hour and fifteen minutes to two hours including the break time and the questions
and answers period between the teacher and the learners. The academic standard sets
the duration of a license branch of study to three years of 600 h by year. The amount
of hours includes the Lecture (LT) period, the assignments (AS) period and the Practice
Work (PW) period. The AS and PW are mainly conducted by the learners and the LT by
the teacher. The 600 h of a year are divided into three periods for the LT, AS and PW.
Let’s assume that the three periods are equivalent. The LT periods are used to determine
the required storage amount of space. Thus, during the whole three years of a license
branch of study, 600 h are spent in standard for the LT. The previous Cas and Cvs are then
adapted to that specific case to become the new Casl and Cvsl parameters calculated in the
following (5) and (6) equations. They indicate the complexity of the lecture warehouse
for a license branch of study in the DOUNG according to the audio or video format of
the lectures:

Casl = 600 ∗ Ca (5)

Cvsl = 600 ∗ Cv (6)

The same scheme is applied to the two years of the Master branch of study. It
produces 400 h of LT when taking the stage period out of consideration. The Casm and
Cvsm parameters indicating the complexity of the lecture warehouse for a master branch
of study are calculated in the following (7) and (8) equations.

Casm = 400 ∗ Ca (7)

Cvsm = 400 ∗ Cv (8)

7 Description of the Communicating Vessels Model

Other additional parameters are identified when considering the synchronous transfer
mode and the using of the anticipation window by the learner device. When that system
is operating, both the two buffers (DOUNG and learner) must avoid the multimedia
stream to be exhausted. The information living in the DOUNG buffer is waiting to be
conveyed in real time at the learner destination through the network. Every information
unit may reach completely to the learner anticipation buffer before the demand of the
learner application occurs even during its transfer process. We use the communicant
vessels model to describe all the parameters required to monitor the service guarantee.
The classic communicating vessels model puts into association two vessels with a
content moving from one to the other through a pipe used as the content conveying
channel. The amount of the content subtracts from one is equal to the amount that
reaches the other. Thus, when providing a real time lecture, the first vessel is the buffer
created by the camera stream at the numeric university side. The channel is composed of
the DOUNG protocol stack at the sender side, the network and then the learner protocol
stack at the destination. The second vessel is the learner buffer created by the anticipation
704 B. A. Nicolas et al.

Fig. 1. The communicating vessels model of the synchronous access mode

window of the application. The Fig. 1 below illustrates that model and allows to identify
all the IIMSG additional parameters. It helps to build their equation and makes explicit
their interdependence.
Note: The DOUNG and the learner buffers operate in FIFO (First In First Out) mode.
For the DOUNG buffer, the stream enters from the top (input) and goes out (output) from
the bottom of the vessel. Contrary, the input access point of the learner anticipation buffer
is the bottom while the stream is output from the top.
Legend of the parameters:

Tv Camera bit rate (in bit per second)


Fs Full level of the sending buffer (in number of bits)
Es Empty level of the sending buffer (in number of bits)
Vtc End to End theoretical bit rate of the DOUNG channel (in bit per second)
Fr Full level of the receiving anticipation window (in number of bits)
Er Empty level of the receiving anticipation window (in number of bits)
Rv Data displaying bit rate (in bit per second)

The Round Trip Time (RTT) is an additional parameter that shows the time spent by
the learner’s application to reach the DOUNG and to get back the requested data.

8 Resolution of the Transfer Time Problematic in Synchronous


Mode

Integrating the Network Delays


The lecture delivery in synchronous mode needs to consider the DOUNG initial time
of starting the session (STs ). The crossing of the network brings the learner to synchronize
with a delay at his own initial time STv . Let assume δ as equal to the amount of bits
Communicating Vessels Model for the Intelligent Monitoring 705

required by the anticipation buffer before the learner application starts playing the video.
Then the difference between STv and STs is extended by the time used for the transfer of
δ bits through the network. The value of δ is given as the required size of the anticipation
buffer. In addition, the network throughput, and the type of traffic help to determine the
shift time before the synchronization between the teacher and the learner. For example,
the Constant Bit Rate (CBR) [13] model provides an uninterrupted information steam
contrary to the exponential traffic variation model that describes an alternate traffic with
peaks and low activity periods. Some other models integrate the traffic interruption
periods to differentiate the continuous and the discontinuous (discrete) nature of the
information stream.
We are using average values of the CBR for monitoring the service guarantee dur-
ing the lecture delivery. Thus, at every current time (CT), the amount of transferred
information is used to calculate the effective values of the parameters that populate the
IIMSG.

Using of the Communicant Vessels


When a lecture is started, initially the two communicating buffers are empty. At the
DOUNG side, the camera will tend to fill the sending buffer while the network will have
the effect of emptying it. At reverse, the network will tend to fill the learner anticipation
buffer while the visualization application will have the effect of emptying it. During
the lecture delivery, the amount of information available in the two buffers at CT time
becomes an important parameter allowing to establish the failure probability of the
synchronization between the teacher and the learner. Thus, the service guarantee is
subject to the values of the two parameters Qs and Qr related to the amount of information
available in the two communicant vessels at CT time. The CBR model of the traffic
helps to determine the calculus formula of the average value of these parameters. Their
effective values are established in real time with the effective amount of the exchanged
information.

Integrating the Data Time to Live (TTL)


The IIMSG integrates the data TTL in the DOUNG buffer, value which is given following
the routing protocol model. For its calculation, the RTT is considered for allowing the
learner application to operate the recovery of lost information for a maximum of Tmax
attempts by using the advantage offered by the anticipation window. The Tmax value is
set following the maximum number of attempts of the well-known Ethernet CSMA/CD
(Carrier Sens Multiple Access with Collision Detection) [14, 15]. Deriving from the TTL,
TTLmax parameter is defined at the current time CT as the maximal transit duration
of information in the DOUNG buffer regardless to the occurrence of recoveries. All
recoveries that occur may extend that duration by adding the time taken to cross the
network multiplied by the amount of transferred packets from the beginning to CT. The
maximal transit duration allows to establish if the DOUNG configuration offers chances
to learners to operate successful recoveries of lost information before that information
706 B. A. Nicolas et al.

expires. That philosophy avoids the loss of synchronization and the loss of the thread of
the lecture by the learner. Such a loss impacts the understanding of the teacher message.

Integrating the Recovery Time of the Receiver (RTR)


The RTR is another IIMSG parameter showing the recovery duration of lost data. It
varies according to the traffic state, the network throughput, the traffic nature (CBR
for example) and the Smax parameter (in number of bits) indicating the information
maximal threshold that can be stored in the DOUNG buffer. The value of Smax is set
according to the value of Tv that is the camera throughput and according to Tmax the
maximum number of allowed recovery attempts. For example, the threshold Smax for
an asynchronous access is equal to the amount of information generated by the camera
during one hour of lecture to store it in full. The RTR problematic is discussed later
and allows to determine the PRTR (Boolean) parameter indicating the successful or the
unsuccessful chance of a recovery. The threshold FRTR can be set to limit the number
of the unsuccessful recovery occurrence indicated by PRTR during a period. That period
depends on the content available in the anticipation buffer. When the value of FRTR is
reached, an inference rule advices the learner to choose the asynchronous mode that is
becoming best to match the service guarantee constraints. The effective value of PRTR
is calculated by the monitoring system during the lecture delivery. The transport layer
reliable protocol retransmissions are helpful to achieve that goal or the same mechanism
implemented by the applications. Thus, the occurrence of a recovery failure can be
monitored in an efficient manner.

Integrating the Effective Throughput of the Network


We consider the Vrc as a parameter of the IIMSG giving the effective throughput of the
network. The determination of its value belongs to the amount of data received by the
learner. The monitoring of the anticipation buffer entrance will facilitate that calculus.

9 Summary of the IIMSG Parameters from the Communicant


Vessels Model
The previous parameters of the IIMSG are calculated as follow. The current time CT is
used to calculate the number of exhausted seconds since the beginning of the lecture.
The number is called T and presented in the Eq. (9) that follows:
T = STs − CT (9)
The following equations from (10) to (12) use Vtc as the theoretical throughput
of the network while Vrc is the effective throughput. To determine their values in the
interconnection between the Internet and the GSM network, the lowest channel enforces
its throughput. The TTLmax parameter (in number of seconds) calculated below gives
the maximal time to live of the data in the DOUNG buffer without any recovery. When
considering the FIFO principle and the CBR model of the traffic, the TTLmax is the time
taken by the network to evacuate the precedent stream injected in the DOUNG buffer
by the camera, without recovery, before reaching the information injected at the time T.
TTLmax = (Tv ∗ T) / Vtc (10)
Communicating Vessels Model for the Intelligent Monitoring 707

The theoretical data amount available in the DOUNG buffer at CT, without any
recovery, is Qs (in number of bits):

Qs = (Tv − Vtc ) ∗ T (11)

The theoretical data amount available in the anticipation buffer of the learner at CT,
without any recovery, is Qr (in number of bits):

Qr = (Vtc − Rv ) ∗ T (12)

10 Recovery Time (RTR) Influence and Determination of PRTR


In the communicant vessels model, the DOUNG buffer is lowered by data conveyed to
the learner before it exhausts. When a data loss occurs, it generates an overhead that
increases the size of the DOUNG buffer with the network MTU (Maximal Transmit
Unit) amount of bits in addition to the camera throughput. The same event decreases the
amount of bits available in the anticipation buffer. Following the transport layer reliable
protocols principle, when the recovery time expires, another recovery is initiated, and
so on, until the maximal number of attempts is reached. Every unsuccessful attempt
increases the data level in the DOUNG buffer because of the camera incoming stream
and the remaining data. It decreases the amount of available data in the anticipation
buffer of the learner by the browser emptying effect. The increasing and lowering effect
will accelerate according to the time taken by the system to overcome that recovery
issue, with the additional constraint to convey the normal stream after the resolution of
the data loss problem. The recovery will accumulate data in the DOUNG buffer and
extend their living time. Thus, a recovery can be successful only if the stream injected
by the camera will not exceed Smax . If not, the loss becomes effective with the expiration
of the data living time in the buffer. The FIFO mechanism will replace the old data by
the new camera stream. Let’s assume ϕ as the number of unsuccessful attempts before
the resolution of the recovery problem and let’s suppose that the retransmission retrying
period take a constant time ε over the RTT. If the retransmission timer describes growing
intervals between every attempt, ε is replaced by the function that generates the used
growing values. The time taken by the camera to reach the Smax threshold, in addition
to the available content in the buffer, becomes then of paramount importance. The same
philosophy can be considered at the anticipation buffer side, to avoid the reaching of
its critical threshold that is also defined following the δ parameter. From the DOUNG
side, a recovery at the time T can succeed only if the stream injected by the camera will
not bring the content of the buffer to exceed Smax . Under that condition, the information
loss can be avoided. Thus, QRTT can be calculated as the amount of bits injected by the
camera during that period:

QRTT = (Tv ∗ (RTT + e) ∗ j) (13)

An inference rule of the IIMSG can be stated. The “true” value of PRTR allows to
initiate the recovery while the “false” indicates that it is prohibited. The value of PRTR
708 B. A. Nicolas et al.

depends on Qs which is the amount of the data available in the buffer added to QRTT and
compared to Smax :

If (Qs + QRTT ) < Smax then PRTR = ′′ true′′ else PRTR = ′′ false′′ (14)

The values of Qs and Qr are exchanged between the two sides of the system. The
TCP (Transmission Control Protocol) stream control mechanism is used as a model; the
“window” field of TCP can be used to limit the overhead in the communicant vessels
system.

11 Conclusion
The DOUNG is bound to respect its commitment to deliver a complete service of distance
education delivery following multiple constraints such as the mobile network weakness,
the volatile availability of its channel with repeated link failure. The constraints are
extended by the no more less binding of the actual cell devices having storage limitation,
low processing and browsing capacities. All these constraints impact the reliability of
the service and influence the learner’s choice of a service access mode.
This paper helps to identify the service guarantee parameters. The goal is to make
their values available for the DOUNG and the learner during a course delivery, showing
the level of the service guarantee realization. Some of the parameters that populate the
knowledge base of the IIMSG derive from the synchronous mode having a restrictive
transmission character. We study that real time lecture delivery by using the communicant
vessels model to achieve an efficient design of that problem. In addition, the parameters
are extended by the asynchronous access mode or required for opening a new branch of
study.
An upcoming work to be conducted is to run simulations according to the traffic
type so that the variation of the buffers level will become explicit. This will help to set
significant thresholds of the parameters to fill the inference base of the IIMSG.

References
1. Issoufou Tiado, M., Saliah-Hassane, H.: Cloud-Computing based architecture for the advent
of a New Generation of Digital Open Universities in m-learning. In: ICEER13 Proceedings,
pp. 572–579 (2013). www.labader.org,
2. Tiado, M.I., Idrissa, A., Karimou, D.: Improved text reading system for digital open
universities. IJARAI 4(10), 29–34 (2015)
3. Sun, Y., Wang, B., Wang, C., Wei, Y.: On man-in-the-middle attack risks of the VPN gate
relay system. Hindawi Secur. Commun. Netw. Article ID 9091675, 7 (2021). https://doi.org/
10.1155/2021/9091675
4. Zhou, Z., Huang, T.: Open VPN application in COVID-19 pandemic. In: 2021 Interna-
tional Conference on Advances in Optics and Computational Sciences, Journal of Physics:
Conference Series, vol. 1865, p. 42015 (2021). https://doi.org/10.1088/1742-6596/1865/4/
042015
5. Kumaki, K., Murai, T., Cheng, D., Matsushima, S., Jiang, P.: Support for Resource Reservation
Protocol Traffic Engineering (RSVP-TE) in Layer 3 Virtual Private Networks (L3VPNs), RFC
6882 (2013)
Communicating Vessels Model for the Intelligent Monitoring 709

6. Fuzi1, M.F.M., Alias, M.R.M., Kaur, N., Halim, I.H.A.: SafeSearch: obfuscated VPN server
using raspberry Pi for secure network. J. Comput. Res. Innov. 6(4), 90–101 (2021). https://
jcrinn.com, eISSN: 2600–8793
7. LeCavé, A., Salamin, A.D.: Mobile Learning: Les avantages du papier virtuel, FI 1, Février,
pp.7–9 (2004)
8. Aziz, B.T.A., Tiado, M.I., Abdoulwahabou, S., Harouna, M., Noura, I.G.: Models of Quality
of Service (QoS) in the GSM environment of the New Generation of Digital Open Univer-
sities (DOUNG). Int. J. Wireless Networks Commun. 1 3(1), 1–13 (2021). (Research India
Publication, SSN 0975-6507, 2020)
9. Liu, X.: Wireless network communication in the XML metadata storage of Wushu Historical
Archives. Hindawi Wireless Commun. Mobile Comput. 2021(Article ID 5171713), 13 (2021).
https://doi.org/10.1155/2021/5171713
10. Breje, A.-R., Győrödi, R., Győrödi, C., Zmaranda, D., Pecherle, G.: Comparative study of
data sending methods for XML and JSON models. IJACSA 9(12) (2018)
11. Seol, K., Kim, Y.-G., Lee, E., Seo, Y.-D., Baik, D.-K.: Privacy-preserving attribute-based
access control model for XML-based electronic health record system. IEEE ACCESS, Digital
Object Identifier (2018). https://doi.org/10.1109/ACCESS.2018.2800288
12. http://www.ieee.org/publications_standards/publications/rights/index.html
13. Sinnema, R., Wilde, E.: eXtensible Access Control Markup Language (XACML) XML Media
Type, RFC 7061 (2013)
14. Ceccarelli, D., Zhang, F., Belotti, S., Rao, R., Drake, J.: Traffic Engineering Extensions to
OSPF for GMPLS Control of Evolving G.709 Optical Transport Networks, RFC 7138 (2014)
15. Key, R., Delord, S., Jounay, F., Huang, L., Liu, Z., Paul, M.: Requirements for Metro Ethernet
Forum (MEF) Ethernet-Tree (E-Tree) Support in Layer 2 Virtual Private Network (L2VPN),
RFC 7152 (2014)
People Skills and Online Learning: To Assume
Makes an Ass Out of U and Me

C. Todd Williams(B)

Southeastern Oklahoma State University, 425 W. University Blvd., Durant, OK 74701 (Morrison
220-C), USA
[email protected]

Abstract. Education is an enterprise that is rich with interactions amongst people


from varied and diverse backgrounds. Online education has become increasingly
popular with teachers and others who desire an advanced degree that will allow
them entry into administrative positions in school leadership. It is assumed that
graduates of strictly online programs do not possess the necessary people skills
that will allow for success as school administrators. The research conducted in
this study is relevant for the simple fact that school leaders need to have a certain
amount of emotional intelligence in order to perform their jobs in a quality manner.
Findings from this study suggest that supervisors of graduates of an online degree
program from a regional university are pleased with the employees they have hired,
thereby confirming that the curriculum being taught is relevant and cognizant of
the concepts related to interpersonal relationships.

Keywords: Interpersonal relations · People skills · Emotional intelligence ·


Online education · Rural education · Educational leadership

1 Introduction
Due to the popularity and convenience of online degree programs, there is little doubt that
participation in online learning programs has skyrocketed in recent years. About 46%
of college students in the United States have taken at least one online course according
to statistics from eLearning Industry [14]. Likewise, research on the effectiveness of
online learning, as reported in The Future of State Universities analysis [19] suggests
that “the growth of online learning is also in response to the new college student who
is older, more technologically savvy, and in need of an accessible, low-cost educational
option.” According to Allen and Seaman, almost 30% of all enrollments are now in
online courses [1].
However, not all of academia is convinced that online learning is the answer to cure
all ills of higher education. The Babson Study [13] put it this way:

“The proportion of chief academic leaders reporting online learning is critical to


their long-term strategy reached a new high of 70.8%. At the same time, only 28%
of academic leaders say that their faculty accept the value and legitimacy of online
education.”

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 710–723, 2023.
https://doi.org/10.1007/978-3-031-18344-7_50
People Skills and Online Learning 711

In addition to questions about academic rigor and the legitimacy of online learning,
there are questions about grading. Research completed by Littlefield [13] clearly supports
the notion that “students who took all or part of their class online performed better, on
average, than those taking the same course through traditional face-to-face instruction.”
In some cases, there is general distrust when it comes to professors of traditional, face-to-
face instructional delivery and those who teach strictly online. The ever-present concern
that your university is becoming a “diploma mill” looms large in the minds of college
professors who care deeply about their field of study and are passionate about delivering
curriculum that is relevant, meaningful, and helps train students who will make a positive
impact on others.
The individual writing this paper is a college professor teaching online courses for
Southeastern Oklahoma State University in Durant, Oklahoma. He was hired in 2017
and has been teaching educational administration courses for 4.5 years. Most of the
courses taught by the writer/researcher have been delivered online. To be completely
honest with you, the reader, the writer/researcher would prefer to teach face-to-face
in the physical presence of his students. In a somewhat dated although very important
contribution to the field of education, Paolo Freire [9] indicated that a relationship with
a caring, supportive teacher is critical to student success. This viewpoint is shared by
the researcher as he has tried to be a supportive educator who not only can empathize
with his students but one who tries to teach in a way that prepares students for success
once they have graduated from our program.
During this journey as a college professor, he has made a habit of not only listening
to students but also listening to the people who are going to (hopefully) hire them.
One concern that has consistently reared its ugly head is the idea that graduates of
online programs do not have the requisite people skills necessary for success as a school
administrator. In fact, the researcher has been told by a prominent leader of an educational
service provider that some school superintendents “will not hire any more graduates of
an online program” due to the perceived lack of people skills that they have witnessed in
the graduates they have hired from such institutions. You can imagine how the previous
statement has caused not just a little bit of anxiety as our faculty has tried to navigate
the conundrum of trying to avoid this reputation and design learning activities that are
relevant yet cognizant of the need to develop an awareness of and sensitivity to others.
In the paragraphs that follow, the researcher will explain how he went about dealing
with this problem. A survey was developed for local administrators to evaluate the people
skills of known graduates of our program at Southeastern. Admittedly, the survey size
is small (33 participants) yet the results are insightful. (One reason for the small sample
size is due to the relatively recent development of our online degree. Finding school
administrators from our area who could evaluate our graduates is limited due to the rural
nature of our campus/area coupled with the fact that school administration jobs can be
hard to get).
The specific question that this research attempts to address is this: “Do graduates
of the online master’s in education (MED) program offered at Southeastern Oklahoma
State University possess the necessary interpersonal skills that allow them to be suc-
cessful school leaders?” As you can tell from the question above, the researcher targeted
emotional intelligence and the reader will understand this better as you see the questions
712 C. T. Williams

related to the survey. Goleman [10] estimated that “close to 90% of a leader’s success is
attributable to emotional intelligence”.
Education is a people intensive enterprise, requiring school leaders to have a skill
set that includes sensitivity to others - especially children. In addition to the obvious
overtures about people skills, another item the researcher was hoping to target is whether
or not we, as a staff, need to address some of these issues in our curriculum and possibly
update our approach to instruction as it relates to these matters.
The setting for the research conducted in this study was a group of public schools
in the state of Oklahoma near a regional university. What prompted the research was a
desire to know the answers to the following questions:

1. Do the graduates of the online MED (master’s in education) at Southeastern Okla-


homa State University have the people skills necessary for success as school
leaders?
2. Do we (as a staff) at Southeastern Oklahoma State University need to focus more of
our curriculum on the development of emotional intelligence in our students in the
MED program?

2 Literature Review
Interpersonal skills are referred to as soft skills in today’s business world. Soft skills
include uniquely human relational skills such as listening, empathy, communication,
compassion, and a caring attitude towards others. Based on a study by the Society
for Human Resource Management [16], 51% of its members reported that “education
systems have done little or nothing to help address the skills shortage.” In addition, human
resource professionals targeted soft skills such as professionalism, business acumen,
critical thinking, and lifelong learning as skills that are lacking in job candidates and
potential employees [16].
According to Chamorro-Premuzic and Frankiewicz, [3] the demand for colleges and
universities to stress soft skills is becoming more important and necessary. They stated
the need this way:

“…universities could substantially increase the value of the college degree if they
spent more time teaching their students critical soft skills. Recruiters and employ-
ers are unlikely to be impressed by candidates unless they can demonstrate a certain
degree of people-skills. This is perhaps one of the biggest differences between
what universities and employers look for in applicants. While employers want
candidates with higher levels of EQ (emotional intelligence), resilience, empathy,
and integrity, those are rarely attributes that universities nurture or select for in
admissions. As the impact of AI (artificial intelligence) and disruptive technology
grows, candidates who can perform tasks that machines cannot are becoming more
valuable—and that underscores the growing importance of soft skills, which are
hard for machines to emulate.” [3].

In a survey of 2,600 hiring managers and human resource professionals, 71% stated
they valued emotional intelligence more than intelligence; 75% stated they were more
People Skills and Online Learning 713

likely to promote a worker who is highly emotionally intelligent; and 59% mentioned
they would not hire a candidate with a high IQ but low EQ [8].
Deutschendorf [8] listed seven reasons why emotionally intelligent candidates are
so valuable:

• They can handle pressure healthily.


• They understand and cooperate with others.
• They’re good listeners.
• They’re more open to feedback.
• They’re empathetic.
• They set an example for others to follow.
• They make more thoughtful and thorough decisions.

As mentioned previously in this article, the researcher is a college professor teaching


educational administration courses at a regional university in the state of Oklahoma. Our
main focus is the training of teachers for jobs in school administration as principals,
superintendents, and other central office positions. One of the questions that comes to
mind is whether or not our students, through quality interactions with the faculty on a
routine basis, are being led in a manner that fosters an awareness of others and breeds an
attitude of empathy. Another dilemma revolves around this question: can these personal
qualities be passed on to others via online learning?
The challenge of establishing a personal relationship with each student is made
much more difficult due to the fact that online teachers are speaking to a computer
with a camera and their actions are being broadcast via the internet. Although one might
assume that this is impossible, Song et al. [17] found otherwise in their research regarding
the relationship between college professors and students in their classes. As defined by
Derlega, Metts, Petronio, and Margulis [7], self-disclosure is the act of revealing personal
information to others and is a fundamental starting point from which quality interpersonal
relationships are built. Teacher self-disclosure is defined as “conscious and deliberate
disclosures about one’s self, aspects of one’s professional practice, world or personal
views, personal history, and responses to ongoing classroom events” [15]. Through their
research, Song et al. [17] found that an analysis of the results suggested that teacher
self-disclosure and student emotional response toward teacher self-disclosure enhanced
student perceptions about the teacher-student relationship. This also yielded an increase
in student perceptions regarding knowledge gained and class satisfaction.
Social presence theory is an attempt to explain how digital interfaces in human-
computer interactions are influenced by the sense of being with another person. Accord-
ing to Cui et al. [5], social presence is worthy of much research due to the “asynchronous
nature of online learning and communication issues between online instructors and stu-
dents.” Tackie [18] pointed out that availability and access have to be intentionally
established by the teacher in the virtual classroom before intimacy can be achieved.
Song et al. [17] further concluded that teachers of online classes who can generate a
greater social presence by improving students’ perceptions of closeness in spite of the
lack of physical proximity ultimately improve the student-teacher relationship while
simultaneously producing more efficacious learning experiences.
714 C. T. Williams

Tackie [18] also explained the importance of social presence and how it impacts the
online learning environment: “Effective social presence enables students to recognize
their teachers’ humanity. By conveying personal information, or making themselves
readily available, teachers establish human connection, which in turn leads students to
more deeply engage in the classroom and motivates enhanced communication.”
A relatively recent development that has occurred due to the advent of online learn-
ing is TPACK or Technological Pedagogical And Content Knowledge. According to
tpack.org, [20] TPACK “attempts to identify the nature of knowledge required by teach-
ers for technology integration in their teaching, while addressing the complex, multi-
faceted and situated nature of teacher knowledge.” The TPACK framework consists of
seven components and is illustrated Fig. 1:

Fig. 1. Reproduced by permission of the publisher, © 2012 by tpack.org

From the graphic represented above, one can see that the optimal level of student
learning occurs when the components of technological knowledge (TK), content knowl-
edge (CK), and pedagogical knowledge (PK) intersect. For teachers who deliver instruc-
tion in a purely online format, this model has significant implications. It is easy to see that
what is important to the instructor becomes important to the student. Personal character-
istics and qualities that are deemed to be essential interpersonal skills by the instructor
are emphasized in learning activities that lead to the student making an emotional evalu-
ation as to their significance and deciding whether these qualities will be adopted by the
People Skills and Online Learning 715

learner. These values are not only “taught” but are effectively “caught” by the students
as the instructor leads the class.
A quote by Maya Angelou serves as a good example. She once said, “People may
not remember what you said but they will always remember how you made them feel.”
The importance of this mindset is used as the basis for a class discussion via Zoom in
the writer’s classes as a way of stressing the importance of genuine personal interactions
which lead to deeper levels of trust, sensitivity to others, and mutual respect. Now, the
reader might assume that due to the impersonal setting of a virtual classroom meeting
online, personal concepts such as those mentioned above cannot be transmitted to a
class full of digital natives who are participating via the internet from their own homes.
However, just the opposite was found by simply reading through the student evaluations
for the writer/researcher in addition to the numerous studies about this phenomena (Cui
et al. [5]; Song et al. [17]; Tackie [18]).
Effective leaders possess a high degree of empathy (Greenleaf [11]; Culver [6]).
Effective leaders also demonstrate humility and are focused on the needs of others
(Collins [4]; Blanchard and Hodges [2]). Therefore, it was decided to survey a group of
school administrators who were working alongside a graduate of our program and had
the responsibility of evaluating these graduates in order to test the hypotheses mentioned
previously in this article.

3 Research Methodology

With this knowledge in mind, it was decided to survey administrators of local school
districts who had known graduates of the online program at Southeastern. The purpose
of the survey was to evaluate the emotional intelligence of our graduates in an attempt
to measure two things:

1. Do the graduates of the online MED (master’s in education) at Southeastern Okla-


homa State University have the people skills necessary for success as school
leaders?
2. Do we (as a staff) at Southeastern Oklahoma State University need to focus more of
our curriculum on the development of emotional intelligence in our students in the
MED program?

4 Participants
The sample size for this study included supervisors of graduates of our program who
had secured employment as a school administrator since 2017, when our program went
online. Thirty-three supervisors responded to the survey which was completed online via
Survey Monkey. Admittedly, the sample size is small, due mainly to the fact that it takes
time for a person to secure a job as a school administrator. However, the results do reveal
some interesting insights about people skills, emotional intelligence, and the perceptions
supervisors have about our graduates. Certainly, further study of this phenomena is
necessary.
716 C. T. Williams

5 Design and Procedure


Data for this study was collected through the analysis of supervisor perceptions of
graduates of the online MED (master’s in education) program offered at Southeast-
ern Oklahoma State University located in Durant, Oklahoma. The survey, designed in
a Likert Scale format and administered through Survey Monkey, was sent to school
administrators in the local area who were the known supervisors of graduates of our
program.

6 Results

Demographics. Surveys were sent via email to 60 school administrators in our local
area who were currently supervising known graduates of the online MED program at
Southeastern Oklahoma State University (SOSU). Responses were received from 33
school administrators who agreed to participate in the study. Among the school admin-
istrators who responded, four (12%) were Superintendents, four (12%) were Assistant
Superintendents, 12 (37%) were Principals, six (18%) were Assistant Principals, and
seven (21%) were from the category “Other”, meaning curriculum director or other
district/campus leader.
Of the respondents, a little over half (18 or 55%) had served 21 or more years in
education, while roughly a fourth (8 or 24%) had served 11–20 years in education.
Five (15%) had served 6–10 years and two (6%) had served 0–5 years in education,
respectively.
In terms of years of experience, 55% (18) reported they had spent 0–5 years in their
current position. Fully 30% (10) reported they had 6–10 years of experience in their
current position while 12% (4) expressed they had spent 16–20 years of experience in
their current position. Only one (3%) indicated they had 21–25 years of experience in
their current position.
In describing their community, 52% (17) indicated they worked in rural areas. Twelve
respondents (36%) identified their community as suburban and four (12%) reported they
worked in an urban setting. In terms of level of education, five respondents (15%) had
earned a doctoral degree while 28 (85%) had earned a master’s degree.

Findings of the Study. Overall, the respondents for this survey were generally positive
about their experiences with graduates of our online program. Questions 7, 8, and 13 were
designed to measure the degree to which respondents’ view online learning as a legitimate
form of instructional delivery. Question 8 was intentionally worded in a negative tone in
order to verify the reliability of responses for a similarly worded question. The results
for these three questions confirm that the respondents do not view online learning in a
negative way.
The remaining questions on the survey (6, 9–12, and 14–22) all dealt with items
related to interpersonal skills and emotional intelligence. The results of these items on
the survey also verify that school leaders who have a supervisory role relating to the eval-
uation of graduates of our program indicated they were generally satisfied when it comes
to these areas. Multiple questions were designed to measure perceptions about emotional
People Skills and Online Learning 717

intelligence and interpersonal skills for these sections of the survey and administrator
responses about those questions were generally favorable. One key takeaway for the
researcher is that people skills can be conveyed during online class sessions and based
on the evidence from the study, we are doing a good job of stressing the importance of
social and emotional learning (SEL). The entire survey and results for each question can
be found in the Appendix I and II.
As stated previously in this article, more research in this area is warranted and future
study will allow for ongoing evaluation of our program. The conclusions drawn from this
research can only “whet the appetite” for more discovery of what works when it comes
to online education in a consistent approach to monitoring for continuous improvement.
According to Kolloff [12], “The design role becomes important in that the majority of the
instructor’s time is spent in determining how the course is to be implemented.” With that
in mind, we (as a staff) can begin to consider and evaluate what is critically important
as we design learning activities that serve our students well in the preparation for their
future roles as school leaders who possess personal skills which lead to success in the
performance of their job duties.

7 Recommendations and Road Map for Future Study


Recommendations for this research include continuing to stress to students the need for
building trust with school stakeholders (teachers, co-workers, students, and parents) in
order to be successful as school leaders. Self-evaluations, as part of the student self-
assessment of current level of goal attainment, will stress the importance of empathy
and emotional intelligence as we continue to monitor our curriculum at Southeastern
Oklahoma State University. The ongoing model of continuous improvement that is part of
our annual evaluation process will also help us in our effort to better serve the students in
our program by continuing to adjust to the TPACK framework related to online learning.
As mentioned throughout this article, further research of this nature is necessary for
the purpose of identifying what works and what needs to be excluded in the preparation
of school leaders. With time, the same survey can be sent to more school leaders who
are responsible for evaluating our graduates, thereby allowing a method for generalizing
the results to an expanded population. The research conducted in this survey has proven
to be valuable to the researcher as it represents something that is worthy of attention:
social – emotional learning (SEL). The SEL model can be more fully explained and
evaluated with the next round of surveys.
The findings of this study are anecdotal due to the small survey size (n = 33).
Additional research is necessary which will help guide the efforts of the faculty in the
Educational Instruction and Leadership department at Southeastern Oklahoma State
University. Research of this nature is critical to the development of school leaders who
demonstrate empathy towards others as they perform the duties of their jobs and who
possess the necessary emotional intelligence required for successful school leadership.

Appendix I
See Tables 1 and 2
718 C. T. Williams

Table 1. Background data

Position Superintendent Assistant Principal Assistant Other


12.12% superintendent 36.36% principal 21.21%
12.12% 18.18%
Years served 0–5 6–10 11–15 16–20 21–25 26 or
in education 6.06% 15.15% 9.09% 15.15% 27.27% more
27.27%
Years in 0–5 6–10 11–15 16–20 21–25 26 or
current 54.55% 30.30% 0.00% 12.12% 3.03% more
position 0.03%
School and Urban Suburban Rural
community 12.12% 36.36% 51.52%
Current level Bachelor’s Masters Doctorate
of education 0.0% 84.85% 15.15%

Table 2. Administrator responses

Survey question SA A N D SD
Q6. When thinking about graduates of the online MED 15.15% 60.61% 24.24% 0.00% 0.00%
program at SEOSU, they are more likely to demonstrate
empathy towards other individuals in the performance of
their job as a school leader
Q7. When thinking about online degree programs in 54.55% 39.39% 3.03% 3.03% 0.00%
general, I believe they are a valuable asset to our
employees as they allow for convenience and expediency
Q8. When thinking about online degree programs in 0.00% 3.30% 3.03% 54.55% 39.39%
general, I believe they are a waste of time as students are
not exposed to the experiences necessary for personal
development and personal interaction which allow for
professional growth as they earn their degree
Q9. When thinking about graduates of the online MED 12.12% 48.48% 36.36% 3.03% 0.00%
program at SEOSU, I am more likely to hire a graduate
from this program as our school district has benefitted
from the curriculum being taught in this program
Q10. When thinking about graduates of the online MED 0.00% 15.15% 36.36% 42.42% 6.06%
program at SEOSU, I believe there should be a greater
emphasis on interpersonal skills than what I’ve seen thus
far
Q11. Graduates of the online MED program at SEOSU 30.30% 60.61% 9.09% 0.00% 0.00%
are generally reliable and dependable professionals who
consistently meet the demands of the position in which
they are employed
(continued)
People Skills and Online Learning 719

Table 2. (continued)

Survey question SA A N D SD
Q12. When presented with an opportunity to hire 30.30% 54.55% 15.15% 0.00% 0.00%
someone for a school leadership position, I feel confident
in recommending an individual who has completed the
online MED program at SEOSU
Q13. I prefer traditional, face-to-face learning 9.09 15.15% 27.27% 36.36% 12.12%
experiences as opposed to the current trend towards %
online learning when it comes to preparing individuals
for their roles as school leaders
Q14. Graduates of the online MED program at SEOSU 36.36% 54.55% 9.09% 0.00% 0.00%
have demonstrated the ability to build trust with others as
they perform the duties of their job
Q15. Graduates of the online MED program at SEOSU 33.33% 54.55% 9.09% 3.03% 0.00%
display humility and authenticity as they perform the
duties of their job
Q16. Graduates of the online MED program at SEOSU 30.30% 54.55% 15.15% 0.00% 0.00%
exhibit emotional intelligence as they perform the duties
of their job
Q17. Graduates of the online MED program at SEOSU 36.36% 57.58% 6.06% 0.00% 0.00%
develop positive relationships with others as they
perform the duties of their job
Q18. Graduates of the online MED program at SEOSU 33.33% 57.58% 9.09% 0.00% 0.00%
demonstrate courage as they perform the duties of their
job
Q19. Graduates of the online MED program at SEOSU 33.33% 57.58% 9.09% 0.00% 0.00%
use data to make informed decisions
Q20. Graduates of the online MED program at SEOSU 30.30% 63.64% 6.06% 0.00% 0.00%
hold themselves and others accountable for their actions
Q21 Graduates of the online MED program at SEOSU 27.27% 63.64% 9.09% 0.00% 0.00%
use resources wisely

Appendix II

See Figs. 2, 3, 4, 5 and 6


720 C. T. Williams

Fig. 2. Years of service in education

Fig. 3. Years in current position


People Skills and Online Learning 721

Fig. 4. Years of experience

Fig. 5. School’s community


722 C. T. Williams

Fig. 6. People dkills

References
1. Allen, I.E., Seaman, J.: Class differences: Online education in the United States. Sloan
Consortium (NJ1) (2010)
2. Blanchard, K., Hodges, P.: Lead Like Jesus. Thomas Nelson, Nashville, TN (2005)
3. Chamorro, T., Frankiewicz, B.: Does higher education still prepare people for jobs. Harvard
Business Review (2019)
4. Collins, J.C.: Good to Great: Why Some Companies Make the Leap…and Others Don’t.
Harper Collins, New York, NY (2001)
5. Cui, G., Lockee, B., Meng, C.: Building modern online social presence: a review of social
presence theory and its instructional design implications for future trends. Educ. Inf. Technol.
18, 661–685 (2013). https://doi.org/10.1007/s10639-012-9192-1
6. Culver, M.K.: Applying Servant Leadership in Today’s Schools. Routledge, New York, NY
(2009)
7. Derlega, V.J., Metts, S., Petronio, S., Margulis, S.T.: Self-disclosure. Sage Publications, Inc.
(1993)
8. Deutschendorf, H.: 7 reasons why emotional intelligence is one of the fastest-growing job
skills. Fast Company. https://www.fastcompany.com/3059481/7-reasons-why-emotional-int
elligence-is-one-of-the-fastest-growing-job-skills. Accessed 18 March 2022
9. Freire, P.: Pedagogy of the Oppressed. Simon Fraser University Library (2018)
10. Goleman, D.: Working with Emotional Intelligence. Bantam Books, New York, NY (1998)
11. Greenleaf, R.K.: The Servant as Leader. The Robert Greenleaf Center, Indianapolis, IN (1991)
12. Kolloff, M.: Strategies for Effective Student-To-Student Interaction in Online Courses.
University of Wisconsin System Board of Regents, Madison, WI (2001)
13. Jamie, L.: What Does Research Say About Online Learning? (2020). https://www.thoughtco.
com/what-research-says-about-online-learning-1098012
14. Pappas, C.: E-learning: Top 10 e-Learning Statistics for 2014 You Need To Know. eLearn-
ing Industry. https://elearningindustry.com/top-10-e-learning-statistics-for-2014-you-need-
to-know. Accessed 18 March 2022
People Skills and Online Learning 723

15. Rasmussen, B.M., Mishna, F.: A fine balance: instructor self-disclosure in the classroom. J.
Teach. Soc. Work. 28(1–2), 191–207 (2008)
16. SHRM: The Global Skills Shortage. SHRM (2019). https://www.shrm.org/hr-today/trends-
and-forecasting/research-and-surveys/Pages/default.aspx. Accessed 18 March 2022
17. Hayeon, S., et al.: Teacher–student relationship in online classes: a role of teacher self-
disclosure. Comput. Hum. Behav. 54, 436–443 (2016). https://doi.org/10.1016/j.chb.2015.
07.037
18. Tackie, H.N.: (Dis)Connected: establishing social presence and intimacy in teacher-student
relationships during emergency remote learning. AERA Open. (2022). https://doi.org/10.
1177/23328584211069525
19. The Future of State Universities: Research on the effectiveness of online learning (2011).
https://www.learningfront.com/Media/Research_Online_Learning.pdf. Accessed 18 March
2022
20. tpack.org
Scenarios for Virtual Clinical Simulation
to Train Nursing Students at a South African
University

Botha Benjamin Stephanus(B) and Fourie Cecile

School of Nursing, University of the Free State, Bloemfontein, Free State, South Africa
[email protected]

Abstract. With the COVID-19 pandemic, nursing students were left in the dark
when it came to clinical practice and skills acquisition; suddenly, limited placement
and practical skill opportunities became even more limited. The University of
the Free State in South Africa was no exception, which forced rapid innovation
and expansion of digital systems to assist nursing students in practising skills
and integrating theory in practice. To address the need for theory and practice
integration, the researchers sought free-to-use VCS platforms and scenarios that
might be used by students to practice their skills and integrate their theory and
practice. During this research, it became clear that nursing does not have a lot of
support in the open-source and free-to-use world op software as most platforms
and scenarios are aimed at medical doctors. There were, however, some platforms
and scenarios which could be included and linked to outcomes in the Bachelor of
Nursing programme at the University of the Free State.

Keywords: Virtual reality · Nursing simulation · Simulation · Virtual clinical


simulation

1 Introduction
In the light of the recent COVID-19 pandemic, innovative online solutions are being
sought to provide nursing students at the University of the Free State (UFS) with oppor-
tunities to integrate their theory and practice. The COVID-19 pandemic further increased
the gap between theory and practice integration because of even more restricted access
to clinical placements [1], which has always been a global challenge [2, 3].
The identified problem is that there are limited available virtual reality (VR) appli-
cations for students to practice clinical skills in nursing. To try and address the lack
of VR applications, the researchers identified free-to-use online clinical skills training
solutions in different health science fields which might be applicable for practicing nurs-
ing skills. The researchers sought to evaluate free-to-use desktop-based VR applications
for students to use as training opportunities for bridging the theory and practice gap by
comparing the outcomes of the available scenarios and skills to those that are required
throughout the four-year Bachelors in Nursing (B.Nur) programme presented by the
UFS.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 724–733, 2023.
https://doi.org/10.1007/978-3-031-18344-7_51
Scenarios for Virtual Clinical Simulation to Train Nursing Students 725

The outcomes of the evaluations of the free-to-use desktop-based VR applications


are presented in this paper along with possible application areas, healthcare education,
and more specific in the field of Nursing.

2 Related Work
Various studies have emphasised the gap between theory and practice, for example, Choi
et al., Gilbert and Johnson, Howard, Scully and Van Zyl [4–8]. The gap between theory
and practice is mainly due to limited accredited clinical placement sites where students
can apply and transfer their theoretical knowledge in practical situations, especially in a
developing country like South Africa [8, 9]. Innovative teaching and learning strategies
may help bridge the gap between theory and practice and improve patient safety in
clinical environments [7, 8, 10].
Innovative teaching and learning strategies that have been identified to try and address
the gap between theory and practice is the use of Computerised Human Patient Simula-
tion (CHPS) and Virtual Clinical Simulation (VCS) [11, 12]. CHPS is an effective, but
expensive method to assist in bridging the gap between theory and practice and to help
students develop their skills [13, 14].
VCS has been investigated by various researchers to determine its viability as a
modality for training nursing students, for example, managing a patient with a foreign
object in the right lung [15], diagnosing patients via an Artificial Learning Interface
for Clinical Education (A.L.I.C.E.) [16], patient safety [17], medication administration
[18] and enhancing men’s awareness of testicular diseases [19], to mention a few. All the
aforementioned research also found VCS to be an adequate modality for skills acquisition
in nursing education.
For his research the spotlight falls on desktop-based VCS, also referred to as non-
immersive VCS, which is seen as VCS utilising interactions in a virtual environment (VE)
using a mouse and keyboard or a touchscreen on a mobile device [18, 20]. One issue with
the available research is that it was, in most cases, created by the researchers themselves
and are not always freely available. There were however free-to-use applications that
could be considered for evaluation during this research, to determine whether they could
assist with the acquisition of nursing skills. This study will aim to evaluate free-to-use
desktop VR applications for skills acquisition in the four-year Bachelors in Nursing
(B.Nur) programme presented by the UFS.

3 Materials and Methods


The researchers sought out viable, free-to-use applications that might assist nursing
students at the UFS to develop various skills required during the four-year B.Nur pro-
gramme. The free-to-use applications were sourced from literature and various searches
via google. During the Google searches the search terms that were used are listed below.

• Free Online Nursing Simulation/Online Nursing Simulation


• Free Nursing simulation Games/Nursing simulation Games
• Free Healthcare games/Healthcare games
726 B. B. Stephanus and F. Cecile

• Free Nursing Games/Nursing Games


• Free Healthcare Simulation Games/Healthcare Simulation Games
• Free Virtual Clinical simulation/Virtual Clinical simulation
• Free Virtual Clinical Simulation Games/Virtual Clinical Simulation Games

The searches were done with “free” included and with “free” excluded in the searches.
From all the searches only the platforms and scenarios that could be freely accessed and
used were included. Set out in Table 1 are the platforms and their scenarios that were
found, from the aforementioned search terms and research articles that the researchers
had access to.

Table 1. Free-to-use desktop-based VCS scenarios and platforms

Platform and link Available scenarios


Full Code emergency simulation [21]: • Cough, Shortness of breath, diarrhoea
(Click Here to go to platform) • Fever and Cough
• Paediatric Fever
• Abdominal pain in a child (Fig. 1)
• Acute dyspnoea
• Pelvic pain
• Wheezing
Stanford Sepsis [22]: • Classify the epidemiology of sepsis syndrome and
(Click Here to go to platform) differentiate between the different forms of sepsis
syndromes (simple, severe and septic shock)
• Integrate best evidence practices, clinical expertise
and diagnostic test results for early identification and
optimal management of septic states using
evidence-based guidelines and clinical decision
support tools (e.g., best practice alerts.)
• Demonstrate specific best practice strategies such as
fluid resuscitation, early identification with
laboratory markers and screening and transfer of a
patient to higher care with sepsis
• Describe priority actions for establishing and
implementing early goal-directed therapies for the
septic patients along the continuum of care
• Develop and apply communication skills related to
the identification and management of sepsis when
working among healthcare teams. (eg. Calling for
help early)
(continued)
Scenarios for Virtual Clinical Simulation to Train Nursing Students 727

Table 1. (continued)

Platform and link Available scenarios


Surgery Squad [23]: • Retinal Detachment Eye Surgery
(Click Here to go to platform) • Cataract Eye Surgery
• All About Dental Braces
• Colonoscopy Procedure – What you should know
• Natural Childbirth
• Childbirth with Epidural
• Adenoidectomy
• Teeth Whitening
• Laparoscopic Gallbladder Surgery
• Teeth Cleaning
• Ingrown Toenail Removal
• Saline Breast Implants
• Carpal Tunnel Release
• Laparoscopic Appendectomy
• Silicone Breast Implants
• Double Mastectomy
• Breast Cancer Awareness
• Botox Injections
• Lumpectomy
• Mastectomy
• Root Canal Procedure
• Dental Crown Placement
• Wisdom Tooth Extraction
• Dental Filling
• Laser Tattoo Removal
• C-Section
• Hair Transplant Procedure
• Laser Hair Removal
• Tumescent Liposuction
• Tonsillectomy
• Laparoscopic Gastric Bypass
• LASIK Eye Surgery
• Rhinoplasty
Medicactiv [24]: • Heart failure progression after hospitalisation for
(Click Here to go to platform) Atrial Fibrillation
• Hospitalisation for Heart Failure (demo)
• Cardiological progression of cardiovascular disease
(demo)
• Cardiovascular progression (demo)
(continued)
728 B. B. Stephanus and F. Cecile

Table 1. (continued)

Platform and link Available scenarios


Breakawaygames [25]: • Community Health Nursing
(Click Here to go to platform) • Community Health Nursing – Loggerton
• Knowledge Club Challenge NCLEX
• Knowledge Club Countdown
• NCLEX
• SPS - Difficulty Breathing
• SPS - Medication Series 1
• SPS - Painful Headache
• vHealthcare Advanced Trauma Life Support (ATLS)
- Car Accident
• vHealthcare Paediatric Advanced Life Support
(PALS)– Anaphylaxis
• vHealthcare Paediatric Advanced Life Support
(PALS) – Diabetic Ketoacidosis (DKA)
• Vital Signs ER - demo shift

Once the platforms with the scenarios were listed, the principal researcher determined
the technical viability for example, can this platform work on various platforms (mobile
and desktop) and what are the technical requirements. This was done since not all students
have high-end computers or smart mobile devices. From the available platforms and
scenarios, the unviable options were excluded before the evaluation commenced (See
Table 2).

Table 2. Platforms and scenarios excluded

Platform and scenarios excluded Reason for exclusion


• Surgery Squad: • The reason for excluding these scenarios is
• All scenarios seen in Table 1. Were because the platform requires adobe flash
excluded from evaluation player to run and adobe flash player has
been discontinued and is no longer
supported in any of the three most used
browsers, namely Google Chrome, Mozilla
Firefox, and Microsoft Edge
• Breakawaygames: • The scenarios listed were excluded from
• Community Health Nursing evaluation because they are not virtual
• Knowledge Club Challenge NCLEX simulations per se but more trivia-based
• Knowledge Club Countdown games, which does not fit the purpose of this
• NCLEX research
• SPS - Difficulty Breathing
• SPS - Medication Series 1
• SPS - Painful Headache
Scenarios for Virtual Clinical Simulation to Train Nursing Students 729

Once the initial exclusions were completed two reviewers evaluated the scenarios and
platforms for inclusion, one reviewer is a nurse educator who specialised in simulation for
nursing students and the other reviewer is a specialist in various simulation technology.
Both reviewers determined what platforms and scenarios could be included based on the
outcomes for the B.Nur programme at the UFS. The outcomes are available from the
yearbook which is published on the website of the UFS [26].

Fig. 1. Abdominal pain scenario taken from fullcode

The reviewers played each of the VCS games which were selected for evaluation,
together and compared the outcomes of the VCS scenario with that which was set out
in the curricula for the B.Nur programme. Figure 1 shows one of the scenarios available
from FullCode which was evaluated. The reviewers then determined for which year
group or groups the scenario is applicable and what outcome does it satisfy, as will be
discussed in the results and discussion to follow.

4 Results and Discussion

After evaluation of the games, the reviewers noted that most scenarios are aimed at
medical doctors, even though there were aspects applicable to nursing, it would be
somewhat difficult to split the aspects that are applicable for nursing skills from the
main scenarios due to the nature of the platforms and the scenarios they house, which
requires the student to address all the outcomes to complete the scenarios and receive
feedback. There were however some platforms and scenarios which could be included
and coded to outcomes of the B.Nur programme as can be seen in Table 3.
From all the scenarios and platforms a total of two platforms with ten scenarios
between them could be coded to the B.Nur programme. However, the reviewers also
found that some platforms and scenarios might be applicable for use in the future for
certain post-graduate diplomas as can be seen in Table 4.
730 B. B. Stephanus and F. Cecile

Table 3. Platforms and scenarios coded to B.Nur outcomes

Platform and Scenario Linked Outcome and applicable year group


Mediactiv: Cardiovascular progression • Ancle brachial pressure index (ABPI) (3rd
(demo) - Situation 1 Year B.Nur students)
• Hypertension (2nd and 3rd Year B.Nur
students)
Mediactiv: Cardiovascular progression • Hypertension (2nd and 3rd Year B.Nur
(demo) - Situation 2 students)
Mediactiv: Hospitalisation for heart failure • Pulmonary oedema (3rd Year B.Nur
(demo) Situation 1 students)
Full Code emergency simulation: • Interpret assessment findings (1st Year
Cough, Shortness of breath, diarrhoea B.Nur students)
(COVID) • Vital signs: Blood pressure measurement,
Viral Symptoms with hypoxia (Covid) Temperature measurement, Oxygen
saturation (1st Year B.Nur students)
• Infection control measures (1st and 2nd
Year B.Nur students)
• Respiratory conditions: Pneumonia, Cough
and a cold (1st Year B.Nur students)
• Assessment of a patient (history taking,
physical examination, side room
investigation, collecting appropriate samples
for laboratory testing) (1st Year B.Nur
students)
• Interpretation of assessment findings (1st
Year B.Nur students)
• Oxygen therapy/nebulisation/spacer (1st
Year B.Nur students)
• Respiratory failure (3rd Year B.Nur
students)
Full Code emergency simulation: Abdominal • Acute abdominal pain (3rd Year B.Nur
pain with poor appetite in child students)
• Appendicitis (3rd Year B.Nur students)
Full Code emergency simulation: MV • Secondary trauma assessment (4th Year
Collision, Abdominal pain B.Nur students)
Full Code emergency simulation: Gunshot • Secondary trauma assessment (4th Year
Wound in the Neck B.Nur students)
Full Code emergency simulation: Viral • Infectious diseases and their transmission
symptoms with spreading rash (2nd Year B.Nur students)
• Childhood illnesses (IMCI) (1st Year B.Nur
students)
Full Code emergency simulation: Confusion • Suppressed consciousness (4th Year B.Nur
after attending a party students)
• Raised intracranial pressure (4th Year
B.Nur students)
(continued)
Scenarios for Virtual Clinical Simulation to Train Nursing Students 731

Table 3. (continued)

Platform and Scenario Linked Outcome and applicable year group


Full Code emergency simulation: Weakness • Stroke (4th Year B.Nur students)
with speech changes

Table 4. Platform and scenario with linked post-graduate diploma

Scenario and platform Linked post-graduate diploma and outcome


vHealthcare ATLS - Car Accident ICU Nursing Diploma: Advanced Trauma Life
(Breakaway Games) Support (ATLS)
vHealthcare PALS - Anaphylaxis Pediatric Nursing Diploma: Anaphylaxis
(Breakaway Games)
vHealthcare PALS - DKA: Pediatric Pediatric Nursing Diploma and ICU Nursing
Advanced Life Support (PALS) in the Diploma: Diabetic Ketoacidosis (DKA)
Emergency Department - DKA
Vital Signs ER - 2 h shift (Breakaway Pediatric Nursing Diploma and ICU Nursing
Games) Diploma: Emergency Room
Septris (Stanford) ICU Nursing Diploma: Sepsis

The reviewers determined that there is a limited number of platforms that could be
used for the nursing students in the B.Nur programme, however, it can still assist with
some of the outcomes as mentioned previously.

5 Conclusion

In conclusion, the research showed that there are a limited number of free-to-use plat-
forms and scenarios and even more so for nursing students, as the focus seems to be on
training medical doctors. This is a big problem, especially seeing that 90% of the world
population’s first contact is with a nurse [27, 28]. Even though the evaluated scenarios
can help the students in the B.Nur programme at the UFS, the available platforms for
nursing, still needs to be expanded, especially open-source options. The reason is that
developing countries do not have the financial capacity to pay licensing fees for virtual
platforms.
This research can also provide insights to other researchers on possible available
platforms and scenarios which could be expanded on in future research and tested to
determine their effectiveness in providing nursing students with the needed skills as
per the outcomes of their nursing programme. Furthermore, this research can provide
opportunities for medical doctors to research the effect of free-to-use VCS for skills
acquisition on the platforms and scenarios that are aimed at them.
732 B. B. Stephanus and F. Cecile

6 Ethical Clearance
Ethical clearance was granted by the relevant ethics committee under a project with
ethical clearance number: UFS-HSD2020/1313.

7 Future Research

The platforms and scenarios were evaluated but not tested by the students, which could be
valuable future research endeavors. VCS could also be combined with CHPS simulation
by preceding simulation activities. Research can be done to determine whether it helps
prepare students better for CHPS.
Another issue was that the post-graduate diplomas are still being designed and must
be approved by the regulating bodies in South Africa, which meant the reviewers based
the inclusion of scenarios for post-graduate diplomas, on the old outcomes which might
in future differ from the new post-graduate diploma outcomes, once they are complete
and approved.
For this research the researchers assumed that the platforms and scenarios will con-
tribute positively to skills acquisition for nursing students due to the number of research
which indicated the positive effect of VCS. The effect these VCS platforms and scenarios,
can however be tested empirically to determine the effect they have on the skills acqui-
sition of nursing students in future research endeavors, which is one of the upcoming
research endeavors that flowed from this research.

References
1. Dolan, H., Amidon, B.J., Gephart, S.M.: Evidentiary and theoretical foundations for virtual
simulation in nursing education. J. Prof. Nurs. 37, 810–815 (2021). https://doi.org/10.1016/
j.profnurs.2021.06.001
2. Niederhauser, V., Schoessler, M., Gubrud-Howe, P.M., Magnussen, L., Codier, E.: Creating
innovative models of clinical nursing education. J. Nurs. Educ. (2012). https://doi.org/10.
3928/01484834-20121011-02
3. Poikela, P., Ruokamo, H., Teräs, M.: Comparison of meaningful learning characteristics in
simulated nursing practice after traditional versus computer-based simulation method: a qual-
itative videography study. Nurse Educ. Today. 35 (2015). https://doi.org/10.1016/j.nedt.2014.
10.009
4. Choi, W., et al.: Engagement and learning in simulation: recommendations of the Simnovate
Engaged Learning Domain Group. BMJ Simul. Technol. Enhanc. Learn. 3, S23–S32 (2017).
https://doi.org/10.1136/bmjstel-2016-000177
5. Gilbert, K.A., Johnson, C.W.: Increasing self-efficacy through immersive simulations: leading
professional learning communities. J. Leadership Educ. 17, 72–93 (2018). https://doi.org/10.
12806/V17/I4/R5
6. Howard, S.: Increasing Fidelity and Realism in Simulaton (2018)
7. Scully, N.J.: The theory-practice gap and skill acquisition: an issue for nursing education.
Collegian 18, 93–98 (2011). https://doi.org/10.1016/j.colegn.2010.04.002
8. Van Zyl, A.E.: Exploring the potential theory-practice gap in the teaching methods of nurse
educators (2014)
Scenarios for Virtual Clinical Simulation to Train Nursing Students 733

9. Waldner, M.H., Olson, J.K.: Taking the patient to the classroom: applying theoretical frame-
works to simulation in nursing education. Int. J. Nurs. Educ. Scholarsh. 4, Article18 (2007).
https://doi.org/10.2202/1548-923X.1317
10. Alinier, G., Platt, A.: International overview of high-level simulation education initiatives in
relation to critical care. Nurs. Critical Care 19 (2013). https://doi.org/10.1111/nicc.12030
11. Botha, B.S., Hugo-van Dyk, L., Nyoni, C.N.: The reality of virtual reality at a South African
university during the COVID-19 pandemic. African J. Heal. Prof. Educ. 13, 199–200 (2021).
https://doi.org/10.7196/AJHPE.2021.v13i3.1503
12. Verkuyl, M., Atack, L., Mastrilli, P., Romaniuk, D.: Virtual gaming to develop students’
pediatric nursing skills: a usability test. Nurse Educ. Today. 46, 81–85 (2016). https://doi.org/
10.1016/j.nedt.2016.08.024
13. Lapkin, S., Levett-Jones, T.: A cost-utility analysis of medium vs. high-fidelity human patient
simulation manikins in nursing education. J. Clin. Nurs. 20, 3543–3552 (2011). https://doi.
org/10.1111/j.1365-2702.2011.03843.x
14. Pywell, M.J., Evgeniou, E., Highway, K., Pitt, E., Estela, C.M.: ScienceDirect High fidelity,
low cost moulage as a valid simulation tool to improve burns education. Burns 42, 844–852
(2016). https://doi.org/10.1016/j.burns.2015.12.013
15. Botha, B.S., de Wet, L., Botma, Y.: Usability of a foreign body object scenario in VR for
nursing education. In: IEEE (ed.) 2020 IEEE Conference on Virtual Reality and 3D User
Interfaces Abstracts and Workshops (VRW). pp. 787–788. IEEE, Atlanta (2020)
16. Kleinert, R., Wahba, R., Chang, D.H., Plum, P., Hölscher, A.H., Stippel, D.L.: 3D immersive
patient simulators and their impact on learning success: a thematic review (2015)
17. Butt, A.L., Kardong-Edgren, S., Ellertson, A.: Using game-based virtual reality with haptics
for skill acquisition. Clin. Simul. Nurs. 16, 25–32 (2018). https://doi.org/10.1016/j.ecns.2017.
09.010
18. Dubovi, I., Levy, S.T., Dagan, E.: Now I know how! The learning process of medication
administration among nursing students with non-immersive desktop virtual reality simulation.
Comput. Educ. 113, 16–27 (2017). https://doi.org/10.1016/j.compedu.2017.05.009
19. Saab, M.M., Hegarty, J., Murphy, D., Landers, M.: Incorporating virtual reality in nurse
education: a qualitative study of nursing students’ perspectives. Nurse Educ. Today 105,
105045 (2021). https://doi.org/10.1016/j.nedt.2021.105045
20. Choi, D.H., Dailey-Hebert, A., Estes, J.S.: Emerging tools and applications of virtual reality
in education (2016)
21. Full Code: Emergency Medicine Simulation. https://app.full-code.com/Player/Player.html
22. Stanford University School of Medicine: Septris. http://septris.stanford.edu//game/SeptrisTi
tle.html
23. Surgery Squad: Surgery Games | Surgery Squad. http://www.surgerysquad.com/category/sur
gery-games/page/2/
24. MedicActiV: MedicActiV. https://app.medicactiv.com/?redirect=%2Fhome
25. Breakawaygames: vHealthCare. https://store.breakawaygames.com/Home/Index
26. University of the Free State: Rule book – Courses. https://www.ufs.ac.za/health/departments-
and-divisions/school-of-nursing-home/general/courses
27. Knowles, M.: Survey: 90% of nurses admit they do not have enough time to prop-
erly care for patients. https://www.beckershospitalreview.com/quality/survey-90-of-nurses-
admit-they-do-not-have-enough-time-to-properly-care-for-patients.html
28. University of Texas Arlington: The Nurse’s Role in Global Health. https://academicpartner
ships.uta.edu/articles/healthcare/nurses-role-in-global-health.aspx
Learning Factory Synergy: Applied Learning
and Problem-Based Pedagogy in the Digital
Transformation Ecosystem

Peter ChunYu Yau1 , Ejoe Tso2 , and Dennis Wong3(B)


1 University of Glasgow, Glasgow, UK
[email protected]
2 Hong Kong Institute of Vocational Education, Hong Kong, China
[email protected]
3 Macao Polytechnic University, Macao, China

[email protected]

Abstract. The current manufacturing process has changed drastically in the last
decade due to the many changes in both hardware and software in computing. It is
thus worth investigating how people make use of these latest technologies in one
single workplace to fully utilize the power given by the manufacturing business
process, from learning, production, and then to further development.
In this paper, we discuss the challenges and difficulties of how workplace
synergy can be synchronous with academia in the form of research centers. We
surveyed some key personnel who partnered with higher education institutes for
collaboration work. Based on their experience, we would like to showcase and
discuss the core factors to make the academic-industrial collaboration work suc-
cessful. We will discuss the project plan, partner relationship, and knowledge
sharing process between industry supporters, academic staff, and the students;
including the pedagogy used and how the digital transformation takes place in the
learning factory ecosystem and then transferred the output to the real world.
We conclude that to achieve a good workplace synergy in the learning fac-
tory ecosystem, four elements are essential: a real-world scenario, a work-based
learning pedagogy, a long-term industry partner, a knowledgeable manager, who
is a professional with commercial experience, and with a technically trained back-
ground. We believed vocational education, applied learning, work-based learning
and teaching are several critical educational elements to enhance economic growth.

Keywords: Learning factory · Digital transformation · Applied learning ·


Problem-based learning · Pedagogy · Workplace synergy · Trainings · Internship

1 Introduction

The first industrial revolution [1] started its chapter by burning coal to power steam-
powered engines in the 1760s; while the second industrial revolution [2] started the

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 734–741, 2023.
https://doi.org/10.1007/978-3-031-18344-7_52
Learning Factory Synergy: Applied Learning and Problem-Based Pedagogy 735

chapter through the rapid development of mass production techniques, standardization,


and industrialization in the 1870s. After that, the digital revolution (i.e., the advancement
of technology such as the introduction of microelectronics and information technology)
shifted the mechanical and analog engineering towards the binary digital electronics in
the late twentieth century, which is regarded as the third industrial revolution [3, 4].
Now, we enter the year 2022 and we are trending towards the fourth industrial
revolution, also known as the Industrial 4.0 and 4IR [5, 6]: Cyber-Physical System
(CPS), Industrial Internet of Things (IIoT), 3D printing, Cloud Computing, Cognitive
Computing, and Artificial Intelligence, etc. How do all these latest technologies work
together and change our life [7]? And how are they related to the academia in terms of the
training perspectives, where vocational schools, learning factories, and research centers
can best use the benefits brought by using these technologies for educational purposes
[8]? This paper discusses the workplace ecosystem and the synergy in academia: how
the digital application and pedagogy work together in the Industrial 4.0 in the form of
Learning Factory representation.

1.1 Insights from World Economic Forum – Resetting the Future of Work
Agenda

In 2020, World Economic Forum (WEF) conducted global research on how businesses
would take to respond to the effects brought by COVID-19 [9]. 84% of the respondents
replied they would accelerate digitalization in their workplaces and business processes.
83% responded they would provide more opportunities to work remotely, and 50%
of them addressed the importance of process automation, extra efforts will be put to
accelerate this in the workplace (Fig. 1).
What does this mean to the learning factory? One of the major impacts is the change
from the physical environment to the online virtual environment. COVID-19 speeded
up the digital transformation process, and shifted many works from physical face to face
media to the digital world. Perhaps it’s due to the safety issue, or perhaps it’s due to the
resources saved and convenient issues, more people prefer to work from remote (home)
instead of spending time on commute. So, when we talk about the digital application
and the pedagogy in the learning factory ecosystem; what should we pay attention to
inspired by the insights above? There are five imperatives for resetting the future of the
work agenda discussed in the WEF white paper which may give us some hints (Fig. 2).
Embrace stakeholder capitalism, creates, and enforces a closer working relationship
between industry and academia, by having a win-win sustainable ecosystem that drives
innovation from the research center to commercial areas [10]. Funding, equitable shar-
ing of risks, and rewards allow a better atmosphere in the workplace. Aligning new
technologies and skillsets gives room to the students, trainees, and current employees an
opportunity to develop their hidden talents to the next level [11]. Transforming organi-
zation design and workflow facilitates the digital transformation trend which is already
taking place during the COVID-19 period [12].
736 P. C. Yau et al.

Fig. 1. Planned business measures in response to COVID 19 (Partial) (Source: World Economic
Forum, the future of jobs report 2020; Image Source: World Economic Forum resetting the future
of work Agenda: disruption and renewal in a post-COVID World 2020).

Fig. 2. Five imperatives for resetting the future of work Agenda (Source: World Economic Forum
and Mercer, 2020; Image Source: World Economic Forum resetting the future of work Agenda:
Disruption and renewal in a post-COVID World 2020).

1.2 Feedback from the Students and Feedforward from the Industry Partners

To plan forward and to make those insights mentioned above work in the ecosystem.
What do we need to do? It is common that we receive feedback when we complete a task
or a project; it is getting popular to use feedforward as a reverse approach to managing
the expectation and achieving better outputs management for an event.
Take an example from the survey results in the previous section: more people are con-
cerned about the digital transformation in the workplace during the COVID-19 period.
How students can understand the importance of digital transformation in advance when
they participate in the learning factory activities? And how the industry partners can
give advance attention to both the teaching staff when they design the simulation and
Learning Factory Synergy: Applied Learning and Problem-Based Pedagogy 737

curriculum? And to the students on what they can expect after participating in the train-
ing program? In a well-matched and balanced situation, expectation and output can be
managed in a harmonized way. In the next section, we will look at how digital transfor-
mation and work-based learning pedagogy work together to make the learning factory
ecosystem good in the new era.

2 Digital Transformation and Work-Based Learning Pedagogy


We know that there are many new technologies coming up, and the fast-moving society
just requires enterprises to deliver results as soon as possible. How can we make a balance
and embed this concept to the learning factory training and education ecosystem?
Firstly, we need to identify that education partners are important. Society requires
formally trained personnel to work in a professional way. Formal education allows
students to learn new knowledge step by step. So, identifying an academic partner is an
important element to build a successful ecosystem. And, historically, especially in the
engineering domain: we are shifting from the analogue process to the digital process.
As discussed, COVID-19 pushed the digital transformation process under this pandemic
situation. Due to the safety issue, due to the economic growth reason, and many other
kinds of factors: having the pedagogy more anytime and anywhere, making it more
application and vocational based (i.e. work-based learning) do fit into the requirement
and expectation from the society.
To understand how industry partners, think about the learning factory design, work-
flow, training model and the actual usefulness: we invited two working professionals for
an interview, who are currently and previously in charge of the training program which
is affiliated with the Universities, and their company. The goal of this interview is to
understand how they found the training program useful to the industry, and what kind
of improvements could be made in the learning factory ecosystem.

3 Methodology, Questions and Interview


A qualitative study is done for the research topic (question): How digital transformation
and pedagogy in the learning factory ecosystem should be designed to achieve workplace
synergy in academia.

3.1 Goal and Methodology

The goal of our interview is to understand at an in-depth level, how the management
of a business supporting partner thinks about educational body and responds to the
technological changes that take place in the real world. What kind of supporting facilities
we should have to bring a positive result to the industry partner. Interviewees are resource
owners and sponsors to a part of the learning factory design, so they hold and account
for the actual usefulness of the resources they spent.
Due to the continuous development of the COVID-19, interviews were done via an
online video conference system, open type questions were asked. Interviewees were told
738 P. C. Yau et al.

that no personal information will be revealed in the research process; all the identifiable
information including the name of the person, company, and all business-related data
will be concealed. Background of the company, professional domain and other important
data related to the success and implications of the research will be fully disclosed.

3.2 Questions
Four questions were discussed in the interview, they are asked in order. No time limitation
is set for each of the questions. The interviewee was informed the discussion will last
around 60 min. Two interviewees were introduced to each other at the beginning of the
interview, and they can share the comments anytime they want, this arrangement allows
more communication based on other’s sharing, if any.

1. What do you think about the learning factory nowadays in the academia?
2. What do you expect from the school training? How is it related to your com-
pany/business?
3. Do you think that digital transformation, blockchain, metaverse; and all these kinds
of new technological keywords are related to the training, and to the students?
4. How can we do better as a partner, as an ecosystem?

3.3 Interview

Interviewee, Mr K (K) and Ms N (N). Mr K is a manager who works for a real-estate


development company, which right now they would like to develop corporate social
responsibility (CSR) together with some sustainable knowledge transfer development
in civil engineering and property management areas.
Ms N is an academic manager who previously worked for a global business school,
now she is a co-founder of a business consulting company focused on job matching,
specifically for the business professional who has a technical background. Here below
is the extract and highlight of some of the sharing in the interview.

What Do You Think About the Learning Factory Nowadays in the Academia?
N: As a manager in a higher education institute, I am fortunate to have the opportunity
to visit different kinds of laboratories and learning factories in different schools and
universities. They are attractive, fun and interesting. I can see that the arrangement is
well-designed. I would love to see if this kind of facility can be fully utilized.
K: For me, I just want my expectation can be fulfilled. Well as you know, we are
accountable for the ROI, no matter whether it is a commercial project or not.

What Do You Expect from the School Training? How Is It Related to Your Com-
pany/Business?
K: Job-ready student, less supervision in the workplace is the best. I hope that they come
to the workplace to solve our problems.
Learning Factory Synergy: Applied Learning and Problem-Based Pedagogy 739

N: School should provide comprehensive training. I think that a good student should
be both equipped with hands-on knowledge and theoretical knowledge. This is the best
scenario.

Do You Think that Digital Transformation, Blockchain, Metaverse; and All These
Kinds of New Technological Keywords are Related to the Training, and to the Stu-
dents?
K: We want more digital things, just like we want to try 3D construction on the site. But
we lack the skills and it looks like expensive in research as well. I don’t know. There
are just lots of new technologies coming out in recent years. Many of them I just don’t
know what it is.
N: I have no idea, too. I guess students should think about whether they would like
to go wide or go deep. They should have a good mentor who allows them to understand
both the business and technical knowledge.

How Can We Do Better as a Partner, As an Ecosystem?


K: right now, the project done in the school is just too short. Students should investigate
a topic for a longer duration. We can see some of the results in the cooperation project,
but I don’t think that it can totally fit in the actual environment.
N: I believe people are the key, so for a company, I believe this applies to the school
as well. How you can form a long, or longer partnership with a training partner is a
pathway to success. Just like what K mentioned, if we want something to happen, it
takes time to incubate.

4 Discussion
We analysed the viewpoints shared by the interviewees and found that four elements are
especially important: (i) a real-world scenario, (ii) a work-based learning pedagogy, (iii)
a long-term industry partner, and (iv) a knowledgeable manager. Here we discuss each
of the identified elements and discuss the reasons why it is important.
We found that real-world scenario gives the exact detail [13] about where the problem
is, how it happens and what could be done for the solutions. This gives educators the
exact direction on how they plan and design the teaching direction. Simulations in the
learning factory can be built based on it and trainees can mockup the entire process in a
safe way, and long term sustainability. A work-based learning pedagogy facilitated both
the needs of the society and the students, maybe it is due to the economic reason, or
family financial issues; a work-based teaching methodology (majority) diverge students
into two streams, hands-on first or theory first [14]. By suitable design in the learning
factory, it can provide the hands-on first student training while the academia partner and
filling the gap on how theory backed up the practical knowledge as what they just did in
the simulation.
A good research item takes time to investigate, study, plan and develop; this is what
the interviewee addressed: although output is important, sometimes they need to take a
balance between the number of countable outputs delivered and the quality of the results
(i.e., performance indicator may not apply in all the scenario) [15]. This message is
740 P. C. Yau et al.

especially important to bring to the trainees that, the digital transformation process is a
speed thing up many times, but it is not 100% guaranteed. Technologies can be helpful in
many situations, and it can destroy many. Know-how managers, best to work in a pair, a
senior and a junior are recommended to guide the students in the entire learning process.
A know-how manager is a person who knows well about the real-world problem in the
workplace, and they know exactly what they need in the research output, to give value
to the company and as a measurement KPI how this cooperation work in a sustainable
way [16].

5 Conclusion
Technologies should be selective in the project picked by the industry partner. Indus-
try partners should create a feedforward mechanism and real-world problem for the
academia to work on and plan the simulation in the learning factory. Rather than many
forms of new technologies, digital transformation is one of the processes which cannot
be missed due to the global impacts. The work-based learning approach should be con-
sidered by the faculty while they need to balance the theory education to bridge the gap
between practical and theoretical. An ecosystem can be achieved by long term partner-
ship and a setup of the know-how manager, who is directly responsible for the question
setup and the examination of the research output.

Acknowledgments. This research is supported by the Macao Polytechnic University research


grant (Project code: RP/FCA-02/2022). The research of the third author is also supported by the
National Research Foundation of Korea (NRF) grant funded by the Ministry of Science and ICT
(MSIT), Korea (No. 2020R1F1A1A01070666).

References
1. Deane, P.M., Deane, P.M.: The First Industrial Revolution. Cambridge University Press (1979)
2. Mokyr, J., Strotz, R.H.: The second industrial revolution, 1870–1914. Storia dell’economia
Mondiale 21945(1) (1998)
3. Janicke, M., Jacob, K.: A third industrial revolution. Long-term governance for social-
ecological change, pp. 47–71 (2013)
4. Cooper, C., Kaplinsky, R.: Technology and Development in the Third Industrial Revolution.
Routledge (2005)
5. Chunguang, B., Patrick, D., Guido, O., Joseph, S.: Industry 4.0 technologies assessment: a
sustainability perspective. Int. J. Product. Econ. 229, 107776 (2020). https://doi.org/10.1016/
j.ijpe.2020.107776. ISSN 0925–5273
6. Wikipedia contributors: Fourth Industrial Revolution (2022). https://en.wikipedia.org/wiki/
Fourth_Industrial_Revolution
7. Panel, E.: 10 Ways Technology Has Changed Team Communication. Forbes (2018). https://
www.forbes.com/sites/forbesbusinessdevelopmentcouncil/2018/08/02/10-ways-technology-
has-changed-team-communication/
8. Gronau, N., Ullrich, A., Teichmann, M.: Development of the industrial IoT competences
in the areas of organization, process, and interaction based on the learning factory concept.
Procedia Manuf. 9, 254–261 (2017)
Learning Factory Synergy: Applied Learning and Problem-Based Pedagogy 741

9. Resetting the Future of Work Agenda: Disruption and Renewal in a Post-COVID World.
World Economic Forum (2020). https://www.weforum.org/whitepapers/resetting-the-future-
of-work-agenda-disruption-and-renewal-in-a-post-covid-world
10. Drobyazko, S., Okulich-Kazarin, V., Rogovyi, A., Goltvenko, O., Marova, S.: Factors of
influence on the sustainable development in the strategy management of corporations. Acad.
Strateg. Manag. J. 18, 1–5 (2019)
11. Green, A.: The COVID-19 Crisis and Implications for Skills Development and the Skills
System. Edward Elgar Publishing, In Productivity and the Pandemic (2021)
12. Priyono, A., Moin, A., Putri, V.N.A.O.: Identifying digital transformation paths in the business
model of SMEs during the COVID-19 pandemic. J. Open Innov. Technol. Market Compl.
6(4), 104 (2020)
13. Okuda, S.M., Runco, M.A., Berger, D.E.: Creativity and the finding and solving of real-world
problems. J. Psychoeduc. Assess. 9(1), 45–53 (1991)
14. Black, J.S., Mendenhall, M.: A practical but theory-based framework for selecting cross-
cultural training methods. Hum. Resour. Manage. 28(4), 511–539 (1989)
15. Marr, B.: Key Performance Indicators (KPI): the 75 measures every manager needs to know.
Pearson UK (2012)
16. Petrosjan, L.A., Zenkevich, N.A.: Conditions for sustainable cooperation. Autom. Remote.
Control. 76(10), 1894–1904 (2015). https://doi.org/10.1134/S0005117915100148
Teacher Training Management Guidelines
for Improving Green IT Teaching Intention
and Behavior

Ricky Nhlanhla Dlamini and Grant Royd Howard(B)

University of South Africa (Unisa), 28 Pioneer Avenue, Florida, Roodepoort 1709, South Africa
[email protected], [email protected]

Abstract. The study is positioned at the intersection of teaching, learning and


Green Information Technology (Green IT) and would fall within the domain
of science, technology, engineering and mathematics (STEM)/science, technol-
ogy, engineering, arts and mathematics (STEAM) education. The study aims to
address the real-world problem of schoolteachers’ and student schoolteachers’
unpreparedness to teach sustainability, which includes Green IT. In particular, the
study addressed the lack of teacher training management guidelines for improv-
ing the Green IT teaching intention and behavior of student schoolteachers by
building on prior empirical quantitative work. The study progressed prior struc-
tural equation modeling (SEM) with one-way analysis of variance (ANOVA).
An original contribution to the Green IT and education fields was made through
empirical evidence of the significant and relevant constructs and demographic vari-
ables influencing Green IT teaching intention and behavior, which is important for
promoting, propagating and improving sustainability and the sustainable develop-
ment goals (SDGs). Teacher training management guidelines were developed to
provide teacher training management with valuable insight into the management
and design of student teacher training and courses for improving student teachers’
Green IT competencies, teaching and teaching motivation.

Keywords: Education · Environmental sustainability · Green computing · Green


information technology (Green IT) · Student teachers · Teaching and learning ·
Theory of reasoned action (TRA) · Theory of planned behavior (TPB)

1 Introduction

1.1 Background and Context

The United Nations sustainable development goals (SDGs) provide an urgent call for the
global community to address society’s most urgent challenges [1]. A major challenge is
the protection and sustainable use of the natural environment to provide for the needs of
the present and future generations, and the SDGs expose the vital and indispensable role
that the natural environment plays in the well-being of all people on Earth. Nevertheless,

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 742–751, 2023.
https://doi.org/10.1007/978-3-031-18344-7_53
Teacher Training Management Guidelines for Improving Green IT 743

people continue to use the natural environment unsustainably, causing severe environ-
mental depletion and degradation that is evident in many forms, such as pollution, global
warming, ocean acidification, loss of biodiversity and deforestation [2].
Information Technology (IT), incorporating Information and Communications Tech-
nologies (ICTs), too has an impact on the natural environment. IT use has become ubiqui-
tous resulting in extensive non-renewable resources consumption during IT manufacture
with the associated air, water and soil pollution [3], increased global warming through
carbon emissions due to energy consumption during IT use [4] and considerable ground
and water pollution from millions of tons annually of hazardous electronic waste (e-
waste) at IT disposal [3]. To address these negative environmental impacts, the theory
and practice of Green Information Technology (Green IT) was developed [5]. Notably,
the term ‘green’ is typically associated with nature, corresponds to plants, grass and trees
and is used to denote environmental sustainability. In addition, the concept of Green IT
is considered to have conceptual equivalence to Green ICT, IT for Sustainability, Green
Computing, Sustainable IT and Environmentally Sustainable Computing.
Environmental sustainability necessitates human attitude and behavior changes
toward sustainable ways of living and interacting with the natural environment [6].
To enable attitude and behavior changes, people require the knowledge and skills to
understand sustainability problems and solutions, make sustainable decisions and take
sustainable actions. To this end, education has a vital role to play in the teaching and
learning of green competencies, including green awareness, knowledge, skills, abilities,
attitudes and behaviors [7].
However, before teaching can proceed, teachers themselves need to acquire green
competencies and be motivated to teach sustainability. In this regard, the literature reports
that teachers often lack these requirements resulting in inconsistent and inadequate sus-
tainability education in schools [8]. A key opportunity for developing teachers’ green
competencies and motivation is during teacher training or when student teachers are
being trained about what and how to teach. Yet, the literature reports that student teach-
ers also do not have the appropriate green competencies and feel unprepared to teach
sustainability [9].

1.2 Research Problem, Question and Objective


The general real-world problem was schoolteachers’ and student schoolteachers’ unpre-
paredness to teach sustainability, which includes Green IT. Specifically, the literature
has not clarified how to effectively influence student schoolteachers’ Green IT teaching
intention and behavior but has rather explained aspects such as the intention to use Green
IT [10] and the intention to practice Green IT [11]. Also, where prior research consid-
ered variables that affect Green IT behavior [12], they did not involve student teachers or
teaching. Furthermore, prior research has not sufficiently engaged with the developing
[13] and African country context in which the study was based, which is a context that
is particularly susceptible to global warming and environmental degradation [14].
Consequently, the research problem was the lack of teacher training management
guidelines for improving the Green IT teaching intention and behavior of student
schoolteachers. Developing and presenting such guidelines provides an original con-
tribution to scientific knowledge and especially the sustainability, Green IT, education
744 R. N. Dlamini and G. R. Howard

and IT fields. Additionally, this knowledge provides teacher training management with
valuable insight into the management and design of student teacher training and courses
for improving student teachers’ Green IT competencies and teaching motivation. Thus,
the research question was what guidelines should teacher training management follow
to improve their student schoolteachers’ Green IT teaching intention and behavior?
Correspondingly, the research objective was to develop teacher training management
guidelines by building on prior empirical work [15].
The paper is structured into five sections with the first contextualizing the study,
the second reviewing applicable frameworks, theories and models in the literature and
the third providing justification for the research methodology. The results of which are
presented in the fourth section and the study concludes in the final section.

2 Literature Review
To achieve the research objective, applicable frameworks, theories and models were
searched for in the Green IT and IT literature to establish a basis for an empirically
testable research model from which to develop teacher training management guidelines.
Nine prominent frameworks, theories and models were evident, namely the IT Gover-
nance and Green IT model (ITGM) [16], the Green-readiness framework (G-readiness)
[17], the adoption model for Green IT [18], the Green IT adoption model (GITAM)
[19], a readiness self-assessment model for implementing Green lean initiatives [20],
the belief-action-outcome framework [21], the theory of planned behavior (TPB) [22],
the theory of reasoned action (TRA) [23] and the decomposed theory of planned behavior
(DTPB) [24].
It was evident that the first six are mostly applicable at an organizational level and
DTPB is more suitable for new technology adoption and utilization, resulting in their
exclusion. However, the TRA had applicability for targeting behavioral change strategies
and the TPB exposed variables affecting behavioral intention. Hence, TRA and TPB were
selected and formed the basis of the study’s research model as they could be used to
address the research problem, answer the research question and had previously provided
useful insight and explanations about the variables involved in behavioral intention and
behavior in prior Green IT, IT and sustainability research [25–27].
Subsequently, these theories guided the study’s research model development result-
ing in ten constructs, namely behavioral beliefs (BB), normative beliefs (NB), con-
trol beliefs (CB), level of awareness (LA), attitude toward behavior (ATB), subjective
norm (SN), perceived behavioral control (PBC), person-related beliefs (PRB), behavioral
intention (BI) and behavior (B).
To elaborate, BB relates to the level of acceptance by a student teacher that Green IT
teaching results in improved Green IT practices, NB relates to the level of acceptance by
a student teacher that Green IT teaching is expected by important people in the education
domain and CB relates to the level of acceptance by a student teacher of his/her discretion
to teach Green IT. LA relates to the level of Green IT knowledge, ATB relates to how
a student teacher feels about Green IT teaching, SN relates to the level of approval a
student teacher expects from people significant to him/her about Green IT teaching, PBC
relates to the perceived level of difficulty or ease of teaching Green IT, PRB relates to
Teacher Training Management Guidelines for Improving Green IT 745

the how important student teachers think their role is in promoting Green IT practice.
BI relates to the resolve of a student teacher to teach Green IT and B relates to Green IT
teaching.

3 Methodology
For answering the research question and addressing the research problem, the study was
appropriately guided by a positivist philosophy. As an epistemology, positivism indicates
that knowledge can be objectively acquired using the scientific method and observed
empirical quantitative data and analyses. Theory and hypothesis testing are common
characteristics. Hence, the study conducted an online anonymous questionnaire survey
to measure the research model constructs and test their relationships.
Following advice in the literature, the study used purposive sampling for relevant
and knowledgeable respondents representing the key research problem categories [28].
This method is efficient and replicable. A total of three hundred responses were collected
from student teachers across three teacher training tertiary institutions in Swaziland that
demonstrated a broad set of demographics, teaching grades and related qualifications.
Ethics clearance was approved by the University of South Africa (Unisa) following
formal permission from all the teacher training institutions and each respondent provided
informed consent.

4 Data Analysis and Findings

The questionnaire was administered following acceptable measurement item reliability


scores developed over two pilot tests and confirmed construct, discriminant and con-
vergent validity per exploratory factor analysis, demonstrating that the questionnaire
items adequately measured the research model constructs. Thereafter, structural equa-
tion modeling (SEM) was applied to analyze the hypothesized relationships amongst the
research model constructs. The results are provided in Fig. 1 and demonstrate the sig-
nificant positive influence of LA, PBC and PRB on BI and BI on B. A solid line denotes
a statistically significant influence, a dashed line no statistically significant influence
and influences are in the direction of the arrows. The line values are the magnitudes or
standardized estimate values of the influences with the hypothesis numbers in adjacent
brackets.
Building on the SEM, a one-way ANOVA or analysis of variance was carried out
to ascertain whether any statistically significant differences were evident amongst the
demographic variables, namely gender, home language, age, the year level of most of
a respondent’s courses being 1st , 2nd or 3rd year level, the grade range that a respon-
dent planned to teach once he/she finished his/her training, the subject category that a
respondent planned to teach once he/she finished his/her training, the qualification for
which a respondent was registered and a respondent’s months/years of practical teaching
experience. ANOVA was suitable since it exposes systematic variances in groups of two
or more through a comparison of the variance between and within groups on a particular
variable.
746 R. N. Dlamini and G. R. Howard

Fig. 1. SEM results for the study’s research model [15].

Conducting the ANOVA was also important because prior research [27] had noted
material differences that could impact teacher training management guidelines. For
instance, it would be important to know whether there was a significant difference in
Green IT teaching attitude between males and females or a significant difference in
Green IT teaching intention between student schoolteachers of different ages.
The ANOVA was carried out using the statistical software platform called SPSS
and examined all demographic variables for each of the ten research model constructs
[29]. However, ANOVA like many other statistical procedures has certain requirements.
One of which is homogeneity of variance, which was determined using Leven’s test. If
the significance of a Leven’s test is below five percent, then the null hypothesis which
states equal variances is rejected and means a violation of the homogeneity of variance
assumption, which signifies likely misleading ANOVA results and those ANOVA results
should not be interpreted. However, if the significance of a Leven’s test is above or equal
to five percent then the null hypothesis is not rejected and there is no violation of the
homogeneity of variance assumption.
Notably, ANOVA’s benefit is its utilization of one procedure to simultaneously inves-
tigate all comparisons, but its disadvantage is its inability to indicate which groups differ
on a variable. For this information, the post hoc procedure called Tukey’s honestly sig-
nificant difference (HSD) is required. However, it can occur that the ANOVA reports
a statistically significant difference and Tukey’s HSD does not because Tukey’s HSD
requires a greater difference before significance, due to its control of the Type I error,
which occurs when the null hypothesis is actually true but is rejected.
Proceeding with the ANOVA and the demographic variable called gender, Leven’s
test showed that ANOVA could be conducted for the constructs PRB, PBC, CB, ATB,
BB, BI and LA. However, the corresponding ANOVA for these constructs indicated
Teacher Training Management Guidelines for Improving Green IT 747

a statistically significant (SS) difference on the construct ATB only, but Tukey’s HSD
indicated no SS difference. Thus, there were no meaningful differences on gender.
For home language, Leven’s test showed that ANOVA could be conducted for all
constructs excluding PRB. The corresponding ANOVA indicated a SS difference on
the construct BB only, but Tukey’s HSD was not conducted since one group or more
groups had less than two responses, namely the Zulu home language group with only
one response. To clarify, there were six English, one Zulu and 293 Swazi responses on
home language, so it was not meaningful to compare such uneven groups resulting in
no meaningful differences on home language.
For age, Leven’s test showed that ANOVA could be conducted for all constructs
and a SS difference on construct LA only was indicated on the corresponding ANOVA.
Subsequently, on the construct LA, Tukey’s HSD indicated SS differences amongst the
20 - 24 and I do not want to answer this question age response options, and the 25 - 29
and I do not want to answer this question age response options. It was concluded that
there were no meaningful differences on age because then I do not want to answer this
question response option provides little useful information and insight.
For course year level, Leven’s test showed that ANOVA could be conducted for all
constructs excluding B and LA, but the resulting ANOVA showed no SS differences for
course year level.
For planned teaching grades, Leven’s test showed that ANOVA could be conducted
for all constructs excluding LA, B and NB, but the corresponding ANOVA showed a SS
difference on construct BI only, on which the Tukey’s HSD indicated a SS difference
between the primary school grades and early childhood grades groups. In conclusion,
Green IT teaching intention differed significantly between those respondents planning
primary school teaching, which had a mean of 15.560, and those planning early childhood
teaching, which had a lower mean of 14.138. Those planning early childhood teaching
would possibly benefit from adapted Green IT training to improve their Green IT teaching
intention.
For planned teaching subject, Leven’s test showed that ANOVA could be conducted
for all constructs excluding LA, NB and SN. Subsequently, there was a SS difference
on the construct PRB only from the ANOVA but Tukey’s HSD was not conducted since
one group or more groups had less than two responses. Thus, there were no meaningful
differences on planned teaching subject.
For the registered qualification, Leven’s test showed that ANOVA could be conducted
for all constructs excluding BB, PBC, B, NB and LA and the ANOVA showed a SS
difference on BI only. Tukey’s HSD, similar to planned teaching grades, indicated a
SS difference between the Primary Teacher’s Diploma and Early Childhood Education
Diploma groups on BI. Tukey’s HSD also indicated a SS difference between the Primary
Teacher’s Diploma and the Bachelor of Special and Inclusive Education groups on
BI. Thus, Green IT teaching intention differed significantly between those respondents
registered for the Primary Teacher’s Diploma, with a mean of 15.9077, those registered
for the Early Childhood Education Diploma, with lower mean of 14.5309 and those
registered for the Bachelor of Special and Inclusive Education, with the lowest mean of
13.7857. Hence, those registered for the Early Childhood Education Diploma and the
748 R. N. Dlamini and G. R. Howard

Bachelor of Special and Inclusive Education would possibly benefit from adapted Green
IT training to improve their Green IT teaching intention.
For practical teaching experience, Leven’s test showed that ANOVA could be con-
ducted for all the constructs excluding B and LA but the ANOVA showed no SS
differences between the different practical teaching experience groups.

5 Conclusion
5.1 Management Guidelines and Recommendations
Following the empirical work, teacher training management guidelines were developed
with the aim of improving student schoolteachers’ Green IT teaching intention and
behavior, as follows:

• With the strong empirical link demonstrated between Green IT teaching intention and
Green IT teaching, it would be important for teacher training management to address
the Green IT teaching resolve of student teachers directly and this could be done by
formally explaining to the student teachers the many global and local risks of not
practicing and teaching general environmental sustainability and Green IT, and also
formally explaining the benefits of practicing and teaching general environmental
sustainability and Green IT.
• A central focus by teacher training management should be on raising the student
teachers’ Green IT awareness since the analysis indicated that Green IT awareness
has the greatest positive influence on Green IT teaching intention. Green IT awareness
could be improved by formally integrating Green IT information and knowledge into
current teacher training courses and curriculums, including problems and solutions.
Also, informal methods including social media and events could augment formal
Green IT awareness.
• Another important aspect is the student teachers’ perceived behavioral control or their
perceptions about how difficult or easy it would be to teach Green IT, as the analysis
showed this aspect’s positive influence on Green IT teaching intention. This aspect
could be practically improved through the provision of teaching methods training
specifically designed for teaching Green IT content and creating opportunities for the
student teachers to pilot those methods with their planned teaching grades.
• Also essential is the person-related beliefs of the student teachers given its positive
influence on Green IT teaching intention. Person-related beliefs relate to the student
teachers’ perceptions of their part in improving the practice of Green IT by others.
Improving their person-related beliefs could involve training the student teachers in
the design of Green IT assessments to assess the Green IT behavior changes in their
pupils over time to provide evidence of their part in affecting Green IT change in
others.
• In addition, it would be essential to adapt and augment the Green IT training for
student teachers who plan to teach early childhood grades as these students showed a
significantly lower Green IT teaching intention. This could require teaching content
adapted for early childhood age groups given this age group’s young learning stage,
development and particular use of IT.
Teacher Training Management Guidelines for Improving Green IT 749

• Similarly, the student teachers enrolled in the Bachelor of Special and Inclusive Edu-
cation and Early Childhood Education Diploma would require adapted and special-
ized Green IT teaching content and teaching methods that fit the particular learning
development stages, capabilities and IT use contexts of these learners.

5.2 The Problem Addressed, Limitations and Future Research


Positivism provided an epistemological basis for objective knowledge acquisition to
address the research problem and answer the research question. Thus, an original con-
tribution to the Green IT and education fields was made through empirical evidence
of the significant and relevant constructs and demographic variables influencing Green
IT teaching intention and behavior, which is important for promoting, propagating and
improving sustainability and the SDGs. In addition, teacher training management guide-
lines for improving the Green IT teaching intention and behavior of student schoolteach-
ers were developed to provide teacher training management with valuable insight for
the management and design of student teacher training and courses to improve student
teachers’ Green IT competencies and teaching motivation.
Nevertheless, the study had limitations, such as its purposive sampling method which
may be open to subjective bias and constrain generalizability or external validity. Future
research would benefit from employing random sampling where possible. In addition,
future research could collect data from other contexts outside of Africa, in both develop-
ing and developed countries to verify, test and refine the findings and research model. Fur-
thermore, longitudinal studies may provide useful information over time about Green IT
teaching intention and behavior and effective and ineffective Green IT teaching methods
and content.

References
1. UN: Transforming our world: The 2030 agenda for sustainable development (2015). https://
sdgs.un.org/publications/transforming-our-world-2030-agenda-sustainable-development-
17981. Accessed 08 Apr 2022
2. UNEP: Frontiers 2022: Noise, blazes and mismatches: Emerging issues of environmen-
tal concern. Nairobi, Kenya (2022). https://www.unep.org/resources/frontiers-2022-noise-bla
zes-and-mismatches. Accessed 08 Apr 2022
3. Krumay, B., Brandtweiner, R.: Measuring the environmental impact of ICT hardware. Int. J.
Sustain. Dev. Plan. 11, 1064–1076 (2016). https://doi.org/10.2495/SDP-V11-N6-1064-1076
4. Bekaroo, G., Bokhoree, C., Pattinson, C.: Impacts of ICT on the natural ecosystem: a grassroot
analysis for promoting socio-environmental sustainability. Renew. Sustain. Energy Rev. 57,
1580–1595 (2016). https://doi.org/10.1016/j.rser.2015.12.147
5. Murugesan, S., Gangadharan, G.R. (eds.): Harnessing Green IT: Principles and Practices.
John Wiley and Sons Ltd, Chichester, United Kingdom (2012)
6. Cabral, C., Dhar, R.L.: Green competencies: insights and recommendations from a systematic
literature review. Benchmarking: An Int. J. 28(1), 66–105 (2021). https://doi.org/10.1108/BIJ-
11-2019-0489
7. Erhabor, N.I.: Developing leaders through mentoring in environmental education. Electr.
Green J. 1, 1–9 (2018). https://doi.org/10.5070/G314134454
750 R. N. Dlamini and G. R. Howard

8. Kalsoom, Q., Qureshi, N.: Impact of sustainability-focused learning intervention on teachers’


agency to teach for sustainable development. Int. J. Sust. Dev. World 28, 540–552 (2021).
https://doi.org/10.1080/13504509.2021.1880983
9. Alvarez-García, O., Sureda-Negre, J., Comas-Forgas, R.: Assessing environmental compe-
tencies of primary education pre-service teachers in Spain. Int. J. Sustain. High. Educ. 19,
15–31 (2018). https://doi.org/10.1108/IJSHE-12-2016-0227
10. Dezdar, S.: Green information technology adoption: influencing factors and extension of
theory of planned behavior. Soc. Responsib. J. 13, 292–306 (2017). https://doi.org/10.1108/
SRJ-05-2016-0064
11. Dalvi-Esfahani, M., Alaedini, Z., Nilashi, M., Samad, S., Asadi, S., Mohammadi, M.: Stu-
dents’ green information technology behavior: beliefs and personality traits. J. Clean. Prod.
257, 1–12 (2020). https://doi.org/10.1016/j.jclepro.2020.120406
12. Soroya, S.H., Mahmood, K, Soroya, M.S., Hussain, S., Ilyas, A.: Green computing intent
and behavior of Pakistani academic librarians: PLS-SEM analysis. Library Hi Tech. 2021.
Ahead-of-Printhttps://doi.org/10.1108/LHT-01-2021-0001
13. Nanath, K., Pillai, R.R.: Individual and organizational factors affecting the implementation
of Green IT: a case study of an Indian business school. Electr. J. Inf. Syst. Dev. Countries 87,
1–15 (2021). https://doi.org/10.1002/isd2.12163
14. Filho, W.L., et al.: The influence of ecosystems services depletion to climate change adaptation
efforts in Africa. Sci. Total Environ. 779, 146414 (2021). https://doi.org/10.1016/j.scitotenv.
2021.146414
15. Dlamini, R.N., Howard, G.R.: Investigating the antecedents to teaching green information
technology (Green IT): a survey of student teachers in Swaziland. In: Proceedings of the
2018 Annual Research Conference of the South African Institute for Computer Scientists and
Information Technologists (SAICSIT), pp. 108–117. Association for Computing Machinery
(ACM), Port Elizabeth, South Africa (2018). https://doi.org/10.1145/3278681.3278695
16. Hardin-Ramanan, S., Chang, V., Issa, T.: A green information technology governance model
for large mauritian companies. J. Clean. Prod. 198, 488–497 (2018). https://doi.org/10.1016/
j.jclepro.2018.07.047
17. Molla, A., et al.: E-readiness to G-readiness: developing a green information technology
readiness framework. In: Proceedings of the 19th Australasian Conference on Information
Systems (ACIS), pp. 669–678. University of Canterbury (2008)
18. Asadi, S., Nilashi, M., Samad, S., Rupani, P.F., Kamyab, H., Abdullah, R.: A proposed adop-
tion model for Green IT in manufacturing industries. J. Clean. Prod. 297, 1–16 (2021). https://
doi.org/10.1016/j.jclepro.2021.126629
19. Molla, A.: GITAM: a model for the adoption of green IT. In: Proceedings of the 19th
Australasian Conference on Information Systems (ACIS), Christchurch, New Zealand,
pp. 658–668. Association for Information Systems AIS Electronic Library (AISeL) (2008)
20. Cherrafi, A., Garza-Reyes, J.A., Belhadi, A., Kamble, S.S., Elbaz, J.: A readiness self-
assessment model for implementing green lean initiatives. J. Clean. Prod. 309, 1–17 (2021).
https://doi.org/10.1016/j.jclepro.2021.127401
21. Melville, N.P.: Information systems innovation for environmental sustainability. MIS Q. 34,
1–21 (2010). https://doi.org/10.2307/20721412
22. Ajzen, I.: The theory of planned behavior. Organ. Behav. Hum. Decis. Process. 50, 179–211
(1991). https://doi.org/10.1016/0749-5978(91)90020-T
23. Fishbein, M., Ajzen, I.: Belief, Attitude, Intention, and Behavior: An Introduction to Theory
and Research. Addison-Wesley, Reading, MA (1975)
24. Sadaf, A., Newby, T.J., Ertmer, P.A.: Exploring factors that predict preservice teachers’ inten-
tions to use web 2.0 technologies using decomposed theory of planned behavior. J. Res.
Technol. Educ. 45, 171–196 (2012). https://doi.org/10.1080/15391523.2012.10782602
Teacher Training Management Guidelines for Improving Green IT 751

25. Chen, Y., Shi, S., Chow, W.S.: Investigating users’ extrinsic motivation for green personal
computing. J. Comput. Inf. Syst. 56, 70–78 (2016). https://doi.org/10.1080/08874417.2015.
11645803
26. de Leeuw, A., Valois, P., Ajzen, I., Schmidt, P.: Using the theory of planned behavior to identify
key beliefs underlying pro-environmental behavior in high-school students: implications for
educational interventions. J. Environ. Psychol. 42, 128–138 (2015). https://doi.org/10.1016/
j.jenvp.2015.03.005
27. Mishra, D., Akman, I., Mishra, A.: Theory of reasoned action application for green information
technology acceptance. Comput. Hum. Behav. 36, 29–40 (2014). https://doi.org/10.1016/j.
chb.2014.03.030
28. Tongco, M.D.C.: Purposive sampling as a tool for informant selection. Ethnobotany Res.
Appl. 5, 147–158 (2007). https://doi.org/10.17348/era.5.0.147-158
29. Tredoux, C., Durrheim, K. (eds.): Numbers, Hypotheses & Conclusions: A Course in Statistics
for the Social Sciences. UCT Press, Cape Town, South Africa (2005)
Design and Implementation of an Automatic
Word Match Generator

E. Miles Gertis and Y. Daniel Liang(B)

Georgia Southern University, Savannah, GA 31419, USA


{eg08014,yliang}@georgiasouthern.edu

Abstract. An Automatic Word Match Generator is a software tool that can be


used to generate word-matching interactives automatically. The purpose of a word-
matching interactive is to provide students with the mechanism to learn new vocab-
ulary and improve their reading comprehension skills. This paper will present the
design and implementation of an Automatic Word Match Generator, as well as
the research and algorithms used in the program.

Keywords: Automatic programming · Computer science education · Online


learning · Programming synthesis · Word matching

1 Introduction

In this paper, we will address a common problem that instructors frequently encounter.
Instructors use word-matching interactives to teach new vocabulary to students.
Typically, these word-matching interactives had to be developed by hand.
We have developed more than sixty word matching exercises. Each word match-
ing exercise is a representation of a word-matching interactive. These interactives are
embedded in interactive eBooks as shown in [1–3]. The interactives in the eBooks have
received good reviews [4, 5]. They help students learn and grasp key terms. Each of
the word-matching interactives were programmed manually. Creating word-matching
interactives requires programming skill and takes a lot of time and effort. To enable
instructors with the ability to create word-matching exercises, we created an Automatic
Word Match Generator.
The Automatic Word Match Generator enables the user to enter key terms and their
descriptions and generates a Web page for a word-matching interactive as shown in
Fig. 1.
In the following sections, we will demonstrate word-matching interactives and the
use of the Automatic Word Match Generator. We will present the model for the Automatic
Word Match Generator and the design and implementation of the Automatic Word Match
Generator. Finally, we will discuss lessons learned from this project and future works.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 752–765, 2023.
https://doi.org/10.1007/978-3-031-18344-7_54
Design and Implementation of an Automatic Word Match Generator 753

Fig. 1. The word match generator generates a word matching interactive

2 Word-Matching Interactives

A word-matching interactive is a Web Page for students to learn key terms by matching
key terms with descriptions. Figure 2 shows an example of a word-matching interactive,
which can be viewed from https://liveexample.pearsoncmg.com/wordmatch/Section1_
2.html. Figure 3 shows the result after the user drags the key terms to match their
descriptions. A Congratulations dialog (see Fig. 4) is displayed when all key terms are
matched to their descriptions.

Fig. 2. A word-match interactive before dragging terms to match descriptions.

3 The Use of the Automatic Word Match Generator

We designed a simple and intuitive user interface for an instructor to use. The first step in
developing the Automatic Word Match Generator was to create a method for generating
the static HTML, CSS, and JavaScript word-matching interactive template. The next
phase of developing the Automatic Word Match Generator was to save the generated
754 E. M. Gertis and Y. D. Liang

Fig. 3. A word-match interactive after dragging terms to match description

Fig. 4. A congratulations dialog box is displayed

code to the internal server by clicking the Post button. The Automatic Word Match
Generator creates an HTML file to store the generated HTML code for the exercises and
then displays a View button.
The View button serves two purposes. First, it renders the HTML code for the exer-
cise. Second, it shows the URL for the exercise on the server. The instructor can give
this URL to the student.
To use the Automatic Word Match Generator, go to http://livelab.georgiasouthern.
edu/wordmatchgenerator as shown in Fig. 5.
Design and Implementation of an Automatic Word Match Generator 755

Fig. 5. The initial screen for the automatic word match generator.

Now enter a title, Key Term 1, Description for Key Term 1, Key Term 2, and Descrip-
tion for Key Term 2. You can click the Add More button to create more entries for key
terms and their descriptions. For example, to create the word matching exercise in Fig. 2,
you can enter the following entries in Fig. 6.
Now click the Generate HTML button to display the generated HTML code for this
word matching exercise. The generated HTML is shown in Fig. 7.

Fig. 6. The instructor UI with key terms and description inputs

Click the Post button to post the word match exercise to the server. Note that the
descriptions are randomly ordered. The Post button saves the generated HTML file for the
exercise on the server and creates a URL for the generated exercise. After the generated
HTML file is posted, a View button is displayed, as shown in Fig. 8. Clicking the View
button displays the exercise using the URL, as shown in Fig. 9. The instructor can give
the URL for this exercise to the student.
756 E. M. Gertis and Y. D. Liang

Fig. 7. The instructor UI after Clicking the Generate HTML Button.

Fig. 8. The instructor UI after Clicking the Save Button


Design and Implementation of an Automatic Word Match Generator 757

Fig. 9. A rendered word-matching interactive.

4 A High-Level Model for the Automatic Word Match Generator


Automatic programming is used to write a program that generates another program based
on certain specifications. For example, a compiler is an automatic program that takes a
source code and generates an executable code. In a broad sense, automatic programming
can be classified into two types:

1. Generative Programming: the application of reusing code for a new function or


software.
2. Code Generation: a mechanism to produce the executable form of a program.

The Automatic Word Match Generator is an example of code generation. It takes


key terms and their descriptions as input and generates an HTML source code.
The research of automatic programming started in the 70s. The initial goal was
to provide a specification and let the computer automatically generate a program that
meets the specification. Unfortunately, the task of automatically generating a program is
harder than expected. Formal specifications were proposed to give precise requirements
in a mathematical structure [6–11]. Experimental systems were developed that take
the requirements written in formal specification and generate a program automatically.
However, these systems are not used in industry, because there is a wide gap between
the high-level specification and target implementation [12].
In recent years, domain-specific automatic programming systems have been devel-
oped. A system called “Wrex” was created to automatically generate Python code for
analyzing data [8]. A system called “Falx” was created to automatically generate R
programming code for visualizing data [7]. A system called “Scythe” was created to
generate certain types of SQL statements [9].
Inspired by the current development in the domain-specific automatic programming
systems, we developed an Automatic Word Match Generator that automatically generates
a word-matching interactive. We propose a generic model for generating a web page as
shown in Fig. 10.
In the following section, we discuss the implementation of the model. In the case of
the Automatic Word Match Generator, the input is entered from text fields and stored
in arrays. The validation might be simply to check if key terms or their descriptions are
empty. Process Data randomly shuffles data and maps key terms to descriptions. Create
758 E. M. Gertis and Y. D. Liang

Fig. 10. A generic model for generating a web page.

Web Page uses HTML, CSS and JavaScript to create a word-matching interactive. Post
Web Page automatically posts the generated Web page on a web server so the page can
be viewed on the Internet.

5 Design and Implementation


We will describe the custom multi-tiered application that we used to implement the
Automatic Word Match Generator shown in Fig. 11 in a sequence of steps:

1. The dispatch servlet processes the request from the browser.


2. The mapping handler then routes the request to the correct controller class.
3. The WordMatchController then processes the request which then calls the appropri-
ate method from the WordMatchService.
4. The appropriate wordmatch jsp file is returned to the user.

Fig. 11. A diagram of the custom multi-tiered application

The methods used in the custom WordMatchController class facilitate the following
operations:

1. Retrieve word-matching interactives from the server.


2. Save the word-matching interactives on the server.
Design and Implementation of an Automatic Word Match Generator 759

The functions saveWordMatchJSP and getWordMatch handle the controller inter-


actions listed above. The saveWordMatchJSP method was designed to save the content
sent by an HTTP POST request from the client. The model from the custom model class,
View.java, was designed to facilitate the sending and receiving of data related to display-
ing word-matching interactives. The view, wordmatch.jsp, was designed to follow the
original structure of the first word-matching interactive introduced at the beginning of
the project as shown in Fig. 1. In Fig. 12, we can see how the request is sent to the server
by using the chrome developer tools. The request body used in the request is shown in
Fig. 12. The response from the request is shown in Fig. 13.

Fig. 12. An example of an XMLHttpRequest being sent to the server.

Fig. 13. An example of an XMLHttpRequest body shown in chrome developer tools.

Fig. 14. An example response returned from the server.


760 E. M. Gertis and Y. D. Liang

After the response is returned an Instructor can give a student the id associated with
the word-matching interactive.
Before implementing our project, a survey of possible solutions was conducted. In
this phase various programming languages were evaluated. We decided to use Java for
two reasons: it supports a set of frameworks for implementing web applications and
we had more experience using it. Using Spring Boot. We would be able to create an
application that could be used to generate, serve, and store word-matching interactives
as static HTML pages.
An overview of the steps used developed the Automatic Word Match Generator are
shown below:

1. Selected languages for developing the program.


2. Identified an expected output for the program.
3. Created a proof of concept that collected the data needed to generate the derived
data to create a word-matching interactive
4. Developed a user interface for collecting the input data.
5. Implemented an HTML generating function to collect the input data and generate
the derived data.
6. Implemented an HTML render function to display the expected output with the
derived data.
7. Developed a system for displaying the output to the user.

The implementation of the Automatic Word Match Generator consisted of four


phases:

1. Developed a JavaScript method to capture the values from HTML input elements.
2. Logged the input values to the JavaScript console as shown in Fig. 14.
3. Displayed the concatenated string from step two in a textarea element below the
instructor GUI as shown in Fig. 14.
4. Rendered the output from the textarea box in a separate window as shown in Fig. 15.

The specific classes used in the implementation of our microservice are shown in
Fig. 16, 17, 18, and 19.
Design and Implementation of an Automatic Word Match Generator 761

Fig. 15. The console log output from the generated HTML function.

Fig. 16. An initial attempt at rendering the HTML output from the generated HTML

The primary purpose of the controller class shown in Fig. 17 is to provide URL
routes for our application. The purpose of the classes shown in Fig. 18 and Fig. 19 is to
provide models which can be used to transport data throughout the service. The purpose
of the View class shown in Fig. 19 was designed to keep track of the word-matching
interactives saved on the server. The WordMatch class shown in Fig. 18 was used as a
model to transport the word-matching interactive. The model exists entirely on the server.
The method saveJSP shown in Fig. 20 was used to save a word-matching interactive
762 E. M. Gertis and Y. D. Liang

Fig. 17. The UML diagram for the custom WordMatchController class.

Fig. 18. The UML diagram for the custom WordMatch model class.

Fig. 19. The UML diagram for the custom view model class.

as a jsp (Java Server Pages) file on the server and saveHTML was used to convert a jsp
(Java Server Pages) file to a static HTML file.
Design and Implementation of an Automatic Word Match Generator 763

Fig. 20. The UML diagram for the custom WordMatchService class.

6 Lessons Learned
We created many word matching exercises manually. It was time consuming to create
each exercise and maintain it. Now we have this tool. It is a simple process to create a
word matching exercise without writing any code. In retrospect, we should have created
this tool earlier to save hundreds of hours of writing word matching exercises manually.
When we first designed the tool, we generated the HTML code and displayed the
code in a text area. We expect the instructor to copy and paste the code. We found this
limited the adoption of this tool. So we added the Post button to save the generated
HTML code to a server and create a URL for the instructor to access it directly without
any extra work.

7 Future Work
At present the generated exercises are not associated with a user. We plan to let instructors
create accounts. So they can create and store exercises in a database. An instructor will be
able to view all created exercises and delete them as well. With a user account, the keys
and their descriptions for each exercise will be saved in the database and regenerated.
The instructor will not need to re-enter the keys and descriptions if new functionality or
a new user interface is added to the generated HTML file.
Another direction of the future work is to create multiple word matching exercises
once. This idea was proposed by an instructor. The instructor wishes to create an XML
file that stores information for multiple exercises. For each exercise, it specifies the
title, key terms, and their descriptions. The Automatic Word Match Generator takes
the information from the XML file and automatically generates an HTML file for each
exercise specified in the XML file.

8 Conclusions
This paper presented a Web-based tool for automatically generating a word-matching
interactive. Instructors can enter the terms and their descriptions to generate a HTML
764 E. M. Gertis and Y. D. Liang

page and share the URL with students. The tool is freely available from http://livelab.
georgiasouthern.edu/wordmatchgenerator.
We proposed a generic model for automatically generating web pages. The Automatic
Word Match Generator is a demonstration of a concrete implementation for this generic
model. We believe that many other web page generation projects can be implemented
using similar approaches. Our Automatic Word Match Generator project serves as a
stepping stone in the field of automatic programming for generating web pages.
The Automatic Word Match Generator removes the pain that instructors typically
face when they have to create word-matching games. Before implementing the Automatic
Word Match Generator all word-matching interactives had to be developed manually.
The process of manually creating these exercises was a large waste of time for instructors.
Our tool provides instructors with the ability to create word-matching interactives
without having to write any code. The first iteration of our tool required instructors to
at least copy and paste their code from the text area onto the server. The manual effort
required to copy and paste resulted in poor adoption, so we added a Post button to save
the generated HTML code. Once the content is saved onto the server, a URL is created
for the instructor to access the exercise directly without any extra work.
The contribution from our research is a web-based tool that can automatically gener-
ate a word-matching interactive. Now instructors can enter their terms and descriptions
to create fun word-matching interactives which can be shared with students by sending
them a URL. The tool is freely available from http://livelab.georgiasouthern.edu/wor
dmatchgenerator.

References
1. Liang, Y.D.: REVEL™ for Introduction to Java Programming and Data Structures. Pearson
Education (2016). ISBN-13: 978–0134167008
2. Liang, Y.D.: REVEL™ for Introduction to C++ Programming and Data Structures. Pearson
Education (2018). ISBN-13: 978–0134669854
3. Liang, Y.D.: REVEL™ for Introduction to Python Programming and Data Structures. Pearson
Education (2018). ISBN-13: 978–0135187753
4. REVEL™ educator study observes homework and exam grades at University of Louisiana,
Spring (2016). http://www.pearsoned.com/results/revel-educator-study-observes-homework-
exam-grades-university-louisiana/. Accessed 16 May 2022
5. REVEL educator study assesses quiz, exam, and final course grades at Central Michigan Uni-
versity, Fall (2015),. http://www.pearsoned.com/results/revel-educator-study-assesses-quiz-
exam-final-course-grades-central-michigan-university/. Accessed 16 May 2022
6. Olsson, R.: Inductive functional programming using incremental program transformation.
Artif. Intell. 74(1), 55–81 (1995)
7. Wang, C., Feng, Y., Bodik, R., Dillig, I., Cheung, A., Ko, A.J.: Falx: synthesis-powered
visualization authoring. In: Proceedings of the 2021 CHI Conference on Human Factors in
Computing Systems, pp. 1–15 (2021)
8. Balzer, R.: A 15 year perspective on automatic programming. IEEE Trans. Softw. Eng. 11,
1257–1268 (1985)
9. Jazayeri, M.: Formal specification and automatic programming. In: Proceedings of the 2nd
International Conference on Software Engineering, pp. 293–296 (1976)
Design and Implementation of an Automatic Word Match Generator 765

10. Whalen, M.W., Heimdahl, M.P.E.: An approach to automatic code generation for safety-
critical systems. In: Proceedings of the 14th IEEE International Conference on Automated
Software Engineering, pp. 315–318. IEEE (1999)
11. Sun, S.Y.: A translator description language tdl for specification languages. J. Inf. Process.
3(3) (1990)
12. Palshikar, G.K.: Applying formal specifications to real-world software development. IEEE
Softw. 18(6), 89–97 (2001)
The Impact of Feedback Modes on Learners’
Performance in Paragraph Writing

Murad Abdu Saeed1(B) , Atef Odeh AbuSa’aleek2 ,


and Enas Abdelwahab Eltom RahmtAllah1
1 Department of English, Unaizah College of Sciences & Arts, Qassim University, Qassim,
Saudi Arabia
[email protected]
2 Department of English, College of Education, Majmaah University, Al-Majmaah 11952,
Saudi Arabia
[email protected]

Abstract. Despite the increasing attention devoted to digital feedback modes in


teacher feedback, the impact of these diverse feedback modes on learners’ writing
performance has not been sufficiently addressed. Therefore, the current study, by
assigning sixty English as a foreign language (EFL) undergraduates to four feed-
back mode conditions: oral/spoken, electronic (e-)text, voice, and audio-visual,
examined the effect of these four feedback modes on paragraph writing perfor-
mance. The results obtained from the four-mode groups’ pretest-posttest writing
tasks through sample paired t-tests and one-way ANOVA indicate that oral, voice
and audio-visual feedback modes enhanced learners’ performance in paragraph
writing. Based on the findings, useful pedagogical and research implications are
offered for writing teachers and instructors.

Keywords: Feedback modes · Students’ writing-performance · Teacher


feedback

1 Introduction
Teacher feedback is an integral part of English as a second/foreign language (ESL/EFL)
writing classrooms [1]. However, although feedback has been acknowledged to be useful
for ESL/EFL learners, it may not be well understood and effectively utilized by learners
in revising their texts and improving their writing performance [2, 3]. Therefore, research
has called for enhancing how teachers formulate and give feedback [4–6]. In this regard,
there is a line of research that supports the role of digital technology in improving
teachers’ feedback-formulating and giving processes as teachers have become able to
formulate and provide feedback in different digital modes varying from text/written
comments to voice notes and even audio-visual formats (e.g. [7–11]).
Some previous studies exploring teacher feedback-giving through digital modes have
focused on how these digital modes impact teachers’ feedback types and patterns (e.g.
[12]. However, these studies have not addressed how such digital modes affect learners’

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 766–774, 2023.
https://doi.org/10.1007/978-3-031-18344-7_55
The Impact of Feedback Modes on Learners’ Performance 767

use of teacher feedback in improving their writing. Some studies have measured the
impact of two different digital modes (e.g. text and voice) (e.g. [13–16]) and text and
audio-visual [9, 17] on learners’ text revisions. A few studies have compared more than
two modes: text, voice, and audio-visual modes [8, 18], oral, text, voice, and screencast
modes [7]. They have reported contradictory findings. Since the purpose of digital mode
in feedback delivery is to enhance learners’ uptake and write performance, it is mandatory
to see how and to what extent learners will take up such feedback given in different
digital modes and will enhance their writing performance [2, 8]. The author argues that
replication research should be based on sound methodological and analytical practices to
advance ESL/EFL theory and inform pedagogy. Therefore, this study aims to determine
the effect of teacher feedback modes on EFL learners’ text revisions and performance
in paragraph writing. It addresses the question of which modes are more effective in
enhancing learners’ performance in paragraph writing?

2 Literature Review

There are different modes of teacher feedback. Starting with the traditional or non-digital
feedback modes, oral feedback is corrective and evaluative information orally given
by writing teachers on students’ written texts in face-to-face (FTF) classroom settings,
which might take the form of dialogue [19]. A few studies have compared oral feedback to
other feedback modalities. For instance, [20] reported that oral metalinguistic feedback
was more effective than written feedback in enhancing learners’ use of subject-verb
agreement in English. According to [21], oral feedback was more efficient than written
feedback on writing for Turkish EFL learners. A few other studies have highlighted
the potential of teacher feedback provided in the oral modality due to the occurrence
of teacher-learner dialogue around feedback [22, 23]. Yet, the efficacy of teacher oral
feedback has not been explored in compassion to other feedback modalities [19].
Because of the widespread application of technology in writing courses, teacher
feedback has been increasingly digital. It is formulated and provided through digital
tools such as Google Docs comments, voice records, and screencast capture records.
This has resulted in diversifying digital feedback modes from text to voice and audio-
visual modalities [2, 7, 12, 24, 25]. While text feedback is provided in the form of written
comments/notes inserted into learners’ written texts using online writing tools, such as
Google Docs, voice feedback is usually recorded through audio recording programs
and provided in voice notes on learners’ writing. In addition, audio-visual feedback
is recorded through screencast capture tools, thus making it multimodal feedback. It
consists of oral or voice comments and visual elements (e.g., mouse color effect, mouse
pointer, text display, etc.) [7, 12].
Concerning empirical research comparing these different digital modalities and their
effect on learners’ writing, some studies have compared two digital modes: text and voice
modes [13, 16] and text and audio-visual modes [9, 17, 26, 27]. The findings of the first
group of studies seem inconclusive. While one study supported the efficacy of voice
feedback in enhancing learners’ content and ideas, organization, and style [16], the
other study reported that feedback mode was not an essential factor affecting learners’
tasks [13]. For the second group of studies, some studies provided evidence on the
effectiveness of audio-visual feedback modality on learners’ writing [9, 26, 27], another
768 M. A. Saeed et al.

study found that students’ writing improved regardless of the feedback modes used.
Results of some other studies comparing text and audio-visual feedback modes appeared
mixed. In other words, audio-visual feedback was found to trigger a higher number of
learners’ successful text revisions of macro-level errors such as content, organization,
and structure [28] as well as appropriate vocabulary use [29], text feedback was found
more effective in leading to higher amounts of micro-level errors, such as linguistic
errors, vocabulary choice, and punctuations [28, 29].
A few recent studies have compared three digital feedback modalities: text, voice,
and audio-visual [8, 18] and even four modalities: oral, text, voice, and audio-visual [7].
The first two studies found that no digital feedback mode was more efficient than the
others in enhancing learners’ texts. On the other hand, the latter study reported that most
of the successful text revisions made by learners were elicited by audio-visual feedback.
In contrast, the least was elicited by text feedback. In general, some of the above studies
support the efficacy of audio-visual feedback due to its multimodal composition that
makes the information easier to understand and to use by learners to improve their
writing.

3 Method
3.1 Participants

The present study used a pretest-posttest design to measure the effectiveness of teacher
feedback modes on learners’ paragraph writing performance. It was conducted among
60 EFL undergraduates joining a writing course in a Saudi public university over five
weeks. The writing course introduces learners to paragraph writing of different genres:
descriptive, narrative, argumentative, comparison, and contrast. However, the present
study focused on narrative writing. The writing course instructor taught the course using
English as the medium of instruction and feedback. However, he had to shift to Arabic
in some cases to simplify the information.

3.2 The Feedback Treatment

Before the experiment, the students were randomly recruited into four groups consisting
of fifteen members. The four groups were labeled according to the feedback mode
conditions: oral feedback group (OFG), text feedback group (TFG), voice feedback
group (VFG), and audio-visual feedback group (A-VFG). During the first week, the
students were assigned to narrative paragraphs writing on the topic of an experience in
my life. The writing task was initiated by writing the first draft (60 first drafts are referred
to as the pretest in this study). After collecting the first drafts, the teacher read the drafts
and gave feedback using a different mode for each group of students (OFG-oral feedback,
TFG-text feedback, VFG-voice feedback, and A-VFG-audiovisual feedback) for four
weeks. The oral feedback was given in the classroom setting in the form of dialogue,
and it was also recorded using mobile audio records so that it could be later used as
data. However, the feedback in the latter three modes was provided by the instructor
using different digital media/tools: Blackboard Forum commenting box (Snapshot 1),
The Impact of Feedback Modes on Learners’ Performance 769

WhatsApp audio/voice records (Snapshot 2), and Bandicam screencast recorder software
(Snapshot3), respectively. In addition, the voice and audio-visual feedback records were
shared with the course WhatsApp group (Snapshots 2 & 4) as shown in Fig. 1.
During the last week, the students were requested to revise their first drafts of narrative
writing based on the teacher’s feedback. They were also asked to submit their final drafts
at the end of the week (n.60). The writing instructor read and checked them as the post-test
in this study.

Fig. 1. An illustration of teacher audio, screencast and written feedback modes

3.3 Data Collection and Analysis


The data was collected from students’ pretest-posttest written texts. Each group’s texts
were organized in a separate folder for scoring and comparison. To determine the effect
of the four feedback modes on learners’ paragraph writing performance and find which
modes were more effective, the first drafts and final drafts of the four different feedback
mode conditions were scored by the instructor based on the evaluation rubrics: content
and idea development, unity and organization, accurate grammar and structure, appropri-
ate vocabulary selection and correct spelling and punctuations. Then they were compared
by performing paired-samples t-tests and one-way ANOVAs. First, each student’s first
770 M. A. Saeed et al.

and final drafts were assessed and scored out of 20 marks based on the writing task
rubric specified in the course. Then, these pretests- post-test scores in paragraph writing
were compared for each feedback group independently using descriptive (mean values)
and inferential statistics (a paired sample t-test) to determine the effect of each feedback
on learners’ paragraph writing. In addition, the mean values of the post-test scores in
paragraph writing of the four groups were compared against each other to determine
the significance level of differences between and within groups using one-way ANOVA.
To find out where these significant differences lay or the location of these statistically
significant differences, the Scheffe test, one of the ANOVA tests, was performed in this
study.

4 Results and Discussion

To determine the level of impact of each feedback mode on learners’ performance in


paragraph writing, a paired sample t-test was performed on each group’s scores in the
first and final drafts. Table 1 illustrates differences in the mean values of each feedback
group’s first draft and final draft scores. At the same time, the mean differences for OFG,
VFG, and A-VFG are statistically significant (p = .000 < .05), but the mean difference
for the TFG is not statistically significant (p = .563 > .05).

Table 1. Paired sample T-test of scores of first and final drafts for each group

Paired t-test
Group Drafts Mean Std. deviation t Sig. (2-tailed)
OFG First draft 11.8000 1.14642 −8.646 .000
Final draft 16.3333 1.83874
TFG First draft 12.1333 2.66905 −.592 .563
Final draft 12.7333 2.73774
VFG First draft 12.9333 2.68506 −4.641 .000
Final draft 16.0667 2.43389
A-VFG First draft 10.6667 2.09307 −16.568 .000
Final draft 16.6000 1.99284

As shown in Table 2 one-way ANOVA was performed to compare these four feedback
groups concerning their final drafts or post-test scores in paragraph writing. Table 2 shows
that the difference between groups (49.311) is higher than the difference within groups
(1.05). In addition, the observed F of the writing post-test was 9.496, and the critical F
shows a degree of freedom of 3. The four groups differed in their performance on the
writing post-test. In other words, overall, the difference is statically significant among
the four groups (p = .000 < .005), which suggests that the feedback treatment affected
learners’ paragraph writing performance.
The Impact of Feedback Modes on Learners’ Performance 771

Table 2. One-way ANOVA results of the writing post-test

Sum of squares DF Mean square F Sig.


Between groups 147.933 3 49.311 9.496 .000
Within groups 290.800 56 5.193
Total 438.733 59

To determine where this significant difference lies or which feedback groups outper-
formed the others in the post-test, a posthoc analysis was performed using the Scheffe
test as shown in Table 3. The results show significant differences between OFG and TFG
(p = .001 < .005), between VFG and TFG (p = .003 < .005) and between A-VFG and
TFG (p = .000 < .005). This suggests that the OFG, VFG, and A-VFG outperformed the
TFG in the post-test paragraph. In other words, the three feedback modes: oral, voice,
and audio-visual are more effective than the text feedback mode in enhancing learners’
performance in paragraph writing.

Table 3. Scheffe test results for the writing post-test

Feedback group comparisons Mean difference Std. error Sig.


OFG TFG 3.60000* .83209 .001
VGF .26667 .83209 .991
A-VFG −.26667 .83209 .991
TFG OFG −3.60000* .83209 .001
VGF −3.33333* .83209 .003
A-VFG −3.86667* .83209 .000
VFG OFG −.26667 .83209 .991
TFG 3.33333* .83209 .003
A-VFG −.53333 .83209 .938
A-VFG OFG .26667 .83209 .991
TFG 3.86667* .83209 .000
VFG .53333 .83209 .938

On the other hand, the differences between OFG and VFG (p = .991 > .005),
between OFG and A-VFG (p = .991 > .005), and between VFG and A-VFG (p = .938 >
.005) are not statistically significant. This suggests that no feedback group outperformed
its counterpart in each of these three clusters of group comparisons (e.g. OFG when
compared to VFG). Such results also indicate that from these three feedback modes:
oral, voice, and audio-visual, no mode is more effective than the others in improving
learners’ paragraph writing performance.
772 M. A. Saeed et al.

Despite the increased interest in enhancing feedback provision through various


modes, including digital modes, research on whether and how such modes affect learn-
ers’ performance in writing has been limited, and results have been inconclusive [7,
8]. The present study provided evidence of the significant effect of feedback modes on
learners’ writing performance in the post-test task except for the text feedback mode.
This result contradicts [8] that no mode of the three compared modes: audio/voice, text,
and audio-visual, was more effective in enhancing learners’ writing. In addition, the cur-
rent study result seems to be contradictory to the result of a previous study [15] which
indicates that the feedback mode does not affect learners’ performance. However, this
previous study reported results derived from learners’ perspectives rather than scores in
writing tasks. In addition, the result of the present study was supported by the one-way
ANOVA, which illustrated that the oral, voice, and audio-visual feedback modes are
more effective than the text feedback mode. Although this result seems to corroborate
[7] results on the effectiveness of audio-visual mode in enhancing learners’ uptake of
teacher feedback and thus to lead to higher amounts of successful text revisions, this
previous study was based on simple descriptive statistics (e.g. counting the number and
percentage of feedback leading to successful text revisions). Therefore, the present study
supports the efficacy of audio-visual feedback based on sound methodological analyses.
This result could explain some affordances or merits of these three modes in providing
clearer, detailed, and more explicit feedback, which benefits learners in improving their
performance in writing. Yet, this does not mean teachers should not use other feedback
modes as oral feedback is effective, especially when practiced in dialogue, and voice
feedback is efficient for its detailed explanation and clarity. However, the implication of
this result is that because screencast appears to allow teachers to formulate their detailed
and precise feedback on learners’ errors and enables learners to listen to such detailed
oral comments and also see or watch the visual elements (e.g. mouse pointer) on their
errors, teachers should use such screencast capture technology in making their feedback
more effective if the purpose of feedback-giving is to improve their learners’ writing.

5 Conclusion

The present study addressed issues related to the effectiveness of teacher feedback on stu-
dents’ paragraph writing. Specifically, it focused on enhancing feedback-giving practices
to maximize its use among students in enhancing their writing. By exploring the effect
of four different feedback modes on learners’ narrative writing, the study revealed that
the oral, voice, and audio-visual feedback modes are more efficient than text feedback
in improving learners’ performance in writing.
Although the study contributes to earlier research on feedback-giving practices and
their effect on learners’ writing, it has some limitations. The first limitation is that one
writing course instructor gave the feedback. Therefore, future studies should look at the
effect of these different feedback modes among different writing instructors to compare
and better understand such effect variation. In addition, the study was exclusive to one
writing course, which focuses on paragraph writing. Therefore, future research may also
look at this effect of feedback modes in different writing courses, including essay writing
courses. In addition, future studies may also look at this effect of feedback modes across
The Impact of Feedback Modes on Learners’ Performance 773

different writing topics and genres as feedback mode may not be the only factor affecting
learners’ writing performance. Finally, students’ views on feedback modes as receivers
of feedback should be considered to enrich and support such results.

References
1. Dressler, R., Chu, M.W., Crossman, K., Hilman, B.: Quantity and quality of uptake: examining
surface and meaning-level feedback provided by peers and an instructor in a graduate research
course. Assess. Writ. 1(39), 14–24 (2019)
2. Ene, E., Upton, T.A.: Synchronous and asynchronous teacher electronic feedback and learner
uptake in ESL composition. J. Second. Lang. Writ. 41, 1–13 (2018)
3. Kilickaya, F.: Pre-service English teachers’ views on coursebook evaluation and designing
supplementary materials. Kastamonu Eğitim Dergisi 27(2), 523–536 (2019)
4. Bahari, A.: Computer-mediated feedback for L2 learners: challenges versus affordances. J.
Comput. Assist. Learn. 37, 24–38 (2020)
5. Novakovich, J.: Fostering critical thinking and reflection through blog-mediated peer
feedback. J. Comput. Assist. Learn. 32(1), 16–30 (2016)
6. Storch, N., Wigglesworth, G.: Learners’ processing, uptake, and retention of corrective
feedback on writing: case studies. Stud. Second. Lang. Acquis. 32(2), 303–334 (2010)
7. Alharbi, M.A.: Impact of teacher written vs. audio feedback on EFL undergraduates’ writing.
Kıbrıslı Eğitim Bilimleri Dergisi 16(3), 1141–1154 (2021)
8. Bakla, A.: A mixed-methods study of feedback modes in EFL writing. Lang. Learn. Technol.
24(1), 107–128 (2020)
9. Cheng, D., Li, M.: Screencast video feedback in online TESOL classes. Comput. Compos.
58, 102612 (2020)
10. Ko, M.H.: Students’ reactions to using smartphones and social media for vocabulary feedback.
Comput. Assist. Lang. Learn. 32(8), 920–944 (2019)
11. Ma, X.: Writing in a Task-Based Individualized Curriculum: Effectiveness of Direct and
Indirect Written Corrective Feedback (Doctoral dissertation, Georgetown University) (2020)
12. Mohammed, M.A.S.: Does teacher feedback mode matter for language students? Asian EFL
J. 28(11), 2021 (2021)
13. Gleaves, A., Walker, C.: Richness, redundancy or relational salience? a comparison of the
effect of textual and aural feedback modes on knowledge elaboration in higher education
students’ work. Comput. Educ. 62, 249–261 (2013)
14. Johnson, W.F., Stellmack, M.A., Barthel, A.L.: Format of instructor feedback on student
writing assignments affects feedback quality and student performance. Teach. Psychol. 46(1),
16–21 (2019)
15. Morris, C., Chikwa, G.: Audio versus written feedback: exploring learners’ preference and
the impact of feedback format on students’ academic performance. Act. Learn. High. Educ.
17(2), 125–137 (2016)
16. Solhi, M., Eğinli, İ: The effect of recorded oral feedback on EFL learners’ writing. Dil ve
Dilbilimi Çalışmaları Dergisi 16(1), 1–13 (2020)
17. Elola, I., Oskoz, A.: Supporting second language writing using multimodal feedback. Foreign
Lang. Ann. 49(1), 58–74 (2016)
18. Espasa, A., Mayordomo, R.M., Guasch, T., Martinez-Melo, M.: Does the type of feedback
channel used in online learning environments matter? Students’ perceptions and impact on
learning. Active Learn. High. Educ. 23(1), 49–63 2019.https://doi.org/10.1177/146978741
9891307
774 M. A. Saeed et al.

19. Schuldt, L.C.: Feedback in action: examining teachers’ oral feedback to elementary writers.
Teach. Teach. Educ. 83, 64–76 (2019)
20. Mansourizadeh, K., Abdullah, K.I.: The effects of oral and written meta-linguistic feedback
on ESL students writing. 3L the SE Asian J. Engl. Lang. Stud. 20(2), 117–126 (2014)
21. Küçükali, E.: The effect of oral vs. written feedback in EFL writing. J. Appl. Linguist. Lang.
Res. 4(7), 47–67 (2017)
22. Merkel, W.: Role reversals: a case study of dialogic interactions and feedback on L2 writing.
J. Second. Lang. Writ. 39, 16–28 (2018)
23. Steen-Utheim, A., Wittek, A.L.: Dialogic feedback and potentialities for student learning.
Learn. Cult. Soc. Interact. 15, 18–30 (2017)
24. Johnson, G.M., Cooke, A.: Self-regulation of learning and preference for written versus
audio-recorded feedback by distance education students. Distance Educ. 37(1), 107–120
(2016)
25. Orlando, J.: A comparison of text, voice, and screencasting feedback to online students. Am.
J. Distance Educ. 30(3), 156–166 (2016)
26. Cavaleri, M., Kawaguchi, S., Di Biase, B., Power, C.: How recorded audio-visual feedback
can improve academic language support. J. Univ. Teach. Learn. Pract. 16(4), 6 (2019)
27. Özkul, S., Ortactepe, D.: The use of video feedback in teaching process-approach EFL writing.
TESOL J. 8(4), 862–877 (2017)
28. Cunningham, K.J.: Student perceptions and use of technology-mediated text and screencast
feedback in ESL writing. Comput. Compos. 52, 222–241 (2019)
29. Ducate, L., Arnold, N.: Computer-mediated feedback: effectiveness and student perceptions
of screen-casting software versus the comment function. Technol. Across Writ. Contexts and
Tasks 10, 31–56 (2012)
Metasearch: A Web-Based Application
to Perform Systematic Reviews

Rafael Santos Crema(B) , Guilherme Nunes Nogueira Neto, and Percy Nohama

Pontifícia Universidade Católica do Paraná - PUCPR, Curitiba, PR, Brazil


[email protected]

Abstract. This article presents some of the available features dedicated to per-
forming systematic reviews, their limitations and importance, followed by a novel
tool for a scientific article search engine, fully integrated into a robust system to
perform systematic reviews. It has helpful and desirable features such as Database
Integration; Duplicate Removal; Collaboration and Reviewers; Validation Pro-
cess; Automated Criteria Creation; and Cost. This tool, called Metasearch, is a
web-based application focused on performing systematic reviews with automatic
search and metadata retrieval from databases, removing duplicates in just one
click, performing complex rules for excluding criteria quickly, and using filters
in many metadata or tags, as well as work validation by third-party reviewers.
The tool was developed following the scientific method of performing system-
atic reviews, always focusing on saving time and helping researchers with smart
tools and a friendly interface, providing an integrated set of tools to reach these
objectives.

Keywords: Systematic review · Academic search engine · Literature search ·


Search builder

1 Introduction
Internet search engines used in the processes of data search, retrieval, storage, and report-
ing of systematic reviews have a negative influence [1–6]. Their use is associated with the
lack of guidance in making internet searching reproducible and fails to identify results
without introducing bias, bringing more harm than benefit [7]. Considering this, the use
of specialized search engines (such as Rayan [8]) or applications (such as Publish or
Perish [9]) are reliable and safe alternatives to searching and retrieving data for system-
atic reviews. However, even with available options, these tools do not contain a desirable
fully integrated roll of features, including articles metadata retrieval or collaboration and
sharing tools.
In this article, we present some of the available tools, their limitations and impor-
tance, followed by the methods used to develop a novel tool for a scientific articles
search engine, fully integrated into a robust system to perform systematic reviews, con-
taining helpful and desirable features and, finally, the conclusions and benefits reached
by developing it.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 775–785, 2023.
https://doi.org/10.1007/978-3-031-18344-7_56
776 R. S. Crema et al.

2 Available Tools and Limitations


There are tools focused on developing systematic reviews, such as Rayyan, Covidence,
EPPI-Reviewer, CADIMA, and Distiller. Systematic reviews are large, complex projects;
depending on the purpose, they can be quite expensive to conduct [10].
Therefore, tools focused on helping get this kind of work performed are helpful
and desirable. There is also another roll of software and tools not precisely created for
this task but can also be used on this type of research, such as Mendeley and EndNote,
that have different features like assistance in managing references and library [11] and
internal library search and review [12]. Considering this, we evaluate the main benefits
and limitations of the tools available specifically for systematic reviews.
To better explore the main features that help the development of systematic reviews,
following important points like transparency and reproducibility, and also considering
that this kind of work can be time-consuming because of important requirements, we
selected a roll of features that are focused on supporting to reach this: Database Integra-
tion; Duplicate Removal; Collaboration and Reviewers; Validation Process; Automated
Criteria Creation; and Cost.

2.1 Analysing Features


The selected features was selected to be analysed following some specific reasons for
each one, better described here.
The first feature selected was Database Integration and consists on the tool auto-
matically connect and integrated with the articles database to import their metadata, it
wasn’t possible in all available academic databases, but some of then allow this kind of
integration by providing an integration interface. This feature was selected based on the
first step of every systematic review, where the author access all the desirable databases
to be part of the work, perform their search using Boolean logic, and then export or copy
all retrieved articles, this process of delivering search results into some tool can take up
to 40 h [13], and if it can be made automatically by some tool, it can be done in less then
an hour.
Duplicate Removal was selected as the second important feature because researchers
used to work on systematic reviews searching on different databases, facing situa-
tions with high prevalence of duplicates, for example among PubMed, EMBASE, and
Cochrane Library [14].
The third feature is Collaboration and Reviewers, that focus on multiple authors
systematic reviews, that needs to work together on all development steps, but also on
reviewers such as a professor adviser, that used to follow the work done helping the
author with instructions, comments and improvements.
Validation Process was selected as an useful feature not specificly for authors, but
with focus on paper reviewers and publishers. When they analyse and review some
systematic review work, the provision of protocol and methods facilitates the replication
of systematic review [15], so this feature can support the reviewing process.
When analysing the articles retrieved at a systematic review, researchers create some
exclude criteria to remove not relevant articles, this process requires performing searchs
and sometimes read the full article, so we considered as desirable to have some feature
Metasearch: A Web-Based Application to Perform Systematic Reviews 777

that can automatically creates some exclude criteria based on search rules, something
we called Automated Criteria Creation.
Finally, we considered Cost as an important feature (or characteristic) because
depending on the tool cost, it can be used by more, or less, researchers.
After select and considerer all the main features, we analyse all the cited tools looking
for those features, presenting the results in Table 1.

Table 1. Tools comparative table of features

Feature Rayyan Covidence EPPI-Reviewer CADIMA DistillerSR


Database No No PubMed No PubMed
Integration
Duplicate Removal
Collaboration and No Yes Yes No Yes
Reviewers
Validation Process No No No No No
Automated Criteria No No No No No
Creation
Cost Free -Paid Paid Paid Free Paid

As we can check in the table, none of the evaluated tools has all selected features
together. Some of them are not even available in any tool, such as Automated Criteria
Creation and Validation Process, which are very important to decrease the time spent
on exclusion criteria creation and guarantee a transparent work evaluation by reviewers.
Providing this roll of features together is important to help the exhausting and painful
work required to perform systematic reviews, leaving more time for article reading, cri-
teria development, and search validation. This paper aims to present a complete solution
that can combine all the listed features.

3 Metasearch Tool
The Metasearch tool is a web-based application focused on developing systematic
reviews, improving the time spent on the search stages in article databases, article sort-
ing with exclusion criteria, and work validation. Finally, it also helps researchers to
collaborate and share their work.
The motivation to develop the tool was based on the personal needs of the authors
to develop systematic reviews in a fast and more trackable way. Also, considering that
more than three-quarters of all studies used in systematic reviews are found in electronic
databases [16–20], and Boolean logic already are used in systematic reviews, as it allows
complex query formulation [21], that was the main idea for the tool development, pro-
viding the desirable features that was selected previously, such as database integration,
duplicate removal, collaboration and reviewers, validation process, automated criteria
creation, and cost together.
778 R. S. Crema et al.

4 Tool Development
The development started by designing a workflow based on how systematic reviews
are performed, including all stages. To better understand the process, the PRISMA
statement was used as reference, with their steps of Identification, Screening, Eligibility
and Included [22].
After the understand of the PRISMA flow, we created a flow diagram for the tool,
resulting in the main process that it should attend (see Fig. 1).

Fig. 1. Tool workflow diagram.

The first three steps of the diagram inside the Metasearch land was focused on the
first step of Identification, starting with “Perform Search on Databases with Boolean
Logic” that allows the user to perform searches in different databases directly from the
tool, that should be integrated with the databases that will “List Search Results”, this
step should list all databases and the number of articles retrieved in each one based on
the query parameter. After listed, the user should “Select databases to include in the
Systematic Review”, which means the tool user should select the desirable ones to be
included, and then the system will “Import Articles Metadata”, meaning that using the
database integration, the tool will import all metadata from the retrieved articles, what
was possible by the database “List Articles with Metadata” on their integration interface.
After performed the first steps, the user should have a started systematic review insed
the tool, that goes to the fourth main step of “View Results”, this steps should include the
possibility of list all the imported with their metadata with filters to better find specific
articles. From this step the tool user should be able to perform three main actions: Create
Exclusion Criteria; Fill Missing Articles Metadata; and Create Articles Tags.
Create Articles Tags means that the tool needs to provide a Tag system, where the
user can create, include and remove tags for each individual article, allowing the articles
to be categorized, helping the screening step of PRISMA. Fill Missing Articles Metadata
Metasearch: A Web-Based Application to Perform Systematic Reviews 779

refers to the action of provide missing metadata for articles, it should be possible because
sometimes the database integration interface are not able to retrieve everything from the
article. Finally, Create Exclusion Criteria is the possibility of create the planned exclusion
criterias for the systematic review inside the tool, automatically setting the articles as
excluded, it should be possible to perform using two ways: a regular manual one, by
selecting the desirables articles for the exclude criteria, and an smart and automatically
one, focused to exclude articles based on rules, for exemple articles with some specific
word present on the title or abstratic, articles with publication data older than some year,
and so on.
Ending the flow, there’s two final steps of Export and Share Results, where the
tool should provide tools to export the articles list with each article status (included or
excluded) and some way to share the results with others.
To achieve the proposed workflow, the technical approach considered was using web-
based technologies, developing the tool with the PHP programming language, Nginx web
server software, running on a Linux server, and for the database, the MySQL database
management system was defined. Furthermore, as a tool for academic purposes, all the
technology selected for its development consisted of Open Source solutions, avoiding
unnecessary costs with software licenses.

5 Results
After worked on the tool development, it was released in a beta version with all the
proposed workflow stages. This first beta version made it possible to create Systematic
Reviews with the developed tool, using all the benefits of a roll of features and internal
tools fully integrated, following the workflow better explained next.

5.1 Search Articles Databases


The first feature available for the system user is the database search, where he/she can use
Boolean searches to find articles directly from IEEE, PLOS ONE, PubMed, PubMed
Central and Springer (see Fig. 2). The article metadata found are directly imported
from the databases to Metasearch using their integration tool (API) and includes the
primary data used for systematic reviews: Origin/Database; Title; Publication Date and
Year; DOI; Publication Name; Publisher; Authors; Keywords; and Abstract. The tool
also allows uploading articles from ScienceDirect (or other databases) using articles
metadata files with RIS format.
Once the user imports all the articles to the Metasearch, it can check metadata that
was not imported automatically and fill the missing data directly. This can happen in
some situations where the database API could not retrieve complete information.

5.2 Sorting
Once the articles are imported to the database, the sorting stage can be done inside the
Metasearch tool. First, the articles are listed with Database, Year, and Title information.
It also shows the possibility of creating tags for each one, suggesting the already used
780 R. S. Crema et al.

Fig. 2. Searching interface show results in all integrated databases

ones as users start typing the text, this tags can be useful to create exclude criteria as
will be mentioned ahead and to filter the articles exibition. Other features are: Article
Status, which shows if the article is still included in the systematic review or is already
excluded by some of the exclusion criteria; Alerts, which present possible missing article
metadata; and Edit option, where missing data from Alerts can be filled in to help the
researcher on the sorting stage.
The sorting of articles is performed by creating exclusion criteria and can use the
tool features developed for this task. The first is the Remove Duplicates function, which
looks for articles with the same DOI number and removes them. The exclude option is
based on a scientific relevance ranking provided by the tool, excluding the duplicated
article from the less important rank database.
Custom exclusion criteria can be created using two types of features, Manual and
Smart Criteria. The Manual option allows the user to select all the articles that were not
excluded yet and which ones they want to remove from the review, creating exclusion
criteria just like a regular sorting.
On the other hand, using the Smart Criteria function, the user can create smart filters
to sort all the remaining articles using countless combinations of fields and rules (see
Fig. 3). The filters can be created for the fields Title, Abstract, Authors, Keywords, Year,
Tags, and Content Type, using operators such as CONTAIN, NOT CONTAIN, START
WITH, NOT START WITH, END WITH, NOT END WITH, IS and IS NOT, also
GREATER THAN, LOWER THAN, EQUAL TO for the Year field. Another detail about
the filters is that they can be applied if ALL or ANY of the conditions are TRUE/FALSE,
allowing the user to create Smart Criteria with many possibilities. After the articles are
Metasearch: A Web-Based Application to Perform Systematic Reviews 781

filtered using all conditions created by the user, the smart exclude criteria are created
automatically with one click, saving precious time selecting articles based on conditions.

Fig. 3. Smart criteria creation view.

Other methods to create Smart Criteria are using the two shortcuts on the main
systematic review screen. The user can create criteria for a specific Tag or Content Type
directly with just one click.
After all criteria are created and undesirable articles excluded from the systematic
review, everything can be checked by viewing the result on the main screen (see Fig. 4).
It presents the total number of articles retrieved, excluded, and included, and the exclude
criteria list ordered by creation sequence, and the number of exclusions that occurred in
each criterion.
Completing the main screen functionality, there is a specific feature where it is possi-
ble to inactivate/activate the exclude criteria. This feature can be helpful in understanding
and testing the criteria in an easy and aggregated view of results.

5.3 Collaboration

Thinking about the possibilities of multiple authors’ systematic reviews and also about
reviewers’ validation, collaborators and multiple collaboration possibilities, Metasearch
was designed to allow the systematic review owner/author to assign other users to access
it with three possible rules: owner/author, that can collaborate with the same access rights
from the original owner/author; Reviewer, who can leave comments for criteria, article
metadata or for the review itself; and finally, Viewer, that allows viewing the work done
without performing any action.
782 R. S. Crema et al.

Fig. 4. Review main screen with main information regarding the systematic review

5.4 Export and Timeline Link


After the user has finished excluding articles and reading the remaining ones, Metasearch
has two options to share the results. First, the articles list can be exported in CSV (comma
separated values) format with all metadata from each article. A second option is to create
a sharable link with a password to access the Timeline View with all systematic review
articles and the exclusion criteria used to remove unwanted articles (with the possibility
to expand and see which articles were excluded on each criterion). This second option
can be shared with other researchers or paper reviewers to validate the method used to
perform the systematic review.

6 Discussion
The advantages of the Metasearch tool was presented after released the beta version
and the first systematic review was performand as a test, it was clearly helpful to have
the article databases directly integrated to the system, allowing not only to perform the
planned query for the review, but also to use the search to try different queries with
Boolean logic, something that can help the researcher to find the best keyword match
during the planning of the systematic review. This database integration also indicated a
great saving of time, retrieving and importing the metadata of almost a thousand articles
in less than a half hour.
The Duplicate Removal feature was very helpful, finding duplicated articles based
on DOI, but presented a possible future improvement in the Metasearch tool, to look
for duplicated articles not only based on DOI number, but also by same title or maybe
Metasearch: A Web-Based Application to Perform Systematic Reviews 783

creating an similarity index that can help in cases where the article receive small updates
between different publishers.
Speaking of Collaboration and Reviewers, the tool was successful on allowing the
collaboration between multiple authors, that can work together on the same review
sharing tags categorization, comments and reviewing their work, on the other hand,
reviewers such as professor advisor can follow the systematic review progress and share
comments with the author.
One of the most interesting features reveals to be the Validation Process, where the
tool user can not only export all the metadatada and inputed data for some CSV file,
but also, and most useful, create a share link with password, where paper reviewers, or
anyone else he desires, can access the systematic review in a simple timeline format
screen, presenting all articles and status (included or excluded) and all exclude criteria
with the excluded articles of each one and the rule used to filter (in case of smart criteria),
what can be very helpful to share work done.
Automated criteria creation shows up as a great ally to researchs on applying the
planned exclude criteria for the systematic review, performing this process in a fast way
with manual criteria, and even faster on smart criteria with countless combinations of
filters possible.
Finally, as an actual beta free version with only open-source technologies, the toolhas
a very low cost to be online and in use.
After performed some tests and validations, the tool shows up that also can be used
in a second way: as a simple search engine. This use is focused on the prospection of
information in some determined area or subarea. However, unlike regular search engines,
this can be done in many databases simultaneously and allows to use the smart criteria
to filter results and find the desirable articles.
Some limitations of this work are the current version, which is a beta version and
not open to the public. Also, it does not have all available databases integrated, only
the mentioned before, and is only working currently in English, not offering different
languages support at this point.
Future work will demand the inclusion of new databases allowing a comprehensive
set of journals to be used in the systematic reviews. It can also allow researchers from
different areas of science to use the platform. On the same matter, importing articles
using the user’s local files in different file formats, such as CSV and XLSX, will be help-
ful. Another possible development already mentioned is the improvement of Duplicate
Removal tool, allowing to be performed with better accuracy. Finally, one future work
already being planned focuses on testing the Metasearch performance comparing with
other tools and the regular development without any tool.

7 Conclusion
The Metasearch tool was developed following the scientific method of performing sys-
tematic reviews, always focusing on saving time and helping researchers with smart
tools and a friendly interface. The original aspect of this work was to provide integrated
features to be used in the development of Systematic Reviews or regular articles search.
It was possible by combining database integrations, duplicate removal, collaboration
784 R. S. Crema et al.

and reviewers, validation process, automated criteria creation, and low cost, innovating
specifically with validation of the systematic review by reviewers with a timeline feature
and automatic criteria creation using smart filters. From this, the described solution with
all features provides the desirable tool in an integrated way.

References
1. Adams, J., Hillier-Brown, F.C., Moore, H.J., et al.: Searching and synthesising ‘grey literature’
and ‘grey information’ in public health: critical reflections on three case studies. Syst Rev. 5,
164 (2016). https://doi.org/10.1186/s13643-016-0337-y
2. Stansfield, C., Dickson, K., Bangpan, M.: Exploring issues in the conduct of website searching
and other online sources for systematic reviews: how can we be systematic? Syst Rev. 5, 191
(2016). https://doi.org/10.1186/s13643-016-0371-9
3. Mahood, Q., Van Eerd, D., Irvin, E.: Searching for grey literature for systematic reviews:
challenges and benefits. Res Synth Methods. 5, 221–34 (2014). https://doi.org/10.1002/jrsm.
1106
4. Briscoe, S.: A review of the reporting of web searching to identify studies for Cochrane sys-
tematic reviews. Res Synth Methods (2017). https://doi.org/10.1002/jrsm.1275. Epub ahead
of print
5. Briscoe, S.: Web searching for systematic reviews: a case study of reporting standards in the
UK health technology assessment programme. BMC Res. Notes 8, 153 (2015). https://doi.
org/10.1186/s13104-015-1079-y
6. Cooper, C., Booth, A., Britten, N., Garside, R.: A comparison of results of empirical studies of
supplementary search techniques and recommendations in review methodology handbooks:
a methodological review. Syst Review 6(1), 234 (2017). https://doi.org/10.1186/s13643-017-
0625-1
7. Ćurković, M., Košec, A.: Bubble effect: including internet search engines in systematic
reviews introduces selection bias and impedes scientific reproducibility. BMS Medical
Research Methodology 18, 130 (2018). https://doi.org/10.1186/s12874-018-0599-2
8. Rayan Intelligent Systematic Review Homepage https://www.rayyan.ai/ Accessed 24 Apr
2022
9. Rawat, S., Meena, S.: Publish or perish: where are we heading? J. Res. Med. Sci. 19(2), 87–89
(2014)
10. Software tools to support your systematic review processes, IFIS – Food and Health Informa-
tion https://www.ifis.org/en/research-skills-blog/software-tools-to-support-your-systematic-
review-processes Accessed 15 June 2022
11. Mendeley Homepage https://www.mendeley.com/ Accessed 15 June 2022
12. EndNote Homepage https://endnote.com Accessed 15 June 2022
13. Bullers, K., Howard, Hanson, A., Kearns, A., Orriola, B., Polo, J., Sakmar, K.A.: It takes
longer than you think: Librarian time spent on systematic review tasks. J. Medical Library
Association. 106(2), 198 (2018). https://doi.org/10.5195/JMLA.2018.323
14. Qi, X., Yang, M., Ren, W., et al.: Find duplicates among the PubMed, EMBASE, and cochrane
library databases in systematic review. PLoS One. 8(8), e71838 (2013). Published 2013 Aug
20. https://doi.org/10.1371/journal.pone.0071838
15. Shokraneh, F.: Reproducibility and replicability of systematic reviews. World J. Meta-
Analysis 7(3), 66–76 (2019). https://doi.org/10.13105/wjma.v7.i3.66
16. Royle, P., Waugh, N.: Literature searching for clinical and cost-effectiveness studies used
in health technology assessment reports carried out for the national institute for clinical
excellence appraisal system. Health Technol. Assess 7, 1–64 (2003)
Metasearch: A Web-Based Application to Perform Systematic Reviews 785

17. Wallace, S., et al.: After MEDLINE? Dividend from other potential sources of randomised
controlled trials [abstract]. In: 2nd International Conference, Scientific Basis of Health
Services & 5th Annual Cochrane Colloquium, Amsterdam (1997)
18. Jadad, A.R., McQuay, H.J.: A high-yield strategy to identify randomized controlled trials for
systematic reviews. Online J. Current Clinical Trials (1993). Doc No 33:3973
19. Farriol, M., Jorda-Olives, M., Padro, J.B.: Bibliographic information retrieval in the field of
artificial nutrition. Clin. Nutr. 17, 217–222 (1998)
20. Suarez-Almazor, M.E., Belseck, E., Homik, J., Dorgan, M., Ramos-Remus, C.: Identifying
clinical trials in the medical literature with electronic databases: MEDLINE alone is not
enough. Control Clin. Trials 21, 476–487 (2000)
21. Sampson, M., et al.: Can electronic search engines optimize screening of search results in
systematic reviews: an empirical study. BMC Medical Res. Methodology 6(1), 1–8 (2006)
22. PRISMA Homepage https://prisma-statement.org/ Accessed 15 June 2022
Preliminary Study on e-Collaboration Readiness
and Community of Inquiry Presences
in a Higher Educational Institution

Alimatu–Saadia Yussiff1(B) , Abdul-Lateef Yussiff1 , Franklin Kome Amoo1 ,


and Wan Fatimah Wan Ahmad2
1 Department of Computer Science and Information Technology, University of Cape Coast,
Cape Coast, Ghana
{asyussiff,ayussiff,amoo.franklin}@ucc.edu.gh
2 Department of Computer and Information Sciences, Universiti Teknologi PETRONAS, Perak,
Malaysia
[email protected]

Abstract. The current nature of large class size in Higher Educational Institutions
(HEI), the recent COVID-19 pandemics, and more importantly, because lecturer-
student’s relationships mostly terminate right after the class session have made
educators faced many new challenges. Based on these, educators have found it
imperative to change the pedagogical and didactical approaches to teaching by
integrating Information and Communication Technologies (ICT) into the class-
room. e-Collaboration is one of the pedagogical approaches that enable two or
more people to work together using technology to help achieve a goal. This study
has introduced students to e-collaboration platforms via Learning Management
System (LMS) and Piazza. The present research focuses on finding out the experi-
ence and readiness of the e-collaboration in a HEI. Both qualitative and quantitative
approach were employed in the study. Results indicated that majority of partic-
ipants in the study have positive attitude towards e-collaboration, their attitude
results are significantly varied with their gender, and there are positive correla-
tions among the Community of Inquiry (CoI) constructs at r = 0.75, n = 75,
p = 0. In addition, majority of participants would like to use e-collaboration in
future at M = 3.95. Thus, both male and female have positive attitudes towards
e-collaboration at M3.82 SD = 0.74. The research brings to light the usefulness
and the possibilities of e-collaboration for effective teaching and learning in HEI.

Keywords: Electronic collaboration · e-Collaboration readiness · Community of


inquiry · Social presence · Teaching presence · Cognitive Presence · CoI ·
COVID-19 · University of Cape Coast · Department of Computer Science

1 Introduction
1.1 Background
The emerging concept of digital-native or student-K [1–5] and the continuous use of
the concept awaken the researchers as to the nature of students in Higher Educational

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 786–801, 2023.
https://doi.org/10.1007/978-3-031-18344-7_57
Preliminary Study on e-Collaboration Readiness and Community 787

Institutions (HEI). Prensky [6] coined the term digital-native in 2001. The term has
since then gained popularity among researchers and educators. Other concepts used as
synonyms include Generation-Y by Stanat [7], and net-generation by Tapscott [8].
Current students are described as digital native or student-K because they are gener-
ation born into an era proliferated by smart-phones, computers, mobile digital devices,
video games, cell phones, laptops, and video cameras. These electronic devices among
others have become part of their lives. They enjoy using these devices for longer hours
than sitting in a lecture theatre. The ubiquitous natures of these devices have there-
fore given today’s college students different capabilities of thinking and processing of
information compared to their predecessors [9]. According to Prensky [2, 10] the dig-
ital native are fast in receiving information, have preferences for graphics instead of
text-based contents, such as multitasking, networking, parallel process and frequent
rewards.
The nature of current students and their enthusiasm in the adoption and use of tech-
nology therefore necessitate the need for educators to change their teaching method-
ology, the design of learning contents, and the integration of appropriate technologies
into teaching and learning activities. According to Popescu and Cioiu [11] traditional
teaching methods should be adapted to accommodate the learning needs of the new gen-
eration of digital native students by integrating web 2.0 tools to support social learning
in educational settings. Thus e-collaborative teaching and learning approaches would
pave the way for students to learn socially, collaboratively and more engaging.
With the advent of Information and Communication Technology (ICT) and social
software advancements, educators have started integrating technology into the class-
room. ICT including social software have open up opportunities for students and lec-
turers without limiting time, location and space. More importantly, students can change
the way they interact with the community of enquiries in online environment. ICT can
improve the quality of collaboration and facilitate social interaction between teachers
and students. As a result, there has been a move in the use of technology to motivate
people to develop interaction and connections with individuals. ICT has also opened
opportunities for experimenting various teaching and learning methods online.
e-Collaboration is one of the pedagogical approaches that enable two or more people
to work together using technology to help achieve a goal [12–14]. It is therefore an
extension of the regular class online where both students and lecturers can have in-depth
discussions that may have eluded them during the regular class sessions. Accordingly, the
e-collaboration effort has risen as one of the most encouraging ways to deal with learning
improvement. e-Collaboration is the right environment in which students assume a key
role in the learning process. e-Collaboration provides an interactive, simulation-based,
innovative, and comprehensive learning experience. In addition, e-collaboration improve
the effectiveness and performance of learning experience of Béres and Turcsányi-Szabó
[15, 16].
The purpose of e-collaboration practices is to focus on social technologies to promote
discussions and communication among groups and peers especially in higher educational
institutions. It trains students for the demands of the present global industry, where
staffs participating in-group projects are geographically isolated without considering
time and space. Currently, there are assortments of tools (such as PIAZZA, Facebook,
788 A.-S. Yussiff et al.

Zoom, Google Meet, Schoology, Moodle, etc.) to facilitate e-collaboration operations


on specialized networks using a variety of multimedia and hypermedia applications.
The current nature of large class size in higher educational institutions in Ghana, the
recent COVID-19 pandemics, and more importantly, because lecturer-student’s relation-
ships are most often terminated right after the class session, these among others have
made educators faced many new challenges. Based on these, educators are therefore
challenged to change the way they teach, to adapt the curriculum to meet new standards,
to change the pedagogical and didactics of online teaching, to change assessment prac-
tices, and to integrate technology into teaching and learning processes. The department of
Computer Science and Information Technology at the University of Cape Coast (DCSIT-
UCC) has introduced students to e-collaboration platforms, UCC learning management
System and piazza (UCC-LMS-Piazza).
The present research focuses on investigating the readiness and experience of e-
collaboration in higher education institution, the case of DCSIT-UCC. The four ques-
tions (RQ) that guided this research are (RQ1) What are students’ attitudes toward e-
collaboration through piazza in addition to face-to-face meetings in the class? (RQ2) Are
students more likely to communicate effectively and ask for help on the e-collaboration
platform than in the traditional classroom? (RQ3) What recommendations do students
have for future use of e-collaboration? (RQ4) What are the relationships among the key
constructs of the instrument?
Thus, in this paper, we present the background information, theoretical and related
work, methodology, results and discussion, and finally the conclusion, limitations and
future works.

2 Theoretical Foundation
2.1 e-Collaboration Teaching and Learning
e-Collaboration is an educational approach that engage a joint academic effort by stu-
dents, or students and teachers together online using electronic devices [17, 18]. It
is a teaching method involving groups of students collaborating through the aid of
electronic devices to solve a problem, complete an assignment, or produce a product.
In community of inquiry, the key to e-collaboration is the principle of social depen-
dence, in which participants communicate freely and contribute to the achievement of
goals [18–20]. Thereby, shifting the teaching model from teacher-centered model to the
student-centered model. In the teacher-based learning model, teachers play a passive
role as learners after communicating knowledge to them. Nonetheless, e-collaboration
encourages greater success across all age groups and subject areas than other forms
of individualistic teacher-centered learning and enables students and their lecturers to
collaborate on topics that are more intensive during or after class sessions.
Compared to conventional learning approach, e-collaboration learning environment
incorporates social constructivism teaching and learning approach, self –accountability,
and personal responsibility. More importantly, e-collaborative teaching and learning is
more beneficial than traditional learning (teacher-centered) and it has a clear impact
on the success and performance of students inside and outside the lecture halls. e-
Collaboration work not only affects the performance of students, but it is also about
Preliminary Study on e-Collaboration Readiness and Community 789

students’ skills such as learning and communication, interacting with others, and working
effectively in the group (Béres and Turcsányi-Szabó [15]. Group participants use a
variety of techniques to solve problems and understand the needs of tasks that increase
the retention of all members of the group. “Students on the set can recognize their
abilities, strengths, and weaknesses when performing the required tasks” [21]. In this
study, we tried to investigate the benefits and readiness of e-collaboration in a HEI by
experimenting the didactics in six courses.

2.2 Didactics for e-Collaboration

Didactics is defined as the “art of teaching, or teaching methods” [22]. Didactics, as


opposed to open learning or experiential learning (where people can learn by themselves
in an unstructured form), is a teaching or instructional method that follows a structured
scientific approach or educational style to impart knowledge or skill to the learner. It
includes various structured teaching or instructional styles, strategies or activities. It
encompasses the activities of educating or instructing or the various activities that are
used to impart knowledge. Thus, “an electronic learning environment is a didactics tool
to accomplish some pedagogic goal” [22].
According to Reigeluth [23] the main purpose of instructional theory, consisting
of different forms of didactics is to “help people learn better” [23]. Similarly, Béres
and Turcsányi-Szabó [15] posit that the incorporation of suitable learning and teaching
methods into e-learning would result in effective collaboration. In addition, easy access
to content, the context of the learning environment, as well as guidance and feedback
mechanisms have to be considered in designing the teaching and learning methods.
In this study, we specifically utilized constructivist didactics, which implies
constructivist-teaching methods that follows a structured scientific approach or educa-
tional style to impart knowledge or skill to the learner. This method of teaching ascertains
that learners have prior-knowledge/schema and that learning takes place through active
participation of learners rather than passively receiving knowledge. Active participation
will motivate learners, enable them to think critically, and make them independent to
take control of their own learning [24, 25]. Through collaboration, they will experience
others’ viewpoints, negotiate, discuss, discover, analyze and make a conclusive decision.
They will also develop meta-cognition (the ability to learn how to learn) resulting in a
motivated and independent individual. Constructivist didactics also defines the roles of
educators and learners.
One drawback of this approach is the belief that the extrovert students in a group
may dominate group discussions and conclusions [26]. However, Mayer [27] suggested
the use of guided discovery, which is a mix of direct instruction and hands-on activity,
rather than pure discovery, in order to derive the full advantage of constructivist learning.
Other drawbacks include “misleading or contradicting known findings [28]”. This can be
minimized or eradicated with adequate planning, implementation strategies and finally
through content validation. Hence, these drawbacks have to be observed and mitigated.
790 A.-S. Yussiff et al.

2.3 Community of Inquiry

A model of the online collaborative learning process is the Community of Inquiry (CoI)
model originally developed by [29]. The CoI model is social constructivist model of
learning processes in online and blended environments that emphasized that effective
online learning, especially in higher education, needs community of inquiry [30]. The
CoI model is a complex model of key elements that are essential for the promotion of
social development and research in all educational institutions. Since its development,
the CoI model has introduced some key research methods and guidelines for online
learning. Overall, the CoI model has become the most widely referenced and the leading
theory for the study and design of effective e-learning.
According to (Garrison, 2016), in order to create a community of learners, computer-
based conferencing should incorporate text-based, asynchronous discussions to connect
learners to one another. This is different to traditional individualistic distance teaching
and learning process.
As shown in Fig. 1, the three key elements of the Community of Inquiry are Cog-
nitive Presence (CP), Social Presence (SP), and Teaching Presence (TP). Figure 1, also
illustrated how the intersection among the three elements of the CoI model leads to
deep and meaningful learning outcomes. TP is defined as “the design, facilitation, and
direction of cognitive and social processes to realize personally meaningful and educa-
tionally worthwhile learning outcomes” [31]. In online learning settings, TP involves the
(1) instructional design and organization of the course and activities, (2) facilitation of
the course and activities, and (3) facilitating or directing discussions to achieve desired
learning outcomes [18].

Fig. 1. Community of inquiry model (T. Anderson et al., 2000)

CP is the extent to which learners are able to construct and confirm meaning through
sustained reflection and discourse. The ultimate goal of the Community of Inquiry is to
build a solid foundation of social presence and teaching presence to stimulate cognitive
presence in a course. It is the ability of collaborators to construct knowledge through
Preliminary Study on e-Collaboration Readiness and Community 791

student-to-facilitators, student-to-student and faculty interactions. SP “refers to the abil-


ity to perceive others in an online environment as “real” and the projection of oneself as
a real person. Social presence involves open communication, affective expression, and
group cohesion [31]. It is about feeling ‘real people’ with social communication [32].
Social presence directly affects community development and online course collabora-
tion and it is an important part of the CoI framework. [33] defined social presence as
“the degree to which a person is perceived as ‘real’ in mediated communication. They
defined three categories of indicators of social presence as:

1. Affective expression where students share personal expressions of emotions,


feelings, beliefs, and values;
2. Open communication, where students form and maintain a sense of group commit-
ment;
3. Group cohesion, where students discuss common intellectual activities and respon-
sibilities”.

Overall, CoI model suggests a meaningful collaborative learning experience when


CP, SP and TP are developed in a balanced way [18]. Communications in a trusting
environment contributes to construction of knowledge by developing interpersonal rela-
tionships [34]. Current research have been experimenting the relationships among the
three main construct of CoI model, learning outcomes and satisfaction.
This research will contribute to body of knowledge by investigating the relationship
among the CoI constructs and overall attitudes of students towards e-collaboration. To
achieve this goal, the researcher followed the methodologies outlined below.

3 Methodology

3.1 Population

The target population for this study are University of Cape Coast, Department of Com-
puter Science and Information Technology (UCC-DCSIT) students’ ranging from level
200–400 enrolled in the following courses; Web Technologies I, Multimedia Computing,
Software Development Practices, Web Technologies II, Human Computer Interface and
System Security and Administration. The research was on a purposive study of students
who registered courses in DCSIT. It was observed that each student were enrolled in at
least two courses given us a total of N = 75 participants.

3.2 Instrumentation
UCC-LMS was used for communication and posting of teaching and learning materials.
Piazza was used for posting teaching and learning materials, group formation and e-
collaboration. In addition, a questionnaire was design to collect data on the purpose
of this study. The questionnaire consists of four parts. The first was on respondent
demographic information consisting of 6 items; the second was on general attitudinal
questions consisting of 12 items; the third was on CoI standardized questionnaire by
792 A.-S. Yussiff et al.

[35] consisting of 34 items; and the fourth was on open questions on the effect of using
e-learning in higher education consisting of 1 item. Overall, the questionnaire for the
study comprised of 52 items. A reliability coefficient of the items is; α = 0.958, which
is more than 0.70 indicating that the questionnaire used in the study is reliable.

3.3 Procedure
At the beginning of first and second semester, 2019–2020 academic year, students were
introduced to the concepts of e-collaboration, course outline, textbooks and tools (UCC-
LMS and PIAZZA) in the first week. In addition, teaching and learning materials and
activities were posted on both UCC-LMS and PIAZZA. This was followed by forming
groups of collaborators consisting of (3–5) member in the second week. From the third
week and on weekly basis collaboration activities in the form of problem based and
questions were posted and moderated by the facilitator on piazza system. The system
notified students whenever something was posted for participation. At the end of six
weeks of e-collaboration, the research instrument (Appendix I) was designed in Google
form and students were invited to participate in the survey. The following presents the
data analysis criteria.

3.4 Data Analysis

Responses from the study were coded and analyzed using SPSS 16.0. First, a reliability
analysis was calculated. Followed by descriptive statistics such as Mean, Standard Devi-
ation and Frequency. Finally, we derived the scatterplot and Pearson product-moment
correlation coefficient. Results from the analysis are presented and discussed below.

4 Results and Discussions


4.1 Piazza e-Collaboration Platform

Piazza is an e-collaboration platform that supports collaborators in various ways


including;

• Save time and help students learn using the power of community
• Wiki style format enables collaboration in a single space
• Features LaTeX editor, highlighted syntax and code blocking
• Questions and posts needing immediate action are highlighted
• Instructors endorse answers to keep the class on track
• Anonymous posting encourages every student to participate
• Highly customizable online polls
• Integrates with every major LMS

Figure 2 below presents Piazza home page.


Preliminary Study on e-Collaboration Readiness and Community 793

Fig. 2. Piazza homepage

Figure 3 presents an interface of CSC312 class. It presents the class at a glance with
an indication of 27 total posts and 317 total contributions.

Fig. 3. CSC312 e-collaboration class at a glance


794 A.-S. Yussiff et al.

4.2 Reliability Analysis Results


In order to examine the internal consistency of our instruments, a reliability analysis
was run on the data. The result was then interpreted using the scale from [36–38]. The
reliability result is shown in Fig. 4 with Cronbach’s alpha coefficient, a = 0.958, which
is above the acceptable scale of 0.7. This demonstrated that the internal consistency of
the 52 items scales was excellent.

Fig. 4. Reliability analysis result

4.3 Descriptive Statistics of the Participants


Figure 5 shows that there are seventy-five (N = 75) participants in the study making up
of 62 males (82.7%) and 13 females (17.3%). In addition, the participants range from
level 200 (44%), level 300 (30.7) and level 400 (25.3%).

Fig. 5. Gender and level of education

4.4 Students Attitudes Towards Using Piazza for e-Collaboration


The standard deviations and mean scores of students’ attitudes towards the use of Piazza
for e-collaboration are shown in Fig. 6. The highest rated statement was “I would like to
be able to easily view course materials (syllabus, notes, assignment) on piazza” (M =
4.28; SD = .763). This was followed by the statement, “I believe that using Piazza to learn
Preliminary Study on e-Collaboration Readiness and Community 795

will increase the flexibility to learn inside and outside the classroom” (M = 4.08; SD =
.928). Majority also believe that “implementing and using piazza as part of teaching and
learning tools will make the educational process easier and more enjoyable” (M = 4.00;
SD = .944). On the statement, “I would like my lecturer to integrate piazza in my class
in addition to face-to-face meetings in the class”, (M = 3.93; SD = 1.018). The overall
mean score and Standard deviation was M = 3.82 and SD = 0.739 respectively. This
demonstrated that majority of participants in the study from DCSIT-UCC has overall
positive attitude towards e-collaboration and their attitude results significantly vary with
their gender. These findings have helped to answer Research Question (RQ1), “What
are students’ attitudes toward e-collaboration through piazza in addition to face-to-face
meetings in the class?”.

Fig. 6. Students attitudes towards using piazza for e-collaboration

In addition, Fig. 7 illustrates a grouped bar chart of perceived attitude of students


towards e-collaboration by gender and levels. The output suggest that females had higher
perceived overall attitude scores than males in level 200 and level 200. However, we
experienced the opposite trend in level 400 where males had higher perceived overall
attitude scores than their females’ counterparts. Overall, since the difference among the
two gender types is small at every level, it can be concluded that both males and females’
categories have positive attitude towards e-collaboration.
RQ2, “Are students more likely to communicate effectively and ask for help on the
e-collaboration platform than in the traditional classroom?”.
Two survey questions from Fig. 6 that help to address this issue are; first, “I
believe that implementing piazza in the educational process will increase communi-
cation between lecturer and student” (M = 3.87; SD = 1.82). The second is “I would
be more likely to ask for help if I could communicate using piazza” (M = 3.87; SD
= .977). The mean score on the scale of five is high, which demonstrated students are
more likely to communicate effectively, participate in discussion and ask for help on the
e-collaboration platform than in the traditional classroom.
796 A.-S. Yussiff et al.

Fig. 7. Bar chart of overall attitude of students toward e-collaboration

4.5 Relationship Results

The scatterplot in Fig. 8, and the correlation results in Fig. 9, respectively illustrate the
relationship results.

Scatterplot Results
In order to explore the relationships between two continuous variables and to know
either they are linearly or curvilinear related, it is important to generate scatterplot
before calculating correlations [36, 38]. This is because only linearly related variables
are qualified for correlations analysis.

Fig. 8. Scatterplot of key constructs


Preliminary Study on e-Collaboration Readiness and Community 797

The scatterplot also helps to identify positively or negatively related variables as


well as the strength of the relationships. Figure 8 illustrate the scatterplot result from the
main constructs of the study (overall attitude, teaching presence, cognitive presence and
social presence). The result indicated that there was a significant positive relationship
among the constructs.

Correlation Results
Since the scatterplot does not give us definite answer, we need to follow it up with
Pearson product-moment correlation coefficient and the output of our analysis is shown
in Fig. 9. A correlation is statistically significant if its “Sig. (2-tailed)” < 0.05. The
result in Fig. 9 indicated that there was a significant positive association among all the
constructs. The highest correlation result was between cognitive presence and teaching
presence at r = 0.745, n = 75, p = 0.01. On the other hand, the lowest correlation
result was found between cognitive presence and overall Attitude at r = 0.601, n = 75,
p = 0.01. The scatterplot in Fig. 8 further summarizes the results. Conclusively, this
demonstrated that there was a strong, positive correlation among all the constructs. The
result also illustrated that an increase in one construct leads to an increase in the other. A
demonstration of strong relationships among the constructs. The results of the correlation
therefore, give answer to research question (RQ4), “what are the relationships among
the key constructs of the instrument?”.

Fig. 9. Correlation results

4.6 Recommendation for Future Use of e-Collaboration


In Fig. 6, the mean score on the statement, “I would like my lecturer to integrate piazza
in my class in addition to face-to-face meetings in the class” was (M = 3.93) out of 5
scale. In addition, the mean score on the statement, “I am looking forward to use piazza
in my other courses” was (M = 3.68) out of 5 scale. This demonstrated that majority of
students would like e-collaboration teaching and learning in future courses.
798 A.-S. Yussiff et al.

In addition, Table 1 presents both positive and negative recommendations for future
use of e-collaboration through Piazza and other forms of social media.

Table 1. Recommendation for Future use of e-Collaboration through LMS and Piazza

Positive Recommendation Negative Recommendation


• When used wisely, helps in delivering information • Inadequate Wi-Fi on campus
• It’s a very nice initiative • Data is expensive
• The use of piazza helped me to get more insight into my • It is distracting and derails students from being productive
course of study • The use of social media in higher education is costly.
• Social media has made learning convenient and efficient Also, internet connectivity is most of the time very bad
and therefore enhanced our learning capabilities. The • Using social media in higher education is helpful. The
hypothetical student can reach out to course materials only problem is when students don’t have the proper
regardless of geographical location. This would have internet connection and the necessary device to participate
been inconceivable at the turn of the 19th century • If we want to incorporate social media into our learning
• Social media helps student far ways interact we make sure if we do it well and right and not as if it’s a
• Zoom help my group members to have distance necessity for the student else he or she will fail
discussions • It is interactive but mostly absorbs much of student time
• Wikipedia helped my learning a lot • Should not be implemented in our classroom
• A picture is worth a thousand words they say. Most read
articles on Facebook and this improved my English
writing abilities way back in high school and made me
abreast with trending issues
• I believe online social media learning is a thing of the
future and has come to stay. It’s, therefore, a good idea to
allow lecturers and students to interact using online
resources
• I believe that online/virtual schooling should be heavily
integrated into our current education system
• Easily contacted course mates outside the classroom
• I think piazza should include videos
• I think piazza should include videos on lectures
• We should be introduced to the piazza at convenient times
• Social media has helped a lot…even in this COVID-19
pandemic…various apps where lectures use to
communicate with their students and staff…social media
has helped in higher education a lot
• I think it should be practical
• I think it would help improve the learning experience
• It was quite okay using the piazza

Therefore, the results from open-ended question in Table 1 support the quantitative
result in Fig. 9. This affirm an answer to RQ3, “what recommendations do students have
for future use of e-collaboration?”.

5 Conclusions, Research Contributions, Limitations and Future


Works
5.1 Conclusion
This study explored students’ overall attitude and its relationships with Community of
Inquiry (CoI) and presence the use of UCC-LMS-Piazza in e-collaboration environment.
Preliminary Study on e-Collaboration Readiness and Community 799

The e-collaboration learning experience was implemented through blended learning


approach with N = 75 students. Both qualitative and quantitative data were collected
to support this research. The reliability result was α = 0.958 indicating that the internal
consistency of the 52 items in the instrument was excellent.
Overall, the results indicated that majority of the participants in the study from
DCSIT-UCC has positive attitude towards e-collaboration, their attitude results signifi-
cantly vary with their gender, and there are positive correlations among the CoI constructs
at r = 0.75, n = 75, p = 0.01 as shown in Fig. 7. In addition, majority of participants
would like to use e-collaboration in future at M = 3.95. Thus, both male and female
have positive attitudes towards e-collaboration at M = 3.82 and SD = 0.74.
This research has clearly shown that collaboration among students and lecturers can
be a powerful means for examining and improving classroom practice. It also helps
to support creative thinking, problem solving, risk-taking, and innovation (as shown in
Table 1).

5.2 Research Contributions


The research brings to light the usefulness and the possibilities of e-collaboration for
effective teaching and learning in DCSIT-UCC. It opens opportunities for more research
in the area of readiness and attitude of students toward e-collaboration in higher edu-
cation institution. The finding also serves as reference material for interested individ-
uals who would want to acquire knowledge in enhancing teaching and learning with
e-collaboration. Future research shall be extended to other courses and programmes.

5.3 Limitations
As we proceeded with this research, some challenges encountered were technological
in nature. Some students encountered slow response on the platform, making it difficult
to engage effectively.
In addition, the sample population for this research was restricted to the Department
of Computer Science and Information Technology, University of Cape Coast. We hope
the sample population would have been larger; this would have ultimately increased the
responses we received.
Finally, access to internet connectivity on the platform was an issue faced by some
students making their participation on the platform less or no participation at all.

5.4 Future Works


The findings of the present study offer the following suggestion about improving the
quality of e-collaboration in future research:

1. More research is required to explore the attitudes and comprehension of students


and lecturers in delivering material to e-collaboration environments.
2. Identify ways to promote e-collaboration and shape new learning dimensions: such as
social networking, a community of inquiry, practice group, learning, and technology
network and adaptive, personal learning methods.
800 A.-S. Yussiff et al.

3. More work is needed to determine the long-term sustainability of electronic


collaboration and how it can adapt to future technological and educational changes.

References
1. Prensky, M.: Digital natives, digital immigrants part 1. On the horizon 9(5), 1–6 (2001)
2. Prensky, M.: Don’t Bother Me, Mom, I’m Learning!: How Computer and Video Games are
Preparing Your Kids for 21st Century Success and how You Can Help! Paragon House, New
York (2006)
3. Bennett, S., Maton, K., Kervin, L.: The ‘digital natives’ debate: a critical review of the
evidence. Br. J. Edu. Technol. 39(5), 775–786 (2008)
4. Brown, C., Czerniewicz, L.: Debunking the ‘digital native’: beyond digital apartheid, towards
digital democracy. J. Comput. Assist. Learn. 26(5), 357–369 (2010)
5. Teo, T., Kabakçı Yurdakul, I., Ursavaş, Ö.F.: Exploring the digital natives among pre-service
teachers in Turkey: a cross-cultural validation of the digital native assessment scale. Interactive
Learning Environments, (ahead-of-print): pp. 1–14 (2014)
6. Prensky, M.: Digital natives, digital immigrants part 2: Do they really think differently? On
the horizon (2001)
7. Stanat, M.: China’s generation Y: Understanding the future leaders of the world’s next
superpower. Homa & Sekey Books (2006)
8. Tapscott, D.: Grown up digital: How the net generation is changing your world HC. McGraw-
Hill (2008)
9. Suto, H., Sakamoto, M.: Developing an Education Material for Robot Literacy, in Human
Interface and the Management of Information. Information and Knowledge in Applications
and Services. Springer. pp. 99–108 (2014). https://doi.org/10.1007/978-3-319-07863-2_11
10. Prensky, M.R.: From digital natives to digital wisdom: Hopeful essays for 21st century
learning. Corwin Press (2012)
11. Popescu, E., Cioiu, D.: eMUSE-integrating Web 2.0 tools in a social learning environment.
Advances in Web-Based Learning-ICWL 2011, p. 41–50 (2011)
12. Chebil, R., Lejouad-Chaari, W., Cerri, S.A.: An e-collaboration new vision and its effects on
performance evaluation. Int. J. Computer Inf. Systems Industrial Manage. Appl. 3, 560–567
(2011)
13. Kock, N., Nosek, J.: Expanding the boundaries of e-collaboration. Professional Communica-
tion, IEEE Trans. 48(1), 1–9 (2005)
14. Razmerita, L., Kirchner, K.: Social media collaboration in the classroom: a study of group
collaboration. In: Baloian, N., Burstein, F., Ogata, H., Santoro, F., Zurita, G. (eds.) CRIWG
2014. LNCS, vol. 8658, pp. 279–286. Springer, Cham (2014). https://doi.org/10.1007/978-3-
319-10166-8_25
15. Béres, I., Turcsányi-Szabó, M.: Added value model of collaboration in higher education.
Interdisciplinary J. E-Learning and Learning Objects 6(1), 203–215 (2010)
16. Jara, C.A., et al.: Synchronous collaboration of virtual and remote laboratories. Computer
Appl. Eng. Educ. 20(1), 124–136 (2012)
17. Maddrell, J.A., Morrison, G.R., Watson, G.S.: Community of inquiry framework and learner
achievement. in annual meeting of the Associaiton of Educational Communicaitons &
Technology, Jacksonville, FL http://www.jennifermaddrell.com/papers.2011
18. Garrison, D.R., Anderson, T., Archer, W.: Critical inquiry in a text-based environment:
computer conferencing in higher education. Internet Higher Education 2(2), 87–105 (2000)
Preliminary Study on e-Collaboration Readiness and Community 801

19. Garrison, D.R., Anderson, T., Archer, W.: The first decade of the community of inquiry
framework: a retrospective. Internet Higher Educ. 13(1), 5–9 (2010)
20. Akyol, Z., Garrison, D.R.: The development of a community of inquiry over time in an
online course: Understanding the progression and integration of social, cognitive and teaching
presence (2014)
21. Smith, B.L., MacGregor, J.T.: What is collaborative learning. Towards the Virtual University:
International Online Learning Perspectives, pp. 217-232 (1992).
22. Schoenmakers, S., Plugge, L., Kirschner, P.: Criteria for the evaluation of electronic
learning environments. 2000, Report of MMI/Learning Lab, Maastricht. http://members.
home. nl/la plug ge1/Plugge/publications/papers/UNESCO% 20Criteria% 20f or% 20the%
20Evaluation% 20of% 20Electronic% 20Learning. pdf [27/07/2007]
23. Reigeluth, C.M.: Instructional design theories and models: An overview of their current status.
Routledge (2013)
24. Nie, Y., Lau, S.: Differential relations of constructivist and didactic instruction to students’
cognition, motivation, and achievement. Learn. Instr. 20(5), 411–423 (2010)
25. Nie, Y., et al.: The roles of teacher efficacy in instructional innovation: its predictive relations
to constructivist and didactic instruction. Educ. Res. Policy Pract. 12(1), 67–77 (2013)
26. Corporation, E.B.: Constructivism as a Paradigm for Teaching and Learning (2004)
27. Mayer, R.E.: Should there be a three-strikes rule against pure discovery learning? Am. Psychol.
59(1), 14 (2004)
28. Anderson, J.R., Reder, L.M., Simon, H.A.: Applications and misapplications of cognitive
psychology to mathematics education. ERIC Clearinghouse (1999)
29. Anderson, T., et al.: Methodological Issues in the Content Analysis of Computer Conference
Transcripts (2000)
30. Swan, K., Garrison, D., Richardson, J.C.: A constructivist approach to online learning: The
community of inquiry framework, in Information technology and constructivism in higher
education: Progressive learning frameworks., IGI global. pp. 43–57 (2009)
31. Anderson, T., et al.: Assessing Teaching Presence in a Computer Conferencing Context (2001)
32. Garrison, D.R., Arbaugh, J.B.: Researching the community of inquiry framework: review,
issues, and future directions. Int. Higher Educ. 10(3), 157–172 (2007)
33. Gunawardena, C.N., Zittle, F.J.: Social presence as a predictor of satisfaction within a
computer-mediated conferencing environment. American J. Distance Educ. 11(3), 8–26
(1997)
34. Garrison, D.R.: Online community of inquiry review: social, cognitive, and teaching presence
issues. J. Asynchronous Learning Networks 11(1), 61–72 (2007)
35. Arbaugh, J.B., et al.: Developing a community of inquiry instrument: testing a measure of the
community of inquiry framework using a multi-institutional sample. Internet Higher Educ.
11(3–4), 133–136 (2008)
36. Pallant, J.: Survival Manual. A Step by Step Guide to Data Analysis Using SPSS, p. 4 (2011)
37. Gerber, S.B., Finn, K.V.: Using SPSS for Windows: Data analysis and graphics. Springer
(2013). https://doi.org/10.1007/0-387-27604-1
38. Green, S.B., Salkind, N.J.: Using SPSS for Windows and Macintosh. Pearson Upper Saddle
River, NH (2013)
Utilising Gamification and Virtual Environments
to Present Digitally Enhanced Advanced
Services (DEAS) for the Financial Sector

S. Khan1(B) , V. Charissis1 , and D. K. Harrison2


1 Virtual Reality and Simulation Laboratory, School of Computing, Engineering and Built
Environment, Glasgow Caledonian University, Glasgow, UK
{Soheeb.khan,Vassilis.Charissis}@gcu.ac.uk
2 Department of Mechanical Engineering, School of Computing, Engineering and Built
Environment, Glasgow Caledonian University, Glasgow, UK
[email protected]

Abstract. Servitization offers a fresh opportunity for manufacturing and finance


companies to incorporate additional services to the main product, as part of an
extended maintenance scheme or for insurance purposes. Parametric insurance
products based on new technological innovations in the financial sector related to
weather risk management can potentially offer preferable and greater solutions for
the construction industry in contrast to traditional insurance models. Such smart
contract solutions/ policies can potentially aid the customers’ management of risks
related to weather in a more effective manner, settle insurance claims faster and
overcome some of the limitations associated with traditional insurance. However,
these can be complex for customers and stakeholders to understand, limiting them
to express their needs and see the true value of such DEAS offers. This paper
presents a prototype Serious game application designed for the financial sector
which enables the customers to experience a variety of construction scenarios
and weather conditions that affect the progress and cost of the in-game building.
Their choices, risks and results are presented in a two-fold gaming system that
offers both a 3D Virtual Environment and the relevant information reflecting the
appropriate offers. Preliminary results from a specialists’ focus group of ten users
are presented and their results and feedback are discussed in the paper forming
a future plan of development of similar applications for additional sectors that
require Digitally Enhanced Advanced Services (DEAS).

Keywords: Gamification · Serious games · Servitization · User experience ·


Virtual environment · DEAS

1 Introduction

The servitization in the current manufacturing and building industries presents a unique
opportunity to convey additional and future services to the customers. This aims to
ensure a long term, positive customer/user experience whilst enhancing the provider’s

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023


K. Arai (Ed.): FTC 2022, LNNS 561, pp. 802–814, 2023.
https://doi.org/10.1007/978-3-031-18344-7_58
Utilising Gamification and Virtual Environments to Present Digitally 803

revenues when offering additional services as part of the initial deal [1–3]. However,
the explanation of the various benefits that these services could yield for the customers
poses a major issue due to their complexity and bespoke nature.
To this end, technological advances and the increased popularity of smartphones
and tablets provided a new conduit to present and visualise information to the general
public. In particular, the use of emerging technologies such as 3D visualisation, Virtual
and Augmented Reality (VR/AR) and serious games was employed to present complex
information, training and simulations, in diverse domains such as medical training, envi-
ronmental sciences, defence and commercial electronics, to individuals and companies
[4–7].
Transferring this know-how to services provided for the manufacturing and financial
sector was achieved through a new approach namely Digitally Enhanced Advanced
Services (DEAS) [8, 9].
With the growing amount of organisations and businesses adopting innovative tech-
nologies to offer advanced services rather than just selling products, the financial service
providers have also had to consider the utilisation of DEAS as a long-term business model
[10, 11].
To investigate this further, this project developed a prototype online 3D serious game
in close collaboration with the EHAB group - servitization designers for the building
and financial sector - focusing on enhancing the understanding and education of their
servitization offers in the aforementioned domains. The project was designed with a
two-fold approach; (a) to provide a complete and realistic simulation of a building
construction and (b) to embed/explain seamlessly the DEAS offer of the real-life provider
(EHAB). This game design mantra offered positive outcomes in previous studies [12].
During the initial stages of the project, it was observed that servitization offers
were particularly convoluted and difficult for the customers to understand. One of the
challenges faced by this financial servitization design team was to help end-users see
the limitation imposed by the current method being used for pricing risk.
The following sections will present, the design and development process as well
as the challenges of the proposed serious game. The paper will elaborate on the game
design and provide the feedback of a specialists’ focus group after extensive gameplay
testing. The paper will conclude with a tentative plan of work for the development of an
example of a building positioned in a 3D/VR segment of a real-life UK city following
previous studies that employed simulated cities and gamification [13].

2 Digital Transformation Through Gamification


Serious Games/ Gamification is an innovative and creative way to communicate, engage
and educate users. As well as bringing awareness and providing information, a serious
game can potentially communicate complex information in an enjoyable and simple
method. Previous studies have presented that gamification and serious games could
have positive outcomes in various other industries/ education, particularly in cases where
complex information has to be conveyed [14–18].
It has yet to be determined thought if the above methods could be employed to
enhance the communication, education and engagement of DEAS offer for the financial
service providers [12, 14].
804 S. Khan et al.

Typical practices supporting servitization include paper-based documentation and


specialised offers drafted for each customer. Current attempts at digital transformation
have primarily focused on transferring the paper-based material to digital format docu-
mentation or in some cases online with a predetermined modular document structuring
based on customer’s choices. Yet these digital outputs have limited success as they still
require extensive presentation by the provider’s staff until they manage to convey the
servitization offer to the customers [19, 20].
The financial products are significantly more complex. In this particular case, the
EHAB products are designed to customise the insurance requirements for various con-
struction projects taking into account a complex set of data related to previous weather
patterns and their effects on buildings as illustrated in Fig. 1 below.

Fig. 1. Screenshot of EHAB serious game showing weather simulation and risks.

3 Proposed 3D Serious Game


The proposed 3D Serious Game application aims to demystify for the customers, the
parametric insurance products offered by EHAB group through an interactive, 3D Seri-
ous Game that guides the user through their own choices and experiences. The offers
are contrasted to typical insurance offers to highlight the benefits of the DEAS offers.
Traditional construction projects tended to have a minimum umbrella covering insurance
policies based on the constructor’s previous experience and mainly guessing or gambling
with the probability of adverse weather conditions and the potential damages /delays
that could cause to the buildings. The provision of bespoke insurance plans that could
be defined by machine learning (ML) was not directly acceptable or understood by the
typical construction companies as deemed superfluous or unnecessary. As such it was
deemed essential for the digital transformation through DEAS to explore new options
such as Serious Games that could entice and explain better to the users the benefits of
the aforementioned products.
Utilising Gamification and Virtual Environments to Present Digitally 805

3.1 Game Design and Virtual Environment

The game was designed as a simulation of construction processes and random weather
conditions spanning a duration of multiple months depending on the size of the construc-
tion project. In addition, the game design focused on motivating the user through gradual
reward schemes embedded in the game reflecting the user’s decisions [21, 22]. In par-
ticular, the game focused on the EHAB’s servitization Weather Ledger Platform which
was at the centre of the game and like its real-life counterpart, the offer works in-game
in a similar way, providing improvements to the player and simulating the benefits.

Fig. 2. Screenshot of EHAB serious game operated on a tablet device

The selection of different options and timelines was accommodated in half of the
screen whilst the other half presented the 3D visualisation of the different construction
stages and the weather conditions as presented in Fig. 2.
The user, however, could change the screen size ratio and customize the panels to
maintain the development of more than one building as shown in Fig. 3. The customiza-
tion of the operating environment, as well as the provision of multiple choices, enhance
the experience for each individual user [23, 24]. To further immerse the user in the pro-
cess and the perks the DEAS offers, the game design introduced a key feature of the
service which is the Enhanced Planner. The latter helps the user to mitigate risk and
make better predictions for future weather events. The benefits of utilising the Enhanced
Planner were directly mapped into the game.

3.2 User Interface (UI) Design


The game’s User Interface (UI) presents all the information in a minimalistic design
approach supported by colour-coded green and grey tones to avoid user distraction from
806 S. Khan et al.

Fig. 3. The game offers the option to build and monitor more than one construction site.

the main virtual environment. However, the time-sensitive information is presented with
different colour intensity or highlighted by red coloured dots and frames as seen in Fig. 3.
The options and activities panels that present the simulation facts and support the
decision-making process of the user are illustrated in Fig. 4. The interface reveals the
probability of weather days per month when the user updates the time risk allowance on
the monthly calendar as presented in Fig. 5. Having this feature gives flexibility to the
player to change and update insurance on every round based on predicted weather events.
This imitates the function of the real-world counterpart which provides a consistent and
accurate approach, streamlining how the user can plan and avoid unforeseen issues.

3.3 User Experience (UX) Design


To enhance the user experience (UX) beyond the typical scoring methods it was consid-
ered useful to replicate and simulate the more abstract notion of personal or company
reputation. Users’ reputation would increase based on how many sites they completed,
and how efficiently.
The purpose of this was to demonstrate the impact of either a good or bad reputation
of a construction company to the user. If the player does not have a reputation, would
result in reduced functions and an inability to bid for the bigger and more profitable
projects, even though they may have the funds. This was introduced to demonstrate to
the player the direct implications of not building your reputation, the negative impact of
not being prepared and mitigating weather risk appropriately.
The reputation score was defined by three reputation areas, namely: risk management,
experience, and cost presented below.

• Risk Management: points awarded for insurance successes and buying site upgrades
(to help insurance)
• Experience: points awarded for progressing and completing sites
Utilising Gamification and Virtual Environments to Present Digitally 807

Fig. 4. The UI design presents a simple and colour-coded panel that guides the user through the
different options and activities.

Fig. 5. User interface (UI) design that allows the user to monitor closely the weather patterns and
the risks involved in contrast to the insurance services enabled.

• Cost: points awarded for spending funds efficiently on insurance

The player’s reputation points are also mapped to an overall star system, where the
player can earn points to increase the number of stars they have. This is implemented
using a curve/graph which determines how many stars the player should have based on
808 S. Khan et al.

the total number of points (e.g. 1 star requires 100 points, 4 star requires 900 points,
etc.).

4 Focus Group Game-Play Evaluation

4.1 Evaluation Method

For the evaluation of this project, the team opted to develop a Technology Acceptance
Model (TAM), based on previous projects that were evaluating prototype systems and
technologies with particular groups of the public as presented in Fig. 6 [25–27]. The
TAM aims to identify if and how much users will accept new technologies to complement
or replace existing practices [28, 29].
This TAM followed a similar structure to previous studies related to the introduction
of emerging technologies to diverse areas aiming to acquire users’ feedback on the
following user experience areas [26, 30].

• Relatability to content (RC)


• Simulated Learning Experience (SLE)
• Perceived Usefulness (PU)
• Perceived Ease of Use (PEU)
• Attitude towards Usage (AU)
• Behavioural Intention to Use (BI)
• Functionality/ Controls (FC)
• Previous Gaming Experience (PGE)
• Accessibility/Platform (AP)

The users responded to a pre-questionnaire aiming to identify their demographic


information and prior knowledge of gaming, DEAS and computing overall. In turn,
the users played the game with the task to complete a medium-size 2-floor residential
building. After the completion of their task, the users responded to several statements
corresponding to the above areas of interest.
As this was the preliminary evaluation of the application one of the main points
of interest was the Perceived Usefulness (PU) which will be presented in this paper as
illustrated in Fig. 6. To identify the users’ experience with the latter, three statements
were designed as presented in Table 1.
Utilising Gamification and Virtual Environments to Present Digitally 809

Fig. 6. Customised technology acceptance model (TAM)

Table 1. Perceived usefulness (PU) statements for custom TAM

Perceived Usefulness (PU)


PU1: The use of this serious game helped me understand EHAB servitisation offer
PU2: The use of this serious game simulated the servitisation offer effectively
PU3: The use of these serious games offered a better opportunity to learn about the
servitization offer

4.2 Participants
The evaluation was performed by ten users (5 female, 5 male) who were specialists in
the field and formed our initial focus group that game-play tested the application. The
participants volunteered to test and evaluate the game.

5 Results and Discussion


One of the main challenges for this application was to address the limitations of current
risk management with the customer. The game design and the educational goals needed
to educate the end-user about the limitations of the current systems as well as simulate
and demonstrate the benefits of the servitization offer. The responses to the pre-test
questionnaire highlighted the fact that the customers/users did not always perceive the
limitations imposed by the current system.
810 S. Khan et al.

The users’ feedback on the perceived use (PU) regarding their experience and under-
standing of the aforementioned aim offered encouraging results as illustrated in Fig. 7.
In particular, the users responded positively with 80% (Strongly Agree and Moderately
Agree and Somewhat agree) for PU1 (The use of this serious game helped me understand
EHAB servitisation offer), with 50% of the responses being on the Strongly Agree. This
positive feedback was of major importance for the project as the complexity of the finance
products and especially the parametric insurance offers were particularly challenging
for the users.

Fig. 7. Participants’ feedback on the statements related to perceived use (PU) of the game.

The second statement PU2, (The use of this serious game simulated the servitisation
offer effectively) received similar feedback scoring also 80%. A 20% of users responded
neutrally to this statement. Post questionnaire discussions with the users highlighted that
the simulation although presented correctly the construction process should have taken
into consideration additional factors that might affect the delivery time of a building.
This was an interesting suggestion and the game could be enriched with additional
construction issues in the following versions. However, this specific work was concerned
with the adverse weather conditions that could damage and delay the construction of the
building, presenting mainly the Weather Ledger Platform.
On the third statement, the users responded positively with 90% (Strongly Agree
and Moderately Agree and Somewhat agree) for PU3 (The use of these serious games
offered a better opportunity to learn about the servitization offer) with only 10% being
neutral and no negative responses. Notably, there were no Strongly Disagree responses
from the participants.
The above results highlight an initial appreciation of the potential users’ tendency
to utilise these technologies and methods (i.e. 3D visualisation and gamification) in the
Utilising Gamification and Virtual Environments to Present Digitally 811

particular field. The different hypotheses that link the nine TAM constructs as illustrated
in Fig. 6 are not analysed in this paper, as the limited number of participants offered
mainly indicative results that could not support a full TAM analysis [27–29].
However, the users’ responses related to the Perceived Usefulness (PU) of the pro-
posed system are on par with other studies that utilised gamification to support the
servitization of various products and industries or investigated the impact of gamifica-
tion on clients [12, 16, 31–34]. This confirms the initial hypothesis that the gamification
approach for the servitization of financial products will have comparable outcomes to
other studies that focus primarily on manufacturing servitization [12, 35, 36].
As this study investigates an uncommon area of servitization which is not directly
linked to the manufacturing domain but employs gamification to present financial servi-
tization offers, no other similar studies that use the same methods and metrics were
found. Remote similarities could be found in only one study that customised an existing
board game, namely: snake and ladders, to convey different servitization offers [16].
In addition, the customised TAM, based on previous projects which aimed to investi-
gate the impact of emerging technologies on customers’ uptake of new products, has also
presented similar responses to the current study’s results [27, 29]. The project’s design
which was supported by industry collaboration and continuous feedback throughout the
development was reflected in the users’ responses to the PU questions. This established
a baseline of areas of interest that need to be covered in such applications and suggested
a selection of UI structures and actions that convey successfully the complex financial
products to the customers.
At this stage of the project, this output was deemed essential for the continuation of
the development and expansion of the particular system. In addition, the above results
and analysis of this preliminary evaluation highlight the potential use of these structures
and methods for the development of other similar systems that employ gamification and
3D visualisation for enhancing the presentation of information and user engagement
with other servitization offers.

6 Conclusions
This paper presented the design consideration and challenges of a novel 3D serious game
developed to support the customers’ understanding of financial and insurance choices
in the construction sector. Both the virtual environment and the game design focused on
the simplification of information provided to the user/customer whilst offering a holistic
overview and real-time visualisation of construction projects.
The game was based on the EHAB insurance servitization offers and their Weather
Ledger Platform that support the decision making process of various construction
projects in the UK.
The application was evaluated by ten volunteers that responded to pre and post-test
questionnaires designed to inform a custom TAM. The results of this part of the TAM
were overall positive yet are indicative and additional user-trials with larger cohorts are
required to define the exact level of learning outcomes achieved through this serious
game application.
812 S. Khan et al.

A future plan for further enriching this serious game with additional variables and
construction drawbacks will be formed in the next stage. To identify further the impact
on the particular industry a larger cohort evaluation will be essential.
Long term impact could highlight the potential of video games outside entertain-
ment; this could encourage other businesses to collaborate with game developers and/ or
other innovative technology practitioners to create solutions for their management and
marketing issues related to DEAS.

Acknowledgments. The authors would like to thank EHAB for their involvement as an industrial
partner and the provision of vital information for the development of this system. Furthermore, the
authors would like to thank Lyall Campbell for his work on this project. This project was funded
by EPSRC.

References
1. Schroeder, A., Naik, P., Ziaee Bigdeli, A., Baines, T.: Digitally enabled advanced services:
a socio-technical perspective on the role of the internet of things (IoT). Int. J. Oper. Prod.
Manag. 40, 1243–1268 (2020)
2. Kowalkowski, C., Bigdeli, A.Z., Baines, T.: Guest editorial: the future of servitization in a
digital era. J. Serv. Manag. 33(1), 59–69 (2022). https://doi.org/10.1108/JOSM-01-2022-450
3. Du, W., Sepasgozar, S.M.E., Romero, J.S.G.: Measuring virtual reality (VR) technology
application and adoption in chinese construction risk management. Environ. Sci. Proceed.
12(1), 18 (2021). https://doi.org/10.3390/environsciproc2021012018
4. Lagoo, R., Charissis, V., Harrison, D.: Mitigating driver’s distraction with the use of
augmented reality head-up display and gesture recognition system. In: IEEE Consumer
Electronics Magazine (CEM) J. 8(5), 79–85 (2019)
5. Liu, X., Zhang, J., Hou, G., Wang, Z.: Virtual reality and its application in military. In: IOP
Conference Series: Earth Environmental Science 170(3), 032155 (2018)
6. Ward, B.M., Charissis, V., Rowley, D., Anderson, P., Brady, L.: An evaluation of prototype
VR medical training environment: applied surgical anatomy training for malignant breast
disease. Stud. Health Technol. Inform. 2008(132), 550–555 (2008)
7. Huang, J., Lucash, M.S., Scheller, R.M., Klippel, A.: Visualizing ecological data in virtual
reality. In: 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), pp. 1311–
1312 (2019). https://doi.org/10.1109/VR.2019.8797771
8. Alfalah, S., Harisson, D.K., Charissis, V., Evans, D.: Investigation of multimodal interaction
and 3D simulation environment for prototype healthcare system. In: Journal of Enterprise
Information Management (JEIM), Mustafee N., Katsaliaki K. (Eds.): 26(1/2), 183 – 197
(2013). ISSN: 1741–0398
9. DEAS NetworkPlus: Digitally Enhanced Advanced Services EPSRC NetworkPlus Manufac-
turing Theme Research Agenda 2019; University of Westminster: London UK; ISBN 978
185449 478 8; Available online: www.deas.ac.uk Accessed 20 April 2022 (2019)
10. Baines, T., Lightfoot, H.W.: Servitization of the manufacturing firm. Int. J. Oper. Prod. Manag.
34(1), 2–35 (2014)
11. Wood, Z., Godsiff, P.: Establishing the core principles of servitisation for application outside
manufacturing. Compet. Advant. Digit. Econ. 2021, 125–130 (2021). https://doi.org/10.1049/
icp.2021.2425
Utilising Gamification and Virtual Environments to Present Digitally 813

12. Khan, M.S., et al.: Improving user experience and communication of digitally enhanced
advanced services (DEAS) offers in manufacturing sector. Multimodal Technol. Interact. 6,
21 (2022). https://doi.org/10.3390/mti6030021
13. Wang, S., Charissis, V., Harrison, D.K.: Augmented reality prototype HUD for passenger
infotainment in a vehicular environment. Advances Science, Technol. Eng. Syst. J. 2(3),
634–641 (2017)
14. Romero-Rodriguez, L.M., Ramirez-Montoya, M.S., Gonzalez, J.R.V.: Gamification in
MOOCs: Engagement application test in energy sustainability courses. IEEE Access 7,
32093–32101 (2019). https://doi.org/10.1109/access.2019.2903230
15. Abuhammad, A., et al.: “MedChemVR”: a virtual reality game to enhance medicinal chemistry
education. Multimodal. Technol. Interact. 5, 10 (2021). https://doi.org/10.3390/mti5030010
16. Andrews, D., Dmitrijeva, J., Bigdeli, A.Z., Baines, T.: Snakes and ladders in servitization:
using a game to capture inhibitors and enablers of transformation snakes and ladders in
servitization using a game to capture inhibitors and enablers of transformation. Res. Technol.
Manag. 61, 1–12 (2018). https://doi.org/10.1080/08956308.2018.1516930
17. Falah, J., et al.: Identifying the characteristics of virtual reality gamification for complex
educational topics. Multimodal. Technol. Interact. 5(9), 53 (2021). https://doi.org/10.3390/
mti5090053
18. Khan, M.S., Charissis, V., Harrison, D.: Development and preliminary evaluation of a serious
game to communicate digitally enhanced advance service (DEAS) offers; servitization: a
pathway towards a resilient, productive and sustainable future. In: Proceedings of the Spring
Servitization Conference 2021, Virtual Conference, 10–12 May, p. 287 (2021)
19. Gebauer, H., Paiola, M., Saccani, N., Rapaccini, M.: Digital servitization: crossing the per-
spectives of digitization and servitization. Ind. Mark. Manag. 93, 382–388 (2021). https://doi.
org/10.1016/j.indmarman.2020.05.011
20. Marcon, E., Marcon, A., Le Dain, M.A., Ayala, N.F., Frank, A.G., Matthieu, J.: Barriers
for the digitalization of servitization. Procedia CIRP 83, 254–259 (2019). https://doi.org/10.
1016/j.procir.2019.03.129
21. Kohtamäki, M., Parida, V., Patel, P.C., Gebauer, H.: The relationship between digitalization
and servitization: the role of servitization in capturing the financial potential of digitalization.
Technol. Forecast. Soc. Chang. 151, 119804 (2020). https://doi.org/10.1016/j.techfore.2019.
119804
22. Alsawaier, R.: The effect of gamification on motivation and engagement. Int. J. Inf. Learn.
Technol. (2018). https://doi.org/10.1108/IJILT-02-2017-0009
23. García-Magro, C., Soriano-Pinar, I., Re, U., Carlos, J.: Design of services in servitized firms:
gamification as an adequate tool. J. Bus. Ind. Mark., pp. 575–585 (2019). https://doi.org/10.
1108/JBIM-12-2018-0413
24. Altarteer, S., Charissis, V., Harrison, D., Chan, W.:. Product customisation: virtual reality and
new opportunities for luxury brands online trading. In: International Conference on 3D Web
Technology / ACM SIGGRAPH, 22–24 Anaheim, California, USA (2016)
25. Kharoub, H., Lataifeh, M., Ahmed, N.: 3D user interface design and usability for immersive
VR. Applied Sciences 9(22), 4861 (2019). https://doi.org/10.3390/app9224861
26. Al-Emran, M.: Evaluating the use of smartwatches for learning purposes through the inte-
gration of the technology acceptance model and task-technology fit. Int. J. Human-Computer
Interaction 37(19), 1874–1882 (2021). https://doi.org/10.1080/10447318.2021.1921481
27. Altarteer, S., Charissis, V.: Technology acceptance model for 3D virtual reality system in
luxury brands online stores. IEEE Access 7, 64053–64062 (2019)
28. Marangunić, N., Granić, A.: Technology acceptance model: a literature review from 1986 to
2013. Univ. Access Inf. Soc. 14, 81–95 (2015)
814 S. Khan et al.

29. Lee, Y., Larsen, K.R.T.: The technology acceptance model: past, present, and future. Com-
munications of the Association for Information Systems 12 (2003). https://doi.org/10.17705/
1CAIS.01250
30. Vanduhe, V.Z., Nat, M., Hasan, H.F.: Continuance intentions to use gamification for training
in higher education: Integrating the technology acceptance model (TAM), social motivation
and task technology fit (TTF). IEEE Access 8, 21473–21484 (2020)
31. Eisingerich, A.B., Marchand, A., Fritze, M.P., Dong, L.: Hook vs. Hope: how to enhance
customer engagement through gamification. International J. Research in Marketing 36(2),
200-215 (2019)
32. Xi, N., Hamari, J.: Does gamification affect brand engagement and equity? a study in online
brand communities. J. Bus. Res. 109, 449–460 (2020)
33. Baird, A., Raghu, T.: Associating consumer perceived value with business models for digital
services. Eur. J. Inf. Syst. 24(1), 4–22 (2015). https://doi.org/10.1057/ejis.2013.12
34. García-Magro, C., Soriano-Pinar, I.: Design of services in servitized firms: gamification as
an adequate tool. J. Business Ind. Marketing 35(3), 575–585 (2020). https://doi.org/10.1108/
JBIM-12-2018-0413
35. Shi, V.G., Ridgway, K., Baldwin, J., et al.: Gamification for Servitization in Growth through
servitization: Growth through servitization Chapter: Frameworks and Analytical Techniques,
(Eds) Baines, T., Clegg, B., Harrison, D. (2014)
36. Baines, T., Shi, V.G.: A Delphi study to explore the adoption of servitization in UK companies.
Production Planning & Control 26(14–15), 1171–1187 (2015). https://doi.org/10.1080/095
37287.2015.1033490
Author Index

A B
Abdullah, Norris Syed, 249, 257 Bahari, Mahadi, 257
AbuSa’aleek, Atef Odeh, 766 Bahn, Jacob A., 196
Adankai, Victor, 381 Bansal, Arvind K., 432
Ahmad, S., 1 Batsukh, Bat-Erdene, 508
Akinnuwesi, Boluwaji, 341 Bautista, Yohn Jairo Parra, 381
Alankar, Adham M. M., 59 Bouflous, Zakariyae, 206
AL-Ansari, Aliya, 413 Bouragba, Khalid, 206
Alattar, Alhassan E., 398 Buvet, Pierre-André, 463
Alférez, Germán H., 196
Al-Hammadi, Fatima, 633
C
Al-Hammadi, Yousra, 633
Cabrera, Rafael Guzmán, 499
Alhazmi, Abdulsalam K., 633
Caligiuri, Luigi Maxmilian, 237
Alismail, Sarah, 325
Carrillo, Luis Manuel Ledesma, 499
Allamudi, Meghna, 607
Castro, José Carmen Morales, 499
AL-Lawati, Batool, 413
Cearley, Jerry, 103
Allgood, Nicholas R., 273
Cecile, Fourie, 724
Alnanih, Reem, 660
Charissis, V., 802
Aló, Richard, 381
Christian, Adepo Joël, 44
Al-Omair, Osamah M., 451
Cookenmaster, Dakota C., 196
Alsakkaf, Nasr, 633
Crema, Rafael Santos, 775
AL-Sawafi, Sumaya, 413
Alwadai, Asma, 660
Alwan, Ali A., 16 D
Amoo, Franklin Kome, 786 Darmawan, Deni, 649
Anagnostopoulos, Christos, 69 Davoudi, Heidar, 593
Apaza, Honorio, 565 Deksne, Daiga, 555
Aruhuanca, Brisayda, 565 Derahman, M. N., 16
Asad, Arghavan, 227 Dlamini, Ricky Nhlanhla, 742
Athavale, Rishi, 359 du Preez, Johan A., 534

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2023
K. Arai (Ed.): FTC 2022, LNNS 561, pp. 815–817, 2023.
https://doi.org/10.1007/978-3-031-18344-7
816 Author Index

E M
Ebrahimi, Mehran, 593 Maddipatla, Jagadeepram, 359
Ekpenyong, Moses, 341 Mattu, Gurjeet Singh, 622
Elkaseer, Ahmed, 398 Michel, Babri, 44
Estrada, Jheanel, 478 Mohammad, A., 1
Mohammadi, Farah, 227
F Mohsen, Saeed, 398
Fache, Bertrand, 463 Moravcik, Oliver, 680
Fadel, Wiam, 463 Morrison, Ann, 155
Flores, Anibal, 565 Mullachery, Balakrishnan, 325

N
G
Neto, Guilherme Nunes Nogueira, 775
Georges, Anoh Nogbou, 44
Ngoc, Quoc Tran, 179
Gertis, E. Miles, 752
Nicholas, Charles K., 273
Gomez-Enriquez, Diego, 286
Nicolas, Boukar Abatchia, 698
Guzmán-Castillo, Adán, 370
Nina, Mariela M., 565
Nohama, Percy, 775
H Noura, Ibrahim Ganaou, 698
Haddara, Moutaz, 121 Novikova, Aleksandra, 155
Hajiyan, Hooria, 593 Nwokoro, Chukwudi, 341
Halicka, Katarzyna, 485
Hamid, Hanifah Binti Abdul, 59 O
Harouna, Moussa, 698 Obot, Okure, 341
Harrison, D. K., 802 Opinas Jr., Gil, 478
Hederman, Lucy, 295 Ouattara, Kobenan Ali, 44
Howard, Grant Royd, 742 Ouzzif, Mohammed, 206
Huang, Shihong, 451 Øverdal, Maria, 121

I P
Ibarra-Fiallo, Julio, 370 Pallipuram, Vivek K., 103
Ibrahim, Ahmed Mamdouh Abdelfatah, 249, 257 Peerzada, Abdul B., 493
Ikwunne, Tochukwu, 295 Pérez-Hernández, María, 370
Intriago-Pazmiño, Monserrate, 370 Pinales, José Ruiz, 499
Priego, Belém, 499
J
R
Junan, S., 1
Rahadian, Dian, 649
RahmtAllah, Enas Abdelwahab Eltom, 766
K Raman, Adhiti, 493
Kaed, Ezzadeen, 633 Rangaraju, Prasad, 493
Kapočiūtė-Dzikienė, Jurgita, 577 Rebola, Claudia B., 286
Kaur, Shubhpreet, 90 Reyes-Ortiz, José A., 521
Kaur, Tarandeep, 90 Risda, Dianni, 649
Khan, S., 802 Rouam, Abdelhadi, 463
Kolomvatsos, Kostas, 69
Krovi, Venkat N., 493 S
Saeed, Murad Abdu, 766
L Saif, Faten A., 16
Langseth, Marius, 121 Salimbajevs, Askars, 577
Latip, Rohaya, 16 Schmid, Matthias J., 493
Lezama Sánchez, Ana Laura, 521 Scholz, Steffen, 398
Liang, Y. Daniel, 752 Scrivner, Olga, 607
Lyons-Rocque, Catherine, 312 Semwal, Sudhanshu Kumar, 312
Author Index 817

Seonghoon, K., 1 U
Sharma, Deepak, 622 Udby, Tristan, 138
Sharma, Sukhdeep, 622 Udo, Aniema I. A., 341
Silva, Carlos, 565 Uzoka, Faith-Michael, 341
Singh, Aditi, 432
Skadiņš, Raivis, 555 V
Vargas-Alfonso, Erwin, 286
Skotti, Xenia, 69
Srivastava, Manu, 493
W
Stephanus, Botha Benjamin, 724 Wahyudin, Dinn, 649
Suryadi, Andri, 649 Walker, Ian D., 493
Svetsky, Stefan, 680 Wall, P. J., 295
Wan Ahmad, Wan Fatimah, 786
Williams, C. Todd, 710
T Wong, Dennis, 734
Taylor, Rebecca M. C., 534
Y
Theran-Suarez, Carlos, 381
Yau, Peter ChunYu, 734
Tiado, Mahamadou Issoufou, 698
Yong, B., 1
Tian, Yun, 138 Yussiff, Abdul-Lateef, 786
Tito, Euler, 565 Yussiff, Alimatu–Saadia, 786
Torres-Constante, Eddy, 370
Tovar Vidal, Mireya, 521 Z
Tripathi, Anshuman, 478 Zaizi, Nurzi Juana Binti Mohd, 59
Tso, Ejoe, 734 Zidoum, Hamza, 413

You might also like