Proceedings of FTC - Deni Darmawan
Proceedings of FTC - Deni Darmawan
Proceedings of FTC - Deni Darmawan
Proceedings
of the Future
Technologies
Conference
(FTC) 2022,
Volume 3
Lecture Notes in Networks and Systems
Volume 561
Series Editor
Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences,
Warsaw, Poland
Advisory Editors
Fernando Gomide, Department of Computer Engineering and Automation—DCA,
School of Electrical and Computer Engineering—FEEC, University of Campinas—
UNICAMP, São Paulo, Brazil
Okyay Kaynak, Department of Electrical and Electronic Engineering,
Bogazici University, Istanbul, Turkey
Derong Liu, Department of Electrical and Computer Engineering, University
of Illinois at Chicago, Chicago, USA
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Witold Pedrycz, Department of Electrical and Computer Engineering, University of
Alberta, Alberta, Canada
Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Marios M. Polycarpou, Department of Electrical and Computer Engineering,
KIOS Research Center for Intelligent Systems and Networks, University of Cyprus,
Nicosia, Cyprus
Imre J. Rudas, Óbuda University, Budapest, Hungary
Jun Wang, Department of Computer Science, City University of Hong Kong,
Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest
developments in Networks and Systems—quickly, informally and with high quality.
Original research reported in proceedings and post-proceedings represents the core
of LNNS.
Volumes published in LNNS embrace all aspects and subfields of, as well as new
challenges in, Networks and Systems.
The series contains proceedings and edited volumes in systems and networks,
spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor
Networks, Control Systems, Energy Systems, Automotive Systems, Biological
Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems,
Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems,
Robotics, Social Systems, Economic Systems and other. Of particular value to both
the contributors and the readership are the short publication timeframe and
the world-wide distribution and exposure which enable both a wide and rapid
dissemination of research output.
The series covers the theory, applications, and perspectives on the state of the art
and future developments relevant to systems and networks, decision making, control,
complex processes and related areas, as embedded in the fields of interdisciplinary
and applied sciences, engineering, computer science, physics, economics, social, and
life sciences, as well as the paradigms and methodologies behind them.
Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago.
All books published in the series are submitted for consideration in Web of Science.
For proposals from Asia please contact Aninda Bose ([email protected]).
123
Editor
Kohei Arai
Faculty of Science and Engineering
Saga University
Saga, Japan
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Editor’s Preface
We are extremely delighted and excited to present before you the seventh Future
Technologies Conference 2022 (FTC 2022), which was successfully held during
20–21 October 2022. COVID-19 necessitated this conference to be held virtually
for two years. However, as the pandemic waded and restrictions eased, we managed
to recreate the scholarly aura by having the esteemed conference in hybrid mode,
wherein learned researches from across the globe adorned the stage by either their
in-person presence or via the online mode. Around 250 participants from over 60
countries participated to make this event a huge academic success.
The conference provided a wonderful academic exchange platform to share the
latest researches, developments, advances and new technologies in the fields of
computing, electronics, AI, robotics, security and communications. The conference
was successful in disseminating novel ideas, emerging trends as well as discussing
research results and achievements. We were overwhelmed to receive 511 papers out
of which a total of 177 papers were selected to be published in the final proceed-
ings. The papers were thoroughly reviewed and then finally selected for publishing.
Many people have collaborated and worked hard to produce a successful FTC
2022 conference. Thus, we would like to thank all the authors and distinguished
Keynote Speakers for their interest in this conference, the Technical Committee
members, who carried out the most difficult work by carefully evaluating the
submitted papers, with professional reviewing and prompt response and to Session
Chairs Committee for their efforts. Finally, we would also like to express our
gratitude to Organizing Committee who worked very hard to ensure high standards
and quality of keynotes, panels, presentations and discussions.
We hope that readers are able to satisfactorily whet their appetite for knowledge
in the field of AI and its useful applications across diverse fields. We also expect
more and more enthusiastic participation to this coveted event next year.
Kind Regards,
Kohei Arai
Conference Program Chair
v
Contents
vii
viii Contents
1 Introduction
Sihai Wen, 2002, suggested that the term “Piezoelectricity” refers to the alteration of the
electric polarization with stress; this change results in a generation of voltage across the
material in the direction of the polarization [1]. Piezoelectricity is related to the dielectric
behavior of material. According to Ang Hu, 1999, the dielectric constant is a material
property that is related to the dipole electric moment per volume unit [2]. Kim et al., 2011,
pointed out that piezoelectricity allows the conversion of mechanical energy generated
by mechanical vibration to electrical energy [3]. Rahman et al., 2014, mentioned A
piezoelectric material has the ability to transform a mechanical movement like pressure,
movement of substance, or vibration into an electrical signal or electrical power and
vice versa[4]. This energy conversion can be used for the generation of electrical power.
Piezoelectric (PZ) energy harvesting technology has significant advantages over other
renewable energy sources such as solar, wind, and geothermal [5, 6]. Using the pressure
of vehicles caused by gravity, the method generates electric energy from the deformations
in the paving materials [7].
PZT is composed of a perovskite-type crystalline structure. This structure is rep-
resented by the compositional formula ABO3. This structure can achieve large piezo-
electricity, when A is replaced by Pb. This feature of PZT materials can be optimized
by compositional alterations [4]. Piezo ceramics are physically active, chemically inert
and relatively inexpensive to manufacture. PZT ceramic has value because of its higher
sensitivity and operating temperature than other piezo ceramics [8]. PZT-based ceramics
materials show high performance while being used for various purposes at a relatively
low cost. An important feature of PZT is it has large piezoelectricity. The piezoelectric-
ity has ability to intensify on phase boundary composition between rhombohedral and
tetragonal phases in the solid state. The other name phase boundary is morphotropic
phase boundary (MPB) [4]. There are some required electrical characteristics required
for practical usages which are not necessarily high on the MPB. A tetragonal phase PZT
generally has higher heat-proof characteristics, which is often applied to the applications
requiring high temperature durability. All of these compositions could be chosen to meet
the demands of each application. The phases of PZT are easily controlled by the com-
positional change of the zirconate and titanate ratio in several situations. PZT materials
could be given good shaping with flexibility. Utilizing the mechanical displacement or
vibration, the performance of piezoelectric device could be greatly altered by a device
shape including in case of non-resonant devices as well [4].
This technology has been tested for a variety of purposes, including sensors [9–12],
roadway lighting and bridge bearing [13, 14], structural health monitoring [6, 7, 10],
deicing [15] and traffic monitoring [16]. However, the amount of electric voltage pro-
duced by piezoelectric material is not as high as other alternative sources [3]. In addition,
the economic efficiency of producing energy using piezoelectric materials is also not
very high. Thus, it is essential to conduct more research on ways to harvest energy using
Developing a Prototype Piezoelectric Wafer-Box for Optimal Energy 3
The objectives of this research project were: 1) the development of seven different
shaped prototype wafer box utilizing the 3D printer and CAD, 2) the development of an
experimental design for conducting load wheel testing with PZT sensors embedded in a
wafer-box, 3) the identification of the wafer-box shape capable of producing the highest
energy. The research project utilized plastic materials and ceramic disk PZT electronic
sensors for the development of the wafer-boxes. The lesson learned from this research
can be knowledge basis to improve methods for harvesting maximum amounts of energy
in future studies.
3 Methodology
Since there is no specific standards exist to conduct material testing in this research
area, the research team referred to ASTM Standards and UL standards that provide
guidelines for different material testing and environmental standards. The null hypothesis
was assumed that all the means of different data groups are the same. The alternative
hypothesis was assumed that not all the means of different data groups are same.
Two experimental design methods were used for conducting the wheel load test. One was
the preliminary experimental design, and the second was the final experimental design.
This preliminary data collection allowed for the research team to improve the final data
collection and analyses [19, 20]. For the preliminary experimental design five circular
disk PZT sensors were randomly assigned to be embedded into a wafer-box. For easy
4 S. Ahmad et al.
identification, these five sensors were provided a mark ranging from 1 to 5 (Table 1). The
same sensor was not embedded in same wafer box for the second time. The wafer-box
PZT sensor combinations for the preliminary test have been shown in Fig. 1.
The PZT sensor and wafer-box combinations for the final experiment are provided
in Table 2. For the final experimental design twenty-five circular disk PZT sensors were
randomly selected. These sensors were given a mark from 1 to 25 to be identified easily
(Table 2). These sensors were arranged into five blocks including B1, B2, B3, B4, and
B5. Each block contained five PZT sensors. The same sensor was not embedded in same
wafer-box for the second time [21–23]. The wafer-box PZT sensor combinations for the
final experiment are visualized in Fig. 2.
Table 1. Possible combinations of PZT sensors and wafer-box (preliminary data collection)
3.2 Voltage Data Collection Method from Wheel Load Test (WLT)
The wafer-boxes with embedded PZT sensors (see Fig. 3) were tested under the load
wheel of the Asphalt Pavement Analyzer (APA) machine (see Fig. 4.) A vertical load of
152lbs was applied on each wafer-box with an embedded PZT sensor by the load wheels
of the APA machine. This vertical load on wafer-box generated mechanical energy
which was received by the PZT sensor. The test wheel completed 30 full cycles in 60 s
means 1 complete cycles took 2 s. The wheel movement was forward and backward and
Developing a Prototype Piezoelectric Wafer-Box for Optimal Energy 5
Table 2. Possible combinations of PZT sensors and wafer-box (final experiment design)
backward. So one forward movement took 1 s and one backward movement took 1 s.
The test speed was set at 30 Hz which means the wheels completed 30 full cycles in 60
s. The APA cabin temperature was kept at 30 degrees Celsius. The test conditions were
maintained in the same during testing of all the wafer-boxes.
6 S. Ahmad et al.
This research project used several statistical methods including one-way ANOVA test,
Tukey HSD-Q test and Scheffé’s method to analyze collected data. The one-way analysis
of variance (ANOVA) was utilized to identify substantial differences among two or more
independent groups of datasets. The Tukey’s Honest Significant Difference (HSD) test,
is a post-hoc test based on the standardized range distribution. An ANOVA test can show
if results are significant overall, but it does not show exactly where those differences
lie. After an ANOVA result is found to be significant, and then the Tukey’s HSD test
can be applied to learn which specific group’s means (compared with each other) are
different. The test compares all possible pairs of means [25]. The Scheffé’s Test is a
post-hoc test used in analysis of variance. In an analysis of variance (ANOVA) test if the
null hypothesis is rejected, showing that the means of different data groups are not the
same, the Sheffe’s test could then be run to find out which pairs of means are significant.
For Scheffé’s method, a T statistic is defined as the ratio of unsigned contrast mean to
contrast standard error [26]. The basic difference between these two post-hoc tests is
the Tukey’s HSD test is used for data groups of similar sample size and the Scheffé’s
method could be used for data groups of both equal and unequal sample sizes.
Different structural properties of the wafer-boxes, including section modulus, area
moment of inertia, radius of gyration and extreme points, were utilized in linear regres-
sion modelling with an average voltage value produced by different shaped wafer-boxes.
Regression models were developed to substantiate the validity of the research outcomes.
4 Results
4.1 Analysis of the Experiment Data
The data were analyzed using statistical tools to identify the voltage produced by different
shapes of wafer-boxes. The average voltage (RMS) produced by different shaped wafer-
boxes coupled with PZT sensors in the preliminary experiment were ranked by the
average voltage value, from highest to lowest. The preliminary data indicated that the
circular shaped wafer-box had the highest voltage average than all other shapes. The
8 S. Ahmad et al.
hexagonal and triangular shaped boxes produced similar voltage values to each other.
The square shape produced the lowest voltage. After the preliminary experiment, the
researchers conducted the final experiment with seven different shapes of wafer boxes.
In the preliminary experiment, five basic shapes were selected to identify which shape
produce the highest amount of voltage. Later the research scope allowed to work with two
more basic geometric shapes to substantiate if they produce more voltage than the other
shapes. For the final experiment data, a regression analysis was conducted to identify
the reasons of different amounts of voltage produced by different shaped wafer-boxes.
A simple graph (Fig. 5) represents the average voltage value produced by 25 sensors
embedded into different shaped wafer-boxes. Table 3 shows the ranking of wafer-boxes
according to the average voltage value. As shown in the graph the right angled triangle
shape has the highest average energy (voltage), and the square-shaped box produces
lowest average voltage value. Table 3 shows the 175 voltage (RMS) values produced
by twenty-five PZT sensors coupled with seven different shaped wafer boxes, and the
various average values produced by different PZT sensors embedded in the wafer-box.
The Tukey HSD Test Table 4 shows the results of the Tukey HSD test analysis, comparing
pairs of wafer-boxes with embedded sensors based on the calculated Q-statistic and p-
value described in the Tukey HSD method [25]. There was a significant difference among
different data groups at the two selected confidence intervals (significance level α = 0.01
and 0.05).
10 S. Ahmad et al.
Treatments pairs Tukey HSD Qstatistic Tukey HSD p-value Tukey HSD inference
Rectangle vs Circle 5.4053 0.0034066 Significant
Rectangle vs Square 9.3806 0.0010053 Significant
Rectangle vs 2.4996 0.5617074 Insignificant
Triangle
Rectangle vs 1.7817 0.8569824 Insignificant
Hexagon
Rectangle vs Right 8.5558 0.0010053 Significant
triangle
Rectangle vs 7.9997 0.0010053 Significant
Rhombus
Circle vs Square 15.223 0.0010053 Significant
Circle vs Triangle 2.9916 0.3486336 Insignificant
Circle vs Hexagon 3.7307 0.1213037 Insignificant
Circle vs Right 3.4428 0.1904155 Insignificant
triangle
Circle vs Rhombus 2.8715 0.401228 Insignificant
Square vs Triangle 12.2314 0.0010053 Significant
Square vs Hexagon 11.4923 0.0010053 Significant
Square vs Right 18.0685 0.0010053 Significant
triangle
Square vs Rhombus 17.4972 0.0010053 Significant
Triangle vs Hexagon 0.7391 0.8999947 Insignificant
Triangle vs Right 6.317 0.0010053 Significant
triangle
Triangle vs 5.7457 0.0013962 Significant
Rhombus
Hexagon vs Right 7.0271 0.0010053 Significant
triangle
Hexagon vs 6.4558 0.0010053 Significant
Rhombus
Right triangle vs 0.5505 0.8999947 Insignificant
Rhombus
The SCHEffé’s Method Table 5 shows the results of Scheffé’s method test analysis,
comparing pairs based on the calculated T-statistic and p-value described in Scheffé’s
method [26]. It can be identified from the statistical analysis that there was a significant
difference among different data sets at any confidence interval (Significance level α =
0.01 and 0.05).
Developing a Prototype Piezoelectric Wafer-Box for Optimal Energy 11
Descriptions Values
Circular Triangular Hexagonal Rectangular Square Right Rhombus
triangle
Area Moment of 139.15138 239.2961 202.81994 142.41983 205.7792 162.96175 129.2496
Inertia Section
Properties
(inch^4)
Section modulus 38.12367 38.59616 46.10552 48.71977 58.41512 31.69968 31.96175
(inch^3)
Radius of 1.825 2.1948 2.013 1.69065 2.03745 1.8054 3.755
gyration (inch)
Extreme points 3.65 6.2 4.39962 2.925 3.525 5.1 4.6
(inch)
18.03%. The R2 value for the section value is 0.9494. This means the section modulus can
account for 94.94% variation in average voltage values. Thus, from regression analysis it
could be said, out of four properties, the section modulus is the most influential structural
property affecting voltage production. Lower section modulus leads to higher deflections
which translate into higher stress levels and higher voltages.
Regression Relation analysis with average voltage value of 125 and different
description structural properties of wafer-box
Area moment of Section modulus Radius of Extreme points
inertia gyration
r −0.483 −0.9744 0.3258 0.4246
R^2 0.2332 0.9494 0.1061 0.1803
Relationship Very weak Very strong Very weak Very weak
negative negative positive positive
relationship relationship relationship relationship
Independent and Insignificant Insignificant Insignificant Insignificant
dependent variable
relationship
significance
according to those section modulus values. If the section modulus is high for any shape,
the member is more resistant to bending. When two plates of same materials are com-
pared, according their section modulus value the plate with higher section modulus can
bear more loads than the plate with lower section modulus. Alternatively, a wafer-box
with a lower the section modulus will bend more when force is applied to it. Out of
seven shapes, the right triangular shaped wafer-box bended more than the other shapes.
The square box bended the least as it had the highest section modulus value. Because
the right angled triangular box bended more due to the lower section modulus, the PZT
sensors imbedded in it bent more as well. The more PZT sensors bend, the more elec-
tricity is produced. According to this principle, the square shape produced the lowest
energy. The energy produced by the other five shapes (rhombus, rectangular, hexagonal,
circular, triangular, square) were shown by this bending moment principle.
Acknowledgment. This research project was funded by the Georgia Technology Research Insti-
tute (GTRI) as part of the Kennedy Space Center (KSC) Vapor Trail Walkway Project with which
GTRI contracted with Delaware North Companies (DNC) and NASA. See the research team
website. http://pzmaterialtest.s3-website-us-east-1.amazonaws.com/
References
1. Wen, S., Chung, D.D.L.: Piezoelectric cement-based materials with large coupling and voltage
coefficients. Cement Concrete Res. ELSEVIER 32(3), 5 (2002)
2. Hu, A., Fang, Y., Young, J.F., Oh, Y.-J.: Humidity dependence of apparent dielectric constant
for DSP cement materials at high frequencies. J. American Ceramic Soc. 82(7), 8 (1999)
3. Kim, H.S., Kim, J.H., Kim, J.: A review of piezoelectric energy harvesting based on vibration.
Int. J. Precision Eng. Manufact. 12(6), 1129-1141 (2011)
4. Rahman, M., et al.: 1.02 - Techniques for assessing the properties of advanced ceramic mate-
rials. In: Comprehensive Materials Processing, Hashmi, S., et al., Editors: Elsevier: Oxford,
pp. 3–34 (2014)
5. Harnessing Pavement Power: Developing Renewable Energy Technology in the Public Right-
of-Way. Federal Highway Administration, p. 2 (2013)
6. Xiong, H., et al.: Piezoelectric energy harvesting from traffic induced deformation of
pavements. Int. J. Pavement Res. Technol. 5(5), 333–337 (2012)
7. Ali, S.F., Friswell, M.I., Adhikari, S.: Analysis of energy harvesters for highway bridges. J.
Intell. Mater. Syst. Struct. 22(16), 1929–1938 (2011)
8. APC International, L. PZT Materials. PIEZO Theory 2016 07/28/2018 [cited 2018. https://
www.americanpiezo.com/piezo-theory/pzt.html
9. Gkoumas, K., Petrini, F., Bontempi, F.: Energy harvesting for the life-cycle of structures and
infrastructures: State of art, recent trends and future developments. In: Life-Cycle and Sus-
tainability of Civil Infrastructure Systems: Proceedings of the Third International Symposium
on Life-Cycle Civil Engineering (IALCCE’12), Vienna, Austria, October 3–6, 2012. CRC
Press (2012)
10. Yu, L., et al.: In-situ health monitoring on steel bridges with dual mode piezoelectric sensors.
In: Nondestructive Characterization for Composite Materials, Aerospace Engineering, Civil
Infrastructure, and Homeland Security 2013, March 11, 2013 - March 14, 2013. SPIE, San
Diego, CA, United states (2013)
11. Yu, L., et al.: Piezoelectric based sensing in wireless steel bridge health monitoring. In: Non-
destructive Characterization for Composite Materials, Aerospace Engineering, Civil Infras-
tructure, and Homeland Security 2009, March 9, 2009 - March 11, 2009. SPIE, San Diego,
CA, United states (2009)
Developing a Prototype Piezoelectric Wafer-Box for Optimal Energy 15
12. Vijayaraghavan, K., Kossett, A., Rajamani, R.: Passive Roadside Reflectors and Communi-
cations Systems for Improvement of Radar Reliability, p. 54 (2006)
13. Baldwin, J.D., et al.: Energy Harvesting on Highway Bridges, p. 24 (2011)
14. Wang, M., Chang, P.C., Newcomb, R.: Power scavenging from highway bridge vibration. In:
1st International Conference on Structural Health Monitoring and Intelligent Infrastructure,
SHMII-1’2003, November 13, 2003 - November 15, 2003. Tokyo, Japan: A.A. Balkema
(2003).
15. Symeoni, A.: A review on energy harvesting from roads (2013)
16. Huang, R.-B., et al.: Technical approach and research prospect of piezoelectric energy harvest
from highway. Zhongguo Gonglu Xuebao/China J. Highway Transp. 25(6), 1–8 (2012)
17. Sun, C.-H., et al.: Designing piezoelectric harvesting unit from road vibration. In: 4th Inter-
national Conference on Manufacturing Science and Engineering, ICMSE 2013, March 30,
2013 - March 31, 2013. Dalian, China: Trans Tech Publications Ltd. (2013)
18. Zhao, H.D., Ling, J.M., Fu, P.C.: A review of harvesting green energy from road. In: 8th
International Conference on Road and Airfield Pavement Technology, ICPT 2013, July 14,
2013 - July 18, 2013. Trans Tech Publications Ltd., Taipei, Taiwan (2013)
19. Winchester, C.L., Salji, M.J., Kasivisvanathan, V.: Gathering preliminary data. J. Clinical
Urology 10(6), 568–572 (2017)
20. NCBI: Preliminary studies and pilot testing. Field Trials of Health Interventions: A Toolbox
2015 [cited 2018 07/1/18]; 3rd https://www.ncbi.nlm.nih.gov/books/NBK305518/
21. University, Y. Experimental Design. Experimentation [cited 2018 06/05/2018] (1997) http://
www.stat.yale.edu/Courses/1997-98/101/expdes.htm
22. Encyclopedia, W.t.F. Random sampling. Random assignment 02/11/2018 [cited 2018
03/10/18]; (2018). https://en.wikipedia.org/wiki/Random_assignment
23. Teaching, C.f.I.i.R.a. Types of Experimental Research. Experimental Research 07/28/2018
[cited 2018 04/28/2018]; (2018). https://cirt.gcu.edu/research/developmentresources/res
earch_ready/experimental/design_types
24. Engineering, R.A.o. The Study of Root Mean Square (RMS) Value. Mechanical, Electrical,
Electronics Engineering [cited 2018; (2018). https://www.raeng.org.uk/publications/other/
8-rms
25. Technology, N.I.o.S.a., Tukey’s Method Handbook of Statistical Methods, ed. I.T.L. (ITL).
Vol. 2018. MD, USA: NIST. Statistical method (2018)
26. Technology, N.I.o.S.a. Scheffe’s method. Engineering Statistics Hndbook [cited 2018
03/05/18]; Statistical method]. (2018). https://www.itl.nist.gov/div898/handbook/prc/sec
tion4/prc472.htm
27. Safayet, A.J.: Designing and Testing 3-D Printed Wafer-box with Embedded PZT Sensors to
Identify the Shape Effect on Energy Harvesting. Electronic Theses and Dissertations. 1751
(2018). https://digitalcommons.georgiasouthern.edu/etd/1751
Hybrid Meta-heuristic Genetic Algorithm:
Differential Evolution Algorithms for Scientific
Workflow Scheduling in Heterogeneous Cloud
Environment
1 Introduction
Cloud computing is widely used to deliver services to end-users and enterprises which
are distributed geographically regardless of whether or not they are in the cloud. Cloud
computing aims to share remote resources with clients such as applications, storage,
databases, servers, and services on-demand using a pay-as-you-go method. Due to the
ubiquitous nature of the Cloud technology, users will be able to execute large-scale
computations without running out of network bandwidth with low size limitations [1,
2]. Internet services have attracted attention in the current decade to cloud computing
such as Microsoft Azure,Amazon Web Services, and Google App Engine. Cloud enables
access to the resources in the Virtual Machines (VMs) form [3, 4]. The main categories
of cloud services can be divided into three: (IaaS) refers to Infrastructure as a Service that
is deployed in Amazon EC2, (SaaS) indicates software as a Service to provide the online
applications for users, and (PaaS) refers to Platform as a Service to facilitate deploying
the applications for users and provides them with the controlling [5]. The attraction to
adopt the technology of cloud computing by various levels of companies has been raised
in recent years owing to many aspects such as the prompt improvement of the proces-
sor of the computers in the form of multi-crore processors. Furthermore, it decreases
significantly the cost of the system hardware. One of the main functions of cloud com-
puting plays a vital role in allocating tasks to suitable resources in polynomial time
with meeting the Quality of Service (QoS) to satisfy end-user requirements. In addition,
scheduling is a NP-complete problem, especially for Large-scale tasks. So, this chal-
lenge demands a rough solution by maintaining the constraints to improve objectives of
scheduling such as decreasing the energy consumption, the cost of communication, and
completion time. Also, increase the throughput, resource utilization, load balancing, fault
tolerance, tardiness, laxity, and deadline. Generally, there are common classifications of
task scheduling algorithms in the heterogeneous resources [6] namely: heuristic algo-
rithms and metaheuristic algorithms. The heuristic approach offers an optimal solution.
There are different heuristic approaches like Critical Path on a Processor (CPOP), Het-
erogeneous Earliest Finish Time, Graham algorithm, and Minimum Completion Time
(MCP). On the other hand, meta-heuristics get popularity to obtain a good result for the
NP-hard problem with minimal effort of computation making it appropriate for a large-
scale task. There are various popular algorithms like Genetic Algorithm (GA), Particle
Swarm Optimization (PSO), Honeybee, and Ant Colony Optimization (ACO). However,
both approaches heuristic and meta-heuristic have not provided satisfying solutions for
scheduling tasks in heterogeneous resources in the cloud environment.
In this regard, with the speedy improvement of big data, cloud com is with existence
of the numerous workflow applications, for instance, the video intelligent surveillance
that required instant processing to manipulate the five modules: (motion, object) detector,
user interface, object tracking, tilting control, and Zoom monitoring. However, the far
distance of cloud computing and the limited bandwidth may play challenges to the
workflow scheduling in cloud computing. From this point, the significant question is
how to reduce the makespan of workflow scheduling in the cloud and maintain instant
processing for critical applications that cannot afford the delate. The various tasks’
specifications that are manipulated in the machine have become a hot spot to attract the
researchers to take into their consideration the heterogeneity of the system to achieve the
user’s requests effectively. Furthermore, abundant studies have been conducted about
scheduling in cloud computing, which utilize one of the scheduling categories: heuristics
and metaheuristics approach to obtain the approximated solutions. Thus, the significant
question is which approach is more convenient for a distributed environment.
18 F. A. Saif et al.
In the workflow scheduling, the scholars have implemented one of the metaheuristic
algorithms (PSO, Genetic algorithm, and ant colony) for scheduling due to its ability
to provide reasonable solutions in a short execution time and are more appropriate for
the distributed computing environment. Therefore, the main contribution of this study is
merging two metaheuristics to take advantage of the algorithms to guarantee fulfilling
the main aim of the study, which is about reducing the makespan during scheduling the
workflow in cloud computing. The proposed metaheuristic algorithms are differential
evolution (DE) and Genetic algorithm (GA) which are considered Evolutionary Algo-
rithms (EA) that are for solving difficult and complex problems [7]. DE is powerful
among evolutionary algorithms and stochastic search techniques for obtaining results
for optimization problems that are commonly utilized in the science and engineering
area [8]. The main feature of this algorithm is its ability to obtain the global optimal
solution and is stronger than other metaheuristic algorithms. The DE algorithm has the
ability to raise the efficiency of resources and reduce the makespan [9]. However, this
algorithm is easily stuck in the local optimum solution and faces other problems like
premature convergence and slow convergence [3]. Thus, to overcome this challenge and
enhance the DE the study resorted to adopting a Genetic algorithm and exploiting its
feature of overcoming the challenge of trapping in the locoal optimum solution, and
refine the research space. Genetic algorithm (GA) is commonly used in the academic
and industrial fields due to its intuitiveness, easy and simple implementation, and high
capability to solve the problems such as complex, nonlinear, and mixed-integer opti-
mization because it can manipulate the continuous and discrete variables and nonlinear
objectives and meeting the constraints with no need of the information of gradient [10].
Thus, integrating more than one meta-heuristic algorithm for integrity to accomplish
the desired objective. Furthermore, conducting this proposed algorithm versus heuristic
algorithms to validate the metaheuristic is more appropriate for scheduling in distributed
computing.
This study discusses the scheduling of workflow in the heterogeneous cloud and
verifies how scaling the resources by fulfilling the requirements in a heterogeneous
cloud computing environment. The verification of the proposed hybrid meta-heuristic
algorithm (GA - DE) algorithm) respecting of makespan conducting the simulation on
the scientific workflow (Cybershake, Montage, and Epigenetics). The paper is organized
as follows. Sect. 2 illustrates related works; Sect. 3 represents the proposed GA-DE
Algorithm; Sect. 4 describes the experimental setup; Sect. 5 illustrates the results and
summarize, Sect. 6 sets out the conclusion.
2 Related Works
In recent decades, the majority of research focus on investigating issues pertinent to cloud
environment. That it has a crucial role in gathering parallel and distributed systems in
a singular platform. It relies on VMs instead of physical machines for processing and
configuration that facilitates sharing the resources with the remote users via the Internet
in place of utilizing supercomputers that required a high cost for maintenance [11].
Furthermore, cloud computing has an important bearing on allocating various numbers
of the shared resources with distributing user over the Internet [12]. This part is taking
Hybrid Meta-heuristic Genetic Algorithm: Differential Evolution Algorithms 19
a view of scheduling algorithms in cloud computing and highlights the criteria that
affect the process of scheduling. Moreover, describing the effective operation in various
objectives such as delay, energy consumption, cost, and so on. Besides, discuss and
examine the previous research works that are relevant to the research area in cloud
computing. The reviews focus on the three main types of scheduling approaches in the
cloud computing environment, namely: heuristic, meta-heuristic, and hybrid algorithms.
The definition of scheduling is a process of mapping the tasks to the propper pro-
cessor based on the required objectives, such as raising the execution speed, reducing
makespan, and reducing the cost and delay. The main purpose of scheduling in cloud
computing is raising the effectiveness to optimize the performance of system and reduce
the overhead over network [13]. The role of scheduling is guiding the process of execu-
tion of tasks (dependent or independent, like workflow) on the shared resources as stated
in [14]. It assigns tasks to suitable resources to fulfill the user requirements that enhance
the system’s performance. Generally, there are two models of workflow scheduling,
called QoS based and best-effort based. Scheduling the best-effort based depends on the
reduction of the makespan while the QoS based focuses on decreasing the makespan
by maintaining the constraint’s requirement. For example, reducing the cost under the
budget or makespan under deadline. Workflow is considered as a set of independent tasks
that are related to each other (i.e., can’t execute tasks before finishing the previous one)
that facilitate execute the complex applications that are deployed in the heterogeneous
computing and represent a directed acyclic graph (DAG) [6]. The main technique to
measure the performance effectiveness of the scheduling process is by experimenting’s
performance metrics like makespan, laxity, tardiness, delay, cost, and energy consump-
tion [15]. Recently many studies on workflow applications which has large-scale com-
puting and cloud computing, offer a significant chance to execute the s workflows at
low. A workflow application can be illustrated by a Directed Acyclic Graph (DAG) that
contains nodes and edges to represent the tasks and data dependencies respectively.The
main role of a dependency is stopping child node executing till all its parent tasks are
complete from execution then sending the demand child input data. Finishing executes
all tasks that are namely the schedule length or makespan. The model to gain the overall
cost of execution for tasks can take into account storage costs, data transfer costs, and
computation costs [7]. Workflow is considered as set of computational tasks and its pat-
tern are repeatable and dependent on dependent tasks. Workflows represents a series of
activities and mechanisms utilized for executing a singular or set of tasks. Input/Output
intensive workflows demand a massive amount of input data and massive output data.
The main performance objectives that are considered in scheduling in workflow in cloud
computing are makespan, cost, energy, load balancing, and resource utilization. Gener-
ally, executing large-scale of workflow applications demands scalable data and capable
resources [9]. Workflow can be categorized into two types, called business workflow
and scientific workflow. (i) Business Workflow represents the real work that has a set
of sequence of business processes and activities, and (ii) Scientific workflows repre-
sent a scientific application that depends on other tasks and has a complex execution.
Scientific workflows assist in the formulation and struct of the complex process. Many
algorithms have been implemented in workflow scheduling. Generally, scheduling algo-
rithms depend on an optimization approach that is the optimal approach or a heuristic
20 F. A. Saif et al.
algorithms respecting to makespan and execution time reduction. The work reported in
[26] introduces a new hybrid meta-heuristic approach for scheduling the incoming tasks
that are considered as a bag of task (BOT) application in the interconnected cloud envi-
ronment. The approach cooperates the simulated annealing algorithm with a tabu search
meta-heuristic algorithm for minimizing the cost of scheduling and develop the schedul-
ing processing performance. The proposed algorithm has been compared with the Fastest
Processor Largest Task based on the arrival and the running time. The outcomes have ver-
ified the effectiveness of the proposed algorithm in reducing the cost and the makespan
of the scheduling. The problem of task scheduling in the cloud computing environment
has been discussed in [27]. They have developed a new algorithm called (PSO-SA)
algorithm and its role is resource provisioning in multi-tier cloud computing by imple-
menting the meta-heuristic appraoch. The proposed method also implemented the PSO
with simulated annealing and a hybrid algorithm that has (PSO) and (SA) that leads to
speed up the provisioning of resources in cloud environemnt. Furthermore, The work
reported in [28] propose a constraints-aware multi-QoS workflow scheduling approach
to handle the multi-QoS constraints in grid workflow—a technique based on hybridizing
the PSO algorithm and a novel look-ahead strategy based on a min-max heuristic. The
approach does not take into account accelerating the convergence. The study introduced
in [29] has addressed the topic of cloud computing scheduling. To address scheduling in
virtual machines in the cloud, they developed an approach that incorporates ant colony
optimization and particle swarm optimization (ACOPS). The algorithm uses previous
Hybrid Meta-heuristic Genetic Algorithm: Differential Evolution Algorithms 23
effectiveness of the proposing algorithm with respect to cost and makespan. The research
[38] has proposed scheduling the workflow in cloud computing by providing a hybrid
technique based on a genetic algorithm and heterogeneous earliest finish time namely a
deadline-constrained and cost-effective hybrid genetic algorithm for scientific work-
flow scheduling in cloud computing (DCHG-TS) (HEFT). All experiments have been
accomplished by involving real-world scientific workflows and the result demonstrates
the proposing strategy performs better than the previous strategies respecting to execu-
tion time and cost. Table 2 summarizes the related works related to the hybrid approach
in the cloud computing environment.
Table 2. Summary of related works using the hybrid approach for workflow scheduling
algorithms for cloud computing
Table 2. (continued)
Table 2. (continued)
to guarantee execution of the tasks for the maximum number of successors that are
executed first. The mapping phase is for aligning the resources with tasks of workflow.
However, that is why this algorithm is not suitable for dynamic changing of workflow
data centers [38]. The main factors in the HEFT algorithm are EST that indicate to the
earliest start time and EFT related to the earliest finish time of the task execution ni on
the processor pj. The value of EST of the entry task in DAG is equivalent to (0) which is
referring in Eq. (1) and the values of EST and EFT are obtaining from Eqs. (2) and (3).
For gaining the EFT of task ni, all the predecessors of the task ni should be scheduled.
The pred(ni) indicates to the entire predecessors of the task ni and the avail j referring
to the ready time of the pj processor for task executing. Calculating the time of all the
requiring data for ni which arrived at pj is presented by internal max in Eq. (2). The
EST and the EFT of task nm on the processor pj will be equivalent to AST (actual start
time) and AFT (actual finish time) after scheduling the task nm on the processor pj. AFT
indicates to the smallest obtained EFT for that task which is presented in Eq. (5). Finally,
getting the makespan of scheduling is from the AFT of the exit task by Eq. (4). Also,
cm, i presents the communication costs between node m and node i. If the two related
tasks m and i are assigned to the same processor, the cost of cm, i is supposed zero.
EST nentery , pj = 0 (1)
EST ni , pj = max{avail {j}, max AFT(nm ) + cm,i Where i = 0, 1, . . . , n (2)
nm ∈pred(ni )
EFT ni , pj = ωi,j + EST ni , pj (3)
Determine the priority in HEFT based on the upward rank indicate on Eq. (7). The
succ(ni) indicates to a group of the task successors ni and ci;j refers to the AVG cost
of the communication edge (i, j) and ωi is the AVG computational cost of task ni that
is getting through Eq. (8). Upward rank begins from the exit task recursively. Exit task
rank getting from Eq. (6).
max
Ranku = ωi + Cij + ranku (nj ) (7)
nj ∈ Succ(ni)
q
̟i = ωi,j /q (8)
j=1
downward rank is recursively done by Eq. (9) which begins from the entry node of the
graph. Where pred (ni ) indicates to the group of the task predecessors ni. The value of
the downward rank of the entry node is equivalent to zero.
28 F. A. Saif et al.
of the steps that DE algorithm starts with mutation, crossover, then selection whereas
genetic algorithm processes are selection, crossover, and mutation [10].
The work given has enhanced the proposed approach in this paper. The suggested
approach is based on integrating the meta-heuristics of the Genetic Algorithm and Dif-
ferential Evolution Method to determine the ideal task schedule in cloud environments
by identifying the optimal solutions that reduce the makespan of task execution. The
directed acyclic graph G (V, E), where V (Vertices) denotes a set of nodes in the graph
and E (Edges) indicates the priority relationships between jobs. The weight of edges by
the cost of communication between two requested tasks, whereas the weight of nodes
by the value of their computing time. The communication cost is zero when two jobs
are allocated to the same processor. DAG has tasks T1 , T2 , …, T10 , and T0 , which is the
entry task and T10 is the exit task. Figure 1 presents a directed acyclic graph (DAG). The
following is representing the work of the proposed algorithm. Generating a collection of
solutions, the Genetic Algorithm (GA) is first used, which includes selection, crossover,
and mutation procedures. The Differential Evolution (DE) approach uses these generat-
ing solutions as an initial population. The DE processes are then used to these generating
solutions to generate a solution for the GA population. The procedure keeps continuing
till the DE ends the processing and the updated list of all solutions is obtained. It is
ordered from left to right, but the precedence of new generating solutions following the
production of children is violated.
Table 3. Demonstrates the tasks computation cost on the m0 , m1, and m2 and the
̟ points out the task computation cost on the machines. Every task has a various
computation cost on every machine which appears to the heterogeneity of the system.
Tasks m0 m2 M1 ̟i
t0 9 11 10 10
t1 11 7 9 9
t2 8 6 4 6
t3 6 5 7 6
t4 9 17 10 12
t5 7 5 9 7
t6 12 15 9 12
t7 17 12 13 14
t8 8 12 10 10
t9 16 15 14 15
T1 11 10 12 11
3.7 Makespan
Makespan of (execution time) is one of the critical objectives that has an important bear-
ing on the the system performance, that refers to the period of the whole workflow from
Hybrid Meta-heuristic Genetic Algorithm: Differential Evolution Algorithms 31
Tasks Ranku (ti ) Rankd (ti ) Level Ranku (ti ) + Rankd (ti )
t0 123 0 0 123
t1 81 22 1 103
t2 76 24 1 100
t3 96 27 1 123
t4 62 41 2 103
t5 63 45 2 108
t6 77 46 2 123
t7 39 69 3 108
t8 36 75 3 111
t9 45 78 3 123
t10 11 112 4 123
32 F. A. Saif et al.
the beginning to the end or the spending time to execute all tasks of DAG. Calculating
makespan in this study for each individual. First, tasks must allocate to the processor by
HEFT illustrated in Sect. 3.1
avoid falling in the optimum solution. The crossover operator is considered a key factor
to explore the solution. This phase aims to exchange some of the individual’s genes with
another one for producing two adequate offspring. The crossover operator employs a
random single point that is generated between land n and land n + 1. If the genes of
both parents from the entry node to the crossover point do not match, the crossover is
performed. The two additional offspring are produced by the crossing single point, which
is equal to five. On the left side children inherit genes from their parents in the place of the
same gene, then selected genes from the parent are eliminated, and the remaining genes
are imported to the child from left to right. As a result, the offspring’s development will
be effective, and their fitness will be gained through the fitness function. When children’s
fitness values are compared to those of their parents, the children’s fitness values will
be substituted, and the values will be better than those of their parents. To illustrate the
procedure of the crossover (see Fig. 3). The details of crossover operator (see Algorithm
2).
The main role of the mutation operators is raising the variety of the population and
facilitate the exploring in the research area is finding the optimal solution [44]. The
responsibility of the component is generating a new chromosome by modifying two genes
in such a way that the precedence constraint is not broken. This is how the procedure
begins. The first successor of the chode task (tj) from the mutation point to the finish is
identified after a gene is chosen at random. If there is mth gene that is a member of [ i
+ 1, j−1] and the predecessors of tm are not in front of ti, ti and tj can be swapped with
each other. If these conditions are not met in the mutation function, the mutation operator
algorithm is restarted from the beginning. Finally, a child’s fitness value is assessed for
the child, and if the child’s fitness exceeds that of the parent, the child will succeed the
parent. Figure 4 depicts the mutation operator’s detailed method, which was adapted
from [43]. Illustrating the procedure of mutation operator (see Fig. 4).
The evolutionary algorithms category includes both genetic algorithms and DE algo-
rithms. It is well known that these types of algorithms might run indefinitely, hence
when using them, a preset termination condition must be considered in order to end
the solution generation process. Some predetermined policies have been presented in
this research to ensure that the algorithms terminate after delivering the most appropriate
result. The fitness evaluations, the system’s operating times, and the population diversity
have all been taken into account while terminating the algorithms. The algorithm in our
suggested technique is terminated after reaching 1000 iterations.
Hybrid Meta-heuristic Genetic Algorithm: Differential Evolution Algorithms 35
4 Performance Evaluation
This section illustrates an overview of the simulation, setting the main parameters of the
simulation environment and data set.
Parameter Values
Datacenter configurations
MIPS 15000–2000 (Heterogenous)
RAM 4096 – 10240
Storage 150000
Bandwidth 15000
Number of hosts 10
Virtual Machines
MIPS 250–1000 (Heterogenous)
RAM 256–1024
Storage 500
Bandwidth 250
Number of hosts 40
The GA-DE parameters
No. of the population (genes) 80
Selection operator (srate) 30
Crossover operator (crate) 80
Mutation operator (mrate) 20
Number of generations 1000
Population size Randomly
mutation factor (F) Randomly
crossover rate (CR) Randomly
Fig. 5) They are too abstract, the data flow, that is utilized in real applications. Chosen
the scientific workflow applications due to its ability to illustrate that area of applications
in wide broad and the needs in diversity of resources.
In this study, the workflow data sets are CyberShake_30.xml and Cyber-
Shake_50.xml tasks to evaluate the performance of the proposed algorithm with the
Hybrid Meta-heuristic Genetic Algorithm: Differential Evolution Algorithms 37
5 Experimental Results
Conducting the simulation for evaluating the GA-DE algorithms’s performance com-
pared to the three heuristic algorithms (HEFT upward rank, HEFT downward rank, and
HEFT-Level rank) and the metaheuristic algorithm Genetic algorithm (GA) with 30 VMs
and 5 hosts.
In Fig. 6 the bar charts have been drawn between various numbers of tasks on
the X-axis and the corresponding makespan of executing the applications on the Y-
axis. The proposed algorithm (GA-DE) was compared versus other heuristic algorithms
as benchmark (HEFT-Upward rank, downward rank, and Level rank) with respect to
makespan by generating random DAGs with 10, 50, and 100 cloudlets. DA-DE algorithm
outperforms algorithms in terms of makespan despite being constantly raised with the
growing number of tasks.
The result of conducting of experiments that applied to Cybershake workflow to esti-
mate the makespan is shown in Table 6. Figure 7(a) illustrates the experimental results
from the comparison algorithms among the three heuristic approaches (HEFT upward
rank, HEFT downward rank, and HEFT-Level rank), and the metaheuristic algorithm
Genetic algorithm (GA) with the proposed hybrid metaheuristic (GA-DE) that was con-
ducted in cybershake_30 cloudlets. It is obvious from the dramatically different results
obtained from considering the makespan metrics that GA-DE performs the best per-
formance due to accomplishing the workflow scheduling with minimum makespan and
outperforming the comparison algorithms. Then, followed by GA with a reasonable
result and better than the three heuristic algorithms. Also, it is noticed that HEFT down-
ward rank obtains the worst result in this experiment with the highest makespan which
means consuming more energy consumption. Overall, this experimental result proves
that metaheuristic approaches are more proper for scheduling the workflow applications
in the distributed environment than the heuristic algorithm.
38 F. A. Saif et al.
Fig. 6. Comparison of Makespan Versus the Number of various Tasks in Random Graphs
In Fig. 7(b) the bar chart illustrates the number of comparison algorithms and the
makespan rate according to the makespan aspect and conducted by Cybershake_50
cloudlets to simulate the scheduling of workflow in cloud computing. Apparently, the
hybrid algorithm GA-DE remains in the forefront with achieving minimum makespan
despite the increasing numbers of tasks processing, with slightly better than GA. In corre-
sponding, still HEFT downward rank in the worst performance with a higher makespan.
Furthermore, it can be noticed HEFT Level rank has a high makespan and a little differ-
ence in result with HEFT downward rank, which means it loses its stability in scheduling
Hybrid Meta-heuristic Genetic Algorithm: Differential Evolution Algorithms 39
210
208
206
Makespan
204
202
200
198
196
194
HEFT upward HEFT HEFT-Level GA-DE GA
rank downward rank rank
Cybershake_30
(a)
400
350
300
Makespan
250
200
150
100
50
0
HEFT upward HEFT HEFT-Level GA-DE GA
rank downward rank rank
Cybershake_50
(b)
Fig. 7. (a) Comparison with Various Algorithms in Terms of Makespan and Cybershake_30 (b)
Comparison with Various Algorithms in Terms of Makespan and Cybershake_30
the workflow when increasing numbers on tasks. The result of a conducting the experi-
mentations of GA-DE algorithm over various Scientific Workflow to calculate the result
of makespan is shown in Table 7.
In Fig. 8 the second simulation experiments with three real-world scientific appli-
cation DAG structures: (Montage, Epigenomics, and Cybershake) to validate the result
of the proposed GA-DE algorithm with respect to the makespan depending on random
generated DAGs with 100 cloudlets and conducted 100 times till convergent of values.
40 F. A. Saif et al.
It is obvious that the performance of the proposed algorithm when conducting Montage
workflow obtains the best result in terms of makespan with a minimum makespan that is
45.05 followed by Cybershake with a vast difference of about 326.64. The worst perfor-
mance of the proposed algorithm is during conducting the Epigenomics with a massive
difference that is 42138.66.
Generally, this study proves the effectiveness of the hybrid metaheuristic that is
derived from merging two meta-heuristic algorithms than the singular heuristic approach
for task scheduling in distributed computing. A hybrid metaheuristic aims to exploit the
benefits of the metaheuristics by merging them to strengthen the efficiency and overcome
the limitation of the algorithms such as stucking in local optimum. Even more, this
approach achieves minimum makespan that means the lowest response time that plays a
significant impact in guaranteeing the QoS, especially indirectly affecting the experience
of users. Thus, the end-users are open to spending less money to conduct higher services
and that is what this study wants to achieve from the proposed algorithm GA-DE.
6 Conclusion
This study conducted the hybrid meta-heuristic algorithm GA-DE to verify the effe-
ciency of the proposed algorithm for scheduling the workflow in heterogeneous cloud
computing in terms of makespan which is considered as the main objective of the study
to improve the scheduling in cloud computing. This proposed hybrid algorithm has
exploited the features of the two meta-heuristic algorithms namely genetic algorithm
(GA) and Differential Evolution (DE) to reduce the makespan by adopting the roulette
wheel technique to facilitate finding the best optimal solution from fitness value. The sim-
ulation was conducted on scientific workflow (Cybershake, Epigenomics, and Montage)
Hybrid Meta-heuristic Genetic Algorithm: Differential Evolution Algorithms 41
and a cloudsim simulator for modeling the algorithm. The experiment was compared
to three other heuristics (HEFT level rank, HEFT-Upward rank, and HEFT-downward
rank) and the GA meta-heuristic approach. The simulation result demonstrates the effec-
tiveness of the proposed algorithm in order to fulfill the best result compared to others
in providing the reduced makespan. Besides, the simulation result verifies that the best
scientific workflow goes for the Montage workflow whereas the worst result is for the
Epigenomics. However, this study considers a single objective which is reducing the
makespan. Therefore, in the future, we will discuss the multi-objective optimization
(MOP) in term of (energy consumption, delay, and resource utilization) and incorporate
other meta-heuristic. This study can be further extended to adopt with artificial neural
for predicting the workload. Even more, considering multi-objectives optimization such
as ( cost, energy, delay, etc.)
References
1. Rimal, B.P., Choi, E., Lumb, I.: A Taxonomy, Survey, and Issues of Cloud Computing
Ecosystems, pp. 21–46 (2010)
2. Mehdi, N.A., Mamat, A., Amer, A., Abdul-Mehdi, Z.T.: Minimum completion time for power-
aware scheduling in cloud computing. In: Proceedings - 4th International Conference on
Developments in eSystems Engineering, DeSE 2011, pp. 484–489 (2011). https://doi.org/10.
1109/DeSE.2011.30
3. Mehdi, N.A., Ali, H., Amer, A., Abdul-Mehdi, Z.T.: Two-phase provisioning for HPC tasks
in virtualized datacenters. In: International Conference on Emerging Trends in Computer and
Electronics Engineering (ICETCEE’2012), no. March 2012 (2012). https://www.researchg
ate.net/publication/262300772
4. Lagar-Cavilla, H.A., et al.: SnowFlock. In: Proceedings of the fourth ACM european confer-
ence on Computer systems - EuroSys ‘09, p. 1 (2009). https://doi.org/10.1145/1519065.151
9067
5. Armbrust, M., et al.: A view of cloud computing. Commun. ACM 53(4), 50–58 (2010). https://
doi.org/10.1145/1721654.1721672
6. Gupta, A., Garg, R.: Workflow scheduling in heterogeneous computing systems : a survey. In:
2017 International Conference on Computing and Communication Technologies for Smart
Nation (IC3TSN), pp. 319–326 (2017). https://doi.org/10.1109/IC3TSN.2017.8284499
7. Pant, M., Zaheer, H., Garcia-Hernandez, L., Abraham, A.: Differential evolution: a review of
more than two decades of research. Eng. Appl. Artif. Intell. 90, 103479 (2020). https://doi.
org/10.1016/j.engappai.2020.103479
8. Gu, Y., Budati, C.: Energy-aware workflow scheduling and optimization in clouds using bat
algorithm. Futur. Gener. Comput. Syst. 113, 106–112 (2020). https://doi.org/10.1016/j.future.
2020.06.031
9. Alaei, M., Khorsand, R., Ramezanpour, M.: An adaptive fault detector strategy for scientific
workflow scheduling based on improved differential evolution algorithm in cloud. Appl. Soft
Comput. 99, 106895 (2021). https://doi.org/10.1016/j.asoc.2020.106895
10. Subramoney, D., Nyirenda, C.N.: A Comparative Evaluation of Population-based Optimiza-
tion Algorithms for Workflow Scheduling in Cloud-Fog Environments (2020). http://arxiv.
org/abs/2012.00176
11. Wesley Chai, S.J.B.: What Is Cloud Computing?. techtarget, Dec. https://www.techtarget.
com/searchcloudcomputing/definition/cloud-computing
12. Mangla, P.: Heuristic vs meta-heuristic approaches for load balancing in cloud environment.
IJRTI 3(8), 197–200 (2018)
42 F. A. Saif et al.
13. Tyagi, R., Gupta, S.K.: A survey on scheduling algorithms for parallel and distributed systems.
In: Mishra, A., Basu, A., Tyagi, V. (eds.) Silicon Photonics & High Performance Computing.
AISC, vol. 718, pp. 51–64. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-
7656-5_7
14. Vashistha, A., Porwal, R., Soni, A.K.: A Taxonomy of Scheduling Algorithms for Cloud
Computing. 12(1), 67–71 (2015)
15. Luis, F., Moncayo, G.: A review on scientific workflow scheduling in cloud computing. In:
Proceedings of the 2nd International Conference on Communication and Electronics Systems
(ICCES 2017), p. 6 (2017)
16. Enzai, N.I.M., Tang, M.: A heuristic algorithm for multi-site computation offloading in mobile
cloud computing. Procedia Comput. Sci. 80, 1232–1241 (2016). https://doi.org/10.1016/j.
procs.2016.05.490
17. Li, X., Yu, W., Ruiz, R., Zhu, J.: Energy-aware cloud workflow applications scheduling with
geo-distributed data. IEEE Trans. Serv. Comput. 15(2), 891–903 (2020). https://doi.org/10.
1109/TSC.2020.2965106
18. Cai, Z., Li, X., Ruiz, R., Li, Q.: A delay-based dynamic scheduling algorithm for bag-of-task
workflows with stochastic task execution times in clouds. Futur. Gener. Comput. Syst. 71,
57–72 (2017). https://doi.org/10.1016/j.future.2017.01.020
19. Arabnejad, V., Bubendorfer, K., Ng, B.: Budget and deadline aware e-science workflow
scheduling in clouds. IEEE Trans. Parallel Distrib. Syst. 30(1), 29–44 (2019). https://doi.org/
10.1109/TPDS.2018.2849396
20. Alkhanak, E.N., Lee, S.P.: A hyper-heuristic cost optimisation approach for scientific work-
flow scheduling in cloud computing. Futur. Gener. Comput. Syst. 86(ii), 480–506 (2018).
https://doi.org/10.1016/j.future.2018.03.055
21. Keshanchi, B., Souri, A., Navimipour, N.J.: An improved genetic algorithm for task scheduling
in the cloud environments using the priority queues: formal verification, simulation, and
statistical testing. J. Syst. Softw. 124, 1–21 (2017). https://doi.org/10.1016/j.jss.2016.07.006
22. Akbari, M., Rashidi, H., Alizadeh, S.H.: An enhanced genetic algorithm with new operators for
task scheduling in heterogeneous computing systems. Eng. Appl. Artif. Intell. 61(February),
35–46 (2017). https://doi.org/10.1016/j.engappai.2017.02.013
23. Ismayilov, G., Topcuoglu, H.R.: Neural network based multi-objective evolutionary algorithm
for dynamic workflow scheduling in cloud computing. Futur. Gener. Comput. Syst. 102,
307–322 (2020). https://doi.org/10.1016/j.future.2019.08.012
24. Aburukba, R.O., Landolsi, T., Omer, D.: A heuristic scheduling approach for fog-cloud com-
puting environment with stationary IoT devices. J. Netw. Comput. Appl. 180, 102994 (2021).
https://doi.org/10.1016/j.jnca.2021.102994
25. Choudhary, A., Gupta, I., Singh, V., Jana, P.K.: A GSA based hybrid algorithm for bi-objective
workflow scheduling in cloud computing. Futur. Gener. Comput. Syst. 83, 14–26 (2018).
https://doi.org/10.1016/j.future.2018.01.005
26. Moschakis, I.A., Karatza, H.D.: A meta-heuristic optimization approach to the scheduling of
bag-of-tasks applications on heterogeneous clouds with multi-level arrivals and critical jobs.
Simul. Model. Pract. Theory 57, 1–25 (2015). https://doi.org/10.1016/j.simpat.2015.04.009
27. Eawna, M.H., Mohammed, S.H., El-Horbaty, E.-S.M.: Hybrid algorithm for resource provi-
sioning of multi-tier cloud computing. In: Procedia Computer Science 65, 682–690 (2015).
https://doi.org/10.1016/j.procs.2015.09.012
28. Ambursa, F.U., Latip, R., Abdullah, A., Subramaniam, S.: A particle swarm optimization and
min–max-based workflow scheduling algorithm with QoS satisfaction for service-oriented
grids. J. Supercomput. 73(5), 2018–2051 (2016). https://doi.org/10.1007/s11227-016-1901-x
29. Cho, K.-M., Tsai, P.-W., Tsai, C.-W., Yang, C.-S.: A hybrid meta-heuristic algorithm for VM
scheduling with load balancing in cloud computing. Neural Comput. Appl. 26(6), 1297–1309
(2014). https://doi.org/10.1007/s00521-014-1804-9
Hybrid Meta-heuristic Genetic Algorithm: Differential Evolution Algorithms 43
30. Ben Alla, H., Ben Alla, S., Touhafi, A., Ezzati, A.: A novel task scheduling approach based
on dynamic queues and hybrid meta-heuristic algorithms for cloud computing environment.
Clust. Comput. 21(4), 1797–1820 (2018). https://doi.org/10.1007/s10586-018-2811-x
31. Rafique, H., Shah, M.A., Islam, S.U., Maqsood, T., Khan, S., Maple, C.: A novel bio-inspired
hybrid algorithm (NBIHA) for efficient resource management in fog computing. IEEE Access
7, 115760–115773 (2019). https://doi.org/10.1109/access.2019.2924958
32. Srichandan, S., Ashok Kumar, T., Bibhudatta, S.: Task scheduling for cloud computing using
multi-objective hybrid bacteria foraging algorithm. Futur. Comput. Informatics J., 3(2), 210–
230 (2018). https://doi.org/10.1016/j.fcij.2018.03.004
33. Midya, S., Roy, A., Majumder, K., Phadikar, S.: Multi-objective optimization technique for
resource allocation and task scheduling in vehicular cloud architecture: A hybrid adaptive
nature inspired approach. J. Netw. Comput. Appl. 103, 58–84 (2018). https://doi.org/10.1016/
j.jnca.2017.11.016
34. Verma, A., Kaushal, S.: A hybrid multi-objective particle swarm optimization for scientific
workflow scheduling. Parallel Comput. 62, 1–19 (2017). https://doi.org/10.1016/j.parco.2017.
01.002
35. Kalra, M., Singh, S.: Multi-objective energy aware scheduling of deadline constrained work-
flows in clouds using hybrid approach. Wireless Pers. Commun. 116(3), 1743–1764 (2020).
https://doi.org/10.1007/s11277-020-07759-4
36. Zade, B.M.H., Mansouri, N., Javidi, M.M.: Multi-objective scheduling technique based on
hybrid hitchcock bird algorithm and fuzzy signature in cloud computing. Eng. Appl. Artif.
Intell. 104, 104372 (2021). https://doi.org/10.1016/j.engappai.2021.104372
37. Sardaraz, M., Tahir, M.: A hybrid algorithm for scheduling scientific workflows in cloud
computing. IEEE Access 7, 186137–186146 (2019). https://doi.org/10.1109/ACCESS.2019.
2961106
38. Iranmanesh, A., Naji, H.R.: DCHG-TS: a deadline-constrained and cost-effective hybrid
genetic algorithm for scientific workflow scheduling in cloud computing. Clust. Comput.
24(2), 667–681 (2020). https://doi.org/10.1007/s10586-020-03145-8
39. Topcuoglu, H., Hariri, S., Wu, M.-Y.: Performance-effective and low-complexity task schedul-
ing for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002).
https://doi.org/10.1109/71.993206
40. Xu, Y., Li, K., Hu, J., Li, K.: A genetic algorithm for task scheduling on heterogeneous
computing systems using multiple priority queues. Inf. Sci. (Ny) 270, 255–287 (2014). https://
doi.org/10.1016/j.ins.2014.02.122
41. Hassan, R., Cohanim, B., De Weck, O., Venter, G.: A comparison of particle
swarm optimization and the genetic algorithm. Collection of Technical Papers -
AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics and Materials Conference
2, 1138–1150 (2005). https://doi.org/10.2514/6.2005-1897
42. Zubair, M., Javaid, N., Ismail, M., Zakria, M., Asad Zaheer, M., Saeed, F.: Integration of
cloud-fog based platform for load balancing using hybrid genetic algorithm using bin packing
technique. In: Xhafa, F., Leu, F.-Y., Ficco, M., Yang, C.-T. (eds.) 3PGCIC 2018. LNDECT,
vol. 24, pp. 279–292. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-02607-3_25
43. Kamalinia, A., Ghaffari, A.: Hybrid task scheduling method for cloud computing by genetic
and DE algorithms. Wireless Pers. Commun. 97(4), 6301–6323 (2017). https://doi.org/10.
1007/s11277-017-4839-2
44. Ali, I.M., Sallam, K.M., Moustafa, N., Chakraborty, R., Ryan, M.J., Choo, K.-K. R.: An
automated task scheduling model using non-dominated sorting genetic algorithm II for fog-
cloud systems. IEEE Trans. Cloud Comput. 7161, 15 (2020). https://doi.org/10.1109/TCC.
2020.3032386
Reconfiguration of Protected Unicast
Connections in Elastic Optical Networks
1 Introduction
The evolution of data-intensive applications in transport networks such as videocon-
ferencing, streaming services, cloud computing applications, the development of data
centers, telecom operators are led to design their fiber-based networks optical. The tech-
nology of these networks is Wavelength Division Multiplexing (WDM). Today, we are
talking about elastic or flexible optical networks which give many advantages [1]. In these
networks, connections are often submitted to some disruptions. An approach to effective
management and failure tolerance is to establish connections protected by backup paths.
Fiber-optic networks, like connection-oriented networks, experience a drop in perfor-
mance due to events such as traffic changes (add or deleting connections) or maintenance
operations on a link or node. To optimise them, network operators need to reconfigure
routing (provide a new routing: configuration). Reconfiguration is an important feature
for optimising the use of network resources. It is used by operators to plan path changes
in the event of network disruptions. It must be transparent from the user’s point of view.
Reconfiguration is a difficult problem that can be reduced to two subproblems [3]. The
sub-problem of calculating the new routing and the sub-problem of switching from the
current routing to the new routing. This paper interests the second sub-problem. When
the reconfiguration is not well done, it produces flow interruption on the connections.
To reduce flow interruption during the reconfiguration process, some approaches use
backup path resources [1, 2]. Backup paths are used to free up resources needed by
some connections in the reconfiguration process. When connections use resources need
for reconfiguring another connection, backup paths are used to temporarily switch those
connections to free their resources for establishing connections for the reconfiguration.
The reconfiguration can be done in a single step using the backup paths. However, it
consumes enough resources [2]. Additional resources may be needed to free up con-
nections on backup paths because the resources available on these paths are limited,
so do not allow simultaneous switching of the primary paths of connections. In some
works, researchers calculate the final paths [3–5]. In this paper, we consider that the
connections established on an initial routing are protected. And we want to reconfigure
then to a new pre-computed routing, using the backup paths resources. In this work,
we want to reconfigure a set of protected unicast connections using the backup paths.
Given the dependencies between the initial and final paths and dependencies between
shared backup paths, solving this problem becomes a challenge. This work proposes
a reconfiguration technique that ensures flow continuity, reduces the number of steps
and additional resources (backup paths) during the reconfiguration process. This paper is
structured as follows. In Sect. 2, we briefly present the general concepts of elastic optical
networks. In Sect. 3, related work is presented. Section 4 gives the problem statement
and, Sect. 5 describes a basic reconfiguration algorithm using backup paths. Section 6
is our approach. Section 7 shows the results of evaluations.
The Elastic optical networks are developed from the limitations of Wavelength Division
Multiplexing (WDM) optical networks. WDM optical networks allow multiple users to
use the full bandwidth of the optical channel simultaneously at different wavelengths.
In WDM, the frequency grid of the channel is subdivided into sub-bands of fixed width.
The fixed allocation of frequency grids is one of the major problems of these networks.
In this case, a less demanding application will be allocated a large bandwidth, which
will waste the available resources. In addition, a very demanding application will be
allocated insufficient bandwidth, which is inappropriate. This inefficient management
of available resources has introduced elastic or flexible optical networks. This solution
is suitable for future technologies. Elastic optical networks allow for more efficient
resource management. The allocation of available resources depends on the modulation
46 K. A. Ouattara et al.
format, the path length, and the bandwidth demand. The frequency sub-bands of elastic
optical networks are flexible, unlike traditional optical networks. Figure 1 below shows
the spectral difference between the two technologies (fixed-width WDM and flexible-
width optical networks). In Fig. 1(a), the frequency bands are fixed, regardless of the
bandwidth demand of the application. This solution is not suitable for efficient resource
management. Figure 1(b) shows the flexible management of sub-bands according to
the connection demand. The frequency grid consists of small fine granularities which
are allocation units called frequency slots. The size of a slot can vary from 25 GHz to
6.25 GHz depending on the policy in place, the modulation format, and other parameters
[6, 7]. Connection requests must meet the continuity and contiguity constraints of the
frequency slots.
Elastic optical networks use the following basic elements: variable bandwidth
transponders (BVT), variable bandwidth optical connectors BV_OXC, optical ampli-
fiers AO. Figure 2 below shows the basic architecture of elastic optical networks. Each
component has a specific function in this architecture. The BVTs allow the right mod-
ulation format and the right number of frequency slots to be found, depending on the
demand. BV_OXC variable bandwidth optical connectors on each node of the path
between the source and destination nodes help to establish an end-to-end optical path.
Optical amplifiers are placed at least 80 km away to regenerate the optical signal [7].
We have briefly presented the basic concepts of elastic optical networks. In the next
section, we present the concept of connection protection in optical networks.
3 Related Work
Reconfiguration is used to optimise resource management or when a maintenance oper-
ation is planned in the network [11]. This technique is important for connection-oriented
networks to meet certain optimization requirements. The reconfiguration is part of the
planning of network resources. It consists in changing the routes of the connections
established to respond to disruptive phenomena (maintenance operations, failures of
links or nodes for example). During the reconfiguration process, when the resources of
the final path of a connection are used by another connection, it is necessary to tem-
porarily interrupt this connection to switch the connection to its final path. This can have
consequences if this interrupted connection is carrying very important data. To resolve
this problem, some works use the backup paths to temporarily switch over certain con-
nections using the resources necessary to reconfigure other connections. Backup paths
are therefore used to avoid interrupting connections during the reconfiguration process.
In this section, we discuss the work done on routing reconfiguration. The authors [2]
proposed a trade-off between improving network performance and packet loss during
the reconfiguration process. The trade-off to solve these two conflicting problems is to
use backup paths during the reconfiguration process. In this context, the resources of
the backup paths are reserved for restoring the flow in case of a connection failure. This
approach is resource intensive as it uses a dedicated protection where each primary path
48 K. A. Ouattara et al.
has its backup path. The authors of [12] have done work on connection reconfiguration
in Multiprotocol Label Switching (MPLS) networks. These connections are protected
by backup paths to free up resources for establishing new connections. The authors [13]
propose an approach to reconfiguration using additional resources (i.e. backup paths)
in traditional MPLS and WDM optical networks. Reconfiguration often requires the
interruption of some connections. The objective is to propose a reconfiguration app-
roach without interrupting connections. The authors [3] propose a reconfiguration using
shared backup path resources (shared protection) for Wide Area Network (WAN) mesh
networks. This technique aims to minimise the loss of network performance due to traf-
fic losses when switching from the current routing to the new routing. The author [1]
presents a reconfiguration study in a star topology. The objective is to reduce connection
interruptions resulting in performance loss. To solve this problem, the authors proposed
to use the backup paths which are the reserved resources. The algorithm is based on five
procedures for configuring and destroying unused or faulty optical paths. In the above,
the reconfiguration technique is done without interrupting the connections. Migration is
done connection by connection using the backup paths. As this method is relatively long,
another approach is group reconfiguration. This involves migrating a set of connections
simultaneously according to a defined criterion, which reduces the number of steps in
the process and therefore the duration of the reconfiguration process. This method is pro-
posed by [10] in WDM optical networks. The basic idea is to group primary connections
with disjoint links that do not share backup paths. These connections can be migrated
simultaneously to their backup paths and then from their backup paths to their final paths.
This solution assumes that the resources of final paths are always available. Previous
work has focused on the reconfiguration of routing in WDM optical networks using
backup path resources. In the problem addressed in [10, 13], the initial routing and the
shared backup paths of the set of connections are known. The final routing is unknown
at the beginning and determined during the reconfiguration process. But network oper-
ators, to optimize their network, can determine the final routing before the process of
migration [14–16]. A recent study in [17] concerned work on the reconfiguration of the
routing with the backup paths. This approach does not produce any disruption however
these works are done for multicast connections. In this case, the problem data becomes
the initial routing, the final routing, and the backup paths of the network topology. Thus,
new dependencies arise in this problem such as the dependencies between the paths of
the initial routing and the paths of the final routing. This new type of dependency is in
addition to the existing ones between shared backup paths. Figure 3 and Fig. 4 show
these dependencies. To solve this problem, we will model the dependencies on the one
hand and propose a reconfiguration algorithm on the other hand.
Reconfiguration of Protected Unicast Connections in Elastic 49
4 Problem Modelling
In this section, we address the problem statement of routing reconfiguration in elastic
optical network. Reconfiguration problem can be reduced to two subproblems: (1) find
a new routing corresponding current routing and (2) process of switching from current
routing to new routing [18]. In this work we address the second subproblem. We assume
initial routing and final routing are known. The objective is to determine configuration
sequences of current topology to new topology without interruption.
Consider the physical network can be modelised by an unoriented graph G = (N , L)
where N is the set of physical nodes and L the physical links between the end nodes
representing optic fiber. Consider R0 = (Ci , Si ) and Rt = (Cf ) and where R0 and Rt
denote respectively initial routing and final routing.
Ci = (p1 , p2 , . . . , pn ) and Si = (b1 , b2 , . . . , bn ) where bi is the backup paths of pi,
i ∈ {1, . . . , n}. A unicast connection of the virtual topology is characterised by a source
node and one destination node and the necessary slots number to the signal transmission
called a lightpath. We note that:
• Each link uses an optic fiber capacity K divided to many frequencies slot of size T;
• Connection established on a lightpath must satisfy the continuity’s constraint and
contiguity constraint of frequencies slot;
• An interruption during the reconfiguration process is characterized by the absence of
resources necessary to establish the connection (frequency slots) between the source
node and the destination node of the connection.
• Pre-establish the new path between the source and the destination. All nodes of the
new path are configured in parallel except the source;
• Configure the source to interrupt the flow on the old path to feed the new path;
• Delete the old path, all nodes of the old path are configured in parallel.
Another variant is the use of backup paths in the MBB technique. In this case, the
backup paths are used during reconfiguration to avoid temporarily interrupting of some
connections during the process. The steps can be summarized as follows:
• Pre-establish the backup path of the connection which uses the resources of the final
path of the connection to be reconfigured. All nodes are configured in parallel except
the source;
• Configure the source to interrupt the flow on the old path of said connection and feed
the backup path;
• Delete the old path (the final path which uses the resources of the connection to be
reconfigured);
• Pre-establish the final path of the connection to be reconfigured by configuring all the
nodes in parallel except the source;
• Configure the source to interrupt the flow on the old path of the connection to be
reconfigured and feed the new path (final path of the connection);
• Delete the old connection path by configuring all nodes in parallel;
• Pre-establish the final connection path, switch to its backup path;
• Establish his new path (his final path);
• Delete his backup path.
First Type of Dependence: Dependence between Initial Paths and Final Paths
The dependencies between the initial paths and the final paths are modelled by a directed
graph called the dependency graph Gd. In this graph, the nodes represent the optical paths
of the main topology, and the dependencies are materialized by arcs. A dependency
between the paths of two connections i and j is defined by the arc (i, j) such that the
initial path of connection j uses the resources necessary to configure the new path of the
connection
i. The
details of the construction of the dependency graph are defined in [15].
Gd = N ′ , L′ is the graph representing the dependencies between the initial routing
paths and the final routing paths. N’ the set of nodes representing the connections of the
network topology and L’ the set of arcs between two nodes. The figures below illustrate
the principles for determining the dependency graph.
Primary paths
Backup paths
Final paths
Network nodes
Fig. 3. Example of the main virtual topology of 7 connections established in the network
Figure 5 above shows the auxiliary graph corresponding to the network topology in
Fig. 3.
The network reconfiguration occurs between constant time intervals called observation
periods [20]. The algorithm below illustrates this new approach of reconfiguring the
initial routing to the precomputed final routing based on shared backup paths.
Reconfiguration of Protected Unicast Connections in Elastic 53
54 K. A. Ouattara et al.
Our algorithm takes as input a set of connections whose main paths, backup paths
and final paths are known (line 1). We construct the auxiliary graph Ga using the backup
paths as described in Sect. 6.1.2 (line 2). The algorithm for colouring the nodes of the Ga
graph is then run on graph Ga to build the connection groups [2] (line 3). The connections
belonging to the same set of this colouring have their separate backup paths. Then they
can be switched as needed in parallel on these backup paths (line 4). The reconfiguration
order of the connections is determined by the dependency graph between the initial
path and the final path defined in Sect. 6.1.1. In this context, these dependencies cause
a temporary interruption of some connections during the reconfiguration process (line
6 – line 9). Leaf or isolated nodes in the dependency graph mean that their resources
are available for reconfiguration by migrating the corresponding connections directly to
their final paths. We determine in the dependency graph Gd the set of all leaf or isolated
nodes. Then, we reconfigure them in parallel as described in the algorithm and we update
the both graph (auxiliary and dependency graph) at each iteration (line 10 – line 13). If
the reconfiguration is not completed, this means that there are cycles in the dependency
graph. In this condition the auxiliary graph Ga is used to migrate simultaneously the
connections of the same group (nodes with the same colour in the graph Ga). If the
nodes are of the same colour, we simultaneously migrate the corresponding connections
to their backup paths to free up resources. The remaining connections are migrated
directly to their final paths and then the connections switched to the backup paths are
migrated to their final paths since their respective resources are now available (line 15 –
line 17). In the case where the connections are not of the same group (the Ga nodes
do not have the same colour), we migrate the connections forming the cycles to their
backup paths to free up resources, then the remaining connections are migrated to their
final paths, and finally the connections switched to the backup paths to their final paths
since their resources are now available. We repeat this operation until the dependency
graph is empty. The algorithm stops when all connections have been reconfigured.
illustrated in Fig. 8, Fig. 9 and Fig. 10 below. We compare our approach in terms of
the number of process steps [9], the number of backup path resources, and the number
of interrupted connections. A reconfiguration step or sequence consists of simultane-
ously pre-establishing new paths, simultaneously establishing new paths (backup or final
paths) and deleting the old paths.
The duration of the reconfiguration process is the maximum number of steps during
the process to reconfigure all connections. The number of backup paths used in the
process is defined by the number of nodes creating a cycle in the dependency graph that
are temporarily switched before being reconfigured.
additional steps compared to MBB_WBP which does not use any step. Connections in
MBB_WBP are directly migrated on their final paths without additional steps because of
the passage of connections through their backup paths as in MBB_BP. This is observed in
both topologies of our tests where the results are substantially identical with a slight dif-
ference. Regarding the interruptions, we can observe in the Fig. 10, MBB_WBP causes
Reconfiguration of Protected Unicast Connections in Elastic 57
interruptions since it does not use backup paths to temporarily switch connections when
needed. The strength of approaches using backup paths is that they guarantee uninter-
rupted connections, which is why in Fig. 10 we see MBB_WBP. In Fig. 9, our approach
uses fewer backup paths resources. This is because Xin’s algorithm uses the backup
paths at each step to free up resources on the primary paths before computing the final
path. In the MBB_BP approach, connections are randomly selected to be reconfigured
on the final paths. When the resources on the final path of the connection are used by
another. The connection is switched to its backup path before being reconfigured. The
use of backup paths is not systematic. The random selection of connections gives the
MBB_BP approach the use of backup paths more frequently than ours. The same is true
for the second topology which has practically the same characteristics.
The particularity of this work is that we use fewer additional resources compared to
what is done which are limited in these working conditions. In addition, the operation
is non-disruptive to the end user in the sense that signal interruptions and the reduction
in the number of steps are considered in the reconfiguration process.
8 Conclusion
In this paper, we proposed a new algorithm to reconfigure routing with backup paths
in elastic optical networks (RABP_EON). In this work, we considered that the initial
routing and the final routing are known. However, the solution of the reconfiguration
problem considered the dependencies between the initial routing and the final routing
to avoid flow interruptions. Simulation results show that the proposed algorithm makes
reconfiguration with few number of steps (process time) and not flow interruption. In this
work, we assumed that the connections had the same number of slots and considered the
continuity and contiguity constraints. We focused on the process of switching from the
current routing to the new routing because the problem was difficult to solve due to new
dependencies in addition to the shared backup paths. An interesting future research topic
is to consider energy consumption in reconfiguration process. We will try to address this
crucial problem in elastic optical networks.
References
1. Ishida, S., Arakawa, S., Murata, M.: Reconfiguration of logical topologies with minimum
traffic disruptions in reliable WDM-based mesh networks. Photonic Netw. Commun. 6(3),
265277 (2003)
2. Józsa, B.G., Makai, M.: On the solution of reroute sequence planning problem in MPLS
networks. Comput. Netw. 42(2), 199210 (2003)
3. Takagi, H., Zhang, Y., Hua Jia, X., Takagi, H.: Reconfiguration heuristics for logical topolo-
gies in wide-area WDM networks. IEEE Global Telecommun. Conf. 2002, Globecom 2(3),
27012705 (2002)
4. Anoh, N., Babri, M., Kadjo, T.L.: Efficient energy routing with connections rerouting in
elastic optical networks under static connections. Int. J. Computer Science Issues 11 6(2),
89–98 (2014)
5. Marković, G.Z.: Routing and spectrum allocation in elastic optical networks using bee colony
optimization. Photon Netw. Commun. 34(3), 356–374 (2017)
58 K. A. Ouattara et al.
6. Anoh N.G., Babri, M., Kora, A.D., Faye, R.M., Aka, B., Lishou, C.: An efficient hybrid
protection scheme with shared/dedicated backup paths on elastic optical networks. Digit.
Commun. Netw. 3(1), 1118 (2017)
7. Chatterjee, B.C., Sarma, N., Oki, E.: Routing and Spectrum allocation in elastic optical
networks: a tutorial. IEEE Commun. Surv. Tutor. 17, 17761800 (2015)
8. Shen, G., Wei, Y., Bose, S.K.: Optimal design for shared backup path protected elastic optical
networks under single-link failure. J. Opt. Commun. Netw. 6(7), 649 (2014)
9. Anoh, N.G., Adépo, J.C., Babri, M., Aka, B.: Hybrid protection scheme for elastic opti-
cal networks with regenerator and efficient energy consideration. Int. J. Computer Science
Telecommun. 6(10), 7 (2015)
10. Xin, Y., Shayman, M., La, R.J., Marcus, S.I.: Reconfiguration of survivable IP over WDM
networks. Opt. Switch. Netw. 21, 93100 (2016)
11. Atta, A.F., Adepo, J.C., Cousin, B., Oumtanaga, S.: Minimize Flow Interruptions during
reconfiguration of a set of ligth-trees in all-optical WDM network. Int. J. Computer Science
Network Security 20, 7 (2020)
12. Orincsay, D., Szviatovszki, B., Böhm, G.: Prompt partial path optimization in MPLS networks.
Comput. Netw. 43(5), 557572 (2003)
13. Xin, Y., Shayman, M., La, R.J., Marcus, S.I.: OPNp1–2: Reconfiguration of survivable
MPLS/WDM networks. IEEE GLOBECOM 2006, pp. 15 (2006)
14. Cohen, N., Coudert, D., Mazauric, D., Nepomuceno, N., Nisse, N.: Tradeoffs when optimizing
lightpaths reconfiguration in WDM networks. INRIA. Research Report RR-7047 (2009)
15. Cohen, N., Coudert, D., Mazauric, D., Nepomuceno, N., Nisse, N. : Tradeoffs in routing
reconfiguration problems. 12èmes Rencontres Francophones sur les Aspects Algorithmiques
de Télécommunications (AlgoTel) (2010)
16. Adépo, J.C. : Reconfiguration du routage multicast dans les réseaux optiques WDM. Ph.D.
Thesis, Université Nangui Abrogoua, September (2016)
17. Christian, N.N., Christian, A.J., Michel, B.: Protected ligtht tree reconfiguration without flow
interruption in elastic optical networks. Int. J. Computer Networks Appl. 8(3), 140150 (2021)
18. Coudert, D., Huc, F., Mazauric, D., Nisse, N., Sereni, J.: Reconfiguration of the routing
in WDM networks with two classes of services. 2009 International Conference on Optical
Network Design and Modeling, pp. 16 (2009)
19. Coudert, D.: Algorithmique et optimisation dans les réseaux de télécommunications. Inria
Sophia Antipolis, 83 (2010)
20. Melidis, P., Nicopolitidis, P., Papadimitriou, G.: Reserved energy-aware virtual topology
management for IP-over-WDM optical networks. Opt. Switch. Netw. 31, 7285 (2019)
Users Engagement Factors with e-Court
Application Conceptual Framework
Abstract. This study adds to our knowledge of the elements that influence user
involvement with e-court applications. The goal of this research is to find out
what factors influence end-user preparedness on an e-court, the structural rela-
tionship between preparedness characteristics and e-court interoperability, and to
create a readiness model based on e-court end-user readiness parameters for self-
assessment of organizational adoption. Based on literature research, a qualitative
approach was employed to determine the most critical variables of users’ inter-
action with the e-court application. The results revealed four factor categories.
Human behavior, technology advancements, organizational structure, and legal
issues. A case study technique might be used to provide a more complete expla-
nation of the phenomenon, which would improve the study’s conclusions. A case
study, for example, in a Malaysian e-Court, is proposed to explore the collected
features and variables. This research can be applied to the private sector as well
as other industries like as healthcare and banking.
1 Introduction
Access to justice has become a major challenge in many court systems across the world.
Technology is increasingly being seen as a potential facilitator of access to justice,
particularly in terms of improving the justice sector’s efficiency [1]. The key functions
covered in court works are registration, indexing, and case follow-up. Case management
is a significant success factor in the legal system. Citizens’ social behavior reflects their
trust in the ability of justice institutions and courts to offer fast and effective services,
and a lack of trust reduces the function of these institutions in maintaining the rule of
law. This emphasizes the need of conducting surveys and studies to track efforts and
changes in the justice system, identify flaws, and put in place the appropriate measures
and remedies [2].
One of the key areas of interest is the automation of judicial operations; several
issues have arisen in the pursuit of justice, including delays caused by misplaced case
files at the register when a reference should be made. The process of automating the
functioning of the judicial system has spread nearly all over the world in the wake of the
ICT revolution, as legal practice has improved in terms of technology, thanks to effective
case file management, ease of access and retrieval of information, organizational inte-
gration, and speed of justice [3]. As a result, the courts are increasingly under pressure
to keep up with technological advancements in order to provide quality service. Further-
more, public trust and confidence in judicial institutions require a focus on government
transparency [4]. Furthermore, the government should consider the confidence and plea-
sure of government customers, both residents and enterprises, through openness, cost
reduction, and easy access to government services.
Additionally, when creating interoperable information systems, e-Government agen-
cies should focus on factors that affect performance, such as availability, dependability,
standardization, flexibility, reaction time, and integration [5]. Enterprise interoperabil-
ity is a precondition for assuring collaboration in this scenario [6]. Interoperability is
defined as the ability of multiple types of information and communications technology
(ICT) systems to communicate and share data and information in a meaningful and
usable manner, according to Lallana [7]. When this skill isn’t honed, it becomes an issue
that must be addressed. When interoperability hurdles exist, interoperability concerns
develop. There are three types of barriers: intellectual, technological, and organizational.
The primary conceptual hurdles are the information to be conveyed syntactic and seman-
tic incompatibility. These problems arise when modeling at a high level of abstraction or
when modeling at the information level. Content, syntactic, and semantic barriers are the
three most common types of conceptual barriers. The technical obstacles deal with how
people communicate and share information using computers or ICT (Information and
Communication Technology). Organizational obstacles refer to incompatibility between
two businesses’ organizational structures and management techniques [8]. Indeed, a lack
of interoperability can have a substantial influence on business and network performance
and outcomes. Businesses should be aware of their strengths and weaknesses in terms
of interoperability in order to build such capabilities amongst systems. [9]. As a result,
these obstacles should be removed in order to avoid interoperability issues.
The purpose of this study is to determine the elements that influence end-user pre-
paredness on an e-court, the structural relationship between preparedness characteristics
and e-court interoperability, and to develop a readiness model based on e-court end-user
readiness parameters for self-assessment of organizational IT adoption.
The following research questions are being addressed in this study:
1. What are the most important elements influencing end-user preparedness for e-court
interoperability?
2. How to characterize the structural linkages between preparedness elements and e-
court interoperability?
3. How can a readiness model for self-assessment evaluation in the organization be
developed based on the readiness criteria that influence users on the interoperability
of e-courts?
Users Engagement Factors with e-Court Application Conceptual 61
2 Literature Review
People’s willingness to adopt and use new technologies to achieve goals at home and at
business is referred to as technology readiness.
Court records are extremely important in the judicial system. Legal researchers, practi-
tioners, and policymakers mostly use them to make decisions. As a result, records man-
agement has grown in popularity, as a well-organized, efficient, and structured records
management system is crucial in ensuring that courts make fair decisions based on reli-
able data. The computerization of court records raises a number of challenges, some of
which may be specific to each country’s legal system. The generation, management, and
preservation of digital records has an impact on policies, standards, copyright, metadata,
and other technical considerations. Given the nature and importance of the court, due
process, impartiality, and independence should be carefully evaluated, even as the use of
technology reduces delays, improves economy, efficiency, and effectiveness, and encour-
ages confidence in the justice system. This is especially true when there are structural and
procedural changes, such as those brought on by new technology [11]. Countries’ expe-
riences in the International Records Management Trust (IRMT) research demonstrated
that a system needs its own robust legal framework to function with authority, trustwor-
thiness, and reliability. The legal and judicial record case studies have identified several
significant issues (IRMT) [12]: (1) the need to raise the status and priority of recordkeep-
ing; (2) the need to allocate greater resources to supporting recordkeeping infrastructure,
such as storage facilities and equipment; and (3) the need to adopt records management
policies and standards. (4) the understanding that computerized case management sys-
tems have the potential to improve case flow management and information access; (5)
the importance of developing an information strategy and business case based on the
needs of all key stakeholders before embarking on case administration computerization;
(6) the value of pilot computerization projects in building confidence and capacity; and
(7) the importance of standardized formats and templates for common documentation.
They include cost savings and efficiency improvements, improved customer service,
openness, anti-corruption, and accountability, and improved decision-making quality.
62 A. M. M. Alankar et al.
They include ICT infrastructure; lack of knowledge and resistance; change management;
poor leadership in an organization; and materials and methods.
The research gap was discovered through a review and analysis of previous publications
and studies in this topic (see Fig. 1).
Users Engagement Factors with e-Court Application Conceptual 63
Lack of studies in investigating the readiness factors that affect electronic courts
implementation based on end-user perspectives and there is no readiness model for
electronic courts interoperability that have been developed in Malaysia.
3 Methodology
This involved a qualitative method, a literature analysis, and a thesis on e-Courts, e-justice
systems, and e-litigation concerns concerning eCCMS readiness. The process includes
64 A. M. M. Alankar et al.
The variables in e-readiness and intention readiness models in the literature were exam-
ined in this study, which focused on an electronic court system and the e-government
sector. This study chose various parameters and variables based on a review of the
literature and previous investigations. Table 1 below summarizes all Research variables.
Human conduct can be broken down into eleven variables. The strongest predictor
of intention to employ the target technology is performance expectation [13–15]. Effort
expectation is a significant variable that has been shown to have a direct and considerable
favorable impact on behavioral intention to utilize systems [15, 16]. Social influence
affects overall use intention, where it has a significantly positive impact on the behavioral
intention to use systems [14, 15, 17]. Facilitating factors influence behavioral intention
to utilize systems in a direct and meaningful way [14–16]. The cost and pricing structure
have a huge impact on customers’ technology [18]. It was found that experience and
habit had insignificant influence on behavioral intention to use system [16]. Optimism
and awareness have a significant impact on behavioral intention to use e-file and optimism
is a motivator contributing to Technology Readiness (TR) [14, 19, 20]. As a result of the
research in previous studies, Innovativeness is considered as a motivator contributing to
TR [19, 20]. Discomfort is an inhibitor detracting from TR [19, 20]. Finally, Insecurity is
an inhibitor detracting from TR [19, 20]. Technological factors have four variables. ICT
Infrastructure is a significant variable, and it is one of the success factors responsible for
the effective and efficient implementation of e-Court [21, 22]. Limited accessibility is one
of the key issues and challenges found to be significant in using ICT tools. When users
have greater access to technological resources, attitudes toward technology are more
positive, and they tend to use technology to a greater extent; limited accessibility is one
of the key issues and challenges found to be significant in using ICT tools [1, 23, 24].
The study found that consumers of e-learning systems confront numerous challenges,
including a lack of trust in ICT services and the high cost of maintaining them. As a result,
numerous aspects are required in the use of the electronic court in this context, the most
important of which is technological and legal protection [21]. Lastly, in terms of design
and development factors, it was discovered that in order to obtain positive results in the
design and execution of e-justice projects, it is critical to create principles of involvement
and cooperation with key stakeholders. In addition, the study demonstrated the need of
coordination between different stakeholder groups and different types of actors during the
design and implementation phase; the initial design phase and continuing development
Users Engagement Factors with e-Court Application Conceptual 65
Factors Variables
(A) Human Behavior A1 Performance expectancy
A2 Effort expectancy
A3 Social Influence
A4 Facilitating Conditions
A5 Price value
A6 Experience and Habit
A7 Optimism
A8 Innovativeness
A9 Discomfort
A10 Insecurity
(B) Technological B1 ICT Infrastructure
B2 Accessibility and Simplicity
B3 Security and Trust
B4 Design and Development
(C) Organizational C1 Top Management Support
C2 Stockholder Training
C3 Awareness
C4 Strategy
C5 Funds and Resources
(D) Legal D1 Governmental Regulations
D2 Procedure and Standards Unit
D3 IT Standards
scrutiny are the key risk considerations [1, 25]. Organizational factor has five variables.
According to the literature review, the biggest challenge facing court administrators in
general is to support the administration of justice and provide greater access to justice,
which will necessitate new skill sets such as government will, motivating and supporting
leaders [21, 26]. During the research, it was discovered that one of the many challenges
faced by users of e-learning systems is that they lack sufficient skills to use information
and communication technology. As a result, training stakeholders is one of the success
factors responsible for the effective and efficient implementation of the e-court, and a
lack of effective training is one of the main challenges found to be one of the issues
[21, 23, 24]. In terms of the awareness factor, the introduction of information systems
as a tool to assist in organizational structure changes the organization itself, so people in
the organization must be aware of these changes, and awareness of the electronic court
is one of the most important factors that will enable members and litigants to use the
66 A. M. M. Alankar et al.
electronic court appropriately [21, 25, 27, 28]. This study also concluded that the primary
obstacles of the electronic court are uniformity, practice, technology, and strategy, and
that strategic planning is vital and extremely significant [3, 26]. The last variable in
organizational factors is funds and resources, which the study found are required to
broadcast ICT infrastructure and increase scaling levels, despite the fact that technology
and communications investments are costly [28, 29]. Legal factor has three variables; the
first is governmental regulations; The electronic court necessitates legislative oversight
of the use of advanced information technology in judicial proceedings, so the law has
to be updated to include and cover all new technology amendments and procedures,
and difficulties of working with old systems that are out of date [1, 30]. The second
legislative variable is procedure and standards unit; it is critical to enforce practice
and process consistency across the state [30]. The last variable is IT Standards; results
show the electronic court requires legislative regulation of the process of using modern
information technologies in court work, many elements, most notably technical and
legal protection, and legislative regulation of the process of using modern information
technologies in court work. The most pressing issue is the misalignment between what
technology can provide and the current state of technology regulation in the courts [3,
31]. This research contributes to the understanding of user’s engagement factors with e-
court application. The findings of this study could be improved by employing a case study
method to provide a more detailed explanation of the phenomenon. It is suggested that
the collected components and variables be examined, and that a case study be conducted
in a public organization to find uniform multi-perspective criteria for human behavioral,
technological, organizational, and legal challenges. This phase identifies multi-criteria
perspectives for readiness aspects that may influence eCCMS adoption and usage, as well
as building a new multi-perspective decision-making procedure based on the identified
challenges.
References
1. Lupo, G., Bailey, J.: Designing and implementing e-justice systems: some lessons learned
from EU and Canadian examples. Laws. 3(2), 353–387 (2014). https://doi.org/10.3390/law
s3020353
2. Fifth Legal Monitor Report (2018)
3. Saman, W.S., Haider, A.: E-Shariah: information and communication technologies for Shariah
court management. Legal Inf. Manage. 13(2), 94106 (2013)
4. Slowes, R.: Benefits of a modern court case management system. Thomson Reuters, pp. 1–6
(2012)
5. Sulehat, N.A., Taib, C.A.: e-Government information systems interoperability in developing
countries: the case of Jordan. J. Business and Social Rev. Emerging Economies 2(1), 49–60
(2016). https://doi.org/10.26710/jbsee.v2i1.18
6. Panetto, H., Zdravkovic, M., Jardim-Goncalves, R., Romero, D., Cecil, J., Mezgár, I.: New
perspectives for the future interoperable enterprise systems. Comput. Ind. 79, 47–63 (2016).
https://doi.org/10.1016/j.compind.2015.08.001
7. Lallana, E.: An Overview of ICT Policies and e-Strategies of Select Asian Economies. UNDP-
APDIP ICT4D Series. Elsevier (2004)
8. Vernadat, F.B.: Technical, Semantic and Organizational Issues of Enterprise Interoperability
and Networking (2010)
Users Engagement Factors with e-Court Application Conceptual 67
9. Leal, G., Guedria, W., Panetto, H.: A semi-automated system for interoperability assessment:
an ontology- based approach. Enterprise Information Systems, Taylor & Francis, In press,
ff10.1080/17517575.2019.1678767ff. ffhal-02309347f
10. Youssef, A.F.: Electronic Information Courts and Electronic Litigation. 1st Edition.
Alexandria- Egypt (2013)
11. Mosweu, T., Mosweu, O.: Electronic court records management systems: a review of literature
in selected African countries. Mousaion 36(4), 1–21 (2019)
12. International Records Management Trust (IRMT) (2004)
13. Shin, D.H.: Towards an understanding of the consumer acceptance of mobile wallet Original
Research Article. Comput. Hum. Behav. 25, 1343–1354 (2009). https://doi.org/10.1016/j.chb.
2009.06.001
14. Schaupp, L.C., Carter, L., McBride, M.E.: E-file adoption: a study of U.S. taxpayers’
intentions. Computers in Human Beh. 26(4), 636–644 (2010)
15. Chiu, Y.-T., Fang, S.-C., Tseng, C.-C.: Early versus potential adopters: exploring the
antecedents of use intention in the context of retail service innovations. Int. J. Retail
Distribution Manage. 38(6), 443–459 (2010). https://doi.org/10.1108/09590551011045357
16. Enaizan, O., et al.: Electronic medical record systems: decision support examination frame-
work for individual, security and privacy concerns using multi-perspective analysis. Heal.
Technol. 10(3), 795–822 (2020)
17. Keong, M.L., Ramayah, T., Kurnia, S., Chiun, L.M.: Explaining intention to use an enterprise
resource planning (ERP) system: an extension of the UTAUT model. Business Strategy Series
13(4), 173–180 (2012). https://doi.org/10.1108/17515631211246249
18. Venkatesh, V., Thong, J.Y.L., Xu, X.: Consumer acceptance and use of information: extending
the unified theory of acceptance and use of technology. MIS Q. 36(1), 157178 (2012)
19. Sani, O.G., Pesaran, B., Shanechi, M.M.: Modeling behaviorally relevant neural dynamics
enabled by preferential subspace identification (PSID). Nature Neuroscience 24, 808154
(2019)
20. Parasuraman, A., Colby, C.L.: An updated and streamlined technology readiness index: TRI
2.0. J. Service Res. 18(1), 59–74 (2015)
21. Singh, M., Sahu, G.P., Dwivedi, Y., Rana, N.P., Tamilmani, K.:. Success factors for e-court
implementation at Allahabad High-Court. In: PACIS, p. 137 (2018)
22. Mandil, A.F.: Remote litigation: a legal study. Al-Qadisiyah University. Kufa Journal for
Legal and Political Science (2014). http://uokufa.edu.iq/
23. Al-Shboul, M.A.R., Barber, K.D., Garza-Reyes, J.A., Kumar, V., Abdi, M.R.: The effect of
supply chain management practices on supply chain and manufacturing firms’ performance.
J. Manuf. Technol. Manag. 28(5), 577–609 (2017). https://doi.org/10.1108/JMTM-11-2016-
0154
24. Ghavifekr, S., Kunjappan, T., Ramasamy, L., Anthony, A.: Teaching and learning with ICT
tools: issues and challenges from teachers’ perceptions. Malaysian Online J. Educ. Technol.
4(2), 38–57 (2016)
25. Rosa, J., Teixeira, C., Pinto, J.S.: Risk factors in e-justice information systems. Gov. Inf. Q.
30(3), 241–256 (2013)
26. Dillon, M., Beresford, D.: Electronic courts and the challenges in managing evidence; a view
from the inside the international criminal court. In: IJCA, Vol. 6, p. 29 (2014)
27. Upadhyay, M.H.: E-Courts in India and E-Judiciary in India. Int. Multidisciplinary Research
J. (7637: 7), pp. 2–5 (2015)
28. Sharma, C., Mitra, A.: Corruption, governance and firm performance: evidence from Indian
enterprises. Arup J. Policy Modeling 37(5), 835-851 (2015)
29. Ojo, A., Janowski, T., Estevez, E.: Semantic Interoperability Architecture for Electronic
Government, pp. 63–72 (2009). https://doi.org/10.1145/1556176.1556192
68 A. M. M. Alankar et al.
30. Abu Taleb, J.N.: Electronic Courts: Their Procedures and the Extent of their Legal Application
in Jordan, 1st edn. Alaan publishers and distributors, Amman, Jordan (2018)
31. Stefanov, S.: Some aspects of legal regulation of the project electronic court” during its
implementation in the legal procedure of Ukraine. Evropsky politicky a pravni diskurz 3(1),
165–171 (2016)
On the Reusability of Machine Learning
Models in Edge Computing: A Statistical
Learning Approach
1 Introduction
Lee et al. [6] define compute reuse as ‘the partial or full utilization of already
executed computational task results by multiple users to complete a new task
while avoiding computation redundancy’. Systems that adopt compute reuse
benefit from significant performance gains motivating model reuse in Machine
Learning (ML). Model reuse [14] attempts to construct a model from other pre-
existing and pre-trained models for other tasks, in order to avoid building a
model from scratch. Exploitation of pre-existing models can set a good basis
for the training of a new model which translates into a reduced time cost, data
amount and expertise required to train a new model. Moreover, model reuse has
been used to tackle concept drift [13] and building ad-hoc analytic models [5].
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
K. Arai (Ed.): FTC 2022, LNNS 561, pp. 69–89, 2023.
https://doi.org/10.1007/978-3-031-18344-7_5
70 X. Skotti et al.
2 Related Work
Compute reuse has been investigated in the context of edge computing by [6]
to quantify its gain. Experiments on edge-based applications showed that sys-
tems that adopt compute reuse can finish the same task up to five times faster.
Motivated by similar concerns a theoretical paradigm named ‘learnware’ was
proposed by Zhou [14]. More specifically, a learnware is a ML model that is pre-
trained and achieves good performance paired with a detailed specification. The
vision behind the paradigm was that learnware models can be shared in a pool
without their raw data, allowing the identification of pretrained models that sat-
isfy their requirements without concerns over privacy violations. Therefore, the
author identified three characteristics: reusable, evolvable and comprehensible
as fundamental for a model to be considered a learnware.
Based on this paradigm, the Reduced Kernel Mean Embedding (RKME)
[12] was presented, i.e., a two phased framework consisting of the upload and
deployment phase. During the upload phase, each model is paired with its Kernel
Mean Embedding (KME) of the dataset and added to the pool of models. Then,
in the deployment phase either a single or a combination of models is chosen
based on the RKHS distance between the testing (target) mean embedding and
reduced (source) embedding of pool models. In essence, the RKME’s deployment
phase, is similar to the MMD statistic [3], since by quantifying the distance of
the mean embedding of two populations (source and target), it ensures that the
target distribution is the same as the source.
In [14], the authors recognise transfer learning as a preliminary attempt to
reusability. A two-stage framework dubbed as Learning to Transfer (L2T) was
presented [11], which exploits previous transfer learning experiences to optimize
what and how to transfer between domains. In the first stage each transfer
learning experience is encoded into three parts and, then, are utilised to learn
a reflection function, which approximates the performance improvement ratio
and thus encrypts transfer learning skills of deciding what and how to transfer.
The improvement ratio in this framework is the difference between domains
calculated by MMD. In addition to the MMD between domains, the variance is
also calculated since a small MMD paired with an extremely high variance still
indicates little overlap. During the second stage, whenever a new pair of domains
arrives, L2T optimizes the knowledge to be transferred by maximising the value
of the learned reflection function.
Model reuse has also been used to handle concept drift. The assumption
that previous data contain some useful information, indicates that the models
corresponding to the data can be leveraged. Condor was proposed [13] as an
approach to handle concept drift through model reuse. Condor consists of two
modules, ModelUpdate and WeightUpdate which leverage previous knowledge
72 X. Skotti et al.
to build a new model, hence updating the model pool and adapting the weights
of previous models to reflect current reusability performance respectively.
Hasani et al. [5] proposed a two-phased approach, to build faster models
for a popular class of analytic queries. Similar to the other approaches [11] -
[13], there is a preprocessing and a runtime phase. During the first phase the
models, their statistics and some meta-data are stored, while in the second phase
relevant models are identified from which an approximate model is constructed.
Their approach can achieve speed-ups of several orders on magnitude on very
large datasets, however, it is only geared towards exploratory analysis purposes
and the approach is potentially less robust under concept drift.
Concerns over intellectual property (IP) infringement and vulnerability prop-
agation of deep learning models (DNN) motivated the proposal of ModelDiff [8],
a testing-based approach to DNN model similarity comparison. They compare
the decision logic of models on the test inputs represented by a decision distance
vector (DDV), a newly defined data structure in which each value is the dis-
tance between the outputs of the model produced by two inputs. These inputs
are pairs of normal and corresponding adversarial samples and thus when used
to calculate the DDV, the decision boundary is captured.
Lee et al. [6] also discuss alternative approaches and corresponding challenges
of compute reuse including in networks. They identify that reuse can be achieved
either in a distributed or centralized manner. The distributed approach involves
forwarding tasks to the compute reuse node that is responsible for the opera-
tion. This adds additional complexity to the forwarding operations of routers
resulting in a potential downgrade in performance. Reuse of results in a network
setting, undoubtedly improves performance, however speeding up the estima-
tion of parameters can also be beneficial in that regard. Nodes in a network can
collaborate to estimate parameters as discussed in [7]. More specifically, their
method takes advantage of the joint sparsity of vectors used for computations
enhancing estimation performance. Joint sparsity simply means that the indexes
of nonzero entries for all nodes are the same, but their values differ. The authors
also adopt an intertask cooperation strategy to consider intertask similarities.
Their method assumes that both the vectors of interest and their associated
noise follow a zero-mean Gaussian distribution which is a strong assumption for
the data to hold.
The contributions of this paper that, in parallel, depict its differences with
other relevant efforts in the domain are as follows:
– An online model reuse framework for edge computing consisting of two steps,
a pair similarity detector (based on MMD) followed by a direction of model
reusability (based on the inlier data space overlap).
– A decision making algorithm which given the results of the framework it can
maximise the number of nodes which do not require distinct models along
with a list of replacement models.
– Extensive experimental evaluation of the framework with both classification
and regression models over real datasets.
Model Reuse Framework 73
3 Background
3.1 Maximum Mean Discrepancy
MMD is a statistic that can quantify the mean discrepancy of two data distri-
butions in a kernel space in order to determine if two samples are drawn from
different distributions [3]. Let p and q be two independent probability distribu-
tions, and Ex [f (x)] (shorthand notation for Ex p [f (x)]) denotes the mathemat-
ical expectation of f (x) with x under the probability density p. The statistic
definition between p and q is:
where the function class F is a unit ball in the reproducing Hilbert space (RKHS)
and μp , μq is the mean embedding of p and q, respectively i.e., the mean of the
feature mapping in the kernel space. The function class F is universal meaning
that M M D (F , p, q) = 0 if and only if p = q. Therefore, MMD is the largest
difference in expectations over functions in F and can only be zero if the two
samples were drawn from the same distribution.
In practice, we use the square MMD in order to be able to use kernel func-
tions. Let X = {x1 , ..., xm } and Y = {y1 , ..., yn } denote the independent and
identically distributed (i.i.d.) samples
from distribution p and q, respectively.
An unbiased estimation of M M D2 μp − μq 2H can be obtained using a U-
statistic:
m m
1
M M D2 (F, p, q) = k (xi , xj )
m(m − 1) i=1
j=i
n
n
1
+ k (yi , yj ) (2)
n(n − 1)
i=1 j=i
m
n
2
− k (xi , yj )
mn i=1 j=1
where k(.) denotes the kernel function. In our model, we adopt the linear and
Gaussian RBF kernels as defined as: k(x, y) = xT y and k(x, y) = exp(− 2σ1 2
x − y 2 ), where σ ∈ R is a kernel parameter and x − y is a dissimilarity
measure (e.g., Euclidean distance).
the normal data region are going to be classified as outliers. OCSVMs utilize
an implicit transformation function φ (.) defined by the kernel to project data
to a higher dimensional space. The algorithm learns the decision boundary (a
hyperplane) which achieves the maximum separation of the majority of data
points. Only a small fraction of data are allowed to lie on the other side of the
decision boundary and those data are considered outliers.
The OCSVM returns a function f that takes the value +1 for the normal
region and −1 elsewhere. This function f is called a decision function being
defined as: f (x) = sign(g(x)) = sign(wT φ(x) − ρ) where w is the vector per-
pendicular to the decision boundary (g(x) = 0) and ρ is the bias. Given that
the distance of any arbitrary data point to the decision boundary can be cal-
culated by d(x) = |g(x)|
w and that the origin’s value when plugged to g(x) is ρ,
ρ
the distance of the origin to the decision boundary is w . The OCSVM essen-
tially attempts to maximise the distance by solving the minimisation problem
of w
2 − ρ, i.e.,
n
w 2 1
min −ρ+ ξi (3)
w,ξ∈RN ,ρ∈R 2 vn i=1
where ξi is the slack variable for a point i which allows it to lie on the other side
of the decision boundary, n is the size of the training dataset and v ∈ (0, 1) is a
regularization parameter. As shown in (3) the objective is not only to minimise
the distance of the origin to the decision boundary but also minimise the slack
variables ξi for all points. v represents the upper bound limit of the fraction of
outliers and a lower bound on the number of support vectors. In other words, v
specifies the number of training points which are guaranteed to be misclassified
and the number of training examples being support vectors. As mentioned above
v ∈ (0, 1) and therefore a percentage, where a high value may lead to over-fitting
and a low value to under-fitting. v controls the trade off between ξ and ρ.
For reducing the number of variables to a single vector and utilise the kernel
trick, the primary objective is transformed into a dual objective:
aT Qa
min (4)
a 2
1
n
subject to: 0 ≤ ai ≤ vn , i=1 ai = 1
where Q is the kernel matrix and a the Lagrange multipliers. Now, the decision
function becomes:
n
f (x) = sign( ai k(x, xi )) (5)
i=1
Model Reuse Framework 75
Once we identify the similar pairs in the network we can then calculate the
OCSVM scores of each node in each pair and hence determine the direction of
reusability per pair. The OCSVM score is essentially the probability of detecting
the inliers of the node by using the other node’s model. Therefore, given two
nodes x and y, and their corresponding OCSVM models, we use each OCSVM
model to predict the other node’s inliers and then we calculate the number of
points that were identified as inliers and divide by the number points in the
dataset, hence the probability. The reason we divide by the number of points in
the dataset is because we expect to do some form of filtering prior and remove the
outliers if they exist, hence all the points in the dataset are inliers. We calculate
the OCSVM score for both directions and whichever is higher is the node for
which we should train the model for Algorithm 3 calculates of the OCSVM scores
of each node per pair.
The framework presented by this point operates on the node level, however
in order unify the information to the network level we prose a naive decision
making algorithm (Algorithm 4). The algorithm provides the user with infor-
mation about which nodes do not require distinct models and the respective
potential replacement models. The algorithm is naive and thus simple whose
aim is to find the maximum number of nodes for which we do not train a model
for. Nevertheless, the algorithm would not take into account any performance
optimising considerations.
A visual representation of the framework being applied to a network is shown
in Fig. 1.
78 X. Skotti et al.
Fig. 1. Example of a network where the framework is applied. The letters N, D and M
followed by a number stands for node, dataset and model respectively.
Algorithm 4: Finds Nodes that can use a Reused Model, along with a
List of Replacements, based on the Results of the Framework
Data: pair results: dictionary associating each pair with the node whose
model to be reused i.e. the direction of reusability, nodes: the list of
nodes from the MMD identified pairs
Result: mns: modelless nodes i.e. nodes that do not require that a model is
trained for them, model mns: associates each modelless node (mn)
with a list of potential replacement node models
1 begin
2 similar pairs ←− pair results.keys()
3 mns ←− nodes.copy(), model mns ←− {}
4 for node in nodes do
5 model mns[node] ←− []
6 end
7 for node in nodes do
8 node similar pairs ←− get node similar pairs(node, similar pairs)
9 for x, y in node similar pairs do
10 model node ←− pair results[(x, y)]
11 mn ←− dif f erence(model node, (x, y))
12 if model node in mns then
13 mns.remove(model node)
14 end
15 model mns[mn].append(model node)
// ensures we do not encounter the pair again
16 similar pairs.pop((x, y))
17 end
18 end
// Remove replacement options for an mn that can be replaced
themselves
19 for node in nodes do
20 if model mns[node].count() > 1 then
21 for model node in model mns[node] do
22 if model mns[node] not empty then
23 model mns[node].remove(model node)
24 mns.append(model node)
25 end
26 end
27 end
28 end
29 end
5 Experimental Evaluation
Datasets. We have evaluated our framework for both regression and classifica-
tion models. For regression, we have used the GNFUV Unmanned Surface
Vehicles Sensor Data Set [4] which includes data from three experiments.
80 X. Skotti et al.
In each experiment there are four sets of mobile sensor readings data (humid-
ity and temperature) recorded by the Raspberry Pi’s corresponding to four
Unmanned Surface Vehicles (USVs) (see Fig. 3).
For classification, we have used the UCI Bank Marketing Dataset (BM)
[9]. The data was collected by a banking institution through phone calls as part
of a direct marketing campaign. The dataset is a binary classification dataset
of classes ‘yes’ or ‘no’, to subscribe to the product (bank term deposit). More
specifically there are 4640 ‘yes’ instances and 36548 ‘no’ instances.
We have applied Principal Component Analysis (PCA) to reduce the number
of dimensions of the dataset from 20 to 3 and then subsequently used these data
to execute the hypothesis testing.
In comparison to the GNFUV dataset the BM dataset has no inherent
network-node like structure and hence it was constructed. We trained a K-
means classifier with an equal number of yes and no instances to split data into
four clusters. This was done to avoid class imbalance from influencing the clus-
tering algorithm. However, we wanted to have more available samples to split
into more nodes so instead of clustering equal amounts of instances per class, we
used three times the number of yes instances for no instances. We merged three
of these clusters into one (clusters 1,2 and 4 in Fig. 2) and then created 5 nodes
from the two clusters.
It is worth mentioning we have used two data configurations per dataset.
For the GNFUV dataset, the two configurations were the original data and a
standardised version of them. For the BM Dataset, we used the node data created
from the aforementioned process as well as a balanced version of them, by under
sampling the majority class (no) to have an equal number of instances as the
minority class (yes).
Lastly, we have drawn 100 unique samples per network, in each of which
the node data have an equal number of examples in order to comply with
the MMD implementation constraint discussed in Sect. 4. The sample size of
each node dataset is determined by the Minimum Sample Size (MSS), i.e.,
defined by the node with the minimum number of entries. The source code is
available for re-producability at https://github.com/XeniaSkotti/online model
reuse framework edge computing.
Fig. 3. The relationship between humidity and temperature per experiment alongside
their distribution plots for the original GNFUV data.
The approach to identifying the similar and other nodes for each dataset
differed due to the nature of each dataset. Since the GNFUV is a regression
dataset of only two dimensions, we plotted the points of each experiment and
visually identified the pairs which we deemed as similar per experiment. Then
we used Algorithms 1 and 2 to confirm our inferences, otherwise we adjusted the
similar and other nodes sets. For the BM dataset the similar nodes are either the
nodes of the newly merged cluster or cluster 3. Similarly, we tested both possible
82 X. Skotti et al.
similar nodes sets for each data configuration (balanced and unbalanced) to
determine which one was best.
Once we had an initial idea of the similar and other nodes sets, we could then
use them to determine the kernel and bandwidth. The two kernels we considered
were the Radial Basis Function (rbf) and Linear kernels. We aimed to choose
the parameters which would most effectively separate the similar from dissimilar
pairs. The full parameter configuration of each dataset (experiment) and data
configuration is found in Table 1.
ML Models. For each problem type we chose distinct classifiers, namely Sup-
port Vector Regression (SVR) and Logistic Regression (LR) for regression and
classification, respectively.
Starting off with regression, we have trained SVRs to capture the relationship
between the humidity and temperature attributes of the dataset. SVRs are a
version of SVM for regression proposed by Vapnik et al. [2]. SVRs have a few
variables that should be optimised for each node model. First, we experiment
with both the linear and rbf kernels in order to evaluate how different kernels
interact with our framework. Moreover, we optimise the regularization parameter
and the epsilon in the epsilon-SVR model using grid search given a node’s dataset
to ensure we find the best ǫ-insensitive region for the data. It is worth noting that
the SVR implementation in scikit-learn reports the performance of the classifier
in terms of the coefficient of determination (R2 ).
Our classification dataset, has two classes yes and no and we have used LR [1]
specifically because it is usually a good baseline for binary classification. Hence,
the scikit-learn implementation of LR reports performance in terms of the mean
accuracy on the test dataset. As mentioned in Sect. 5.1, for the BM Dataset
we experiment with two data configurations, one which data are balanced and
another in which they are not. For the case in which the data are not balanced,
we configured the class weight parameter of LR to be balanced to deal with the
Model Reuse Framework 83
Table 2. Classifier parameter values that are fixed and optimised per dataset.
imbalance. The other parameter which we control for both data configurations
is the regularization parameter. Lastly, the scikit learn implementation offers
a variety of solver options hence we optimise it as well. Table 2 details which
parameters were fixed and optimised for each classifier.
As discussed in the previous Sect. 5.1, we assess the framework across two met-
rics, precision and speedup. In this section we evaluate these metric results one
by one for each dataset and provide a discussion around the effectiveness of the
framework.
In the following sections we discuss the precision results across the three lev-
els, followed by speedup. We will analyse each dataset’s precision individually
and then discuss the speedup across both datasets simultaneously. More specifi-
cally, in the case of the GNFUV dataset precision, we will provide observations
for each experiment before drawing general conclusions using the metric results.
Finally, we will draw some general conclusions on the applicability of the frame-
work in regression, the effect of the kernel and standardisation of data. Similarly,
for the BM dataset precision we will draw conclusions for the dataset, the appli-
cability of the framework in classification and the effect of using balanced and
unbalanced data.
Regression Precision. Original Data: Starting off with the GNFUV original
data, the combined precision is almost 1 for Experiment 1 and Experiment 3 if
we allow a 0.05 margin of tolerance in terms of the OCSVM predictions (non-
strict - as discussed in Sect. 5.1). The combined precision falls to 0.69 when we are
strict about the predictions because of Experiment 3. The combined precision for
Experiment 2 is low but that’s expected considering what we discussed above.
Therefore, the framework, when the threshold is set to 0.8, has a combined
precision of 0.59 with no tolerance and increases to 0.77 when there is. These
results are illustrated in Table 3. If we analyse combined precision per kernel,
the linear kernel is better suited for original data across all three experiments.
Similar trends to those discussed either when we do or do not distinguish per
kernel, can be found in the MMD precision and OCSVM precision (Table 4),
with MMD precision at 0.78 when the threshold is 0.8 and OCSVM precision at
0.79 when we are strict and 0.97 when we are not. It is wroth noting that the
OCSVM precision for Experiment 2 when the kernel is linear is as high (almost
1) as for the other two experiments which illustrates the importance of the kernel
Model Reuse Framework 85
choice. Upon further analysis linear results yield the best results on average for
the original GNFUV data and hence the framework’s high precision overall.
Standardised Data: The comments made previously for the combined precision of
Experiment 3 when we are strict cease to be true and are instead true for Exper-
iment 1 and the combined precision is low. Nevertheless, similarly to the origi-
nal data the combined precision is extremely high for Experiments 1 and 3 we
are not strict with OCSVM. The Experiment 2 combined precision is almost half
what it is for the original data at the 0.8 threshold. Consequently, the overall com-
bined precision of the framework drops at the threshold level 0.8, to 0.45 and 0.61
when we are strict and non-strict respectively (Table 3). Contrary to the origi-
nal data where the combined precision per kernel showed that the linear kernel is
better suited, for the standardised data the opposite is true, while this difference
is not significant. This is also true for the OCSVM precision when analysed per
kernel. Overall, the OCSVM precision for Experiment 2 drops (Table 4), hence
the OCSVM weighted average precision across experiments drops by 30%. On
the other hand, MMD precision increases slightly by %4 due to an increase in the
precision of Experiment 2. Upon further analysis per kernel, the MMD precision
increased 15% per kernel with the linear kernel providing much better results.
86 X. Skotti et al.
Dataset Speedup
GNFUV Experiment Data configuration
Standardised Original
1 0.23 0.26
2 0.3 0.28
3 0.24 0.23
Weighted average 0.26 0.26
BM Unbalanced Balanced
0.29 0.41
88 X. Skotti et al.
Speedup. Overall, the speedup of the framework for the particular datasets
used for regression and classification are 26%, and 29% to 41% respectively
(Table 7). These results are expected if you consider that for the GNFUV dataset
regardless of the data configuration on average there is one good pair for reusabil-
ity hence one node’s model is not trained. The two data clusters created from
the BM dataset mean that ideally we would only train two models. Nevertheless
the results are lower than this average case due to the fact we use samples of
the dataset hence the true reusability differs from sample to sample. Hence, for
both the classification and regression case we can argue that the framework if
effective in identifying the true number similar pairs.
In this paper, we presented a novel online model reuse framework in edge com-
puting. The framework considers all possible pairs of nodes in the network and
infers which are good reusability pairs as well as which of the two nodes’ model
can be used as a replacement model for the other per pair. We utilise MMD as
our dataset similarity measure and we present a newly defined algorithm which
calculates a threshold that distinguishes similar from non-similar pairs. The node
model that is chosen to be reused in each pair is the one with the highest inlier
data space overlap. Experiments in the context of both regression and classifi-
cation have shown the framework achieves good precision. Lastly, we present an
algorithm that, given the results of the framework, can maximise the number of
nodes which use reused models along with a list of potential replacement models.
The framework presented is novel and therefore the results presented in this
paper while encouraging they are still preliminary. We experimented with only
one model per data domain and a limited range of data configurations. Con-
sequently, the evaluation of the framework needs to be extended to check the
compatibility with more domain models and data configurations. Even though
this framework in its current does not preserve user privacy it could be amended
to meet this requirement. In this paper, we hypothesise that the inlier space
overlap is an indicator for the direction of reusability. However, we only consider
one outlier detection model and there many more that could be used. Further-
more, the naive decision making algorithm proposed as part of the framework is
maximising the speedup, which does not guarantee that the solution is optimal
performance wise. Defining an algorithm which can produce either the perfor-
mance optimal or partially optimal solution is a different and challenging task
altogether.
Acknowledgment. This research has received funding from the European Union’s
Horizon 2020 research and innovation programme under Grant Agreement no.
101037247.
Model Reuse Framework 89
References
1. Cramer, J.S.: The origins of logistic regression. Tinbergen Institute, Tinbergen
Institute Discussion Papers, 01 (2002)
2. Drucker, H., Burges, C.J.C., Kaufman, L., Smola, A., Vapnik, V.: Support vector
regression machines. In Proceedings of the 9th International Conference on Neural
Information Processing Systems, NIPS1996, pp. 155–161, Cambridge, MA, USA
(1996). MIT Press
3. Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel
two-sample test. J. Mach. Learn. Res. 13, 723–773 (2012)
4. Harth, N., Anagnostopoulos, C.: Edge-centric efficient regression analytics. In: 2018
IEEE International Conference on Edge Computing (EDGE), pp. 93–100 (2018)
5. Hasani, S., Thirumuruganathan, S., Asudeh, A., Koudas, N., Das, G.: Efficient
construction of approximate ad-hoc ML models through materialization and reuse.
Proc. VLDB Endow. 11(11), 1468–1481 (2018)
6. Lee, J., Mtibaa, A., Mastorakis, S.: A case for compute reuse in future edge systems:
an empirical study. In: 2019 IEEE Globecom Workshops (GC Wkshps), pp. 1–6
(2019)
7. Li, C., Huang, S., Liu, Y., Zhang, Z.: Distributed jointly sparse multitask learning
over networks. IEEE Trans. Cybern. 48(1), 151–164 (2018)
8. Li, Y., Zhang, Z., Liu, B., Yang, Z., Liu, Y.: ModelDiff: testing-based DNN similar-
ity comparison for model reuse detection. In: Proceedings of the 30th ACM SIG-
SOFT International Symposium on Software Testing and Analysis, ISSTA 2021,
pp. 139–151, New York, NY, USA (2021). Association for Computing Machinery
9. Moro, S., Cortez, P., Rita, P.: A data-driven approach to predict the success of
bank telemarketing. Decis. Support Syst. 62, 22–31 (2014)
10. Schölkopf, B., Williamson, R., Smola, A., Shawe-Taylor, J., Platt, J.: Support
vector method for novelty detection. In: Proceedings of the 12th International
Conference on Neural Information Processing Systems, NIPS1999, pp. 582–588,
Cambridge, MA, USA (1999). MIT Press
11. Wei, Y., Zhang, Y., Huang, J., Yang, Q.: Transfer learning via learning to transfer.
In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference
on Machine Learning, vol. 80 of Proceedings of Machine Learning Research, pp.
5085–5094. PMLR, 10–15 July 2018
12. Wu, X.-Z., Xu, W., Liu, S., Zhou, Z.-H.: Model reuse with reduced kernel mean
embedding specification. arXiv preprint arXiv:2001.07135 (2020)
13. Zhao, P., Cai, L.-W., Zhou, Z.-H.: Handling concept drift via model reuse. Mach.
Learn. 109(3), 533–568 (2019). https://doi.org/10.1007/s10994-019-05835-w
14. Zhou, Z.-H.: Learnware: on the future of machine learning. Front. Comp. Sci. 10(4),
589–590 (2016). https://doi.org/10.1007/s11704-016-6906-3
Survey of Technology-Enhanced Learning:
Novel Pedagogical Concepts, Challenges
and Future Perspectives
1 Introduction
Education sector is becoming increasingly integrated with Information Technology (IT)
regarding the distribution, communication, and collaboration of learning as well as the
training materials. The IT support involves provisioning of infrastructure (such as hard-
ware, software, networking, storage); technologies of computing, people who are or will
work with the technologies connected over the Internet.
Many education systems have IT departments to deal with and manage the techno-
logical advances in computers, networks, and other businesses. The IT dissemination of
education helps in promoting knowledge dissemination opportunities for the instructors
and learners as well as help in implementing the latest technological innovations [1].
Nowadays, IT has made the learning process more efficient and beneficial by offering
IT-support for the institutions, universities, and schools with the help of software, servers,
and storage conclusively facilitating e-learning approach. This has increased the well-
being of the learners who can take advantage of the technological training methods
and are able to exchange the books with electronic devices (tablets and laptops). The
emergence and adoption of e-learning platforms has further simplified the process. For
the students who are not coming in institutions due to some urgent reasons and face
difficultly in connecting with their instructors in the classrooms find the e-platforms as
the best options for learning. These platforms give learners the chance to review course
at every moment with modest and more concise explanations and this strengthens the
process of learning and for most learners brings better results [2].
Technology-enhanced learning (TEL) is playing a vital role in distance education as
well as regular-mode online learning. It has emerged as a guide for one or more learners
located at geographically distant locations. TEL is used by educational institutions to
support their learning process and to provide training materials and information in real-
time and irrespective of their locale. TEL includes several hands-on imprisonments
of many learning technologies and facilitates the exploration of various processes for
designing, implementing, and critiquing the concept of e-learning inducing pioneering
notions underpinning such processes [3, 4]. The virtual learning and training are being
provided extensively with the advent of novel technologies offering many benefits such
as extensibility, diversification, scalability etc. [3].
In India, the growth of e-learning in the last few years has been an important catalyst.
Owing to the present COVID-19 scenarios, TEL has completely engulfed the education
sector. There are different platforms (Moodle Cloud, Canvas, Blackboard etc.) that offer
real-time classroom environments. Several online degrees and certifications (through
online learning platforms such as Coursera, EduX, Udemy) are feasible by virtue of TEL.
Not only this, but most of the Indian companies are also adopting e-learning platforms
because learning has become a strategic necessity for employees. The large companies
are adopting or engaging with the e-training platforms to provide their employees with
the aim of short-term courses, certifications, and capacity-based training [5].
In recent years, the concept of Technology Enhanced Learning (TEL) has gained sig-
nificant embellishment. There are technology-based training and instruction systems
through which students acquire skills or knowledge, usually with the help of teachers or
facilities; learning tools and technical resources [6]. However, being open to a very wide
area of explanations, this is not limited to any kind of technical or educational approach.
The use of technology to handle-up all situations where it plays a key role in formulation
training brings productivity, powerfulness, and learnability with pleasure.
A variety of technologies can be used to increase training. In its extensive sense,
“technology” can include both hardware such as collaborative whiteboards, smart tables,
handheld technology, sculptural objects, and software such as Hardware-added learning
systems, Training Management Systems (TMS), Simulation modelling tools, electronic
learning materials and science fix statistics, educational games, Web 2.0 social applica-
tions, 3D Virtual Reality, etc. The specific examples are technologically advanced and
creative small machines such as MIT’s OLPC (One Laptop per Child) or Intel’s Class
92 T. Kaur and S. Kaur
Edge computing has developed a new concept in the computing landscape. The edge
computing directs cloud computing data, applications, and services to the edge of a
network away from the cloud server. It brings cloud computing services and facilities
closer to the end-user and is characterized by more rapidly processing and faster response
times [9]. The content providers and application developers can use the edge computing
system by offering services to their immediate users.
According to IBM, edge computing has been described as a “distributed computing
framework that brings enterprise applications closer to data sources such as Internet of
Things (IoT) devices or local edge servers. The edge computing can provide proximity
to resources and data in significant business benefits: more rapidly insights, improved
reaction times, and better bandwidth availability” [10].
Fig. 1. Relationship between Edge Computing, Fog Computing and Cloud Computing
Survey of Technology-Enhanced Learning: Novel Pedagogical 93
The innovative version of cloud computing is edge computing that involves fetching
the services close to the destination users and reduces latency. The edge computing
provides resources and services by cloud in the edge network and minimize a load of a
cloud. Figure 1 shows the relationship between edge computing, cloud computing, and
fog computing using technology and IoTs like devices, nodes, and data centers.
The fog computing represents network links between edge computing devices and
the cloud. The edge computing refers specifically to computing processes performed
near edge devices. Therefore, the fog computing involves edge computing, as well as
the networks that are needed to send the processed data to the destination. In other
words, it’s a standard that explains how edge computing should work. This creates a
faster control loop using the fog computing model as the data is processed on the device.
As compared to the traditional cloud-based networks, edge computing helps institutions
break down the boundaries set with a new method to network architecture. The exciting
promises offered by edge and IoT devices enhance the ability to process data collected
close to the source [11].
There are some apparent benefits of edge computing over traditional technologies
like cloud computing that give it a superiority (particularly in provisioning TEL) as listed
below and depicted in Fig. 2 [12]:
1. Promotes interaction between students and faculty in and out of the classroom.
2. Active Learning: Active learning is a conceptual framework that captures the current
trend towards skills and training as opposed to classical knowledge learning. It is
encouraged in the classroom using structured exercises, challenging discussions,
team projects, and collaborative critiques.
3. Feedback Oriented: Digital feedbacks are moving forward more quickly with the
advent of learning analytics. The data can now be collected unobtrusively and during
learning activities. The learners need timely feedback on their performance to benefit
from the courses.
4. It is important to learn to use time wisely by students and professionals alike.
Survey of Technology-Enhanced Learning: Novel Pedagogical 95
(a) Human developmental regularities, which include the conditions for the devel-
opment of cognitive processes, the conditions for sensory development, as well
as the conditions for socio-emotional development.
(b) The taxonomy of the educational process, which includes the goals to be
achieved and the regularities of the learning process needed to achieve these
goals.
(c) Technological progress, which entails the need for changes in teachers’ ped-
agogical competence, where one of the most important components of this
competence is predictive analytical competence.
such as smartphones, tablets, laptops, etc. through which one can study subjects of
one’s choice. TEL implies that students and institutions are increasingly able to follow
specific areas of study, unbundled from complete programs and degrees. Comparatively,
in traditional learning, students are gathered under a roof at a specific time and specific
place. The teaching style of the traditional education system is teacher driven. The
learners discuss with the peers to clear their doubts or interact with the instructor after
the class to do the same. Subsequently, the knowledge attained by the learner depends
on the knowledge of the instructor [18]. Table 1 lists out the comparisons between TEL
and traditional learning.
TEL supports the use of technology for training and learning with the focus on improvis-
ing the quantity and quality of knowledge in the learning. The “learner-centric” means
increasingly evolution of learning is parallel to the rapid changes in TEL; there have
been important modifications in the roles and duties of teachers & education proce-
dures. The students are being facilitated with their education procedure under guided
expertise and are getting educated without time and place constraints, subsequently,
utilizing the technologies. This helps in reducing the limitations students face in the
schoolrooms. Additionally, the thrusting IT inventions aid in offering various options
for the development and growth of the TEL [19]. The educators need to be able to man-
age a variety of e-learning programs, they need to change the classroom from a fixed
Survey of Technology-Enhanced Learning: Novel Pedagogical 97
• IT helps teachers and administrators keep an eye on all students in the classroom
• IT has made both teachers and learners in education easier
• Education using digital books
• IT has made education fun and entertaining
• IT has made easy to access in research and information
• IT has made group study and application easier
98 T. Kaur and S. Kaur
TEL is the most important and latest trend in the present ICT era. TEL is transforming
and improving learning and informative institutions beyond recognition through that
is taking over learning in the form of different types of informative software. As the
education sector learns to harness the power of classroom devices, edge computing has
a profound effect on classrooms.
Edge computing technology enhances the educational applications and provides a
stage for quickness rather than reducing them down or ending them. It is a technology set
for future high growth, and will dramatically improve day-to-day operations for many
industries, including education.
Edge computing decentralizes computing resources and brings them closer to the data
source. When schools use edge computing, they prioritize connectivity and networking
across multiple campuses to eliminate slow speeds, which dramatically improves the
student and teacher experience [24]. When it comes to edge networks, computing, and
data storage, it’s close to the person building the application, device, or data [25].
Edge computing is already improving several higher education applications, such
as the quality of experience for end-users and network traffic management. There are
three learning and technology experts share how edge computing is gives a classroom-
education a boost [26]: (a) augmented and virtual reality, (b) Internet of Things and (c)
student’s outcomes.
Edge computing continues to evolve, using new technologies and practices to
enhance its capabilities and performance. Where edge computing is often situation-
specific today, technology is expected to become more ubiquitous and change the way
the Internet is used, bringing more abstraction and potential use to edge technology.
Wireless communications technologies, such as 5G, will also impact the deployment
and usability of edge in the coming years, enabling virtualization and automation capa-
bilities that are yet to be explored, such as better vehicle autonomy workloads migration
to the edge, while making wireless networks more flexible and cost-effective [27].
With TEL, teachers are no longer limited to the textbooks that their organizations pro-
vide. Using other resources such as video, audio, and interactive learning, learners have
many ways to learn. Educators can find creative ways to teach their students in an inter-
esting way. Technology has changed the learning environment so that learning is more
hands-on [29].
properly, we can say that the learning process will be positively affected. Technology-
enhanced learning environments not only encourage the transfer of content but also
support the use of robust re-evaluation methods. These environments are directed towards
the active participation of teachers and students and interaction between them. The
use of a technology-intensive learning environment contributes to the development of
students’ analytical thinking and problem-solving skills. It also allows teachers to follow
the learner’s position, organize the feedback system, and monitor their own situation.
In this paper, we have presented the pedagogical approaches of technology-enhanced
learning and presented the key role players of teachers, edge computing, and information
technology and learners in TEL.
Edge computing directs cloud computing data, applications, and services to the edge
of a network away from the cloud server. It brings cloud computing services and facilities
closer to the end-user and is regarded as by faster processing and faster response times.
Edge computing visualizes bringing services of cloud computing and utilities closer
to the end-user for ensuring fast processing of data-intensive applications. This paper
widely considers the essential concepts related to edge computing, presenting how edge
computing is used in education enabling TEL in education. It summarizes an analysis
of possible challenges in offering TEL through edge computing.
The future research can be directed towards offering knowledgeable material in
offline and online modes so that the future researchers can enhance their knowledge
irrespective of their location and time zones. Additionally, predictive analytics can be
integrated with the TEL such that student performances can be analysed as well as the
instructors can also utilise the predicted data for improvising their teaching approaches
or pedagogies.
Currently, the Information and Communication Technology (ICT) sector is wit-
nessing important reliability on novel IT technologies such as cloud computing, fog
computing and edge computing. Many ICT sectors are based upon these technologies
or an amalgamation of these. TEL sector can involve various educational frameworks
and models based on such impeccable underlying technologies.
Moreover, the world today is facing severe environmental sustainability issues per-
taining to the rising global warming and carbon emissions. The IT sector is being held as
the prime carbon and GreenHouse Gas (GHG) emitter. The TEL being enabled through
IT must be capable of proffering environmental sustainability goals. The United Nations
has also emphasized on enhancing environmental sustainability through TEL. The TEL
can be made greener by efficiently managing the resources green resource management
based on technologies and will be optimized for supporting a green and resource-aware
learning environment to promote sustainable education for a sustainable future.
References
1. Srinivasan, A., Quadir, M.A., Vijayakumar, V.: Hybrid cloud for the educational sector.
Procedia Computer Sci. 50, 37–41 (2015)
2. Li, H., Ota, K., Dong, M.: Learning IoT on edge: deep learning for the Internet of Things
with edge computing. IEEE Network 32(1), 96–101 (2018)
3. Selviandro, N., Hasibuan, Z.A.: Cloud-based e-learning: a proposed model and benefits by
using e-learning based on cloud computing for educational institutions. In: Information and
Survey of Technology-Enhanced Learning: Novel Pedagogical 101
1 Introduction
Each year, many young individuals (and their parents) throughout the world
face a burning question: “Which university or college to attend? How much
tuition will I pay? Will they help me get a job? Is it worth it?” Considering
the magnitude of this decision and its impact on the prospective students, it
is reasonable to examine as many schools as possible before deciding on which
school to attend. With over 5000 schools to choose from in the United States
alone, the task of fully investigating each school’s alignment with that of the
student’s goals and desires can be daunting and unrealistic. There are far too
many institutions to examine and not enough time for anyone to review all of
them to make the most beneficial decision possible.
To address the above challenge, this paper delves into social computing
and presents True-Ed Select, a machine learning framework to facilitate uni-
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
K. Arai (Ed.): FTC 2022, LNNS 561, pp. 103–120, 2023.
https://doi.org/10.1007/978-3-031-18344-7_7
104 J. Cearley and V. K. Pallipuram
2 Related Work
1
The terms ‘university’, ‘college’, ‘institution’, and ‘school’ refer to a post-secondary
educational institution and may be used interchangeably.
True-Ed Select 105
all of which highly concern the users. The CDSSM component handles the user’s
subjective natural language input to produce likely matches between the user
and the institutions. Our framework’s ability to analyze both the objective and
subjective attributes enables it to offer a well-rounded recommendation to the
end-users.
3 Background
In this section, we provide background on the content-based filtering machine
learning (ML) and convolutional deep semantic similarity (CSSSM) model
employed in this research.
3.2 CDSSM
The Convolutional Deep Semantic Similarity Model (CDSSM) [4,13] uses a series
of machine learning and natural language processing techniques to find similarity
between a user’s document and existing documents. The CDSSM process can be
broken into four distinct stages: 1) user data collection, 2) data vectorization, 3)
convolution semantic layer, and 4) cosine similarity.
In the first stage, the model collects the user data and the existing documents
(from a defined database) for comparison. In the second stage, the model pro-
duces word-n-grams, obtained from a sliding contextual window run across input
word sequences. Letter-trigrams are further produced from each word-n-gram as
a letter-trigram vector. CDSSM concatenates the letter-trigrams of each word
to yield a trigram representation for each word-n-gram in the words sequence.
In the third stage, a convolution operation extracts contextual information from
the word-n-gram sentences structure. CDSSM then applies max pooling to down-
sample the features in the word-n-grams to produce fixed-length feature vectors
across all the dimensions. Next, a semantic dense layer comprising a feed-forward
neural network extracts the non-linear semantic feature vectors. CDSSM applies
True-Ed Select 107
the third stage to both the user document and the existing documents to gen-
erate their respective semantic feature vectors. The fourth stage computes the
cosine similarity between the user-document vector and the existing-documents
vectors to generate a list of CDSSM scores. The CDSSM score ranges from 0 to
1 and denotes the level of similarity between the user document and the existing
documents. Further details on the algorithm can be found in [4,13].
4 Methodology
Figure 1 shows the Tru-Ed Select framework comprising four successive lay-
ers: user-interaction, machine learning, consolidation, and recommendation. The
first layer, user-interaction extracts users’ preference for common university
attributes that influence their decision. The second layer, machine-learning (ML)
inputs the users’ preference from the first layer to compare against the Integrated
Postsecondary Education System (IPEDS) [1] database. The third layer, consol-
idation combines the results from the ML techniques in layer-2. The fourth layer,
recommendation presents a curated list of universities that are most amenable
to users’ interests. Sections 4.1–4.3 expound on these layers.
Fig. 1. Overview of the true-Ed select framework comprising four layers: user interac-
tion, machine learning, consolidation, and recommendation.
Figure 2 shows layer-1, user-interaction that creates profiles for multiple users.
Specifically, it asks users questions about the attributes that influence their
choice of universities. We use five objective attributes and one subjective
attribute. The objective attributes include college location (distance from home),
financial aid, career services, medical school major availability, and cost of atten-
dance. The subjective attribute includes a natural language summary of what
108 J. Cearley and V. K. Pallipuram
Fig. 3. Layer 1: Survey questions employed by the framework to construct the user
profiles.
True-Ed Select 109
users desire for their ideal school experience. Figure 3 shows the survey questions
pertaining to each of the six attributes.
The location attribute asks users their willingness to relocate on a scale of
1 (not willing to move) to 10 (willing to move over 1000 miles). The financial
aid attribute asks users the importance of financial aid on a scale of 1 (not
important) to 4 (very important). To normalize this attribute, we multiply the
rating by 0.25—a normalized value equal to 0 means that aid is not critical, and
1 implies that aid is highly important. The career services attribute asks users
about the level of assistance provided by the university to secure part-time/full-
time jobs. This attribute uses the same scale as the financial aid attribute. The
medical school is a Boolean attribute that inquires about users’ interest in a
medical major. The cost of attendance attribute includes both the tuition and
cost-of-living. This attribute ranges from less than $5K to ‘do not care’. The
natural language summary attribute inputs the user’s free form response on
their expectations for an ideal college experience. For instance, a user may wish
to attend an engineering-specific school with small classroom sizes and a high
return on investments (ROI).
After extracting the user profiles, layer-1 passes this information to layer-2
for machine learning.
Fig. 4. The content-based filtering method and its two components: data-formatting
and Dot product.
apply from across the country. We employ an API [2] to calculate the distance
between a user’s home zip code and a given university’s zip code.
The rows of the UP matrix denote the specific users and the columns rep-
resent the five subjective attributes provided by them. The pseudo-code (Fig. 5
bottom right) describes the dot product operation and distance calculation to
yield the preference matrix, P REF . The CBF ML passes the preference matrix
to layer-3 for further processing.
Convolutional Deep Semantic Similarity Model (CDSSM)—Fig. 6 shows the
four stages of CDSSM namely the user survey, data vectorization, convolutional
deep semantic neural network (CDSNN), and cosine similarity. The user survey
stage parses the user’s subjective input (natural language summary) and the
mission statements of universities saved in the IPEDS database. The data vec-
torization stage converts the user input to word-n-grams to create tokens of the
words in the summary. The stage also creates the word-n-grams of the words in
the mission statements of the universities. The stage transforms each word-n-
gram into letter tri-grams, which are concatenated to form the tri-gram vector
representation for each word. These vectors constitute the letter tri-gram matrix.
The CDSNN stage performs convolution on the letter tri-gram matrix by taking
the hyperbolic tangent of the product of the convolution mask and the letter
tri-gram matrix values. Next, the stage performs max pooling to suppress the
insignificant features from the final semantic layer. The cosine similarity stage
performs a cosine similarity operation on the users’ natural language semantic
matrix and the mission statement semantic matrix to obtain a similarity matrix
containing CDSSM scores. In this matrix, the rows denote the multiple users and
columns denote the institutions from the IPEDS database. The CDSSM score
ranges from 0 to 1 and denotes the level of match between a given user’s sum-
mary and a given university’s mission statement. For a given user, CDSSM’s final
True-Ed Select 111
output is a list of CDSSM scores (between the user and multiple universities),
which is passed to layer-3 for further processing.
Fig. 6. Convolutional deep semantic similarity model (CDSSM) with its four stages:
user survey, data vectorization, CDSNN, and cosine similarity.
which presents the level of match between a user’s preferences (both objective
and subjective) with a given university. The layer-4, recommendation provides
a user-friendly interface to read the curated list of top 15–20 universities that
best match the users’ interests.
Table 1. The values of five objective attributes selected by four randomized users.
Table 2. The natural language summary (subjective attribute) for the users.
We use the six attributes in Table 1–2 and IPEDS data to test the CBF and
CDSSM ML. Specifically, we use these attributes to generate the CBF, CDSSM,
and True-Ed Select scores for various universities in the IPEDS database.
As a proof-of-concept, Sect. 5.1 provides the sample scores generated by the
CBF, CDSSM, and the overall framework for the four fictitious users across the
most interesting set of universities. Section 5.2 discusses the recommendations
provided to the four users by the framework.
5.1 Proof-of-Concept
Table 3 provides the CBF score, CDSSM score, and the framework’s True-Ed
Select (TES) score across four selected universities for two users: John Tesla and
Lindsey Croft. While these universities have similar CBF scores, the framework
employs the CDSSM score as a tie-breaker. As seen in this table, the framework
recommends Auburn University to John Tesla and Methodist College to Lindsey
Croft as the top institution. Similarly, Table 4 provides the sample CBF, CDSSM,
and TES scores across four universities for Nicola Cina and Urg Golum.
Table 3. The scores obtained by the content-based filtering (CBF), CDSSM, and the
overall framework (true-Ed select (TES) score) for John Tesla and Lindsey Croft.
Table 4. The scores obtained by the content-based filtering (CBF), CDSSM, and the
overall framework (true-Ed select (TES) score) for Nicola Cina and Urg Golum.
Fig. 7. University suggestions for John Tesla (top); Distribution of universities for
various scores per user’s choice of attributes (bottom).
User-1, John Tesla—Figure 7 (top) shows the United States of America (USA)
top colleges for John Tesla based on their choice of university attributes in
Table 1. John’s location is marked with a star and the university selections appear
in dots. Because John wishes to stay near home, True-Ed Select obtains nearby
colleges/universities.
Figure 7 (bottom) shows the frequency of institutions across CBF scores
considered by True-Ed Select as per the user’s choices. The distribution is
right-skewed about the score equal to 0, meaning that True-Ed Select sifts low-
True-Ed Select 115
performing schools to low scores and top-performing schools to high scores. This
process significantly reduces the search space for this user. Specifically, True-Ed
Select identifies 12 schools that obtain a score of 3.5 and above, which makes it
easier for John to evaluate their options versus evaluating hundreds of schools.
It is worth noting that a right-skewed distribution is the most desirable because
such a distribution only keeps the top-performing institutions at high scores.
This kind of distribution is evidently achieved via a relaxed choice of attributes,
such as the ones given by John Tesla.
Fig. 8. University suggestions for Lindsey Croft (left); distribution of universities for
various scores per user’s choice of attributes.
116 J. Cearley and V. K. Pallipuram
Fig. 9. University suggestions for Nicola Cina (Left); distribution of universities for
various scores per user’s choice of attributes.
User-2, Lindsey Croft—Fig. 8 (top) shows the top university selections for
Lindsey Croft. As seen in Table 1, Lindsey is not willing to move far away from
home; however they are willing to pay high cost of attendance, and desire career
services and financial aid. True-Ed Select identifies schools (in dots) that best
match Lindsey’s university attribute values.
Figure 8 (bottom) shows the distribution of universities across CBF scores.
Because Lindsey’s attribute choices are more firm than John Tesla, the distribu-
tion appears to be symmetric about the score of 0. Nonetheless, True-Ed Select
True-Ed Select 117
Fig. 10. University suggestions for Urg Golum (Left); distribution of universities for
various scores per user’s choice of attributes.
identifies 14 schools (out of 3753 throughout the USA) that obtain a score above
4, which simplifies this user’s task of college selection.
User-3, Nicola Cina—Fig. 9 (top) provides the university selection for Nicola
Cina. As per Table 1, this user is willing to move over 2000 miles. Therefore,
True-Ed select identifies universities/colleges that are away from their home and
yet satisfying the other attributes.
118 J. Cearley and V. K. Pallipuram
6 Conclusion
We present True-Ed Select, a machine learning framework to facilitate user-
friendly college/university selection. Our framework uses common objective and
subjective attributes to select a concise list of colleges/universities for the users.
The objective attributes include the distance of the schools from home, financial
aid availability, career services, choice of major, and cost of attendance (tuition
and living expenses). The subjective attribute includes a free-form response from
users describing their ideal choice of a university. The machine learning stage,
comprising content-based filtering and convolutional deep semantic similarity
model (CDSSM), inputs these attributes for machine learning. For a given user,
the framework produces objective scores, True-Ed Select scores, for different
universities within the IPEDS database. The framework sifts the low-performing
universities to low scores and keeps only a small set of schools in the high score
range. This process effectively reduces the search space, from several thousand
schools to less than 20, greatly simplifying the college selection task for the users.
This framework is currently a proof-of-concept. In the future, we aim to
include additional objective attributes such as GPA (grade point average), return
on investment (ROI), and graduation rate for a well-rounded recommendation.
After obtaining pertinent approvals from the Institutional Review Board (IRB),
we aim to conduct user surveys for an in-depth analysis of our framework. While
the research presented only considers the United States universities/colleges,
the framework seamlessly lends itself to universities in the other countries. We
envision that this open-source framework will be a valuable addition to the field
of social computing, globally helping high-school students and their parents with
the daunting task of college selection.
True-Ed Select 119
Finally, we note that the universities used in this work are for research pur-
poses only. The analysis showed only demonstrates the framework’s functional-
ity; the results should not be construed as actual recommendations provided by
the authors.
References
1. IPEDS: Integrated Postsecondary Education Data System. https://nces.ed.gov/
ipeds/. Accessed 14 Mar 2022
2. Sami: Zip Code Latitude Longitude City State County (2022) https://www.
mathworks.com/matlabcentral/fileexchange/45905-zip-code-latitude-longitude-
city-state-county, MATLAB Central File Exchange. Retrieved 14 March
2022. https://www.mathworks.com/matlabcentral/fileexchange/45905-zip-code-
latitude-longitude-city-state-county. Accessed 14 Mar 2022
3. Guo, W.-W., Liu, F.: Research on collaborative filtering personalized recommen-
dation algorithm based on deep learning optimization. In: 2019 International Con-
ference on Robots Intelligent System (ICRIS), pp. 90–93 (2019)
4. He, H., Lin, J.: Pairwise word interaction modeling with deep neural networks for
semantic similarity measurement. In: Proceedings of the 2016 Conference of the
North American chapter of the Association for Computational Linguistics: Human
Language Technologies, pp. 937–948 (2016)
5. Hu, O., FungYuen, K.K., Craig, P.: Towards a recommendation approach for uni-
versity program selection using primitive cognitive network process. In: 2017 Inter-
national Conference on Service Systems and Service Management, pp. 1–4 (2017)
6. Lee, C.P., Ng, Z.B., Low, Y.E., Lim, K.M.: Expert system for university program
recommendation. In: 2020 IEEE 2nd International Conference on Artificial Intel-
ligence in Engineering and Technology (IICAIET), pp. 1–6 (2020)
7. Muladi, U.P., Qomaria, U.: Predicting high school graduates using naive Bayes
in state university entrance selections. In: 2020 4th International Conference on
Vocational Education and Training (ICOVET), pp. 155–159 (2020)
8. Nayak, P.K., Madireddy, S., Case, D.M., Stylios, C.D.: Using fuzzy cognitive maps
to model university desirability and selection. In: 2017 IEEE International Confer-
ence on Systems, Man, and Cybernetics (SMC), pp. 1976–1981 (2017)
9. Nikhil, N., Srivastava, M.M.: Content based document recommender using deep
learning. In: 2017 International Conference on Inventive Computing and Informat-
ics (ICICI), pp. 486–489. IEEE (2017)
10. Powar, V., Girase, S., Mukhopadhyay, D., Jadhav, A., Khude, S., Mandlik, S.:
Analysing recommendation of colleges for students using data mining techniques.
In: 2017 International Conference on Advances in Computing, Communication and
Control (ICAC3), pp. 1–5 (2017)
11. Rutkowski, T., Romanowski, J., Woldan, P., Staszewski, P., Nielek, R., Rutkowski,
L.: A content-based recommendation system using neuro-fuzzy approach. In: 2018
IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 1–8 (2018)
12. Sharma, V., Trehan, T., Chanana, R., Dawn, S.: StudieMe: college recommendation
system. In: 2019 3rd International Conference on Recent Developments in Control,
Automation Power Engineering (RDCAPE), pp. 227–232 (2019)
13. Shen, V., He, X., Gao, J., Deng, L., Mesnil, G.: A latent semantic model with
convolutional-pooling structure for information retrieval. In: Proceedings of the
23rd ACM International Conference on Conference on Information and Knowledge
120 J. Cearley and V. K. Pallipuram
Management, CIKM 2014, pp. 101–110, New York, NY, USA. Association for
Computing Machinery (2014)
14. Van Meteren, R., Van Someren, M.: Using content-based filtering for recom-
mendation. In: Proceedings of the machine learning in the new information age:
MLnet/ECML2000 Workshop, vol. 30, pp. 47–56 (2000)
15. Zhao, W., Zhang, W.: Collaborative filtering service recommendation algorithm
based on trusted user and recommendation evaluation. In: 2018 IEEE 4th Inter-
national Conference on Computer and Communications (ICCC), pp. 2248–2255
(2018)
Exploring Public Cloud-ERP Systems’ Impact
on Organizational Performance
Abstract. Moving enterprise resource planning (ERP) systems to the cloud seems
inevitable. The cloud-ERP systems provide many opportunities to organizations.
On the other hand, barriers, and challenges to this move still exist. This research
provides an overview of relevant academic literature on public cloud-ERP migra-
tion and identifies the status quo for the body of knowledge within cloud-ERPs
delivered through software-as-a-service (SaaS). In addition, this paper explores the
motivators and inhibitors, and if cloud-ERP provides the ability to enhance orga-
nizational performance from an IT perspective. The study is motivated by a gap in
research in investigating if cloud-ERP systems’ characteristics encourage or hin-
der enterprises from migrating to the public cloud. Cloud-ERPs delivered in SaaS
models are distributed as a service over the Internet, usually through a public cloud
infrastructure with shared resources. SaaS enables organizations to pay for services
and functions they use and removes the need to maintain the complex information
technology infrastructure by client organizations. The cloud-ERP-system’s char-
acteristics found in the literature are organized, and presented based on DeLone,
and McLean’s IS success model dimensions, and the Technology-Organization-
Environment (TOE) framework. Our main findings suggest that cloud-ERP’s sys-
tem and service quality are the most discussed issues in the literature. The system
quality attributes of cloud-ERPs identified are scalability, availability, accessibil-
ity, reliability, and the ability to compose and customize web services, motivat-
ing organizations to adopt cloud-based ERP. On the other hand, service quality
attributes are found as inhibitors for moving to the cloud due to the organization´s
dependency on vendors´ support and service throughout the product lifecycle, and
the security risks related to the public cloud environment.
1 Introduction
Information technology (IT) plays an essential role in the performance of business activ-
ities and their success. The motivation for pursuing IT and information system (IS)
enhancements usually emerges from realizing a need to optimize processes, reduce
costs and resources, improve productivity and efficiency, and increase organizational
competitiveness. Today, those benefits are especially relevant as companies confront
2 Method
This study followed the guidelines for systematic literature review by Webster and Wat-
son [13]. Sandberg and Alvesson [14] imply that posing innovative and challenging
questions about existing literature is essential to generate or identify exciting and sig-
nificant theories. Hence, to accumulate research evidence from the existing body of
knowledge, this research has identified literature that discusses cloud-ERP character-
istics that may contribute to or inhibit enhanced business performance. Accordingly,
the research findings in the acquired literature are categorized and clustered according
to DeLone and McLean [11] updated IS success model and Tornatzky et al. [12] TOE
framework. The IS success model posits that system, information and service quality can
enhance organizational benefits. Combining TOE with the IS success model to become
an integrated model aims to identify system characteristics and measures and internal
and external influences contributing to organizational performance.
Since DeLone and McLean published the IS success model in 1992, nearly 300
articles in refereed journals have referred to and used the model to measure the dependent
variable in IS research [15]. The updated model has added “service quality” as a third
dimension to “system quality” and “information quality” as components of IS success.
Additionally, DeLone and McLean [11] believed it is more parsimonious to combine
“individual” and “organizational impacts” into a single variable, namely “net benefits.”
The authors initially used the term “impacts” to measure effectiveness and success.
However, in the ten-year update paper, they imply that “impacts” may be positive or
negative, thus leading to a possible confusion as to whether the results are good or
bad. Including “net” in “net benefits” is vital because no outcome is wholly positive
without any negative consequences. Thus, “net benefits” is sought to be the most accurate
descriptor of the updated success variable [11].
This paper reviewed the literature published between 2015 and 2022. The selected range
ensures a timely review of the state-of-the-art technologies and up-to-date literature
studying the latest issues of cloud-ERPs adoptions. The literature search was based on
four primary sources of scientific papers databases: ACM Digital Library, Science Direct,
Google scholar, and Emerald. The search terms included: cloud-ERP success; cloud-ERP
system quality; cloud-ERP information quality; cloud-ERP and business performance,
as well as synonyms and combinations were used. Additionally, a secondary search
was conducted by scanning all the selected articles’ reference lists to identify additional
literature. Furthermore, data from the acquired literature were extracted in guidance by
the main research question of this study. The following research inclusion and exclusion
criteria were defined to ensure a narrow yet comprehensive literature review. The articles
needed to be published in peer-reviewed journals or conference proceedings to ensure
the quality of the literature. No limitations on the industry type or organizational size
were adopted to gather a wider set of research results. The authors read all the papers in
order to check their relevance to this research and to be able to classify the final set of
papers into the various dimensions and adopted frameworks in this research.
124 M. Øverdal et al.
Table 1. (continued)
Due to the growing popularity of cloud-ERPs, the literature shows a great interest in
cloud computing in general and cloud-ERP adoptions in specific. Figure 1 below illus-
trates the research methods distribution adopted by the authors of the reviewed articles.
Three papers adopted a case study research method. Four studies have conducted quan-
titative surveys, one conceptual paper discussed cloud-ERP systems with no empirical
data, and four papers were literature reviews. It is important to note that the most recent
and relevant review on cloud-ERP systems identified in this research was published in
2019. Furthermore, based on the increased number of cloud-ERP adoptions and imple-
mentation projects during the pandemic, a more updated literature review is sought to be
needed. Table 2 provides an overview of the reviewed studies and the methods adopted
in their research.
Case study
Survey
Literature review
Conceptual paper
0 1 2 3 4 5
Table 2. Overview of reviewed papers mapped with their adopted research design/method
4 Findings
In cloud-ERP environments, system quality measures the desired characteristics of a
cloud-ERP system, and information quality measures (among others) accuracy, time-
liness, and completeness of the information provided by the IS. System and infor-
mation quality characteristics in literature are included in the technological contexts
as predictors for migrating to the cloud. Environmental contexts refer to government,
partners/providers, and industry influences. Service quality is merely a subset of sys-
tem quality. However, this instrument includes measures, such as service reliability,
responsiveness, assurance, and empathy, primarily dependent on IS employees´ (system
providers) user service. Hence, service quality is included in the environmental context.
Organizational contexts refer to the characteristics and resources of the studied organi-
zations in the literature. The third and fourth IS success dimensions - intention to use and
user satisfaction are included in the organizational context. Table 3 shows cloud-ERP
characteristics cited in the literature and contains benefits (+) and hindrances (−) for
migrating to the public cloud.
Table 3. ERP Characteristics Cited in Literature Mapped with Benefits (+) and Barriers (−) for
Migrating to the Public Cloud
Table 3. (continued)
users. The users can alter the service provider with any changes they require to their
applications. Nevertheless, the study also states that cloud-ERP solutions in packages
with limited customization and integration options may require organizations to add
more integration features, followed by additional costs [26]. Chang [16] implies that
cloud service providers should design more effective systems relevant to organizational
processes and tasks. On the other hand, a case study by Bjelland and Haddara [7] in the
Norwegian cloud-ERP market suggests that cloud-ERP vendors are generally reluctant
to customize cloud-ERP system implementations for their clients, as the concept goes
against the one to many application-infrastructure designs. In addition, customizations
would increase the need for increased vendor support and involvement of vendor-ERP
consultants within the adoption projects, complicate implementations and affect the scale
and speed of cloud-ERP offerings [7].
Reliability. Research by Chang [16] illustrated various enablers and inhibitors for
switching intention/migration from on-premise ERP systems to cloud-ERP. For exam-
ple, system quality refers to the performance characteristics of cloud-ERP systems, and
reliability is presented as an essential contributing factor to increasing system quality
[16]. Moreover, Muslmani et al. [26] studied solutions for adopting Cloud-based ERP
systems and reducing the integration complexity. The need for organizations to integrate
the on-premises systems with their cloud system to ensure data synchronization remains
one of the most significant challenges since most data are saved on the cloud. They sug-
gest that reducing integration complexity before migrating to the cloud could add an extra
level of reliability and productivity. Integrating a traditional, on-premises ERP system
with a new cloud system also may improve information quality [26]. Muslmani et al. [26]
suggested that the solution is an application program interface (API) which facilitates
the integration process, as APIs specify how the system’s components should interact.
The issue is raised as every cloud service provider has its API standards which might cre-
ate conflicts when integrating the current system with the cloud service provider. Thus,
using standard APIs may avoid integration problems and increase system reliability.
Moreover, López and Ishizaka [21] compiled a list of criteria for cloud-ERPs that
organizations should aim for when considering moving to the cloud. The authors studied
a company who decided to adopt a cloud-based ERP system to improve data integration
and operate more efficiently. The list of criteria is related to system and software quality
for evaluating SaaS ERP applications. Furthermore, it was found that the “systems”
criterion, which included reliability, customization, maintainability, security, usability,
and functionality, was considered the most relevant in the cloud-ERP selection process.
A case study conducted in the UAE’s public sector suggests that governmental orga-
nizations may easily migrate from on-premise ERP and align their institutional work
processes with the inbuilt logic of cloud-ERP, resulting in successful and rapid adoption
[18]. Furthermore, Gupta et al. [3] survey found that SMEs and large organizations do
not differ in integration, security, functionality, and provider integrity.
high availability and improved accessibility and scalability as perceived cloud-ERP bene-
fits. Jain and Sharma [24] survey discovered that cloud-ERP adoption improved scalabil-
ity and supported modern user experience and socially enabled businesses. Improvement
in system accessibility was another prominent feature of Cloud-ERP that helped firms
customize security services.
Alsharari et al. [17] findings suggest that cloud-ERP´s ease of use and control and
management leads to increased flexibility of the processes’ accessibility. Their study
results demonstrate that the elasticity of different operations and procedures has risen
dramatically since the beginning of the cloud-ERP integration in their case. The sys-
tem’s accessibility can also be enhanced by accessing the needed information from the
organization’s database, which became available from any online resource. Due to the
system´s accessibility, the productivity in different departments was also enhanced and
boosted organizational efficiency due to the optimum-utilization services provided by
the cloud-ERP system that the company applied, which, in turn, improved the overall
organizational performance [17].
Real-Time Information Flow. Information quality refers to the characteristics of the
output (data/information) provided by cloud-ERP systems Chang [16]. Completeness,
understandability, and relevance of data/information are sought to be considered as
enablers of switching and migrating to cloud-ERPs Chang [16]. Thus, organizations
may benefit from the accuracy of information provided by cloud-ERP systems because
information quality is related to workforce collaboration, productivity, and efficiency.
Jain and Sharma [24] identified several benefits for cloud-ERP systems, like improved
information integration for better decision making and faster response time to customer
queries as direct aspects of cloud-ERP real-time information quality. Indirect aspects of
information quality of cloud-ERP systems included better corporate image, improved
customer goodwill, and customer satisfaction [24]. Similarly, [23] also found a clear
positive impact of cloud-ERPs information quality on organizational performance. The
authors proposed that cloud-ERP acts as the catalyst for real-time information flow
between department and manufacturing processes. Cloud-ERP catalyzes supplier inte-
gration, and business integration helps organizations scale efficiently, leading to better
financial and economic performance. Their findings suggest that cloud-ERPs reduce
data losses, enable real-time cloud operations, and improve processing time. The study
concludes that overall economic, social, and environmental performance growth can be
acquired by deploying cloud-ERP [23].
Data Security. Literature frequently mentions data security as the primary concern
related to cloud-ERP systems and that data security may inhibit cloud-ERP adoption [3,
16, 19, 20, 22, 25].
Chen et al. [20] argue that successful cloud-ERP adoption does not depend on the
product itself but mainly on the vendor´s support and the customer experience with
provided service. Hence, the paradigm changes from product feature to service trust
in handling the data securely. Abd Elmonem et al. [19] identified data ownership as
another challenge related to security and data management. Conversely, Alsharari et al.
[17] studied a company that believes that their cloud-ERP vendor properly secures
their organization’s data security and privacy. Data security issues might be linked to
inefficient providers of cloud-ERP rather than the system itself.
Exploring Public Cloud-ERP Systems’ Impact 131
Human Resources and Key Users. Jain and Sharma [24] state that cloud-ERP would
be beneficial for improving resource utilization, enhancing collaboration capabilities,
reducing environmental footprint, and reducing IT infrastructure needs. The importance
of human resources is relevant in the cloud-ERP environment, as Gupta et al. [23] imply;
organizational, people, and technological factors are crucial resources of an organization,
enhancing the process of nurturing the dynamic capability. López and Ishizaka [21] argue
that key users’ active involvement in the cloud-ERP implementation process enables
subsequent training, use, and acceptance of the technology. The key users’ expertise
is critical in accomplishing successful ERP initiatives and adoptions (2017). Likewise,
other studies also suggest that employees’ IT knowledge and training of users are crucial
organizational concerns affecting cloud-ERP adoption success [25].
Intention to Use and User Satisfaction. Perceived risk of cloud-ERP systems and sat-
isfaction with and breadth of use of on-premise ERP systems hinder the adoption of
cloud-ERP [16]. However, data quality, system quality, information quality, and positive
peer-employee opinions are found to affect the perceived benefits leading to increased
intention to use cloud-ERP systems [16]. Other technical characteristics of cloud-ERP,
such as compatibility, complexity, and trialability, may also enhance the organiza-
tional comparative advantage and adoption likelihood [25]. In addition, the individual
132 M. Øverdal et al.
of modules, which may ease the trust establishment between the clients and service
providers, given the reduced complexity of the systems (2017).
5 Discussion
Based on the review of cloud-ERP system characteristic benefits and inhibitors, the
following part presents the main research focus and some research gaps in the existing
literature. Although twelve articles from 2015 to 2022 are a low number of studies, this
review identified reoccurring research themes and findings across the articles (refer to
Table 3).
Common denominators of the literature are the hindrances for utilizing cloud-ERPs,
primarily based on environmental contexts, such as vendors’ support and service, and to
what extent the organization is dependent on the vendors (vendor lock-in). This is in line
with several studies that categorized vendor lock-in as one of the major challenges of
cloud-ERP adoptions, as the service providers, host, operate, and support both the appli-
cation layer and the data layer. Additionally, customization and integration limitations
are other barriers to cloud-ERP migration. Given that Cloud-ERP providers may vary
extensively when placing governance over cloud-ERP services, it can be expected that
the utilization of cloud-ERP would provide a different degree of efficiency and service
reliability across organizations. Additionally, six out of the twelve reviewed articles pro-
pose data security as a top concern for businesses utilizing cloud services. Although data
security is categorized in the technological aspect and is considered a system quality
measure, cloud-ERP differentiates from other technical attributes since it is delivered as
a service. Thus, the literature discusses data security issues regarding vendors´ trustwor-
thiness and information privacy protocols due to vendors´ access to the organizational
master data and the public cloud. Hence, it may be appropriate to view data security
as a service measure. Security concerns are relevant, as cloud-ERPs may cause data
leakage and/or suffer other vulnerabilities that may affect client organizations. Some
studies (e.g. [17]) imply that security issues might be linked to incompetent providers of
cloud-ERP rather than the system itself. Moreover, client-side users’ IT experience may
improve the perception of security risks in cloud ERPs [24], as some studies suggest that
the security measures taken by cloud ERP providers may be better standards than what
their clients can provide themselves [2]. This may indicate that real-life case studies
and surveys conducted in IT companies or organizations with IT-savvy employees may
provide more valid/realistic evidence regarding the cloud-ERP security landscape.
however, it is not evident if these benefits yield positive net benefits for the organizations
in general. For instance, cloud-ERP offers financial benefits, such as pay per use, lower
up-front costs, cost transparency, and affordability compared to increment ERP systems.
Nevertheless, the costs of cloud-ERPs are not subtracted to capture the actual balance
of positive and negative financial impacts. In general, studies focusing on the total cost
of ownership (TCO) of cloud-ERP systems are needed.
The on-demand feature is essential in cloud-ERPs and is cited as an accessibility-
related dimension in literature. However, some vendors may also access the data, intro-
ducing data security and privacy issues. Researchers might need to investigate the balance
between accessibility and vendor trust. Moreover, future research should investigate how
the benefits and challenges offset each other. Finally, vendors’ perspectives on data secu-
rity and how they aim to create business value and maximize trust, and the steps they take
to reduce the security concerns for their enterprise customers, maybe another interesting
future research avenue.
6 Conclusion
By utilizing the DeLone and McLean’s IS success model and the TOE framework, this
paper attempts to identify which system characteristics of cloud-ERP may lead to an
improved organizational performance or may hinder the migration to the public cloud.
All the reviewed articles found a positive impact of cloud-ERP on organizational perfor-
mance in different contexts. Cloud-ERPs improve collaboration capabilities, scalability,
reliability, system availability, collaboration, and accessibility, enhancing organizational
efficiency. Additionally, real-time information flow and frequent system updates are con-
firmed to improve decision-making processes. These impacts are due to cloud-ERPs sys-
tems and information quality. However, service quality features such as vendor lock-in
and reliability might hinder organizations from moving to the cloud. Data security issues,
integration complexity, and customization difficulties may also inhibit cloud migration.
These findings may help service providers to work on strategies to reduce or eliminate
those concerns, enhance trust, and encourage enterprises to move to a cloud-based ERP
to enhance their business performance in general.
References
1. Christiansen, V., Haddara, M., Langseth, M.: Factors affecting cloud ERP adoption decisions
in organizations. Procedia Comput. Sci. 196, 255–262 (2022)
2. Sædberg, A., Haddara, M.: An exploration of adoption factors for cloud-based ERP systems
in the public sector. In: NOKOBIT, vol. 24, no. 1 (2016)
3. Gupta, S., Misra, S.C., Singh, A., Kumar, V., Kumar, U.: Identification of challenges and
their ranking in the implementation of cloud ERP: a comparative study for SMEs and large
organizations. Int. J. Qual. Reliabil. Manag. (2017). https://doi.org/10.1108/ijqrm-09-2015-
0133
4. Wang, X.V., Xu, X.W.: An interoperable solution for cloud manufacturing. Robot. Comput.
Integr. Manuf. 29(4), 232–247 (2013). https://doi.org/10.1016/j.rcim.2013.01.005
5. Galov, N.: 25 cloud computing statistics in 2020 - will AWS domination continue? In:
HostingTribunal (2020)
136 M. Øverdal et al.
6. Saa, P., Moscoso-Zea, O., Costales, A.C., Luján-Mora, S.: Data security issues in cloud-based
Software-as-a-Service ERP. In: 2017 12th Iberian Conference on Information Systems and
Technologies (CISTI), pp. 1–7. IEEE (2017). https://doi.org/10.23919/cisti.2017.7975779
7. Bjelland, E., Haddara, M.: Evolution of ERP systems in the cloud: a study on system updates.
Systems 6(2), 22 (2018)
8. Demi, S., Haddara, M.: Do cloud ERP systems retire? An ERP lifecycle perspective. Procedia
Comput. Sci. 138, 587–594 (2018). https://doi.org/10.1016/j.procs.2018.10.079
9. Gartner: Gartner forecasts worldwide public cloud revenue to grow 17.5 percent in 2019
(2019)
10. Haddara, M., Staaby, A.: RFID applications for patient safety in the healthcare sector. In:
Quality of Healthcare in the Aftermath of the COVID-19 Pandemic, pp. 155–179. IGI Global
(2022)
11. DeLone, W.H., McLean, E.R.: The DeLone and McLean model of information systems suc-
cess: a ten-year update. J. Manag. Inf. Syst. 19(4), 9–30 (2003). https://doi.org/10.1080/074
21222.2003.11045748
12. Tornatzky, L.G., Fleischer, M., Chakrabarti, A.K.: Processes of Technological Innovation.
Lexington Books, Lexington (1990)
13. Webster, J., Watson, R.T.: Analyzing the past to prepare for the future: writing a literature
review. MIS Q., xiii–xxiii (2002). https://doi.org/10.1080/12460125.2020.1798591
14. Sandberg, J., Alvesson, M.: Ways of constructing research questions: gap-spotting or
problematization? Organization 18(1), 23–44 (2011). https://doi.org/10.1177/135050841037
2151
15. DeLone, W.H., McLean, E.R.: Measuring e-commerce success: applying the DeLone &
McLean information systems success model. Int. J. Electron. Commer. 9(1), 31–47 (2004).
https://doi.org/10.1080/10864415.2004.11044317
16. Chang, Y.-W.: What drives organizations to switch to cloud ERP systems? The impacts of
enablers and inhibitors. J. Enterp. Inf. Manag. (2020). https://doi.org/10.1108/jeim-06-2019-
0148
17. Alsharari, N.M., Al-Shboul, M., Alteneiji, S.: Implementation of cloud ERP in the SME:
evidence from UAE. J. Small Bus. Enterp. Dev. (2020). https://doi.org/10.1108/jsbed-01-
2019-0007
18. Alsharari, N.M.: Cloud computing and ERP assimilation in the public sector: institutional
perspectives. Transf. Gov. People Process Policy 16, 97–109 (2021)
19. Abd Elmonem, M.A., Nasr, E.S., Geith, M.H.: Benefits and challenges of cloud ERP systems–
a systematic literature review. Future Comput. Inform. J. 1(1–2), 1–9 (2016). https://doi.org/
10.1016/j.fcij.2017.03.003
20. Chen, C.-S., Liang, W.-Y., Hsu, H.-Y.: A cloud computing platform for ERP applications.
Appl. Soft Comput. 27, 127–136 (2015). https://doi.org/10.1016/j.asoc.2014.11.009
21. López, C., Ishizaka, A.: GAHPSort: a new group multi-criteria decision method for sorting a
large number of the cloud-based ERP solutions. Comput. Ind. 92, 12–25 (2017). https://doi.
org/10.1016/j.compind.2017.06.007
22. Sørheller, V.U., Høvik, E.J., Hustad, E., Vassilakopoulou, P.: Implementing cloud ERP solu-
tions: a review of sociotechnical concerns. Procedia Comput. Sci. 138, 470–477 (2018).
https://doi.org/10.1016/j.procs.2018.10.065
23. Gupta, S., Meissonier, R., Drave, V.A., Roubaud, D.: Examining the impact of Cloud ERP on
sustainable performance: a dynamic capability view. Int. J. Inf. Manag. 51, 102028 (2020).
https://doi.org/10.1016/j.ijinfomgt.2019.10.013
24. Jain, D., Sharma, Y.: Cloud computing with ERP-a push business towards higher efficiency.
Ann. Res. J. SCMS Pune 4 (2016). https://doi.org/10.2139/ssrn.2755457
Exploring Public Cloud-ERP Systems’ Impact 137
25. Tongsuksai, S., Mathrani, S., Taskin, N.: Cloud enterprise resource planning implementation:
a systematic literature review of critical success factors. In: 2019 IEEE Asia-Pacific Confer-
ence on Computer Science and Data Engineering (CSDE), pp. 1–8. IEEE (2019). https://doi.
org/10.1109/csde48274.2019.9162373
26. Muslmani, B.K., Kazakzeh, S., Ayoubi, E., Aljawarneh, S.: Reducing integration complexity
of cloud-based ERP systems. In: Proceedings of the First International Conference on Data
Science, E-learning and Information Systems, pp. 1–6 (2018). https://doi.org/10.1145/327
9996.3280033
A Generic Neural Network
Implementation on GPU and Its
Performance Benchmark
1 Introduction
As a massively parallel platform for general-purpose computing, Graphics Pro-
cessing Units (GPUs), traditionally the video cards, have been applied beyond
graphics processing. For example, GPU has been widely utilized in computa-
tional chemistry, biology, and computer vision, to accelerate problem solving
and simulations [3,7,23], thanks to its tremendous amount of computational
power that is enabled by thousands of processor cores on a GPU.
Artificial Neural Networks (ANN) are a crucial foundation for deep learn-
ing and many machine learning algorithms. However, training an ANN on the
Central Processing Unit (CPU) is quite computationally intensive. During ANN
training, for each training data sample, the Back Propagation (BP) algorithm
must traverse all neurons in hidden layer(s) and the output layer of such a net-
work [12,24]. As the number of layers, the number of neurons in each layer and
the size of the training data set increase, ANN training may dramatically slow
down.
Training an ANN is inherently parallel, thus is suitable to be parallelized
using a GPU. In an ANN, neurons in the same layer are independent of each other
during the training process. In addition, each training data sample is independent
of other samples during training, if batch mode is used [1] Traditionally, GPU
accelerated ANN training took advantage of both levels of parallelism in two
separate designs/implementations [13].
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
K. Arai (Ed.): FTC 2022, LNNS 561, pp. 138–154, 2023.
https://doi.org/10.1007/978-3-031-18344-7_9
GPU ANN 139
2 Related Work
Early work nearly twenty years ago showed the promise of GPUs in their infancy.
In 2004 an ATI RADEON 9700 achieved 20× speedup over CPU [20]. Another
test in 2006 tested specifically CNNs and found it could achieve a speedup of
between 3.1× and 4.1×. These tests were promising but such tests must be done
on new hardware [5].
More recent tests looked at more updated devices. One such test looked at
devices that are available in many desktops and used python packages such as
Numba to achieve 100–250× speedups [8]. The advantage of testing on common
devices using popular languages and packages is that the benchmark provided
applies for many implementations that would be seen in practical applications.
A downside is that testing against Python (even with Numbas’ jit optimizations)
will bias the results to make the cpu appear slower than it is. A different study
used cuBLAS (cuda basic linear algebra subprograms) to implement large RNNs
and found 2× to 11× speedup over the CPU [16]. A benefit to this is that
cuBLAS is popular for implementing linear algebra operations through cuda,
but the downside is that further optimization can be achieved by designing
kernels directly.
Several studies have been done to compare lower level implementations
against optimized CPU code. One such study reported 10–60× speedup depend-
ing on the size of the network when comparing the GPU to compiler optimized
C code [6]. Another study used the GTX 260 to achieve 50× speedup over opti-
mized CPU code [11], and a third tested CNNs versus C++ code using the –O3
optimization to achieve 2–24× speedup [22]. These tests provide a good bench-
mark, but utilized lower end hardware. A better GPU, the Tesla C2050, was
shown to provide up to 1095× speedup [21], showing that better GPUs can be
expected to increase the tested speedups into the thousands.
Many tests have also been conducted using mobile hardware. One such study
showed a range of 2× to 9× speedup [15], while another testing recurrent neural
networks RNNs found at least 4× speedup over the mobile phones CPU [4]. The
best tests conducted achieved a super-linear speed of 63.45× [14], showing just
how much potential a mobile devices GPU has for machine learning.
Several studies have probed into the advantages provided by using multiple
GPUs. Source [25] used two GTX 570’s and compared the speedup displayed by
one GPU to the speedup shown by using both. The one GPU sped up over the
CPU by 11.99× while both together achieved 51.35×. A second study compared
140 T. Udby and Y. Tian
different amounts of GPUs together, using eight Tesla P100s for mini-batch of
256 took 29 h while one thousand twenty-four Tesla P100s with a mini-batch size
of 32768 took 15 min to complete [2]. These tests had to grapple with the bot-
tleneck provided by updating the weights. For any large-scale machine learning
system, utilizing multiple GPUs efficiently would be necessary.
Other studies have investigated comparisons between GPUs and other hard-
ware besides a fully synchronous CPU. One such study compared the GPU
implementation to a parallel MLP implemented on a multi-core CPU [18]. While
the GPU outperformed synchronous CPU code with about 4–5.66× speedup, the
multi-core CPU outperformed the GPU with about 6–9.7× speedup. This inter-
estingly led to the conclusion that multi-core CPUs might be useful for machine
learning. A second [17] study looked at the use of FPGAs for CNNs and used a
Virtex-7 FPGA to process the network 8.3× faster than the Titan X GPU, also
with 75× more energy efficiency.
Our study designs kernels using C/CUDA based on tile matrix multiplication
and tests them against a C based implementation that is optimized using the
–O2 flag. Our generic implementation allows for an arbitrary amount of nodes
and hidden layers to be specified and runs for a predetermined amount of epochs.
Several GPUs are accessed using Google Colab to observe what speedups can
be achieved on different hardware. Unlike many of the studies mentioned, no
packages or higher level languages are tested, and the GPUs are high end. This
allows it to be useful as a benchmark of what can be accomplished by dedicated
systems.
Our Contributions: First, our implementation in C/CUDA successfully inte-
grates tile matrix multiplication and is tested against a sequential implementa-
tion using C that is optimized using the –O2 flag. Second, our generic imple-
mentation allows for creating an arbitrary number of neural nodes and hidden
layers and running for a predetermined amount of epochs. Third, several GPUs
are accessed using Google Colab to observe what speedups can be achieved on
different GPU hardware. Unlike many of the existing studies, no packages or
higher level programming languages are used in our tests. This allows it to be
useful as a performance benchmark.
In comparison with existing research, this work has the following limita-
tions. Our work does not test over using multiple GPUs to implement one ANN.
But, we test our ANN implementation on different GPUs in different experi-
ments. Also, the speedup is only compared to sequential CPU code and does
not compare it to other hardware (such as FPGAs, multi-core CPUs, TPUs,
etc.). Finally, the tests run a basic multilayer perceptron and do not test CNNs,
RNNs, or other forms of neural networks.
The outline of the rest of the paper is as follows. First we will state the
necessary math for multilayer perceptrons in matrix form in Sect. 3. Then, we
will explain how the data for these steps is stored and processed in parallel in
Sect. 4. After this, we present pseudocode in Sect. 5 for the construction of the
CUDA kernels necessary to conduct the math previously shown. In Sect. 6, we
test the code on several different GPUs and discuss our results in the performance
section.
GPU ANN 141
Let Yr denote the matrix of node values for layer r. The rows correspond with
the nodes in the layer and the columns correspond with training pairs. Then
let Wr be the matrix of weights going from layer r to layer r + 1. The rows
correspond to nodes in the out-layer and columns correspond with nodes from
in-layer, except for the first column, which stores the bias weights. The activation
function used in the program is the sigmoid function, denoted here simply as
f (). By appending a row of ones to the top of the matrix of node values, denoted
Y˚r , a single forward step is calculated using the formula below.
3.2 Backpropagation
DL = YL ◦ (J − YL ) ◦ (YL − O) (2)
The matrix J stands for an all-ones matrix, and ◦ is the Hadamard prod-
uct that defines component-wise multiplication. Also, the matrix O holds the
expected output that is being used to train the MLP. After the base case is
calculated, the deltas for every other layer can be calculated using the deltas of
the layer in front of it with the following equation.
The change in weights can be calculated using the matrix of deltas and the trans-
pose of the matrix of node values with the appended ones. Using the learning
rate µ and the matrices already defined, the change in all weights (including bias
weights) for a given layer r is given.
T
∆Wr = −µDr+1 Y˚r (4)
4 Data Representation
Before developing code, we must figure out how to store the necessary data. We
have matrices of weights, node values, and delta values. First, let us store all
matrices of a kind in an array in order of their rows.
Y = [Y1 , Y2 , . . . , YL ]
D = [D1 , D2 , . . . , DL ]
W = [W0 , W1 , W2 , . . . , WL ]
Notice that the first element in W is given an index of 0, while the first
elements of D and Y are given indices of 1. This is because layer 0 is the input
layer and W0 represents the weights going from the input layer to the first hidden
layer (or output layer if there are no hidden layers). The matrix D doesn’t
calculate delta values for the input layer as there is no error associate with the
input itself. The matrix Y could have a Y0 , but that is already held in the array
being given as input and including Y0 would only mean copying the data from
the input array into Y0 , which wastes time.
Using multidimensional arrays for this representation of data allows for a
very intuitive way of accessing the information. Let yr,j (i) be the value of node j
in layer r for training pair i, and let δr,j (i) be the delta value for that same node
and training pair. Accessing this information as it has been stored in an array
becomes Y [r−1][j][i] and D[r−1][j][i], respectively. The one has to be subtracted
from the row because the array uses zero based indexing but the first element
of these arrays started with index one. Now let wr,j,h be the weight going from
node j in row r to node h in the next row. This can be accessed as W [r][j][h + 1].
The one is added because the first column holds the bias weight. The bias weight
going into node h from layer r is denoted br,h and can be accessed as W [r][h][0]
For those who look over the code a few details are necessary. When pro-
gramming on a GPU it is best to linearize multidimensional arrays into a single
array. This poses a difficulty with our data representation. When linearized, an
element of a three-dimensional array may be accessed as M [z ∗ matrix size + y ∗
columns amount + x] which is equivalent to M [z][y][x]. However, this assumes
all matrices are of the same size with the same amount of rows and columns.
This is not the case for us, as the size of our matrices are dependent on how many
nodes are in each layer. Because of this, the starting index for a matrix in the
GPU ANN 143
vector has to be calculated beforehand. We define two arrays: nodes indices and
w indices. Because Y and D have identically sized matrices corresponding to
nodes and training pairs, the calculated starting indices of the matrices in both
of these arrays can be stored in the nodes indices array. The W indices will be
stored in w index. To calculate these starting indices, and also to reference the
amount of columns a given matrix has, we also need an array to store the size
of each layer in order, which will be called layers. In the pseudocode presented,
these lower level details are omitted and the arrays are treated as though they
are stored as three-dimensional.
5 The Kernels
The following are pseudocode for the kernels that must implement the math
using the data structures previously described. For the sake of brevity, the pseu-
docode omits certain minor details that would be necessary to make functioning
code, but are not necessary for displaying the concept. Each kernel is built around
the tile matrix multiplication algorithm for GPUs [19].
In the pseudocode, kr is the amount of nodes in layer r. Also, the variable
batch size refers to the amount of training pairs being run at a time. This
allows the kernels to be useful for mini-batch mode as well, but in the code all
the training pairs were run at once in full batch mode. Finally, the variables bx,
by, tx, and ty are used as shorthand for blockIdx.x, blockIdx.y, threadIdx.x, and
threadIdx.y.
The forward propagation kernel implements equation (1) for a single layer, and
will be called iteratively for each step. First, the matrix multiplication section
within the for loop handles Wr Yr . Notice that this multiplication skips the bias
nodes by adding one to the column. After the multiplication section completes,
every thread adds the bias weight to the accumulated value for its index and
passes it into the sigmoid activation function.
Also, notice that there is a final check at the end to see if we just went from
the second to last layer, L − 1, to the last layer, L. If the layer is the second to
last layer, then we just completed the final forward computation. At this point,
we actually do the calculation for the first delta values given by equation (2). We
do this because we have all the resources allocated and information calculated,
so it would be a waste to exit the kernel and dedicate a separate kernel to what
can be done in one line.
The kernel for weights updating must implement equation (4). Once we calcu-
T
late Dr+1 Y˚r we only need to multiply each element by the learning rate µ and
subtract it from the respective element of weights matrix. Both the weights and
the biases will be updated at the same time. For the multiplication, we only
T
need to find how to deal with Y˚r as we haven’t stored a 1 at the end of the
nodes values. This is solved with a brief if-else statement that shifts the threads
assigned column down for each thread, and all threads assigned column 0 are
given the value of 1 to create the all-ones vector. Notice that Yr is transposed
using the same simple trick used on the weights matrix in the backpropagation
kernel.
Three different GPUs were used to test the performance of the code: the Tesla
K80, the Tesla T4, and the Tesla P100. The GPU’s and their specs are in Table 1
in order of performance, with the Tesla K80 being the weakest to the Tesla P100
being the strongest. The difference in performance between GPUs provides a
good demonstration of the extent to which the speedup is dependent on hard-
ware.
Two sets of tests were run for each hardware. First, the CPU and GPUs
were run and timed for an amount of hidden layers ranging from 2 to 16 and an
amount of nodes per hidden layer from 10 to 700 (incrementing by 10). These
tests were to observe how the speedup behaved as a function of the amount of
nodes and layers. Because so many networks were being run in this test it was
not feasible to include larger networks in this mass testing as the CPU took too
much time to complete. Instead, the second test observed the speedup for much
GPU ANN 147
larger networks that were run individually. This test ran for 5000 nodes, 7500
nodes, and 10,000 nodes, each over 3 layers. The code was compiled using the
–O2 optimization flag to give the CPU code its best performance. In prior tests
that omitted any optimization, the speedup was much greater for all tests. This
detail is necessary to understand that the results displayed are a lower bound for
the performance of the GPUs. Unoptimized code, or code running on languages
other than C, are likely to afford the GPU much greater speedups over the CPU
through the use of these kernels.
The following sections go over the first round of tests, then after them is a
section on the large network tests. The next three sections will each display a
figure with four parts. The upper left is a snippet of the dataframe containing the
results of the test with the specific GPU. The upper right is a multiple regression
conducted using the language R. Then the bottom two parts are scatter plots
to visualize the data. The first scatter plot is three-dimensional and displays
the relationship between the amount of hidden layers, the amount of nodes per
hidden layer, and the speedup. The second plot is a two-dimensional plot that
cuts out the layers to focus on the effect of the amount of nodes.
Observing the dataframe in Fig. 1, for the smallest networks tested the CPU
outperformed the GPU (without the optimization flag, the GPU outperformed
the CPU, but only barely). Intuitively, we can’t really expect any GPU to do
much better than this. The network being worked with is so small the CPU is
able to brute force the computation in nearly the time it takes for a GPU to set
up and tear down its kernels. Only if the clock speed of the GPU cores could
match the CPU could a significant speedup be seen.
However, as the network reaches the largest sizes tested, the speedup has
increased to 83×. The quantity of the many cores in the K80 overpower the
quality of the CPU core. This can be observed visually in the scatter plots.
There is clearly a positive trend with moderate curvature.
We see diminishing returns as the amount of nodes increases. To test our
observations statistically we turn to the regression in the top right of Fig. 1.
After trying several models, a full second order model was chosen. This is because
in the 2D plot we see a change in slope for different amounts of layers which
implies an interaction term, and the diminishing returns we suspect should occur
demands the quadratic terms. The model fits very well, with R2 and Ra2 values
148 T. Udby and Y. Tian
above 0.94 and all p-values at the highest significance that R will calculate.
Looking at the estimates for the parameters, the tests allow us to be confident
that there is small negative curvature for both nodes and layers, as well as a
positive interaction between them.
(*Note: The equation provided by the regression does not predict very well.
A residual analysis hints that there is a curvilinear relationship that is not
explained by the model, which might come from the warp scheduling done by the
GPU itself, or some other factor external to the code. However, the regression
still allows us to be confident about the overall effect that a change in the nodes
and layers must have on the speedup).
6.2 Tesla T4
The dataframe sample for the T4 is in Fig. 2. The smallest networks tested are
only a little faster than for the K80. The only increase in speedup comes from
the slightly faster clock speed as the increase in cores is not yet utilized. As
explained previously, not much more can be expected for such small networks,
but with the larger networks we see a more dramatic increase from the 83×
reported by the K80 to a 139.7× seen with the T4.
A look at the plots for the T4 show slightly more erratic speedup behavior for
larger networks. This was observed over several rounds of tests and only showed
up with the T4, so it is likely to originate with the hardware rather than our
code.
GPU ANN 149
Fig. 2. T4 Test
The same general trends are observed that were seen in the K80. To ensure
these observations are accurate, we conduct another multiple regression on the
data. Again, the hypothesis tests for all terms show high significance. Also, the
coefficients of determination are strong, but we can see the decrease that corre-
sponds with the more erratic behavior we noted. The small negative curvature
is more gradual this time, allowing quicker growth in speedup before plateauing.
p-value for the quadratic nodes term has actually increased massively compared
to prior test, which reduces the statistical significant of that term. The large drop
in significance likely comes from our observation that the diminishing returns for
nodes were not captured in the dataframe. Because the trend occurs for much
larger networks on the P100, the statistical test doesn’t observe it with as much
confidence.
Both the GPU and GPU find great significance in the interaction term, and
the explanation of proportions previously given no longer applies here.
It appears the increase in layers amplifies the time taken for the amount of
nodes. It may just be a type 1 error, but the p-value of the t-tests allow for
very high confidence. A possible explanation may be that locality is being better
exploited by the forward and backward propagation. Advice given by GPU Gems
[10] states:
“Access vertex data in a relatively sequential manner. Modern GPUs cache
memory accesses when fetching vertices. As in any memory hierarchy, spa-
tial locality of reference helps maximize hits in the cache [...]”.
152 T. Udby and Y. Tian
The arrays of nodes and delta values are both being accessed sequentially
along layers when propagating forward and backward. This same data structure
is used by the CPU which has powerful caching and so also benefits from locality,
so this may be the explanation. However, this remains conjecture for now.
7 Conclusion
A GPU provides significant speedup for Artificial Neural Networks. While hard-
ware clearly makes a significant difference in how much improvement can be
expected, improvement will still be achieved. As the regressions show, an increase
in the amount of nodes and layers leads to large increases in speedup. This tells
us a GPU becomes more practical the larger the network becomes, as the larger
network better exploits its parallel capabilities. Overall, the same general trends
are observed in the estimates between all GPUs tested, allowing us to conclude
that they will likely hold on other GPU hardware. A user implementing the ker-
nels shown can expect (depending on hardware) anywhere from several hundred
to several thousand times speedup for large networks.
References
1. Abraham, A.: Artificial neural networks. In: Syden-ham, P., Thorn, R. (eds.) Hand-
book of Measuring System Design. John Wiley and Sons Ltd., London, pp. 901–908
(2005)
2. Akiba, T., Suzuki, S., Fukuda, K.: Extremely large minibatch SGD: training resnet-
50 on ImageNet in 15 minutes. arXiv preprint arXiv:1711.04325 (2017)
3. Beckingsale, D.A., et al.: Portable performance for large-scale scientific applica-
tions. In: 2019 IEEE/ACM International Workshop on Performance, Portability
and Productivity in HPC (P3HPC), pp. 71–81. IEEE (2019)
4. Cao, G., Balasubramanian, N., Balasubramanian. A.: MobiRNN: efficient recurrent
neural network execution on mobile GPU. In: Proceedings of the 1st International
Workshop on Deep Learning for Mobile Systems and Applications, pp. 1–6 (2017)
GPU ANN 153
5. Chellapilla, K., Puri, S., Simard, P.: High performance convolutional neural net-
works for document processing. In: Lorette, G. (ed.) Tenth International Workshop
on Frontiers in Handwriting Recognition, La Baule (France), October 2006. Uni-
versité de Rennes 1, Suvisoft. https://www.suvisoft.com
6. Ciresan, D.C., Meier, U., Masci, J., Gambardella, L.M., Schmidhuber, J.: Flexi-
ble, high performance convolutional neural networks for image classification. In:
Twenty-Second International Joint Conference on Artificial Intelligence (2011)
7. Dematté, L., Prandi, D.: GPU computing for systems biology. Brief. Bioinform.
11(3), 323–333 (2010)
8. Dogaru, R., Dogaru, I.: Optimization of gpu and cpu acceleration for neural net-
works layers implemented in python. In: 2017 5th International Symposium on
Electrical and Electronics Engineering (ISEEE), pp. 1–6 (2017)
9. Dolhansky, B.: Artificial neural networks: Matrix form (Part 5), Decem-
ber 2014. https://www.briandolhansky.com/blog/2014/10/30/artificial-neural-
networks-matrix-form-part-5
10. Fernando, R.: Reducing the Cost of Vertex Transfer, Chapter 28.3.2. Addison-
Wesley (2004)
11. Guzhva, A., Dolenko, S., Persiantsev, I.: Multifold acceleration of neural network
computations using GPU. In: Alippi, C., Polycarpou, M., Panayiotou, C., Ellinas,
G. (eds.) ICANN 2009. LNCS, vol. 5768, pp. 373–380. Springer, Heidelberg (2009).
https://doi.org/10.1007/978-3-642-04274-4 39
12. Hassoun, M.H.: et al.: Fundamentals of Artificial Neural Networks. MIT Press,
Cambridge (1995)
13. Huqqani, A.A., Schikuta, E., Ye, S., Chen, P.: Multicore and GPU parallelization
of neural networks for face recognition. Procedia Comput. Sci. 18, 349–358 (2013)
14. Salar, S., Oskouei, L., Golestani, H., Hashemi, M., Ghiasi, S.: CNNdroid: GPU-
accelerated execution of trained deep convolutional neural networks on android.
In: Proceedings of the 24th ACM International Conference on Multimedia, pp.
1201–1205 (2016)
15. Lee, J., et al.: On-device neural net inference with mobile GPUs. arXiv preprint
arXiv:1907.01989 (2019)
16. Li, B., et al.: Large scale recurrent neural network on GPU. In: 2014 International
Joint Conference on Neural Networks (IJCNN), pp. 4062–4069 (2014)
17. Li, Y., Liu, Z., Kai, X., Hao, Yu., Ren, F.: A GPU-outperforming FPGA accelerator
architecture for binary convolutional neural networks. ACM J. Emerg. Technol.
Comput. Syst. (JETC) 14(2), 1–16 (2018)
18. Ma, Y., Rusu, F., Torres, M.: Stochastic gradient descent on modern hardware:
Multi-core CPU or GPU? synchronous or asynchronous? In: 2019 IEEE Interna-
tional Parallel and Distributed Processing Symposium (IPDPS), pp. 1063–1072.
IEEE (2019)
19. Nugteren, C.: Tutorial: Opencl sgemm tuning for kepler (2014). https://cnugteren.
github.io/tutorial/pages/page1.html
20. Kyoung-Su, O., Jung, K.: GPU implementation of neural networks. Pattern
Recogn. 37(6), 1311–1314 (2004)
21. Pallipuram, V.K., Bhuiyan, M., Smith, M.C.: A comparative study of GPU pro-
gramming models and architectures using neural networks. J. Supercomput. 61(3),
673–718 (2012)
22. Strigl, D., Kofler, K., Podlipnig, S.: Performance and scalability of GPU-based
convolutional neural networks. In: 2010 18th Euromicro Conference on Parallel,
Distributed and Network-based Processing, pp. 317–324 (2010)
154 T. Udby and Y. Tian
23. Vouzis, P.D., Sahinidis, N.V.: GPU-blast: using graphics processors to accelerate
protein sequence alignment. Bioinformatics 27(2), 182–188 (2011)
24. Yegnanarayana, B.: Artificial Neural Networks. PHI learning Pvt. Ltd. (2009)
25. Zhang, S., Gunupudi, P., Zhang. Q-.J.: Parallel back-propagation neural network
training technique using CUDA on multiple GPUs. In: 2015 IEEE MTT-S Interna-
tional Conference on Numerical Electromagnetic and Multiphysics Modeling and
Optimization (NEMO), pp. 1–3. IEEE (2015)
Monitoring Technologies for Animal Welfare:
A Review of Aspirations and Deployments
in Zoos
1 Introduction
Routine monitoring of animals in captive settings is essential to provide insights into
the quality of life of the animals and for maintaining and improving exacting standards
of animal welfare in zoos. Monitoring technologies are useful for optimizing welfare
strategies with non-invasive observation of animals. Advanced tracking and monitoring
technologies are used for welfare considerations for livestock, captive, and wild animals.
Monitoring the behaviour of animals in zoos promotes science-based decision making
and future planning for best-case animal care solutions [1]. While one of the goals of this
research is to find technological solutions to take labour intensive duties off zookeepers
including smart data collection and analysis, the focus is the identification of non-invasive
monitoring technologies to enhance animal welfare.
We identified progressive zoos, research labs, institutions and companies working
with monitoring technologies to improve animal welfare [2]. We review literature on
wearable and nonwearable monitoring technologies including camera traps, remote
video camera systems (CCTV), additional technologies, software applications and digital
tools for data collection, storage, sharing, and analysis. We include monitoring technolo-
gies from sanctuary, domestic and agricultural environments as these may also prove fit
for purpose in zoo environments. We sent a questionnaire survey with five plain language
questions to zoos identified as being concerned with animal welfare [2] to determine
what monitoring technologies were already in use and what their future ‘wish list’ would
be.
In structuring the article, we have placed the method section after the introduction and
before the literature section. The method section, (Sect. 2), details the literature review
selection process to provide context for the ‘literature reviewed’ section (Sect. 3) which
forms the greater part of the of the paper. Section 4 covers zoo responses to the five-
question survey on current use, perceived limitations, issues and wish lists for improving
conditions and animal welfare. The discussion section outlines and maps these wish
lists and more general requirements besides existing, or in-technology-development-
solutions identified in the ‘literature reviewed’ section. The conclusion summarizes the
findings and recommendations for the zoo scenarios that may be more broadly applicable.
3 Literature Reviewed
In this section we highlight the technologies covered in the literature review process that
relate to potential use for zoo environments.
Monitoring the behaviour of animals in zoos can provide valuable insights into animal
welfare and promote a process of science-based decision making in animal management
[1]. Monitoring relates to (remote) monitoring of the animal behaviour, control, and
studying the populations of wildlife (see Fig. 1).
Behavioural monitoring is the scientific collection of animal behaviour data to under-
stand ‘normal’ patterns of behaviour and identify changes in these patterns [4]. Used
effectively, monitoring can indicate problems compromising animal well-being.
Monitoring Technologies for Animal Welfare 157
used for understanding the threats and causes of population decline and assessment of
endangerment status of species [9].
A wireless activity monitoring system (Wireless Sensor Network (WSN)) would
allow scientists to collect data and investigate behaviour without needing to chase and
capture animals and offers a promising solution to monitor animal behaviour [12].
The International Polar Year project [13] used a Conductivity-Temperature-Depth
Satellite Relay Data Logger with southern elephant seals to quantify how animals respond
to differences in the environment because the seals’ behaviour and population trends
signal prevailing conditions for multiple marine habitats. The research collated estimates
of population size to determine the number of southern elephant seals in the Southern
Ocean, comparing these to published numbers to determine overall change.
Twenty-six baboons were each equipped with a smart collar that embeded a tri-
axial accelerometer and GPS to identify running, walking, sitting, standing, and feeding
activities. The system fuses sensors data to perform intelligent behaviour identifica-
tion, allowing for automatic activity profiling by using the ethologists’ agreed activity
identification system and avoiding prior subjectivity in categorising activities [14].
Camera Traps
Camera traps are remote devices equipped with sensors (e.g., motion, infrared) that
record images or videos automatically. They are an important wildlife research tool
that offer a practical approach to answer questions about wildlife beyond density or
estimation of animal populations [16]. For example, camera traps allow researchers to
determine the presence of rare species and sometimes reveal how to better support their
recovery [17]. When used in combination with telemetry, they are useful to examine
scavenging behaviour [18]. Camera Base software is a tool that helps biologists manage
data from multiple camera trap surveys and provides tools for data analysis including
capture-recapture, occupancy, activity patterns and diversity [18–20].
[20] compared the efficiency of arboreal camera trapping with line transects for
inventorying medium and large-sized arboreal mammals and assessed the viability of
using camera traps in trees to model habitat occupancy. Cameras recorded 10-s video
clips for ease of identifying species with 200–300 videos processed per hour. Videos
can be reviewed at double speed and analysed in statistical software.
Monitoring Technologies for Animal Welfare 159
Collaborative wildlife monitoring and tracking large geographical and time scales
with volunteer citizen scientists using camera-traps (motion sensitive cameras) has
expanded conservation research [21, 22]. Priorities identified for future improvement
include automated camera-trap image analysis for animal detection, tracking, species
recognition, advanced machine learning and image analysis methods to improve per-
formance and successful deployment. 2.6 million images of several North American
mammals were processed using eMammal, a biological informatics cyber-infrastructure,
which brings together citizen scientists and wildlife professionals to collect, analyse, and
manage massive camera-trap data. The system comprises: (1) software for viewing, tag-
ging, and uploading photographs, (2) expert review to ensure data quality, (3) an archive
for approved data, and (4) a website for managing the study, including the partici-
pants, and accessing and analysing the data. Macrosystem scale monitoring of wildlife
by volunteer-run camera traps could produce the data needed to address questions
concerning broadly distributed mammals and raise public awareness of conservation
science.
Using 83 camera traps (Bushnell Trophy Cam™), researchers examined the accuracy
of camera trap data to provide assessments of chimpanzees (Pan troglodytes) party size,
seasonal variation in party size, community demographic changes (births, deaths, emi-
grations, immigrations), and community composition (age/sex structure) and habituation
to camera traps [23].
A photographic capture – recapture survey used remotely triggered modified and
installed Pentax ‘point and shoot’ cameras in a waterproof plastic box with a receiver and
separate wireless passive infrared trigger. Later modifications enabled infrared images.
Remote RFID (Radio-frequency identification) scanners have been deployed in a range
of situations for passive monitoring and work well in the wild to record the diversity of
co-occurring species [24].
Bushnell camera traps with infrared sensor and low glow LED flashs, equipped with
SD cards and lithium batteries were left in place to take bursts of three pictures [25].
Camera trapping combined with citizen science was efficient for long-term non-invasive
monitoring at low cost.
A remote camera trapping method took images and video, providing identification
of individual free-roaming wild horses across a range of habitats and capturing multiple
animal-based welfare indicators. This was useful where horses could not be sighted
regularly, for a long enough duration or approached closely enough to enable direct
assessment of welfare. Precise, strategic camera placement and settings enhanced quality
of the data and minimised battery usage and SD card storage [26].
Comparative Review
Comparative testing of the five most frequently used camera traps [27] (Bushnell Tro-
phy Cam Aggressor, Keep Guard 680V, Ltl Acorn Ltl-5310, Scoutguard SG550BV,
and Reconyx HyperFire) identified key factors influencing the probability of successful
usable photographs. Performance differences from varied settings demonstrated cau-
tion is needed for direct comparisons between results of different experiments, or when
designing new ones [27]. The study [28] compares three commonly used camera traps
(Reconyx PC850, Scoutguard KG680v, Bushnell Trophy) used for monitoring behaviour
160 A. Morrison and A. Novikova
of fauna, general survey of fauna and detection of medium to large terrestrial animals to
improve fauna conservation.
Testing in the Zoo with Trail Cameras
Trail cameras were tested in three zoos; Auckland Zoo, Hamilton Zoo, and Currumbin
Wildlife Sanctuary to examine how red panda would respond to these cameras within the
context of gauging their usefulness for wild settings. The author [29] used two main types
of cameras: a Kinopta Blackeye BE2-W (‘Blackeye’) and two different models of trail
cameras: a Bushnell Trophy Cam Aggressor and Browning Dark Ops sub micro-series.
Direct personal observations were also taken, noting typical significant factors, such as
weather and temperature. Statistical analysis demonstrated a significant difference in
types of behaviours recorded with the two observational methods, exposing that method
does affect the type of data collected. Trail cameras affected behaviour at all zoos by
changing the way red panda spent their time, with captive red panda more active when in
trail camera presence. Temperature also had a significant impact with red panda sleeping
and resting longer at higher temperatures. As trail cameras changed the way red panda
spent their time (in a captive setting), care should be taken for using trail cameras in the
wild to compensate for inflated activity estimates.
Advances from camera trap array data (Reconyx infrared cameras) paired with data
collected from GPS (wearable) tracking collars (containing a triaxial accelerometer, and
ultra-high-frequency transmitter for telemetry and data download) was used to detect
whether, at the population level, the spatial and temporal patterns of detections reflected
the proximity of space use to sampling sites, or variability in the magnitude of animal
movement across the area [30]. Not accounting for multi-species movement may bias
inferences of ecologic processes and result in mis-specified recommendations.
Nonwearable Wildlife Advanced Monitoring Camera (WAMCam) and wearable
(smart collar) monitoring technologies were combined [31]. WAMCam is a smart camera
unit, connected by satellite communications and backed by a system control panel to
manage a collection of deployed devices [32]. This system combines several WAMCam
smart devices, communicating over LoRaWAN with a SATCOM gateway device. The
rugged, battery-powered cameras are designed with AI onboard, capable of identifying
difference species of interest. WAMCam devices monitor live animal traps and send
notifications to the end user when the trap is triggered via SMS and/or email in real time.
To minimise cost, the WAMCam system uses Iridium SBD messaging to notify the user
of the animal trap status and contents. Small, text-based messaging works for sites with
satellite visibility issues. SBD messages are received at the ground station and forwarded
to the Cerebella middleware, where they are processed and passed to the end user as
notifications. Frequency of status reports can also be configured remotely. Notifications
can include the detected species in the trap or indicate when the trap is empty and was
accidentally triggered, e.g., by a falling branch. LoRaWAN allows the user to position the
animal trap where required, unconstrained by satellite visibility constraints. The system
is configured via the web-based Cerebella control panel where devices are managed, with
status updates. Use demonstrated the multi-scale modelling identified primary habitat
Monitoring Technologies for Animal Welfare 161
requirements, limiting factors and the spatial scales at which organisms are strongly
associated with key habitat factors. The projected model provides crucial information
for conservation management, including the identification of suitable core habitats and
medium-quality habitats, critical to meta-population viability through provisioning of
essential connectivity corridors for dispersal and mating among core populations.
monitoring wildlife with low-cost solutions to make CCTV more accessible to wildlife
practitioners and naturalists. While [35] provides recommendations for animal facilities
on installing systems. They outline the benefits of camera systems for sanctuaries to
facilitate animal care and observational research. Further, [36] identified costs, main-
tenance logistics, and location as issues and recommended use for easily identifiable
behaviours. The study [37] used CCTV for sleep monitoring combined with cortisol
measuring for stress testing to assess animal welfare states.
behaviour of elephants in urban zoo environments and provides a basis for future wel-
fare recommendations [42]. These elephants displayed behaviours and travel distances
comparable to those in the wild [43]. Data was collected without disturbing elephants’
usual routines. The work promotes monitoring technology use in further zoo studies,
alleviates the need to attach sensors to animals and enables footage to be played in real
time or viewed later.
Delhi Zoo installed CCTV cameras (n = 230) on the premises and in animal enclo-
sures, for 24/7 monitoring of animal and human behaviour [44]. The zoo plans to intro-
duce virtual reality technology, to allow visitors to “get closer” to the animals, and a
GPS-based mobile application to make zoo visits more engaging and informative. The
technologies can provide dependable behavioural information 24/7 while minimising
time and resources used in long-term monitoring. Long-term behaviour data can be inte-
grated into zoo management strategies to respond to the changing needs of animals to
social, environmental, or physical changes.
The Association of Zoos & Aquariums (AZA) Animal Welfare Committee recom-
mends that zoo professionals develop tools for measuring zoo animal welfare on an indi-
vidual animal-based level. Multiple zoos and aquariums have developed their own assess-
ment tools and programs. These include EthoTrak® (developed by the Chicago Zoolog-
ical Society), EthoSearch (developed by Lincoln Park Zoo and partners), ZooMonitor
(developed by Lincoln Park Zoo and partners), WelfareTrak® (developed by the Chicago
Zoological Society and partners), and the geriatric animal quality of life assessment pro-
cess developed by San Francisco Zoo’s Wellness and Conservation Center. These tools
are provided for the zoological community to engage in on-going behavioural moni-
toring and facilitate a continual assessment of animal welfare. Some are offered free
to Accredited Organizations (Zoo, Aquarium, Sanctuary or Museum). For example,
ZooMonitor is a popular free application used in many zoos including the Smithsonian’s
National Zoological Park, North America, the sanctuary Chimp Haven, Shreveport, LA,
etc. Companies selling technology may supply their systems with inbuilt software, such
as Gview, supplied as part of the CCTV system.
multifaceted and multilevel monitoring system of goat welfare, this system may provide
a useful reference for future precision livestock farming and surveillance.
Surveillance of farm animals and automatic detection of deviant behaviours is evolv-
ing in livestock science and farming [45]. [45, 47] use two computer vision algorithms to
analyse and record the movement activity of single-housed sows. The system transforms
the signal, so sows are reliably detected and monitored, with detection levels customised
so unexpected behaviour raises alarms.
reptiles and monkeys. Temperature is key in this building, because reptiles and amphib-
ians are housed on the upper level and mammals on the ground floor. Each require unique
settings. If the system detects a problem, alerts are instantaneous. Additional entry and
motion sensors can operate as a whole building security system.
Cardiopulmonary Activity
The study [52] used digital cameras for basic health checks to reduce anaesthetic use for
zoo animals. Monitoring included nine species of zoo animals: giant panda (Ailuropoda
melanoleuca), African lions (Panthera leo), Sumatran tiger (Panthera tigris sumatrae),
koala (Phascolarctos cinereus), red kangaroo (Macropus rufus), alpaca (Vicugna pacos),
little blue penguin (Eudyptula minor), Sumatran orangutan (Pongo abelii) and hamadryas
baboon (Papio hamadryas) [53]. The non-contact, non-invasive and cost-effective mon-
itoring system uses digital camera imagery to extract cardiopulmonary signal (PR and
BR) of unrestrained animals at different distances detecting motion on the animal body
surface caused by cardiopulmonary activity. This novel method provides non-contact
physical monitoring and remotely sensed health assessment of animals, demonstrating
promise for applications in veterinary practice, conservation, game management, animal
welfare and zoological and behavioural studies.
Thermal (Infrared)
The author [54] worked with thermal (infrared) imaging in a sanctuary setting where
unrestrained chimpanzees were able to move freely around their enclosures. This was
coupled with an evaluation of pairing information with long-term behavioural data for
a multifactor welfare monitoring system. Use of thermal imaging in large and complex
environments is useful where enclosure elements may otherwise occlude (e.g., trees,
low-light conditions) or for e.g., non-invasive documentation and tracking of wound and
infection healing from a distance.
Used for observation of wildlife in their natural habitat and overview of thermal
physics and the thermal imager, [55] included a manual on sound survey design, the-
ory and performance characteristics of thermal imaging cameras with cooled quantum
detectors and uncooled micro bolometric imagers as introduced in past decades [55].
The study [56] describes how thermal images (or thermographic cameras) work and
presents some examples of using this technology in a variety of contexts beyond wildlife
monitoring, including research on migrations [57], behaviour (e.g., flight patterns; [58]),
welfare and disease diagnosis [59], to avoid killing of animals (e.g., farmland bird nests,
fawns) during mowing [60], to detect wind farm collisions of birds [61].
The contrast between the heat emitted by animals and their immediate surroundings
can help detect them efficiently and unobtrusively, particularly at night, with cryptic
background or when hidden by vegetation [62]. Complexities such as ambient tempera-
ture, insulation by fur, surface temperature vs. core body temperature, distance to target,
field of view of the lens meant pilot studies/case studies were required. For data col-
lection, thermal imaging is passive under day and nighttime conditions. It minimizes
disturbances to wildlife and detects animals which are colder, warmer, or the same as
their background temperature because it does not compare temperatures but detects heat
emissions of the animal against its background.
166 A. Morrison and A. Novikova
Drones
Drones (also known as unmanned aerial vehicles, UAVs and remotely piloted aircraft
systems, RPAS) are remotely operated aircrafts with autonomous flight capabilities.
Drone surveys allow rapid and frequent monitoring in remote and poorly-understood
areas, with data immediately accessible and rich information on habitat and conservation
related conditions [72]. The author in [73] describes a female chimpanzee making two
sweeps at an overhead drone with a branch that she held in one hand. The second sweep
successfully downed the drone, demonstrating forward planning with tool-use and in this
instance, the perceived invasiveness of the drone. Studies [74] and [75] discuss the use
of drones for wildlife conservation, including the three common types of conservation
drones, outlining the pros and cons of each version. There is much potential for drone
use in larger scale environments and for conservation purposes, to detect and monitor
arboreal mammal populations and to assess species occupancy and distribution.
One of the most critical issues in using technologies in addition to data collection is data
analysis. Different applications are being developed to combine images and/or video with
analytics for smart event detection and automatic control of the technology--reducing or
eradicating the need for user interaction or participation. Some species-specific welfare
monitoring programs are being designed based on multi-institutional studies that tested
many parameters on a single species or taxa. Artificial intelligence is increasingly used to
improve wildlife identification, monitoring and analysis of large amounts of conservation
data, coming from multiple sources such as camera trap, satellite and drone images or
audio and video recordings [76]. Digital tools that increase efficiency in data collection
and visualization are becoming increasingly available. The author [49] points to ideas
surrounding welfare that is unique to individual animals and contexts.
Monitoring Technologies for Animal Welfare 167
4 Questionnaires
To understand what monitoring technologies zoos concerned with animal welfare were
already using – and what their ‘wish list’ for future improvements would be—we sent
out five straight forward plain English questions.
Using the samples from keyword searches via google and google scholar (e.g., zoo moni-
toring behaviour remote) using PICO process, we had found a variety of publications and
resources that included this list of zoos (see Table 1)—identified as taking a progressive
approach to animal welfare [2]. The identified Zoos were:
Table 1. Zoos identified in the literature [2]as progressive in relation to animal welfare and use
of technology
The approach towards all zoos was via email or where no email contact was available,
via their online form queries system. We used the same request text for all enquiries.
Dear [ZOO NAME],
We are researchers at Auckland University of Technology. We are doing a study that
involves identifying the best animal welfare monitoring solutions used by the most pro-
gressive zoos and sanctuaries. We are looking at technology solutions that help identify
and address animal behaviour issues and take the workload off zookeepers.
Could you please pass on this short questionnaire to the right person/people in your
organisation? The findings from this survey will be presented in a report, a copy of which
can be sent to your organisation.
168 A. Morrison and A. Novikova
If convenient, can you email me the answers to these questions, or I can also zoom/phone
in to discuss depending on what suits you best.
Ann Morrison (contact details etc.).
4.1 Responses
Four zoos graciously participated, and we present their responses to the questions here:
#1 Our main method of monitoring animals is video cameras that are trained on the
enclosures 24/7.
#2 For our welfare assessments, we enter the data into ZIMS/Species 360. Keeper staff
helped decide what aspects we would like to monitor and then a form was made
for them to fill out. Once it is filled out, it is sent to the Animal Care Supervisor
of Mammals and our veterinarian for review, then entered ZIMS/Species 360. The
hard copies are kept in a file for each individual or in their information folder.
#3 ZIMS. We currently aren’t using the Care and Welfare module yet but are planning
to slowly implement in the next few months. Internet to look up info or help in
creating ethograms. Video/cameras. Thermometers/Hygrometers? For monitoring
animal environments. Metasys?
#4 The primary technology that we use for animal monitoring is the ZooMonitor app
(www.zoomonitor.org). This is an app that was originally developed by Lincoln
Park Zoo in partnership with Tracks Data Solutions, largely funded by the Institute
for Museum and Library Services. Trained observers (volunteers, interns, research
staff, keepers) watch the animals and record animal behavior and space use on tablet
devices (iPads), and the ZooMonitor software provides some basic summary data
and intuitive heat maps to visualize how animals are using their habitats. We are in
the process of expanding the app to facilitate multi-institutional animal monitoring.
In addition, we use Monnit sensors to remotely detect activity and habitat or feature
use (www.monnit.com), motion-triggered or time-triggered trail cameras (e.g., Bushnell.
com, www.Wyze.com), and small “spy” cameras (brand = Blindspot). We also have sev-
eral habitats equipped with 24-h camera surveillance. We will sometimes extract sys-
tematically collected behavior information from our primary record-keeping software,
Tracks. (www.trackssoftware.com).
#1 In principle all the enclosures are under constant passive video monitoring, but if
and when there is a particular concern we then switch to active monitoring.
#2 In the mammal department, we do assessments on all the individuals. Depending
on health and age, we will do them more often. Some individuals are twice a year,
while others are four times a year.
#3 All animals but less so with our program animal reptiles/invertebrates.
#4 ZooMonitor app has been used as part of an ongoing, long-term monitoring pro-
gram for the African lion, African penguin, Allen’s swamp monkey, American avo-
cet, Asian small-clawed otter, Bactrian camel, Black bear, Black rhino, Black-and-
white colobus, Black-necked stilt, Brush-tailed betting, Chimpanzees, Cinereous
vulture, Crowned lemur, De Brazza’s monkey, Eastern screech owl, Egyptian fruit
bats, Giraffe, Golden-headed lion tamarin, Gorillas, Grey seal, Guam rail, Guam
kingfisher, Harbor seal, Japanese macaques, Jamaican Iguana, Klipspringer, Ornate
box turtle, Polar bear, Pygmy hippo, Red river hog, Snowy owl, Takin, Titi monkey,
Three-toed box turtle, White-faced saki monkey and others.
Trail cameras, small spy cameras, or built-in camera systems have been used to
monitor: African lions, American toads, Domestic chickens, Dwarf crocodiles, Pygmy
hippos, Polar bears, Prevost squirrels, White-blotched river stingray and others. Brush-
tailed bettongs and the Armadillo species have been monitored using remote sensors.
#1 Provision.
#2 We use ZIMS/Species 360.
#3 Camera software genetic security. Trail cameras all different types and brands.
ZIMS.
#4 ZooMonitor app (www.zoomonitor.org), Monnit sensors to remotely detect activity
and habitat or feature use (www.monnit.com), motion-triggered or time-triggered
trail cameras (e.g., Bushnell.com, www.Wyze.com), and small “spy” cameras
(brand = Blindspot). Extract systematically collected behavior information from
our primary record-keeping software, Tracks. (www.trackssoftware.com).
In a Perfect World, What Else Would You Like These Technologies to Be Able
to Do?
#1 You can access it from home, but I have never tried to set up any alerts. We also
have never used it for behaviour analysis. We have used ZooMonitor, but we don’t
use ZIMS/Species 360 in that form. I am sure it is possible, but we don’t use it that
way here.
#2 Audio. A perfect monitoring camera would be portable, easy to attach places,
weatherproof, have night vision, more recording capabilities, remotely con-
trolled/moveable and viewable, and audio.
#3 We are expanding the ZooMonitor functionality to support multi-institutional data
collection which we think is a step in the right direction! In a perfect world, behav-
ioral monitoring apps like ZooMonitor would have built-in analytics that indicate
real-time when welfare has likely improved or declined in quality. In a perfect
world, there would be non-invasive, accurate, automated recording of behavioral
and physiological changes in animals. The remote sensors are typically made for
larger animals, people, so more sensitivity for smaller-bodied animals, burrowing
animals, flying animals, would be great. Ability to train motion-triggered cameras to
the type of motion of interest (e.g., a moving wolf but not a moving stick) and to fol-
low that motion, view the full scene, would also be ideal, combined with automated
coding of the recorded information.
used the system ‘Provision’ with video cameras trained on the all the enclosures 24/7
for passive video monitoring. If any concerning behaviour was detected, the system was
switched to active monitoring, coupled with manual observation from the caretaker. The
zoo uses the technology to monitor health, behaviour, group interaction, aggression,
interaction with devices etc. For future improvements, the zoo would like to add cortisol
measuring to their data gathering to get a better reading of health and stress levels of
their animals.
By contrast, Zoos #2 and #3 used ZIMS/Species 360 on the mammal population
with the monitoring also used for assessments on all individuals. How often these assess-
ments occurred depending on the health and the age of the individuals with the more
fragile being assessed more often (e.g., four times per year versus twice a year). For
each individual animal, there was a hard copy information folder where any changes
were recorded. The data from the assessments was not entered directly into ZIMS, in
the case that more information or assessment would be needed from the supervisor or
vet. ZIMS/Species 360 systems catered for all #2 Zoo’s current needs but were not using
the system for behaviour analysis. #3’s priority is to gain information about animal
interactions, conspecifics, and mix species.
The fourth zoo is a major instigator in a wider problem-based solution process to
fit multiple scenarios. Their responses are comprehensive and detail their historical
and ongoing developmental solution-based approach. Their continual expansion of e.g.,
ZooMonitor functionality is beneficial to many zoos who due to their inclusive approach,
also work with this system. As a key-player in developing technology solutions in this
field it is useful to note their future trajectory with “non-invasive, accurate, automated
recording of behavioral and physiological changes in animals” and “automated coding of
the recorded information.” Something many zoos, farms and animal wildlife sanctuaries
are also looking to implement. In addition, multi-institutional sharing of data, also a
conservation imperative, would accelerate knowledge transfer and impact significantly
on improvements to animal welfare.
5 Discussion
We have identified developments and implementations in the reviewed literature Sect. 3,
versus deployed and future aspirations demonstrated in the zoo questionnaire responses.
Here, we combine advances and ambitions from these two sources and discuss limita-
tions, issues and impact, recommendations, and next steps forward. Overall, we note
a call for ‘non-invasive, accurate, automated recording of behavioral and physiological
changes in animals’ (#4 zoo).
5.1 Limitations
Since writing up the initial report and this article, we are aware other relevant articles
will have been published that we could not include. ‘Relatively’ new to the field, we
took guidance from Auckland University of Technology librarians and conservation
researchers on refining our keyword search terms.
172 A. Morrison and A. Novikova
The small number of zoos that responded compared to those we approached (see
Table 1) is a limitation of the study. Regardless, the responses reveal a diverse set of
priorities, focus, and implemented solutions and contribute to the larger discussion.
Not all monitoring technologies are suitable for use in a zoo environment. Drones
have a limited capacity with legal and institutional restrictions regarding aviation rules
and health and safety. Noise from drones has been identified as a serious disturbance risk
for some species in the wild with future aerial survey or monitoring work requiring strict
protocols to minimize disturbance risks [77]. Recent novel work determined optimal
flight altitudes for minimizing drone disturbance for wildlife using species audiograms
[78]. While Passive Acoustic Monitoring (PAM) [15, 79] is useful technology for sound
recording and automatic sound identification of animals in the wild [56, 80], use is
restricted by privacy issues for zoo environments.
Wifi Coverage: The efficiency and capacity of Wi-Fi and the servers the systems run on
impacts what technology can be supported and what remote use is possible within zoos [7,
12]. Traditionally zoos’ focus was on providing ‘natural-enough’ enriched environments
for the animals and this still fits, but technologies did not play such an integral role. More
recent technology interventions require mitigation of technology integration into design
choices to augment the naturalistic landscape environments [50].
Public Institutions: Many zoos are supported by public monies and operate on public
institution networks or cloud-based services. These have standard restrictions on privacy
and data security, plus competition for resources is always a factor within the framework
of a large institutional model. Upgrading and adding new software and data analysis
systems may cause incompatibilities across entire systems, where numerous functions
and institutions need to operate securely within the one multi-serving system.
Public Facing: Keepers and zoos are aware of the need to keep up with the evolving
focus on animal welfare, successful breeding (especially for endangered species) and
education programs, as well as benefits from using enhanced technology systems. Most
important is the re-education of the public’s perception of the usefulness of technologies
to address animal welfare issues, particularly with e.g., visible wearable technologies for
this purpose. Often the public has a mixed perception and reception even of the role of
zoos, which requires Public Relations information management. This might take the form
of radio and online interviews, newspaper clips and social media promotion that focus on
animal welfare benefits. Zoo tours and information sessions already make up many zoo’s
routines and could include information on the benefits of such technologies. Research
studies that demonstrate positive welfare impacts from data gathered through wearables
and other monitoring technologies would support an informed public’s understanding
of these devices as having a positive impact on animal welfare. We also see this in
Sect. 3.4, Examples of Use in Zoos, where technologies bring animals and humans
‘closer together’ through webcam streaming, CCTV and video monitoring, camera traps
Monitoring Technologies for Animal Welfare 173
and VR technology [38–44]. These technologies feed information to the keepers and also
act to connect and bond the public to the animals whose lives they are able to witness.
Events such as the birth of an endangered species [39] provide leverage for updating
global technology coverage, promotes the conservation role of the modern zoo and
attracts visitors.
Combining Systems: Adaptive modular systems would enable various sensor systems to
be combined, proving useful as would combining monitoring methods, e.g., mixing sleep
observation with cortisol readings (#1 zoo) [37]. Continuing modification and integration
of simple modular systems proves promising. For example, camera traps are mobile,
and motion activated—so they can be readily repositioned in response to changing
activity. However, they cannot be accessed remotely, need an easy-to-use interface,
extended recording capabilities night vision, audio (#3 zoo) and sensitivity to smaller-
bodied animals (#4 zoo). Adapting camera traps to manage Wi-Fi and adding a quality
interface significantly change capacity. Smaller mobile modular solutions can sometimes
be the most useful [31, 32]. Where existing systems can be updated, modified, and/or
coupled with several systems offers flexibility and expands data collection capabilities
[37]. Digital cameras to track cardiopulmonary readings offer basic health checks [52].
Wearable solutions such as a leg band or collar are possible for some animals [6–8, 14]
and would prove to be a less invasive solution. Zoo environments require ruggedized
solutions to operate in restrictive conditions.
Remote Access: Secure robust Wi-Fi coverage throughout zoo environments can expand
viable coverage options and solutions [46]. In turn, this would provide remote access to
monitoring systems [39, 51], reducing manual labour significantly and ensuring systems
174 A. Morrison and A. Novikova
could adapt easily to the changing needs of animals synchronously. Looking through
a 24-h cycle of footage (even with sampling or fast forwarding) to find anomalies is
inefficient use of keeper time. A significant improvement would be to enable alert notifi-
cations in condition changes to be reported and received instantaneously [38, 39, 51]. A
system of remotely accessible in-situ transponders would enable keepers to note trends
vs. established stress baselines. We see this where precision farming captures only above
defined baseline parameters of ‘usual’ behaviour, customised levels are adaptable and
unexpected behaviours raise alarms [45, 47]. Autonomous systems to manage data col-
lection and analysis would also inform longer term welfare management strategies and
address welfare needs.
Data Analysis: Efficient data collection, digital tools and visualisation addressing indi-
vidual animals unique welfare needs and contexts are becoming increasingly available
[49]. Zoos and technology developers have recognised the need for an Artificial Intel-
ligence system or similar to analyse large amounts of data from multiple sources [76].
Combining data capture with automated coding of the recorded information would be
ideal (#4 zoo). In addition, a long-term archive [21, 22] would map improvement or
deterioration of the different species and sanction resources more effectively for future
strategic planning, as would multi-institutional sharing of data collection.
6 Conclusion
We investigate the status of contemporary monitoring technologies for animal welfare
in a review of the literature. With a focus on zoo environments, we included agricultural
and wild environment solutions, as knowledge and applications from those contexts may
be transferrable to zoo environment requirements. Responses from zoos working with
multiple species with distinctive needs reveal current and future requirements envisaged
for the animals in their care and for streamlining workload for the keeper teams. We
discuss those expanded desires and aspirations against findings from the literature to
scope future improvement solutions for monitoring welfare in zoo environments. We
contribute findings, recommendations, and next steps from these scenarios that can be
applied more broadly to other animal welfare contexts.
Acknowledgments. We thank the generous responses from the four zoos who remain anonymised
for this article but have read and agreed that their input be published. Additionally, we thank funding
from the AUT Summer Research Award from the Faculty of Design and Creative Technology,
without which this research would not be possible. We also thank all who reviewed early drafts of
this research, including anonymous FTC reviewers, for their helpful comments that have improved
this publication.
References
1. Wark, J.D., et al.: Monitoring the behavior and habitat use of animals to enhance welfare
using the ZooMonitor app. Anim. Behav. Cogn. 6(3), 158–167 (2019)
Monitoring Technologies for Animal Welfare 175
2. Hawkes, N.: Animal Care Monitoring Tool Coming to ZIMS (2016). https://www.species360.
org/2018/03/animal-care-monitoring/. Accessed 10 Apr 2022
3. Methley, A.M., Campbell, S., Chew-Graham, C., McNally, R., Cheraghi-Sohi, S.: PICO,
PICOS and SPIDER: a comparison study of specificity and sensitivity in three search tools
for qualitative systematic reviews. BMC Health Serv. Res. 14(1), 579 (2014)
4. Watters, J., Margulis, S., Atsalis, S.: Behavioral monitoring in zoos and aquariums: a tool for
guiding husbandry and directing research. Zoo Biol. 28(1), 35–48 (2009)
5. Camal, L., Kirtane, A., Blanco, T., Casas, R., Rossano, F., Aksanli, B.: A wearable device net-
work to track animal behavior and relationships in the wild. In: 2019 IEEE 10th Annual Ubiq-
uitous Computing, Electronics and Mobile Communication Conference, UEMCON 2019,
pp. 0198–0202 (2019)
6. Jukan, A., Masip-Bruin, X., Amla, N.: Smart computing and sensing technologies for animal
welfare. ACM Comput. Surv. 50(1), 1–27 (2017)
7. Kwong, K.H., et al.: Wireless sensor networks in agriculture: cattle monitoring for farming
industries. Prog. Electromagn. Res. Symp. 2, 1719–1723 (2009)
8. Boyd, I., Kato, A., Ropert-Coudert, Y.: Bio-logging science: sensing beyond the boundaries.
Mem. Natl. Inst. Polar Res. Spec. Issue 58, 1–14 (2004). Special issue (ISSN/ISBN: 03860744)
9. Cooke, S.: Biotelemetry and biologging in endangered species research and animal conser-
vation: relevance to regional, national, and IUCN Red List threat assessments. Endanger.
Species Res. 4(1–2), 165–185 (2008)
10. Hedenström, A., Lindström, Å.: Migration and flight strategies in animals: new insights
from tracking migratory journeys. In: Animal Movement Across Scales, pp. 73–89. Oxford
University Press (2014)
11. Block, B.A.: Physiological ecology in the 21st century: advancements in biologging science.
Integr. Comp. Biol. 45(2), 305–320 (2005)
12. Tan, S.-L., Ha Duy, N., Garcia-Guzman, J., Garcia-Orduna, F.: A wireless activity monitor-
ing system for monkey behavioural study. In: 2011 IEEE 15th International Symposium on
Consumer Electronics (ISCE), pp. 40–45 (2011)
13. Hindell, M., et al.: Circumpolar habitat use in the southern elephant seal: Implications for
foraging success and population trajectories. Ecosphere 7(5), e01213 (2016)
14. Leoni, J., Tanelli, M., Strada, S.C., Berger-Wolf, T.: Data-driven collaborative intelligent
system for automatic activities monitoring of wild animals. In: 2000 IEEE International
Conference on Human-Machine Systems (ICHMS), pp. 1–6 (2020)
15. Kalan, A.K., Mundry, R., Wagner, O.J.J., Heinicke, S., Boesch, C., Kühl, H.S.: Towards the
automated detection and occupancy estimation of primates using passive acoustic monitoring.
Ecol. Indic. 54, 217–226 (2015)
16. Pacheco, X.: How technology can transform wildlife conservation. In: Green Technologies
to Improve the Environment on Earth. IntechOpen (2018)
17. Pisto, K.: What do remote cameras reveal for carnivore researchers? Hike with us to find
out, 01 August 2019. https://blog.zoo.org/2019/08/what-do-remote-cameras-reveal-for.html.
Accessed 22 Feb 2022
18. Tobler, M., Zúñiga Hartley, A., Carrillo-Percastegui, S., Powell, G.: Spatiotemporal hierar-
chical modelling of species richness and occupancy using camera trap data. J. Appl. Ecol.
52(2), 413–421 (2015)
19. Tobler, M.: Camera base version 1.7 [computer program] (2015)
20. Bowler, M., Tobler, M., Endress, B., Gilmore, M., Anderson, M.: Estimating mammalian
species richness and occupancy in tropical forest canopies with arboreal camera traps. Remote
Sens. Ecol. Conserv. 3(3), 146–157 (2017)
21. He, Z., et al.: Visual informatics tools for supporting large-scale collaborative wildlife
monitoring with citizen scientists. IEEE Circuits Syst. Mag. 16(1), 73–86 (2016)
176 A. Morrison and A. Novikova
22. McShea, W.J., Forrester, T., Costello, R., He, Z., Kays, R.: Volunteer-run cameras as dis-
tributed sensors for macrosystem mammal research. Landsc. Ecol. 31(1), 55–66 (2015).
https://doi.org/10.1007/s10980-015-0262-9
23. McCarthy, M.S., et al.: An assessment of the efficacy of camera traps for studying demographic
composition and variation in chimpanzees (Pan troglodytes). Am. J. Primatol. 80(9), e22904
(2018)
24. Hogg, C., Fox, S., Pemberton, D., Belov, K.: Saving the Tasmanian Devil. CSIRO Publishing,
Melbourne (2019)
25. Rode, J., et al.: Population monitoring of snow leopards using camera trapping in Naryn State
Nature Reserve, Kyrgyzstan, between 2016 and 2019. Glob. Ecol. Conserv. 31, e01850 (2021)
26. Harvey, A.M., Morton, J.M., Ramp, D., Mellor, D.J., Russell, V., Chapple, R.S.: Use of
remote camera traps to evaluate animal-based welfare indicators in individual free-roaming
wild horses. Animals 11(7), 2101 (2021)
27. Palencia, P., Vicente, J., Soriguer, R.C., Acevedo, P.: Towards a best-practices guide for camera
trapping: assessing differences among camera trap models and settings under field conditions.
J. Zool. 316, 197–208 (2021)
28. Molloy, S.W.: A practical guide to using camera traps for wildlife monitoring in natural
resource management projects. Micronesian Megapode Project View Project Bird Ecology
and Conservation View Project (2018)
29. Bugler, K.: Monitoring the ‘original’ panda: impacts and outcomes of using infra-red trail
cameras on captive red panda (Ailurus fulgens) behaviour (2020)
30. Stewart, F.E.C., Fisher, J.T., Burton, A.C., Volpe, J.P.: Species occurrence data reflect the
magnitude of animal movements better than the proximity of animal space use. Ecosphere
9(2), e02112 (2018)
31. Macdonald, D.W., et al.: Multi-scale habitat modelling identifies spatial conservation priorities
for mainland clouded leopards (Neofelis nebulosa). Divers. Distrib. 25(10), 1639–1654 (2019)
32. Archangel Imaging: WAMCam | ESA Business Applications, August 2018. https://business.
esa.int/projects/wamcam-1. Accessed 22 Feb 2022
33. CCTV Camera World: Utilizing Cameras To Monitor Animals (2015). https://www.cctvca
meraworld.com/utilizing-cameras-to-monitor-animals.html. Accessed 22 Feb 2022
34. Young, S.: CCTV for wildlife monitoring : an introduction (2016)
35. Hansen, B.K., Fultz, A.L., Hopper, L.M., Ross, S.R.: An evaluation of video cameras for
collecting observational data on sanctuary-housed chimpanzees (Pan troglodytes). Zoo Biol.
37(3), 156–161 (2018)
36. Munita, C., Tadich, T.A., Briceño, C.: Comparison of 2 behavioral sampling methods to
establish a time budget in a captive female cheetah (Acinonyx jubatus). J. Vet. Behav. 13, 1–5
(2016)
37. Kalirathinam, U.K., Elangkovan, S., Kawi, J., Cabana, F.: Sleep monitoring of an Asian
elephant Elephas maximus calf at Night Safari, Singapore: testing whether sleep time is a
significant predictor of cortisol or the onset of positive elephant endotheliotropic herpesvirus
viraemia. Int. Zoo Yearb. 53(1), 128–137 (2019)
38. Chester Zoo and NW Security Group: Smart use of CCTV at Chester Zoo - Case Study. https://
www.nwsystemsgroup.com/sectors/visitor-attractions/chester-zoo. Accessed 22 Feb 2022
39. A. The Birmingham Zoo: High-resolution cameras enhance zoo security while collecting criti-
cal information on animal behaviour, July 2017. https://www.mobotix.com/sites/default/files/
2019-09/mx_CS_BirminghamZooUSA_en_2018-A4-web%2B.pdf. Accessed 22 Feb 2022
40. Fazio, J.M., Barthel, T., Freeman, E.W., Garlick-Ott, K., Scholle, A., Brown, J.L.: Utilizing
camera traps, closed circuit cameras and behavior observation software to monitor activ-
ity budgets, habitat use, and social interactions of zoo-housed Asian Elephants (Elephas
maximus). Animals 10(11), 2026 (2020)
Monitoring Technologies for Animal Welfare 177
41. Zoo Atlanta: Giant Panda Research: Giant Panda Maternal Behavior. https://zooatlanta.org/
project/giant-panda/. Accessed 22 Feb 2022
42. Brady, A., McMahon, B., Naulty, F.: Estimates of locomotion in Asian elephants Elephas
maximus using video monitoring at Dublin Zoo, Ireland. J. Zoo Aquar. Res. 9(2), 124–133
(2021)
43. Field, A., Miles, J., Field, Z.: Discovering Statistics Using SAS. SAGE Publications Ltd.,
London (2012)
44. The Times of India: Delhi zoo installs CCTV cameras to monitor animal behaviour |
Delhi News - Times of India (2020). https://timesofindia.indiatimes.com/city/delhi/delhi-zoo-
installs-cctv-cameras-to-monitor-animal-behaviour/articleshow/77051744.cms. Accessed 22
Feb 2022
45. Küster, S., Kardel, M., Ammer, S., Brünger, J., Koch, R., Traulsen, I.: Usage of computer
vision analysis for automatic detection of activity changes in sows during final gestation.
Comput. Electron. Agric. 169, 105177 (2020)
46. Rao, Y., Jiang, M., Wang, W., Zhang, W., Wang, R.: On-farm welfare monitoring system for
goats based on Internet of Things and machine learning. Int. J. Distrib. Sens. Netw. 16(7),
155014772094403 (2020)
47. Traulsen, I., Scheel, C., Auer, W., Burfeind, O., Krieter, J.: Using acceleration data to
automatically detect the onset of farrowing in sows. Sensors 18(2), 170 (2018)
48. Connors, M.J., Schauber, E.M., Forbes, A., Jones, C.G., Goodwin, B.J., Ostfeld, R.S.: Use
of track plates to quantify predation risk at small spatial scales. J. Mammal. 86(5), 991–996
(2005)
49. Orban, D.A., Soltis, J., Perkins, L., Mellen, J.D.: Sound at the zoo: using animal monitoring,
sound measurement, and noise reduction in zoo animal management. Zoo Biol. 36(3), 231–
236 (2017)
50. Webber, S., Carter, M., Smith, W., Vetere, F.: Interactive technology and human–animal
encounters at the zoo. Int. J. Hum. Comput. Stud. 98, 150–168 (2017)
51. Sensaphone Remote Monitoring Solutions: Case Studies | Remote Monitoring Solutions |
Sensaphone (2015). https://www.sensaphone.com/case-studies/2015/03/protecting-animals-
from-dangerous-temperatures-24-7. Accessed 22 Feb 2022
52. Al-Naji, A., Tao, Y., Smith, I., Chahl, J.: A pilot study for estimating the cardiopulmonary
signals of diverse exotic animals using a digital camera. Sens. (Switz.) 19(24), 5445 (2019)
53. Chahl, J.: Using digital cameras for basic health checks saves zoo animals from anesthet-
ics. PhysOrg, 13 February 2020. https://phys.org/news/2020-02-digital-cameras-basic-hea
lth-zoo.html. Accessed 22 Feb 2022
54. Ross, S.R., Lake, B.R., Fultz, A., Hopper, L.M.: An evaluation of thermal imaging as a welfare
monitoring tool for captive chimpanzees. Primates 62(6), 919–927 (2021)
55. Havens, K.J., Sharp, E.J.: Thermal Imaging Techniques to Survey and Monitor Animals in
the Wild: A Methodology. Academic Press, London (2015)
56. Lahoz-Monfort, J.J., Magrath, M.J.L.: A comprehensive overview of technologies for species
and habitat monitoring and conservation. Bioscience 71(10), 1038–1062 (2021)
57. McCafferty, D.J.: Applications of thermal imaging in avian science. Ibis (Lond. 1859) 155(1),
4–15 (2013)
58. Hristov, N.I., Betke, M., Kunz, T.H.: Applications of thermal infrared imaging for research
in aeroecology. Integr. Comp. Biol. 48(1), 50–59 (2008)
59. Cilulko, J., Janiszewski, P., Bogdaszewski, M., Szczygielska, E.: Infrared thermal imaging in
studies of wild animals. Eur. J. Wildl. Res. 59(1), 17–23 (2013)
60. Steen, K.A., Villa-Henriksen, A., Therkildsen, O.R., Green, O.: Automatic detection of
animals in mowing operations using thermal cameras. Sensors 12(6), 7587–7597 (2012)
178 A. Morrison and A. Novikova
61. Desholm, M.: Wind farm related mortality among avian migrants - a remote sensing study and
model analysis. Thesis/Dissertation, ETDEWEB. Danmarks Miljoeundersoegelser, Roskilde
(Denmark); Copenhagen Univ. (Denmark), Denmark (2006)
62. Lathlean, J., Seuront, L.: Infrared thermography in marine ecology: methods, previous
applications and future challenges. Mar. Ecol. Prog. Ser. 514, 263–277 (2014)
63. Piel, A.K., et al.: Noninvasive technologies for primate conservation in the 21st century. Int.
J. Primatol. 43, 133–167 (2021). https://doi.org/10.1007/s10764-021-00245-z
64. Mcmahon, B., Teeling, E., Höglund, J.: How and why should we implement genomics into
conservation? Evol. Appl. 7(9), 999–1007 (2014)
65. Hoban, S.M., et al.: Bringing genetic diversity to the forefront of conservation policy and
management. Conserv. Genet Resour 5, 593–598 (2013)
66. Gilardi, K., et al.: Best practice guidelines for health monitoring and disease control in great
ape populations (2015). https://doi.org/10.2305/IUCN.CH.2015.SSC-OP.56.en
67. Jain, M., Olsen, H.E., Paten, B., Akeson, M.: The Oxford Nanopore MinION: delivery of
nanopore sequencing to the genomics community. Genome Biol. 17(1), 1–11 (2016)
68. Loit, K., et al.: Relative performance of MinION (Oxford Nanopore Technologies) versus
sequel (Pacific Biosciences) third-generation sequencing instruments in identification of agri-
cultural and forest fungal pathogens. Appl. Environ. Microbiol. 85(21), 1–20, e01368-19
(2019). https://doi.org/10.1128/AEM.01368-19. PMID: 31444199; PMCID: PMC6803294
69. Baldi, P., La Porta, N.: Molecular approaches for low-cost point-of-care pathogen detection
in agriculture and forestry. Front. Plant Sci. 11, 1603 (2020)
70. Chang, J.J.M., Ip, Y.C.A., Ng, C.S.L., Huang, D.: Takeaways from mobile DNA barcoding
with BentoLab and MinION. Genes 11(10), 1121 (2020)
71. Krehenwinkel, H., Pomerantz, A., Prost, S.: Genetic biomonitoring and biodiversity assess-
ment using portable sequencing technologies: current uses and future directions. Genes
10(11), 858 (2019)
72. Bonnin, N., Van Andel, A., Kerby, J., Piel, A., Pintea, L., Wich, S.: Assessment of chimpanzee
nest detectability in drone-acquired images. Drones 2(2), 17 (2018)
73. van Hooff, J.A.R.A.M., Lukkenaar, B.: Captive chimpanzee takes down a drone: tool use
toward a flying object. Primates 56(4), 289–292 (2015). https://doi.org/10.1007/s10329-015-
0482-2
74. Wich, S.A., Koh, L.P.: Conservation Drones: Mapping and Monitoring Biodiversity, vol. 1.
Oxford University Press, Oxford (2018)
75. Koh, L.P., Wich, S.A.: Dawn of drone ecology: low-cost autonomous aerial vehicles for
conservation. Trop. Conserv. Sci. 5(2), 121–132 (2012)
76. Minh, T.C.: These new technologies could transform wildlife conservation, 04 Febru-
ary 2022. https://thehill.com/changing-america/sustainability/environment/592820-these-
new-technologies-could-transform-wildlife. Accessed 25 Feb 2022
77. Zhang, H., et al.: Thermal infrared imaging from drones can detect individuals and nocturnal
behavior of the world’s rarest primate. Glob. Ecol. Conserv. 23, e01101 (2020)
78. Duporge, I., et al.: Determination of optimal flight altitude to minimise acoustic drone dis-
turbance to wildlife using species audiograms. Methods Ecol. Evol. 12(11), 2196–2207
(2021)
79. Crunchant, A.S., Borchers, D., Kühl, H., Piel, A.: Listening and watching: do camera traps or
acoustic sensors more efficiently detect wild chimpanzees in an open habitat? Methods Ecol.
Evol. 11(4), 542–552 (2020)
80. Wrege, P.H., Rowland, E.D., Keen, S., Shiu, Y.: Acoustic monitoring for conservation in
tropical forests: examples from forest elephants. Methods Ecol. Evol. 8(10), 1292–1301
(2017)
81. Hyun, C.U., Park, M., Lee, W.Y.: Remotely piloted aircraft system (RPAS)-based wildlife
detection: a review and case studies in maritime Antarctica. Animals 10(12), 1–17 (2020)
Hierarchical Tucker Tensor Regression:
A Case Study on Classification
1 Introduction
Regression analysis is a statistical method to model the relationship between
dependent (target) and independent (predictor) variables with one or more inde-
pendent variables. Mathematically, it is the task of approximating a mapping
function from input variables to a continuous output variable. In machine learn-
ing, in some cases, a regression problem can be converted to a classification prob-
lem by converting the response variable into discrete buckets. Logistic regression,
softmax regression are two classical examples for applying regression to classifi-
cation problems. When the deep learning era comes, the powerful deep learning
models archive many remarkable results in the field of classification.
Nowadays, the strong development of science and technology has produced
multi-dimensional, complex structured, and large-sized data. These data types
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
K. Arai (Ed.): FTC 2022, LNNS 561, pp. 179–195, 2023.
https://doi.org/10.1007/978-3-031-18344-7_11
180 Q. T. Ngoc
simple but not flexible. Tucker Tensor Regression, proposed by Li et al. [5], has
overcome the inflexibility of CP regression by replacing CP decomposition with
Tucker decomposition [4], as it can admit different ranks on different modes of
Tensor. Both of the above models achieved promising results in neuroimaging
analysis. For Tensor-Tensor Regression, a representative work is the High-order
Partial Least Squares Regression (HOPLS) [13], in which the matrix Partial
Least Squares (PLS) [11,12] is generalized to High-Order Partial Least Squares
(HOPLS) to handle the tensor-output situation. The principle behind HOPLS
is to factor both the input and output tensor into a sum of Tucker tensor, with a
constraint that the extracted latent variables capture the maximum covariance
between input and output tensor. Lock et al. [15] developed a Tensor-on-Tensor
regression model that can estimate a tensor while learning the CP decompo-
sition of the tensor input and the contracted tensor product of the input and
predictor tensor. Gahrooe et al. [16] propose a general multiple tensor-on-tensor
regression approach in which each set of input data and output measurements
are represented by tensors. This work is more general and overcomes the model
in Lock [15], as it can work when input tensor and output tensor have different
ranks. The use of Tucker decomposition instead of CP decomposition makes this
model more flexible and avoids overfitting due to the estimation of a large num-
ber of parameters. In addition, several works were proposed based on interesting
ideas. Kossaifi et al. [17] introduced the Tensor Regression Networks, which can
be seen as the combination of deep learning and tensor method. It reformat the
fully connected layer as coefficients of a tensor regression model and assumed this
tensor coefficient follows a low-rank Tucker format. This model takes advantage
of the information generated from the CNN layers while reducing the number
of parameters through Tucker decomposition. Or Zhao et al. [10] adapted the
Gaussian Process [14] to tensor and proposed the Tensor Gaussian Process to
solve nonlinear regression. All the above methods have gotten remarkable and
promising results in their specified tasks.
In this paper, we try to reuse the Hierarchical Tucker Regression (HTR),
proposed by Hou [7]. This model is similar to the CP regression [6] and Tucker
regression [5], but using the Hierarchical Tucker Decomposition (HTD) [18,19]
instead. HTR maintains the advantages of both the CP model and Tucker model
at the same time. It has the inadequacy of flexibility in the CP model and
avoids the exponential parameter growth with tensor order in the Tucker model
that makes HTR a highly compact, flexible, and scalable tensor regression. In
addition, we have modified the original block relaxation algorithm for HTR
based on the common in the tree structure between HTD and Tensor-Train
Decomposition (TTD) [20]. We make some numerical experiments to evaluate
the original and the adjusted HTR to see the effect of the difference in the tree
structure of the HTD. Finally, we perform a classification case study using HTR
and compare the performance with vector-based regressions on both simulated
and real data.
The paper is structured as follows. In Sect. 2, we want to present some useful
background and notations about tensor, as well as Hierarchical Tucker decom-
182 Q. T. Ngoc
position. In Sect. 3, we review the HTR and its block relaxation algorithm used
to estimate the parameters of this model. Our modifications are also presented
in this Section. Section 4 is about numerical experiments and a case study on
classification. The conclusion is presented in Sect. 5.
Fig. 2. Left: the balanced canonical dimension tree for HT format. Right: the front-
to-back splitting dimension Tree for TT formats.
Definition 2: The set of all tensors X ∈ RI1 ×I2 ×...×ID of hierarchical rank at
most r, with dimension tree T , called H-Tucker tensors, are given by
Depending on how we split the modes of vector, we will get different tree
structures. The most usual is balanced canonical dimension tree used by
Grasedyck et al. [19], which obtained by spitting the modes in the way: for
each parent node t = {m, ..., m + p}, its corresponding children are defined as
tl = {m, ..., m + ⌊p/2⌋} and tr = {m + ⌊p/2⌋ + 1, ..., m + p}. As usual, the modes
are balanced split to archive a balance canonical dimension tree [18,19]. Besides,
Lubich et al. [21] introduce a front-to-back splitting dimension tree, which is
obtained by splitting the modes in the way: for each parent node t = {m, .., p},
its corresponding children are defined as tl = {m} and tr = {m + 1, ..., p}. This
tree structure is a special case of Hierarchical Tucker format and Lubich et al.
[21] says it can be used to present Tensor-Train Decomposition. Figure 2 shows
the illustrations of both dimension trees.
For each matrix X(t) at each leaf node t, Grasedyck et al. [19] define a
factor matric Ut , where the number of columns of Ut is equal to the dimension
of X(t) , denoted by rt = rank(X(t) ). To present the relation of subspaces of
matricizations between the parent and children nodes, Hou et al. [7] introduce a
link between the corresponding basis factor matrices using a so-called transfer
matrix Bt via formula:
Ut = (Utl ⊗ Utr )Bt (6)
where Bt ∈ Rrtl rtr ×rt and rtl , rtr , rt are ht-rank at nodes tl , tr and t respectively.
The construction of H-Tucker proceeds by applying Eq. (6) recursively from the
leaf singletons to root of the dimension tree.
3 Method
In this section, we first provide a representation of Hierarchical Tucker Regression
(HTR) and the original block relaxation algorithm used to estimate the model. In
addition, we modify some calculation steps in this algorithm based on replacing
the balance canonical dimension tree by the front-to-back splitting dimension
tree in HTR.
position with balance canonical dimension tree. The GLM model for Tensor is
expressed through the formula:
g(µ) = η = γ ⊺ z + B, X = γ ⊺ z + vec(B), vec(X ) (7)
where η = γ ⊺ z+B, X is the systematic part, µ is the return expected value and
g(µ) is the link function of GLM. γ ∈ RI0 is the coefficient vector corresponding
to input vector z ∈ RI0 . The coefficient Tensor B is as the same order and
dimension as input Tensor X ∈ RI1 ×I2 ×...×ID .
Hierarchical Tucker Regression is the GLM model for Tensor with coefficient
Tensor B is assumed to follow a Hierarchical Tucker Decomposition. For the
root node, vec(B) = U root (rroot = 1) and by using the nestedness property in
(6), formula (7) can be rewritten as
g(µ) = η = γ ⊺ z + (Urootl ⊗ Urootr )Broot , vec(X ) (8)
Then recursively apply the nestedness property in (6) for all other inner nodes
t ∈ N (T ) by replacing Ut with its corresponding children Utl , Utr and transfer
matrix Bt until all the leaf nodes t ∈ L(T ) are reached. The mixed-product
property of the Kronecker product in (3) is also exploited in this procedure.
After all, the final resulting model is obtained in the form
g(µ) = η = γ ⊺ z + ( ⊗ Ut )( ⊗ Bt ⊗ ⊗ It )
t∈L(T )L t∈N (T )L−1 t∈L(T )L−1
(9)
...( ⊗ Bt )...(Broot ), vec(X )
t∈T l
where the term t∈T \root rt2 is used for the propose of nonsingular transforma-
tion indeterminacy [5].
Hou et al. [7] noticed that in (9), the linear systematic part is only linear
in each Ut and each Bt separately. So they proposed an algorithm referred by
Block Relaxation Algorithm (BRA) [8] with the main idea is to alternately
update one basis factor (or transfer) matrix Ut (or Bt ) at a time while keep-
ing the rest of the matrices fixed. The update steps are performed iteratively
until the convergence criterion is reached. This algorithm breaks the simultane-
ous estimation of all parameters into a sequence of low dimensional parameter
optimizations using classical GLM. All steps of BRA for HTR are shown in
Algorithm 1. In general, there are two phases in this BRA: updating leaf nodes
and inner nodes. In first phase, for each factor matrix Ut , the inner product in
(9) can be rewritten as
′ ′
Ut JL ( ⊗ Ut )⊺ , X(t) = Ut , X(t) ( ⊗ Ut )(JL )⊺ (12)
t′ ∈L(T )\t t′ ∈L(T )\t
′
t′ ′
where JL = ( ⊗ Bt ⊗ ⊗ I )...( ⊗ Bt )...(Broot ). Similarly, in sec-
t′ ∈N (T )L−1 t′ ∈L(T )L−1 t′ ∈T l
t
ond phrase, for each transfer matrix B in intermediate level l, the inner product
in (9) can be rewritten as
′ ′
Bt Kl ( ⊗ Bt )⊺ , Hl = Bt , Hl ( ⊗ Bt )(Kl )⊺ (13)
t′ ∈T l \t t′ ∈T l \t
′ ′
where Hl = ( ⊗ Bt )⊺ ...( ⊗ Ut )⊺ vec(X )
t′ ∈T l+1 t′ ∈L(T )
′ ′
and Kl = ( ⊗ Bt )( ⊗ Bt )...(Broot ). We iteratively run this block updat-
t′ ∈T l−1 t′ ∈T l−2
ing procedure from bottom to top and from left to right along each level of T
until the log likelihood defined for classical GLM in (11) ceases to increase. The
regularization and proof for the convergence of Algorithm 1 can be found in [1]
[5].
Hierarchical Tucker Tensor Regression: A Case Study on Classification 187
g(µ) = η = γ ⊺ z + ( ⊗ Ut )( ′ ⊗ It ⊗ ⊗ Bt )
t∈L(T )L t ∈T \t t∈T L−1
(14)
...( ⊗ It ⊗ ⊗ Bt )...(Broot ), vec(X )
t′ ∈T \t t∈T l
as there is at most 1 inner node at each level of the front-to-back splitting dimen-
sion tree. We call the model in (14) the “front-to-back splitting” Hierarchical
Tucker Regression (FTB-HTR) to distinguish it from the original HTR. Because
it is just the change of the tree structure, the number of leaf nodes and inner
nodes does not change, so if we ignore the difference of the initial ht-rank, the
number of free parameters of FTB-HTR is the same as (10).
Similar to the original HTR model, the Maximum Likelihood Estima-
tion (MLE) method continues to be used to estimate the parameters of the
FTB-HTR model. And just like the HTR model in (9), the linear system-
atic part in the FTB-HTR model in (14) is also only linear in each Ut and
each Bt separately. So we can reuse Algorithm 1 but with changes in the
computation steps to solve the optimization problem in (14). Specifically, in
the updating leaf nodes phase (step 3 to step 5 in Algorithm 1), for each
factor matrix Ut , the inner product in (14) can be rewritten as same as
(12). There is a difference that the element JL in (12) can be rewritten as
′′ ′ ′′ ′
JL = ( ⊗ It ⊗ ⊗ Bt )...( ⊗ It ⊗ ⊗ Bt )...(Broot ). Then we are
t′′ ∈T \t′ t′ ∈T L−1 t′ ∈T \t′ t′ ∈T l
going to solve a GLM regression with Ut as the “parameter” and the term
′
X(t) ( ⊗ Ut )(JL )⊺ as the “predictor”. The number of parameters of this
t′ ∈L(T )\t
GLM regression is just It rt corresponding to the size of factor matrix Ut .
In the updating inner nodes phase (step 6 to step 10 in Algorithm 1), HTR
model in (9) and FTB-HTR model in (14) differ in how the transfer matrices
are calculated, so we rewrite (13) as
′ ′
Bt Kl ( ⊗ It )⊺ , Hl = Bt , Hl ( ⊗ It )(Kl )⊺ (15)
t′ ∈T l \t t′ ∈T l \t
′′ ′ ′
where Hl = ( ⊗ It ⊗ ⊗ Bt )⊺ ...( ⊗ Ut )⊺ vec(X )
t′′ ∈T \t′ t′ ∈T l+1 t′ ∈L(T )
′′ ′
and Kl = ( ⊗ It ⊗ ⊗ Bt )...(Broot ). Like the first phase, we
t′′ ∈T \t′ t′ ∈T l−1
also solve a GLM regression with Bt as the “parameter” and the term
188 Q. T. Ngoc
′
Hl ( ⊗ It )(Kl )⊺ as the “predictor”. The number of parameters of this GLM
t′ ∈T l \t
regression is just only rtl rtr rt , corresponding to the size of transfer matrix Bt .
In summary, the parameter estimation of the two models HTR and FTB-HTR
are similar. The difference in tree structure leads to differences in systematic
parts in models and calculation steps. Algorithm 1 helps break the complicated
original GLM problem with a huge number of parameters into a sequence of sub
GLM problems which is simpler and has a smaller number of parameters.
For an input Tensor Xi ∈ RI1 ×I2 ×...×ID , The complexity of vector-based
method is O(I D ) while the complexity of FTB-HTR and HTR are O(DR3 +
DIR) where I = max{I d }D d=1 and R is rank of Hierarchical Tucker format. This
shows the ability to reduce the number of parameters as well as the flexibility
of the model when the efficiency and complexity are highly dependent on the
tree structure and user-defined ht-rank sets. In addition, by dividing a regres-
sion problem with a very large number of parameters into a series of regression
problems with a much smaller number of parameters, HTR and FTB-HTR help
avoid overfitting, especially is when the amount of data is limited.
The basis for us to make these changes comes from the work in Lubich et
al. [21] and Grasedyck et al. [22]. Concretely, Lubich et al. [21] give a dynami-
cal approximation of Hierarchical Tucker format and Tensor-Train format via a
suitable front-to-back splitting dimension tree, while Grasedyck et al. [22] intro-
duce the theories about the relationship between Hierarchical Tucker rank and
Tensor-Train rank. More specifically, Grasedyck et al. [22] says that the Hier-
arhical Tucker ranks depend strongly on the tree or permutation of modes and
tree structute, so there is no straight answer to the question which Hierarchical
Tucker format is ‘the best’. But in some case when we work with some task
relevant to Tensor Train, we will have a useful constrain. Grasedyck et al. [22]
have shown the ranks required for the Hierarchical Tucker format based on an
arbitrary tree can always be bounded by the ranks in the Tensor-Train format,
which may help finding better Hierarchical Tucker ranks.
4 Numerical Experiments
Fig. 3. Simulate the results of parameter estimation of the hierarchical tucker regres-
sion model on a simulated binary matrix of size 25 × 25 and the sample size 1000.
Fig. 4. Simulate the results of parameter estimation of the hierarchical tucker regres-
sion model on third order tensor of size 2 × 28 × 28, 3 × 28 × 28 and 5 × 28 × 28 respec-
tively. Left: the orginal hierarchical tucker regression - right: The Front-to-Back Split-
ting hierarchical tucker regression.
HTR, the ht-rank is as form (1 − r12 r3 − r1 r2 ), we will run with 3 sets of ht-rank
(1-3, 2-2, 2), (1-4, 2-2, 2) and (1-4, 3-2, 2). For FTB-HTR, the ht-rank is as form
(1−r1 r23 −r2 r3 ), 3 selected sets of ht-rank are (1-2, 3-2, 2), (1-2, 4-2, 2) and (1-2,
4-2, 3). These sets of ht-rank guarantee the same number of parameters for both
models. Tensor B is consistent and the sample size is 1500 for both two models.
Table 3 and Table 4 show the Mean Square Error (MSE) between the value
of the estimated parameter and the original coefficient of HTR and FTB-HTR
models corresponding to each dimension. Figure 4 illustrates the results of the
experiment, with the leftmost column being the original shapes representing the
coefficient Tensor B, followed by the corresponding shapes representing the B̄
parameter estimates for each set ht-rank. The closer the estimated value of B̄ is
to B, the closer the estimated shape will be to the original one. This experiment
and the experiment in Sect. 4.1.1 show the parameter estimation ability of the
two models HTR and FTB-HTR. It can be seen that under the same conditions,
the accuracy of the estimated parameter of each model depends on the tree
structure as well as the ht-rank set.
Hierarchical Tucker Tensor Regression: A Case Study on Classification 191
Dimension HTR
(1-3, 2-2, 2) (1-4, 2-2, 2) (1-4, 3-2, 2)
2 × 28 × 28 0.06358 0.10344 0.07696
3 × 28 × 28 0.05964 0.05315 0.05910
5 × 28 × 28 0.12313 0.08294 0.07962
Dimension FTB-HTR
(1-2, 3-2, 2) (1-2, 4-2, 2) (1-2, 4-2, 3)
2 × 28 × 28 0.10281 0.10344 0.10344
3 × 28 × 28 0.06690 0.06871 0.06871
5 × 28 × 28 0.10876 0.09805 0.09802
192 Q. T. Ngoc
5 Conclusion
In this paper, we have reviewed the Tensor-Scalar Hierarchical Tucker Regres-
sion based on Hierarchical Tucker Decomposition. Our contribution is shown by
proposing the front-to-back splitting Hierarchical Tucker Regression by replac-
ing the balanced canonical dimension tree in the original Hierarchical Tucker
Regression with the front-to-back slitting dimension tree. This tree structure
can be viewed as a representation of the conversion between Hierarchical Tucker
Decomposition and Tensor-Train Decomposition, which provides one more useful
condition when initializing the tree structure. The numerical experiments show
the flexibility and efficient parameter estimation of both two models. We also
try to apply these Tensor regression models on classification problems instead of
vector-based regression models. Experimental results show the effectiveness of
HTR and FTB-HTR on the binary classification problem when achieving high
accuracy while the number of parameters is much smaller than the vector-based
models. The results on multiclass classification are medium, this is due to the fact
194 Q. T. Ngoc
that HTR and FTB-HTR are Tensor-Scalar models, which makes them unsuit-
able for multiclass problems. For the future works, we are going to address this
limitation by extending the Hierarchical Tucker Regression to a Tensor-Tensor
regression model. Replacing the GLM part in Hierarchical Tucker Regression
with the Vector Generalized Linear Model [23] might be a good approach.
Acknowledgment. Quoc Tran Ngoc was funded by Vingroup Joint Stock Company
and supported by the Domestic Master/PhD Scholarship Programme of Vingroup
Innovation Foundation (VINIF), Vingroup Big Data Institute (VINBIGDATA), code
VINIF.2020.ThS.JVN.11
References
1. Hou, M.: Tensor-based Regression Models and Applications. Tensor-based Regres-
sion Models and Applications (2017)
2. Nelder, J.A., Baker, J.: Generalized linear models. Wiley Online Library (1972)
3. McCullagh, P., Nelder, J.A.: Generalized linear models. Chapman and Hall, Mono-
graphs on statistics and applied, London (1983)
4. Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Review
(2009)
5. Li, X., Zhou, H., Xu, D., Li, L.: Tucker tensor regression and neuroimaging analysis.
Stat. Biosci. (2018)
6. Zhou, H., Li, L., Zhu, H.: Tensor regression with applications in neuroimaging. J.
Am. Stat. Assoc. (2013)
7. Hou, M., Chaib-draa, B.: Hierarchical tucker tensor regression: application to brain
imaging data analysis. In: IEEE International Conference on Image Processing
(ICIP 2015) (2015)
8. De Leeuw, J.: Block-relaxation algorithms in statistics. Springer, In Information
Systems and Data Analysis (1994)
9. Guo, W., Kotsia, I., Patras, I.: Tensor learning for regression. IEEE Trans. Image
Process. 21(2), 816–827 (2012)
10. Zhao, Q., Zhou, G., Adali, T., Zhang, L., Cichocki: Kernelization of tensorbased
models for multiway data analysis: Processing of multidimensional structured data.
IEEE Signal Process. Mag. 30(4), 137–148 (2013)
11. Abdi, H.: Partial least squares regression and projection on latent structure regres-
sion (PLS regression). Wiley Interdisciplinary Rev. Comput. Stat. 2(1), 97–106
(2010)
12. Wold, S., Ruhe, A., Wold, H., Dunn, III, W.: The collinearity problem in linear
regression. the partial least squares (PLS) approach to generalized inverses. SIAM
J. Sci. Stat. Comput. 5(3), 735–743 (1984)
13. Zhao, Q., et al.: Higher order partial least squares (HOPLS): a generalized multilin-
ear regression method. IEEE Trans. Pattern Anal. Mach. Intell. 35(7), 1660–1673
(2013)
14. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT
Press (2005)
15. Lock: Tensor-on-Tensor Regression (2017). arXiv no. 1701.01037
16. Gahrooei, M.R., Yan, H., Paynabar, K., Shi, J.: Multiple tensor-on-tensor regres-
sion: an approach for modeling processes with heterogeneous sources of data. Tech-
nometrics (2020). https://doi.org/10.1080/00401706.2019.1708463
Hierarchical Tucker Tensor Regression: A Case Study on Classification 195
17. Kossaifi, J., Lipton, Z.C., Khanna, A., Furlanello, T., Anandkumar, A.: Tensor
regression networks. arXiv preprint arXiv:1707.08308 (2017)
18. Hackbusch, W., Kühn, S.: A new scheme for the tensor representation. J. Fourier
Anal. Appl. 15(5), 706–722 (2009)
19. Grasedyck, L.: Hierarchical singular value decomposition of tensors. SIAM J.
Matrix Anal. Appl. 31(4), 2029–2054 (2010)
20. Oseledets, I.V.: Tensor-train decomposition. SIAM J. Sci. Comput. 33(5), 2295–
2317 (2011)
21. Lubich, Ch., Rohwedder, T., Schneider, R., Vandereycken, B.: Dynamical approx-
imation of hierarchical Tucker and tensor-train tensors. In SIAM J. Matrix Anal.
Appl. 34(2), 470–494 (2013)
22. Grasedyck, L., Hackbusch, W.: An Introduction to Hierarchical (H-) Rank and
TT-Rank of tensors with examples. Comput. Methods Appl. Math. 11(3), 291–
304 (2011). https://doi.org/10.2478/cmam
23. Yee, T.W.: Vector Generalized Linear and Additive Models: With an Implementa-
tion in R. Springer, New York (2015)
Introducing Database Normal Forms
to Students: A Comparison Between
Theory-First and Practice-First
Educational Approaches
1 Introduction
2 Literature Review
Advances in computer science tied with those in education have resulted in a
nearly ubiquitous positive change for students. In this way, in [9] researchers rec-
ognized that practical student involvement in computer science education was
fundamentally essential for student growth. This study highlighted the impor-
tance of implementing the principles of experiential learning in order to obtain
significant progress. Even between disciplines, the authors in [4] recognize the
helpful nature of experience-based pedagogy. In their research, the authors found
that this philosophy, as opposed to solely theory, assisted student farmers in
Slovak towns and was, by and large, a success. Its methods are gradually being
adopted by the general public and the broader academic community.
Relating to the concept of engineering, practice-based learning (PBL) can
successfully be applied in engineering programs, as mentioned in [6]. Also, the
focus should be placed on application and integration of knowledge rather than
on knowledge acquisition. In [5], the authors state that teachers who underwent
practice-based teacher professional development training were able to more read-
ily adapt new long-lasting positive pedagogical changes, aiding in the theory that
experiential learning is effective. The authors of [3] found that practice-based
198 D. C. Cookenmaster et al.
pedagogy was able to greatly enhance teaching models. With the concept of
web design, in [2], a controlled study found that experiential learning was suited
for complicated topics and instructors were able to focus more on problem-
solving activities. Recent research thus corroborates the idea that integrating a
strong emphasis on experiential and practice-based learning is largely positive
for students and can be utilized to assist in the education of challenging topics,
particularly those in computer science.
In the context of databases, normal forms are a challenging collection of topics
related to database schema design; it is a vital component for ensuring data
integrity and limiting data redundancy. The implementation challenge arises
because database designers have to make sure that non prime attributes do not
depend on a pure subset of the database table’s candidate key and also do not
depend on another non-prime attribute. The only way to check that the database
meets these requirements is by manually parsing through the database. This
means that a database schema needs to be constantly checked and normalized
if new tables are added to the schema [8].
Although there are positive results when instructors and institutions engaged
in practice-based pedagogy, we discovered that there is an absence of sufficient
research in the area of database learning methods, and particularly Normal
Forms. In fact, although [8] and [7] discuss methods for normalization, there
is no insight regarding the way normalization should be taught. Furthermore, [9]
and [6] argue for the use of experiential- and practice-based learning in computer
science and engineering, but the study of databases is not explicitly mentioned.
3 Theoretical Framework
Figure 1 explains the general concept behind this research, namely that practice-
based pedagogy should diverge from theory-based pedagogy through the explicit
inclusion of practice at the forefront of the learning experience. In the practice-
based approach, an instructor facilitates the conversation about a specific prob-
lem and students brainstorm and come up with solutions. The practice-based
approach has been applied in student-centered pedagogies, such as project-based
learning and problem-based learning [1,10]. Theory-based pedagogy often relies
firstly on theory and may potentially exclude real-world examples entirely. An
anecdotal example of this might be a simple math problem for children where
an individual walks to the supermarket to purchase some 200 watermelons.
Theory-only education will never suffice in preparing students for work in
the real world, especially considering the fact that theory-based education often
acknowledges little to no practical limitations. For database systems, this is
wholly unhelpful, as all database systems have physical constraints, policy con-
straints, and real-world data requirements. This paper does not seek to argue the
merit of theory, which is most certainly a necessity in the educational process.
Rather, this paper seeks to encourage real-world practical examples in education
first, which come to eventually rest squarely on solid, theoretical foundations.
Theory-First vs. Practice-First Educational Approaches 199
Fig. 1. Conceptual map describing different pedagogical approaches and some related
examples.
4 Methodology
etc.), but they had not yet been introduced to database normal forms nor
concepts related to deduplication, consistency, or isolation.
2. Determining Topic: The research topic focused on educating students in the
normalization of data in a relational database system, with specific emphasis
on 3NF.
3. Lecture Methodology: The lecture methodology utilized in this research
reflects a comparison of theory-based learning and practice-based learning.
In the theory-based pedagogy conducted, theoretical concepts and terminol-
ogy were introduced first with generic (and potentially unrelated) examples
provided to explain differences between the different normal forms, and how to
identify them. Little to no emphasis was placed on walking students through
a real-world example. Rather, this pedagogical approach focused on key terms
and rote memorization over application. The practice-based learning pre-
sented in this paper, on the other hand, began with a plausible real-world
example. Students were encouraged to build new functionality into a system
given a series of requirements, and eventually the students normalized the
data in order to prevent issues that arose. Theory is not completely avoided in
this approach as terminology must still be introduced. However, the emphasis
focused mostly on developing a solution to the problem.
One of the important things taken into consideration during the creation of
the post-lecture survey was the intentional exclusion of leading questions (i.e.,
questions that illicit a specific answer). The survey initially inquires about how
comfortable the participant was with 3NF before the experiment, followed by
their level of comfort post-experiment. Also asked were questions relating to the
student’s comfort with the instructional methodology. All questions utilized fall
on an integral scale from 1 to 5.
A group of 16 students between the ages of 18 and 23 were broken into two
smaller groups, which this study labels Section A and Section B, respectively.
The group was split into two due to their being two different teaching method-
ologies. Section A was comprised of 5 students, and Section B was comprised of
11 students; for a detailed breakdown, see Fig. 2. The groups varied in size due
to students arriving late to the class period. Each group was given a lecture,
approximately 20 min long, regarding the concept of Database Normal Forms.
After the lectures were complete, the students were asked to complete a survey
regarding their level of comfort with normal forms both before and after the
lecture, their perceptions on the instructional methodology provided, and their
opinions on when theory should be introduced in the classroom.
Section A was provided with a Theory-first approach to normal forms. Def-
initions on relational database terminology were provided upfront, followed by
Theory-First vs. Practice-First Educational Approaches 201
(c) Academic Majors for Section A (d) Academic Majors for Section B
(e) Grade Levels for Section A (f) Grade Levels for Section B
Fig. 2. Participant data shows varied ages, majors, and grade levels among survey
groups.
specific, pointed examples of designs that violated the normal form being pre-
sented. Finally, the students thoroughly observed an example of a design which
violated Third Normal Form and were shown the proper way to correct the
violation.
Section B was provided with a practice-first approach to normal forms. The
lecture began with a real world example where the students brainstormed how to
extend a database schema given a series of practical requirements. Following this,
the students were provided examples of database designs that violated normal
202 D. C. Cookenmaster et al.
forms. Throughout these steps, they were asked targeted questions related to
normal forms and pressed to update the real world example to be 3NF-compliant.
5 Results
A series of questions were asked in a post-lecture survey, which were used to help
determine whether or not students were more or less comfortable with database
normal forms after learning from a particular instructional methodology.
Students were asked, on a scale from 1–5, how comfortable they would be explain-
ing 3NF to a peer after the lecture. A student selecting a 1 would indicate low
comfort, and a 5 would indicate high comfort. Students in Section A were, on
average, comfortable with the concept of explaining 3NF to a peer after the
lecture. The calculated average was 3.8/5, which rounded to the nearest integer
would be: 4 - Comfortable. Students in Section B were, on average, indifferent
to the concept of explaining 3NF to a peer after the lecture. The calculated aver-
age was 3.4/5, which rounded to the nearest integer would be 3 - Indifferent.
Students were asked, on a scale from 1–5, what their impressions were on the
complexity of database normal forms. Specifically, a statement posits: “I find
third normal form easy to understand.” A student selecting 1 would indicate
strong disagreement, and a 5 would indicate strong agreement. Students in
Section A were, on average, in agreement with this statement. The calculated
average was 3.6/5, which rounded to the nearest integer would be: 4 - Agree.
Students in Section B were, on average, in agreement with this statement. The
calculated average was 3.6/5, which rounded to the nearest integer would be: 4
- Agree.
Students were asked, on a scale from 1–5, what their impressions were on their
own personal readiness for a test on 3NF. Specifically, a statement posits: “If a
test were given today on Third Normal Form, I would ace it.” A student selecting
a 1 would strongly disagree with this statement, and a 5 would indicate strong
agreement. Students in Section A were, on average, indifferent to this statement.
The calculated average was 3/5, which would be: 3 - Indifferent. Students in
Section B were, on average, indifferent to this statement. The calculated average
was 2.5/5, which rounded to the nearest integer would be: 3 - Indifferent.
5.6 Demonstration
Students were asked, on a scale from 1–5, how intuitive they found the lecture.
Specifically, a statement posits: “I found the demonstration intuitive.” A stu-
dent selecting a 1 would indicate strong disagreement with this statement, and
a 5 would indicate strong agreement. Students in Section A were, on average, in
agreement with this statement. The calculated average was 4/5, which would be
4 - Agree. Students in Section B were, on average, in agreement with this state-
ment. The calculated average was 4.2/5, which rounded to the nearest integer
would be 4 - Agree.
204 D. C. Cookenmaster et al.
Students were asked, on a scale from 1–5, how much they enjoyed the educational
approach. Specifically, a statement posits: “I enjoyed the educational approach
used in the demonstration.” A student selecting a 1 would indicate strong dis-
agreement, and a 5 would indicate strong agreement. Students in Section A were,
on average, indifferent to this statement. The calculated average was 3.4/5, which
rounded to the nearest integer would be 3 - Indifferent. Students in Section B
were, on average, in agreement with this statement. The calculated average was
4.2/5, which rounded to the nearest integer would be 4 - Agree.
Students were asked, on a scale from 1–5, how much they believed the intro-
duction of theory was important to introduce first in education. Specifically, a
statement posits: “It is important to learn theory before engaging in practice.”
A student selecting a 1 would indicate strong disagreement with this statement,
a 5 would indicate strong agreement. Students in Section A were, on average, in
agreement with this statement. The calculated average was 4/5, which would be
4 - Agree. Students in Section B were, on average, in agreement with this state-
ment. The calculated average was 4.1/5, which rounded to the nearest integer
would be 4 - Agree.
5.9 Discussion
The authors of this paper recognize that the sample size is small and acknowledge
the limitations of these results. Classroom size limitations, student availability,
and time constraints resulted in a smaller-than-desirable set of students. How-
ever, the methodology and results presented herein can be used to fuel another,
larger study.
Based on the data collected, while it is currently indeterminate as to whether
or not students were more or less comfortable with the concept of normal forms
after the experiment, what is certainly clear is that students across both groups
have a strong predisposition to theory-first education. Specifically, across both
groups, the average was four or above (general agreement) when asked how much
they believed that the introduction of theory was important to introduce first
in education. This has lasting implications for educators, as students currently
believe that theory is important to introduce first, whether or not it actually
behooves them.
That being said, it should be noted that while students had a strong predispo-
sition to theory-first education, those in the practice-first lecture were reportedly
more comfortable on average with the educational approach afforded them, as
opposed to those in the theory-first lecture, who were on average indifferent.
Theory-First vs. Practice-First Educational Approaches 205
References
1. Alférez, G.H.: Ideas para docentes-investigadores adventistas. Publicaciones Uni-
versidad de Montemorelos (2020)
2. Jakovljevic, M., Ankiewicz, P.: Project-based pedagogy for the facilitation of web-
page design. Int. J. Technol. Des. Educ. 26(2), 225–242 (2016)
3. Jao, L., Wiseman, D., Kobiela, M., Gonsalves, A., Savard, A.: Practice-based ped-
agogy in mathematics and science teaching methods: challenges and adaptations
in context. Can. J. Sci. Math. Technol. Educ. 18(2), 177–186 (2018)
4. Katarina Slobodová Nováková and Zuzana Giertlová: New models of theoretical
and practical education in urban environment (on example of experience-based
pedagogy in Slovak towns). Procedia. Soc. Behav. Sci. 228, 305–310 (2016)
5. Pella, S.: Pedagogical reasoning and action: affordances of practice-based teacher
professional development. Teach. Educ. Quart. 42(3), 81–101 (2015)
6. Perrenet, J.C., Bouhuijs, P.A.J., Smits, J.G.M.M.: The suitability of problem-based
learning for engineering education: theory and practice. Teach. High. Educ. 5(3),
345–358 (2000)
7. Salzberg, B.: Third normal form made easy. SIGMOD Rec. 15(4), 2–18 (1986)
8. Sug, H.: A method for normalization of relation schema based on data to abide by
the third normal form. WSEAS Trans. Math. 19, 216–225 (2020)
9. Tzafilkou, K., Protogeros, N., Chouliara, A.: Experiential learning in web develop-
ment courses: examining students’ performance, perception and acceptance. Educ.
Inf. Technol. 25(6), 5687–5701 (2020)
10. Zabala, A., Arnau, L.: Métodos de enseñanza de las competencias. Graó(2014)
Analysis of Load Balancing Algorithms Used
in the Cloud Computing Environment:
Advantages and Limitations
1 Introduction
Cloud refers to connected IT resources and computing refers to the work, treatment
or processing on those resources remotely in a pay as you go basis. It is an internet-
based technology which provides various cloud-based services characterized by their
efficiency, reliability and low-cost accessibility anytime and from everywhere (Karan
D. Patel 2019, Tosal M. Bhalodia 2019) [1]. Cloud computing had no consent definition
until 2008, and researchers continue improvements to establish a common definition.
NIST1 (National Institute of standards and Technology) [5] defined cloud computing as
a model for enabling global, convenient and on-demand network access to a shared pool
of configurable computing resources that can be rapidly provisioned and released with
minimal management effort or service provider interaction (Ahmad Salah al-Ahmad,
Hasan Kahtan 2018) [2]. The definition describes explicitly the major 5 features that
constitutes the essence of the cloud System to provide computing resources accessibil-
ity, which are: On-demand self-service, resource pooling, broad network access, rapid
elasticity, and measured service. On demand self-service focuses on the ability of the
user to request and configure his computing utility demand independently on a third
party. Resource pooling means that CSP’s computing resources and VMs are pooled in
a distributed manner to serve client needs without geographical constraints, especially
while managing sessions and user traffic based on SIP protocol, in a cloudlet or at the
WAN networking level. Broad network access means that all resources, from end point
servers to software applications running on the hosts are available for the users over
the internet, through a network client server architecture. Rapid elasticity refers to the
load, traffic and level of demand on the computing resources, which can be allocated to
the requesting jobs or released elastically based on the cloud state and amount of client
requests. On the QoS level, those capabilities should appear to be unlimited and acces-
sible for any quantity of user requests at any time, and the CSP should consider the fault
tolerance of the cloud system as a principal challenge, to keep all requests running using
load balancing techniques and VM migration solutions. Tolerance to faults ensures that
all services are being delivered continuously even if there is a SW/HW issue with some
cloud servers. Finally, the measured service feature, which means that end users doesn’t
have the responsibility to control, configure or using optimally the computing resources,
CSP handles automatically all capabilities (Servers, storage, runtime environments and
applications. Both cloud consumers and providers have the option to monitor, control
and report the amount and type of the utilized services. Because of those features, orga-
nizations and industries are migrating to a new architecture of cloud services to enhance
their business model and to encourage remote work for their employees. Businesses also
use software as a service (SaaS) to access a web-based application, or infrastructure as a
service (IaaS) to sublet to other smaller companies, or use Platform as a Service (PaaS)
to build their own applications (N. Manikandan, A. Pavin 2019) [3]. At the deployment
level, cloud infrastructure can be categorized into 3 types:
a) Public Cloud: service should be public in a standard model and most services should
be free for a user, this type allows users to access the cloud publicly via interfaces
using web browsers on a pay per use basis (computing utility). However, public
clouds present a less security in comparison to other categories.
b) Private Cloud derived from the intranet model and constitutes a service offered to
a selected category of users instead of a public cloud. It provides high level security
because all cloud networking traffic is processed within the organization’s internal
DCC, means that all services and resources are made available for the users uniquely
at the organizational level.
c) Hybrid: combine both advantages of public and private cloud models. It serves
the organizations security needs and provide access to public capabilities whenever
needed by the private cloud users. Here, the private pool of servers is linked to
208 Z. Bouflous et al.
one or more public cloud nodes to enable the accessibility. This category provides
more flexibility to enhance IT infrastructure, networking and open the possibility for
more options (S. Sahu and M. Pandey 2019) [4]. Furthermore, the NIST mentions
the existence of another type of cloud systems named the Community one, which
is provisioned for configured use by a specific community of CSCs (Cloud Service
consumers) from businesses and organizations that have shared concerns and strate-
gies (e.g., security requirements, policy, data privacy monitoring systems …). One
or more of the organizations within the community could own, operate, configure
and maintain the overall architecture, on or off premises.
Cloud computing model as described by the NIST is shown in the Fig. 1 below:
At the reference architectural level, the NIST defines 5 major actors on the cloud
infrastructure: cloud consumer, cloud provider, cloud carrier, cloud auditor and cloud
broker. Each actor can be considered as an entity (a person or an organization) that
participates in a transaction or process and/or performs tasks in cloud infrastructure
[5]. 1) Cloud consumer (CSC): A person or organization that maintains a business
relationship with, and uses service from, Cloud Providers (CSP). 2) Cloud Provider: A
person, entity or organization which is responsible for the availability of a service to
interested parties. 3) Cloud Auditor: defined as a party that can conduct independent
assessment of cloud services, information system operations, performance and security
of the cloud implementation. 4) Cloud.
Broker: defined as an entity that manages the use, performance and delivery of cloud
services, and negotiates relationships between Cloud Providers and Cloud Consumers. 5)
Cloud Carrier: any intermediary which is responsible of providing connectivity and trans-
port of Cloud computing services from CSP to CSC. Figure 2 resumes the interactions
between those actors in cloud computing:
Analysis of Load Balancing Algorithms Used in the Cloud Computing Environment 209
A) When the CSC asks service from CSB instead of contacting directly the CSP. In this
case, the CSB may create a new service by combining multi CSP Service sources.
According to (R. Hentschel and S. Strahringer 2020) [6], Finding appropriate cloud
services that best fit CSC requirements can be a complex and time-cost intensive
process, especially for small and medium organizations, and since there is no “one
fits all” CSP, companies face the challenge of selecting and combining services
from different vendors to meet their requirements.
B) When the cloud consumer requests are linked directly to a cloud provider according
to a certain SLAco (Service level agreement with consumer). In this case, the cloud
provider is linked itself to a cloud carrier according to another SLAca (Service level
agreement with carrier), that enables capability for the cloud provider to request ded-
icated and encrypted connections to ensure that the cloud services are consumed at
the consistent level mentioned on the contractual obligations with cloud consumers.
In this case, the provider may specify its requirements on capability, flexibility and
functionality in SLAca in order to provide essential requirements in SLAco.
keep traceability of the overall networking system. Indeed, Service Level Agreement and
user satisfaction could be guaranteed by choosing excellent load balancing techniques.
Hence, another entity named ‘Load balancer’ should be added on the reference model of
cloud computing and the whole picture of cloud computing is presented in Fig. 3 below:
Several researches have been done in LB and task scheduling for cloud environments
and different load balancing strategies have been proposed, will be discussed next in
Sect. 2. The remaining segments are structured as follows: Sect. 3 will introduce the
preliminary of the review and Sect. 4 concludes the paper.
2 Related Works
Cloud computing in the stack of web architecture and services remains one of the fas-
cinating fields on IT industry so far. With virtualization of hosts and availability of a
set of servers, it can be considered as an emerging new technique for providing net-
work computing services like water and electricity utility. Cloud users demands for the
computing resources are increasing day after day, because of the huge number of digital
gadgets sending user requests daily. The purpose of CC is to make available anytime
the computing services for all requests taking into account QoS measurements: security,
cost (Pay as You Go), throughput and servers response time. Geeta and Shiva Prakash
2018 [9] reminds that Cloud computing is facing several challenges, mentioned the main
ones as below:
• Security and Privacy: One of the biggest issues in distributed cloud infrastructure. It
depends on jobs nature, networks data and application movements. Various security
policies have been set by CSPs to minimize the likelihood of data control loss.
• Performance: The performance is also a big concern in CC. It’s the mirror metric
capability of the overall cloud infrastructure servers. CSP could face overloading or
underloading situations due to lack in system HW configuration assets like memory,
diminutive CPU speed and limited bandwidth.
Analysis of Load Balancing Algorithms Used in the Cloud Computing Environment 211
• Efficient Load balancing: Aims to distribute the workload the most fairly possible
across all the nodes in cloud environment in order to reach client satisfaction and
manage the availability state of resources, ensuring that no node is underloaded against
overloaded nodes, Hence the refinement of the whole cloud environment throughput.
• Resource Management and Scheduling at all the embedded stack levels of cloud
architecture: software, hardware, networking protocols, virtualization level and load
balancing techniques. It includes also the supervision of memory, threads, CPU’s
cores, disc space, VM images, I/O devices etc.
• Require Fast Internet speed constantly: The full exploitation of the computing
cloud services could not be guaranteed without high-speed communication channels.
Many researches in the field of networking are there to support CSPs on this issue.
• The Energy Consumption behind the Data Center: based on a communication from
Amazon brand, the cost consumption of its data centers is 53%, and the total cost is
used by the servers for a 3-year amortization period. Besides, cooling use 42% of the
total CA including both requirements (23%) and direct power consumption (19%) for
amortization period of 15 years.
• Scale and Quality of Service Management: It’s a primordial issue for CSP to keep
the trust and guarantee the SLA contract made with cloud consumers.
This review concentrates the study on the load balancing key challenging CC. The
problem which is facing the CSP actually is the number of servers which can’t in any way
follow the huge number of calling requests. To rise above the problem, several researches
have been done in Load balancing and task scheduling for cloud environments, to meet
those requirements.
M. Ala’anzi and M. Othman 2019 [10] presented a Meta study of the literature
on load balancing and server consolidation as a reference taxonomy on the most effi-
cient algorithms that achieve load balancing and server consolidation. They have men-
tioned a new classification for load balancing and server consolidation, such as hard-
ware threshold, migration overhead, network traffic, and reliability. Then, they described
how the merge of load balancing techniques and server consolidation can optimize the
exploitation of resource utilization and enhance QoS parameters. Though their study,
they presented a clear overview of the load balancing process, talking about PM and
VM migration and how the whole process in managed by the VMM. In another section,
the review will describe the methods for server consolidation and the parameters to take
into account to effectively achieve requested performance when combining with load
balancing techniques.
M. Asim Shahid, N. Islam, M. Alam, M.S. Mazliham and S. Musa 2020 [11] pre-
sented a comprehensive study of load balancing in the cloud computing environment
and identified the need for a novel LB algorithm that employs fault tolerance (FT) tech-
nique. Their analysis resulted in the idea that existing traditional LB algorithms without
taking into account this new FT approach are not good enough to effectively spread the
workload through Cloud system’s nodes. In their review, they discussed the current state-
of art challenges in cloud computing, and then focused on the Load Balancing issues
related to cloud infrastructure, presenting the various LB techniques currently available
in the literature and their applied performance parameters. The research gap existing
actually in the literature LB techniques was introduced, in addition to the possibility of
212 Z. Bouflous et al.
finding a new LB algorithm that can address the gaps identified. Their survey focused
essentially on the FT metric to consider for optimizing Cloud environment performance.
Due to their study, a LB algorithm aims to have this Fault Tolerance capability, which
significantly reduces the job make-span, produces efficient networking exchanges and
achieves high system efficiency during resource/ server’s losses.
Jyoti, M. Shrimali, S. Tiwari, H. Pratap Singh [7] gathered on their review the most
useful algorithms used for cloud computing, classified into static and dynamic strategies,
dependently on the types of requesting application, capability of the computing hosts
and the behavior of the cloud system. Their review was based on the literature of recent
literatures and surveys on cloud computing. Through their literatures, they focused the
review on a comparative approach of the working solutions to optimize most efficiently
the work handling, describing some strategies (Round Robin, weighted round robin, least
connection, weighted least connection and Random) on which the cloud load handling
is based to classify the algorithms proposed into static and dynamic algorithms. Then
explain in a summarizing way the most useful algorithms in the LB and the service
brokering using a taxonomy section. Another sect ion was introduced to discuss an
overview of the CC techniques in a systematic planning study, presenting a comparative
analysis of LB and the simulation tools used for LB process to discuss in the last chapters
the cloud storage security related to LB and service brokering.
A.A.A. AlKhatib, T. Sawalha, S. AlZu’bi 2020 [12] mentioned that the limitations
of network resources and efficient requirements server responses confirms the need of
load balancing technique that could help in distributing traffic via several resources for
improving the overall cloud architecture efficiency and reliability. Through their review,
a complete and relative understanding of existing literature on several load balancing
algorithms has been proposed complementary to major LB concepts. Then, the review
summarizes the advantages and limitations of major LB algorithms used nowadays for
handling the workload on cloud system, Viz Round robin algorithm, least connection,
Throttled load balancer, Genetic algorithm, Ant colony optimization, Honey bee algo-
rithm, Active monitoring load balancer, FCFS, Generalized Priority algorithm and NBST
algorithm. The review was concluded by a classification of all the previous techniques
into static and dynamic classes, discussing the overhead metric, complexity, advantages
and limitations.
R. Ramya, S. Puspalatha, T. Hemalatha, M. Bhuvana 2018 [13] have presented in their
review a performance analysis of LB algorithms using meta heuristics approach at cloud
provider’s side. Starting from the historical background of cloud theory and challenges,
the commitment tool SLA aims to handle most fairly the expectations between the cloud
service provider and consumers. The challenge resides in difficulty to handle all requests
by the cloud providers at a time during peak hours and to keep up the contractual SLA
measurements. When the occurrence of an uneven request shows up, the cloud resources
may either be underutilized or over utilized. In order to manage this load, LB mechanism
plays a primordial role in cloud computing. For that the study present a review of both the
existing static and dynamic load balancing algorithms proposed till now and design the
implementation of a Load balancer that uses a Meta Heuristics approach and Ant Colony
Optimization technique to meet the SLA criterion. They consider in their implementation
the Amazon AWS EC2 Cloud PaaS method.
Analysis of Load Balancing Algorithms Used in the Cloud Computing Environment 213
G. Srinivasa Rao, P. Charan Arur and T. Anuradha 2020 [17] confirms in their paper
analyzing real-time cloud based LB algorithms that static techniques need the whole
server and user needs data. It has some limitations regarding the response time that is
considerably less, because it uses more computational resources, it is less scalable (which
means the ineffectiveness of virtual machines to respond demands as per the needs of the
user request traffic during the peak hours), it consumes more energy and with less make-
span and less throughput. This type of algorithms is best suitedfor the homogeneous
environments. To sum up, static load balancing algorithms are not appropriate to the
cloud environment, referring to cloud itself which is dynamic and heterogeneous.
Round Robin
Round robin is a famous load balancing algorithm, based on a list of identified servers;
it divides the assigned task to the same set of cloudlets and forwards each request to the
214 Z. Bouflous et al.
corresponding server in ordered list. Each request is divided to a set of Quantum Times
QT among all processors. Once we reach the last server, the loop moves back to the first
server and restarts the process. The Round Robin algorithm has proved its performance
in case of many CPU scheduling problems and has achieved great efficiency through
fair allocation principles which are based on temporal multiplexing frames for every
processing request (Fig. 5). RR characterizes preemption, means that all running jobs
are forced and preempted from the CPU into a queue to avoid starvation in the system
by making a long wait time for the processes to be allotted CPU. If we consider N tasks,
then each one will be allotted 1/N of the VM’s global processing time and the waiting
time will not be more than (n − 1) * QT and the overhead of scheduling will be of the
order of o(1). The main advantage of this algorithm is the performance while distributing
equally the load to all servers the more identical servers are configured to provide exactly
the same services. However, the Cloud environment systems are subjected to face the
challenge of a longer waiting time in the queue for the clients; in addition, the fixed time
slicing method cannot assure best system performance because of the different nature
of the calling jobs. The time of wait and the ratio of the context switching need to be
handled cautiously. For that, a new enhanced round robin (ERR) method was introduced
by (Sanaj MS and Joe Prathap PM 2020) [14].
Here, all tasks Ti where i in {1, 2, 3, 4, 5} are supposed runnable for a Quantum
time QT equal to the time slot defined in the Round Robin SW algorithm by the cloud
administrator.
The main advantage of this algorithm is that’s realistic taking the fact that HW servers
can have different configuration capacity, such as memory, CPUs frequency and number
of cores. The main challenges of this algorithm are the starvation, which is the situation
under which low weighted servers get postponed by the LB algorithm due to the high
priority given to the other servers. Another issue is the non-standardized metrics on
216 Z. Bouflous et al.
which the cloud database administrator could refer to assign the appropriate weight to
server nodes.
Min-Min Algorithm
This algorithm is based on the determination of the average time execution of all the
waiting activities first and it processes like that until the entire workload is complete.
The algorithm has shown improved productivity and response time and optimal resource
utilization, challenged by high overhead communication. At first Min-Min steps, the
optimal activities resulting in improved scheduling and overall developments of the
global make-span are scheduled, that means that the algorithm assigns each task to
the best matching server depending on the response time and hardware configuration.
Minimal runnable tasks will then be allocated first, while the bigger tasks would persist
in the holding stage, contributing to weak machine use. The algorithm uses a quick and
easy approach improving the overall make-span; however, it suffers from starvation [11].
Max-Min Algorithm
Max–Min follows the Min-Min heuristic algorithm. This procedure in cloud environment
selects the task with larger size and chose a cloud resource (VM) that has the minimum
processing capacity. After the allocation of task to a VM, the algorithms remove the
task from the queue and proceed forward to distribute the remaining unallocated tasks.
The Max–Min algorithm is suitable for only small scale distributed clustered systems; it
keeps a task status table in memory for real-time VM load measurements, additionally
to the expected completion time while executing tasks. Another Elastic Cloud Max–
Min (ECMM) algorithm was proposed in the literature and proved better than the RR
technique for improvements in task’s average pending time, resulting in arrival of the
tasks in batch mode process [16]. This algorithm performs greatly than the Min-Min
algorithm, because of the higher number of small tasks in comparison to long ones;
nevertheless, it leads to starvation too.
of work in the network of multiple processors [7]. The (OLB + LBMM) measurement
follows the approach of the specialists. The algorithm presents multiple stages: stage
one in which a challenging administrator manages the workloads and assigns tasks to
the specific nodes. Stage two where service manager divides the requests into the sub-
enterprises and relegates them to the operational nodes in question. It also consists of
administrative nodes for performing the level three tasks. This algorithm presents the
advantage of efficient resources utilization, and improved work competency. However,
since completion and run time of node tasks are not considered in OLB, the overall
pending time of all activities is significantly long.
Random Algorithm
The random algorithm matches clients and servers randomly, based on a random number
generator; the load balancer follows a large number of requests evenly to the nodes by a
Random algorithm. Like Round Robin, this algorithm has proven its efficiency for cluster
nodes that with similar configurations. S. Kumar Mishra, B. Sahoo and P. Paramita Parida
2020 [16] described in their paper that in cloud computing environment, load refers to the
allocation of different tasks to VMs. We can define the LB research problems at different
levels: (1) Task allocation: The random sharing of a finite number of tasks into various
Physical Machines (PMs) which again responsible to the creation of different VMs using
a hypervisor firmware. The metric of efficient task allocation in the cloud determines
the effectiveness of the load balancing algorithm and the access control to each service
could be provided by ABE (Attribute based encryption), which is widely used with SW
applications using the cloud for data storage [20]. (2) VM/Task Migration Management:
In CC Environment, VM Migration describes the transfer of a VM from one overloaded
PM to another one improving the resource utilization. Similarly, the movement of the
actual task’s state from a virtual machine to another one is called task migration. The VM
or task migration are a primordial concept for load balancing of cloud computing. The
algorithm presents an advantage for reaching load balancing on all system servers and the
better performance of servers with similar and high workloads of computing resources.
Those were the most used strategies on the cloud system history. Improvements and
enhancements were introduced as variations based on the same principle, giving birth to
much advanced LB algorithms, classified into static and dynamic ones. The static-based
balancing algorithms are mostly suitable for stable environments with homogeneous
system, while dynamic-based LBs are more adaptable for complex cloud architectures,
proving effectiveness in both homogeneous and heterogeneous environments. However,
static load balancing processes present less system overhead as compared to the dynamic
ones.
algorithms monitor the load of each node real time on a regular basis. The main objective
is to interchange load amount and state’s data between nodes and within nodes (VMs
interchanging) at given times to updated about nodes workload and redistribute the traffic
between and within nodes whenever the need for cloud flow refinement [11].
Least Connection
This algorithm schedules network traffic to the less connected server with the client
requests. It is one of the dynamic load balancing scheduling algorithms; as it depends
on counting the no of connections for every server to extract its load. The load balancer
maintains the connection number of each and every server in real time, increases the
counter number when a new connection is sent off to it, and decreases it at the end of
the connection. LC algorithm transmits a web request to the server that has least web
connection numbers (G. Singh and K. Kaur) [15]. The main advantage of this strategy
is the minimization of the server’s overload likelihood, while sending requests to the
fewest active connections. Based on the [7] reference review, the disadvantage of this
method is that LB can’t guarantee the tasks execution. Another limitation is that longer
traffic connection times stacking up on a single server, which can overload the server
when adding a new request even if it’s just one connection. The distribution of the load
is shown in the Fig. 7 below:
in comparison to low weighted ones. The default server weight is one, and the IPVS
Administrator or monitoring program can assign any weight to real server. In the WLC
algorithm, all next network connections are given to a server which has the minimum
ratio of the number of current active connections related to its weight. The Weighted
least connection scheduling algorithm does to least connection algorithm what weighted
round robin algorithm does to round robin algorithm. That is, it introduces a “weight” that
is based on the hardware configuration and specifications of every server [15]. Figure 8
below describes the behavior of a cloud system using weighted least connection:
In Fig. 8, the server 1 is selected by the load balancer. Here the algorithm chooses the
request connection depending on the number of active connections in the LB traceability
table according to the weight. This algorithm presents the advantage of preventing a
server from being overloaded by checking the number of server connections using the
WLC approach. However, the main issue with this technique is the long processing time
[7].
based on the gray prediction algorithm to predict the load rate of the departure and target
hosts. Experiments on the paper show that using the energy aware algorithm, CSPs could
reduce their total CC energy consumption, enhance the overall clustering performance
and over scale their environment platform.
Throttled Algorithm
This algorithm is similar to AMLB using a table that contains the VMs and their current
states (available - busy). The algorithm sends a request to the control unit whenever there
is an assignment of a virtual machine to perform a specific task. The DCC will search for
the optimal matching VM based on capabilities qualifications. Throttled Load Balancer
is described in Fig. 9 below:
This algorithm presents the advantage of data center searching for the best VM that
fits its capabilities with the task required, which improve the performance of the cloud
structure. Nevertheless, this process of searching starts from the beginning of the table for
each new request, which results in a wastage of time because it passes through unavailable
VMs every time. For that a Modified Throttled Algorithm MTA was proposed in the
literature, working on modifying the VM selection cursor mechanism. For each request
Analysis of Load Balancing Algorithms Used in the Cloud Computing Environment 221
arrival, the algorithm selects the VM index next to already assigned VM depending on
its availability [12].
Genetic Algorithm
J.M. Shah, S. Pandya, N. Joshi, K. Kotecha, D.B. Choksi, N. Joshi 2017 [19] mentioned
in their analysis that the GA model has been based on the natural calculation choice’s
model while simulating the theory of Darwin related to biological method of genetic
operations following the dynamic SW approach of computing. This algorithm presents
the advantage being adoptable for complex objective functions.
Generally, the implementation of such an algorithm requires three steps:
1. Selection Operator for which the procedure selects randomly the initial population.
2. Crossover Operator for finding out the fitness pair of individuals.
3. Mutation Operator: A value which consist of low probability said to be called as
a mutation value. These bits are toggled from 0’s t 1’s or 1’s to 0’s. The GA is
clear for developers to understand its implementation; however, it doesn’t meet the
performance criterion where resources are strictly bound [7].
1. Scout bees which look arbitrarily for food source and perform waggle dance to
inform about the quality of food.
2. Employed bees which gathers data about food source and shares the information
obtained with onlooker bees.
3. Onlooker bees which calculate the fitness value for finding the best source food. In
respect of LB the incoming requests, tasks from overloaded machines presents the
honey bees, being transferred from overloaded machines to under loaded ones. The
222 Z. Bouflous et al.
dynamic approach of this algorithm makes the changes in the status of the load to
be checked real time, and the updated load of the departure overloaded machine is
taken into account for the remaining tasks. However, the algorithm suffers from the
exactitude while calculating the VM load, because it doesn’t take into consideration
the task transfer time between nodes.
Cloud computing (CC) provides powerful tools and methods for on-line digital process-
ing, making it a fascinating field in the computing intelligence (CI). It consists of network
delivery of computational services such as storage, applications and servers. The global
optimization model of the cloud is qualified NP-hard, seeking the use of multiple meta
heuristics as part of the architectural solution. Many challenges are facing the CC nowa-
days with the increased amount of cloud services users, and throughput criterion had
become a big concern for cloud providers, to meet the SLA contractual negotiations.
Load balancing (LB) plays a major role on the availability of the cloud, because of its
aim to equally distribute the load among all servers. This survey overviews the cloud
Analysis of Load Balancing Algorithms Used in the Cloud Computing Environment 223
computing, details its major concepts and 10 reviewed the most used load balancing
algorithms, while identifying the main advantages and limitations in the literature for
future opened researches. A new insight could be discussed here about a collaborative
approach between several load balancing algorithms discussed in the review, taking in
consideration the core advantages presented by each algorithm, and to find an inter-
operability link that could contribute in minimizing the remaining limitations in the
literature, such as make span, throughput and server response time. This approach will
lead to design a new algorithm which gets all parameters about distributed cloud servers
and calling requests, analyze the available data and choose whether to apply Genetic or
Ant Colony algorithms for example. After several processing iterations the overall met-
rics about server response time and performance could be improved. Table 1 summarizes
a comparison between the load balancing algorithms discussed through the review.
Authors’ Contributions. Z.B wrote the main manuscript except conclusion, and prepared Fig. 5,
6, 7, 8 and 9. He contributed in gathering data of related works and analysis of each algorithm.
M.O prepared Fig. 1 and 2 and the conclusion of the manuscript, and contributed in gathering
and choosing data of related works, thus analysis of each algorithm in the article.
K.B prepared Fig. 3 and 4 and contributes in gathering and choosing data of related works,
thus analysis of each algorithm in the article
All authors reviewed the manuscript twice.
Funding. This research didn’t receive any specific grant from funding agencies in the public,
commercial, or not-for-profit sectors.
224 Z. Bouflous et al.
Data Availability. Data sharing not applicable to this article as no datasets were generated or
analyzed during the current study.
List of Abbreviations
A-
ACA: Ant Colony Algorithm
C-
CSP: Cloud Service Provider
CSC: Cloud Service Consumer
CSB: Cloud Service Broker
CC: Cloud Computing
CPU: Central Processing Unit
D-
DCC: Data Center Coalition
E-
ERR: Enhanced Round Robin
EALB: Energy Aware Load Balancing
F-
FT: Fault Tolerance
FCFS: First Come First Serve
G-
GA: Genetic Algorithm
H-HW: Hardware
I-
IaaS: Infrastructure as a Service
IT: Information Technology
I/O: Input/Output
L-
LB: Load Balancing
LC: Least Connection
LBMM: Load Balancing Min-Min
M-
MAMLB: Modified Active Monitoring Load Balancer
MTA: Modified Throttled Algorithm
N-
NIST: National Institute of standards and Technology
O-
OLB: Opportunistic Load Balancing
P-
PaaS: Platform as a Service
PM: Physical Machine
Q-
QoS: Quality of Service
Analysis of Load Balancing Algorithms Used in the Cloud Computing Environment 225
This research did not receive any specific grant from funding agencies in the public,
commercial, or not-for-profit sectors.
Note
The NIST defines CC as ‘a model for enabling ubiquitous, convenient, on-demand
network access to a shared pool of configurable computing resources (e.g., networks,
servers, storage, applications, and services) that can be rapidly provisioned and released
with minimal management effort or service provider interaction’ [5].
References
1. Patel, K.D., Bhalodia, T.M.: An efficient dynamic load balancing algorithm for virtual machine
in cloud computing. IEEE Xplore Part Number: CFP19K34-ART (2019). ISBN: 978-1-5386-
8113-8
2. Salah al-Ahmad, A., Kahtan, H.: Cloud computing review: features and issues, 978-1-5386-
5630-3/18/$31.00. IEEE (2018)
3. Manikandan, N., Pavin, A.: Comprehensive solution of scheduling and balancing load in
cloud – a review. IEEE Xplore Part Number: CFP19OSV-ART (2019). ISBN: 978-1-7281-
4365-1
4. Sahu, S., Pandey, M.: Efficient load balancing algorithm analysis in cloud computing. In: Pro-
ceedings of the Fourth International Conference on Communication and Electronics Systems,
ICCES (2019)
5. Liu, F., et al.: NIST Cloud Computing Reference Architecture Special Publication 500-292
(2011)
6. Hentschel, R., Strahringer, S.: A broker-based framework for the recommendation of cloud
services: a research proposal. In: Hattingh, M., Matthee, M., Smuts, H., Pappas, I., Dwivedi,
Y.K., Mäntymäki, M. (eds.) I3E 2020. LNCS, vol. 12066, pp. 409–415. Springer, Cham
(2020). https://doi.org/10.1007/978-3-030-44999-5_34
7. Jyoti, A., Shrimali, M., Tiwari, S., Pratap Singh, H.: Cloud computing using load balancing
and service broker policy for IT service: a taxonomy and survey. J. Ambient Intell. Humaniz.
Comput. 11, 4785–4814 (2020)
226 Z. Bouflous et al.
8. Ben Hamouda, R., Boussema, S., Ben Hafaiedh, I., Robbana, R.: Performance evaluation of
dynamic load balancing protocols based on formal models in cloud environments. In: Atig,
M.F., Bensalem, S., Bliudze, S., Monsuez, B. (eds.) VECoS. LNCS, vol. 11181, pp. 64–79.
Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00359-3_5
9. Geeta, Prakash, S.: A literature review of QoS with load balancing in cloud computing environ-
ment. In: Aggarwal, V., Bhatnagar, V., Mishra, D. (eds.) Big Data Analytics. AISC, vol. 654,
pp. 667–675. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-6620-7_64
10. Ala’anzi, M., Othman, M.: Load balancing and server consolidation in cloud computing
environments: a meta-study (2019). https://doi.org/10.1109/ACCESS.2019.2944420
11. Asim Shahid, M., Islam, N., Alam, M., Mazliham, M.S., Musa, S.: A comprehensive study
of load balancing approaches in the cloud computing environment and a novel fault tolerance
approach. IEEE Access (2020). https://doi.org/10.1109/ACCESS.2020.3009184
12. AlKhatib, A.A.A., Sawalha, T., AlZu’bi, S.: Load balancing techniques in software-defined
cloud computing: an overview. In: Seventh International Conference on Software Defined
Systems (SDS) (2020)
13. Ramya, R., Puspalatha, S., Hemalatha, T., Bhuvana, M.: A survey on and performance analysis
of load balancing algorithms using meta heuristics approach in public cloud-service provider’s
perspective, 978-1-5386-9432-9/18/$31.00. IEEE (2018)
14. Sanaj, M.S., Joe Prathap, P.M: An Enhanced Round Robin (ERR) algorithm for effective and
efficient task scheduling in cloud environment, 978-1-7281-6453-3/20/$31.00. IEEE (2020)
15. Singh, G., Kaur, K.: An improved weighted least connection scheduling algorithm for load
balancing in web cluster systems. Int. Res. J. Eng. Technol. (IRJET) (2018)
16. Kumar Mishra, S., Sahoo, B., Paramita Parida, P.: Load balancing in cloud computing: a big
picture. J. King Saud Univ. Comput. Inf. Sci. 32, 149–158 (2020)
17. Srinivasa Rao, G., Charan Arur, P., Anuradha, T.: Real time cloud based load balance
algorithms and an analysis. SN Computer Science (2020)
18. Yang, Q., Shao, Y., Cui, H., Fang, Y., Yang, D., Pan, Y.: Energy-aware and load balancing
based dynamic migration strategy for virtual machine. In: 4th International Conference on
Recent Advances in Signal Processing, Telecommunications Computing (SigTelCom) (2020)
19. Sudhakar, C., Jain, R., Ramesh, T.: Cloud load balancing - honey bees inspired effec-
tive request balancing strategy. In: International Conference on Computing, Power and
Communication Technologies (GUCON) (2018)
20. Naregal, K., Kalmani, V.: Study of lightweight ABE for cloud based IoT. In: Proceedings of
the Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)
(I-SMAC) (2020)
NeuroTower: A 3D Neuromorphic Architecture
with Low-Power TSVs
1 Introduction
The main concern regarding neural networks is their incessant need for large amounts of
energy. The ambition of neural networks is to mimic the function of the human brain, and
in doing so requires copious volumes of energy [15]. In order to speed up the process of
learning and processing while maintaining a high degree of efficiency, many accelerator
architectures have been developed. Currently, most of the success in accelerator devel-
opment has come from architectures using general purpose graphics processing units
(GPGPU), which offer a high degree of scalability but poor power efficiency. Contrast-
ingly, application specific integrated circuits (ASIC) offer better power efficiency, but
a low degree of scalability [1, 2]. This paper aims to outline an architecture to acceler-
ate neural networks with good scalability similar to GPGPU, while also providing high
power efficiency like ASIC.
The NeuroTower offers various benefits when compared to existing accelerators.
To start, the NeuroTower makes use of in-memory processing through integration of a
compute layer within a 3D high-density memory package [16–19] where a high degree
of parallelism can be realized. It also does not use a traditional instruction set to carry out
processing of the neural network. As a result, the system uses less energy and operates
with higher efficiency. Lastly, the introduction of a component called the programmable
neurosequence generator (PNG) allows the host to program state machine descriptions
into the architecture, which can give rise to abstractions within the network simplifying
the processing procedure.
This paper expands upon the already existing Neurocube, proposing a method of
reducing the NoC (network on chip) traffic previously limiting the efficiency of the
architecture [3]. This is accomplished through a pruning unit designed to exploit naturally
occurring sparsity in both interconnection weights, and states of neurons [4].
The architecture of the Neurocube is explained in several sections detailing the
function of each of the individual components as well as communication among those
components. Following this is the design for the suggested pruning unit to reduce traffic
through sparsity exploitation.
2 Proposed Method
2.1 Dram
In the Neurocube, memory is integrated as a stack of multiple DRAM chips each sepa-
rated into 16 partitions. Along one column of partitions is a vault as shown in Fig. 1 below.
Each of these vaults has an associated vault controller which controls data movement in
and out of the vaults to other elements of the NeuroTower. Each vault is connected to one
processing element to allow for parallel processing and these connections are realized
by using high speed through silicon vias (TSVs) [5]. The DRAM stack is crucial to the
operation of the system as all the information for processing is contained here. Every
layer of the neural network, their states, and connectivity weights are stored in the vaults
of the DRAM. This implies that the data movement paths are known before beginning
processing. To make use of this, the paths are compiled into finite state machine descrip-
tions which drive the programmable neurosequence generators (PNG) [3]. To initiate
processing the host must load these state machine descriptions into the PNG which
begins the data-driven processing of each layer of the neural network.
DRAM
Layers
Vault
PE16
For each neuron in a layer, the PNG generates the address of connected neurons
and connectivity weights from the previous layer through the address generator. These
addresses are sent to the vault controller which makes accesses to the vault to retrieve
data located at the requested addresses [3]. Figure 2 presents the address generator,
which uses combinational logic in addition to three nested loops to generate the address
of the required neurons. The combinational logic computes the memory address of the
target neurons from the neuron and connectivity counter. As the counters increment,
the combinational logic computes the address of the target neuron and makes a request
to the vault control for the required data. This loop continues until the states of all the
neurons in the layer have been computed [3]. This loop is constantly incrementing and
sending addresses to the vault controller while processing is taking place. This ensures
that processing elements are not wasting clock cycles waiting for data to arrive.
As the PNG receives the data stream from the vault controller, it must also apply the
non-linear activation function to the input neuron state. In the NeuroTower, the activation
function is implemented as a look-up table (LUT). The output state is then encoded into a
packet along with other relevant information as dictated by the encapsulation logic. Once
the packet is ready, the PNG sends it to the router of the network on chip for delivery to
the processing elements [3]. Figure 3 illustrates the interactions of each element of the
PNG with each other, as well as with the vault and network on chip.
2.3 Packets
The processing elements require data in the form of packets to conduct arithmetic.
In addition to state and connectivity weight, each packet also contains the different
identifiers listed below and illustrated in Table 1 [3]:
• MAC-ID: Indicates which MAC unit will conduct processing and which neuron in
the current layer is under computation.
230 A. Asad and F. Mohammadi
Fig. 3. a) Position of the PNG between the vault controller and router. b) Overview of the PNG
architecture.
• Operation-ID: Indicates which neuron from the previous layer is being used as an
input.
• Source-ID: Indicates the DRAM vault accessed for location and is used to locate
where the neurons state should be updated.
• Destination-ID: Indicates which processing element will conduct processing.
In part c, the OP-ID is equal to the OP counter, so data is stored in the temporal
buffer. At this stage, the temporal buffer is full with 16 weights and inputs.
Lastly, the temporal buffer is flushed, and the MAC units receive this data to conduct
processing. The operation counter increments, and relevant data (with the same OP-ID)
is retrieved from the cache memory and stored in the temporal buffer [3].
The Neurocube faces a high degree of difficulty dealing with network on chip traffic and
consumes more power than necessary as a result while also slowing down computation
[3]. As a solution to this problem, sparsity can be exploited to reduce the amount of
data transfer within the system. Though sparsity occurs naturally in activation functions
232 A. Asad and F. Mohammadi
and weights, the amount can be increased without a loss of accuracy through pruning,
wherein values under a threshold are set to zero [4]. Thus, in order to address the issue
of NoC traffic, a pruner unit can be used as a medium to reduce data transfer through
sparsity exploitation [7]. This is the key difference between the NeuroTower and the
existing Neurocube.
In a MAC operation, there are three parameters to consider: input state, weight, and
output state of the previous neuron. If any of these values are 0, the MAC operation is not
necessary and is labeled ineffectual [8]. On the other hand, if all the values are non-zero,
the MAC operation is effectual and must be carried out [9]. Through avoiding ineffectual
operations, the NoC traffic can be reduced while also decreasing power consumption.
The pruner unit, shown in Fig. 6, is simply a comparator used during runtime to
compare a packet’s data field to a predefined threshold. The output of the comparator
is used as an enable signal to control the function of the appropriate MAC unit. If the
data field in the packet is less than the threshold, the comparator outputs a 1-bit null flit
and disables the corresponding MAC unit. Otherwise, the MAC units are enabled, and
data is sent as it normally would. When a MAC unit is disabled from the comparator,
the output is taken to be equal to the state of the current input neuron. As a result, three
unnecessary operations are avoided which not only reduces traffic, but also opens up
space in a given MAC unit for more data to be processed earlier than it would be without
the pruner unit.
The flow of operations in the NeuroTower is depicted through the flow chart in Fig. 7,
with the cases of sparsity existing and not.
NeuroTower: A 3D Neuromorphic Architecture with Low-Power TSVs 233
3 Experimental Results
3.1 Experimental Setup
In order to validate the efficacy of the NeuroTower and other architectures in this work,
we used 3D-Noxim, [10, 11] and GEM5 full-system simulator [12].
We used the Caffe Model Zoo [13] framework to run various DNNs as shown in
Table 2. DNNs in this work were used to perform image classification on the ILSVRC12
dataset [13]. ILSVRC12 is a dataset with 256 × 256 images across 1000 classes.
Traffic traces of real NN workloads, shown in Table 2, are extracted from the GEM5
full-system simulator [12]. For simulating a 3D architecture, the extracted traffic traces
234 A. Asad and F. Mohammadi
Figure 9 shows the system energy, normalized to the 3D Meshed-CPU. Due to the higher
power consumption of interconnection support in 3D Meshed-CPU (complex routing
units and cores) in contrast to the NeuroTower, energy consumption of the 3D Meshed-
CPU is about 1.76 times that of the NeuroTower on average. Due to the low-latency direct
paths between the PEs and Vaults in the NeuroTower, energy consumption is improved
compared to the 3D Meshed-CPU.
4 Conclusion
References
1. Ibrahim, Y., et al.: Soft errors in DNN accelerators: a comprehensive review, Micro-
electron. Reliab. 115, 113969 (2020). https://doi.org/10.1016/j.microrel.2020.113969. ISSN
0026-2714
2. Maxwell, J.C.: A Treatise on Electricity and Magnetism, vol. 2, 3rd edn., pp. 68–73.
Clarendon, Oxford (1892)
236 A. Asad and F. Mohammadi
3. Kim, D., Kung, J., Chai, S., Yalamanchili, S., Mukhopadhyay, S.: Neurocube: a programmable
digital neuromorphic architecture with high-density 3D memory. In: 2016 ACM/IEEE 43rd
Annual International Symposium on Computer Architecture, pp. 380–392 (2016)
4. Mahmoud, M., et al.: TensorDash: exploiting sparsity to accelerate deep neural network
training. In: 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture
(MICRO), pp. 781–795 (2020)
5. Panem, C., Gad, R.S.: Brajesh Kumar Kaushik, Vertical traversal approach towards TSVs
optimisation over multilayer network on chip (NoC). Microelectron. J. 116, 105231 (2021).
ISSN 0026-2692
6. Höppner, S., et al.: The SpiNNaker 2 processing element architecture for hybrid digital
neuromorphic computing. arXiv [cs.AR] (2021)
7. Albericio, J., Judd, P., Jerger, N., Aamodt, T., Hetherington, T., Moshovos, A.: Cnvlutin:
ineffectual-neuron-free deep neural network computing (2016)
8. Judd, P., Delmas, A., Sharify, S., Moshovos, A.: Cnvlutin2: ineffectual-activation-and-weight-
free deep neural network computing (2017)
9. Asadikouhanjani, M., Zhang, H., Gopalakrishnan, L., Lee, H.-J., Ko, S.-B.: A real-time archi-
tecture for pruning the effectual computations in deep neural networks. IEEE Trans. Circuits
Syst. I Regul. Pap. 68(5), 2030–2041 (2021). https://doi.org/10.1109/TCSI.2021.3060945
10. Norollah, A., Derafshi, D., Beitollahi, H., Patooghy. A.: PAT-Noxim: a precise power &
thermal cycle-accurate NoC simulator. In: 2018 31st IEEE International System-on-Chip
Conference (SOCC), pp. 163–168. IEEE (2018)
11. Chen, K.-C., Wang, T.-Y.: NN-noxim: high-level cycle-accurate NoC-based neural networks
simulator. In: 2018 11th International Workshop on Network on Chip Architectures (NoCArc),
pp. 1–5. IEEE (2018)
12. The gem5 simulator. https://www.gem5.org/
13. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput.
Vis. 115(3), 211–252 (2015)
14. Joardar, B.K., Kim, R.G., Doppa, J.R., Pande, P.P., Marculescu, D., Marculescu, R.: Learning-
based application-agnostic 3D NoC design for heterogeneous manycore systems. IEEE Trans.
Comput. 68(6), 852–866 (2018)
15. Asad, A., Kaur, R., Mohammadi, F.: A survey on memory subsystems for deep neural network
accelerators. Future Internet 14(5), 146 (2022)
16. Asad, A., Al-Obaidy, F., Mohammadi, F.: Efficient power consumption using hybrid emerg-
ing memory technology for 3D CMPs. In: 2020 IEEE 11th Latin American Symposium on
Circuits and Systems (LASCAS), pp. 1–4. IEEE (2020)
17. Dorostkar, A., Asad, A., Fathy, M., Jahed-Motlagh, M.R., Mohammadi, F.: Low-power het-
erogeneous uncore architecture for future 3D chip-multiprocessors. ETRI J. 40(6), 759–773
(2018)
18. Asad, A., Ozturk, O., Fathy, M., Jahed-Motlagh, M.R.: Optimization-based power and ther-
mal management for dark silicon aware 3D chip multiprocessors using heterogeneous cache
hierarchy. Microprocess. Microsyst. 51, 76–98 (2017)
19. Al-Obaidy, F., Asad, A., Mohammadi, F.A.: Improving power-performance via hybrid cache
for chip many cores based on neural network prediction technique. Microsyst. Technol. 27(8),
2995–3006 (2020). https://doi.org/10.1007/s00542-020-05048-5
Coherence Domains in Condensed Matter
as Storage “Devices” of Quantum
Information
1 Introduction
Due to the local gauge invariance of L, the oscillations of the matter and
e.m. fields within the system are tuned with each other, for the gauge field
generates the required background in which such dynamics takes place, giving
rise to a new macroscopic coherent quantum state of the system [2,3]. More
specifically, the occurrence of suitable values of temperature and density [2–4]
makes the system to spontaneously undergo a quantum phase transition from a
non-coherent ground state (characterized by uncorrelated matter and e.m. fields
oscillations) towards a coherent ground state (CGS) in which, on the contrary,
these oscillations are phase correlated. This phenomenon can be also described
as a spontaneous symmetry breaking (SSB) according to which, in the “true”
ground state of the system, i.e. the CGS, the matter and e.m. fields are phase-
locked each other. Even more remarkably, such a mechanism also gives rise to
the formation of macroscopic spatial regions, named “coherent domains” (CD)
that are seats of such tuned oscillations. Such domains, that resemble the macro-
scopic state of a superfluid or a superconductor, are characterized by a quantum
wavefunction that is macroscopic in character [4] and given by
where Ψ0 (x, t) is the amplitude of the wave function and Θ (x, t) its quantum
phase, the latter being eigenfunction of a suitable quantum phase operator [5,6].
According to Eq. (5) the matter and e.m. fields of the system exhibit a collective
coherent behaviour giving rise to a quantum field resulting in the “condensa-
tion” of quasi-particles at macroscopic scale. As a very meaningful result of
such dynamics, to each coherent domain is associated an energy gap ffiE < 0
compared to the non-coherent ground state that makes it stable against envi-
ronmental decoherence (for not too high temperatures) [4]. Moreover, the phase
- correlation between matter components and e.m. field occurring within a CD,
resulting from the “phase - locking”, produces a long-range order at macroscopic
scale and a decrease of the entropy, that, in turn, allows for the “storing” of an
amount of information in the CD itself. Such information is even quantum in
nature since it arises from the coherent quantum dynamics of CDs. In particular,
the coherent oscillations taking place inside CDs, determine a rescaling of the fre-
quency of e.m. field, the latter becoming “trapped” inside them, for the photons
Quantum Information by Water CDs 239
Matter fields ψ1 (x, t) and ψ2 (x, t) interact, within the CD volume (of size
λ3 ), with a e.m. field associated to a potential vector of amplitude A (x, t). The
long-term evolution of the system is then described by the equations.
where 0 < γ < π/2 and τ = ω0 t is an adimensional time. The fields given by
Eqs. (8) also satisfy the “phase-locking” constraint [2–4]
∂ϕ ∂θ1 ∂θ2
= − (9)
∂τ ∂τ ∂τ
The new stable state of the system, namely the CGS, assumes, after a very
short transient time, non-vanishing amplitudes, described by Eqs. (8).
The phase-locking constraint given by Eq. (9), occurring in the coherent
state, defines a macroscopic quantum state (including a very high number N
of elementary components of the system), being an eigenstate of the quantum
phase operator with a well-defined eigenvalue Θ (x, t) [5,6] the latter defining
the oscillation of the whole CD as a single macroscopic quantum system.
The evolution from the non-coherent state to CGS implies a reduction of
the CD energy per atom/molecule given by ffiE/N > 0 so that it represents
the “true” ground state of the system that is much more stable compared to
the non-coherent “perturbative” ground state (PGS). For spherically symmetric
CDs of radius RCD , the radial profile of the energy gap (δE = −ffiE < 0) is
given by [4,12]
δE δE
(x) = (0) g (x) (10)
N N
where x ≡ ω0 r/π, r being the radial distance from the CD’s center, δE/N (0) is
the value of the energy gap at the CD’s centre (r = 0) and g (x) ≡ P 2 (x) is a
function of x giving the radial behavior of energy gap. A specific calculation [4]
shows that, for such a CD, the “radius” is given by
3π
rcoh ≡ RCD ≃ (11)
4ω0
showing the energy gap extends far beyond the CDs “borders”, due to the evanes-
cent tail of the coherent e.m. field generated by the coherent dynamics inside
the CD [4].
For this reason, when two or more CDs are close each other, within one
coherent domain we observe a superposition between the “inner” e.m. field and
the evanescent e.m. field due to the neighbouring domains. This is energetically
advantageous for the system for it becomes more stable compared to far isolated
domains. We can then interpret this increase of the energy gap as the arise
of a “binding” energy between neighbouring CDs, This energy gain or binding
energy acquires a maximum value when the coherent domains are the most
closely packed, namely, in the case of spherically symmetric CDs, when the
interdomains distance d satisfies the condition.
d = 2RCD (13)
For a couple of such closely packed CDs, the function g (x) assumes, within
a single domain, the form
√
sin (πx) 2 3
P (x) = + exp −π −x (14)
πx π (3 − 2x) 4
and the full profile of g (x) including both the two close domains is represented
in Fig. 1.
Fig. 1. Graph of the Function g (x) for a Couple of Close Interacting Domains: a)
Isolated Domains (Dotted Line); b) Close Domains (Continuous Line)
would be able to desynchronize some of them pushing these out from the CD.
In this case, we can quantify the number of components belonging to coherent
(non-coherent) state by the average fraction of the total species in this state
Fc (T ) (Fnc (T )) such that, at a given temperature
with ffiS = S1 − S0 being the variation of entropy occurring during the pro-
cess and I is the associated information expressed in bits. By following [7] we
obtain the expression of the information that can be stored in a coherent system
including N elementary oscillating components
δE Fcoh (T ) N
I=− (19)
N coh ln 2kB T
In particular, Eq. (19) tells us the quantity of quantum information storable
in a coherent domain is proportional to the energy gap/molecule, to the coherent
fraction and to the total number of elementary components of the system, while
it decreases with the temperature. Furthermore, according to Eq. (19), the total
amount of information storable in a given system, including a number nCD of
different coherent domains, is directly proportional to nCD so that we can write
I (nCD ) = ICD nCD (20)
where ICD indicates the information stored per single coherent domain as given
by Eq. (19). If we consider a network of closely packed coherent domains, whose
intercentrum distances are all equal to 2RCD , we must account for the energy
gain due to such configuration in order to calculate the overall information
storable in the network of coherent domains. We firstly note (let’s consider Fig. 1
as well as Eq. (12) and Eq. (14)) that, in this configuration, at x = 3/4 the
energy gained by a couple of packed CDs is about four time larger than in the
two isolated domains. In fact, by using Eq. (14), we obtain
δE12 δE1
(RCD ) ≈ 32 9π 2 8 9π 2 = 4
(RCD ) (21)
N N
where δE1 is the energy gap of an isolated domain and δE12 that of two close
domains. Ideally, as already suggested, we could think of the energy gap per
couple of domains as a “binding” or, even, a “potential” energy V12 associated
to the couple of coherent domains in this geometrical configuration. In first
approximation we could further assume
V12 δE12 δE12 32 δE1
(x) = (x) ≈ (RCD ) = (0) ≡ ffiV (22)
N N N 9π 2 N
the overall energy gap associated to a network of closely packed CDs, each con-
taining N elementary components, in the above geometrical configuration, can
be then calculated in the same way as the potential energy of a distribution of
point-wise electric charges interacting via a Coulomb - like potential, namely,
nCD −1 nCD
Vtot = Vij (23)
i=1 j>i
that can be compared with the expression of Ṽtot in the case of nCD “isolated”
coherent domains, that is
8 δE1
Ṽtot = (0) nCD (25)
9π 2 N
As regard as the total amount of information I˜tot stored by a network of
interacting coherent domains we obtain, by inserting Eq. (24) in Eq. (19):
We see from Eq. (20) and Eqs. (29)–(30) the information stored in n isolated
coherent domains is O (n) while, in the case of n interacting domains, for large
n, it is O n2 . Figure 2 shows the plots of Itot (isolated domains) respectively
for bulk and interfacial water, while Fig. 3 the plots of I˜tot as a function of nCD .
We note the amount of information storable by water CDs is noticeable, even
for a low number of coherent domains, both for bulk and interfacial water. For
example, we have, for nCD = 100, Itot = 1.08 · 107 bits, I˜tot = 2.14 · 109 bits for
bulk water and Itot = 2.70 · 107 bits, I˜tot = 5.35 · 109 bits
for interfacial water.
The higher the value of nCD , the higher the ratio I˜tot Itot . For example, at
nCD = 106 we have I˜tot Itot ∼106 . It is interesting to estimate the amount of
quantum information storable by the coherent domains enclosed in a given vol-
ume of water, by assuming the geometrical configuration corresponding to their
closest packing satisfying Eq. (13). This can be done by considering spherically
symmetric domains characterized by a radius RCD ≃ 375A. This calculation is
shown in Fig. 4.
Quantum Information by Water CDs 245
Fig. 2. Information Itot Stored by Water Domains: (a) Bulk Water; (b) Interfacial
Water as a Function of n.
Fig. 3. Information I˜tot Stored by Water Domains: (a) Bulk Water; (b) Interfacial
Water as a Function of n.
Fig. 4. Information I˜tot Stored by Water Domains as a Function of the Overall System
Volume.
246 L. M. Caligiuri
References
1. Maggiore, M.: A modern Introduction to Quantum Field Theory, pp. 243–247.
Cambridge University Press, Cambridge (2004)
2. Del Giudice, E., Vitiello, G.: The role of the electromagnetic field in the formation
of domains in the process of symmetry breaking phase transitions. Phys. Rev. A
74, 022105 (2006)
3. Del Giudice, E., Vitiello, G.: Quantum fluctuations, gauge freedom and meso-
scopic/macroscopic stability. J. Phys.: Conf. Series 87, 012009 (2007)
4. Preparata, G.: QED Coherence in Matter. World Scientific, Singapore (1995)
5. Del Giudice, E., Tedeschi, A.: Water and autocatalysis in living matter. Electro-
magn. Biol. Med. 28, 46–52 (2009)
6. Caligiuri, L.M.: The quantum phase operator and its role in quantum comput-
ing. In: Caligiuri, L.M. (ed.) Frontiers in Quantum Computing, pp. 39–56. NOVA
Science Publisher, New York (2020)
7. Caligiuri, L.M.: QED coherence in matter, syntropy and the coherent domains as
storing “devices”. J. Phys.: Conf. Series 2197, 012004 (2022)
8. Caligiuri, L.M.: Quantum (hyper)computation by means of water coherent domains
- Part II: the computational level. In: Caligiuri, L.M. (ed.) Frontiers in Quantum
Computing, pp. 57–102. NOVA Science Publisher, New York (2020)
9. Caligiuri L.M.: Fast and accurate control of gates for quantum hypercomputation
in coherent domains of water. J. Phys. Conf. Series 2162 012025 (2022)
10. Caligiuri L.M.: Quantum (hyper)computation through universal quantum gates in
water coherent domains. J. Phys. Conf. Series 2162 012003 (2022)
11. Caligiuri, L.M.: QED coherence and super-coherence of water in brain microtubules
and quantum hypercomputation. In: Bandyopadhyay, A., Ray, K. (eds.) Rhythmic
Advantages in Big Data and Machine Learning. SRE, pp. 225–262. Springer, Sin-
gapore (2022). https://doi.org/10.1007/978-981-16-5723-8 9
12. Caligiuri, L.M.: Quantum (hyper)computation by means of water coherent domains
- Part I: the physical level. In: Caligiuri, L.M. (ed.) Frontiers in Quantum Com-
puting, pp. 1–37. NOVA Science Publisher, New York (2020)
13. Buzzacchi, M., Del Giudice, E., Preparata, G.: Coherence of the glassy state. Int.
J. Mod. Phys. B 16(25), 3771–3786 (2001)
14. Del Giudice, E., Tedeschi, A., Vitiello, G., Voeikov, V.: Coherent structures in
liquid water close to hydrophilic surfaces. J. Phys. Conf. Series 442, 012028 (2013)
15. Brillouin, L.: Science and Information Theory, pp.152-153. Dover Publication, New
York (1962)
16. Del Giudice, E., Spinetti, P.R., Tedeschi, A.: Water dynamics at the root of meta-
morphosis in living organisms. Water 2, 3771–3786 (2010)
17. Caligiuri, L.M.: Super-coherent quantum dynamics of zero-point field and super-
luminal interaction in matter. In: Amoroso, R.L., Kauffman, L.H., Rowlands, P.,
Albertini, G. (eds.) Unified Field Mechanics II: Formulation and Empirical Tests,
pp. 331–343. World Scientific, Singapore (2018)
Antecedents of Software-as-a-Service Adoption
for Small and Medium Enterprise in Developing
Countries
1 Introduction
The development of telecommunications and technology has long been considered one
of the most important growth drivers in various industries [1]. This makes it necessary for
companies to adopt and use new technologies to improve quality and increase benefits
in all industries [3]. Cloud computing has made significant progress in the last decade,
with usage reaching up to 94 per cent (Shang & Kauffman, 2020). As a result, cloud
computing has become one of the most advanced solutions IT for public and private
companies around the world. According to a recent study, the market for cloud computing
will grow by more than 17% in 2019, reaching $200 billion in 2019 and $278 billion in
2022 [11].
The below is a summary of the paper’s framework: The first Section gives an overview
of SaaS cloud computing, including the research question and objectives as well. The
second section digs into the literature review which includes CC service delivery models,
research methodology and SaaS adoption challenges. Section three contains the proposed
framework and the hypothesis. While the last Section including the main conclusions
and the future work.
Problem Background
Low performance and lack of economic diversity are the result of the absence of a tech-
nological environment for entrepreneurship that necessitates the use of cloud computing
SaaS in small and medium enterprises (SMEs). This is to enable enterprises with limited
budget [12] and human resources to benefit from SaaS cloud computing and reduce the
cost of IT infrastructure and expertise [13, 14] to improve their economic viability.
Despite the benefits of SaaS, there are a number of barriers to adoption. Different
authors have pointed out different difficulties in adopting SaaS in different situations.
According to the findings of [16], these are customization, security and privacy [17],
virtualization and multi-tenancy [9], lack of familiarity with the definition of cloud
computing services & insufficient knowledge, [10] loss of control, regional regulations
[10], governance [21], lack of a well-managed and established standard, management
resistance [22].
Despite the many benefits cloud computing can offer, companies are reluctant to
adopt it [26]. In a modern environment, the pace of technological innovation and new
thinking is a hot topic. Although several studies have been conducted to investigate the
barriers to adoption of CC, they are insufficient [27], and the question remains: How can
a technology like cloud computing (CC) help SMEs overcome the previous problems?
In the context of developing countries, Qatar offers an excellent economic climate
[28], but the majority of SMEs in Qatar still struggle with the use of cloud computing
SaaS and the adoption of cloud computing is still in its infancy with only 3%.
Research Questions
This research will address the antecedents that influence intention to adopt cloud com-
puting SaaS for small and medium-sized enterprise in Qatar. The aim of this research is
to find answers to the questions listed below.
“How can a model be developed to enable the adoption of SaaS in SMEs?” The
following sub-questions support the primary research question:
Research Objectives
The research objectives can be formulated as follows:
This research will address the antecedents that influence the intention to adopt cloud
computing Software as a service for SMEs in Qatar.
2 Literature Review
Research Methodology
Howell defines research methodology as the general research strategy that outlines the
way in which research is to be conducted and specifies, among other things, the methods
to be used in the process [2].
In order to answer the research question, survey research needs to collect data from
various sources. For this research, the existing literature was also consulted and reviewed.
Therefore, the research design used can be called survey research.
CC adopter organizations in the UK using the TOE model. The study’s findings revealed
that while risks are frequently connected with SEMs’ adoption of innovative technolo-
gies, organizational innovativeness and associated capacities play a critical influence in
cloud computing adoption.
In Universities in Malaysia, a study was done to determine the differences based on
SaaS adoption [31]. The research emphasized the importance of innovation for SaaS
adoption, as well as effort expectation, social impact, performance expectancy, self-
efficacy, peer, and even superior components. As stated by Awodele et al., issues in cloud
computing services can be in terms of (a) Network and Data Security, (b) Governance,
Compliance and legal, and (c) Communication interface and Virtualization Security
[18]. Haider and Selvan, have addressed the inability to maintain data confidentiality
because of the huge number of access devices and applications to store and manage data
in cloud-based storage to be the most prominent issue in cloud computing [19].
Security and privacy challenges associated with SaaS be addressed as follows [15].
Following findings from (Nema, S. 2016), security and privacy issues in cloud computing
are in terms of Loss of control, Lack of transparency, Virtualization, and Multi-Tenancy
as shown in figure [9]. Kumar et al., narrated that Loss of control in cloud computing
can be in terms of data loss and breach, data storage, and sharing under several Multiple
Regional Regulations [10]. Research on the use of cloud computing infers nearly 63%
of customers disagree to use services of cloud services providers in case the vendor fails
to prevent loss of data through unauthorized accesses.
having knowledge of security features in order to breach the security. Moreover, sharing
cloud services by a number of users reduces the control of organizations over IT/IS.
While small and medium enterprises find it quite comfortable to enjoy services of
a common shared pool, larger organizations do not find it convenient. This is because
of the fact that a shared information pool allows users to use information stored by
other users and hence costs of information collection, maintenance and security reduces
significantly which in turn helps in cost reduction of SMEs.
Quality of service and security provisions affects the decision of using cloud services
by government organizations [24]. This is because; some of the government organiza-
tions such as national financial institutions and data related to the defense sector are
highly confidential and hence cannot be taken a chance of getting any kind of unautho-
rized access. The use of any kind of cloud services by financial and defense organizations
is supposed to make them susceptible to getting attacked by cyber attackers and get the
entire country to go through a financial crisis and breach in national security.
Factors identified by Tehrani and Shirazi, [25] influence the Adoption decision.
These factors are; External Support, Pressure of competition in the market, Knowledge
of decision-makers and employees on efficiencies of Cloud Computing, Information
intensity, potential advantages, privacy and security, innovativeness, complexity, triala-
bility, and compatibility with business requirements, used technologies, and company
norms.
To summarize, the majority of past research focused on businesses as a whole, with
only a few studies focusing specifically on small and medium-sized businesses. Previous
research has primarily focused on the adoption of CC, with only a few studies focusing
specifically on SaaS. While these studies may not be able to exactly pinpoint the benefits
that small and medium-sized businesses can achieve by adopting SaaS, very few studies
are able to address the reasons why their issues with cloud computing SaaS persist.
Only a small amount of research has been done in Qatar to investigate the antecedents
of SMEs adopting SaaS, and only 3% of Qatari SMEs have done so. Very minimal
studies are also looking at human level adoption for small and medium-sized businesses
in developing or industrialized countries. As a conclusion, more research is needed to
determine the various antecedents that may influence SaaS adoption among SMEs in
developing nations.
4 Conclusion
Across all the services especially on request, SaaS cloud computing is a technology where
computation can be carried out at extremely low costs. To encourage more companies
Antecedents of Software-as-a-Service Adoption 255
to adopt SaaS cloud computing, the service providers need to address all the challenges
affecting the adoption for their clients. In this paper, we have addressed and demonstrated
the numerous obstacles and challenges related to SaaS cloud computing challenges.
Addressing these challenges requires research to be conducted as a future work
from different areas like informatics, statistics, risk modeling, social sciences, and phys-
iological factors. Also, a to investigate the relationship between Relative Advantage,
Complexity, Security & Privacy, Compatibility, Competitive pressure, Regulatory Sup-
port, Awareness, Top Management support, Culture, Personal innovation, Prior Tech
experience, and SaaS adoption as well. As a future work, a research need to conduct to
test the developed model and the hypotheses by collecting the data using a quantitative
technique via surveys of SMEs in Qatar as a developing country then analyzed the data
using the Smart PLS tool to sort out the significant factors that may influence SaaS
adoption in the developing countries.
References
1. Gangwar, H., Date, H., Ramaswamy, R.: Understanding determinants of cloud computing
adoption using an integrated TAM-TOE model. J. Enterp. Inf. Manag. 28(1), 107–130 (2015).
https://doi.org/10.1108/JEIM-08-2013-0065
2. Howell, K.E.: Introduction to the Philosophy of Methodology. Sage Publications, London
(2013)
3. Arvanitis, S., Kyriakou, N., Loukis, E.N.: Why do firms adopt cloud computing? A compara-
tive analysisbased on South and North Europe firm data. Telematics Inform. 34(7), 1322–1332
(2017). https://doi.org/10.1016/j.tele.2016.05.013
4. Mell, P.M., Grance, T.: The NIST definition of cloud computing (2011). https://doi.org/10.
6028/nist.sp.800-145
5. Hashizume, K., Rosado, D., Fernández-Medina, E., Fernandez, E.: An analysis of security
issues for cloud computing. J. Internet Serv. Appl. 4(1), 5 (2013)
6. IaaS, PaaS and SaaS – IBM Cloud service models. https://www.ibm.com/cloud/learn/iaas-
paas-saas. Accessed 24 July 2019
7. Cloud computing service and deployment models: layers and management. Choice Rev.
Online 50(07) (2013). https://doi.org/10.5860/CHOICE.50-3896
8. Alajmi, Q., Sadiq, A.S., Kamaludin, A., Al-Sharafi, M.A.: Cloud computing delivery and
delivery models: opportunity and challenges (2018). https://doi.org/10.1166/asl.2018.11537
9. Nema, S.: A survey of security and privacy challenges in cloud computing. Int. J. Adv. Res.
Comput. Commun. Eng. 5(3), 191–194 (2016)
10. Kumar, D., Samalia, H.V., Verma, P.: Exploring suitability of cloud computing for small and
medium-sized enterprises in India. J. Small Bus. Enterp. Dev. 24(4), 814–832 (2017). https://
doi.org/10.1108/jsbed-01-2017-0002
11. Gartner, Assessing the Security Risks of Cloud Computing (2008).
http://www.gartner.com/DisplayDocument?id=685308 La. Accessed July 2020
12. Mwaniki, P., Ondiek, C.: Evaluation of the effects of SaaS on SMEs in Nairobi County, Kenya.
J. Inf. Syst. Eng. Manag. 3(3), 20 (2018)
13. Fakieh, B., Blount, Y., Busch, P.: SMEs and cloud computing: the benefits to the national
economy and global competitiveness. Paper presented at the Conference: The 13th European
Mediterranean & Middle Eastern Conference on Information Systems. EMCIS (2016)
14. Trinh, T.P., Pham, C.H., Tran, D.: An adoption model of Software as a Service (SaaS) in
SMEs. Paper presented at the PACIS (2015)
256 A. M. A. Ibrahim and N. S. Abdullah
15. Sakr, S., Zomaya, A.: Encyclopedia of Big Data Technologies, 1st edn. Springer, Cham (2019).
https://doi.org/10.1007/978-3-319-77525-8 . eReference ISBN 978-3-319-77525-8
16. Aleem, S., Ahmed, F., Batool, R., Khattak, A.: Empirical investigation of key factors for SaaS
architecture dimension. IEEE Trans. Cloud Comput. 9, 1037–1049 (2019)
17. Sun, Y., Zhang, J., Xiong, Y., Zhu, G.: Data security and privacy in cloud computing. Int. J.
Distrib. Sens. Netw. 10(7), 190903 (2014)
18. Awodele, O., Ominike Akpovi, A., Adebayo, A.O., Tayo, O.O.: Security and privacy issues
in cloud computing (2017). ISSN: 2394-4714
19. Haider, Y., Selvan, S.: Confidentiality issues in cloud computing and countermeasures: a
survey (2016)
20. Branco, T., Jr., de Sá-Soares, F., Rivero, A.L.: Key issues for the successful adoption of cloud
computing. Procedia Comput. Sci. 121, 115–122 (2017)
21. Awodele, O., Adebayo, A.O., Tayo, O.O.: Security and privacy issues in cloud computing.
Commun. Appl. Electron. 7(3), 14–17 (2017)
22. Ahmed, A.M., Moreton, R., Mehdi, Q.H., Elmaghraby, A.: E-government services challenges
and opportunities for developing countries: the case of Libya. Paper presented at the 2013
Second International Conference on Informatics and Applications (ICIA) (2013)
23. Hsu, C.-L., Lin, J.-C.: Factors affecting the adoption of cloud services in enterprises. IseB
14(4), 791–822 (2015). https://doi.org/10.1007/s10257-015-0300-9
24. Alsanea, M., Barth, J., Griffith, R.: Factors affecting the adoption of cloud computing in the
government sector: a case study of Saudi Arabia. Int. J. Cloud Comput. Serv. Sci., 36 (2014)
25. Tehrani, S.R., Shirazi, F.: Factors influencing the adoption of cloud computing by small
and medium size enterprises (SMEs). In: Yamamoto, S. (ed.) HCI 2014. LNCS, vol. 8522,
pp. 631–642. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07863-2_60
26. Fishman, C.: The insourcing boom. The Atlantic, pp. 44–52, December 2012
27. Gide, E., Sandu, R.: A study to explore the key factors impacting on cloud based service
adoption in Indian SMEs. In: 12th International Conference on e-Business Engineering,
pp. 387–392. IEEE (2015)
28. Arabian Business: Qatar SMEs face barriers to 2022 projects (2013). https://www.arabianbu
siness.com/qatar-smes-face-barriers-2022-projects-485087.html
29. El-Haddadeh, R.: Digital innovation dynamics influence on organisational adoption: the case
of cloud computing services. Inf. Syst. Front. 22(4), 985–999 (2019). https://doi.org/10.1007/
s10796-019-09912-2
30. Senarathna, I., Wilkin, C., Warren, M., Yeoh, W., Salzman, S.: Factors that influence adoption
of cloud computing: an empirical study of Australian SMEs. Australas. J. Inf. Syst. 22 (2018).
https://doi.org/10.3127/ajis.v22i0.1603
31. Yadegaridehkordi, E., Nilashi, M., Shuib, L., Samad, S.: A behavioral intention model for
SaaS-based collaboration services in higher education. Educ. Inf. Technol. 25(2), 791–816
(2019). https://doi.org/10.1007/s10639-019-09993-1
32. The Report: Qatar 2008 - Page 178 - Google Books Result. https://books.google.com.
qa/books?id=stSsFxDTl4QC&pg=PA178&lpg=PA178&dq=problem+of+high+cost+of+
skilled+labour+%26+spaces+in+qatar&source=bl&ots=5y1PtZWyjy&sig=ACfU3U342dib
1LmBUbGBh0Szt8oGi0AA1g&hl=en&sa=X&ved=2ahUKEwibh76v7ZroAhXo63MBHX
JHCXMQ6AEwAHoECAoQAQ#v=onepage&q=problem%20of%20high%20cost%20of%
20skilled%20labour%20%26%20spaces%20in%20qatar&f=false
33. Singh, A., Sharma, S., Kumar, S.R., Yadav, S.A.:Overview of PaaS and SaaS and its applica-
tion in cloud computing. Paper presented at the 2016 International Conference on Innovation
and Challenges in Cyber Security (ICICCS-INBUSH) (2016)
34. Hsu, C.-L., Lin, J.C.-C.: Exploring factors affecting the adoption of internet of things services.
J. Comput. Inf. Syst. 58(1), 49–57 (2016). https://doi.org/10.1080/08874417.2016.1186524
35. Rogers, E.M.: Diffusion of Innovations, 5th edn. Free Press, New York (2003)
Software as a Service Challenges: A Systematic
Literature Review
1 Introduction
Migration to SaaS faces various obstacles that go beyond the technology itself. A mass of
antecedents influences the adoption of cloud computing SaaS. These antecedents must
be systematically evaluated prior to making the decision to adopt SaaS solutions. The
goal of this research is to find and assess previous studies on SaaS cloud computing
adoption challenges, as well as to pinpoint any gaps that may exist.
The following is a breakdown of the structure of this paper: Sect. 1 provides an
overview of cloud computing, including a definition of CC, challenges to cloud expan-
sion, the different service delivery models and SaaS benefits as well. Section 2 delves
into research methodology, as well as why SLR is relevant in this study and the scope of
the study. Section 3 summarizes the results of the search and the main conclusions while
Sects. 4 and 5 contain the primary discussion, conclusion, future work with explanations
of the limitations and factors that may influence this research.
Problem Background
Cloud computing is the supply of any provided services via the Internet at cheap rates
by paying only for what you consume [1]. Cloud Computing, as defined by the National
Institute of Standards and Technology (NIST), is a model for delivering and imple-
menting a shared resource of configurable computing resources (e.g. networks, servers,
stockpiling, apps, and services) with relatively minimal exertion or interaction among
organizations and consumers [2]. Furthermore, using flowcharts, the moniker Cloud was
utilized to identify the cloud service symbolized by the cloud sign that makes up the
internet.
According to [3] and [4], there are three ‘levels’ of cloud computing delivery mod-
els from which consumers can pick. Infrastructure-as-a-Service (IaaS), Platform-as-
a-Service (PaaS), and Software-as-a-Service (SaaS) are the three services. Figure 1
depicts the differences among those levels [5]. IaaS consists of infrastructure-centric IT
resources that allow users to have complete control over their configuration and use. An
IaaS environment typically consists of computer hardware, operating systems, networks,
connectivity, and other raw IT resources.
PaaS refers to a data modeling framework that is ready to use. This service level
consists of IT resources that have been configured and deployed but do not include
infrastructure [6]. SaaS provides a comprehensive solution that is run and managed by
the service provider [7]. Its software licensing methods [8], in which customers can
Software as a Service Challenges 259
access apps made by others via an internet browser [9]. SaaS is a consumable cloud-
based service that is used financially by a set of cloud users. Examples include Office
360, Jira, Google Drive, Oracle CRM, MS Azure, and HubSpot.
Despite the growing acceptance of cloud computing, entrepreneurs and scholars have
been vocal about the antecedents and obstacles that this new paradigm has presented.
Some of the difficulties are critical in nature, such as Privacy and data confidentiality.
Other difficulties, like inadequate performance, vendor lock-in, and limited bandwidth,
are a logical outgrowth of this innovation [15]. Those antecedents must be addressed
properly to increase the SaaS adoption level. We conducted a Systematic literature study
of potential SaaS issues and used this information to classify these obstacles into a
categorization that can be used as a framework to encourage an international discussion
260 A. M. A. Ibrahim et al.
on approaches and tools. The goal of our study is to learn more about the kind of
antecedents and challenges that have recently emerged.
2 Methodology
As narrated by Armstrong et al. [16], Systemic reviews are a kind of literary research
utilizing systematic methods for data gathering and evaluating previous studies. Trans-
parency considers one of the important principles in systematic literature reviews [17].
SLR meant to provide a detailed and comprehensive review from the previous studies to
the main research question by establishing formalized and well-defined questions, then
investigation through the correlated papers [18] by evaluating all relevant researches,
and finally synthesizing the finding. This means the main three phases as shown in Fig. 3,
are plan, Conduct, and report [19].
As summary and guidelines from [20–22] a detailed step has been extracted to
perform SLR in this research, these steps are:
Data was collected from Scopus, Thomson Reuters, Elsevier, Science Direct,
Springer, IEEE, Library Genesis, Google Scholar, and World Cat. In addition, some
ranked journals like Telematics and Informatics, Computer Standards & Interfaces, Jour-
nal of Enterprise Information Management, Information Systems Frontiers, Information
Development, Information Systems, International Journal of Information Systems and
Project Management, Australasian Journal of Information Systems, Journal of Enter-
prise Information Management and Journal of Enterprise Information Management to
find the papers published since 2015 and above.
• Inclusion Criteria which has been used to extract the most related papers are:
• Exclusion Criteria which has been used to exclude the non-related papers are:
3. Quality Assurance: For a positive and valuable analysis, Quality assurances should
be taken into consideration by implementing the below criteria:
• Choose Studies from reputable and esteemed libraries and repositories only.
• Articles from well-known journals only.
• Ranked journals only
5. Synthesize Data and Document Outcomes: The outcome from reviewing the
selected 17 papers is addressed in Table 1. That shows the selected papers, and
measured factors.
Table 1. (continued)
Table 1. (continued)
Security: Although Most of the SMEs using SaaS services and enjoy services of com-
mon cloud space, Security considered as one of the significant challenge to adopt
cloud computing [39, 40]. With the incorporation of logical isolation in various virtual
machines that is developed with the underlying technology of SaaS cloud computing that
affects the vulnerability of data piracy and data security. Security and privacy challenges
associated with cloud computing can be addressed as follows [41]. The security breach
in the SMEs has occurred with the unauthorized access of the data network and fault
authentication code. A research on use of cloud computing infers nearly 63% customers
disagreeing to use services of cloud services provider in case the vendor fails to prevent
loss of data through unauthorized accesses.
Privacy: Data privacy in SaaS cloud computing [42] refers to the prevention of potential
adversary by cloud services when users visit sensitive data. It is done by assessment of
user behavior in context to visit model of the user. Moreover, the researchers also have
focused on a technology used for maintaining data privacy in cloud computing. This
technology is ORAM or otherwise known as oblivious RAM [43].
Governance: Any business deals with a considerable amount of data need to make
sure that all assets are totally controlled and properly managed. Without an appropriate
governance of data procedure, no SME’s will successfully manage their information, and
the privacy of the information will be broken [46]. In Cloud computing is availability of
confidential information regarding position and security features of data center. Haider
Software as a Service Challenges 265
and Selvan, [15] have addressed inability to maintain data confidentiality because of
huge number of access devices and applications to store and manage data in cloud-
based storage to be the most prominent issue in cloud computing. In addition, it is well
known to everybody that users of cloud computing store data in and extract data from a
shared pool of computing resources.
Financial Resistance: This is one of the critical factors for the manager for not adopting
cloud computing in their business place. In addition, the underlying reason includes many
SMEs could not spend a large amount of money for the implementation of this Cloud-
based service. Even lack of poor technical knowledge along with the no proper investor
for funding the Cloud-based technology gives the SME minimum bargaining power.
Cloud-based technology is also new to the market. Therefore, it also requires adequate
knowledge and technical skills to implement the technology, in particular, SMEs [48].
Performance: There are several performance issues in cloud computing services. One
of them is that one cannot use applications that are not suitable for the cloud, so this
is very important to identify the most suitable applications for cloud computing. One
important point is being aware of which physical server is the application is running.
Without knowing this, no SMEs can find the root causes of any problems associated with
the performances. It is very much necessary to know for an SME’s or enterprise how
much CPU is consumed by a particular application. It should ensure that the services
are allocated according to several priorities of a specific business.
Service Level of Agreement (SLA): While SMEs rely on the Cloud service provider
to handle and manipulate their data, and accepted agreement including SAL should
be signed [54]. There are many users and companies to resist the adoption of cloud
computing solutions for the privacy of their confidential information and the quality of
the provided service. In this context, a Service Level Agreement can be used. According
to Alkhater [37], Service Quality in SLA play a crucial role in increasing cloud Adoption.
This increases the trust in products provided to the clients through a transparent form of
guarantees provided by the service providers to the subscribers.
Confirmation: are those provided by the service providers to the clients about their
services and this is one type of guarantee to the clients about how good will the service
be. Usually, in this context, clients get the confirmation of several security aspects such as
data security and the guarantee of not losing any important information of the customer
that is stored in the cloud.
Quality of Service: is also another confirmation that the clients get from their ser-
vice provider. In case clients do not get any trustworthy confirmation from the service
providers, they will not take the service from those service providers. Besides, functional
testing and non-functional testing are two important sectors that put an impact on the
adoption of cloud computing services.
App Rating and Free Alternative to Paid Apps: In today’s time, free apps are
becoming more popular than the apps that are paid which means, not free. However,
this has some drawbacks. Free Apps are easy to get and access, it is true, but at the
same time, the quality of those apps is not up to the mark. Paid Apps usually have bet-
ter quality. It can easily state that those apps provide better features and facilities than
the free Apps. Besides the fact of quality, the paid apps are more secure to use as the
developers put more effort into developing these apps compared to the free apps. In the
adoption of cloud computing, the idea of security and confidentiality is one of the most
important parts. Therefore, if an app is free but provides poor security and quality, and
fewer features, clients will ignore those apps.
Besides these, cost, service provider reliability, vendor lock-in is some other critical
challenges for the adoption of cloud computing [53]. Organizations are also influenced
because of some prominent factors to adopt Cloud computing technology. The most
prominent of these factors can be identified to be the ability to reduce cost and relevant
benefits. Apart from these, more factors identified by Tehrani and Shirazi [55] influence
268 A. M. A. Ibrahim et al.
the decision. These factors are; External Support, Pressure of competition in the mar-
ket, Knowledge of decision-makers and employees on efficiencies of cloud computing,
Information intensity, potential advantages, privacy and security, innovativeness, com-
plexity, trialability, and compatibility with business requirements, used technologies,
and company norms.
Awodele et al. [39] stated that issues in cloud computing services could be in terms
of (a) Network and Data Security, (b) Governance, Compliance and legal, and (c) Com-
munication interface and Virtualization Security. As discussed by [40] various factors
affect the decision of adopting cloud services.
4 Discussion
As discussed there are many significant challenges for Adopting SaaS Cloud Computing.
Security is considered as the most significant factor affecting SaaS adoption [28, 30, 32,
37, 56, 58 and 59]. While Matias and Hernandez [51] consider Regulatory support as
the most significant factor affecting Adoption decision, another research [37] argue that
Quality of services, Security, Privacy, and Trust has a significant impact on adoption
decision.
According to Some researches [27, 28, 30, 32, 34, 54, 56–58, 60], Benefits like
saving cost or Positive Relative Advantages have a significant impact on the adoption
decision. Other research adds CC Awareness, Innovativeness Skills have a significant
direct impact on the adoption decision [26, 27, 38, 59, 60 and 61]. Different researchers
[37, 38] added, Quality of service has a significant and positive relationship toward CC
adoption.
However, the considerable conclusion is that no matter how many antecedents are
faced in adopting SaaS it is always beneficial for SMEs to utilize the SaaS cloud comput-
ing technology [2]. One of the significant findings that there is no comprehensive study
had conducted to study antecedents that affect Cloud Computing adoption. None of
those studies focused on both benefits and Scarifies from a technological and behavioral
perspective. Another Significant finding is that none of those been studied the human
factors (Prior technology experience, personal innovation) or attractiveness alternatives.
Moreover, it is noticed that most of the previous adoption studies using TOE [26, 28,
34, 37, 51, 58 and 61]. Or integrating TOE with TAM or DOI [13, 27, 30, 32, 38, 56,
57, 59, and 60].
Questions, Forming Query Strings, Quality Assurance and the limitation that could
affect this sturdy.
It also outlined include/Exclude Criteria as well. After that extract Literature based on
the predefined Criteria has been done. Finally Synthesize data and Document Outcomes
have been performed. As a result, researchers must address these issues by developing
a comprehensive framework to address the highlighted SaaS Adoption obstacles as a
future work in order to grow the market share of SaaS cloud computing and make it
work in practice.
References
1. Ratten, V.: Cloud computing technology innovation advances: a set of research propositions.
Int. J. Cloud Appl. Comput. 5(1), 69–76 (2015). https://doi.org/10.4018/ijcac.2015010106
2. Mell, P., Grance, T.: The NIST definition of cloud computing. National Institute of Standards
and Technology, Gaithersburg, USA, September 2011
3. QSS: Cloud Computing – Delivery and Deployment Models (2019). https://www.qsstechno
soft.com/cloud-computing-delivery-and-deployment-models/
4. Pillai, S.: Cloud computing delivery models explained (2014). https://www.ibm.com/blogs/
cloud-computing/2014/03/17/cloud-computing-delivery-models-explained/
5. www.binaryinformatics.com (2019). Cloud Service Model – Understand the Types, Charac-
teristics, & Advantages. http://blog.binaryinformatics.com/technology/what-is-the-cloud-ser
vice-model/. Accessed 03 Apr 2019
6. Jing, X., Jian-Jun, Z.: A brief survey on the security model of cloud computing. Distrib.
Comput. Appl. 34(19), 475–478 (2010)
7. Amazon AWS: What is cloud computing? (2019). https://aws.amazon.com/what-is-cloud-
computing/
8. Mitchell Grant: Software-as-a-Service (SaaS) (2020). https://www.investopedia.com/terms/
s/software-as-a-service-saas.asp
9. Mell, P., Grance, T.: The NIST definition of cloud computing version 15. National Institute of
Standards and Technology (NIST). Information Technology, Laboratory (2009). www.csrc.
nist.gov
10. Avram, M.G.: Advantages and challenges of adopting cloud computing from an enterprise
perspective. Procedia Technol. 12, 529–534 (2014)
11. Sasikala, P.: Cloud computing in higher education. Int. J. Cloud Appl. Comput. 1(2), 1–13
(2011)
12. CtrIs 2015: Top 5 Benefits of Cloud Adoption (2016). http://www.ctrls.in/blog/benefits-of-
cloud-adoption/. Accessed 14 Mar 2020
13. Kumar, D., Samalia, H.V., Verma, P.: Exploring suitability of cloud computing for small and
medium-sized enterprises in India. J. Small Bus. Enterp. Dev. 24(4), 814–832 (2017). https://
doi.org/10.1108/JSBED-01-2017-0002
14. Rath, A., Kumar, S.: Decision points for adoption cloud computing in small, medium
enterprises (SMEs). Internet Technol. Commun. 34(4), 688–691 (2012)
15. Haider, Y., Selvan, S.: Confidentiality issues in cloud computing and countermeasures: a
survey (2016)
16. Armstrong, R., Hall, B.J., Doyle, J., Waters, E.: Cochrane update. ‘Scoping the scope’ of a
cochrane review. J. Public Health 33(1), 147–150 (2011). https://doi.org/10.1093/pubmed/
fdr015.PMID21345890
270 A. M. A. Ibrahim et al.
17. Pittway, L.: Systematic literature reviews. In: Thorpe, R., Holt, R. (eds.) The SAGE Dictionary
of Qualitative Management Research. SAGE Publications Ltd. (2008). https://doi.org/10.
4135/9780857020109
18. Eden, J., Levit, L., Berg, A., Morton, S., et al.: Institute of medicine (US) committee on
standards for systematic reviews of comparative effectiveness research. In: Finding What
Works in Health Care: Standards for Systematic Reviews (2011). https://doi.org/10.17226/
13059. ISBN 978-0-309-16425-2. PMID 24983062
19. Kitchenham, B., Charters, S.: Guidelines for performing systematic literature reviews in
software engineering. Keele University and Durham University joint report ((2007))
20. Anwer, F., Aftab, S.: Latest customizations of XP: a systematic literature review. Int. J.
Mod. Educ. Comput. Sci. (IJMECS) 9(12), 26–37 (2017). https://doi.org/10.5815/ijmecs.
2017.12.04
21. Brereton, P., Kitchenham, B.A., Budgen, D., Turner, M., Khalil, M.: Lessons from applying
the systematic literature review process within the software engineering domain. J. Syst.
Softw. 80(4), 571–583 (2007)
22. Kitchenham, B.A., et al.: Preliminary guidelines for empirical research in software engineer-
ing. IEEE Trans. Softw. Eng. 28(8), 721–734 (2002)
23. Palos-sanchez, P.R., Arenas-marquez, F.J., Aguayo-camacho, M.: Cloud computing (SaaS)
adoption as a strategic technology: results of an empirical study (2017)
24. Oliveira, T., Martins, R., Sarker, S., Thomas, M., Popovič, A.: Understanding SaaS adoption:
the moderating impact of the environment context. Int. J. Inf. Manag. 49, 1–12 (2019)
25. Salum, K., Rozan, A., Zaidi, M.: Exploring the challenge impacted SMEs to adopt cloud ERP.
Indian J. Sci. Technol. 9, 1–8 (2016). https://doi.org/10.17485/ijst/2016/v9i45/100452
26. El-Haddadeh, R.: Digital innovation dynamics influence on organisational adoption: the case
of cloud computing services. Inf. Syst. Front. 22(4), 985–999 (2019). https://doi.org/10.1007/
s10796-019-09912-2
27. Hemlata, G., Date, H., Ramaswamy, R.: Understanding determinants of cloud computing
adoption using an integrated TAM-TOE model (2017)
28. Senyo, P.K., Effah, J., Addae, E.: Preliminary insight into cloud computing adoption in a
developing country (2016). https://doi.org/10.1108/JEIM-09-2014-0094
29. Yang, Z., Sun, J., Zhang, Y., Wang, Y.: Understanding SaaS adoption from the perspective of
organizational users: a tripod readiness model. Comput. Hum. Behav. 45, 254–264 (2015)
30. Stieninger, M., et al.: Factors influencing the organizational adoption of cloud computing: a
survey among cloud workers. Int. J. Inf. Syst. Proj. Manag. 6(1), 5–23 (2018). https://doi.org/
10.12821/ijispm060101
31. Yadegaridehkordi, E., Nilashi, M., Shuib, L., Samad, S.: A behavioral intention model for
SaaS-based collaboration services in higher education. Educ. Inf. Technol. 25(2), 791–816
(2019). https://doi.org/10.1007/s10639-019-09993-1
32. Safari, F., Safari, N., Hasanzadeh, A.: The adoption ofsoftware-as-a-service (SaaS): ranking
the determinants (2017)
33. Ming, C.F., et al.: The determinant factors affecting cloud computing adoption by small and
medium enterprises ( SMEs) in Sabah, Malaysia. J. Telecommun. Electron. Comput. Eng.
(JTEC) 10(3), 83–88 (2018)
34. Gutierrez, A., Boukrami, E., Lumsden, R.: Technological, organisational and environmental
factors influencing managers’ decision to adopt cloud computing in the UK. J. Enterp. Inf.
Manag. 28(6), 788–807 (2015). https://doi.org/10.1108/JEIM-01-2015-0001
35. Hasheela Miss, V.T., Mufeti Dr, T.K.: An investigation of factors leading to the reluctance of
SaaS ERP adoption in Namibian SMEs. Afr. J. Inf. Syst. 8(4), 1 (2016)
36. Lian, J.W., Yen, D.C., Wang, Y.T.: An exploratory study to understand the critical factors
affecting the decision to adopt cloud computing in Taiwan hospital. Int. J. Inf. Manag. 34(1),
28–36 (2015)
Software as a Service Challenges 271
37. Alkhater, N., et al.: organisations an empirical study of factors influencing cloud adoption
among private sector organisations. Telemat. Inform. (2017). https://doi.org/10.1016/j.tele.
2017.09.017
38. Senarathna, I., Wilkin, C., Warren, M., Yeoh, W., Salzman, S.: Factors that influence adoption
of cloud computing: an empirical study of Australian SMEs. Australas. J. Inf. Syst. 22 (2018)
39. Awodele, O., Adebayo, A.O., Tayo, O.O.: Security and privacy issues in cloud computing.
Commun. Appl. Electron. 7(3), 14–17 (2017)
40. Hsu, C.-L., Lin, J.C.-C.: Exploring factors affecting the adoption of Internet of Things services.
J. Comput. Inf. Syst. 58(1), 49–57 (2016). https://doi.org/10.1080/08874417.2016.1186524
41. Sakr, S., Zomaya, A.: Encyclopedia of Big Data Technologies, 1st edn. Springer, Switzerland
(2019). https://doi.org/10.1007/978-3-319-77525-8. eReference ISBN 978-3-319-77525-8
42. Sun, Y., Zhang, J., Xiong, Y., Zhu, G.: Data security and privacy in cloud computing. Int. J.
Distrib. Sens. Netw. 10(7), 190903 (2014)
43. Gholami, A., Laure, E.: Security and privacy of sensitive data in cloud computing: a survey
of recent developments. arXiv preprint arXiv:1601.01498 (2016)
44. Makkaoui, K.E., Ezzati, A., Beni-Hssane, A., Motamed, C.: Data confidentiality in the world
of cloud. J. Theor. Appl. Inf. Technol. 84(3) (2016)
45. Aloraini, A., Hammoudeh, M.: A survey on data confidentiality and privacy in cloud com-
puting. In: Proceedings of the International Conference on Future Networks and Distributed
Systems - ICFNDS 2017 (2017). https://doi.org/10.1145/3102304.3102314
46. Vasiljeva, T., Kreslins, K., Novik, D.: Challenge of cloud computing for SMEs: a case of
baltic countries. J. Innov. Manag. Small Medium Enterp. 1–10 (2018). https://doi.org/10.
5171/2018.238581
47. Arvanitis, S., Kyriakou, N., Loukis, E.N.: Why do firms adopt cloud computing? A compara-
tive analysisbased on South and North Europe firm data. Telemat. Inform. 34(7), 1322–1332
(2017). https://doi.org/10.1016/j.tele.2016.05.013
48. Narwal, R., Sangwan, S.: Benefits, dimensions and issues of software as a service (SAAS).
Int. J. New Innov. Eng. Technol. (IJNIET), 36–40 (2013)
49. Caldeira, M.M., Ward, J.M.: Using resource-based theory to interpret the successful adoption
and use of information systems and technology in manufacturing small and medium-sized
enterprises. Eur. J. Inf. Syst. 12(2), 127–141 (2003)
50. Gashami, J.P.G., Chang, Y., Rho, J.J., Park, M.-C.: Privacy concerns and benefits in SaaS
adoption by individual users: a trade-off approach. Inf. Dev. 32(4), 837–852 (2016). https://
doi.org/10.1177/0266666915571428
51. Matias, J.B., Hernandez, A.A.: Cloud computing adoption intention by MSMEs in the
Philippines. Glob. Bus. Rev. (2019). https://doi.org/10.1177/0972150918818262
52. Singh, A., Sharma, S., Kumar, S.R., Yadav, S.A.: Overview of PaaS and SaaS and its applica-
tion in cloud computing. Paper presented at the 2016 International Conference on Innovation
and Challenges in Cyber Security (ICICCS-INBUSH) (2016)
53. Opara-Martins, J., Sahandi, R., Tian, F.: A holistic decision framework to avoid vendor lock-in
for cloud SaaS migration (2017). https://doi.org/10.5539/cis.v10n3p29
54. Chou, D.C.: Cloud computing: a value creation model. Comput. Stand. Interfaces 38,. 72–77
(2015). https://doi.org/10.1016/j.csi.2014.10.001
55. Tehrani, S.R., Shirazi, F.: Factors influencing the adoption of cloud computing by small and
medium size enterprises (SMEs). Paper presented at the International Conference on Human
Interface and the Management of Information (2014)
56. Gangwar, H., Date, H., Ramaswamy, R.: Understanding determinants of cloud computing
adoption using an integrated TAM-TOE model. J. Enterp. Inf. Manag. 28(1), 107–130 (2015).
https://doi.org/10.1108/JEIM-08-2013-0065
57. AlBar, A.M., Hoque, M.R.: Factors affecting cloud ERP adoption in Saudi Arabia: an
empirical study (2017). https://doi.org/10.1177/0266666917735677
272 A. M. A. Ibrahim et al.
58. Hsu, C., Lin, J.C.-C.: Factors affecting the adoption of cloud services in enterprises. Inf. Syst.
e-Bus. Manag. (321) (2015). https://doi.org/10.1007/s10257-015-0300-9
59. Priyadarshinee, P., et al.: Understanding and predicting the determinants of cloud computing
adoption: a two staged hybrid SEM - neural networks approach. Comput. Hum. Behav. 76,
341–362 (2017). https://doi.org/10.1016/j.chb.2017.07.027
60. Lal, P., Bharadwaj, S.S.: Understanding the impact of cloud-based services adoption on
organizational flexibility an exploratory study (2017)
61. Dincă, V.M., Dima, A.M., Rozsa, Z.: Determinants of cloud computing adoption by Romanian
SMEs in the digital economy. J. Bus. Econ. Manag. 20(4), 798–820 (2019)
A Quantum Algorithm to Locate
Unknown Hashgrams
1 Introduction
Quantum computing is rapidly evolving and each day something new is being
discovered. These discoveries are beginning making these concepts applicable
across a variety of domains. In the late 1980s and early 1990’s, quantum com-
puting was entirely theoretical and many of the early algorithms created then
have since provided a foundation on which to build other quantum algorithms.
While many of these algorithms, such as Simon’s [15] and Grover’s [7], were seen
as proof of concept algorithms, they in fact have more value on their own merits
than simply providing a foundation for other algorithms.
Though the situation is improving, one of the current limitations has to do
with availability of quantum computing. While companies such as IBM [8] and
D-Wave [5] are providing access to their quantum computers at no cost via cloud
platforms, they are still limited in the number of qubits and quantum volume
available. For that reason, much of our work is done using Qrack [16], a high-
performance quantum simulator. Simulators on classical hardware can simulate
approximately 30–32 qubits.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
K. Arai (Ed.): FTC 2022, LNNS 561, pp. 273–285, 2023.
https://doi.org/10.1007/978-3-031-18344-7_18
274 N. R. Allgood and C. K. Nicholas
One of the first steps in malware analysis is to perform static analysis which
searches the suspect binary file for static information such as strings that indicate
the program’s purpose, if a binary is maliciously packed, and whether the file
is malicious. [14]. It is also desirable to compare the suspect binary with other
binaries, malicious or not, to see if the suspect binary is similar to any of them.
An n-gram is a sequence of n contiguous bytes, for some small integer n. Files
that happen to have many of the same n-grams, in roughly the same proportions,
can be regarded as similar [6]. Historically, the value of n might be in the range
2–6. But unlike ordinary text, executable binaries use most if not all of the
characters in the range 0x00 to 0xFF. For n of 4, for example, that results in
2564 or roughly 4 billion possible n-grams to be tabulated. More recently, as
described below, larger values of n are also of practical value, but in tabulating
n-grams for any n larger than say 3 or 4, a hash table would be used to keep
track of which n-grams have been seen, and how often. Hash tables are usually
sized so that collisions don’t matter too much in practice. As a file is ingested,
though, a lot of n-grams are seen multiple times, and the same hash value is
computed multiple times. We will show how to improve n-gram tabulation by
calculating an n-gram’s hash once, storing the result, and using quantum search
to find the desired hash value, without recomputing it, should that n-gram be
seen again. This paper is organized as follows: in Sect. 2 we provide a review of
related work. We present the concept of quantum search as applied to n-grams
in Sect. 3. Our numerical and simulation results are presented in Sects. 4 and 5.
In Sect. 6 we summarize our results and make suggestions for future work.
2 Related Work
2.1 n-grams for Malware Analysis
2.2 KiloGram
KiloGram [10] was released as open source software in 2020.1 KiloGram takes
a set of benign and known malicious software as input data. The output will
be a list of the top-k most frequent n-grams found that are contained within
the malicious software. Benign software is any software that is considered to not
contain any malicious code where malware is any software that is design to cause
harm in some fashion. We chose the KiloGram approach since it can be used for
a large number of n-grams and large values of n. Of use to us, the KiloGram
algorithm can handle ngrams that are 8-bytes or larger while keeping 1000 or
more of the most frequent entries.
In the context of malware analysis, n-grams are used to represent strings that
appear in some if not all members of a set of suspected malware specimens. These
n-grams then can be provided to other algorithms for a variety of uses, such as
classification into malware families. KiloGram was designed with these uses in
mind. Recall that the n in n-gram refers to some small integer n. For example,
if we wish to process a 4-byte string such as 0xABCD, you would see this called
a 4-gram. Unfortunately one major drawback of an n-gram based approach for
malware detection, is that the shorter the n-gram, the more likely you will also
find the byte sequence also benign software, making your rate for false positives
increase. Fortunately, KiloGram was also designed to overcome this limitation by
allowing the storing of larger and more specific n-grams, increasing the likelihood
they will be unique within a variety or family of malware.
Grover’s algorithm [7] was one of the first quantum searching algorithms to be
developed. Grover’s has even been the inspiration for other quantum algorithms
such as Shor’s [13] factoring algorithm. While much attention and research has
been specifically around Shor’s algorithm with regards to quantum cryptography,
Grover’s has been used and even improved upon in recent years [18].
Grover’s search algorithm implements what is known as an amplitude ampli-
fication algorithm [3] which has been said to be a generalization of Grover’s algo-
rithm (although amplitude amplification was first discovered in 1997 by Gilles
Brassard in 1997, and then a year later by Lov Grover). The fundamental idea is
to increase (amplify) the probabilities of the desired results, and this is accom-
plished by using a sequence of reflections.2 What is occurring in the amplitude
amplification is that the reflections are rotated closer to the desired quantum
state along the Bloch Sphere. The target state is marked as sin2 (Θ) so that
when the amplitude amplification algorithm is applied m times, the probability
1
https://github.com/NeuromorphicComputationResearchProgram/KiloGrams.
2
https://docs.microsoft.com/en-us/quantum/libraries/standard/algorithms.
276 N. R. Allgood and C. K. Nicholas
of obtaining the correct state is sin2 ((2m + 1)Θ) In other words, we think of the
target state on the Bloch Sphere [2] and we keep rotating it until we find the
correct result, with each rotation getting slightly closer.
3
https://www.encyclopediaofmath.org/index.php/Fourier transform.
A Quantum Algorithm to Locate Unknown Hashgrams 277
4 Quantitative Results
4.1 Grover’s Circuits
Grover’s algorithm is an oracle based algorithm and in the majority of the
literature that discusses Grover’s algorithm, it’s typically split into four parts:
1. Initialization
2. Oracle processing
3. Amplitude amplification
4. Measurement
We now describe how a quantum simulator, in particular Qrack [16], imple-
ments both the oracle and amplification components of Grover’s search. Figure 1
is an example of a traditional quantum circuit for Grover’s search. Figure 2 is
a modified version of Grover’s search to be utilized with the Qrack quantum
simulator.
4
https://link.springer.com/article/10.1007%2FBF03014877.
278 N. R. Allgood and C. K. Nicholas
|0 /n H • X H • H
Oracle M easure
|0 /n H X H Z H
Cl. Reg /n
|key /k H • X H H
Oracle UZ M Reg
|value /v H X H H
|key /k
UZ
v
|value /
Cl. M em DEC IN C
operation to return to the original value, only with the sign flipped. To finalize
our example, we add 0 + (+100) where + is the phase, with our result being
+100. √
Theoretically, Grover’s algorithm requires an average of O( N ) lookups to
find a match for the specified target. While we are using a traditional lookup
table for Grover’s, the input time complexity evaluation might not be that obvi-
ous. If we dive into the bare fundamentals of Qrack/VM6502q, we notice we
have a IndexedLDA instruction [17]. This is a modified LOAD instruction that
allows loading a key with a superimposed index into a quantum register. The
IndexedLDA operation is unitary by design so it will not affect the overall quan-
tum state as it is loaded into the registers. The writing of the data with a
superimposed index, will actually entangle the classical memory cache and the
index register. Knowing this, we can say that the IndexedLDA operation takes
O(1) to load data into quantum registers.
In addition to the initial loads, there
M
will be an input time complexity of O N where M is the total number of
keys in the lookup table and N is the total number
of matches
[9]. This yields
M M
an overall input time complexity of O(1) + O N =O N .
We use the term lookups but this refers to the number of iterations of Grover’s
algorithm. To be specific, Qrack uses the following equation to determine the
number of iterations to use [17]:
π
f loor (2)
4 arcsin2 ( √ 1N )
2
5 Simulated Results
5.1 Qrack Operations
The Qrack [16] implementation utilizes some specialized methods for implement-
ing many of the operations in the oracle and amplitude amplification portions
of the algorithm. Here are some of the most commonly used operations [17]:
As briefly mentioned in Sect. 2, there are some limitations with quantum simu-
lations, the most obvious being limited computing resources available for simu-
lation. While Qrack [16] can take full advantage of a GPU for processing using
OpenCL5 , one typically is limited to simulating approximately 30-qubits. Qrack
has some development branches of code where they are simulating 128-qubits for
testing the quantum supremacy problem released by Google [1], however, these
branches are quite experimental. To better appreciate why 30 qubits is a limita-
tion for simulation, we must recall our base formula 2n where n is the number of
qubits we wish to simulate. 2n specifically refers to the total amount of quantum
states we wish to simulate. With 30 qubits, we end up with 230 = 1073741824
or roughly one billion values. But the amplitudes represented by the quantum
states are complex numbers, so we must include the real and imaginary parts
when factoring in memory requirements. We use 22 bytes for the real value and
22 bytes for the imaginary value. This then gives us 22+2 = 16 bytes for each of
those one billion values, or
Outlined in Table 1, Using our above equation, the following table represents
how much memory is required for simulating and encoding up to 40 qubits.
5
https://www.khronos.org/opencl/.
A Quantum Algorithm to Locate Unknown Hashgrams 281
In Table 2 we state our benign and malicious datasets along with the the
respective number of files in each dataset.
In Table 3 we list the number of kept n-grams when comparing malicious and
benign datasets. We also record the size of our n-gram kept, with a maximum
of n-gram size of 3-bytes due to limitations of the simulation hardware.
The hardware and software used was a 16-Core Intel Xeon E5-2630 @ 2.4 Ghz
with 32 GB RAM and two GeForce GTX 1660 video cards. The machine was
running 64-bit Ubuntu Linux 18.04 and OpenCL 1.2. As one can see, due to the
282 N. R. Allgood and C. K. Nicholas
As we can see from Table 4, more n-grams requires more iterations and the
number of iterations increases by a much smaller amount as we keep a larger
number of n-grams. Using a practical example, below we describe pseudo-code
for the Qrack implementation of Grover’s algorithm in addition to showing the
output for a 2-byte n-gram with a 10-bit index. We search for a n-gram with
the value of 0xF3D7 which has an unknown hash, which we quickly find to be
0x3a9.
Figure 4 is an example where we search for an n-gram with the value of 0xF3d7
that has a hash value of 0x3a9.
A Quantum Algorithm to Locate Unknown Hashgrams 283
1: idxLen = 10
2: valLen = 16
3: cryIdx = idxLen + valLen
4: ngrams = ngramtable[indexLength]
5: ngram = 0xf 3d7
6: qReg = CreateQuantumInterf ace(∗params)
7: qReg = SetP ermutation(0)
8: qReg = H(valLen, idxLen)
9: qReg = IndexedLDA(valLen, idxLen, 0, valLen, ngrams)
10: procedure QueryOracle(tP erms, qReg, valueSt, valLen)
11: qReg = DEC(tP erms, valueSt, valLen)
12: qReg = ZeroP haseF lip(tP rems, valueSt, valLen)
13: qReg = IN C(tP rems, valueSt, valLen)
14: end procedure
15: procedure AmplitudeAmplifiation
16: idxLen = 10
17: valLen = 16
18: cryIdx = idxLen + valLen
19: ngrams = ngramtable[idxLength]
20: ngram = 0xf 3d7
21: qReg = CreateQuantumInterf ace(params)
22: qReg = SetP ermutation(0)
23: qReg = H(valLen, idxLen)
24: qReg = IndexedLDA(valLen,
0, valLen, ngrams)
idxLen,
6 Conclusion
We have shown that combining the results of an efficient n-gram collection soft-
ware such as KiloGram with quantum computing, we can provide a faster way of
finding a previously computed, but currently unknown hash for a known n-gram.
We have compared this solution to the classical approach, and have shown that
for a large number of n-grams, the quantum based solution outperforms them
substantially. When better quantum hardware is available, these concepts could
be applied to cryptographic hashes such as SHA-256 or BLAKE3. We hope
that our work will remain useful when better quantum computers are available.
Quantum computing research is continuing to grow each day and while it might
seem that adequate enough hardware is far into the future, it is will be upon us
before we realize and cybersecurity professionals will need to be ready.
References
1. Arute, F., Arya, K., Babbush, R., et al.: Quantum supremacy using a pro-
grammable superconducting processor. Nature 574, 505–510 (2019). https://www.
nature.com/articles/s41586-019-1666-5
2. Bloch, F.: Nuclear induction. Phys. Rev. 70, 460–474 (1946)
3. Brassard, G., Høyer, P., Mosca, M., Tapp, A.: Quantum amplitude amplification
and estimation. Quantum Computation and Information, pp. 53–74 (2002)
4. Coppersmith, D.: An approximate fourier transform useful in quantum factoring,
2002
5. D-Wave. D-wave (2020). https://dwavesys.com
6. Damashek, M.: Gauging similarity with N-Grams. Science 267(5199), 843–848
(1995)
7. Grover, L.K.: A fast quantum mechanical algorithm for database search. In: Pro-
ceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing
- STOC 1996 (1996)
8. IBM. IBM quantum experience (2020). https://quantum-computing.ibm.com
9. Michael, A.: Nielsen and Isaac L. Cambridge University Press, Chuang. Quantum
Computation and Quantum Information (2000)
10. Raff, E., et al.: KiloGrams: very large N-grams for malware classification. In: Pro-
ceedings of KDD 2019 Workshop on Learning and Mining for Cybersecurity (LEM-
INCS 2019) (2019)
11. Rudin, W.: Real and Complex Analysis, 3rd Edn. McGraw-Hill, Inc., USA (1987)
12. Shalaginov, A., Banin, S., Dehghantanha, A., Franke, K.: Machine learning aided
static malware analysis: A survey and tutorial. Cyber Threat Intelligence, pp. 7–45
(2018)
13. Shor, P.W.: Algorithms for quantum computation: discrete logarithms and factor-
ing. Proceedings 35th Annual Symposium on Foundations of Computer Science,
Santa Fe, NM, pp. 124–134 (1994)
A Quantum Algorithm to Locate Unknown Hashgrams 285
14. Sikorski, M., Honig, A.: Practical Malware Analysis. no starch press (2012)
15. Simon, D.R.: On the power of quantum computing. In: Proceedings of the 35th
Annual Symposium on Foundations of Computer Science, pp. 116–123 (1994)
16. Strano, D., Bollay, B.: Qrack a comprehensive, gpu accelerated framework for devel-
oping universal virtual quantum processors (2020). https://github.com/vm6502q/
qrack
17. Strano, D., Bollay, B.: Vm6502q and qrack (2020). https://vm6502q.readthedocs.
io/en/latest/index.html
18. Wang, Y.: A quantum walk enhanced grover search algorithm for global optimiza-
tion (2017)
BUMP: Bridging Unmet Modes
of Participation in the Workplace
1 Introduction
Technologies allow users to connect at a distance. Humans are essentially social
species with the motivation to form and maintain interpersonal relationships
as a fundamental organizational principle of behavior [1]. Especially during the
COVID-19 pandemic, connecting at a distance via video conferencing and chat-
ting proves to benefit bringing people together. Despite many technologies and
platforms that existed before the pandemic, their importance to our daily liv-
ing increased exponentially during the pandemic. People are using tools like
Zoom, Skype, and other technologies to stay in touch with work collaborators,
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
K. Arai (Ed.): FTC 2022, LNNS 561, pp. 286–294, 2023.
https://doi.org/10.1007/978-3-031-18344-7_19
BUMP 287
colleagues, and family, thus making the relevance of technology even more evi-
dent [2]. However, these systems usually require programming and coordination
to establish interaction and connection. While chat rooms and videoconferenc-
ing are continuously evolving into low-delay and high-quality platforms, both
systems can be further evolved better to increase the “social” aspect and the
capabilities to “expand” the physical space of users. Office spaces and digital
communication technologies are changing the nature of work for individuals
[3]. Users, online platforms, and spatially are interrelated and come together
in the constitution of everyday spatial encounters and experiences [4]. However,
barriers to technologies at the workplace are related to issues of adoption and
accessibility. In addition to discrepancies in digital literacy, there is imminent
segregation of groups of the population that decides to stay apart from the
continuous-evolving communication networks.
Multiple factors such the age, academic degree, and the ability to effectively
use technology [5] may affect the adherence to modernized tools to interact and
solve problems [6] while the meaning of life-like interactions is preserved [7]. This
study explores the design and implementation of accessible technologies to bridge
participation in the workplace. This paper describes the Bridging Unmet Modes
of Participation (BUMP) in the workplace, mainly focusing on the technology
development for implementation. The significance of this paper is to address the
creative approach in technology iterations based on off-shelf technologies; and
how the results contribute to designing inclusive technologies that can facilitate
the way users communicate with accessible technologies at a distance.
The rest of the paper is organized as follows. Section 4 discusses the techni-
cal setup, criteria, and observation facts considered for the pilot studies. Each
subsection elaborates on critical components for the project’s development. Sub-
section 4.1 explains the technological components that enable the system to run
independently without involving existing or commercial software. Subsection 4.2
describes the environment considered for the implementation of the pilot study,
focusing on the sampling size and the current space conditions where the study
took place. Subsection 4.3 focuses on the results of the pilot study conducted
in two different approaches, passive and active, to test different users’ levels of
interaction with the BUMP stations. Finally, Sect. 5 concludes the article and
discusses future research opportunities.
2 Related Work
Other prototypes have been previously developed. The authors in [8] and [9]
presented the design of a new communication method to bridge unmet partici-
pation. The project’s goal was to extend older adults’ home spaces by enabling
them to connect with families and the environment in more natural ways. Such
a project explored the possibilities of improving the environment by making a
more innovative and more supportive environment for its inhabitants.
In terms of post-pandemic scenarios, authors have remarked on the impor-
tance of finding methods to reduce the social distance created by the pandemic
288 C. B. Rebola et al.
3 Design Concept
Following the idea of “bumping into someone”, we present this project as an
alternative to meeting or introducing someone unexpectedly. Our work aims
to create unplanned connections in workspaces to promote interaction between
coworkers. As a result, a novel option to socialize in the workspace is developed
by connecting people through virtual windows that can be placed in multiple
spaces such as hallways, cafeterias, or lounges. This pilot work consists of two
physical stations located in two different spaces. These two stations are connected
using software that allows them to stream and receive video. While no interac-
tion occurs at any of the stations, the screen switches to stand-by by showing a
black background with the BUMP logo on top. Once any of the systems detects
the movement around the stations, it starts transmitting video allowing the par-
ticipants to interact with each other in real-time Fig. 1. BUMP is also meant to
be an unprecedented method of pressure-relieving while creating periodic breaks
that generate greater productivity, inspire creativity, and improve the positive
attitude among employees. Furthermore, this approach reduces distraction by
limiting the interaction time to 30 s.
This study also measures the impact of adding such dynamic interactions
in the workspace. Although this first approach only involves two stations, the
system could potentially be expanded to more stations inside multiple corporate
buildings.
4 Implementation
4.1 Technical Setup
The major challenges of the design were related to sensing users in the spaces
while timely connecting them at a distance. To compensate for the delay, this
project used a couple of movement sensors to locate the user several seconds
before being in front of the screen and thus create the interaction “bump”
at a distance. The current prototype implemented two BUMP stations that
BUMP 289
Noteworthy, this project does not use any existing video-conference appli-
cation or software. Instead, the stations create a Point-to-Point connection by
using Netcat. Initial set-up requires the manual execution of the script that exe-
cutes a Raspivid command to stream the video to a local port, followed by a
290 C. B. Rebola et al.
Mplayer command that shows the other participant’s video on the local display.
The automated script is shown in Fig. 3. A python program -start.py- plays a
coordination role as it controls the sequence and timing, plays the invitation-to-
interact sound, and reads the IR sensors connected to the I/O ports to operate
the display accordingly.
The BUMP prototype was tested for implementation. The BUMP prototype was
launched for pilot studies once the system was stable and produced no errors.
The criteria to select the ideal place to conduct the pilot study were to find
two different spots in the building. The research team had immediate access to
a workplace site with a large setting (housing approximately 2500 users) with
different floors and open spaces. Therefore, the two modules were located along
the busiest halls; both featured a precise vicinity that was noticeable and did
not block any walkable area.
The BUMP prototype installation consisted of two stations in different hall-
ways of the workplace where a high amount of passersby was expected (see
Fig. 4). High traffic areas were chosen to facilitate an optimal scenario with mul-
tiple observations from several users.
In order to assess how users interact with the device, the BUMP prototype
pilot study was conducted under two different scenarios with a duration of three
weeks each. The first approach (passive) was focused on passerby’s curiosity,
people’s acceptance, and indifference, and whether the user exhibited an inten-
tion to interact with the stations without any explicit invitation. Upon com-
pletion of the first set of trials, different behaviors were analyzed. The second
approach (active) incorporates two simple invitation methods, the exclamation
“Hey, psst!” and a screen message with the simple sentence “give a smile here”
(see Fig. 5). This modification aimed to attract more passersby and explicitly
invite them to interact with the device.
BUMP 291
device. These people were teaching the new observers their definition of how the
system works, making the appropriation of the people with the device and the
environment evident.
For the second approach, an increasing number of individuals started inter-
acting with the BUMP stations by waving and smiling at the person on the
counterpart. The interaction between individuals occurs easier as people feel
invited to meet and produce an interaction. In addition, the BUMP stations
start to receive personality and acceptance in the building. Figure 6 exemplifies
the interaction between two users using BUMP stations at different locations.
BUMP 293
References
1. Lieberz, J., et al.: Loneliness and the social brain: how perceived social isolation
impairs human interactions. Adv. Sci. 2021 8, 2102076 (2021). https://doi.org/10.
1002/advs.202102076
2. Queen, D.: Technological impact of COVID-19. Int. Wound J. 18(2), 129–130
(2021). https://doi.org/10.1111/iwj.13578
3. Bowen, T., Pennaforte, A.: The Impact of Digital Communication Technologies
and New Remote-Working Cultures on the Socialization and Work-Readiness of
Individuals in WIL Programs: Global Perspectives on the Future (2017). https://
doi.org/10.1108/S1479-367920170000032006.
4. Repenning, A.: Workspaces of Mediation: How Digital Platforms Shape Practices,
Spaces and Places of Creative Work. Tijds. voor econ. en Soc. Geog. 113, 211–224
(2022). https://doi.org/10.1111/tesg.12508
5. Davies, R.S.: Understanding technology literacy: a framework for evaluating edu-
cational technology integration. Techtrends Tech Trends 55, 45 (2011). https://
doi.org/10.1007/s11528-011-0527-3
6. Pfaltzgraf, D., Insch, G.S.: Technological illiteracy in an increasingly technological
world: methods to help employees create with rather than simply consume technol-
ogy. Dev. Learn. Organ. 35(6), 4–6. https://doi.org/10.1108/DLO-12-2020-0235
7. Petrova, K., Schulz, M.S.: Emotional experiences in technology-mediated and in-
person interactions: an experience-sampling study. Cogn. Emotion (2022). https://
doi.org/10.1080/02699931.2022.2043244
8. Chu, C., Rebola, C.B., Kao, J.: BUMP: bridging unmet modes of participation. In:
Proceedings of the 2015 British HCI Conference (British HCI 2015). Association
for Computing Machinery, New York, NY, USA, 261–262 (2015). https://doi.org/
10.1145/2783446.2783601
9. Rebola, C.B., He, S.: Project BUMP: developing communication tools for the older
adult population. In: Arai, K., Bhatia, R., Kapoor, S. (eds.) Proceedings of the
Future Technologies Conference (FTC) 2019. FTC 2019. Advances in Intelligent
Systems and Computing, vol. 1069. Springer, Cham (2020). https://doi.org/10.
1007/978-3-030-32520-6 70
10. Bleakley, A., et al.: Bridging social distance during social distancing: exploring
social talk and remote collegiality in video conferencing. Hum.-Comput. Interact.
(2021). https://doi.org/10.1080/07370024.2021.1994859
11. Jacobs, N.J., Lindley, J.: Room for Improvement in the Video Conferencing. AoIR
Selected Papers of Internet Research (2021). https://doi.org/10.5210/spir.v2021i0.
12188
12. Portalcities.org. 2022. Portal - a Bridge to The United Planet. [online] https://
portalcities.org/. Accessed 20 June 2022
13. Lages, W.S., Bowman, D.A.: Walking with adaptive augmented reality workspaces:
design and usage patterns. In: Proceedings of the 24th International Conference on
Intelligent User Interfaces (IUI 2019). Association for Computing Machinery, New
York, NY, USA, pp. 356–366 (2019). https://doi.org/10.1145/3301275.3302278
14. Ofek, E., Grubert, J., Pahud, M., Phillips, M., Kristensson, P.O.: Towards
a practical virtual office for mobile knowledge workers (2020). arXiv preprint
arXiv:2009.02947
Theoretical Perspectives Towards
Culture-Centered User Engagement Design
for Mobile Health in the Global South
ADAPT Centre, School of Computer Science and Statistics, Trinity College Dublin, Dublin,
Ireland
[email protected]
Abstract. The main objective of this article is to propose a theoretical model for
uncovering users’ socio-cultural contexts for user engagement designs, as well
as to present a set of consolidating concepts to guide future research in mobile
health (mHealth) designs to strengthen the theoretical framing of user engage-
ment designs. First, we present a brief discussion of the importance of employing
appropriate theoretical frameworks that promote user engagement. Second, we
discuss different theoretical perspectives on user engagement. Two theoretical
frameworks, activity theory and the communicative ecological framework, are
used to frame an understanding of users’ socio-cultural contexts and inform the
design of a framework called Design Process Engagement Enhancement Sys-
tem (DECENT) to support designers in mHealth engagement designs. This paper
addressed the research question of how to uncover the socio-cultural contexts of a
user group to inform the design of engaging digital artifacts (tools) by augmenting
two phases of user-centered methodology with users’ socio-cultural filtration and
socio-cultural checklists for user engagement designs. The DECENT framework
is an adapted user-centered design framework that uses activity theory and the
communicative ecological framework as theoretical framework for understanding
the socio-cultural complexity of users in the context of mHealth. The findings of
the paper suggest that to fully explain and design for user engagement in mHealth,
an integrated approach incorporating a variety of technological and socio-cultural
factors should be required.
1 Introduction
contexts in the design of mHealth interventions and suggested that such socio-cultural
contexts be considered and addressed systematically by identifying a design process for
engaging users in mHealth interventions [5].
New products are frequently developed for international markets in our globalized
economy. Because user characteristics and needs differ significantly across regions [6],
product development for global markets necessitates organizations to take these differ-
ences in user characteristics and needs into account when designing products. These
differences are frequently influenced by national and ethnic cultures. According to Shen
et al. [7, p. 7] “Successful interface metaphors should be developed or adapted through
cultural requirements by or regarding, representatives of the culture for which they are
intended”. Culture refers to the similar patterns of thinking, feeling, and acting of people
who belong to the same group but differ from other groups in these patterns [8]. This
type of ‘acting’ is largely based on unwritten rules and habits that are passed down from
generation to generation [9]. Thus, this paper focuses on the research question - How to
uncover the socio-cultural complexities of a user group to inform the design of engaging
digital artifacts?
Honold [10] asserts that the approach to learning how to use a new mobile phone, for
example, may differ depending on national culture. According to the findings from Hon-
old’s study, German users prefer to use a user manual, whereas Italians prefer learning
by doing. Furthermore, while the salesperson is an important source of information for
the Chinese, the entire family is involved in the knowledge acquisition process for Indian
users. Honold’s study demonstrates that culture influences patterns of user-product inter-
action and engagement, which implies that designers should take these differences in
knowledge acquisition approaches into account. However, determining the nature of this
impact may be difficult without theorization and a deeper understanding of how users
actually engage with technologies [11]. This emphasizes the importance of theoreti-
cal models that provide a framework for assessing cultural differences and generating
design recommendations. Activity theory (AT) and the communicative ecology frame-
work (CEF) are suitable models used in the paper for uncovering users’ socio-cultural
contexts for user engagement designs for many reasons.
AT is a well-known theoretical model for examining and understanding the human
use of technology. The use of AT to uncover the socio-cultural contexts of users helps
establish a context for human activity and provides key insights into the potential meaning
and significance of user engagement designs. AT provides a framework for investigating
the complexities of interactions between people and their environments by identifying
the units of an activity system, how they are related, their various voices, their history,
flaws, and changes [12]. This paper uses the Engeström model of AT (as described
in Sect. 3) for uncovering users’ socio-cultural contexts for user engagement designs.
The Engeström model of AT provides a conceptual framework for understanding the
interconnections between activities, actions, operations, and artifacts, as well as aspects
of the social, cultural, and societal contexts in which these activities are framed [12].
In the case of mHealth technology engagement design as an example, AT can be
used to illustrate why a user relates with the mHealth but may not investigate in detail
what causes a specific user to relate with mHealth technologies through his or her daily
communication in broader socio-cultural contexts. As a result, the CEF (as defined by
Theoretical Perspectives Towards Culture-Centered User Engagement Design 297
Foth and Hearn, [13]), which integrates three layers of interpretation (technical, social,
and discursive), is used in this paper to provide a rich description of how mHealth is
structured in a social context. Section 3 provides a detailed example of the use of AT and
the CEF to uncover socio-cultural contexts of a user group for user engagement designs
that inform the design process engagement enhancement system (DECENT) framework.
To clarify the theoretical perspective taken to uncover the socio-cultural contexts of
users, it is necessary to outline the conceptualization of user engagement and theories of
engagement used as described in Sect. 2. In this paper, the definition of user engagement is
a context-dependent, individual-specific psychological state that emerges from two-way
interaction with an object, such as an app [14–16].
Overall, the research in mHealth design for user engagement demonstrates that the
impact of culture on design should not be overlooked. Furthermore, theoretical models
are useful in guiding the product design process of user engagement designs. The fol-
lowing section presents a theoretical framework for user engagement. Other sections of
the article cover DECENT and the use of AT and the CEF to inform DECENT tools, as
well as the DECENT framework, its phases, and conclusion.
Table 1. (continued)
Table 1. (continued)
While AT can be used to explain why an individual user interacts with mHealth
artifacts in his or her daily communication, the Altheide [30] model of CEF provides
a rich description of how each individual interacts with mHealth artifacts in broader
Theoretical Perspectives Towards Culture-Centered User Engagement Design 301
socio-cultural contexts to inform DECENT tools (as detailed in Sect. 3.1). Thus, AT and
CEF are used as a lens for uncovering the socio-cultural contexts of a user group for
user engagement designs.
These units of activity systems can be used as an organizing principle for understand-
ing the socio-cultural complexity of users [12]. Oers [31, p. 71] defined an activity in AT
as “any motivated and object-oriented human enterprise, with roots in cultural history
and depending for its actual occurrence on specific goal-oriented actions”. Deliver-
ing mobile training to community health workers (CHWs), for example, is an activity;
CHWs can replay important training content without the need for additional classroom
presence, which helps to manage care for vulnerable populations. The AT framework,
according to Good and Omisade [32, p. 54], “uses activity as the basic unit for studying
human practices”. “Activity, or ‘what people do,’ is reflected in actions as people interact
with their environment.“ Components are embodied in activity. The theory’s components
are the subject, object, and community, while the artifacts used in the activities to deter-
mine the context are tools, rules, and divisions of labour. Activity is carried out by
a subject (e.g., mobile trainers) who carries out activity toward the solution from the
activity (object, e.g., trained CHWs) and is mediated by tools (e.g., training modules) in
collaboration with others (community). Cultural factors such as conventions (rules) and
social divisions (a division of labour) within the context shape and constrain the structure
of the activity [33]. AT also emphasizes context factors and interpersonal interactions,
arguing that some context must be considered in the analysis of human actions because
the ultimate cause of human activities is needs [34]. AT provides a strong framework
for investigating contextual factors and demonstrates the complexities and fluidity of
activities in context.
The users are referred to as “innovators” as they are individuals deeply involved in co-
creating their user engagement designs with the designers. They share more information
about different points of engagement/disengagement with mHealth products that they
have used previously with developers. The information will aid in user engagement
designs.
Fig. 3. A schematic diagram for analyses of users as innovators (capture and postcards)
This approach is applied as a DECENT tool to capture and post in form of cards,
special moments of users engaging with mHealth products and different stages of points
of engagement/disengagement to uncover insights about users’ lifestyles often portray in
their inherent cultural roots while engaging with mHealth products for user engagement
designs.
Use AT to Analyze Self-Built Guide: Users build personal information based on their
experience engaging with a mHealth product in form of stories.
This may make developers more included in knowing users’ values and interests.
Figure 5 shows a schematic diagram of the self-built guide pattern.
Theoretical Perspectives Towards Culture-Centered User Engagement Design 305
4 DECENT Framework
This study is about improving user engagement designs and development. McCurdie
et al. [37] argue that mHealth technologies have not done enough in engaging the users.
It has been established in this paper that the ways users engage with mHealth technol-
ogy and behave are greatly influenced by their previous experience and socio-cultural
background. It is through a better understanding of users’ perceptions and socio-cultural
values that software designers/developers will move into a new paradigm of quality where
technological products have added value, met users’ true needs, and make their experi-
ence more meaningful [38]. Thus, we augmented two phases of the user-centered design
with “socio-cultural filtration” and “socio-cultural checklist” to inform the DECENT
tools. Thus, DECENT is an adapted model of user-centered design that provides tools
for establishing socio-cultural contexts. The DECENT framework has six phases and
it’s adapted from a user-centered framework (details in Sect. 4.1).
The presented “socio-cultural filtration” and “socio-cultural checklist” would fit well
into two phases of the user-centered design framework (Fig. 6). The “socio-cultural
filtration” phase is focused on enabling mHealth designers to understand the socio-
cultural contexts of the user to bring input to the first phase of user-centered design, the
“analysis of needs assessment of user” which focuses on the understanding of the user’s
needs and values.
The second phase of the user-centered design framework that would benefit from the
introduction of the presented socio-cultural checklist is the “design” phase. This phase
is focused on incorporating the data gathered from the “analysis of needs assessment
of user” and design of the solution. To this end, the “socio-cultural checklist” aids in
determining whether the quality of the solution is consistent with the socio-cultural
values of the designed solution.
As shown in Fig. 6, the “socio-cultural filtration” phase and “socio-cultural checklist”
phase extends the user-centered design framework in two specific phases to get to the
overall phases of the DECENT framework. Thus, DECENT has 6 phases: socio-cultural
filtrations, analysis, design, socio-cultural checklists, evaluations, and implementation.
The goal of DECENT is to aid designers to become more aware of or sensitive to and
capturing users’ socio-cultural contexts into user engagement designs. According to ([7],
306 T. Ikwunne et al.
p.12), the closer the similarity in the socio-cultural background between the user and
the designer, the stronger the assurance of a successful human-computer interaction.
Whether designers/developers share the same socio-cultural origin with users or not,
designers are required to be sensitive to the users’ socio-cultural contexts and be able to
view them using DECENT tools.
The DECENT tools for capturing socio-cultural contexts of the user group are shown
in Fig. 7. It is comprised of six steps.
They comprise (I) contextual inquiry (II) personas, (III) capture cards and postcards,
(IV) self-built guide, (V) communication delight, and (VI) ethics. These six steps are
explained in the next section of the phases of DECENT.
In understanding the Korean context of cross-cultural design practice in Korean edu-
cational design, Lee [39] conducted a comparison of ‘Before’ and ‘After’ the application
of the Cross-Cultural Design education method in Korean design education to create a
more user-centered design, which is summarized in Table 2.
Table 2. Summary of Lee [39]’s comparison of ‘Before’ and ‘After’ the application of the cross-
cultural Design
Before After
Outcome-oriented design process Process-oriented design process
Function, solution approach Socio-cultural approach
Focus majorly on aesthetic values Focus both on aesthetic, socio-cultural, and
contextual values
Lack of critical understanding of the importance Critical understanding of the importance of
of design in socio-cultural contexts design in socio-cultural contexts
Contextual Inquiry: DECENT employs a contextual inquiry tool to elicit users’ socio-
cultural backgrounds by probing for examples of user behaviour when interacting with
a mHealth app, stories or images about user acceptance and use of mHealth apps, and
changes in users’ lifestyles because of their use of mHealth apps.
Capture and Postcard: The capture and postcard tool of DECENT framework is
inspired by Lee [39]. Lee [39] engaged in a cross-cultural design programme called
“Bon-Voyage” that aimed to develop designs based on the understanding of Eastern and
Western cultures, and anchor ideas on the differences between Korean and British cul-
ture to inspire a unique product design that captures tourist experience while traveling.
Capture and postcard were important tools in uncovering the socio-cultural contexts of
tourists while traveling. ‘Capture Cards’ enable capturing of special objects encountered
308 T. Ikwunne et al.
while traveling, in a postcard format, thus allowing sharing the experience with others. In
the work, capture and postcard were identified as key mechanisms for understanding cul-
ture and cross-culture, identifying cross-cultural design strategies, and adding insights
into how cross-cultural design can benefit design communities. This idea is applied to
capture points of engagement, disengagement, re-engagement, and self-management of
one’s health that does not involve mHealth technology.
Self-Built Guide: Lee [39], in his paper created a self-built guidebook that allows users to
build a body of travel information based on their own experiences. The book was aimed
at assisting people in creating fun memories as they encounter locations that tourists
are not normally aware of, giving them the impression that they are on a treasure hunt
to discover information and stories to share and send to friends or keep as a personal
reminder. Self-built as a DECENT tool, allows users to build a body of engagement
stories with app info based on their experience. It assists users in creating fun memories
as they encounter while engaging with a previous mobile app that normally designers are
not aware of with an impression to share stories to friends or keep for personal memories.
Communication Delight: DECENT framework incorporates communication delight
tools to share ideas and facilitate communication in the form of images between two or
more cultures. This helps bridge the gap in cultural differences between the software
designers and users in the design of mHealth technologies. According to Lee [39], designs
might emerge as a result of improved communication or as a result of advancements in
communication- it can work both ways. Lee [39] developed a tool Emotional Blind, that
gives British people the experience of Korean community culture by presenting how
people can freely express their emotions visually and communicate with neighbours. In
the same vein, communication delight as a DECENT tool enables designers and users
to share ideas and communicate via images and visual diagrams. The idea of using
visual images to communicate between two or more cultures was birthed because of the
researchers’ interactions with software developers on ways to uncover the socio-cultural
background of users. Thus, communication delight is used as a DECENT tool to allow
mHealth users to communicate with designers or other users by using images rather than
words when interacting with them to share their experiences with mHealth products. As
a result, language barriers may no longer irritate the user.
Ethics: Every procedure for collecting data from participants should be ethical and
should not infringe on the participants’ rights.
Phase 2. Analysis: This phase incorporates the data gathered in the first phase (phase
1) and entails mapping out all of the necessary stakeholders as well as empathizing with
users.
Phase 3. Design: This is the phase in which ideas are generated. Ideas are generated
and can be improved through brainstorming. Team members build on each other’s ideas
before deciding on the best one and prototyping it.
Phase 4. Socio-Cultural Checklist: This phase involves determining whether the qual-
ity of the developed concepts/prototypes is consistent with the socio-cultural values of
the designed solution.
Theoretical Perspectives Towards Culture-Centered User Engagement Design 309
Phase 5. Evaluation: The evaluation phase entails testing the solution prototype with
users to learn how they feel about it.
Phase 6. Implementation: The final phase focuses on how to put the final solution into
action.
5 Conclusion
Much research has been conducted to investigate the concept of user engagement. This
paper identified and described eight theoretical perspectives pertinent to understand-
ing user engagement, namely the Flow theory, Motivation theory, O’Brien and Tom’s
Model of Engagement, Sidner et al.’s Model of Engagement, Short et al.’s Model of
User Engagement, Unified Theory of Acceptance and Use of IT, Technology Accep-
tance Model, and ‘PERMA’ framework. According to this paper, an interdisciplinary
approach incorporating a variety of technological and socio-cultural factors is required
to comprehensively model user engagement in mHealth-based interventions. This study
can be used to guide future research in mHealth designs by providing a set of consolidat-
ing concepts to strengthen the theoretical framing of user engagement designs. Further
evaluation is also required to determine the extent to which the core proposals of the
two theoretical perspectives – AT and CEF as a lens in uncovering users’ socio-cultural
contexts – are supported by empirical evidence in the implementation of the DECENT
tool. The plan for the future is to develop, refine and test DECENT using the theoretical
knowledge of AT and the CEF in a specific application of mHealth interventions.
Acknowledgments. This research was conducted with the financial support of Science Founda-
tion Ireland under Grant Agreement No. Grant 18/CRT/6222 at the ADAPT SFI Research Centre
at Trinity College Dublin. The ADAPT SFI Centre for Digital Content Technology is funded by
Science Foundation Ireland through the SFI Research Centres Programme and is co-funded under
the European Regional Development Fund (ERDF) through Grant #13/RC/2106_P2.
References
1. Doherty, K., Doherty, G.: Engagement in HCI: conception, theory, and measurement. ACM
Comput. Surv. (CSUR) 51(5), 1–39 (2018)
2. Hingle, M., Patrick, H.: There are thousands of apps for that: navigating mobile technology
for nutrition education and behavior. J. Nutr. Educ. Behav. 48(3), 213–218 (2016)
3. Tang, J., Abraham, C., Stamp, E., Greaves, C.: How can weight-loss app designers best engage
and support users? A qualitative investigation. Br. J. Health. Psychol. 20(1), 151–171 (2015)
4. Ikwunne, T., Hederman, L., Wall, P.J.: Design processes for user engagement with mobile
health: a systematic review. Int. J. Adv. Comput. Sci. Appl. 13(2), 291–303 (2022)
5. Ikwunne, T., Hederman, L., Wall, P.J.: Understanding user engagement in information and
communications technology for development: an exploratory study. In: Stephanidis, C., Mar-
cus, A., Rosenzweig, E., Rau, PL.P., Moallem, A., Rauterberg, M. (eds.) HCII 2020. LNCS,
vol. 12423, pp. 710–721. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60114-
0_46
310 T. Ikwunne et al.
6. Ono, M.M.: Emergent strategies for designing new products facing cultural diversity, within
the globalisation context. In: 2nd Conference on Innovative Research in Management,
Stockholm (2002)
7. Shen, S.T., Woolley, M., Prior, S.: Towards culture-centered design. Interact. Comput. 18(4),
820–852 (2006)
8. Hofstede, G.: Cultures and Organizations: Software of the Mind. McGraw-Hill, New York
(1991)
9. De Angeli, A., Kyriakoullis, L.: Globalisation vs. localisation in e-commerce: cultural-
aware interaction design. In: Proceedings of the Working Conference on Advanced Visual
Interfaces, pp. 250–253 (2006)
10. Honold, P.: Learning how to use a cellular phone: comparison between German and Chinese
users. Tech. Commun. 46(2), 196–205 (1999)
11. Sonderegger, A., Sauer, J.: The influence of socio-cultural background and product value in
usability testing. Appl. Ergon. 44(3), 341–349 (2013)
12. Frambach, J.M., Driessen, E.W., van der Vleuten, C.P.M.: Using activity theory to study
cultural complexity in medical education. Perspect. Med. Educ. 3(3), 190–203 (2014). https://
doi.org/10.1007/s40037-014-0114-3
13. Foth, M., Hearn, G.: Networked individualism of urban residents: discovering the com-
municative ecology in inner-city apartment buildings. Inf. Commun. Soc. 10(5), 749–772
(2007)
14. Brodie, R.J., Hollebeek, L.D., Jurić, B., Ilić, A.: Customer engagement: conceptual domain,
fundamental propositions, and implications for research. J. Serv. Res. 14(3), 252–271 (2011)
15. Brodie, R.J., Ilic, A., Juric, B., Hollebeek, L.: Consumer engagement in a virtual brand
community: an exploratory analysis. J. Bus. Res. 66(1), 105–114 (2013)
16. Hollebeek, L.: Exploring customer brand engagement: definition and themes. J. Strateg. Mark.
19(7), 555–573 (2011)
17. Cowley, B., Charles, D., Black, M., Hickey, R.: Toward an understanding of flow in video
games. Comput. Entertain. (CIE) 6(2), 1–27 (2008)
18. Webster, J., Ahuja, J.S.: Enhancing the design of web navigation systems: the influence of
user disorientation on engagement and performance. MIS Q., 661–678 (2006)
19. Seddon, K., Skinner, N.C., Postlethwaite, K.C.: Creating a model to examine the motivation
for sustained engagement in online communities. Educ. Inf. Technol. 13(1), 17–34 (2008)
20. O’Brien, H.L., Toms, E.G.: What is user engagement? A conceptual framework for defining
user engagement with technology. J. Am. Soc. Inform. Sci. Technol. 59(6), 938–955 (2008)
21. Sidner, C.L., Lee, C., Kidd, C.D., Lesh, N., Rich, C.: Explorations in engagement for humans
and robots. Artif. Intell. 166(1–2), 140–164 (2005)
22. Short, C., Rebar, A., Plotnikoff, R., Vandelanotte, C.: Designing engaging online behaviour
change interventions: a proposed model of user engagement (2015)
23. Petty, R.E., Cacioppo, J.T.: The elaboration likelihood model of persuasion. In: Petty, R.E.,
Cacioppo, J.T (eds.) Communication and Persuasion. SSSP, pp. 1–24. Springer, New York
(1986). https://doi.org/10.1007/978-1-4612-4964-1_1
24. Oinas-Kukkonen, H., Harjumaa, M.: Persuasive systems design: key issues, process model,
and system features. Commun. Assoc. Inf. Syst. 24(1), 28 (2009)
25. Ritterband, L.M., Thorndike, F.P., Cox, D.J., Kovatchev, B.P., Gonder-Frederick, L.A.: A
behavior change model for internet interventions. Ann. Behav. Med. 38(1), 18–27 (2009)
26. Venkatesh, V., Morris, M.G., Davis, G.B., Davis, F.D.: User acceptance of information
technology: toward a unified view. MIS Q., 425–478 (2003)
27. Davis, F.D.: A technology acceptance model for empirically testing new end-user information
systems: theory and results, Doctoral dissertation, Massachusetts Institute of Technology
(1985)
Theoretical Perspectives Towards Culture-Centered User Engagement Design 311
28. Ludden, G.D., Van Rompay, T.J., Kelders, S.M., van Gemert-Pijnen, J.E.: How to increase
reach and adherence of web-based interventions: a design research viewpoint. J. Med. Internet
Res. 17(7), e4201 (2015)
29. Engeström, Y., Miettinen, R., Punamäki, R. (eds.) Perspectives on Activity Theory. Cambridge
University Press, Cambridge (1999)
30. Altheide, D.L.: An ecology of communication: toward a mapping of the effective environment.
Sociol. Q. 35(4), 665–683 (1994)
31. van Oers, B.: Educational forms of initiation in mathematical culture. In: Kieran, C., Forman,
E., Sfard, A. (eds.) Learning Discourse, pp. 59–85. Springer, Dordrecht (2002). https://doi.
org/10.1007/0-306-48085-9_2
32. Good, A., Omisade, O.: Linking activity theory with user centred design: a human computer
interaction framework for the design and evaluation of. Appl. Interdiscip. Theory Health
Inform. Knowl. Base Practitioners 263, 49 (2019)
33. Kang, S.: Designing for design activity. In: Undisciplined! Design Research Society
Conference 2008, pp. 16–19. Sheffield Hallam University, Sheffield, UK (2009)
34. O’Leary, D.: An activity theory framework for DSS for extreme events: with a hurricane
example. In: Pre-ICIS SIG DSS Workshop (2007)
35. Dourish, P.: What we talk about when we talk about context. Pers. Ubiquit. Comput. 8(1),
19–30 (2004)
36. Tacchi, J.A.: Studying communicative ecologies: an ethnographic approach to information
and communication technologies (ICTs). In: 56th Annual Conference of the International
Communication Association (2006)
37. McCurdie, T., Taneva, S., Casselman, M., Yeung, M., McDaniel, C., Ho, W., et al.: mHealth
consumer apps: the case for user-centered design. Biomed. Instrum. Technol. 46(s2), 49–56
(2012). https://doi.org/10.2345/0899-8205-46.s2.49. PubMed, CrossRef, Google Scholar
38. Marzano, S.: New Values for the Millennium: Philips Corporate Design. V+ K Publishing,
Eindhoven (2000)
39. Lee, D.Y.: Interaction of cultures through design’ cross-cultural design (CCD) learning model:
the development and implementation of CCD design education in South Korean higher
education, Doctoral dissertation, Goldsmiths, University of London (2016)
40. Roussou, M., Katifori, A., Pujol, L., Vayanou, M., Rennick-Egglestone, S.J.: A life of their
own: museum visitor personas penetrating the design lifecycle of a mobile experience. In:
CHI 2013 Extended Abstracts on Human Factors in Computing Systems pp. 547–552 (2013)
41. Cooper, A., Reimann, R.M.: About Face 2.0. Indianapolis. Wiley, Hoboken (2002)
42. Grudin, J., Pruitt, J.: Personas, participatory design and product development: an infrastructure
for engagement. In: Proceedings of the Participatory Design Conference, pp. 144–161. ACM
Press (2002)
43. Ma, J., LeRouge, C.: Introducing user profiles and personas into information systems
development. In: Proceedings of the Americas Conference on Information Systems. AIS
(2007)
44. Chapman, C.N., Milham, R.P.: The persona’s new clothes: methodological and practical
arguments against a popular method. In: Proceedings of the Human Factors and Ergonomics
Society, pp. 634–636. HFES (2006)
45. Miaskiewicz, T., Kozar, K.A.: Personas and user-centered design: how can personas benefit
product design processes? Des. Stud. 32(5), 417–430 (2011)
Automated Meal Planner Using Multiple
User-Defined Benchmarks for Healthy Eating
1 Introduction
1.1 Problem Statement
For families, it can be cumbersome to balance nutrition, cost, and the time needed to make
the plan. While there are several tools widely available on a variety of platforms, there
seems to be no option that generates a plan for the user, unless that plan is driven by the
app’s proprietary recipes. According to the Bureau of Labor Statistics, the percentage
of households where both adults work is between 52% and 58% [1]. This increasing
busyness in the day-to-day lives has led people to turn to digital options for handling
simple, but time-consuming tasks. Meal planning is something that everyone needs to
do. Whether it is done meal-to-meal, day-to-day, week-to-week or month-to-month,
everyone at some point needs to decide what to eat. Cooking at home is cheaper than
dining out, and with health and fitness as a multi-billion-dollar industry, it is clear that
individuals are interested in trying to ensure their meals fit in with their dietary and
nutritional goals. Meal planning relies on individuals to juggle a number of factors: how
many individuals are they cooking for, does the plan for each day fit with dietary and
nutritional needs and exactly which recipes should be served when. Generating shopping
lists from these recipes without relying on digital tools is cumbersome, requiring the
individual to go through every recipe they plan on cooking and copying the ingredients
into a list.
The application created here was designed to make meal planning and shopping list
generation a much easier on end users by providing a randomly generated meal plan
that fits within the provided nutritional and cost criteria, complete with a pre-generated
menu.
Option to remove the previous week’s dinner recipes from the list of possible recipes
to force variation.
Paprika
This app stores your recipes, allowing you to sort your recipes into user-defined cate-
gories. Recipes can be added manually or imported directly from websites. It generates
314 C. Lyons-Rocque and S. K. Semwal
shopping lists sorted into categories (such as “Dairy” or “Produce”. It also includes
“Pantry” items that users would keep in stock, with the ability to check off if it is in
stock, when it was purchased, and when it expires. It allows you to save reusable menus
for your favorite meals [2].
However, it still requires manual meal planning, returning to the same issue of having
to select recipes when the user may be driven by cravings, or a “nothing sounds good”.
Additionally, nutritional value of the recipes is not included (whether calculated or
entered by the user), leaving it to the user to track the criteria of their meal plan manually.
While it presents an easy-to-use interface, it does not meet the need of reducing time
spent meal planning.
Mealime
Mealime constructs meal plans for the user based on dietary restrictions and preferences.
User profiles boast “200 personalization options” plan type (such as “Classic” or “Veg-
etarian”), saves allergies and dislikes, and adjusts the recipe for the number of servings
desired. Nutritional data for each recipe is provided, and the app helps manage the cost
of the meal plan [3].
Where this app falls short is there is no mechanism for user defined recipes. They
provide only their own recipes, with no option to add additional recipes. The user is
limited to whatever recipes are available in the app. It also relies on the user to build the
plan manually, rather than providing a randomly generated plan. Nutritional information
is not totaled for the day, leaving total calculations for the user.
elements quickly and easily. Both the GUI and the underlying processing are coded in
C#. For data storage, csv and text files are used. Csv files provide simple two-dimensional
relational databases with minimal overhead, and built-in library tools for reading and
writing to such files. Additionally, as a standard file type, there is no concern for any
changes to APIs from using more robust database tools. In the future, the vision for this
project is to become a mobile app. Csv and text files have a low storage footprint, and
do not require internet connectivity to use. They also do not require additional software
to be installed on the system. It was determined that these file types provided all the
functionality necessary without loss of data clarity.
Three files in total are required for this application to work. First, Recipes.csv, which
holds recipe names, nutritional value, cost, and the active Boolean. Second, Ingredi-
ents.csv which holds the ingredients of each recipe, with each ingredient as a single line
containing the recipe name, ingredient name, units, quantity, and the pantry Boolean.
The “isPantry” Boolean defines an ingredient as the sort of ingredient that does not need
to be purchased every time it is called for, for example milk or olive oil. These are items
that the user would have “in their pantry”, and simply need to check if the quantity they
have is enough for the week’s recipes.
This value is user-defined in the Input Recipe GUI, which allows user to tailor items
based on their own shopping habits. For example, some user may buy pasta in bulk and
would thus label it a “pantry” item, while others may simply buy a package of pasta
every time the meal plan includes a pasta dish.
The final file required for processing is the Preferences.txt file. This contains the
maximum thresholds for calories, carbohydrates (in grams), protein (in grams), fat (in
grams), and maximum weekly cost. Additionally, it contains a string list of each of these
criteria, ordered from least important to most important to the user. Each of these values
can be edited via the Preferences GUI.
Algorithm Details
In this section, the key algorithms developed in this application is reviewed. Not all
methods are covered, only the ones the author felt were significant or significantly coded
by the author. Before discussing the algorithms, an overview of the classes is needed.
The “Recipe” class defines a recipe by name, which meal of the day it is associated with
(an enumeration defined as Breakfast, Lunch, Dinner, Side, and Snack), the number
of calories in the recipe, the amount of carbohydrates (grams), the amount of protein
(grams), the amount of fat (grams), a Boolean indicating if the recipe is active, a list of
ingredients in the recipe, and the cost as a double. Though output uses the “$” symbol
to denote American dollars, the field itself is not tied to any currency, and users can
enter values in whatever local currency they prefer. This class does no manipulation or
calculations of data. The only methods available are simple getters.
The “Ingredient” class defines an ingredient by name, quantity, unit of measure (an
enumeration defined as Unit, lbs, tsp, tbsp, oz, cup, kg, g, ml, and l), and a Boolean
indicating whether the item is a “pantry” item or not. No calculations are performed in
this class. Methods consist of simple getters, and a method to convert the ingredient to a
string based on the layout of the Ingredients.csv file (quantity, unit and name separated
by tab characters) for use in the IO class.
316 C. Lyons-Rocque and S. K. Semwal
The “Day” class defines the “Day” object, which contains a recipe for breakfast,
lunch and dinner, a list of recipes for snacks, an enumeration for the day of the week,
and a Boolean indicating if it is a “good” day i.e., meets all of the criteria as defined
by the user, minus any criteria removed from calculation consideration. No outflowing
change in data is processed in this class. The methods consist of simple getters/setters,
and methods that total the calories, carbohydrates, protein, fat, and cost of the recipes
held by the Day object.
The “Preferences” class defines the user preferences of the application. It holds max
calories, max carbohydrates, max protein, max fat, max weekly cost, the ranking of the
aforementioned criteria, a Boolean indicating whether the user would like to remove the
previous week’s dinners from the list of options (preventing a recipe from appearing more
than once every other week to increase variety), and a Boolean indicating whether the
user would like to include dinner side dishes in their meal plan. Beyond the simple getter
and setter methods, this class provides the mechanism to both read in the preferences
from the “Preferences.txt” file, and to save updated preferences back into the same file.
The “IO” class handles the input and output processing necessary for the application,
including manipulation of the data into output-ready format. The “IO.CSVRecipeInput”
method reads the “Recipes.csv” file and parses out the columns to create a list of Recipe
objects, including converting Meal integers to the proper enumeration value (0 = Break-
fast, 1 = Lunch, 2 = Dinner, 3 = Snack, 4 = Side) and converting “TRUE” of “FALSE”
text to a Boolean for “isActive”. The contents of the Recipes.csv are shown in Fig. 1
with example recipes.
For each recipe it reads in, it calls “IO.IngredientInput”, which returns a list of ingre-
dients associated with the recipe. These ingredients are stored in the “Ingredients.csv”
file, which contains all the necessary information required for an Ingredient object,
including the name of the recipe it is associated with, the name of the ingredient, the
quantity, the unit, and a Boolean value indicating if it is a pantry item. Figure 2 shows
the Ingredients.csv file with the ingredients for a several recipes. Even simple foods
consumed as single ingredients (such as coffee) can be entered to be included in the
meal plan. Additional ingredients (such as coffee creamer) can be added to the “Coffee”
recipe to ensure proper calculations.
CalculateDay Method
Calculator.CalculateDay is the primary method where most calculations are completed.
For each day of the week, it calculates a meal plan meeting the user’s criteria. If a meal
cannot be found that meets the criteria, it will start removing criteria from the calculations
Automated Meal Planner Using Multiple User-Defined Benchmarks 317
by setting the threshold variable to an absurdly high amount of 500,000 cal, and 5000
each for carbs, protein, fat, and cost.
Figure 3 shows the entry to the CalculateDay method, which takes in the Day object
of the day of the week being calculated.
After reading the preference values and ensuring that there are remaining dinner
recipes to choose from, the CalculateDay method begins with calculating dinner. Dinner
is usually the largest meal of the day for most American families as parents work during
the day and children are typically in school.
As the largest meal, it was decided that it should be entered first as the other, smaller
meals would be easier to fit around the criteria values remaining after choosing dinner.
As dinner is the first meal, the method does not need to check if the recipe calories and
macros will fit. It is assumed that the user will not input a dinner whose calories, carbs,
protein and fat are higher than the daily maximum.
318 C. Lyons-Rocque and S. K. Semwal
If Day.Dinner is not null, it indicates that the user utilized the “Set Meal” functionality
to set a specific recipe for that day’s dinner.
If the Day’s object has a “null” dinner recipe, the dinner is chosen by calling “Se-
lectMeal”, which takes a list of recipes as input, selects a random index, and returns
the recipe in that index. After a recipe is chosen, the recipe is removed from the list of
dinners to prevent duplicates appearing in a single week.
After the dinner is removed, UpdateRemaining is called, which passes in the recipe,
and as output variables passes the variables containing the remaining calories, carbs,
protein, and fat. UpdateRemaining subtracts the recipe’s values from the totals remaining
for the day, as well as subtracting the cost from the remainingWeeklyCost variable, which
is held at the class level (and therefore does not need to be passed in or out of the method).
With dinner set, and again assuming that the user would not enter a dinner recipe with
calorie, carb, protein, or fat values that exceed the daily target, the Day.Good Boolean
is set to True indicating that the recipe fits in the day.
After the dinner is set, if there is at least one Side recipe, and the user has selected to
enable using dinner sides in the meal plan, the side dish is selected via SelectMeal, and
then UpdateRemaining is called. Again, no check against the criteria totals is done as it
is highly unlikely that a dinner plus side dish would consume all calories, carbs, protein
and fat for the day.
The code for the dinner portion of the algorithm is shown in Fig. 4.
Lunch Logic is summarized below: As the algorithm moves into the Lunch selection,
the program needs to start checking if the remaining meals fit within the criteria for the
day when its values are added to that of the dinner selection. If the user added the lunch
manually, the following process is skipped.
Automated Meal Planner Using Multiple User-Defined Benchmarks 319
• A Boolean “lunchOk” is used to track whether the lunch fits into the meal plan criteria
for the day. A second list is created so recipes that do not fit can be removed. Currently,
only dinners are needed to be unique, and so the lunch recipes, are not removed from
the main list of lunch recipes after selection, allowing them to be repeated during the
week. If the lunch was not set by the user, after randomly selecting the lunch recipe
via SelectMeal, the RecipeOk method is called, passing in the recipe and remaining
criteria values. RecipeOk checks if the remaining values minus the values of the
selected lunch recipe are greater than zero, and sets “lunchOk” to the returned value.
• If the lunch fits, UpdateRemaining is called, the Day’s Lunch variable is set to the
chosen recipe, the day is marked as good, and the algorithm breaks from the loop. If
the lunch does not fit, the Day.Good value is set to false, and the recipe is removed
from the list of possible lunches for that day.
• If the “todayLunches” list gets emptied due to no lunch fitting in the criteria, the
CalculateDay method is exited, returning to the Calculate method, which then calls
UpdateCriteria to remove the least important criteria remaining from the list of criteria,
and attempts to calculate the day again.
• As the Dinner reference on the day is already set, it will try to find a lunch that now
fits based on the updated criteria (where variable tracking the max threshold for the
least important remaining criteria is set to a high enough amount as to always return
TRUE when any recipe’s values is subtracted from it).
• If the lunch is already set, the values are subtracted from the remaining totals, and the
Day.Good value is set to TRUE.
The maximum number of snacks per day is three, leaving a possible range of snacks
chosen from zero to three.
First, a list of snacks to hold the snacks chosen for the day is created. Then, a
temporary list of snacks duplicates the snacks list to avoid deletion of snacks from
the main snacks list, similar to todayLunches and todayBreakfasts lists used in their
respective portions of the code. Snacks can not only be repeated day-to-day, but can be
repeated within the day as well.
If any snacks have already been set by the user in the “Set Meal” interface, those
snacks update the total remaining values for the criteria, and are added to the “daySnacks”
list. While there exists snacks in the “tempSnacks” list, and there are less than three snacks
in the “daySnacks” list, all snacks remaining in “tempSnacks” are checked if they would
be acceptable snacks (i.e., will not overshoot the criteria). Any snacks that no longer fit
after each snack is added is removed from the list.
This avoids any snack that does not fit from being chosen to add to the list, and
breaks out of the loop if the criteria list is expended prior to reaching the three-snack
maximum. As this processing occurs before the snack is randomly selected, the selected
snack does not need to be checked again prior to adding it to the list.
After a snack is chosen, the remaining criteria values are updated, the snack is added
to the list of snacks for the day, and the process is completed. This continues until either
320 C. Lyons-Rocque and S. K. Semwal
there are no snacks that will fit the remaining criteria for the day, or until the maximum
three snacks are reached.
Reroll Method
The Reroll method is called when the user is unhappy with the dinner choices for a given
day and wants to generate a new random choice.
To start, the method checks if the list of dinner recipes is empty. This occurs when
the user rerolls more times than they have dinner recipes, and so no recipe is left to be
chosen. The error message prompts the user to exit the application and try again with
the list of dinners reloaded.
If the list of dinner recipes is not empty, the program ensures that the chosen recipe
is removed from the list. It should have been removed when the dinner was set, but a
simple call to List.Remove verifies that it is no longer in the list of options.
A new dinner is chosen via the SelectMeal recipe, and that recipe is removed from the
list of potential dinners. Before that meal is set to the Day.Dinner property, the program
first finds the remaining allotment for calories, carbs, protein, and fat by taking the max
threshold value from the preferences object, subtracting the amount from the day (which
is totaled across all recipes held by that day), and adding back the values of the dinner
recipe being changed. The same is also done with cost, but the arithmetic is simpler as
it is held at the weekly level as a class variable. As such, only the cost of the changing
dinner recipe need be added, without the need to query the Day object.
After these variables have been calculated, the RecipeOk method is called to verify
the dinner will work within the confines of the remaining threshold values. If it fits the
requirements, the recipe is set to the Day.Dinner property.
This enter process is repeated until either the list of dinner recipes is exhausted, or
a suitable replacement dinner recipe is found and is added to the meal plan.
SetMeal Method
The SetMeal method allows the user to set a specific recipe to a specific meal on a specific
day. It takes the meal, recipe name, and the day (as an int to convert to the enumeration).
This method is called from the GUI, and the user’s choices are passed in. If it finds the
recipe in the list of recipes, it sets the recipe to the appropriate meal on the appropriate
day and returns true. If the meal is not found, the method returns false.
Implementation
This section will review the implementation details of the Meal Planner tool. First, the
GUI is described, and each of the features available on the GUI is described. Following
the GUI description, a review of the algorithms used to generate the meal plan for the
user is provided.
The user interface is simple. The program starts up with several options to the user.
“Shopping” (displays shopping list), “Reroll Dinner” (replace dinner on a given day
with a randomly generated alternative) and “Save Plan” (saves text file to a user-selected
location) are all disabled.
These options require a meal plan to have been generated, and disabled until after
the meal plan is generated to avoid user confusion.
Automated Meal Planner Using Multiple User-Defined Benchmarks 321
The GUI starts with the following options available to the user: “Generate” (generates
to the meal plan and outputs to the screen), “Set Meal” (allows the user to set a specific
meal), “Preferences” (allows the user to set and update preferences), “Add Recipe”
(allows user to add a new recipe to the planner), and “Update Active/Inactive Recipes”
(allows user to reactivate or deactivate selected recipes). (Fig. 5).
The “Preferences” GUI allows the user to set different options that impact the criteria
the system uses. The user can update their preferred maximum value for their calories,
carbohydrates (carbs), protein, fat, and cost. Once the user has the required number of
recipes available, the “Generate” button will activate the calculator and will output a
meal plan to the text box located on the main GUI (Fig. 6).
Figure 7 shows how a meal plan appears to the user after it is generated. First, any
criteria that had to be removed in order to generate the plan is provided to the user so
they can see which criteria was ignored.
Next, the total cost of ingredients to make for all the week’s recipe entries is dis-
played. Next, each day’s plan is displayed. The name of the day and a breakdown of
the nutritional criteria for that day is provided, allowing the user to see their nutritional
value immediately.
Each meal is then listed, including any snacks that were added to the day (max three,
assuming the snacks fit the criteria). If “Use Dinner Sides” is checked, the meal plan
will list dinner as “Dinner <recipe> and: <side>” (see Fig. 8).
322 C. Lyons-Rocque and S. K. Semwal
4 Future Work
This work had a few features that were out of scope of this project but should be added to
increase usability. The ability to check off ingredients as the user purchased them would
make tracking the shopping easier. Unit conversions to condense shopping list entries
would make the shopping list more readable (e.g., “1.25 cups of flour” instead of “1 cup
flour, 4 tbsp flour”). The ability to edit recipes would allow for tweaks without requiring
the user re-enter a recipe with a new name and deactivate the previous recipe, or needing
to edit the recipe file directly. Finally, converting into a mobile application would make
it easier for the user to access the items from anywhere via the user’s smart phone.
Applications such as this are infinitely extensible. There are many ways of using this
app, and as such many new features that could be added to support the needs of the end
user. This section describes some of the features that could be added, but is by no means
an exhaustive list:
• We imagine statistical and deep learning techniques could also be used to add new
features based on possible patterns observed in connecting meals to possible future
health predictions.
• Adding tags that could categorize recipes and allow certain meals to be restricted to
certain tags would allow users to tailor their meal plan to different goals, audiences, and
dietary needs. For example, a “Vegetarian” tag could be used for “Meatless Mondays”.
“Meatless Mondays” is a popular way to reduce a family’s carbon footprint and meat
Automated Meal Planner Using Multiple User-Defined Benchmarks 323
consumption without going entirely vegetarian. For blended families where custody of
children is shared with another parent, a “Kids” tag could be used for meals when the
children were with the user. “Vegan” or “Allergen” tags could be used to accommodate
friends, family or other guests who have dietary restrictions that the user only needs
to follow when those guests join for dinner.
• Some families may consistently have leftovers, so having an option to plan on leftovers
for lunch the next day would reduce waste. This is especially useful if a limited number
of family members are tracking their nutrition (such as one member monitoring what
they eat for weight loss).
• Providing connection to other apps and services would allow users to synchronize their
usage across apps. Recipes could be imported from tools used to calculate nutritional
value of a recipe. Meal plans could be sent directly to food tracking applications.
Shopping lists could be sent directly to the grocery store app or website to auto-
populate the online shopping cart. Integration with digital assistants, such as Alexa
or Siri would allow the user to use voice commands to generate, order, and save meal
plans. Commands to ask “What’s for Dinner Tonight” etc. can be added to allow
for easy reminders without needing to refer to the saved meal plan text file. New
“smart” refrigerators can be integrated to automatically check for items such as milk,
heavy cream, etc. In time, the ability to use that same camera and food identification
technology could be added to pantries and cupboards to check the stock of all items.
• Providing means for user accounts to communicate. Allow trading of recipes between
users, or share meal plans within the user’s network of friends and family could also
be added.
• Although we have tested working of the program, the quantitative and qualitative
testing by several group of people will be beneficial in future.
5 Conclusion
This project addresses a modern family’s need to save time and energy, while still
planning healthy meals that meet health and fitness goals. Modern families can struggle
with the age-old question “What’s for dinner?”.
Typically, one-button meal plan generators push their proprietary recipes and diets
onto the user. Tools that allow the user to enter their own recipes do not provide the
option for one-click meal planning, relying on the user to manually plan their meals,
storing the plan, and offering tools like one-click shopping.
But no tool was found that automatically generated a meal plan based on user-defined
recipes and food. This tool allows users to plan their favorite recipes and foods while
still keeping to their nutritional goals.
This tool allows users to generate nutritional goal-driven meal plans quickly and
easily, without requiring the user to manually calculate if their goals are being met. If
goals cannot be met with the foods available, it removes the least important criteria. Final
totals are provided to the user, allowing them to validate their meal plan themselves.
Acknowledgment. This program was developed by the first author under the supervision of the
second author to satisfy part of the MS project requirement and is based on [5]. Both authors will
324 C. Lyons-Rocque and S. K. Semwal
like to thank the MS project Committee Members, and reviewers of FTC 2022 conference as their
insightful comments improved our work. Thank you.
References
1. Bureau of Labor Statistics: Bureau of Laber Statistics, 5 March 2022. https://www.bls.gov/
opub/mlr/2020/article/comparing-characteristics-and-selected-expenditures-of-dual-and-sin
gle-income-households-with-children.htm
2. Mealime: Mealime, 5 March 2022. https://www.mealime.com/
3. Paprika: Paprika, 5 March 2022. https://www.paprikaapp.com/
4. Whisk: Whisk, 5 March 2022. https://whisk.com/
5. Lyons-Rocque, C.: Automated meal planner using multiple user-defined benchmarks for
healthy living, MS project, advisor: Dr. Sudhanshu Semwal, Department of Computer Science,
University of Colorado, Colorado Springs, pp. 1–61, 27 March 2022
A Smart Healthcare Framework: Opportunities
for Integrating Emerging Technologies (5G, IoT,
AI, and GIS)
1 Introduction
With the continued advancement in information and communications technology (ICT),
the world has become digitally transformed and moving towards a connected society. A
connected society is a much-discussed topic in the research community. Citizen science
discusses the connected society in multiple dimensions. In political science, the con-
nected society is discussed as egalitarian empowerment and bringing power to people
for engagement in progressive politics [1]. In geography, it is interpreted as the spatial and
topological connectivity of objects that people require for well-being and good living.
Distributed wealth among people is also discussed in economics. However, in science
and technology, the connected society is interpreted as the manner in which information
enables the society to enrich its knowledge recovery process, hence, allowing the pursuit
of higher living standards and greater social rights. Social connectivity is indeed for the
Main contributions of this work-in-progress paper to literature and practice are three-
fold. First, it discusses current and potential applications of collaborative, interconnected
technological solutions to improve healthcare activities and services taking into account
three different dimensions: healthcare stakeholders, local level computing, and regional
level computing services. Second, it proposes a comprehensive model of smart healthcare
that converges IoT, AI, GIS, and 5G that may be used to guide researchers through a com-
prehensive solution development process. Third, the proposed model can also be used
by solution providers in the healthcare sector to facilitate their business development.
The remainder of this paper is structured as follows. First, a literature review and
description section of each of the technologies: AI, IoT, 5G, and GIS was provided.
Next, a model for the convergence of 5G, IoT, AI, and GIS for a smart healthcare was
presented. A use case was then presented to demonstrate the proposed model by providing
scenarios and examples from the scholarly literature of the individual technologies and
their integration to support smart healthcare. Finally, expert opinions of the model’s
value and utility were summarized.
2 Literature Review
2.1 Artificial Intelligence
AI is today one of the most discussed buzzwords of the technology industry. It refers to
a self-learning system based on data. It augments the human decision-making process
using the experiences or observations of a human brain with machine-learned patterns
from data. The two main components of AI are data and mathematical algorithm. The data
can be structured or unstructured. AI algorithms classified into supervised, unsupervised,
and reinforced learning processes based on how the algorithm learns patterns from data.
Supervised learning algorithm uses human-labeled data for training.
In the unsupervised learning process, the algorithm creates labels from data and
trains itself. The third class, reinforced learning, uses iterative interactions of an agent
and the environment for actions to change the status either as a reward or penalty. During
the interaction of the agent and environment, the agent learns the environment through
the iteration of an action and the reward/penalty process [1].
The patient-centric treatment model has been discussed in the industry for a decade.
Technological barriers to fully implementing the patient-centric model have yet to be
overcome. One of the main characteristics of the patient-centric model is the real-time
information sharing process. In a real-time decision-making process for mission-critical
applications such as healthcare and for the development of an intelligent connected
society for the knowledge sharing process, AI will be part of a technological ecosystem.
For instance, chronic disease patients’ data would be available from patient medical
records at a hospital. The hospital can develop an AI tool to identify the patterns from
the data to support patients and treatments. However, such information and decisions
are based on a clinical aspect. Assuming a shared social platform between patients and
hospitals, IoT devices (e.g., wearable sensors, drones, and mobile phones) automati-
cally captures patient health information and shares it through a technology platform.
AI-based real-time analytics can provide new information to patients directly by perform-
ing data analytics on the collated data from medical records, patient-level time-series
328 B. Mullachery and S. Alismail
data, patient-to-patient communication data, and the geolocation aspects of patients. For
this type of technological integration, computing and communication infrastructure rep-
resents a limitation to instant knowledge sharing geographically. An enhanced wireless
communication platform with cloud computing and AI can provide predictive knowl-
edge to a patient at risk. A 5G wireless network can reduce the gap in the real-time
data delivery process from patients to the computing platform and from the computing
platform to patients.
defined as “an end- to- end ecosystem to enable a fully mobile and connected society. It
empowers value creation towards customers and partners, through existing and emerg-
ing use cases delivered with consistent experiences and enabled by sustainable business
models” [9:1].
The 5G infrastructure provides an ecosystem for every internet-enabled device
and can be configured in a state-of-the-art fashion, including a network-as-a-service
for infrastructure efficiencies. The core parts of the 5G are: Software Defined Net-
work (SDN), Network Function Virtualization (NFV), and cloud computing. SDN and
NFV provide capabilities for logically virtualizing the network (aka network slicing).
SDN is agile and flexible, and the programmatically configurable data plane and con-
trol plane enable the operators to provide a faster infrastructure service. NFV enables
network function as software which dynamically scale the resources. Five G is capa-
ble of providing three generic services: enhanced mobile broadband (eMBB), massive
machine-type communications (mMTC); and ultra-reliable and low-latency communi-
cations (URLLC) in mission-critical communication [9, 11]. These applications suggest
new performance criteria for latency, reliability, connection and capacity density, spec-
tral efficiency, energy efficiency and peak throughput that need to be addressed with
the 5G technology [11]. Non-standard architecture and eMBB standard [12] and the
standard architecture version for 5G New Radio (5G NR) standards [13] provides the
guidelines for 5G implementation. Figure 1 depicts the different requirements of 5G
wireless network [9].
Five G users can access the service from anywhere at anytime a data rate of 1Gbps
and a latency of 1ms with a quality of service and user experience [9]. The spectrum band
330 B. Mullachery and S. Alismail
is less than 6 GHz. Five G network is expected to provide faster data rates, higher con-
nection density, and lower latency. Also, device-to-device communication, better battery
consumption, and improved wireless coverages are characteristics of the 5G communi-
cation system. The max speed of the 5G system is expected over 20 times faster than a
4G communication system [6]. It is estimated that 5G can support one billion devices
in 1 sq.km and a mobile device at a speed of 500 km/h. The reliability and availability
of the services are expected to be up to 99.99% [9]. The 5G communication platform
can overcome many limitations in the healthcare automation process by providing uni-
fied communication platforms. The low-latency and high throughput of 5G along with
fog/cloud computing and AI analytics can hinder a smarter healthcare platform from
sharing mission-critical information.
The pervasiveness spatial sciences and ICT are increasingly being used to uncover hid-
den questions related to society and to find solutions to many unresolved social problems,
and they are becoming popular in the knowledge recovery process. Studies have shown
that 80% of the data from various sources (i.e., big data) available today can be georefer-
enced [14]. Georeferenced data can provide its association within a geographic context.
A new interdisciplinary field, Geospatial Artificial Intelligence (GeoAI), which learns
and predicts or forecasts an event in a geographic frame of reference, has emerged for
knowledge recovery using spatial analysis with the integration of AI. For example, in the
field of epidemiology, the disease spread prediction model was developed using GeoAI
algorithms [14].
GeoAI can analyze environmental exposure and develop exposure modeling related
to health. Many studies have proved that air pollution, including PM2.5, NO2 , SO2 , and
PM10, influences the mortality and hospitalization of CHF patients [15], and increases
the number of COPD, cancer, and CHF cases in urban and sub-urban areas [16–19].
IoT plays a role in geolocation services and the information-gathering process. In
the context of healthcare, information such as patient locations, hospital and caregiver
locations, routing and tracking of an ambulances, time-series and static environmental
factors of a location, and patients’ geographic movements is important. At the same
time, sharing and processing this information in a real-time application are very process-
intensive. The technologies of 5G, IoT and AI contribute to transferring and processing
geolocations with environmental factors in real-time scenarios to create smart healthcare.
scheme optimization for allocating resources to different users who share the network.
Five G network is deployed in the Multiple Input Multiple Output (MIMO) framework,
whereas AI technology can be implemented for channel optimization and detection error
rate minimization [7].
Machine to Machine (M2M) and device to device (D2D) communications are char-
acteristics of IoT devices. Billions of addressable devices on the network require massive
amounts of data transfer through the network [12]. This will pose a challenge to inter-
net and mobile communications in terms of location refresh and network congestion.
The efficient throughput and low latency of 5G will augment the effectiveness of M2M
communication. This calls for a need to have a comprehensive model to facilitate the
development of such collaborative, interconnected solutions.
In this section, we proposed such a model to support current and future researchers
and practitioners in designing ICT-based solutions. Figure 2 illustrates the proposed
model for smart healthcare platforms under 5G network with IoT, AI and GIS tech-
nologies in a connected society. In this platform, patients, hospitals, caregivers, medical
equipment, and healthcare infrastructures are equipped with IoT or connected to the 5G
network. The network is sliced and optimized using technologies such as SDN, NFV,
fog/edge computing, and cloud. The D2D, M2M, and URLLC capture and compute
the information without manual intervention or patient knowledge. The health-related
feedback will be passed on to the patient, caregivers, and/or medical equipment (e.g.,
drone or an ambulance) for immediate assistance. Such a system can increase the QoL
of the patient, lessen the burden on healthcare facilities, and reduces overall cost.
The model illustrates a geography centric approach in the solution design. Spatial
information becomes increasingly popular for efficient management of infrastructure
and resource location prediction models. The infrastructure and resource locations (e.g.,
healthcare providers, 5G communication towers) are directly proportional to the pop-
ulation of an area. The population can be better defined using geography and the base
of this solution is spatial technology or GIS. The concept behind 5G is having multiple
frequency-based towers. High-frequency towers are used for faster data services. The
clusters of low frequency and high-frequency towers provide better connectivity for
machine critical application usage which was a limitation in the existing telecommuni-
cation networks (3G, 4G, or LTE). The number of IoT devices generates huge volumes
of data in a local area that can be computed in a fog computing node instead of send-
ing it into the cloud. This enables faster data storage, computing, and dissemination of
information.
The basic concept of this design is a mashup of technology-based patient-centric
and provider-centric healthcare systems. Every patient is equipped with IoT devices and
smartphones that collect and transfer current information with near-zero lag time into a
cloud computing center. The data is spatiotemporal, repeated at constant intervals, and in
an environment where the subject is live. This follows the theory of ecological momen-
tary assessment (EMA) [21], random ecological momentary assessment (R-EMA), or
context-sensitive ecological momentary assessment (CS-EMA), all of which are very
popular theories in determining patients’ psychological and mental health. This helps to
forecast health determinants before they appear.
332 B. Mullachery and S. Alismail
The fog computing center processes information using AI technology locally and
determines the appropriate actions to message back to a patient. It then sends the infor-
mation back to the cloud for a better AI prediction using more data available from
historical archives as well as from other fog nodes. It also sends messages to the patient
and providers, where the providers can see all processed information in a dashboard.
This helps to determine the level of urgency. The cloud platform also integrates mul-
tiple healthcare systems that can provide patient-specific clinical and subjective data.
The geolocation services can process and route incorporating agents, such as a patient’s
location, medical facilities, resources, medical devices, and medical equipment for the
efficient provision of services. This information is available for other research centers,
agencies, universities, and providers under a common government governance model.
The world faced a novel healthcare crisis in 2020. Technological innovations were needed
to help remediate the delivery of healthcare services during the coronavirus (COVID-
19) outbreak. The public health crisis surrounding COVID-19 is used as a use case to
demonstrate and discuss how technology, such as IoT, AI, 5G and GIS, contributes to
healthcare services. More specifically, this section discusses a use case of a patient-
centric smart healthcare system for chronic disease patients by mapping examples from
A Smart Healthcare Framework 333
scholarly literature of the individual technologies and their integration to support smart
healthcare.
One of the advantages of a connected society is “people power”. The connectiv-
ity between people using technology makes them more knowledgeable about their
environment and gives them learning power. Five-G technology could augment fur-
ther network coverage for underserved areas and create a wider-connected society of
patients, healthcare providers, and caregivers while allowing for real-time data transfer
with URLLC.
Home healthcare allows patients to live in an environment where they are most
comfortable. The ICT is enabled by IoT, AI and 4G or 5G. Such a system replaces a
caregiver and behaves as a system in charge of the patient’s health to read and interpret
real-time health data and provide instructions or connect to a remote healthcare provider
for further health advice and services. This can impact the cost, time, and quality of
life of a patient and place less burden on healthcare services, providing a balanced
service to patients that satisfies their needs [22]. Hence, technology-driven, patient-
centric healthcare services can provide cost-effective quality self-care for suppressed
and diseased communities. Healthcare is a broad area within the connected society
paradigm.
The rapid increase of chronic illnesses poses a significant threat to human popula-
tions in terms of health economics [23]. According to the Centers for Disease Control
and Prevention’s (CDC) 2019 statistics, 60% of Americans live with a chronic illness.
Currently, 90% of the $3.3 trillion of healthcare expenditure is spent on treating chronic
and mental illnesses. A chronic patient needs ongoing medical attention. The illness
cannot be permanently cured; however, medical care attempts to mitigate the symptoms
and complications of the disease.
One of the solutions for this growing problem is keeping chronic disease patients at
home by providing prolonged and efficient healthcare support either through caregivers
or self-management. Researchers have been studying long-term, at-home self-care using
multidisciplinary approaches by including clinical, medical, and behavioral sciences
(e.g., [24–26]). In the advent of information systems and technology (IS&T), researchers
have developed telemedicine and telemonitoring systems for healthcare services (e.g.
[4]). The study of this concept explores the possibilities of home-based health monitoring
systems through the integration of home-to-clinic or home-to-hospital ICT to enhance
communication between patients and caregivers remotely by using IoT devices. This
becomes especially important in delivering healthcare services during a crisis like a
pandemic. The approach of monitoring patients remotely while they are at home can be
a less expensive option in terms of healthcare resource management and utilization as
well as a safer option in situations where in-person medical visits are unsafe, such as
during the COVID-19 pandemic. Figure 3 depicts the convergence of these technologies
and the intersection of 5G, IoT, AI and GIS in healthcare where future studies can be
performed.
The advent of high-performance computing, big data storage (BDS), data mining
(DM) and advanced data analytics using AI including machine learning (ML), deep
learning (DL), and text mining (TM), have all further enhanced the application of ICT.
These pervasive forms of technology, along with spatial data, are becoming popular in the
334 B. Mullachery and S. Alismail
field of knowledge recovery to find discernible patterns and answer questions of interest
using advanced data analytics. A research focus of an interdisciplinary study by the
integration of the technologies described above can produce new insights about chronic
illnesses and self-care and assist in identifying patterns that are helpful for forecasting
QoL and risks for a patient with chronic condition(s) and/or disease(s).
Internet of things (IoT)-based mobile health (mHealth) applications are becoming
very popular with faster information collection and dissemination in healthcare. In the
case of chronic disease patients, there is a knowledge gap between their living experi-
ences and the shared understanding available from different care management medical
models. The main purpose of a patient-centered application is to provide the patient with
the ability to make an informed decision about their health and self-manage the risks
during their active daily life [27]. The effective use of the mHealth application along
with other forms of ICT helps to understand wellness and manage chronic diseases [28].
Virtual healthcare service centers have gained popularity due to the current COVID-
19 pandemic and increase in population density together with the lack of sufficient
healthcare resources and awareness of disease prevention and management. Youm and
Park [29] studied the advantages and disadvantages of ubiquitous healthcare (u-Health)
service centers. The monitoring and care management of cardiovascular disease patients
has been successfully performed by means of assessments obtained via unmanned kiosks
without a face-to-face interaction with a physician [30]. Telemedicine has been success-
ful in analyzing the risk of heart failure for patients at home and has proven to be a
potential IT solution for the reduction of hospital readmissions [4]. A cross-sectional
study reports the self-assessment of acute stroke patients treated without hospitaliza-
tion using a mobile phone application-based scale [28]. Considering the abundance of
mHealth application devices in the market, Ruiz-Fernández [31] studied the effective
use of the business process management (BPM) paradigm by the integration of patient
data collecting devices, and clinical processes. The fundamental objectives of a BPM
model are collaboration and coordination of multiple technologies, techniques, and IT
principles to empower patients and increase treatment adherence. Empirical evaluation
of the usability of mHealth systems available for self-care management has also been
reported in the literature [32].
These studies describe the background and proliferation of IoT-based mHealth and its
applications. The key concept highlighted in these studies is the novelty in the design of an
mHealth application for maximum utilization from the patient perspective. The success
of a mHealth application system is based on several factors that affect the continuation of
A Smart Healthcare Framework 335
its use. These factors are typically based on perceived usefulness and patient satisfaction,
which are measured in terms of individual literacy, social support, information quality,
and service quality [33]. An enhanced communication system is essential for faster data
transfer; it analyzes the data and delivers feedback to the connected patients that are
geographically distributed across a wide area by accessing a large knowledge base. For
instance, if a patient is at-risk, the system is expected to automatically analyze the data
by using readings for current and previous health conditions and comparing them against
the knowledge base, both locally (fog computing), and globally (cloud computing). This
is to be followed by the execution of protocols to provide instructions to the patient
directly or through caregivers (if any), send medicine via drone, or sends an ambulance
(ground or air) for faster medical support. Ambulances or drones should have the most
efficient routing system to provide faster service. These analytics and actions must be
executed without any interaction between the patient and the system. Such essential
communication protocols for D2D and M2M critical applications can be provided by
5G wireless communication with URLLC characteristics.
Time series data is an equal-interval type of data and is collected in a sequence
of orders from a single object. This data will help to monitor an event or object in
time-space and explain the underlying structure or mechanism exhibited by that object
or event. In healthcare, time series data can reveal trends by emphasizing the study of
the real-world effectiveness of complex, personalized clinical interventions, a patient-
centered salutogenic focus, or engagement with nonmedical diagnostic and treatment
frameworks. Time series-based clinical data from patient electronic records are widely
used in clinical utility studies [34]. For a congestive heart failure (CHF) patient, accurate
detection of cardiac health is essential to improve the quality of life, and an ML model—
support vector machines (SVM)—was used by [35] to classify electrocardiogram (ECG)
signals to study the heart rate variability that leads to cardiac arrhythmias.
AI and ML models, linear regression (LR), Recurring neural Networks (RNN), Reg-
ularized Regression (LASSO), and Gradient Boosting Machines (GBM) have been used
to forecast healthcare expenditures [36]. Seasonal Autoregressive Integrated Moving
Average (SARIMA) machine learning has been used to understand and enhance the
opportunities for resource allocation and patient care at periods of elevated risk [37].
Multiple ML models from simple to complex decision trees and ensemble models were
used by [38] for feature extraction in the classification of ventricular tachycardia risk
based on device-measured time series data.
Kwon et al. [39] developed a deep learning-based early warning system to predict
the possibility of cardiac arrest for hospitalized patients using a ML model RNN, the
periodic records extracted from the patients’ electronic database, and clinical vital sign
features: systolic blood pressure, heart rate, respiratory rate, and body temperature. In
terms of disease detection, smartphones built-in inertial measurement unit sensor data
have been used to detect cardiovascular diseases [40]. In addition, AI models have been
tested in tracking and predicting diseases using IoT, including mobile and wearable
devices [41].
336 B. Mullachery and S. Alismail
We collected research and industry expert opinions about the potential value and utility of
the proposed model. We conducted qualitative phone-based semi-structured interviews,
which took about 20 to 30 min to complete. We used a convenience purposeful sampling
of five experts. The experts we interviewed were two professors in the field of health
information systems and technology, a computer scientist, a healthcare finance specialist,
and a healthcare professional. The feedback we obtained from the expert interviews were
used to generate ideas for improving and enhancing the proposed model as well as to
propose ideas for future work.
Most experts expressed that the proposed model promises great value in providing
a comprehensive view of how all these different technologies will ultimately converge.
A health information systems and technology professor noted that given the fact that
we are still in the process of adopting the 5G network, what constitutes the novelty
of this model is that it addresses the convergence of 5G network with other existing
technologies along with being based on GIS. She further expressed that “we do have
now robust infrastructure offered by [the] 5G network, so why don’t we take advantage
of that and start collecting more data from different nodes whether it is on the individual
level or household level” (Expert 2). Another expert in the field of health information
systems and technology noted that to the best of her knowledge, she has not seen models
that really integrate all these cutting-edge technologies. She believes that this can be
used as a guidance model for how the technologies can converge together to ultimately
benefit patient, physician, and/or research community. Both the integration of 5G with
other cutting-edge technologies and the benefit they promise for healthcare stakeholders
shows the value of the proposed model.
Given the COVID-19 pandemic the world is currently facing, a health information
systems and technology professor mentioned that,
“I am hoping to see more and more of harnessing cutting-edge technologies such as
the contact tracing apps…that are currently used in some countries such as China and
Singapore to help with the surveillance… so instead of using the traditional contract
tracing, which means that a person who have shown some symptoms of COVID-19, he
is going to wait 14 days to see if these symptoms developed and then he is going to be
approached by a healthcare facility to share names of people who has been in contact
with. Instead of this long cycle, they can use what is called the digital contact tracing,
which is a mobile app that is based on the location of the person/user where the geo-
location feature will be enabled in those apps, which mean that automatically whenever
the app user meet any person, the app will show all the contacts of that user, given that
this person and whoever he is been in contact with during these 14 days both have the
contact tracing app” (Expert 1).
Considering future work, another expert in computer science noted the importance
of considering privacy at the local area computing level when these types of technologies
are deployed. According to her, the number one issue that researchers and practitioners
are facing with these smart devices is how we can make sure that these technologies
preserve the privacy of the individuals. The issue of security is also critical to consider.
Health information is highly sensitive and transmitting it over unsecured networks like
the POTS has troubled privacy experts for decades. Only with the advent of legislation
A Smart Healthcare Framework 337
like HIPAA and the incorporation of encryption, technologies have the privacy situation
become somewhat tenable. However, bringing IoT devices into the proposed model not
only makes communication vulnerable, but it also jeopardizes end user devices through
malware and uninvited remote access. This would be a worthwhile area to explore in
future research.
A healthcare finance specialist viewed the proposed model through the lens of
finance. The feedback was that such a collaborative system is essential to fulfilling
the requirements of interoperability between multiple systems for common healthcare
expenses and budgeting. She further mentioned, “the options available in the model can
cut insurance cost and instead of Medicare for all, this will improve healthcare systems
effectively in all ways” (Expert 4).
Another healthcare professional who is working in telemedicine communication
as a nurse practitioner expressed an interest to have such a fully automated system for
effective use of the telemedicine concept. In her experience, the patients must upload the
requested health data through a manual process which takes weeks to be processed into
the system. This delays the appropriate medical consultation and sometimes the situation
goes worse than expected. She also noted that the trend analysis available in the proposed
model can offer analysis for many common diseases. Some examples of these use cases
include, food and nutrition requirements based on the medication, medicine dosage
adjustment, monitoring pregnant women and providing instructions, during a trauma
bleeding versus health vitals analysis to determine the risks, unprecedented pandemic
and abnormal situations, and mental health depression and suicide transformation in an
early state. She further noted that, “the model proposed, once implemented, can bring
big changes in healthcare services due to the collaboration of various data. This can also
help to avoid unwanted health testing and screening processes and during COVID 19
the resources could have been better managed and forecasted” (Expert 5).
4 Conclusion
This work-in-progress paper presents a proposal of a model converging AI, IoT, 5G
network, and GIS in a concept that contributes to smart healthcare. We present a review
of technologies including AI, IoT, 5G network, and GIS; and highlight some of the chal-
lenges of 5G implementation. The increasing potential of those technologies individually
and in tandem to improve the process and delivery of healthcare services constitutes the
significance of our work. The transformative power of these technologies to provide effi-
cient and inexpensive healthcare services illustrates the potential benefits of designing
and developing such technological solutions. Geolocation and intelligent built around
using technologies can enable and optimize the smart healthcare deployment landscape.
It would be interesting to learn from future research projects or solutions in the realm of
healthcare that could be organized, implemented, and/or evaluated using the proposed
model. This work is part of a larger project on smart healthcare. The proposed model
is a starting point for designing and developing technological solutions to solve real-
world problems that will be further assessed for their efficacy and effectiveness using
empirical studies. The proposed model was preliminarily evaluated with academic and
medical industry experts to understand the usefulness and acceptance from the industry
338 B. Mullachery and S. Alismail
and research community for the study continuation. We have received favorable encour-
agements for having such system to reduce the current hospital and patient burdens
and providing satisfactory health services. Future studies are encouraged to continue to
build on the conceptual model, articulate the role of GIS, how it integrates with other tech-
nologies, how the location information can be mined in and of itself, or can be combined
with other patient attributes to provide spatiotemporal location intelligence. Another
interesting opportunity is to examine patterns and relationships between the location
and non-location attributes of patients with the overall smart healthcare framework.
References
1. Allen, D.: A connected society. Soundings 53(53), 103–113 (2013). https://doi.org/10.3898/
136266213806045719
2. Wang, C.-X., Di Renzo, M., Stanczak, S., Wang, S., Larsson, E.G.: artificial intelligence
enabled wireless networking for 5G and beyond: recent advances and future challenges.
IEEE Wirel. Commun. 27(1), 16–23 (2020)
3. Henry, S., Alsohaily, A., Sousa, E.S.: 5G is real: evaluating the compliance of the 3GPP 5G
new radio system with the ITU IMT-2020 requirements. IEEE Access 8, 42828–42840 (2020)
4. Alnosayan, N., Chatterjee, S., Alluhaidan, A., Lee, E., Feenstra, L.H.: Design and usability
of a heart failure Mhealth system: a pilot study. JMIR Hum. Factors 4(1), e9 (2017)
5. Chanchaichujit, J., Tan, A., Meng, F., Eaimkhong, S.: Healthcare 4.0. Springer, Singapore
(2019). https://doi.org/10.1007/978-981-13-8114-0
6. Yamin, M.: Information technologies of 21st century and their impact on the society. Int. J.
Inf. Technol. 11(4), 759–766 (2019)
7. Yang, W., et al.: Narrowband wireless access for low- power massive Tnternet of Things: a
bandwidth perspective. IEEE Wirel. Commun. 24(3), 138–145 (2017)
8. Ekudden, E.: Five Technology Trends Augmenting the Connected Society (2018). https://
www.ericsson.com/en/ericsson-technology-review/archive/2018/technology-trends-201
9. Liyanage, M., Ahmad, I., Abro, A.B., Gurtov, A., Ylianttila, M.: A Comprehensive Guide to
5G Security. Wiley, Hoboken (2018)
10. Ericsson: Be the First to Deliver 5G Access (2020) https://www.ericsson.com/en/networks/
offerings/5g
11. You, X., Zhang, C., Tan, X., Jin, S., Wu, H.: Ai for 5G: research directions and paradigms.
Sci. China Inf. Sci. 62(2), 21301 (2019)
12. ITU: Minimum requirements related to technical performance for IMT-2020 radio inter-
face(S), November (2017)
13. 3rd Generation Partnership Project (2019). https://www.3gpp.org/release-15. Accessed 15
14. VoPham, T., Hart, J.E., Laden, F., Chiang, Y.-Y.: Emerging trends in geospatial artificial
intelligence (geoAI): potential applications for environmental epidemiology. Environ. Health
17(1), 40 (2018)
15. Sovuthy, C.: Imess 2018: Focus on IoT, AI, and 5G [conference reports]. IEEE Solid State
Circuits Mag. 11(1), 88–91 (2019)
16. Cakmak, S., Hebbern, C., Vanos, J., Crouse, D.L., Tjepkema, M.: Exposure to traffic and
mortality risk in the 1991–2011 Canadian census health and environment cohort (CanCHEC).
Environ. Int. 124, 16–24 (2019)
17. Cerrone, M., Cantile, M., Sacco, O., Botti, G.: Geo-location of oncological diseases in the
extra-urban areas of naples and creation of territorial biobanks: an important tool to study
potential connections between environmental factors and cancer. Anticancer Res. 38(11),
6459–6463 (2018)
A Smart Healthcare Framework 339
18. Han, X., et al.: Estimating the spatial distribution of environmental suitability for female
lung cancer mortality in china based on a novel statistical method. Environ. Sci. Pollut. Res.
26(10), 10083–10096 (2019)
19. Zhang, Z.: Prediction model for patients with acute respiratory distress syndrome: use of a
genetic algorithm to develop a neural network model. PeerJ 7, e7719 (2019)
20. Tayyaba, S.K., Shah, M.A.: Resource allocation in SDN Based 5G cellular networks. Peer
Peer Netw. Appl. 12(2), 514–538 (2019)
21. Stone, A.A., Shiffman, S.: Ecological momentary assessment (EMA) in behavorial medicine.
Ann. Behav. Med. 16(3), 199–202 (1994). https://doi.org/10.1093/abm/16.3.199
22. Lin, T.-S., Liu, P.-Y., Lin, C.-C.: Home healthcare matching service system using the Internet
of Things. Mob. Netw. Appl. 24(3), 736–747 (2018). https://doi.org/10.1007/s11036-018-
1087-y
23. Centers for Disease Control and Prevention (2019). Health and Economic Costs of Chronic
Disease. https://www.cdc.gov/chronicdisease/about/costs/index.htm
24. Bosworth, H.B., Steinhauser, K., Orr, M., Lindquist, J., Grambow, S., Oddone, E.: Con-
gestive heart failure patients’ perceptions of quality of life: the integration of physical and
psychosocial factors. Aging Ment. Health 8(1), 83–91 (2004)
25. Gallacher, K., May, C.R., Montori, V.M., Mair, F.S.: Understanding patients’ experiences of
treatment burden in chronic heart failure using normalization process theory. Ann. Fam. Med.
9(3), 235–243 (2011)
26. Juenger, J., et al.: Health related quality of life in patients with congestive heart failure:
comparison with other chronic diseases and relation to functional variables. Heart 87(3),
235–241 (2002)
27. Evangelista, L.S., et al.: Examining the effects of remote monitoring systems on activation,
self-care, and quality of life in older patients with chronic heart failure. J. Cardiovasc. Nurs.
30(1), 51 (2015)
28. Chang, H., et al.: Mobile phone application for self- assessment of acute stroke patients: a
tool for extended care and follow-up. Medicine 97(26) (2018)
29. Youm, S., Park, S.-H.: How the awareness of u-healthcare service and health conditions affect
healthy lifestyle: an empirical analysis based on a u-healthcare service experience. Telemed.
E-Health 21(4), 286–295 (2015)
30. Bahadin, J., Shum, E., Ng, G., Tan, N., Sellayah, P., Tan, S.W.: Follow-up consultation
through a healthcare kiosk for patients with stable chronic disease in a primary care setting:
a prospective study. J. Gen. Intern. Med. 32(5), 534–539 (2017)
31. Bradway, M., Pfuhl, G., Joakimsen, R., Ribu, L., Grøttland, A., Årsand, E.: Analysing
mhealth usage logs in RCTs: explaining participants’ interactions with type 2 diabetes
self-management tools. PLoS ONE 13(8) (2018)
32. Georgsson, M., Staggers, N., Weir, C.: A modified user-oriented heuristic evaluation of a
mobile health system for diabetes self-management support. Comput. Inform. Nurs. 34(2),
77 (2016)
33. Wu, W., et al.: Unsupervised phenotyping of severe asthma research program participants
using expanded lung data. J. Allergy Clin. Immunol. 133(5), 1280–1288 (2014)
34. Sherman, E., Gurm, H., Balis, U., Owens, S., Wiens, J.: Leveraging clinical time-series data for
prediction: a cautionary tale. In: AMIA Annual Symposium Proceedings: American Medical
Informatics Association, p. 1571 (2017)
35. Ashtiyani, M., Lavasani, S.N., Alvar, A.A., Deevband, M.: Heart rate variability classification
using support vector machine and genetic algorithm. J. Biomed. Phys. Eng. 8(4), 423 (2018)
36. Yang, L., MacEachren, A.M., Mitra, P., Onorati, T.: Visually-enabled active deep learning for
(Geo) text and image classification: a review. ISPRS Int. J. Geo-Inf. 7(2), 65 (2018)
340 B. Mullachery and S. Alismail
37. McCoy, T.H., Pellegrini, A.M., Perlis, R.H.: Assessment of time-series machine learning
methods for forecasting hospital discharge volume. JAMA Netw. Open 1(7), e184087–
e184087 (2018)
38. Marzec, L., et al.: Device-measured physical activity data for classification of patients with
ventricular arrhythmia events: a pilot investigation. PLoS ONE 13(10), e0206153 (2018)
39. Kwon, J.M., Kim, K.H., Jeon, K.H., Park, J.: Deep learning for predicting in-hospital mortality
among heart disease patients based on echocardiography. Echocardiography 36(2), 213–218
(2019)
40. Dubey, A.K., Gupta, U., Jain, S.: Epidemiology of lung cancer and approaches for its
prediction: a systematic review and analysis. Chin. J. Cancer 35(1), 71 (2016)
41. Sheth, A., Jaimini, U., Yip, H.Y.: How will the Internet of Things enable augmented
personalized health? IEEE Intell. Syst. 33(1), 89–97 (2018)
Analytic Hierarchy Process Model
for the Diagnosis of Typhoid Fever
Abstract. Typhoid fever is a global health problem, which seems neglected. Still,
it is responsible for significant levels of morbidity in many regions of the world,
with about 12 million cases annually, and about 600,000 fatalities. Diagnosis of
typhoid poses a lot of challenges because its clinical presentation is confused with
those of many other febrile infections such as malaria, yellow fever, etc. In addi-
tion, most developing countries do not have adequate bacteriology laboratories
for further investigations. Decision support systems (DSSs) have been known to
increase the efficiency and effectiveness of the diagnosis process, in addition to
improving access; however, most existing decision support models for the diag-
nosis of diseases have largely focused on ‘non-tropical’ conditions. An effective
decision support model for the diagnosis of tropical diseases can only be devel-
oped through the engineering of experiential knowledge of physicians who are
experts in the management of such conditions. In this study, we mined the experi-
ential knowledge of twenty-five tropical disease specialist physicians to develop
a decision support system based on the Analytic Hierarchy Process (AHP). The
resulting model was tested based on 2044 patient data. Our model successfully
determined the occurrence (or otherwise) of typhoid fever in 78.91% of the cases,
demonstrating the utility of AHP in the diagnosis of typhoid fever.
1 Introduction
The World Health Organization [1] population estimate of the global typhoid fever bur-
den lies between 11–21 million. Also, up to 161,000 associated deaths have been reported
annually, and a greater proportion of this statistic comes from poor and vulnerable com-
munities such as South and South-East Asia including sub-Saharan Africa. Without
B. Akinnuwesi—Formerly University of Swaziland.
based on his experience of the disease he determines the degree of importance of the
pair he compares. This task becomes daunting in the face of so many patients waiting to
be attended to by few medical doctors. To attend to most of them, most inexperienced
doctors (or FHWs) are prone to diagnosis errors, especially if the disease suspected is
the type that presents confusable symptoms. The clinical presentation of typhoid fever
is one of such diseases whose symptoms are often confused or in conflict with diseases
like malaria, hepatitis, urinary tract infection and others. The outcome is misdiagnosis
and attendant consequences of late diagnosis resulting in high morbidity and mortality
rate.
The need for a robust analytic hierarchy process model presented in this paper is
justified by the following major issues of concern in the course of diagnosing Typhoid
fever: (1) Typhoid fever is identified among of the neglected diseases that pose global
health problem and it is responsible for significant levels of morbidity in many regions
of the world, with about 12 million cases annually, and about 600,000 fatalities; (2)
Diagnosis of typhoid poses a great deal of challenge because its clinical presentation
is confused with those of many other febrile infections such as malaria, yellow fever,
and many others; (3) Most developing countries do not have adequate bacteriology
laboratories for further investigations; (4) Most patients suffering from typhoid fever find
it difficult to express how they feel making it difficult for a medical doctor to decipher the
cause of their illness; and (5) Typhoid fever is one of the often-misdiagnosed diseases
in low-to-middle income countries (LMICs) due to self-diagnosis resulting from poor
access to quality health care and lack of access to pathogenic testing.
This study proposes the use of the Analytic Hierarchy Process (AHP) in the devel-
opment of a decision support system for diagnosing typhoid fever. AHP [9] provides a
suitable mechanism for evaluating complex multicriteria decision variables, such as that
presented in the diagnosis of tropical diseases, which in most cases could be challenging
in terms of the combinatorial analysis of symptoms and their degrees of intensity in
the diagnosis process. The AHP technique has been applied in various facets of human
endeavour, including health care [10–12] and provides a mechanism for evaluating con-
sistency in the pairwise comparison of decision variables in the knowledge engineering
process [13]. In [14], AHP is seen as a theory of measurement through pairwise compar-
ison and relies on the judgment of experts to derive priority scales. These scales measure
tangibles in relative terms.
The model reported in this paper helps to successfully determine the occurrence
(or otherwise) of typhoid fever in 78.91% of the cases, demonstrating the utility of
AHP in the diagnosis of typhoid fever. In Sect. 2, we review literature on the use of
decision support systems in the diagnosis of some tropical diseases and the conventional
method of diagnosis of typhoid fever is also discussed. Section 3 presents the Study
Methodology. The results and discussion are presented in Sect. 4. Some conclusions are
drawn in Sect. 5.
2 Literature Review
The first efforts at creating decision support tools for medical diagnosis began with
the pioneering works of [15, 16]. These works attempted a paradigm shift from purely
344 F.-M. Uzoka et al.
outputs by human experts e.g. [11, 24, 25], because of the ability of fuzzy logic to han-
dle vagueness in symptom elicitation and the strength of AHP in the development of
multi-criteria models.
Typhoid fever is one of the often-misdiagnosed diseases in low-to-middle income
countries (LMICs) due to self-diagnosis resulting from poor access to quality health care
and lack of access to pathogenic testing. Most poor communities in LMICs have high
incidence of malaria (due to poor vector control) and typhoid fever (due to poor sanitation,
drug resistance, and self-diagnosis) [26]. Fever is a commonly reported symptom in
several tropical diseases, and without localized features and appropriate tests, diagnosis
often erroneously defaults to malaria [27] without consideration to other pathogens [28].
In the absence of accurate laboratory tests, presumptive diagnosis would require a careful
analysis of symptoms presentation and other clinical/non-clinical parameters [29].
Decision support systems have been previously proposed/developed for the diagno-
sis of typhoid fever. Oguntimilehin et al. in [30] proposed a machine learning approach,
using 18 symptoms, 100 training datasets and 50 testing datasets, with a 95% detection
rate. Though the results are impressive, the number of data sets utilized for training and
testing were few. Moreover, using 18 symptoms would likely reduce the efficiency of
diagnosis. Santosa et al. in [31] applied fuzzy logic Sugeno methods to the diagnosis of
typhoid fever and Dengue hemorrhagic fever. The similarity of the symptoms of these
two diseases necessitated the use of soft-computing methods, with 80.2% diagnostic
accuracy; but also, with a small set of 86 data. Several other researchers have developed
hybrid systems for the diagnosis of typhoid with varying degrees of diagnostic accu-
racy. For examples, Asogbon, et al. in [32] deployed an enhanced neuro-fuzzy system
which was applied in genetic algorithm for medical diagnosis. Their aim was to opti-
mize performance of an Adaptive Neuro-Fuzzy Inference System (ANFIS) in terms of
its connection weights which is usually computed based on trial and error when used
to diagnose typhoid fever. The study used Genetic Algorithm (GA) technique to auto-
matically evolve optimum connection weights needed to efficiently train a built ANFIS
model used for typhoid fever diagnosis, 104 medical records were adopted for the study
with 15 to 75 age range. This was used to test the performance of the multi-technique
decision support system. 70% of the dataset was used for training data, 15% was used
for validation while the remaining 15% was used to observe the performance of the
proposed system. Genetic Adaptive Neuro Fuzzy Inference system (GANFIS) gave an
average diagnosis accuracy of 92.7% compared to 85.5% recorded by the ANFIS.
Most of the studies on the use of soft-computing and multicriteria methods for
the diagnosis of typhoid fever produced encouraging results in terms of the matching
diagnoses; however, they mostly used small datasets, which makes the outputs difficult
to generalize. In addition, they failed to provide the false positive and false negative
values. The false positive (FP) is a Type-1 error because it indicates that the patient
actually has the disease, whereas a confirmatory test proves the initial test result to be
false. A false negative (FN) result is a Type-II error, whereby the diagnosis fails to
reject a false null hypothesis. The existence of false positive and false negative results
underscores the need for further confirmatory investigations with a higher degree of
sensitivity. According to Ioannidis, Tarone and McLaughlin in [33] the FP and FN
results do not necessarily lead to the same consequences, and their relative importance
346 F.-M. Uzoka et al.
may vary in different investigations, which indicates that the acceptable threshold may
also vary. In general, an acceptable threshold value could be from 0.5 accuracy for a
random classifier but a close accuracy value which is close to 1 could be better. Cooper
in [34] established that threshold can be addressed by selecting symptom-based cut-off
points to distinguish between disorder and normality which may be more or less wisely
chosen so that results obtained should be widely accepted.
Typhoid fever is one the diseases that is responsible for high mortality rate in the
tropical regions of the world. The high mortality rate is occasioned by misdiagnosis
due to its confusable symptoms that overlap with symptoms of other febrile diseases.
Inadequate medical facilities and personnel has also contributed to the high mortality and
morbidity as patients resort to self-medication which complicate the symptoms resulting
into deaths. Most patients suffering from typhoid fever find it difficult to express how
they feel making it difficult for a medical doctor to decipher the cause of their illness. This
uncertainty and imprecision though not peculiar to typhoid fever has necessitated the use
of soft computing techniques for the management and processing of these uncertainties
and imprecisions to medical diagnosis. AHP is found to be the most used tool for eliciting
weights of symptoms of diseases while fuzzy logic is known to be the most popular tool
in managing uncertainties and imprecision. A strong correlation has been found in most
AHP/Fuzzy logic diagnostic system and human experts’ diagnosis, though most of such
systems use small datasets. Such systems are also not found to use the false positive and
false negative values which underscores the need for further confirmatory investigations
with higher degree of sensitivity. In the light of this, the results produced by such systems
are not generalised.
Microbiological Cultures
The isolation of the causative organism, Salmonella enterica serovar Typhi (Salmonella
Typhi), is the gold standard for the diagnosis [1]. Body fluids like blood, bone marrow,
stool, urine, rose spots, gastric and intestinal secretions may be cultured. Blood culture
gives a definitive diagnosis. However, the rates of positive culture are usually higher
when using bone marrow aspirates for the culture [38]. In a systemic review by Mogasale
et al. in [39], the proportion of Salmonella Typhi detection was 61% from blood cultures
compared to 96% from bone marrow aspirate cultures. The use of bacteriological cultures
Analytic Hierarchy Process Model for the Diagnosis 347
for the diagnosis of typhoid infection is cost-intensive and technically difficult, hence
the need for other diagnostic tests.
Molecular Assay
The need to overcome the challenges posed by the inadequacies of using serologic tests
and cultures have led to exploration of molecular methods for the diagnosis of typhoid
fever. DNA-based detection methods, such as Polymerase Chain Reaction (PCR) has
shown better sensitivity and specificity than blood cultures. The results are even better
with the use of nested multiplex PCR [38, 45].
3 Methodology
3.1 Data Collection
Data collection for the development and testing of the typhoid fever model was obtained
in Nigeria, which is a tropical country with a high population and a fairly significant
prevalence of tropical diseases. Two data collection instruments were designed for the
348 F.-M. Uzoka et al.
purpose of the study. The first instrument obtained experiential knowledge from 25 physi-
cians, experienced in the diagnosis of tropical diseases, for the development of models
to diagnose the following tropical diseases: malaria, typhoid, chicken pox, measles hep-
atitis B, yellow fever and UTI. In this paper, we report on the model for the diagnosis of
typhoid fever. The knowledge extraction instrument also elicited the following physician
demographic information: age range, gender, professional experience, type of clinic they
work in (public or private) and experience in diagnosing and treating the tropical dis-
eases under consideration. The instrument required the physicians to carry out a pairwise
comparison of various symptoms (obtained through literature search) that are associ-
ated with the diseases on a nine-point linguistic scale. Prior to the administration of the
AHP questionnaire, we employed the assistance of a physician and an epidemiologist
in reviewing our model to ensure that the correct symptoms are captured for each of
the diseases. Overall, eighteen symptoms were considered relating to the diagnosis of
typhoid: fever, headache, abdominal pain, fatigue, vomiting, coughing, loss of appetite,
chills, rash, and diarrhoea. The second instrument was administered to 40 physicians
who provided patient consultation and diagnosis data for 2199 patients, for purposes
of model testing – 2044 were found usable after data cleaning. In addition to the tests
with real life patient data, we requested 13 physicians to do a validation of the results
generated by our model.
3.2 Processing
This study adopted the classical AHP methodology [46] in the development of a model
for the diagnosis of typhoid fever. The AHP modeling was based on the group decision
analysis using an online Excel template (https://bpmsg.com/ahp-excel-template/). Based
on the results of the AHP computation, we developed the diagnosis model. The key
elements of the AHP are: pairwise comparison of variables; measurement of consistency,
and priorities derivation, all of which are detailed below:
Measurement of Consistency
The levels of consistency and consensus of judgments by experts in AHP decision
modelling are crucial pointers to the model’s reliability and reflection of the dependability
of the expert judgments in relation to the pairwise comparison of the decision variables.
A consistency check must be conducted since priorities make sense only if they are
derived from consistent or near consistent matrices. Saaty in [46] proposed a consistency
ratio, which is related to the eigenvalue method. Deviations from the consistency are
represented by the consistency index (CI). Related to the CI is the consistency ratio (CR),
which is the ratio of the CI to a random consistency index (RI). CI is calculated as:
λmax − n
CI = (3)
n−1
and the consistency ratio is given as:
CR = CI /RI (4)
λmax = maximal eigenvalue, and n is the number of variables in the pairwise comparison
matrix.
RI is the random index determined by Saaty in [46] as follows:
n 3 4 5 6 7 8 9 10
RI 0.58 0.9 1.12 1.24 1.32 1.41 1.45 1.49
Pn
n
pi is generated as ( vij /n) (6)
j=1
vij is the eigenvalue corresponding to element aij of the PWC matrix. This is obtained
from the matrix of eigenvectors. The matrix of eigenvectors V is computed as:
⎡ a11
· · · na1n a
⎤
n
a
⎢ i=1. i1 . i=1 in
.. ⎥
V =⎢⎣ .
. . . .
⎥
⎦ (7)
a n1 a nn
n
a
· · · n a
i=1 i1 i=1 in
350 F.-M. Uzoka et al.
If there are lower levels in the hierarchy, then the global priority is obtained by
factoring in the eigenvector value of the priority at the level above the current hierarchy.
If µi is the eigenvector value associated with the upper-level criteria directly above the
set of variables (si ) under consideration, then the global priorities would be given as:
GP i = µi −
−→
pi. xi. (8)
where
−
GP i is the global priority associated with the vector of variables and weight pairs
p−→
i. xi. . The variables are xi1 , xi2 , …, xin ; while pi. represents the lower level priority
weights (pi1 , pi2 , …, pin ) associated with xi1 , xi2 , …, xin .
Table 1. PWC Matrix (Relative Importance) with Respect to the Typhoid Symptoms
Fever Headache Fatigue Abdominal Vomiting Chills Diarrhoea Coughing Rash Loss of
pain appetite
Fever 1.00 0.79 0.75 0.49 0.70 0.81 0.50 0.00 0.23 0.78
Headache 1.00 0.00 0.00 0.47 0.00 0.00 0.00 0.00 0.00
Fatigue 1.00 0.46 0.68 0.00 0.00 0.00 0.00 0.00
Abdominal 1.00 0.69 0.38 0.71 0.00 0.30 0.59
pain
Vomiting 1.00 0.00 0.00 0.00 0.00 0.00
Chills 1.00 0.00 0.00 0.00 0.00
Diarrhoea 1.00 0.00 0.00 0.00
Coughing 1.00 0.37 0.35
Rash 1.00 0.00
Loss of 1.00
appetite
eigenvector. The implication of the eigenvector is that it expresses the relative importance
of a symptom over another relating to the diagnosis of typhoid fever in the minds of the
physician. Figure 1 shows the relative priorities (relevance) of symptoms in the diagnosis
of typhoid fever, while the linear model (typhoid fever diagnosis factor index – TFDFI)
is shown in Eq. (9).
Diarrhoea 0.044
Rash 0.017
Chills 0.081
Loss of appetite 0.083
Coughing 0.016
Vomiting 0.052
Fatigue 0.137
Abdominal pain 0.118
Headache 0.184
Fever 0.269
The TFDFI model shows that typhoid fever manifests mostly with fever (26.9%),
headache (18.4%), fatigue (13.7%), abdominal pain (11.8%), loss of appetite (8.3%) and
chills (8.1%). These are in agreement with the results obtained in [27, 49, 50]. Fever and
headache are two symptoms that manifest across most tropical diseases. The confusable
nature of symptom manifestations in these diseases call for methodical approaches to
isolate each disease based on other peculiar symptoms. Our research shows that a com-
bination of abdominal pain, chills, fatigue and loss of appetite in addition to headache
and/ or fever are strong pointers to the possibility of typhoid presence, though a num-
ber of these symptoms could present more at the later stages of typhoid infection [51,
52]. Several researchers have revealed that there are primary symptoms of typhoid which
starts with fever lasting for more than 48hrs, thereafter accompanied by intense headache
of about 43–90% presentation, followed by gastrointestinal symptoms which includes;
abdominal pain/cramps, nausea and vomiting, constipation or diarrhoea. All of these
symptoms present the same way for both children and adults [53–56].
Our model was tested using data from the 2044 patients, based on an aggregation
procedure shown in Fig. 2. The patients are assessed on each of the symptoms based on
352 F.-M. Uzoka et al.
The results show 78.91% matching classifications of typhoid fever, with 10.18%
false positives and 10.91% false negatives. The results align with a number of results,
which have recorded false positives and false negatives of between 3% and 15% [57].
We approached 13 medical doctors to evaluate our model in terms of the results
obtained and feasibility of utilizing a computer application that would be developed
based on the AHP model for the diagnosis of typhoid fever. Most of the physicians were
of the opinion that computational methods, such as the use of AHP, could be viable in the
diagnosis of typhoid fever. There was a general opinion that the AHP model is complex
Analytic Hierarchy Process Model for the Diagnosis 353
Though our sensitivity results (FP and FN) are within fairly acceptable thresholds
[58, 59], there is need to reduce the FP and FN levels. This could be accomplished
through: i) increase in the number of physicians providing experiential knowledge for
the model development; and ii) use of Delphi method for refining the physicians’ expert
judgments in pairwise comparison of symptoms of diseases. The physicians further
pointed out the need to use our syndromic test as a first-stage diagnostic tool to isolate
cases for further tests. Since a number of the further tests could be expensive [60, 61],
especially in low-to-middle-income countries, a computational the syndromic diagnosis
tool could be a veritable means of methodically isolating cases for further laboratory
tests. Previous studies (e.g. [62]) have emphasized the utility of soft-computing tools in
aiding inexperienced physicians and front-line health workers in syndromic diagnosis
of tropical confusable diseases.
The results of our study and their generalizability can be improved upon by increas-
ing the number of domain experts (physicians) involved in the knowledge definition, and
implementing mechanisms that could improve the consistency and consensus in pair-
wise comparisons by the domain experts. The consistency of the pairwise comparison
could be improved by methods such as the adaptive AHP approach [A3 ] [67], and the
linguistic preference relations [Fuzzy LinPreRa] [68, 69], which also improves consen-
sus. Additional utilization of a Delphi process would also refine the experts’ pairwise
comparison results [70], while AHP hybridization with fuzzy logic could potentially
increase the predictive ability of the model by dealing with the fuzzy nature of data
that could arise during expert pairwise comparison judgment, and patient consultation.
We note that typhoid co-infects with some other febrile diseases such as malaria [71,
72]. It will be desirable to develop a multi-criteria diagnosis system that assists in the
differential diagnosis of febrile diseases, recognizing co-infection.
References
1. World-Heath-Organisation: Typhoid vaccine: WHO position paper. Weekly epidemiological
record 93, pp. 153–172 (2018). http://www.who.int/wer. Accessed 23 July 2020
2. Iheukwumere, I., Nwachukwu, C.N., Kanu, M.A.: Manifestations, mismanagement and diag-
nostic challenges of malaria and typhoid fever. Malar Chemoth Cont Elimin. 2(109), 38–41
(2013)
3. W. W. H. organization: World Health Organization, World Health Statistics 2015, World
Health Organization, Geneva, Switzerland (2015). https://www.who.int/gho/publications/
world_health_statistics/2015/en/. Accessed 20 Apr 2017
4. Djam, X., Wajiga, G., Kimbi, Y., Blamah, N.: A fuzzy expert system for the management of
malaria (2011)
5. Szolovits, P., Patil, R.S., Schwartz, W.B.: Artificial intelligence in medical diagnosis. Ann.
Intern. Med. 108(1), 80–87 (1988)
6. Driver, C.: Malaria and its avoidance. Pract. Nurse 37(8), 19–24 (2009)
7. Kayemba, C.N., et al.: Introduction of newborn care within integrated community case
management in Uganda. Am. J. Trop. Med. Hyg. 87(5 Suppl), 46 (2012)
8. World-Health-Organization: WHO teams assist people in hard-to-reach areas of Nige-
ria (2017). https://www.who.int/news-room/feature-stories/detail/who-teams-assist-people-
in-hard-to-reach-areas-of-nigeria. Accessed 12 May 2019
9. Wind, Y., Saaty, T.L.: Marketing applications of the analytic hierarchy process. Manag. Sci.
26(7), 641–658 (1980)
10. Liberatore, M.J., Nydick, R.L.: The analytic hierarchy process in medical and health care
decision making: a literature review. Eur. J. Oper. Res. 189(1), 194–207 (2008)
11. Uzoka, F.-M.E., Obot, O., Barker, K., Osuji, J.: An experimental comparison of fuzzy logic and
analytic hierarchy process for medical decision support systems. Comput. Methods Programs
Biomed. 103(1), 10–27 (2011)
12. Agapova, M., et al.: Using the analytic hierarchy process for prioritizing imaging tests in
diagnosis of suspected appendicitis. Acad. Radiol. 24(5), 530–537 (2017)
13. Zyoud, S.H., Fuchs-Hanusch, D.: A bibliometric-based survey on AHP and TOPSIS
techniques. Expert Syst. Appl. 78, 158–181 (2017)
14. Saaty, T.L.: Decision making with the analytic hierarchy process. Int. J. Serv. Sci. 1(1), 83–98
(2008)
356 F.-M. Uzoka et al.
15. Kulikowski, C.A.: Pattern recognition approach to medical diagnosis. IEEE Trans. Syst. Sci.
Cybern. 6(3), 173–178 (1970)
16. Shortliffe, E.H.: MYCIN: a rule-based computer program for advising physicians regarding
antimicrobial therapy selection, Stanford Univ Calif Dept of Computer Science (1974)
17. Kaeding, A.-K., Flor, T.: Processing unexact information in a medical used multiparadigm
system. pp. 590–592 (1995)
18. Song, Q., Ma, T., Kasabov, N.: A novel generic higher-order TSK fuzzy model for prediction
and applications for medical decision support, pp. 241–245 (2003)
19. Marsh, K., Lanitis, T., Neasham, D., Orfanos, P., Caro, J.: Assessing the value of healthcare
interventions using multi-criteria decision analysis: a review of the literature. Pharmacoeco-
nomics 32(4), 345–365 (2014)
20. Grosan, C., Abraham, A., Tigan, S.: Multicriteria programming in medical diagnosis and
treatments. Appl. Soft Comput. 8(4), 1407–1417 (2008)
21. Hancerliogullari, G., Hancerliogullari, K.O., Koksalmis, E.: The use of multi-criteria decision
making models in evaluating anesthesia method options in circumcision surgery. BMC Med.
Inform. Decis. Mak. 17(1), 1–13 (2017)
22. Olaniyan, O.M., Alegbeleye, O.: An android-based expert system for diagnosis of selected
tropical diseases using fuzzy-analytical hierarchy process. Int. J. Innov. Res. Educ. Technol.
Soc. Strateg. 6(1), 149–155 (2019)
23. Ajenaghughrure, I.B., Sujatha, P., Akazue, M.I.: Fuzzy based multi-fever symptom classifier
diagnosis model. Int. J. Technol. Comput. Sci. 10(1), 13–28 (2017)
24. Prihatini, P.M., Putra, I.K.G.D.: Fuzzy knowledge-based system with uncertainty for tropical
infectious disease diagnosis. Int. J. Comput. Sci. Issues (IJCSI) 9(4), 157 (2012)
25. Obot, O., Inyang, U.: ANFIS based fuzzy clustering system for differential diagnosis of
confusable diseases. World 6(2), 160–165 (2014)
26. Ajibola, O., Omisakin, O.A., Eze, A.A., Omoleke, S.A.: Self-medication with antibiotics,
attitude and knowledge of antibiotic resistance among community residents and undergraduate
students in Northwest Nigeria. Diseases 6(2), 32 (2018)
27. Crump, J.A., Luby, S.P., Mintz, E.D.: The global burden of typhoid fever. Bull. World Health
Organ. 82(5), 346–353 (2004)
28. Acestor, N., et al.: Mapping the aetiology of non-malarial febrile illness in Southeast Asia
through a systematic review—terra incognita impairing treatment policies (2012)
29. Luvira, V., et al.: Etiologies of acute undifferentiated febrile illness in Bangkok, Thailand.
Am. J. Trop. Med. Hyg. 100(3), 622 (2019)
30. Oguntimilehin, A., Adetunmbi, A., Abiola, O.: A machine learning approach to clinical
diagnosis of typhoid fever. Mach. Learn. Approach Clin. Diagn. Typhoid Fever 2(4), 1–6
(2013)
31. Santosa, I., Rahmanita, E., A’Yuni, T., Novianti, T.: Application of fuzzy logic Sugeno
methods for diagnosis typhoid fever disease and dengue hemorrhagic fever, pp. 24–10 (2018)
32. Asogbon, M., Samuel, O., Omisore, M., Awonusi, O.: Enhanced neuro-fuzzy system based
on genetic algorithm for medical diagnosis. J. Med. Diagn. Meth. 5(205), 2 (2016)
33. Ioannidis, J.P., Tarone, R., McLaughlin, J.K.: The false-positive to false-negative ratio in
epidemiologic studies. Epidemiology, 450–456 (2011)
34. Cooper, R.V.: Avoiding false positives: Zones of rarity, the threshold problem, and the DSM
clinical significance criterion. Can. J. Psychiatry 58(11), 606–611 (2013)
35. Andrews, J.R., et al.: High rates of enteric fever diagnosis and lower burden of culture-
confirmed disease in peri-urban and rural Nepal. J. Infect. Dis. 218(suppl_4), S214–S221
(2018)
36. Parry, C.M., Wijedoru, L., Arjyal, A., Baker, S.: The utility of diagnostic tests for enteric
fever in endemic locations. Expert Rev. Anti Infect. Ther. 9(6), 711–725 (2011)
Analytic Hierarchy Process Model for the Diagnosis 357
37. Andrews, J.R., Ryan, E.T.: Diagnostics for invasive Salmonella infections: current challenges
and future directions. Vaccine 33, C8–C15 (2015)
38. Sultana, S., Al Maruf, M.A., Sultana, R., Jahan, S.: Laboratory diagnosis of enteric fever: a
review update. Bangladesh J. Infect. Dis. 3(2), 43–51 (2016)
39. Mogasale, V., Ramani, E., Mogasale, V.V., Park, J.: What proportion of Salmonella Typhi
cases are detected by blood culture? A systematic literature review. Ann. Clin. Microbiol.
Antimicrob. 15(1), 1–8 (2016)
40. Bharmoria, A., Shukla, A., Sharma, K.: Typhoid fever as a challenge for developing countries
and elusive diagnostic approaches available for the enteric fever. Int. J. Vaccine Res. 2(2),
1–16 (2017)
41. Ammah, A., Nkuo-Akenji, T., Ndip, R., Deas, J.: An update on concurrent malaria and typhoid
fever in Cameroon. Trans. R. Soc. Trop. Med. Hyg. 93(2), 127–129 (1999)
42. Nsutebu, E.F., Ndumbe, P.M., Koulla, S.: The increase in occurrence of typhoid fever in
Cameroon: overdiagnosis due to misuse of the Widal test? Trans. R. Soc. Trop. Med. Hyg.
96(1), 64–67 (2002)
43. Mengist, H., Tilahun, K.: Diagnostic value of Widal test in the diagnosis of typhoid fever: a
systematic review. J. Med. Microbiol. Diagn. 6(01), 1–4 (2017)
44. Ajibola, O., Mshelia, M.B., Gulumbe, B.H., Eze, A.A.: Typhoid fever diagnosis in endemic
countries: a clog in the wheel of progress? Medicina 54(2), 23 (2018)
45. Srivastava, K.R., Awasthi, S., Mishra, P.K., Srivastava, P.K.: Biosensors/molecular tools for
detection of waterborne pathogens. Waterborne Pathog., 237–277 (2020)
46. Saaty, T.L.: A scaling method for priorities in hierarchical structures. J. Math. Psychol. 15(3),
234–281 (1977)
47. Karapetrovic, S., Rosenbloom, E.: A quality control approach to consistency paradoxes in
AHP. Eur. J. Oper. Res. 119(3), 704–718 (1999)
48. Cook, M., Angus, A., Gottberg, A., Smith, R., Longhurst, P.: Promoting sustainable resource
use through product service systems. In: CIWM Conference, Waste: A Global Resource.
Technical Session 5, Resource Recovery. Paignton, Torbay, UK, pp. 12–15 (2007)
49. Bhan, M., Bahl, R., Bhatnagar, S.: Typhoid and paratyphoid fever. Lancet 366(9487), 749–762
(2005)
50. Mouton, F., Ohuoba, E.I., Evans, F.M., Desalu, I., Wilson, C.: Typhoid enteric fever–part.
Update Anaesth. 32, 13 (2017)
51. Sanhueza Palma, N.C., Farías Molina, S., Calzadilla Riveras, J., Hermoso, A.: Typhoid fever:
case report and literature review. Medwave 16(05) (2016)
52. Buzğan, T., Evirgen, Ö., Irmak, H., Karsen, H., Akdeniz, H.: A case of typhoid fever presenting
with multiple complications. Eur. J. Gen. Med. 4(2), 83–86 (2007)
53. Zein, U.: Management of severe typhoid fever, pp. 1–6 (2017). https://www.researchgate.net/
publication/321144926_Management_of_Severe_Typhoid_Faver
54. Bhutta, Z.A.: Current concepts in the diagnosis and treatment of typhoid fever. BMJ
333(7558), 78–82 (2006)
55. Stephens, I., Levine, M.M.: Management of typhoid fever in children. Pediatr. Infect. Dis. J.
21(2), 157–159 (2002)
56. Woodward, T.E., Smadel, J.E.: Management of typhoid fever and its complications. Ann.
Intern. Med. 60(1), 144–157 (1964)
57. Lee, J.-H., et al.: False-positive results for rapid diagnostic tests for malaria in patients with
rheumatoid factor. J. Clin. Microbiol. 52(10), 3784–3787 (2014)
58. Hjalmarsson, V.: Machine learning and multi-criteria decision analysis in healthcare: a
comparison of machine learning algorithms for medical diagnosis (2018)
59. Dhouib, S., Kharrat, A., Chabchoub, H.: A multi-start threshold accepting algorithm for
multiple objective continuous optimization problems. Int. J. Numer. Meth. Eng. 83(11), 1498–
1517 (2010)
358 F.-M. Uzoka et al.
60. Lehmann, L.E., Herpichboehm, B., Kost, G.J., Kollef, M.H., Stüber, F.: Cost and mortality
prediction using polymerase chain reaction pathogen detection in sepsis: evidence from three
observational trials. Crit. Care 14(5), 1–10 (2010)
61. Bartlett, J., Stirling, D.: A short history of the polymerase chain reaction. In: Bartlett, J.,
Stirling, D. (eds.) PCR Protocols. Methods in Molecular Biology™, vol. 226, pp. 3–6. Humana
Press (2003). https://doi.org/10.1385/1-59259-384-4:3
62. Uzoka, F.-M.E., Nwokoro, C., Debele, F., Akinnuwesi, B., Olaniyan, M.: AHP model for
diagnosis of tropical confusable diseases, pp. 1758–1763 (2017)
63. Khanmohammadi, S., Rezaeiahari, M.: AHP based classification algorithm selection for
clinical decision support system development. Procedia Comput. Sci. 36, 328–334 (2014)
64. Antillón, M., et al.: The burden of typhoid fever in low-and middle-income countries: a
meta-regression approach. PLoS Negl. Trop. Dis. 11(2), e0005376 (2017)
65. Zhang, X., Liu, Y., Yang, M., Zhang, T., Young, A.A., Li, X.: Comparative study of four
time series methods in forecasting typhoid fever incidence in China. PLoS ONE 8(5), e63116
(2013)
66. Hosoglu, S., Geyik, M.F., Akalin, S., Ayaz, C., Kokoglu, O.F., Loeb, M.: A simple validated
prediction rule to diagnose typhoid fever in Turkey. Trans. R. Soc. Trop. Med. Hyg. 100(11),
1068–1074 (2006)
67. Lin, C.-C., Wang, W.-C., Yu, W.-D.: Improving AHP for construction with an adaptive AHP
approach (A3). Autom. Constr. 17(2), 180–187 (2008)
68. Wang, T.-C., Chen, Y.-H.: Applying fuzzy linguistic preference relations to the improvement
of consistency of fuzzy AHP. Inf. Sci. 178(19), 3755–3765 (2008)
69. Wu, Z., Huang, S., Xu, J.: Multi-stage optimization models for individual consistency and
group consensus with preference relations. Eur. J. Oper. Res. 275(1), 182–194 (2019)
70. Abdel-Basset, M., Mohamed, M., Sangaiah, A.K.: Neutrosophic AHP-Delphi group decision
making model based on trapezoidal neutrosophic numbers. J. Ambient. Intell. Humaniz.
Comput. 9(5), 1427–1443 (2017). https://doi.org/10.1007/s12652-017-0548-7
71. Baba, M., et al.: Evidence of arbovirus co-infection in suspected febrile malaria and typhoid
patients in Nigeria. J. Infect. Dev. Ctries. 7(01), 051–059 (2013)
72. Odikamnoro, O., et al.: Incidence of malaria/typhoid co-infection among adult population in
Unwana community, Afikpo north local government area, Ebonyi state, Southeastern Nigeria.
Afr. J. Infect. Dis. 12(1), 33–38 (2018)
Gradient Boosting and Minimum Redundancy
Maximum Relevance (mRMR) Feature Selection
for Diagnosis of Parkinson’s Disease Through
Patient Audio Data
1 Introduction
Parkinson’s Disease (PD) is a neurological disease which persists typically amongst
the elderly, though not entirely. While the exact causes of the disease vary amongst the
affected population, all exhibit injury within the basal ganglia and substantia nigra por-
tions of the brain [8]. These regions are most closely correlated with voluntary movement
and dopamine assembly; thus, excessive damage and inhibition of the neurons can lead
to noticeable manifestations of PD. For example, common in a majority of PD patients
is an involuntary tremor within the hands and sometimes feet [11]. Motor control and
movement is inhibited as well, with sudden bouts of muscle rigidity preventing typical
bodily actions [11].
Currently, doctors and laboratories tasked with diagnosing PD base their reports
upon symptom descriptions and brain scans, particularly dopamine mapping. In particu-
lar, tremors and muscle stiffness symptoms typically reported by PD patients result in the
official diagnosis of the disease due to its connection to the substantia nigra [3]. Rigorous
PD diagnosis, however, is unable to be conducted properly simply due to the unavail-
ability of clinical PD tools and unique symptoms of PD. While the disease results in the
manifestation of numerous symptoms, these are oftentimes not related solely to PD, and
can also be attributed to a variety of other diseases and disorders. Utilizing dopamine
mapping as the basis for PD diagnosis also leads to dramatically limited accessibility to
patients worldwide due to the lack of proper imaging tools.
In conjunction with the aforementioned symptoms, PD patients also experience a
change in speech patterns, often with slight variations in enunciation [3]. While such
changes are typically too insignificant to be noticed by the human observer on individual
cases, tell-tale patterns are clear within frequency metrics that are derived from vocal
recordings. The prevalence of such a symptom within the PD population as a whole
allows for an opportunity to differentiate active PD patients through a machine learning
approach. Unlike other widely considered symptoms, PD voice discrepancies are specific
to the PD disease, and a diagnosis tool based around voice fluctuations could prove to
be comparatively rigorous in the diagnostic process.
2 Methods
2.1 Dataset
This study utilized biomedical voice measurements of voice recordings to both,
train and evaluate Gradient Boosting models to detect PD. This data was taken
from the UCI Machine Learning Repository’s Parkinsons Data Set (available here).
This dataset includes a total of 195 voice recordings taken from 31 subjects, 23
of whom have PD and 8 of whom are healthy. Subjects who have PD are labeled
with a 1 and healthy subjects are labeled with a 0. For each voice recording,
22 biomedical voice measurements are included. These biomedical voice measure-
ments include the average vocal fundamental frequency (MDVP:Fo(Hz)), the max-
imum vocal fundamental frequency (MDVP:Fhi(Hz)), the minimum vocal funda-
mental frequency (MDVP:Flo(Hz)), five measures of variation in fundamental fre-
quency (MDVP:Jitter(%), MDVP:Jitter(Abs), MDVP:RAP, MDVP:PPQ, Jitter:DDP),
six measures of variation in amplitude (MDVP:Shimmer, MDVP:Shimmer(dB), Shim-
mer:APQ3, Shimmer:APQ5, MDVP:APQ, Shimmer:DDA), two measures of ratio of
noise to tonal components in the voice (NHR, HNR), two nonlinear dynamical complex-
ity measures (RPDE, D2), signal fractal scaling exponent (DFA), and three nonlinear
Gradient Boosting and Minimum Redundancy Maximum Relevance 361
For Eq. (1), L is the loss function and yi is the ith label.
Once an initial prediction is made, regression trees are then constructed based on the
pseudo residuals of the previous prediction [1]. The equation for calculating the pseudo
residuals is shown below:
∂L(yi , F(xi ))
rim = − F(x) = Fm−1 (x) for i = 1 . . . n (2)
∂F(xi )
For Eq. (2), r im is the pseudo residual for sample i. This pseudo residual will be used to
create regression tree m.
To create regression tree m, a regression tree is fitted to the pseudo residuals [1]. The
terminal regions of the regression tree are denoted by Rjm , where j is the number of the
terminal region in the regression tree and m is the number of the regression tree [1]. The
output for each leaf node the tree is then computed with the following equation:
For j = i…J m :
γjm = argmin L(yi , Fm−1 (xi ) + γ ) (3)
γ
xi∈Rij
For Eq. (3), J m is the number of terminal regions for regression tree m.
Using the outputs from the tree, the predictions (denoted by F m (x)) are now updated.
Jm
Fm (x) = Fm−1 (x) + v γjm I x ∈ Rjm (4)
j=1
There are multiple other variations of Gradient Boosting which introduce improve-
ments to the base algorithm. XGBoost (eXtreme Gradient Boosting) is a version of Gra-
dient Boosting that improves on scalability [12]. LightGBM (Light Gradient Boosting)
decreases memory usage and training time [4]. CatBoost allows for automatic handling
of categorical features and reduces overfitting [7].
This study utilized five different Gradient Boosting models: XGBoost, HistGradient-
Boosting, GradientBoosting, LightGBM, and CatBoost. XGBoost was implemented
using the XGBClassifier class from the xgboost Python. HistGradientBoosting and Gra-
dientBoosting were implemented using sklearn’s HistGradientBoostingClassifier and
GradientBoostingClassifier classes. LightGBM was implemented using the LGBM-
Classifier class from the lightgbm Python library. CatBoost was implemented using
the CatBoostClassifier class from the catboost Python library.
The biomedical voice measurements in the data were also normalized using Min-Max
scaling. For a given feature, MinMax scaling subtracts the minimum value for that
feature and then divides the result by the maximum value for that feature. The equation
for Min-Max scaling is shown below:
xi − min(xi )
xi′ = (5)
max(xi ) − min(xi )
For Eq. (5), x i is the ith feature. To prevent data leakage, the maximum and minimum
feature values used were taken from the training set.
Standardization was also tested as an additional data preprocessing method. Stan-
dardization works similarly to Min-Max scaling, except that in Standardization the mean
of the values for a feature are subtracted from each feature value and the result is then
divided by the standard deviation for the feature values.
xi − µ
xi′ = (6)
σ
For Eq. (6), μ represents the mean of xi and σ represents the standard deviation of xi. For
standardization, the mean and standardization values used were taken from the training
set.
Min-Max scaling, and Standardization were all tested for each model and with both
mRMR feature selection and PCA.
algorithms like Boruta that seek to identify features that have any predictive capabil-
ity, mRMR identifies a small subset of features that will be the most useful [13]. For
this dataset in particular, which includes numerous redundant features (such as the six
different measures of variation in amplitude), decreasing the number of features to the
most essential will help to eliminate redundant features and potentially improve model
performance. For this study, mRMR feature selection was used to reduce the number of
features from 22 to 20. Additionally, to prevent data leakage, the features will be selected
based on training data. The models will be tested both with and without mRMR feature
selection.
Similar to mRMR, Principal Component Analysis (PCA) reduces the number of features
inputted to the model [2]. However, unlike mRMR which selects features to utilize, PCA
condenses features that correlate with one another into a new feature [2]. This allows
PCA to both reduce the number of dimensions and minimize the amount of information
lost in the process [2]. For this study, PCA was used to reduce the number of dimensions
from 22 to 20. Additionally, to prevent data leakage, PCA was performed based on data
from the training set. The models will be tested both with and without PCA.
The five different Gradient Boosting models were trained on the data with different
combinations of Min-Max, Standardization, mRMR, and PCA. One group on data with
MinMax scaling, one group on data with Min-Max scaling and mRMR, one group on
data with Min-Max scaling and PCA, one group on data with Standardization, one group
on data with Standardization and mRMR, and one group on data with Standardization
and PCA. In total, 30 models were trained on the training set and then evaluated and
compared on the cross validation set. The models were evaluated on the cross validation
set using the metrics of accuracy, sensitivity, specificity, AUC score, and F1 score.
AUC Score: The AUC score is the area under the Receiver Operating Characteristic
(ROC) Curve. An ROC Curve is generated by varying the model’s threshold and plotting
the different false positive and true positive rates. The area under this curve works as a
measure of how likely a model is to output a higher probability for a positive example
than a negative example. For example, an AUC score of 0.7 would represent that if given
a positive example and a negative example, the model will output a higher probability
for the positive example than the negative one 70% of the time.
364 J. Maddipatla and R. Athavale
F1 Score: A model’s F1 score is the harmonic mean of the model’s precision and recall.
Precision is the likelihood that if the model predicts that a given example is positive that
the example is actually positive. This metric is also known as the Positive Probability
Value (PPV). Recall is the same as sensitivity (the model’s accuracy on positive exam-
ples). The term recall is used in this context because recall is most commonly used when
concerning F1 score. The equation for the F1 score is shown below:
precision × recall
F1 = 2 × (7)
precision + recall
For Eq. (7), it is common to also add a value ǫ to the denominator. ǫ is often a very small
value (such as 1e−100) and serves to prevent dividing by zero. Equation (7) with the
term included is shown below:
precision × recall
F1 = 2 × (8)
precision + recall+ ∈
3 Results
The five models were trained on the training set with different combinations of Min-
Max scaling, Standardization, mRMR feature selection, and PCA. The models were then
evaluated on the cross validation set based on their accuracy, sensitivity, specificity, AUC
score, and F1 score. The results for each model on the cross validation set are shown in
Tables 1, 2, 3, 4, 5 and 6.
The HistGradientBoosting model with Min-Max and mRMR performed the best on
the cross validation set because, as shown in Table 3, it obtained the highest accuracy,
specificity, AUC score, and F1 score and obtained the second highest sensitivity. This
model was then evaluated on the test set. Its performance on the test set is shown in
Table 7 and Fig. 1.
PD currently affects roughly 1 million people in the U.S. alone, with 60,000 U.S. citizens
being positively diagnosed for the disease annually [9]. With this statistic only expected to
rise in the future, it is becoming increasingly important to diagnose PD in its early stages.
Prolonged diagnosis delays have proven to be catastrophic in the livelihood of families
and patients due to a lack of proper medication and attention. While alternatives for
diagnosis such as dopamine screening and symptom checklists exist, they require medical
professionals and extensive equipment to allow for proper execution; not only is this not
accessible to many populations, but it can also be extremely expensive. Furthermore,
many of the tested symptoms of PD also overlap with the known symptoms for other
diseases. Creating a viable and accurate solution for the rapid diagnosis of PD is an
essential asset in the race to stem disease progression. Voice analysis, being a PD specific
symptom, can easily be scaled to meet the needs of analysis due to its prevalence in
positively diagnosed patients. A machine learning algorithm to detect discrepancies in
patient voices for diagnosis tackles the issues of both accessibility and cost by creating a
readily available software solution. Voice is also unique with regards to PD, and so can
be used as a relatively accurate metric for diagnosis.
To improve the accuracy and efficiency of speech-based PD diagnosis, this study
aims to apply Gradient Boosting to classify a patient as either having PD or being
healthy based on various biomedical voice measurement features. After training on
the biomedical voice measurements from 117 voice recordings, the best performing
Gradient Boosting method was found to be HistGradientBoosting with Min-Max feature
scaling and mRMR feature selection. On the cross validation set, this model achieved
an accuracy of 94.8718%, a sensitivity of 96.7742%, a specificity of 87.5000%, an
AUC score of 0.921371, and an F1 score of 0.967742. When tested on the test set,
368 J. Maddipatla and R. Athavale
References
1. (pdf) Gradient Boosting Machines, a tutorial - researchgate. (n.d.). https://www.researchg
ate.net/publication/259653472_Gradient_Boosting_Machines_A_Tutorial. Accessed 16 Feb
2022
2. Gewers, F.L., et al.: Principal component analysis: A natural approach to data exploration, 19
June 2018 arXiv.org. https://arxiv.org/abs/1804.02502. Accessed 16 Feb 2022
3. How parkinson’s disease is diagnosed. Johns Hopkins Medicine. (n.d.). https://www.hopkin
smedicine.org/health/treatment-tests-and-therapies/how-parkinson-disease-is-diagnosed. 16
Feb 2022
4. Ke, G., et al.: LightGBM: A highly efficient gradient boosting decision tree. Microsoft
Research, 6 August 2019. https://www.microsoft.com/en-us/research/publication/lightgbm-
a-highly-efficient-gradient-boosting-decision-tree/. Accessed 16 Feb 2022
5. Niccolini, F., Su, P., Politis, M.: Dopamine receptor mapping with PET imaging in parkinson’s
disease. J. Neurol. December 2014. https://pubmed.ncbi.nlm.nih.gov/24627109/. 16 Feb 2022
6. Mayo Foundation for Medical Education and Research. Parkinson’s disease. Mayo Clinic,
14 January 2022. https://www.mayoclinic.org/diseases-conditions/parkinsons-disease/dia
gnosis-treatment/drc-20376062. 16 Feb 2022
Gradient Boosting and Minimum Redundancy Maximum Relevance 369
7. Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A.V., Gulin, A.: CatBoost: Unbiased
boosting with categorical features, 20 January 2019. arXiv.org, https://arxiv.org/abs/1706.
09516. 16 Feb 2022
8. Spine, M.B. (n.d.). Parkinson’s disease. Parkinson’s Disease (PD) Mayfield Brain & Spine
Cincinnati, Ohio. https://mayfieldclinic.com/pe-pd.htm. Accessed 16 Feb 2022
9. Statistics. Parkinson’s Foundation. (n.d.). https://www.parkinson.org/Understanding-Par
kinsons/Statistics#:~:text=Nearly%20one%20million%20people%20in,to%201.2%20mill
ion%20by%202030. Accessed 16 Feb 2022
10. UCI Machine Learning Repository: Parkinsons data set. (n.d.). https://archive.ics.uci.edu/ml/
datasets/parkinsons. Accessed 16 Feb 2022
11. U.S. Department of Health and Human Services. (n.d.). Parkinson’s disease. National Institute
on Aging. https://www.nia.nih.gov/health/parkinsons-disease. 16 Feb 2022
12. XGBoost: A scalable tree boosting system -researchgate. (n.d.). fromhttps://www.researchg
ate.net/publication/310824798_XGBoost_A_Scalable_Tree_Boosting_System. Accessed 16
Feb 2022
13. Zhao, Z., Anand, R., Wang, M.: Maximum relevance and minimum redundancy feature selec-
tion methods for a marketing machine learning platform, 15 August 2019. arXiv.org. https://
arxiv.org/abs/1908.05376. 16 Feb 2022
Optic Disk Detection in Fundus Images
of Retinopathy of Prematurity
Abstract. Children born prematurely may suffer from an eye disease called
retinopathy of prematurity (ROP). To estimate the severity of this disease, physi-
cians need to type, among other things, the extent of the disease on images that
are often of poor quality. The extent of ROP is measured from the optical disc.
Therefore, it is essential to have an automatic method that locates and segments
the optical disc. In order to contribute to a computational method that detects the
optic disc in children’s pathological images, a fast-processing method is presented
in this work. This method creates a template-based local binary pattern histogram.
Next, the method proposes recognizing candidate windows as an optical disk from
regional maxima. Then, the template is used to choose the correct optic disk. This
method used thirty images from the ROPFI dataset that contains infant patholog-
ical images. The optic disk has been manually labeled. The test identifying the
optic disk achieved a sensitivity of 0.95.
1 Introduction
Retinopathy of prematurity is an eye pathology that could occur in children born pre-
maturely. In extreme severity, it can cause blindness. This pathology is typified in the
International Classification of Retinopathy of Prematurity within stages, extent, pre-
plus, and plus-disease [1, 2]. ROP is the infantile eye pathology that, in recent years,
has gained more effort to be diagnosed using computational solutions [3], including the
use of artificial intelligence approaches [4]. However, researchers are dealing with some
difficulties; on the one hand, the availability of modern cameras used to obtain images
in high quality, the lack of quantitative agreements that facilitate the implementation
of algorithms, and, on the other hand, the few public datasets accessible to replicate
research [3–5].
Returning to the fact that this pathology is diagnosed by extent (zone I, II, and
III) and that the extension is measured by taking the optic disc as a reference, there is
2 Literature Review
In this section, we review proposals for localization of the optic disc in fundus images
of both adults and children.
An automatic method for segmenting the optic disk is presented in [6]. The proce-
dure for OD segmentation applied some techniques like Principal Component Analysis
(PCA), Mathematical Morphology, and Circular Hough Transform. The input image is
well presented using PCA and allowed to convert the image to grayscale. Then, blood
vessels are removed from the image using mathematical morphology. Finally, the Cir-
cular Hough Transform is applied for the OD separation. This proposal was tested in the
MESSIDOR database. It contains fundus images with the presence of diabetic retinopa-
thy. Authors reported that OD was effectively segmented in some images, whereas, in
others, the proposed approach failed to segment OD. Nevertheless, this work did not
report quantitative results.
The optic disk is automatically segmented in fundus adult images using a morpho-
logical approach, as is detailed in [7]. This research uses a low pass finite impulse
response (FIR) filter to suppress blood vessel dominance and improve the OD area in
fundus images. Optimized grayscale morphological dilation and median filtering oper-
ations are used to segment the OD area. This method was tested in four public datasets
(DRIVE, DIRATEDB0, DIRATEDB1, and DRIONS). Considering the four datasets,
the sensitivity reached values of up to 0.8707.
An automatic optic disk detection is given in [8]. The authors propose an approach
combining lexicography representation and a super vector machine classifier (SVM).
Local feature spectrum analysis (LFSA) is used for pre-processing. The sparse dictionary
selection strategy was used to choose optic disc candidate windows in LFSA. And an
SVM implementation realizes the final classification. This proposal was tested in the
MESSIDOR dataset, previously described. On accuracy, this approach achieved a rating
of 0.9975.
An optic disk and optic cup (OC) location method is proposed in [9]. The method
is a personalized fully convolutional network (FCN) and Inception building blocks of
372 M. Intriago-Pazmiño et al.
GoogleNet. This method is developed for adult fundus images healthy and with the
presence of glaucoma. It was tested in two datasets REFUGE and a private dataset
from the Second Affiliated Hospital of Zhejiang University School of Medicine. The
performance of this method is presented in terms of Intersection-over-Union (IOU) and
Dice scores.
In [10], the optic disc segmentation is achieved with a convolutional neural network
(CNN). The Eye Hospital of Wenzhou Medical University provided manually labeled
images, creating a private dataset of 541 images. The CNN was trained using 487 images,
and testing was achieved with 54 images. The sensitivity was reported equal to 0.94.
In [11], the optic disc is detected and localized, and then the presence of glaucoma is
stated. Regions with Convolutional Neural Networks (RCNN) are used for OD detection,
which is responsible for locating and extracting the optic disc from a fundus image. The
method is tested on several adult datasets ORIGA, HRF, OCT & CFI, DIARETDB,
DRIVE, DRIONS DB, and MESSIDOR. The results show that more than 96% of the
DO are located with more than 50% of the labeled disk present in the prediction.
In [12], a Locally Statistical Active Contour Model (LSACM) is introduced, which
allows the modeling of inhomogeneous objects as Gaussian distributions of different
means and variances. To achieve the modeling, a sliding window is used within the
original image to another domain, in which the intensity distribution of each object with
intensity non-homogeneity has less overlap statistically. Then, a maximum likelihood
energy function is solved to approximate the correct image signal. The results obtained
for this proposed method, on the adult fundus images DRISHTI-GS, averaged 0.95 for
the F-Score metric and 8.32 for the boundary bases distance metric.
In [13], LSACM is also applied for optic disc and optic cup segmentation in the pres-
ence of fundus intensity non-homogeneity. This method was applied to the DRISHTI-GS
and RIM-ONE R2 adult patient datasets. The results obtained for optic disc and optic
cup performance in F-score are 0.5% and 2.2% higher than baseline, respectively.
At the time of writing this article, we have identified only one research paper propos-
ing the detection of the optic disc in fundus images of infant patients [10]. The other
papers propose methods applied for imaging adult patients. For motivations of our
research, this article details a method for detecting the optic disc in images of preterm
infants. These images present challenges due to their low quality, aiming to contribute to
the later stages of the computational processing to assist in Retinopathy of Prematurity
(ROP) diagnosis.
3 Method
Given the significance of identifying the optical disk in the fundus images from premature
births, an effective method for recognizing the optical disk is proposed. The difficulty
of the automatic digital processing of these pathological images is considered. The
Retinopathy of Prematurity Fundus Images (ROPFI) dataset is used in this research [14].
ROPFI has sixty-four pathological images. However, only thirty images were suitable
to work on for this research. Images that did not contain the complete optical disc were
discarded.
The method is a template matching proposal. In advance, a template is created
based on histogram representation. Then areas with maximum regional are found, the
Optic Disk Detection in Fundus Images of Retinopathy 373
template is compared with each region, and the most similar or less different is the most
probable optic disk (see the method’s graph in Fig. 1). In order to create the template
for matching, optical disk samples from ten images were used. The template is based
on the local binary pattern histogram (LBP) developed in [15]. Having the template,
the process of identifying the optical disk on any fundus image can be executed. The
entry is a color fundus image and its mask. The mask is a binary image that delimits the
region of the vascular network. It allows optimizing the processing by sidestepping the
nonvascular area, which could be black background or an abnormal area of the retina.
Then, the fundus color image is enhanced in contrast, brightness, and correction gamma
using the method proposed in [14]. The image is required to convert to grayscale for the
next step. Making use of the image in grayscale regional maximums are computed. The
maximum regions represent the lightest or closest to white pixels, which are likely to be
the optical disk. The center of mass of each maximum region is calculated, and a window
is created from that center. The LBP is calculated and compared with the template using
the structural similarity index measure (SSIM) proposed in [16]. The histogram with the
most significant similarity to the template has the highest probability of being the optical
disk. Finally, the optical disk is marked on the original picture. The detail of each step
is described below.
The original images present poor contrast. In order to facilitate the later steps of process-
ing, it is required to alter the contrast and brightness. Therefore, the proposed method
in [11] was selected. This method performs an adaptive improvement using a feedfor-
ward artificial neural network to choose the filters’ parameter. The filters used are basic
contrast and brightness, gamma correction, and contrast-limited adaptative histogram
equalization. In addition, Gaussian smoothing increases the distinction of connected
pixels in the same region [17]. See an example image in Fig. 2.
374 M. Intriago-Pazmiño et al.
Fig. 1. The method proposed to recognize the optic disk from fundus images with the presence
of retinopathy of prematurity. Contrast has been modified in some images for visibility proposals.
(a) (b)
Fig. 2. A color fundus image from the ROPFI dataset. (a) Original image. (b) Its corresponding
image where contrast and brightness are enhanced.
The optic disk template is created in terms of a local binary pattern histogram (LBPH)
[15]. It has been created using ten images of the ROPFI set. Images were improved
using the method mentioned in the following subsection. To create the template, only
the optic disk is required; for this reason, the optic disk was cropped in a window of
100x100 pixels. A sample is presented in Fig. 3. The LBP histogram of each image was
Optic Disk Detection in Fundus Images of Retinopathy 375
computed, and the template is the result of the average. The histograms of each model
image and the resulting template are shown in Fig. 3 and (Fig. 4)
Fig. 3. Sample of three optic disks used for generating the template.
Fig. 4. Local binary pattern histograms (LBPH) of ten optic disks utilized to create the template.
The last graph is the template’s LBPH.
The main features of the optical disk are a ring-shaped region, the clearest region, and
the thickest blood vessels. Having understood these characteristics, the clearest regions
are searched. They are known as regional maximum. They are computed with eight-
connected neighbors. Then, the centroid is identified for each region. A window of the
same size as the template is drawn from each centroid. Its LBP feature vector is generated
and compared with the template, and the optic disk is decided by the region with the
most significant similarity to the LBP template. The similarity measure is described in
the following subsection.
where ∝, β, γ are parameters to set the weight of the three terms. Luminance is driven
by Weber’s law, contrast is compared by the mean of intensity standard deviation, and
structure comparison is performed by cross-covariance. The development of this metric
is available in [16].
This method has been implemented in Matlab 2021b. Most of the algorithms have
been coded, as well as the similarity and dissimilarity measures.
4 Results
This research work used the ROPFI image set, containing only pathologic images. Suc-
cessive filters and the definition of an effective template achieved a 95% true positive
rate or sensitivity. Sensitivity is defined [18], and mathematically expressed in (2).
TP
Sensitivity = (2)
P
where TP means true-positive, and it is the number of correctly recognized optic discs,
and P is the total number of test images. In this case, there are no negative classes (N)
because all images contain an optic disc.
A comparison with related work is presented in Table 1. Two recent works about
adult images and only one about children’s images are summarized. The F-score metric
used in one related work can be reviewed at [18].
The nature of fundus imaging varies between adults and children; even so, the result
of our model is quite close to imaging studies of adult patients.
Optic Disk Detection in Fundus Images of Retinopathy 377
Fig. 5. Mean square error measure between images used for getting the template and the template.
Images five and six are the most and less different, respectively.
Fig. 6. Structural similarity index measure between images used for getting the template and the
template. Images six and nine are the most and less similar, respectively.
5 Discussion
In this research, we have chosen to present the proposal as if it were two since the
results vary according to the similarity metric chosen: mean squared error (SQE) and
structural similarity index measure (SSIM). This decision made it possible to find the
best metric to refine the method’s performance. In this case, the metric SSIM gave the
best performance.
378 M. Intriago-Pazmiño et al.
Table 1. Performance of recent proposals to detect the optic disk. SQE: mean squared error,
SSIM: structural similarity index measure
6 Conclusion
In this research work, a straightforward technique for optical circle recognition has been
introduced, whose fundamental commitments are this proposal is focused on dealing with
pathological images of infants, its simplicity makes it possible to run on any computer,
and performance is slightly superior to other related work.
The proposed method is based on image processing techniques. First, a template
describing the optical disc is established. Then, candidate regions are identified, and the
region with the best match with the template is chosen. This simple solution is suitable
for the needs of many physicians.
Optic Disk Detection in Fundus Images of Retinopathy 379
In future works, the method should be tested on other datasets to refine its model if
necessary; acquire or create a dataset of a larger number of images; provide new methods
to achieve a complete typification of ROP disease.
References
1. Zhao, J., et al.: A deep learning framework for identifying zone I in RETCAM images. IEEE
Access. 7, 103530–103537 (2019). https://doi.org/10.1109/ACCESS.2019.2930120
2. An international committee for the classification of retinopathy of prematurity: an inter-
national classification of retinopathy of prematurity revisited. Arch. Ophthalmol. 102,
1130–1134 (1984). https://doi.org/10.1001/archopht.1984.01040030908011
3. Reid, J.E., Eaton, E.: Artificial intelligence for pediatric ophthalmology. Curr. Opin.
Ophthalmol. 30, 337–346 (2019). https://doi.org/10.1097/ICU.0000000000000593
4. Scruggs, B.A., Paulchan, R. V., Kalpathy-Cramer, J., Chiang, M.F., Peter Campbell, J.: Arti-
ficial intelligence in retinopathy of prematurity diagnosis. Trans. Vis. Sci. Technol. 9, 1–10
(2020). https://doi.org/10.1167/tvst.9.2.5
5. Shen, Y., et al.: Domain-invariant interpretable fundus image quality assessment. Med. Image
Anal. 61, 101654, (2020). https://doi.org/10.1016/j.media.2020.101654
6. Akhade, S.B., Deshmukh, V.U., Deosarkar, S.B.: Automatic optic disc detection in digital
fundus images using image processing techniques. In: 2014 International Conference on
Information Communication and Embedded Systems, ICICES 2014. (2015). https://doi.org/
10.1109/ICICES.2014.7034118
7. Bharkad, S.: Automatic segmentation of optic disk in retinal images. Biomed. Signal Process.
Control 31, 483–498 (2017). https://doi.org/10.1016/J.BSPC.2016.09.009
8. Zhou, W., Wu, H., Wu, C., Yu, X., Yi, Y.: Automatic optic disc detection in color retinal
images by local feature spectrum analysis. Comput. Math. Meth. Med. 2018 (2018). https://
doi.org/10.1155/2018/1942582
9. Qin, P., Wang, L., Lv, H.: Optic disc and cup segmentation based on deep learning. In: Pro-
ceedings of 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation
Control Conference, ITNEC 2019, pp 1835–1840 (2019). https://doi.org/10.1109/ITNEC.
2019.8729455
10. Mao, J., et al.: Automated diagnosis and quantitative analysis of plus disease in retinopathy of
prematurity based on deep convolutional neural networks. Acta Ophthalmol. 98, e339–e345
(2020). https://doi.org/10.1111/aos.14264
11. Bajwa, M.N., Malik, M.I., Siddiqui, S.A., et al.: Two-stage framework for optic disc local-
ization and glaucoma classification in retinal fundus images using deep learning. BMC Med
Inform Decis Mak 19, 136 (2019). https://doi.org/10.1186/s12911-019-0842-8
12. Gao, Y., Yu, X., Wu, C., Zhou, W., Wang, X., Chu, H.: Accurate and efficient segmentation of
optic disc and optic cup in retinal images integrating multi-view information. IEEE Access.
7, 48183–148197 (2019). https://doi.org/10.1109/ACCESS.2019.2946374
13. Jiang, Y., Tan, N., Peng, T.: Optic disc and cup segmentation based on deep convolutional
generative adversarial networks. IEEE Access. 7, 64483–64493 (2019). https://doi.org/10.
1109/ACCESS.2019.2917508
14. Intriago-Pazmino, M., Ibarra-Fiallo, J., Crespo, J., Alonso-Calvo, R.: Enhancing vessel vis-
ibility in fundus images to aid the diagnosis of retinopathy of prematurity. Health Inform. J.
1–15 (2020). https://doi.org/10.1177/1460458220935369
15. Ojala, T., Pietikäinen, M., Mäenpää, T.: Multiresolution gray-scale and rotation invariant
texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24,
971–987 (2002). https://doi.org/10.1109/TPAMI.2002.1017623
380 M. Intriago-Pazmiño et al.
16. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error
visibility to structural similarity. IEEE Trans. Image Process. 13, 600–612 (2004). https://doi.
org/10.1109/TIP.2003.819861
17. Hsiao, P.Y., Chou, S.S., Huang, F.C.: Generic 2-D gaussian smoothing filter for noisy image
processing. In: IEEE Region 10 Annual International Conference, Proceedings/TENCON,
pp. 1–4 (2007). https://doi.org/10.1109/TENCON.2007.4428941
18. Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27, 861–874 (2006).
https://doi.org/10.1016/j.patrec.2005.10.010
Machine Learning Computational
Framework for Alzheimer’s Disease
Stages Classification
1 Introduction
Alzheimer’s disease (AD) is a neurodegenerative disorder characterized by dete-
riorating cognitive functions and neuropsychiatric symptoms. AD is a progres-
sive disease, typically resulting in episodic memory loss and behavioral changes.
Symptoms occur because nerve cells (neurons) in parts of the brain involved
in thinking, learning, and memory (cognitive function) have been damaged or
destroyed. The damage to the brain is usually irreversible. Eventually, neurons in
parts of the brain that enable a person to carry out basic bodily functions, such
as walking and swallowing, are affected. Also, individuals become bedbound and
require around-the-clock care, ultimately fatal [2,29]. Around 50% and 75% of
dementia worldwide has been characterized as AD. It is the sixth-leading cause
of death in the United States and the fifth-leading cause of death among Amer-
icans aged 65 and older, with over 6.2 million Americans are affected by 2021,
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
K. Arai (Ed.): FTC 2022, LNNS 561, pp. 381–397, 2023.
https://doi.org/10.1007/978-3-031-18344-7_26
382 C. Theran-Suarez et al.
and it is expected that the number will increase up to 13.8 million by 2060. It is
reported that deaths from stroke, heart disease, and HIV decreased, but reported
death from AD increased about 145% [2]. Research has shown that the progress
of AD may improve if it can be detected early and taking treatment at the ini-
tial stages [14,21]. AD is thought to begin 20 years or more before symptoms
arise [2]. Once the patients are detected with the AD after about 3–10 years, the
patient will eventually die [13]. The detection of AD at its different stages is a
priority for medical practices to provide adequate and accurate treatments.
Machine Learning (ML) techniques have been well used in biomedical
research due to its ability to capture complex patterns from challenging datasets
[18,22]. Several works have been reported where ML algorithms are used to
predict the different stages of Alzheimer’s disease based on different biomark-
ers [3,6,28,34]. For example, the use of boosting-based ML predictive model for
accurate prediction of AD age of onset is investigated in individuals for potential
treatments and therapeutic interventions, where biomarkers were characterized
by extracellular deposits of the betamyloid (Aβ) peptide, the formation of intra-
cellular neurofibrillary tangles of hyperphosphorylated tau protein (p-Tau), and
the impairment of neurons and synaptic connections in the cerebral cortex and
hippocampus. In this case the performance of the model was evaluated using the
Root Mean Square Error (RMSE) obtaining 1.79 as a best result [34].
ML models have been applied to Magnetic Resonances Imaging (MRI) con-
tributing to a faster diagnosis of AD as well as predicting the evolution of the
disease using longitudinal brain MRI features. A framework supervised learning
classifiers in the dementia subject categorization as either AD or non-AD based
on MRI features was proposed obtaining an accuracy of 97.58% [6]. The perfor-
mance of genetic-based ML multivariate strategies in predicting late-onset AD
and to describe the main genetic features associated with the risk of developing
late-onset AD has been studied. Individuals with either a Cognitively Normal
or Alzheimer’s Disease were selected [28]. On the other hand, generalized classi-
fication schema where all stages of the disease are considered simultaneously in
the classification and decision-making processes for the prediction of AD onsets,
where at the same time missing information is handling. As a result an accuracy
of 80.52% was achieve [3].
To study the stages of this disease, the exploration of different biomarkers
has been integrated and studied to improve the diagnosing of AD and its differ-
ent stages [1,19,31]. For example, the eight most common biomarkers used to
predict and identify AD’s onsets are: Main cognitive test (CDR Sum of Boxes,
ADA11, ADAS13, MMSE, RAVLT, Moca, Ecog), MRI ROI (volumes, cortical
thicknesses, surface areas). FDG PET ROI averages(measure cell metabolism,
where cells affected by AD show reduced metabolism), AV45 PET ROI averages
- (measures amyloid-beta load in the brain, where amyloid-beta is a protein that
mis-folds, which then leads to AD), AV1451 PET ROI averages measures tau
load in the brain, where tau is another protein which, when abnormal, damages
neurons and thus leads to AD, DTI ROI measures - measures microstructural
parameters related to cells and axons (cell radial diffusivity, axonal diffusivity)
ML Computational Framework for AD 383
CSF biomarkers, amyloid and tau levels in the cerebrospinal fluid (CSF), as
opposed to the cerebral cortex.
Given the needs to study AD’s diseases and its stages, different frameworks
based on deep learning model were proposed [8,16,26]. But, it is well known
that deep learning model need large amount of data to provide good perfor-
mance. The access of data in the field of neurodegeneratives disorder is limited.
In this paper, a ML computational framework that integrate different classical
machine learning models, as well as imputation methods to handle missing data
is proposed. It is known that ML methods provides good performance upon the
quality of the data. In the experimental results, We provide the performance
of five different ML models for AD’s stages multiclass prediction to discrimi-
nate the stages of Alzheimer’s disease that include cognitive normal (CN), mild
cognitive impairment (MCI), and Alzheimer’s disease (AD). These models are
evaluated in terms of accuracy, precision, and F1-score. Also, the analysis of
three imputation methods to handle the missing value problem is presented. A
general schema that integrates ML model for AD’s stages multiclass prediction
is proposed, performing an average accuracy of 99%.
This paper is organized as follows. Section 2 presents a description of the
ensemble learning model and feature selection. Section 3 describes the machine
learning models used to predict the stages of Alzheimer’s disease. Section 4 pro-
vides a performance analysis of the models using the metrics accuracy, precision,
and F1-score. And Sect. 5 provides the conclusion.
Table 2 shows important ADNI biomarkers used in the analysis. The Boruta
algorithm consists of following steps [23]:
ML Computational Framework for AD 385
1. Extend the information system by adding copies of all variables (the infor-
mation system is always extended by at least 5 shadow attributes, even if the
number of attributes in the original set is lower than 5).
2. Shuffle the added attributes to remove their correlations with the response.
3. Run a random forest classifier on the extended information system and gather
the Z scores computed.
4. Find the maximum Z score among shadow attributes (MZSA), and then
assign a hit to every attribute that scored better than MZSA.
5. For each attribute with undetermined importance perform a two-sided test
of equality with the MZSA.
6. Deem the attributes which have importance significantly lower than MZSA as
‘unimportant’ and permanently remove them from the information system.
7. Deem the attributes which have importance significantly higher than MZSA
as ‘important’.
8. Remove all shadow attributes.
9. Repeat the procedure until the importance is assigned for all the attributes,
or the algorithm has reached the previously set limit of the random forest
runs.
We can observe in Table 2 how the variables are confirmed to be essential for
the ensemble model. Higher means evaluate the importance of each feature, and
MaxImp against the mean confirmed the importance for the model. A Wrapper
386 C. Theran-Suarez et al.
method allows the subset of ADNI features to forward and backward elimination
to be applied. We draw some inferences from previous training data to progres-
sively find irrelevant features. However, We found that all subset attributes have
importance significantly higher than Z score among shadow attributes (MZSA).
Therefore, confirmed to be necessary for future classification tasks.
where p(t) is the relative frequency of the first class in the node t. Defining G(t)
as Eq. 2
where k indexes the target class, pL() and pR() are the probability distributions
of the target in the left and right child nodes.
3.2 XGBoost
where L(y, F ) is the loss function, which is defined as negative binomial loglike-
lihood log (1 + e−2yF ), when y ∈ {−1, 1} for classification problem. And Ey,x is
the joing distribution. It is possible to minimize the function F ∗ finding a global
or minimum solution using different numerical methods such as gradient descent.
For multiclass classification problem the gradient-decent boosting algorithm for
K-class is defined in [17] as follow.
K
L({yk , Fk (x)}K
1 )=− yk log (pk (x)) (6)
k=1
Taking the first derivative of Eq. (9), K-trees are generated per each iteration
m to predict the current residual for each class based on the probability scale.
xh+1
j = yih wij
h
− θh+1 (12)
i
where yih is the response of the ith neuron from the (h − 1)th layer, wij is the
weight from the ith neuron in layer h to the jth neuron in layer h+1, and θjh+1 is
the threshold of the jth neuron in layer h + 1. The output of each neuron is the
results of a activation function (nonlinear function). for example, the sigmoid
activation function is well known and it is defined by the Eq. (13).
1
yjh = h (13)
1 + e−xj
To find the optimum weight values w, the least mean square error (LMS) is
used in output vectors. For a given weight w the LMS is defined as follows
1 h
E(w) = (y (w) − ŷj )2
2 j j,c
h
where yj,c (w) is the output for the node j in the layer h, and ŷj is its real target.
We apply the method gradient descent to minimize the function E(w) .
where C1 and C2 are the two classes. The objective is to find a w and w0 such
that
wt xt + w0 ≥ +1 for rt = +1
(15)
wt xt + w0 ≤ −1 for rt = −1
The samples xi closest to the hyperplane are called support vector and define
the margin that needs to be maximized. The distance between the hyperplane
and the samples x is defined as follows
|wt xt + w0 |
(16)
||w||
when rt ∈ {−1, +1} the Eq. (16) is transformed into the Eq. (17)
rt (wt xt + w0 )
(17)
||w||
t t t
Given a threshold ρ > 0, we look for r (w||w||
x +w0 )
≥ ρ. To maximize the margin
1
distance, we need to maximize ρ. Defining a fixed value ρ||w|| = 1, then ||w| . To
maximize the margin, we need to minimize ||w||. This generate an optimization
problem with constrains, which is describe as follows.
1
min ||w||2 subject to rt (wt xt + w0 ) ≥ +1, ∀t (18)
2
Solving the Eq. (18) we find the optimal hyperplane w.
4 Results
The proposed ML Computational framework for Alzheimer’s disease stages clas-
sification has been evaluated using the ADNI dataset, which is a well established
dataset in the study of neurodegenerative disorder disease. In particular, the
ADNI-merge data was considered to perform the evaluation. A number of 1853
subjects were considered, where each subject belongs to a different group (AD,
CN, EMCI, LMCI) as illustrated in Table 3, this provide a total of 13182 of
samples.
To evaluate the performance of the proposed strategy for AD’s disease stages
classification four different metrics were adopted; Accuracy, Precision, F1-score,
and Recall. These metrics quantify the uncertainty of the model, it means, how
well these model are classifying patients over the four different categories (AD,
CN, LMCI, and EMCI). A good classifier provides high values among a range of
[0,100%]. On the other hand, The Receiver Operator Characteristic (ROC) curve
was computed to show the true positive rate (TPR) against the false positive rate
(FPR), which measures the capability of the models to distinguish the patterns
among classes.
The Table 4 shows the performance of the adopted ML models, which were
trained using the prepossessing data. These results show that the proposed strat-
egy can provide quality data that help the different models to identify and gen-
eralize the learned pattern to classify new data with the same characteristic. The
trained model provide an average of 99% for the four different metrics from 10
k-fold (cross-validation) process.
Figure 3 shows how well the models distinguish among the all positive and
negative class points. For each category the area under the curve (AUC) is equal
to one, which means that the models are perfectly discriminating the samples
among the categories using the prepossessing data. In particular, Fig. 3 shows
the ROC curve for SVM (right figure) and decision tree (left figure).
On the other hand, Table 5 compares the performance of our proposed frame-
work illustrated in Fig. 1 versus the performance of the approach proposed by
Aghili in [3]. Our proposed framework for AD’s classification is superior in our
best case by 19% in term of accuracy, preccision, F1-score and recall.
ML Computational Framework for AD 393
Fig. 3. ROC curve for SVM (Right Image) and decision tree (Left Image) from the
ensemble learning model.
5 Discussion
The proposed framework can be defined as a sequence of processes that start with
data preparation and end with the implementation of different ML models for a
classification task. The strength of this framework relies on a data-centric app-
roach, which is the crucial step in providing quality data and therefore improv-
ing the performance of well-known ML methods. Consequently, our framework
394 C. Theran-Suarez et al.
Table 5. Performance comparison between the framework proposed in [3] and the
proposed in this work.
includes feature selection, data transformation, imputation methods, and five dif-
ferent ML models, which were discussed in Sect. 3. In particular, this framework
has been developed for Alzheimer’s Disease classification problem handling 28%
of the missing data values presented in the dataset. The effectiveness of this pro-
posed framework was illustrated in Table 4, showing performance of around 99%
of accuracy, precision, F-1 score, and Recall. In terms of performance, XG-Boost
performs best out of the five implemented ML models, followed by Decision
Tree. Both models achieved the highest performance when the median impu-
tation method was adopted to handle the messing data and improve the data
quality. Comparing the presented results against the ones found in the literature
[3], an improvement of about 19% of accuracy was obtained.
6 Conclusions
A ML computational framework was develop that integrates data processing,
feature selection, imputation methods, and five different ML models. The per-
formance of the proposed framework was evaluated, showing that the framework
provides a strategy capable of handling missing data value, providing a quality
prepossessing data that makes the ML models manage to learn and distinguish
the pattern gathered from the data. The accuracy, F1-score, recall, and precision
metrics were computed, showing that the framework classifies the AD stages with
an accuracy of 99%. Also, this paper shows that classical ML algorithms such
as Decision Tree Classifier, XGBoost, Random Forest, Multi-Layer Perceptron
(MLP), and Support Vector Machine (SVM) can achieve high accuracy upon
the of the quality data. This paper advanced knowledge in AD field comparing
different ML models with the proposed ensemble learning computational frame-
work. As well as, promoting the use of traditional machine learning models for
AD’s stages classification.
The dataset used contains 39 features amounts Cognitive tests, MRI, PET,
and Genetic information. It is well known that the probability of having miss-
ing values increase when the number of feature increase. In this way, there is
still the need to analyze the proposed framework when a significant percentage
ML Computational Framework for AD 395
of data is missing (higher than 28%). In other words, how sensitive is the pro-
posed framework in managing missing values to improve the data quality and
provide high accuracy classification. On the other hand, due to the complexity
of this framework, the design of an interface is required for those unfamiliar or
not knowledgeable in programming. The interface can facilitate the use of this
framework.
Acknowledgment. The authors would like to acknowledge NIH BioMed Grant Num-
ber 150108136 under Florida A&M University, and CI-New: Cognitive Hardware and
Software Ecosystem Community Infrastructure for allow us to run our application in
their infrastructure (Nautilus).
References
1. Biomarkers of Alzheimer’s disease: Neurobiol. Dis. 35(2), 128–140 (2009). Biomark-
ers of Neuropsychiatric Disease
2. Alzheimer’s disease facts and figures: Alzheimer’s & Dementia 17(3), 327–406
(2021)
3. Aghili, M., et al.: Prediction modeling of Alzheimer’s disease and its prodromal
stages from multimodal data with missing values. Int. J. Med. Health Sci. 13(2),
36–40 (2019)
4. Antor, M.B., et al.: A comparative analysis of machine learning algorithms to
predict Alzheimer’s disease. J. Healthc. Eng. 2021, 1–12 (2021)
5. Bae, J.-M.: Clinical decision analysis using decision tree. Epidemiol. Health 36,
e2014025 (2014). https://doi.org/10.4178/epih/e2014025. Korean Society of Epi-
demiology
6. Battineni, G., et al.: Improved Alzheimer’s disease detection by MRI using multi-
modal machine learning algorithms. Diagnostics 11(11), 2103 (2021)
7. Bhagwat, N., Viviano, J.D., Voineskos, A.N., Chakravarty, M.M., Alzheimer’s Dis-
ease Neuroimaging Initiative, et al.: Modeling and prediction of clinical symptom
trajectories in Alzheimer’s disease using longitudinal data. PLoS Comput. Biol.
14(9), e1006376 (2018)
8. Bhatkoti, P., Paul, M.: Early diagnosis of Alzheimer’s disease: a multi-class deep
learning framework with modified k-sparse autoencoder classification. In: 2016
International Conference on Image and Vision Computing New Zealand (IVCNZ),
pp. 1–5 (2016)
9. Biau, G., Scornet, E.: A random forest guided tour. TEST 25(2), 197–227 (2016).
https://doi.org/10.1007/s11749-016-0481-7
10. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regres-
sion Trees. Routledge (2017)
11. Buntine, W., Niblett, T.: A further comparison of splitting rules for decision-tree
induction. Mach. Learn. 8(1), 75–85 (1992)
12. Campos, S., Pizarro, L., Valle, C., Gray, K.R., Rueckert, D., Allende, H.: Evaluat-
ing imputation techniques for missing data in ADNI: a patient classification study.
In: CIARP 2015. LNCS, vol. 9423, pp. 3–10. Springer, Cham (2015). https://doi.
org/10.1007/978-3-319-25751-8 1
13. Chávez-Gutiérrez, L., et al.: The mechanism of γ-secretase dysfunction in familial
Alzheimer disease. EMBO J. 31(10), 2261–2274 (2012)
396 C. Theran-Suarez et al.
14. Crous-Bou, M., Minguillón, C., Gramunt, N., Molinuevo, J.L.: Alzheimer’s disease
prevention: from risk factors to early intervention. Alzheimer’s Res. Therapy 9(1)
(2017). https://doi.org/10.1186/s13195-017-0297-z
15. Fan, Z., Fanyu, X., Qi, X., Li, C., Yao, L.: Classification of Alzheimer’s disease
based on brain MRI and machine learning. Neural Comput. Appl. 32(7), 1927–
1936 (2019)
16. Feng, Q., Zhu, D., Yang, J., Li, B.: Multisource hyperspectral and lidar data fusion
for urban land-use mapping based on a modified two-branch convolutional neural
network. ISPRS Int. J. Geo-Inf. 8, 28 (2019)
17. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann.
Stat. 29(5), 1189–1232 (2001). http://www.jstor.org/stable/2699986. Institute of
Mathematical Statistics. ISSN 00905364
18. Gao, H., Li, Y., Zhang, Z., Zhao, W.: Editorial: machine learning used in biomedical
computing and intelligence healthcare, volume i. Frontiers in Genetics, 12 May 2021
19. Humpel, C.: Identifying and validating biomarkers for Alzheimer’s disease. Trends
Biotechnol. 29(1), 26–32 (2011)
20. Joshi, S., Shenoy, D., Simha, G.G.V., Rrashmi, P.L., Venugopal, K.R., Pat-
naik, L.M.: Classification of Alzheimer’s disease and Parkinson’s disease by using
machine learning and neural network methods. In: 2010 Second International Con-
ference on Machine Learning and Computing, pp. 218–222 (2010)
21. Kalaria, R.N., et al.: Alzheimer’s disease and vascular dementia in developing
countries: prevalence, management, and risk factors. Lancet Neurol. 7(9), 812–826
(2008)
22. Koohy, H.: The rise and fall of machine learning methods in biomedical research.
F1000Research, 6:2012, January 2018
23. Kursa, M.B., Jankowski, A., Rudnicki, W.R.: Boruta-a system for feature selection.
Fundamenta Informaticae 101(4), 271–285 (2010)
24. Kursa, M.B., Rudnicki, W.R.: Feature selection with the Boruta package. J. Stat.
Softw. 36, 1–13 (2010)
25. Li, D.-C., Liu, C.-W., Hu, S.C.: A fuzzy-based data transformation for feature
extraction to increase classification performance with small medical data sets. Artif.
Intell. Med. 52(1), 45–52 (2011)
26. Mahendran, N., PM, D.R.V.: A deep learning framework with an embedded-based
feature selection approach for the early detection of the Alzheimer’s disease. Com-
put. Biol. Med. 141, 105056 (2022)
27. Murtagh, F.: Multilayer perceptrons for classification and regression. Neurocom-
puting 2(5–6), 183–197 (1991)
28. De Velasco Oriol, J., Vallejo, E.E., Estrada, K., Peña, J.G.T., The Alzheimer’s
Disease Neuroimaging Initiative: Benchmarking machine learning models for late-
onset Alzheimer’s disease prediction from genomic data. BMC Bioinformat. 20(1),
1–17 (2019)
29. Reitz, C., Mayeux, R.: Alzheimer disease: epidemiology, diagnostic criteria, risk
factors and biomarkers. Biochem. Pharmacol. 88(4), 640–651 (2014)
30. Rosenblatt, F.: The perceptron: a probabilistic model for information storage and
organization in the brain. Psychol. Rev. 65(6), 386 (1958)
31. Sharma, N.: Exploring biomarkers for Alzheimer’s disease. JCDR 10, KE01 (2016)
32. Shishegar, R., et al. Using imputation to provide harmonized longitudinal measures
of cognition across AIBL and ADNI. Sci. Rep. 11(1), 1–11 (2021)
33. Singh, D., Singh, B.: Investigating the impact of data normalization on classifica-
tion performance. Appl. Soft Comput. 97, 105524 (2020)
ML Computational Framework for AD 397
34. Vélez, J.I., et al.: A comprehensive machine learning framework for the exact pre-
diction of the age of onset in familial and sporadic Alzheimer’s disease. Diagnostics
11(5), 887 (2021)
35. Wu, X., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37
(2007)
36. Yang, W., et al.: Independent component analysis-based classification of
Alzheimer’s disease MRI data. J. Alzheimer’s Dis. 24(4), 775–783 (2011)
Critical Assessment of Current State of the Art
in Wearable Sensor Nodes with Energy
Harvesting Systems for Healthcare Applications
1 Introduction
The internet of things (IoT) technology has recently contributed to the medical field, and
new healthcare devices termed “smart wearable sensor nodes”. Such smart wearable
devices and sensors help patients to monitor their vital conditions through their own
smart mobile phones, which enable proper displaying and smart analysis of the measured
data, and reduced the consumed power. Presently, these nodes are utilized by hospitals
and doctors, to follow the health status of the patients directly and remotely, without
any need to a manual measurement, and store the vital data of patients for later needs.
These data is sent via different wireless technologies, such as Wi-Fi, or Bluetooth, etc.
One of the challenges which face wearable devices is the continuous measurement of the
data, which could be unnecessary, and consumes more power for insensible reasons. So,
software algorithms are developed to discretely measure the data every a defined period of
time, and switch to a sleep/off mode, when the data measurement is unneeded. However,
securing the used energy sources is a challenge, and thus energy harvesting techniques
has appeared; to simultaneously collect energy required to operate the devices/nodes,
without the need to a manual recharge. The most preferred energy sources to use are the
solar and thermal ones which depend on the temperature difference between a human
body and an ambient temperature. On the other hand, wearable nodes are utilized for
seamless and remote diagnosing of patients, where the sensor data can be transmitted
automatically to hospitals in case of critical vital data or even for further analysis by
specialists.
In the literature, three types of wearable sensor nodes were proposed to monitor
the vital signs of patients. In [1], the node measured the heart rate and body tempera-
ture, transmitted by a BLE. In [2], the node measured the heart rate, temperature, and
oxygen concentration in the blood (SpO2 ), transmitted by a Wi-Fi. In [3], the node
measured the heart rate, temperature, oxygen concentration and acceleration, and also
connected by a BLE. The energy storage of these nodes varied between batteries [1, 2]
and super-capacitors [3]. These storages can be recharged by energy harvesting systems
using only a photovoltaic (PV) cell [1, 2] or a hybrid harvester comprises a PV cell
with a thermoelectric generator (TEG) [3]. In [4], a sensor node powered by a BQ25570
energy harvester, used to increase energy conversion efficiency via automatic setting of
its internal input impedance to achieve maximum power transfer. This node can harvest
energy using a solar panel. It can be worn as a ring in the finger and used to measure
the blood oxygen level and employs an accelerometer to measure movements during
measurements and correct movement artifacts. In [5], the reported node is a hand wear-
able device and had a BQ27441 fuel Gauge chip which performs the same function.
This device contains a blood pressure sensor, a 9-axis motion sensor, a microphone, an
ultra-low-power ECG/EMG sensor with a bio-impedance analog front-end. The device
also empowered with a low-power galvanic skin response front-end which was used to
collect vital data and utilize artificial intelligence with these data for stress detection,
where the data can be transmitted by a Bluetooth. In [6], a wearable sensor node was
designed based on a maximum power point tracking (MPPT) algorithm, was used with
employing a solar panel for harvesting energy. This node can be worn as a smart patch
sewed on clothes. It measures the echocardiogram (ECG) using an ECG acquisition cir-
cuit, which utilizes AD8232 chip for ECG data acquisition. It uses a Bluetooth module
400 A. E. Alattar et al.
which measured a sweat pH using a smart textile that changed its colors according to a
pH value with a color sensor, as well as the body temperature. The data from this node
was sent through a Bluetooth to a PC.
This paper presents a comprehensive review about the reported sensor nodes in
the literature. Nevertheless, three nodes will be paid special attention in the following
discussion. The three nodes will be compared with respect to different aspects, for
instance power consumption, lifetime, charging time, and energy harvesting. The paper
is organized as follows: Sects. 2, 3, and 4 present the architectures description of the three
nodes. Section 5 discusses the results of the three nodes. Section 6 presents suggestions
to improve the three wearable nodes. Finally, the conclusion is provided in Sect. 7.
Rechargeable BaƩery
Microcontroller Unit
intervals and sleeps the rest of the time, for a definite work cycle. A time of 15 min was
chosen as a repeated cycle, where the device is initialized to wake up and measure the
data for 15 s (0.25 min), then it sleeps for 885 s (14.75 min), and so on every 15 min.
This algorithm was developed using C programming language. The data is transferred
by a Bluetooth to a mobile phone to be visualized using HMBLE Terminal application,
which can be downloaded through Google Play Store.
an illumination of 1 h per day, the energy generated per day is 828 J, So the charging
time is 87.65 days, which is less than the lifetime (137.17 days). So, a sustainable work
performance of the device is achieved, even in short illumination times. Assuming used
of a lower voltage and capacity battery of 3.7 V and 280 mAh, its lifetime is 192 (8 days).
This battery stores an energy of 1.036 Wh. The power generated by the PV panel is 230
mW (0.23 W), so assuming that the panel is illuminated 1 h daily, the generated energy
per day is 0.23 Wh (828 J). The time required to recharge the battery was 4.5 days.
Energy Harvester
Rechargeable BaƩery
Microcontroller Unit
Body Pulse
Wi-Fi
Temperature Oximeter
Module
Sensor Sensor
Online Server
The node works on a voltage of 3.3 V, and consumes 72.5 mA and 0.02 mA in wake-up
and sleep modes, respectively. Assuming a period of 1 h, the wake-up time is 300 s and
sleep time is 3300 s. The average consumed current in one hour is 6.06 mA, consuming
an average power of 19.99 mW. Practically, it consumes 20.23 mW. The battery stores
a power of 15.96 Wh, so the lifetime of the battery is 798.39 h (33.26 days). In case
using the harvester, the weather is assumed sunny for 2 h where a power of 414 mW s
generated, cloudy for 2 h where a power of 266 mW is generated, and the rest of the
day was night. Thus, the generated power per day is 1.36 Wh. So, the charging time is
11.73 days. The average power generated in one hour is 0.34 Wh. When an illumination
of one hour is assumed, the charging time is 46.94 days. This time is more than the
Critical Assessment of Current State of the Art 405
time of recharging, which means that the node may not be sustainable in short daily
illumination times.
Figure 5 demonstrates the block diagram of the third node [3]. It is a wearable sensor
node which is wearable on a hand wrist and its sensors are placed on the fingers, it
works on an Arduino Lilypad board and uses ATMega328P microcontroller. It utilizes
the sensor MAX30205 to measure a body temperature, MAX30100 to measure a blood
oxygen and heart rate. An ADXL335 accelerometer is added. It uses the HM-10 BLE
module to send the vital data to a mobile phone to be shown on it using an application,
but the used application is designed using MIT App Inventor. To power this node, two
(100 F-2.7 V) super-capacitors from AVX® are used, connected in series to achieve
a capacitance of 50 F and a voltage of 5.4 V. The LDO regulator MCP1700 which is
used to regulate the voltage of the super-capacitors, as the operating voltage of the node,
3.3 V. For harvesting energy, the harvester is hybrid of two energy sources, a photovoltaic
panel, and a thermoelectric generator (TEG) module (SP1848–27145) [17]. Both hybrid
harvesting systems are used to generate an electric power from a body heat of the patient,
where a thermal-conducting coat is used between the module and the skin. To regulate
the voltage output of the PV cells and TEG module, a DC-DC converter based on low-
power LTC3105 is used [18]. The health information revealed includes heart beats, blood
pressure, oxygen blood level, temperature, and acceleration. The continuous monitoring
may be needed for some groups of patients. In hospital patients who aren’t in the intensive
care and the quarantined patients need monitoring as mentioned before. The home-
quarantined patients can be monitored, where it’s suggested for the health authorities or
hospitals to open a server where the home-quarantined patients can register their sensor
nodes which are enabled to do so, and the health authorities can monitor their health and
the emergency services can be called automatically in critical cases.
DC-DC Converter
Super-Capacitors
Pulse Body
Oximeter Temperature
Sensor Sensor
Microcontroller
Unit
Bluetooth
Accelerometer
Low Energy
Sensor
(BLE)
practical consumption is 2.13 mW. So, the lifetime is 46 h (1.91 days). The harvester
comprises two methods to generate power: photovoltaic cell with thermoelectric (TEG)
generator. Assuming a PV cell which had an area of 4.32 cm2 , conversion efficiency
of 7%, irradiated by 1000 W/m2 for 6 h a day, then it generates 302.4 mW, so the
generated energy in 6 h is 6531.84 J, and the battery got charged with the PV cell only
in 0.054 days (1.29 h). For thermoelectric generator, it’s assumed that the temperature
of the body is 37 °C and the ambient temperature is 17 °C, so the temperature difference
is 20 °C. The TEG module generates a power of 100 mW. It’s assumed that it works
for 6 h, it generates 2160 J per day, and so recharges the capacitors in 0.16 days (3.9
h). The simultaneous work of the two energy generators is assumed, they are generated
together 402.2 mW, and so they generate 8687.52 J per day, and so the recharging time
is 0.04 days (approximately 0.97 h). Studying the case for continuous charging, the PV
cell generated in one hour (0.3022 Wh), the TEG module generates 0.1 Wh and both
generators produce 0.4022 Wh, so the charging time is for PV only 0.32 h (or 0.32 days).
While for TEG only is 0.97 h (or 0.97 days if generating for 1 h per day) and for both
PV and TEG together is 0.24 h (or 0.24 days if generating for 1 h per day). It’s noticed
that the charging time is very short and less than the lifetime, which proves sustainable
work performance.
5 Discussion
The node in [1] consumes an average power of 4.811 mW, while the node in [2] consumes
19.99 mW and the node in [3] practically consumes 2.13 mW. Looking at the lifetimes,
the lifetime in [1] is 137.17 days, lifetime in [2] is 33.26 days, and lifetime in [3] is
1.91 days. For the charging times, [1] is charged in 14.6 days, [2] is charged in 11.73 days,
while in [3] is charged in 0.04 days. It’s noticed from the previous data that the node [3]
is the least power consumption, [1] is the longest lifetime and [2] is the least charging
time. Figure 7 shows a comparison of the specifications of the three nodes. For activity
times, it’s noticed in [1] that it measures the vital data for 15 s and sleeps for 885 s,
which indicates four times an hour. In [2], it measures for 5 s and sleeps for 55 s, which
is equivalent to 60 times an hour. The node in [3] measures for 10 s and sleeps for 1190
s, which means 3 times per hour. The most precise node is [2]; because it’s the most
active node among others. To know the most sustainable node, looking at the lifetime and
recharging time isn’t enough. A figure of merit can be taken by taking a ratio between
the discharging time (lifetime) to charging time, here it is termed a sustainability factor,
denoted by Q. In [1], Q = 137.17/14.6 = 9.39, while in [2], Q = 33.26/11.73 = 2.83
and in [3], Q = 1.91/0.04 = 47.5. The sustainability factor is the highest in [3], so it’s
the most sustainable. The method used to transmit data differs. In [1, 3], the vital data
are transmitted by a BLE to a mobile phone app, where a Bluetooth terminal app is used
in [1], but in [3] a manually-designed app is used, where the data are shown on a mobile
phone. In [2], the used method to transmit vital data is Wi-Fi internet connection, to
be accessed through a cloud service, by someone who has a credential to the service.
Table 1 shows comparison between the three wearable sensor nodes. Table 2 shows a
comparison between sensor nodes in [4–6, 9–11].
408 A. E. Alattar et al.
6 Suggestions
For harvesting energy, it’s preferred to vary energy sources, not to use only photovoltaic
energy; to reduce the charging time as possible and to ensure continuous charging if
some sources are unavailable. In [3], it uses a photovoltaic source with a thermal source,
but in [2] it uses two photovoltaic sources, which completely stops the charging in
absence of illumination. However, in [3], the use of a thermoelectric source ensures
continuity of charging even if the PV panel isn’t illuminated. Regarding energy storages
in [1, 2], it uses Li-ion battery, but in [3] it uses super-capacitors of low capacity, so
these capacitors aren’t preferred. The preferred source is the Li-ion batteries. Regarding
charging circuits, the preferred is the DC-DC boost converter; to produce the maximum
power. For precision of measurement, it’s preferred to increase activity times, as in [1],
the algorithm can be modified to measure for 5 s and sleep for 295 s, to measure every
5 min. A similar modification can be done in [3] to make the vital data measurements
for 5 s and sleep for 595 s, to measure every 10 min. A button can be added to measure
at pressing it when a measurement is needed even in the sleeping time. Regarding the
wireless communication, a BLE is the best when the node is used personally, but a Wi-Fi
connection is better when the device is used in hospitals. BLE and Wi-Fi may be used
together, where the Wi-Fi enables access of the hospital to the patient’s data, and BLE
enables the patient to access to his own data, or the patient can be given a credential
to access his own data only, according to the use. The use of Wi-Fi to send data to a
server is useful in epidemics such as COVID-19 pandemic, where the data are sent to
a server which can be accessed by doctors. The most advantage of using Wi-Fi is the
ability to use for home-quarantined patients, where their vital data are monitored by
hospitals to help the critically ill patients as soon as possible. An option can be added
is to automatically alert the doctors when the device measures a definite value of heart
rate, blood oxygen level or temperature. Regarding wearing place, [1, 3] are wearable
on the wrist and the sensors are placed on the fingers, but [2] is wearable on the upper
arm. The best selection is to wear on the upper arm; to make the patient’s hand free.
Regarding adding other vital data, the sensor node that measures the most vital data is
[3], which measures heart rate, blood oxygen level, temperature and acceleration. It’s
suggested to add a blood pressure sensor to measure the blood pressure.
Critical Assessment of Current State of the Art 409
7 Conclusion
This paper has presented a detailed review of a number of wearable nodes, among which
three architectures were given special and detailed discussion. These three architectures
utilized algorithms to reduce the measuring of vital parameters in unnecessary times and
thus reducing their power consumption. These nodes have various energy harvesters to
fulfill sustainable work performance in terms the power supplying. The first node [1] was
wearable on a hand wrist, measured the temperature and heart rate, worked on a Li-ion
battery charged by a photovoltaic (PV) cell, transmitted the vital data by a Bluetooth
low energy (BLE), and it had the highest lifetime. The second node [2] was upper-arm-
wearable, measured the heart rate, blood oxygen level, and temperature, supplied by a
Li-ion battery charged by two parallel PV cells, transmitted the vital data via a Wi-Fi to a
Ubidots server, and it gets active the most times. The third node [3] was wrist-wearable,
measured the body acceleration, heart rate, blood oxygen level and temperature, powered
by two series super-capacitors charged by a PV cell with a thermoelectric generator, and
transmitted the vital data by a BLE. Regarding future works, one could use a variety of
energy sources, increase activity times of the nodes, and varying energy sources. New
sensors can also be added to measure other vital parameters.
Acknowledgment. This work was carried out with the support of the Karlsruhe Nano Micro
Facility (KNMFi, www.knmf.kit.edu) a Helmholtz Research Infrastructure at Karlsruhe Institute
Critical Assessment of Current State of the Art 411
of Technology (KIT, www.kit.edu) and under the Helmholtz Research Programme MSE (Materials
Systems Engineering) at KIT.
References
1. Mohsen, S., Zekry, A., Youssef, K., Abouelatta, M.: An autonomous wearable sensor node
for long-term healthcare monitoring powered by a photovoltaic energy harvesting system.
Int. J. Electr. Telecommun. 66(2), 267–272 (2020)
2. Mohsen, S., Zekry, A., Youssef, K., Abouelatta, M.: On architecture of self-sustainable wear-
able sensor node for IoT healthcare applications. Wireless Pers. Commun. 119(1), 657–671
(2021)
3. Mohsen, S., Zekry, A., Youssef, K., Abouelatta, M.: A self-powered wearable wireless sensor
system powered by a hybrid energy harvester for healthcare applications. Wireless Pers.
Commun. 116(4), 3143–3164 (2021)
4. Magno, M., Salvatore, G.A., Jokic, P., Benini, L.: Self-sustainable smart ring for long-term
monitoring of blood oxygenation. IEEE Access 7, 115400–115408 (2017)
5. Magno, M., Wang, X., Eggimann, M., Gavigelli, L., Benini, L.: InfiniWolf: energy effi-
cient smart bracelet for edge computing with dual source energy harvesting. In: 2020th
Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 342–345. IEEE,
Grenoble, France (2020)
6. Wu, T., Redouté, J.-M., Yuce, M.: A wearable, low-power, real-time ECG monitor for smart
T-shirt and IoT healthcare applications. In: Fortino, G., Wang, Z. (eds.) Advances in Body
Area Networks I. Internet of Things (Technology, Communications and Computing) 2019,
vol. 2019, pp. 165–173. Springer, Cham (2019)
7. D˛abrowska, A., Bartkowiak, G., P˛ekosławski, B., Starzak, Ł: Comprehensive evaluation of a
photovoltaic energy harvesting system in smart clothing for mountain rescuers. IET Renew.
Power Gener. 14(16), 3200–3208 (2020)
8. Ivanov, K.: Design, realization and study of thermoelectric watch. In: 21st International Sym-
posium on Electrical Apparatus & Technologies (SIELA), pp. 1–4. IEEE, Bourgas, Bulgaria
(2020)
9. Wan, J., et al.: Wearable IoT enabled real-time health monitoring system. EURASIP J. Wire.
Commun. Netw. 298, 1–10 (2018)
10. Qiu, C., Wu, T., Redouté, J-M., and Yuce, M. R.: A wireless wearable sensor patch for the
real-time estimation of continuous beat-to-beat blood pressure. In: 41st Annual International
Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 6842–
6845. IEEE, Berlin, Germany (2019)
11. Wu, T., Wu, F., Redouté, J., Yuce, M.R.: An autonomous wireless body area network imple-
mentation towards IoT connected healthcare applications. IEEE Access 5, 11413–11422
(2017)
12. Magno, M., et al.: InfiniTime: multi-sensor wearable bracelet with human body harvesting.
Sustain. Comput. Inform. Syst. 11, 38–49 (2016)
13. Dionisi, A., Marioli, D., Sardini, E., and Serpelloni. M.: Autonomous wearable system for
vital signs measurement with energy-harvesting module. IEEE Trans. Instrum. Measure. 65
(6), 1423–1434 (2016)
14. Tran, T.V., Chung, W.: High-efficient energy harvester with flexible solar panel for a wearable
sensor device. IEEE Sens. J. 16(24), 9021–9028 (2016)
15. Sonoda, K., et al.: Wearable photoplethysmographic sensor system with PSoC microcon-
troller. Int. J. Intell. Comput. Med. Sci. Image Process. 5(1), 45–55 (2013)
412 A. E. Alattar et al.
16. Caldara, M., Colleoni, C., Guido, E., Re, V., Rosace, G., Vitali, A.: A wearable sweat ph and
body temperature sensor platform for health, fitness, and wellness applications. In: Di Natale,
C., Ferrari, V., Ponzoni, A., Sberveglieri, G., Ferrari, M. (eds.) Sensors and Microsystems.
Lecture Notes in Electrical Engineering, vol. 268. Springer, Cham (2014). https://doi.org/10.
1007/978-3-319-00684-0_82
17. Mohsen, S.: Hybrid energy harvester for medical sensor node toward real-time healthcare
monitoring. Proc. Eng. Technol. Innov. 18, 43–48 (2021)
18. Mohsen, S.: A solar energy harvester for a wireless sensor system toward environmental
monitoring. Proc. Eng. Technol. Innov. 21, 10–19 (2022)
Identifying Severity Clusters in SLE
Patients
1 Introduction
Systemic lupus erythematosus (SLE) is a complex immune multi-system disease
effecting various organs and tissues [1]. The production of auto- antibodies by
the immune system against DNA is considered as a hallmark of SLE. As a result,
the immune system attacks its own organs and tissues [2]. Main symptoms of
SLE are rash joint pain, in addition to more severe manifestations such renal
disease, autoimmune hemolytic anaemai.
Disease distribution is worldwide in both genders with nine times higher
ratio in women than men [1]. Prevalence and incidence of SLE vary according
to gender, age and ethnicity [1,3]. SLE exact etiology is unknown, the complex
interaction between different multi factor such as genetics, environment, and
hormonal factors causes the disease [1].
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
K. Arai (Ed.): FTC 2022, LNNS 561, pp. 413–431, 2023.
https://doi.org/10.1007/978-3-031-18344-7_28
414 H. Zidoum et al.
SLE is a heterogeneous disease [1] and some of its symptoms overlap with
other autoimmune pathology [4,5]. SLE is related to unpredictable exacerbations
and remissions with a wide range of symptoms [1,5].
The nature of SLE led the researchers to propose SLE as a syndrome [1].
Clinical signs don’t always occur together at the same time and may develop at
any phase of the disease. The extreme heterogeneity of SLE disease constitutes
many challenges for patients and clinicians. At present, the diagnosis is based
on the experience of doctors [1]. They can evaluate symptoms and the results of
laboratory analysis.
The course of this disease is characterized by dynamic episodes of active
inter- change with remissions [6]; consequently, there is a problem in defining an
efficient line of treatment for the cumulative symptoms. Older treatment meth-
ods often implied a reduced life span for SLE patients due to organ malfunctions
or the toxic effects of therapy. At this stage, SLE disease does not have a clear
path, which helps clinicians to follow up and manage patients.
The complex nature of SLE makes measuring disease activity accurately
remains a challenging task. SLE disease activity measurement is essential to
evaluate the Patient’s status, the result affects the decisions of therapeutic strate-
gies. Also, it gives an index to predict chronic damage and mortality [1]. Disease
activity needs to be monitored and controlled to avoid the chronic damage devel-
opment and enhance the management of SLE patients.
The complex nature of SLE makes measuring disease activity accurately a
challenging task. SLE disease activity measurement is essential to evaluate the
patients’ status; the result affects the decisions of therapeutic strategies. Also,
it gives an index to predict chronic damage and mortality [1]. Disease activity
needs to be monitored and controlled to avoid chronic damage development and
enhance SLE patients’ management.
To identify clinically significant changes in SLE different disease activity
indices were developed [1,5,7]. The SLE Disease Activity Index (SLEDAI) is
an internationally widely used index. SLEDAI developed in order to calculate
disease activity in the preceding 10 days. It considered as a clinical index, which
includes 24 weighted clinical and laboratory variables of nine organ systems
[5,8]. Another global activity index, European Consensus Lupus Activity Mea-
surement (ECLAM) involves 5 weighted clinical and serological items related to
the previous 30 days [5].
Physicians face different challenges in defining, diagnosing, and managing
SLE patients. Effort and time are also devoted. Thus, there is a need to reduce
the effort of diagnosis and management. Disease activity is a fundamental issue
that needs to be monitored and assessed to facilitate disease management, reduce
health care costs (e.g., medications), and reduce the harmful effects of the
medicines. To the best of our knowledge, no research has used the clustering
methods to identifying severity clusters in Omani patients with SLE and detect
the features associated with disease severity.
In this paper we aim at (a) identifying severity clusters in Omani patients with
Systemic Lupus Erythematosus, (b) detecting features related to diseases severity,
SLE Patients Clusters 415
and (c) examining the correlation between disease activity index (SLEDAI) and
physician global assessment (PGA) with each subgroup in results.
This paper is organized as follows. Section 2 discuss related work that uses
the same method to solve a different problem. Section 3 introduces clustering
methods. Section 4 defines materials and methods used to solve our problem.
Section 7 discussed experimental results and evaluation. Section 6 contains our
clustering result validation. Section 7 discussion. Section 8 conclusion and future
work.
2 Literature Review
Clustering analysis used in several medical fields, for instance, brain image seg-
mentation [9,10], partitioning patients of Alzheimer’s disease [11] and clustering
autoantibody of lupus patients [12].
Authors in [13], used the concept of clustering to identify damage clusters
in SLE patients. Around 1130 patient’s data were clustered using the K-means
algorithm with Euclidean distance. Patient’s data were grouped into three clus-
ters, cluster 1 least damage, cluster 2 high in renal damage and ocular dam-
age, and cluster 3 neuropsychiatric damage and musculoskeletal damage. Here
is observed that most of the symptoms are closely related to each other. This
study introduced new approach to evaluate SLE patients by damage. It show
that neuropsychiatric and renal disorder are present in different clusters and
patients with this damage had highest risk of death.
Authors in [14], used the concept of clustering to cluster symptoms of SLE in
children’s. They created 5 clusters to group data of five major SLE symptoms in
children’s. Each cluster has an associated set of factors. All participants in the
study were below 18 years of age. This study was conducted to find out other
problems, which may arise due to existing SLE in children’s below age of 18 and
identify the symptom occurring in patients with childhood-onset. 75 patients
were included and five separate clusters were identified usin agglomerative hier-
archical clustering using centroid method with cosine similarity measure. Cluster
1 had symptoms related to pain (joint pain, headache, and painful muscles )and
itching, cluster 2 symptoms related to bruises and stomach complaints, cluster 3
symptoms related to weight gain, cluster 4 symptoms related to white fingers in
cold weather, hair lossand sensitivity to sunlight, and cluster 5 symptoms related
to fatigue. The problem with study is that the number of participants was less,
also participants were heterogeneous.
Authors in [15] collected data of 150 SLE patients from Egypt. Three dif-
ferent clusters were created on the basis of some associated features. In this
study, K- means clustering algorithm was used. Euclidian distance is used as the
similarity measure. This study helped in understanding the disease pattern in
Egyptian SLE patients. Such study helps in disease prediction and precaution.
Three distinct clusters identified, cluster 1 significantly hight in mucocutaneous
and arthritis manifestations. Cluster 2 more frequent in renal and hematological
manifestations and finally Cluster 3 had a high prevalence of mucocutaneous
manifestations,serositis, hematologic manifestations and renal involvement.
416 H. Zidoum et al.
To the best of our knowledge, no research has used the clustering methods to
identifying severity clusters in Omani patients with SLE and detect the features
associated with disease severity. Furthermore our method is robust as we used
cluster validation for evaluating the goodness of clustering algorithm results.
In this research, we used both internal cluster validation and external cluster
validation.
3 Clustering Methods
Several clustering methods have been proposed [16] partition, density-based,
and hierarchical [17]. Partitioning algorithms separate the data set into speci-
fied number of clusters according to the similarity or distance among the data
samples. Hierarchical algorithms compose the clusters in the hierarchical struc-
ture. Density- based algorithms find the dense regions among data samples to
form clusters and low-density regions create boundaries between the clusters.
Using a specific method depends on data size, data dimensionality, and time
complexity [18].
Partition methods start by consider all data points in the data set as sin-
gle cluster and do splitting until a stopping criterion is met. While, hierarchical
methods start considering each data point in the data set as a cluster and aggre-
gating them to form a hierarchical cluster structure (larger cluster) [19]. There
are many clustering methods to cluster datasets based on calculating the sim-
ilarity. The following subsections describe and provide the detail of clustering
methods.
for estimating the number of clusters and the appropriate clustering algorithm
without any external data.
1. The silhouette score value was used to determine the appropriate cluster-
ing algorithm. Silhouette helps to evaluate the correctness of a data object’s
assignment in a particular cluster instead of another cluster by measuring
both inter-cluster separation and intra-cluster cohesion [24]. Negative Silhou-
ette values show the incorrect placement of objects, while a positive value
represents better object placement [24].
Using this technique, go through three steps, assume the data have been
clustered via any technique into k clusters.
(a) Compute a cluster distance for every data point with respect to other
points in the same cluster, and we denote the results as a(i). a(i) mean
distance between i and all other data points in the same cluster (Eq. 1).
1
a(i) = d(i, j) (1)
|Ci | − 1
j∈Ci ,i=j
(c) We now define Silhouette Coefficient of one data point i (Eq. 3).
b(i) − a(i)
s(i) = (3)
maxa(i), b(i)
2. The Calinski Harabasz Score (CH) one of the internal validity measures com-
monly used for evaluating clustering solutions [25]. CH measures the two
criteria simultaneously with the help of average between and within cluster
sum of squares. It is described by:
Bc (k) Wc (k)
CH(k) = / (4)
(k − 1) (n − 1)
where n stands for number of the clusters and k stands for class.BC and WC
denotes between and within cluster sums of squares respectively, given by:
K
2
Bc = |Ck | Ck − x (5)
k=1
N
K
2
Wc = wk ,i xi − Ck (6)
k=1 i=1
SLE Patients Clusters 419
The dataset was collected from Sultan Qaboos University Hospital (SQUH) for
studies on SLE and approved by the Ethics Committee of the College of Medicine
and Health Science in the Sultan Qaboos University (SQU) (MERC # 1418 and
1650). To maintain confidentiality, every patient was assigned a unique label,
and the data were analyzed anonymously.
For this research work, we included only Omani adult patients (15–55 years
old) who were diagnosed with SLE and followed up in a Rheumatology clinic in
SQUH from 2006 to 2019. Patients who meet the American College of Rheuma-
tology (ACR) classification criteria were included in the research. Research par-
ticipants were 138 SLE patients. Extracted data are from SQUH includes dif-
ferent files, including demographic data file, clinical notes file, laboratory test
result file, and medication data file.
This section describes the data preprocessing phase conducted to our dataset
to apply clustering. Data preprocessing is an important step to apply cluster
analysis methods to the dataset to increase quality. It consists of a sequence of
steps that depend on the data itself for example normalize, scale, and transform
feature data. The preprocessing was implemented using the python language.
The following steps of preprocessing data were applying to our dataset:
The distance between these two points is quantified based on the Pythagoras
Theorem, where (x1 ,y1 ) and (x2 ,y2 ) are points in 2-dimensional space then the
Euclidean distance between them is calculated by using the Eq. 8.
ED = (x2 − x1 )2 + (y2 − y1 )2 (8)
Renal disorders were more prevalent among Cluster 2 patients (0.87) than
only 0.18 Cluster 1 patients. Of these 54 patients in Cluster 2, 0.41 patients had
a fever, while 0.23 of the patients in Cluster 1 had a fever. Cluster 2 patients
significantly were higher in low C3 (0.96) and low C4 (0.93) than those in Cluster
1 (0.51 and 0.54). The majority of patients in Cluster 2 had significantly more
Anti-dsDNA (0.85) than those in Cluster 1 (0.67). Cluster 2 significantly had
more Acute Cutaneous Lupus (0.65) than those in Cluster 1 (0.33). Interestingly,
Cluster 2, in general, tends to be more expensive compared to another cluster.
Table 1 shows the prevalence of features detected using a hierarchical method
where the maximum is one and minimum is zero.
SLE Patients Clusters 423
Table 1. Prevalence of features detected using a hierarchical method where the max-
imum is one and minimum is zero
The K-Means clustering method uses to group the patients into a different num-
ber of clusters. First, we set up the desired number of clusters to 2,3, and 4.
Next, via the maxiter parameter, we specify the maximum number of iterations
for every run (maxiter = 300).
5.3.1 K-Mean Results Analysis for K=2 In the first experiment, we set
the number of the desired cluster to two. Cluster 1 was the smallest cluster with
54 patients (53 females and one male) than the second cluster with 84 patients
(80 females and four males). Cluster1 patients 0.89 was Females, and they suf-
fer from the renal disorder, hemolytic anemia, anti-dsDNA antibody, and low
complements (C3, C4). The radar chart in Fig. 5 shows features distribution
424 H. Zidoum et al.
in Cluster 1 and Cluster 2 using a K-Means method, radar chart shows Clus-
ter1 (severe) has the highest prevalence of features (lupus nephritis, hemolytic
anemia, low complement (C3 and C4), and positive anti-dsDNA) compared to
Cluster 2.
5.4.1 Results Analysis for K=2 Two separate clusters were identified using
spectral methods; the first cluster included 44 (one male and 43 females) patients,
while the second cluster contains 94 (4 males and 90 females) patients. The radar
chart in Fig. 6 shows features distribution in Cluster 1 and Cluster 2 using a K-
Means method, radar chart shows Cluster 1 (severe) has the highest prevalence
of features (lupus nephritis, hemolytic anemia, low complement (C3 and C4),
and positive anti-dsDNA) compared to Cluster 2.
Table 3 show the prevalence of features detected using the Spectral method
where the maximum is one and minimum is zero. Cluster 2 (n = 93) was the
largest cluster with the lowest renal disorder’s symptoms (0.21) compare to Clus-
ter 1 (0.95). Cluster 1 patients significantly were higher in low C3 (0.98) and Low
C4 (0.95) than those in Cluster 2 (0.55 for low C3 and 0.56 low C4). Cluster 1 had
a higher prevalence in acute cutaneous lupus (0.64) and hemolytic anemia (0.70)
than those in Cluster 2, 0.37 for acute cutaneous lupus and 0.17 hemolytic ane-
mia. However, there were no differences in the prevalence of leukopenia between
the two clusters.
SLE Patients Clusters 425
Table 2. Prevalence of features detected using K-Mean method in clusters where the
maximum is one and minimum is zero
6 Clustering Validation
Clustering validation has long been recognized as one of the vital issues essential
to the success of clustering applications. As described in Sect. 3.4 we will use
two kinds of cluster validation, internal cluster validation and external cluster
validation.
Fig. 6. The features distribution in the two clusters using spectral clustering
Table 3. Prevalence of features detected using spectral method in clusters where the
maximum is one and minimum is zero
categorizes patients into four categories: severe, medium, mild, or none (remis-
sion), while SLEDAI five categories depending on the SLEDAI scores. No activ-
ity SLEDAI when scoring zero, mild activity when the SLEDAI score between
one and five, moderate activity when the SLEDAI score between six and ten,
high activity when the SLEDAI score between eleven to nineteen, and very high
activity when the SLEDAI greater than or equal to twenty.
Table 5 shows clinician evaluation (PGA and SLEDAI) and clustering result
of evaluated SLE patients. All three methods groups none cases evaluated using
PGA in the same group except two cases. Hierarchical agglomeration and K-
means methods group all mild cases in the same group except one case, while the
spectral method groups two cases in one cluster and another two cases in other
clusters. All med cases are clusters in the same cluster using the three clustering
methods. All severe cases are clustered in the same group using hierarchical
agglomeration and K-means methods, while the spectral method groups all cases
in the same group except one case.
The evaluation of clustering analysis results to SLEDAI shows all very high
activity and high activity cases are clustered in the same group using three meth-
ods. All moderate activity cases are clustered in the same group using hierarchical
agglomeration and K-means methods, while the spectral method groups all cases
in the same group except one case. Regarding no activity disease activity cases,
all three methods group the cases in the same group except one case.
The reason for missing classified some patients in the same cluster was the
period between collecting data and assessing patients; The data was collected in
different periods. Most of it in the diagnosis period, while the patients were
recently evaluated. When patients’ diagnosed with SLE, some patients were
severe in the age-onset and then alternating to remissions or vice versa.
7 Discussion
To the best of our knowledge, this is the first study aimed to establish SLE
Omani patients clusters, which leads to identify clusters in patients based on
Table 5. Clustering result validation using physician global assessment and disease activity index
428
symptoms and immunological tests. The clustering results will give researchers
improved way to group the heterogeneous patients. For instance, patients that
tend to occur together may develop the same symptoms and therefore share the
same cause and/or underlying immunopathology. We explored the prevalence
of the symptoms in Omani SLE patients. The most frequent symptoms were
joint pain, followed by acute cutaneous lupus (ACL), renal disorder, hemolytic
anemia, anti-dsDNA antibody, alopecia, and low complements (C3, C4).
The patients’ are clustered into two clusters in the present study according to
the optimal number of clusters. We have found severe and medium cases clustered
in the same group. These patients suffered from rash, lupus nephritis, hemolytic
anemia, low complement (C3 and C4), and positive anti-dsDNA. Also, remission
and mild cases clustered in the same group; patients were suffering from joint
pain. 90% of severe and medium cases were grouped in the same group and
81.8% of none and mild cases.
We called the first cluster severe and the second cluster a mild cluster. The
severe cluster has a high prevalence of renal disorder, hemolytic anemia, anti-
dsDNA antibody, and low complements (C3, C4), which indicate disease severity
in SLE patients in Oman. The mild cluster was associated with joint pain, low
complements (C3, C4), and a positive anti-dsDNA antibody.
By identifying the patient clusters, we confirmed that renal disorder,
hemolytic anemia, and low complements (C3, C4) are related features. These
features were highest in the severe cluster (high disease activity). Joint pain
symptom was not a distinct symptom for disease severity, but we observed that
it tends to be one of the most common symptoms in Omani patients with SLE.
References
1. Bertsias, G., Cervera, R., Boumpas, D.T.: Systemic lupus erythematosus: patho-
genesis and clinical features. EULAR Textbook Rheumat. Dis. Geneva Switzerland:
European League Against Rheumat. 2012, 476–505 (2012)
2. Pavlovic, M., Kats, A., Cavallo, M., Chen, R., Hartmann, J.X., Shoenfeld, Y.:
Pathogenic and epiphenomenal anti-DNA antibodies in SLE. Autoimmune Dis.
2010 (2010)
3. Rees, F., Doherty, M., Grainge, M.J., Lanyon, P., Zhang, W.: The worldwide inci-
dence and prevalence of systemic lupus erythematosus: a systematic review of
epidemiological studies. Rheumatology 56(11), 1945–1961 (2017)
4. Cruz, B.H., et al.: Differences in clinical manifestations and increased severity of
systemic lupus erythematosus between two groups of Hispanics: European Cau-
casians versus Latin American mestizos (data from the relesser registry), Lupus
29, 0961203319889667 (2020)
5. Ceccarelli, F., et al.: Assessment of disease activity in systemic lupus erythemato-
sus: lights and shadows. Autoimmun. Rev. 14(7), 601–608 (2015)
6. Campar, A., Farinha, F., Vasconcelos, C.: Refractory disease in systemic lupus
erythematosus. Autoimmun. Rev. 10(11), 685–692 (2011)
7. Smith, P.P., Gordon, C.: Systemic lupus erythematosus: clinical presentations.
Autoimmun. Rev. 10(1), 43–45 (2010)
8. Aranow, C.: A pilot study to determine the optimal timing of the physician global
assessment (PGA) in patients with systemic lupus erythematosus. Immunol. Res.
63(1–3), 167–169 (2015)
9. Hrosik, R.C., Tuba, E., Dolicanin, E., Jovanovic, R., Tuba, M.: Brain image seg-
mentation based on firefly algorithm combined with k-means clustering. Stud.
Inform. Control 28, 167–176 (2019)
10. Huang, H., Meng, F., Zhou, S., Jiang, F., Manogaran, G.: Brain image segmenta-
tion based on FCM clustering algorithm and rough set. IEEE Access 7, 12 386–12
396 (2019)
11. Alashwal, H., El Halaby, M., Crouse, J.J., Abdalla, A., Moustafa, A.A.: The appli-
cation of unsupervised clustering methods to Alzheimer’s disease. Front. Comput.
Neurosci. 13, 31 (2019)
12. Mizus, M., Li, J., Goldman, D., Petri, M.A.: Autoantibody clustering of lupus-
associated pulmonary hypertension. Lupus Sci. Med. 6(1), e000356 (2019)
13. Ahn, G.Y., et al.: Identifying damage clusters in patients with systemic lupus
erythematosus. Int. J. Rheumat. Dis. 23(1), 84–91 (2020)
14. Chiang, Y.-C., Huang, J.-L., Wang, C.-H., Lee, H.-C., Lee, M.-Y., Hsiao, Y.-C.:
Symptom clustering in patients with childhood-onset systemic lupus erythemato-
sus. J. Adv. Nurs. 75(1), 54–62 (2019)
15. Helalyand, M., Mansour, M.: Clinical features clusters in systemic lupus erythe-
matosus. Egypt. J. Hosp. Med. 71(5), 3136–3141 (2018)
16. Abbas, O.A.: Comparisons between data clustering algorithms. Int. Arab J. Inf.
Technol. (IAJIT) 5(3) (2008)
17. Jung, Y.G., Kang, M.S., Heo, J.: Clustering performance comparison using k-
means and expectation maximization algorithms. Biotechnol. Biotechnol. Equip.
28(sup1), S44–S48 (2014)
18. Aggarwal, S., Phoghat, P., Maitrey, S.: Hierarchical clustering- an efficient tech-
nique of data mining for handling voluminous data. Int. J. Comput. Appl. 129(13),
31–36 (2015)
SLE Patients Clusters 431
19. Saxena, A., et al.: A review of clustering techniques and developments. Neurocom-
puting 267, 664–681 (2017)
20. Ahmad, A., Khan, S.S.: Survey of state-of-the-art mixed data clustering algorithms.
IEEE Access 7, 31 883–31 902 (2019)
21. Reddy, M.V., Vivekananda, M., Satish, R.: Divisive heirarchical clustering with k-
means and agglomerative heirarchical clustering. Int. J. Comp. Sci. Trends Technol.
5(5), 6–11 (2017)
22. Popat, S.K., Emmanuel, M.: Review and comparative study of clustering tech-
niques. Int. J. Comput. Sci. Inf. Technol. 5(1), 805–812 (2014)
23. Hamad, D., Biela, P.: Introduction to spectral clustering. In: 2008 3rd Interna-
tional Conference on Information and Communication Technologies: From Theory
to Applications, pp. 1–6. IEEE (2008)
24. Bruno, G., Cerquitelli, T., Chiusano, S., Xiao, X.: A clustering-based approach to
analyse examinations for diabetic patients. In: 2014 IEEE International Conference
on Healthcare Informatics, pp. 45–50. IEEE (2014)
25. Lukasik, S., Kowalski, P.A., Charytanowicz, M., Kulczycki, P.: Clustering using
flower pollination algorithm and calinski-harabasz index. In: IEEE Congress on
Evolutionary Computation (CEC). IEEE, pp. 2724–2728 (2016)
Automated Real-Time Recognition
of Non-emotional Conversational Head-Gestures
for Social Robots
1 Introduction
In recent years, developed countries are facing growth in aging population [1]. It is
anticipated that the world population will soon plateau and start declining by 2050 [1].
This will cause a severe lack of availability of human caretakers. Social or assistive
robots will be needed to provide mental, physical, and social support for special needs,
including medical industry or elderly care industry [2] for therapy or care [3], education
industry as teachers [4] and space or deep-sea exploration [5]. Social robots will share
a common space with humans. To improve acceptability and interact, social robots will
have to comprehend human emotions, speech, and gestures, and generate natural human
comprehensible gestures and speech [6, 7].
Human-robot interaction, including conversation, is mostly non-emotional.
Researchers have developed many models for the analysis and generation of multi-
pose conversational head-gestures based upon a combination of head-nod, headshake,
head-tilt and eye-focus for dynamic adaptation and learning [8, 23]. Researchers have
also developed a synchronized colored Petri net-based model for multimedia gesture
analysis and training of social robots, one gesture at a time [9]. The training model uses
silence, motion, stillness and synchronization between motions and speech to train social
robots one gesture at a time [9]. However, they do not propose an implementation of
the proposed model that continuously detects conversational gestures during real-time
interaction.
In this paper, we analyze a combination of motion sequence, eye-focus, cyclic
motions, and their synchronization associated with non-emotional conversational head-
gestures and implement the synchronous colored Petri net for real-time automatic recog-
nition of gesture boundaries and gestures in the wild. We extract vector of meta-attributes
for each gesture and match vectors for the incoming gestures and vectors in the knowl-
edge base to identify gestures in the wild. We also use head-motion and eye-focus based
gesture classification for better disambiguation between similar gestures.
The implemented model has five modules, as illustrated in Fig. 1. The first module
derives motion, stillness and silence vector using video analysis of facial feature-points
and speech intensity analysis. The second module derives places, transitions, delays, and
synchronization for each gesture using dynamically generated matrices for the corre-
sponding Petri net graphs. The third module derives the meta-attributes of each derived
gesture-graph. Meta-attributes include types of head-motion, eye-focus, types of syn-
chronization, and number of places, transitions, and cycles. The fourth module matches
the signatures derived from meta-attributes of analyzed gestures with archived signa-
tures to label the gestures being analyzed. Fifth module performs gesture resolution for
error introduced in the places and transitions due to the smaller head and eye motion.
The major contributions of this research are:
The overall organization of the paper is: Sect. 2 presents the background of conver-
sational head gestures, synchronous gesture modeling, and the derivation of the basic
elements of Petri net graphs. Section 3 describes the related work. Section 4 discusses
synchronous colored petri net modeling of conversational head-gestures and the deriva-
tion of corresponding meta-properties. Section 5 describes matching of meta-properties
to derive similar gestures to resolve ambiguity. Section 6 describes implementations and
algorithms. Section 7 describes the experimentation and discusses the results. Section 8
concludes the paper and describes future work.
434 A. Singh and A. K. Bansal
Video source
Ambiguity resolution
Recognized gesture
2 Background
2.1 Conversational Non-emotional Head Gestures
P1 T1
end
P
P
P T
2a. start sync (| | ) 2b. end sync
T T
P3
P1
P2
P1
P0
T1
P2 T2 P4
2c. during sync (δ > +1) 2d. simple cycle
3 Related Works
Video analysis for gesture recognition is classified as a subclass of Human Action Recog-
nition (HAR) [13]. HAR has many applications such as video surveillance, healthcare,
and human-robot interaction [14–17]. Researchers have used 2D Convolutional Neu-
ral Network (CNN) for HAR [14–17] and 3D convolution kernel and depth sensors to
extract spatio-temporal data [18–22].
Conversational gesture analysis differs from HAR such as identification of walking
and running due to embedded nonverbal communication and interaction and inherent
synchronization of multiple actions such as head-motion, hand-motion, eye-focusing,
lip-movements, and speech.
The current research on head-gesture recognition is limited to 1) video analysis
for basic head-motion (head-node and head-shake) detection [23]; and 2) hand-motion
analysis during conversation using various artificial intelligence methodologies such as
Hidden Markov Model (HMM), Dynamic Bayesian Network (DBN), and Long Short-
Term Memory (LSTM) [23, 24]. However, the research does not capture the temporal
synchronization of concurrent motions and speech needed for conversational gesture
analysis.
Automatic head-nod detection methods have been developed in the context of
Human-Computer Interaction (HCI) to facilitate affirmation [25]. Researchers have
proposed designs to track the facial points in combination with finite state machines
(FSM) and HMM for head-nod detection using facial feature-detections, including mul-
tipose detections [23–29]. Researchers also combine head-posture analysis with head-
motion analysis [23, 24]. However, these methods do not derive the combination of
head-movements for different conversational head-gestures as identified by behavioral
and clinical psychologists.
was employed using a neural network [30]. The limitations of face-pose estimations are
that they are not sensitive to small head-pose changes.
Automated co-speech gesture recognition combines multimedia (video stream,
including sound) analysis. Pioneering research has been done on exploiting cognitive
models of conversational gesture recognition and generation, along with limited con-
current speech generation capability in Asimov robots [34]. However, the model lacks
Allen’s synchronization in movements between multiple organs, and synchronization
in speech and organ movements for realistic, human-like interactions. Lack of synchro-
nization causes perceptual distortion. They also do not have a formal declarative model
required for real-time learning and adaptation.
In recent years, researchers have also used deep neural networks (DNN) such as CNN
[35] or fusion of CNN and LSTM [36] to classify and label gestures from head-motions.
Both studies suffer from incomplete classification of head-movements in co-speech ges-
tures and do not take synchronization and cycles into account. Ad hoc grouping of head-
motion classes [36] is not as concise as synchronous petri net based abstractions, which
is also invariant of individual-specific and gesture-specific motion-speed variations.
transitions. Each cell of the matrix is a 5-tuple of the form (connectivity, edge-delay,
colors, part-of-cycle, type-of-synchronization). Connectivity is ‘1’ if the corresponding
nodes are connected; edge-delay is non-zero in the case of starting delay of processes in
‘during synchronization’; colors describe the dimensions in which motion takes place;
part-of-cycle marks the edge as part of a cycle; and type-of-synchronization describes
the type of synchronization associated with the edge. For example, for head-shake color-
entry is ‘x-dimension;’ for head-nod color-entry is ‘y-dimension’; for head-tilt color-entry
is {x-dimension, y-dimension}. The matrix is built dynamically during video analysis,
one node at a time based upon motion, stillness, and silence analysis. All the place-labels
are coalesced consecutively followed by all the transition-labels.
The property tuple for each node is {node-type, in-degree, out-degree, cycles, syn-
chronization, delays}. Node-type could be ‘place’, ‘transition’, or ‘trigger-node’. In-
degree is the number of incoming edges to the node. In-degree is greater than 1 when
(1) two concurrent processes merge; and (2) presence of a cycle is caused by a repeated
sequence of head-motions. Out-degree is greater than 1 when (1) two concurrent pro-
cesses are spawned; and 2) non-deterministic junction where a motion may repeat,
change the motion-type, or terminate motion.
Synchronization in Petri net is derived using in-degree and out-degree analysis in
the associated connected nodes. A transition node with in-degree ≤1 and out-degree >1
indicates the beginning of synchronization. A place with in-degree >1 and out-degree
≤1 indicates termination of synchronization.
tilted head
down P3 Tr3{y} relaxed
relaxed and Tr {x, y} tilted head & silent
silent head 1
Tr2{y} silent head
P1
P2
tilted head + P5 P6
Tr5{x, y}
P4 Tr4
Tr2{x} Tr5
+ silent P6
speak phrase
P3
Fig. 3. Synchronous colored petri net for gestures ‘appreciate’ and ‘disagreement’
The corresponding Petri net has six places (P1 to P6 ), five transitions (Tr 1 to Tr 5 ),
one strict synchronization between P2 and P5 . The transition node Tr 1 is a trigger-node
that starts a synchronization, node P2 is the entry point of the cycle and node P4 is the
end of the cycle. Tr 2 and Tr 3 are transitions involved in a cycle. It creates an 11 × 11
matrix.
Each gesture has a signature. Each signature-tuple defines ((head-nod, direction), (head-
shake, direction), (head-tilt, direction), eye-focus, number of places, number of tran-
sitions, number of start-synchronization, number of end-synchronization, number of
strict-synchronization, number of during-synchronization, number of concurrent asyn-
chronous actions, number of cycles, speech). Head-nod, head-shake, Head-tilt, eye-
focus and speech have binary values, and other attributes are multivalued integers. The
signature of conversational head-gestures is given in Table 1.
440 A. Singh and A. K. Bansal
If a place is missed due to a sampling error because sampling time ≥ the stillness thresh-
old, the erroneous signature may not match with the signature of the actual gesture, and
the actual gesture may be mislabeled. This labeling error is reduced using a classification
tree that groups gestures based on major attributes.
The classification tree is based on the semantics of the head-motion-type, eye-focus,
cycle (repeated motion) and speech. Head-nod and head-shake are mutually exclusive.
Single motion (cycle = 0) and repeated motions (cycle = 1) are mostly mutually exclu-
sive. Similarly, focused-eye and unfocussed-eye are mutually exclusive. This mutual
exclusion helps in reducing labeling error in identified gestures.
Based upon major attributes, we classified gestures into fifteen classes. Within the
same class, gestures are separated further using motion direction, synchronization infor-
mation, place information and transition information for further resolution. The classes
are illustrated in Table 2.
A dynamically constructed matrix is realized to model Petri net graph. Petri net graphs
are modeled using: 1) a matrix comprising (m + n) × (m + n) where m is the number
of places, and n is the number of transitions; 2) a vector of associated meta-properties
derived by matrix-analysis. Since there is no edge between places, and there is no edge
between transitions, the corresponding matrix segments have zero entries.
Facial video is analyzed to derive the sequence of facial feature-point coordinates.
Feature-point coordinates are analyzed to derive transition coordinates, stillness vector
stl and silence vector sil. Using changes in transition coordinates, various head-motion
types are derived.
Dynamic matrix is built by incrementing the place-label counter i or the transition-
label counter j after identifying a place or transition. The dynamic matrix is also analyzed
to derive synchronization information and cycles in the Petri net.
A simplified abstract algorithm for building Petri net matrix and continuous gesture
identification is given in Fig. 4. The algorithm uses stillness vector, silence vector, and the
coordinates of the feature-points during motion. For simplicity, the algorithm illustrates
synchronization between head-motions and speech.
A place is identified when the stillness vector stl transitions from ‘0’ to ‘1’. Head is
still if the distances between centroids of facial feature-points for consecutive sampling
time are below a statistically derived threshold. If a place, based on coordinates compar-
ison, has not already been visited, a new node-label Pi+1 is created, an edge is marked
between the previous label LM Prev → Pi+1 , place-counter i is incremented by one, and
previous label LM Prev is updated to new Pi .
A place is part of a cycle if the place has already been visited. Before creating a new
place-label, coordinates of the next place Pi+1 are compared with the coordinates of the
visited places in a memo table S place and searched using a similarity-based analysis. If
the Euclidean distance between the two coordinates is below a threshold, the new place
is the same as the visited place, and the label of the visited place is used instead of the
new label Pi+1 , and the new place index i is not incremented.
A cycle detection algorithm is initiated using the set S place to identify the places and
transitions involved in the cycle. All the nodes (places and transitions) in the cycle are
marked, and the corresponding meta-attribute vector is updated.
A transition is identified when the stillness vector stl transitions from ‘1’ to ‘0’. If a
transition is not a part of the cycle, then a new transition node Tr j +1 is created. An edge
LM prev → Tr j+1 is marked between the previous node (a place Pi ) and the new transition
Automated Real-Time Recognition 443
Tr j+1 . The transition-counter j is incremented by one, and the new label Tr j is stored as
LM prev .
When the stillness-vector stl transitions from 1 → 0 and silence vector sil transitions
from 1 → 0 within a short time-lag δ, there is a concurrent occurrence of speech,
which indicates either start-synchronization or duration-synchronization. If the silence-
vector transitions within |δ| ≤ 1 time-unit, a Boolean variable start-synch is set, and
start_sync_count is incremented by one.
Fig. 4. A simplified algorithm for building a dynamic matrix for gesture recognition
444 A. Singh and A. K. Bansal
The algorithms have been implemented in Python, interfaced with Visual Studio Code
(1.55.2), using the python library for face-detection and OpenCV library [44], PyAudio
library for speech analysis [45] and Pydub audio library for silence analysis [46]. The
software was executed on a machine with Intel(R) Core (TM) i7-6500U CPU @ 2.50 GHz
2.60 GHz 64-bit system with 8 GB RAM. The system was developed using the Python-
library.
Video frames were sampled 10 times/sec with a time interval between two frames
being 100 ms. The upper threshold for silence detection was 45 dB. The empirical
analysis of recorded data showed that the maximum threshold of random head-motion
for still head x-coordinates is ±4.0 and y-coordinates is ±3.0. The relaxed-state was
found to be in the region (x-origin ±8.5, y-origin ±4.0). Increasing the sampling rate
improves the recall percentage slightly at the cost of added computational overhead.
During the experimentation, there were instances of the time where the software
failed to detect the face. Hence, coordinates of the feature-points were unavailable. In
such cases, we assumed continuity of motion, according to Gestalt theory of cognition
Automated Real-Time Recognition 445
[47], and predicated the coordinate values x t = (x t-1 + x t+1 )/2 and yt = (yt-1 + yt+1 )/2.
This assumption leads to some measurement error if the video-frame for the endpoint
of motion is not sampled. We also identified perceptual time for gesture boundaries and
synchronization distortion.
The experiment was repeated 500 times for each gesture. Table 3 shows a confusion
matrix (in percentage) to describe the accuracy and mislabeling of gestures appreciate,
interest and request (class 7), question (class 4) and interrogation (class 5b).
Table 3. Confusion matrix (in percent) for labelling some similar gestures
The gestures ‘question’ and ‘interrogation’ are similar and share five major attributes:
head-nod, no head-shake, no head-tilt, eyes focused and speech (see Table 1 and Table 2).
However, ‘interrogation’ has a cycle and ‘question’ does not have a cycle. In addition,
additional attributes such as start-synchronization (in ‘question’), number of places and
transitions also differ.
The result shows a high percentage of recall (correct labeling of observed gestures)
for all five gestures. Recall varies from 81.5% for the actual gesture ‘appreciate’ to 91%
for the actual gesture ‘interrogate’. This is significant, as it establishes the importance
of motion analysis for accurate gesture identification.
There are many mislabelings: actual gesture ‘appreciate’ mislabeled as other gestures
(7.8%), as ‘interest’ (1.8%) and as ‘interrogate’ (1.2%); the actual gesture ‘interest’
mislabeled as ‘request’ (1.4%) and as ‘interrogation’ (3.4%); the actual gesture ‘request’
mislabeled as other gestures (2.1%), as ‘interest’ (1.5%) and as ‘question’ (3.4%).
The mislabeling between the gestures ‘appreciate’ and ‘interest’ stems from the
choice of threshold to derive start and end synchronization (see Table 1), the lack of
comprehension of the meaning of the associated dialogs and sensing inaccuracies. Larger
threshold for start synchronization treats during synchronization as start synchronization,
and a smaller threshold misses the start synchronization. Sensing inaccuracies are caused
by sampling instance of the motion and speech-digitization.
The mislabeling between the gestures ‘interest’ and ‘request’ is caused by missing
a place and error in the relaxed-state temporal threshold that may insert a place (see
Table 1) and the lack of dialog analysis. The mislabeling between the gestures ‘request’
and ‘interest’ can be reduced by dialog analysis.
There are mislabeling errors in the gestures ‘question’ and ‘interrogation’: ‘inter-
rogation’ is mislabeled as ‘question’ 0.9% of the time, and ‘question’ is mislabeled as
446 A. Singh and A. K. Bansal
‘interrogation’ 1.7% of the time. Signature analysis in Table 1 shows that ‘interrogation’
being mislabeled as ‘question’ is caused by a missing cycle during error in the proximity
threshold to identify visited places. The mislabeling of the gesture ‘question’ as interro-
gation is caused by mixing of a small amount of ‘backchanneling’ with ‘question’. The
mislabeling can be reduced by dialog analysis and better tuning of proximity threshold.
The mislabeling of the gesture ‘interest’ as ‘interrogation’ (3.4%) and the gesture
‘interrogation’ as ‘interest’ (3.7%) are quite striking due to two gestures being in the
different classes based upon major attributes’ classification (see Table 2). The mislabel-
ing of ‘interrogation’ as ‘interest’ is caused by the differences in one or more of three
attributes (see Tables 1 and 2): 1) the missing of a cycle due to error in the proximity
threshold to identify visited places; 2) presence of the focused-eye during ‘interroga-
tion’ compared to unfocussed eye in ‘interested’; 3) the presence of slight tilt during
‘interrogation’. This mislabeling can be reduced by dialog analysis.
Mislabeling is also caused by missing small undetectable motions in gestures and
missing feature-points and frames in video analysis, resulting into missing places and
the corresponding transitions. Larger angular threshold for eye-focus causes mislabeling
such as disagreement being labeled as denial or discourage (see Table 1). Signatures
also lack the results of speech analysis, dialog-context, motion-speed, facial expression
analysis and number of cycles. To improve the accuracy, motion analysis needs to be
augmented with head-motion sequence, dialog context, facial expression analysis, and
dialog understanding.
An error in the measurement of actual changes in x and y coordinates during tilt
+ an acyclic head-shake can also be mislabeled as an acyclic head-nod + tilt in the
corresponding signature. Similarly, small tilt may be treated as ‘no tilt’ because of the
limitations in video analysis and feature-point detection.
These errors cause actual gestures to be mislabeled around 9%–17% of the time.
Despite mislabeling, combination of motion analysis, synchronization information cap-
ture and cycles in the Petri net graph gives around 83%–91% accurate recognition of the
gestures.
Currently, we are looking at patterns of head-motion analysis and LSTM based anal-
ysis of stillness vectors and coordinate stream to reduce the error caused by sampling
intervals. Thresholds for start and end synchronization affect the detection of synchro-
nization and signatures accordingly. The threshold for the same gesture varies for indi-
viduals. Threshold and motion-speed also change with gestures [48]. Hence, thresholds
must be adaptive based upon gesture prediction. We are also looking at transfer learning
for adapting the learnt gesture for different age and gender. We are also looking into
separating head-motions involved in deictic gestures from the head-motions involved in
co-speech gestures. We are also doing speech analysis to separate co-speech gestures
such as ridicule and denial from gestures such as disagreement.
References
1. Yenilmez, M.I.: Economic and social consequences of population aging the dilemmas and
opportunities in the twenty-first century. Appl. Res. Qual. Life 10(4), 735–752 (2015). https://
doi.org/10.1007/s11482-014-9334-2
2. Agrigoroaie, R.M., Tapus, A.: Developing a healthcare robot with personalized behaviors and
social skills for the elderly. In: Proceedings of the 11th ACM/IEEE International Conference
on Human-Robot Interaction (HRI), pp. 589–590. Christchurch, New Zealand (2016). https://
doi.org/10.1109/HRI.2016.7451870
3. García, D.H., Esteban, P.G., Lee, H.R., Romeo, M., Senft, E., Billing, E.: Social robots in
therapy and care. In: Proceedings of the 14th ACM/IEEE International Conference on Human-
Robot Interaction (HRI), pp. 669–670. Daegu, Korea (2019). https://doi.org/10.1109/HRI.
2019.8673243
4. Rosenberg-Kima, R., Koren, Y., Yachini M., Gordon, G.: Human-robot-collaboration (HRC):
social robots as teaching assistants for training activities in small groups. In: Proceedings of the
14th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 522–523.
Daegu, South Korea (2019). https://doi.org/10.1109/HRI.2019.8673103
5. Diftler, M.A., et al.: Robonaut 2 – the first humanoid robot in space. In: IEEE International
Conference on Robotics and Automation, pp. 2178–2183, Shanghai, China (2011)
6. Glas, D.F., Minato, T., Ishi, C.T., Kawahara, T., Ishiguro, H.: ERICA: the ERATO intelligent
conversational android. In: Proceedings of the 25th IEEE International Symposium on Robot
and Human Interactive Communication (RO-MAN), pp. 22–29, New York (2016)
7. Kendon, A.: Gesture: Visible Actions as Utterance. Cambridge University Press, Cambridge,
UK (2004)
8. Singh, A., Bansal, A.K.: Declarative modeling and implementation of robotic head-based
gestures for human-robot interaction. Int. J. Comput. Appl. 16(2), 49–66 (2019)
9. Singh, A., Bansal, A.K.: Towards synchronous model of non-emotional conversational gesture
generation in humanoids. In: K. Arai (ed.) Intelligent Computing. LNNS, vol. 283(1), pp. 737–
756. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-80119-9
10. Allen, J.F.: Maintaining knowledge about temporal intervals. Commun. ACM 26(11), 832–
843 (1983). https://doi.org/10.1145/182.358434
11. Singh, A., Bansal, A.K.: Towards modeling gestures for non-emotional conversational inter-
action by humanoid robots. In: Proceedings of the 31st International Conference on Computer
Applications in Industry and Engineering, pp. 59–64. New Orleans, LA, USA, (2018)
448 A. Singh and A. K. Bansal
12. David R., Alla, H.: Petri Nets & Grafcet, Tools for Modelling Discrete Event Systems, Prentice
Hall, New York, USA (1992)
13. Liu, H., Wang, L.: Gesture recognition for human-robot collaboration: a review. Int. J. Ind.
Ergon. 68, 355–367 (2018). https://doi.org/10.1016/j.ergon.2017.02.004
14. Wang, P., Li, Z., Hou, Y., Li, W.: Action recognition based on joint trajectory maps using
convolutional neural networks. In: Proceedings of the 24th International ACM Conference
on Multimedia, pp. 102–106. New York (2016) https://doi.org/10.1145/2964284.2967191
15. Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. 43(3),
Article 16 (2011). https://doi.org/10.1145/1922649.1922653
16. Liu, M., Liu, H., Chen, C.: Enhanced skeleton visualization for view invariant human
action recognition. Pattern Recogn. 68, 346–362 (2017). https://doi.org/10.1016/j.patcog.
2017.02.030
17. Gholamrezaii, M., Almodarresi, S.M.T.: Human activity recognition using 2D convolutional
neural networks. In: Proceedings of the 27th Iranian Conference on Electrical Engineer-
ing (ICEE), pp. 1682–1686. Yazd, Iran (2019). https://doi.org/10.1109/IranianCEE.2019.878
6625
18. Qiu, Z., Yao, T., Mei, T.: Learning spatio-temporal representation with pseudo-3D residual
networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV),
pp. 5534–5542. Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.590
19. Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recogni-
tion. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013). https://doi.org/10.1109/
TPAMI.2012.59
20. Arunnehru, J., Chamundeeswari, G., Bharathi, S.P.: Human action recognition using 3D con-
volutional neural networks with 3D motion cuboids in surveillance videos. Procedia Comput.
Sci. 133, 471–477 (2018). https://doi.org/10.1016/j.procs.2018.07.059
21. Yang, H., Yuan, C., Li, B., Du, Y., Xing, J., Hu, W., et al.: Asymmetric 3D convolutional
neural networks for action recognition. Pattern Recogn. 85, 1–12 (2019). https://doi.org/10.
1016/j.patcog.2018.07.028
22. Li, C., Zhong, Q., Xie, D., Pu, S.: Co-occurrence feature learning from skeleton data for action
recognition and detection with hierarchical aggregation. In: Proceedings of the International
Joint Conference on Artificial Intelligence (IJCAI), pp. 786–792. Stockholm, Sweden (2018).
https://doi.org/10.24963/ijcai.2018/109
23. Dong, L., Jin, Y., Tao, L., Xu, G.: Recognition of multi-pose head gestures in human con-
versations. In: Proceedings of the Fourth International Conference on Image and Graphics
(ICIG), pp. 650–654. Chengdu, China (2007). https://doi.org/10.1109/ICIG.2007.176
24. Thafar, M., Ghayoumi, M., Bansal, A.K.: A formal approach for multimodal integration to
derive emotions. J. Vis. Lang. Sent. Syst. 2, 48–54 (2016). https://doi.org/10.18293/DMS201
6030
25. Ishi, C.T., Liu, C., Ishiguro, H., Hagita, N.: Head motion during dialogue speech and nod
timing control in humanoid robots. In: Proceedings of the 5th ACM/IEEE International Con-
ference on Human-Robot Interaction (HRI), pp. 293–300. Osaka, Japan (2010). https://doi.
org/10.1109/HRI.2010.5453183
26. Kapoor, A., Picard, R.W.: A real-time head nod and shake detector. In: Proceedings of the
Workshop on Perceptive User Interfaces (ICMI-PUI), pp. 1–5. Orlando, FL, USA (2001).
https://doi.org/10.1145/971478.971509
27. Tan, W., Rong, G.: A real-time head nod and shake detector using HMMs. Expert Syst. Appl.
25(3), 461–466 (2003). https://doi.org/10.1016/S0957-4174(03)00088-5
28. Morency, L. P., Sidner, C., Lee, C., Darrell, T.: Contextual recognition of head gestures. In:
Proceedings of the International Conference on Multimodal Interfaces (ICMI), pp. 18–24.
Trento, Italy (2005). https://doi.org/10.1145/1088463.1088470
Automated Real-Time Recognition 449
29. Saunders, J., Syrdal, D.S., Koay, K.L., Burke, N., Dautenhahn, K.: Teach me–show me-end-
user personalization of a smart home and companion robot. IEEE Trans. Hum.-Mach. Syst.
46(1), 27–40 (2016). https://doi.org/10.1109/THMS.2015.2445105
30. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York, NY, USA
(2006)
31. Murase, H., Nayar, S.K.: Visual learning and recognition of 3-D objects from appearance.
Int. J. Comput. Vision 14(1), 5–24 (1995). https://doi.org/10.1007/BF01421486
32. Tang, J., Nakatsu, R.: A head gesture recognition algorithm. In: International Conference of
Multimedia Interfaces (ICMI), Beijing, China 2000, LNCS, vol. 1948, pp. 72–80. Springer,
Heidelberg (2000). https://doi.org/10.1007/3-540-40063-X_10
33. Lu, P., Zhang, M., Zhu, X., Wang, Y.: Head nod and shake recognition based on multi-
view model and Hidden Markov Model. In: Proceedings of the International Conference on
Computer Graphics, Imaging and Visualization (CGIV), pp. 61–64. Beijing, China (2005).
https://doi.org/10.1109/CGIV.2005.41
34. Ng-Thow-Hing, V., Luo, P., Okita, S.: Synchronized gesture and speech production for
humanoid robots. In: Proceedings of the IEEE/RSJ International Conference on Intelligent
Robots and Systems, pp. 4617–4624. Taipei, Taiwan (2010)
35. Otsuka, K., Tsumore, M.: Analyzing multifunctionality of head movements in face-to-face
conversations using deep convolutional neural networks. IEEE Access 8, 217169–217195
(2020). https://doi.org/10.1109/ACCESS.2020.3041672
36. Sharma, M., Ahmetovic, D., Jeni, L.A., Kitani, K.M., Recognizing visual signatures of spon-
taneous head gestures. In: Proceedings of the IEEE Winter Conference on Applications of
Computer Vision (WACV), pp. 400–408, Lake Tahoe, NV, USA (2018). https://doi.org/10.
1109/WACV.2018.00050
37. McGlaun, G., Althoff, F., Lang, M., Rigoll, G.: Robust video-based recognition of dynamic
head gestures in various domains - comparing a rule-based and a stochastic approach. In:
Antonio, C., Volpe, G. (eds.) 5TH International Gesture Workshop On Gesture-Based Com-
munication In Human-Computer Interaction (GW) 2003, LNAI, vol. 2915, pp. 180–197.
Springer-Verlag, Berlin Heidelberg (2004)
38. Lavee, G., Borzin, A., Rivlin, E., Rudzsky, M.: Building petri nets from video event ontologies.
In: Bebis, G., Tanveer S.-M., et al. (eds.) International Conference on Advances in Visual
Computing (ISVC) 2007. LNCS, vol. 4841, pp. 442–445. Springer-Verlag, Heidelberg (2007).
https://doi.org/10.1007/978-3-540-76858-6_44
39. Ghanem, N., DeMenthon, D., Doermann, D., Davis, L.: Representation and recognition of
events in surveillance video using Petri nets. In: Proceedings of the Second IEEE Workshop
on Event Mining, Computer Vision and Pattern Recognition, International Conference on
Computer Vision and Pattern Recognition, p. 112 (2004). https://doi.org/10.1109/CVPR.200
4.430
40. Mancas, M., Glowinski, D., Volpe, G., Coletta, P., Camurri, A.: Gesture saliency: a context-
aware analysis. In: Kopp, S., Wachsmuth, I. (eds.) GW 2009. LNCS (LNAI), vol. 5934,
pp. 146–157. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12553-9_13
41. Qiu, J., Wang, L., Wang, Y., Hu, Y.H.: Multi-event modeling and recognition using extended
petri nets. IEEE Access 8, 37879–37890 (2020). https://doi.org/10.1109/ACCESS.2020.297
5095
42. Beddiar, D.R., Nini, B., Sabokrou, M., Hadid, A.: Vision-based human activity recognition:
a survey. Multimedia Tools Appl. 79(41–42), 30509–30555 (2020). https://doi.org/10.1007/
s11042-020-09004-3
43. Vrigkas, M., Nikou, C., Kakadiaris, I.A.: A review of human activity recognition methods.
Front. Robot. AI 2(28), Article 28 (2015). https://doi.org/10.3389/frobt.2015.00028
44. Open CV. https://opencv.org. Accessed 29 Apr 2022
450 A. Singh and A. K. Bansal
1 Introduction
Showing affection is an ability machines have yet to master. Technologies are advanc-
ing and the way we communicate with them is becoming more natural as if we are
heading towards a symbiotic relationship. Human emotions contain so much meaning
and, in some cases, can be more expressive than words. Technology was invented and
developed to make our lives easier. For this reason, computers should be affective in
every aspect; in showing emotion and in perceiving it. This ability is referred to as affec-
tive computing, [1] defines it as “computing that relates to, arises from, or deliberately
influences emotions.” However, a system to perceive human emotion is not enough, the
system needs to know how to react and how to adapt to these emotions. A system is
considered to be adaptive if it has the ability to change its behavior based a change in its
environment or requirements. We aim for software systems to be adaptive to their users,
leading to a more desirable user experience.
Affective computing adds tremendous enhancements to the field of Human-Robot
Interaction (HRI). According to [2], HRI “is the interdisciplinary study of interaction
dynamics between humans and robots.” Robots have reached new heights in their tech-
nologies and their importance has risen as well. Affection in robotics is aimed to make
robots more socially acceptable, referred to as affective social robots [3] or socially adap-
tive robots [4]. These types of robots can produce a great candidate to replace emotional
support animals. An emotional support animal is an untrained animal that gives comfort
through companionship and is usually targeted to help patients of mental and emotional
illnesses [5]. A study by [6], concludes emotional support animals can be beneficial to
mental health patients.
An emotional support robot would be easier to maintain than a real animal and can
be programmed to do exactly what they’re supposed to do. For example, a patient with
Alzheimer’s disease or dementia could forget to feed their emotional support animal.
With an emotional support robot, this can be avoided as the robots can be taught to be
self-sufficient, such as going to their charging station when they’re running low on power.
Some users might enjoy the responsibility of having an animal, and an emotional support
robot can also be programmed to be more dependent on its owner. Not all people have
the same taste or enjoy being treated in the same manner. This is the reason behind our
emphasis on personalization, as one size shouldn’t fit all. We are different and subjective
to have our own tastes and opinions.
We propose an emotional support robot that adapts to its owner’s emotions. In this
framework we focus on personalization in two aspects. First, the personalization of
emotion recognition as in [7]. Personalizing emotion recognition will give higher detec-
tion accuracy, as not all humans portray emotions similarly. Second, personalizing the
emotional support robot’s responses to their owner, where we adopt a reinforcement
learning approach where the robot learns more about their owner over time. Similar to
a real emotional support animal at a young age learning their owner’s likes and dislikes
overtime. The longer the Emotional support robot is used, the more familiar it becomes
with its owner. In the following section we go over a number of Emotion Recognition
technologies that are applicable to our proposed framework.
2 Emotion Recognition
Human emotion is rather challenging for machines to interpret. Even as humans it’s not
always easy to fully comprehend another person’s emotions. Several factors account
to this, such as race and cultural differences [8]. This can also be due to difference in
personality, as some people show more or less emotions. There are numerous technolo-
gies that exist for emotion detection. Such as using neural input, voice/tone analysis,
heart rate tracking, and facial expression recognition. Neural input of brain waves can
be achieved by electroencephalography (EEG), the most commonly used technology for
Brain Computer Interface (BCI) [9]. BCI “is a system that measures activity of the cen-
tral nervous system and converts it into artificial output [10].” Emotions can be detected
An Emotional Support Robot Framework Using Emotion Recognition 453
using BCI, as done in [11], by feature extraction using fractal dimensions from EGG
signals then classifying those features into emotions. However, extracting human emo-
tions using neural input requires specific hardware that is not commonly owned or easily
accessible by the general public. Heart rate monitoring also requires specific hardware
but has become more widespread other the years. The majority of smart watches today
have the ability to monitor its user’s heart rate. Shu et al. [12] utilized a wearable smart
bracelet to recognize emotions based on heart rate data. In addition, speech emotion
recognition is a widely research area that has promising results. Detecting emotions
from speech can be achieved using classification based machine learning, as in [13].
We focus our research mainly on facial expression recognition using standard cam-
eras. Computer vision and Machine Learning are both used to achieve this task. Further-
more, there are different methods and algorithms to classify emotions from images, as
discussed and compared in [7]. Generally, a cascade classifier is used to first detect any
faces in an image. This can be accomplished with the Open Source Computer Vision
Library (OpenCV) [14]. Then facial features are extracted, compared, and classified into
emotions. In addition, the Deep Learning Library [15] can be utilized to extract facial
features. Within this library is a shape predictor that detects facial landmarks, devel-
oped by [16]. Calculations can be made among these facial landmarks to differentiate
between facial expression as done in [7]. In addition, cloud-based image classification
is also an option for recognizing facial expression within an image. In [17], the cloud
based emotion recognition services of Amazon, Google, and Microsoft are compared.
A personalized approach discussed and tested in [7], shows potential for higher emo-
tion detection accuracy. This approach entails that the system learns how to detect the
emotions of a specific person. This requires the system to run a preliminary training
phase, where the human is asked to purposely show the system how they portray cer-
tain emotions. The system captures a number of images per emotion supported by the
system. These images are labeled based on emotion and trained using a machine learn-
ing algorithm. A drawback with this approach is that the training process may be time
consuming and tedious for the user. Therefore, we suggest a hybrid approach, where
the system has a baseline emotion recognition method that is not personalized with the
capability of personalization overtime. This way, the robot can still function before any
personalized emotion detection training is accomplished. The robot can then perform
the personalized training in smaller increments over time.
3 Emotional Adaptation
The main objective behind this research is for an emotionally adaptive system where the
basis of adaptation is the human. Our goal is for an emotional support robot that adapts
its behavior based on its owner’s emotions. Adaptive systems can also be considered
autonomic systems which are, according to [18], “computing systems that can manage
themselves”. As described in [18], the main structure of an autonomic element consists
of a managed element and an autonomic manager. An autonomic system can consist of
more than one autonomic element. The autonomic manager consists of a feedback control
loop known as the MAPE-K loop [18]. This feedback control loop is considered the
most significant reference control model for autonomic systems [19]. The components
454 O. M. Al-Omair and S. Huang
of the MAPE-K loop are Monitor, Analyze, Plan, Execute and Knowledge. We map
the components of our proposed framework model to the MAPE-K loop, as presented
in Fig. 1, as evidence of autonomy. We define each component of the MAPE-K loop
according to our proposed model as follows:
Sensor is the physical element that captures the external elements that effect the
adaptation of the system. In our case, the sensor is the robot’s camera that views the
human. More than one sensor can be implemented. Using more than one sensor, espe-
cially for emotion recognition, will increase the adaptability of the system leading to a
more effective outcome. Such as a heart rate monitor as well as a camera at the same
time.
Monitor is the component of the system that gathers data from the sensor(s). In most
cases, monitoring should be continuous, unless otherwise specified in the adaptation
rules. The collected data is stored in the Knowledge component to be shared with the
other components of the MAPE-K loop. In our proposed model, the monitor component
is responsible for monitoring the human’s facial expressions. In the case of the sensor is
a heart monitor, the monitor element would continuously record the human’s heart rate.
Analyze is the component that is responsible of analyzing the data acquired by the
monitor component. This analysis will determine if adaptation is required or not. In our
case this component will analyze the facial expressions to extract emotions with the aid
of the personalized emotion recognition data that is stored in the knowledge component.
According to the adaptation rules, the analyze component determines whether a detected
emotion needs the robots to take an adaptive action or not.
Plan is executed only if the analyze component determines that an adaptive action
needs to be performed. This component may also determine if more than one action
needs to be performed. The plan component is in charge of determining the actual action
that should be performed. Action determination is based on the analysis data from the
analyze component and, in our case, the semi-random pool of actions that is stored in
the shared knowledge component. The semi-random pool of actions is further discussed
in Sect. 4.
Execute is the component that is responsible for executing the action or actions
determined by the plan component. The execute component uses the effectors of the
managed element to physically perform the action(s). In our case, the execute component
prepares the step-by-step instructions for the robot to perform the required action(s).
Effector, also known as actuator, is the physical element or elements that carry out the
instructions given by the execute component. For our model, these would be the robot’s
physical elements. These elements may include, but are not limited to, the robot’s arms,
hands, legs, wheels, screen, speaker. This differs based the capabilities and physical
elements of the implemented robot.
Autonomous systems may have more than one MAPE-K loop depending on the
nature of the system, specifically how may properties of adaptation the system has. We
discussed the emotional adaptation property of the Emotional Support Robot framework.
Whereas, for actual implementation, the ESR should have various adaptation properties,
hence more than one MAPE-K loop of autonomy can be applied. In addition, other
methods of emotion recognition can be implemented to the same loop of autonomy,
leading to a more effective system. In the following section, we discuss our proposed
framework and its components.
Camera. The robot’s camera is the sensor that monitors the human and also the robot’s
environment. It captures a number of facial images that would be processed within the
module. Other sensors can be implemented as well such as an EEG for brain wave moni-
toring, this will add another degree of emotion recognition and higher emotion detection
accuracy as previously discussed. After images of the human’s face are captured, the
images are sent to the facial features extraction component.
Facial Features Extraction. This component is in charge of locating the face within
the images acquired by the camera. This can be done with the Open Source Computer
456 O. M. Al-Omair and S. Huang
Vision Library (OpenCV) [14], as discussed in Sect. 2. In addition, facial features can
be extracted by using the Deep Learning Library [15].
Human. The human is the owner of the Emotional Robot. The framework is centric to
this component. The human shows emotions that are acquired by the emotion recognition
module and receives or is shown an action by the robot that is determined within the
reaction module.
Personal Emotion Recognition Data. This component is responsible for storing the
trained machine learning model for recognizing the emotions of the human. The emotion
recognition module depends on this component to know specifically how the human
portrays emotions. This file is specific to the owner of the robot, personalized to their
way of showing emotions.
This component is in charge of instructing the robot what action is to be performed based
the human’s current emotion. The components of this module are discussed below.
An Emotional Support Robot Framework Using Emotion Recognition 457
Emotion Processing. This component is given the human’s current emotion by the
emotion recognition module. It is responsible for registering and analyzing the human’s
emotion for action selection and action assessment.
process is random, but each action has a weight of possibility. It resembles a raffle,
but with the weights representing multiple raffle entries. Therefore, actions with higher
weight values have a bigger chance to be selected. After an action is chosen it is sent to
be executed.
Action Execution. This component is in charge of taking the selected action and pro-
ducing step-by-step instructions for the robot to perform. As discussed in Sect. 3, these
instructions are to be performed by the robot’s effectors.
Action Assessment. After an action is performed by the robot, the action assessment
component is responsible for measuring feedback of that action from the human. This
assessment is based on measuring the human’s emotions after the action is performed. If
the new emotion is positive, the action assessment component increments the weight of
that action and initial emotion within the semi-random pool of action. If the new emotion
is negative, the weight is decreased.
To clarify the idea of how the ESR functions we illustrate the step-by-step process
in Fig. 3. In the following section we discuss a number of related works in affective
robotics.
5 Related Works
The field of affective robotics is promising, and a great deal of research has gone into it.
In this section we discuss several studies that are related to our research. Chumkamon
An Emotional Support Robot Framework Using Emotion Recognition 459
et al. [20], propose a model for an emotional animal robot. Their model focuses on
emotional motivation, where the robot’s mood is stimulated based on its environment
which determines the robot’s level of motivation. Each level of motivation is correlated
with a number of behaviors, such as sleep, hate, gaze around. They also take the user’s
facial expressions into account for how the robot shows emotions for better social inter-
action. However, they only use the robot’s eyes to show emotions. Admoni et al. [21]
propose using eye gaze as a nonverbal commination cue to predict a user’s intention.
Their research is on the focus of a shared autonomy that can be used for assistive care.
Furthermore, recognizing human gesture and eye gaze are proposed by [22], to model
nonverbal behavior for socially adaptive robots.
An empathetic virtual agent, developed by [23], showed positive results in their study.
The empathetic virtual agent mimics a health counselor to help with health intervention.
Their system adapts its own behavior to mimic empathy, also referred to as facial mimicry,
by portraying facial expressions in response to the user’s facial expressions during a
conversation with the virtual agent. This is referred to as reflective listening, which is
conventional in the way we communicate amongst us humans. [24], also apply facial
mimicry in their model for empathetic virtual agents, with a focus on showing different
levels of empathy.
The proposed framework was designed for the purpose of enabling software systems
to understand human’s nonverbal cues. In our case, we focus on human emotions. We
discussed a number of Emotion Recognition options that can be applied to the proposed
framework. Applying more than one method for recognizing the human’s emotion would
lead to a more holistic solution. Our approach focuses on the detection of emotion from
facial expressions. This requires the robot to monitor the human to detect their emotion.
Adding a smart internet of things device, such as a smart bracelet, to monitor heart rate
would give a more continuous option for detecting a change in the human’s emotions.
Other methods such as, neural input, requires more expensive hardware, making it not
as easily applicable. For future work we aim to add more levels of emotion recognition
that would lead to more adaptation rules. In addition, other peripherals can be added
for additional functionality other than emotion recognition. Such as, using a heart rate
monitor to monitor the well-being of the patient and the robot can contact the health
authorities in case of an emergency.
460 O. M. Al-Omair and S. Huang
Acknowledgment. This work was supported by the Deanship of Scientific Research, Vice Presi-
dency for Graduate Studies and Scientific Research, King Faisal University, Saudi Arabia [Project
No. GRANT669].
References
1. Picard, R.W.: Affective computing. Pattern Anal. Appl. 1, 71–73 (1997). https://doi.org/10.
1007/BF01238028
2. Feil-Seifer, D., Matarić, M.J.: Human robot interaction. In: Encyclopedia of Complexity and
Systems Science, pp. 4643–4659. Springer, New York (2009)
3. Kirby, R., Forlizzi, J., Simmons, R.: Affective social robots. Rob. Auton. Syst. 58, 322–332
(2010). https://doi.org/10.1016/j.robot.2009.09.015
4. François, D., Polani, D., Dautenhahn, K.: Towards socially adaptive robots: a novel method for
real time recognition of human-robot interaction styles. In: 2008 8th IEEE-RAS International
Conference Humanoid Robot Humanoids 2008, pp. 353–359 (2008). https://doi.org/10.1109/
ICHR.2008.4756004
An Emotional Support Robot Framework Using Emotion Recognition 461
5. Carroll, J.D., Mohlenhoff, B.S., Kersten, C.M., et al.: Laws and ethics related to emotional
support animals. J. Am. Acad. Psychiatry Law 48(4), 509–518 (2020) https://doi.org/10.
29158/JAAPL.200047-20
6. Brooks, H.L., Rushton, K., Lovell, K„ et al.: The power of support from companion animals
for people living with mental health problems: a systematic review and narrative synthesis
of the evidence. BMC Psychiatry 18(1), 1–12 (2018). https://doi.org/10.1186/s12888-018-
1613-2
7. Al-Omair, O.M., Huang, S.A.: Comparative study of algorithms and methods for facial expres-
sion recognition. In: IEEE International Systems Conference (SysCon), pp. 1–6. Orlando, FL
(2019)
8. Ekman, P., Friesen, W.V.: Constants across cultures in the face and emotion. J. Pers. Soc.
Psychol. 17, 124–129 (1971). https://doi.org/10.1037/h0030377
9. Huang, S., Miranda, P.: Incorporating human intention into self-adaptive systems. In: Pro-
ceedings IEEE International Conference on Software Engineering, vol. 2, pp. 571–574 (2015).
https://doi.org/10.1109/ICSE.2015.196
10. Hill, N.J., Wolpaw, J.R.: Brain–Computer Interface✩. Ref Modul Biomed Sci. (2016).https://
doi.org/10.1016/B978-0-12-801238-3.99322-X
11. Kaur, B., Singh, D., Roy, P.P.: EEG based emotion classification mechanism in BCI. Procedia
Comput. Sci. 132, 752–758 (2018). https://doi.org/10.1016/J.PROCS.2018.05.087
12. Shu, L., Yu, Y., Chen, W., et al.: Wearable emotion recognition using heart rate data from a
smart bracelet. Sensors 20(3), 718 (2020). https://doi.org/10.3390/s20030718
13. Nwe, T.L., Foo, S.W., De Silva, L.C.: Speech emotion recognition using hidden markov
models. Speech Commun. 41, 603–623 (2003). https://doi.org/10.1016/S0167-6393(03)000
99-2
14. Introduction — OpenCV 3.0.0-dev documentation. https://docs.opencv.org/3.0-beta/mod
ules/core/doc/intro.html. Accessed 30 Jan 2018
15. dlib C++ Library. http://dlib.net/. Accessed 10 Jan 2018
16. Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees.
In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1867–1874. IEEE
(2014)
17. Al-Omair, O.M., Huang, S.: A comparative study on detection accuracy of cloud- based
emotion recognition services. In: The International Conference on Signal Processing and
Machine Learning. Shanghai, China, pp. 142–148 (2018)
18. Kephart, J.O., Chess, D.M.: The vision of autonomic computing. Comput. (Long Beach Calif)
36, 41–50 (2003). https://doi.org/10.1109/MC.2003.1160055
19. Arcaini, P., Scandurra, P.: Modeling and Analyzing MAPE-K Feedback Loops for Self-
Adaptation - IEEE Xplore Document (2015)
20. Chumkamon, S., Masato, K., Hayashi, E.: Facial expression of social interaction based on
emotional motivation of animal robot. In: Proceedings - 2015 IEEE International Conference
on Systems, Man, and Cybernetics, SMC 2015, pp .185–190. IEEE (2016)
21. Admoni, H., Srinivasa, S.S.: Predicting user intent through eye gaze for shared autonomy. In:
Proceedings of the 2016 AAAI Fall Symposium: Shared Autonomy in Research and Practice.
pp. 298–303 (2016)
22. Admoni, H., Scassellati, B.: Nonverbal behavior modeling for socially assistive robots. In:
Proceedings of the 2014 AAAI Fall Symposium: Artificial Intelligence for Human-Robot
Interaction (AI-HRI), pp. 7–9 (2014)
462 O. M. Al-Omair and S. Huang
23. Lisetti, C., Amini, R., Yasavur, U., Rishe, N.: I can help you change! an empathic virtual
agent delivers behavior change health interventions. ACM Trans. Manag. Inf. Syst. 4, 1–28
(2013). https://doi.org/10.1145/2544103
24. Boukricha, H., Wachsmuth, I., Carminati, M.N., Knoeferle, P.: A computational model of
empathy: empirical evaluation. In: Proceedings - 2013 Humaine Association Conference on
Affective Computing and Intelligent Interaction, ACII 2013, pp. 1–6. IEEE (2013)
How Does a Social Robot Analyze Emotions?
1 Introduction
We present an experiment on the automatic processing of emotions in a multimodal
framework. The results will be implemented in a social robot. They will take the form
of an emotion detector that simultaneously processes image, voice, and text. For a robot
to be empathetic, it must identify the emotions of its users. Empathy is defined as:
‘intuitive ability to put oneself in the place of others, to perceive what they feel’ (source
Larousse dictionary). Being empathetic does not mean sharing the same emotional state
but consists in manifesting an affective state testifying to the understanding of the feelings
of the other. It is an appropriate response to an emotional reaction. For example, we show
compassion when someone is sad or complicity when someone is happy. The multimodal
approach is necessary to automatically analyze emotions because the only processing of
the textual content is sometimes insufficient to identify the emotional state of the person
who speaks. For example, the French expression ça va (in English it’s ok) is interpreted
as a positive or negative emotion depending on the context. The prosodic analysis of this
utterance helps to disambiguate it. Pronounced with a falling melodic pattern in the infra-
low register and a jerky tempo implies a negative emotion, while produced with a rising
falling intonation pattern and a slower tempo will be interpreted as a positive emotion
[1]. Similarly, the analysis of facial expressions also makes it possible to disambiguate
this language expression [2].
For nearly four years, we have been developing UKKO, an intelligent dialogue
system [3–5]. This system will become a component of the social robot BUDDY [6]
inter alia. This component will allow the robot to have conversations with its users. The
creative aspect of the UKKO system sets it apart from other dialogue systems. Instead of
reproducing stored human formulations, UKKO produces appropriate utterances thanks
to an algorithm that uses linguistic resources to simulate human language. The way the
system works is based on human language modeling [7]. Its development is based on a
mixed approach, called extended intelligence, which combines a symbolic approach and
a numerical approach [8]. The first approach, called linguistic intelligence, gives rise
to metalinguistic rules which allow to: 1) decode the incoming message; 2) encode the
outgoing message, ensuring the adequacy of the two types of messages; 3) chain these
messages in a conversational form. The second approach, called artificial intelligence,
contributes to automatically producing the linguistic descriptions that are necessary for
the application of the metalinguistic rules developed as part of the first approach. The
complementarity between linguistic intelligence and artificial intelligence is central in
the development of the system.
The evaluation of verbal interactions between a social robot and its users has pro-
vided good results. Misunderstandings of incoming messages are rare. They are easily
correctable. The generation of outgoing messages is well controlled. The use of a social
robot is facilitated if its users forget that they are interacting with a machine [9]. The
humanoid form of the robot and its discursive competence contribute to the personifica-
tion of the robot. This is also true when the robot has an empathic skill [10]. Artificial
empathy is a set of marks of interest shown by a robot for its users [11]. Affective
computing studies this category of interaction [12]. This area of research has two com-
ponents: there is, on the one hand, the automatic recognition of human affective states
and, on the other hand, the automatic creation of marks of empathy. This paper deals
with the first part only.
In recent years, many works have been conducted in affective computing [13]. These
studies deal, inter alia, with multimodal emotion recognition, based on deep learning
[14, 15]. However, in these works, the number of emotions is quantitatively limited and
multimodality is confined either to the analysis of facial features and voice, or to the
analysis of voice features and the content of what is said.
We proceeded differently than in the works mentioned above. First, we automatically
classified facial and voice features. Second, we projected the emotions identified and
labeled automatically onto what is said about the classes of facial and voice features
of the first step when the three modalities are activated at the same time in the videos
studied. Third, we used a semi-supervised learning method based on deep learning. This
method exploits the results of the second step to label the other classes of facial and
voice features. These classes are not associated with verbally expressed emotions (this
is the most frequent case in our corpus). Our approach aims to automatically recognize
a wide variety of emotions on the face and voice in order to get closer to the variety of
emotions expressed verbally.
How Does a Social Robot Analyze Emotions? 465
First, we specify the issues of the research and the methodology used. Secondly, we
present the experimental protocol used to test the research work hypothesis. Third, we
discuss the interpretation of the obtained results. Finally, we specify what will be the
extension of this work.
2 Research Problem
Emotions are a subject of study in several scientific disciplines. Among the many studies
in philosophy on emotions, there are those of René Descartes. Emotions, called by
the philosopher “passions of the soul” transcend the Cartesian body-mind dichotomy,
because as subjects of emotions, true human beings are necessarily aggregates of body
and spirit [16]. Studies in psychology on the emotions of William James and Carle
Lange at the end of the 19th century are also essential. They define emotions as felt
bodily reactions: the triggering of emotion would be determined by the perception of
a peripheral activation pattern [17]. This analysis of emotions was challenged at the
beginning of the 20th century by physiologists Walter Bradford Cannon and Philip Bard:
the triggering of emotion is determined by the processing of a stimulus at the level of the
central nervous system, the peripheral activation pattern being neither specific nor causal
[18]. More recent studies in neuroscience distinguish two categories of affect: emotions
and feelings [19]. The first category is physiological, an emotion appearing essentially
on the body, for example, lumped shoulders. The second category is neuropsychological
since feeling is considered a cognitive process. A causal continuity, in which the emotion
precedes the feeling, is established between the two kinds of affect.
First, it emerges from all these studies that the physiological properties of emotions
are undeniable. Therefore, these properties are necessary for the automatic detection
of emotions. Their physiological aspect is integrated into the processing presented here
because of its multimodal nature: two somatic indicators are analyzed: facial expressions
and voice. These studies focus on the inner point of view instead of the outer point of
view.The inner point of view is that of the subject who feels the affects. It explains how
emotions work by modeling the human mechanisms that produce them. The external
point of view is that of the observer. It is adopted in this study since its purpose is to
detect the emotions of users of a social robot.
Detecting emotions involves categorizing them according to specific markers: bodily
attitudes, facial expressions, the voice, and the messages it delivers. When the observer
is a human being, the physiological aspects are fundamental because they echo those
that characterize his own emotions: a large part of our interactions with the surrounding
environment and our emotional behaviors depend on our ability to perceive and under-
stand the emotions of others [20]. This is the first challenge of the research presented
here.
A language is a tool used by human beings to communicate. It allows describing the
emotions in a very deep way. The literature has described affective feelings in detail.
Therefore, languages have a very complete lexicon to signify them. In a more famil-
iar, even very familiar register, they also have a large stock of expressions to express
their emotions; for example, c’est le fun (in English it’s fun) to express satisfaction. The
second challenge of the research is to rely on the description of the vocabulary to desig-
nate the emotional markers that are physiological. The starting point is a very detailed
466 P.-A. Buvet et al.
analysis of affect predicates [21]. We rely on this analysis to define the physiological
markers of emotions from their textual markers. According to this theory, utterances
proceed from predicate-argument structures. The predicates are the linguistic forms of
oriented relations between entities corresponding to their arguments. Affects are cogni-
tive processes. Their particularity is to be centered on the psychic interiority of people.
They are distinct from sensations. These have the particularity of being centered on the
physiological interiority of people. This distinction is not absolute because there are
interactions between the two kinds of interiority. The feeling of cold can go hand in
hand with annoyance. Similarly, disgust can lead to nausea. It is these interactions that
explain the physiological aspect of emotions in the context of the evolution of species
[22]. Affects are distinguished from cognitive processes that are not centered on the
interiority of people, those that contribute to processing information from the outside
world. Affects correspond to moods, emotions, and feelings. Moods and emotions are
conceived as reflexive relations: their point of departure and their point of arrival are the
same. Feelings are conceived as oriented binary relationships: their starting point is a
human being, and their point of arrival is another human being. The linguistic forms of
these affects are predicates. For example, the French adjectives morne (in English bleak),
morose(in English gloomy), somber (in English depressed)and the French phraseolo-
gism avoir le moral dans les chaussettes (literally ‘to have moral in the socks’) are
mood predicates of the class MOROSITY, the French nouns frayeur (in English fright),
frousse (in English jeatters), peur (in English fear) and the French phraseologism les
avoir à zero (literally ‘to have them to zero’) are emotion predicates of the class FEAR
and the French verbs détester(in English to dislike), exécrer (in English to execrate) and
haïr(in English to hate) and the French phraseologism ne pas pouvoir le voir en peinture
(literally’not being able to see it in painting’) are feeling predicates of the class HATE.
The French lexicon of affect predicates contains at least 5000 monolexical or polylexical
units.
Affect predicates are adjectives, adverbs, nouns, and verbs corresponding to simple
words or complex words. Their identification and categorization in terms of emotion,
mood, and feeling result from their ontological properties specified above. The subdi-
vision of the three main categories into sub-categories, called classes, results from a
thorough linguistic analysis of their lexical items. This is based on both their semantic
properties and their distributional properties. Items of the same class are quasi-synonyms
and share the same contexts. Now we are discussing only emotions. For these, 11 classes
are identified. They are distributed on a first axis according to their tonality (negative
tonality versus positive tonality). From this point of view, the class SURPRISE is neu-
tral while the class ANGER is the most negative and the class JOY is the most positive.
Between these poles, the other classes are distributed as follows (from the negative tone to
the positive tone): FEAR, DISGUST, DISSATISFACTION, BURDEN, SADNESS, and
CONFUSION (classes of negative emotion); CONTENTMENT and APPRECIATION
(classes of positive emotion).
Each class of emotion is subdivided into subclasses according to their intensity
(low intensity versus high intensity). For each class, the subclasses are listed from low
intensity to high intensity. The class ANGER subsumes the subclasses IRRITATION
(low intensity) and RESENTMENT, VARIOUS ANGER (high intensity). The class
How Does a Social Robot Analyze Emotions? 467
FEAR subsumes the subclasses WORRY and APPREHENSION (low intensity) and
VARIOUS FEAR and TERROR (high intensity). The class DISGUT subsumes the sub-
classes DISPLEASURE (low intensity) and VARIOUS DISGUT (high intensity). The
class DISSATISFACTION subsumes the subclasses FRUSTRATION, CONTRARIETY
(low intensity), and DISAPPOINTMENT, VARIOUSDISSATISFACTION, MORTI-
FICATION, and INDIGNATION (high intensity). The class BURDEN subsumes the
subclasses DISENCHANTMENT and WEARINESS (low intensity) and VARIOUS
BURDEN, DISCOURAGEMENT, DESPAIR (high intensity). The class SADNESS
subsumes the subclasses VARIOUS SADNESS and UNHAPPINESS (strong inten-
sity). The class CONFUSION subsumes the subclasses DISCOMFORT (low inten-
sity) and VARIOUS CONFUSION, DISORIENTATION, TROUBLE (high intensity).
The class SURPRISE subsumes the subclasses ASTONISHMENT (low intensity) and
VARIOUS SURPRISE, STUPEFACTION (high intensity). The class CONTENTMENT
subsumes the subclasses VARIOUS CONTENTMENT (low intensity) and ENTHUSI-
ASM and SATISFACTION (high intensity). The class APPRECIATIONsubsumes the
subclasses VARIOUS APPRECIATION, PLEASURE, ADMIRATION, and FASCI-
NATION (strong intensity). The class JOY subsumes the subclasses VARIOUS JOY,
HAPPINESS, and EXALTATION (strong intensity). An excerpt from the emotions list
is provided in Table 1.
The projection of emotion subclasses on the tone axis (x-axis) and the intensity
axis (y-axis) provides a categorical and dimensional representation of these affects.
Therefore, a subclass corresponds to a point in this two-dimensional space. It is defined by
coordinates related to its tone or intensity. Table 1 shows this representation of emotions.
The third challenge is to exploit this linguistic knowledge about emotions to automat-
ically obtain new knowledge about the physiological aspect of emotions (facial expres-
sions and voice). The goal is to integrate them into a device based on artificial intelli-
gence. From this point of view, the chosen approach is based on extended intelligence
(see above).
Our study starts from the observation of recovered video recordings. Emotions that
are physiologically signified appear much more frequent than emotions that are expressed
orally. Therefore, to detect emotions in a multimodal way, we made the following hypoth-
esis: it is possible to infer the emotions signified by facial expressions and the voice of
the emotions said by the people who feel them. This hypothesis is associated with three
questions: 1) Which data can be used to automatically process emotions in a multimodal
setting?; 2) Which unit of analysis should be chosen for automatic processing?; 3) Is the
categorization of data text relevant for the categorization of image and sound data?
468 P.-A. Buvet et al.
3 Experimentation
The first step is the constitution of the corpus. It is a set of videos from the web. It
is subdivided into three subcorpora. The first sub-corpus is a test corpus. Its role is to
use data to develop analytical tools for experimentation. The second sub-corpus is a
validation corpus. Its role is to use data to verify the quality of the analysis tools we have
developed for the experiment. The third sub-corpus is an evaluation corpus. Its role is
to use data to verify the quality of the results of the analyses.
The sound quality of the input file and the quality of its images are necessary condi-
tions for efficient information processing. The selection of the sources of information is
a preliminary work to the analysis of the contents. Three criteria are used for profiling
How Does a Social Robot Analyze Emotions? 469
the corpus: 1) people are facing the camera and their face is uncovered; 2) the soundtrack
does not contain extraneous noise; 3) the words recorded are those of the people filmed.
The first phase of the experiment aims to segment the raw corpus according to each of
the comments made by the people filmed. We chose VOSK [23] as our speech-to-text
tool. This is the most efficient for representing a monologue as a series of utterances.
Figure 1 illustrates the processing of VOSK. The input represents the continuous sound
stream (its language content is indicated in italics). The output is the representation of
this stream in a textual form. It is the segmentation of the initial stream into utterances.
Sound streams (WAV format) are extracted from the various videos in the corpus.
They are obtained by using the software FFmpeg [24]. This software was chosen because
it provides a sound file compatible with the processing performed by VOSK. The results
of the speech-to-text tool are then used to split video files, using the Python language’s
moviepy module, and sound files, using the Python language’s pydub module. The size
of the split files is not identical. It is delimited by the utterance of the text file. Each
utterance determines the segmentation of the text file and the segmentation of the sound
and image files from which it comes. In this way, it is possible to align the image, sound,
and text streams extracted from the original video file. The average length of split files
is 5 sc.
INPUT OUTPUT
{
"text" : c'était le premier livre
d'enquête"
}{
"text" : "bon voilà donc on peut
être européen et défendre ses
intérêts nationaux c'est pas
incompatible"
}{
c'était le premier livre d'enquête bon voilà donc on peut être
"text" : " et voilà ce n'est pas
européen et défendre ses intérêts nationaux c'est pas incompatible
se poser cette question là ce
et voilà ce n'est pas se poser cette question là ce n'est pas à un
n'est pas à un problème
problème justement le souci en france c'est qu'on a tendance à
justement"
faire des beaux débats sur des concepts donc on se réfugie
}{
derrière il y a la casaque bleue rouge"
"text" : "le souci en france
c'est qu'on a tendance à faire
des beaux débats sur des
concepts"
}{
"text" : "donc on se réfugie
derrière il y a la casaque bleue
rouge"
}
Then, the data from the image and sound streams are pre-processed to
make them more manipulable. The image stream data are filtered using the
shape_predictor_68_face_landmarks shape predictor. It is pre-trained on the ibug 300-
W dataset. The tool only analyses facial features: eyes, eyebrows, mouth, and nose.
These strokes are associated with geometric shapes and represented in a digital format.
470 P.-A. Buvet et al.
They are then automatically processed to segment facial expressions. Table 2 shows this
representation of facial features.
… … … … … … … … …
…
The sound stream is filtered to extract prosody markers from the voice recording
(for example, voice intensity; voice pitch, or speech rhythm). The raw data are filtered
using the openSMILE tool. It identifies all kinds of voice characteristics. Only the char-
acteristics of a prosodic nature are exploited. They reveal emotions [25]. The prosodic
features identified by the tool are represented in a digital format. Then, they have pro-
cessed automatically to segment the prosodic markers. Table 3 shows this representation
of prosody.
The prosodic variations are shown in the three right columns of Table 3. The left
column specifies the time course.
Text stream preprocessing consists of preparing training data for a supervised learn-
ing method. The data are obtained by tagging a portion of the text stream with the
How Does a Social Robot Analyze Emotions? 471
semantic analysis engine of the UKKO system. The engine is configured to qualify and
detect verbally expressed emotions. The analysis integrates the centering of the emo-
tions on the person who speaks and not on another person. For example, the sequence
j’ai de la peine(in English I feel sorry) is labeled SADNESS while the sequence il a de
la peine (in English he feels sorry) is not. Therefore, it is not only keywords that are
identified but also their context. A posteriori human verification validates the quality of
the semantic labeling carried out. This allows any faulty labels to be corrected. Table 4
presents emotion labels inserted into the text to provide training data.
Content emotion
"on passe d'ailleurs plus il ne pèse plus grange et qui ne neutral
pèse plus grand-chose"
"mais en fait comme ils sont toujours en contentieux" neutral
"je vais être vulgaire je suis désolé on est en train de contrariety
se faire botter le cul"
"oui ils sont inquiets par la présence de la chine en neutral
afrique"
"d'ailleurs le fils débit et plus c'est compliqué confusion
l'histoire mais enfin en tout cas le lien de la famille
plus que des institutions tchadiennes "
"oui mais déjà c'est voilà ils sont pas très formés pas neutral
très compétents généralement lors des fouilles et des
surprises d'ailleurs"
"comme c'est des sujets complexes plupart du temps" confusion
"et c'est parfois un reproche qu'on me fait sur mes confusion
enquêtes c'est trop complexe"
"c'est trop compliqué vous a trouvez ça très simple" confusion
"il est neuf cent trente mille abonnés" neutral
"de mon point de vue ce qui m'inquiète c'est que" worry
"ils sont abonnés à des lettres confidentielles" neutral
"c'est pas très sexy c'est compliqué" confusion
"elles sont même quand elles sont posées les enjeux sont neutral
très mal décrypté et c'est une responsabilité collective"
"moi ce qui m'a frappé après mon passage mon premier surprise
passage chez vous c'est que"
"mais on voit bien que ça ne fonctionne plus et ça disenchantment
fonctionne plus puisque la cinquième république totalement
verticale dans un monde qui est aussi"
472 P.-A. Buvet et al.
The second step is the processing of the image, sound, and text streams. The streams
are analyzed separately. The goal is to obtain classes of facial expressions and classes of
prosodic markers with the image and sound streams. The classes are labeled during the
third step. The flows are segmented with the iterative K-means algorithm. The Elbow
for optimal K method heuristically determined the number of segments. The number
of segments proposed is 11. For the processing of the image stream. The number of
segments proposed is 5 for processing the sound stream. This number is insufficient for
the detailed analysis of emotions. Also, the number of segments chosen is 12 for the
processing of the two streams. This number corresponds to the 11 categories of emotions
presented above and to the neutral category, i.e. the absence of emotion. This number
will then be increased empirically to integrate the sub-categories of emotions. Table 5
shows an extract of the segmentation performed on the image stream.
left_ey rigth_e left_ right all_m open_ nose/ left_ey rigth_ey clus
e/face ye/face eye _eye outh mouth mouth ebrow ebrow ters
Text Emotion
1 tout le monde est dans l’invective et tout le monde dans la morale cela ANGER
me met en rage de constater cela
2 C’était gênant de savoir qu’elleen souffrait CONFUSION
3 savoir qu’elle est heureuse de vivre me comble de joie JOY
4 quand elle m’a dit oui j’étais content CONTENTMENT
5 sa réaction m’a étonné SURPRISE
The third step consists in merging the new data obtained during the previous step
and detecting all the emotions expressed by a facial expression or a voice. We specify
the key principles of the processing of this data. The learning method is semi-supervised
[26]. First, the data are merged. For each of the split file triplets during the first step,
the emotion deteincted in the text file is assigned to the facial expression class and the
prosodic marker class of the corresponding video and sound files. Secondly, the result
of the fusion is exploited with a deep semi-supervised learning algorithm to predict
the emotions expressed in video files and sound files that have not been labeled from
a text file. The neural network used is based on the SGAN model (SGAN stands for
Semi-supervised Generative Adversarial Network). It predicts non-verbal emotions.
How Does a Social Robot Analyze Emotions? 475
4 Interpretation
Interpreting the results of the experiment consists in answering the questions associated
with the initial hypothesis. The answers obtained confirm, or invalidate, the hypothesis.
The interpretation depends on the evaluation of the analyses presented in the previous
section.
The evaluation concerns the corpus that is specific to it. To avoid hypothesis con-
firmation bias (processing previously validated information), it is necessarily different
from the test and validation corpora. Corpus profiling is essential for evaluation. The
videos retrieved must be sufficiently representative of the phenomena studied. The goal
of the evaluation is to test the relevance of the semantic categorization of image, sound,
and text flows in terms of emotion.
The evaluation of the model used for voice and face analysis aims to measure the
quality of segmentation using the Silhouette index:
(b − a)
(max(a, b))
The Silhouette index is defined for each sample. It is composed of two scores a,
average distance between a sample and all the other points of the same cluster, and b,
average distance between a sample and all the other points of the closest cluster.
After segmentation, fusion is performed to label the data. A part of labeled textual
data is used to calculate the similarity of the results obtained using the Jaccard index:
|A ∩ B|
J(A, B) =
|A ∪ B|
The Jaccard index is used to measure the similarity between sets of finite samples.
It is the size of the intersection divided by the size of the union of the sample sets.
Table 9 shows that The results obtained validate our hypothesis: it is possible to
automatically infer the emotions signified by facial expressions and the voice from the
emotions that are expressed verbally by those who feel them.
From the results, the model is deployed in an HDF file (HDF stands for Hierarchical
Data Format), type HDF5. The file includes the configuration of the model and the
weights used for labeling.
476 P.-A. Buvet et al.
5 Perspectives
The research presented here shows that automatic, multimodal, and detailed analysis
is possible. It is fundamental to apprehend the emotional variety instead of the five
fundamental emotions. An empathetic response is different when someone shows fear
or when he is terrified.
Incorporating a system that identifies the emotions into a social robot requires the
machine to respond to those signals appropriately. The empathic words formulated by
the robot must correspond as closely as possible to the emotions felt by its users. Any
discrepancy between an emotional manifestation of a human being and an empathetic
reaction of a robot tends to discredit its use. This is why it is necessary to broaden the
spectrum of automatically detected emotions.
Once the research results are stabilized, we will develop an emotion detector that
analyzes multimodality (image, voice, and text) and emotional variety. The signals from
the detector will be integrated into the knowledge base of the intelligent dialogue system
called UKKO. The role of this knowledge base is to contextualize the verbal interactions
between the machine and its users. It integrates extralinguistic information in the analysis
of incoming messages and the synthesis of outgoing messages. Informing the system
about the emotions of users will ultimately make the robot empathetic and, consequently,
increase its sociability.
References
1. Lacheret, A.: Le corps en voix ou l’expression prosodique des émotions. Evolutions
Psychomotrices, Fédération Européenne des Psychomotriciens 23(90), 25–37 (2011)
2. Bassil, J.: Facial motion in the perception of faces and of emotional expression. J. Experimental
Psychology 4(3), 373–379 (1978)
3. Buvet, P.-A., Fache, B., Rouam, A.: How does a robot speak? about the man-machine ver-
bal interaction. In: Proceedings of The 3rd International Workshop on the Applications of
Knowledge Representation and Semantic Technologies in Robotics ((AnSWeR19), CEUR.
http://ceur-ws.org/Vol-2487/(2019)
4. Buvet, P.-A., Fache, B., Rouam, A.: Interview with a robot: How to Equip the Elderly Compan-
ion Robots with Speech?. In: Proceedings of the Future Technologies Conference (FTC 2020,
2, pp. 310–326, Springer, Heidelberg (2020). https://doi.org/10.1007/978-3-030-63089-8_20
5. Buvet, P.-A., Fache, B., Rouam. A.: Which intelligence for human-machine dialogue sys-
tems?. In: Proceedings of the Future Technologies Conference (FTC 2021), 1, pp. 121–133,
Springer, Heidelberg (2021). https://doi.org/10.1007/978-3-030-89906-6_10
6. http://www.bluefrogrobotics.com/robot/
7. Buvet, P.-A.: Linguistique et intelligence, Etudes de linguistique appliquée, Klincksieck (in
press)
8. Buvet, P.-A.: Prédication et relation: l’exemple de la détermination, in Le prédicat en
questions, Champion, Paris (in press)
9. Tisseron, S. : Le jour où mon robot m’aimera, Albin Michel, Paris (2015)
10. Devillers, L.: Les Robots émotionnels. Santé, surveillance, sexualité… : et l’éthique dans tout
ça ? Editions de l’Observatoire, Paris
11. Bensoussan, J., Bensoussan A.: IA, robots et droit, Liège (2019)
12. Pelachaud, C. (ed.): Systèmes d’interaction Émotionnelle. Lavoisier, Paris (2010)
How Does a Social Robot Analyze Emotions? 477
Abstract. In the past few years, autonomous vehicles have been the subject of
innovation and technology amidst the challenges and difficult situations surround-
ing the urban driving and infrastructure. The recent developments in the sensors
and embedded systems made way to do a deeper analysis of the cost and needs
of AV. This study aims to make use of these new technologies and sensor to have
a better recognition model of the relevant road actors surrounding autonomous
vehicles. The results of this study made it possible to use depth and RGB data to
recognize obstacles in an urban road driving scenario. The mean average preci-
sion of the model across all labels shows acceptable results running in ten frames
per second setting. The model was deployed in an autonomous golf picker buggy
using Robotic Operating System.
1 Introduction
In the past few years, researchers found interest in the evolving field of Autonomous
Vehicles due to the enormous capabilities of available sensors nowadays [1]. These
self-driving vehicles heavily rely on its capabilities in motion planning, path planning,
perception, localization and controls. Automation in the transport industry offers a wide
range of advantages. To achieve their full potential in the market, autonomous vehicles
deal with the safety and comfort of their users. Self-driving vehicles have been the talk
of many studies exploring the potential use cases of future mobility solutions. Over the
last five decades, several companies are competing to achieve Level 5 Full Automation.
Taking this into a high note, this requires the vehicle to sense the local environment,
classify important objects on the road that detects in real-time for both day and night-
time even under rain conditions which also requires very big data. A part of the Level 5
Full Automation is the application of deep neural networks in classifying road actors.
Automation in the transport industry offers a wide range of advantages. To achieve
their full potential in the market, autonomous vehicles deal with the safety and comfort
of their users. Self-driving vehicles have been the talk of many studies exploring the
potential use cases of future mobility solutions [1–3].
One of the common techniques in this task of object detection and recognition is
Deep Neural Network (DNN). It can be utilized in two famous techniques, namely,
one-stage detection and two-stage detection. One-stage detection includes You Only
Live Once (YOLO) [5] and Single Shot Detection (SSD) [4]. On the other hand, two-
stage detection includes Region Convolutional Neural Network (R-CNN) [7] and Spatial
Pyramid Pooling (SPP) [6]. The advantage of one stage detection is that it detects on
a higher speed and in a real-time basis, but this shows a slight disadvantage in higher
precision recognition. 2D and/or 3D images captured by high powered cameras and
highly programmed lidar sensors could be used for this task.
This study aims to utilize a convolutional neural network done using YOLOv4 to
recognize obstacles using an RGB and depth image data. Both the RGB and depth image
data will be captured by a stereo camera placed in a golf picker buggy.
2 Methods
2.1 Preparation
Although there are many datasets that could be used for obstacle recognition, the study
developed its own dataset. The images in the dataset were captured using two cameras –
FLIR and Mynteye. These cameras were installed in an autonomous vehicle that runs
for more than 200 h on the road both for daytime and night time. Both cameras run in a
Linux and ROS and provide a rosbag after the recording process. As seen in Fig. 1, the
cameras were attached in autonomous vehicles presented below.
The camera is positioned in the front side of the vehicle with approximately one
meter above the ground. The setup was intended to take images of the obstacles such as
pedestrian, car and movables.
For the data gathering, the team was able to develop a more comprehensive, with high-
density and heavily occluded image dataset which comprises eight labels. A total of
700,000 annotations were completed across all labels. The annotation was done using
labelImg, an opensource tool for different annotation formats such as YOLO, JSON and
others.
2.4 Pre-requisites
The study used YOLO (You Only Look Once) and LabelImg for annotation. The YOLO
format includes center of the bounding box, width and height of the bounding box and
the class or label of the object. Annotation of at least 200,000 images was done manually
using LabelImg. A total of 700,000 annotations were gathered. This sums up as the initial
dataset for this study.
The software implementation was developed in Robotic Operating System using
C++ language using the pre-trained YOLOv3 architecture and Darknet and PyTorch
framework and libraries. In addition to this, mynteye sdk and modules were used. This
provides a more convenient and efficient working with the mynteye camera and OpenCV
libraries. Initially, we applied YOLO which works by putting and splitting an image into
n grid cells (usually 19x19). For each cell that represents a certain part of an object,
there will be predicted bounding boxes, confidence scores, and class probabilities. The
confidence is calculated using an IOU (intersection over union) metric that measures
how much a detected object overlaps with the ground truth as a fraction of the total area
spanned by the two together (the union).
YOLO aims to predict a class of an object in an image by the use of a bounding box
(see Fig. 3 below). Each bounding box has four descriptors namely:
Center of the bounding box (bxby) or (x, y)
Width (bw)
Height (bh)
C = class or label
YOLO is predicting that there is an object in the image instead of searching for
regions of interest. It is splitting an image S x S grid cells. Each cell is responsible for
predicting n number of bounding boxes. Each grid cell predicts a bounding box alongside
a confidence value. If a grid cell does not contain a bounding box, its confidence value
Obstacle Recognition Using Depth Estimation 481
must be zero. Most of these cells do not contain the object; therefore YOLO will predict
a value that will remove boxes with low object probability and bounding boxes with the
highest. This process is called non-max suppression. If the center of an object falls into
a grid cell, that cell is mainly responsible for the detection.
3 Results
The results of the training are shown in Table 1 below. Given the generous number
of samples in the dataset, the model was able to achieve an accuracy of 92.48% for
pedestrian, 82.22% for car and 49.96% for movables which includes traffic cones. The
true positive and false positive metric was also shown below.
Once the model training is done and when we reached the acceptable and desirable
model, the mynteye camera will be connected to a linux-based industrial PC which has
an installed ROS Kinetic in Ubuntu 16.04. The mynteye camera has a downloadable
SDK, for this version of mynteye, SDK D is needed. Then, mynteye camera is up and
482 J. Estrada et al.
running, it will show multiple topics that were being published. Some of the topics that
needed for this object recognition task are:
/camera/image_raw == rgb data
/camera/dmap = = depth data
Fig. 4. Depth data and the Topics Associated with the Synchronization
As seen in the Fig. 4 above, the depth image data from the RGB-D camera shows as the
associated topics presented in rqtgraph. In the topics listed, there are two topics captured
from the RGB-D camera which are image_raw_color and depth_registered. These two
topics are the inputs for the darknet_ros node. These two topics were synchronized using
an image_sync function. This module is enabled for each grab call and these images will
be fed into the AI module (darknet_ros node) that will output the detected objects for
each frame.
4 Discussion
The next step is to publish the information as a topic with its respective and relevant
information. This information is being transferred as a message for the AV. The target
contents of the message are the following (see Table 2 below):
5 Conclusion
The study was able to develop a system prototype of object recognition task using 2D
and depth information with distance estimation. The prototype was implemented in an
AV. This study recommends the use of lidar and camera for better sensing capabilities
of the vehicle since lidar is also a powerful tool for object recognition task.
References
1. Rohr, C., Ecola, L., Zmud, J., Dunkerley, F., Black, J., Baker, E.: Travel in Britain in 2035:
Future scenarios and their implications for technology innovation (2016). In Innovate UK from
https://www.rand.org/pubs/research_reports/RR1377.html
484 J. Estrada et al.
2. Trommer, S., Kolarova, V., Frädrich, E., et al.: Autonomous driving The impact of vehicle
automation on mobility behavior (2016). https://www.ifmo.de/publications.html?t=45
3. Urry, J.: What is the Future. Polity Press, Cambridge (2016)
4. Liu, W., Anguelov, D., et al.: SSD: single shot multi-box detector. In: ECCV (2016). https://
doi.org/10.1007/978-3-319-46448-0_2
5. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv:1804.02767 (2018)
6. He, K., Zhang, X., et al.: Spatial pyramid pooling in deep convolutional networks for visual
recognition. In CVPR (2014)
7. Girshick, R.: Fast r-cnn. In: ICCV (2015)
Humanoids Improving the Quality of Life
of Older People: The Case of Poland
Katarzyna Halicka(B)
1 Purpose
The scientific objective of the study is to analyse and evaluate the humanoid robotic
technology Rudy Robot for its capacity to improve the quality of life of older people. The
technology was evaluated against different groups of criteria. The examined demographic
characteristics included age, gender, place of residence and education. The main research
problem was formulated in the form of the following questions:
1. What are the most important criteria for evaluating the Rudy Robot?
2. How was the humanoid assessed against different criteria?
3. Does age influence the assessment of humanoid robotic technologies used in the care
of older adults?
4. Does gender influence the assessment of humanoid robotic technologies used in the
care of older adults?
5. Does education affect the assessment of humanoid robotic technologies?
6. Does the place of residence affect the assessment of the Rudy Robot?
The article consists of five sections: literature review, research method, results, dis-
cussion, conclusions. Initially, based on the literature review, it discussed examples of
humanoid robot technologies. Their main features were indicated. In the next section,
the research process is presented. Four main research tasks are described and methods
used in the whole process are characterised. Further, the result of the research conducted
within the article is described. Then, the results are discussed and the article ends with
a short summary. Limitations of the performed research and plans for further research
are also presented.
2 Literature Review
The development of civilisation would have been impossible without inventions. For
more than two decades, the proportion of working-age people in the EU-28 has been
steadily declining, while the relative number of people in retirement has increased.
In 2021, compared to 2020, the proportion of people aged 65 and over increased in
all Member States except Lithuania, where it remained unchanged. The proportion of
the population aged 65 and over is increasing in every EU Member State, EFTA and
candidate countries. In the last decade, the increase ranged from 5.2 p.p. in Finland, 5.1
p.p. in Poland, 4.7 p.p. in Liechtenstein and 4.6 p.p. in the Czech Republic, to 1.3 p.p.
in Germany and 0.7 p.p. in Luxembourg. Over the last decade (2011–2021), there has
been an increase of 3 p.p. for the EU as a whole [1].
The proportion of the working-age population is estimated to continue decreasing
between 2021 and 2100, while older adults are likely to represent an increasing pro-
portion of the total population: people aged 65 or over will represent 31.3% of the EU
population by 2100, compared to 20.8% in 2021. The median age is projected to increase
by 4.9 years, rising from 44.1 in 2021 to 48.8 in 2100.
The increase in the number of older people is also associated with the need to provide
institutional support in the form of care, particularly in the case of a low level of inde-
pendence. The need exists for pro-health and digital education, the development of care
services, the creation of safe and functional housing and access to public transport. Tech-
nical solutions must be developed to support the functioning of older people. However,
such solutions require recognising knowledge from such fields as anthropotechnics (in
the scope of human-computer relations), cognitive psychology, neurobiology, artificial
intelligence, and IT, electrical and communication engineering. Robots can be one of the
major ways of helping older people. The purpose of robots is to help older people to live
and function as independently as possible [2]. Older people can use robots to lift, grip,
Humanoids Improving the Quality of Life of Older People 487
carry, remind them to take medication, recognise and assess health conditions, monitor
gait, motivate walking, and meet social needs through interaction. Pearl is an example of
a humanoid robot developed by Carnegie Mellon University. It is a mobile robot that can
help older people navigate a care facility. Pearl can follow patients, communicate via a
graphical touchscreen and serve as a telehealth device. It is equipped with two comput-
ers, sonar, stereo camera systems and wireless Ethernet. It can recognise and synthesise
speech using microphones and speakers [3]. Peral reminds older people of their daily
activities, such as eating, drinking, taking medicine or using the bathroom, and helps to
navigate their environments. Robovie is a humanoid robot designed to communicate with
humans and weighs approximately 40 kg. It is equipped with various sensors, including
skin and touch, microphones, vision sensors and ultrasonic obstacle sensors installed on
the mobile platform. The combination of sensors and various actuators for moving eyes,
head and arms facilitate meaningful behaviour [4].
Another example is the Olivia Robot, acting as a personal assistant and companion for
older people. The robot was developed at the A*STAR social robotics lab in Singapore.
Olivia version 2.1 is equipped with a pair of cameras for the eyes, and a third camera on the
forehead, targeting the face of the interlocutor and determining from the movement of the
mouth whether the person is speaking to it. Then, using the built-in eight microphones,
it starts listening to the caller while pointing its mechanised face towards the person’s
face. The stationary version of Olivia stands 1.6 m tall while weighing 152 kg, and the
researchers plan to mount it on a moving platform and equip it with gripping three-finger
manipulators.
Twendy-One is an extremely agile and intelligent humanoid robot designed to help
older and disabled people. It was designed and built by Japanese students and researchers
at Waseda University in Tokyo. Twendy-One can hold limited conversations. It uses a
built-in camera to locate designated objects. It can greet you, bring you breakfast on a
tray, wish you a tasty meal, then help you get out of bed and give you clothes or a walking
stick. The Twendy-One robot, equipped with soft hands that respond to human touch,
could be a recipe for overcrowded retirement homes. It can sensitively and effectively
help people get out of a chair or bed, and interact with its owner, responding to touch
and pressure accordingly, thanks to its sensors. The Twendy-One motor system has 47
degrees of freedom. It stands 147 cm tall and weighs 111 kg.
Another example of a humanoid robot is the RUDY robot. The RUDY robot can
remind a person to take medication and can also dispense it. It can call for help in an
emergency, be a good companion, reducing the feeling of loneliness.
Some characteristic features of humanoid robot, their functional capabilities have
been presented in the literature. On the basis of the literature review, it should be stated
that so far there have been no studies in which the humanoid robot technology and its
usefulness have been assessed. Therefore, it seems important to investigate what features
of robots according to current and future users are very important. It is also important
to find out whether the age, gender, etc. of the respondent affects the assessment of this
technology.
488 K. Halicka
3 Research Method
The entire research process consists of four main steps. The first research task is to iden-
tify, based on a literature review, criteria for evaluating the humanoid robotic technology
Rudy Robot.
Another research task is to evaluate the individual six groups of criteria by the
respondents, organise these groups and select the top two. In the study, the research
method was a diagnostic survey using the CAWI (Computer-Assisted Web Interview)
survey technique. Groups of criteria were evaluated by 1152 respondents from Poland,
aged over 40 years.
The third step of the research was the evaluation of the technology (Rudy Robot) in
the context of the two highest-rated criteria: demand and technical aspects.
The last step of the research was to investigate whether age, gender, education affect
and place of residence impacted the technology assessment (Rudy Robot) in the context
of the two groups of criteria: demand and technical aspects. It was checked how the
humanoid robotic technology Rudy Robot would be assessed by older adults in the
context of the highest-rated groups of criteria. The study used the non-parametric Mann–
Whitney U test to determine the effect of gender on the technology assessment. The
ANOVA Kruskal–Wallis test was used to examine the influence of age, education and
place of residence on the evaluation of the Rudy Robot.
The survey was conducted in Poland at the turn of 2020 and 2021 with people over
40 years of age. The representative research sample consisted of 1152 respondents (all
respondents were Polish citizens of all voivodships). In terms of age, the largest group
of respondents was aged over 60 (520 people, 45.1%). The second-largest group was
aged 50–59 and consisted of 329 people (28.6%), while the smallest group was aged
40–49 (303 people, 26.3%). Analysing the age of the respondents, women accounted
for over half of all respondents (625 people, 54.3%), and men comprised 45.7% (527
people). Over 42% (493 people) of all respondents had primary education, 31.1% (358
people)—higher education, 22.3% (257 people)—vocational education and 3.8% (44
people)—basic education.
The respondents evaluated the assessment of the analysed humanoid robotic tech-
nology using a seven-point Likert scale, where one meant “it definitely means I do not
agree with the given statement” and seven—“I definitely agree.”
4 Results
First, based on the literature review, the following six criteria groups were identified:
competitiveness, demand criteria, technical criteria, social and ethical criteria, ecological
criteria, and ease of use [5–7]. The criteria were formulated in the form of statements.
37 criteria were identified with eight statements related to the demand for technologies
(D1–D8) and the ecological aspect (E1–E8), six—to social and ethical aspects (SEC1–
SEC6) and competitiveness of technology (C1–C6), five—to technical criteria (T1–T5)
and four—to ease of use (EU1–EUT4).
Next, the six groups of criteria were evaluated, where one meant “very unimportant”
and seven—“very important”. The highest-rated were demand and technical criteria. The
Humanoids Improving the Quality of Life of Older People 489
list of the highest-rated group of criteria used to assess the humanoid robotic technology
Rudy Robot is presented in Table 1.
Next, the age and gender of respondents were checked for the influence on the tech-
nology assessment of the humanoid robotic technology Rudy Robot. A critical signifi-
cance level was assumed at p = 0.1. The study used the non-parametric Mann–Whitney
U test [8, 9] to determine the effect of gender on the technology assessment. The ANOVA
490 K. Halicka
Kruskal–Wallis test was used to examine the influence of age, education and place of res-
idence on the evaluation of the Rudy Robot. The research used the Statistica 13 software.
The respondents assessed the demand for the humanoid robotic technology Rudy Robot.
The statistical values for the technology assessment in terms of demand are presented
in Table 2.
Table 2. Statistical values of the Rudy Robot’s technology assessment by older adults for demand
The respondents also assessed technical aspects of the humanoid robotic technology
Rudy Robot (Table 3).
Table 3. Statistical values of the Rudy Robot’s technology assessment by older adults for the
technical aspect
5 Discussion
Table 2 shows significant assessment differences (D1, D2, D3 and D6) between genders
in their rating of the demand level for the humanoid robotic technology Rudy Robot (p <
Humanoids Improving the Quality of Life of Older People 491
0.1). Also, Table 2 demonstrates statistically significant (p < 0.1) differences depending
on age (the acceptance of drivers D2, D3, D4 and D6), education (D6) and the place of
residence (D2 and D8).
Table 3 provides that statistically significant differences between genders in this
technology assessment for the technical aspect occur only in statement T3 (p < 0.1).
Such differences depending on education appear in the acceptance of statement T1 (p
< 0.1) and in terms of the place of residence—statements T1, T4 and T5 (p < 0.1).
No significant differences were observed depending on age in the assessment of these
statements.
A 90% probability exists that a respondent’s gender influences the assessment of
technology in terms of demand and technical criteria, i.e., statements D1 (“There is a
need for the Rudy Robot in institutions responsible for the care of older adults e.g.,
nursing homes”), D2 (“The Rudy Robot will bring users additional benefits unavailable
through other solutions”), D3 (“The popularisation of the Rudy Robot corresponds to
forecasts concerning technology development directions and the expectations of older
adults”), D6 (“Changes in the environment make the Rudy Robot more attractive for
older adults”) and T3 (“The widespread use of the Rudy Robot depends on the use
of hard-to-reach materials”). The research results also allow observing with the same
probability of 90% that the age of a respondent influences the technology assessment in
terms of demand and technical criteria, i.e., statements D2, D3, D4 (“The Rudy Robot
has higher ease of use and operation simplicity than the technologies used so far”) and
D6. The age of the respondent has no influence on the evaluation of the technical aspect
of humanoid robotic technologies.
Also, it was found that a respondent’s education influences the technology assessment
of demand and technical criteria, i.e., statements D6 and T1 (The Rudy Robot is imple-
mented and successfully used by older adults). In turn, place of residence influenced
the assessment of humanoid robotic technologies in terms of the criteria D2, D8, T1,
T4 (“The Rudy Robot can complement the solutions currently available on the market”)
and T5 (“There is a great potential for further improvement of the Rudy Robot”).
6 Conclusions
The article assesses humanoid robotic technologies (Rudy Robot) that improve the qual-
ity of life of older people. The research mainly aimed to find answers to the following
questions: (1) What are the most important criteria for evaluating the Rudy Robot? (2)
How would a humanoid be assessed against different criteria? (3) Does age influence
the assessment of humanoid robotic technologies used in the care of older adults? (4)
Does gender influence the assessment of humanoid robotic technologies used in the
care of older adults? (5) Does education affect the assessment of humanoid robotic
technologies? (6) Does the place of residence affect the assessment of the Rudy Robot?
The conducted research showed that the humanoids for older adults received the
highest rating in terms of demand (average of all scores on a scale from 1 to 7 – 4.67).
This technology also received high ratings for technical and demand (average of all
scores on a scale from 1 to 7 – 4.36).
The research indicated that the age of the respondent does not affect the evaluation
of the Rudy Robot technology in terms of technical aspects, for each criterion of this
492 K. Halicka
group a critical significance level was greater than 0.1 (p > 0.1). Also the education
and gender of the respondent mostly does not affect the evaluation of this technology in
terms of technical aspects.
However, age and gender have an impact on the evaluation of this technology (the
Rudy Robot) in terms of demand.
The study carried out the following limitations:
1. only respondents over 40 years of age made an assessment of the technology, younger
people were not considered;
2. the survey was conducted on the territory of only one country;
3. all criteria had the same weighting.
In her next research, the author plans to extend the research to a larger sample and
other countries. She also intends to consider a different weight of criteria and a different
weight of decision makers (e.g. taking into account the age of respondents according
to the assessment criterion). Furthermore, in the author’s opinion, other technology
assessment criteria should also be taken into account, such as technology readiness
levels or technology life cycle analysis.
Acknowledgments. This research was funded by the Ministry of Science and Higher Education,
grant number W/WIZ/1/2019. The publication of the article for 11th International Conference on
Engineering, Project, and Production Management - EPPM2021 was financed in the framework of
the contract no. DNK/ SN/465770/2020 by the Ministry of Science and Higher Education within
the "Excellent Science" programme.
References
1. Eurostat. https://ec.europa.eu/eurostat/statistics-explained/index.php?title=Population_stru
cture_and_ageing#Past_and_future_population_ageing_trends_in_the_EU Accessed 29 Mar
2022
2. Zhou, D., Barakova, E.I., An, P., Rauterberg, M.: Assistant robot enhances the perceived
communication quality of people with dementia: a proof of concept. IEEE Transactions on
Human-Machine Systems (2021). https://doi.org/10.1109/THMS.2021.3112957
3. Pollack, M.E., et al.: A mobile robotic assistant for the elderly. Paper presented at the AAAI
Workshop on Automation as Eldercare, July 29, 2002, Edmonton, Alberta, Canada (2002)
4. Kanda, T., Ishiguro, H., Imai, M., Ono, T.: Development and evaluation of interactive humanoid
robots. Proc. IEEE 92, 1839–1850 (2004)
5. Ejdys, J.: Innovativeness of residential care services in Poland in the context of strategic
orientation. Procedia. Soc. Behav. Sci. 213, 746–752 (2015)
6. Nazarko, J., et al.: Foresight study of road pavement technologies. Procedia Eng. 122, 129–136
(2015)
7. Radziszewski, P., et al.: Future trends in road pavement technologies development in the context
of environmental protection. Baltic J. Road Bridge Eng. 11(2), 160–168 (2016)
8. Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics Bulletin 1, 80–83 (1945)
9. Mann, H.B., Whitney, D.R.: On a test of whether one of two random variables is stochastically
larger than the other. Ann. Math. Stat. 18, 50–60 (1947)
3D Concrete Printing with Macro-micro Robots
1 Introduction
1.1 Motivation and Overview
The construction industry is vital for the national economy, accounting for approximately
5% of US Gross Domestic Product (GDP). In 2017, the average gross output by the
construction industry in the first three quarters was reported at $1.463 trillion [1], creating
employment for 6.7 million workers [2]. However, construction is also one of least
automated industries in the world [3]. Productivity within the industry is hindered by
the lack of automation tools. Concrete operations are a foundational element within
construction, and automation of those operations could improve efficiency, productivity,
and safety.
In this work, we discuss the augmentation of the cement delivery process, specifi-
cally the robotization of the concrete delivery process. The goal is to develop a relatively
portable, cable-actuated system, to position and manuever the cement hose in the pres-
ence of obstacles, including construction workers, in the workspace. A further goal is to
provide additional dexterity at the nozzle, to allow for non-traditional pouring of cement,
to enable, for example, repair operations requiring non-vertical deposition.
In the following sections we describe and discuss the robot system design and imple-
mentation, as well as related materials research (to produce cement suitable for robotic
printing) and experiments conducted to date with an early prototype of the robotic hose.
the cables were routed formed a key part of this assembly. Bearings were mounted
within the collars to reduce the effects of friction – a challenge for tendon-operated
continuum robots in configurations featuring high bending angles in practice. Encoder
measurements at the motors were transformed to hose shape, via cable lengths, with
continuum kinematics [5]. The resulting system is shown in Fig. 2.
Control of the hose robot was initially implemented utilizing Constant Curvature
(CC) kinematics, the standard approach for continuum robots [4, 5]. However, the non-
uniform construction of the cement hose, featuring a heavy nozzle, produced section
curvatures that deviated significantly from the constant curvature assumption. This led
to significant errors in the end effector positions and orientation achieved. Consequently,
we subsequently developed and implemented an Euler curve based Variable Curvature
(VC) kinematic model for multiple (two, in the hardware implementation) continuum
robot sections. The VC approach significantly improved end effector accuracy compared
to the CC model. Details of this approach can be found in [6].
The assembled prototype system with the cable hose integrated is illustrated in Fig. 3.
We are currently implementing a statics-based model that ensures uniform EE velocity
and curvature at all bending planes developed for a single section. This approach further
improves upon on the VC kinematics model. Multi-section versions of the model are
currently under development.
496 I. D. Walker et al.
In order to produce cementitious materials suitable for 3D printing with the robot system,
fundamental rheological research was conducted. Specifically, we developed 3D print-
able mixtures of Portland cement with slag and metakaolin. We further investigated the
influence of aggregate shape characteristics on the rheological behavior of 3D printable
mixtures, supplemented with preliminary work on shrinkage, mechanical and transport
behavior of 3D printable mixtures. We are evaluating the impact of chemical accelerators
on properties of printable mixes.
We are also developing dynamic rheology control for cementitious materials for
3D printing. We have designed and manufactured an active-rheology control test setup,
conducting material trials utilizing rheometer and flow rate measurements. The active-
stiffening test setup designed is currently under manufacture.
Initial experiments with the prototype, using the new cementitious mixes pumped into
and through the hose, demonstrated the ability of the hose to maneuver to print a series
of shapes in the horizontal plane. See Fig. 4.
The results further validated the ability of the research in rheology to generate mixes
which could both be smoothly pumped through the hose without congestion, but also
hardened sufficiently quickly to support printing multiple additional layers in the vertical
direction as the robot re-traversed the trajectory. Additional details are provided in [6].
498 I. D. Walker et al.
4 Conclusions
We describe a novel intelligent cable-driven macro/micro co-robot system aimed at 3D
printing of concrete in the construction industry. A a cable driven parallel robot acts
as a macro-base, maneuvering the cable-driven continuum robot (integrated with the
concrete delivery hose in the application) which serves as the micro-unit, providing the
hose with controllable dexterity.
Both the macro and micro robotic elements possess redundant degrees of freedom.
Redundancy resolution for the cable-driven macro robot system allows controllable
stiffness of its payload. Variable curvature kinematics are used for motion planning
and control of the micro robot hose. Fundamental research in rheology of 3D-printable
concrete has enabled the research by identifying and evaluating concrete mixes suitable
for robotic printing.
Experiments with an initial prototype demonstrate the ability to 3D print closed
structures, and also highlighted the need for improved control of the system. Future
work will exploit the overall redundancy in the system, treating the macro and micro
units as a single coupled system. We are currently working on IMU-based sensing to
support end-effector-based control. The wider goals for the research are to provide field
intelligence by adding situational awareness and physical-assist capabilities.
References
1. Bureau of Economic Analysis, Department of Commerce (2017). https://www.bea.gov/
Accessed 20 Apr 2022
2. Bureau of Labor Statistics, Department of Labor (2017). https://www.bls.gov/ Accessed 20
Apr 2022
3. Agarwal, R., Chandrashekaran, S., Sridhar, M.: Imagining Construction’s Digital
Future (2016). https://www.mckinsey.com/industries/capital-projects-and-infrastructure/our-
insights/imagining-constructions-digital-future Accessed 20 Apr 2020
4. Webster, R.J., III., Jones, B.A.: Design and kinematic modeling of constant curvature
continuum robots: a review. Int. J. Robot. Res. 29(13), 1661–1683 (2010)
5. Jones, B.A., Walker, I.D.: Kinematics for multisection continuum robots. IEEE Trans. Rob.
22(1), 43–55 (2006)
6. Srivastava, M., Ammons, J., Peerzada, A.B., Krovi, V.N., Rangaraju, P., Walker, I.D.: 3D
Printing of Concrete with a Continuum Robot Hose Using Variable Curvature Kinematics. to
appear. In: IEEE International Conference on Robotics and Automation (ICRA), Philadelphia,
PA (2022)
Automatic Polarity Identification on Twitter
Using Machine Learning
José Carmen Morales Castro1 , Rafael Guzmán Cabrera1(B) , José Ruiz Pinales1 ,
Luis Manuel Ledesma Carrillo1 , and Belém Priego2
1 Departamento de estudios multidisciplinarios Colonia Yacatitas, Universidad de Guanajuato,
Yuriria, Gto., Yuriria, Guanajuato, México
{jc.moralescastro,guzmanc,pinaleslm.ledesma}@ugto.mx
2 Departamento de Sistemas, Reynosa Tamaulipas universidad Autónoma Metropolitana
Unidad Azcapotzalco, Azcapotzalco, Ciudad de México, México
[email protected]
1 Introduction
Currently microblogging websites have become digital spaces of varied information,
where users publish and disseminate information in real time related to a wide variety of
topics where opinions can be expressed through texts that implicitly carry an emotional
charge. This means that opinions carry an emotional charge that becomes a positive or
negative opinion about people, products, or services that are conducted in daily life.
Several companies, organizations and institutions have made use of this type of
media to obtain feedback, promote themselves, or simply to convert the opinion of users
into an improvement network that has begun to poll micro blogs to get an idea about the
general sentiment of their users, products and services [1]; in this context is Twitter that
in recent years has had an important growth in the so-called “social panoramas”, used in
a transmission system, as well as a conversation tool [2], that is why this social network
is currently widely used for the development of numerous investigations, including
sentiment analysis or opinion mining as it is also known, where sentiment analysis
is defined as the process of determining opinions based on attitudes, evaluations and
emotions about specific topics [3].
Some research works like [4] describe opinion mining as an automatic treatment
of opinions contained in a sentence, this allows determining the polarity or feeling that
is expressed, whether positive, negative, or mixed, as well as allowing the automatic
extraction of characters that helps to know the perception that users have on specific
topics and aspects.
Because the emotions that users express in the Tweets are related to the feelings
of the person, and the polarity (positive, negative, and neutral) that is the measure of
the emotions expressed in a sentence. Generally, the polarity goes from negative (-1) to
positive (1) passing through neutral (0), this last value means that no feeling or opinion
has been expressed [5].
The structure of this research work is briefly described below, where Sect. 2 talks
about the related working order to give a better understanding of the orientation of the
project.
Section 3 explains the problem that we want to combat with this research work and
the hypothesis that we want to carry out to verify the results that are going to be obtained.
Section 4 explains the proposed methodology where the explained process that allowed
us to develop the project in a competitive and efficient manner.
The results and conclusions that are delivered with this research work are shown in
Sect. 5 and 6 respectively, to end with the references that were used to carry out this
research work.
2 Related Work
Authors like [6] describe sentiment analysis as a task that is responsible for identifying
and classifying different points of view and opinions on a particular issue, which may
be an object, a person, or an activity, among others; based on natural language process-
ing (NLP) to identify the state of mind of people, collecting comments, reactions and
messages through social networks, where the main objective is the analysis of online
documents and their classification as feelings: positive or negative, there is also the
possibility that they do not exist and would be classified as neutral. In social networks,
research has been growing in the sentiment analysis where its classification depends a lot
on the use of keywords in the texts, a small factor that can cause a problem would be the
information stored in the graphics, videos, or images since they can include information
not found in the accompanying text.
In [7] the authors present some techniques used to review sentimental analysis,
which help automatically determine the polarity in a text, the most common being those
based on machine learning, which is an important part of Artificial Intelligence, since
it develops programs through algorithms of learning and the generation of knowledge
Automatic Polarity Identification on Twitter Using Machine 501
capable of learning to solve problems; within the possible applications that can become
as useful as they are different, which is why it is currently still an open research topic
in which very attractive contributions continue to be made and interesting in the area
of sentiment analysis, it is worth mentioning that sentiment analysis not only focuses
on the part of identifying polarity in opinions expressed through subjective texts, since
this task can go much further, allowing even to carry out the identification of particular
feelings, such as the classification of primary feelings such as joy, sadness, anger, fear,
among others.
Another technique used for reviewing sentiment analysis is semantic orientation,
which is responsible for extracting opinions. In [8] the authors explain that the semantic
orientation of a word can become positive when it is shown through praise or negative,
when it is presented as criticism. It uses a learning technique that does not necessar-
ily have to be supervised since it does not require initial training, that is, it does not
require manually labeled instances to conduct the learning process. Authors like [9] tell
us about how to adapt this semantic orientation system, as an example, to be able to
perform sentiment analysis in a new language, building support vector machine (SVM)
classifiers, taking into account the approach they used through automatic learning of this
text classifier, based on the fact that the classifiers can be trained in any language, for
this they carried out cross-validation tests using a classifier based on the SVM learning
method.
To conduct the automatic identification of sentiments in information systems, with
the best features, in tweets using an architecture that combines base classifiers and lexical
resources. To try to define automatic tools capable of extracting subjective information
from texts in natural language, such as opinions or feelings, to create structured and
actionable knowledge to be used by a decision-making system.
The question we ask in the development of this research work is to know if it is
possible to identify polarity in unstructured texts using machine learning techniques.
3 Methodology
In the present work, the classification of tweets was conducted. The evaluation uses
two datasets corresponding to opinions issued on Twitter, which have around 163,000
tweets, which are found labeled in terms of the polarity of opinion as: positive, negative,
and neutral. The first data set is made up of 10,653 tweets in total, divided equally
where 3,551 tweets contain a positive label, 3,551 a neutral label and 3,551 contain a
negative label; while the second database used to manage 10% of the totality of the tweets
contained in the database. These tweets and comments were made about Narendra Modi
and other leaders, as well as the opinion in society towards the next prime minister of
the nation (in the context of the general elections held in India in the year 2019), and
that will help classify them automatically. For this work we take this database as an
object of study since it is interesting to see how the media perception of a character can
be measured through opinions issued on social networks and that this can undoubtedly
help the character in question correct or moderate your speech in relation to a particular
topic.
Selecting the database was our first step to build the classifier, the texts are labeled
with values from −1 to 1, where:
502 J. C. M. Castro et al.
It is worth mentioning that this is a standard database that is available on the internet1 .
In Fig. 1, the diagram that illustrates the method implemented in this work is shown.
In our case, we use the classification scenario based on cross-validation, which is one
of the most widely used resampling methods to evaluate the generalization capacity of
predictive models and thus estimate the true prediction error and parameter adjustment
[10].
Next, each of the elements that make up the proposed method is briefly described.
For data entry, the first thing that was done was to change the value in the database
by setting the polarity as positive, negative, and neutral respectively, according to the
existing numerical label.
The experiments were conducted in Python, we added the corpus, where the text of
the tweets was used as learning characteristics, that is, the comments or opinions that
the users made.
In the preprocessing part, the stop words were eliminated. To create the stop words
lists, a wordcloud is inserted to be able to visualize the words that are repeated the
most and that may be irrelevant for the analysis which are the empty words of content
but that serve us for structurer prayers and being able to express ourselves correctly.
However, since for the classification system, it becomes a matrix problem, will fewer
1 https://www.kaggle.com/cosmos98/twitter-and-reddit-sentimental-analysis-dataset.
Automatic Polarity Identification on Twitter Using Machine 503
elements reduce the dimensionality of the matrix. Once this process is done; to verify,
a wordcloud is created again, as shown in Fig. 2 to ensure mentions were found as
headwords, once stop words were removed.
With the adjustment to the document, a bag of words is passed, also known as a
bag of words, where the IDF (Inverse document frequency) is selected as the document
frequency. As the next step of the proposed method, the label variable to be analyzed
is placed, in this case it is the row where the positive, negative, and neutral labels are
found.
An analysis is conducted to see which classifier yields the best precision results,
for which the following learning methods, widely known in the state-of-the-art, were
used: Support Vector Machines (SVM), which is a method that is based on learning and
provides us with support in solving problems through classification and regression, which
is based on training and resolution phases, this method proposes an answer (output) to
an established problem.
Logistic Regression (RL), which is defined in [11] as a machine learning classifica-
tion algorithm used to predict probability and data using lines, requiring the dependent
variable to be binary.
Naïve Bayes (NB), which is a classifier that helps us calculate the probability of
an event having information about it, based on the theorem and additional hypotheses
[12]. Random forest, according to Breiman, is a classifier that consists of a combination
of tree classifiers where each one of them is generated using a sampled random vector,
independently of the input vector, where each tree casts a unitary vote for the most
popular class and can classify an input vector.
González in [13] explains that KNN is a supervised machine learning non-parametric
classification method that estimates the value of the probability density function or
directly the probability that an element belongs to a class from the information provided
by the set of prototypes. It is used to classify values by looking for the most similar data
points learned in the training stage and making guesses of the new points based on that
classification.
504 J. C. M. Castro et al.
1. Area under the curve (AUC) is calculated using the area under the ROC curve and the
larger the area the more accurate the predictor is formally, the formula to calculate
the AUC is represented by Eq. (1):
1
AUC = ∈ f (x)dx (1)
0
where f(x) represents the receiver operating characteristic (ROC) curve function,
however since f(x) tends not to have an integration form like a parabola; several
authors suggest using approximation methods to calculate AUC [14].
2. Accuracy is the degree of closeness to the true value; it refers to a measurement with
both true and consistent results. The formula to calculate the Accuracy is represented
by Eq. (2):
tp + tn
Accuracy = (2)
tp + tn + fp + fn
where tp represents a true-positive value, tn a true-negative value, fp a false-positive
value, and fn a false-negative value.
3. F1 is a measure of precision in a test that is calculated from the precision and recall
of the test that is being conducted. In a nutshell F1 is the harmonic mean of the
precision and recall, which is shown in the Eq. (3):
tp
F1 = 1
(3)
tp + 2 (fp + fn)
4 Results
Carrying a sequence and work as shown in the Fig. 3, using Python, the results obtained
in the experiments conducted are shown below.
This experiment was carried out in two stages, where two lists of stop words were
used, which were eliminated from the documents under study. The first list consists of
173 stop words, and the second list contains a total of 665 stop words and is available
on the internet2 , it can be found online3 . The results obtained for the database with 10%
of tweets, are shown in Table 1 for cross validation and in Table 2, for the training and
testing sets. As well, those results for the database with the number of tweets in equal
content for each polarity, are shown in Table 3 for cross validation and in Table 4 for the
training and testing sets.
As a result, we can observe that for the first database which contains 10% of the total
tweets, the best result is given by Random Forest with a 78.5% in accuracy with cross
validation, while for the second database which contains the same number of tweets for
each polarity, the Logistic Regression classifier gives us a result of 74.9% accuracy, with
the same classification scenario: Cross validation.
2 https://github.com/manishkanadje/reuters21578/blob/master/stopwords.txt.
3 https://www.ranks.nl/stopwords.
506 J. C. M. Castro et al.
Table 2. Evaluation metrics database at 10% for training and test sets
Table 4. Evaluation metrics equitable database for training and test sets
and allowing the ability to identify areas of opportunity for improvement in the case of
negative opinions.
As future work, new text processing techniques could be tested to reduce the margin
of error in terms of classification, as well as testing with different classification methods
in order to be able to make the comparison with the other methods that were used in the
study.
References
1. Agarwal, A., Xie, B., Vovsha, I., Rambow, O., Passonneau, R.J.: Sentiment analysis of twitter
data. In: Proceedings of the Workshop on Language in Social Media (LSM 2011), pp. 30–38
(2011)
2. Jackson, J., Gettings, S., Metcalfe, A.J.N.: “The power of Twitter”: Using social media at a
conference with nursing students. 68. Elsevier, pp. 188–191 (2018)
3. Fiorini, P.M., Lipsky, L.R.: Search Marketing Traffic and Performance Models. 34(6), 517–
526 (2012)
4. Fernandez, J., Boldrini, E., Manuel Gomez, J., Martinez-Barco, P.J.P.D.L.N.: Sentiment
Analysis and Opinion Mining: The EmotiBlog Corpus. 47, pp. 179–187 (2011)
5. Reyes, A., Rosso, P., Veale, T.: A Multidimensional Approach for Detecting Irony in Twitter
47(1), 239–268 (2013)
6. Saberi, B., Saad, S.: Sentiment Analysis or Opinion Mining: A Review. 7(5), 1660–1666
(2017)
7. Hierons, R.:Machine learning. Tom M. Mitchell. Published by McGraw-Hill, Maidenhead,
UK, International Student Edition, 1997. ISBN: 0-07-115467-1, 414 pages. Price: UK£ 22.99,
soft cover, ed: Wiley Online Library (1999)
8. Chaovalit, P., Zhou, L.: Movie review mining: A comparison between supervised and unsu-
pervised classification approaches. In: Proceedings of the 38th Annual Hawaii International
Conference on System Sciences, pp. 112c-112c: IEEE (2005)
9. Brooke, J., Tofiloski, M., Taboada, M.: Cross-linguistic sentiment analysis: from english to
Spanish. In: Proceedings of the International Conference RANLP-2009, pp. 50–54 (2009)
10. Refaeilzadeh, P., Tang, L., Liu, H.: Cross-validation. 5, p. 532–538 (2009)
11. Wright, R.E.: Logistic Regression (1995)
12. Castro, W.M., Cabrera, S.G.: Tuberculosis: Diagnosis by Image Processing. 24(2) (2020)
13. González, R.H., Morell, C., Blanco, A.: Regresión lineal local con reducción de rango para
problemas de predicción con salidas compuestas. Revista Cubana de Ciencias Informáticas
10(4), 184–193 (2016)
14. Bowers, A.J., Zhou, R.: Receiver operating characteristic (ROC) area under the curve (AUC):
a diagnostic measure for evaluating the accuracy of predictors of education outcomes. 24(1),
20–46 (2019)
15. Makhoul, J., Kubala, F., Schwartz, R., Weischedel, R.: Performance measures for information
extraction. In: Proceedings of DARPA Broadcast News Workshop, 249–252 Herndon, VA
(1999)
Sentence Structure and Boundary for Deep
Neural Machine Translation Alignment Model
Bat-Erdene Batsukh(B)
1 Introduction
Neural machine translation refers to machine translation based on several neural network
models [1]. It differs from a system that takes into account word and sentence structure
in its translation based on previously trained neural network models. The difference
between the two methods is as follows.
Neural machine translation generates translation hypotheses based on neural net-
work scores, and machine translation generates hypotheses using count-based language
models that take into account word and sentence structure. The neural network can be
used in combination with a system that takes into account word and sentence structure,
and can be used to calculate points directly or to calculate the N-best evaluation sequence
to rank previously established translation hypotheses based on k shortest paths [2].
This means that when neural network-based training is performed in machine trans-
lation, no intermediate training steps are required to generate the information needed to
train the neural network. Models are taught using only the ordered pairs of source and
target sentences. However, systems that take into account word and sentence structure
are based on prepared language patterns based on word position. Since it is often not
possible to use word matching, first, word matching training is conducted separately on
the parallel training fund.
Another difference between the two methods is the compatibility between the training
phase and the decoding phase. Just as standard neural machine translation uses training
sentence pair evaluation models during the training phase, it also evaluates translation
hypotheses generated during decoding. Systems that take into account word and sentence
structure often include a combination of independently prepared templates. Therefore,
decoding is not directly compatible with training, so there is a difference between training
and decoding.
Standard neural machine translation may require some modifications to normalize
the length between training and decoding, and to use assessment criteria that are different
from the learning objective function. Neural machine translation is more consistent with
trends that take into account word and sentence structure, and is much simpler in the
learning process. Neural network-based model architecture is a complex concept that
differs significantly from the count-based models used in systems that take into account
word and sentence structure. Simple single-layer neural networks are theoretically func-
tional packers that can model any functional relationship, but in practice the quality
of these models is based on whether the network is well calculated and the input data
are sampled and, in addition, word and sentence structure taken into account. Therefore,
designing a neural network usually means finding, experimenting with architectures that
can be practiced, and improving the results as much as possible. There are mixed meth-
ods of combining models that take into account word and sentence structure in neural
machine translation search [3]. These systems, which require significant changes to the
decoding algorithm, include many models, so it is considered necessary to fine-tune the
weight of the model. Although these technologies have improved the quality of machine
translation to some extent, they have not yet reached the level of fully automatic transla-
tion. Irregular sentences are usually created automatically and there is no guarantee that
the meaning of the sentence will be preserved. Alternatively, an automatic translation
system can be used to help professional translators translate high quality. Initiated by
Bender et al. [4], an interactive translation tool that allows translators to translate written
sentences in real time and complete them, if necessary, has led to a change in the strategy
for creating a translation system. This is because as soon as the user starts typing, he
510 B.-E. Batsukh
or she is faced with the problem of finding the best translation of a given sentence that
matches the word or sentence he or she is writing.
Compared to structures that take into account the structure of words and sentences,
the neural machine translation does not require extra intermediate steps, such as word
dependency, and produces direct results using a trained model. Lacking a specific word
connection can make it problematic to link the target words and sentence structure.
2 Related Works
We can identify three different trends in machine translation [5]. First, the propensity
to translate directly from the text into the target language. The next is the “transfer
method”, which is a step-by-step method of translating among the text and the abstract
representation of the target text. The final approach is to translate the text into a non-
linguistic representation between languages, and the target text is extracted from the
abstract representation of all these languages. It can be separated into rule-based and data-
based machine translation. Rule-based methods focus on manually defined translation
rules for a given bilingual. For example, phrase structure trees [6] analyze a sentence
from a sentence into several small phrases and even a single word, while a tree that
shows the relationship between words and sentence structure determines the relationship
between individual words [7]. Usually, they contain only one type of node, and the
relationship between the parent node and the dependent node is indicated by specially
marked notation. The process of defining translation rules requires a great deal of human
knowledge and involvement. On the other corner, a data-based approach, for example a
statistical machine translation, does not need such human knowledge, but translates the
model based on the data example.
Statistical machine translation is a data-based method established in the late 1980s
[8]. Its core idea is to develop a translation template that can be taught using a collection of
source data and target data. Preceding systems of machine translation based on statistics
were word-based, and each translation step contained of the creation of one correct word
[9, 10]. In the early 2000s, a system that considered word and sentence structure was
proposed [11–13]. Later, neural machine translation [14, 15] became the leading trend
in machine translation. To successfully implement these models, the lowest error rate
training [16] is used. It uses variants of attention mechanisms to address the backwardness
of encoder and decoder models [17–19]. On the other view, neural machine translation
is more flexible for translation that does not exactly match the training data. The key
solution is to divide words into sub-words based on creative research and use them in
machine translation [20]. This aspect delivers more opportunities for such models, but
limits the translation to predefined constraints. Without a specific word connection, it
will be difficult to connect the target words to the source word. Therefore, research
into the design and development of neural machine translation models has been widely
conducted in the field of applied and computational linguistics in the form of mixtures
and hierarchies based on basic statistical translation models.
Sentence Structure and Boundary 511
Sentence endings do not need to be taken into account when determining sentence
structure and scope. We define this boundary using an algorithm developed by Stanford
University [21]. Word-based models must model a long context to generate such a
sentence, and the search must be supple enough not to stop the partial assumptions that
lead to such a translation (see Fig. 2).
Nevertheless, phrase-based systems that take into account word and sentence struc-
ture are adequate to store such entries in the sentence table. Throughout the search, all
expressions can be assumed to be a single atomic unit (see Fig. 3).
Decoding using repetitive neural networks in translations that take into account
word and sentence structure, or making decisions with N-best ratings. On the topic
of combining repetitive patterns in coding that takes into account word and sentence
structure, Auli suggests keeping hidden repetitive states in the search state, and suggests
a way to reconcile the states and decide whether they are correct when comparing
search states [22]. Although state reassembly is not abstract, the method of repeating
the model is used to approximate the iteration of the node, as it only retains the latent
state corresponding to the best path when the node is reassembled. Schwenk, on the
other hand, uses transfer models to calculate additional language scores [23], while
Le and his colleagues use short lists [24] to evaluate translation models using class-
based output layers and transmission networks. Kalchbrenner & Blunsom have used
repetitive neural networks to describe the original sentence obtained by using sequential
alignments in the source sentence [14]. Textual descriptions fall into the hidden layer of
repetition on target words. The finest translation is shaped by segmenting all possible
translations and their key phrases. For instance, if the source sequence of sentences in
a text of length K is M = mK 1 = m1 m2 . . . mK , then the equivalent MOSE format, or
the sequence of sentences in the target language corresponding to the same length L,
must be E = e1L = e1 e2 . . . eL .. In our case, we want to translate from Mongolian to
English, we get a (M, E) ranked pair. Based on this, t1L = t1 t2 . . . tL is the alignment
path of the position of each word in the target language to the position of the words in
the target language, the position of each word in the target language to the position of
the words in the target language s1K = s1 s2 . . . sK , [25, 26] and let g1K = g1 g2 . . . gK
be the sentence structure and boundary. In fact, finding the structure and position of a
translated sentence is a matter of probability theory to determine the distribution p (·)
of the translation pattern corresponding to the sentence to be translated at that time,
from the unknown probability distribution Pr (·).. These models include: (1) translation
and targeting, which includes text and target information, (2) language model, which
contains only target language information, and (3) inter-phrase reordering modeling,
which includes word and sentence structure. Can be classified into basic categories.
Modern neural network-based models are able to learn these three models on their own
from the language model and the parallel corpus. However, in some cases, grammar and
sentence structure are not taken into account, which can lead to problems in translating a
given text, such as misinterpretation, synthesis, or omission of sentences. To address this
issue, we have added the sentence structure and boundary as extra model (see Fig. 4).
Sentence Structure and Boundary 513
Once we have identified the templates, we need to solve the search problem, and
this process is called decoding. A search is to find the best translation based on sentence
structure, boundary and word placement. The search is performed by the max and argmax
operators, which search for the best translation among the different translations. In doing
so, length normalization is used to balance the probabilities of long and short sentences
(Eq. 1).
L
K L̂
K 1 l−1 K
m1 ∈ ê1 m1 = arg max logp(el |e1 , m1 ) . (1)
L,eL L
1 l=1
The search for this model, which combines the three models we propose in a
hierarchical manner, is as follows (Eq. 2).
L
1
l−1 l
mK
1 ∈ ê1
L̂
mK
1 = arg max max { ( λ logp el |e1 , t ,
1 1g K
, mK
1 + (1 − λ)
L,e1L t1L L
l=1
log p l |e1l−1 , t1l−1 , g1K , mK
1 )} (2)
λ is the weight of the lexical model, and (1−λ) is the weight of the hierarchical structure
of the word, sentence structure and boundary. When modeling grammar and sentence
boundary, the overall connection of sentences in Mongolian is first designed. “Bapak
Obama Xavad t8pc8n”. Assumed the sentence, the diagram looks like this (see Fig. 5).
For us, the UD, which syndicates Mongolian grammar and sentence boundaries, is
encouraged by Stanford’s method [27], which studies neural network-based words and
514 B.-E. Batsukh
sentence structures and relations. For instance, “Bapak Obama Xavad t8pc8n.” The
Stanford dependency of the Mongolian language is as follows (see Fig. 6).
When training grammar and sentence boundary in a total of 1000 steps, sentence
structure and boundary recognition loss was condensed to 0.02 (see Fig. 7).
During the training, development scores were automatically evaluated for every 200
steps, and the final development score reached 98,653 (see Fig. 8).
Sentence Structure and Boundary 515
By including this dependency in the search for neural translation model, we have
become a gateway to better understanding of sentence structure and boundary.
An effort was made to mix neural network results with a model that takes into account
word and sentence structure, and for the first time projected a model of re-alignment by
changing the position of words [25]. In fact, this integrated model of neural machine
translation uses phrases to train neural networks. For those training, we created a local
Mongolian-English miscellaneous bilingual corpus by translating the following corpora
(see Table 1) (see Table 2).
With the purpose of present the results of the study more evidently and in more
detail, we have measured some statistical indicators. The average number of words in the
original data or 2,402,138 sentences prepared in Mongolian was 15.919942984124976,
the average number of characters was 112.4927664438929, the smallest line consisted
of 2 characters with 1 word, and the line with the most words consisted of 2149 words
with 2039369 indexes.
The model choice was chosen in two or three different models, the data size was
taught in the same two million four hundred thousand sentences, and the translation test
was performed with a 95% confidence level. Also, when the alpha level was chosen
516 B.-E. Batsukh
to be 0.05 and the t-test was performed, the difference in t value was 6.889030645 >
1.961889826, which negates the null hypothesis. Selecting the data size in one million
sentences and two million four hundred thousand sentences, the mean of 0.8445 for one
million sentences and 0.9514 for two million four hundred thousand sentences, with a
95% confidence interval, was significantly different. In addition, when the alpha level
was chosen to be 0.05 and tested on a t-test, the value of 11.2556322 > 1.962023587,
which is significantly higher than the comparison point, refutes the null hypothesis. To
evaluate our model, we have generated sample paragraph that sampled from the test
package. The following results were obtained by comparing and evaluating the quality
of translation using those text (see Table 3).
The most important feature of our study is the introduction of a completely new triple-
hierarchical model that adds focus to the neural network model that takes into account
word and sentence structure, correctly defining the sentence structure and boundary
according to the grammar. We started this experiment just to make this model. To develop
this model, we have developed a general definition of the model and explained it in terms
of probability theory. The search steps were identified by examining the translation
model, the lexical model, and the count-based language model [34]. The results of the
neural network were then staged in a three-step model that worked by correctly defining
the sentence structure and boundary by linking it to a pattern that took into account word
and sentence structure. It then significantly increased the speed by running the template
on the best translations without having to re-calculate the search source data using the
N-best list rating. Finally, we developed a hierarchical neural machine translation model.
Sentence Structure and Boundary 517
The diagram illustrates which three models are step-by-step, how the learning process
works, how to translate with the help of a already trained model, and how to use sentence
structure and boundary.
5 Conclusion
The neural machine translation has recently become a new paradigm to dominate the
machine translation research. In this case, this type of translation model and methodical
518 B.-E. Batsukh
study have entered the field of computational linguistics. By introducing the neural
machine translation alone or in two stages, it reduces the control of the system’s output,
which takes into account the structure and boundaries of words, sentences, and grammar.
Therefore, we developed a neural machine translation system with three different models
of hierarchical connections to improve sentence structure and grammar boundary. This
study was the first attempt to use neural machine translation as a hierarchical system
of sentence structure, word placement, and sentence coverage. Furthermore, a neural
machine translator can produce direct translations without waiting for a whole input
sentence, allowing the user to translate directly or in synchronously, even when the user
is translating. The main thing that makes it possible to translate like this is the sentence
structure and boundary.
References
1. Sutskever, I., Vinyals, O., Le, Q.: Sequence to sequence learning with neural networks. CoRR,
vol. abs/1409.3215 (2014). http://arxiv.org/abs/1409.3215
2. Eppstein, D.: Finding the k Shortest Paths. SIAM J. Comput. 652–673 (1997). Accessed 20
Jan 2022. http://www.ics.uci.edu/
3. Dahlmann, L., Matusov, E., Petrushkov, P., Khadivi, S.: Neural machine translation leveraging
phrase-based models in a hybrid search. In: Proceedings of the 2017 Conference on Empirical
Methods in Natural Language Processing, September 2017, pp. 1411–1420 (2017) https://
doi.org/10.18653/v1/D17-1148
4. Bender, O., Hasan, S., Vilar, D., Zens, R., Ney, H.: Comparison of generation strategies for
interactive machine translation. In: EAMT, pp. 33–40 (2005)
5. Vauquois, B.: A survey of formal grammars and algorithms for recognition and transformation
in mechanical translation (1968)
6. Chomsky, N.: Three models for the description of language. IRE Trans. Inform. Theory 2,
11–124 (1956)
7. Tesnière, L.: Eléments de syntaxe structurale ´Editions Klincksieck, vol. 6, no. 1. Cambridge
University Press (1959). https://doi.org/10.1017/S0008413100018922
8. Brown, P.F., et al.: A statistical approach to machine translation. Comput. Linguist. 79–85
(1990)
9. Brown, P.F., della Pietra, S.A., della Pietra, V.J., Mercer, R.L.: The mathematics of statistical
machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993). https://
aclanthology.org/J93-2003
10. Vogel, S., Ney, H., Tillmann, C.: HMM-based word alignment in statistical translation. In:
International Conference on Computational Linguistics, pp. 836–841 (1996)
11. Och, F.J., Ney, H.: Improved statistical alignment models. In: Proceedings of the 38th Annual
Meeting of the Association for Computational Linguistics, October 2000, pp. 440–447. https://
doi.org/10.3115/1075218.1075274
12. Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of the
2003 Human Language Technology Conference of the North American Chapter of the Asso-
ciation for Computational Linguistics, pp. 127–133 (2003). https://aclanthology.org/N03-
1017
13. Zens, R., Ney, H.: Improvements in dynamic programming beam search for phrase-based
statistical machine translation. In: International Workshop on Spoken Language Translation,
pp. 195–205 (2008)
Sentence Structure and Boundary 519
14. Kalchbrenner, N., Blunsom, P.: Recurrent continuous translation models. In: Proceedings of
the 2013 Conference on Empirical Methods in Natural Language Processing, October 2013,
pp. 1700–1709. https://aclanthology.org/D13-1176
15. Tan, Z., et al.: Neural machine translation: a review of methods, resources, and tools. AI Open
1, 5–21 (2020). https://doi.org/10.1016/j.aiopen.2020.11.001
16. Och, F.J.: Minimum error rate training in statistical machine translation. In: Proceedings
of the 41st Annual Meeting of the Association for Computational Linguistics, July 2003,
pp. 160–167. https://doi.org/10.3115/1075096.1075117
17. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align
and translate. CoRR, vol. 1409.0473 (2014)
18. Vaswani, A., et al.: Attention Is All You Need. In: Advances in Neural Information Processing
Systems, pp. 5998–6008 (2017)
19. Cheng, J., Dong, L., Lapata, M.: Long short-term memory-networks for machine reading. In:
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing,
November 2016, pp. 551–561. https://doi.org/10.18653/v1/D16-1053
20. Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword
units. In: Proceedings of the 54th Annual Meeting of the Association for Computational
Linguistics (Volume 1: Long Papers), August 2016, pp. 1715–1725. https://doi.org/10.18653/
v1/P16-1162
21. Wang, H., Huang, Y.: Bondec-A sentence boundary detector (2003)
22. Auli, M., Galley, M., Quirk, C., Zweig, G.: Joint language and translation modeling with
recurrent neural networks. In: Proceedings of the 2013 Conference on Empirical Methods in
Natural Language Processing, October 2013, pp. 1044–1054. https://aclanthology.org/D13-
1106
23. Schwenk, H.: Continuous space translation models for phrase-based statistical machine trans-
lation. In: Proceedings of COLING 2012: Posters, December 2012, pp. 1071–1080. https://
aclanthology.org/C12-2104
24. Le, H.S., Allauzen, A., Yvon, F.: Continuous space translation models with neural networks.
In: Proceedings of the 2012 Conference of the North American Chapter of the Association for
Computational Linguistics: Human Language Technologies, June 2012, pp. 39–48. https://
aclanthology.org/N12-1005
25. Wang, W., Alkhouli, T., Zhu, D., Ney, H.: Hybrid neural network alignment and lexicon
model in direct HMM for statistical machine translation. In: Proceedings of the 55th Annual
Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), July
2017, pp. 125–131. https://doi.org/10.18653/v1/P17-2020
26. Alkhouli, T., Bretschner, G., Peter, J.-T., Hethnawi, M., Guta, A., Ney, H.: Alignment-based
neural machine translation. In: Proceedings of the First Conference on Machine Translation:
Volume 1, Research Papers, August 2016, pp. 54–65. https://doi.org/10.18653/v1/W16-2206
27. Dozat, T., Qi, P., Manning, C.D.: Stanford’s graph-based neural dependency parser at the
CoNLL 2017 shared task. In: Proceedings of the CoNLL 2017 Shared Task: Multilingual
Parsing from Raw Text to Universal Dependencies, August 2017, pp. 20–30. https://doi.org/
10.18653/v1/K17-3002
28. Ziemski, M., Junczys-Dowmunt, M., Pouliquen, B.: The United Nations parallel Corpus v1.0.
In: Proceedings of the Tenth International Conference on Language Resources and Evaluation
(LREC 2016), May 2016, pp. 3530–3534. https://aclanthology.org/L16-1561
29. Schwenk, H., Chaudhary, V., Sun, S., Gong, H., Guzmán, F.: WikiMatrix: mining 135M
parallel sentences in 1620 language pairs from Wikipedia. CoRR, vol. abs/1907.05791 (2019).
http://arxiv.org/abs/1907.05791
30. Lison, P., Tiedemann, J.: OpenSubtitles2016: extracting large parallel corpora from movie
and TV subtitles (2016). http://www.opensubtitles.org. Accessed 20 Jan 2022
520 B.-E. Batsukh
31. Graça, M., Kim, Y., Schamper, J., Khadivi, S., Ney, H.: Generalizing back-translation in
neural machine translation,” in Proceedings of the Fourth Conference on Machine Translation
(Volume 1: Research Papers), Aug. 2019, pp. 45–52. doi: https://doi.org/10.18653/v1/W19-
5205
32. Cotterell, R., Kreutzer, J.: Explaining and generalizing back-translation through wake-sleep.
arXiv preprint, vol. 1806.04402 (2018)
33. Edunov, S., Ott, M., Auli, M., Grangier, D.: Understanding back-translation at scale. In:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing,
October 2018, pp. 489–500. https://doi.org/10.18653/v1/D18-1045
34. Ma, S., Sun, X., Wang, Y., Lin, J.: Bag-of-words as target for neural machine translation.
ACL, vol. 1805.04871 (2018). Accessed 20 Jan 2022. https://github.com/
35. Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: Bleu: a method for automatic evaluation
of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for
Computational Linguistics, pp. 311–318 (2002). https://doi.org/10.3115/1073083.1073135
36. Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate
with targeted human annotation. In: Proceedings of the 7th Conference of the Association for
Machine Translation in the Americas: Technical Papers, pp. 223–231 (2006). https://aclant
hology.org/2006.amta-papers.25
Topic Discovery About Economy During
COVID-19 Pandemic from Spanish
Tweets
1 Introduction
The main aim of topic discovery in text documents is to extract the text’s mean-
ing, imitating human capacity automatically. This area of study is investigated
within Natural Language Processing (NLP), which allows the automatic extrac-
tion of the meaning of texts, identifying recurring topics automatically, and cre-
ating algorithms capable of interpreting human language. The goal of topic dis-
covery is to extract information by identifying recurring topics and thus finding
the central topic. The purpose of topic discovery is to show relevant information
for other information systems.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
K. Arai (Ed.): FTC 2022, LNNS 561, pp. 521–533, 2023.
https://doi.org/10.1007/978-3-031-18344-7_37
522 A. L. Lezama Sánchez et al.
2 Topic Discovery
Topic discovery is analyzing a large amount of text and finding the topics dis-
cussed in a set of them. Some methods or techniques perform this task auto-
matically in the literature, but it is still necessary to develop new methods or
improve existing ones.
The task of topic discovery is an essential part of computer systems that
require the analysis of large amounts of text automatically in a reasonable time,
Topic Discovery About Economy During COVID-19 Pandemic 523
compared to the time that the human would spend on the same task. Several
algorithms can extract topics present in large volumes of information quickly in
the literature, such as Latent Semantic Analysis, Probabilistic Latent Semantic
Analysis and Latent Dirichlet Analysis.
The computational systems that topic discovery presents in large volumes
of text have the characteristic that they can shred the text as a human being
does. In the literature, some techniques can provide a system with the ability
to examine a significant number of texts in a short time, aided by techniques
that incorporate mathematical procedures that together uncover the topics [6].
Some of the most used techniques are Latent Dirichlet Analysis (LDA), Latent
Semantic Analysis (LSA), and Probabilistic Latent Semantic Analysis (PLSA).
It is worth mentioning that the authors who have worked with texts in English
have mainly applied LDA or some method developed by them. The Spanish
language has the disadvantage that it has been little investigated.
Latent Dirichlet Analysis (LDA) is a generative probabilistic model for ana-
lyzing discrete data collections. The hierarchical Bayesian model of three levels
(document, word, and topic) considers a topic distribution over a vocabulary.
The model explains the number of topics and defines the words that belong to
those topics [5]. Furthermore, Latent Semantic Analysis (LSA) discovers words
from the same semantic field (those that form a group of words that share char-
acteristics in their meaning). LSA is based on the linear factorization known
as Singular Value Decomposition (SVD) [4]. And finally, Probabilistic Latent
Semantic Analysis (PLSA) descends from LSA, discovers the semantics of hidden
topics in documents using the representation called bag of words (they represent
each document ignoring the order of the words) [4].
Some authors have not used traditional techniques for this task. The use of
neural networks, cosine similarity, clustering algorithms, Principal Component
Analysis, among others, are applied by authors such as [6,8]. They have not
used LDA, LSA or PLSA, but they have discovered topics with their proposed
algorithm.
Regarding text pre-processing, some tools provide morphological analysis,
recognition of named entities, PoS-tagging, disambiguation of words’ meaning,
and lemmatization. The Freeling Analyzer [14] provides these types of analysis
for English, Spanish, Portuguese, Italian, French, German, and Russian texts.
This paper proposes to carry out the topic discovery in texts in Spanish, tweets
about the Mexican economy with COVID19 mentions. We implemented three
techniques most used in the literature: Latent Dirichlet Analysis, Latent Seman-
tic Analysis, and Probabilistic Latent Semantic Analysis, with three different
approaches using textual features and morphological information of texts.
3 Related Work
This section presents some of the works carried out by other authors on topic
discovery in Spanish texts.
In [10], a method is presented to detect polarity in opinions by topics in texts
in Spanish about opinions of Andalusian hotels on the TripAdvisor site. The
524 A. L. Lezama Sánchez et al.
the investigation. The authors discovered 14 main topics. The most common
were health care responses and clinical manifestations. To obtain the data set
with which they carried out their experiments, the authors carried out a search in
PubMed on June 1, 2020, with terms such as covid or covid-19 without language
or date restrictions. They excluding the term coronavirus, using the Biopython
package. The corpus consisted of the title, keywords, abstract, date of the last
revision, the author’s affiliation list, the name of the journal, and the PubMed
identification number for each publication. The corpus was preprocessed; that
is, uppercase letters were converted to lowercase, and double spaces; special
characters, stopwords, and numbers were removed, and they were also applied to
stems. The evaluation was carried out with the evaluation metrics of perplexity,
probability of exclusion, and PCA. The authors decided the final number of
topics based on the evaluations of the three evaluation metrics, as well as the
authors’ mastery of COVID-19 and medical research.
In [11], it is proposed a topic discovery study and sentiment analysis from
tweets about the COVID-19 vaccine. The method distinguishes between changes
in topics and feelings over time in the population. The authors built a corpus
of tweets on COVID-19 dating from March 11, 2020, the day the World Health
Organization declared COVID-19 a pandemic, through January 31, 2021. The
key phrases they used to download these tweets were CoronavirusPandemic,
COVID-19, 2019nCoV, CoronaOutbreak, coronavirus, WuhanVirus, covid19,
coronavirus pandemic, covid-19, 2019ncov, coronaoutbreak y wuhanvirus. The
authors used R software to pre-process and preserve the tweets that contained
the keywords vaccination, vaccines, vaccine, vaccines, immunization, vaccinate,
and vaccinated. The authors applied the Latent Dirichlet Assignment for topic
modeling and sentiment and emotion analysis using the Council of Canada’s
National Lexicon of Emotion Research. The analysis yielded 16 topics, which
were grouped into five general topics. Based on the results obtained, the most
discussed topic was vaccination and how to obtain it. Regarding sentiment anal-
ysis, they showed that sentiment was increasingly optimistic.
An analysis of Twitter narratives around decision making by applying a
dynamic theme model to tweets is expose in [16]. The authors downloaded a set
of COVID-19 related tweets about governors and members of the US presiden-
tial cabinet, with a total of 73 politicians. The tweets were downloaded from
January 1, 2020, to April 7, 2020. The corpus obtained had 7,881 tweets related
to COVID-19 of the 73 politicians ranked in ascending order over time. The
model used was the Network Hawkes Binomial Topic Model to track evolving
subtopics around COVID-19. The authors built networks of influence among
government officials using Granger causality. Based on experimental results, the
authors found themes about risks, working from home, staying at home, school
closings, and social distancing.
A study that aims to understand the discourse and psychological reactions
of Twitter users on COVID-19 is proposed in [18]. The authors selected a list
of 19 trending hashtags related to COVID-19. The proposed study managed
to identify 11 topics, including confirmed cases, mortality, cases outside and
Topic Discovery About Economy During COVID-19 Pandemic 527
within China. Covid-19 outbreak in South Korea, early signs of the outbreak in
New York, Diamond Princess cruise ship, economic impact, preventive measures,
authorities, and supply chain. The results obtained did not reveal topics related
to treatments and symptoms as frequently as the topics on Twitter. In addi-
tion, they applied a sentiment analysis that showed that fear of the unknown
nature of coronavirus was dominant in all topics. The authors applied an obser-
vational study design and an intentional sampling approach to select all Tweets
containing defined hashtags related to COVID-19 on Twitter.
In general, one of the limitations when working with texts in Spanish is the
lack of tools for its treatment. For this reason, work that addresses the discovery
of text topics in Spanish is lacking; added to this, the processing of texts from
social networks represents an important challenge in tasks of this magnitude.
For this reason, in this work, it was proposed to discover the topics presented in
Spanish texts but working with nouns, adjectives, and verbs since it is considered
that these three elements are the ones that provide information to carry out this
task.
This work proposes to discover the latent topics in a corpus with the three
topic discovery techniques most used in the literature in a corpus in Spanish
extracted from Twitter. The corpus has been pre-processed traditionally, and
the topics are subsequently extracted. After using the Freeling analyzer, the
dependency graph was obtained, providing information about where the nouns,
abjectives, and verbs are found. The objective was to analyze the behavior of
LDA, LSA, and PLSA in a corpus in Spanish, and when Freeling was used
the coherence levels of each technique used were improved compared to the
first approach. The proposed evaluation is topic coherence aided by an external
corpus, in this case, Wikipedia, with a size of 1,495,246 million documents.
4 Proposed Method
In this work, it is proposed to carry out three different approaches for topic
discovery in Spanish. First, the corpus is pre-processed by removing stop words,
punctuation marks, non-ASCII symbols, converting uppercase to lowercase, and
removing non-printable symbols. The second approach consists of removing non-
ASCII symbols and, with the help of Freeling, extracting the dependency graph
and working with nouns and adjectives. The third approach works only with
adjectives, verbs, and nouns.
The following steps describe the proposed experiments:
1. Corpus pre-processing: For the first approach, this stage includes the following
actions:
(a) Mentions, number symbols, emoticons, punctuation marks are removed,
and accents of the language.
2. For the second and third approaches, we remove the non-ASCII symbols and
extract the parts of speech with the help of Freeling. The dependency graph is
obtained from the original corpus. It is presented as an algorithm as follows.
Begin
According to option do:
528 A. L. Lezama Sánchez et al.
where T is the main words p(wi ) (resp. p(wj )) is the probability that the
word wi (resp. p(wj )) appears in a text window of a given size, while p(wi , wj )
denotes the probability that wi y wj co-occur in the same window.
5 Experimental Analysis
The results obtained with the proposed method provided a view of the three
selected elements: adjectives, nouns, and verbs. Although the results so far are
not the best compared to the literature, they did indicate that the proposed
method can provide coherent topics about language and domain. The proposed
method is applied to the LDA, LSA, and PLSA techniques, combining the num-
ber of topics parameters 20, 50, and 100 to be discovered, respectively, to observe
the method’s behavior according to the number of topics extracted. The follow-
ing section presents in detail the results obtained and evaluated with the topic
coherence metric.
6 Results
This section presents the dataset used and the results obtained with the proposed
procedure.
6.1 Dataset
This section presents the dataset used (Table 1) and the experimental results
with the proposed procedure. The effects and differences when applying LDA,
LSA, and PLSA on a set of tweets are analyzed. The tweets are extracted between
May 2020 and November 2021, filtering the Spanish and Mexican territory and
contains #COVID-19 in economics. We use Twitter API to extract and filter
them. The dataset was pre-processed in two different ways for three different
approaches. For the first approximation, stop words, non-ASCII symbols, URL,
Topic Discovery About Economy During COVID-19 Pandemic 529
mentions, punctuation marks, and symbols such as #, %, &. With the pre-
processed corpus, LDA, LSA, and PLSA were applied and evaluated with topic
coherence. Later, returning to the original corpus, only non-ASCII symbols are
removed and using the Freeling analyzer, certain parts of each sentence that
make up the corpus are obtained. The dataset information is shown in Table 1,
where D represents the number of documents in the dataset, and T is the total
vocabulary, including stop words.
Table 1. Dataset
Dataset 20 50 100
LDA 1.15 1.11 0.94
LSA 0.93 0.93 0.96
PLSA 1.07 1.11 1.12
Table 3 the results obtained when working on the second sub-corpus created
are shown. The second approach was formed by extracting only the adjectives
and nouns labeled by Freeling from the corpus. It is observed that the results
vary a little. In the case of LDA, when 20 topics are discovered with their ten
most representative words, it obtains higher coherence levels than when 50 and
100 topics are discovered, respectively. In the case of LSA, the same behavior is
observed, but when 100 topics are obtained and in the case of PLSA when 20 and
50 are obtained. This behavior in the results is since on the original corpus. It was
530 A. L. Lezama Sánchez et al.
Table 3. Results obtained with topic coherence and the adjectives and nouns recog-
nized by freeling
Dataset 20 50 100
LDA 0.94 0.92 0.84
LSA 1.00 1.02 1.05
PLSA 1.19 1.19 1.18
Table 4 shows the results obtained with the topic coherence metric when
evaluating LDA, LSA, and PLSA, with the third proposed approach, extracting
nouns, adjectives, and verbs labeled by Freeling from the original corpus. In this
experiment, the highest results were obtained for LDA with 20 topics compared
to the results obtained with 50 and 100 topics. In the same way, the same happens
with LSA with 100 topics and with 50 topics for PLSA.
The results shown in Table 3 compared with those shown in Table 4 are higher
in LDA with 20 topics because the verbs were incorporated, but the nouns are
the same, which stopped them from obtaining higher results than those obtained
previously. LSA did not exceed the results obtained in Table 3. However, if it
exceeded those obtained in the Table 2 which shows that in this case, incorpo-
rating adjectives, nouns and verbs provided information that LSA incorporated
into the topics discovered, and therefore, its levels of coherence were higher. On
the other hand, PLSA obtained the highest results compared to the previous
ones reported, which shows that for PLSA, incorporating nouns, adjectives, and
verbs allowed to eliminate much present noise that was had during the first
approximation.
Table 4. Results obtained with topic coherence and the adjectives, nouns and verbs
recognized by freeling
Dataset 20 50 100
LDA 1.06 0.88 0.86
LSA 1.00 1.01 1.04
PLSA 1.00 1.22 1.19
Among the topics recovered were the words vacuna, sputnik, ola, contagios,
casos, financieros, polı́tica, mexicanos as shown in Table 5 shows 5 of the 20
Topic Discovery About Economy During COVID-19 Pandemic 531
topics obtained with LDA with adjectives, verbs and nouns with only five top
words. Topic 1 reference is made to health measures as well as vaccination in
companies. Topic 2 refers to the financial crisis due to the pandemic. However,
in topic 3, reference is made to vaccination against COVID in Mexican territory.
On the other hand, Topic 4 refers to financial activity and positive cases of
COVID. And topic 5 refers to taxes in the country and the increase in infections
during the pandemic.
Table 5. Topics obtained with LDA and the adjectives, verbs and nouns recognized
by freeling
text analysts and, in general, users who want to know the topics discussed on
social networks in this pandemic situation over a specific region.
The purpose of this article was topic discovery in spanish tweets with LDA,
LSA and PLSA only with features like verbs, adjectives and nouns. The obtained
results show that the aim was accomplished because the results obtained were
obtain topical texts in Spanish tweets about COVID-19, showed that it is possible
to obtain important information about pandemic situation in the social network
Twitter.
As work in the future, it is necessary to consider in-depth information from
the texts, such as relationships or semantic roles. Additionally, implement deep
learning techniques to discover topics with better coherence.
References
1. Agüero-Torales, M.M., Vilares, D., López-Herrera, A.G.: Discovering topics in twit-
ter about the COVID-19 outbreak in Spain. Procesamiento del Lenguaje Natural
66, 177–190 (2021)
2. Älga, A., Eriksson, O., Nordberg, M.: Analysis of scientific publications during the
early phase of the COVID-19 pandemic: topic modeling study. J. Med. Internet
Res. 22(11), e21559 (2020)
3. Amara, A., Hadj Taieb, M.A., Ben Aouicha, M.: Multilingual topic modeling for
tracking COVID-19 trends based on Facebook data analysis. Appl. Intell. 51(5),
3052–3073 (2021)
4. Anaya, L.H.: Comparing Latent Dirichlet Allocation and Latent Semantic Analysis
as Classifiers. ERIC (2011)
5. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn.
Res. 3, 993–1022 (2003)
6. Bougteb, Y., Ouhbi, B., Frikh, B., et al.: Deep learning based topics detection. In:
2019 Third International Conference on Intelligent Computing in Data Sciences
(ICDS), pp. 1–7. IEEE (2019)
7. Figuerola, C.G.: Applying topic modeling techniques to degraded texts: Spanish
historical press during the transición (1977-1982). In: Proceedings of the Sixth
International Conference on Technological Ecosystems for Enhancing Multicultur-
ality, pp. 857–862 (2018)
8. Fuentes-Pineda, G., Meza-Ruiz, I.V.: Topic discovery in massive text corpora based
on min-hashing. Expert Syst. Appl. 136, 62–72 (2019)
9. Heintz, I., et al.: Automatic extraction of linguistic metaphors with LDA topic
modeling. In: Proceedings of the First Workshop on Metaphor in NLP, pp. 58–66
(2013)
Topic Discovery About Economy During COVID-19 Pandemic 533
10. Hernández, A.R., Lorenzo, M.M.G., Simón-Cuevas, A., Arco, L., Serrano-Guerrero,
J.: A semantic approach for topic-based polarity detection: a case study in the
Spanish language. Procedia Comput. Sci. 162, 849–856 (2019)
11. Lyu, J.C., Le Han, E., Luli, G.K.: COVID-19 vaccine-related discussion on twitter:
topic modeling and sentiment analysis. J. Med. Internet Res. 23(6), e24435 (2021)
12. Mena, A., Reátegui, R.: Topic identification from Spanish unstructured health
texts. In: Botto-Tobar, M., Montes León, S., Camacho, O., Chávez, D., Torres-
Carrión, P., Zambrano Vizuete, M. (eds.) ICAT 2020. CCIS, vol. 1388, pp. 351–362.
Springer, Cham (2021). https://doi.org/10.1007/978-3-030-71503-8 27
13. Navarro-Colorado, B.: On poetic topic modeling: extracting themes and motifs
from a corpus of Spanish poetry. Front. Digit. Humanit. 5, 15 (2018)
14. Padró, L., Stanilovsky, E.: Freeling 3.0: towards wider multilinguality. In: LREC
2012 (2012)
15. Saorı́n, T.: Wikipedia de la A a la W, vol. 8. Editorial UOC (2012)
16. Sha, H., Hasan, M.A., Mohler, G., Brantingham, P.J.: Dynamic topic modeling
of the COVID-19 twitter narrative among US governors and cabinet executives.
arXiv preprint arXiv:2004.11692 (2020)
17. Sridhar, V.K.R.: Unsupervised topic modeling for short texts using distributed
representations of words. In: Proceedings of the 1st Workshop on Vector Space
Modeling for Natural Language Processing, pp. 192–200 (2015)
18. Xue, J., Chen, J., Chen, C., Zheng, C., Li, S., Zhu, T.: Public discourse and sen-
timent during the COVID 19 pandemic: using latent dirichlet allocation for topic
modeling on twitter. PLoS ONE 15(9), e0239441 (2020)
SimLDA: A Tool for Topic Model
Evaluation
1 Introduction
In supervised learning models, the ability of a trained model to predict a target
variable is evaluated using a test set. Evaluating the performance of unsupervised
learning algorithms such as topic models, is less straightforward and a measure
of success needs to be defined. Typically, to evaluate topic models, the metrics
discussed below can be utilised.
been done to improve the estimators [36], held-out perplexity does not give suffi-
ciently fine-grained resolution: Minka and Lafferty address similar concerns [25].
They demonstrate that held-out perplexity for two different models can be
almost identical but when inspected (using simulated data where the word-topic
and topic-document distributions are known), large performance differences are
seen [25]. Furthermore, a large-scale human topic labeling study by Chang et al.
[14] demonstrated that low held-out perplexity is often poorly correlated with
interpretable latent spaces.
In more recent work, coherence measures are typically preferred in topic
evaluation [29]. Coherence, unlike held-out perplexity, is highly correlated with
human interpretability of topics [28]. In a comprehensive study of multiple coher-
ence measures, the CV coherence score had the highest correlation with human
topic ratings [28]. This measure is a combination of three measures: the indi-
rect cosine measure, the Boolean sliding window and the normalised pointwise
mutual information score, CNPMI , which performed almost as well as the CV
score. Other well-known coherence measures evaluated in their analysis include
CUCI and CUMass [24,28]. The CV score (used in this article) and the simpler
CNPMI score, are now popular for evaluating topic modelling results.
These coherence measures, however, are not without their drawbacks since
they take only the top words per topic into account, and not the full distributions
over topics. Consequently, much detail of the learnt distributions is discarded.
Because these measures are not comprehensive evaluation tools, it is good
practice to inspect the topics extracted (read through the words in each topic)
where the metrics indicate good performance [14]. Here we propose using sim-
ulated data along with a Kulback-Leibler divergence (KLD) measure to replace
extensive use of this tedious process and show how this metric gives more fine-
grained results than the CV coherence score for the same simulated data sets.
1.3 Overview
2 Background
In this section, we introduce latent Dirichlet allocation (LDA) and the two
approximate inference techniques that will be used to showcase our topic model
performance evaluation methodology.
Although many types of topic models exist, ranging from latent-sematic indexing
(LSI) [20], as well its probabilistic counterpart, probabilistic-LSI (pLSI) [18], to
correlated topic models (CTM) [8], latent Dirichelt allocation (LDA) is still one
of the most popularly used topic models [32].
While the LDA model can extract latent topics of any type from a wide range
of inputs, it is most commonly known for its ability to extract latent semantic
information from text corpora (collections of documents).
By applying LDA to text corpora, we can extract topics, each consisting of
a list of words, where each word in the vocabulary has a probability of being in
that topic. Similarly, after running the inference algorithm, each document in
the corpus is represented as a probability distribution over topics. The notation
used to represent these word-topic and topic-word distributions, as well as the
other distributions that characterise LDA, are listed in Table 1.
SimLDA: A Tool for Topic Model Evaluation 537
Fig. 1. Plate model of LDA system as a bayes net. The symbols used in this figure are
explained in Table 1.
Symbol Description
M Total number of documents
m Current document
N Number of words in current document
n Current word (in document)
K Total number of topics
k Current topic
Km n Number of topics per document
V Total number words in the vocabulary
v Current word (in vocabulary)
v Observed word (in vocabulary)
θm Topic-document Dirichlet for document m
Zm,n Topic-document categorical for word n in document m
Wm,n Word-topic conditional categorical for word n in document m
φk Word-topic Dirichlet for topic k
We use the LDA model in this article to perform topic modelling, and com-
pare the topic extraction results using two different approximate inference tech-
niques that are introduced in the following section.
Exact inference is intractable for many useful graphical models such as LDA
[7,9,10]. In fact, one cannot perform exact inference on any graphical model
where continuous parent distributions have discrete children [26]. A range of
538 R. M. C. Taylor and J. A. du Preez
10 20 30 40 50 60 20 40 60 80 100 10 20 30 40 50 60
Words in dictionary for corpus Words in dictionary for corpus Words in dictionary for corpus
(a) Generated Word-Topic (b) Generated-Word Distri- (c) Shuffled Word-Topic Dis-
Distributions for a Laplace butions on a Gaussian Dis- tributions for (a). This Il-
Distributed Data Set with tributed Data Set with 15 lustrates that no Reliance
Number of Topics being 7, Topics. This Data Set has is made on the Sequence of
with the 7th Topic as the a Narrow Width (Support) Words within a Topic or the
Function Words Topic. per Document. Fact that Adjacent Topics
are Significantly more Likely
to Share Words.
Fig. 2. Plots of generated word-topic distributions from which samples are drawn in
the simulation of documents. The width of the word-topic distributions relates to the
support of each distribution.
To match up the extracted topic to the ground truth topic, we compare each
extracted topic with the ground truth topic and choose the extracted topic that
is closest to the ground truth topic base on KLD. We repeat this process for
all ground truth topics and the average KLD over all topics is taken to be the
error for each model. It is important to note that when generating a corpus, we
are sampling from the underlying true distributions. We compare the extracted
distributions with the ground truth distributions from which we sample, and not
from the sampled distributions.
3.3 Implementation
SimLDA was developed using EMDW, a C++ library for Bayesian statistics from
Stellenbosch University [12,22,30,31], and can be used directly from Python. It
has also been Dockerised so that it can be used on any machine (see Fig. 3). It
can be used as an HTTP API (accepting a PUT request with JSON payload),
through the LDA wrapper package or directly from the console. Below we show
an example of input that is written to a JSON file (with Python-style comments),
1 SimLDA_json = {
2 ‘‘ topics_per_doc " : 3 , # topics per doc
3 ‘‘ number_of_docs " : 5 0 0 0 , # number of documents
4 ‘‘ total_topics " : 5 0 , # topics per doc
5 ‘‘ words_per_doc " : 1 5 0 , # words per doc
6 ‘‘ total_vocab " : 1 0 0 0 0 0 , # number words in corpus
7 ‘‘ weightingfactor " : 0 . 2 5 , # scale the width of the
distribution
8 ‘‘ tag " : " api " , # the tag is be used in
path
9 ‘‘ laplace " : False , # True for Laplace , else
Gaussian
10 }
If the API is used, the documents are returned in JSON format, along with a
dictionary. If SimLDA is used natively, the documents are written to compressed
text files locally.
SimLDA: A Tool for Topic Model Evaluation 541
AWS Cloud
lda_wrap package
Input
documents
Notebook / Python Script
Document
Topic word
topic
Distriputions
distributions
(a) Deployment of the Docker Container on Amazon Web Services (AWS) Elastic
Container Registry (ECR)
lda_wrap package
docker container
Input
documents
Notebook / Python Script
Document
Topic word
topic
Distriputions
distributions
Fig. 3. Diagram showing how the SimLDA can be made available either on (a) a cloud
service such as amazon web services (AWS) or (b) on a server or local machine.
Once the simulated documents are created and made available, our LDA
wrapper package can be used to parse the created documents, and to interface
with the topic models. The LDA wrapper package also allows us to run a number
of iterations for each corpus type for the simulated data sets. On completion,
SimLDA writes the generated documents to file, or, if used as an API, returns
the documents as a JSON payload.
542 R. M. C. Taylor and J. A. du Preez
4 Method
Here we describe the method used to showcase SimLDA and our custom topic
modelling evaluation metric. We start by describing the simulated data sets, and
then describe the hyperparameters that are used in the experiments.
We chose two small synthetic data sets to illustrate the functionality of SimLDA.
Each data set consists of 20 groups of corpora, where each group contains corpora
consisting of a set number of documents per corpus. We generate multiple cor-
pora per data set so that we can compare performance over a number of samples
to have an idea of how performance varies with small changes to a corpus.
These data sets are small by real-word text topic extraction standards (in
terms of number of documents, and words per document), which makes it harder
for LDA to learn their underlying distributions—they contain less information.
By choosing harder data sets, differences between topic models are often more
apparent.
Furthermore, smaller corpora require less processing time. Choosing small
corpora allows us to:
1. Run collapsed Gibbs sampling for long chains and take multiple samples.
2. Generate many corpora per corpus generation parameters setting (such as
document length, number of topics per document, etc.).
3. Iterate over multiple hyperparameters for LDA (such as the Dirichlet hyper-
parameters, and number of epochs).
We now describe the two simulated corpora that are used in this work.
Smaller Simulated Data Set: For each corpus we use the following corpus
generation parameters (see Table 1): V = 100, N = 100, K = 7 and Km = 3.
This data set is smaller than the other in terms of number of topics and
vocabulary length. There are 100 words per document, which makes the total
number of observed words low—which would be the case even with many docu-
ments.
The ratio of topics per document to total topics is reasonably high (about
1:2) when compared to text topic extraction data sets. When performing LDA
on text corpora, we typically expect fewer topics within each document (often
only one or two, such in the 20 Newsgroups corpus), but expect many more
topics for the entire corpus.
Larger Simulated Data Set: For each corpus we use the following corpus
generation parameters: V = 500, N = 120, K = 10 and Km = 5. This data set
has a larger vocabulary, though considerably smaller than most text corpora.
Each document contains five documents out of the 10 available topics.
SimLDA: A Tool for Topic Model Evaluation 543
Here we provide details about the hyperparameters that are chosen to be used
for our experiments.
We now present the topic extraction results for these two simulated data sets,
as well as for a well known text corpus, the 20 Newsgroups corpus [2].
5 Results
To objectively determine the degree to which the estimated topic-word distribu-
tions differ from the actual distributions from which the simulated data are gen-
erated, we present average KLD values for each of the two algorithms. For each
group of 20 corpora (each group consisting of a different number of documents
per corpus M —with the other hyperparameters fixed), we compute average KLD
over all topics for the two algorithms.
Using box plots, we show the average KLD against the number of documents
per corpus. This allows the median KLD and interquartile ranges (the latter
indicating the degree of variability in the data) of the algorithms to be compared
visually. These results are summarised in Fig. 4 (smaller simulated data set) and
Fig. 9 (larger simulated data set).
We also, for select corpora, plot the word-topic distributions inferred by the
algorithms, superimposed on the true distributions from which the corpora are
sampled. Average KLD over all topics is provided in these plots (which we
call word-topic plots), as an objective indication of the extent to which the
true and extracted distributions agree. Algorithm performance can also be visu-
ally assessed by examining the differences between the true distributions and
544 R. M. C. Taylor and J. A. du Preez
extracted distributions. In Fig. 4, we show the summary box plots for the exper-
iments performed on this data set. For corpora containing fewer documents, col-
lapsed Gibbs sampling outperforms VB in terms of both variability and median
value. Average KL-Divergence for topics (per run)
Gibbs VB
0.7
0.6
0.5
0.4
0.3
0.2
0.1
Fig. 4. Box plot showing the average KLD values for collapsed gibbs sampling and VB
as the number of documents per run increases for the Smaller Simulated Data Set. The
average KLD is computed over all topics for 20 runs. For smaller corpora, collapsed
gibbs sampling performs best. For larger corpora, VB starts to perform as well or even
better than collapsed gibbs sampling. (Color figure online)
We show only one example of poorer performance and one example of better
performance (based on KLD scores provided in each figure) of each algorithm.
In each plot, the ground truth topics are represented by red lines, and extracted
topics are represented by different colours. The closer the coloured curves are to
the red lines over all topics, the better the performance of the algorithm.
For corpora with 200 documents each, VB starts to outperform collapsed Gibbs
sampling in terms of the median value, but not in terms of variability. For corpora
with more than 200 documents, VB outperforms collapsed Gibbs sampling in
terms of median value, and the variably starts to decrease to a level that seems
to be nearing that of collapsed Gibbs sampling.
Inspecting the topic extraction of individual corpora containing 50 documents
each (Fig. 5 and 6), allows us to compare the extracted topics (the coloured
curves) with the ground truth topics (as defined in SimLDA). It is clear that
collapsed Gibbs sampling extracts topics more correctly than VB does, since in
SimLDA: A Tool for Topic Model Evaluation 545
Fig. 5, we see that the coloured curves do not match the red curves and that this
is reflected in the high KLD values of 0.49 and (at best) 0.19 (compared with
the KLD values of 0.16 and 0.12 for the examples shown in Fig. 5 as extracted
by collapsed Gibbs sampling.
0.200
0.20
0.175
Probability of word per topic
0.125
0.100 0.10
0.075
0.050 0.05
0.025
0.000 0.00
0 20 40 60 80 100 0 20 40 60 80 100
(a) KLD = 0.16 (Average over All Topics). (b) KLD = 0.12 (Average over All Topics).
This is One of the Corpora where Collapsed This is an Example of Good Topic Extrac-
Gibbs Sampling Performed the Worst (al- tion by Collapsed Gibbs Sampling.
though it is still good performance).
Fig. 5. True versus extracted topics identified by collapsed gibbs sampling from two
simulated corpora derived from the Smaller Simulated Data Set (Color figure online)
Although we have only presented results in this manner for a few select cor-
pora, one can inspect the results for each corpus. This is valuable when devel-
oping either new topic modelling techniques or when developing a new inference
algorithm.
0.200 0.200
0.175 0.175
Probability of word per topic
0.150 0.150
0.125 0.125
0.100 0.100
0.075 0.075
0.050 0.050
0.025 0.025
0.000 0.000
0 20 40 60 80 100 0 20 40 60 80 100
(a) KLD = 0.49 (Average over All Top- (b) KLD = 0.19 (Average over All Top-
ics). This is an Example of Typical Topic ics). This Shows Exceptionally Successful
Extraction by VB. Typical Topic Extraction by VB.
Fig. 6. True versus extracted topics identified by VB from two simulated corpora
derived from the Smaller Simulated Data Set. Each corpus contains 20 documents.
The result in (a) is a typical result, not an extreme one. In (b) this result for VB is
in fact the KLD outlier that can be seen in the summary box plot in Fig. 4 (Plotted
where M = 50 on the x-Axis).
546 R. M. C. Taylor and J. A. du Preez
0.70
Gibbs VB
0.65
0.60
Coherence score: c v
0.55
0.50
0.45
0.40
0.35
0.30
4 5 6 7 8 9
Number of topics
Fig. 7. Cv scores for the two algorithms for the Smaller Simulated Data set for corpora
containing 100 documents.
SimLDA: A Tool for Topic Model Evaluation 547
0.70
Gibbs VB
0.65
0.60
Coherence score: c v
0.55
0.50
0.45
0.40
0.35
0.30
4 5 6 7 8 9
Number of topics
Fig. 8. Cv scores for the two algorithms for the Smaller Simulated Data set for corpora
containing 500 documents.
Here the inference problem is harder to solve than when performed on the smaller
simulated data set, since there are more topics per document (6 topics, instead
of 3), which implies greater topic overlap within each document.
Over all the groups of corpora (from those containing 100 to those containing
500 documents each), collapsed Gibbs sampling outperforms VB with a large
margin in terms of variability as well as median value.
The word-topic plots show more detail with regard to these summarised
results. In Fig. 10, we show topics extracted using VB on two corpora containing
100 documents each. In (a) the topic extraction performance is very poor. In (b)
we can see that the algorithm identifies most of the underlying topics, but not
well.
Figure 11 shows topic extraction by collapsed Gibbs sampling. For these cor-
pora, collapsed Gibbs sampling successfully identifies the topics.
548 R. M. C. Taylor and J. A. du Preez
1.0
0.8
0.6
0.4
0.2
Fig. 9. Box plot showing the average KLD values for the four algorithms as the number
of documents per run settings increase for the larger data set. KLD is computed over
all topics for 20 runs. It is clear that VB is the worst performing algorithm over this
range of corpora.
0.040
0.035 0.04
Probability of word per topic
0.030
0.03
0.025
0.020
0.02
0.015
0.010
0.01
0.005
0.000 0.00
0 100 200 300 400 500 0 100 200 300 400 500
(a) KLD = 1.3 (Average over All Topics). (b) KLD = 0.77 (Average over All Topics).
This is an Example of Poor Topic Extrac- This is an Example of Good Topic Extrac-
tion by VB. tion by VB.
Fig. 10. True versus extracted topics identified by VB for two simulated corpora
derived from the Larger Simulated Data Set. Each corpus contains 100 documents.
SimLDA: A Tool for Topic Model Evaluation 549
0.05
0.04
Probability of word per topic
0.03
0.03
0.02
0.02
0.01 0.01
0.00 0.00
0 100 200 300 400 500 0 100 200 300 400 500
(a) KLD = 0.32 (Average over All Topics). (b) KLD = 0.3 (Average over All Topics).
This is a Typical Topic Extraction by Col- This is Another Typical Topic Extraction
lapsed Gibbs Sampling. by Collapsed Gibbs Sampling.
Fig. 11. True versus extracted topics identified by collapsed gibbs sampling for two
simulated corpora derived from the Larger Simulated Data Set. Each corpus contains
100 documents.
To compare our KLD metric with coherence, we chose the corpus group
where M = 200, and plot the Cv coherence scores in box plot form in Fig. 12.
Collapsed Gibbs sampling performs better than VB for the correct number of
topics (K = 10), as well as where (K = 9). For other numbers of topics, VB
either performs similarly or better than collapsed Gibbs sampling. It is also
interesting to note that the correct number of topics, does not give the highest
coherence score.
0.70
Gibbs VB
0.65
0.60
Coherence score: c v
0.55
0.50
0.45
0.40
0.35
0.30
6 7 8 9 10 11 12 13 14
Number of topics
Fig. 12. Cv scores for the two algorithms for the Larger Simulated Data Set for corpora
containing 200 documents.
550 R. M. C. Taylor and J. A. du Preez
Fig. 13. Cv scores for the two algorithms. The performance is similar for K = 13, but
for other values of K, collapsed gibbs sampling performs much better than VB.
SimLDA: A Tool for Topic Model Evaluation 551
6 Discussion
SimLDA allows very large numbers of simulated documents to be created with
a wide range of hyperparameters. By varying these hyperparameters such as
number of topics per document and topic width, one can compare topic model
performance over a wide range of corpora. In this article, we demonstrate this
for the two simulated data sets.
Because the ground truth distribution of the simulated corpora is known,
we can easily compare the extracted topics with the word-topic distributions
used to create the corpora in the first place. By using an average forward KLD
over all the topics, we can quantify the error that a topic model makes for a
specific corpus. Since many corpora can be extracted using the same underlying
distributions, we can apply LDA to a number of these corpora, and inspect the
variability of the results. This gives an indication of the stability of the topic
model, inference technique used for topic extraction, or hyperparameters chosen.
For example, we see that in both the smaller simulated data set and the larger
simulated data set (Fig. 4 and 9), collapsed Gibbs sampling shows less variability
than VB does.
By inspecting these box plots, we can also see that although the general
performance of collapsed Gibbs sampling is better than that of VB by a large
margin, there are times when VB starts to do better than collapsed Gibbs sam-
pling. This can also be seen by looking at the coherence plot in Fig. 8. Should
one have only looked at specific text corpora (such as the 20 Newsgroup corpus,
shown in Fig. 13), this effect could have been missed.
In contrast to our results using SimLDA and KLD, plots of CCv scores reveal
that differences between the two algorithms appear to be very small, with a
large amount of variability in scores at each topic number setting. In the larger
simulated data set, the highest scores for both algorithms could not clearly
identify the correct number of topics. Our KLD metric can show the performance
differences between topic models more clearly than the standard Cv score because
we use the ground truth distributions in the KLD metric, and we work with
probabilities and not merely the word rank.
The visual nature of the word-topic plots are another advantage of our topic
modelling performance evaluation methodology. By using these plots we can see
the probabilities of a word being assigned to a topic, compared with the underly-
ing probability of that word in the topic (as part of the word-topic distributions
from which the corpus was generated). These word-topic plots can, moreover, be
inspected after every few epochs, allowing one to visually compare convergence
for different inference algorithms for the same corpus, or to compare convergence
for corpora with various hyperparameters.
sampling and VB, to perform topic modelling using LDA, and calculate the topic
modelling performance of these algorithms using a forward KLD measure. This
measure utilises the posterior word-topic distributions as well as the original
word-topic distributions from which the corpora were generated.
We plot the results using box plots which show the median values for both
inference algorithms over a range of corpus sizes for both simulated data sets.
Collapsed Gibbs sampling performs better than VB in both data sets overall,
but in the smaller simulated data set, when the number of documents is higher,
VB does marginally better than collapsed Gibbs sampling. This is a function of
the hyperparameters chosen for inference, as well as the corpus hyperparameters.
Being able to identify cases like this is one of the advantages of SimLDA.
We also provide word-topic plots to inspect the results of individual corpora
visually. These plots give a more detailed view of the information provided in the
box plots, and allow the user to see exactly where the topic modelling does well,
and where topics are incorrectly learned. The Cv scores are also computed over a
range of K for the two simulated data sets and compared with the KLD metric.
Coherence scores were not able to discriminate between the two algorithms as
well as what is seen using the custom KLD metric.
As future work, the use of synthetic data generated using SimLDA, together
with our KLD measure, could find application in research involving new topic
models or for comparing existing models and inference algorithms over a wider
range of corpora. Expanding the scope of these methods to include corpora
with diverse characteristics and data distributions could present opportunities
for future work and advance current understanding on which models are most
useful for specific types of datasets. SimLDA currently supports only topics that
have a Gaussian or Laplace shaped distribution. Future work could include the
addition of distributions having other properties. Additionally, SimLDA could
be extended to generate data for other similar graphical models.
References
1. Python Package Index - PyPI
2. 20 newsgroups dataset, empty
3. Albishre, K., Albathan, M., Li, Y.: Effective 20 newsgroups dataset cleaning. In:
2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelli-
gent Agent Technology (WI-IAT), vol. 3, pp. 98–101. IEEE (2015)
4. Asuncion, A., Welling, M., Smyth, P., Teh, Y.W.: On smoothing and inference for
topic models. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in
Artificial Intelligence, pp. 27–34. AUAI Press (2009)
5. Attias, H.: A variational Baysian framework for graphical models. In: Advances in
Neural Information Processing Systems, pp. 209–215 (2000)
6. Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing
Text with the Natural Language Toolkit. O’Reilly Media Inc., Sebastopol (2009)
7. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York
(2006)
8. Blei, D., Lafferty, J.: Correlated topic models. Adv. Neural. Inf. Process. Syst. 18,
147 (2006)
SimLDA: A Tool for Topic Model Evaluation 553
9. Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for
statisticians. J. Am. Stat. Assoc. 112(518), 859–877 (2017)
10. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn.
Res. 3(Jan), 993–1022 (2003)
11. Braun, M., McAuliffe, J.: Variational inference for large-scale models of discrete
choice. J. Am. Stat. Assoc. 105(489), 324–335 (2010)
12. Brink, D.: Using probabilistic graphical models to detect dynamic objects for
mobile robots (2016)
13. Cao, L., Fei-Fei, L.: Spatially coherent latent topic model for concurrent segmen-
tation and classification of objects and scenes. In: 2007 IEEE 11th International
Conference on Computer Vision, pp. 1–8. IEEE (2007)
14. Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J., Blei, D.: Reading tea leaves:
how humans interpret topic models. In: Advances in Neural Information Processing
Systems, pp. 288–296 (2009)
15. Elberrichi, Z., Rahmoun, A., Bentaalah, M.A.: Using wordnet for text categoriza-
tion. Int. Arab J. Inf. Technol. (IAJIT) 5(1) (2008)
16. Griffiths, T.: Gibbs sampling in the generative model of latent dirichlet allocation
(2002)
17. Griffiths, T.: Gibbs sampling in the generative model of latent dirichlet allocation-
gruffydd@ psych (2004)
18. Hofmann, T.: Probabilistic latent semantic analysis. arXiv preprint
arXiv:1301.6705 (2013)
19. Knowles, D.A., Minka, T.: Non-conjugate variational message passing for multi-
nomial and binary regression. In: Advances in Neural Information Processing Sys-
tems, pp. 1701–1709 (2011)
20. Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: the latent semantic
analysis theory of acquisition, induction, and representation of knowledge. Psychol.
Rev. 104(2), 211 (1997)
21. Loper, E., Bird, S.: NLTK: the natural language toolkit. arXiv preprint cs/0205028
(2002)
22. Louw, E.J.: A probabilistic graphical model approach to multiple object tracking
(2018)
23. Mcauliffe, J.D., Blei, D.M.: Supervised topic models. In: Advances in Neural Infor-
mation Processing Systems, pp. 121–128 (2008)
24. Mimno, D., Wallach, H., Talley, E., Leenders, M., McCallum, A.: Optimizing
semantic coherence in topic models. In: Proceedings of the 2011 Conference on
Empirical Methods in Natural Language Processing, pp. 262–272 (2011)
25. Minka, T., Lafferty, J.: Expectation-propagation for the generative aspect model.
In: Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelli-
gence, pp. 352–359. Morgan Kaufmann Publishers Inc. (2002)
26. Murphy, K.P.: Dynamic Bayesian networks: representation, inference and learning,
dissertation. Ph.D. thesis, UC Berkley, Department of Computer Sciences (2002)
27. Rehurek, R., Sojka, P.: Software framework for topic modelling with large cor-
pora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP
Frameworks. Citeseer (2010)
28. Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence mea-
sures. In: Proceedings of the Eighth ACM International Conference on Web Search
and Data Mining, pp. 399–408. ACM (2015)
29. Stevens, K., Kegelmeyer, P., Andrzejewski, D., Buttler, D.: Exploring topic coher-
ence over many models and many topics. In: Proceedings of the 2012 Joint Confer-
554 R. M. C. Taylor and J. A. du Preez
Abstract. This paper reports on creating virtual assistants (VA) that enable users
to query a database in the natural language. Building SQL queries from the natural
language is a complicated task. We build the query via a conversation between
the user and the virtual assistant allowing the users to describe their needs during
a more detailed conversation. The VA uses information about the schema of the
data source to guide the user. The query is built incrementally. To test the proposed
method, we implemented a dialogue system for querying a part of the Open Food
Facts database. The evaluation results show that users successfully completed the
task in most cases. The easiest task was completed by 72% of users, the most
sophisticated task was completed by 58% of users. To finish the tasks, users had
to provide parameters that the VA prompted for, to sort the records, and to add
filtering conditions using natural language. The proposed approach allows the
building of similar VAs for different databases.
1 Introduction
Nowadays, a wide range of open data stored in various databases is available. Frequently,
information from databases cannot be accessed by people who need it, as databases
can be queried either through limited, pre-built user interfaces or by writing queries in
SQL, SPARQL, or other query languages. Thus, for a user without in-depth technical
knowledge full access to the data is impossible. Typical users who could benefit from
the knowledge base information include non-IT researchers, journalists, entrepreneurs,
etc., and typical knowledge bases include open data archives, sales data, etc. Currently,
there is no easy solution. The most typical solution involves asking the IT specialists for
help, trying to load data in Excel, or using graphical query building tools.
In this research, we are looking for solutions that would help people without in-
depth technical knowledge in accessing the information stored in various databases. We
propose to build database queries in a conversation between the user and the virtual
agent. We allow the users to describe their needs in dialog during which the users can
provide more details, if needed, and the VA can help the users by asking questions and
guiding them.
2 Previous Work
There have been many attempts to implement solutions that allow writing the queries in
the natural language that are later automatically converted into the SQL or other technical
query language. Typically, queries are generated using the natural language processing
workflow components. The study [10] offered to use a workflow with six components for
translating natural language questions into structured SPARQL queries. However, each
of the workflow components can create noise, thereby reducing the ability to generate
a correct query. The author [7] proposed to use a simpler workflow that tried to rely
only on keyword detection. Also, [13] developed a SPARQL query generator capable of
coping with noisy inputs. After generating query hypotheses, the system ranked them
based on their structural similarity to the input question.
In a study by [1], 24 systems with a natural language interface for databases have
been evaluated. There are systems that 1) use keywords, 2) use samples (pattern-based),
3) parse text, and 4) use grammar. Each of the systems has been evaluated as to its ability
to interpret 10 questions with a varying degree of complexity. For example, systems had
to find answers to the question All movies starring Brad Pitt from 2000 until 2010 or
All movies with the same genres as ‘Sin City’. Testing results demonstrate that keyword
systems are sufficient to understand simple questions. However, to deal with questions
whose interpretation requires generation of sub-queries, parsing systems that clarify the
structure of the question are more suitable. Overall, grammar-based systems are the most
powerful, but are highly dependent on manually designed rules.
As methods based on neural network algorithms become more popular, researchers
have begun to study end-to-end techniques for generating queries from questions posed
in the natural language. The architecture of solutions based on neural network algorithms
is very diverse. Inputs of the Seq2SQL model [14] contain a question and names of the
table columns, but the output has three components that match the parts of an SQL
query: aggregate function(s), column(s), and filter condition(s). An enhanced learning
algorithm has been employed: it uses the result of the generated query as a reward. The
SQLNet [9] uses the sequence-to-set architecture based on the query template (sketch)
Virtual Assistant for Querying Databases in Natural Language 557
to be filled in with column names and values. The column attention mechanism is used
to determine the columns. With this approach, no query structure needs to be generated.
The sequence-to-SQL approach is also used in [8]. The input of the model contains a
question and a table consisting of column names and cells. At each time step, a channel
is selected for predicting an SQL keyword, a column name, or a cell. The study [4] uses
a two-step neural model, first, by generating the SQL Query Template (sketch) from
the question, and second, by generating a full SQL query by relying both on the text
query and the acquired sketch. There is also a different way to answer questions by
relying on knowledge bases: instead of generating knowledge base queries, [6] train the
memory-to-sequence model for a task-oriented dialog.
The recent advances in automatic speech recognition have promoted development
of a voice-based interface for database querying. The EchoQuery developed by [5] uses
voice command device Echo from Amazon and the voice command service Alexa to
provide a stateful dialogue-based query interface between the user and the database.
There are several labeled datasets that are used for training and testing the systems
that address the challenge of employing the natural language for retrieving information
from relational datasets such as ATIS [3], WikiSQL[14], Spider [11], and CoSQL [12]
which is the dialogue version of Spider. The ATIS Corpus contains information on air
traffic in the United States. ATIS0 Pilot has 2884 questions in the natural language
about information from 28 tables (125 fields). The WikiSQL corpus includes 80,654
queries about the information in 24,241 Wikipedia tables. The Spider dataset contains
200 databases in 138 domains, and 5,693 SQL queries corresponding to 10,181 questions
in the natural language.
Questions for querying information from various domains can be very different.
Besides, the question labeling process can be expensive, timely, and requires expert
knowledge. Many previous solutions assume that the entire query will be described
by a single expression in the natural language (a sentence or a few), but it may be
very challenging for a human to describe complex queries in this manner. Unlike the
approaches described above, we offer to take a different approach: we propose building
the query as a conversation between the users and the virtual agent. We want to allow
the users to describe their needs in a more detailed conversation during which the users
can provide more details, if needed, and the VA can help the users by asking questions
and guiding them.
3 Methodology
In this research we are doing a feasibility study, and the research question of our paper
is – is it possible to create a virtual assistant that helps its users without deep technical
knowledge to build SQL queries and access the necessary information from databases.
And we are looking for solution that would be easy to adapt to different databases and
would not require collecting and annotating large datasets.
To answer this question, we are building a prototype that demonstrates how we can
build an example virtual assistant that allows users to access one particular database.
The creation of such a prototype would open opportunities for other researchers to build
virtual assistants for other databases using techniques similar to those we propose.
558 D. Deksne and R. Skadiņš
To validate, the prototype are conducting a user study to understand whether the
prototype created allows users to make queries to the database. In this study, we analyze
user behavior, we analyze which tasks are easier and which tasks are more difficult, and
we count how many percent of the cases users succeeded with the task.
4 Proposed Solution
The VA uses knowledge about the data source to query. Although there have been
attempts to convert any natural language query into a SQL query, in this research we use
database schema information that describes the structure of the database – tables, fields,
field types, links, indexes, etc.
The VA builds the query incrementally. Initially, we start with a query template, and
add some query elements analyzing user’s input using the natural language understanding
(NLU) techniques: intent detection and named entity recognition. We use an intent
detection component that is based on fastText word embeddings and a convolutional
neural network [2]. Input of the classifier contains the embedding vector for user’s
utterance. Output contains probability distribution for all possible intents.
The query is built up with each input from the user. Typical intents allow to specify
a query template, fields for selection, the sorting order, filtering conditions, the number
of records to retrieve, and typical entities used with these intents include table and field
names, and filtering values. The query is built up by relying on the detected intent as a
command for the query builder, and entity values are the attributes of these commands.
During the conversation, the parts of the query are stored in conversation context vari-
ables, and the SQL query is generated only when we need to execute it. This approach
allows us to focus the query building process on describing the data we want to get and
not the syntax of the query language.
The dialogue between the user and the VA is guided by a specific scenario managed
by the Dialog Manager (see Fig. 1). A combined dialogue style is used: it combines the
features of the guided and the free dialogue. In the course of the guided dialogue, the
VA determines the course of the conversation by asking questions or asking the user
to choose one of the provided options. In the course of the free dialogue, the initiative
is given to the user. The users express their wishes or ask for something, and the VA
responds accordingly.
The solution consists of two parts. One part does not depend on a particular data
source and the other part contains specific data for a particular database. The dialogue
scenario, the functions that encapsulate the API calls, and the VA responses are three
database-independent parts of the solution, they are the same for any database. Entities,
query templates, the way how users express their intents to use these templates, and
information about the database schema depend on a specific database.
Building the query in the form of a conversation also helps to deal with the problem
of multilingualism, as we can use the same intents and named entities for all languages
and can independently train the NLU models for each language. Thus, multilingualism
is handled at the NLU level, and query building and SQL generation is language inde-
pendent. Initially, we focus on the SQL data sources because they are the most popular,
but the same method can be later adapted to access SPARQL, CKAN, GraphQL, and
other information sources.
Virtual Assistant for Querying Databases in Natural Language 559
We are also investigating the option of building multilingual intent detection and
named entity recognition models, as this approach would help us in building models for
less-resourced languages by leveraging data from well-resourced languages (in particular
English).
5 Prototype
We have implemented a prototype VA to evaluate the suitability of the proposed solution.
For our experiments, we selected a popular open data source - the Open Food Facts1
database. This database represents a typical use case that we address in this research, the
database contains information that is valuable for many non-IT specialists or the general
public, but this database cannot be queried by non-IT specialists without knowledge of
SQL.
In the process of developing a VA for querying databases, one first needs to define
templates that will be offered to the user. Templates reflect the most common tasks that
users usually wish to accomplish. Internally, we represent each template consisting of
four parts that correspond to the SQL query as follows:
For each template, we define a list of parameters whose values need to be acquired
from the user during a conversation with the VA.
Five types of entities are identified in user utterances: table names, field names,
number of records, sorting order, field alias names.
The only intents that depend on the data source are the intents that express the user’s
desire to proceed with a certain type of query template. Other intents do not depend on
the data source. The user can ask a question about the database, table, or field. Even if
1 https://world.openfoodfacts.org/.
560 D. Deksne and R. Skadiņš
the question contains a table/field name associated with the data source (recognized as
an entity), the form of the question does not depend on the data source. There are intents
that allow the user to change the list of fields, search in a different table, change the
sorting order of records, choose the number of records to display, and change filtering
conditions. The intent classifier is trained using 5-fold cross validation. It recognizes 14
intents with the accuracy 79.14%. The training data contains 163 utterance examples,
11 examples per intent on average. Some examples for the intent sort order: change
sorting order to descending, can I order by x field ascending, I would prefer sorting in
descending order, show me the records having the largest value of a, show records with
the least amount of x.
The dialog scenario consists of dialog states and transitions. One can move forward
in a conversation between various states of the scenario if a specific intent is recognized
in the user’s utterance or if any other condition defined for the transition is fulfilled.
The conversation starts with a guided dialog:
Next, the control is passed to the user. The user can adjust the parts of the query in
natural language. The VA processes the user’s input to detect the intent and entities, and
updates the query accordingly.
In this prototype we have implemented the following questions or commands that
the user can give at any time during the conversation:
• The user can ask what tables are included in the database.
• The user can ask what information is included in table X.
• The user can ask what fields are included in table X.
• The user can filter the query by specific field values and remove the filter from a field.
• The user can specify the number of records in the query.
• The user can specify which fields have to be included or removed from the query.
• The user can specify which table to search.
• The user can specify by which field name and in which order the records should be
sorted.
• The user can start composing a new query.
As the Open Food Facts database contains products from all over the world and
its data completeness varies, we selected a subset of the database containing 383,725
records for products available in the United States and the United Kingdom. This subset
was imported into a SQL database. In the original database, there were approximately
200 fields most of which were empty. We used only 20 fields containing information
about product category, energy, salt, sugars, fat, carbohydrates, proteins, fiber, vitamins
A, B12, C, D, and minerals iron and magnesium.
We implemented three templates from that the user can choose:
Virtual Assistant for Querying Databases in Natural Language 561
Table 1. Fragment of a dialog between the user and the VA based on the Food Facts Database
A sample conversation between the user and the VA is provided in Table 1. When
the user selects nutritional facts of a specific product, the VA asks to provide the name of
the product. When the user selects a template concerning a product that contains specific
vitamins or minerals, the VA asks to provide the name of the mineral or the vitamin field.
When the user selects a template about the products filtered by a specific category, the
VA asks to provide the name of the category.
For now, we use only textual input and output. As the result retrieved from the SQL
Server might contain several rows and fields, the voice output would not be efficiently
consumable by the user.
6 Evaluation
To evaluate the implemented VA prototype, 15 respondents were asked to solve three
tasks in a conversation with the VA (see Table 2). During every task, the users were
required to provide values of template parameters and to modify the query by asking the
VA to sort the records or to change specific filtering conditions. Not every respondent
tried all three tasks.
Qualitative evaluation of the errors allowed us to determine the main causes that led
to the failure to complete the tasks.
• Some users did not follow the instructions and did not use the ‘help’ option. The first
mandatory step of each dialog scenario was to choose one of the predefined templates.
562 D. Deksne and R. Skadiņš
Some users started by giving tasks to the VA prior to choosing a template. As no basic
template was selected, composing the query failed. To avoid such initial failure, some
restrictions were introduced in the dialog scenario to prevent the user from proceeding
without selecting a template.
• The second cause was the users’ wish to refer to fields with simplified names though
the VA presented a list of valid field names, e.g., the field ‘vitamin_c_100g’ was called
just ‘vitamin C’. This problem was solved by training the named entity recognizer to
recognize the field alias names and to map them depending on the field names in the
database schema.
• The third cause was the users’ inability to adjust the initial query. All possible actions
were described in the help section, and individual tips were given to users to advise
on how to proceed.
• On some occasions, the users provided just keywords, e.g., top5 beverages, sug-
ars_100g. The intent detection module failed to find a correct intent for inputs like
that. In fact, such inputs contained three intents: the number of records to show, results
filtered by product category that matches the beverages, and also a request to include
field ‘sugars_100g’ in the selection. This problem has not yet been solved.
In this paper, we have presented the VA that helps users in composing a SQL statement
for querying the database in the natural language. The solution allows adjusting the
dialogue system for querying any database using various query languages. As most
parts of our system do not depend on a particular data source only intents for choosing
query templates, a set of named entities and query templates corresponding to a different
data schema must be adjusted.
To test our approach, we have implemented the VA prototype to query the Open
Food Facts database, but the same method can be applied also to other databases. 15
respondents were asked to perform three tasks. Analysis of conversations demonstrates
that this approach can succeed, if the dialog is very intuitive and, as users do not like
to read long instructions, short tips displayed during the dialog help to achieve the
desired result. The lowest success rate (68%) was achieved during the task that required
Virtual Assistant for Querying Databases in Natural Language 563
specifying filtering conditions. Other tasks required specifying the sorting order of the
records.
The created VA prototype clearly demonstrates that it is possible to build a VA that
allows its users without deep technical knowledge to build SQL queries and access the
necessary information from databases. Although the prototype VA needs some informa-
tion about the database schema and query templates, it can be adapted to other databases,
and to implement such VA, we do not need to collect and annotate large datasets. Standard
VA development techniques have been used in developing the VA prototype - rule-based
dialogue scenarios, intent detection, and entity recognition. This opens opportunities for
other researchers to build VAs for other databases using an approach similar to what we
propose.
There are still some limitations and issues that have not yet been addressed:
• When the values of the template parameters are collected or filtering conditions are
added, no checking is done if the value entered by the user is valid for the field type. It
should be checked in the future. A message explaining the error should be displayed
if an invalid value is provided.
• When the SELECT or WHERE part of the query is adjusted, the user can specify only
table fields and not the calculated fields that use mathematical functions. Calculated
fields should be defined in the initial query template.
• Two types of predicates can be entered for filtering conditions: standard comparison
operators (=,! =, <, >, <=, >= and the operator BETWEEN. The operator can
be expressed in words, for example, should not exceed, would be translated as <=
. All filtering conditions set out in separate utterances of a query are joined with the
logical operator AND. If the OR operator is required, the user must specify it in a
single utterance, e.g., the value of energy_100g is larger than 20.5 or less than 15.4
will be translated as energy_100g > 20.5 OR energy 100g < 15.4.
• Records can be sorted only by a single field.
These limits are not related to the approach proposed, rather to the scale of the
experiment. The methods already described can be used to make the prototype more
complete. The listed issues can be addressed in the future, and we are also planning to
work on more advanced SQL features, such as grouping, counting, joining tables etc.
Acknowledgments. The research leading to these results has received funding from the research
project “Competence Centre of Information and Communication Technologies” of EU Structural
funds, contract No. 1.2.1.1/18/A/003 signed between IT Competence Centre and Central Finance
and Contracting Agency, Research No. 2.3 “Neural network machine learning techniques for
automated creating of virtual assistant dialog scenarios”.
References
1. Affolter, K., Stockinger, K., Bernstein, A.: A comparative survey of recent natural language
interfaces for databases. VLDB J. 28(5), 793–819 (2019). https://doi.org/10.1007/s00778-
019-00567-8
564 D. Deksne and R. Skadiņš
2. Balodis, K., Deksne, D.: FastText-based intent detection for inflected languages. Information
10(5), 161 (2019)
3. Hemphill, C.T., Godfrey, J.J., Doddington, G.R.: The ATIS spoken language systems pilot
corpus. In: Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley,
Pennsylvania (1990)
4. Hosu, I.A., Iacob, R.C.A., Brad, F., Ruseti, S., Rebedea, T.: Natural language interface for
databases using a dual-encoder model. In: Bender, E.M., Derczynski, L., Isabelle, P. (eds.)
Proceedings of the 27th International Conference on Computational Linguistics, pp. 514–524.
ACL, Santa Fe (2018)
5. Lyons, G., Tran, V., Binnig, C., Cetintemel, U., Kraska, T.: Making the case for query-by-
voice with EchoQuery. In: SIGMOD 2016: Proceedings of the 2016 International Conference
on Management of Data, pp. 2129–2132. ACM, New York (2016)
6. Madotto, A., Wu, C.S., Fung, P.: Mem2Seq: effectively incorporating knowledge bases into
end-to-end task-oriented dialog systems. arXiv preprint arXiv:1804.08217 (2018)
7. Shekarpour, S., Auer, S., Ngomo, A.C.N., et al.: Keyword-driven SPARQL query gen-
eration leveraging background knowledge. In: WI-IAT 2011: Proceedings of the 2011
IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent
Technology, vol. 1, pp. 203–210. IEEE Computer Society, New York (2011)
8. Sun, Y., Tang, D., Duan, N., et al.: Semantic parsing with syntax-and table-aware SQL gen-
eration. In: Proceedings of the 56th Annual Meeting of the Association for Computational
Linguistics (Volume 1: Long Papers), pp. 361–372. ACL, Stroudsburg, PA (2018)
9. Xu, X., Liu, C., Song, D.: SQLNet: generating structured queries from natural language
without reinforcement learning. arXiv preprint arXiv:1711.04436 (2017)
10. Yahya, M., Berberich, K., Elbassuoni, S., Ramanath, M., Tresp, V., Weikum, G.: Deep answers
for naturally asked questions on the web of data. In: Proceedings of the 21st International
Conference on World Wide Web, pp. 445–449. Association for Computing Machinery, New
York (2012)
11. Yu, T., Zhang, R., Yang, K., et al.: Spider: a large-scale human-labeled dataset for complex
and cross-domain semantic parsing and text-to-SQL task. In: Proceedings of EMNLP 2018,
pp. 3911–3921. ACL, Stroudsburg, PA (2018)
12. Yu, T., Zhang, R., Er, H., et al.: CoSQL: a conversational text-to-SQL challenge towards cross-
domain natural language interfaces to databases. In: Proceedings of EMNLP-IJCNNLP 2019,
pp. 1961–1979. ACL, Stroudsburg, PA (2019)
13. Zafar, H., Napolitano, G., Lehmann, J.: Formal query generation for question answering over
knowledge bases. In: Gangemi, A., et al. (eds.) ESWC 2018. LNCS, vol. 10843, pp. 714–728.
Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93417-4_46
14. Zhong, V., Xiong, C., Socher, R.: Seq2SQL: generating structured queries from natural
language using reinforcement learning. arXiv preprint arXiv:1709.00103 (2017)
Neural Machine Translation for Native
Language Aymara to English
1 Introduction
The Aymara language: ISO (ayc, ays) is traditionally spoken in south zone of
Peru, Bolivia and northern Argentina and Chile. In the language itself, the cor-
rect spelling is Aymara [1], more precisely, it can be seen in Fig. 1.
Due to the field of computing in the year 1950, a translation machine based
on rules originates, then there is an evolution in the year 1980, the translation
machine based on the example, in 1990 a statistical translation machine appears
and finally in the year 2015 it gives the conversion to a new translation machine,
the latter has very interesting applications and with very good results, such is
the case in the industry, it is better known for the google translator application,
to cite one of the examples. Machine translation is an important research topic in
the field of artificial intelligence, which allows machines to learn to automatically
translate one language into another [3].
There are many translation studios for the most famous languages in the
world like English, Spanish, Chinese, Portuguese, etc. However, the problem is
that there are no studies with native languages of Latin America, because they
are poorly known and/or in danger of spreading, and there is no corpus of trans-
lation data to train neural machine translator (NMT) models. One of the factors
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
K. Arai (Ed.): FTC 2022, LNNS 561, pp. 565–576, 2023.
https://doi.org/10.1007/978-3-031-18344-7_40
566 H. Apaza et al.
to take into account is the bilingual corpus, this issue becomes a critical resource
since they are the basis of any state-of-the-art machine translation system; fur-
thermore, building a parallel corpus is usually a complex and very expensive [4]
operation. Another important thing to consider is the existing models today, the
existing NMT systems use sequence-to-sequence neural networks to generate the
word-by-word target translation and then have the word generated at each time
step and the counterpart in the references are as consistent as possible [5], this
type of translation requires a set of examples of translations between the lan-
guages we want to translate, but so far they are the most used and with good
results.
The objective of this work is to build a small Aymara-English data set to
study the behavior and operation of the sequence-of-sequence model with RNN
architecture for translation from the native Aymara language to English.
The present work focuses on three main parts, 1) Due to the lack of data
set, we began by collecting texts of conversations in Aymara. 2) The next step is
create a data set structure with Natural Language Processing (NLP) techniques
to standardize the scripts and finally format them in input format to a recurrent
neural network (RNN). 3) Finally, the translations (set of collected data) are
trained with the seq2seq model and finally, the translation tests of examples
written in Aymara are carried out and the model returns the translation in
English version.
Neural Machine Translation for Native Language Aymara to English 567
2 Methodology
The procedure of this research work is according to Fig. 2.
In the document [6], the texts of the conversations are written in Aymara and
their respective translations in Spanish. It was necessary to translate the Spanish
part into English by three of the authors, whose mother tongue is Aymara.
It is the stage where natural language processing (NLP) techniques are applied,
which includes standardizing the writing, transforming characters compatible
with ACSII, cleaning strange characters and transforming the input format to
the recurrent neural network (RNN).
Computers must receive input in a specific format so that they can under-
stand natural languages as humans do [7]. In this stage, the seq2seq model is
trained with the data previously pre-processed in the input format to the neural
networks.
568 H. Apaza et al.
3 Aymara
According the Ministerial Resolution No. 1218-85-ED, of November 18, 1985,
with 32 spellings (a, ä, ch, chh, ch’, i, ı̈, j, k, kh, k’, l, ll, m, n, ñ, p, ph, p’, q, qh,
q’, r, s, t, th, t’, u, ü, w, x, y) [1]. However, the variables of Aymara from Chile,
Tacna, Moquegua, and Jacaru, which have the velar nasal sound nh, were not
taken into account; That is why we have to know that Peru has 44 languages, 3
in the mountains, 1 on the coast and 40 in the jungle, as well as the varieties [8].
One of the characteristics of the Aymara language is the influence of the
Spanish language, because in the countries where the native Aymara language
is found, the Spanish language is also declared the official language. Therefore,
globalization and the migration of the same Aymara to the cities has gener-
ated considerable influence from Spanish to Aymara, therefore today Aymara is
spoken with words borrowed from Spanish.
The linguistic study of Aymara is still in process, many characteristics of
Aymara writing and speech need to be defined, according to existing studies of
the typology of the Aymara language, we present in the Tables 1, 2, 3 and 4 some
rules of morphophonological operation: According [8] it lacks voiced stop conso-
nants /b/, /d/, /g/; and the fricative consonants /f/ and /θ/ of Spanish. Hence,
we note that due to a substratum process (aimarization), by loan, hispanic terms
such as:
According [8] Aymara has 140 suffixes, 40 verbal derivations (DV), 31 verbal
inflections (FV), 15 nominal derivations (DN), 25 nominal inflections (FN), 17
independent suffixes (SI) and 12 fossilized suffixes (SF).
Table 3. Lacks the diphthongs /ue/ and /ie/, or different vowel sequences.
The input texts can have various characters, such as initial capital letters, dif-
ferent writing characters, etc. therefore, it is very important to standardize the
input text. The first step is Unicode normalization to split accented characters
and replace compatibility characters with their ASCII equivalents. In this step,
we use tensrof lowt ext function of the tensorflow library.
Tokenization is the first step, before natural language processing, it is the
delimitation of sequences of words in a document, there are usually two ways
to build, first following the lexicographer’s experience and second is following
personal experience [9]. For this procedure we use the preprocessing.T extV ec
torization function from the Tensorflow library.
Figure 3 shows a very general visualization of the model. The result of the
decoder output is combined with a sum over the encoded input, for the prediction
of the next word [10].
The most basic way to understand NMT is by recognizing two main steps,
a) Encoder: which computes a representations for each source sentence and b)
Decoder: which generates one target word at a time and hence decomposes the
conditional probability [10].
570 H. Apaza et al.
4.1 Encoder
The decoder selects the input character sequence piecewise. Attention takes a
sequence of vectors as input for each instance and returns an “attention” vec-
tor for each instance. The equations presented below are extracted from the
Tensorflow Neural machine translation with attention tutorial [11], Effective
Approaches to Attention-based Neural Machine Translation [10] and Neural
Machine Translation by Jointly Learning to Align and Translate [12].
exp(score(ht , hs ))
αts = S (1)
s′ = 1exp(score(ht , hs′ ))
h∈s W hs
score(ht , hs ) = (2)
vσ∈ tanh(W1 ht + W2 hs )
The Eq. 1 compute the attention weights, as a softmax across the encoder’s
output sequence. The Eq. 2 calculates the context vector as the weighted sum of
the encoder outputs.
Neural Machine Translation for Native Language Aymara to English 571
Where:
ct = αts hs (3)
s
The job of the Eq. 3 is to calculate a scalar logit-score for each key-query
pair.
4.2 Decoder
In the Eq. 4 show the decoder job which is generate the prediction for the next
out token. The decoder get a complete out of the encoder. It uses RNN to
track predictions and queries the attention on the encoder output, producing
the context vector. So, it combines the RNN output and the context vector
using to generate the attention vector.
5 Dataset
The first step was to recover translations of conversations in Aymara, which with
their respective translations into Spanish. Later, the translation into English was
carried out by the Aymara speakers, the texts of conversations correspond to
material AYMARA ARUSKIPAWINAKA (Conversations in Aymara) [6].
In total, 1915 conversations have been collected in text format, which can be
found at: https://github.com/Honorio-apz/AYMARA ARUSKIPAWINAKA,
the data set format is as shown in the Table 5.
572 H. Apaza et al.
N Aymara English
1 Aski urukı̈pan kullaka good day sister
2 Aski urukı̈panay kullaka good day sister
3 Kamisaki? how are you?
... ... ...
... ... ...
... ... ...
1912 Anupax allqawa Her dog is 2 colors
1913 Jurpürkam kullaka Until the day after tomorrow sister
1914 Jurpürkamay jilata see you the day after tomorrow brother
6 Results
6.1 Training
The training consists of three specific functions, 1) to calculate the loss function
and one to calculate the optimization, 2) update method for each input/target
batch at each training step, 3) a training loop and save the checkpoints for each
step.
Specifically, we read the input texts, then convert the input texts into tokens
and masks, then run the encoder to get the input tonks and state tokens. Next,
we start with the decoder function, it is executed on the target tokens loop, the
decoder step by step is executed, also the loss calculation step by step and the
loss average is accumulated.
A good sign of a new model is that it may overfit a lot of input, meaning that
the loss function values should quickly approach zero. See Fig. 4a. The jumps
visible on the graph are at the epoch limits, see Fig. 4b.
N Aymara English
1 Aski urukı̈pan kullaka Good morning/day sister
2 Kamisaki? How are you?
3 Waliki I am good
4 Juman sutimax kunasa What is your name?
5 Jumax uywanitati? Do you have animals?
6 Jikisiñkamay kullaka see you later sister
The attention values in theory should show where the focus is or the infor-
mation when the model generates new translations, the sum of attention values
should return all ones, the input and output word alignment should approximate
on a diagonal line, the results for example 4 and 5 can be seen in Fig. 6, where
the attention sum clearly does not align diagonally, this could be future work to
improve.
The examples of translations carried out have been evaluated according to the
original text and by a language specialist, the model was trained by forcing the
examples by entering the correct tokens at each step, regardless of the model’s
predictions. The model could be made more robust if it were sometimes fed with
its own predictions.
It could be improved by keeping feedback with the users, who validate the
good and erroneous translations, which are taken in future predictions, as is the
case with Google translator.
7 Conclusion
– The model works quite well with basic/simple sentences, it has difficulty
translating complex sentences, it also clarifies in the same tutorial of the
model in tensorflow.
Neural Machine Translation for Native Language Aymara to English 575
– The data set is very little, in total it was trained with 1914 examples of
conversations in the native Aymara language, this detail must be improved if
we want to improve the translations of the model and to make a much more
real analysis of the translation behavior from Aymara to English.
– The fact of studying cases of machine translation from native languages to
the world’s most well-known languages can bring improvements to the current
NMT models.
8 Future Work
– Work needs to be done on ways to improve the model for more complex
sentence translations.
– It is necessary to work on the collection of the data set of conversations in
the Aymara language, after on making the respective translations into the
English language.
– Study how to improve the alignment of sum of attentions in a diagonal form,
which would help predict longer sentences.
References
1. Ministerio de la cultura del Peru, Base de datos de pueblos indiginas u originarios
(2022)
2. Albó, X., et al.: Raices de América: el mundo aymara, 1a ed., Alianza Editorial
(1988). ISBN 84-206-4213-4
3. Zhou, M., Secha, J., Cai, R.: Domain adaptation for Tibetan-Chinese neural
machine translation. In: 2020 3rd International Conference on Algorithms, Com-
puting and Artificial Intelligence (ACAI 2020). Association for Computing Machin-
ery, New York, NY, USA, Article 77, 1–5 (2020). https://doi.org/10.1145/3446132.
3446404
4. Tse, R., Mirri, S., Tang, S.-K., Pau, G., Salomoni, P.: Building an Italian-Chinese
parallel corpus for machine translation from the web. In: Proceedings of the 6th
EAI International Conference on Smart Objects and Technologies for Social Good
(GoodTechs 2020). Association for Computing Machinery, New York, NY, USA,
pp. 265–268 (2020). https://doi.org/10.1145/3411170.3411258
5. Duan, C., et al.: Modeling future cost for neural machine translation. IEEE/ACM
Trans. Audio, Speech and Lang. Proc. 29 (2021), 770-781 (2021). https://doi.org/
10.1109/TASLP.2020.3042006
6. Aruskipawinaka, A.: Conversaciones en aimara, Román Pairumani Ajacopa and
Alejandra Bertha Carrasco Lima, Centro de Apoyo en Investigación y Educación
Multidisciplinaria - CAIEM (2022)
7. Zanini, N., Dhawan, V.: Text Mining: An introduction to theory and some appli-
cations. Research Matters, pp. 38–44 (2015)
8. Huayhua Pari, F.: Normas para el buen uso de la ortografı́a aimara. Lengua Y
Sociedad, 12(1), 167–176 (2017). Recuperado a partir de http://revista.letras.
unmsm.edu.pe/index.php/ls/article/view/428
576 H. Apaza et al.
9. Webster, J.J., Kit, C.: Tokenization as the initial phase in NLP. In: Proceedings
of the 14th Conference on Computational linguistics - Volume 4, COLING ’92.
Association for Computational Linguistics, USA, pp. 1106–1110 (1992). https://
doi.org/10.3115/992424.992434
10. Luong, M.-T., Pham, H., Manning, C.D.: Effective approaches to attention-based
neural machine translation (2015). https://doi.org/10.48550/arxiv.1508.04025
11. Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous sys-
tems (2015). Software available from tensorflow.org
12. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning
to align and translate (2014). https://doi.org/10.48550/arxiv.1409.0473
Vocabulary Expansion for the Sub-word
WFST-Based Automatic Speech
Recognition System
1 Introduction
Since a human language constantly evolves, ASR systems need to correctly rec-
ognize newly occurring words that are unseen during training. This problem is
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
K. Arai (Ed.): FTC 2022, LNNS 561, pp. 577–592, 2023.
https://doi.org/10.1007/978-3-031-18344-7_41
578 A. Salimbajevs and J. Kapočiūtė-Dzikienė
2 Related Work
The problem of out-of-vocabulary words (OOV) is typical for any speech recog-
nition system. Most systems are usually constructed to recognize a fixed set of
words and rarely can include all the words that will be encountered during the
exploitation of the system. Instead, the system will try to find the (acoustically)
closest in-vocabulary (IV) word, affect the surrounding context, and confuse the
end-user or downstream models like machine translation or intent detection.
Character or grapheme-based end-to-end (E2E) systems [3] seem like the per-
fect solution to our problem: they use neural models mapping audio (acoustic
features) to text (graphemes) directly. E2E systems perform global optimiza-
tion in a data-driven fashion and reduce the complexity compared to traditional
hybrid ASR systems. Since E2E systems have mechanisms for jointly learning
pronunciation and language information as a single model, it makes them espe-
cially robust in coping with open vocabulary problems. However, despite open
vocabulary advantage, grapheme-based E2E systems are significantly outper-
formed by sub-word or word-based systems [4–6].
Also, E2E systems require much more training data to outperform hybrid
ones. Comparative experiments on irregularly spelled English demonstrate E2E
superiority over hybrid ASR systems only with more than 10,000 h of training
data [7]; and with fewer data (∼100-1,000 h) hybrid systems guarantee much
better performance [8]. The word error rate (WER) with the E2E system on
Turkish and Georgian languages with much smaller datasets (73.4 and 50.2 h)
is high > 38.9% and > 46.3%, respectively, and may not be sufficient for some
tasks [9]. Whereas comparative experiments performed under the same experi-
mental conditions demonstrate the drop in WER to 32.2% for the hybrid ASR
system [10] on the same Georgian dataset.
The Lithuanian language has several publicly available corpora: LIEPA1 [11],
SEIMAS [12], LIEPA21 which makes up ∼1,300 h in total. However, at the time
when baseline Lithuanian ASR was trained, only about ∼300 h were available
(no LIEPA2). Except for several consonant assimilation rules, the Lithuanian
language has relatively regular spelling, which theoretically means that E2E
systems should not require as much training data as English to learn how to
recognize regular Lithuanian words. However, the problem we are tackling in
this research is not only regular Lithuanian words but also surnames, brand
names, and other complicated cases that appear in different domains/topics
and are pronounced not according to the Lithuanian rules. Unfortunately, the
publicly available Lithuanian corpora lack these critical examples: they should
be collected specifically for different customization tasks.
Considering resources available for the Lithuanian language together with
findings by other researchers, we come up with the decision of using a hybrid
1
https://xn--ratija-ckb.lt/liepa-2/infrastrukturines-paslaugos/garsynas/.
580 A. Salimbajevs and J. Kapočiūtė-Dzikienė
ASR system for our OOV problem-solving, especially having in mind that we
already have the background in this direction.
There are multiple approaches to deal with OOVs in hybrid ASR depend-
ing on the application. In some applications it is enough just to detect these
occasions [13–17] while other applications require a mechanism to recover OOV
words. The most primitive way is by adding these words directly into the lan-
guage model and pronunciation model. Unigram probabilities can be set to some
default value or trained on a small number of examples, while the pronuncia-
tion model can be updated either manually or automatically [18,19]. There has
been a lot of research on how to achieve this in WFST-based speech recognition
without rebuilding the decoding graph and retraining ASR models [20–24].
Another popular approach is to use language models containing <unknown>
token that can represent any OOV word and another generic (phonemic) lan-
guage model trained on a lexicon of words with low counts. During the recovery
process, the OOV word is aligned with the <unknown> token from the language
model and recognized as the sequence of phones from the phonemic language
model [25–27]. Usually, both learned word and phonemic language models are
static. However, some authors (e.g., [28]) overcome this limitation by offering
solutions on how dynamically recover recognized phoneme sequences as OOV
words: with the second pass decoding, the vocabulary is dynamically expanded
by calibrating OOV candidates’ language model scores (considering their pro-
nunciation, spelling, empirical frequency, and overall OOV rate which cannot be
done during the first pass).
However, taking into account the Lithuanian language specifics (high inflec-
tion) word-based approaches would require having a very large vocabulary (hun-
dreds of thousands of units) as each surface form will be represented as a separate
entry in the vocabulary. This creates challenges for accurate language modeling,
as it greatly increases the sparsity of n-grams and requires special solutions for
the state-of-the-art neural network language models (most of such models can
not be efficiently trained with such a large output layer).
One workaround for the large vocabulary problem is the sub-word-based
model. The approach is based on the assumption that theoretically each word
can be composed as a sequence of sub-word units, the number of which is much
smaller and fixed. The comparative experiments [29] between word-based and
sub-word-based approaches on the English and German (which is inflective and
full of compound words) languages show no improvement for the English lan-
guage but significant improvement for German. Similar improvements have also
been demonstrated for agglutinative languages [2].
The sub-word approach potentially can also solve the OOV problem. There
are several groups in this family of methods. Some approaches are straightfor-
ward: the language model is trained on the variable-length sub-word units which
make such ASR system open vocabulary [2,30,31]. Another group represents the
hybrid language models which combine both word and sub-word units. During
the decoding, sub-words would have higher posterior probabilities at the regions
of OOV words [32–34]. Despite theoretically sub-word units (especially shorter)
Vocabulary Expansion for the Sub-word WFST-Based ASR 581
having the potential to “recover” any word in the Lithuanian language; they
still fail to recover quite a lot of words, especially correct written forms of non-
Lithuanian origin words. The ability of sub-word systems to recognize unseen
words can be improved using regularization during training [35,36]. However, we
are not aware of research on boosting the probabilities of such words in sub-word
WFST ASR systems without retraining.
The previously presented related review analysis covers the research done
not for the Lithuanian language. Whereas [37] presents a comprehensive 15-
year overview of various attempts to create Lithuanian word and sub-word ASR
systems. The authors also experimentally investigated several approaches with a
rather small dictated speech corpus of 50 h containing readings from books. Their
investigation claims the superiority of phone-based mappings over grapheme-
based by proving that the best results are achieved with a phoneme-based lex-
icon that explicitly models syllable stress and represents diphthongs as single
phonetic units. Despite the comprehensive overview and interesting comparative
experiments, the authors do not pay specific attention to the OOV or vocabulary
expansion problems.
The other currently available Lithuanian ASR systems [38] also do not have
mechanisms able to dynamically treat the OOV problem. Consequently, this
research will be mainly focused on this type of problem for the Lithuanian lan-
guage. The contribution of our research is two-fold:
– We focus on “vocabulary expansion” and recognition of new words in the open-
vocabulary sub-word ASR system. Since the feature is intended to be used
by end-users, two important conditions are considered: 1) the customized
vocabulary is “expanded” without acoustic and language model retraining:
only by boosting probabilities of new words; 2) the system’s adaptation is
performed in the production environment without requiring a lot of memory
and computing resources.
– For the first time, the OOV problem is being solved in the Lithuanian ASR
system. Besides, we tackle not only regular Lithuanian words, but also more
complicated cases (i.e., foreign surnames, brand names, etc.).
P (X|W )P (W )
T ∈= arg max
W P (X)
where P (X|W ) is the conditional probability of acoustic signal X given word
sequence W , or acoustic model, P (W ) is unconditional probability of word
sequence W , or language model, and P (X) is unconditional probability of acous-
tic signal X.
Because the optimal value of W is independent of P (X), this probability can
be ignored in the optimization process and the decoder works only with non-
normalized probabilities. However, some estimate of this probability might be
necessary if one would want to calculate the normalized probability of T ∈ given
acoustic signal X. For example, for calculating the confidence of the decoder for
a given recognized word sequence.
A finite-state transducer (FST) is a finite-state machine with two memory
tapes, following the terminology for Turing machines: an input tape and an out-
put tape. An FST is a type of finite-state automaton (FSA) that maps between
two sets of symbols. An FST will read a set of strings on the input tape and
generate a set of relations on the output tape. An FST can be thought of as a
translator or relater between strings in a set. Finite State Transducers can be
weighted, where each transition is labeled with a weight in addition to the input
and output labels.
In weighted finite-state transducer (WFST) based speech recognition [40],
the search network for a decoder is composed out of four separate finite-state
transducers that each provide one part of the mapping from sounds to words. The
hidden Markov model FST (H) maps emission distributions from the acoustic
model (which can be neural network or Gaussian model) to context-dependent
phones. After that, the context FST (C) maps these context-dependent phones to
context-independent phones. The third part is the lexicon FST (L) which maps
phone sequences to words and inserts appropriate silences on word boundaries.
The last FST is more like an acceptor; the grammar or language model FST
(G) gives appropriate probabilities to the word sequences. All these transducers
are composed together into a single optimized transducer HCLG (usually called
decoding graph), which is used for speech recognition.
4 Proposed Solution
4.1 Baseline
In the (b) static composition is used to combine G and B, but then the
dynamic composition is used to combine the final decoding graph. This method
should allow lowering the hardware requirements. For dynamic composition, we
use lookahead composition by [39].
Finally, in (c) boosting FST is dynamically composed with the static HCLG.
This approach should theoretically have the lowest hardware requirements and
lowest latency, allowing it to be used in real-time speech recognition.
During decoding with such composed graph probabilities of sub-word paths
P are boosted. After decoding sub-words are glued back into words and reverse
rewriting is performed using mappings R. Therefore, the probability to recognize
OOV words W is increased making their recognition possible. Note that G itself
is not modified, so when rescoring is performed only G weights are subtracted,
that means boosting continues to work during rescoring too as boosting weights
remain in the lattice.
4.3 Evaluation
For evaluation of the proposed solution, we use the 4-hour, 1,417 utterance data
set, made mainly from the news broadcast recordings. These utterances contain
named entities of non-Lithuanian origin (e.g., Thomasas Walkupas pronounced
as Tomasas Volkapas, Baltic MG as Boltik emdži, Facebook as Feisbuk ) or other
special terms, like car body types (e.g., bečbekas). From these named entities
and terms, we created a list of 54 OOV words (together with manually assigned
Vocabulary Expansion for the Sub-word WFST-Based ASR 585
“Lithuanized” pronunciations) that were not seen during language model train-
ing. These 54 OOV words appear 352 times in total or ∼6.52 times per word on
average.
From these 54 words an additional synthetic audio test set was created by
applying the Lithuanian speech synthesizer2 on pronunciations of these words.
All three variations of the proposed method was evaluated and compared
with the baseline using the following metrics:
– ASR WER and CER (Character Error Rate) on both test sets.
– Percentage of OOV words missing in the ASR transcript on both test sets.
– For each OOV word, numbers of true positives, false positives, and false nega-
tives were collected and used to compute micro-average precision, recall, and
F1 score on real test set.
As mentioned in the Introduction section, not only the accuracy but also the
speed and computational resources are important in our work. Therefore, we
also measured how much time and memory resources are needed to update the
vocabulary using proposed method and to perform the ASR decoding.
5 Results
Firstly, evaluation experiments on the synthetic audio test set were performed.
Table 1 contains the best WERs (after optimizing the language model (LM)
weight scores from the range [7, 14]) obtained by each booster composition
method. We have used a = 1 as an initial boosting weight in B.
The results show that the baseline sub-word system could not recognize about
59% (see Miss, % notation) of the OOV words (with WER = 71.48%). This is
a very good result considering these words are not seen during language model
training. It shows that in good acoustic conditions sub-word approach alone
can recognize significant amount of unseen words. With the proposed boosting
FST the result can be improved to 40% (WER = 42.87%), which is a significant
improvement. Since all composition strategies achieved the same result, it allows
us to conclude that the offered approach is effective and stable.
2
https://xn--ratija-ckb.lt/liepa-2/paslaugos-vartotojams/interneto-naujienu-skaityt
uvas/.
586 A. Salimbajevs and J. Kapočiūtė-Dzikienė
LM weight Method Results with OOV words Results with all words
Precision, % Recall, % F1, %, Miss, % WER, % CER, %
7 baseline 98.6 65.1 78.4 43 13.0 4.5
(a) and (b) 98.6 67.9 80.5 39 12.9 4.5
(c) 98.7 71.8 83.1 39 12.9 4.4
8 baseline 97.9 65.6 78.5 43 12.2 4.3
(a) and (b) 97.9 67.9 80.2 39 12.2 4.3
(c) 97.4 72.7 83.3 39 12.1 4.2
9 baseline 98.0 68.9 80.9 43 11.6 4.1
(a) and (b) 98.0 71.8 82.9 37 11.6 4.1
(c) 97.4 73.7 83.9 39 11.5 4.1
10 baseline 97.4 70.8 82.0 37 11.3 4.0
(a) and (b) 97.5 73.7 83.9 31 11.3 4.0
(c) 97.5 75.5 85.2 31 11.2 4.0
11 baseline 97.4 71.8 82.6 37 11.3 4.0
(a) and (b) 97.5 74.6 84.6 31 11.3 4.1
(c) 97.7 76.1 84.8 31 11.3 4.0
12 baseline 96.8 72.2 82.7 37 11.3 4.1
(a) and (b) 96.9 75.6 84.9 31 11.2 4.1
(c) 95.8 75.6 84.5 31 11.2 4.1
13 baseline 96.8 72.7 83.1 37 11.4 4.2
(a) and (b) 96.4 76.1 85.0 31 11.4 4.2
(c) 95.2 76.1 84.6 30 11.4 4.2
14 baseline 96.2 71.8 82.2 39 11.6 4.3
(a) and (b) 95.2 75.1 84.2 33 11.6 4.3
(c) 94.6 75.6 84.0 30 11.6 4.3
The second set of experiments is performed on the real test set and boosts the
same list of OOV words. This allows the evaluation of sub-word path boosting
on real audio with human pronunciation and context around each OOV word.
For each OOV word, numbers of true positives, false positives, and false
negatives were collected and used to compute micro-average precision, recall,
and F1 score reported in Table 2. Besides, the percentage of OOV words that
the ASR system failed to recognize are also presented. The last two columns
(WER and CER) represent the overall ASR recognition results on all words
(including OOV words) in the test set. The table also reports the results for
different language model weights = [7, 14] and booster composition approaches
(none/baseline, static (a), hybrid (b), dynamic (c)). Since boosting with static
and hybrid composition achieved the same result, they are presented as the
single row next to the each language model weight. Boosting weight is a = 1 as
previously.
Vocabulary Expansion for the Sub-word WFST-Based ASR 587
6 Discussion
Overall, the experimental results have proved that the proposed approach
enables ASR to recognize more OOV words than a simple sub-word system.
It has demonstrated robustness even recognizing complicated cases, i.e., words
that are not pronounced accordingly to regular Lithuanian pronunciation rules
(e.g., Facebook, Baltic MG, etc.). However, there are several limitations: (1) there
is a performance penalty for booster FST initialization and decoding, (2) only
the main form of the word is boosted, each inflection form has to be manually
added to the boosting FST, (3) recall is still far from 100%.
The static composition enables the fastest decoding but requires much more
decoding, rescoring time, and RAM, which makes it unsuitable to use in pro-
duction by typical end-users. Despite, decoding with the dynamic or hybrid
composition is almost 3 times slower, this is still faster than real-time, therefore
can be used in practice. Moreover, the dynamic composition does not require
resource-consuming preparation of the decoding graph and can be used even in
online recognition. It is important to mention, that these features are in line
with our goals of using the system with real customers.
It can be seen from Table 4 that between the boosting weight “a” and the
recall there is a direct correlation: increasing “a” increases the recall and thus
more “OOV” words are recognized (therefore the percentage of OOV words that
are failed to recognize is lowered). On the contrary, an increasing number of “a”
increases the number of false positives simultaneously and degrades the precision.
The peak of the F1 score is with a = 2: the continuous increase of “a” degrades
the F1 score. However, we believe that the optimal value of “a” depends on the
specific usage scenario: in some cases, higher might be more important than
precision or recall.
7 Conclusion
This paper presents the method for vocabulary expansion in a sub-word WFST
ASR system, that enables to recognize newly added words, that were unseen or
Vocabulary Expansion for the Sub-word WFST-Based ASR 589
OOV during training. The method works by creating a boosting FST from a
list of words to be added and optionally their pronunciations. Then, this FST
is composed with decoding graph to increase probabilities of added words. Dif-
ferent FST composition techniques are evaluated on experimental results of the
Lithuanian ASR. The OOV word list used in evaluation includes both the spe-
cific terminology and complicated cases (i.e., words that are pronounced not
according to regular Lithuanian pronunciation rules, e.g., Facebook pronounced
as Feisbuk ; Thomasas Walkupas as Tomasas Volkapas). The research is novel
because this problem for the Lithuanian language has never been solved before.
The evaluation shows that the proposed approach achieved its aim. Improve-
ments over the sub-word ASR baseline were shown on both synthetic and real
evaluation data. On the later the percentage of misrecognized out-of-vocabulary
words dropped by ∼7% and F1 improved from 83.1% to 85.6% comparing with
baseline sub-word WFST system.
However, there are still many different cases where our approach failed.
We believe that sub-word regularization should improve boosting performance.
Another problem is that currently each inflection form should be added to boost-
ing list separately. These issues will be addressed in the future research.
Acknowledgments. This research has been supported by the ICT Competence Cen-
tre (www.itkc.lv) within the project “2.8. Automated voice communication solutions
for the healthcare industry” of EU Structural funds, ID no 1.2.1.1/18/A/003.
References
1. Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with
subword units. In: Proceedings of the 54th Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long Papers), pp. 1715–1725 (2016)
2. Smit, P., Virpioja, S., Kurimo, M., et al.: Improved subword modeling for WFST-
based speech recognition. In: Interspeech, pp. 2551–2555 (2017)
3. Wang, S., Li, G.: Overview of end-to-end speech recognition. J. Phys: Conf. Ser.
1187(5), 052068 (2019). https://doi.org/10.1088/1742-6596/1187/5/052068
4. Rao, K., Sak, H., Prabhavalkar, R.: Exploring architectures, data and units for
streaming end-to-end speech recognition with RNN-transducer. 2017 IEEE Auto-
matic Speech Recognition and Understanding Workshop (ASRU), pp. 193–199.
IEEE (2017)
5. Chiu, Ch.-Ch., et al.: State-of-the-art speech recognition with sequence-to-sequence
models. In: 2018 IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP), pp. 4774–4778. IEEE (2018)
6. Zenkel, T., Sanabria R., Metze, F., Waibel, A.: Subword and crossword Units for
CTC acoustic models. In: Proceedings of the Interspeech 2018, pp. 396–400 (2018).
https://doi.org/10.21437/Interspeech.2018-2057
7. Sainath, T.N., et al.: No need for a lexicon? Evaluating the value of the pronunci-
ation lexica in end-to-end models. CoRR, arXiv:abs/1712.01864 (2017)
8. Lüscher, Ch., et al.: RWTH ASR systems for LibriSpeech: hybrid vs attention
Interspeech 2019, ISCA (2019). https://doi.org/10.21437/interspeech.2019-1780
590 A. Salimbajevs and J. Kapočiūtė-Dzikienė
9. Laptev, A., Andrusenko, A., Podluzhny, I., Mitrofanov, A., Medennikov, I.,
Matveev, Y.: Dynamic acoustic unit augmentation with BPE-dropout for low-
resource end-to-end speech recognition. Sensors (9), 3063 (2021). MDPI AG .
https://doi.org/10.3390/s21093063
10. Alumäe, T., et al: The 2016 BBN Georgian telephone speech keyword spotting sys-
tem. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Pro-
cessing (ICASSP), pp. 5755–5759 (2017). https://doi.org/10.1109/ICASSP.2017.
7953259
11. Laurinčiukaitė, S., Telksnys, L., Kasparaitis, P., Kliukienė, R. Paukštytė, V.:
Lithuanian speech corpus liepa for development of human-computer interfaces
working in voice recognition and synthesis mode. Informatica 29(3), 487–498
(2018). https://doi.org/10.15388/Informatica.2018.177. Vilnius University Insti-
tute of Data Science and Digital Technologies
12. Salimbajevs, A., Kapočiūtė-Dzikienė, J.: General-purpose lithuanian automatic
speech recognition system. In: Human Language Technologies – The Baltic Per-
spective – Proceedings of the Eighth International Conference Baltic HLT, vol.
307, pp. 150–157. IOS Press (2018). https://doi.org/10.3233/978-1-61499-912-6-
150
13. Rastrow, A., Sethy, A., Ramabhadran, B.: A new method for OOV detection using
hybrid word/fragment system. In: 2009 IEEE International Conference on Acous-
tics, Speech and Signal Processing, pp. 3953–3956. IEEE (2009)
14. White, Ch., Zweig, G., Burget, L., Schwarz, P., Hermansky, H.: Confidence estima-
tion, OOV detection and language id using phone-to-word transduction and phone-
level alignments. In: 2008 IEEE International Conference on Acoustics, Speech and
Signal Processing, pp. 4085–4088. IEEE (2008)
15. Kumar, R., et al.: Detecting OOV named-entities in conversational speech. In:
Thirteenth Annual Conference of the International Speech Communication Asso-
ciation (2012)
16. Lin, H., Bilmes, J., Vergyri, D., Kirchhoff, K: OOV detection by joint word/phone
lattice alignment. In: 2007 IEEE Workshop on Automatic Speech Recognition &
Understanding (ASRU), pp. 478–483. IEEE (2007)
17. Asami, T., Masumura, R., Aono, Y., Shinoda, K.: Recurrent out-of-vocabulary
word detection based on distribution of features. Comput. Speech Lang. 58, 247–
259 (2019)
18. Lee, Ch-y., Zhang, Y., Glass, J.: Joint learning of phonetic units and word pronun-
ciations for ASR. In: Proceedings of the 2013 Conference on Empirical Methods in
Natural Language Processing, Seattle, Washington, USA, pp. 182–192, Association
for Computational Linguistics (2013). https://aclanthology.org/D13-1019
19. Lee, Ch.-y., O’Donnell, T. J., Glass, J.: Unsupervised Lexicon Discovery from
Acoustic Input. Transactions of the Association for Computational Linguistics,
Cambridge, MA, vol. 3, pp. 389–403. MIT Press (2015). https://doi.org/10.1162/
tacl_a_00146
20. Aleksic, P., Allauzen, C., Elson, D., Kracun, A., Casado, D.M., Moreno, P.:
Improved recognition of contact names in voice commands. In: 2015 IEEE Inter-
national Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.
5172–5175. IEEE (2015)
21. Allauzen, C., Riley, M.: Rapid vocabulary addition to context-dependent decoder
graphs. In: Sixteenth Annual Conference of the International Speech Communica-
tion Association (2015)
Vocabulary Expansion for the Sub-word WFST-Based ASR 591
22. Bulusheva, A., Zatvornitskiy, A., Korenevsky, M.: An efficient method for vocab-
ulary addition to WFST graphs. In: Sojka, P., Horák, A., Kopeček, I., Pala, K.
(eds.) TSD 2016. LNCS (LNAI), vol. 9924, pp. 452–458. Springer, Cham (2016).
https://doi.org/10.1007/978-3-319-45510-5_52
23. Horndasch, A., Kaufhold, C., Nöth, E.: How to add word classes to the Kaldi
speech recognition toolkit. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.)
TSD 2016. LNCS (LNAI), vol. 9924, pp. 486–494. Springer, Cham (2016). https://
doi.org/10.1007/978-3-319-45510-5_56
24. Liu, J., Zhu, J., Kathuria, V., Peng, F.: Efficient dynamic WFST decoding for
personalized language models. arXiv preprint, arXiv:1910.10670 (2019)
25. Bazzi, I.: Modelling OOV words for robust speech recognition. Ph.D. thesis, Mas-
sachusetts Institute of Technology, Cambridge, MA, USA (2002)
26. Kombrink, S., Hannemann, M., Burget, L., Heřmanský, H.: Recovery of
Rare Words in Lecture Speech. In: Sojka, P., Horák, A., Kopeček,
I., Pala, K. (eds.) TSD 2010. LNCS (LNAI), vol. 6231, pp. 330–
337. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15760-8_42
https://www.fit.vut.cz/research/publication/9323
27. Alumäe, T., Tilk, O., Ullah, A.: Advanced Rich Transcription System for Estonian
Speech. CoRR, arXiv:abs/1901.03601 (2019)
28. Zhang, X., Povey, D., Khudanpur, S.: OOV recovery with efficient 2nd pass decod-
ing and open-vocabulary word-level RNNLM rescoring for hybrid ASR. ICASSP,
pp. 6334–6338. IEEE (2020)
29. Braun, R.A., Madikeri, S.R., Motlícek, P.: A comparison of methods for OOV-word
recognition on a new public dataset. CoRR. arXiv:abs/2107.08091 (2021)
30. Hirsimäki, T., Pylkkönen, J., Kurimo, M.: Importance of high-order n-gram models
in morph-based speech recognition. IEEE Trans. Speech Audio Process. 17(4),
724–732 (2009)
31. Siivola, V., Hirsimäki, T., Creutz, M., Kurimo, M.: Unlimited vocabulary speech
recognition based on morphs discovered in an unsupervised manner. In: Interspeech
(2003)
32. Klakow, D., Rose, G., Aubert, X.L.: OOV-detection in large vocabulary system
using automatically defined word-fragments as fillers. In: Eurospeech, ISCA (1999)
33. Bisani, M., Ney, H.: Open vocabulary speech recognition with flat hybrid models.
In: Interspeech [and] Eurospeech, 9th European Conference on Speech Communi-
cation and Technology, pp. 725–728 (2005). https://publications.rwth-aachen.de/
record/113162
34. Kombrink, S., Hannemann, M., Burget, L.: Out-of-vocabulary word detection and
beyond. In: Weinshall, D., Anemüller, J., van Gool, L. (eds.) Detection and Identi-
fication of Rare Audiovisual Cues. Studies in Computational Intelligence, vol. 384,
pp. 57–65. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-24034-
8_4
35. Drexler, J., Glass, J.: Subword regularization and beam search decoding for end-
to-end automatic speech recognition. In: ICASSP 2019-2019 IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6266–6270,
IEEE (2019)
36. Lakomkin, E., Heymann, J. Sklyar, I., Wiesler, S.: Subword regularization: an anal-
ysis of scalability and generalization for end-to-end automatic speech recognition.
In: Proceedings of the Interspeech 2020, pp. 3600–3604 (2020). https://doi.org/10.
21437/Interspeech.2020-1569
592 A. Salimbajevs and J. Kapočiūtė-Dzikienė
37. Raškinis, G., Paškauskaitė, G., Saudargienė, A., Kazlauskienė, A., Vaičiūnas, A.:
Comparison of phonemic and graphemic word to sub-word unit mappings for
lithuanian phone-level speech transcription. Informatica 30(3), 573–593 (2019).
https://doi.org/10.15388/Informatica.2019.219
38. Alumäe, T., Ottokar, T.: Automatic speech recognition system. In: Human Lan-
guage Technologies–The Baltic Perspective: Proceedings of the Seventh Interna-
tional Conference Baltic HLT 2016, vol. 238, pp. 39. IOS Press (2016)
39. Allauzen, C., Riley, M. Schalkwyk, J.: A generalized composition algorithm for
weighted finite-state transducers In:. Proceedings of the Interspeech 2009, pp.
1203–1206 (2009). https://doi.org/10.21437/Interspeech.2009-348
40. Mohri, M., Pereira, F., Riley, M.: Weighted finite-state transducers in speech recog-
nition. Comput. Speech Language 16(1), 69–88 (2002)
41. Ko, T., Peddinti, V., Povey, D., Khudanpur, S.: Audio augmentation for speech
recognition. In: Interspeech, pp. 3586–3589, ISCA (2015). http://dblp.uni-trier.de/
db/conf/interspeech/interspeech2015.html#KoPPK15
A Comparative Analysis of Local
Explainability of Models for Sentiment
Detection
1 Introduction
Machine learning has been widely used in various fields such as video caption-
ing [1], big data analysis [2], Natural Language Processing (NLP) [3], text clas-
sification [4], and sentiment analysis [5] which has led to remarkable growth in
Artificial Intelligence (AI) research [6]. Sentiment Analysis is a sub-field of NLP
that combines tools and techniques from linguistics and computer science to sys-
tematically identify, extract, and study emotional states and personal opinions
in natural languages. However, the main question is how an algorithm can detect
if a text is expressing positive or negative sentiment. Despite the high accuracy
and satisfactory performance of machine learning models in sentiment detection,
still, the models can be complicated that provide no information about how the
sentiment classification task is performed [7]. However, as these models are fre-
quently used to predict people’s preferences in recommendation engines, it may
be useful to inspect how they learned this prediction knowledge. In social net-
works, negative sentiments can be shared quickly, which may become a problem
if recommendation systems cannot explain the reasons behind their recommen-
dations. Explanations supporting the output of a black-box model are crucial
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
K. Arai (Ed.): FTC 2022, LNNS 561, pp. 593–606, 2023.
https://doi.org/10.1007/978-3-031-18344-7_42
594 H. Hajiyan et al.
where experts require more information about the decision than a simple pre-
diction. Recent research shows that most of the machine learning models used
in NLP, text classification, and sentiment analysis work as black-box models,
and the non-transparent structure of these models have led to different types of
research in this field using XAI [7–10].
Explainable AI (XAI) emphasizes on understanding the cause and effect
within the AI system by examining the sensitivities of the output to changes in the
parameter inputs without needing to understand the complex computation of the
model [6]. Furthermore, the explanations are helpful in supporting collaboration
between AI agents and human experts in many applications [11–13].
Although the transformer models, which rely on attention-based mechanism
have become very popular in the development of NLP tasks [14,15], other well-
known approaches, like random forest, Support Vector Machine (SVM), Naive
Bayes, and K-nearest neighbors can produce a quite good performance in senti-
ment analysis, their results are not explainable [16]. In this study, we investigate
the high performance of attention models in sentiment analysis compared to the
random forest and multinomial Naive Bayes, and sentiment explanation results
at features level show how each relevant or irrelevant keywords contribute to
true positive and false positive sentiment analysis.
To do so, we chose LIME as an explanation tool to evaluate the performance
of four different classifiers. As LIME is a model agnostic Post-hoc explanation
tool, it can be easily applied to any classification model, no matter if the decision
process is interpretable by itself or not. Therefore, we can investigate and com-
pare the sensitivity of attention-based models, and any other simple classifier in
sentiment analysis, to locally check the importance of each token/word in the
decision process.
In Sect. 2, explanation tools are categorized into local or global approaches
and determining whether generating the explanation requires post-processing
or not. Related work and the most XAI tools in NLP tasks are provided in
Sect. 3. Section 4 discusses four different classification algorithms, besides the
state-of-the-art XAI tool used in this study. Quantitative results of each model
and sentiment explanations for some instances are provided in Sect. 5. Finally,
we conclude the explanation results in Sect. 6.
2 Background
2.1 Transparency of a Black-Box Model
Transparency often refers to interpretability, which means a model is understand-
able by a human, such as regression, or decision tree [17]. However, explainability
is associated with the notion of explanation as an interface between humans and
a decision-maker [18]. There is an interchangeable misuse of interpretability and
explainability in the literature. The notable difference among these concepts
is interpretability refers to a passive characteristic of a model referring to the
level at which a given model makes sense for a human, which is also expressed
as transparency. In contrast to transparency, explainability refers to an active
Local Explainability of Models for Sentiment Detection 595
characteristic of a model with its internal functions [13]. In other words, a model
can be explained, but the interpretability of the model is something that comes
from the design of the model itself [13].
Explanations are often categorized into two main aspects [12,18]. The first one
distinguishes whether the explanation is for an individual prediction, called local
explanation, or the model’s prediction process as a whole, called global expla-
nation. Local explanation provides information or justification for the model’s
prediction on a specific input. Global explanation provides a similar justification
by revealing how the model’s predictive process works. In other words, global
explanation describes the whole decision process as a human term independent
of any particular input [8].
Whether the explanation is local or global, explanations differ on whether
they arise as part of the prediction process or whether their generation requires
post-processing following the model making a prediction [8].
3 Related Work
connected, convolution, and recurrent layers [26,27]. However, the main issue is
needing access to the inner structure of the model. For this reason, model agnostic
explanation methods, specifically perturbation-based, such as Local Interpretable
Model-Agnostic Explanations (LIME), have been used among for explaining text
classification problems such as sentiment analysis [7,11,28], as they are easy to
understand and do not require access to the inner structure of the model [21]. In
other words, model-agnostic explanation probe the black-box model by observing
the probability change on the predicted class when erasing a certain word [11,29].
Although several studies have been conducted to provide an explanation for
the results of sentiment analysis by LIME, they have been mostly focused on
explaining a single model, specifically a fined-tuned attention-based model [30],
using LIME to improve the explainability of a sentiment classifier with aug-
mented data [30], or a comparison of sentiment analysis methods, then choosing
the most accurate one to explain the results of the right predictions, with a high
accuracy score [31]. More specifically, there is no work investigating how local
explanations can reflect the accuracy of sentiment analysis methods in detecting
positive or negative sentiments at the features level.
4 Methodology
For this study, we considered four classification methods trained on IMDB review
dataset to compare the results of a sentiment analysis, which could either be
positive or negative.
Random Forest: Random forest is a classification algorithm consisting of many
decisions trees. It uses bagging and feature randomness when building each tree
to create an uncorrelated forest of trees whose prediction is more accurate than
any individual tree. There have been many studies conducted using random for-
est and produced quite good performance in sentiment analysis on a large num-
ber of sentiments from online purchasing, movie reviews, YouTube, and Twitter
social media [32–35]. Nevertheless, trusting the predictions and explaining the
results are remain as the main issues.
Multinomial Naive Bayes: Naive Bayes is one of the most popular algorithms
used in a variety of classification problems because of its fast processing time
and high level of effectiveness [36]. This algorithm uses statistical methods to
calculate the probability of a class based on its attributes, then find the highest
probability value to classify the data to the most appropriate category. Due to
the basic concept of the Naive Bayes algorithm, it is more often used in clas-
sifying texts and sentiments analysis as it combines the probability of words
and categories documents [37,38]. The Multinomial Naive Bayes (Multinomial
NB) method is the probability-based algorithm suitable for classification with
discrete features such as word counts for text classification. This approach con-
siders the term frequency and calculates the probability of each label given the
input text [39,40].
Local Explainability of Models for Sentiment Detection 597
4.2 LIME
locally linear model around the predictions of an opaque model to explain it.
The number of perturbations around each instance and the total number of fea-
tures represented in the output are two main parameters of LIME that should be
fine-tuned based on the given problem. According to the literature, the number
of perturbations will guarantee the stability of the resulting explanation [55],
and it should be 10 times larger than the number of words [56]. As the instances
are of different lengths, we adjusted this parameter based on the size of the given
instance. Therefore, the linear model fits on these perturbations and returns the
n important features. We decided to highlight 10% of the length of each instance
as the most important words detected by LIME.
5 Experiments
This study was conducted on the IMDB review dataset for binary sentiment
analysis containing 25,000 reviews for training and 25,000 for testing. In addition,
there is 50,000 additional unlabeled data for unsupervised use as well, Table 1.
Train Test
Random forest 0.80 0.79
Multinomial NB 0.85 0.82
Bidirectional RNN 0.87 0.85
BERT 0.93 0.86
As can be seen, the attention-based models are trying to capture the contex-
tual meaning of a text. Figure 3, and 4 show the true negative sentiment analysis
made by BiLSTM and BERT, and the probabilities are about 1. Though the
highlighted words and their weights detected by LIME are slightly different in
BiLSTM and BERT, the whole context conveys the same sentiment and it is
what we hope to see in attention-based methods.
The second sentiment is: “I first saw this movie on IFC. Which is a great
network by the way to see underground films. I watched this movie and was
thinking it was going to be pure drama and a story line that doesn’t hold water.
But it really was a worth while watch. The main character is in such rough shape,
and you hate to see him deny help, but no matter what you just can’t hate him.
His devotion to The Beatles and John Lennon is a great metaphor for his life
and the helplessness he feels. The atmosphere of the film is also great. At times,
you feel like you can see what he sees, feel what he feels in some situations.
This movie does not leave you wanting to know more, or disliking a loophole
in the plot. There are NO loopholes (in my opinion). I have always been a fan
of foreign films, especially now with movies being made so poorly in America.
I really enjoy the foreign settings because I feel it can take you on a trip, and
sometimes understand a different culture. This movie did all those things to me
and more. Please watch this movie and if you’re new to foreign films, this is a
great start.”
The actual label of this instance is positive. The prediction made by each
classifier besides the local explanations are as follows.
According to Fig. 5 and 6, although the predictions of random forest and
Multinomial NB is correct and the probability of the correct label is significantly
higher than the opposite one, again like the first instance, most of the highlighted
words detected by LIME seem to be irrelevant to the sentiment analysis task,
like the most frequent stop words, a, is, and, etc. The point is that a model
can make a correct prediction, but it does not seem to be meaningful if we
explain the reason behind that. That is why we cannot easily trust the model’s
prediction, besides it might raise the question is this correct prediction happened
accidentally.
The same instance has been fed into BiLSTM and BERT. The prediction
probabilities and explanation results are shown in Fig. 7, and 8. The prediction
probabilities of the actual label are close to 1, and the highlighted parts of the
sentence are more precise. In Fig. 7, and 8 the word great got more weight, besides
other relevant words such as worth, enjoy detected by LIME. Although these
attention-based models consider the contextual meaning of the given instance,
these highlighted words are the most affecting ones towards the positive label.
602 H. Hajiyan et al.
6 Conclusion
Sentiment analysis has become very popular in both research and business due
to the availability of vast amount of opinions currently produced by users on
social media. Standard sentiment analysis deals with classifying the overall sen-
timent of a text by considering the importance of each word within a context.
This study investigated the accuracy of four classification models for sentiment
analysis using local explanations. Precisely, we performed four different classifi-
cation methods, including random forest, Multinomial NB, BiLSTM, and BERT
on the IMDB review, then used LIME to explain the predictions. We investi-
gated two case studies from the given dataset chosen as true positive and false
positive predictions, then revealed the importance of each keyword affecting the
predicted label. The results proved that although random forest and Multino-
mial NB would predict the actual label, the prediction might not be precise in
long sentences when some irrelevant words are highlighted in the explanations.
In contrast, attention-based models like BiLSTM and BERT accurately pre-
dicted the correct label by focusing on the most relevant parts of the sentence.
This study shows how a correct prediction, even with a high prediction proba-
bility, cannot be accepted blindly. In other words, a correct prediction may arise
from an inaccurate model when we explain the model’s behavior at the features
level. Although many classification methods have been used in sentiment anal-
ysis, attention-based models represent the high performance through a decision
process, when the results are explained. For further research, we can examine
the sensitivity of attention-based models to different words through an expla-
nation framework by replacing the most affecting tokens with their synonyms
and then explaining the decisions again to see if the new words can change the
contextual meaning of a sentence or not. Furthermore, since LIME is a model-
agnostic explanation technique, we can combine it with other NLP tasks, e.g.
summarizing, extraction, and question-answering, which we leave for our future
work.
Acknowledgments. This work has been supported in part by the Natural Sciences
and Engineering Research Council of Canada (NSERC).
References
1. Zhou, L., Zhou, Y., Corso, J.J., Socher, R., Xiong, C.: End-to-end dense video
captioning with masked transformer. In: Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 8739–8748 (2018)
2. Najafabadi, M.M., Villanustre, F., Khoshgoftaar, T.M., Seliya, N., Wald, R.,
Muharemagic, E.: J. Big Data, 2(1), 1–21 (2015)
3. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed repre-
sentations of words and phrases and their compositionality. In: Advances in Neural
Information Processing Systems, pp. 3111–3119 (2013)
4. Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., Gao, J.:
Deep learning-based text classification: a comprehensive review. ACM Comput.
Surv. (CSUR) 54, 1–40 (2021)
604 H. Hajiyan et al.
5. Hasan, A., Moin, S., Karim, A., Shamshirband, S.: Machine learning-based senti-
ment analysis for twitter accounts. Math. Comput. Appl. 23, 11 (2018)
6. Linkov, I., Galaitsi, S., Trump, B.D., Keisler, J.M., Kott, A.: Cybertrust: from
explainable to actionable and interpretable artificial intelligence. IEEE (2020)
7. Bodria, F., Panisson, A., Perotti, A., Piaggesi, S.: Explainability methods for nat-
ural language processing: applications to sentiment analysis (Discussion Paper)
(2020)
8. Danilevsky, M., Qian, K., Aharonov, R., Katsis, Y., Kawas, B., Sen, P.: A sur-
vey of the state of explainable AI for natural language processing arXiv preprint
arXiv:2010.00711 (2020)
9. Liu, H., Yin, Q., Wang, W.Y.: Towards explainable NLP: a generative explanation
framework for text classification arXiv preprint arXiv:1811.00196 (2018)
10. Wiegreffe, S., Marasović, A.: Teach me to explain: a review of datasets for explain-
able NLP. arXiv preprint arXiv:2102.12060 (2021)
11. Ribeiro, M.T., Singh, S., Guestrin, C.: Why should i trust you?” Explaining the
predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD Interna-
tional Conference on Knowledge Discovery and Data Mining Pages, pp. 1135–1144
(2016)
12. Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable
artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018)
13. Arrieta, A.B., et al.: Explainable Artificial Intelligence (XAI): concepts, tax-
onomies, opportunities and challenges toward responsible AI. Information Fusion,
vol. 58. Elsevier (2020)
14. Grimsley, C., Mayfield, E., Bursten, J.: Why attention is not explanation: surgical
intervention and causal reasoning about neural models (2020)
15. Brunner, G., Liu, Y., Pascual, D., Richter, O., Ciaramita, M., Wattenhofer, R.:
On identifiability in transformers. arXiv preprint arXiv:1908.04211 (2019)
16. Daeli, N.O.F., Adiwijaya, A.: Sentiment analysis on movie reviews using Informa-
tion gain and K-nearest neighbor. J. Data Sci. Appl. 3, 1–7 (2020)
17. Lipton, Z.C.: The mythos of model interpretability. In: Machine learning, the Con-
cept of Interpretability is Both Important and Slippery. ACM New York, NY,
USA,, vol. 16, no. 3, pp. 31–57. Queue Press (2018)
18. Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.: A
survey of methods for explaining black box models. In: ACM Computing Surveys
(CSUR) (2018)
19. Arya, V., et al.: One explanation does not fit all: a toolkit and taxonomy of AI
explainability techniques. arXiv preprint arXiv:1909.03012 (2019)
20. Kenny, E.M., Keane, M.T.: Twin-systems to explain artificial neural networks using
case-based reasoning: comparative tests of feature-weighting methods in ANN-
CBR twins for XAI. In: Twenty-Eighth International Joint Conferences on Artifi-
cial Intelligence (IJCAI), Macao (2019)
21. Keane, M.T., Smyth, B.: Good counterfactuals and where to find them: a case-
based technique for generating counterfactuals for explainable AI (XAI). In: Inter-
national Conference on Case-Based Reasoning (2020)
22. Saltelli, A., et al.: Global sensitivity analysis: the primer (2008)
23. Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In:
International Conference on Machine Learning (2017)
24. Gorski, L., Ramakrishna, S., Nowosielski, J.M.: Towards grad-cam based explain-
ability in a legal text processing pipeline. arXiv preprint arXiv:2012.09603 (2020)
25. Lertvittayakumjorn, P., Toni, F.: Human-grounded evaluations of explanation
methods for text classification. arXiv preprint arXiv:1908.11355 (2019)
Local Explainability of Models for Sentiment Detection 605
26. Poerner, N., Roth, B., Schütze, H.: Evaluating neural network explanation
methods using hybrid documents and morphological agreement. arXiv preprint
arXiv:1801.06422 (2018)
27. Croce, D., Rossini, D., Basili, R.: Explaining non-linear classifier decisions within
kernel-based deep architectures. In: Proceedings of the 2018 EMNLP Workshop
BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (2018)
28. Alvarez-Melis, D. Jaakkola, T.S.: A causal framework for explaining the predictions
of black-box sequence-to-sequence models. arXiv preprint arXiv:1707.01943 (2017)
29. Chen, H., Zheng, G., Ji, Y.: Generating hierarchical explanations on text classifi-
cation via feature interaction detection. arXiv preprint arXiv:2004.02015 (2020)
30. Chen, H., Ji, Y.: Improving the explainability of neural sentiment classifiers via
data augmentation. arXiv preprint arXiv:1909.04225 (2019)
31. Aljuhani, S.A., Alghamdi, N.S.: A comparison of sentiment analysis methods on
Amazon reviews of Mobile Phones. Int. J. Adv. Comput. Sci. Appl. 10, 608–617
(2019)
32. Karthika, P., Murugeswari, R., Manoranjithem, R.: Sentiment analysis of social
media network using random forest algorithm. In: 2019 IEEE International Con-
ference on Intelligent Techniques in Control, Optimization and Signal Processing
(INCOS) (2019)
33. Singh, J., Tripathi, P.: Sentiment analysis of Twitter data by making use of SVM,
Random Forest and Decision Tree algorithm. In: 2021 10th IEEE International
Conference on Communication Systems and Network Technologies (CSNT) (2021)
34. Munshi, A., Arvindhan, M., Thirunavukkarasu, K.: Random forest application of
twitter data sentiment analysis in online social network prediction. In: Emerging
Technologies for Healthcare: Internet of Things and Deep Learning Models (2021)
35. Aufar, M., Andreswari, R., Pramesti, D.: Sentiment analysis on YouTube social
media using decision tree and random forest algorithm: a case study. In: 2020
International Conference on Data Science and Its Applications (ICoDSA) (2020)
36. Novendri, R., Callista, A.S., Pratama, D.N., Puspita, C.E.: Sentiment analysis of
YouTube movie trailer comments using Naı̈ve Bayes. Bull. Comput. Sci. Electr.
Eng. 1, 26–32 (2020)
37. Dey, S., Wasif, S., Tonmoy, D.S., Sultana, S., Sarkar, J., Dey, M.: A comparative
study of support vector machine and Naive Bayes classifier for sentiment analysis
on Amazon product reviews. In: 2020 International Conference on Contemporary
Computing and Applications (IC3A) (2020)
38. Li, Z., Li, R., Jin, G.: Sentiment analysis of danmaku videos based on Naı̈ve Bayes
and sentiment dictionary. IEEE Access (2020)
39. Dhola, K., Saradva, M.: A comparative evaluation of traditional machine learning
and deep learning classification techniques for sentiment analysis. In: 2021 11th
International Conference on Cloud Computing, Data Science & Engineering (Con-
fluence) (2021)
40. Rahman, R., Masud, M.A., Mimi, R.J., Dina, M.N.S.: Sentiment analysis on ben-
gali movie reviews using multinomial Naı̈ve Bayes. In: 2021 24th International
Conference on Computer and Information Technology (ICCIT) (2021)
41. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans.
Signal Process. 45, 2673–2681 (1997)
42. Nistor, S.C., Moca, M., Moldovan, D., Oprean, D.B., Nistor, R.L.: Building a
twitter sentiment analysis system with recurrent neural networks. Sensors 21, 2266
(2021)
606 H. Hajiyan et al.
43. Islam, M.S., Sultana, S., Roy, U.K., Al Mahmud, J., Jahidul, S.: HARC-new hybrid
method with hierarchical attention based bidirectional recurrent neural network
with dilated convolutional neural network to recognize multilabel emotions from
text. Jurnal Ilmiah Teknik Elektro Komputer dan Informatika (JITEKI) (2021)
44. Abid, F., Li, C., Alam, M.: Multi-source social media data sentiment analysis
using bidirectional recurrent convolutional neural networks. Comput. Commun.
157, 102–115 (2020)
45. Cai, Y., Huang, Q., Lin, Z., Xu, J., Chen, Z., Li, Q.: Recurrent neural network
with pooling operation and attention mechanism for sentiment analysis: a multi-
task learning approach. Knowl.-Based Syst. 203, 105856 (2020)
46. Turek, J., Jain, S., Vo, V., Capotă, M., Huth, A., Willke, T.: Approximating stacked
and bidirectional recurrent architectures with the delayed recurrent neural network.
In: International Conference on Machine Learning (2020)
47. Elfaik, H., et al.: Deep bidirectional LSTM network learning-based sentiment anal-
ysis for Arabic text. J. Intell. Syst. 30, 395–412 (2021)
48. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9,
1735–1780 (1997)
49. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: pre-training of
deep bidirectional transformers for language understanding. arXiv preprint
arXiv:1810.04805 (2018)
50. Taylor, W.L.: “Cloze procedure”: a new tool for measuring readability. J. Quart.
30, 415–433 (1953)
51. Habimana, O., Li, Y., Li, R., Gu, X., Yu, G.: Sentiment analysis using deep learning
approaches: an overview. Sci. China Inf. Sci. 63, 1–36 (2020)
52. Chauhan, P., Sharma, N., Sikka, G.: The emergence of social media data and
sentiment analysis in election prediction. J. Ambient. Intell. Humaniz. Comput.
12, 2601–2627 (2021)
53. Karimi, A., Rossi, L., Prati, A.: Adversarial training for aspect-based sentiment
analysis with Bert. In: 2020 25th International Conference on Pattern Recognition
(ICPR) (2021)
54. Hoang, M., Bihorac, O.A., Rouces, J.: Aspect-based sentiment analysis using
BERT. In: Proceedings of the 22nd NORDIC Conference on Computational Lin-
guistics (2019)
55. Zhou, Z., Hooker, G., Wang, F.: S-lime: stabilized-lime for model explanation. In:
Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery &
Data Mining (2021)
56. Garreau, D., Mardaoui, D.: What does LIME really see in images? In: International
Conference on Machine Learning (2021)
Persuasive Dialogue Corpus:
Graph-Based Approach Combining
Persuader and Persuadee Perspectives
1 Introduction
Persuasion is the act of convincing a person to believe in or act on some-
thing, whether it is making a donation, voting for a particular candidate, or
following healthier habits [9]. Understanding persuasion is one of the keys for
building smart AI-powered applications (e.g. chatbot assistants, tutors, gaming
avatars) able to recognize and predict persuasive strategies [1,6,12]. Due to its
nature, persuasion is a complex natural language phenomenon, often inconsis-
tent or irrational, which makes it difficult to classify into underlying constructs.
Recently, identifying persuasion as well as improving persuasive dialogue systems
has become an active area of Natural Language Processing (NLP) and Natural
Language Generation (NLG) research [7,11]. Current topics are concentrated on
i) argument mining to gather data, ii) annotation schemes to categorize each
argument and make persuasion more normative and iii) models trained on these
2 Related Work
Persuasion has been extensively studied in many fields (e.g. linguistics, cogni-
tive science, gaming) and there are many different ways to represent persuasion.
From a cognitive perspective, the behavior of persuadee can be influenced by
using the following acts: command, convincing, and suggestion [2]. An example
of a command would be when the persuader is simply commanding the user to
donate using some online forum or by providing credit card details. Convincing
Persuasive Systems 609
requires altering certain desires of the user. To add convincing arguments, the
persuader could describe some benefits to the persuadee that come with making
a donation, such as tax exemption for the persuadee. Lastly, suggestion involves
giving power back to the persuadee to decide if they want to make the donation.
Furthermore, there are multiple factors that can make a persuadee more sus-
ceptible to changing their mind [9], for example, acknowledging perceived social
norms, conforming to social pressure, and having an emotional attachment to the
topic being discussed. Persuasive dialogues can also be used in various domains,
such as emotional reasoning and gaming, where parents or game players must
strategize a way to get their child or another player to perform a task [10].
First, with many factors at play during an argumentative discussion and a
variety of contexts in which these discussions may take place, persuasion is very
difficult to capture. Current ways of representing persuasion in text includes
establishing a set of categories for each argument. In this way, a model over
the categorized dataset can be created to classify new arguments. To test this
method, a simulation is often created where persuader arguments are generated
from the model and tested on persuadees that are played by real people (users
of the simulation). For example, a chatbot where the bot is the persuader is a
type of simulation used in recent work [3,4]. It is important to note, however,
that developing a chatbot with an artificial persuader requires having a balanced
corpus. It is even more helpful if said corpus categorizes each argument so that
the chatbot can pull an argument from a relevant category when listening to the
user and understanding their perspective [5,14]. It is also important to construct
a dataset for possible persuadee responses so that the chatbot can be more
responsive and can directly address the user’s points in a way that is appealing
to them [6]. This highlights the importance for creating an annotation schema
to represent the persuadee.
Second, there exist only a few available persuasive dialogue corpora. Data
collection and conversational corpus development are time-consuming and
often topic-dependent tasks. Despite the availability of several conversational
datasets [7], we focus our review on only two corpora that would serve as a
foundation for our framework, a combined approach of incorporating annotation
schemas along with graph-based connections between dialogue taken directly
from persuasive conversations. The first corpus has been developed from conver-
sations between assigned persuaders and persuadees recruited on the Amazon
Turks Platform [6,14], where the persuader was tasked with trying to convince
the persuadee to donate to a charity (‘action’ persuasive conversation). This
corpus has 1017 data entries with annotated schemes tagging each persuadee
response and persuader argument with a specific category type [6,14]. For exam-
ple, argument categories include logical appeal, emotion appeal, and credibility
appeal, while persuadee categories include request for organization information,
inquiries about donation procedure, and positive/negative reactions. This cor-
pus, however, does not encode the sequence of the conversations collected. It
solely provides categories for which to sort arguments and responses in an ‘action’
conversation (persuasive conversations aimed at convincing the persuadee to do
610 M. Allamudi and O. Scrivner
Fig. 1. Graph-based corpus: arguments B and C argue against the point made in their
parent, argument A. Argument D agrees with the point made in argument A, but is a
counterargument to the argument made in argument B. The graph shows the flow of
a potential, back-and-forth debate between two people [3, p.2].
chatbots with the number of changes in stance points (a point is awarded if the
user’s opinion is changed from what it originally was before the conversation
started), users are also asked whether they have been satisfied. It is hypothe-
sized that the reason why the users were not satisfied with how understood they
were and how their points were addressed by the strategic chatbot is because
this specific chatbot could have come across as being more stubborn, ignoring
how the user felt, which is why a consideration for the persuadee perspective is
important.
In regards to the persuadee perspective, multiple models and their own
predefined categories based on the first corpus are used to represent per-
suadee responses in [6]. Persuadee categories include: ask-org-info, ask-donation-
procedure, positive reaction, agree donation, etc. A Transformer-based model
with extended CRF (Conditional Random Field) is used to build a persuasive
strategy recognition model. This model (Transformers-ExtCRF) proves to be
more accurate when categorizing persuader responses according to the defined
categories in [14]. Finally, HARGAN (Heterogenous Argument Attention Net-
work) uses a graph tree to learn argument structure for both persuader’s and
persuadee’ stance predictions using ChangeMyView dataset [8].
Thus, as previous research has shown, it is important that the persuadee
perspective be considered and annotated thoroughly. Additionally, the combi-
nation of annotated methods with the graph-based corpus method to create a
corpus can lead to a more strategic and informative corpus tracking arguments
and counterarguments.
1. The creation of a publicly available data collection tool to help gather con-
versational data,
2. The gathering of data with ‘no-action’ persuasive conversations,
3. The design of a graph-based schema for Neo4j, a graph database platform.
To achieve the first milestone, we have followed previous work by [4,5] and
designed our collection instrument specific to participating students who would
play a role of either the persuadee or persuader. It is important to note that
there is a lack of dialogue collection tools and original corpora made available in
the NLP community (without the use of web-scraping from sites such as Reddit),
so this developed data collection tool and our overall data collection process will
be a contribution on its own. Figure 2 illustrates the process used to gather data
from students on campus.
As illustrated in the second step of the flowchart in Fig. 2, we developed a
data collection site where two students can chat simultaneously. The interactive
612 M. Allamudi and O. Scrivner
web application, built with React (a front-end javascript library) and Firebase
(a back-end cloud service) platform, introduces a novel way to create and store
conversational data. Additionally, the web app code can be reproduced and used
for small and large datasets. The dialogues are stored via Firestore, a document
database in Firebase platform, and exported to CSV after each conversation.
Once the topic for conversation is determined, it is placed at the top of the
chatting web interface, where users can anonymously log in for each conversation.
Each dialogue is stored as a separate document within Firestore. Each data entry
(persuader or persuadee) has its unique ID, a timestamp, and a text field. When
data is exported to CSV, the user can manually add labels and annotations. The
user-interface of the web collection tool is shown in Fig. 3.
Fig. 3. Web application for data collection: a screenshot of chatting area where partic-
ipants can engage in synchronous conversation. The deployed app is available online -
https://dialogue-data-collect-site.netlify.app/.
Fig. 4. Talking points for the persuader used during the study (Adapted from [14]).
Additional strategies include scenario-based inquiries, experience-related inquiries, clar-
ifying inquiries and opening/closing remarks. Each persuader can choose any of the
strategies.
614 M. Allamudi and O. Scrivner
Fig. 5. Basic design schema for persuasive strategies: the schema represents possible
paths of persuasive strategies that can be taken to convince the persuadee that online
learning is more beneficial. In this case, starting from opening remarks, the persuader
could potentially use any other strategy to begin or follow their arguments.
The schema is used as a design outline for the Neo4j network database. It is
important to note that given the current study limitation, no node for outcome
(draw, partial success, or complete success) is incorporated. In this schema, the
opening remarks strategy is kept in the center, as it is the beginning of every
conversation. Any of the other strategies (except for a clarifying inquiry or a
closing remark) can potentially follow an opening remark. From any of those
eight strategies can follow any other strategy including a closing remark if some
conclusion has been reached (whether it is a draw, partial success, or complete
success) and a clarifying question.
Fig. 6. Neo4j graph with persuader strategies in sequence. Logical appeal is the largest
node since many routes intersect with that persuasive strategy.
The second iteration includes both the sequence of persuader and persuadee
strategies used in all 21 conversations. In this schema, the data is grouped by con-
versation number and each route of strategies regardless of position are mapped
in this graph, as illustrated in Fig. 7.
Figure 8 demonstrates the graph for conversations 10 through 13 with a
closer look at logical appeal node, showing that many points are leading to that
strategy.
Figure 9 shows a zoomed-in version of iteration number one of the graph
that only maps conversation number 4 to show it in a more comprehensible
light. Here, we see that the persuader starts at opening remarks, circles around
logical appeal a few times, and then ends with getting a full user agreement.
Finally, the last iteration includes every single dialogue as a separate node and
each node also includes the strategy used.
616 M. Allamudi and O. Scrivner
Fig. 8. A close-up graph of the conversation numbers 10–13: many conversation points
lead to logical appeal strategy.
Persuasive Systems 617
Fig. 9. Persuader strategies for the conversation number 4: persuader started with an
opening remark, uses three logical appeal arguments, and ends with a user agreement.
5 Discussion
Table 1. Breakdown of categories for each dialogue in a dataset of 228. Note that a
“Thought-Provoking User Response” is a persuadee’s answer to a thought-provoking
inquiry by a persuader, while a “Thought-Provoking Inquiry User Response” is a
thought-provoking question asked by the persuadee to the persuader.
Additionally, we were able to graph the first thirteen conversations (each one
represented in a different color) and two had resulted in a complete success (see
Fig. 10). If we follow the yellow and blue lines that resulted in a complete success
with the persuadee agreeing completely that online learning is more beneficial
for students academically, we see that the majority of the arguments made in
each pipeline are logical, which shows that the persuadees engaged in both of
these conversations (two different students) are both logic-leaning. We have yet
to see a persuadee that is more emotion-leaning or responds better to personal
stories.
Finally, we present a map of clustered (by their annotations) arguments and
counterarguments to online learning along with their specific category annota-
tion. This will create a collection of various arguments and annotations that will
be more flexible and understanding of different persuadee perspectives. In the
future, with more data and more observations made on persuadee behavior, it
will also give us some more insight into the best strategies. For example, Fig. 11
is a snippet of what we wanted our end-goal to be before we finished collecting
data and constructing our final Neo4J iteration. Notice here that a logical user
response can be refuted with either a thought-provoking inquiry or a personal
story made by the persuader. It all depends on what kind of persuadee person-
ality the persuader is working with. Our strategy graphs should be able t2o help
determine what paths work for certain persuadees, especially when we begin to
build conversational agents that can use it.
Persuasive Systems 619
Fig. 10. A graph of the first thirteen conversations: the graph shows paths to complete
success and any other paths (we did not encode partial success in this schema yet).
(Color figure online)
Fig. 11. Example graph with dialogue based on collected data. Arguments A, C and D
are made by the persuader, while argument B is made by the persuadee and in response
to argument A. In this case, depending on the kind of persuadee is participating in
the conversation, a conversational agent would decide whether to take a more logical
approach or a more personal and emotional approach.
620 M. Allamudi and O. Scrivner
6 Conclusion
Currently our work in progress is focused on creating Neo4j graph-based
databases to highlight which argument strategies in ‘no-action’ persuasive con-
versations prove to be optimal and on developing more in-depth profiles of poten-
tial persuadees to help persuaders be more understanding of their perspective.
Thus far, we have collected 228 dialogues and plan on adapting our graphs and
annotation schemas to larger ‘no-action’ conversational datasets in the future.
These 228 dialogues have provided insights into which strategy seems to be the
most used and which sequence of strategies seems to work for the persuadees.
Moreover, the collected data has helped us developed an annotation schema for
the persuadees. This is an important contribution in determining how to inter-
pret success among persuadees and how to categorize their responses. It can
help us understand what strategies each persuadee is more likely to resonate
with and what kind of responses they will tend to use in their retort. Addition-
ally, we developed a reproducible conversational data collection instrument2 ,
Neo4j corpus with collected dataset3 .
Our study has also several limitations. The first limitation is the amount of
data we have collected. In order to create a more representative graph-based
corpus, we need more data points. However, with a smaller dataset, we have
been able to create an annotation schema for the persuadee perspective for this
specific ‘no-action’ persuasive conversation. We have also been able to construct
several algorithms for developing the Neo4J graphs presented in Sect. 4. These
are important contributions that we believe can now be adapted to a larger ‘no-
action’ persuasive conversation (this will be our future work). Additionally, the
pool of students consists of STEM-majoring students. Therefore, the demograph-
ics of our users are difficult to diversify, which means our data, to an extent, is
not diverse. Another limitation to our current project is that we have manually
labeled each dialogue. We understand that it is possible for an argument to be
partially logical and partially appealing to the emotions of the user, so in the
future it will be best if we change our categorization technique from a 1:1 map-
ping (from dialogue to category) to a scale-based categorization. For example,
a persuader argument could then potentially be 70% logical appeal and 30%
emotional appeal.
Finally, it is difficult to understand or measure the full impact of our data-
storing and representation approach at the moment. Therefore, our next step
is to evaluate our algorithms and data-labeling processes to a larger, existing
conversational dataset.
References
1. Benner, D., Schöbel, S., Janson, A.: Exploring the state-of-the-art of persuasive
design for smart personal assistants. In: International Conference on Wirtschaftsin-
formatik (WI) (2021)
2
https://github.com/MeghnaAllamudi/Thesis-Data-Collection.
3
https://github.com/MeghnaAllamudi/Neo4JDatabaseDev.
Persuasive Systems 621
2. Boella, G., Hulstijn, J., Van Der Torre, L.: Persuasion strategies in dialogue. In:
The ECAI Workshop on Computational Models of Natural Argument (CMNA
2004) (2004)
3. Chalaguine, L.A., Hunter, A.: Chatbot design for argument harvesting. Front.
Artif. Intell. Appl. 305, 457–458 (2018)
4. Chalaguine, L.A., Hunter, A.: Knowledge acquisition and corpus for
argumentation-based chatbots. In: Proceedings of the 3rd Workshop on Advances
in Argumentation in Artificial Intelligence, pp. 1–14 (2019)
5. Chalaguine, L.A., Hunter, A., Potts, H.W.W., Hamilton, F.L.: Impact of argument
type and concerns in argumentation with a chatbot. In: IEEE 31st International
Conference on Tools with Artificial Intelligence, pp. 1557–1562 (2019)
6. Chen, H., Ghosal, D., Majumder, N., Hussain, A., Poria, S.: Persuasive dialogue
understanding: the baselines and negative results. Neurocomputing 431, 47–56
(2021)
7. Duerr, S., Gloor, P.A.: Persuasive natural language generation - a literature review
(2018), 1–17 (2021)
8. Huang, K.-Y., Huang, H.-H., Chen, H.-H.: HARGAN: heterogeneous argument
attention network for persuasiveness prediction. In: Proceedings of the AAAI Con-
ference on Artificial Intelligence, vol. 35, no. 14, pp. 13045–13054 (2021)
9. Hunter, A.: Towards a framework for computational persuasion with applications
in behaviour change. Argument Comput. 9(1), 15–40 (2018)
10. Kacprzak, M.: Persuasive strategies in dialogue games with emotional reasoning.
In: Polkowski, L., et al. (eds.) IJCRS 2017. LNCS (LNAI), vol. 10314, pp. 435–453.
Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60840-2 32
11. Lipa-Urbina, E., Condori-Fernandez, N., Suni-Lopez, F.: Towards an automatic
generation of persuasive messages. In: Ali, R., Lugrin, B., Charles, F. (eds.) PER-
SUASIVE 2021. LNCS, vol. 12684, pp. 55–62. Springer, Cham (2021). https://doi.
org/10.1007/978-3-030-79460-6 5
12. Oduor, M., Alahaivala, T., Oinas-Kukkonen, H.: Software design patterns for per-
suasive computer-human dialogue: reminder, reward, and instant feedback. In: Lit-
tle, L., Sillence, E., Joinson, A. (eds.) Behavior Change Research and Theory:
Psychological and Technological Perspectives, pp. 47–67. Elsevier Science (2017)
13. Sakai, K., Higashinaka, R., Yoshikawa, Y., Ishiguro, H., Tomita, J.: Hierarchical
argumentation structure for persuasive argumentative dialogue generation. IEICE
Trans. Inf. Syst. E103D(2), 424–434 (2020)
14. Wang, X., et al.: Persuasion for good: towards a personalized persuasive dialogue
system for social good. In: ACL 2019 - 57th Annual Meeting of the Association for
Computational Linguistics, Proceedings of the Conference, pp. 5635–5649 (2020)
N-Gram Based Amharic Grammar Checker
1 Introduction
Now days, the demand for producing texts with high quality is increase. The automated
tools that check and correct sentence error contribute to improvements in writing high
quality texts [1]. This is one area of Natural Language Processing (NLP) and concerned
with creating proofing systems. As result, such error free text achieved at different levels
such as morphology to define the structure of words, syntax to determine the composition
of sentences, and semantics to determine the meaning [2].
Several research works have been conducted on this research area; the work by [2]
used ontology to define the logical description of the rules of Arabic grammar to gen-
erate all possible sentences that are syntactically correct for each extracted words of
target sentence. Afterwards the target sentence is compared with all possible generated
sentences to detect any grammatical mistakes followed by correction phase. This app-
roach requires a detailed grammatical and structural knowledge of a language to build
ontology as a knowledge source. As result, this approach is not suitable for a model
operates without language restriction.
On the other hand, most research works used fixed n-gram (i.e. Bigram, Trigram,
or in combination) tag sequence probability as language model for grammatical error
detection and correction of a language [3–6]. In these research works, the grammatical
features are extracted at Bigram and Trigram level and this limits extracted features to
train model about the grammatical properties of language. As result, the model is not
effective on detection and correction of grammatical errors.
Beside this, all the above mentioned research works are design for a particular lan-
guage. Due to a de fact of multilingualism textual contents on the web, tools that operate
beyond language barriers are more required. In this research area, one work done by [7]
attempted a grammar checker for text written with any language using statistical data.
In this work, the model used Trigram tag sequence to learn the grammatical properties
of the language. As result, this approach is not effective on grammatical error detection
and correction. Since, the extracted grammatical features are limited to maximum of
Trigram tag sequence. This also increases the probability of tag feature sequences that
are out of the training model. Finally, checking the grammatical correctness of a text by
only considering of three neighbour tag sequence ignores the normal scenario of natural
languages. Since, during natural language the grammatical correctness of a text checked
by considering all tag sequences per sentence [8].
Therefore, in this study all possible sentence level grammatical features are used to
learn the model about grammatical properties of a language. For this purpose, we have
formulated the following research questions: evaluate the effectiveness of grammatical
errors detection and correction at multilingual setting (i) using fixed n-gram (i.e. Bigram
and Trigram) grammatical features (ii) using all sentence level grammatical features or
sentence level n-gram. To demonstrate the model, we adopt textual documents written
in under-resourced language such as Amharic, Afaan Oromo and Tigrigna.
2 Methodology
2.1 Data Selection
For part-of-speech tagging we adopt TreeTagger and to train this tool we used corpus
for each supported language from HaBit (Harvesting big text data for under-resourced
languages) [9, 10]. Training data for tree tagger described under (Table 1). To train
the language properties for grammar detection and correction module, we used word
n-gram and tag n-gram data sets. For this, 408,920 Amharic sentence tag and word n-
gram are used respectively. Finally, to train grammatical disagreement error detection
and correction module all possible word-class agreement combinations and words are
extracted from tagged training corpus for all domain language. For testing, we adopt
the experience of many researchers which is creating a test set artificially by randomly
replacing words in correct sentence. The statistical details of grammatical incorrect text
units used to evaluate our model describe in Table 1.
624 D. Sharma et al.
As shown in Fig. 1 the proposed model is structured into seven main modules such
as language selection, sentence segmentation and indexing, POS tagging, POS label
normalization, n-gram extraction, grammar error detection and grammar error correction.
This model requires text as input that should be checked its grammatical error and text
at any level (i.e. phrase, sentence, etc.). First, the language of the written text should
be identified to perform further operation on the model. Once the language is known
the text split into set of statements with index information and pass via POS tagging
to assign each words into their word class category. This grammatical information is
very important for grammatical error detection and correction. However, a POS training
corpus used for POS tagging and for generating language model have different POS label
representation and to handle this complexity we include POS label normalization. The
tagged sentence is split into set of n-grams (i.e. fixed n-gram or sentence level n-gram)
for both word and tag sequence features. All possible tag n-grams of sentence is checked
along language model and if none of them not found then considered as grammatical
error text unit. To provide correction for detected grammatical error proposed model use
word n-gram language model.
Annotated corpus is used to generate list of POS tag and from these generated tag
sequences, some sequence will be very common, others will probably not occur at all.
Commonly occurring sequences will be considered correct in other words uncommon
sequences will lead to errors. In this study we adopt Tree Tagger, words are represented
by word itself, lemma and POS tag [10]. We trained this tagger by using a manually
tagged training corpus. For tree tagger we used POS annotated training corpus having
different POS representation and would be different POS outcome for each tree tagger
corpora. For best match, the POS label of test text and training POS should have same
representation, unless the model is not effective. To handle this we include a module that
converts original tagged POS label of each word into standard POS label representation.
We adopt fixed n-gram and sentence level n-gram grammatical feature extraction tech-
niques. In case of fixed n-gram number of grammatical features extracted to learn
grammatical properties of a language is limited with specified size of N (i.e. tri-gram
and bigram) and this reduce effectiveness of model. Example1: show context feature
extraction techniques with Amharic text unit
Each words of given text unit are analyzed with POS tagger and assigned with their
word class category as follows:-
N-Gram Based Amharic Grammar Checker 625
The possible tag n-grams extracted from above Amharic sentence is:-
Possible tag tri-gram patterns:-
Similarly all possible tag Bigrams also extracted to enhance the probability of
grammatical error detection and correction for a given tagged text.
We incorporate a technique that extract rich grammatical features of a given tagged
text unit. In this technique, we are going to extract all sentence level possible n-gram
grammatical features and we call this sentence level tag n-gram grammatical feature
626 D. Sharma et al.
extraction technique. This enables proposed model to learn more about grammatical
properties of language.
All possible extracted tag n-gram sequence at sentence levels for Example1:-
As shown above all possible tag n-grams from higher tag n-gram sequence up to the
tag bi-gram sequence are extracted.
In Example2, all possible tag n-gram sequences are extracted to enhance grammatical
mistake detection capability of proposed model. However, for efficiency we only extract
tag n-gram sequence which contains the last word tag information.
N-Gram Based Amharic Grammar Checker 627
Grammar checkers are interactive systems which check grammatical errors of given
text unit before any other input word. In Example2 grammatical error detection and cor-
rection is already done for tag n-gram sequence without last tag “ <ENDPUNC>” (i.e.
“<ADV><N><NUMCR><N><V>”, “<ADV><N><NUMCR><N>” etc.). In
given text unit grammar checker module validates non-checked tag-sequence by looking
it up in target tag n-gram language model. First checks higher non-checked tag n-gram
(i.e. hexa-gram) and if not found, the module further checks for availability of the lower
non-checked n-grams. This checking process continue until non-checked bi-gram gram-
matical feature and if none of them are not found in the language model then “last tag”
is considered as suspicious grammatical information.
We used word n-gram model to formulate text unit variation by replacing all available
words in terms of last word of original text unit. After grammatical error word is replaced
by other variation words that are extracted from language model relative probability is
computed. When relative probability of newly formulated text unit is higher than original
text, the proposed models verify previously detected grammatical error is a real error.
language model is used to lookup test tagged text adverb and verb agreement, it is valid
when found in the model. In above sentence, the word “ ” and “ ” are not
found in the training model and are considered as adverb and verb agreement error.
To detect grammatical mistakes of above tagged text, all possible tag n-grams features
are extracted with either fixed or sentence level extraction technique and check their
occurrence in target n-gram language model. According to the above example the last
tag n-gram order is detected as an error (V and N). As result, to suggest grammatical
corrections the first step is generate new formulated candidate text variations as follows
The entire new formulated candidate texts are analysed along POS tagger and each
word in a text are labelled to their corresponding word class as shown in below.
In example4 each word of candidate texts are assigned to their word class category
and our model further requires selection of top relevant suggestions.
To suggest nearly similar texts as user wants to write, we rank candidate tagged
suggestions based on their relevance to the original text. Compute the probability of tag
N-Gram Based Amharic Grammar Checker 629
n-gram sequence extracted from each tagged candidate texts via tag n-gram language
model as follows
occurance of tag − ngram
relativeProb − tag − ngram = (1)
all target tag − ngrams occurrence
where
relativeProbn − tag − ngram is the probability degree of tag n-gram along target tag
n-gram language model.
occurance of tag−ngram is the tag n-gram frequency in a target tag n-gram language
model.
all target tag − ngrams occurrence is the sum of frequency of all tag n-gram in
target language model.
Finally, the proposed model is providing the grammatical error suggestions to user if
and only if one of the grammatical candidate text probabilities is greater than the original
grammatical mistaken text. To provide grammatical error suggestions, there are two
alternative ways: In case of fully automated grammar checker system, the grammatical
mistaken text is replaced with suggestion text having highest relative probability. But, in
case interactive system the top K (i.e. K is integer) ranked suggestion texts are suggest
to user based on their relevance to original grammatical error text. In this investigation,
the grant to choose the right suggestion text is given for end user and this resolve the
problem of false positive.
4 Conclusion
In this study, we design and implement an automatic grammatical error detection and
correction without any language restrictions. To do this, we incorporate seven high level
modules such as: language selection, sentence segmentation and indexing, POS tagging,
POS label normalization, n-gram extraction, grammar error detection and correction.
Both supervised and unsupervised corpus of domain language was collected and used
as training of proposed grammar checker. To evaluate proposed approach erroneous test
sets are created by exchanging the location of words in random manner due to lack of
actual and well organized test set.
In this study we conduct three experimental techniques: Experiment 1 aims to eval-
uate the effectiveness of proposed model along with fixed n-gram feature extraction
technique. Moreover, Experiment 2 conducted to evaluate the effectiveness of proposed
model with sentence level n-gram feature extraction technique and Experiment 3 con-
ducted to evaluate proposed model using combination of sentence level n-gram feature
extraction technique via disagreement rules. The experimental result indicates proposed
model via Experimental 3 performs better for both grammatical mistake detection and
correction. In this investigation, for demonstration purpose the number of supported
language is limited but our model is flexible and modular so, it is possible to extend for
other language. Therefore, future work is bound towards extending the model for other
languages and improving the performance of the system.
References
1. McCarthy, K.S., Roscoe, R.D., Likens, A.D., McNamara, D.S.: Checking it twice: does adding
spelling and grammar checkers improve essay quality in an automated writing tutor? In:
Isotani, S., Millán, E., Ogan, A., Hastings, P., McLaren, B., Luckin, R. (eds.) AIED 2019.
LNCS (LNAI), vol. 11625, pp. 270–282. Springer, Cham (2019). https://doi.org/10.1007/
978-3-030-23204-7_23
2. Chouaib, M., Tragha, A., El Habib, B.A., Almalki, T.: An innovative approach to autocor-
recting grammatical errors in Arabic texts. J. King Saud Univ. Comput. Inf. Sci. 33, 476–488
(2019). https://doi.org/10.1016/j.jksuci.2019.02.005
3. Jindal, L., Singh, H., Sharma, S.: A framework for grammatical error detection and correction
system for Punjabi language using stochastic approach. EAI Endorsed Trans. Scalable Inf.
Syst. 8 (2021). https://doi.org/10.4108/eai.27-4-2021.169421
632 D. Sharma et al.
4. Leekha, J., Vijay, R., Sanjeev, K.: N-gram statistical grammar checker for an Indian language.
Int. J. Adv. Sci. Technol. 29, 3098–3106 (2020). http://sersc.org/journals/index.php/IJAST/
article/view/4541
5. Nahid, H., Salekul, I., Mohammad, N.: Development of Bangla spell and grammar checkers:
resource creation and evaluation. Inst. Electr. Electron. Eng. 9 (2021). Digital Object Identifier.
https://doi.org/10.1109/ACCESS.2021.3119627
6. Riazur, R., Tarek, H., Sadekur, R., Shaon, B., Mohammad, S.: An investigative design based
statistical approach for determining Bangla sentence validity. Int. J. Comput. Sci. Netw. Secur.
16, 30–37 (2016). http://paper.ijcsns.org/07_book/201611/20161106.pdf
7. Verena, H., Timo, R.: LIS Grammar Checker: Language Independent Statistical Gram-
mar Checking (2009). https://www.ru.is/kennarar/hrafn/students/MasterThesis_HenrichRe
uter.pdf
8. Debela, T.: A rule-based Afan Oromo grammar checker. Int. J. Adv. Comput. Sci. Appl.
(IJACSA) 2(8) (2011). https://doi.org/10.14569/IJACSA.2011.020823
9. Suchomel, V., Baisa, M., Jakubíček, V., Kovář, Z., Nevěřilová, A., Rambousek.: HaBiT -
harvesting big text data for under-resourced languages. Habit-project.eu. (2014). http://habit-
project.eu/. Accessed 20 Nov 2019
10. Helmut, S.: Probabilistic part-of-speech tagging using decision trees. In: International Con-
ference on New Methods in Language Processing, pp. 44–49 (1994). http://citeseerx.ist.psu.
edu/viewdoc/similar?doi=10.1.1.28.1139&type=cc
The Internet of Things as a Tool Towards Smart
Education: A Systematic Review
Abstract. IoT’s adoption has grown exponentially across a vast number of indus-
tries; with each industry that IoT is applied in being characterised by a unique set
of prospects and challenges. In education, advancements in new technologies,
guided by the advent of artificial intelligence and IoT, has seen the learning envi-
ronment transform from traditional-based learning to digital-based learning. Edu-
cation Institutions that leverage all of the big data generated from IoT applications,
is a process that could be adopted to address the challenges of implementing IoT
solutions, as well as challenges in the education industry. We surveyed the litera-
ture to identify new developments, trends and applications of IoT in the education
industry. To achieve this we used the Scopus database and retrieved related articles
using key words, for example IoT in education, IoT and education, IoT application
in education, IoT in teaching and learning, IoT and online learning, Implication
of IoT in Education. IoT and distance learning, IoT and monitoring in education.
We established that the IoT’s application in the education field has expanded in
recent years, where applications have already been adopted that have been devel-
oped in different institutions globally. The results of the content analysis were
classified into four main categories, namely, Application, Potential, Factors and
Challenges of IoT in education. The directions and recommendations for future
work concerning IoT’s implementation in education have been presented.
1 Introduction
1.1 IoT Overview
The Internet of things (IoT) is defined as a set of electronic devices that are connected via
the internet or intranet. Such devices and objects include sensors, electronics and soft-
ware. This technology enables connection between devices (things), people and environ-
ments in order to collect data by embedding actuators and sensors, then transmitting such
data to specialised applications to create useful and actionable information. The extant
literature has adopted several terms to define IoT, for instance, the Internet of Anything
(IoA), Internet of Everything (IoE), Web of Things, Industrial Internet of Things (IIoT),
or Machine-to-Machine communication. IoT affects numerous spheres of life including
education, social, health, transport, communication, environmental monitoring, business
and society.
The concept of IoT is deemed to be a gateway to the digital society. IoT’s innova-
tions will increase based on continuous advances in cloud computing, nanoelectronics,
communications, sensors, big data, as well as smart objects. IoT is a particular aspect of
the Internet that permits the connection of humans to each other, connection of human
and things, in addition to connections between things. Consequently, the emergence of
the IoT has facilitated the establishment of giant intelligent systems.
learning, practical and experimental changes, campus changes, changes in security and
confidentiality, quality and ethics, changes of a financial nature, in addition to other types
of changes [2].
The IoT’s adoption in various fields has helped to revolutionise them. One such field
is higher education, which has begun adopting the IoT as a means of enhancing learning,
training, management, experimentation and so forth [1]. However, the IoT’s adoption
and its applications remains at a growth stage across industries. Given that the IoT is
at a nascent stage of widespread implementation in the education field, it is significant
to investigate the challenges and variables affecting its implementation, future potential
use, as well as overall benefit in education.
2 Methodology
This paper aims to review the literature with a focus on new developments, trends and
the application of the IoT in education. Thus, a range of IoT-focused literature was
searched, with relevant peer reviewed research articles being retrieved from the Scopus
database. From over 100 papers retrieved, only 54 fulfilled the inclusion criteria, namely
being published between 2017 and 2021, having the keywords in the title, as well as
being peer reviewed. In accordance with this paper’s scope, a range of related keywords
and phrases were applied to retrieve the related articles, with keywords being “IoT in
education”, “IoT and education”, “IoT application in education”, “IoT in teaching and
learning”, “IoT and online learning”, “Implication of IoT in Education”, as well as “IoT
and distance learning”. Subsequently, thematic analysis was applied to identify key
emerging themes [5], followed by summarising and organising of data [6]. The selected
papers data were broadly divided into the themes of “potential”, “challenges”, “factors”
and “application”.
3 Review
The IoT concerns technology transformation in various aspects. Smart cities, smart
homes, smart transportation and smart industries are such transformations stemming
from the IoT. Considerable crucial research and investigations have been undertaken with
the aim of enhancing technology via the IoT. Even so, substantial challenges and issues
remain that require resolution to attain the IoT’s full potential. This research presents a
systemic review of recent academic research published in scientific journals concerning
the IoT’s application in the education field. The paper’s results have been classified into
four sections: the technologies and application; benefits and potential; challenges, as
well as factors. Finally, the paper summarises the literature review directions, while also
providing recommendations regarding how future research could elaborate on the trends
and research developments identified in the extant literature reviewed.
Smart Campus is one of the newest concepts linked to the IoT’s application to the
education field. A few related terms have been adopted in the extant literature, including
smart classroom, smart library and smart books, all of which pertains to IoT’s integration
with campus related technologies. For example, the IOT may be used to collect mass
data through wearable devices, sensors and actuators, embedded sensors and QR codes.
These technologies can promote and enable the smart campus through the IoT being
adopted to manage related functions of university campuses, including temperature-
controlled devices, light power, security cameras and access to buildings, simplifies
access control, enhanced security, classroom monitoring and notification, automated
attendance processes, integration of the IoT and open data in school books, smart boards,
smart libraries and numerous other applications. Meanwhile, the IoT offers convenience
in relation to future smart campus design, construction, teaching, as well as overall
management.
The IoT enhances how schools monitor students’ behaviour, performance, loca-
tions, health and social behaviours, with applicability in the use of beacon chips as
a form of student identity. This technology ensures simplification of facial identifica-
tion challenges through the use of biometrics. Additionally, the IoT guarantees student
monitoring activities’ accuracy, thereby ensuring smart school applications [7]. Smart
operations management through the IoT can reduce costs for a sustained campus, because
it enables smarter service delivery. Higher education institutions are privileged by devel-
oping smart solutions that are achievable through smart services [3, 8–10]. Concerning
teaching and learning, the IoT’s most recent applications in the education field enable
the simplification of pedagogical methods. This technology enhances teacher-student
relationships, providing the teacher with novel ways of realising students’ deep learning
abilities. Schools are concerned with comprehending how the teaching environment’s
overall intelligence may be enhanced in order to strengthen learners’ outcomes.
Currently, it is economical to manage students, staff, researchers and lecturers
through sharing data and functionalities, the coexistence of old and new systems, in
addition to the elimination of major drawbacks that challenge school management. Tech-
nologies including sensor modules, micro controllers’ boards, digital payment services
and other infrastructure, enhance schools’ sustainability. IoT improves the traditional
education system through an innovative technology-guided learning strategy. In this
case, students, teachers and staff may collaborate to share ideas, materials, projects,
screens and communications. This ensures transversal combinations across actors along
the education value chain, where a common language is tangible for all stakeholders.
Flipped classrooms and online classes are further means through which students engaged
in long-distance learning can collaborate online.
The research reported applications pertaining to the IoT and eLearning, including
IoT technologies adopted for enhancing online learning through IoT data driven analy-
sis, gamification for making the learning experience engaging and effective, as well as
intelligent systems combining IoT, AI and VR tools. This assists instructors with super-
vising the students while presenting lessons and during their exams. Overall attempts
to fully automate the learning process have been made by connecting IoT devices with
cutting-edge learning technologies. The summary of IoT applications in education is
presented in Table 1.
The Internet of Things as a Tool Towards Smart Education 637
Table 1. (continued)
Table 2. (continued)
Regardless of the IoT’s popularity in the education field having increased due to it
offering empowerment, while being swift and effective, it has not yet been comprehen-
sively implemented. The challenges limiting its implementation are financial constraints,
complexity, privacy, security, trust and ethics. Furthermore, there is limited expertise,
meaning a dearth of guidance and standard authentication. This leads to the incompati-
bility of devices, while poor auditing standards are not defined for IoT components and
restricted interfaces. Additional challenges include a dearth of skills among users, in
addition to poor acceptability and scalability.
The Internet of Things as a Tool Towards Smart Education 641
The challenges limiting the IoT’s implementation are financial constraints, complex-
ity, privacy, security, trust and ethics. Security and privacy are among the fundamental
difficulties confronting the IoT in the education field, due to there being a lack of secu-
rity, limited device update improvements, in addition to poor user awareness concerning
security [19, 34, 39]. Cyber security attacks originate from objects’ massive intercon-
nectivity online, thus making it accessible by anonymous and untrusted users. Users’
privacy rights are fundamental in ensuring confidence in interconnected devices.
Further challenges include a dearth of skills among users, poor acceptability and
scalability, alongside poor power and internet connections, especially in developing
nations. Ultimately, it is significant to ensure safety and reliability, as well as a dual
computer backup system combined with other management strategies.
Big data comes hand in hand with the IoT, meaning that a large domain of interacting
objects will generate big data. It is anticipated that the IoT’s scalability will be an issue
with regard to the virtual classroom’s size, sensors and actuators, among other virtual
and physical objects. Key resources are expensive to acquire, for example accreditations,
buildings and faculty members, databases, in addition to other IoT technologies. Despite
the IoT enabling savings to be made on future expenditure, maintenance and installation
expenditure is nevertheless substantial. Additionally, curriculum divergence is a chal-
lenge, because it disadvantages students with limited credit hours and less profound IoT
skills and knowledge. This results in deficient practical skills that are generally lacking
across numerous campuses. Table 3 presents a summary of the difficulties linked with
the IoT’s implementation in the education field.
4 Discussion
Several studies have been undertaken in relation to the IoT in the educational context.
Research has considered the prospects of the IoT for transforming the educational system,
for example [16, 17, 19, 48]. Additional studies have been undertaken investigating users’
acceptance of IoT-based applications [39, 49] being adopted in the educational context
[1, 24, 36]. The results revealed that such technologies have tremendous potential to
transform the education paradigm, although the IoT’s implementation continues to be
linked with particular difficulties, for example security and educational management
issues [20].
A study by [50] investigated solutions to problems pertaining to the curriculum,
human resources, financial restrictions, distance learning and cultural challenges, as
well as how to have confidence in the examination process. Such solutions include inter-
active text books, 3D positioning technologies to solve security issues, IoT end-devices
attendance data and intelligent camera vision, which are all usable on campus. Further-
more, [51] appraised the educational management decision-making process. The IoT has
the potential to shift the educational system’s design to be more responsive to students’
needs. For IoT-based application studies, the results revealed the most significant IoT
applications, including smart tools (pens, stopwatches, glasses), experimental learning,
smart notifications, instant feedback and monitoring, security and control equipment,
students’ behaviour and interaction systems, monitoring, student attendance, big data
analysis, in addition to information sharing.
Additional research has attempted to devise innovative concepts, as well as having
devised or recommended novel approaches. For instance, the Internet of robotic things
manages the interaction between the physical and virtual worlds [46], while [40] sought
to identify how each aspect of the educational system may be automated with IoT chips.
Further studies have concerned online education, for example research [27] that
developed an amended hybrid blended learning model, enabling educators and learners
to co-create knowledge and enjoy online learning, combining the advantages of face-to-
face learning with the traditional approach. Additional studies have focused on the IoT
and Gamification, IoT for e-learning using gamification [29], as well as an education
game based on the IoT. The research [31] discussed the IoT’s role in an effective dis-
tance learning process, aiming to devise a model to enable teachers’ provision of instant
feedback to students, thus overcoming the challenges of distance learning compared
with face-to-face learning. Study [52] proposed a framework for measuring students’
behaviour and attentiveness by observing facial expressions, while [18] proposed a novel
model for integrating educational environment objectives with virtual academic com-
munities. Study [26] adopted an application orientated architecture (AOA) alongside
The Internet of Things as a Tool Towards Smart Education 643
The IoT’s potential to support education has been significantly emphasised in the extant
literature, although the majority of research continues to be of a theoritical nature rather
than dealing with practical aspects. Most studies used qualitative methodologies [1],
primarily literature reviews, while other researches have used case studies. A more lim-
ited number of studies have adopted quantitative methodolgies such as surveys, while a
few have developed innovative technologies and proposed novel applications employ-
ing the IoT in the education field. Accordingly, it is recommend that further studies be
undertaken into the various IoT technologies’ applications in the learning environment.
In the e-learning context, popular learning management systems (LMS) must adopt
it as part of new emerging technologies in the education field [55], thereby providing
students with rich experiences of informal and lifelong learning. The author in [56] rec-
ommended that future research should concentrate on constructing the smart education
644 A. K. Alhazmi et al.
References
1. Chweya, R., Ibrahim, O.: Internet of Things (IoT) implementation in learning institutions:
a systematic literature review. Pertanika J. Sci. Technol. 29(1), 471–517 (2021). Universiti
Putra Malaysia Press. https://doi.org/10.47836/pjst.29.1.26
2. Mircea, M., Stoica, M., Ghilic-Micu, B.: Investigating the impact of the Internet of Things
in higher education environment. IEEE Access 9, 33396–33409 (2021). https://doi.org/10.
1109/ACCESS.2021.3060964
3. Al-Emran, M., Malik, S.I., Al-Kabi, M.N.: A survey of Internet of Things (IoT) in education:
opportunities and challenges. In: Hassanien, A.E., Bhatnagar, R., Khalifa, N.E.M., Taha,
M.H.N. (eds.) Toward Social Internet of Things (SIoT): Enabling Technologies, Architectures
and Applications. SCI, vol. 846, pp. 197–209. Springer, Cham (2020). https://doi.org/10.1007/
978-3-030-24513-9_12
4. Verma, A., Singh, A., Anand, D., Aljahdali, H.M., Alsubhi, K., Khan, B.: IoT inspired intelli-
gent monitoring and reporting framework for education 4.0. IEEE Access 9, 131286–131305
(2021). https://doi.org/10.1109/ACCESS.2021.3114286
5. Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101
(2006). https://doi.org/10.1191/1478088706qp063oa
The Internet of Things as a Tool Towards Smart Education 645
6. Maguire, M., Delahunt, B.: Doing a thematic analysis: a practical, step-by-step guide for
learning and teaching scholars (2017)
7. Banica, L., Burtescu, E., Enescu, F.: The Impact of Internet-of-Things in Higher Education.
http://www.gartner.com/newsroom/id/2819918
8. Charmonman, S., Mongkhonvanit, P., Dieu, V.N., van der Linden, N.: Applications of Internet
of Things in E-learning. Int. J. Comput. Internet Manag. 23(3), 1–4. www.charm.SiamTe
chU.net
9. Martins, P., Lopes, S.I., da Cruz, A.M.R., Curado, A.: Towards a smart & sustainable campus:
an application-oriented architecture to streamline digitization and strengthen sustainability in
academia. Sustainability 13(6) (2021). https://doi.org/10.3390/su13063189
10. . McRae, L, Ellis, K., Kent, M.: The Internet of Things (IoT): Education and Technology.
http://www.curtin.edu.au/
11. Shehzad, K., Xiaoxing, L., Sarfraz, M., Zulfiqar, M.: Signifying the imperative nexus between
climate change and information and communication technology development: a case from
Pakistan. Environ. Sci. Pollut. Res. 27(24), 30502–30517 (2020). https://doi.org/10.1007/s11
356-020-09128-x
12. Majeed, A., Ali, M.: How Internet-of-Things (IoT) making the university campuses smart?
QA higher education (QAHE) perspective. In: 2018 IEEE 8th Annual Computing and Com-
munication Workshop and Conference, CCWC 2018, vol. 2018, pp. 646–648, January 2018.:
https://doi.org/10.1109/CCWC.2018.8301774
13. Domínguez, F., Ochoa, X.: Smart objects in education: an early survey to assess opportu-
nities and challenges. In: 2017 4th International Conference on eDemocracy eGovernment,
ICEDEG 2017, pp. 216–220 (2017). https://doi.org/10.1109/ICEDEG.2017.7962537
14. Hardyanto, H.: Smartclass design based on Internet of Things. In: International Confer-
ence on Education and Science, Icons, pp. 959–962 (2017)
15. Bagheri, M., Movahed, S.H.: The effect of the Internet of Things (IoT) on education business
model. In: Proceedings - 12th International Conference on Signal Image Technology and
Internet-Based Systems, SITIS 2016, pp. 435–441 (2017). https://doi.org/10.1109/SITIS.201
6.74
16. Ramlowat, D.D., Pattanayak, B.K.: Exploring the Internet of Things (IoT) in education: a
review. In: Satapathy, S.C., Bhateja, V., Somanah, R., Yang, X.-S., Senkerik, R. (eds.) Infor-
mation Systems Design and Intelligent Applications. AISC, vol. 863, pp. 245–255. Springer,
Singapore (2019). https://doi.org/10.1007/978-981-13-3338-5_23
17. Alalade, A.M., Ejemeyovwi, J.O., Ekong, E.E., Adeyemo, D.: Internet of Things as a tool
for enhancement of education administration and delivery. Int. J. Mech. Eng. Technol. 10(5),
48–62 (2019)
18. Marquez, J., Villanueva, J.,. Solarte, Z, Garcia, A.: IoT in education: integration of objects
with virtual academic communities. In: Advances in Intelligent Systems and Computing, vol.
444, pp. 201–212 (2016). https://doi.org/10.1007/978-3-319-31232-3_19
19. Rodney, B.D.: Understanding the paradigm shift in education in the twenty-first century: the
role of technology and the Internet of Things. Worldw. Hosp. Tour. Themes 12(1), 35–47
(2020). https://doi.org/10.1108/WHATT-10-2019-0068
20. He, X., Guo, H., Cheng, X.: Blockchain-based privacy protection scheme for IoT-assisted
educational big data management. Wirel. Commun. Mob. Comput. 2021 (2021). https://doi.
org/10.1155/2021/3558972
21. Guo, J., Sun, C.: Real-time monitoring of physical education classroom in ceges and universi-
ties based on open IoT and cloud computing. J. Intell. Fuzzy Syst. 40(4), 7397–7409 (2021).
https://doi.org/10.3233/JIFS-189563
22. Paganelli, F., Mylonas, G., Cuffaro, G.: A RESTful rule management framework for internet
of things applications. IEEE Access 8, 217987–218001 (2020). https://doi.org/10.1109/ACC
ESS.2020.3041321
646 A. K. Alhazmi et al.
23. Herlianto, H.R., Kusuma, G.P.: IoT-based student monitoring system for smart school appli-
cations. Int. J. Emerg. Trends Eng. Res. 8(9), 6423–6430 (2020). https://doi.org/10.30534/ije
ter/2020/242892020
24. Jasim, N.A., Salim AlRikabi, H.T., Farhan, M.S.: Internet of Things (IoT) application
in the assessment of learning process. In: IOP Conference Series: Materials Science and
Engineering, vol. 1184, no. 1, p. 012002 (2021). https://doi.org/10.1088/1757-899x/1184/1/
012002
25. Miglino, O., Di Fuccio, R., Di Ferdinando, A., Ricci, C.: BlockMagic, a hybrid educational
environment based on RFID technology and Internet of Things concepts. In: Giaffreda, R.,
et al. (eds.) IoT360 2014. LNICSSITE, vol. 150, pp. 64–69. Springer, Cham (2015). https://
doi.org/10.1007/978-3-319-19656-5_10
26. Magyari, A., Chen, Y.: FPGA remote laboratory using IoT approaches. Electronics 10(18)
(2021). https://doi.org/10.3390/electronics10182229
27. Njeru, A.M., Omar, M.S., Yi, S., Paracha, S., Wannous, M.: Using IoT technology to improve
online education through data mining. In: Proceedings of the 2017 IEEE International Con-
ference on Applied System Innovation: Applied System Innovation for Modern Technology,
ICASI 2017, pp. 515–518 (2017). https://doi.org/10.1109/ICASI.2017.7988469
28. Shinghal, K., Saxena, A., Saxena, N., Misra, R.: IoT based modified hybrid blended learning
model for education. In: Proceedings of the 2020 International Conference on Advances
in Computing, Communication and Materials, ICACCM 2020, pp. 229–232 (2020). https://
doi.org/10.1109/ICACCM50413.2020.9213049
29. AjazMoharkan, Z., Choudhury, T., Gupta, S.C., Raj, G.: Internet of Things and its applications
in E-learning. Int. J. Eng. Technol. 7, 422–427 (2017). https://doi.org/10.1109/CIACT.2017.
7977333
30. Zaguia, A., Ameyed, D., Haddar, M., Cheikhrouhou, O., Hamam, H.: Cognitive IoT-based
e-learning system: enabling context-aware remote schooling during the pandemic. J. Healthc.
Eng. 2021 (2021). https://doi.org/10.1155/2021/7358874
31. Yakoubovsky, R., Sarian, V.: IoT in effective distance learning process. In: 2019 10th IFIP
International Conference on New Technologies, Mobility and Security, NTMS 2019, pp. 1–4
(2019). https://doi.org/10.1109/NTMS.2019.8763805
32. Hassan, R.H., Hassan, M.T., Naseer, S., Khan, Z., Jeon, M.: ICT enabled TVET education:
a systematic literature review. IEEE Access (2021). https://doi.org/10.1109/ACCESS.2021.
3085910
33. Hayashi, V.T., Arakaki, R., Ruggiero, W.V.: OKIoT: trade off analysis of smart speaker archi-
tecture on open knowledge IoT project. Internet Things 12 (2020). https://doi.org/10.1016/j.
iot.2020.100310
34. Riekki, J., Mammela, A.: Research and education towards smart and sustainable world. IEEE
Access 9, 53156–53177 (2021). https://doi.org/10.1109/ACCESS.2021.3069902
35. Shi, W., Haga, A., Okada, Y.: Web-based 3D and 360∝VR materials for IoT security education
and test supporting learning analytics. Internet Things. 15, 100424 (2021). https://doi.org/10.
1016/j.iot.2021.100424
36. Pour, M.J., Hosseinzadeh, M., Rafiei, K.: Identifying and prioritizing applications of Inter-
net of Things (IOT) in educational learning using Interval Best-Worst Method (BWM). In:
Proceedings of the 4th International Conference on Smart City, Internet Things Applications,
SCIoT 2020, pp. 1–6 (2020). https://doi.org/10.1109/SCIOT50840.2020.9250206.
37. A.-L. Enterprise: The Internet of Things in Education Improve learning and teaching
experiences by leveraging IoT on a secure foundation Solution Brief IoT in Education
38. Kassab, M., Neto, V.V.G., Allian, A.: Investigating quality requirements from a human per-
spective in IoT-based software architectures for education. In: PervasiveHealth: Pervasive
Computing Technologies for Healthcare, vol. 2, pp. 241–244 (2019). https://doi.org/10.1145/
3344948.3344978
The Internet of Things as a Tool Towards Smart Education 647
39. Ionescu-Feleaga, L., S, tefan Ionescu, B., Bunea, M.: The IoT technologies acceptance in
education by the students from the economic studies in Romania. Amfiteatru Econ. 23(57),
342–359 (2021). https://doi.org/10.24818/EA/2021/57/342
40. Tripathi, G., Ahad, M.A.: IoT in education: an integration of educator community to promote
holistic teaching and learning. In: Nayak, J., Abraham, A., Krishna, B.M., Chandra Sekhar,
G.T., Das, A.K. (eds.) Soft Computing in Data Analytics. AISC, vol. 758, pp. 675–683.
Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-0514-6_64
41. Bajracharya, B., Blackford, C., Chelladurai, J.: Number 1, vol. 6
42. Kumar, S.R., et al.: This work is licensed under a creative commons attribution 4.0 interna-
tional license IOT based cloud integrated smart classroom and sustainable campus. Int. Adv.
Res. J. Sci. Eng. Technol. 8(5) (2021). https://doi.org/10.17148/IARJSET.2021.8560
43. Gómez, J., Huete, J.F., Hoyos, O., Perez, L., Grigori, D.: Interaction system based on Internet
of Things as support for education. Procedia Comput. Sci. 21, 132–139 (2013). https://doi.
org/10.1016/j.procs.2013.09.019
44. Kiryakova, G., Yordanova, L., Angelova, N.: Can we make Schools and universities smarter
with the Internet of Things? TEM J. 6(1), 80–84 (2017). https://doi.org/10.18421/TEM61-11
45. Kumar, A., Vengatesan, K., Rajesh, M., Singhal, A.: Teaching literacy through animation &
multimedia. Int. J. Innov. Technol. Explor. Eng. 8(5), 73–76 (2019)
46. Romeo, L., Petitti, A., Marani, R., Milella, A.: Internet of robotic things in smart domains:
applications and challenges. Sensors 20(12), 1–23 (2020). MDPI AG. https://doi.org/10.3390/
s20123355
47. Gutiérrez-Martínez, Y., et al.: A challenge-based learning experience in industrial engineering
in the framework of education 4.0. Sustainability 13(17) (2021). https://doi.org/10.3390/su1
3179867
48. Shrestha, S.K., Furqan, F.: IoT for smart learning/education (2021)
49. Abed, S., Alyahya, N., Altameem, A.: IoT in education: its impacts and its future in saudi
universities and educational environments. In: Luhach, A.K., Kosa, J.A., Poonia, R.C., Gao,
X.-Z., Singh, D. (eds.) First International Conference on Sustainable Technologies for Com-
putational Intelligence. AISC, vol. 1045, pp. 47–62. Springer, Singapore (2020). https://doi.
org/10.1007/978-981-15-0029-9_5
50. Mohammadian, H.D.: IoT - a solution for educational management challenges. In: IEEE
Global Engineering Education Conference, EDUCON, pp. 1400–1406, April 2019. https://
doi.org/10.1109/EDUCON.2019.8725213
51. Silva, R., de Pontes Bernardo, C., Watanabe, C.Y.V., da Silva, R.M.P., da Silva Neto, J.M.:
Contributions of the internet of things in education as support tool in the educational man-
agement decision-making process. Int. J. Innov. Learn. 27(2), 175–196 (2020). https://doi.
org/10.1504/IJIL.2020.105077
52. Mahmood, S., Palaniappan, S., Hasan, R., Sarker, K.U., Abass, A., Rajegowda, P.M.: Rasp-
berry PI and role of IoT in education. In: 2019 4th MEC International Conference on Big
Data and Smart City, ICBDSC 2019, pp. 1–6 (2019). https://doi.org/10.1109/ICBDSC.2019.
8645598
53. Cornetta, G., Touhafi, A., Togou, M.A., Muntean, G.M.: Fabrication-as-a-service: a web-
based solution for STEM education using Internet of Things. IEEE Internet Things J. 7(2),
1519–1530 (2020). https://doi.org/10.1109/JIOT.2019.2956401
54. Fang, A.D., Xie, S.C., Cui, L., Harn, L.: Research on the structure and practice of internet
environment of things based on big data analysis. Ekoloji 28(107), 4239–4247 (2019)
55. Alhazmi, A.K., Imtiaz, A., Al-Hammadi, F., Kaed, E.: Success and failure aspects of LMS
in e-learning systems. Int. J. Interact. Mob. Technol. 15(11), 133–147 (2021). https://doi.org/
10.3991/ijim.v15i11.20805
648 A. K. Alhazmi et al.
56. Dai, Z., Zhang, Q., Zhu, X., Zhao, L.: A Comparative study of Chinese and foreign research
on the Internet of Things in Education: bibliometric analysis and visualization. IEEE Access
9, 130127–130140 (2021). https://doi.org/10.1109/ACCESS.2021.3113805
57. Kumar, S., Tiwari, P., Zymbler, M.: Internet of Things is a revolutionary approach for future
technology enhancement: a review. J. Big Data 6(1), 1–21 (2019). https://doi.org/10.1186/
S40537-019-0268-2/FIGURES/9
The VCDLN Mobile Learning System for Digital
Learning Services in Pandemic Covid-19
1 Introduction
The COVID-19 pandemic, which impacts service delivery and direct learning interac-
tions, requires new studies for policymakers, scientists, and industrial partners provid-
ing digital learning platforms to collaborate in the development of an e-learning system
[1]. According to research, several professional educator organizations such as Subject
Teacher Consultations (STC) and Teacher Working Groups (TWG), Indonesian Teach-
ers Association, and even UNESCO, quickly implemented the necessary innovations.
Everyone has realized how important it is to establish a robust communication system
and strategy [2] in learning services right away. Also, new techniques and strategies are
urgently needed considering the development of the educational world, which has seen
a shift from face-to-face teaching to learning from home.
As part of its strategic response to the COVID-19 pandemic in Indonesia, the ministry
developed an online digital mobile learning innovation policy [3]. This is a new force in
the “New Normal Education” era’s learning revolution. As a result, this research aims to
create a “Virtual Community Digital Learning Nusantara in the COVID-19 pandemic”.
It is expected to be able to accommodate all innovations and revolutions in learning
steps through the mobile systems [4] and healthy learning communication strategies in
a virtual, community, digital, online, mobile, electronic, distance learning framework
that packaged in the form of Television Digital Mobile. Then, this is developed and uti-
lized by the VCDLN community, which includes educators, industry, local governments,
schools, minimarkets, police stations, village, and sub-district heads, all of whom work
together to serve students in remote parts of the archipelago. Specifically, the objectives
of this research are as follows (1) to integrate the distance learning system with Learning
Resources multiple “Hand on hand Technology”; (2) to develop a CBT system for e-
assessment needs; and (3) to measure the opinions of the VCDLN community members
which include minimarket, village office, district police office, integrated healthcare cen-
ter, school, sub-district office, community health centers and military Rayon Command
on the implementation of the program as a model for Multiplatform Distance Learning,
during the COVID-19 pandemic.
2 Literature Review
2.1 Element of the VCDLN System
In retrospect, some essential elements in implementing the VCDLN system can be
analyzed regarding several terms and objects or target subjects that are often used in
educational practice, such as software, hardware, Brain ware, and environmental ware.
Likewise, the analysis is likely to put into practice quickly a new mobile learning concept
Hardware
Brain ware
Environment
ware SoŌ
ware
Learning PracƟces
Technological Virtual
EducaƟon Community
InsƟtuƟon (Euro,
USA, Asia)
Virtual
communicaƟon
Technology
(Spain)
Social Media
& Website
or model adapted to research from [5, 6]. In the implementation paradigm, mobile
education communication system and strategy services are needed [7], and the learning
innovation in this research is called the Virtual Community Digital Learning Nusantara
(VCDLN) system. Further, the analysis results of these elements can be seen in the
following Fig. 1 below this.
These elements of VCDLN System, like on the Fig. 1 become the main study in the
regulation in the section above that the current rules and demands return to normal in
new conditions. Then the implementation of this system must be harmonized with the
“New Normal” Regulation following the evaluation of the Mobile Technology expert
system [8]. As a method of implementing VCDLN in the context of the realization of
this new normal, Mobile Blended Learning will be possible. In addition, this method is
governed by the Minister of Research, Technology, and Higher Education Regulation
Number 51 of 2018. Thus, this is a concrete form of education and learning policy with
face-to-face and distance learning systems that use online databases as a research study
[4], primarily through the Television Broadcast Program.
3 Research Method
This research employed the Mix Method (Qualitative and Quantitative) [14], which is
commonly used in study every year (2001, 2022, and 2023). The qualitative approach was
used to integrate Distance Learning Services with multiple Learning Resources “Hand
on hand Technology” as a Mobile Television system adaptable to Elementary, Middle,
and Higher Education levels. Also, it was used to develop a CBT system for VCDLN’s
e-Assessment needs. Meanwhile, the quantitative approach was utilized to assess the
impact of the opinions of VCDLN community members on the implementation of the
VCDLN Service Program. In addition, Multiple Regression analysis was used as the test
statistics.
4.1 Integrating the Distance Learning Service System with Multiple Learning
Resources “Hand on Hand Technology”
The VCDLN system, which was built in a special website with the domain name VCDLN
Learning was developed as an e-learning educational database that accommodates the
learning video products [7]. Besides being a Dbase system, it was also created in the
form of Mobile-VCDLN learning through an APK application that can be accessed and
downloaded via Android, using research from [15]. The goal is to design a multiplatform
system that will benefit both educators and students. Furthermore, to utilize ready-made
The VCDLN Mobile Learning System for Digital Learning Services 653
learning video products, the last system is in the form of Youtube Official with the name
VCDLN access, which is integrated with TVUPI, which can be found at https://www.
youtube.com/watch?v=hGDec-Jpm4E&t=206s.
In this pandemic era, the three systems that are integrated into the Mobile Distance
Learning model are constructed in the following below this Fig. 3.
Fig. 3. Integration of VCDLN development products for mobile distance learning services in
“hand on hand technology.”
The need for integration arises from the desire to serve all students who have varying
ownership levels of digital learning infrastructure. Considering that not all pupils have
mobile phones, VCDLN technology tries to provide a variety of devices, channels,
and learning resources, citing the research [8]. Furthermore, one of the most important
reasons for incorporating the VCDLN product is to address differences in the region and
geographical conditions in which students live. Indonesia has regional differences that
are still limited in terms of access and signals so far. As a result, the solution developed
by VCDLN, Satellite TVUPI, will become an option in distance learning services with
research references dating back to [16].
This research also developed an evaluation tool to measure students’ success in learning
using the Multiplatform VCDLN. Because the learning system is digital, online, and
mobile-based, the evaluation system is also digital, online, and mobile using the CBT
concept (Computer Based Test). This CBT system is placed as an e-Assessment function,
with architectural considerations for online digital services that mobile online clients can
access.
654 D. Darmawan et al.
Client
Server
Client
Figure 4 above shows the Client-server communication system. The application has
two login page views for test takers and administrators. Participants can only register and
take tests, whereas VCDLN administrators (in this case: teachers) can manage systems
such as questions, texts, grades, and students’ [1]. In addition, students can register, test,
and view test results in this system, while educators or administrators can manage text,
types of questions, users, and test results [16].
The first step in developing CBT VCDLN is making DFD by creating context dia-
grams. This illustration depicts the system as a whole, with visible external entities.
Then, it is subdivided into DFD level 1, which consists of processes that occur within
the context of the diagram. It is broken down from DFD level 0 to DFD level 1, a more
detailed process, like on Fig. 5 below this.
Export nilai
Logout
Home Text
Question
Partisipants as User Registration
Guidline Evaluation
Rest of CBT Result
Manage of Login
Manual CBT Test
DFD level 0 shows the CBT process traffic for each educator during the online
evaluation process for their students, to which this development refers [17]. As shown
in the following figure, a level 1 DFD was developed to explain how the CBT system
works. The interrelation diagram between all elements in Data Flow Diagram Level 01,
can be seen in the Fig. 6 below this.
The VCDLN Mobile Learning System for Digital Learning Services 655
Question
Result
1.5
Admin Manage Logout
Manual
Questions 1.7
Item Test
Data Score test questions Show
Results
Result of
Evaluation Evaluation
1.10 1.6
Data Modul
Modul Result
Result of 1.8 Show
Evaluation Evaluation Evaluation
This research assessed the opinions of the VCDLN community to determine the impact
of the VCDLN program’s implementation based on the findings of a previous study [18].
The communities that have been designated as members of the VCDLN are asked to
contribute their thoughts on the influence of changes caused by the program’s integration
in their area. Moreover, the measurement process was carried out using multiple regres-
sion test statistics. The following measurement output shows the results of calculations
regarding the effect of 8 community variables (X1 , X2 , X3 , X4 X5 , X6 , X7 , and X8 ) on
the success variables of the VCDLN program implementation (Y). For more details can
be seen in Table 1 below.
The table above depicts the relationship between the independent variables consisting
of (X1 ), (X2 ), (X3 ), (X4 ), (X5 ), (X6 ), (X7 ), and (X8 ) on the dependent variable (Y),
namely the VCDLN Program Implementation, with a coefficient of 0.794. The total
contribution of Military Rayon Command, Community Health Centers, Sub District
Offices, Integrated Healthcare Centers, District Police Offices, and schools to implement
the program is 0.59. Thus, the successful implementation of the VCDLN program is
determined by the contribution of the 8 X variables, namely KP x 100% = 0.590 x
100% = 59.0%. In contrast, the remaining 41% is influenced by other variables not
examined. This finding demonstrates that the strength of each leadership community
will determine the success of the VCDLN program field [19].
Furthermore, to prove the significance of the simultaneous effect of the dependent
variable of (X1 ), (X2 ), (X3 ), (X4 ), (X5 ), (X6 ), (X7 ), and (X8 ) on the success of the
VCDLN (Y) program implementation, the ANOVA test with the F-count formula is used
as follows.
R2
K
F−count = (1)
(1 − R2 )/(n − 1)
The SPSS output results can be seen in the ANOVA table below.
Table 2. Anova
According to the Table 2 above, the F count value is 84 591, with a significance value
of 0.000. This value is greater than the F-Table value of 4.57, and the F Significance
Value is less than α = 5%. It can be explained that the null hypothesis is rejected. The
alternative view is accepted, which means that the variables (X1 ), (X2 ), (X3 ), (X4 ),
(X4 ), (X5 ), (X6 ), (X7 ), and (X8 ), which are community strengths, simultaneously have
a significant effect on the Y variable [20].
Based on the Coefficients on the Table 3 above, the multiple regression equation
Table 3. Coefficients
According to the table above, the minimarket regression coefficient (X1 ) is 0.106.
It indicates that if the score increases by one unit, the implementation of the VCDLN
program will rise by 10.6% as a result of the minimarket community’s influence. The
impact of other variables in a row starting from the Village Office variable (X2 ) with an
increase of 0.306 or 30.6%; District Office (X3 ) of 0.222 or 22.2%; Integrated Health
Care Center (X4 ) by 0.352 or 35.2%; School (X5 ) of 0.552 or 55.2%; Sub District Office
(X6 ) of 0.286 or 28.6%; Community Health Centers (X7 ) of 0.520 or 52.0%; Military
Rayon Command (X8 ) of 0.282 or 28.2%. These findings support research from [23] on
the power of Understanding value co-creation in virtual communities.
Furthermore, to test the significance level of each effect of the dependent variable
on the independent variable, the t-count value is used. The table above expresses that
all t-counts are greater than t-tables, 2.021. As a result, Ho is Rejected, and H1 is
accepted, indicating that the regression coefficient of the variable’s influence (Constant),
Military Rayon Command (X8 ), School (X5 ), Village Office (X2 ), Integrated Healthcare
Center (X4 ), Community Health Centers (X7 ), Mini Market (X1 ), District Police Office
(X3 ), and Sub District Office (X6 ) have a significant influence on VCDLN Program
Implementation (Y). These vital variables are expected to be able to become the basis
for distance learning services ranging from early childhood education to university, as
research from [21, 24–27].
658 D. Darmawan et al.
5 Conclusion
The implementation of this VCDLN innovation program achieved the targets of (1) inte-
grating the Distance Learning system with Learning Resources multiple “Hand on Hand
Technology,” (2) developing the CBT system for e-assessment needs, and (3) measuring
the opinions of the VCDLN community members which include Mini Market, Vil-
lage Office, District Police Office, Integrated Healthcare Center, School, Sub-District
Office, Community Health Centers, and Military Rayon Command on the implementa-
tion of the program as a Multiplatform Distance Learning model during the COVID-19
Pandemic. They have all provided comprehensive benefits in delivering solutions to dis-
tance learning services throughout Indonesia during the pandemic. The measurements
revealed that the contribution and influence of all the VCDLN community’s opinions on
the implementation of the programs are positive. With the help of the learning service
community, this VCDLN Mobile, which was built and tested, can be a multi-platform
distance learning service solution.
References
1. Tawafak, R.M., et al.: A combined model for continuous intention to use e-learning system.
Int. J. Interact. Mob. Technol. Technol. 15(03), 113–129 (2021)
2. Landicho, J.A.: VOISEE COMMUNICATOR: an android mobile application for hearing-
impaired and blind communications. Int. J. Interact. Mob. Technol. 10(4), 26 (2016). https://
doi.org/10.3991/ijim.v10i4.5859
3. Mun, S.H., et al.: Active learning using digital smart board to enhance primary school students’
learning. Int. J. Mob. Technol. 13(7), 4–16 (2019)
4. Rimale, Z., El Habib, B., Tragha, A., El Guemmat, K.: Survey on the use of the mobile
learning based on mobile cloud computing. Int. J. Interact. Mob. Technol. 10(3), 35 (2016).
https://doi.org/10.3991/ijim.v10i3.5672
5. Chohan, A.H., Mohd Affandi, H., Awad, J., Che-Ani, A.I.: A methodology to develop a
mobile application model to appraise housing design quality. Int. J. Interact. Mob. Technol.
11(6), 4 (2017). https://doi.org/10.3991/ijim.v11i6.6379
6. Almeatani, M., Alotaibi, H., Alasmari, E., Meccawy, M., Alghamdi, B.: Thesis supervision
mobile system for enhancing student-supervisor communication, pp. 4–14 (2019)
7. Karkar, A., Al Ja’am, J.: An educational ontology-based m-learning system. Int. J. Interact.
Mob. Technol. 10(4), 48 (2016). https://doi.org/10.3991/ijim.v10i4.6011
8. Divayana, D.G.H., et al.: An evaluation of instructional process of expert system course
program by using mobile technology-based CSE-UCLA model. Int. J. Interact. Mob. Technol.
11(6), 18 (2017). https://doi.org/10.3991/ijim.v11i6.6697
9. Villa-martinez, H.A.: Digital learning tools for mobile devices for accomplish hypothesis
testing of statistical parameters. Int. J. Interact. Mob. Technol. 13(6), 15–26 (2019)
10. Sharma, K., Mangaroska, K., van Berkel, N., Giannakos, M., Kostakos, V.: Information flow
and cognition affect each other: evidence from digital learning. Int. J. Hum. Comput. Stud.
146, 102549 (2021). https://doi.org/10.1016/j.ijhcs.2020.102549
11. Deng, C., Ji, X., Rainey, C., Zhang, J., Lu, W.: Integrating machine learning with human
knowledge. iScience 23(11), 101656 (2020). https://doi.org/10.1016/j.isci.2020.101656
12. Kattayat, S., Josey, S., Asha, J.V.: Mobile learning apps in instruction and students achieve-
ment. Int. J. Interact. Mob. Technol. 11(1), 143–147 (2017). https://doi.org/10.3991/ijim.
v11i1.6420
The VCDLN Mobile Learning System for Digital Learning Services 659
13. Zhao, H.: A summary of the research on the teaching mode of MOOCs, pp. 96–109 (2019).
https://doi.org/10.4236/jss.2019.72007
14. Creswell, J.D., Creswell, J.W.: Research Design: Qualitative, Quantitative, and Mixed
Methods Approaches. Sage Publ. (2017)
15. Zhampeissova, K., Kosareva, I., Borisova, U.: Collaborative mobile learning with smartphones
in higher education. Int. J. Interact. Mob. Technol. 14(21), 4 (2020). https://doi.org/10.3991/
ijim.v14i21.18461
16. Haddad, M.E.O., Ferreira, N.S.C., Faria, A.A.: The use of educational technologies in distance
education—enabling the appropriation of teaching and learning process. Open J. Soc. Sci.
02(01), 54–58 (2014). https://doi.org/10.4236/jss.2014.21006
17. Kraleva, R.: Designing an interface for a mobile application based on children’s opinion. Int.
J. Interact. Mob. Technol. 11(1), 53–70 (2017). https://doi.org/10.3991/ijim.v11i1.6099
18. Kattayat, S., Josey, S., Asha, J.V.: Mobile learning apps in instruction and students achieve-
ment. Int. J. Interact. Mob. Technol. 11(1), 143 (2017). https://doi.org/10.3991/ijim.v11i1.
6420
19. Hamzah, M.I.M., Jamil, M.F.: The relationship of distributed leadership and professional
learning community. Creat. Educ. 10(12), 2730–2741 (2019). https://doi.org/10.4236/ce.
2019.1012199
20. Gómez, R.L., Suárez, A.M.: Extending impact beyond the community: protocol for a scoping
review of evidence of the impact of communities of practice on teaching and learning in
higher education. Int. J. Educ. Res. Open 2, 100048 (2021). https://doi.org/10.1016/j.ijedro.
2021.100048
21. Strunga, A.: The integration of virtual learning communities into universities’ knowledge
management Models. Procedia Soc. Behav. Sci. 197, 2430–2434 (2015). https://doi.org/10.
1016/j.sbspro.2015.07.306
22. Strungă, A.: Using virtual learning communities in shaping the professional identity of pri-
mary and preschool pedagogy specialization students: a knowledge management approach.
Procedia Soc. Behav. Sci. 180, 460–467 (2015). https://doi.org/10.1016/j.sbspro.2015.02.145
23. Rodríguez-López, N.: Understanding value co-creation in virtual communities: the key role
of complementarities and trade-offs. Inf. Manag. 58(5) (2021). https://doi.org/10.1016/j.im.
2021.103487
24. Amponsah, E., Fusheini, A., Adam, A.: Influence of information, education and communi-
cation on prenatal and skilled delivery in the Tano North District, Ghana: a cross-sectional
study. Heliyon 7(6), e07245 (2021). https://doi.org/10.1016/j.heliyon.2021.e07245
25. Ahmady, S., Kohan, N., Bagherzadeh, R., Rakshhani, T., Shahabi, M.: Validity testing of
classroom community scale in virtual environment learning: a cross sectional study. Ann.
Med. Surg. 36, 256–260 (2018). https://doi.org/10.1016/j.amsu.2018.08.021
26. Liu, X., Zhang, J.: Foreign language learning through virtual communities. Energy Procedia
17, 737–740 (2012). https://doi.org/10.1016/j.egypro.2012.02.165
27. Aderibigbe, S.A., Dias, J.M., Abraham, M.S.: Understanding issues affecting students’ com-
mitment to online discussion forums in undergraduate courses. Int. J. Interact. Mob. Technol.
15(1), 4–23 (2021). https://doi.org/10.3991/IJIM.V15I01.17939
Applying Design Thinking Approach to Improve
Online Education
1 Introduction
The massive worldwide expansion of online education in response to COVID-19 teaches
many lessons about the value of distributed learning; equally important, it reminds us of
its limits. For example, all the education facilities in King Saudi Arabia (KSA) moved
to 100% online learning during the COVID-19 crisis. Squinting at a tiny display is more
taxing and less enjoyable than conventional in-person interactions for many professors,
parents, and students who hold and conduct their lessons online. Although there are not
any comprehensive data or surveys capturing this phenomenon yet, many teachers and
students report that they cannot spend as much time learning online as they could in
person without being overwhelmed [1].
The COVID-19 pandemic has also led to depression, anxiety, and mental issues,
among school children [2]. During the current pandemic situation, it has been important
to continue children’s education by implementing measures for remote online learn-
ing. However, the change in routine and limited access to their courses affect students’
behavior, alter their moods, and can cause acute depression [1]. Children are unable to
cope with online education, as they have difficulties managing challenges caused by
the pandemic. They are unable to attend school in person and are restricted to online
education. They can often become depressed when they are unable to complete their
tasks at home, and see online education as a burden [3].
Many optimization algorithms have been rendered unusable over time since not all
stakeholders’ interests are often addressed when creating a unique technological solution.
Recognizing this, Design Thinking is a strategic approach based on humans to improve
solving problems through invention in different fields [4].
Design Thinking is “a discipline that uses the designer’s sensibility and methods to
match peoples’ needs with what is technologically feasible and what a viable business
strategy can convert into customer value and market opportunity” [5]. It underlines a
thoughtful and specific process for identifying issues in the system and developing or
coming up with potential solutions. It is predicated on the transformative yet simple
notion that individuals who face problems daily would have a high likelihood of holding
the key to solving them. Design thinkers often collaborate with multiple stakeholders
and actively identify problems and remedies, so the resultant solutions are the product of
thought, collaboration, and iterative effort from different perspectives. Design Thinking
revolves around three aspects: desirability (desirable conditions are available to help
understand the scenario), viability (the ability to grow and observe conditions), and
feasibility (conditions matching with the people’s needs) [6].
Online education cause depression, anxiety, and frustration among youth and chil-
dren. About 13.4% aged 5 to 24 years old account for mental health disorders in the
USA [7]. Studies in Saudi Arabia showed that depression was present among 6.7% of
children aged 14–25 years and 11.3% of children aged 7–9 years [8].
Since COVID-19, the Ministry of Education in Saudi Arabia established the
Madrasty platform as the new gateway for distance teaching and learning for the 1st
to 12th grade for the 2020–2021 school year. However, students have faced many issues
with the platform. The authors studied these issues as part of this research.
This paper aims to improve the design of the Madrasty website educational in Saudi
Arabia during online education through applying the Design Thinking approach as a
foundation to extract the requirement set for a design prototype with complete options
to support remote learning. In addition to investigate the level of depression between
the physical attendance in school and remote school attendance in Saudi Arabia. This
comparison is considering adding the feature of mental health states to the educational
website for remote learning.
The rest of the paper is organized as follows: Sect. 2 presents the related work in
applying Design Thinking in education and the relation between depression in children
and online education. Section 3 illustrates the approach of Design Thinking workflow.
662 A. Alwadai and R. Alnanih
Section 4 addresses the emphasis phase, including the data collection and analysis.
Section 5 identifies the problem and Sect. 6 defines the idea. Section 7 presents the
prototype. Section 8 describes the test phase, and Sect. 9 shows the result analysis in
detailed tabular form. Section 10 discusses the work and concludes in Sect. 11.
2 Literature Review
The literature presents the related work from two views: 1) Design Thinking in education
and 2) children’s depression in remote learning.
learning. Schaeffer and Konetes found that students learning remotely are more likely
to leave their studies than traditional education students [12]. Furthermore, the major
factor affecting students’ studying skills was social isolation during online learning [13].
Generally, depression and anxiety are higher in children with ASD (autism spectrum
disorder), which may be due to school closures and working from home. Moreover,
depression has increased among parents due to a lack of professional support. Althiabi
et al. suggested a methodology to explore the anxiety and attitudes of children and parents
during COVID-19 [14]; that study helped to analyze factors for government involvement
to resolve the burden of working from home to stabilize the mental health of the people.
4 Empathize
Empathy-driven development aims to improve user involvement, engagement, and moti-
vation; as a result, it has the capability to mitigate some of the drawbacks of traditional
methods. The empathic design considers the entire end-to-end user experience in addition
to the core issue, its relevance, and the needs of multiple users [4].
The first step of the Design Thinking methodology is empathically providing the
researchers with real children’s data and needs to improve the current design of educa-
tional websites compatible with the remote learning needs and requirements. This phase
is achieved through two sequential steps; the output from the first step is input for the
second step. The details of these two steps are described below.
664 A. Alwadai and R. Alnanih
4.1 Step1
The researchers conducted a questionnaire to evaluate the current educational site and
verify adding features to support remote learning. The target users were the parent
of children in grades 1 and 2 (7–9 years old) and the Madrasty platform teachers. A
questionnaire was distributed to parents and teachers to measure their opinion about the
Madrasty platform in virtual learning. A total of 350 questionnaires were distributed,
74.3% for parents and 25.7% for teachers. A list of proposed features related to the
Madrasty platform and the results are presented in Table 1.
• Regarding having “Tutorial guide” video on how to use the Madrasty platform help
to better understand the platform’s work, about 81.4% of the participants agreed that,
as 57.4% of the parents and 24% of the teachers agreed that.
• Regarding the preference of an option for a student to display new “Notifications” on
the platform, such as new assignments and tasks to be completed, 90% preferred that,
as 66.9% of the parents and 23.1% of the teachers preferred it.
• Regarding the preference of the lesson “Recorded classes” for reference when study-
ing, about 82.6% of the total participants preferred that, as 64% of the parents and
18.6% of the teachers preferred it.
• Regarding preference of having an option for a student to display the “Grade Center”
consisting of different assessments (assignment - project test) for a course in one place
so that they can be followed up, about 82% preferred that, as about 59.4% of the parent
preferred and 22.6% of the teachers preferred it.
• Regarding the most important reasons that negatively affect the effectiveness of online
education, Table 2 shows the psychological reason such as (depression and anxiety)
has the important reason with about 41.7% of the total participants, as 30.3% of
the parents and 11.4% of the teachers selected the psychological reason. Then the
family and the health reasons got about 32% of the total participants as 16% for
each. In comparison, the material reason got 14.9% of the total. Furthermore, the
remaining 11.4% were for other reasons that negatively affect the effectiveness of
online education.
What are the most reasons that negatively Parents Teachers Total
affect the effectiveness of online education Freq. Percent Freq. Percent Freq. Percent
(from your point of view)?
Psychological 106 30.3% 40 11.4% 146 41.7%
Family 39 11.1% 17 4.9% 56 16.0%
Health 48 13.7% 8 2.2% 56 16.0%
Material 34 9.7% 18 5.1% 52 14.9%
Other 33 9.4% 7 2% 40 11.4%
Total 260 74.3% 90 25.7% 350 100%
From this step, the author concludes that adding the aforementioned features to the
Madrasty platform is required and the importance of considering the mental health states
in the Madrasty platform.
4.2 Step 2
Based on the previous step, the researchers conducted a questionnaire to examine the
impact of children’s mental health status on physical school attendance and remote
school attendance. The sample population was chosen from the Asir and Riyadh regions
of Saudi Arabia because of the easy accessibility for the researchers during the year
2021. The target users were the parent of young children in grades 1 and 2 (7–9 years
old). The questionnaire was structured into three parts as follow:
Before distributing the questionnaire, a test questionnaire was conducted with expert
users to ensure the items were clear. The questionnaire was distributed online through
family and friends, and responses from the two target regions were extracted. The
666 A. Alwadai and R. Alnanih
researchers received 1455 responses, 843 came from the Asir and Riyadh regions; 595
participants had children aged 7–9 years, and this is the sample considered. The sample
size was decided based on a confidence level of 95% and an error margin of 5%. Relia-
bility was tested to measure what degree to which the research tool could be relied on to
ensure the same results in repeated application. The results indicate that the indicators
(Cronbach’s alpha and split half) were 0.747, and 0.679 considered high for parts 2 and
3, respectively.
moderate depression, and 9.1% suffer from severe depression; more than half of the
sample suffered from moderate or severe depression when studying online.
Figure 2 compares the results between children’s level of depression when attending
school in-person versus online. The results clearly show that children’s depression was
higher during remote learning than attending school in person.
5 Definition
The second step of the Design Thinking approach is to define the problem. The empathy
step identified the problem of existing depression during remote learning. It indicates a
need to design and add a set of features to complete the requirements need of children
during online education. The authors collected the issues from the Madrasty website and
defined them as PM# (problem Madrasty #), and the list is shown in Table 8.
6 Ideate
The third step of the Design Thinking approach is to ideate. For this purpose, the authors
explored various ideas related to the platform and its issues. The proposed ideas are
mainly based on the following criteria related to design principles:
• Simplicity: Making the design easy to understand, regardless of the users’ experience,
knowledge, language skills, or current concentration level, and adding instructions
with illustrations and text.
• Usability: Considering the most critical factor in assessing the quality of web appli-
cation user interface, where product users are mainly concerned with the ease of
finding information quickly and want a platform that is easy to navigate, along with
its design, and content. For example, initializing camera capturing whenever a student
is potentially depressed.
• Accessibility: Designing an inclusive and equitable online learning environment for
diverse users to improve access to course content for all learners.
• Satisfaction: Making a product with the overall ease of use, comfortable learning,
ease of set-up and installation, accessibility, table of contents, help, graphics, and so
on.
Applying Design Thinking Approach to Improve Online Education 669
The authors developed three ideas and examined the proposed solutions’ applicability
based on the above criteria. The approved idea is described as follows:
670 A. Alwadai and R. Alnanih
0.00%
No depression. Moderate depression. Severe depression
Fig. 2. Comparative graph of child depression during in-person and online school
The authors planned to design a platform similar to Madrasty with new missing
features. Teachers and students can log in through their Madrasty credentials. On the
platform, students can view all the missing features in PM#s 1–4 and access online
lectures and academic data. PM 5 is related to the teacher’s view only. The redesign of
Madrasty is called Madrasty 2, which includes the new proposed added features.
This step concludes that there is a difference in the children’s mental health between
physical and online school. The level of depression in online school is higher than the
physical school. This indicates the importance of adding the features of mental health
in the education site.
7 Design Prototype
Figure 3 shows the prototype screens for the Madrasty 2 platform, including a set of
new features to resolve the problems in the existing Madrasty. The main design looks
the same as the existing one. The new features that have been added are as follows.
1. Notification center: This will notify students of new tasks, events, and deadlines for
assignments (Fig. 3A).
2. Grades center: This will help students and parents keep track of courses grades
(Fig. 3B).
3. Tutorial video: This will provide users with information on how to use the website
(Fig. 3C).
4. Student’s mental health status: This is to inform parents and instructors look into
students’ mental health state (Fig. 3D).
5. Recordings of previous classes: This is to help students to download the previous
classes as shown in Fig. 4.
8 Pilot Test
The last step of the Design Thinking approach is testing. The proposed prototype for the
Madrasty 2 website with the new features was used to perform the test. Usability testing
was conducted with two types of users as follows:
1. Students: 30 students aged 7–9 years who have previously used the Madrasty
platform.
2. Teachers: 20 instructors in the teaching domain to evaluate the newly added features.
The usability testing for teachers and students consisted of two parts. First, a list
of tasks (Table 9) was prepared to measure the performance task and the number of
incorrect actions. Second, a post-test questionnaire to measure their satisfaction. To
define the benchmarks, an expert in the domain with background on the new features
in the platform performed the tasks and measured the time and number of clicks for the
teacher to perform each task. The expert determined that a completion time of 25 s for a
given task constituted excellent performance, 35 s constituted acceptable performance,
and 45 s constituted unacceptable performance. It was also determined that 1 click to do
a task constituted excellent performance, 2–3 clicks were acceptable, and 4 or more was
unacceptable. The same method was used for the students, but an additional 10 s was
added for each performance time to balance the differences in age and abilities, although
the number of clicks was the same as for teachers.
The prototype testing was run on a MacBook pro device individually for each student,
and each student’s comments were recorded immediately. The test was given during real
Madrasty platform time, between classes and previous classes as shown in Fig. 4.
Applying Design Thinking Approach to Improve Online Education 673
9 Results
9.1 Teacher Performance Tasks and Satisfaction Questionnaire
The usability testing for the 20 teachers consisted of five tasks (to measure the number
of clicks per task), shown in Table 9. The average number of “excellent” clicks, 75.8%,
enhanced the new features. The best performance was on Task 1, Task 2, and with 100%,
85% of teachers performing at “acceptable” or better.
Table 10 shows the results of performance time for the teachers needed to com-
plete the tasks, divided into three groups: excellent (≤25 s), acceptable (26–35 s), and
not acceptable (≥36 s). The average number of “excellent” and “acceptable” times in
seconds, 72%, reflects the easy use of new features.
Table 11 shows the results of teachers’ post-test questionnaire (measuring their sat-
isfaction). The results were obtained using relative weights (RW), which is a way to
674 A. Alwadai and R. Alnanih
Table 13 shows the performance time for the students to complete each task, which
is divided into three groups: excellent (≤35 s.), acceptable (36–50 s.), and not acceptable
(≥50 s.). The average number of “excellent” and “acceptable” times in seconds, 83%,
reflects the easy use of new features.
10 Discussion
Regarding the analysis for the teachers’ responses to the performance task, it can be
stated that despite having room for improvement, the percentage of right actions for
the performance task is significantly high, with 87% (Table 8). The average number of
teachers to perform all the tasks successfully was 72% (46% + 26%), indicating that
over half of the sample could perform all the tasks in an acceptable period. Table 15
shows that the average time needed for the teachers to perform all tasks is about 34.2
s. The success rate for all tasks is 81%, the average number of clicks for all tasks is
about 1.11 clicks, and the correlation between the average time needed for tasks and the
average number of clicks is about 0.94 where a correlation value closed to 1.0 shows a
perfect positive correlation between the movement of time and clicks.
For students, the overall success rate of right actions during the performance tasks
was 82.5% (Table 9), indicating the majority could perform all the tasks with an excellent
number of clicks. The average time to perform all the tasks successfully was 83% (68%
+ 15%), indicating that most students performed all the tasks in an excellent period.
Table 16 shows the average time needed for the students to perform all task is about
39..35 s, and the success rate for all tasks is 74.75%, the average number of clicks for all
tasks is about 1.28 clicks, and the correlation between the average time needed for tasks
Applying Design Thinking Approach to Improve Online Education 677
Table 15. Correlation between average time and average clicks for teachers
Task Time of task Done Clicks Correlation between time and clicks
Task 1 29.9 100% 0.55 0.812
Task 2 29.9 85% 1.00 0.954
Task 3 34.4 80% 1.00 0.988
Task 4 39.2 70% 1.40 0.970
Task 5 37.8 70% 1.60 0.943
Average 34.2 81% 1.11 0.940
and the average number of clicks is about 0.931 which means it has a positive relation
between these two variables because it closed to 1.0. As for task 1: the average time
needed for the task is about 34.5 s, with a success rate of about 83%, the average click
for the task is 1 that is a mapped to excellent as mentioned above, and the correlation
between the time needed for the task and the number of clicks is about 0.898.
Table 16. Correlation between average time and average clicks for students
Task Time of task Done Clicks Correlation between time and clicks
Task 1 34.5 83% 1 0.898
Task 2 32.7 90% 0.9 0.929
Task 3 34.5 83% 0.97 0.959
Task 4 55.7 43% 2.27 0.948
Average 39.35 74.75% 1.28 0.931
11 Conclusion
Design Thinking is a method for coming up with solutions to problems that already exist.
These solutions are always tailored to the demands of users and have a beneficial impact.
Design Thinking is an organized and iterative approach. This paper aimed to apply the
Design Thinking approach to improve the educational website and meet the missing
required features such as 1) adding a grade center to display the different assessment
methods grade, 2) adding a notification feature to notify the students with updated notes.
3) support the website with a tutorial guide to guide all the different types of students.
4) support the website with recording class to allow the students to return to it when
needed. Finally, highlight the importance of adding the mental health feature to the
panel of the website in such a way so as to be visible to the instructors and invisible
to the students, to record any unnormal observation during the online education, for
678 A. Alwadai and R. Alnanih
Acknowledgment. The authors gratefully acknowledge all the participants in the experiment test
for their time and effective feedback.
References
1. Althiabi, Y.: Attitude, anxiety and perceived mental health care needs among parents of
children with autism spectrum disorder (ASD) in Saudi Arabia during COVID-19 pandemic.
Res. Dev. Disabil. 111 (2021). https://doi.org/10.1016/j.ridd.2021.103873
2. Gul, H., Iqbal, S.Z., Saqib, M.: Usability evaluation of an educational website in Saudi Arabia.
VAWKUM Trans. Comput. Sci. 8(2) (2015). https://doi.org/10.21015/vtcs.v8i2.382
3. AlAzzam, M., Abuhammad, S., Abdalrahim, A., Hamdan-Mansour, A.M.: Predictors of
depression and anxiety among senior high school students During COVID-19 pandemic:
the context of home quarantine and online education. J. Sch. Nurs. 37(4) (2021). https://doi.
org/10.1177/1059840520988548
4. Scholten, H., Granic, I.: Use of the principles of design thinking to address limitations of
digital mental health interventions for youth: viewpoint. J. Med. Internet Res. 21(1) (2019).
https://doi.org/10.2196/11528
5. Böhm, M., et al.: Fluid status telemedicine alerts for heart failure: a randomized controlled
trial. Eur. Heart J. 37(41) (2016). https://doi.org/10.1093/eurheartj/ehw099
6. Langkamp, D.L., McManus, M.D., Blakemore, S.D.: Telemedicine for children with develop-
mental disabilities: a more effective clinical process than office-based care. Telemed. e-Health
21(2) (2015). https://doi.org/10.1089/tmj.2013.0379
7. Olfson, M., Druss, B.G., Marcus, S.C.: Trends in mental health care among children and
adolescents. N. Engl. J. Med. 372(21) (2015). https://doi.org/10.1056/NEJMsa1413512
8. Fan, Y.: Research on feature extraction of EEG signals using MSE-PCA and sleep staging
(2018). https://doi.org/10.1109/ICSPCC.2018.8567757
9. Tung, K., Liu, P.K., Chuang, Y.C., Wang, S.H., Wu, A.Y.: Entropy-assisted multi-modal
emotion recognition framework based on physiological signals (2019). https://doi.org/10.
1109/IECBES.2018.8626634
10. Herman, K.C., et al.: Does child likeability mediate the link between academic competence
and depressive symptoms in early elementary school? Child Dev. 91(2) (2020). https://doi.
org/10.1111/cdev.13214
Applying Design Thinking Approach to Improve Online Education 679
11. Duraku, Z.L., Hoxha, L.: The impact of COVID-19 on higher education: a study of interaction
among students’ mental health, attitudes toward online learning, study skills, and changes in
students’ life. Researchgate.net, May 2020
12. Schaeffer, C.E., Konetes, G.D.: Impact of learner engagement on attrition rates and student
success in online learning. Int. J. Instr. Technol. Distance Learn. 7, 3–9 (2010)
13. Hopkins, K., Crosland, P., Elliott, N., Bewley, S.: Diagnosis and management of depression
in children and young people: summary of updated nice guidance. BMJ 350 (2015). https://
doi.org/10.1136/bmj.h824
14. Venkataraman, D., Parameswaran, N.S.: Extraction of facial features for depression detection
among students. http://www.ijpam.eu
15. Belfer, M.L.: Child and adolescent mental disorders: the magnitude of the problem across
the globe. J. Child Psychol. Psychiatry Allied Discip. 49(3) (2008). https://doi.org/10.1111/
j.1469-7610.2007.01855.x
16. Tönnies, J., et al.: Mental health specialist video consultations for patients with depression or
anxiety disorders in primary care: protocol for a randomised controlled feasibility trial. BMJ
Open 9(9) (2019). https://doi.org/10.1136/bmjopen-2019-030003
17. Tonidandel, S., LeBreton, J.M., Johnson, J.W.: Determining the statistical significance of
relative weights. Psychol. Methods 14(4) (2009). https://doi.org/10.1037/a0017735
A Universal IT Support System for Teachers
for Educational Processes, Publishing
and Academic Research Using All-in-One
Educational Software
1 Introduction
The automation of teacher activities and the integration of IT into teaching forms
a complex interdisciplinary problem. Terms such as e-learning, learning technology,
technology-enhanced learning, educational technology, technology in education, and
digital learning can be encountered in the scientific literature, and these differ in princi-
ple based on whether a technology-driven or educationally driven approach is empha-
sized. Unlike in the recent past, a university teacher now needs to bulk process a much
larger volume of educational content in digital form and with more software that was
not developed for educational purposes. The automation of these activities is hampered
by information overload and the incompatibility between current software, hardware,
and computer files in various formats (such as text, image, audio, and video files). The
practical impact on educational processes is that the teacher, and in fact users in general,
must adapt to technology rather than the technology being used as a support tool to
automate learning processes, i.e., for the fastest possible and most efficient processing
of educational content.
From general sources can be mentioned, how it is written in the Edutopia “It is
sometimes difficult to describe how technology can impact learning because the term
technology integration is such a broad umbrella that covers so many varied tools and
practices” [1]. However, from the teacher’s point of view, the use of these tools is based
on a technology-driven approach rather than an educationally driven one. The same
case is when a reader reads that “Learning technology encompasses the full range of
tools and media that can be used to facilitate teaching and learning” [2]. The web page
for the Learning Technology Toolkit of the University of Saskatchewan lists over 20
technological items (e.g., Microsoft 365, One drive, Canvas, Mobius, Zoom). In terms
of a different division of technology categories, the university has Approved Academic
Tools by Function, which is more understandable to the university teacher, e.g., assess-
ment, course management, content creation, STEM student practice, or open textbook
creation/sharing.
For comparison, the ICT-22-2016 call from the European Union Research program
“Technologies for Learning and Skills”, the focus of this call was on “innovation of learn-
ing technology” and a challenge “to create an innovation ecosystem that will facilitate
use of digital content, tools and services for personalized learning and teaching.”
From this short introduction, we can see that there is considerable chaos and incon-
sistency in the definitions of technologies that are suitable for IT integration. Although
attempts have been made to highlight the differences between terms, e.g. “The difference
between technology of education and technology in education,” as in [3], ambiguities are
also observed in general sources; for example, the link to technology-enhanced learn-
ing on Wikipedia redirects the user to the page on educational technology [4], which
states that this term “is not restricted to high technology but is anything that enhances
classroom learning in the utilization of blended, face to face, or online learning.”
In other words, these are basically only synonyms from the point of view of the aver-
age teacher, i.e., the same technology is referred to as e-learning, technology-enhanced
learning, learning technology, digital learning, or educational technology. The average
teacher needs to choose only a few tailor-made supporting tools from a wide portfolio of
existing technological tools and needs to focus on those that will allow for the creation,
682 S. Svetsky and O. Moravcik
but by the painstaking analysis of human learning processes and of the requirements of
a particular task.” The ineffective implementation of technology-related change is also
mentioned, in relation to “resistance to change from students, professors, administra-
tors”. In the context of educational software, the authors mention that future software
applications should be concerned with three aspects of learning: lectures, laboratories,
and libraries. Particularly pertinent is the statement: “That is why we find it necessary
to have a design and technology team behind every professor.” Exactly the same view
is put forward today by the authors of [6], who state that “learning support services are
extremely important, so the instructors or tutors have to understand the learning difficul-
ties and the learning environment of the learners so as to have effective communication
with them.” It is also interesting to argue that “with the advent of visual technologies,
students lose the motivation to make their own notes.” A similar approach is found in
another monograph [7], which emphasizes that technology is not a simple panacea for
education and that a teacher is always a key player in the process of teaching and learn-
ing, in terms of creating and managing educational content. Specific emphasis is placed
on the TPACK model, which introduced the concept of Technological and Pedagogical
Content Knowledge (TPACK) as a framework for “integrating technology in teachers’
knowledge.” According to Mishra and Koehler, TPACK is an emergent form of knowl-
edge that goes beyond all three components (content, pedagogy, and technology), and
is “different from knowledge of a disciplinary or technology expert and also from the
general pedagogical knowledge shared by teachers across disciplines.”
The TPACK framework, with its seven types of knowledge, is still popular in educa-
tional technology, and is explained by many of the internet sources for the educational
community (see for example in the literature review in [18]). In regard to the TPACK
model, the authors of [7] emphasize that “teachers not only need to know the content
they are teaching but also must recognize how to integrate technology into pedagogy to
achieve greatest impact on desired outcomes.” Another model of classroom-based sce-
nario is discussed in [8] and is called the “Turn around Technology Integration Pedagogy
and Planning” (TTIPP) model; this includes phases based on the analysis of learning
and teaching assets and needs, design of the integration framework, and post-instruction
analysis and revisions. A great deal of attention is paid to a specific integrating tech-
nology across certain disciplines (e.g., science, engineering, mathematics, second and
foreign languages). The principle of the TPACK model is aligned with Laurillard’s state-
ment [9] that the optimal solution to technology-enhanced learning can be achieved in
practice if the teacher, researcher and designer work closely with each other.
In a paper on educational technology [6], which is mainly related to the area of
learning activity design it is emphasized that from the perspective of learners, each
learning activity includes four aspects: (1) the learning tasks (which allow the learners
to explicitly understand what they should do); (2) learning resources (non-digital and
digital materials that provide the learner with the necessary information and content);
(3) evaluation methods (which should allow for adequate examination of the completion
of learning activities); and (4) learning support services (where the instructors or tutors
should understand the learning difficulties and environment of the learners, in order to
facilitate effective communication with them). Several theories have been put forward on
the better design of learning activities, such as Bloom’s taxonomy, Sweller’s cognitive
684 S. Svetsky and O. Moravcik
load theory, and Mayer’s principles of multimedia learning. In the last approach, the idea
is that students can learn more deeply with multimedia than they could have with words or
pictures alone, and that multimedia instruction should “encourage the learner to construct
a coherent mental representation of the material” in order to “construct new knowledge”
[19]. One important argument is that “a technology need not be a specific device, as a
technology could be generally understood to be a systematic and disciplined application
of knowledge”. This question of knowledge is a key aspect of the all-in-one software
WPad, which can be considered an adaptive learning software. The cybernetics idea on
which the software is based was published in an AI journal, as it allows for knowledge
extraction and representation, even in natural language, usable by lay users [20]. As
will be explained, WPad is based on a specific model of knowledge representation,
while, but in the mentioned monograph [7] the knowledge representation is discussed
only in general and without any definition. The use of sensors, graphs, drawing and
painting programs, hypermedia is declared as technologies which represent knowledge.
However, computers do not know what knowledge is and how to use it if it is not
computer-defined. Such a representation of knowledge presents, e.g., Syed [21] for next
generation knowledge machines, when he represents the knowledge in the form of graphs
as a quantifiable and dimensioned entity. This is only a theory and is far removed from
the work of a teacher in the realm of natural language within educational settings.
The related pedagogical context can also be selected from a newer monograph of
the design of technology-enhanced learning [8]. The TPACK framework approach is
emphasized, as in previous studies, and the pedagogical aspects of technology-enhanced
learning are also clarified. From the point of view of the function of the WPad educa-
tional software, the important aspect of representing and sharing content is mentioned
in relation to conceptualizing content in the Anderson-Krathwohl taxonomy of learning,
teaching, and assessing. In the Anderson-Krathwohl taxonomy (i.e., a revised Bloom’s
taxonomy), factual, conceptual, procedural, and metacognitive knowledge are taken into
consideration [22].
The pedagogical aspects discussed above are rarely followed in practice, since com-
puters were invented for calculations rather than for teaching. As a result, the current
state of the technology has not yet reached the level required to support the teacher.
Existing technologies are still not optimal for practical teaching; users have to adapt to
the technology, and to check whether it is suitable for their educational needs. As set
out in the introduction, such a huge range of technological tools is now available that
ordinary teachers are likely to find this disorienting.
3 Purpose/Goal
3.1 Motivation and Research Focus on Technology Integration
Technology that is suitable for integration into teaching and selected pedagogical aspects
is discussed here to clarify the purposes and goals of our research. As mentioned above,
our approach focuses on several interdisciplinary elements that are not described else-
where in the scientific literature. Although the scientific monographs discussed above
are very useful, from the point of view of the teacher or researcher, it is interesting
that they do not pay more attention to the factor of time, i.e., the speed of processing
A Universal IT Support System for Teachers 685
educational knowledge and content, which is a key element in solving the automation
of any educational process and fundamentally affect a teacher’s performance, and hence
the learning outcomes in general. In addition, it is well known that if a teacher creates
educational materials, these need to be updated after a certain time, which poses a signif-
icant problem in practice. There is also little mention of the fact that although teachers
typically work for 10–20 years, the lifespan of software and hardware is only a few
years (for example, laptops and mobile phones often fail after 2–3 years, a programming
language may change, and operating systems and software are continually updated).
These practical issues are mentioned because a universal software must be ‘resistant’ to
any changes in software and hardware. In this respect, there was a particular focus in
the development of WPad software on adaptation to the Windows operating system and
the most common Internet browsers. Compatibility with Microsoft Office packages and
other software used in education and the ability to switch from the program environment
to other software and online portals and environments are also advantageous.
From our point of view, however, the issue of mass processing of information and
knowledge is much more important, and this is not mentioned in the related scien-
tific literature. This issue formed the basis of our vision, published in 2007–2008, that a
knowledge worker (such as a teacher or researcher) needs to process such a large amount
of information in the course of teaching and research that they need to be technologi-
cally equipped like a “contemporary soldier” [23]. Since no suitable software was on
the market at the time, the designer of WPad began developing an all-in-one software
for undergraduates, based on a batch information and knowledge processing paradigm.
The progress made in terms of integrating technology into teaching was a subject on
which we continuously published papers in conferences and scientific journals in the
field of technology-enhanced learning. Our original empirical research (which was ini-
tially based on a technology-driven approach and then an educationally driven one) was
transformed into the current interdisciplinary research, including the registration with
the Slovak patent office of a utility model for the conversion of uncertain and unstruc-
tured data into semi-structured data. This is related to our model of virtual knowledge,
i.e., a specific data structure operating on the cybernetic principle of isomorphism of
physical computer processes and mental activities [24].
In the context of finding solutions for future learning technologies, our motivation
and focus is presently on the automation of knowledge-based educational processes
(based on our academic research). In practice, when a teacher aims to develop various
training activities, this mainly requires solving the following key issues:
The main aim of this paper is to give an overview of how all these key elements
have been addressed in our academic research over about 15 years, with a focus on the
integration of IT into teaching to support the teacher (researcher, designer) as a key player
in the educational process. The secondary objectives are to exchange experience with the
academic community and to outline the challenges for solutions of future technologies. In
the following sections, several outcomes will be presented based on case study examples
drawn from academic practice.
A milestone in our research and the key outcome was the invention of the informatics
data structure mentioned above, which simulated human knowledge; this was predicted
in the author’s habilitation thesis, which focused on the mass construction of educational
content and e-learning materials [25]. This virtual knowledge was invented by addressing
the issue of how one computer program could function in academic practice as an all-in-
one program that was suitable for teaching, research, and publishing. As mentioned in the
previous section, it can replace numerous other software packages that the teacher would
need to use for the same educational activities. For a lay person, this can be explained by
the fact that it is sufficient for a teacher, student, researcher, or other user to find a way
of transferring tacit and explicit human knowledge into virtual knowledge (which takes
the form of an ordinary table containing plain text that can be edited). In this case, the
computer can process the virtual knowledge extremely quickly and give the teacher the
desired output in relation to classroom teaching or other educational activities. And in
doing so, a never-ending story of hundreds of outputs began and is possible to perform.
Numerous categories of activities of teachers can now be supported by the IT system of
support using WPad at the personalized IT infrastructure regarding teaching, research,
and publishing (at the level of personnel or collaborative outcomes, including distance
learning).
Within our research, the different categories of educational activities were supported
from the perspective of providing knowledge in teaching (lectures, exercises, self-study,
collaborative learning) as follows:
• Learning content for several study programs (for which the outcomes were published
in global conferences and scientific journals).
• Outcomes from cooperation with international consortia that have submitted proposals
for projects related to the integration of IT into teaching (FP7 and Horizon 2020 calls
for IT).
• The WPad educational software, a shared IT infrastructure (including WEB, cloud, vir-
tual servers), communication channels, and a butch knowledge processing paradigm.
A Universal IT Support System for Teachers 687
Each user (a lay person, teacher, student, researcher, or expert) needs to find their
own style and way of inserting their tacit or explicit knowledge into the knowledge
tables using plain text. The content field of the table has a simple text editor, which
enables a user to manually input shorter texts or to paste in a larger amount of text. It
therefore functions as a container for the content and uses hypertext to directly connect
the knowledge tables to the Internet and the personal folders on the user’s computer. For
688 S. Svetsky and O. Moravcik
Fig. 2. Example of a (Virtual) knowledge table entitled PAPER, which is used to support
publishing in a WPad work environment
From a user point of view, it is important using the hypertext link directly from the
knowledge table both to folders on their personal computer and Internet paths without
need to open browsers or Windows explorer for writing paths. WPad also functions as
a simple HTML editor, so by simply clicking CTRL-F1, the table can be converted into
mirrored HTML-format, as shown in Fig. 3.
Fig. 3. Example of the conversion of a knowledge table entitled PAPER into HTML format
In other words, a user can produce HTML tables with concentrated content, where
one row represents one Web page. Since the knowledge table can contain many rows,
it enabled to develop the batch information and knowledge paradigm, i.e., a way of
A Universal IT Support System for Teachers 689
The teacher inserts educational content into the table and the computer ‘understands’
it as an IT data type that can be processed extremely quickly, with outputs provided in a
form that is comprehensible to humans. It should be emphasized that the computer does
not perform the mental work in the place of humans, but simply supports our mental
processes.
Fig. 5. Example of the basic level of WPad, as used in the classroom teaching of undergraduates
Fig. 6. A teacher’s personal “hybrid” Offline/Online IT infrastructure, based on the advanced use
of our educational WPad software, including a communication channel (PIKS)
of view, this is also in line with the Meyer model, which states that a student will
understand learning material more quickly if the learning content is a combination of
texts and images, and possibly in a multimedia format.
The teacher found that the WPad tables created by the students functioned as their
notebooks; they made notes on lectures and exercises directly into the knowledge tables,
and many of them would not take notes without using WPad as a supporting tool. Another
pedagogical (didactic) advantage was that the teacher could collect and evaluate the
students’ notes from the class computers, or combine them into a single table, and placed
it in the shared faculty’s virtual learning space. So, a collaborative activity was used to
create a new study material, which was also used for self-study by other undergraduates
in the subsequent years.
• Go to the IEEE journal page, create a table with links to yearly issues of the journal,
select the option to download it to a computer, open it, convert to HTML format, open
it in the internet browser, or if necessary, synchronize the transfer to the BOX cloud,
which is shared with several researchers.
• Copy the text from the conference proceedings to a row in the knowledge table, create
a corpus table and enter search keywords, e.g., keywords or stylistic phrases to support
writing an article in English.
• Copy the RTF output from the university’s publication server to a row in the knowledge
table (e.g., for the years 2010 to 2021), create a corpus table from it and search for
a list of your publications or any publications from the department, institute, or the
whole faculty
• Make a list of all PDF files on the computer, USB, or backup disk, and add them as a
new line at the end of the opened table (SHIFT-F9).
• Write source code in a row of the table that will do something with the rows or content,
so it can be used instead of the standard command window or console, and the user can
enter the source code into the same table in which the educational content is stored.
IT support for publishing is also being developed, which is based on inserting various
content (e.g., multilingual annotations, links to journals, instructions for authors, pdf-
articles, and various custom or e-resources) into the text field of the table. This advanced
mode of operation is schematically illustrated in Fig. 7.
Fig. 7. Relationships between virtual knowledge/knowledge tables and computer files (after
loading or linking files to an empty table, the tables contain a domain educational knowledge)
It should be emphasized that the transfer of WPad tables containing virtual knowl-
edge between notebooks, client computer folders or online servers is radically different
from the transfer of computer files as commonly used by teachers and other users. This is
not generally understood by reviewers of scientific journals with a focus on educational
technology and database specialists (as these users are familiar only with the relational
A Universal IT Support System for Teachers 693
database paradigm). Computer files are processed in batches by file management meth-
ods, while in knowledge tables, the batches consist of groups of rows. The monthly man-
ual table for individuals contains about 20–50 rows, whereas an automatically created
table with WPad can have a million rows.
Figure 8 illustrates a content of the row 4887 from a table that has 676,896 rows.
The table was created automatically and contains a list of paths to all the existing files in
the teacher’s notebook. These can be opened directly from the table; for example, after
clicking on the path in this row, a picture of the Fe-C diagram from the STEM course will
be displayed. This principle of offline hypertext, which can be used as a menu item of the
user menu of WPad, represents added value in terms of file management in Windows.
In this content, Fig. 9 illustrates a file management process. The result table contains
a list of files output from a search of the BOX folder; this folder is synchronized with
the online BOX cloud, meaning that the researcher does not have to search in the BOX
cloud and can instead search offline.
Fig. 8. Knowledge table automatically created from the backup folder of teacher’s personal
computer (knowledge base - 676,896 Rows, 800 MB, opening/closing takes 20–40 s)
Fig. 9. Results of an offline search for an explanation of a file management function used in WPad
694 S. Svetsky and O. Moravcik
In our case, learning and teaching content can be stored both in the knowledge tables
and in computer files with various formats (TXT, HTML, PDF, DOC, JPG, MP3, MP4,
PHP, CPP, etc.). A regular user typically has many windows (different types of software,
browsers, e-mail accounts, etc.) open at one point of time. User must use numerous
computer files, interfaces, and switches between them by clicking with the mouse. In
comparison, using the knowledge tables, the user can directly visit websites, local folders,
open directly software or browsers, meaning that the number of mouse clicks required
is significantly lower when using WPad. It can be estimated that an individual working
with information and knowledge can save tens of thousands of clicks per year in this way.
Since only selected learning content is inserted to the knowledge tables, the knowledge
base consisting of the knowledge tables of individuals is drastically smaller than the size
of the computer files. As WPad is an all-in-one software, it is not possible to describe
all the cases that the authors have dealt with over the years of research. The following
screenshots illustrate some of them.
Figure 10 shows some teacher’s tables using WPad, i.e., tables with direct links to
the academic information system without using a browser (shown at the top) and tables
used for assessment, with automatic evaluation and grading (shown at the bottom). For
the lower image in Fig. 10, it should be noted that at this time, handwritten work by
students was scanned and evaluated by scoring three areas, meaning that the computer
was able to automatically sum these and insert the result into the table (although the
addition of the points was done in the text field).
Fig. 10. Teacher’s tables linking to the academic information system and student assessments
Figure 11 presents schematically two cases from research on speech recognition and
modeling of the creation of educational packages by an international team.
A Universal IT Support System for Teachers 695
Fig. 11. Testing speech recognition software for controlling source code via voice (Left); scheme
for educational packages creation by an international team (Right)
5 Conclusion
In this paper, we have described a solution for integrating IT into educational processes
based on the design of own educational software that supports as a universal tool for all
the common activities of a teacher in his teaching, publishing, and research. The teacher
does not have to adapt to existing technology, but the software and the IT infrastructure
are built according to the needs of the teacher and the students. WPad software was
explained in terms of its use as a universal interdisciplinary all-in-one educational tool.
From an informatics point of view, it can be used (1) for the processing of educational
texts; (2) creating a large amount of e-learning and educational materials, as it also
functions as a simple personal HTML editor; (3) as an editor and corpus when teaching
programming languages (C++, C, PHP); and (4) as a supporting tool for pre-service
teachers for their diploma theses, and in a wide variety of situations in the realm of
teaching and learning.
As WPad allows the teacher to process large amounts of educational content, it
has also been tested as a tool for processing large volumes of information contained in
computer files. Indeed, thanks to today’s technology, teachers have a “small internet”
on their computers. Therefore, the research focuses on aggregating educational content
from all offline/online sources and reducing it into a form of a personal knowledge base.
This approach allows teachers to minimize the current information overload. There are
also technological limitations, e.g., when transferring a very large amount of computer
files between offline and online environments, or limitations related to the technology
lifecycle, which is shorter than teachers need in practice. From a pedagogical point of
view, it is important whether the teacher is able to formulate the educational algorithms
needed to write the appropriate informatics algorithms. This is particularly important for
automating the creation of educational content in the form of learning packages. Future
696 S. Svetsky and O. Moravcik
work could therefore focus on the design of a shared virtual server for teaching students,
or the creation of an educational portal with language support. In terms of future plans,
the research will focus on interdisciplinary aspects such as synchronization of teaching
algorithms and computer algorithms. In this context, research is limited by the level of
available technology (e.g., the planned use of Speech recognition technologies depends
on the possibility of using it for languages other than English).
References
1. Edutopia: Technology integration (2007). https://www.edutopia.org/technology-integration-
guide-description
2. Learning technologies: Teaching with technology. https://teaching.usask.ca/strategies/lea
rning-technologies.php#Usingtechnology
3. Technology of education vs technology in education (2011). https://www.differencebetween.
com/difference-between-technology-of-education-and-vs-technology-in-education
4. Wikipedia: Educational technology. https://en.wikipedia.org/wiki/Technology-Enhanced_
Learning
5. Goodman, S.P., et al.: Technology-Enhanced Learning: Opportunities for Change. Laurence
Erlbaum Associates, Mahwah, NJ, USA (2002)
6. Huang, R., Kinshuk, Jemni, M., Chen, N.-S., Spector, J.M. (eds.): Lecture Notes in
Educational Technology Series (2021). https://www.springer.com/series/11777
7. Roblyer, M.D., Doering, A.H.: Integrating Educational Technology into Teaching, 6th edn.
Pearson (2013)
8. Bower, M.: Design of Technology-Enhanced Learning: Integrating Research and Practice.
Emerald Group Publishing – Education (2017)
9. Balacheff, N., Ludvigsen, S., Jong, T, Lazonder, A., Barnes, S. (eds.): Technology-Enhanced
Learning. Principles and Products. Springer, vol. XXVI, 326 p (2009)
10. Stošić, L.: The importance of educational technology in teaching. Int. J. Cogn. Res. Sci. Eng.
Educ. 3(1), 111–114 (2015). https://doi.org/10.23947/2334-8496-2015-3-1-111-114
11. Martens, A.: Software engineering and modelling in TEL. In: Huang, R., Kinshuk, N.-S.C.
(eds.) The New Development of Technology Enhanced Learning: Concept, Research and
Best Practices, LNET, pp. 27–40. Springer, Heidelberg (2014). https://doi.org/10.1007/978-
3-642-38291-8_2
12. Oliver, M.: Learning technology: theorising the tools we study. Br. J. Edu. Technol. 44, 31–43
(2013)
13. Kinchin, I.: Avoiding technology-enhanced non-learning. Br. J. Edu. Technol. 43(2), 43–48
(2012)
14. Walker, R., Voce, J., Swift, E., Ahmed, J., Jenkins, M., Vincent, P.: 2016 Survey of Technology
Enhanced Learning for Higher Education in the UK. UCISA TEL Survey Report 2016.
University of Oxford (2016)
15. Lundie, D.: Authority, autonomy and automation: the irreducibility of pedagogy to informa-
tion transactions. Stud. Philos. Educ. 35(3), 279–291 (2016)
16. Svetsky, S., Moravcik, O.: Some barriers regarding the sustainability of digital technology
for long-term teaching. In: Arai, K., Bhatia, R., Kapoor, S. (eds.) FTC 2018. AISC, vol. 880,
pp. 950–961. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-02686-8_71
17. Mishra, P., Koehler, M.J.: Technological pedagogical content knowledge: a framework for
integrating technology in teachers’ knowledge. Teach. Coll. Rec. 108(6), 1017–1054 (2006)
18. Zhang, W., Tang, J.: Teachers’ TPACK development: a review of literature. Open J. Soc. Sci.
9, 367–380 (2021). https://doi.org/10.4236/jss.2021.97027
A Universal IT Support System for Teachers 697
19. Mayer, R.: Cognitive Theory of Multimedia Learning, pp. 43–71. The Cambridge Handbook
of Multimedia Learning, Cambridge University Press, Cambridge, UK (2014)
20. Svetsky, S., Moravcik, O.: The automation of teaching processes based on knowledge
processing. Trans. Mach. Learn. Artif. Intell. 2(5) (2014)
21. Syed, V.A.: Next generation knowledge machine: design and architecture Page xiii. Elsevier.
https://www.sciencedirect.com/science/article/pii/B9780124166295000153
22. Biology discussion: Anderson and Krathwohl’s taxonomy (with comprehensive view)
| Biology. https://www.biologydiscussion.com/living-organism/taxonomy-living-organism/
anderson-and-krathwohls-taxonomy-with-comprehensive-view-biology/85945
23. Svetsky, S., Moravcik, O., Tanuska, P., Rehakpva, A., Ruskova, D.: The implementation of
technology enhanced learning at dislocated university workplace. In: ICETA International
Conference on Emerging e-Learning Technologies (2008)
24. Svetsky, S., Moravcik, O.: The utility model UV 7340-2014: The linked unstructured data
processing system using a specific data structure. Industrial Property Office of the Slovak
Republic (2016)
25. Svetsky, S.: The practical aspect of knowledge construction and automation of teaching
processes within technology-enhanced learning and eLearning. Habilitation thesis, Slovak
University of Technology (2012)
Communicating Vessels Model for the Intelligent
Monitoring System of the Service Guarantee
in the New Generation of Digital Open
Universities (NG-DOU)
1 Introduction
The DOUNG is an improved model defined in [1] that allows a learner to follow lec-
tures by using laptop or cell phones. The learner can use the synchronous mode and/or
asynchronous file transfer over multiple available channels. But the mobile devices have
capacity limitations in terms of processing, storing and data display that brings the sys-
tem to operate under many constraints. A response is given by the authors of document
[2] through the Advanced Text Reading System (ATRS) allowing to convert into audio,
the course available in text format. It is possible to improve the efficiency of the DOUNG
service that includes the VPN (Virtual Private Network) [3–6] and the m-learning [7]
model. For this purpose, this paper starts with the definition of the “service guarantee”
concept and its problem in the DOUNG system. Then the area covered is extended
with the complexity calculation of the audio/video lecture warehouse for asynchronous
access mode. The communicating vessels model is used to highlight the system opera-
tion in synchronous mode with significant QoS parameters that populate the Intelligent
Interface of Monitoring the Service Guarantee (IIMSG).
belongs to some identified parameters. Their values can strengthen or compromise the
reliability of the service, meaning in that last case the failure of the provider to respect the
commitment. In some cases, the commitment violation impacts the civil responsibility
of the service provider with often the constraint to compensate the client.
When transposing the postal service model to the distance education service offered
by the DOUNG, for a learner following a lecture in real time, the service reliability is
compromised if a great percent of the lecture is lost. The loss can occur in the system
or according to the vagaries of the network link. Particularly, the volatile nature of the
wireless link integrated in the DOUNG architecture can compromise the transmission
of the teacher’s message and can alter the understanding of the lecture. To ensure the
reliability of its lecture delivery service, the DOUNG can use the IIMSG to indicate
the service guarantee level attached to each learner access type. The reliability of the
service helps to achieve the goal of increasing the attendance and influences the learner’s
choice of the service access mode according to many constraints such as the type of
communication and of Internet access, the using of cell phones and the crossing of the
GSM network [8].
Generally, the mobile networks as GSM are characterized by low rate of flow, high
latency during the information exchange, low battery autonomy and great coast. In
addition, the availability of the channel is volatile, having thus a high influence on the
DOUNG lecture followed in real time option. It becomes more complex to set the coast
of each kind of course access because of the nature of the mobile networks in addition
to the no more less than binding of the actual cell devices. In many multimedia cases,
the real time system implements the anticipation window concept. To support a great
size of the used buffer, it is necessary to cross the storage limitation constraint of the cell
phones with the high latency constraint described in the channel model using a repeated
link failure of GSM network. The crossing of the two constraints brings the IIMSG to
invoke inference rules and determine the rank from which the choice of a download of
a multimedia file or other file formats (web page, treated text, untreated text, or pdf)
becomes more efficient than following the course alive.
The nature of the cell device determines the local environment of the learner. The great
constraint of the visualization application in terms of storage, processing or brows-
ing capacities is contrasted by the cell phone limited interface, their low capacity of
processing, their poor ergonomic possibilities, and the weakness of their storage capacity.
As the DOUNG offers multiple options of lecture delivery, an incoming constraint
belongs to access the lecture alive or to the lecture warehouse regardless of the require-
ments of the used hybrid platforms. The cell devices environment requires a global
approach to specify the lectures formats and a particular management of the lecture
Communicating Vessels Model for the Intelligent Monitoring 701
warehouse. One solution is based on the use of the learning objects that are unbound
to a platform. The concept of systems interoperating easily with each other becomes
paramount interesting. A learning object can be designed as a cloned model that incor-
porates different options. It can be used by the software application according to their
need. In the DOUNG case, the learning object will incorporate the content delivery
system to implement a modular learning. The learner can access the DOUNG lecture
warehouse, download a learning object ready to be played and visualized on conform
system. In addition, the evolution improves the web browsers by integrating the capacity
and the connectivity of mobile devices.
To achieve the goal of using a mobile learning solution on a wide range of devices,
the content delivery must be independent of the mobile devices. The content is to be
separated to the format for avoiding the devices to implement specific solution. The XML
language (eXtensible Markup Language) [9–12] illustrates a solution of specifying the
content regardless to its visualization over multiple types of mobile devices.
an encapsulation technique of audio and video stream in continuous mode. The required
synchronization between the two flows facilitates the simultaneous playing of the sound
and the image sequences with various formats for the video. The OpenDML format
subsequently developed allows to exceed the limit of 2 Mb set by the basic AVI format.
The evolution of the AVI format brings also to the AMV (Anime Music Video) format
created for the MP3 and 4 players with a ratio of 4 pixels by byte (Pbb) instead of the
10 Pbb ratio generated by the Mpeg2 (Moving Picture Experts Group). The resolution
range used by the AMV format evolves from 96 × 96 pixels to 208 × 176 pixels. The
image by second (Ibs) cadency varies from 10, 12 to 16 images. For a resolution of 128 ×
96 using 12 Ibs, a video stream of thirty minutes generates around 80 Mb. Before the
advent of the Mpeg format giving a high compression ratio, the multimedia technology
evolution generates the M-JPEG (Motion-Joint Photographic Expert Group) with video
capture devices, able to process 29 Mbps rate of flow.
Let’s assume Tv as the amount of bits per second generates by a video capture
application. Let Nx × Ny be the image resolution with Nx the number of pixels in line
and Ny that of columns. Let’s consider Ni the Ibs cadency used by the video capture
device and Np the number of Pbb. By taking the duration of the teaching unit as equal
to an hour, the equations below give the complexity rate Cv in Mb of one teaching unit
in video format:
The number of bits of every image is given by:
Nx ∗ Ny / 8 ∗ Np (2)
Tv = Nx ∗ Ny ∗ Ni / 8 ∗ Np (3)
Ca and Cv allow to determine the space required in the lecture warehouse and on
the learner device that uses the complete download option. Thus, the space required
on the cell device can be deduced and the capacity of the external storage device to be
used. The amount of space required for opening a distance education branch of study
in the DOUNG derives from the two parameters by using university norms. Efficient
projection can be made with the equipment to be mobilized. Ca and Cv are integrated
in the IIMSG used as a dashboard. In addition, the decision of opening a new branch of
study needs to resolve the asynchronous and synchronous connection throughput with
necessary additional parameters.
one hour and fifteen minutes to two hours including the break time and the questions
and answers period between the teacher and the learners. The academic standard sets
the duration of a license branch of study to three years of 600 h by year. The amount
of hours includes the Lecture (LT) period, the assignments (AS) period and the Practice
Work (PW) period. The AS and PW are mainly conducted by the learners and the LT by
the teacher. The 600 h of a year are divided into three periods for the LT, AS and PW.
Let’s assume that the three periods are equivalent. The LT periods are used to determine
the required storage amount of space. Thus, during the whole three years of a license
branch of study, 600 h are spent in standard for the LT. The previous Cas and Cvs are then
adapted to that specific case to become the new Casl and Cvsl parameters calculated in the
following (5) and (6) equations. They indicate the complexity of the lecture warehouse
for a license branch of study in the DOUNG according to the audio or video format of
the lectures:
The same scheme is applied to the two years of the Master branch of study. It
produces 400 h of LT when taking the stage period out of consideration. The Casm and
Cvsm parameters indicating the complexity of the lecture warehouse for a master branch
of study are calculated in the following (7) and (8) equations.
Other additional parameters are identified when considering the synchronous transfer
mode and the using of the anticipation window by the learner device. When that system
is operating, both the two buffers (DOUNG and learner) must avoid the multimedia
stream to be exhausted. The information living in the DOUNG buffer is waiting to be
conveyed in real time at the learner destination through the network. Every information
unit may reach completely to the learner anticipation buffer before the demand of the
learner application occurs even during its transfer process. We use the communicant
vessels model to describe all the parameters required to monitor the service guarantee.
The classic communicating vessels model puts into association two vessels with a
content moving from one to the other through a pipe used as the content conveying
channel. The amount of the content subtracts from one is equal to the amount that
reaches the other. Thus, when providing a real time lecture, the first vessel is the buffer
created by the camera stream at the numeric university side. The channel is composed of
the DOUNG protocol stack at the sender side, the network and then the learner protocol
stack at the destination. The second vessel is the learner buffer created by the anticipation
704 B. A. Nicolas et al.
window of the application. The Fig. 1 below illustrates that model and allows to identify
all the IIMSG additional parameters. It helps to build their equation and makes explicit
their interdependence.
Note: The DOUNG and the learner buffers operate in FIFO (First In First Out) mode.
For the DOUNG buffer, the stream enters from the top (input) and goes out (output) from
the bottom of the vessel. Contrary, the input access point of the learner anticipation buffer
is the bottom while the stream is output from the top.
Legend of the parameters:
The Round Trip Time (RTT) is an additional parameter that shows the time spent by
the learner’s application to reach the DOUNG and to get back the requested data.
required by the anticipation buffer before the learner application starts playing the video.
Then the difference between STv and STs is extended by the time used for the transfer of
δ bits through the network. The value of δ is given as the required size of the anticipation
buffer. In addition, the network throughput, and the type of traffic help to determine the
shift time before the synchronization between the teacher and the learner. For example,
the Constant Bit Rate (CBR) [13] model provides an uninterrupted information steam
contrary to the exponential traffic variation model that describes an alternate traffic with
peaks and low activity periods. Some other models integrate the traffic interruption
periods to differentiate the continuous and the discontinuous (discrete) nature of the
information stream.
We are using average values of the CBR for monitoring the service guarantee dur-
ing the lecture delivery. Thus, at every current time (CT), the amount of transferred
information is used to calculate the effective values of the parameters that populate the
IIMSG.
expires. That philosophy avoids the loss of synchronization and the loss of the thread of
the lecture by the learner. Such a loss impacts the understanding of the teacher message.
The theoretical data amount available in the DOUNG buffer at CT, without any
recovery, is Qs (in number of bits):
The theoretical data amount available in the anticipation buffer of the learner at CT,
without any recovery, is Qr (in number of bits):
Qr = (Vtc − Rv ) ∗ T (12)
An inference rule of the IIMSG can be stated. The “true” value of PRTR allows to
initiate the recovery while the “false” indicates that it is prohibited. The value of PRTR
708 B. A. Nicolas et al.
depends on Qs which is the amount of the data available in the buffer added to QRTT and
compared to Smax :
If (Qs + QRTT ) < Smax then PRTR = ′′ true′′ else PRTR = ′′ false′′ (14)
The values of Qs and Qr are exchanged between the two sides of the system. The
TCP (Transmission Control Protocol) stream control mechanism is used as a model; the
“window” field of TCP can be used to limit the overhead in the communicant vessels
system.
11 Conclusion
The DOUNG is bound to respect its commitment to deliver a complete service of distance
education delivery following multiple constraints such as the mobile network weakness,
the volatile availability of its channel with repeated link failure. The constraints are
extended by the no more less binding of the actual cell devices having storage limitation,
low processing and browsing capacities. All these constraints impact the reliability of
the service and influence the learner’s choice of a service access mode.
This paper helps to identify the service guarantee parameters. The goal is to make
their values available for the DOUNG and the learner during a course delivery, showing
the level of the service guarantee realization. Some of the parameters that populate the
knowledge base of the IIMSG derive from the synchronous mode having a restrictive
transmission character. We study that real time lecture delivery by using the communicant
vessels model to achieve an efficient design of that problem. In addition, the parameters
are extended by the asynchronous access mode or required for opening a new branch of
study.
An upcoming work to be conducted is to run simulations according to the traffic
type so that the variation of the buffers level will become explicit. This will help to set
significant thresholds of the parameters to fill the inference base of the IIMSG.
References
1. Issoufou Tiado, M., Saliah-Hassane, H.: Cloud-Computing based architecture for the advent
of a New Generation of Digital Open Universities in m-learning. In: ICEER13 Proceedings,
pp. 572–579 (2013). www.labader.org,
2. Tiado, M.I., Idrissa, A., Karimou, D.: Improved text reading system for digital open
universities. IJARAI 4(10), 29–34 (2015)
3. Sun, Y., Wang, B., Wang, C., Wei, Y.: On man-in-the-middle attack risks of the VPN gate
relay system. Hindawi Secur. Commun. Netw. Article ID 9091675, 7 (2021). https://doi.org/
10.1155/2021/9091675
4. Zhou, Z., Huang, T.: Open VPN application in COVID-19 pandemic. In: 2021 Interna-
tional Conference on Advances in Optics and Computational Sciences, Journal of Physics:
Conference Series, vol. 1865, p. 42015 (2021). https://doi.org/10.1088/1742-6596/1865/4/
042015
5. Kumaki, K., Murai, T., Cheng, D., Matsushima, S., Jiang, P.: Support for Resource Reservation
Protocol Traffic Engineering (RSVP-TE) in Layer 3 Virtual Private Networks (L3VPNs), RFC
6882 (2013)
Communicating Vessels Model for the Intelligent Monitoring 709
6. Fuzi1, M.F.M., Alias, M.R.M., Kaur, N., Halim, I.H.A.: SafeSearch: obfuscated VPN server
using raspberry Pi for secure network. J. Comput. Res. Innov. 6(4), 90–101 (2021). https://
jcrinn.com, eISSN: 2600–8793
7. LeCavé, A., Salamin, A.D.: Mobile Learning: Les avantages du papier virtuel, FI 1, Février,
pp.7–9 (2004)
8. Aziz, B.T.A., Tiado, M.I., Abdoulwahabou, S., Harouna, M., Noura, I.G.: Models of Quality
of Service (QoS) in the GSM environment of the New Generation of Digital Open Univer-
sities (DOUNG). Int. J. Wireless Networks Commun. 1 3(1), 1–13 (2021). (Research India
Publication, SSN 0975-6507, 2020)
9. Liu, X.: Wireless network communication in the XML metadata storage of Wushu Historical
Archives. Hindawi Wireless Commun. Mobile Comput. 2021(Article ID 5171713), 13 (2021).
https://doi.org/10.1155/2021/5171713
10. Breje, A.-R., Győrödi, R., Győrödi, C., Zmaranda, D., Pecherle, G.: Comparative study of
data sending methods for XML and JSON models. IJACSA 9(12) (2018)
11. Seol, K., Kim, Y.-G., Lee, E., Seo, Y.-D., Baik, D.-K.: Privacy-preserving attribute-based
access control model for XML-based electronic health record system. IEEE ACCESS, Digital
Object Identifier (2018). https://doi.org/10.1109/ACCESS.2018.2800288
12. http://www.ieee.org/publications_standards/publications/rights/index.html
13. Sinnema, R., Wilde, E.: eXtensible Access Control Markup Language (XACML) XML Media
Type, RFC 7061 (2013)
14. Ceccarelli, D., Zhang, F., Belotti, S., Rao, R., Drake, J.: Traffic Engineering Extensions to
OSPF for GMPLS Control of Evolving G.709 Optical Transport Networks, RFC 7138 (2014)
15. Key, R., Delord, S., Jounay, F., Huang, L., Liu, Z., Paul, M.: Requirements for Metro Ethernet
Forum (MEF) Ethernet-Tree (E-Tree) Support in Layer 2 Virtual Private Network (L2VPN),
RFC 7152 (2014)
People Skills and Online Learning: To Assume
Makes an Ass Out of U and Me
C. Todd Williams(B)
Southeastern Oklahoma State University, 425 W. University Blvd., Durant, OK 74701 (Morrison
220-C), USA
[email protected]
1 Introduction
Due to the popularity and convenience of online degree programs, there is little doubt that
participation in online learning programs has skyrocketed in recent years. About 46%
of college students in the United States have taken at least one online course according
to statistics from eLearning Industry [14]. Likewise, research on the effectiveness of
online learning, as reported in The Future of State Universities analysis [19] suggests
that “the growth of online learning is also in response to the new college student who
is older, more technologically savvy, and in need of an accessible, low-cost educational
option.” According to Allen and Seaman, almost 30% of all enrollments are now in
online courses [1].
However, not all of academia is convinced that online learning is the answer to cure
all ills of higher education. The Babson Study [13] put it this way:
In addition to questions about academic rigor and the legitimacy of online learning,
there are questions about grading. Research completed by Littlefield [13] clearly supports
the notion that “students who took all or part of their class online performed better, on
average, than those taking the same course through traditional face-to-face instruction.”
In some cases, there is general distrust when it comes to professors of traditional, face-to-
face instructional delivery and those who teach strictly online. The ever-present concern
that your university is becoming a “diploma mill” looms large in the minds of college
professors who care deeply about their field of study and are passionate about delivering
curriculum that is relevant, meaningful, and helps train students who will make a positive
impact on others.
The individual writing this paper is a college professor teaching online courses for
Southeastern Oklahoma State University in Durant, Oklahoma. He was hired in 2017
and has been teaching educational administration courses for 4.5 years. Most of the
courses taught by the writer/researcher have been delivered online. To be completely
honest with you, the reader, the writer/researcher would prefer to teach face-to-face
in the physical presence of his students. In a somewhat dated although very important
contribution to the field of education, Paolo Freire [9] indicated that a relationship with
a caring, supportive teacher is critical to student success. This viewpoint is shared by
the researcher as he has tried to be a supportive educator who not only can empathize
with his students but one who tries to teach in a way that prepares students for success
once they have graduated from our program.
During this journey as a college professor, he has made a habit of not only listening
to students but also listening to the people who are going to (hopefully) hire them.
One concern that has consistently reared its ugly head is the idea that graduates of
online programs do not have the requisite people skills necessary for success as a school
administrator. In fact, the researcher has been told by a prominent leader of an educational
service provider that some school superintendents “will not hire any more graduates of
an online program” due to the perceived lack of people skills that they have witnessed in
the graduates they have hired from such institutions. You can imagine how the previous
statement has caused not just a little bit of anxiety as our faculty has tried to navigate
the conundrum of trying to avoid this reputation and design learning activities that are
relevant yet cognizant of the need to develop an awareness of and sensitivity to others.
In the paragraphs that follow, the researcher will explain how he went about dealing
with this problem. A survey was developed for local administrators to evaluate the people
skills of known graduates of our program at Southeastern. Admittedly, the survey size
is small (33 participants) yet the results are insightful. (One reason for the small sample
size is due to the relatively recent development of our online degree. Finding school
administrators from our area who could evaluate our graduates is limited due to the rural
nature of our campus/area coupled with the fact that school administration jobs can be
hard to get).
The specific question that this research attempts to address is this: “Do graduates
of the online master’s in education (MED) program offered at Southeastern Oklahoma
State University possess the necessary interpersonal skills that allow them to be suc-
cessful school leaders?” As you can tell from the question above, the researcher targeted
emotional intelligence and the reader will understand this better as you see the questions
712 C. T. Williams
related to the survey. Goleman [10] estimated that “close to 90% of a leader’s success is
attributable to emotional intelligence”.
Education is a people intensive enterprise, requiring school leaders to have a skill
set that includes sensitivity to others - especially children. In addition to the obvious
overtures about people skills, another item the researcher was hoping to target is whether
or not we, as a staff, need to address some of these issues in our curriculum and possibly
update our approach to instruction as it relates to these matters.
The setting for the research conducted in this study was a group of public schools
in the state of Oklahoma near a regional university. What prompted the research was a
desire to know the answers to the following questions:
2 Literature Review
Interpersonal skills are referred to as soft skills in today’s business world. Soft skills
include uniquely human relational skills such as listening, empathy, communication,
compassion, and a caring attitude towards others. Based on a study by the Society
for Human Resource Management [16], 51% of its members reported that “education
systems have done little or nothing to help address the skills shortage.” In addition, human
resource professionals targeted soft skills such as professionalism, business acumen,
critical thinking, and lifelong learning as skills that are lacking in job candidates and
potential employees [16].
According to Chamorro-Premuzic and Frankiewicz, [3] the demand for colleges and
universities to stress soft skills is becoming more important and necessary. They stated
the need this way:
“…universities could substantially increase the value of the college degree if they
spent more time teaching their students critical soft skills. Recruiters and employ-
ers are unlikely to be impressed by candidates unless they can demonstrate a certain
degree of people-skills. This is perhaps one of the biggest differences between
what universities and employers look for in applicants. While employers want
candidates with higher levels of EQ (emotional intelligence), resilience, empathy,
and integrity, those are rarely attributes that universities nurture or select for in
admissions. As the impact of AI (artificial intelligence) and disruptive technology
grows, candidates who can perform tasks that machines cannot are becoming more
valuable—and that underscores the growing importance of soft skills, which are
hard for machines to emulate.” [3].
In a survey of 2,600 hiring managers and human resource professionals, 71% stated
they valued emotional intelligence more than intelligence; 75% stated they were more
People Skills and Online Learning 713
likely to promote a worker who is highly emotionally intelligent; and 59% mentioned
they would not hire a candidate with a high IQ but low EQ [8].
Deutschendorf [8] listed seven reasons why emotionally intelligent candidates are
so valuable:
Tackie [18] also explained the importance of social presence and how it impacts the
online learning environment: “Effective social presence enables students to recognize
their teachers’ humanity. By conveying personal information, or making themselves
readily available, teachers establish human connection, which in turn leads students to
more deeply engage in the classroom and motivates enhanced communication.”
A relatively recent development that has occurred due to the advent of online learn-
ing is TPACK or Technological Pedagogical And Content Knowledge. According to
tpack.org, [20] TPACK “attempts to identify the nature of knowledge required by teach-
ers for technology integration in their teaching, while addressing the complex, multi-
faceted and situated nature of teacher knowledge.” The TPACK framework consists of
seven components and is illustrated Fig. 1:
From the graphic represented above, one can see that the optimal level of student
learning occurs when the components of technological knowledge (TK), content knowl-
edge (CK), and pedagogical knowledge (PK) intersect. For teachers who deliver instruc-
tion in a purely online format, this model has significant implications. It is easy to see that
what is important to the instructor becomes important to the student. Personal character-
istics and qualities that are deemed to be essential interpersonal skills by the instructor
are emphasized in learning activities that lead to the student making an emotional evalu-
ation as to their significance and deciding whether these qualities will be adopted by the
People Skills and Online Learning 715
learner. These values are not only “taught” but are effectively “caught” by the students
as the instructor leads the class.
A quote by Maya Angelou serves as a good example. She once said, “People may
not remember what you said but they will always remember how you made them feel.”
The importance of this mindset is used as the basis for a class discussion via Zoom in
the writer’s classes as a way of stressing the importance of genuine personal interactions
which lead to deeper levels of trust, sensitivity to others, and mutual respect. Now, the
reader might assume that due to the impersonal setting of a virtual classroom meeting
online, personal concepts such as those mentioned above cannot be transmitted to a
class full of digital natives who are participating via the internet from their own homes.
However, just the opposite was found by simply reading through the student evaluations
for the writer/researcher in addition to the numerous studies about this phenomena (Cui
et al. [5]; Song et al. [17]; Tackie [18]).
Effective leaders possess a high degree of empathy (Greenleaf [11]; Culver [6]).
Effective leaders also demonstrate humility and are focused on the needs of others
(Collins [4]; Blanchard and Hodges [2]). Therefore, it was decided to survey a group of
school administrators who were working alongside a graduate of our program and had
the responsibility of evaluating these graduates in order to test the hypotheses mentioned
previously in this article.
3 Research Methodology
With this knowledge in mind, it was decided to survey administrators of local school
districts who had known graduates of the online program at Southeastern. The purpose
of the survey was to evaluate the emotional intelligence of our graduates in an attempt
to measure two things:
4 Participants
The sample size for this study included supervisors of graduates of our program who
had secured employment as a school administrator since 2017, when our program went
online. Thirty-three supervisors responded to the survey which was completed online via
Survey Monkey. Admittedly, the sample size is small, due mainly to the fact that it takes
time for a person to secure a job as a school administrator. However, the results do reveal
some interesting insights about people skills, emotional intelligence, and the perceptions
supervisors have about our graduates. Certainly, further study of this phenomena is
necessary.
716 C. T. Williams
6 Results
Demographics. Surveys were sent via email to 60 school administrators in our local
area who were currently supervising known graduates of the online MED program at
Southeastern Oklahoma State University (SOSU). Responses were received from 33
school administrators who agreed to participate in the study. Among the school admin-
istrators who responded, four (12%) were Superintendents, four (12%) were Assistant
Superintendents, 12 (37%) were Principals, six (18%) were Assistant Principals, and
seven (21%) were from the category “Other”, meaning curriculum director or other
district/campus leader.
Of the respondents, a little over half (18 or 55%) had served 21 or more years in
education, while roughly a fourth (8 or 24%) had served 11–20 years in education.
Five (15%) had served 6–10 years and two (6%) had served 0–5 years in education,
respectively.
In terms of years of experience, 55% (18) reported they had spent 0–5 years in their
current position. Fully 30% (10) reported they had 6–10 years of experience in their
current position while 12% (4) expressed they had spent 16–20 years of experience in
their current position. Only one (3%) indicated they had 21–25 years of experience in
their current position.
In describing their community, 52% (17) indicated they worked in rural areas. Twelve
respondents (36%) identified their community as suburban and four (12%) reported they
worked in an urban setting. In terms of level of education, five respondents (15%) had
earned a doctoral degree while 28 (85%) had earned a master’s degree.
Findings of the Study. Overall, the respondents for this survey were generally positive
about their experiences with graduates of our online program. Questions 7, 8, and 13 were
designed to measure the degree to which respondents’ view online learning as a legitimate
form of instructional delivery. Question 8 was intentionally worded in a negative tone in
order to verify the reliability of responses for a similarly worded question. The results
for these three questions confirm that the respondents do not view online learning in a
negative way.
The remaining questions on the survey (6, 9–12, and 14–22) all dealt with items
related to interpersonal skills and emotional intelligence. The results of these items on
the survey also verify that school leaders who have a supervisory role relating to the eval-
uation of graduates of our program indicated they were generally satisfied when it comes
to these areas. Multiple questions were designed to measure perceptions about emotional
People Skills and Online Learning 717
intelligence and interpersonal skills for these sections of the survey and administrator
responses about those questions were generally favorable. One key takeaway for the
researcher is that people skills can be conveyed during online class sessions and based
on the evidence from the study, we are doing a good job of stressing the importance of
social and emotional learning (SEL). The entire survey and results for each question can
be found in the Appendix I and II.
As stated previously in this article, more research in this area is warranted and future
study will allow for ongoing evaluation of our program. The conclusions drawn from this
research can only “whet the appetite” for more discovery of what works when it comes
to online education in a consistent approach to monitoring for continuous improvement.
According to Kolloff [12], “The design role becomes important in that the majority of the
instructor’s time is spent in determining how the course is to be implemented.” With that
in mind, we (as a staff) can begin to consider and evaluate what is critically important
as we design learning activities that serve our students well in the preparation for their
future roles as school leaders who possess personal skills which lead to success in the
performance of their job duties.
Appendix I
See Tables 1 and 2
718 C. T. Williams
Survey question SA A N D SD
Q6. When thinking about graduates of the online MED 15.15% 60.61% 24.24% 0.00% 0.00%
program at SEOSU, they are more likely to demonstrate
empathy towards other individuals in the performance of
their job as a school leader
Q7. When thinking about online degree programs in 54.55% 39.39% 3.03% 3.03% 0.00%
general, I believe they are a valuable asset to our
employees as they allow for convenience and expediency
Q8. When thinking about online degree programs in 0.00% 3.30% 3.03% 54.55% 39.39%
general, I believe they are a waste of time as students are
not exposed to the experiences necessary for personal
development and personal interaction which allow for
professional growth as they earn their degree
Q9. When thinking about graduates of the online MED 12.12% 48.48% 36.36% 3.03% 0.00%
program at SEOSU, I am more likely to hire a graduate
from this program as our school district has benefitted
from the curriculum being taught in this program
Q10. When thinking about graduates of the online MED 0.00% 15.15% 36.36% 42.42% 6.06%
program at SEOSU, I believe there should be a greater
emphasis on interpersonal skills than what I’ve seen thus
far
Q11. Graduates of the online MED program at SEOSU 30.30% 60.61% 9.09% 0.00% 0.00%
are generally reliable and dependable professionals who
consistently meet the demands of the position in which
they are employed
(continued)
People Skills and Online Learning 719
Table 2. (continued)
Survey question SA A N D SD
Q12. When presented with an opportunity to hire 30.30% 54.55% 15.15% 0.00% 0.00%
someone for a school leadership position, I feel confident
in recommending an individual who has completed the
online MED program at SEOSU
Q13. I prefer traditional, face-to-face learning 9.09 15.15% 27.27% 36.36% 12.12%
experiences as opposed to the current trend towards %
online learning when it comes to preparing individuals
for their roles as school leaders
Q14. Graduates of the online MED program at SEOSU 36.36% 54.55% 9.09% 0.00% 0.00%
have demonstrated the ability to build trust with others as
they perform the duties of their job
Q15. Graduates of the online MED program at SEOSU 33.33% 54.55% 9.09% 3.03% 0.00%
display humility and authenticity as they perform the
duties of their job
Q16. Graduates of the online MED program at SEOSU 30.30% 54.55% 15.15% 0.00% 0.00%
exhibit emotional intelligence as they perform the duties
of their job
Q17. Graduates of the online MED program at SEOSU 36.36% 57.58% 6.06% 0.00% 0.00%
develop positive relationships with others as they
perform the duties of their job
Q18. Graduates of the online MED program at SEOSU 33.33% 57.58% 9.09% 0.00% 0.00%
demonstrate courage as they perform the duties of their
job
Q19. Graduates of the online MED program at SEOSU 33.33% 57.58% 9.09% 0.00% 0.00%
use data to make informed decisions
Q20. Graduates of the online MED program at SEOSU 30.30% 63.64% 6.06% 0.00% 0.00%
hold themselves and others accountable for their actions
Q21 Graduates of the online MED program at SEOSU 27.27% 63.64% 9.09% 0.00% 0.00%
use resources wisely
Appendix II
References
1. Allen, I.E., Seaman, J.: Class differences: Online education in the United States. Sloan
Consortium (NJ1) (2010)
2. Blanchard, K., Hodges, P.: Lead Like Jesus. Thomas Nelson, Nashville, TN (2005)
3. Chamorro, T., Frankiewicz, B.: Does higher education still prepare people for jobs. Harvard
Business Review (2019)
4. Collins, J.C.: Good to Great: Why Some Companies Make the Leap…and Others Don’t.
Harper Collins, New York, NY (2001)
5. Cui, G., Lockee, B., Meng, C.: Building modern online social presence: a review of social
presence theory and its instructional design implications for future trends. Educ. Inf. Technol.
18, 661–685 (2013). https://doi.org/10.1007/s10639-012-9192-1
6. Culver, M.K.: Applying Servant Leadership in Today’s Schools. Routledge, New York, NY
(2009)
7. Derlega, V.J., Metts, S., Petronio, S., Margulis, S.T.: Self-disclosure. Sage Publications, Inc.
(1993)
8. Deutschendorf, H.: 7 reasons why emotional intelligence is one of the fastest-growing job
skills. Fast Company. https://www.fastcompany.com/3059481/7-reasons-why-emotional-int
elligence-is-one-of-the-fastest-growing-job-skills. Accessed 18 March 2022
9. Freire, P.: Pedagogy of the Oppressed. Simon Fraser University Library (2018)
10. Goleman, D.: Working with Emotional Intelligence. Bantam Books, New York, NY (1998)
11. Greenleaf, R.K.: The Servant as Leader. The Robert Greenleaf Center, Indianapolis, IN (1991)
12. Kolloff, M.: Strategies for Effective Student-To-Student Interaction in Online Courses.
University of Wisconsin System Board of Regents, Madison, WI (2001)
13. Jamie, L.: What Does Research Say About Online Learning? (2020). https://www.thoughtco.
com/what-research-says-about-online-learning-1098012
14. Pappas, C.: E-learning: Top 10 e-Learning Statistics for 2014 You Need To Know. eLearn-
ing Industry. https://elearningindustry.com/top-10-e-learning-statistics-for-2014-you-need-
to-know. Accessed 18 March 2022
People Skills and Online Learning 723
15. Rasmussen, B.M., Mishna, F.: A fine balance: instructor self-disclosure in the classroom. J.
Teach. Soc. Work. 28(1–2), 191–207 (2008)
16. SHRM: The Global Skills Shortage. SHRM (2019). https://www.shrm.org/hr-today/trends-
and-forecasting/research-and-surveys/Pages/default.aspx. Accessed 18 March 2022
17. Hayeon, S., et al.: Teacher–student relationship in online classes: a role of teacher self-
disclosure. Comput. Hum. Behav. 54, 436–443 (2016). https://doi.org/10.1016/j.chb.2015.
07.037
18. Tackie, H.N.: (Dis)Connected: establishing social presence and intimacy in teacher-student
relationships during emergency remote learning. AERA Open. (2022). https://doi.org/10.
1177/23328584211069525
19. The Future of State Universities: Research on the effectiveness of online learning (2011).
https://www.learningfront.com/Media/Research_Online_Learning.pdf. Accessed 18 March
2022
20. tpack.org
Scenarios for Virtual Clinical Simulation
to Train Nursing Students at a South African
University
School of Nursing, University of the Free State, Bloemfontein, Free State, South Africa
[email protected]
Abstract. With the COVID-19 pandemic, nursing students were left in the dark
when it came to clinical practice and skills acquisition; suddenly, limited placement
and practical skill opportunities became even more limited. The University of
the Free State in South Africa was no exception, which forced rapid innovation
and expansion of digital systems to assist nursing students in practising skills
and integrating theory in practice. To address the need for theory and practice
integration, the researchers sought free-to-use VCS platforms and scenarios that
might be used by students to practice their skills and integrate their theory and
practice. During this research, it became clear that nursing does not have a lot of
support in the open-source and free-to-use world op software as most platforms
and scenarios are aimed at medical doctors. There were, however, some platforms
and scenarios which could be included and linked to outcomes in the Bachelor of
Nursing programme at the University of the Free State.
1 Introduction
In the light of the recent COVID-19 pandemic, innovative online solutions are being
sought to provide nursing students at the University of the Free State (UFS) with oppor-
tunities to integrate their theory and practice. The COVID-19 pandemic further increased
the gap between theory and practice integration because of even more restricted access
to clinical placements [1], which has always been a global challenge [2, 3].
The identified problem is that there are limited available virtual reality (VR) appli-
cations for students to practice clinical skills in nursing. To try and address the lack
of VR applications, the researchers identified free-to-use online clinical skills training
solutions in different health science fields which might be applicable for practicing nurs-
ing skills. The researchers sought to evaluate free-to-use desktop-based VR applications
for students to use as training opportunities for bridging the theory and practice gap by
comparing the outcomes of the available scenarios and skills to those that are required
throughout the four-year Bachelors in Nursing (B.Nur) programme presented by the
UFS.
2 Related Work
Various studies have emphasised the gap between theory and practice, for example, Choi
et al., Gilbert and Johnson, Howard, Scully and Van Zyl [4–8]. The gap between theory
and practice is mainly due to limited accredited clinical placement sites where students
can apply and transfer their theoretical knowledge in practical situations, especially in a
developing country like South Africa [8, 9]. Innovative teaching and learning strategies
may help bridge the gap between theory and practice and improve patient safety in
clinical environments [7, 8, 10].
Innovative teaching and learning strategies that have been identified to try and address
the gap between theory and practice is the use of Computerised Human Patient Simula-
tion (CHPS) and Virtual Clinical Simulation (VCS) [11, 12]. CHPS is an effective, but
expensive method to assist in bridging the gap between theory and practice and to help
students develop their skills [13, 14].
VCS has been investigated by various researchers to determine its viability as a
modality for training nursing students, for example, managing a patient with a foreign
object in the right lung [15], diagnosing patients via an Artificial Learning Interface
for Clinical Education (A.L.I.C.E.) [16], patient safety [17], medication administration
[18] and enhancing men’s awareness of testicular diseases [19], to mention a few. All the
aforementioned research also found VCS to be an adequate modality for skills acquisition
in nursing education.
For his research the spotlight falls on desktop-based VCS, also referred to as non-
immersive VCS, which is seen as VCS utilising interactions in a virtual environment (VE)
using a mouse and keyboard or a touchscreen on a mobile device [18, 20]. One issue with
the available research is that it was, in most cases, created by the researchers themselves
and are not always freely available. There were however free-to-use applications that
could be considered for evaluation during this research, to determine whether they could
assist with the acquisition of nursing skills. This study will aim to evaluate free-to-use
desktop VR applications for skills acquisition in the four-year Bachelors in Nursing
(B.Nur) programme presented by the UFS.
The searches were done with “free” included and with “free” excluded in the searches.
From all the searches only the platforms and scenarios that could be freely accessed and
used were included. Set out in Table 1 are the platforms and their scenarios that were
found, from the aforementioned search terms and research articles that the researchers
had access to.
Table 1. (continued)
Table 1. (continued)
Once the platforms with the scenarios were listed, the principal researcher determined
the technical viability for example, can this platform work on various platforms (mobile
and desktop) and what are the technical requirements. This was done since not all students
have high-end computers or smart mobile devices. From the available platforms and
scenarios, the unviable options were excluded before the evaluation commenced (See
Table 2).
Once the initial exclusions were completed two reviewers evaluated the scenarios and
platforms for inclusion, one reviewer is a nurse educator who specialised in simulation for
nursing students and the other reviewer is a specialist in various simulation technology.
Both reviewers determined what platforms and scenarios could be included based on the
outcomes for the B.Nur programme at the UFS. The outcomes are available from the
yearbook which is published on the website of the UFS [26].
The reviewers played each of the VCS games which were selected for evaluation,
together and compared the outcomes of the VCS scenario with that which was set out
in the curricula for the B.Nur programme. Figure 1 shows one of the scenarios available
from FullCode which was evaluated. The reviewers then determined for which year
group or groups the scenario is applicable and what outcome does it satisfy, as will be
discussed in the results and discussion to follow.
After evaluation of the games, the reviewers noted that most scenarios are aimed at
medical doctors, even though there were aspects applicable to nursing, it would be
somewhat difficult to split the aspects that are applicable for nursing skills from the
main scenarios due to the nature of the platforms and the scenarios they house, which
requires the student to address all the outcomes to complete the scenarios and receive
feedback. There were however some platforms and scenarios which could be included
and coded to outcomes of the B.Nur programme as can be seen in Table 3.
From all the scenarios and platforms a total of two platforms with ten scenarios
between them could be coded to the B.Nur programme. However, the reviewers also
found that some platforms and scenarios might be applicable for use in the future for
certain post-graduate diplomas as can be seen in Table 4.
730 B. B. Stephanus and F. Cecile
Table 3. (continued)
The reviewers determined that there is a limited number of platforms that could be
used for the nursing students in the B.Nur programme, however, it can still assist with
some of the outcomes as mentioned previously.
5 Conclusion
In conclusion, the research showed that there are a limited number of free-to-use plat-
forms and scenarios and even more so for nursing students, as the focus seems to be on
training medical doctors. This is a big problem, especially seeing that 90% of the world
population’s first contact is with a nurse [27, 28]. Even though the evaluated scenarios
can help the students in the B.Nur programme at the UFS, the available platforms for
nursing, still needs to be expanded, especially open-source options. The reason is that
developing countries do not have the financial capacity to pay licensing fees for virtual
platforms.
This research can also provide insights to other researchers on possible available
platforms and scenarios which could be expanded on in future research and tested to
determine their effectiveness in providing nursing students with the needed skills as
per the outcomes of their nursing programme. Furthermore, this research can provide
opportunities for medical doctors to research the effect of free-to-use VCS for skills
acquisition on the platforms and scenarios that are aimed at them.
732 B. B. Stephanus and F. Cecile
6 Ethical Clearance
Ethical clearance was granted by the relevant ethics committee under a project with
ethical clearance number: UFS-HSD2020/1313.
7 Future Research
The platforms and scenarios were evaluated but not tested by the students, which could be
valuable future research endeavors. VCS could also be combined with CHPS simulation
by preceding simulation activities. Research can be done to determine whether it helps
prepare students better for CHPS.
Another issue was that the post-graduate diplomas are still being designed and must
be approved by the regulating bodies in South Africa, which meant the reviewers based
the inclusion of scenarios for post-graduate diplomas, on the old outcomes which might
in future differ from the new post-graduate diploma outcomes, once they are complete
and approved.
For this research the researchers assumed that the platforms and scenarios will con-
tribute positively to skills acquisition for nursing students due to the number of research
which indicated the positive effect of VCS. The effect these VCS platforms and scenarios,
can however be tested empirically to determine the effect they have on the skills acqui-
sition of nursing students in future research endeavors, which is one of the upcoming
research endeavors that flowed from this research.
References
1. Dolan, H., Amidon, B.J., Gephart, S.M.: Evidentiary and theoretical foundations for virtual
simulation in nursing education. J. Prof. Nurs. 37, 810–815 (2021). https://doi.org/10.1016/
j.profnurs.2021.06.001
2. Niederhauser, V., Schoessler, M., Gubrud-Howe, P.M., Magnussen, L., Codier, E.: Creating
innovative models of clinical nursing education. J. Nurs. Educ. (2012). https://doi.org/10.
3928/01484834-20121011-02
3. Poikela, P., Ruokamo, H., Teräs, M.: Comparison of meaningful learning characteristics in
simulated nursing practice after traditional versus computer-based simulation method: a qual-
itative videography study. Nurse Educ. Today. 35 (2015). https://doi.org/10.1016/j.nedt.2014.
10.009
4. Choi, W., et al.: Engagement and learning in simulation: recommendations of the Simnovate
Engaged Learning Domain Group. BMJ Simul. Technol. Enhanc. Learn. 3, S23–S32 (2017).
https://doi.org/10.1136/bmjstel-2016-000177
5. Gilbert, K.A., Johnson, C.W.: Increasing self-efficacy through immersive simulations: leading
professional learning communities. J. Leadership Educ. 17, 72–93 (2018). https://doi.org/10.
12806/V17/I4/R5
6. Howard, S.: Increasing Fidelity and Realism in Simulaton (2018)
7. Scully, N.J.: The theory-practice gap and skill acquisition: an issue for nursing education.
Collegian 18, 93–98 (2011). https://doi.org/10.1016/j.colegn.2010.04.002
8. Van Zyl, A.E.: Exploring the potential theory-practice gap in the teaching methods of nurse
educators (2014)
Scenarios for Virtual Clinical Simulation to Train Nursing Students 733
9. Waldner, M.H., Olson, J.K.: Taking the patient to the classroom: applying theoretical frame-
works to simulation in nursing education. Int. J. Nurs. Educ. Scholarsh. 4, Article18 (2007).
https://doi.org/10.2202/1548-923X.1317
10. Alinier, G., Platt, A.: International overview of high-level simulation education initiatives in
relation to critical care. Nurs. Critical Care 19 (2013). https://doi.org/10.1111/nicc.12030
11. Botha, B.S., Hugo-van Dyk, L., Nyoni, C.N.: The reality of virtual reality at a South African
university during the COVID-19 pandemic. African J. Heal. Prof. Educ. 13, 199–200 (2021).
https://doi.org/10.7196/AJHPE.2021.v13i3.1503
12. Verkuyl, M., Atack, L., Mastrilli, P., Romaniuk, D.: Virtual gaming to develop students’
pediatric nursing skills: a usability test. Nurse Educ. Today. 46, 81–85 (2016). https://doi.org/
10.1016/j.nedt.2016.08.024
13. Lapkin, S., Levett-Jones, T.: A cost-utility analysis of medium vs. high-fidelity human patient
simulation manikins in nursing education. J. Clin. Nurs. 20, 3543–3552 (2011). https://doi.
org/10.1111/j.1365-2702.2011.03843.x
14. Pywell, M.J., Evgeniou, E., Highway, K., Pitt, E., Estela, C.M.: ScienceDirect High fidelity,
low cost moulage as a valid simulation tool to improve burns education. Burns 42, 844–852
(2016). https://doi.org/10.1016/j.burns.2015.12.013
15. Botha, B.S., de Wet, L., Botma, Y.: Usability of a foreign body object scenario in VR for
nursing education. In: IEEE (ed.) 2020 IEEE Conference on Virtual Reality and 3D User
Interfaces Abstracts and Workshops (VRW). pp. 787–788. IEEE, Atlanta (2020)
16. Kleinert, R., Wahba, R., Chang, D.H., Plum, P., Hölscher, A.H., Stippel, D.L.: 3D immersive
patient simulators and their impact on learning success: a thematic review (2015)
17. Butt, A.L., Kardong-Edgren, S., Ellertson, A.: Using game-based virtual reality with haptics
for skill acquisition. Clin. Simul. Nurs. 16, 25–32 (2018). https://doi.org/10.1016/j.ecns.2017.
09.010
18. Dubovi, I., Levy, S.T., Dagan, E.: Now I know how! The learning process of medication
administration among nursing students with non-immersive desktop virtual reality simulation.
Comput. Educ. 113, 16–27 (2017). https://doi.org/10.1016/j.compedu.2017.05.009
19. Saab, M.M., Hegarty, J., Murphy, D., Landers, M.: Incorporating virtual reality in nurse
education: a qualitative study of nursing students’ perspectives. Nurse Educ. Today 105,
105045 (2021). https://doi.org/10.1016/j.nedt.2021.105045
20. Choi, D.H., Dailey-Hebert, A., Estes, J.S.: Emerging tools and applications of virtual reality
in education (2016)
21. Full Code: Emergency Medicine Simulation. https://app.full-code.com/Player/Player.html
22. Stanford University School of Medicine: Septris. http://septris.stanford.edu//game/SeptrisTi
tle.html
23. Surgery Squad: Surgery Games | Surgery Squad. http://www.surgerysquad.com/category/sur
gery-games/page/2/
24. MedicActiV: MedicActiV. https://app.medicactiv.com/?redirect=%2Fhome
25. Breakawaygames: vHealthCare. https://store.breakawaygames.com/Home/Index
26. University of the Free State: Rule book – Courses. https://www.ufs.ac.za/health/departments-
and-divisions/school-of-nursing-home/general/courses
27. Knowles, M.: Survey: 90% of nurses admit they do not have enough time to prop-
erly care for patients. https://www.beckershospitalreview.com/quality/survey-90-of-nurses-
admit-they-do-not-have-enough-time-to-properly-care-for-patients.html
28. University of Texas Arlington: The Nurse’s Role in Global Health. https://academicpartner
ships.uta.edu/articles/healthcare/nurses-role-in-global-health.aspx
Learning Factory Synergy: Applied Learning
and Problem-Based Pedagogy in the Digital
Transformation Ecosystem
Abstract. The current manufacturing process has changed drastically in the last
decade due to the many changes in both hardware and software in computing. It is
thus worth investigating how people make use of these latest technologies in one
single workplace to fully utilize the power given by the manufacturing business
process, from learning, production, and then to further development.
In this paper, we discuss the challenges and difficulties of how workplace
synergy can be synchronous with academia in the form of research centers. We
surveyed some key personnel who partnered with higher education institutes for
collaboration work. Based on their experience, we would like to showcase and
discuss the core factors to make the academic-industrial collaboration work suc-
cessful. We will discuss the project plan, partner relationship, and knowledge
sharing process between industry supporters, academic staff, and the students;
including the pedagogy used and how the digital transformation takes place in the
learning factory ecosystem and then transferred the output to the real world.
We conclude that to achieve a good workplace synergy in the learning fac-
tory ecosystem, four elements are essential: a real-world scenario, a work-based
learning pedagogy, a long-term industry partner, a knowledgeable manager, who
is a professional with commercial experience, and with a technically trained back-
ground. We believed vocational education, applied learning, work-based learning
and teaching are several critical educational elements to enhance economic growth.
1 Introduction
The first industrial revolution [1] started its chapter by burning coal to power steam-
powered engines in the 1760s; while the second industrial revolution [2] started the
1.1 Insights from World Economic Forum – Resetting the Future of Work
Agenda
In 2020, World Economic Forum (WEF) conducted global research on how businesses
would take to respond to the effects brought by COVID-19 [9]. 84% of the respondents
replied they would accelerate digitalization in their workplaces and business processes.
83% responded they would provide more opportunities to work remotely, and 50%
of them addressed the importance of process automation, extra efforts will be put to
accelerate this in the workplace (Fig. 1).
What does this mean to the learning factory? One of the major impacts is the change
from the physical environment to the online virtual environment. COVID-19 speeded
up the digital transformation process, and shifted many works from physical face to face
media to the digital world. Perhaps it’s due to the safety issue, or perhaps it’s due to the
resources saved and convenient issues, more people prefer to work from remote (home)
instead of spending time on commute. So, when we talk about the digital application
and the pedagogy in the learning factory ecosystem; what should we pay attention to
inspired by the insights above? There are five imperatives for resetting the future of the
work agenda discussed in the WEF white paper which may give us some hints (Fig. 2).
Embrace stakeholder capitalism, creates, and enforces a closer working relationship
between industry and academia, by having a win-win sustainable ecosystem that drives
innovation from the research center to commercial areas [10]. Funding, equitable shar-
ing of risks, and rewards allow a better atmosphere in the workplace. Aligning new
technologies and skillsets gives room to the students, trainees, and current employees an
opportunity to develop their hidden talents to the next level [11]. Transforming organi-
zation design and workflow facilitates the digital transformation trend which is already
taking place during the COVID-19 period [12].
736 P. C. Yau et al.
Fig. 1. Planned business measures in response to COVID 19 (Partial) (Source: World Economic
Forum, the future of jobs report 2020; Image Source: World Economic Forum resetting the future
of work Agenda: disruption and renewal in a post-COVID World 2020).
Fig. 2. Five imperatives for resetting the future of work Agenda (Source: World Economic Forum
and Mercer, 2020; Image Source: World Economic Forum resetting the future of work Agenda:
Disruption and renewal in a post-COVID World 2020).
1.2 Feedback from the Students and Feedforward from the Industry Partners
To plan forward and to make those insights mentioned above work in the ecosystem.
What do we need to do? It is common that we receive feedback when we complete a task
or a project; it is getting popular to use feedforward as a reverse approach to managing
the expectation and achieving better outputs management for an event.
Take an example from the survey results in the previous section: more people are con-
cerned about the digital transformation in the workplace during the COVID-19 period.
How students can understand the importance of digital transformation in advance when
they participate in the learning factory activities? And how the industry partners can
give advance attention to both the teaching staff when they design the simulation and
Learning Factory Synergy: Applied Learning and Problem-Based Pedagogy 737
curriculum? And to the students on what they can expect after participating in the train-
ing program? In a well-matched and balanced situation, expectation and output can be
managed in a harmonized way. In the next section, we will look at how digital transfor-
mation and work-based learning pedagogy work together to make the learning factory
ecosystem good in the new era.
The goal of our interview is to understand at an in-depth level, how the management
of a business supporting partner thinks about educational body and responds to the
technological changes that take place in the real world. What kind of supporting facilities
we should have to bring a positive result to the industry partner. Interviewees are resource
owners and sponsors to a part of the learning factory design, so they hold and account
for the actual usefulness of the resources they spent.
Due to the continuous development of the COVID-19, interviews were done via an
online video conference system, open type questions were asked. Interviewees were told
738 P. C. Yau et al.
that no personal information will be revealed in the research process; all the identifiable
information including the name of the person, company, and all business-related data
will be concealed. Background of the company, professional domain and other important
data related to the success and implications of the research will be fully disclosed.
3.2 Questions
Four questions were discussed in the interview, they are asked in order. No time limitation
is set for each of the questions. The interviewee was informed the discussion will last
around 60 min. Two interviewees were introduced to each other at the beginning of the
interview, and they can share the comments anytime they want, this arrangement allows
more communication based on other’s sharing, if any.
1. What do you think about the learning factory nowadays in the academia?
2. What do you expect from the school training? How is it related to your com-
pany/business?
3. Do you think that digital transformation, blockchain, metaverse; and all these kinds
of new technological keywords are related to the training, and to the students?
4. How can we do better as a partner, as an ecosystem?
3.3 Interview
What Do You Think About the Learning Factory Nowadays in the Academia?
N: As a manager in a higher education institute, I am fortunate to have the opportunity
to visit different kinds of laboratories and learning factories in different schools and
universities. They are attractive, fun and interesting. I can see that the arrangement is
well-designed. I would love to see if this kind of facility can be fully utilized.
K: For me, I just want my expectation can be fulfilled. Well as you know, we are
accountable for the ROI, no matter whether it is a commercial project or not.
What Do You Expect from the School Training? How Is It Related to Your Com-
pany/Business?
K: Job-ready student, less supervision in the workplace is the best. I hope that they come
to the workplace to solve our problems.
Learning Factory Synergy: Applied Learning and Problem-Based Pedagogy 739
N: School should provide comprehensive training. I think that a good student should
be both equipped with hands-on knowledge and theoretical knowledge. This is the best
scenario.
Do You Think that Digital Transformation, Blockchain, Metaverse; and All These
Kinds of New Technological Keywords are Related to the Training, and to the Stu-
dents?
K: We want more digital things, just like we want to try 3D construction on the site. But
we lack the skills and it looks like expensive in research as well. I don’t know. There
are just lots of new technologies coming out in recent years. Many of them I just don’t
know what it is.
N: I have no idea, too. I guess students should think about whether they would like
to go wide or go deep. They should have a good mentor who allows them to understand
both the business and technical knowledge.
4 Discussion
We analysed the viewpoints shared by the interviewees and found that four elements are
especially important: (i) a real-world scenario, (ii) a work-based learning pedagogy, (iii)
a long-term industry partner, and (iv) a knowledgeable manager. Here we discuss each
of the identified elements and discuss the reasons why it is important.
We found that real-world scenario gives the exact detail [13] about where the problem
is, how it happens and what could be done for the solutions. This gives educators the
exact direction on how they plan and design the teaching direction. Simulations in the
learning factory can be built based on it and trainees can mockup the entire process in a
safe way, and long term sustainability. A work-based learning pedagogy facilitated both
the needs of the society and the students, maybe it is due to the economic reason, or
family financial issues; a work-based teaching methodology (majority) diverge students
into two streams, hands-on first or theory first [14]. By suitable design in the learning
factory, it can provide the hands-on first student training while the academia partner and
filling the gap on how theory backed up the practical knowledge as what they just did in
the simulation.
A good research item takes time to investigate, study, plan and develop; this is what
the interviewee addressed: although output is important, sometimes they need to take a
balance between the number of countable outputs delivered and the quality of the results
(i.e., performance indicator may not apply in all the scenario) [15]. This message is
740 P. C. Yau et al.
especially important to bring to the trainees that, the digital transformation process is a
speed thing up many times, but it is not 100% guaranteed. Technologies can be helpful in
many situations, and it can destroy many. Know-how managers, best to work in a pair, a
senior and a junior are recommended to guide the students in the entire learning process.
A know-how manager is a person who knows well about the real-world problem in the
workplace, and they know exactly what they need in the research output, to give value
to the company and as a measurement KPI how this cooperation work in a sustainable
way [16].
5 Conclusion
Technologies should be selective in the project picked by the industry partner. Indus-
try partners should create a feedforward mechanism and real-world problem for the
academia to work on and plan the simulation in the learning factory. Rather than many
forms of new technologies, digital transformation is one of the processes which cannot
be missed due to the global impacts. The work-based learning approach should be con-
sidered by the faculty while they need to balance the theory education to bridge the gap
between practical and theoretical. An ecosystem can be achieved by long term partner-
ship and a setup of the know-how manager, who is directly responsible for the question
setup and the examination of the research output.
References
1. Deane, P.M., Deane, P.M.: The First Industrial Revolution. Cambridge University Press (1979)
2. Mokyr, J., Strotz, R.H.: The second industrial revolution, 1870–1914. Storia dell’economia
Mondiale 21945(1) (1998)
3. Janicke, M., Jacob, K.: A third industrial revolution. Long-term governance for social-
ecological change, pp. 47–71 (2013)
4. Cooper, C., Kaplinsky, R.: Technology and Development in the Third Industrial Revolution.
Routledge (2005)
5. Chunguang, B., Patrick, D., Guido, O., Joseph, S.: Industry 4.0 technologies assessment: a
sustainability perspective. Int. J. Product. Econ. 229, 107776 (2020). https://doi.org/10.1016/
j.ijpe.2020.107776. ISSN 0925–5273
6. Wikipedia contributors: Fourth Industrial Revolution (2022). https://en.wikipedia.org/wiki/
Fourth_Industrial_Revolution
7. Panel, E.: 10 Ways Technology Has Changed Team Communication. Forbes (2018). https://
www.forbes.com/sites/forbesbusinessdevelopmentcouncil/2018/08/02/10-ways-technology-
has-changed-team-communication/
8. Gronau, N., Ullrich, A., Teichmann, M.: Development of the industrial IoT competences
in the areas of organization, process, and interaction based on the learning factory concept.
Procedia Manuf. 9, 254–261 (2017)
Learning Factory Synergy: Applied Learning and Problem-Based Pedagogy 741
9. Resetting the Future of Work Agenda: Disruption and Renewal in a Post-COVID World.
World Economic Forum (2020). https://www.weforum.org/whitepapers/resetting-the-future-
of-work-agenda-disruption-and-renewal-in-a-post-covid-world
10. Drobyazko, S., Okulich-Kazarin, V., Rogovyi, A., Goltvenko, O., Marova, S.: Factors of
influence on the sustainable development in the strategy management of corporations. Acad.
Strateg. Manag. J. 18, 1–5 (2019)
11. Green, A.: The COVID-19 Crisis and Implications for Skills Development and the Skills
System. Edward Elgar Publishing, In Productivity and the Pandemic (2021)
12. Priyono, A., Moin, A., Putri, V.N.A.O.: Identifying digital transformation paths in the business
model of SMEs during the COVID-19 pandemic. J. Open Innov. Technol. Market Compl.
6(4), 104 (2020)
13. Okuda, S.M., Runco, M.A., Berger, D.E.: Creativity and the finding and solving of real-world
problems. J. Psychoeduc. Assess. 9(1), 45–53 (1991)
14. Black, J.S., Mendenhall, M.: A practical but theory-based framework for selecting cross-
cultural training methods. Hum. Resour. Manage. 28(4), 511–539 (1989)
15. Marr, B.: Key Performance Indicators (KPI): the 75 measures every manager needs to know.
Pearson UK (2012)
16. Petrosjan, L.A., Zenkevich, N.A.: Conditions for sustainable cooperation. Autom. Remote.
Control. 76(10), 1894–1904 (2015). https://doi.org/10.1134/S0005117915100148
Teacher Training Management Guidelines
for Improving Green IT Teaching Intention
and Behavior
University of South Africa (Unisa), 28 Pioneer Avenue, Florida, Roodepoort 1709, South Africa
[email protected], [email protected]
1 Introduction
The United Nations sustainable development goals (SDGs) provide an urgent call for the
global community to address society’s most urgent challenges [1]. A major challenge is
the protection and sustainable use of the natural environment to provide for the needs of
the present and future generations, and the SDGs expose the vital and indispensable role
that the natural environment plays in the well-being of all people on Earth. Nevertheless,
people continue to use the natural environment unsustainably, causing severe environ-
mental depletion and degradation that is evident in many forms, such as pollution, global
warming, ocean acidification, loss of biodiversity and deforestation [2].
Information Technology (IT), incorporating Information and Communications Tech-
nologies (ICTs), too has an impact on the natural environment. IT use has become ubiqui-
tous resulting in extensive non-renewable resources consumption during IT manufacture
with the associated air, water and soil pollution [3], increased global warming through
carbon emissions due to energy consumption during IT use [4] and considerable ground
and water pollution from millions of tons annually of hazardous electronic waste (e-
waste) at IT disposal [3]. To address these negative environmental impacts, the theory
and practice of Green Information Technology (Green IT) was developed [5]. Notably,
the term ‘green’ is typically associated with nature, corresponds to plants, grass and trees
and is used to denote environmental sustainability. In addition, the concept of Green IT
is considered to have conceptual equivalence to Green ICT, IT for Sustainability, Green
Computing, Sustainable IT and Environmentally Sustainable Computing.
Environmental sustainability necessitates human attitude and behavior changes
toward sustainable ways of living and interacting with the natural environment [6].
To enable attitude and behavior changes, people require the knowledge and skills to
understand sustainability problems and solutions, make sustainable decisions and take
sustainable actions. To this end, education has a vital role to play in the teaching and
learning of green competencies, including green awareness, knowledge, skills, abilities,
attitudes and behaviors [7].
However, before teaching can proceed, teachers themselves need to acquire green
competencies and be motivated to teach sustainability. In this regard, the literature reports
that teachers often lack these requirements resulting in inconsistent and inadequate sus-
tainability education in schools [8]. A key opportunity for developing teachers’ green
competencies and motivation is during teacher training or when student teachers are
being trained about what and how to teach. Yet, the literature reports that student teach-
ers also do not have the appropriate green competencies and feel unprepared to teach
sustainability [9].
and IT fields. Additionally, this knowledge provides teacher training management with
valuable insight into the management and design of student teacher training and courses
for improving student teachers’ Green IT competencies and teaching motivation. Thus,
the research question was what guidelines should teacher training management follow
to improve their student schoolteachers’ Green IT teaching intention and behavior?
Correspondingly, the research objective was to develop teacher training management
guidelines by building on prior empirical work [15].
The paper is structured into five sections with the first contextualizing the study,
the second reviewing applicable frameworks, theories and models in the literature and
the third providing justification for the research methodology. The results of which are
presented in the fourth section and the study concludes in the final section.
2 Literature Review
To achieve the research objective, applicable frameworks, theories and models were
searched for in the Green IT and IT literature to establish a basis for an empirically
testable research model from which to develop teacher training management guidelines.
Nine prominent frameworks, theories and models were evident, namely the IT Gover-
nance and Green IT model (ITGM) [16], the Green-readiness framework (G-readiness)
[17], the adoption model for Green IT [18], the Green IT adoption model (GITAM)
[19], a readiness self-assessment model for implementing Green lean initiatives [20],
the belief-action-outcome framework [21], the theory of planned behavior (TPB) [22],
the theory of reasoned action (TRA) [23] and the decomposed theory of planned behavior
(DTPB) [24].
It was evident that the first six are mostly applicable at an organizational level and
DTPB is more suitable for new technology adoption and utilization, resulting in their
exclusion. However, the TRA had applicability for targeting behavioral change strategies
and the TPB exposed variables affecting behavioral intention. Hence, TRA and TPB were
selected and formed the basis of the study’s research model as they could be used to
address the research problem, answer the research question and had previously provided
useful insight and explanations about the variables involved in behavioral intention and
behavior in prior Green IT, IT and sustainability research [25–27].
Subsequently, these theories guided the study’s research model development result-
ing in ten constructs, namely behavioral beliefs (BB), normative beliefs (NB), con-
trol beliefs (CB), level of awareness (LA), attitude toward behavior (ATB), subjective
norm (SN), perceived behavioral control (PBC), person-related beliefs (PRB), behavioral
intention (BI) and behavior (B).
To elaborate, BB relates to the level of acceptance by a student teacher that Green IT
teaching results in improved Green IT practices, NB relates to the level of acceptance by
a student teacher that Green IT teaching is expected by important people in the education
domain and CB relates to the level of acceptance by a student teacher of his/her discretion
to teach Green IT. LA relates to the level of Green IT knowledge, ATB relates to how
a student teacher feels about Green IT teaching, SN relates to the level of approval a
student teacher expects from people significant to him/her about Green IT teaching, PBC
relates to the perceived level of difficulty or ease of teaching Green IT, PRB relates to
Teacher Training Management Guidelines for Improving Green IT 745
the how important student teachers think their role is in promoting Green IT practice.
BI relates to the resolve of a student teacher to teach Green IT and B relates to Green IT
teaching.
3 Methodology
For answering the research question and addressing the research problem, the study was
appropriately guided by a positivist philosophy. As an epistemology, positivism indicates
that knowledge can be objectively acquired using the scientific method and observed
empirical quantitative data and analyses. Theory and hypothesis testing are common
characteristics. Hence, the study conducted an online anonymous questionnaire survey
to measure the research model constructs and test their relationships.
Following advice in the literature, the study used purposive sampling for relevant
and knowledgeable respondents representing the key research problem categories [28].
This method is efficient and replicable. A total of three hundred responses were collected
from student teachers across three teacher training tertiary institutions in Swaziland that
demonstrated a broad set of demographics, teaching grades and related qualifications.
Ethics clearance was approved by the University of South Africa (Unisa) following
formal permission from all the teacher training institutions and each respondent provided
informed consent.
Conducting the ANOVA was also important because prior research [27] had noted
material differences that could impact teacher training management guidelines. For
instance, it would be important to know whether there was a significant difference in
Green IT teaching attitude between males and females or a significant difference in
Green IT teaching intention between student schoolteachers of different ages.
The ANOVA was carried out using the statistical software platform called SPSS
and examined all demographic variables for each of the ten research model constructs
[29]. However, ANOVA like many other statistical procedures has certain requirements.
One of which is homogeneity of variance, which was determined using Leven’s test. If
the significance of a Leven’s test is below five percent, then the null hypothesis which
states equal variances is rejected and means a violation of the homogeneity of variance
assumption, which signifies likely misleading ANOVA results and those ANOVA results
should not be interpreted. However, if the significance of a Leven’s test is above or equal
to five percent then the null hypothesis is not rejected and there is no violation of the
homogeneity of variance assumption.
Notably, ANOVA’s benefit is its utilization of one procedure to simultaneously inves-
tigate all comparisons, but its disadvantage is its inability to indicate which groups differ
on a variable. For this information, the post hoc procedure called Tukey’s honestly sig-
nificant difference (HSD) is required. However, it can occur that the ANOVA reports
a statistically significant difference and Tukey’s HSD does not because Tukey’s HSD
requires a greater difference before significance, due to its control of the Type I error,
which occurs when the null hypothesis is actually true but is rejected.
Proceeding with the ANOVA and the demographic variable called gender, Leven’s
test showed that ANOVA could be conducted for the constructs PRB, PBC, CB, ATB,
BB, BI and LA. However, the corresponding ANOVA for these constructs indicated
Teacher Training Management Guidelines for Improving Green IT 747
a statistically significant (SS) difference on the construct ATB only, but Tukey’s HSD
indicated no SS difference. Thus, there were no meaningful differences on gender.
For home language, Leven’s test showed that ANOVA could be conducted for all
constructs excluding PRB. The corresponding ANOVA indicated a SS difference on
the construct BB only, but Tukey’s HSD was not conducted since one group or more
groups had less than two responses, namely the Zulu home language group with only
one response. To clarify, there were six English, one Zulu and 293 Swazi responses on
home language, so it was not meaningful to compare such uneven groups resulting in
no meaningful differences on home language.
For age, Leven’s test showed that ANOVA could be conducted for all constructs
and a SS difference on construct LA only was indicated on the corresponding ANOVA.
Subsequently, on the construct LA, Tukey’s HSD indicated SS differences amongst the
20 - 24 and I do not want to answer this question age response options, and the 25 - 29
and I do not want to answer this question age response options. It was concluded that
there were no meaningful differences on age because then I do not want to answer this
question response option provides little useful information and insight.
For course year level, Leven’s test showed that ANOVA could be conducted for all
constructs excluding B and LA, but the resulting ANOVA showed no SS differences for
course year level.
For planned teaching grades, Leven’s test showed that ANOVA could be conducted
for all constructs excluding LA, B and NB, but the corresponding ANOVA showed a SS
difference on construct BI only, on which the Tukey’s HSD indicated a SS difference
between the primary school grades and early childhood grades groups. In conclusion,
Green IT teaching intention differed significantly between those respondents planning
primary school teaching, which had a mean of 15.560, and those planning early childhood
teaching, which had a lower mean of 14.138. Those planning early childhood teaching
would possibly benefit from adapted Green IT training to improve their Green IT teaching
intention.
For planned teaching subject, Leven’s test showed that ANOVA could be conducted
for all constructs excluding LA, NB and SN. Subsequently, there was a SS difference
on the construct PRB only from the ANOVA but Tukey’s HSD was not conducted since
one group or more groups had less than two responses. Thus, there were no meaningful
differences on planned teaching subject.
For the registered qualification, Leven’s test showed that ANOVA could be conducted
for all constructs excluding BB, PBC, B, NB and LA and the ANOVA showed a SS
difference on BI only. Tukey’s HSD, similar to planned teaching grades, indicated a
SS difference between the Primary Teacher’s Diploma and Early Childhood Education
Diploma groups on BI. Tukey’s HSD also indicated a SS difference between the Primary
Teacher’s Diploma and the Bachelor of Special and Inclusive Education groups on
BI. Thus, Green IT teaching intention differed significantly between those respondents
registered for the Primary Teacher’s Diploma, with a mean of 15.9077, those registered
for the Early Childhood Education Diploma, with lower mean of 14.5309 and those
registered for the Bachelor of Special and Inclusive Education, with the lowest mean of
13.7857. Hence, those registered for the Early Childhood Education Diploma and the
748 R. N. Dlamini and G. R. Howard
Bachelor of Special and Inclusive Education would possibly benefit from adapted Green
IT training to improve their Green IT teaching intention.
For practical teaching experience, Leven’s test showed that ANOVA could be con-
ducted for all the constructs excluding B and LA but the ANOVA showed no SS
differences between the different practical teaching experience groups.
5 Conclusion
5.1 Management Guidelines and Recommendations
Following the empirical work, teacher training management guidelines were developed
with the aim of improving student schoolteachers’ Green IT teaching intention and
behavior, as follows:
• With the strong empirical link demonstrated between Green IT teaching intention and
Green IT teaching, it would be important for teacher training management to address
the Green IT teaching resolve of student teachers directly and this could be done by
formally explaining to the student teachers the many global and local risks of not
practicing and teaching general environmental sustainability and Green IT, and also
formally explaining the benefits of practicing and teaching general environmental
sustainability and Green IT.
• A central focus by teacher training management should be on raising the student
teachers’ Green IT awareness since the analysis indicated that Green IT awareness
has the greatest positive influence on Green IT teaching intention. Green IT awareness
could be improved by formally integrating Green IT information and knowledge into
current teacher training courses and curriculums, including problems and solutions.
Also, informal methods including social media and events could augment formal
Green IT awareness.
• Another important aspect is the student teachers’ perceived behavioral control or their
perceptions about how difficult or easy it would be to teach Green IT, as the analysis
showed this aspect’s positive influence on Green IT teaching intention. This aspect
could be practically improved through the provision of teaching methods training
specifically designed for teaching Green IT content and creating opportunities for the
student teachers to pilot those methods with their planned teaching grades.
• Also essential is the person-related beliefs of the student teachers given its positive
influence on Green IT teaching intention. Person-related beliefs relate to the student
teachers’ perceptions of their part in improving the practice of Green IT by others.
Improving their person-related beliefs could involve training the student teachers in
the design of Green IT assessments to assess the Green IT behavior changes in their
pupils over time to provide evidence of their part in affecting Green IT change in
others.
• In addition, it would be essential to adapt and augment the Green IT training for
student teachers who plan to teach early childhood grades as these students showed a
significantly lower Green IT teaching intention. This could require teaching content
adapted for early childhood age groups given this age group’s young learning stage,
development and particular use of IT.
Teacher Training Management Guidelines for Improving Green IT 749
• Similarly, the student teachers enrolled in the Bachelor of Special and Inclusive Edu-
cation and Early Childhood Education Diploma would require adapted and special-
ized Green IT teaching content and teaching methods that fit the particular learning
development stages, capabilities and IT use contexts of these learners.
References
1. UN: Transforming our world: The 2030 agenda for sustainable development (2015). https://
sdgs.un.org/publications/transforming-our-world-2030-agenda-sustainable-development-
17981. Accessed 08 Apr 2022
2. UNEP: Frontiers 2022: Noise, blazes and mismatches: Emerging issues of environmen-
tal concern. Nairobi, Kenya (2022). https://www.unep.org/resources/frontiers-2022-noise-bla
zes-and-mismatches. Accessed 08 Apr 2022
3. Krumay, B., Brandtweiner, R.: Measuring the environmental impact of ICT hardware. Int. J.
Sustain. Dev. Plan. 11, 1064–1076 (2016). https://doi.org/10.2495/SDP-V11-N6-1064-1076
4. Bekaroo, G., Bokhoree, C., Pattinson, C.: Impacts of ICT on the natural ecosystem: a grassroot
analysis for promoting socio-environmental sustainability. Renew. Sustain. Energy Rev. 57,
1580–1595 (2016). https://doi.org/10.1016/j.rser.2015.12.147
5. Murugesan, S., Gangadharan, G.R. (eds.): Harnessing Green IT: Principles and Practices.
John Wiley and Sons Ltd, Chichester, United Kingdom (2012)
6. Cabral, C., Dhar, R.L.: Green competencies: insights and recommendations from a systematic
literature review. Benchmarking: An Int. J. 28(1), 66–105 (2021). https://doi.org/10.1108/BIJ-
11-2019-0489
7. Erhabor, N.I.: Developing leaders through mentoring in environmental education. Electr.
Green J. 1, 1–9 (2018). https://doi.org/10.5070/G314134454
750 R. N. Dlamini and G. R. Howard
25. Chen, Y., Shi, S., Chow, W.S.: Investigating users’ extrinsic motivation for green personal
computing. J. Comput. Inf. Syst. 56, 70–78 (2016). https://doi.org/10.1080/08874417.2015.
11645803
26. de Leeuw, A., Valois, P., Ajzen, I., Schmidt, P.: Using the theory of planned behavior to identify
key beliefs underlying pro-environmental behavior in high-school students: implications for
educational interventions. J. Environ. Psychol. 42, 128–138 (2015). https://doi.org/10.1016/
j.jenvp.2015.03.005
27. Mishra, D., Akman, I., Mishra, A.: Theory of reasoned action application for green information
technology acceptance. Comput. Hum. Behav. 36, 29–40 (2014). https://doi.org/10.1016/j.
chb.2014.03.030
28. Tongco, M.D.C.: Purposive sampling as a tool for informant selection. Ethnobotany Res.
Appl. 5, 147–158 (2007). https://doi.org/10.17348/era.5.0.147-158
29. Tredoux, C., Durrheim, K. (eds.): Numbers, Hypotheses & Conclusions: A Course in Statistics
for the Social Sciences. UCT Press, Cape Town, South Africa (2005)
Design and Implementation of an Automatic
Word Match Generator
1 Introduction
In this paper, we will address a common problem that instructors frequently encounter.
Instructors use word-matching interactives to teach new vocabulary to students.
Typically, these word-matching interactives had to be developed by hand.
We have developed more than sixty word matching exercises. Each word match-
ing exercise is a representation of a word-matching interactive. These interactives are
embedded in interactive eBooks as shown in [1–3]. The interactives in the eBooks have
received good reviews [4, 5]. They help students learn and grasp key terms. Each of
the word-matching interactives were programmed manually. Creating word-matching
interactives requires programming skill and takes a lot of time and effort. To enable
instructors with the ability to create word-matching exercises, we created an Automatic
Word Match Generator.
The Automatic Word Match Generator enables the user to enter key terms and their
descriptions and generates a Web page for a word-matching interactive as shown in
Fig. 1.
In the following sections, we will demonstrate word-matching interactives and the
use of the Automatic Word Match Generator. We will present the model for the Automatic
Word Match Generator and the design and implementation of the Automatic Word Match
Generator. Finally, we will discuss lessons learned from this project and future works.
2 Word-Matching Interactives
A word-matching interactive is a Web Page for students to learn key terms by matching
key terms with descriptions. Figure 2 shows an example of a word-matching interactive,
which can be viewed from https://liveexample.pearsoncmg.com/wordmatch/Section1_
2.html. Figure 3 shows the result after the user drags the key terms to match their
descriptions. A Congratulations dialog (see Fig. 4) is displayed when all key terms are
matched to their descriptions.
We designed a simple and intuitive user interface for an instructor to use. The first step in
developing the Automatic Word Match Generator was to create a method for generating
the static HTML, CSS, and JavaScript word-matching interactive template. The next
phase of developing the Automatic Word Match Generator was to save the generated
754 E. M. Gertis and Y. D. Liang
code to the internal server by clicking the Post button. The Automatic Word Match
Generator creates an HTML file to store the generated HTML code for the exercises and
then displays a View button.
The View button serves two purposes. First, it renders the HTML code for the exer-
cise. Second, it shows the URL for the exercise on the server. The instructor can give
this URL to the student.
To use the Automatic Word Match Generator, go to http://livelab.georgiasouthern.
edu/wordmatchgenerator as shown in Fig. 5.
Design and Implementation of an Automatic Word Match Generator 755
Fig. 5. The initial screen for the automatic word match generator.
Now enter a title, Key Term 1, Description for Key Term 1, Key Term 2, and Descrip-
tion for Key Term 2. You can click the Add More button to create more entries for key
terms and their descriptions. For example, to create the word matching exercise in Fig. 2,
you can enter the following entries in Fig. 6.
Now click the Generate HTML button to display the generated HTML code for this
word matching exercise. The generated HTML is shown in Fig. 7.
Click the Post button to post the word match exercise to the server. Note that the
descriptions are randomly ordered. The Post button saves the generated HTML file for the
exercise on the server and creates a URL for the generated exercise. After the generated
HTML file is posted, a View button is displayed, as shown in Fig. 8. Clicking the View
button displays the exercise using the URL, as shown in Fig. 9. The instructor can give
the URL for this exercise to the student.
756 E. M. Gertis and Y. D. Liang
Web Page uses HTML, CSS and JavaScript to create a word-matching interactive. Post
Web Page automatically posts the generated Web page on a web server so the page can
be viewed on the Internet.
The methods used in the custom WordMatchController class facilitate the following
operations:
After the response is returned an Instructor can give a student the id associated with
the word-matching interactive.
Before implementing our project, a survey of possible solutions was conducted. In
this phase various programming languages were evaluated. We decided to use Java for
two reasons: it supports a set of frameworks for implementing web applications and
we had more experience using it. Using Spring Boot. We would be able to create an
application that could be used to generate, serve, and store word-matching interactives
as static HTML pages.
An overview of the steps used developed the Automatic Word Match Generator are
shown below:
1. Developed a JavaScript method to capture the values from HTML input elements.
2. Logged the input values to the JavaScript console as shown in Fig. 14.
3. Displayed the concatenated string from step two in a textarea element below the
instructor GUI as shown in Fig. 14.
4. Rendered the output from the textarea box in a separate window as shown in Fig. 15.
The specific classes used in the implementation of our microservice are shown in
Fig. 16, 17, 18, and 19.
Design and Implementation of an Automatic Word Match Generator 761
Fig. 15. The console log output from the generated HTML function.
Fig. 16. An initial attempt at rendering the HTML output from the generated HTML
The primary purpose of the controller class shown in Fig. 17 is to provide URL
routes for our application. The purpose of the classes shown in Fig. 18 and Fig. 19 is to
provide models which can be used to transport data throughout the service. The purpose
of the View class shown in Fig. 19 was designed to keep track of the word-matching
interactives saved on the server. The WordMatch class shown in Fig. 18 was used as a
model to transport the word-matching interactive. The model exists entirely on the server.
The method saveJSP shown in Fig. 20 was used to save a word-matching interactive
762 E. M. Gertis and Y. D. Liang
Fig. 17. The UML diagram for the custom WordMatchController class.
Fig. 18. The UML diagram for the custom WordMatch model class.
Fig. 19. The UML diagram for the custom view model class.
as a jsp (Java Server Pages) file on the server and saveHTML was used to convert a jsp
(Java Server Pages) file to a static HTML file.
Design and Implementation of an Automatic Word Match Generator 763
Fig. 20. The UML diagram for the custom WordMatchService class.
6 Lessons Learned
We created many word matching exercises manually. It was time consuming to create
each exercise and maintain it. Now we have this tool. It is a simple process to create a
word matching exercise without writing any code. In retrospect, we should have created
this tool earlier to save hundreds of hours of writing word matching exercises manually.
When we first designed the tool, we generated the HTML code and displayed the
code in a text area. We expect the instructor to copy and paste the code. We found this
limited the adoption of this tool. So we added the Post button to save the generated
HTML code to a server and create a URL for the instructor to access it directly without
any extra work.
7 Future Work
At present the generated exercises are not associated with a user. We plan to let instructors
create accounts. So they can create and store exercises in a database. An instructor will be
able to view all created exercises and delete them as well. With a user account, the keys
and their descriptions for each exercise will be saved in the database and regenerated.
The instructor will not need to re-enter the keys and descriptions if new functionality or
a new user interface is added to the generated HTML file.
Another direction of the future work is to create multiple word matching exercises
once. This idea was proposed by an instructor. The instructor wishes to create an XML
file that stores information for multiple exercises. For each exercise, it specifies the
title, key terms, and their descriptions. The Automatic Word Match Generator takes
the information from the XML file and automatically generates an HTML file for each
exercise specified in the XML file.
8 Conclusions
This paper presented a Web-based tool for automatically generating a word-matching
interactive. Instructors can enter the terms and their descriptions to generate a HTML
764 E. M. Gertis and Y. D. Liang
page and share the URL with students. The tool is freely available from http://livelab.
georgiasouthern.edu/wordmatchgenerator.
We proposed a generic model for automatically generating web pages. The Automatic
Word Match Generator is a demonstration of a concrete implementation for this generic
model. We believe that many other web page generation projects can be implemented
using similar approaches. Our Automatic Word Match Generator project serves as a
stepping stone in the field of automatic programming for generating web pages.
The Automatic Word Match Generator removes the pain that instructors typically
face when they have to create word-matching games. Before implementing the Automatic
Word Match Generator all word-matching interactives had to be developed manually.
The process of manually creating these exercises was a large waste of time for instructors.
Our tool provides instructors with the ability to create word-matching interactives
without having to write any code. The first iteration of our tool required instructors to
at least copy and paste their code from the text area onto the server. The manual effort
required to copy and paste resulted in poor adoption, so we added a Post button to save
the generated HTML code. Once the content is saved onto the server, a URL is created
for the instructor to access the exercise directly without any extra work.
The contribution from our research is a web-based tool that can automatically gener-
ate a word-matching interactive. Now instructors can enter their terms and descriptions
to create fun word-matching interactives which can be shared with students by sending
them a URL. The tool is freely available from http://livelab.georgiasouthern.edu/wor
dmatchgenerator.
References
1. Liang, Y.D.: REVEL™ for Introduction to Java Programming and Data Structures. Pearson
Education (2016). ISBN-13: 978–0134167008
2. Liang, Y.D.: REVEL™ for Introduction to C++ Programming and Data Structures. Pearson
Education (2018). ISBN-13: 978–0134669854
3. Liang, Y.D.: REVEL™ for Introduction to Python Programming and Data Structures. Pearson
Education (2018). ISBN-13: 978–0135187753
4. REVEL™ educator study observes homework and exam grades at University of Louisiana,
Spring (2016). http://www.pearsoned.com/results/revel-educator-study-observes-homework-
exam-grades-university-louisiana/. Accessed 16 May 2022
5. REVEL educator study assesses quiz, exam, and final course grades at Central Michigan Uni-
versity, Fall (2015),. http://www.pearsoned.com/results/revel-educator-study-assesses-quiz-
exam-final-course-grades-central-michigan-university/. Accessed 16 May 2022
6. Olsson, R.: Inductive functional programming using incremental program transformation.
Artif. Intell. 74(1), 55–81 (1995)
7. Wang, C., Feng, Y., Bodik, R., Dillig, I., Cheung, A., Ko, A.J.: Falx: synthesis-powered
visualization authoring. In: Proceedings of the 2021 CHI Conference on Human Factors in
Computing Systems, pp. 1–15 (2021)
8. Balzer, R.: A 15 year perspective on automatic programming. IEEE Trans. Softw. Eng. 11,
1257–1268 (1985)
9. Jazayeri, M.: Formal specification and automatic programming. In: Proceedings of the 2nd
International Conference on Software Engineering, pp. 293–296 (1976)
Design and Implementation of an Automatic Word Match Generator 765
10. Whalen, M.W., Heimdahl, M.P.E.: An approach to automatic code generation for safety-
critical systems. In: Proceedings of the 14th IEEE International Conference on Automated
Software Engineering, pp. 315–318. IEEE (1999)
11. Sun, S.Y.: A translator description language tdl for specification languages. J. Inf. Process.
3(3) (1990)
12. Palshikar, G.K.: Applying formal specifications to real-world software development. IEEE
Softw. 18(6), 89–97 (2001)
The Impact of Feedback Modes on Learners’
Performance in Paragraph Writing
1 Introduction
Teacher feedback is an integral part of English as a second/foreign language (ESL/EFL)
writing classrooms [1]. However, although feedback has been acknowledged to be useful
for ESL/EFL learners, it may not be well understood and effectively utilized by learners
in revising their texts and improving their writing performance [2, 3]. Therefore, research
has called for enhancing how teachers formulate and give feedback [4–6]. In this regard,
there is a line of research that supports the role of digital technology in improving
teachers’ feedback-formulating and giving processes as teachers have become able to
formulate and provide feedback in different digital modes varying from text/written
comments to voice notes and even audio-visual formats (e.g. [7–11]).
Some previous studies exploring teacher feedback-giving through digital modes have
focused on how these digital modes impact teachers’ feedback types and patterns (e.g.
[12]. However, these studies have not addressed how such digital modes affect learners’
use of teacher feedback in improving their writing. Some studies have measured the
impact of two different digital modes (e.g. text and voice) (e.g. [13–16]) and text and
audio-visual [9, 17] on learners’ text revisions. A few studies have compared more than
two modes: text, voice, and audio-visual modes [8, 18], oral, text, voice, and screencast
modes [7]. They have reported contradictory findings. Since the purpose of digital mode
in feedback delivery is to enhance learners’ uptake and write performance, it is mandatory
to see how and to what extent learners will take up such feedback given in different
digital modes and will enhance their writing performance [2, 8]. The author argues that
replication research should be based on sound methodological and analytical practices to
advance ESL/EFL theory and inform pedagogy. Therefore, this study aims to determine
the effect of teacher feedback modes on EFL learners’ text revisions and performance
in paragraph writing. It addresses the question of which modes are more effective in
enhancing learners’ performance in paragraph writing?
2 Literature Review
There are different modes of teacher feedback. Starting with the traditional or non-digital
feedback modes, oral feedback is corrective and evaluative information orally given
by writing teachers on students’ written texts in face-to-face (FTF) classroom settings,
which might take the form of dialogue [19]. A few studies have compared oral feedback to
other feedback modalities. For instance, [20] reported that oral metalinguistic feedback
was more effective than written feedback in enhancing learners’ use of subject-verb
agreement in English. According to [21], oral feedback was more efficient than written
feedback on writing for Turkish EFL learners. A few other studies have highlighted
the potential of teacher feedback provided in the oral modality due to the occurrence
of teacher-learner dialogue around feedback [22, 23]. Yet, the efficacy of teacher oral
feedback has not been explored in compassion to other feedback modalities [19].
Because of the widespread application of technology in writing courses, teacher
feedback has been increasingly digital. It is formulated and provided through digital
tools such as Google Docs comments, voice records, and screencast capture records.
This has resulted in diversifying digital feedback modes from text to voice and audio-
visual modalities [2, 7, 12, 24, 25]. While text feedback is provided in the form of written
comments/notes inserted into learners’ written texts using online writing tools, such as
Google Docs, voice feedback is usually recorded through audio recording programs
and provided in voice notes on learners’ writing. In addition, audio-visual feedback
is recorded through screencast capture tools, thus making it multimodal feedback. It
consists of oral or voice comments and visual elements (e.g., mouse color effect, mouse
pointer, text display, etc.) [7, 12].
Concerning empirical research comparing these different digital modalities and their
effect on learners’ writing, some studies have compared two digital modes: text and voice
modes [13, 16] and text and audio-visual modes [9, 17, 26, 27]. The findings of the first
group of studies seem inconclusive. While one study supported the efficacy of voice
feedback in enhancing learners’ content and ideas, organization, and style [16], the
other study reported that feedback mode was not an essential factor affecting learners’
tasks [13]. For the second group of studies, some studies provided evidence on the
effectiveness of audio-visual feedback modality on learners’ writing [9, 26, 27], another
768 M. A. Saeed et al.
study found that students’ writing improved regardless of the feedback modes used.
Results of some other studies comparing text and audio-visual feedback modes appeared
mixed. In other words, audio-visual feedback was found to trigger a higher number of
learners’ successful text revisions of macro-level errors such as content, organization,
and structure [28] as well as appropriate vocabulary use [29], text feedback was found
more effective in leading to higher amounts of micro-level errors, such as linguistic
errors, vocabulary choice, and punctuations [28, 29].
A few recent studies have compared three digital feedback modalities: text, voice,
and audio-visual [8, 18] and even four modalities: oral, text, voice, and audio-visual [7].
The first two studies found that no digital feedback mode was more efficient than the
others in enhancing learners’ texts. On the other hand, the latter study reported that most
of the successful text revisions made by learners were elicited by audio-visual feedback.
In contrast, the least was elicited by text feedback. In general, some of the above studies
support the efficacy of audio-visual feedback due to its multimodal composition that
makes the information easier to understand and to use by learners to improve their
writing.
3 Method
3.1 Participants
The present study used a pretest-posttest design to measure the effectiveness of teacher
feedback modes on learners’ paragraph writing performance. It was conducted among
60 EFL undergraduates joining a writing course in a Saudi public university over five
weeks. The writing course introduces learners to paragraph writing of different genres:
descriptive, narrative, argumentative, comparison, and contrast. However, the present
study focused on narrative writing. The writing course instructor taught the course using
English as the medium of instruction and feedback. However, he had to shift to Arabic
in some cases to simplify the information.
Before the experiment, the students were randomly recruited into four groups consisting
of fifteen members. The four groups were labeled according to the feedback mode
conditions: oral feedback group (OFG), text feedback group (TFG), voice feedback
group (VFG), and audio-visual feedback group (A-VFG). During the first week, the
students were assigned to narrative paragraphs writing on the topic of an experience in
my life. The writing task was initiated by writing the first draft (60 first drafts are referred
to as the pretest in this study). After collecting the first drafts, the teacher read the drafts
and gave feedback using a different mode for each group of students (OFG-oral feedback,
TFG-text feedback, VFG-voice feedback, and A-VFG-audiovisual feedback) for four
weeks. The oral feedback was given in the classroom setting in the form of dialogue,
and it was also recorded using mobile audio records so that it could be later used as
data. However, the feedback in the latter three modes was provided by the instructor
using different digital media/tools: Blackboard Forum commenting box (Snapshot 1),
The Impact of Feedback Modes on Learners’ Performance 769
WhatsApp audio/voice records (Snapshot 2), and Bandicam screencast recorder software
(Snapshot3), respectively. In addition, the voice and audio-visual feedback records were
shared with the course WhatsApp group (Snapshots 2 & 4) as shown in Fig. 1.
During the last week, the students were requested to revise their first drafts of narrative
writing based on the teacher’s feedback. They were also asked to submit their final drafts
at the end of the week (n.60). The writing instructor read and checked them as the post-test
in this study.
and final drafts were assessed and scored out of 20 marks based on the writing task
rubric specified in the course. Then, these pretests- post-test scores in paragraph writing
were compared for each feedback group independently using descriptive (mean values)
and inferential statistics (a paired sample t-test) to determine the effect of each feedback
on learners’ paragraph writing. In addition, the mean values of the post-test scores in
paragraph writing of the four groups were compared against each other to determine
the significance level of differences between and within groups using one-way ANOVA.
To find out where these significant differences lay or the location of these statistically
significant differences, the Scheffe test, one of the ANOVA tests, was performed in this
study.
Table 1. Paired sample T-test of scores of first and final drafts for each group
Paired t-test
Group Drafts Mean Std. deviation t Sig. (2-tailed)
OFG First draft 11.8000 1.14642 −8.646 .000
Final draft 16.3333 1.83874
TFG First draft 12.1333 2.66905 −.592 .563
Final draft 12.7333 2.73774
VFG First draft 12.9333 2.68506 −4.641 .000
Final draft 16.0667 2.43389
A-VFG First draft 10.6667 2.09307 −16.568 .000
Final draft 16.6000 1.99284
As shown in Table 2 one-way ANOVA was performed to compare these four feedback
groups concerning their final drafts or post-test scores in paragraph writing. Table 2 shows
that the difference between groups (49.311) is higher than the difference within groups
(1.05). In addition, the observed F of the writing post-test was 9.496, and the critical F
shows a degree of freedom of 3. The four groups differed in their performance on the
writing post-test. In other words, overall, the difference is statically significant among
the four groups (p = .000 < .005), which suggests that the feedback treatment affected
learners’ paragraph writing performance.
The Impact of Feedback Modes on Learners’ Performance 771
To determine where this significant difference lies or which feedback groups outper-
formed the others in the post-test, a posthoc analysis was performed using the Scheffe
test as shown in Table 3. The results show significant differences between OFG and TFG
(p = .001 < .005), between VFG and TFG (p = .003 < .005) and between A-VFG and
TFG (p = .000 < .005). This suggests that the OFG, VFG, and A-VFG outperformed the
TFG in the post-test paragraph. In other words, the three feedback modes: oral, voice,
and audio-visual are more effective than the text feedback mode in enhancing learners’
performance in paragraph writing.
On the other hand, the differences between OFG and VFG (p = .991 > .005),
between OFG and A-VFG (p = .991 > .005), and between VFG and A-VFG (p = .938 >
.005) are not statistically significant. This suggests that no feedback group outperformed
its counterpart in each of these three clusters of group comparisons (e.g. OFG when
compared to VFG). Such results also indicate that from these three feedback modes:
oral, voice, and audio-visual, no mode is more effective than the others in improving
learners’ paragraph writing performance.
772 M. A. Saeed et al.
5 Conclusion
The present study addressed issues related to the effectiveness of teacher feedback on stu-
dents’ paragraph writing. Specifically, it focused on enhancing feedback-giving practices
to maximize its use among students in enhancing their writing. By exploring the effect
of four different feedback modes on learners’ narrative writing, the study revealed that
the oral, voice, and audio-visual feedback modes are more efficient than text feedback
in improving learners’ performance in writing.
Although the study contributes to earlier research on feedback-giving practices and
their effect on learners’ writing, it has some limitations. The first limitation is that one
writing course instructor gave the feedback. Therefore, future studies should look at the
effect of these different feedback modes among different writing instructors to compare
and better understand such effect variation. In addition, the study was exclusive to one
writing course, which focuses on paragraph writing. Therefore, future research may also
look at this effect of feedback modes in different writing courses, including essay writing
courses. In addition, future studies may also look at this effect of feedback modes across
The Impact of Feedback Modes on Learners’ Performance 773
different writing topics and genres as feedback mode may not be the only factor affecting
learners’ writing performance. Finally, students’ views on feedback modes as receivers
of feedback should be considered to enrich and support such results.
References
1. Dressler, R., Chu, M.W., Crossman, K., Hilman, B.: Quantity and quality of uptake: examining
surface and meaning-level feedback provided by peers and an instructor in a graduate research
course. Assess. Writ. 1(39), 14–24 (2019)
2. Ene, E., Upton, T.A.: Synchronous and asynchronous teacher electronic feedback and learner
uptake in ESL composition. J. Second. Lang. Writ. 41, 1–13 (2018)
3. Kilickaya, F.: Pre-service English teachers’ views on coursebook evaluation and designing
supplementary materials. Kastamonu Eğitim Dergisi 27(2), 523–536 (2019)
4. Bahari, A.: Computer-mediated feedback for L2 learners: challenges versus affordances. J.
Comput. Assist. Learn. 37, 24–38 (2020)
5. Novakovich, J.: Fostering critical thinking and reflection through blog-mediated peer
feedback. J. Comput. Assist. Learn. 32(1), 16–30 (2016)
6. Storch, N., Wigglesworth, G.: Learners’ processing, uptake, and retention of corrective
feedback on writing: case studies. Stud. Second. Lang. Acquis. 32(2), 303–334 (2010)
7. Alharbi, M.A.: Impact of teacher written vs. audio feedback on EFL undergraduates’ writing.
Kıbrıslı Eğitim Bilimleri Dergisi 16(3), 1141–1154 (2021)
8. Bakla, A.: A mixed-methods study of feedback modes in EFL writing. Lang. Learn. Technol.
24(1), 107–128 (2020)
9. Cheng, D., Li, M.: Screencast video feedback in online TESOL classes. Comput. Compos.
58, 102612 (2020)
10. Ko, M.H.: Students’ reactions to using smartphones and social media for vocabulary feedback.
Comput. Assist. Lang. Learn. 32(8), 920–944 (2019)
11. Ma, X.: Writing in a Task-Based Individualized Curriculum: Effectiveness of Direct and
Indirect Written Corrective Feedback (Doctoral dissertation, Georgetown University) (2020)
12. Mohammed, M.A.S.: Does teacher feedback mode matter for language students? Asian EFL
J. 28(11), 2021 (2021)
13. Gleaves, A., Walker, C.: Richness, redundancy or relational salience? a comparison of the
effect of textual and aural feedback modes on knowledge elaboration in higher education
students’ work. Comput. Educ. 62, 249–261 (2013)
14. Johnson, W.F., Stellmack, M.A., Barthel, A.L.: Format of instructor feedback on student
writing assignments affects feedback quality and student performance. Teach. Psychol. 46(1),
16–21 (2019)
15. Morris, C., Chikwa, G.: Audio versus written feedback: exploring learners’ preference and
the impact of feedback format on students’ academic performance. Act. Learn. High. Educ.
17(2), 125–137 (2016)
16. Solhi, M., Eğinli, İ: The effect of recorded oral feedback on EFL learners’ writing. Dil ve
Dilbilimi Çalışmaları Dergisi 16(1), 1–13 (2020)
17. Elola, I., Oskoz, A.: Supporting second language writing using multimodal feedback. Foreign
Lang. Ann. 49(1), 58–74 (2016)
18. Espasa, A., Mayordomo, R.M., Guasch, T., Martinez-Melo, M.: Does the type of feedback
channel used in online learning environments matter? Students’ perceptions and impact on
learning. Active Learn. High. Educ. 23(1), 49–63 2019.https://doi.org/10.1177/146978741
9891307
774 M. A. Saeed et al.
19. Schuldt, L.C.: Feedback in action: examining teachers’ oral feedback to elementary writers.
Teach. Teach. Educ. 83, 64–76 (2019)
20. Mansourizadeh, K., Abdullah, K.I.: The effects of oral and written meta-linguistic feedback
on ESL students writing. 3L the SE Asian J. Engl. Lang. Stud. 20(2), 117–126 (2014)
21. Küçükali, E.: The effect of oral vs. written feedback in EFL writing. J. Appl. Linguist. Lang.
Res. 4(7), 47–67 (2017)
22. Merkel, W.: Role reversals: a case study of dialogic interactions and feedback on L2 writing.
J. Second. Lang. Writ. 39, 16–28 (2018)
23. Steen-Utheim, A., Wittek, A.L.: Dialogic feedback and potentialities for student learning.
Learn. Cult. Soc. Interact. 15, 18–30 (2017)
24. Johnson, G.M., Cooke, A.: Self-regulation of learning and preference for written versus
audio-recorded feedback by distance education students. Distance Educ. 37(1), 107–120
(2016)
25. Orlando, J.: A comparison of text, voice, and screencasting feedback to online students. Am.
J. Distance Educ. 30(3), 156–166 (2016)
26. Cavaleri, M., Kawaguchi, S., Di Biase, B., Power, C.: How recorded audio-visual feedback
can improve academic language support. J. Univ. Teach. Learn. Pract. 16(4), 6 (2019)
27. Özkul, S., Ortactepe, D.: The use of video feedback in teaching process-approach EFL writing.
TESOL J. 8(4), 862–877 (2017)
28. Cunningham, K.J.: Student perceptions and use of technology-mediated text and screencast
feedback in ESL writing. Comput. Compos. 52, 222–241 (2019)
29. Ducate, L., Arnold, N.: Computer-mediated feedback: effectiveness and student perceptions
of screen-casting software versus the comment function. Technol. Across Writ. Contexts and
Tasks 10, 31–56 (2012)
Metasearch: A Web-Based Application
to Perform Systematic Reviews
Rafael Santos Crema(B) , Guilherme Nunes Nogueira Neto, and Percy Nohama
Abstract. This article presents some of the available features dedicated to per-
forming systematic reviews, their limitations and importance, followed by a novel
tool for a scientific article search engine, fully integrated into a robust system to
perform systematic reviews. It has helpful and desirable features such as Database
Integration; Duplicate Removal; Collaboration and Reviewers; Validation Pro-
cess; Automated Criteria Creation; and Cost. This tool, called Metasearch, is a
web-based application focused on performing systematic reviews with automatic
search and metadata retrieval from databases, removing duplicates in just one
click, performing complex rules for excluding criteria quickly, and using filters
in many metadata or tags, as well as work validation by third-party reviewers.
The tool was developed following the scientific method of performing system-
atic reviews, always focusing on saving time and helping researchers with smart
tools and a friendly interface, providing an integrated set of tools to reach these
objectives.
1 Introduction
Internet search engines used in the processes of data search, retrieval, storage, and report-
ing of systematic reviews have a negative influence [1–6]. Their use is associated with the
lack of guidance in making internet searching reproducible and fails to identify results
without introducing bias, bringing more harm than benefit [7]. Considering this, the use
of specialized search engines (such as Rayan [8]) or applications (such as Publish or
Perish [9]) are reliable and safe alternatives to searching and retrieving data for system-
atic reviews. However, even with available options, these tools do not contain a desirable
fully integrated roll of features, including articles metadata retrieval or collaboration and
sharing tools.
In this article, we present some of the available tools, their limitations and impor-
tance, followed by the methods used to develop a novel tool for a scientific articles
search engine, fully integrated into a robust system to perform systematic reviews, con-
taining helpful and desirable features and, finally, the conclusions and benefits reached
by developing it.
that can automatically creates some exclude criteria based on search rules, something
we called Automated Criteria Creation.
Finally, we considered Cost as an important feature (or characteristic) because
depending on the tool cost, it can be used by more, or less, researchers.
After select and considerer all the main features, we analyse all the cited tools looking
for those features, presenting the results in Table 1.
As we can check in the table, none of the evaluated tools has all selected features
together. Some of them are not even available in any tool, such as Automated Criteria
Creation and Validation Process, which are very important to decrease the time spent
on exclusion criteria creation and guarantee a transparent work evaluation by reviewers.
Providing this roll of features together is important to help the exhausting and painful
work required to perform systematic reviews, leaving more time for article reading, cri-
teria development, and search validation. This paper aims to present a complete solution
that can combine all the listed features.
3 Metasearch Tool
The Metasearch tool is a web-based application focused on developing systematic
reviews, improving the time spent on the search stages in article databases, article sort-
ing with exclusion criteria, and work validation. Finally, it also helps researchers to
collaborate and share their work.
The motivation to develop the tool was based on the personal needs of the authors
to develop systematic reviews in a fast and more trackable way. Also, considering that
more than three-quarters of all studies used in systematic reviews are found in electronic
databases [16–20], and Boolean logic already are used in systematic reviews, as it allows
complex query formulation [21], that was the main idea for the tool development, pro-
viding the desirable features that was selected previously, such as database integration,
duplicate removal, collaboration and reviewers, validation process, automated criteria
creation, and cost together.
778 R. S. Crema et al.
4 Tool Development
The development started by designing a workflow based on how systematic reviews
are performed, including all stages. To better understand the process, the PRISMA
statement was used as reference, with their steps of Identification, Screening, Eligibility
and Included [22].
After the understand of the PRISMA flow, we created a flow diagram for the tool,
resulting in the main process that it should attend (see Fig. 1).
The first three steps of the diagram inside the Metasearch land was focused on the
first step of Identification, starting with “Perform Search on Databases with Boolean
Logic” that allows the user to perform searches in different databases directly from the
tool, that should be integrated with the databases that will “List Search Results”, this
step should list all databases and the number of articles retrieved in each one based on
the query parameter. After listed, the user should “Select databases to include in the
Systematic Review”, which means the tool user should select the desirable ones to be
included, and then the system will “Import Articles Metadata”, meaning that using the
database integration, the tool will import all metadata from the retrieved articles, what
was possible by the database “List Articles with Metadata” on their integration interface.
After performed the first steps, the user should have a started systematic review insed
the tool, that goes to the fourth main step of “View Results”, this steps should include the
possibility of list all the imported with their metadata with filters to better find specific
articles. From this step the tool user should be able to perform three main actions: Create
Exclusion Criteria; Fill Missing Articles Metadata; and Create Articles Tags.
Create Articles Tags means that the tool needs to provide a Tag system, where the
user can create, include and remove tags for each individual article, allowing the articles
to be categorized, helping the screening step of PRISMA. Fill Missing Articles Metadata
Metasearch: A Web-Based Application to Perform Systematic Reviews 779
refers to the action of provide missing metadata for articles, it should be possible because
sometimes the database integration interface are not able to retrieve everything from the
article. Finally, Create Exclusion Criteria is the possibility of create the planned exclusion
criterias for the systematic review inside the tool, automatically setting the articles as
excluded, it should be possible to perform using two ways: a regular manual one, by
selecting the desirables articles for the exclude criteria, and an smart and automatically
one, focused to exclude articles based on rules, for exemple articles with some specific
word present on the title or abstratic, articles with publication data older than some year,
and so on.
Ending the flow, there’s two final steps of Export and Share Results, where the
tool should provide tools to export the articles list with each article status (included or
excluded) and some way to share the results with others.
To achieve the proposed workflow, the technical approach considered was using web-
based technologies, developing the tool with the PHP programming language, Nginx web
server software, running on a Linux server, and for the database, the MySQL database
management system was defined. Furthermore, as a tool for academic purposes, all the
technology selected for its development consisted of Open Source solutions, avoiding
unnecessary costs with software licenses.
5 Results
After worked on the tool development, it was released in a beta version with all the
proposed workflow stages. This first beta version made it possible to create Systematic
Reviews with the developed tool, using all the benefits of a roll of features and internal
tools fully integrated, following the workflow better explained next.
5.2 Sorting
Once the articles are imported to the database, the sorting stage can be done inside the
Metasearch tool. First, the articles are listed with Database, Year, and Title information.
It also shows the possibility of creating tags for each one, suggesting the already used
780 R. S. Crema et al.
ones as users start typing the text, this tags can be useful to create exclude criteria as
will be mentioned ahead and to filter the articles exibition. Other features are: Article
Status, which shows if the article is still included in the systematic review or is already
excluded by some of the exclusion criteria; Alerts, which present possible missing article
metadata; and Edit option, where missing data from Alerts can be filled in to help the
researcher on the sorting stage.
The sorting of articles is performed by creating exclusion criteria and can use the
tool features developed for this task. The first is the Remove Duplicates function, which
looks for articles with the same DOI number and removes them. The exclude option is
based on a scientific relevance ranking provided by the tool, excluding the duplicated
article from the less important rank database.
Custom exclusion criteria can be created using two types of features, Manual and
Smart Criteria. The Manual option allows the user to select all the articles that were not
excluded yet and which ones they want to remove from the review, creating exclusion
criteria just like a regular sorting.
On the other hand, using the Smart Criteria function, the user can create smart filters
to sort all the remaining articles using countless combinations of fields and rules (see
Fig. 3). The filters can be created for the fields Title, Abstract, Authors, Keywords, Year,
Tags, and Content Type, using operators such as CONTAIN, NOT CONTAIN, START
WITH, NOT START WITH, END WITH, NOT END WITH, IS and IS NOT, also
GREATER THAN, LOWER THAN, EQUAL TO for the Year field. Another detail about
the filters is that they can be applied if ALL or ANY of the conditions are TRUE/FALSE,
allowing the user to create Smart Criteria with many possibilities. After the articles are
Metasearch: A Web-Based Application to Perform Systematic Reviews 781
filtered using all conditions created by the user, the smart exclude criteria are created
automatically with one click, saving precious time selecting articles based on conditions.
Other methods to create Smart Criteria are using the two shortcuts on the main
systematic review screen. The user can create criteria for a specific Tag or Content Type
directly with just one click.
After all criteria are created and undesirable articles excluded from the systematic
review, everything can be checked by viewing the result on the main screen (see Fig. 4).
It presents the total number of articles retrieved, excluded, and included, and the exclude
criteria list ordered by creation sequence, and the number of exclusions that occurred in
each criterion.
Completing the main screen functionality, there is a specific feature where it is possi-
ble to inactivate/activate the exclude criteria. This feature can be helpful in understanding
and testing the criteria in an easy and aggregated view of results.
5.3 Collaboration
Thinking about the possibilities of multiple authors’ systematic reviews and also about
reviewers’ validation, collaborators and multiple collaboration possibilities, Metasearch
was designed to allow the systematic review owner/author to assign other users to access
it with three possible rules: owner/author, that can collaborate with the same access rights
from the original owner/author; Reviewer, who can leave comments for criteria, article
metadata or for the review itself; and finally, Viewer, that allows viewing the work done
without performing any action.
782 R. S. Crema et al.
Fig. 4. Review main screen with main information regarding the systematic review
6 Discussion
The advantages of the Metasearch tool was presented after released the beta version
and the first systematic review was performand as a test, it was clearly helpful to have
the article databases directly integrated to the system, allowing not only to perform the
planned query for the review, but also to use the search to try different queries with
Boolean logic, something that can help the researcher to find the best keyword match
during the planning of the systematic review. This database integration also indicated a
great saving of time, retrieving and importing the metadata of almost a thousand articles
in less than a half hour.
The Duplicate Removal feature was very helpful, finding duplicated articles based
on DOI, but presented a possible future improvement in the Metasearch tool, to look
for duplicated articles not only based on DOI number, but also by same title or maybe
Metasearch: A Web-Based Application to Perform Systematic Reviews 783
creating an similarity index that can help in cases where the article receive small updates
between different publishers.
Speaking of Collaboration and Reviewers, the tool was successful on allowing the
collaboration between multiple authors, that can work together on the same review
sharing tags categorization, comments and reviewing their work, on the other hand,
reviewers such as professor advisor can follow the systematic review progress and share
comments with the author.
One of the most interesting features reveals to be the Validation Process, where the
tool user can not only export all the metadatada and inputed data for some CSV file,
but also, and most useful, create a share link with password, where paper reviewers, or
anyone else he desires, can access the systematic review in a simple timeline format
screen, presenting all articles and status (included or excluded) and all exclude criteria
with the excluded articles of each one and the rule used to filter (in case of smart criteria),
what can be very helpful to share work done.
Automated criteria creation shows up as a great ally to researchs on applying the
planned exclude criteria for the systematic review, performing this process in a fast way
with manual criteria, and even faster on smart criteria with countless combinations of
filters possible.
Finally, as an actual beta free version with only open-source technologies, the toolhas
a very low cost to be online and in use.
After performed some tests and validations, the tool shows up that also can be used
in a second way: as a simple search engine. This use is focused on the prospection of
information in some determined area or subarea. However, unlike regular search engines,
this can be done in many databases simultaneously and allows to use the smart criteria
to filter results and find the desirable articles.
Some limitations of this work are the current version, which is a beta version and
not open to the public. Also, it does not have all available databases integrated, only
the mentioned before, and is only working currently in English, not offering different
languages support at this point.
Future work will demand the inclusion of new databases allowing a comprehensive
set of journals to be used in the systematic reviews. It can also allow researchers from
different areas of science to use the platform. On the same matter, importing articles
using the user’s local files in different file formats, such as CSV and XLSX, will be help-
ful. Another possible development already mentioned is the improvement of Duplicate
Removal tool, allowing to be performed with better accuracy. Finally, one future work
already being planned focuses on testing the Metasearch performance comparing with
other tools and the regular development without any tool.
7 Conclusion
The Metasearch tool was developed following the scientific method of performing sys-
tematic reviews, always focusing on saving time and helping researchers with smart
tools and a friendly interface. The original aspect of this work was to provide integrated
features to be used in the development of Systematic Reviews or regular articles search.
It was possible by combining database integrations, duplicate removal, collaboration
784 R. S. Crema et al.
and reviewers, validation process, automated criteria creation, and low cost, innovating
specifically with validation of the systematic review by reviewers with a timeline feature
and automatic criteria creation using smart filters. From this, the described solution with
all features provides the desirable tool in an integrated way.
References
1. Adams, J., Hillier-Brown, F.C., Moore, H.J., et al.: Searching and synthesising ‘grey literature’
and ‘grey information’ in public health: critical reflections on three case studies. Syst Rev. 5,
164 (2016). https://doi.org/10.1186/s13643-016-0337-y
2. Stansfield, C., Dickson, K., Bangpan, M.: Exploring issues in the conduct of website searching
and other online sources for systematic reviews: how can we be systematic? Syst Rev. 5, 191
(2016). https://doi.org/10.1186/s13643-016-0371-9
3. Mahood, Q., Van Eerd, D., Irvin, E.: Searching for grey literature for systematic reviews:
challenges and benefits. Res Synth Methods. 5, 221–34 (2014). https://doi.org/10.1002/jrsm.
1106
4. Briscoe, S.: A review of the reporting of web searching to identify studies for Cochrane sys-
tematic reviews. Res Synth Methods (2017). https://doi.org/10.1002/jrsm.1275. Epub ahead
of print
5. Briscoe, S.: Web searching for systematic reviews: a case study of reporting standards in the
UK health technology assessment programme. BMC Res. Notes 8, 153 (2015). https://doi.
org/10.1186/s13104-015-1079-y
6. Cooper, C., Booth, A., Britten, N., Garside, R.: A comparison of results of empirical studies of
supplementary search techniques and recommendations in review methodology handbooks:
a methodological review. Syst Review 6(1), 234 (2017). https://doi.org/10.1186/s13643-017-
0625-1
7. Ćurković, M., Košec, A.: Bubble effect: including internet search engines in systematic
reviews introduces selection bias and impedes scientific reproducibility. BMS Medical
Research Methodology 18, 130 (2018). https://doi.org/10.1186/s12874-018-0599-2
8. Rayan Intelligent Systematic Review Homepage https://www.rayyan.ai/ Accessed 24 Apr
2022
9. Rawat, S., Meena, S.: Publish or perish: where are we heading? J. Res. Med. Sci. 19(2), 87–89
(2014)
10. Software tools to support your systematic review processes, IFIS – Food and Health Informa-
tion https://www.ifis.org/en/research-skills-blog/software-tools-to-support-your-systematic-
review-processes Accessed 15 June 2022
11. Mendeley Homepage https://www.mendeley.com/ Accessed 15 June 2022
12. EndNote Homepage https://endnote.com Accessed 15 June 2022
13. Bullers, K., Howard, Hanson, A., Kearns, A., Orriola, B., Polo, J., Sakmar, K.A.: It takes
longer than you think: Librarian time spent on systematic review tasks. J. Medical Library
Association. 106(2), 198 (2018). https://doi.org/10.5195/JMLA.2018.323
14. Qi, X., Yang, M., Ren, W., et al.: Find duplicates among the PubMed, EMBASE, and cochrane
library databases in systematic review. PLoS One. 8(8), e71838 (2013). Published 2013 Aug
20. https://doi.org/10.1371/journal.pone.0071838
15. Shokraneh, F.: Reproducibility and replicability of systematic reviews. World J. Meta-
Analysis 7(3), 66–76 (2019). https://doi.org/10.13105/wjma.v7.i3.66
16. Royle, P., Waugh, N.: Literature searching for clinical and cost-effectiveness studies used
in health technology assessment reports carried out for the national institute for clinical
excellence appraisal system. Health Technol. Assess 7, 1–64 (2003)
Metasearch: A Web-Based Application to Perform Systematic Reviews 785
17. Wallace, S., et al.: After MEDLINE? Dividend from other potential sources of randomised
controlled trials [abstract]. In: 2nd International Conference, Scientific Basis of Health
Services & 5th Annual Cochrane Colloquium, Amsterdam (1997)
18. Jadad, A.R., McQuay, H.J.: A high-yield strategy to identify randomized controlled trials for
systematic reviews. Online J. Current Clinical Trials (1993). Doc No 33:3973
19. Farriol, M., Jorda-Olives, M., Padro, J.B.: Bibliographic information retrieval in the field of
artificial nutrition. Clin. Nutr. 17, 217–222 (1998)
20. Suarez-Almazor, M.E., Belseck, E., Homik, J., Dorgan, M., Ramos-Remus, C.: Identifying
clinical trials in the medical literature with electronic databases: MEDLINE alone is not
enough. Control Clin. Trials 21, 476–487 (2000)
21. Sampson, M., et al.: Can electronic search engines optimize screening of search results in
systematic reviews: an empirical study. BMC Medical Res. Methodology 6(1), 1–8 (2006)
22. PRISMA Homepage https://prisma-statement.org/ Accessed 15 June 2022
Preliminary Study on e-Collaboration Readiness
and Community of Inquiry Presences
in a Higher Educational Institution
Abstract. The current nature of large class size in Higher Educational Institutions
(HEI), the recent COVID-19 pandemics, and more importantly, because lecturer-
student’s relationships mostly terminate right after the class session have made
educators faced many new challenges. Based on these, educators have found it
imperative to change the pedagogical and didactical approaches to teaching by
integrating Information and Communication Technologies (ICT) into the class-
room. e-Collaboration is one of the pedagogical approaches that enable two or
more people to work together using technology to help achieve a goal. This study
has introduced students to e-collaboration platforms via Learning Management
System (LMS) and Piazza. The present research focuses on finding out the experi-
ence and readiness of the e-collaboration in a HEI. Both qualitative and quantitative
approach were employed in the study. Results indicated that majority of partic-
ipants in the study have positive attitude towards e-collaboration, their attitude
results are significantly varied with their gender, and there are positive correla-
tions among the Community of Inquiry (CoI) constructs at r = 0.75, n = 75,
p = 0. In addition, majority of participants would like to use e-collaboration in
future at M = 3.95. Thus, both male and female have positive attitudes towards
e-collaboration at M3.82 SD = 0.74. The research brings to light the usefulness
and the possibilities of e-collaboration for effective teaching and learning in HEI.
1 Introduction
1.1 Background
The emerging concept of digital-native or student-K [1–5] and the continuous use of
the concept awaken the researchers as to the nature of students in Higher Educational
Institutions (HEI). Prensky [6] coined the term digital-native in 2001. The term has
since then gained popularity among researchers and educators. Other concepts used as
synonyms include Generation-Y by Stanat [7], and net-generation by Tapscott [8].
Current students are described as digital native or student-K because they are gener-
ation born into an era proliferated by smart-phones, computers, mobile digital devices,
video games, cell phones, laptops, and video cameras. These electronic devices among
others have become part of their lives. They enjoy using these devices for longer hours
than sitting in a lecture theatre. The ubiquitous natures of these devices have there-
fore given today’s college students different capabilities of thinking and processing of
information compared to their predecessors [9]. According to Prensky [2, 10] the dig-
ital native are fast in receiving information, have preferences for graphics instead of
text-based contents, such as multitasking, networking, parallel process and frequent
rewards.
The nature of current students and their enthusiasm in the adoption and use of tech-
nology therefore necessitate the need for educators to change their teaching method-
ology, the design of learning contents, and the integration of appropriate technologies
into teaching and learning activities. According to Popescu and Cioiu [11] traditional
teaching methods should be adapted to accommodate the learning needs of the new gen-
eration of digital native students by integrating web 2.0 tools to support social learning
in educational settings. Thus e-collaborative teaching and learning approaches would
pave the way for students to learn socially, collaboratively and more engaging.
With the advent of Information and Communication Technology (ICT) and social
software advancements, educators have started integrating technology into the class-
room. ICT including social software have open up opportunities for students and lec-
turers without limiting time, location and space. More importantly, students can change
the way they interact with the community of enquiries in online environment. ICT can
improve the quality of collaboration and facilitate social interaction between teachers
and students. As a result, there has been a move in the use of technology to motivate
people to develop interaction and connections with individuals. ICT has also opened
opportunities for experimenting various teaching and learning methods online.
e-Collaboration is one of the pedagogical approaches that enable two or more people
to work together using technology to help achieve a goal [12–14]. It is therefore an
extension of the regular class online where both students and lecturers can have in-depth
discussions that may have eluded them during the regular class sessions. Accordingly, the
e-collaboration effort has risen as one of the most encouraging ways to deal with learning
improvement. e-Collaboration is the right environment in which students assume a key
role in the learning process. e-Collaboration provides an interactive, simulation-based,
innovative, and comprehensive learning experience. In addition, e-collaboration improve
the effectiveness and performance of learning experience of Béres and Turcsányi-Szabó
[15, 16].
The purpose of e-collaboration practices is to focus on social technologies to promote
discussions and communication among groups and peers especially in higher educational
institutions. It trains students for the demands of the present global industry, where
staffs participating in-group projects are geographically isolated without considering
time and space. Currently, there are assortments of tools (such as PIAZZA, Facebook,
788 A.-S. Yussiff et al.
2 Theoretical Foundation
2.1 e-Collaboration Teaching and Learning
e-Collaboration is an educational approach that engage a joint academic effort by stu-
dents, or students and teachers together online using electronic devices [17, 18]. It
is a teaching method involving groups of students collaborating through the aid of
electronic devices to solve a problem, complete an assignment, or produce a product.
In community of inquiry, the key to e-collaboration is the principle of social depen-
dence, in which participants communicate freely and contribute to the achievement of
goals [18–20]. Thereby, shifting the teaching model from teacher-centered model to the
student-centered model. In the teacher-based learning model, teachers play a passive
role as learners after communicating knowledge to them. Nonetheless, e-collaboration
encourages greater success across all age groups and subject areas than other forms
of individualistic teacher-centered learning and enables students and their lecturers to
collaborate on topics that are more intensive during or after class sessions.
Compared to conventional learning approach, e-collaboration learning environment
incorporates social constructivism teaching and learning approach, self –accountability,
and personal responsibility. More importantly, e-collaborative teaching and learning is
more beneficial than traditional learning (teacher-centered) and it has a clear impact
on the success and performance of students inside and outside the lecture halls. e-
Collaboration work not only affects the performance of students, but it is also about
Preliminary Study on e-Collaboration Readiness and Community 789
students’ skills such as learning and communication, interacting with others, and working
effectively in the group (Béres and Turcsányi-Szabó [15]. Group participants use a
variety of techniques to solve problems and understand the needs of tasks that increase
the retention of all members of the group. “Students on the set can recognize their
abilities, strengths, and weaknesses when performing the required tasks” [21]. In this
study, we tried to investigate the benefits and readiness of e-collaboration in a HEI by
experimenting the didactics in six courses.
A model of the online collaborative learning process is the Community of Inquiry (CoI)
model originally developed by [29]. The CoI model is social constructivist model of
learning processes in online and blended environments that emphasized that effective
online learning, especially in higher education, needs community of inquiry [30]. The
CoI model is a complex model of key elements that are essential for the promotion of
social development and research in all educational institutions. Since its development,
the CoI model has introduced some key research methods and guidelines for online
learning. Overall, the CoI model has become the most widely referenced and the leading
theory for the study and design of effective e-learning.
According to (Garrison, 2016), in order to create a community of learners, computer-
based conferencing should incorporate text-based, asynchronous discussions to connect
learners to one another. This is different to traditional individualistic distance teaching
and learning process.
As shown in Fig. 1, the three key elements of the Community of Inquiry are Cog-
nitive Presence (CP), Social Presence (SP), and Teaching Presence (TP). Figure 1, also
illustrated how the intersection among the three elements of the CoI model leads to
deep and meaningful learning outcomes. TP is defined as “the design, facilitation, and
direction of cognitive and social processes to realize personally meaningful and educa-
tionally worthwhile learning outcomes” [31]. In online learning settings, TP involves the
(1) instructional design and organization of the course and activities, (2) facilitation of
the course and activities, and (3) facilitating or directing discussions to achieve desired
learning outcomes [18].
CP is the extent to which learners are able to construct and confirm meaning through
sustained reflection and discourse. The ultimate goal of the Community of Inquiry is to
build a solid foundation of social presence and teaching presence to stimulate cognitive
presence in a course. It is the ability of collaborators to construct knowledge through
Preliminary Study on e-Collaboration Readiness and Community 791
3 Methodology
3.1 Population
The target population for this study are University of Cape Coast, Department of Com-
puter Science and Information Technology (UCC-DCSIT) students’ ranging from level
200–400 enrolled in the following courses; Web Technologies I, Multimedia Computing,
Software Development Practices, Web Technologies II, Human Computer Interface and
System Security and Administration. The research was on a purposive study of students
who registered courses in DCSIT. It was observed that each student were enrolled in at
least two courses given us a total of N = 75 participants.
3.2 Instrumentation
UCC-LMS was used for communication and posting of teaching and learning materials.
Piazza was used for posting teaching and learning materials, group formation and e-
collaboration. In addition, a questionnaire was design to collect data on the purpose
of this study. The questionnaire consists of four parts. The first was on respondent
demographic information consisting of 6 items; the second was on general attitudinal
questions consisting of 12 items; the third was on CoI standardized questionnaire by
792 A.-S. Yussiff et al.
[35] consisting of 34 items; and the fourth was on open questions on the effect of using
e-learning in higher education consisting of 1 item. Overall, the questionnaire for the
study comprised of 52 items. A reliability coefficient of the items is; α = 0.958, which
is more than 0.70 indicating that the questionnaire used in the study is reliable.
3.3 Procedure
At the beginning of first and second semester, 2019–2020 academic year, students were
introduced to the concepts of e-collaboration, course outline, textbooks and tools (UCC-
LMS and PIAZZA) in the first week. In addition, teaching and learning materials and
activities were posted on both UCC-LMS and PIAZZA. This was followed by forming
groups of collaborators consisting of (3–5) member in the second week. From the third
week and on weekly basis collaboration activities in the form of problem based and
questions were posted and moderated by the facilitator on piazza system. The system
notified students whenever something was posted for participation. At the end of six
weeks of e-collaboration, the research instrument (Appendix I) was designed in Google
form and students were invited to participate in the survey. The following presents the
data analysis criteria.
Responses from the study were coded and analyzed using SPSS 16.0. First, a reliability
analysis was calculated. Followed by descriptive statistics such as Mean, Standard Devi-
ation and Frequency. Finally, we derived the scatterplot and Pearson product-moment
correlation coefficient. Results from the analysis are presented and discussed below.
• Save time and help students learn using the power of community
• Wiki style format enables collaboration in a single space
• Features LaTeX editor, highlighted syntax and code blocking
• Questions and posts needing immediate action are highlighted
• Instructors endorse answers to keep the class on track
• Anonymous posting encourages every student to participate
• Highly customizable online polls
• Integrates with every major LMS
Figure 3 presents an interface of CSC312 class. It presents the class at a glance with
an indication of 27 total posts and 317 total contributions.
will increase the flexibility to learn inside and outside the classroom” (M = 4.08; SD =
.928). Majority also believe that “implementing and using piazza as part of teaching and
learning tools will make the educational process easier and more enjoyable” (M = 4.00;
SD = .944). On the statement, “I would like my lecturer to integrate piazza in my class
in addition to face-to-face meetings in the class”, (M = 3.93; SD = 1.018). The overall
mean score and Standard deviation was M = 3.82 and SD = 0.739 respectively. This
demonstrated that majority of participants in the study from DCSIT-UCC has overall
positive attitude towards e-collaboration and their attitude results significantly vary with
their gender. These findings have helped to answer Research Question (RQ1), “What
are students’ attitudes toward e-collaboration through piazza in addition to face-to-face
meetings in the class?”.
The scatterplot in Fig. 8, and the correlation results in Fig. 9, respectively illustrate the
relationship results.
Scatterplot Results
In order to explore the relationships between two continuous variables and to know
either they are linearly or curvilinear related, it is important to generate scatterplot
before calculating correlations [36, 38]. This is because only linearly related variables
are qualified for correlations analysis.
Correlation Results
Since the scatterplot does not give us definite answer, we need to follow it up with
Pearson product-moment correlation coefficient and the output of our analysis is shown
in Fig. 9. A correlation is statistically significant if its “Sig. (2-tailed)” < 0.05. The
result in Fig. 9 indicated that there was a significant positive association among all the
constructs. The highest correlation result was between cognitive presence and teaching
presence at r = 0.745, n = 75, p = 0.01. On the other hand, the lowest correlation
result was found between cognitive presence and overall Attitude at r = 0.601, n = 75,
p = 0.01. The scatterplot in Fig. 8 further summarizes the results. Conclusively, this
demonstrated that there was a strong, positive correlation among all the constructs. The
result also illustrated that an increase in one construct leads to an increase in the other. A
demonstration of strong relationships among the constructs. The results of the correlation
therefore, give answer to research question (RQ4), “what are the relationships among
the key constructs of the instrument?”.
In addition, Table 1 presents both positive and negative recommendations for future
use of e-collaboration through Piazza and other forms of social media.
Table 1. Recommendation for Future use of e-Collaboration through LMS and Piazza
Therefore, the results from open-ended question in Table 1 support the quantitative
result in Fig. 9. This affirm an answer to RQ3, “what recommendations do students have
for future use of e-collaboration?”.
5.3 Limitations
As we proceeded with this research, some challenges encountered were technological
in nature. Some students encountered slow response on the platform, making it difficult
to engage effectively.
In addition, the sample population for this research was restricted to the Department
of Computer Science and Information Technology, University of Cape Coast. We hope
the sample population would have been larger; this would have ultimately increased the
responses we received.
Finally, access to internet connectivity on the platform was an issue faced by some
students making their participation on the platform less or no participation at all.
References
1. Prensky, M.: Digital natives, digital immigrants part 1. On the horizon 9(5), 1–6 (2001)
2. Prensky, M.: Don’t Bother Me, Mom, I’m Learning!: How Computer and Video Games are
Preparing Your Kids for 21st Century Success and how You Can Help! Paragon House, New
York (2006)
3. Bennett, S., Maton, K., Kervin, L.: The ‘digital natives’ debate: a critical review of the
evidence. Br. J. Edu. Technol. 39(5), 775–786 (2008)
4. Brown, C., Czerniewicz, L.: Debunking the ‘digital native’: beyond digital apartheid, towards
digital democracy. J. Comput. Assist. Learn. 26(5), 357–369 (2010)
5. Teo, T., Kabakçı Yurdakul, I., Ursavaş, Ö.F.: Exploring the digital natives among pre-service
teachers in Turkey: a cross-cultural validation of the digital native assessment scale. Interactive
Learning Environments, (ahead-of-print): pp. 1–14 (2014)
6. Prensky, M.: Digital natives, digital immigrants part 2: Do they really think differently? On
the horizon (2001)
7. Stanat, M.: China’s generation Y: Understanding the future leaders of the world’s next
superpower. Homa & Sekey Books (2006)
8. Tapscott, D.: Grown up digital: How the net generation is changing your world HC. McGraw-
Hill (2008)
9. Suto, H., Sakamoto, M.: Developing an Education Material for Robot Literacy, in Human
Interface and the Management of Information. Information and Knowledge in Applications
and Services. Springer. pp. 99–108 (2014). https://doi.org/10.1007/978-3-319-07863-2_11
10. Prensky, M.R.: From digital natives to digital wisdom: Hopeful essays for 21st century
learning. Corwin Press (2012)
11. Popescu, E., Cioiu, D.: eMUSE-integrating Web 2.0 tools in a social learning environment.
Advances in Web-Based Learning-ICWL 2011, p. 41–50 (2011)
12. Chebil, R., Lejouad-Chaari, W., Cerri, S.A.: An e-collaboration new vision and its effects on
performance evaluation. Int. J. Computer Inf. Systems Industrial Manage. Appl. 3, 560–567
(2011)
13. Kock, N., Nosek, J.: Expanding the boundaries of e-collaboration. Professional Communica-
tion, IEEE Trans. 48(1), 1–9 (2005)
14. Razmerita, L., Kirchner, K.: Social media collaboration in the classroom: a study of group
collaboration. In: Baloian, N., Burstein, F., Ogata, H., Santoro, F., Zurita, G. (eds.) CRIWG
2014. LNCS, vol. 8658, pp. 279–286. Springer, Cham (2014). https://doi.org/10.1007/978-3-
319-10166-8_25
15. Béres, I., Turcsányi-Szabó, M.: Added value model of collaboration in higher education.
Interdisciplinary J. E-Learning and Learning Objects 6(1), 203–215 (2010)
16. Jara, C.A., et al.: Synchronous collaboration of virtual and remote laboratories. Computer
Appl. Eng. Educ. 20(1), 124–136 (2012)
17. Maddrell, J.A., Morrison, G.R., Watson, G.S.: Community of inquiry framework and learner
achievement. in annual meeting of the Associaiton of Educational Communicaitons &
Technology, Jacksonville, FL http://www.jennifermaddrell.com/papers.2011
18. Garrison, D.R., Anderson, T., Archer, W.: Critical inquiry in a text-based environment:
computer conferencing in higher education. Internet Higher Education 2(2), 87–105 (2000)
Preliminary Study on e-Collaboration Readiness and Community 801
19. Garrison, D.R., Anderson, T., Archer, W.: The first decade of the community of inquiry
framework: a retrospective. Internet Higher Educ. 13(1), 5–9 (2010)
20. Akyol, Z., Garrison, D.R.: The development of a community of inquiry over time in an
online course: Understanding the progression and integration of social, cognitive and teaching
presence (2014)
21. Smith, B.L., MacGregor, J.T.: What is collaborative learning. Towards the Virtual University:
International Online Learning Perspectives, pp. 217-232 (1992).
22. Schoenmakers, S., Plugge, L., Kirschner, P.: Criteria for the evaluation of electronic
learning environments. 2000, Report of MMI/Learning Lab, Maastricht. http://members.
home. nl/la plug ge1/Plugge/publications/papers/UNESCO% 20Criteria% 20f or% 20the%
20Evaluation% 20of% 20Electronic% 20Learning. pdf [27/07/2007]
23. Reigeluth, C.M.: Instructional design theories and models: An overview of their current status.
Routledge (2013)
24. Nie, Y., Lau, S.: Differential relations of constructivist and didactic instruction to students’
cognition, motivation, and achievement. Learn. Instr. 20(5), 411–423 (2010)
25. Nie, Y., et al.: The roles of teacher efficacy in instructional innovation: its predictive relations
to constructivist and didactic instruction. Educ. Res. Policy Pract. 12(1), 67–77 (2013)
26. Corporation, E.B.: Constructivism as a Paradigm for Teaching and Learning (2004)
27. Mayer, R.E.: Should there be a three-strikes rule against pure discovery learning? Am. Psychol.
59(1), 14 (2004)
28. Anderson, J.R., Reder, L.M., Simon, H.A.: Applications and misapplications of cognitive
psychology to mathematics education. ERIC Clearinghouse (1999)
29. Anderson, T., et al.: Methodological Issues in the Content Analysis of Computer Conference
Transcripts (2000)
30. Swan, K., Garrison, D., Richardson, J.C.: A constructivist approach to online learning: The
community of inquiry framework, in Information technology and constructivism in higher
education: Progressive learning frameworks., IGI global. pp. 43–57 (2009)
31. Anderson, T., et al.: Assessing Teaching Presence in a Computer Conferencing Context (2001)
32. Garrison, D.R., Arbaugh, J.B.: Researching the community of inquiry framework: review,
issues, and future directions. Int. Higher Educ. 10(3), 157–172 (2007)
33. Gunawardena, C.N., Zittle, F.J.: Social presence as a predictor of satisfaction within a
computer-mediated conferencing environment. American J. Distance Educ. 11(3), 8–26
(1997)
34. Garrison, D.R.: Online community of inquiry review: social, cognitive, and teaching presence
issues. J. Asynchronous Learning Networks 11(1), 61–72 (2007)
35. Arbaugh, J.B., et al.: Developing a community of inquiry instrument: testing a measure of the
community of inquiry framework using a multi-institutional sample. Internet Higher Educ.
11(3–4), 133–136 (2008)
36. Pallant, J.: Survival Manual. A Step by Step Guide to Data Analysis Using SPSS, p. 4 (2011)
37. Gerber, S.B., Finn, K.V.: Using SPSS for Windows: Data analysis and graphics. Springer
(2013). https://doi.org/10.1007/0-387-27604-1
38. Green, S.B., Salkind, N.J.: Using SPSS for Windows and Macintosh. Pearson Upper Saddle
River, NH (2013)
Utilising Gamification and Virtual Environments
to Present Digitally Enhanced Advanced
Services (DEAS) for the Financial Sector
1 Introduction
The servitization in the current manufacturing and building industries presents a unique
opportunity to convey additional and future services to the customers. This aims to
ensure a long term, positive customer/user experience whilst enhancing the provider’s
revenues when offering additional services as part of the initial deal [1–3]. However,
the explanation of the various benefits that these services could yield for the customers
poses a major issue due to their complexity and bespoke nature.
To this end, technological advances and the increased popularity of smartphones
and tablets provided a new conduit to present and visualise information to the general
public. In particular, the use of emerging technologies such as 3D visualisation, Virtual
and Augmented Reality (VR/AR) and serious games was employed to present complex
information, training and simulations, in diverse domains such as medical training, envi-
ronmental sciences, defence and commercial electronics, to individuals and companies
[4–7].
Transferring this know-how to services provided for the manufacturing and financial
sector was achieved through a new approach namely Digitally Enhanced Advanced
Services (DEAS) [8, 9].
With the growing amount of organisations and businesses adopting innovative tech-
nologies to offer advanced services rather than just selling products, the financial service
providers have also had to consider the utilisation of DEAS as a long-term business model
[10, 11].
To investigate this further, this project developed a prototype online 3D serious game
in close collaboration with the EHAB group - servitization designers for the building
and financial sector - focusing on enhancing the understanding and education of their
servitization offers in the aforementioned domains. The project was designed with a
two-fold approach; (a) to provide a complete and realistic simulation of a building
construction and (b) to embed/explain seamlessly the DEAS offer of the real-life provider
(EHAB). This game design mantra offered positive outcomes in previous studies [12].
During the initial stages of the project, it was observed that servitization offers
were particularly convoluted and difficult for the customers to understand. One of the
challenges faced by this financial servitization design team was to help end-users see
the limitation imposed by the current method being used for pricing risk.
The following sections will present, the design and development process as well
as the challenges of the proposed serious game. The paper will elaborate on the game
design and provide the feedback of a specialists’ focus group after extensive gameplay
testing. The paper will conclude with a tentative plan of work for the development of an
example of a building positioned in a 3D/VR segment of a real-life UK city following
previous studies that employed simulated cities and gamification [13].
Fig. 1. Screenshot of EHAB serious game showing weather simulation and risks.
The game was designed as a simulation of construction processes and random weather
conditions spanning a duration of multiple months depending on the size of the construc-
tion project. In addition, the game design focused on motivating the user through gradual
reward schemes embedded in the game reflecting the user’s decisions [21, 22]. In par-
ticular, the game focused on the EHAB’s servitization Weather Ledger Platform which
was at the centre of the game and like its real-life counterpart, the offer works in-game
in a similar way, providing improvements to the player and simulating the benefits.
The selection of different options and timelines was accommodated in half of the
screen whilst the other half presented the 3D visualisation of the different construction
stages and the weather conditions as presented in Fig. 2.
The user, however, could change the screen size ratio and customize the panels to
maintain the development of more than one building as shown in Fig. 3. The customiza-
tion of the operating environment, as well as the provision of multiple choices, enhance
the experience for each individual user [23, 24]. To further immerse the user in the pro-
cess and the perks the DEAS offers, the game design introduced a key feature of the
service which is the Enhanced Planner. The latter helps the user to mitigate risk and
make better predictions for future weather events. The benefits of utilising the Enhanced
Planner were directly mapped into the game.
Fig. 3. The game offers the option to build and monitor more than one construction site.
the main virtual environment. However, the time-sensitive information is presented with
different colour intensity or highlighted by red coloured dots and frames as seen in Fig. 3.
The options and activities panels that present the simulation facts and support the
decision-making process of the user are illustrated in Fig. 4. The interface reveals the
probability of weather days per month when the user updates the time risk allowance on
the monthly calendar as presented in Fig. 5. Having this feature gives flexibility to the
player to change and update insurance on every round based on predicted weather events.
This imitates the function of the real-world counterpart which provides a consistent and
accurate approach, streamlining how the user can plan and avoid unforeseen issues.
• Risk Management: points awarded for insurance successes and buying site upgrades
(to help insurance)
• Experience: points awarded for progressing and completing sites
Utilising Gamification and Virtual Environments to Present Digitally 807
Fig. 4. The UI design presents a simple and colour-coded panel that guides the user through the
different options and activities.
Fig. 5. User interface (UI) design that allows the user to monitor closely the weather patterns and
the risks involved in contrast to the insurance services enabled.
The player’s reputation points are also mapped to an overall star system, where the
player can earn points to increase the number of stars they have. This is implemented
using a curve/graph which determines how many stars the player should have based on
808 S. Khan et al.
the total number of points (e.g. 1 star requires 100 points, 4 star requires 900 points,
etc.).
For the evaluation of this project, the team opted to develop a Technology Acceptance
Model (TAM), based on previous projects that were evaluating prototype systems and
technologies with particular groups of the public as presented in Fig. 6 [25–27]. The
TAM aims to identify if and how much users will accept new technologies to complement
or replace existing practices [28, 29].
This TAM followed a similar structure to previous studies related to the introduction
of emerging technologies to diverse areas aiming to acquire users’ feedback on the
following user experience areas [26, 30].
4.2 Participants
The evaluation was performed by ten users (5 female, 5 male) who were specialists in
the field and formed our initial focus group that game-play tested the application. The
participants volunteered to test and evaluate the game.
The users’ feedback on the perceived use (PU) regarding their experience and under-
standing of the aforementioned aim offered encouraging results as illustrated in Fig. 7.
In particular, the users responded positively with 80% (Strongly Agree and Moderately
Agree and Somewhat agree) for PU1 (The use of this serious game helped me understand
EHAB servitisation offer), with 50% of the responses being on the Strongly Agree. This
positive feedback was of major importance for the project as the complexity of the finance
products and especially the parametric insurance offers were particularly challenging
for the users.
Fig. 7. Participants’ feedback on the statements related to perceived use (PU) of the game.
The second statement PU2, (The use of this serious game simulated the servitisation
offer effectively) received similar feedback scoring also 80%. A 20% of users responded
neutrally to this statement. Post questionnaire discussions with the users highlighted that
the simulation although presented correctly the construction process should have taken
into consideration additional factors that might affect the delivery time of a building.
This was an interesting suggestion and the game could be enriched with additional
construction issues in the following versions. However, this specific work was concerned
with the adverse weather conditions that could damage and delay the construction of the
building, presenting mainly the Weather Ledger Platform.
On the third statement, the users responded positively with 90% (Strongly Agree
and Moderately Agree and Somewhat agree) for PU3 (The use of these serious games
offered a better opportunity to learn about the servitization offer) with only 10% being
neutral and no negative responses. Notably, there were no Strongly Disagree responses
from the participants.
The above results highlight an initial appreciation of the potential users’ tendency
to utilise these technologies and methods (i.e. 3D visualisation and gamification) in the
Utilising Gamification and Virtual Environments to Present Digitally 811
particular field. The different hypotheses that link the nine TAM constructs as illustrated
in Fig. 6 are not analysed in this paper, as the limited number of participants offered
mainly indicative results that could not support a full TAM analysis [27–29].
However, the users’ responses related to the Perceived Usefulness (PU) of the pro-
posed system are on par with other studies that utilised gamification to support the
servitization of various products and industries or investigated the impact of gamifica-
tion on clients [12, 16, 31–34]. This confirms the initial hypothesis that the gamification
approach for the servitization of financial products will have comparable outcomes to
other studies that focus primarily on manufacturing servitization [12, 35, 36].
As this study investigates an uncommon area of servitization which is not directly
linked to the manufacturing domain but employs gamification to present financial servi-
tization offers, no other similar studies that use the same methods and metrics were
found. Remote similarities could be found in only one study that customised an existing
board game, namely: snake and ladders, to convey different servitization offers [16].
In addition, the customised TAM, based on previous projects which aimed to investi-
gate the impact of emerging technologies on customers’ uptake of new products, has also
presented similar responses to the current study’s results [27, 29]. The project’s design
which was supported by industry collaboration and continuous feedback throughout the
development was reflected in the users’ responses to the PU questions. This established
a baseline of areas of interest that need to be covered in such applications and suggested
a selection of UI structures and actions that convey successfully the complex financial
products to the customers.
At this stage of the project, this output was deemed essential for the continuation of
the development and expansion of the particular system. In addition, the above results
and analysis of this preliminary evaluation highlight the potential use of these structures
and methods for the development of other similar systems that employ gamification and
3D visualisation for enhancing the presentation of information and user engagement
with other servitization offers.
6 Conclusions
This paper presented the design consideration and challenges of a novel 3D serious game
developed to support the customers’ understanding of financial and insurance choices
in the construction sector. Both the virtual environment and the game design focused on
the simplification of information provided to the user/customer whilst offering a holistic
overview and real-time visualisation of construction projects.
The game was based on the EHAB insurance servitization offers and their Weather
Ledger Platform that support the decision making process of various construction
projects in the UK.
The application was evaluated by ten volunteers that responded to pre and post-test
questionnaires designed to inform a custom TAM. The results of this part of the TAM
were overall positive yet are indicative and additional user-trials with larger cohorts are
required to define the exact level of learning outcomes achieved through this serious
game application.
812 S. Khan et al.
A future plan for further enriching this serious game with additional variables and
construction drawbacks will be formed in the next stage. To identify further the impact
on the particular industry a larger cohort evaluation will be essential.
Long term impact could highlight the potential of video games outside entertain-
ment; this could encourage other businesses to collaborate with game developers and/ or
other innovative technology practitioners to create solutions for their management and
marketing issues related to DEAS.
Acknowledgments. The authors would like to thank EHAB for their involvement as an industrial
partner and the provision of vital information for the development of this system. Furthermore, the
authors would like to thank Lyall Campbell for his work on this project. This project was funded
by EPSRC.
References
1. Schroeder, A., Naik, P., Ziaee Bigdeli, A., Baines, T.: Digitally enabled advanced services:
a socio-technical perspective on the role of the internet of things (IoT). Int. J. Oper. Prod.
Manag. 40, 1243–1268 (2020)
2. Kowalkowski, C., Bigdeli, A.Z., Baines, T.: Guest editorial: the future of servitization in a
digital era. J. Serv. Manag. 33(1), 59–69 (2022). https://doi.org/10.1108/JOSM-01-2022-450
3. Du, W., Sepasgozar, S.M.E., Romero, J.S.G.: Measuring virtual reality (VR) technology
application and adoption in chinese construction risk management. Environ. Sci. Proceed.
12(1), 18 (2021). https://doi.org/10.3390/environsciproc2021012018
4. Lagoo, R., Charissis, V., Harrison, D.: Mitigating driver’s distraction with the use of
augmented reality head-up display and gesture recognition system. In: IEEE Consumer
Electronics Magazine (CEM) J. 8(5), 79–85 (2019)
5. Liu, X., Zhang, J., Hou, G., Wang, Z.: Virtual reality and its application in military. In: IOP
Conference Series: Earth Environmental Science 170(3), 032155 (2018)
6. Ward, B.M., Charissis, V., Rowley, D., Anderson, P., Brady, L.: An evaluation of prototype
VR medical training environment: applied surgical anatomy training for malignant breast
disease. Stud. Health Technol. Inform. 2008(132), 550–555 (2008)
7. Huang, J., Lucash, M.S., Scheller, R.M., Klippel, A.: Visualizing ecological data in virtual
reality. In: 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), pp. 1311–
1312 (2019). https://doi.org/10.1109/VR.2019.8797771
8. Alfalah, S., Harisson, D.K., Charissis, V., Evans, D.: Investigation of multimodal interaction
and 3D simulation environment for prototype healthcare system. In: Journal of Enterprise
Information Management (JEIM), Mustafee N., Katsaliaki K. (Eds.): 26(1/2), 183 – 197
(2013). ISSN: 1741–0398
9. DEAS NetworkPlus: Digitally Enhanced Advanced Services EPSRC NetworkPlus Manufac-
turing Theme Research Agenda 2019; University of Westminster: London UK; ISBN 978
185449 478 8; Available online: www.deas.ac.uk Accessed 20 April 2022 (2019)
10. Baines, T., Lightfoot, H.W.: Servitization of the manufacturing firm. Int. J. Oper. Prod. Manag.
34(1), 2–35 (2014)
11. Wood, Z., Godsiff, P.: Establishing the core principles of servitisation for application outside
manufacturing. Compet. Advant. Digit. Econ. 2021, 125–130 (2021). https://doi.org/10.1049/
icp.2021.2425
Utilising Gamification and Virtual Environments to Present Digitally 813
12. Khan, M.S., et al.: Improving user experience and communication of digitally enhanced
advanced services (DEAS) offers in manufacturing sector. Multimodal Technol. Interact. 6,
21 (2022). https://doi.org/10.3390/mti6030021
13. Wang, S., Charissis, V., Harrison, D.K.: Augmented reality prototype HUD for passenger
infotainment in a vehicular environment. Advances Science, Technol. Eng. Syst. J. 2(3),
634–641 (2017)
14. Romero-Rodriguez, L.M., Ramirez-Montoya, M.S., Gonzalez, J.R.V.: Gamification in
MOOCs: Engagement application test in energy sustainability courses. IEEE Access 7,
32093–32101 (2019). https://doi.org/10.1109/access.2019.2903230
15. Abuhammad, A., et al.: “MedChemVR”: a virtual reality game to enhance medicinal chemistry
education. Multimodal. Technol. Interact. 5, 10 (2021). https://doi.org/10.3390/mti5030010
16. Andrews, D., Dmitrijeva, J., Bigdeli, A.Z., Baines, T.: Snakes and ladders in servitization:
using a game to capture inhibitors and enablers of transformation snakes and ladders in
servitization using a game to capture inhibitors and enablers of transformation. Res. Technol.
Manag. 61, 1–12 (2018). https://doi.org/10.1080/08956308.2018.1516930
17. Falah, J., et al.: Identifying the characteristics of virtual reality gamification for complex
educational topics. Multimodal. Technol. Interact. 5(9), 53 (2021). https://doi.org/10.3390/
mti5090053
18. Khan, M.S., Charissis, V., Harrison, D.: Development and preliminary evaluation of a serious
game to communicate digitally enhanced advance service (DEAS) offers; servitization: a
pathway towards a resilient, productive and sustainable future. In: Proceedings of the Spring
Servitization Conference 2021, Virtual Conference, 10–12 May, p. 287 (2021)
19. Gebauer, H., Paiola, M., Saccani, N., Rapaccini, M.: Digital servitization: crossing the per-
spectives of digitization and servitization. Ind. Mark. Manag. 93, 382–388 (2021). https://doi.
org/10.1016/j.indmarman.2020.05.011
20. Marcon, E., Marcon, A., Le Dain, M.A., Ayala, N.F., Frank, A.G., Matthieu, J.: Barriers
for the digitalization of servitization. Procedia CIRP 83, 254–259 (2019). https://doi.org/10.
1016/j.procir.2019.03.129
21. Kohtamäki, M., Parida, V., Patel, P.C., Gebauer, H.: The relationship between digitalization
and servitization: the role of servitization in capturing the financial potential of digitalization.
Technol. Forecast. Soc. Chang. 151, 119804 (2020). https://doi.org/10.1016/j.techfore.2019.
119804
22. Alsawaier, R.: The effect of gamification on motivation and engagement. Int. J. Inf. Learn.
Technol. (2018). https://doi.org/10.1108/IJILT-02-2017-0009
23. García-Magro, C., Soriano-Pinar, I., Re, U., Carlos, J.: Design of services in servitized firms:
gamification as an adequate tool. J. Bus. Ind. Mark., pp. 575–585 (2019). https://doi.org/10.
1108/JBIM-12-2018-0413
24. Altarteer, S., Charissis, V., Harrison, D., Chan, W.:. Product customisation: virtual reality and
new opportunities for luxury brands online trading. In: International Conference on 3D Web
Technology / ACM SIGGRAPH, 22–24 Anaheim, California, USA (2016)
25. Kharoub, H., Lataifeh, M., Ahmed, N.: 3D user interface design and usability for immersive
VR. Applied Sciences 9(22), 4861 (2019). https://doi.org/10.3390/app9224861
26. Al-Emran, M.: Evaluating the use of smartwatches for learning purposes through the inte-
gration of the technology acceptance model and task-technology fit. Int. J. Human-Computer
Interaction 37(19), 1874–1882 (2021). https://doi.org/10.1080/10447318.2021.1921481
27. Altarteer, S., Charissis, V.: Technology acceptance model for 3D virtual reality system in
luxury brands online stores. IEEE Access 7, 64053–64062 (2019)
28. Marangunić, N., Granić, A.: Technology acceptance model: a literature review from 1986 to
2013. Univ. Access Inf. Soc. 14, 81–95 (2015)
814 S. Khan et al.
29. Lee, Y., Larsen, K.R.T.: The technology acceptance model: past, present, and future. Com-
munications of the Association for Information Systems 12 (2003). https://doi.org/10.17705/
1CAIS.01250
30. Vanduhe, V.Z., Nat, M., Hasan, H.F.: Continuance intentions to use gamification for training
in higher education: Integrating the technology acceptance model (TAM), social motivation
and task technology fit (TTF). IEEE Access 8, 21473–21484 (2020)
31. Eisingerich, A.B., Marchand, A., Fritze, M.P., Dong, L.: Hook vs. Hope: how to enhance
customer engagement through gamification. International J. Research in Marketing 36(2),
200-215 (2019)
32. Xi, N., Hamari, J.: Does gamification affect brand engagement and equity? a study in online
brand communities. J. Bus. Res. 109, 449–460 (2020)
33. Baird, A., Raghu, T.: Associating consumer perceived value with business models for digital
services. Eur. J. Inf. Syst. 24(1), 4–22 (2015). https://doi.org/10.1057/ejis.2013.12
34. García-Magro, C., Soriano-Pinar, I.: Design of services in servitized firms: gamification as
an adequate tool. J. Business Ind. Marketing 35(3), 575–585 (2020). https://doi.org/10.1108/
JBIM-12-2018-0413
35. Shi, V.G., Ridgway, K., Baldwin, J., et al.: Gamification for Servitization in Growth through
servitization: Growth through servitization Chapter: Frameworks and Analytical Techniques,
(Eds) Baines, T., Clegg, B., Harrison, D. (2014)
36. Baines, T., Shi, V.G.: A Delphi study to explore the adoption of servitization in UK companies.
Production Planning & Control 26(14–15), 1171–1187 (2015). https://doi.org/10.1080/095
37287.2015.1033490
Author Index
A B
Abdullah, Norris Syed, 249, 257 Bahari, Mahadi, 257
AbuSa’aleek, Atef Odeh, 766 Bahn, Jacob A., 196
Adankai, Victor, 381 Bansal, Arvind K., 432
Ahmad, S., 1 Batsukh, Bat-Erdene, 508
Akinnuwesi, Boluwaji, 341 Bautista, Yohn Jairo Parra, 381
Alankar, Adham M. M., 59 Bouflous, Zakariyae, 206
AL-Ansari, Aliya, 413 Bouragba, Khalid, 206
Alattar, Alhassan E., 398 Buvet, Pierre-André, 463
Alférez, Germán H., 196
Al-Hammadi, Fatima, 633
C
Al-Hammadi, Yousra, 633
Cabrera, Rafael Guzmán, 499
Alhazmi, Abdulsalam K., 633
Caligiuri, Luigi Maxmilian, 237
Alismail, Sarah, 325
Carrillo, Luis Manuel Ledesma, 499
Allamudi, Meghna, 607
Castro, José Carmen Morales, 499
AL-Lawati, Batool, 413
Cearley, Jerry, 103
Allgood, Nicholas R., 273
Cecile, Fourie, 724
Alnanih, Reem, 660
Charissis, V., 802
Aló, Richard, 381
Christian, Adepo Joël, 44
Al-Omair, Osamah M., 451
Cookenmaster, Dakota C., 196
Alsakkaf, Nasr, 633
Crema, Rafael Santos, 775
AL-Sawafi, Sumaya, 413
Alwadai, Asma, 660
Alwan, Ali A., 16 D
Amoo, Franklin Kome, 786 Darmawan, Deni, 649
Anagnostopoulos, Christos, 69 Davoudi, Heidar, 593
Apaza, Honorio, 565 Deksne, Daiga, 555
Aruhuanca, Brisayda, 565 Derahman, M. N., 16
Asad, Arghavan, 227 Dlamini, Ricky Nhlanhla, 742
Athavale, Rishi, 359 du Preez, Johan A., 534
© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2023
K. Arai (Ed.): FTC 2022, LNNS 561, pp. 815–817, 2023.
https://doi.org/10.1007/978-3-031-18344-7
816 Author Index
E M
Ebrahimi, Mehran, 593 Maddipatla, Jagadeepram, 359
Ekpenyong, Moses, 341 Mattu, Gurjeet Singh, 622
Elkaseer, Ahmed, 398 Michel, Babri, 44
Estrada, Jheanel, 478 Mohammad, A., 1
Mohammadi, Farah, 227
F Mohsen, Saeed, 398
Fache, Bertrand, 463 Moravcik, Oliver, 680
Fadel, Wiam, 463 Morrison, Ann, 155
Flores, Anibal, 565 Mullachery, Balakrishnan, 325
N
G
Neto, Guilherme Nunes Nogueira, 775
Georges, Anoh Nogbou, 44
Ngoc, Quoc Tran, 179
Gertis, E. Miles, 752
Nicholas, Charles K., 273
Gomez-Enriquez, Diego, 286
Nicolas, Boukar Abatchia, 698
Guzmán-Castillo, Adán, 370
Nina, Mariela M., 565
Nohama, Percy, 775
H Noura, Ibrahim Ganaou, 698
Haddara, Moutaz, 121 Novikova, Aleksandra, 155
Hajiyan, Hooria, 593 Nwokoro, Chukwudi, 341
Halicka, Katarzyna, 485
Hamid, Hanifah Binti Abdul, 59 O
Harouna, Moussa, 698 Obot, Okure, 341
Harrison, D. K., 802 Opinas Jr., Gil, 478
Hederman, Lucy, 295 Ouattara, Kobenan Ali, 44
Howard, Grant Royd, 742 Ouzzif, Mohammed, 206
Huang, Shihong, 451 Øverdal, Maria, 121
I P
Ibarra-Fiallo, Julio, 370 Pallipuram, Vivek K., 103
Ibrahim, Ahmed Mamdouh Abdelfatah, 249, 257 Peerzada, Abdul B., 493
Ikwunne, Tochukwu, 295 Pérez-Hernández, María, 370
Intriago-Pazmiño, Monserrate, 370 Pinales, José Ruiz, 499
Priego, Belém, 499
J
R
Junan, S., 1
Rahadian, Dian, 649
RahmtAllah, Enas Abdelwahab Eltom, 766
K Raman, Adhiti, 493
Kaed, Ezzadeen, 633 Rangaraju, Prasad, 493
Kapočiūtė-Dzikienė, Jurgita, 577 Rebola, Claudia B., 286
Kaur, Shubhpreet, 90 Reyes-Ortiz, José A., 521
Kaur, Tarandeep, 90 Risda, Dianni, 649
Khan, S., 802 Rouam, Abdelhadi, 463
Kolomvatsos, Kostas, 69
Krovi, Venkat N., 493 S
Saeed, Murad Abdu, 766
L Saif, Faten A., 16
Langseth, Marius, 121 Salimbajevs, Askars, 577
Latip, Rohaya, 16 Schmid, Matthias J., 493
Lezama Sánchez, Ana Laura, 521 Scholz, Steffen, 398
Liang, Y. Daniel, 752 Scrivner, Olga, 607
Lyons-Rocque, Catherine, 312 Semwal, Sudhanshu Kumar, 312
Author Index 817
Seonghoon, K., 1 U
Sharma, Deepak, 622 Udby, Tristan, 138
Sharma, Sukhdeep, 622 Udo, Aniema I. A., 341
Silva, Carlos, 565 Uzoka, Faith-Michael, 341
Singh, Aditi, 432
Skadiņš, Raivis, 555 V
Vargas-Alfonso, Erwin, 286
Skotti, Xenia, 69
Srivastava, Manu, 493
W
Stephanus, Botha Benjamin, 724 Wahyudin, Dinn, 649
Suryadi, Andri, 649 Walker, Ian D., 493
Svetsky, Stefan, 680 Wall, P. J., 295
Wan Ahmad, Wan Fatimah, 786
Williams, C. Todd, 710
T Wong, Dennis, 734
Taylor, Rebecca M. C., 534
Y
Theran-Suarez, Carlos, 381
Yau, Peter ChunYu, 734
Tiado, Mahamadou Issoufou, 698
Yong, B., 1
Tian, Yun, 138 Yussiff, Abdul-Lateef, 786
Tito, Euler, 565 Yussiff, Alimatu–Saadia, 786
Torres-Constante, Eddy, 370
Tovar Vidal, Mireya, 521 Z
Tripathi, Anshuman, 478 Zaizi, Nurzi Juana Binti Mohd, 59
Tso, Ejoe, 734 Zidoum, Hamza, 413