2021 Chapter MachineAndDeepLearningAlgorith

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/352151477
Machine and Deep Learning Algorithms for Wearable Health Monitoring
Article · May 2021
CITATIONS READS
2 2,293
4 authors, including:
Faisal Baig
The Hong Kong Polytechnic University
14 PUBLICATIONS 33 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Thyroid cancer; a novel insight into diagnostic and therapeutic approach View project
All content following this page was uploaded by Faisal Baig on 05 June 2021.
The user has requested enhancement of the downloaded file.

Machine and Deep Learning Algorithms
for Wearable Health Monitoring
Chengwei Fei, Rong Liu, Zihao Li, Tianmin Wang, and Faisal N. Baig
Abstract Because people desire a high quality of life, health is a vital standard of
living factor that is attracting considerable attention. Thus, the development of
methods that enable rapid and real-time evaluation and monitoring of the human
health status has been crucial. In this study, we systematically reviewed the tech-
niques of data mining and machine learning (ML) for wearable health monitoring
(WHM) and their applications, including conventional ML methods (artificial neu-
ral networks, the Kriging model, support vector machines, and principal component
analysis) and the latest advance in deep learning (DL) algorithms for WHM; specifi-
cally, the advantages of the DL-based approaches over the traditional ML methods
were analyzed in line with metrics associated with data feature extraction and iden-
tification performances. Moreover, to attain an intuitive insight, this study further
reviewed the developments on the classifier performance with regard to detection,
monitoring, identification, and accuracy. Finally, with regard to the characteristics
of time series data acquired using health condition monitoring through sensors, rec-
ommendations and advices are provided to apply DL methods to human body evalu-
ation in specific fields. Moreover, future research trends required to improve the
capability of DL algorithms further are offered.
Keywords Machine learning · Deep learning · Wearable health monitoring ·

Data mining
C. Fei · Z. Li · T. Wang
Department of Aeronautics and Astronautics, Fudan University,
Shanghai, People’s Republic of China
e-mail: [email protected]; [email protected]; [email protected]
R. Liu (*)
Institute of Textiles and Clothing, the Hong Kong Polytechnic University,
Hong Kong, SAR, China
e-mail: [email protected]
F. N. Baig
Department of Health Technology and Informatics, Institute of Textiles and Clothing,
the Hong Kong Polytechnic University, Hong Kong, SAR, China
e-mail: [email protected]
© The Author(s), under exclusive license to Springer 105

Nature Switzerland AG 2021
A. K. Manocha et al. (eds.), Computational Intelligence in Healthcare, Health
Information Science, https://doi.org/10.1007/978-3-030-68723-6_6
106 C. Fei et al.
1 Introduction
With improvement in the living standard and increase in the aging population, peo-
ple are becoming increasingly aware by the significance of healthcare in their daily
lives. Wearable health monitoring (WHM) is a rising technology that enables steady
ambulatory monitoring of humans to record vital information related to their health
and body without much discomfort and interference with their routine activities
when they are staying at home, workplace, or other exercise-focused places, or clin-
ical environment [124, 131, 164]. The four major areas of focus of the technical
designs of WHM devices are reliable and safe, low power consumption, ergonomic,
and comfortable [179, 220]. Considerable attention is concentrated on smart WHM
systems, which are fabricated using actuators, sensors, and smart fabrics and involve
the technologies such as electronic surveillance, wireless sensor networks, and so
on. The reason for the considerable attention is that smart WHM systems enable
synchronization to domestic patients and allow real-time consultation with the
healthcare providers without incurring any traveling cost [136, 141, 189]. Smart
WHM systems are provided in various forms—skin-contact devices, implantable
devices, smart clothes, and other wearable small things [76, 158]—and have been
applied for monitoring vital signals related to health and body, body movement, fall
prevention, and location [18, 60, 114, 168]. However, the acceptance level of WHM
devices by end users is low [4, 117, 153], because the data processing technologies
used in WHM systems or devices cannot efficiently manage the data collected by
the varieties of sensors installed in the system during the stages of data preprocess-
ing, extraction of discriminative and salient features, and data recognition [33] in
body activity recognition. To overcome the aforementioned problem in body activ-
ity recognition, machine learning (ML) techniques serve crucial roles in interpret-
ing the activity details. The ML methods include the support vector machine (SVM)
[8, 55, 56, 104], hidden Markov model [172], decision tree [96, 98], K-nearest
neighbor (K-NN) [182], and Gaussian mixture model [165]. These techniques pro-
vide WHM with efficient solutions for processing human activity recognition
through wearable sensor. Moreover, deep learning (DL), a new ML branch, models
high-level features present in data through automatic feature extraction with less
human efforts. Thus, the technology has been highly applied in machine fault diag-
nosis and system health monitoring [102, 166, 167, 173, 186, 230]. However, up to
date, few researches have systematically focused on the use of ML and DL algo-
rithms in WHM system and technologies.
This review study aims to explore the processes involved in developing wearable
health monitoring devices through ML and DL and offers a deep review of the
related technologies and the process of implementation and feature learning. The
rest of the paper is organized that Sect. 2 presents the existing problems of WHM,
an analysis of traditional ML-based approaches and their subdivisions is presented
in Sect. 3, in Sect. 4 the investigations of DL-based advanced approaches and its
subdivisions are proposed, Section 5 provides the discussion of various ML algo-
rithms used for WHM, Section 6 presents the limitations and challenges for future
Machine and Deep Learning Algorithms for Wearable Health Monitoring 107
works on WHM based on ML and DL methods, and Sect. 7 finally concludes this
investigation as well.
2 Key Technologies of WHM
Since the beginning of the twenty-first century, the population in the developed
countries is aging at an extraordinary rate. Currently, approximately 600 million
people who are above 60 years old and approximately 860 million people have
chronic heart disease around the world [85]. For instance, in Japan, aging is sharply
rising. More than 23% of the population in Japan was above 65 years old in 2010,
and this value is expected to exceed 30% in 2025 [85]. Both medical expenditure
and lifestyle disease risks are being increased by increasing aging population.
Therefore, it is urgent to implement preventive medicine and health management to
improve the quality of life of individuals and reducing their medical expenses,
rather than passive medical care.
Health or healthcare monitoring is a preventive medicine and health manage-
ment technique that can be used in the daily lives of human beings. Healthcare
monitoring is applied due to the following reasons: to ensure a better support in
medical diagnosis, provide faster recovery after medical treatment or injury, moni-
tor athletes’ performance in sport or fitness activities, and guide professional per-
sonnel for evaluating and monitoring their physical response under different
dangerous conditions to manage the tasks assigned to them and their occupational
health better [167].
Information and communication technology (ICT) are a crucial aspect of health
monitoring because the use of ICT enables several applications, such as telesurgery
and teleconsultations, to support independent living and wellness. Moreover, envi-
ronment sensors installed in a patient’s body surroundings can make the health
information of the patient accessible from any part of the world by continuously
supervising and evaluating the patient’s domestic activity. However, the use of the
pervasive ICT system is still constrained to the environment which is closed. Lastly,
the advanced wireless communication systems like Bluetooth, WiFi, near-field
communication, ANT+, and Zigbee are adopted in healthcare devices, mobile
phones, and smartphones [31, 44]. Furthermore, the development of micromecha-
tronics such as micromachine and large-scale integrated circuits has enabled the
development of miniature and lightweight wearable sensors. Nonintrusive wearable
sensors equipped with wireless ICT address the restrictions of emergency and hos-
pitalization care and thus allow WHM for individuals. A comparison between the
medical service evolution of different micromechatronic system is shown in Table 1.
As indicated in Table 1, unlike hospitalization (or emergency care) and home
care, WHM is a promising method which can monitor patient’s health anytime and
anywhere in real time at a low cost through miniature, lightweight, and portable
sensors. Obviously, the WHM devices are key systems for performing healthcare.
The following subsections offer a review of the general system architecture of
108 C. Fei et al.
Table 1 Medical service evolution with different micromechatronics

Medical Hospitalization and Wearable healthcare
service emergency care Home care monitoring
Cost/ Low Medium High
performance (treatment medicine) (regular Anywhere, anytime,
monitoring) (preventive medicine)
Care place Medical facilities Any individual’s
home
Medical Any Designed for use in Wearable sensors
devices (big) house (tiny and light)
(sizes) (portable)
Fig. 1 Commonly used framework of wearable health system comprising devices [47, 101]
WHM and elaborate on the system or devices used, vital signals obtained, and data
and signals analyses conducted in WHM.
2.1 Generic System Architecture of WHM
An increase in aging population and chronic heart diseases has led to increasing
concern about the development of cost-effective wearable physiological measure-
ment devices for nondomestic consumer with data storage facility [220]. An abstract
generic WHM architecture was designed in this study by conducting a thorough
literature investigation (Fig. 1). The architecture is decomposed into four parts (or
modules)—(i) a body area network (BAN) module that can employ different tech-
niques, (ii) a data logger or portable unit (PU), (iii) data analysis module, and (iv)
real-time monitoring module that enables the visualization of health-related data
[47, 101].
The BAN can be applied to many types of wearable devices due to its structure.
A network of sensors, known as the BAN, can be created by placing an interconnec-
tion of these sensors around the human body. The signals of this network are trans-
mitted to the portable processing module (or unit). Data centralization to a single
PU is possible by connecting all sensors through a network. Thus, information can
be gathered from numerous sensors and then sent to external networks from body
for teleprocessing. Moreover, the BAN intensifies the synchronization, control, pro-
gramming, and scheduling of the entire system. The system enables the WHM sys-
tem to readjust according to the current physical condition and external situations.
These merits optimize the resource usage [151]. Wireless connection is a crucial
asset and enables systems to become mobile and ubiquitous.
The PU, a data logger unit or user interface box, is a unit in which all the infor-
mation is gathered and contains output and input ports of the WHM devices. The
primary input information are the decisive signals obtained from sensors and other
portable devices connected. The exchange between sensors and the PU is generally
conducted using wires. Such a communication provides easier and more economi-
cal WHM device. Currently, some alterations have been emerging in this communi-
cation technique due to the development of technologies, such as smart clothes.
Smart clothes contain various interconnections (wires) that are woven and embed-
ded into the fabricable clothes that are worn by patients. This is a much more favor-
able WHM technology, because it can avoid the inconvenience of loose wires
around the body. The absence of wires leads to higher degrees of comfort and free-
dom. An inventive approach was proposed in which communication is made using
biological channels [220]. In this method, the human body serves as a transmitter
through electrostatic engineering fields. After getting the analog vital signals, the
PU amplifies and/or filters the obtained signals and then converts them into the digi-
tal signals. Signal processing is conducted in the PU or in the other device after
transmitting the data. The signal features are extracted through signal processing to
evaluate a subject’s health condition through anomaly detection and disease predic-
tion. The original data received from the PU may be wirelessly transmitted or saved
in a memory card. The PU receives data from real-time monitoring equipment and
stores it in a local memory unit (Fig. 1). This bidirectional communication allows
other devices to establish a wireless contact with a primary device and facilitates the
storage of data collected from several storage sensors or devices. This system is
beneficial to tracking the time of incidence and maintaining records [47, 151, 220].
The popular WHM-based wireless controls include WiFi, Bluetooth, Zigbee, and
LoRa (more advanced technology). The features of these technologies are listed in
Table 2.
Mobile telecommunication technologies are also employed to transmit data via
general packet radio service (GPRS) in real time. The GPRS is a standard mobile
data service for global mobile communication. The communication protocol is a
fundamental aspect of a wearable device and aids in minimizing the energy con-
sumption of the device [11]. The battery half-life can be improved by lessening the
110 C. Fei et al.
Table 2 Comparison between primary features of wireless protocols [46, 128]

Max data
Protocol types of wireless Max range (m) rate (Mbps) Power consumption (mW)
Bluetooth low-energy (BLE) 100 1 10
Bluetooth 100 1–3 2.5–100
(before version 4.0)
LoRa 50,000 0.0007 (customizable)
Zigbee 100 0.25 35
Wi-Fi 150–200 54 1,000,000
amount of data transfer. Data can be conveyed merely when the data are saved in the
internal memory for offline data analysis. Data is able to be saved in a micro secure
digital card or an internal digital memory and can be transmitted through a USB link
between a WHM device and another device. The energy consumption of a WHM
device can also be minimized through integrating compression techniques and
transmission protocol [11]. Moreover, this method is helpful when network band-
width limitations occur, the data storage capacity is limited, or data compression is
required [23].
Real-Time Monitoring Distant monitoring through WHM allows prompt actions
to be taken for of hospitalized patients in a timely manner by alerting the medical
staff and the patient in case of any clinical emergency. Moreover, daily vital signals
can also be monitored [151]. Patients inside a specific area like hospital can be
monitored using WHM. The patients can freely move in the specific area, while
their fundamental information like patient location is being sent to a distant moni-
toring center wirelessly. These real-time monitoring systems can be equipped with
a set of alarms [47, 103], to alert the medical staff and patient during an emergency
so that the patient can have a healthy life and freely move while their vital signals
are being continuously or intermittently transmitted to a remote monitoring center.
The data collected from patients can also be useful to reveal their ambient tempera-
ture (excessive cold or heat) [35, 47]. Finally, the key signals can be sent via
Bluetooth to personal computers or portable devices, to visualize and analyze the
health conditions of a person. Mobile technologies, such as GPRS, can be used in
this real-time monitoring method to analyze athletes’ vital signals during their exer-
cises, sports activities, and daily workouts and to analyze the health information of
combatants and firefighters [46, 47].
Offline Monitoring The data of the key signals may be saved in a PU such as tiny
SD card, for the applications in medical diagnosis and analysis or only for indi-
vidual record. The data storage and real-time monitoring are conducted simultane-
ously to keep a record of vital data for the diagnosis and prediction in hospital
[46, 114].
2.2 Device or System of WHM
Many WHM devices have been developed to conduct one or several physiological
parameter measurements. Figure 2 illustrates the various WHM devices attached to
end users’ bodies.
A smartwatch is a typical WHM device and can monitor the blood oxygen satura-
tion, heart rate, and the temperature on the skin through a data communication mod-
ule that is wireless [123]. Recently, novel smartwatches provide high wearing
comfort due to their design, enable mobile and wireless connection, and provide a
long-time vital monitoring (more than 24 h) proposed in [28]. These novel smart-
watches monitor physical activity parameters, such as calories burned, distance trav-
eled, and heart rate. More recently, PEAKTM shown in Fig. 2-(4) was proposed and
is the first smartwatch that could track the cycles of sleep [20, 211]. The Moov shown
in Fig. 2-(7) is a new wearable bracelet, monitoring the movement and being worn in
any positions of the body based on the sports type. For example, the bracelet can be
worn in the wrist while swimming or the leg while conducting running activities.
Google Contact Lens illustrated in Fig. 2-(2) is a type of WHM device that indi-
cates the development direction of wearables. In the future, such wearables will
reduce in size from the macroscale to the microscale and then will be available in
Fig. 2 Several wearable health monitoring devices, including (1) SensoTRACK ear sensor,
(2) Google Contact Lens, (3) BioPatch™, (4) Smartwatch Basis PEAK™, (5) QardioCore,
(6) Vital Jacket® t-shirt, and (7) Moov
112 C. Fei et al.
the nanoscale so that they can be introduced into the body [112]. The ear accessory
device is another class of wearable device. The device is emerging and can obtain
many physiological feature parameters like heart rate and oxygen saturation level.
These types of devices demonstrated in Fig. 2-(1) are connected to the ear as reason-
able sensors, because muscle interference can be eliminated due to the composition
of the central cartilage and the existence of arteries near the surface of the ear.
Valencell which supplies major sensing technology revealed that the signals
obtained through devices connected the ear were 100 times clearer than those
obtained through devices connected to the wrist. Thus, the use of ear-based WHM
devices is trending [61].
Most of the wearable devices are connected with heart activity. These WHM
devices are divided into three main types: chest straps shown in Fig. 2-(5), adhesive
patches displayed in Fig. 2-(3), and t-shirt revealed in Fig. 2-(6) with the sensors
that are embedded. The former two types of devices efficiently acquire signals per-
taining to several vital parameters of the user’s body and heath. However, these
wearable devices are not as comfortable and convenient as the t-shirt with the
embedded sensors. E-textiles are prepared using electronic technologies and cloth
materials and can obtain a larger number of physiological signals because they
cover a larger body area than other WHM. To appropriately analyze the three types
of WHM devices (electronic chest straps, t-shirts, and adhesive patches), the quality
of heart-activity-signal monitoring was analyzed.
The concept of e-textiles is employed to many fields from fashion (e.g., light
dresses) to medical science (e.g., monitoring of vital health and body parameters).
Studies are being conducted to develop the smart (intelligent) fabrics into textiles
that contain the unique properties of electronic systems. These smart fabrics can be
classified into two categories, i.e., metal yarns comprising conductive fibers and
electroconductive yarns containing carbon-coated or polymeric threads. The devel-
opment of smart textiles for realization in WHM devices mainly focuses on textile
electrodes (known as tetrodes) to obtain the signals from the human body. This
technology has already been involved in some acquisition methods pertaining to
vital health and body parameters. Currently, smart textile-based WHM devices can
be developed for many lifestyle and sport monitoring applications. However,
detailed materials are required chiefly in clinic as the demand of device certification
and signal quality.
Wet electrodes are the gold standard in field of medicine for acquiring electrocar-
diograms (ECGs). However, these electrodes cause skin discomfort and irritation.
Thus, textile-based electrodes are the other solution to wet electrodes but with a
slight compromise in terms of signal quality. When wet electrodes are used, the total
contact conductivity of electrodes may increase because the adhesion between the
electrode and the skin reduces due to sweat. The quality of heart activity WHM
devices can undergo from measuring a pure heart rate to measuring an ECG wave-
form quality signal based on the touch between the sensor and the skin as well as the
influence of the hardware acquisition device on the accuracy level of the extracted
signal [6]. In this study, the WHM devices include three kinds of
heart-activity-monitoring devices: (1) HR devices that is to acquire the R-peaks to

evaluate heart rate, (2) R-R interval devices that are adopted to determine the time
of gaining each R-peak of an ECG signal, and (3) ECG devices that are used to
acquire the ECG waveform and to mine morphologic feature parameters (peaks and
valleys of ECG waveform), diagnose cardiovascular diseases, and analyze the reha-
bilitation of cardiovascular.
One the basis of the brand specifications of proposed devices [1, 10, 29, 67, 77,
78, 145, 185, 203–206, 226, 227], the heart-activity-monitoring wearable devices
are assessed using two approaches (Fig. 3): type of wearable device (adhesive
patches, chest straps, and t-shirts) and the aim of using the device (fitness, sports,
medical, and health). The purpose of using the devices determines the heart-activity
measure accuracy required (ECG < R-R interval < HR). Figure 3 presents that
higher-accuracy and higher-quality heart-activity signals are required for medical
and health-related applications compared with those required for fitness and sports
applications.
Fig. 3 Heart trackers characterized by different types of WHM devices

114 C. Fei et al.
2.3 Vital Signals for WHM
Different physiological signals pertaining to the human body may be tested from
electrical signals to biochemical signals. Human biosignals can be utilized to appro-
priately figure out the health condition of the human body and then to respond to
external factors. Before understanding how to generate and acquire the signals by
adopting wearable devices and sensors, the main biosignals that contribute to an
efficient analysis of the human body health should be determined. Currently, tech-
nology and wearable scenarios enable the classification of WHM into three catego-
ries (Fig. 4): the situations of uses such as in home, remote, or clinical environment,
kind of monitoring comprising offline and online, and type of user including healthy
and patient as well [19]. In Fig. 4, the solid line indicates that the devices are used
for medical purposes, and the dotted line represents that the device is used for activ-
ity purposes.
In respect of application, the WHM devices are generally categorized into activ-
ity applications involving fitness monitoring and wellness monitoring, nonmedical
applications involving self-rehabilitation monitoring, and medical applications as
well. The medical applications can be further decomposed into three main subcat-
egories, i.e., prediction, anomaly detection, and diagnosis support. Prediction
includes the identification of events to provide medical information for preventing
chronic problems further and building a diagnosis [19]. Anomaly detection involves
the identification of unusual patterns to distinguish between standard data and out-
lier data and then provide an alarm as a subtask particularly for anomaly detection
[19]. Diagnosis-based support is one of the basic tasks of clinical observations and
monitoring and is used for making a clinical decision by incorporating essential
Fig. 4 Illustration of four types of main data comprising prediction, activity, diagnose/ decision
support, and anomaly detection concerning different perspectives of wearable sensing in WHM
systems and devices
retrieved information of health records, anomaly detection data, and vital sig-
nals [19].
From all the possible health-related parameters that are acquired through the
human body, it is necessary to discern the most helpful nonmedical and medical
parameters (pertaining to activity, exercise, or sports). To appropriately illustrate the
advance of scientifically investigating the WHM devices, a survey and investigation
in the dataset “Web of Science” with criteria are performed and presented in Table 3.
In the survey, the criteria are followed to conduct the search for Wearable Device for
two different periods (2010–2015 and 2016–2019). For each analyzed period (i.e.,
each search in Web of Science), we searched for the “pairs” of purpose and vital
sign data for all combinations, excluding all others (purpose and vital signals). The
following description presents an example of a search: “Topic: (Wearable) AND
Topic: (Medical) NOT Topic: (Activity) AND Topic: (Body Temperature) NOT
Topic: (Blood Pressure) NOT Topic: (Respiration) NOT Topic: (Glucose) NOT
Topic: (Heart Rate) NOT Topic: (oxygen saturation) NOT Topic: (Electrocardiogram);
Timespan: 2010–2015.” In this search, we selected the revealed vital signals accord-
ing to the most frequent vital signals observed in the acquired literatures and segre-
gated into the two main fields of activity and medical (Fig. 5).
Overall, five traditional vital signals including HR, BP, RR, SpO2, and BT were
identified to be crucial. We generally consider the five signals to recognize human
health conditions and to continuously monitor them for patients. Ahrens et al. [3]
presented two novel physiological markers, capnography and stroke volume, and
suggested that these signals should be immediately measured if patient is in critical
medical condition. Similarly, Elliott and Coventry et al. [52] described three vital
signals pertaining to human health, i.e., pain, urine output, and level of conscious-
ness. These signals should also be regarded as part of the regular patient monitoring.
These researchers indicated that a combination of the three additional signals with
the five vital signals could accurately recognize the variation in the physiology of a
patient. The electrocardiography method is significant in electrical heart analysis
and the prediction and diagnosis of cardiovascular diseases [215]. The monitoring
of the blood glucose level is essential in patients with diabetes mellitus, which is an
endocrine disorder. Numerous studies have been conducted to develop noninvasive
method for blood-glucose-level monitoring [35]. For conducting WHM, vital sig-
nals should be analyzed and identified using data mining techniques. The next sub-
section illustrates data analysis of vital signals used for WHM.
Table 3 WHM devices survey topics and restrictions

Topic Years Intention Vital signals words
Wearable healthcare 2010–2015; “Medical”, “Body temperature,” “blood pressure,”
monitoring (WHM) 2016–2019 “Activity” “respiration,” “glucose”
“Heart rate,” “oxygen saturation,”
“electrocardiogram”
116 C. Fei et al.
463
500
450
400
350
Numbers of papers
300
250
178
200
131
150
78
100
50
0
Medical Activity Medical Activity
2010-2015 2016-2019
EGG 23 24 56 46
SpO2 2 2 0 3
HR 24 86 52 297
BG 2 2 13 23
RR 1 2 8 5
BP 14 6 19 41
BT 12 9 30 48
BT BP RR BG HR SpO2 EGG
Fig. 5 The related scientific papers retrieved to the topics of WHM and monitored physiological
signals. Note: BP, blood pressure; BT, body temperature; BG, blood glucose; RR, respiration rate;
HR, heart rate; ECG, electrocardiogram; SpO2, blood oxygen saturation
2.4 Data Analysis of Vital Signals Used for WHM
Due to the technological advances in healthcare and sensors, numerous data mining
approaches have been proposed [13, 18, 22, 35, 138]. Sow et al. [187] categorized
the primary procedure of sensor data mining into five stages, i.e., data acquisition,
data preprocessing, data transformation, data modelling, and data evaluation.
Moreover, in other studies [22, 222], data mining algorithms were subdivided into
two types: (1) unsupervised or descriptive learning (i.e., clustering, association,
summarization) and (2) supervised or predictive learning (i.e., regression and clas-
sification). However, these studies lack an in-depth study into algorithms applicabil-
ity in handling specific sensor data features in WHM systems and devices.
In recent years, the research area of WHM systems has changed from the simple
calculation and measurement of wearable sensor such as computing sleep time or
number of steps in single day to a higher level of data and signal processing which
are promising to provide more beneficial details to the end users. Thus, healthcare
has focused more on in-depth data mining to obtain profound information represen-
tation. Three kinds of data mining tasks were identified in this study based on the
studies selected for analysis. The three data mining tasks include anomaly detec-
tion, prediction, and diagnosis. Herein, the anomaly detection contains raising
alarm as occurring an anomaly. The diagnosis is a decision-making process, by
which the data is often classified into many categories in respect of the diseases or
other conditions. Figure 4 illustrates the three tasks from a three-dimensional (3-D)
perspective. The first dimension is the setting where monitoring is conducted. Most
monitoring applications that involve the home and remote monitoring settings pre-
dominantly pertain to anomaly detection and prediction, while the uses of clinical
settings pay general attention to the diagnosis [35, 188]. The fact is because the
increasing attention is paid to gain a more preventive technique (prediction) by
using wearable sensors, to consider the possibility of promoting independent living
in home environment by the increasing sense of security (alarm). Similarly, enough
information in clinical settings is available for diagnosis and decision-making [22].
A second dimension displays the important tasks of data mining used for users. For
the patients who have known medical records, the WHM devices with the diagnosis
capabilities and the possibility of raising alarms are crucial. Individuals who use
such devices to maintain good health by monitoring, prediction, and anomaly detec-
tion, were reported in the literature [135]. The final dimension pertains to how the
data are dealt with. As for all the three tasks, the data were resolved in an offline and
online manner. Moreover, a large number of alarm-related tasks are being adopted
for continuous monitoring in online method [187].
The central status of data mining in WHM systems is information retrieval such
as anomaly detection, diagnosis decision-making, and prediction. According to pre-
vious studies [138, 187], most healthcare systems deal with issues related to the
following aspects: (1) data acquisition through an enough sensor set, (2) data trans-
mission from a patient to a doctor, (3) data integration with other descriptive data,
and (4) data storage as well. The aforementioned tasks were included in all physi-
ological data processing frameworks that is to conduct data mining tasks, i.e., noise
removing, data cleaning, data compression, and data filtering. Several data mining
techniques are commonly applied such as wavelet analysis for both data compres-
sion [49] and artifact reduction [139], rule-based methods for data transmission and
summarization [2, 223], and Gaussian processing approach for secure authentica-
tion [209], to conduct these tasks. These tasks must be conducted because in real
world the WHM systems often handle continuous data and unlabeled data [18].
The role of data analysis in health monitoring system is to acquire data informa-
tion which are from low-level sensors and to change the information from high-
level sensors. Thus, the novel health monitoring system has paid more attention on
the phase of data processing to obtain a higher amount of information that is valu-
able based on requirements of expert user. Data mining techniques were initially
used on the data of wearable sensors in the health monitoring system. In the subsec-
tion, we summarize the approaches that are frequently applied to process the data of
wearable sensor for providing the valuable information. Apart from the data mining
technique, the most extensively applied and standard method to mine information
from the wearable sensor is presented in Fig. 6.
118 C. Fei et al.
Fig. 6 The generic architecture for the data of wearable sensors of the advanced data min-
ing method
Fig. 7 The relationship between the artificial intelligence (AI), the machine learning (ML), and
the deep learning (DL)
Raw data from the sensor are applied as the origin of the data mining method
typically (Fig. 7). In this study, the sensor data were used as training data to study
from the system and establish models of the feature and applied for testing the data
of the model designed to determine the usage in the real world and derive results.
This data mining method is deemed as a common flow for not only the supervised
but also the unsupervised data mining solutions for obtaining results from any task
of data mining. The primary procedures of the data mining method are as follows:
1. Preprocessing of the data Preprocessing of the raw data obtained in the health-
care domain is necessary because of the appearance of possible motion artifacts,
noise, and errors of sensor in any networks of wearable sensors in real-life sce-
narios. This preprocessing involves the following steps: (1) filtration of unusual
data to remove artifacts usually by applying threshold-based methods [129] or
statistical measures to insert the missing point of data [86] and (2) removal of
high-frequency noise [69, 187]. The major aspects that challenges the prepro-
cessing period of healthcare systems were presented in a previous study [187].
This study included the formatting, normalization, and synchronization of data
because the accumulated sensor data are often unreliable and abundant [69].
2. Extraction and Selection of Features Commonly, data mining is conducted on
extensive and datasets in the real world for retrieving valuable information. The
feature extraction is aimed to find the main features of data sets [74]. In particu-
lar, feature extraction provides a meaningful representation of the abundant and
complex raw data obtained from wearable sensors to formulate a relationship
between the expected message and raw data for making decision [24]. Because
wearable sensor data pertaining to vital parameters is tending to be in the succes-
sive format of time series, the majority of the features that are considered are
correlated to the properties of signals in time series [40]. Signals can be analyzed
in the domain of time and spectrum [15]. In the domain of time, the acquired
features generally contain characteristics of basic waveforms, and statistic
parameters are correlated to the apparent nature in the data stream, for instance,
variance, mean, and pick counts [9]. The features in time domain are commonly
found in physiological data due to that the traditional frameworks of decision-
making, which are suitable for vital parameters, are on basis of the remarkable
tendencies in the signal [203]. However, to obtain additional information related
to the periodic action of data in time series, the researches in the medical aspect
focused more on the obtained features from the frequency domain, for instance,
the power spectral density, high-pass or low-pass filters, spectrum energy, and
signal wavelet factors [68, 69]. For example, even though Bsoul et al. [32] have
proposed a great number of characteristics for ECG signals, the major concern
of their study was to take R points (each beat’s pick point) into account and their
performances in ECG pulses, for instance, the R-R interval and the pick count
[24, 190].
The feature selection is an available solution to select more distinct character-
istics based on the feature size acquired from the raw data and the capability of
the learning approach to deal with these data. The method of feature selection
often discovers a subset of the obtained high-dimensional data that are unrelated
and make contribution to the property of learners [129]. The technique of feature
selection used for physiological data can cut down the scale of the input data.
The three most prevalent methods in the medical field used for dimension reduc-
tion are LDA, ICA, and principal component analysis (PCA) [210]. These
approaches select the subset of the features that are most important in statistic
[74, 116]. Other implements for the feature selection consist of Fourier t ransforms
[99], analysis of variance (ANOVA) [63], and threshold-based principles [9].
120 C. Fei et al.
Even though the bulk of the frameworks presented in healthcare include the
feature acquisition or selection phase, the capital challenge is remaining to bal-
ance between characteristic acquisition (or selection optimum methods) and sys-
tem expenses. For example, the utilization of the feature selection in the systems
of real time is costly because the modeling technologies decrease the accuracy of
these results. The aforementioned challenge is interrelated with (1) the health
parameters that are selected in the system and (2) data mining missions or the
objective of health monitoring system directly. Nevertheless, the solution to
challenge is yet to be illustrated.
3. Modeling and Learning Method The methods of modeling and learning are cru-
cial in WHM and the core of this study. In general, the methods involve statistical
algorithms and ML algorithms. Statistical algorithms include decision tree [121,
152, 219], Gaussian mixture models [43, 209], HMMS [17, 156, 198, 233], rule-
based methods [5, 97], statistical tools [87, 190], and wavelet-based analysis
approaches in the frequency domain [36, 49, 99, 178]. ML has been progressing
rapidly in the recent decades and is studied in the next section.
Other parameters pertaining to ML and data mining approaches are essential, for
instance, electronic health records, historical data measurements, expert knowl-
edge, and anthropometric parameters (e.g., sex, age). The metadata supplies the
analysis from context and ameliorates the knowledge acquisition process [68,
209]. For example, each healthcare system that uses HR sensor data is required
to study effects of metadata, such as medicine, weight, age, and sex, for obtain-
ing a meaningful reasoning (i.e., basic heart rates which are irregular) or to per-
sonalize the pulses which are critical on basis of mentioned metadata [82].
3 Traditional Machine Learning-Based Approaches
Extensive data is produced when WHM sensors are deployed for monitoring the
health of a person in a home environment. Moreover, this data can be multivariate
with possible dependencies when multiple sensors are employed. Thus, suitable
data processing methods are indispensable to make the data intelligible [187]. In
this section, we sketch the most familiar ML algorithms that are applied with the
data of wearable-sensor. The technical details of each algorithm are presented with
the most typical instances to understand how to utilize the algorithm in the health-
care services. Moreover, the usability, efficiency, and relevant challenges of every
technique in the medical field are instructed. Artificial intelligence (AI) is an emerg-
ing technology for the last two decades. ML and DL are the present state-of-the-art
techniques used for system health monitoring and machine fault diagnosis [55, 56,
230] and are promising for rapidly and precisely performing WHM for human
activities. The relationship between AI, ML, and DL are presented in Fig. 7.
Before ML gets more deep, various classical ML and data mining algorithms
have emerged for decades of years, i.e., artificial neural networks (ANN) based on
backpropagation (BP-ANN), SVM. The application of classification ML algorithms

requires considerable expertise and sophisticated feature engineering because an
in-depth exploratory data analysis has to be often conducted on the dataset firstly.
Subsequently, a dimension reduction step can be performed using techniques, such
as PCA, for enabling easier handling. Ultimately, the best characteristics have to be
selected attentively to transfer these characteristics to ML algorithms. Knowledge
pertaining to typical ML for distinct fields and utilizations is very distinct and often
needs massive professional expertise within every domain. Because the major focus
of the retrospect study concerns the DL-based methods, a concise sum up of every
classical ML method is introduced in the part with a complete reference list.
3.1 ANN
Neural network (NN) is one of the AI approach, which is extensively utilized for
classifying and forecasting [150], and its structure is displayed in Fig. 8. NNs are
used to model the training data through studying the classification which is known
of records and make a comparison with given categories with the forecasted catego-
ries of the records to alter the weights of network for the next iterations of learning.
The use of NNs is currently the most prevalent method for data modeling used in the
medical field due to the acceptable predictive performance of NNs [21, 22]. NNs
can model nonlinear systems, for instance, physiological records in which the rela-
tionship between the import parameters is difficult to detect.
A broad scale of decision-making and diagnosis missions has been performed by
NNs in the medical field. NNs have been used to multisensor networks and to con-
duct sophisticated multivariate data analysis. The multilayer perceptron (MLP) NN
has been utilized in a previous study [115] to evaluate the pulse quality in PPG. In
the NN, several quality metrics of individual signal are utilized as the import. Then,
Fig. 8 ANN structure Hidden

Input Output
122 C. Fei et al.
the node number (2–20) of the underground layer invalidation iterations is opti-
mized. The result conducts two categories of signal quality as output. Some stan-
dard indices, such as specificity, accuracy, and sensitivity, were employed for
evaluating the task. Vu et al. [207] presented a framework for recognizing the vari-
ability patterns of heartrate by accelerometer sensors and ECG. This approach uti-
lizes a NN with three layer to learn the obtained patterns incrementally and make
the classification. In the output layer, three nodes were utilized for three data
classes—activity, location, and heart status. The replicator NN (RNN) [30] is
another kind of NN, which is usually applied for outlier and abnormal detections.
Recently, Chatterjee et al. [37] proposed an RNN to forecast levels of blood glu-
cose. A network was designed with 11 input variables and 1 output node which is
used for providing the predicted level of blood glucose, together with 3 underground
layers with 8 neurons for each layer. Another study conducted an online classifica-
tion of sleeping-awake states [99] through a feedforward NN on ECG and RR char-
acteristics in the domain of frequency. The network was devised utilizing three
frameworks (no hidden layers) which is different from the type of input signal
applied. To make the estimation, the investigation utilized other clinical parameters
(e.g., EEG) for monitoring and labeling collected data. Many other studies have
included NNs [69, 147].
In summary, because learning is a complicated task in NNs, the NN method is
generally applied for making decision in clinical situations that contain complex
and large datasets. However, this model cannot handle the domain knowledge for
enriching results. Moreover, because the process of modeling in NN is in the black
box, the methods employing NNs must justify each input data. Thus, the methods
using NNs are not considered as techniques that can be easily applied to diverse
datasets.
3.2 Kriging Model
Kriging surrogate model was proposed by Danie G. Krige in 1951 in the geostatis-
tics field for the first time [48]. The term “Kriging” was forged by Matheron who
firstly formulated the Kriging model mathematically in 1963 [64]. In 1973,
Matheron utilized the Kriging model in the field of the mineral reserve and corre-
sponding error evaluation [65]. Sacks et al. conducted the utilization of Kriging
models firstly in the process of analysis and design of computer tests [91]. In the
method, they made the analogy of the input space points to the geographic coordi-
nates. The Kriging models have been widely utilized in many application scenarios
in recent decades, for instance, the optimization of design of engineering structures
or other systems. Li et al. discussed the use of the Kriging model with the multiob-
jective genetic algorithm for the engineering optimization of a gear train [125]. In
the biomedical engineering field, Li et al. studied the stent design optimization and
relevant dilatation balloon by using the Kriging surrogate model with high-accuracy
analysis [84]. Simpson et al. applied the Kriging model on an aerospike nozzle for
the design optimization in multidisciplinary aspects [193]. Zhao et al. developed a
dynamic Kriging modelling method to address the transient problem pertaining to
structural design optimization [107]. Liem et al. combined an expert’s method with
the Kriging model for predicting the aerodynamic property to enable an accurate
and efficient analysis procedure of aircraft mission [57, 107, 161, 193]. The Kriging
model was also applied in the structural reliability analysis field [57, 84, 91, 107,
125, 161, 193]; meanwhile, it was proved to be fairly efficient and accurate for deal-
ing with the high-dimensional and highly nonlinear problems.
3.3 SVM
SVM is one major statistical learning theory that can itemize unknown information
by deriving chosen characteristics and establishing the hyperplane with a high
dimension to divide the data points into two categories to develop a decision model
[45]. SVMs are extensively used currently for mining physiological data in applica-
tions of medical field due to its ability to manage high-dimensional data through
using a set of mining training features.
The standard health parameters considered in SVM method include SpO2, HR,
and ECG. Hu et al. [86] utilized SVM for diagnosing arrhythmia by ECG signals.
They used an SVM classifier version with binary system to separate ECG signals
into the categories of regular and arrhythmia. Similarly, another study [111] put
forward an SVM method to detect seizure episodes and arrhythmia by ECG signals.
The study indicated that the application of the SVM with polynomial kernels per-
forms better than that with other kernels. In another study [25], researchers utilized
a one-against-all SVMs method for managing multilabel category to detect the con-
dition of the patient. On basis of the labels of experts on the data of the episodes
(level 4 severity), some binary SVMs with distinct kernels, such as sigmoid, poly-
nomial, and RBF, were united with simulated input data obtained from multisensor
features. Researchers also discussed the special application of SVM classifiers
[197]. The study confirmed the deterioration in the patients’ condition with chronic
arthritis and gastritis by conducting binary categories of abnormal and normal ECG
radial pulses by applying the SVM algorithm. The performance of this approach
was estimated in terms of accuracy, specificity, and sensitivity.
General, SVM techniques are usually presented for abnormal inspection and
making decision in services of healthcare. Nevertheless, in the SVM method,
domain knowledge of using symbolic knowledge or metadata cannot be integrated
with the sensors’ measurements. Moreover, SVM cannot be used to find out the
unexpected information that is acquired from unlabeled data like other classifiers.
124 C. Fei et al.
3.4 PCA
PCA is the algorithm that interprets the structure inside in the way best explicating
the change of the data. In case a dataset, which is multivariable, is visualized as a
group of coordinates in the data domain of high dimension (each axis has a vari-
able), PCA can supply a projection of the resulting object in lower dimension with
the user when the most informative viewpoint is adopted. Because the different
features’ sensitivity which are the characteristics of a bearing defect may change
remarkably in various operating conditions, it is proven that PCA is a systematic
and practical feature selection option which enables the manual partition of the
most representative features in the defect categorization.
In a previous study, one of the earliest adoptions of PCA on the bearing fault
diagnosis was found [88]. This investigation illustrated that the PCA method can
classify bearing faults with the higher precision and less feature inputs. Similarly,
the remaining studies based on PCA [214] have utilized this data mining ability of
PCA to benefit the selection process of manual feature. Recently, the PCA was also
applied to the automated diagnosis of coronary artery disease [69].
3.5 k-NN
The k-NN algorithm is a nonparametric approach for either regression or categori-

zation. In the k-NN categorization, the output is the object’s class member, which is
classified by the neighbors’ majority vote. The application of k-NN in personable
health monitoring and ill diagnosis has not been found to date, though k-NN was
widely adopted in rotor fault diagnoses [92, 170]. In a study, the early realization of
the k-NN classifier in the fault diagnosis of bearings was found [47]. In this investi-
gation, k-NN functions serves as the central leading algorithm of the data mining
classifier of ceramic the bearing faults that on basis of acoustic emission signals.
Similarly, some other investigations [11, 23, 103] have applied k-NN for conducting
distance analysis on every new data sample and determining whether the sample is
belonging to a specific fault type.
3.6 Other Traditional ML Algorithms
In addition to the generally applied ML methods mentioned above, a great deal of

other algorithms with different features have been used on the WHM, such as deci-
sion tree [121, 152, 219], Gaussian mixture models [43, 69], HMMs [17, 156, 198,
233], rule-based methods [5, 97], statistical tools [87], and wavelet-based analysis
approaches in frequency domain [36, 178].
Feature acquisition is the most important step in each stage of WHM framework,
because the performance of WHM system is closely related to the extraction of
distinctive and relevant eigenvectors. Many researches have been performed for
improving the WHM recognition system through obtaining expert-driven features
[58]. Traditional artificial set feature learning methods are simple and easy to under-
stand and have been universally used in activity recognition. However, the eigenvec-
tors extracted by these methods depend on the application or task and are not
suitable for similar activity tasks. In addition, the features cannot represent signifi-
cant features of complex activities, and time-consuming feature selection tech-
niques are required to select the best features [218]. What’s more, there is no general
procedures in selecting appropriate features. Therefore, many researches use heuris-
tic methods to study feature engineering knowledge.
In order to solve the above problems, researchers have studied some automatic
feature extraction technologies through DL techniques, which require less man-
power and material resources [109]. DL is a new branch of ML, which models the
high-level data features and has develop into an important technology of human
health recognition. DL is composed of multilayers of NNs, which represent features
from low to high levels, and has become an important research field in natural lan-
guage processing, object recognition, imaging analysis, environmental monitoring,
and machine translation [73]. In recent year, various DL methods can be utilized in
different levels to form DL models, to enhance the robustness, flexibility, and per-
formance of the system, which is promising to eliminate the dependence on tradi-
tional manual setting features.
4 Advanced Methods Based on Deep Learning
DL is a part of ML and Al techniques and thus provides considerable power and

flexibility for feature extraction through learning to represent the world as a con-
cepts’ nested hierarchy. Here, per concept is made up of the simpler concepts, and
less abstract ones are applied to compute the more abstract expressions. The frame-
work of the DL network is displayed in Fig. 9.
DL has advanced considerably since its discovery in 2006 [80]. The prevalent
research of DL is owing to the ability of DL in acquiring prominent characteristics
from raw sensor data without depending on setting features manually. Moreover, in
the human activity identification field, for example, complicated human activities
are hierarchical and translational invariant. Thus, the same activities can be con-
ducted through different manners by the same participants. Sometimes, the activity
can become a preliminary stage for other complicated activities. Jogging and run-
ning activities might not be differentiable relying on the health and age conditions
for persons who are performing the activities. As an ML technique, DL [26] uses
typical learning for automatically conducting feature characterization for the raw
data of sensors. DL techniques are different from classical ML techniques (such as
SVM, k-NN, and k-mean), which require manually set features to perform opti-
mally [109]. Over the years, DL has been extensively used in speech recognition
[80], image recognition [192], natural language processing [191], and medicine and
126 C. Fei et al.
Fig. 9 The framework of deep learning network
Fig. 10 Different framework of deep learning algorithms
pharmacy [127]. Recently, DL has found applications in human activity recognition

[118, 157, 167].
Many DL methods [109, 177] have been put forward in recent years. Some of the
methods are listed in Fig. 10, such as deep autoencoder, restricted Boltzmann
machine (RBM), recurrent NNs (RNNs) and sparse coding, and convolutional NN
(CNN). These methods are commented in the following subsection, and the fea-
tures, merits, and weaknesses of each method are summarized.
4.1 RBM
RBM [59] is a derived model that functions as the main component in the rapacious
layer-by-layer feature training and learning of the deep NNs. The model is trained
with contrastive difference for providing unselfish assessment of the maximum like-
lihood learning. However, RBM is hard to converge to the local minimum point for
the data representation of variants. Moreover, information about the automatic

adjustment of parameters such as momentum, weight decay, learning rate, sparsity,
and mini-batch size is crucial for achieving the optimal result [38, 80]. However,
obtaining this information is challenging. RMB comprises a visible unit together
with several hidden units, which are restricted for generating two-part graph to exe-
cute the algorithm effectively. Accordingly, the weights of neurons that are con-
nected between the hidden and visible units are independent with no hidden-hidden
or visible-visible connections under certain conditions. For providing efficient fea-
ture acquisition, several RBMs are piled up to produce visible to hidden units;
meanwhile, top layers are embedded or connected fully with classical ML to distin-
guish between eigenvectors [59]. However, problems such as class variation, inac-
tive hidden neuron, and intensity and sensitivity to large dataset make it difficult for
training RBM. Recently, the methods such as regularization have employed a noisy
rectified linear unit [137] and temperature-controlled RBM [113]. These methods
have been proposed to resolve the problems. RBM has been widely investigated in
feature acquisition and dimension reduction [79], together with modeling high-
dimensional data in the motion and video sensors [195]. The two well-known RBM
methods mentioned in literature are deep Boltzmann machine (DBM) and deep
belief network (DBN), which are shown in Fig. 11.
DBN [80] is a DL algorithm that is trained with the greedy layer-by-layer way by
piling up a few RBMs for acquiring the hierarchical characteristics from the raw
data of sensors. There are directed links in the lower layer and undirected connec-
tion in the top layer with DBN which make the modeling of detected distribution
possible between hidden layers and the vectors space. Similarly, the training is con-
ducted in a layer-by-layer manner with the fine-tuning weight through the contras-
tive convergence. Next, the data distribution with conditional probability is
computed for learning the robust features that are invariant to displacement, noise,
and transformation, [80]. The DBF is depicted with the different color boxes in
Fig. 12.
Visible Layer Hidden Layer1 Hidden Layer N-1 Hidden Layer N Visible Layer Hidden Layer1 Hidden Layer N-1 Hidden Layer N
(a) (b)
Fig. 11 Representation of restricted Boltzmann machine. (a) Deep belief network. (b) Deep
Boltzmann machine
128 C. Fei et al.
Fig. 12 The architecture

of DBN [80]
DBM [173] is a generation model. It has several hidden layers, which are located
in the indirect connection of the whole network layers. DBM learns features hierar-
chically from the data, and in the next layer, the characteristics learned in the first
layer are applied as potential variables. Similar with DBN, DBM uses Markov ran-
dom field to pretrain massive unlabeled data layer by layer and uses bottom-up
method to provide feedback. Moreover, the algorithm is adjusted by the backpropa-
gation method. Fine-tuning allows for change inference and describes the algorithm
to be deployed to identify tasks for a specific category or activity. The RBM training
process [173, 174] contains maximizing the likelihood lower bound by the random
maximum likelihood algorithm [224]. In this event, the training strategy has to
determine the weight initialization and training statistics, to update after every small
batch, and replace the random binary values with the determined real probability.
The main disadvantages of DBM is that it takes a lot of time when considerable
number of understandable optimization parameters are involved. Montavon et al.
[134] proposed a central optimization method of stable learning algorithm and pro-
posed a medium-sized DBM for discrimination and generation model.
4.2 Autoencoders
The use of autoencoders was proposed in the 1980s as an unsupervised pretraining

approach for ANN [177]. After decades of development, autoencoders have been
extensively applied as a greedy pretraining tool for layer-wise NN and a learning
tool with no supervision. The training procedure of an autoencoder with single
Fig. 13 Training process of a single hidden layer autoencoder [180]
hidden layer is described in Fig. 13. An autoencoder is trained through the use of an

ANN. An ANN consists of two portions— encoder and decoder. The encoder’s
output is sent to decoder as input. The ANN regards the mean square error between
the output and original input as loss function, and its main purpose is to imitate the
input as the eventual output. After training the ANN, the decoding unit is aban-
doned, and the part of encoder is retrained alone. Thus, the encoder’s output is a
feature characterization which can be applied in the classifiers of next stage.
The deep autoencoder approach copies the values of input as output values, as
described in Fig. 14. The deep autoencoder method generates the most distinctive
characteristics from the sensor data that is unlabeled during WHM, and they are
projected into lower-dimensional space in the way of utilizing decoding and encoder
units. The encoder transforms the input of sensor data into the hidden characteris-
tics; furthermore, these features are reconfigured by the decoder to approximate
values for minimizing error rates [119, 208]. The extraction techniques of data-
driven learning feature are provided by the method for avoiding problems that are
commonly inherited by manually set features. An autoencoder should be trained in
a manner such that the hidden units are smaller than the inputs or outputs, so it can
provide lower dimension distinctive characteristics for activity recognition with less
computation time [160]. In addition, the deep autoencoder algorithm uses multi-
layer encoder units for transforming high-dimensional data into low-dimensional
eigenvectors, which make the computation easier. The deep autoencoder algorithm
is pretrained using RBM due to its complexity [79], and higher feature characteriza-
tions can be obtained by piling up multiple levels of automatic encoder algorithms
[208]. In general, different kinds of autoencoder, such as contractive autoencoder,
sparse autoencoder, and denoising autoencoder, have been put forward for ensuring
robust features characterizations of ML applications.
130 C. Fei et al.
Fig. 14 Deep autoencoder encoding and decoding process
Firstly, denoising autoencoders were introduced for learning robust feature char-
acterization stochastically from damaged data (such as sensor values) through
destroying original input samples partially [201]. Therefore, we train a denoising
autoencoder and assign zero to random sample data values in way of stochastic
mapping, to regenerate input data samples from the damaged data. It is similar to
other DL models, which are unsupervised; the denoising autoencoder is trained by
layer-to-layer initialization. It is trained with every layer for producing the next
higher level of input data, which ensures the robust structure of the autoencoder
network and observes the statistical dependence and regularity related to the distri-
bution of input data. What’s more, a stack denoising autoencoder can train and learn
useful damage version of input sample data and with low classification errors [202].
Recently, a stacked denoising autoencoder was used for recognizing complicated
activities [148].
Sparse autoencoder [130] is an unsupervised DL model that is proposed for over
complete and sparse feature characterization of input data. In this kind of autoen-
coder, the sparse term is imposed on the model loss function, and some active units
are set close to 0. Sparse autoencoder is adept in handling tasks, which need to
analyze complex and high-dimensional input data, e.g., videos, images, and motion
sensor data. Using the sparsity term, the representation of general features can be
learned. In addition, the trained model is linearly separable, robust, and invariant to
displacements, distortion, changes, and learning applications [232]. Thus, the sparse
autoencoder model is very effective in acquiring low-dimensional features from
high-dimensional input data and performing compact interpretation of complicated
input data by supervised learning method.
A study proposed a contractive autoencoder [162], which can effectively repre-

sent features by introducing a penalty term of partial derivative. For the size of the
input data, the square sum of all the partial derivatives is used for the eigenvectors,
so that the feature is located in the neighborhood of the input data. Besides, the
penalty term cuts down the feature space of dimension by training the data to make
the change and distortion of model invariants. Compared with the denoising autoen-
coder, contractive autoencoder applies penalty terms to small damaged data sample.
While it is different from denoising autoencoders, the contractive autoencoder
penalizes aggregate data rather than encoded input samples.
4.3 Sparse Coding
Sparse coding is an ML technique first proposed by Olshausen and Field [144],

applied for overcomplete learning and producing efficient characterization of data.
Sparse coding decreases the data dimension effectively and describes the data
dynamically as the linear combination of base vectors, which makes it possible for
the model of sparse coding to acquire the data structure and to ascertain the relation-
ship between different input vectors [109]. In recent years, many studies have put
forward sparse coding methods for learning data characterization, particularly for
the recognition of human activity, such as sparse fusion and shift-invariant method
[50]. These algorithms provide the reduction strategies of feature dimensions for
reducing the complexities in WHM computation.
4.4 CNNs
Convolution was first proposed for the first time to detect image modes layer-by-
layer from the simple features to complex features [95]. The basic visual features
with low level, such as edge and corners, can be detected by the deep layers, and the
higher-level features can be detected by subsequent layers observed, which consists
of simple features with low level. CNN [120] is a deep NN with the structures that
are interconnected and the convolution operation heuristics (Fig. 15).
CNN convolution of original data (such as sensor values) is one of the most stud-
ied mature DL techniques. CNNs have a wide range of applications in ad speech
recognition, sentence modelling, and image classification and have been widely used
recently in the recognition of human activity based on wearable and mobile sensors
[73, 120, 167]. The CNN model generally comprises a pooling layer, convolutional
layer, together with fully connected layer (Fig. 16). It is stacked of these layers to
develop a deep structure for automatic feature acquisition from the original sensor
data [146]. The convolutional layer acquires feature maps with various step sizes and
kernel sizes and then cuts down the connection number between the pooling and
convolutional layers by concentrating together the features maps. The pooling layer
132 C. Fei et al.
Fig. 15 CNN architecture
Fig. 16 Deep convolutional neural network for WHM [120]
reduces the number of parameters and feature maps and enables the network to keep
constant transnationally to distortion and changes. Researchers have proposed vari-
ous pooling strategies on multiple applications of CNN implementations, which
include spatial, stochastic, average, and max pooling units [73]. Recently, perfor-
mance estimation and theory analysis of the pooling strategies have instructed that
the max-pooling strategies exhibit superior performance than all other strategies.
Thus, the maximum pool strategy has been widely used in DL training.
What’s more, the recent WHM studies related to the recognition of human activ-
ity have applied maximum pool strategies because of the robustness in detecting
minor changes [100]. Nevertheless, the investigations including time series analysis
and DL showed that the discriminative ability of maximum pool strategy decreases
[100]. Thus, experimental evaluation and analysis are further needed on pooling
strategies in recognition of human activities and applications in time series to con-
firm the effectiveness. An inference engine, e.g., a HMM, a SVM, or SoftMax, is
integrated in the fully connected layer, that uses eigenvectors of sensor data to
identify activities [33, 53, 166]. In CNN, the activation unit values of each area of
the network are calculated for learning the patterns in the input data [146]. The

output of the convolutional operation is computed as Ci1, j blj ml, j xil1m, j 1 ,
where l is the layer index, α is the activation function, b is the offset term of the
feature mapping, W is the feature map weight, and M is the size of kernel/filter.
Weights can be shared in order to make complexity reduction and make it easy to
train the network. The concept of CNN was obtained from a study by Hubel et al.
[88]. They analyzed the structure of the human visual cortex and found that the
cortex comprised of map of local receptive field whose granularity values decrease
as the cortex moves along receptive fields. Since then, several other CNN patterns
have been proposed, such as GoogLeNet [94], VGG [105], and AlexNet [88].
Recently, the CNN architectures which unite DL techniques of various CNN pat-
terns were also proposed [94, 146]. For instance, DeepConvLSTM was developed
to replace the CNN pooling layer which has the long short-term memory (LSTM)
of a recursive NN (RNN) [94]. Moreover, convolutional DBNs (CDBNs) were
developed to use the capabilities of discriminative CNNs and pretraining technique
of DBN [110]. Furthermore, Masci et al. (2011) combine convolutional NN with
online stochastic gradient descent optimization training of autoencoder and propose
a deep convolution autoencoder for feature learning [132]. The deep CNN frame-
work for WHM is displayed in Fig. 16.
4.5 RNN
RNNs were proposed for modelling series data, such as original sensor data or as
time series. The architecture of an ANN is illustrated in Fig. 17. RNN combines a
time layer for acquiring sequence information, and then it learns complicated
changes by hidden units of recursive units. Hidden units can be changed according
to the available information on the network, and the information is constantly
updated for reflecting the present network state. RNN calculates the present hidden
state through evaluating the next hidden state as the activation function of the previ-
ous hidden state. However, this model is challenged in training and including
exploding or vanishing gradients, which limits the application in the modelling of
long-term activity sequence and time correlation in the sensor data [143]. RNN
variation, such as gated recursive unit (GRU) and LSTM, integrate a large number
of memory cells and gates to acquire the time activity sequence [71]. LSTM [83]
merges memory storage units to reserve contextual information; thus, it can control
the flow of information into the network. Because it contains memory cell with
learnable weights, such as input gate, output gate, and function gate, LSTM can
model temporal correlation in time series data and fully capture global features to
improve the recognition accuracy.
Although LSTM has many advantages, Cho et al. (2014) pointed out that several
parameters are needed to be updated during training, which increases the LSTM
computational complexity [39]. In order to reduce the complexity due to the require-
ment of updating parameters, Cho et al. introduced GRUs with fewer parameters
134 C. Fei et al.
Fig. 17 Architectures of (a) of RNN and (b) RNN across a time step [230]
that made the implementation faster and simpler. GRU and LSTM are different in
the method of updating the hidden state of next one and the exposure mechanism
content [200]. The LSTM updates the next hidden state through a summation opera-
tion, and the GRU updates the state through confirming the correlation on basis of
the time that makes to keep the information in memory. In addition, a recent con-
trastive analysis of the performance of GRU and LSTM shows that GRU is slightly
better than LSTM in most applications of ML [41]. This paper attempts to improve
GRU by cutting down the gate number in the network and introducing only multi-
plication gates for controlling the flow of information [66]. Through comparing
with GRU and LSTM, this algorithm is superior to other algorithms in memory
requirement together with computing time. In recent, Chung et al. [40] proposed a
gating feedback RNN (GF-RNN) for solving the learning problem in multiplicative
scale. The learning process is very challenging in the application fields of the lan-
guage modeling and the sequence evaluation of programming language. Specifically,
GF-ANN was proposed by superimposing multiple recursive layers and allowing
the signal control of the flow, which is from the upper layer to lower one. The pro-
cess is based on the previous hidden state to control information flow adaptively and
assign different layers in various time scales. Nevertheless, the GF-RNN has not
been applied in the recognition of human activity of WHM. Among all the reviewed
investigations, the specific study, which applies GF-RNN for WHM, has not
been found.
4.6 Generative Adversary Network
Goodfellow et al. [89] proposed the generative adversary network (GAN) in 2014.
The network became one of the most encouraging breakthroughs quickly in ML
field. GAN consists of two portions—a function discriminator (FD) and a function
generator (FG)—as described in Fig. 18. In a GAN, the FG and FD compete with
each other. That is, the FG tries to confuse the FD, and the FD tries to differ between
Fig. 18 The architecture of GAN [89]
the samples that are generated by the FG and those samples that are obtained from
the raw data. GANs have a pattern of zero-sum game, in which both FD and FG
complete to obtain a better capacity of imitating the raw data and distinguishing
iteratively between the samples, respectively.
Categorical adversarial autoencoder (CatAAE) [75], a new GAN framework,
was proposed. In this framework, an autoencoder is automatically trained by a train-
ing process with antagonism; meanwhile, the prior distribution is applied on the
potential coding domain. Next, the classifier clusters the input sample units through
balancing the reciprocal information between the sample units and their class distri-
bution that is predicted. The potential training process and coding space were uti-
lized to study the model advantages. Experiments under various ratios of
signal-to-noise and motor loads fluctuations show that the proposed CatAAE frame-
work has advantages in terms of useful characteristics.
GANs have been extensively applied in the fault diagnosis of machines in real-
world applications. Because the operation conditions vary, the general assumption
that the test set and training set are distributed in the same form is usually invalid.
Some advanced GANs were developed [75] and inspired by GAN. These GANs
included adversarial adaptive one-dimensional CNN model (A2CNN) and deep
convolution GAN (DCGAN) [16], which exhibited better performance in terms of
the training and testing accuracies. However, GANs have not been used for health-
care monitoring and illness diagnosis. Among all the studies reviewed, no study
applied a GAN for WHM.
5 A
lgorithms, Application, and Frameworks of Different ML
Methods for WHM
5.1 Comparison of Different DL Algorithms for WHM
In this part, the methods mentioned above are compared in terms of the advantages
and disadvantages in WHM. Different DL methods mentioned in the paper bring the
latest performances of WHM in human activity and diseases. The main DL advan-
tage is that it can automatically study capacity from original sensor data that is
136 C. Fei et al.
unlabeled. Nevertheless, different capabilities are provided by these methods for the
sensor stream processing. For example, RBM algorithms can efficiently transform
the sensor data into eigenvectors through using unlabeled data for a layer-by-layer
training. Moreover, the algorithms allow the extraction of robust feature vectors.
However, RBMs exhibit high parameter initialization, which is a major drawback
and makes training computationally expensive. Supporting real-time and onboard
activity recognition is challenging due to the computing power of wearable sensor
and mobile devices [216].
Auto depth encoder is an effective method, which can automatically convert
unsupervised features into lower eigenvectors from original sensor data. A greedy
layer-by-layer learning method is adopted by deep autoencoder methods to learn
unsupervised features by continuous sensor streams. Auto depth encoder algorithm
is robust to noise sensor data, and it can learn complex and hierarchical features
from the sensor data. However, the main weaknesses of deep automatic encoders
are that it is impossible to find the optimal solution and the calculation time is long
owing to the high requirements of parameter setting. The sparse coding method can
efficiently simplify the sensor data in high dimension into feature vectors with lin-
ear combination and ensure the simplicity of feature representation.
In addition, sparse coding is invariant to sensor localization and transformation,
and it can effectively simulate the changes in the active process [229]. Changes in
the sensor orientation pose significant challenges in the WHM system, especially
for smartphone accelerometers [90]. In this case, the accelerometer signals pro-
duced by smartphones and wearable sensor devices will change with the direction
and location of the accelerometer.
However, the use of sparse coding for effectively performing unsupervised fea-
ture learning is still challenging. CNNs are able to learn eigenvectors from the data
of sensor for the high dimension and complex sensor data modeling. The major
advantage of CNNs is that it can use the pool layer for reducing the training data
dimensions and making the data translation invariant to the distortions and changes
[167]. The algorithms can learn repetitive and remote activities by the multi-channel
method [231]. CNN is mainly used to process images; so, sensor data is converted
into image description for supporting extraction of distinguishing features [176].
CNN can solve the problem of uncertainty of sensor measurement and inconsis-
tency of high-dimensional sensor data correlation. However, in order to obtain opti-
mal features, multiple hyperparameters must be adjusted in CNN.
Besides, supporting the identification of complex activity details on board is a
challenge. Finally, RNNs can be applied to time dynamics modeling in sensor data,
so that complex activity details can be modeled. RNNs, such as LSTM, are effective
for creating global time correlation in sensor data. The primary problem of RNNs,
especially LSTM, is the high computation time required because many parameters
are required to be updated. The techniques such as high throughput parameter
update method can help to decrease computation time. Table 4 summarizes the lat-
est applications of WHM and the advantages and disadvantages of each DL meth-
ods, focusing on the processing of sensor data.
Table 4 A summary of various DL architectures
Architecture Description Characteristics
CNN Pros:
Well-suited for 2D data, i.e., images Few neuron connections required for a typical ANN
Inspired by the neural-biological model of Many variants have been proposed: AlexNet [105],
the visual cortex [88]. Clarifai [225], and GoogLeNet [80]
Every hidden convolutional filter Cons:
transforms its input to a 3D output volume May require many layers to find an entire hierarchy
of neuron activation of visual features
May require a large dataset of labeled data
Deep autoencoder Pros:
Mainly designed for feature extraction or Does not require labeled data
dimension reduction Many variations have been proposed to make the
Has the same number of input and output representation more noise-resilient and robust:
nodes Sparse AutEnc [154], Denoising AutEnc [201],
Unsupervised learning method aiming to Contractive AutEnc [162], Convolutional AutEnc
recreate the input vector [132]
Cons:
Requires a pretraining stage
Training may suffer from the vanishing of errors
DBN Pros:
Composed of RBMs where each Proposes a layer-by-layer greedy learning strategy to
subnetwork’s hidden layer serves as the initialize the network
visible layer for the next Tractable inferences minimize the likelihood directly
Output Layer
Hidden Layer 1 Hidden Layer k Hidden Layer N

Has undirected connections just at the top Cons:
Input Layer
two layers Training may be computationally expensive due to
RBM 1 RBM B RBM N Classifier
Allows unsupervised and supervised the initialization process and sampling
training of the network
137
Table 4 (continued)
138

RNN Pros:
An ANN capable of analyzing data streams Memorizes sequential events
LSTM revibrated the application of RNNs Models time dependencies
Suitable for applications where the output Has shown great success in many applications:
depends on the previous computations speech recognition, natural language processing,
Share the same weights across all steps video analysis
Cons:
Learning issues are frequent due to gradient
vanishing/exploding
Real GAN Pros:

Example
Mainly designed to generate photographs Requires almost no modifications transferring to new
that look superficially authentic to human applications
Generated
Example
observers. Requires no Monte Carlo approximations to train
Input
Discriminator Implemented by a system of generative and Does not introduce deterministic bias
Layer discriminative networks. Cons:
Semi-supervised learning method aiming GAN training is unstable as it requires finding a Nash
to recreate the input vector. equilibrium of a game
Hard to learn to generate discrete data, like text
DNN Pros:
The general deep framework usually used Widely used with successes in many areas
for classification or regression Cons:
Made of many hidden layers (more than Training is not trivial because once the errors are
two) backpropagated to the first few layers, they become
Allows complex (nonlinear) hypotheses to minuscule
be expressed The learning process can be prolonged
C. Fei et al.
DBM Pros:
Proposed in [174] is another approach Incorporates top-down feedback for a more robust
based on the Boltzmann family inferences with ambiguous inputs
Possesses undirected connections Cons:
(conditionally independent) between all The time complexity for the inference is higher than
layers of the network DBN
Uses a stochastic maximum likelihood Optimization of the parameters is not practical for
[224] algorithm to maximize the lower large datasets
bound of the likelihood
139
140 C. Fei et al.
Additional
knowledge
Raw data Feature extraction DL approach

Data preprocessing Decision support
(wearable sensors) and selection (training or testing)
Fig. 19 Integration of wearable sensor data in DL method
Table 5 Overview of different DL methods of wearable medicine. Acc, accuracy; BA, Bland-
Altman slope (systematic error); P, precision
Ref.
Data Application Network type no Results
EEG Recognition of cognitive activities DBN CNN 207 Acc: 91.15%
Acc: 91.63%
Sleep stage scoring DBN 208 Acc: 91.33%
209 Acc: 91.31%
LSTM 210 Acc: 85.92%
Anomaly detection DBN 211 P: 0.1920
High performance
Classification of motor imagery CNN 212 Acc: 77.6%
Autoencoder
Frequency DBN 213 Acc: 84%
Feature extraction of motor-onset visual CNN 214 Acc: 87.5%
evoked potential
EMG Hand movement classification DBN 215 Acc: 66.59%
(healthy)
38.09%
(amputees)
ECG Arrhythmia classification DBN 216 Acc: 98.83%
Abnormal ECG recognition DNN 217 Acc: 85.52%
Biometric user identification CNN-LSTM 218 Acc: 99.54%
Diabetes detection CNN 219 Acc: 95.1%
PPG Monitoring and detecting of atrial DBN 220 Acc: 91.8%
fibrillation
Biometric user identification DBN 221 Acc: 96.t1%
Blood pressure monitoring CNN, 222 BA: 0.47
LSTM- RNN (systolic)
0.16 (diastolic)
Motion Human activity recognition CNN, LSTM 178 Acc: 95.8%
Closed loop for human activity DBN 223 Acc: 90%
recognition
Mobile applications on activity DBN CNN 224 Acc: 73–94%
recognition 28 Acc: 95.75%
225 Acc: 91.5–98.2%
Movement disorder CNN 226 Acc:90.9%
5.2 Different DL Applications in WHM Systems
Some comprehensive comments on the increase of wearable sensors and the impor-
tance of utilizing data science methods to improve WHM systems have been found
in the documents [7, 183]. These reviews highlight the growing trend of continuous
multimodal sensing in the early diagnosis of diseases, rapid response to emergen-
cies, prevention of chronic disease, and monitoring of physical activity. Gravina
et al. [72] delivered a global overview of multisensor information fusion in the
network of body sensors. Banaee et al. [19] reviewed a data mining of healthcare
and wearable sensors under life signals: blood pressure, heart rate, electrocardio-
gram, respiratory rate, blood glucose, photoplethysmography, and oxygen satura-
tion. In this part, we will concentrate on the DL methods used in WHM for the
recognition of physiological activities or human motion analysis. The general con-
cept of DL method for wearable sensor data is shown in Fig. 19. The first step of DL
method is to acquire original data from various wearable sensors. The next is data
preprocessing (such as normalization, filtering, all data synchronization, denoising),
then feature acquisition (with nonlinear features, or in time or frequency domain),
and the selection of most useful features to train or test the DL model. The result of
feature acquisition and selection as well as additional knowledge, e.g., study meta-
data, patient metadata, and expert opinion, are inputs that are used to train or test the
algorithm of DL. The algorithm ultimately determines the interested medical phe-
nomenon (prediction, detection, etc.).
A summary of the possible application of DL algorithms in WHM is presented
in Table 5 with a comparison of the accuracies of the algorithms.
Sarkar et al. [175] proposed a pattern using DL algorithms, which combines
multiplicate EEG sensors in an unconstrained environment for recognizing human
cognitive activities in the real time, and picked a smaller sensor suite that is suitable
for wearable sensor systems. DBN and CNN were used to classify the two impor-
tant activities—“watching” and “listening” (91.63% and 91.15%, respectively). A
two layer of DBN was tested by Langkvist et al. [108], and each layer has 200 hid-
den units. They pointed out that in contrast to other manual approaches, the DBN
improves sleep score accuracy (91.33%) by about 3%. It is concluded by the authors
that the separated DBNs should be applied for each signals of multimodal data.
Besides, their outputs should be combined using a secondary DBN. Zhang et al.
[228] used the sparse version DBN (SDBN) in the classification of sleep stage. A
voting principle was also used on basis of classification entropy utilizing SDBN and
a classifiers combination, such as HMM, k-NN, and SVM, and it attained 91.31%
accuracy. Dong et al. [51] applied LST network to learn sequence data and opti-
mized classification performance with the single channel EEG. Single channel EEG
(Fp2-EOG left and F4-EOG left) of the forehead were measured. According to
494 hours of sleep, 62 people were evaluated. Compared with the existing vertex or
pillow electrode placements methods, this method has a higher algorithm perfor-
mance. Classification accuracy of the LST network was 85.92%. The value was
relatively higher than that of the MLP (81.43%), random forest (RF, 81.67%), and
142 C. Fei et al.
SVM (79.7%) methods. Wulsin et al. [213] used DBN in the semi-supervised para-
digm to simulate EEG waveform for anomaly detection and classification. An
anomaly is a small isolated set of waveform patterns, e.g., eye blinks, seizures,
spikes, and other noise or artifacts. The results show that the performance of DBN
is equivalent to that of standard classifiers (k-NN, SVM, and decision trees); mean-
while, the classification time is faster than other high-performing classifiers with 1.7
to 103.7 times. Compared with other common techniques, DBNs with original data
inputs can be more efficient in automatically identifying anomalies online. Tabar
et al. [194] improved the classification of EEG motion images using DL method for
BCI. They used CNN and an automatic encoder network for classifying moving
images from EEG. A new deep network was proposed to classify the features
extracted by CNN by the autoencoder network. Besides, the average error of clas-
sification was 77.6%. They improved by 9% over the winner of the fourth BCI
competition (Berlin, 2008). Lu et al. [122] evaluated the frequency-domain charac-
terization of motor imaginary EEG signals obtained by wavelet package decompo-
sition and fast Fourier transform and trained three RBMs. These RBMs were
superimposed with additional output layer to develop a four layer NN called fre-
quential DBN (FDBN). The output layer is classified by SoftMax regression, and
the FDBN is fine adjusted by backpropagation method and conjugate gradient
method. When FDBN was used, the results of benchmark data showed statistically
a remarkable improvement in classification (84% vs. 73–80%) compared with the
most advanced methods selected. Ma et al. [126] combined DBN and compressed
sensing to extract the multimodal features generated by the sensing method, which
improved the BCI accuracy by about 3.5%. Compared with the traditional mVEP
features, the proposed method achieves higher classification accuracy (87.5% vs.
84%). The result was applicable to subjects, which have the relatively poor
performance.
A CNN is applied by Atzori et al. [14] to categorize 50 hand movements with
surface EMG signals. They used the NinaPro open database, which included data
from 78 subjects—67 healthy subjects and 11 transradial amputees. The average
classification accuracies obtained utilizing a simple CNN and various classical clas-
sification methods (such as the random forest, k-NN, SVM, and linear discriminant
analysis methods) were comparable (66.59% vs. 62.06% for dataset 1 comprising
data of healthy participants, 60.27% vs. 60.28% for dataset 2 comprising data of
healthy participants, and 38.09% vs. 38.82% for the dataset comprising the data of
participants who underwent amputation).
Yan et al. [217] used the MIT-BIH standard ECG signal database to train the
RBM. Among it, half of the data was utilized for RBMs training, 30% for DBN
fine-tuning, and the remaining 20% for testing. The results showed that the specific-
ity, sensitivity, and accuracy of two kinds of ECG were 96.05%, 99.83%, and
98.83%, respectively. Ripol et al. [163] compared the effectiveness of DBN in iden-
tifying abnormal 12 lead ECG data from a large group of patients (1390 patients
from Clinic of Barcelona Hospital) and compared with the professional algorithm
of extreme learning machines, k-NN, SVM, and dedicated heart disease system.
Both DBN (specificity 78.27% and accuracy 85.52%) and SVM (specificity 73.46%
and accuracy 84.76%) were better than other methods with the higher specificity
and accuracy. Page et al. [149] utilized a deep NN network to the QRS segments of
90 resting participants for identifying their biometrics. Optimal accuracy, sensitiv-
ity, and specificity of the applied NN were 99.54%, 99.49%, and 99.55%, respec-
tively. Ashiquzzaman et al. [12] utilized a CNN-LSTM to detect diabetes by heart
rates that were obtained from ECG signals. The method exhibited an accuracy of
95.1%. Thus, this method is promising for noninvasive diabetes detection.
Shashikumar et al. [181] conducted atrial fibrillation detection on patients in real
time. The continuous wavelet transform of PPG signal, which is recorded by the
Simband smartwatch, is applied to obtain features used in training CNN. The accu-
racy of the method was 91.8%, and it was equivalent to that based on ECG. Jindal
et al. [93] compared the DBN performance with classic fuzzy and k-NN classifiers
in biometric user recognition. The DBN method consists of clustering step, forming
subgroups before data preprocessing through DBN and RBM fine-tuning. The accu-
racy rate of the combination of clustering and DL is 96.1%, which was more than
10% higher than the classical method. Ruiz et al. [169] recorded both noninvasive
(through light plethysmograph) and invasive blood pressure (through the radial
artery catheter) simultaneously. RBM training was carried out in 572 patients with
stable blood pressure, which was measured by the PPG signal. Systematic errors
(Bland-Altman slope) of the diastolic and systolic arterial pressure were 0.16 and
0.47, respectively.
The DL-based WHM systems has been used by different authors to identify
human physical activity. In this systems, the input signals applied for DL training
come from motion sensors (such as gyroscopes and accelerometers). Ordóñez et al.
[146] reported that the combination of CNN and RNN can effectively identify
human activities for18 gesture tasks, and the accuracy of this method is 95.8%.
Using the concept of closed loop proposed by Saeedi et al. [171], the robustness of
activity identification can be improved. In this case, the accuracy of activity recog-
nition could reach 90%. Some authors have proposed a framework for activity iden-
tification using the mobile devices, thus demonstrating the possibility of commercial
applications [167]. Bhattacharya et al. [27] proposed a simple RBM solution for
smartwatch processor (Qualcomm Snapdragon 400). The prototype of smartwatch
includes the following sensors: gyroscope, accelerometer, magnetometer, barome-
ter, and temperature and light sensors. Three different RBM patterns were tested for
supporting three daily scenarios, physical activities, transportation, and gestures,
and the transition between outdoor and indoor environments. The results of classifi-
cation that were obtained by RBM have higher accuracy than that obtained by tra-
ditional methods (decision trees, RF, and SVC). On basis of the RBM model, test
results showed that the life of battery was between 6 and 52 hours. Ronao et al.
[167] reported the results of an investigation of 30 volunteers who executed differ-
ent physical activities, e.g., standing, lying walking, and sitting with the smartphone
in pockets to collect gyroscope and accelerometer data. They reported that the clas-
sification accuracy of CNN for human activity identification was 95.75%. In addi-
tion, Ravi et al. [159] also introduced another solution to effectively implement
CNN for the recognition of human activities (such as cycling, jogging, and walking)
144 C. Fei et al.
with a low power device. Eskofier et al. [54] explicated the superiority and clinical
application of deep CNN in the diagnosis of motor retardation in patients with
Parkinson’s disease (on basis of the data of accelerometers located in the forearm),
and it is contrasted with the classical methods (such as PART, k-NN, SVM, and Ada
Boost M1; 90.9% vs. 81.7%, 67.1%, 85.6%, and 86.3%, respectively).
5.3 H
ardware and Software Frameworks
for DL Implementation
Table 6 provides a list of the most fashionable packages that allow custom DL
method on basis of the method described in this article. All the software that are
listed in the Table 6 can use CUDA (NVIDIA) for improving performance by GPU
acceleration. Because of the increasing trend to transform proprietary DL patterns
into the open-source projects, the companies like Nervana Systems [133] and
Wolfram Mathematica [34] have made a decision to provide cloud-based services to
help researchers expedite training process. The new acceleration hardware of GPU
includes microprocessors that are built specifically for DL, like NVIDIA DGX-1
[184]. Other feasible solutions are neural morphological electronic systems, which
are commonly used in the simulations of computational neuroscience. These hard-
ware patterns are intended to execute artificial synapses and neurons in the chip.
Some of the popular hardware components include Spinnaker [212], IBM TrueNorth,
Intel Curie, and NuPIC.
Other patterns to simplify DL implementation across heterogeneous devices and
platforms are still under development. For example, Convent, DIGIT, and the
MATLAB-based CNN toolbox are being developed for feature acquisition. Besides,
CNN’s implementation of CUDA, Cudanet, and C++ is being fine-tuned for achiev-
ing the DL development. Recently, some estimations of the frameworks have been
reported [100, 120] using parameters such as documentation, language support,
extension speed, development environment, GPU support, training speed, model
library, and maturity level. In these frameworks, TensorFlow has the highest interest
and contribution to GitHub, surpassing Caffe and CNTK. Besides, some patterns
support GPU or have limited support for GPU, and GPU must reside on the work-
station (such as MXNet).
Owing to the development of DL-based recognition of human activities, these
patterns have become the main choices for researcher and developers based on
mobile and wearable sensor applications. Because of various implementation
frameworks and different programming support, the framework choice relies on the
user’s technical and programming ability. Recent software frameworks for mobile-
based human activity recognition include Theano, TensorFlow, Lasagne, Keras, and
Caffe [54, 100, 146, 167, 221]. Other researches use MATLAB [27, 53] and other
programming platforms such as C++ [50] to develop algorithms.
Table 6 Popular software packages of DL implementation
OpenMP Supported techniques Cloud
Name Creator License Platform Interface support RNN CNN DBN computing
Caffe [70] Berkeley Center FreeBSD Linux, Win, OSX, C++, Python, ✗ √ √ ✗ ✗
Andr. MATLAB
CNTK [199] Microsoft MIT Linux, Win Command-line √ √ √ ✗ ✗
Deeplearning4jK Skymind Apache 2.0 Linux, Win, OSX, Java, Scala, Clojure √ √ √ √ ✗
[155] Andr.
Wolfram Math. [34] Wolfram Research Proprietary Linux, Win, OSX, Java, C++ ✗ ✗ √ √ √
Cloud
TensorFlow [62] Google Apache 2.0 Linux, OSX Python ✗ √ √ √ ✗
Theano [140] Universite´ de Montre´ BSD Cross-platform Python √ √ √ √ ✗
al
Torch [142] Ronan Collobert et al. BSD Linux, Win, OSX, Lua, LuaJIT, C √ √ √ √ ✗
Andr., iOS
Keras [106] Franois Chollet MIT license Linux, Win, OSX Python ✗ √ √ √ ✗
Neon [133] Nervana Systems Apache 2.0 OSX, Linux Python √ √ √ √ √
145
146 C. Fei et al.
6 C
hallenges and Open Directions in WHM Based on ML or
DL Methods
6.1 Challenges of ML- or DL-Based WHM
In this part, some research challenges which need to be further discussed are listed.
Some of the areas that require further study are as follows: collection of a large
dataset, real-time and onboard implementation on WHM devices, class imbalance
problems, data preprocessing and evaluation, and the many research issues in the
field of sensor fusion. Here, related research directions are discussed on basis of
seven important themes:
• Implementation of DL algorithm in real time and onboard for WHM: The
onboard implementation of DL algorithms in mobile and wearable devices will
help in reducing the computation complexity on data storage and transfer.
However, this technique is restricted by data acquisition and memory constraints
in the current mobile and wearable sensor devices.
• Moreover, the tuning and initialization of many parameters in DL increases the
computational time and is not suitable for low-energy mobile devices. Therefore,
methods such as optimal compression and mobile-phone-enabled GPU should
be utilized to minimize the computation time and resource consumptions. Other
methods for real-time implementation include mobile cloud computing plat-
forms for training to reduce the training time and memory usage. By using this
type of implementation, the system can become self-adaptive and can require
minimal user inputs for a new information source.
• A comprehensive evaluation of preprocessing and hyperparameter settings on
learning algorithms: Preprocessing and dimensionality reduction are essential
aspects of the human activity recognition process. Dimensionality reduction
provides a mechanism to minimize the computational complexity, especially for
mobile and wearable sensor devices with limited computation ability and mem-
ory, by projecting high- dimensional sensor data into lower-dimensional vectors.
However, the method and extent of preprocessing for the performance of DL is
an open research topic. Many preprocessing techniques such as normalization,
standardization, and different dimensionality reduction methods should be
investigated to know their effects on the performances, computational time, and
accuracy of DL methods. Aspects such as learning rate optimization to acceler-
ate computation and reduce model and data size, kernel reuse, filter size, com-
putation time, memory analysis, and learning process still require further
research because current studies depend on heuristics methods to apply these
hyperparameters. Moreover, the use of grid search and evolutionary optimiza-
tion methods for mobile-based DL methods that support lower-energy consump-
tion, dynamic and adaptive applications, and new techniques that enable mobile
GPUs to reduce the computational time are very significant research direc-
tions [146].
• Large-sensor dataset collection for evaluation of DL methods: The training and

evaluation of DL techniques require large datasets obtained from different
sensor-based Internet of things (IoT) devices and technologies. The current
review indicates that most studies on DL implementation for mobile and wear-
able sensor-based health monitoring depend on benchmark dataset from conven-
tional ML algorithms, such as OPPORTUNITY, Skoda, and WSDM, for
evaluation. Data collection methods use cyber-physical systems and mobile
crowdsourcing to leverage data pertaining to smart homes, mobile location (for
determining the transportation mode), smart home environment (for elderly care
and monitoring), and GPS (for conducting context-aware location recognition
and other essential applications). Therefore, the collection of large datasets
through the synergy of these technologies are crucial for performance
improvement.
• Transfer learning for using DL algorithms for realizing mobile and wearable
sensor devices: Transfer-learning-based activity recognition is a challenging
task. Transfer learning leverages the experience acquired from different domains
to improve the performance of new areas that is yet to be experienced by the
system. Transfer learning is mainly used to reduce training time, provide robust
and versatile activity details, and reuse existing knowledge for new domains (a
critical issue in activity recognition). Further research in the areas related to ker-
nels, convolutional layers, interlocation transferability, and intermodality trans-
ferability would improve implementation of DL-based WHM [146]. In addition,
the transfer learning in the recognition of human activities on basis of the mobile
wearable sensor can minimize environment, target, and source-specific applica-
tion implementation that haven’t received due attention.
• Implementing decision fusion of WHM based on DL: Decision fusion is an
important step to further improve the diversity and performance of recognition
systems of human activities and is key step to combine multiple classifiers, sen-
sors, and architectures into one decision. Moreover, heterogeneous sensor fusion
requires further study as the typical area, and it combines expert knowledge with
the DL algorithms as well as various unsupervised feature-learning methods for
improving the activity recognition system’s performance.
• Solving the class imbalance problem of DL in WHM: During the abnormal activ-
ity of WHM, the class imbalance problems can be discovered in the dataset. It is
very important to solve the problem of class imbalance in medical monitoring,
especially when the real fall is very steep. In human activity recognition based on
mobile and wearable sensors, due to the distortion of dataset and the calibration
of sensor data, performance generalization will be reduced, resulting in class
imbalance. Previous studies have developed a series of solutions, such as the
cost-sensitive learning strategy and weighted extremum based on hybrid kernel.
However, there is no study to find out the impact of class imbalance on DL
implementation, especially for the mobile wearable sensors. Thus, strategies of
reducing class imbalance using DL methods can improve WHM significantly.
• Enhancing WHM to improve DL performance: Another aspect to be studied is to
utilize data enhance technology to improve the DL performance for motion sen-
148 C. Fei et al.
sors (gyroscopes, accelerometer, etc.) of WHM with CNN. The data expansion
methods use limited wearable and mobile sensor data to generate new data by
converting existing data of training sensors. The processes are important owing
to that they help to generate enough training data for avoiding overfitting and
improving translation invariance in sensor variation, distortion, and orientation,
especially in CNN models. Data enhancement is the joint training strategy in the
image categorization [73]. In WHM based on wearable and mobile sensors, the
performance and impact of data expansion need to be evaluated for generating
more training instances and prevent overfitting caused by small datasets.
Different data expansion methods, such as arbitrary rotations, sensor position
change, position arrangement with sensor events, scaling, and time warping, pro-
vide effective method to improve the WHM performance based on DL [229].
6.2 Open Research Directions of ML-/DL-Based WHM
To monitor human activities and health conveniently and smartly, high-performance

and high-reliability WHM devices based on ML or DL approaches should be devel-
oped urgently. Four research directions should be considered in future studies.
• The exploration of a novel data acquisition system based on DL algorithm is one
primary direction that should be considered to ensure rapid and real-time data
collection of all signals with less signal loss. In particular, the development of a
wireless data acquisition system by adopting wireless transmission technology is
crucial for collecting remote signal data from human bodies more conveniently
and rapidly.
• The transfer system of signals should be investigated by using DL algorithms for
developing an intelligent transfer model for accurately and rapidly transmitting
signals from the transmission terminal to the receiving terminal wirelessly.
• The method of adopting the DL approach to process the signal data should be
urgently investigated. When signals are used, the feature extraction of these sig-
nals is a crucial research direction. The feature extraction techniques have to be
developed in ML- or DL-based WHM devices for the precise identification of
different classes of signals that are obtained for different health and illnesses
conditions. How to acquire enough data to cater for the use of DL approach is
very important to structure database. Use of DL in WHM needs large-scale data
to training DL model. Therefore, the collections and buildup of the database of
the human-body-related bio-characteristics is highly needed for the application
of the ML-/DL-based WHM device. In this case, it is necessary to develop more
efficient methods to replenish the insufficient database and to build big database.
For example, collaboration with related hospitals, healthcare centers, clinics, or
interdisciplinary research teams would be possible approaches.
• The interoperability of WHM system is a challenge in information exchange
among different populations and subsystems in WHM system (or device). The
development of the fifth-generation wireless network technology (5G) makes

connecting more devices in hospital into the network on the spot and obtaining
remote access at home possible. Thus, how to adopt 5G technology to transmit
important activity and signals information to the home computer and how to send
data to appropriate medical professionals by telephone line and the internet with
a miniature, wearable, and low-power radio are the key technologies of the
interoperability of WHM system. Besides, the use of 5G technology is also
promising to advance the new devices of the WHM system.
7 Conclusions
Automatic feature learning in WHM has been developed rapidly in terms of feature
extraction and processing of the data or signals pertaining to the activities of indi-
viduals who are monitored. The development has benefited from the steady devel-
opment of computing infrastructure capabilities and access to large datasets by
wearable and mobile sensing technologies, crowd sourcing, and IoT. In the research,
we reviewed multiple ML methods, including manual feature acquisition from data
signals. To conduct automatic feature extraction in WHM, DL methods, such as
RBMs, autoencoders, CNNs, and RNNs, were presented. What’s more, their advan-
tages, characteristics, and disadvantages were introduced in detail. The DL method
can be divided into discriminative, generative, and mixed methods. We used these
categories for reviewing and summarizing DL implementation of WHM. The gen-
erative DL methods include RBMs, autoencoders, sparse coding, and deep mixture
models. In addition, the discrimination methods include RNNs, CNNs, hydrocar-
bons, and deep neural model. It is similar that the hybrid method combines discrimi-
nant and generative model for enhancing feature learning. Currently, such
combinations are mainly used in the studies pertaining to DL for WHM. The hybrid
method combines different generation models (such as RBM and autoencoder) with
a CNN or combines the discriminative model (such as LSTM and CNN). These
methods are the key steps to realize automatic feature learning and improve perfor-
mance generalization across activities and datasets.
The DL implementation is supported by the software framework and high-
performance computing GPU. As open-sources project, a number of these software
frameworks have been released to research communities recently. These software
frameworks were discussed by considering the cognizance of their characteristics
and the information provided by developers’selection of particular frameworks.
Besides, the evaluation, classification, and training of DL algorithms for WHM are
not trivial all the time. In order to provide the best categorizations and comparisons
of recent events in research communities, we reviewed the optimization and training
strategies used by different recently proposed studies of human activities recogni-
tion that are based on wearable and mobile sensors.
Performance and classification metrics obtained by various validation techniques
are critical for ensuring generalization across the datasets. The methods can avoid
150 C. Fei et al.
the overfitting on training sets. It provided an open dataset for identifying and mod-
elling human activities. Some of the extensively applied datasets are PAMAP2,
Skoda, and OPPORTUNITY. Besides, the datasets are popular in classic ML algo-
rithms as well.
To further understand the direction of research progress, we put forward the rel-
evant challenges to be overcome and need researchers’ attention. For example, the
decision fusion based on DL, the implementation of DL mobile devices, class
imbalance, and transfer learning make the WHM implementation achieve higher
performance accuracy. With the further development of wearable computing tech-
nology and wearable devices, people are looking forward to the further develop-
ment of digital learning technology. In addition, many meaningful research
directions are proposed to guide future research.
Acknowledgments This work was supported in part by the Innovation and Technology Fund of
the Hong Kong SAR government (Grant no. ITP/097/18TP), University Grants Committee of the
Hong Kong SAR government (Grant no. UGC-UAHB), the National Natural Science Foundation
of China (Grant no. 51975127), Shanghai International Cooperation Project of One Belt and One
Road of China (Grant No. 20110741700), Aerospace Science and Technology Fund of China
(Grant no. AERO201937), and Fudan Research Start-up Fund (Grant no. FDU38341). The authors
would like to thank them.
Conflicts of Interest The authors declare that there is no conflict of interests regarding the publi-
cation of this article.
References
1. © Polar Electro 2016. H7 Heart Rate Sensor. Available online: www.polar.com.

2. Ahmad, N.F., Hoang, D.B., Phung, M.H. (2009) Robust Preprocessing for Health Care
Monitoring Framework. In: the 11th International Conference on E-Health Networking,
Applications and Services, Sydney, Australia, pp. 169–174.
3. Ahrens, T. (2008) The most important vital signs are not being measured. Aust. Crit
Care 21:3–5.
4. Alaiad, A., Zhou, L. (2014) The determinants of home healthcare robots adoption: an empiri-
cal investigation. Int. J. Med. Inf. 8(3):825–840.
5. Al-Hajji, A.A. (2012) Rule-Based Expert System for Diagnosis and Symptom of Neurological
Disorders “Neurologist Expert System (NES)”. In: the 1st Taibah University International
Conference on Computing and Information Technology, Al-Madinah Al-Munawwarah,
Saudi Arabia, pp. 67–72.
6. Andreoni, G., Standoli, C.E., Perego, P. (2016) Defining requirements and related methods
for designing sensorized garments. Sensors 16:769.
7. Andreu-Perez J, Leff DR, Ip HM, Yang GZ. (2015) From wearable sensors to smart
implants-–toward pervasive and personalized healthcare. IEEE Trans. on Biomed. Eng.
62(12):2750-2762.
8. Anguita, D., Ghio, A., Oneto, L., Parra, X., & Reyes-Ortiz, J. L. (2012) Human activity rec-
ognition on smartphones using a multiclass hardware-friendly support vector machine. In Int.
Workshop on Ambient Assisted Living 216-223.
9. Apiletti, D., Baralis, E., Bruno, G., Cerquitelli, T. (2009) Real-time analysis of physiological
data to support medical applications. Trans. Info. Tech. Biomed. 13:313–321.
10. Appelboom, G., Camacho, E., Abraham, M.E., Bruce, S.S., Dumont, E.L., Zacharia, B.E.,
D’Amico, R., Slomian, J., Reginster, J.Y., Bruyere, O., et al. (2014) Smart wearable body
sensors for patient self-assessment and monitoring. Arch. Public Health 72:28.
11. Asensio, A., Marco, A., Blasco, R., Casas, R. (2014) Protocol and architecture to bring things
into internet of things. Int. J. Distrib. Sens. Netw.
12. Ashiquzzaman A, Tushar AK, Islam MR, Shon D, Im K, Park JH et al. (2018) Reduction of
Overfitting in Diabetes Prediction Using Deep Learning Neural Network. In: IT Convergence
and Security 2017. Singapore: Sringer, pp. 35–43.
13. Atallah, L., Lo, B., Yang, G.Z. (2012) Can pervasive sensing address current challenges in
global healthcare? J. Epidemiol. Glob. Health 2:1–13.
14. Atzori M, Cognolato M, Müller H. (2016) Deep learning with convolutional neural networks
applied to electromyography data: a resource for the classification of movements for pros-
thetic hands. Frontiers in Neurorobotics. 10(9): 1–8.
15. Avci, A., Bosch, S., Marin-Perianu, M., Marin-Perianu, R., Havinga, P. (2010) Activity
Recognition Using Inertial Sensing for Healthcare, Wellbeing and Sports Applications:
A Survey. In: the 23th International Conference on Architecture of Computing Systems,
Hannover, Germany, pp. 167–176.
16. B. Zhang, W. Li, J. Hao, X.-L. Li, and M. Zhang. (2018) Adversarial adaptive 1-D convo-
lutional neural networks for bearing fault diagnosis under varying working condition eprint
arXiv:1805.00778.
17. Bae, J., Tomizuka, M. (2011) Gait phase analysis based on a Hidden Markov Model.
Mechatronics 21:961–970.
18. Baig, M., Gholamhosseini, H. (2013) Smart health monitoring systems: An overview of
design and modeling. J. Med. Syst. 37:1–14.
19. Banaee, H., Ahmed, M.U., Loutfi, A. (2013) Data mining for wearable sensors in health
monitoring systems: A review of recent trends and challenges. Sensors 13:17472–17500.
20. BASIS. PEAK—The Ultimate Fitness and Sleep Tracker. Available online: https://www.
mybasis.com/.
21. Bellazzi, R., Zupan, B. (2008) Predictive data mining in clinical medicine: Current issues and
guidelines. Int. J. Med. Inform. 77:81–97.
22. Bellazzi, R., Ferrazzi, F., Sacchi, L. (2011) Predictive data mining in clinical medicine: A
focus on selected methods and applications. Wiley. Interdiscip. Rev.: Data. Min. Knowl.
Discov. 1:416–430.
23. Belle, A., Thiagarajan, R., Soroushmehr, S., Navidi, F., Beard, D.A., Najarian, K. (2015) Big
data analytics in healthcare. BioMed Res. Int. 2015 370194.
24. Bellos, C.C., Papadopoulos, A., Rosso, R., Fotiadis, D.I. (2010) Extraction and Analysis of
Features Acquired by Wearable Sensors Network. In: the 10th IEEE International Conference
on Information Technology and Applications in Biomedicine, Corfu, Greece, pp. 1–4.
25. Bellos, C., Papadopoulos, A., Rosso, R., Fotiadis, D.I. (2012) A Support Vector Machine
Approach for Categorization of Patients Suffering from Chronic Diseases. In Wireless
Mobile Communication and Healthcare, Nikita, K.S., Lin, J.C., Fotiadis, D.I., Arredondo
Waldmeyer, M.T., Eds., Springer: Berlin, Germany, Volume 83, pp. 264–267.
26. Bengio, Y. (2009) Learning deep architectures for AI. Foundations and trends® in Machine
Learning, 2:1-127.
27. Bhattacharya S, Lane ND. From smart to deep: Robust activity recognition on smartwatches
using deep learning. In: IEEE International Conference on Pervasive Computing and
Communication Workshops (PerCom Workshops), Sydney, Australia, pp. 1–6. (2016)
28. Bieber, G., Haescher, M., Vahl, M. (2013) Sensor requirements for activity recognition on
smart watches. the 6th Int Conf. on PErvasive Technol. Relat. to Assist. Environ. 29–31.
29. Biodevices, S.A. VitalJacket®. Available online: http://www.vitaljacket.com/.
30. Blonde, L., Karter, A.J. (2005) Current evidence regarding the value of self-monitored blood
glucose testing. Am. J. Med. 118:20–26.
31. Bluetooth SIG, “Health Device Profile Specification Vol. 1.0,” http://www.bluetooth.org/.
152 C. Fei et al.
32. Bsoul, M., Minn, H., Tamil, L. (2011) Apnea medassist: Real-time sleep apnea monitor using
single-lead ECG. IEEE Trans. Inf. Technol. Biomed. 15:416–427.
33. Bulling, A., Blanke, U., & Schiele, B. (2014) A tutorial on human activity recognition using
body-worn inertial sensors. Acm Comput. Surv. 46:1-33.
34. Center Berkeley, Caffe, 2016. [Online]. Available: http://caffe.berkeley vision.org/
35. Chan, M., Esteve, D., Fourniols, J.Y., Escriba, C., Campo, E. (2012) Smart wearable systems:
Current status and future challenges. Artif. Intell. Med. 56:137–156.
36. Chaovalit, P., Gangopadhyay, A., Karabatis, G., Chen, Z. (2011) Discrete Wavelet transform-
based time series analysis and mining. ACM Comput. Surv. 43:6:1–6:37.
37. Chatterjee, S., Dutta, K., Xie, H.Q., Byun, J., Pottathil, A., Moore, M. (2013) Persuasive and
Pervasive Sensing: A New Frontier to Monitor, Track and Assist Older Adults Suffering from
Type-2 Diabetes. In: the 46th Hawaii International Conference on System Sciences, Grand
Wailea, Maui, HI, USA, pp. 2636–2645.
38. Cho, K., Raiko, T., & Ihler, A. T. (2011) Enhanced gradient and adaptive learning rate for
training restricted Boltzmann machines. In: the 28th International Conference on Machine
Learning (ICML-11) pp. 105–112.
39. Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., &
Bengio, Y. (2014) Learning phrase representations using RNN encoder-decoder for statistical
machine translation. arXiv preprint arXiv:1406.1078.
40. Choi, J., Ahmed, B., Gutierrez-Osuna, R. (2012) Development and evaluation of an
ambulatory stress monitor based on wearable sensors. IEEE Trans. Inf. Technol. Biomed.
16:279–286.
41. Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014) Empirical evaluation of gated recurrent
neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.
42. Chung, J., Gülçehre, C., Cho, K., & Bengio, Y. (2015) Gated Feedback Recurrent Neural
Networks. In ICML. pp. 2067–2075.
43. Clifton, L., Clifton, D.A., Pimentel, M.A.F., Watkinson, P.J., Tarassenko, L. (2013) Gaussian
processes for personalized e-health monitoring with wearable sensors. IEEE Trans. Biomed.
Eng. 60:193–197.
44. Continua Health Alliance, “Version2010 Design Guidelines,” http://www.continuaalliance.
org/products/design-guidelines.html.
45. Cortes, C., Vapnik, V. (1995) Support-vector networks. Mach. Learn. 20:273–297.
46. Cunha, J.P.S., Cunha, B., Pereira, A.S., Xavier, W., Ferreira, N., Meireles, L. (2010) Vital-
Jacket®: A wearable wireless vital signs monitor for patients’ mobility in cardiology and
sports. 4th Int. Conf. on Pervasive Comput. Technol. for Healthc. 1–2.
47. Custodio, V., Herrera, F.J., Lopez, G., Moreno, J.I. (2012) A review on architectures and com-
munications technologies for wearable health-monitoring systems. Sensors 12:13907–13946.
48. Danie G. Krige (1951) A statistical approach to some basic mine valuation problems on the
Witwatersrand, J. Chem. Metall. Min. Soc. S. Afr. 52:119–139.
49. Ding, H., Sun, H., mean Hou, K. (2011) Abnormal ECG Signal Detection Based on
Compressed Sampling in Wearable ECG Sensor. In: the International Conference on Wireless
Communications and Signal Processing, Nanjing, China, pp. 1–5.
50. Ding, X., Lei, H., & Rao, Y. (2016) Sparse codes fusion for context enhancement of night
video surveillance. Multimed. Tools and Appli., 75:11221–11239.
51. Dong H, Supratak A, Pan W, Wu C, Matthews PM, Guo Y. (2018) Mixed neural network
approach for temporal sleep stage classification. IEEE Trans. on Neural Syst. and Rehabil.
Eng. 26:324–333.
52. Elliott, M.C.A. (2012) Critical care: The eight vital signs of patient monitoring. Br. J. Nurs.
21: 621–625.
53. Erfani, S. M., Rajasegarar, S., Karunasekera, S., & Leckie, C. (2016) High-dimensional
and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern
Recognit. 58:121–134.
54. Eskofier BM, Lee SI, Daneault JF, Golabchi FN, Ferreira-Carvalho G, Vergara-Diaz G. et al.
Recent machine learning advancements in sensor-based mobility analysis: deep learning
for Parkinson’s disease assessment. In: IEEE 38th Annual International Conference of the
Engineering in Medicine and Biology Society (EMBC), Lake Buena Vista, Orlando, USA,
pp. 655–658. (2016)
55. Fei, C.W., Bai, G.C. (2013) Wavelet correlation feature scale entropy and fuzzy support vec-
tor machine approach for aeroengine whole-body vibration fault diagnosis. Shock and Vib.
20(2):341–349.
56. Fei, C.W., Bai, G.C., Tang, W.Z., Ma, S. (2014) Quantitative diagnosis of rotor vibration fault
using process power spectrum entropy and support vector machine method. Shock and Vib.
2014:957531.
57. Fei CW, Lu C, Liem R.P. (2019) Decomposed-coordinated surrogate modelling strategy for
compound function approximation and a turbine blisk reliability evaluation. Aerosp. Sci.
Technol. 95: UNSP105466.
58. Figo, D., Diniz, P. C., Ferreira, D. R., & Cardoso, J. M. (2010) Preprocessing techniques
for context recognition from accelerometer data. Personal and Ubiquitous Computing.
14:645–662.
59. Fischer, A., & Igel, C. (2014) Training restricted Boltzmann machines: An introduction.
Pattern Recognition. 47:25–39.
60. Fraile, A.J., Javier, B., Corchado, J.M., Abraham, A. (2010) Applying wearable solutions in
dependent environments. IEEE Trans. Inf. Technol. Biomed. 14(6):1459–1467.
61. Frank, M. (2015) Your Head Is Better for Sensors than Your Wrist, Outside-Live Bravely:
Santa Fe, NM, USA.
62. Franois Chollet, Keras, 2016. [Online]. Available: https://keras.io/.
63. Frantzidis, C.A., Bratsas, C., Klados, M.A., Konstantinidis, E., Lithari, C.D., Vivas, A.B.,
Papadelis, C.L., Kaldoudi, E., Pappas, C., Bamidis, P.D. (2010) On the classification of
emotional biosignals evoked while viewing affective pictures: An integrated data-mining-
based approach for healthcare applications. Trans. Inf. Tech. Biomed. 14:309–318.
64. G. Matheron (1963) Principles of geostatistics, Econ. Geol. 58:1246–1266.
65. G. Matheron (1973) The intrinsic random functions and their applications, Adv. Appl. Probab.
5(3):439–468.
66. Gao, Y., & Glowacka, D. (2016) Deep Gate Recurrent Neural Network. arXiv preprint
arXiv:1604.02910.
67. Garmin Ltd. HRM-Tri™. Available online: https://buy.garmin.com.
68. Gialelis, J., Chondros, P., Karadimas, D., Dima, S., Serpanos, D. (2012) Identifying Chronic
Disease Complications Utilizing State of the Art Data Fusion Methodologies and Signal
Processing Algorithms. In Wireless Mobile Communication and Healthcare, Nikita, K.S.,
Lin, J.C., Fotiadis, D.I., Arredondo Waldmeyer, M.T., Eds., Springer: Berlin, Germany,
Volume 83, pp. 256–263.
69. Giri, D., Rajendra Acharya, U., Martis, R.J., Vinitha Sree, S., Lim, T.C., Ahamed VI, T., Suri,
J.S. (2013) Automated diagnosis of Coronary Artery Disease affected patients using LDA,
PCA, ICA and Discrete Wavelet Transform. Know. Based Syst. 37:274–282.
70. Google, Tensorflow, 2016. [Online]. Available: https://www.tensorflow.org/.
71. Graves, A. (2013) Generating sequences with recurrent neural networks. arXiv preprint
arXiv:1308.0850.
72. Gravina R, Alinia P, Ghasemzadeh H, Fortino G. (2017) Multi-sensor fusion in body sensor
networks: State- of-the-art and research challenges. Inf. Fusion. 35:68–80.
73. Guo, Y., Liu, Y., Oerlemans, A., Lao, S., Wu, S., & Lew, M. S. (2016) Deep learning for
visual understanding: A review. Neurocomputing 187:27–48.
74. Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L.A. (2006) Feature Extraction: Foundations and
Applications (Studies in Fuzziness and Soft Computing), Springer: Secaucus, NJ, USA.
75. H. Liu, J. Zhou, Y, Xu, Y, Zheng, X. Peng, and W. Jiang (2018) Unsupervised fault diagnosis
of rolling bearings using a deep neural network based on generative adversarial networks,
Neuro comput. 315:412–424.
76. Hakonen, M., Piitulainen, H., Visala, A. (2015) Current state of digital signal processing in
myoelectric interfaces and related applications. Biomed. Signal Process. Control 18:334–359.
77. HealthWatch Technologies Ltd. Available online: http://www.personal-healthwatch.com/.
154 C. Fei et al.
78. Hexoskin. Available online: http://www.hexoskin.com/.

79. Hinton, G. E., & Salakhutdinov, R. R. (2006) Reducing the dimensionality of data with neu-
ral networks. Science. 313:504-507.
80. Hinton, G. E., Osindero, S., & Teh, Y.-W. (2006) A fast learning algorithm for deep belief
nets. Neural comput. 18:1527–1554.
81. Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012)
Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint
arXiv:1207.0580.
82. Hjalmarson, A. (2007) Heart rate: An independent risk factor in cardiovascular disease. Eur.
Heart J. Suppl. 9:F3–F7.
83. Hochreiter, S., & Schmidhuber, J. (1997) Long short-term memory. Neural Comput.
9:1735–1780.
84. Hongxia Li, Tao Liu, Minjie Wang, Danyang Zhao, Aike Qiao, Xue Wang, Junfeng Gu,
Zheng Li, Bao Zhu (2017) Design optimization of stent and its dilatation balloon using krig-
ing surrogate model, Biomed. Eng. Online 16(13):1–17.
85. http://www8.cao.go.jp/kourei/whitepaper/index-w.html
86. Hu, F., Jiang, M., Celentano, L., Xiao, Y. (2008) Robust medical ad hoc sensor networks
(MASN) with wavelet-based ECG data mining. Ad Hoc Netw. 6:986–1012.
87. Huang, G., Zhang, Y., Cao, J., Steyn, M., Taraporewalla, K. (2013) Online mining abnormal
period patterns from multiple medical sensor data streams. World Wide Web 2013, doi:https://
doi.org/10.1007/s11280-013-0203-y.
88. Hubel, D. H., & Wiesel, T. N. (1962) Receptive fields, binocular interaction and functional
architecture in the cat's visual cortex. The J. of physiol. 160:106–154.
89. I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,
A. C. Courville, Y. Bengio, Generative adversarial nets, in Advances in Neural Information
Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014,
December 8-13 2014, Montreal, Quebec, Canada, pp. 2672–2680.
90. Incel, O. (2015) Analysis of Movement, Orientation and Rotation-Based Sensing for Phone
Placement Recognition. Sensors, 15:25474.
91. J. Sacks, W.J. Welch, T.J. Mitchell, H.P. Wynn (1989) Design and analysis of computer
experiments, Stat. Sci. 4:409–423.
92. J. Tian, C. Morillo, M. H. Azarian and M. Pecht (2016) Motor bearing fault detection using
spectral Kurtosis-based feature extraction coupled with K-nearest neighbor distance analysis.
IEEE Trans. Ind. Electron. 63(3):1793–1803.
93. Jindal V, Birjandtalab J, Pouyan MB, Nourani M, An adaptive deep learning
approach for PPG-based identification. In: 38th Annual International Conference of the
Engineering in Medicine and Biology Society (EMBC), Lake Buena Vista, Orlando, USA,
pp. 6401–6404. (2016)
94. Jing, L., Wang, T., Zhao, M., & Wang, P. (2017) An Adaptive Multi-Sensor Data Fusion
Method Based on Deep Convolutional Neural Networks for Fault Diagnosis of Planetary
Gearbox. Sensors. 17:414.
95. K. Fukushima. (1980) Neocognitron: A self-organizing neural network model for a mecha-
nism of pattern recognition unaffected by shift in position. Biolog. Cybernetics. 36:193–202.
96. Kaewwichian, P., Tanwanichkul, L., Pitaksringkarn, J. (2019) Car ownership demand
modeling using machine learning: decision trees and neural networks. Int. J. of Geomate.
17(62):219–230.
97. Kalagnanam, J., Henrion, M. (2013) A comparison of decision analysis and expert rules for
sequential diagnosis. arXiv:1304.2362.
98. Karabadji, N.E., Khelf, I., Seridi, H., Aridhi, S., Remond, D., Dhifli, W. (2019) A data sam-
pling and attribute selection strategy for improving decision tree construction. Expert Syst.
with Appli. 129:84–96
99. Karlen, W., Mattiussi, C., Floreano, D. (2009) Sleep and wake classification with ECG and
respiratory effort signals. IEEE Trans. Biomed. Circuits Syst. 3:71–78.
100. Kautz, T., Groh, B. H., Hannink, J., Jensen, U., Strubberg, H., & Eskofier, B. M. (2017)
Activity recognition in beach volleyball using a Deep Convolutional Neural Network. Data
Min. and Knowl. Discov. 1–28.
101. Khan, Z.A., Sivakumar, S., Phillips, W., Robertson, B. (2014) ZEQoS: A New Energy and
QoS-Aware Routing Protocol for Communication of Sensor Devices in Healthcare System.
Int. J. Distrib. Sens. Netw. 1–18.
102. Khan, S., Yairi, T. (2018) A review on the application of deep learning in system health man-
agement, Mech. Syst. and Signal Process. 107:241–265.
103. Kharel, J., Reda, H.T., Shin, S.Y. (2018) Fog Computing-Based Smart Health Monitoring
System Deploying LoRa Wireless Communication. IETE Tech. Rev. 1–14.
104. Kim, Y., & Ling, H. (2009) Human activity classification based on micro-Doppler signatures
using a support vector machine. IEEE Trans. on Geosci. and Remote Sens. 47:1328–1337.
105. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012) Imagenet classification with deep con-
volutional neural networks. In Adv. in Neural Inf. Process. Syst. pp. 1097–1105.
106. L. A. Pastur-Romay, F. Cedrón, A. Pazos, and A. B. Porto-Pazos. (2016) Deep artificial neural
networks and neuromorphic chips for big data analysis: Pharmaceutical and bioinformatics
applications, Int. J. Molecular Sci., vol. 17, no. 8, Art. no. 1313.
107. L. Zhao, K.K. Choi, I. Lee (2011) Metamodeling method using dynamic Kriging for design
optimization, AIAA J. 49(9):2034–2046.
108. Längkvist M, Karlsson L, Loutfi A. (2012) Sleep stage classification using unsupervised fea-
ture learning. Adv. in Artif. Neural Syst. 2012:1-9.
109. LeCun, Y., Bengio, Y., & Hinton, G. (2015) Deep learning. Nature. 521:436–444.
110. Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y. (2009) Convolutional deep belief networks
for scalable unsupervised learning of hierarchical representations. In: the 26th annual inter-
national conference on machine learning, pp. 609-616. ACM.
111. Lee, K.H., Kung, S.Y., Verma, N. (2012) Low-energy formulations of support vector machine
kernel functions for biomedical sensor applications. J. Signal Process. Syst. 69:339–349.
112. Lee, Y.D., Chung, W.Y. (2009) Wireless sensor network based wearable smart shirt for ubiq-
uitous health and activity monitoring. Sensor. Actuator. B 140:390–395
113. Li, G., Deng, L., Xu, Y., Wen, C., Wang, W., Pei, J., & Shi, L. (2016) Temperature based
Restricted Boltzmann Machines. Sci. Rep. 6.
114. Li, H., Wu, J., Gao, Y.W., Shi, Y. (2016) Examining individuals’ adoption of healthcare wear-
able devices: an empirical study from privacy calculus perspective. Int. J. Med. Inf. 88:8–17.
115. Li, Q., Clifford, G.D. (2012) Dynamic time warping and machine learning for signal quality
assessment of pulsatile signals. Physiol. Meas. 33:1491–1501.
116. Li, X., Porikli, F. (2010) Human State Classification and Predication for Critical Care
Monitoring by Real-Time Bio-signal Analysis. In: the 20th International Conference on
Pattern Recognition, Istanbul, Turkey, pp. 2460–2463.
117. Liddy, C., Dusseault, J.J., Dahrouge, S., et al. (2008) Telehomecare for patients with multiple
chronic illnesses: pilot study. Can. Fam. Physician 54:58–65.
118. Lin, L., Wang, K. Z., Zuo, W. M., Wang, M., Luo, J. B., & Zhang, L. (2016) A Deep Structured
Model with Radius-Margin Bound for 3D Human Activity Recognition. Int. J. of Comput.
Vis. 118:256–273.
119. Liou, C.-Y., Cheng, W.-C., Liou, J.-W., & Liou, D.-R. (2014) Autoencoder for words.
Neurocomputing. 139:84–96.
120. Liu, G., Liang, J., Lan, G., Hao, Q., & Chen, M. (2016) Convolution neutral network
enhanced binary sensor network for human activity recognition. In SENSORS, 2016 IEEE
(pp. 1–3): IEEE.
121. López-Vallverdú, J.A., Riaño, D., Bohada, J.A. (2012) Improving medical decision trees by
combining relevant health-care criteria. Expert Syst. Appl. 39:11782–11791.
122. Lu N, Li T, Ren X, Miao H. (2017) A deep learning scheme for motor imagery classification
based on restricted boltzmann machines. IEEE Trans. on Neural Syst. and Rehabil. Eng. 25:
566–576.
156 C. Fei et al.
123. Lukowicz, P., Anliker, U., Ward, J., Troster, G., Hirt, E., Neufelt, C. (2002) AMON: A wear-
able medical computer for high risk patients. The 6th Int. Symp. on Wearable Comput.
133–134.
124. Lymberis, A.G.L. (2006) Wearable health systems: From smart technologies to real applica-
tions. In: the Annual International Conference of the IEEE Engineering in Medicine and
Biology Society, New York, NY, USA, pp. 6789–6792.
125. M. Li, G. Li, S. Azarm (2008) A Kriging metamodel assisted multi-objective genetic algo-
rithm for design optimization, J. Mech. Des. 130(3):031401.
126. Ma T, Li H, Yang H, Lv X, Li P, Liu T, et al. (2017) The extraction of motion-onset VEP
BCI features based on deep learning and compressed sensing. J. of Neurosci. Methods.
275: 80–92.
127. Ma, J., Sheridan, R. P., Liaw, A., Dahl, G. E., & Svetnik, V. (2015) Deep neural nets as a
method for quantitative structure–activity relationships. Journal of Chemical Information and
Modeling, 55:263–274.
128. Majumder, S., Mondal, T., Deen, M.J. (2017) Wearable sensors for remote health monitoring.
Sensors 17:130.
129. Mao, Y., Chen, W., Chen, Y., Lu, C., Kollef, M., Bailey, T. (2012) An Integrated Data Mining
Approach to Real-Time Clinical Monitoring and Deterioration Warning. In: the 18th ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing,
China, pp. 1140–1148.
130. Marc’Aurelio Ranzato, C. P., Chopra, S., & LeCun, Y. (2007) Efficient learning of sparse
representations with an energy-based model. In: NIPS.
131. Marco Di Rienzo, G.P., Brambilla, G., Ferratini, M., Castiglioni, P. (2005) MagIC System:
A New Textile-Based Wearable Device for Biological Signal Monitoring. Applicability in
Daily Life and Clinical Setting. In: the 2005 IEEE, Engineering in Medicine and Biology
27th Annual Conference 2005, Shangai, China, pp. 7167–7169.
132. Masci, J., Meier, U., Cire
an, D., & Schmidhuber, J. (2011) Stacked convolutional auto-encoders for hierarchical feature
extraction. In: International Conference on Artificial Neural Networks, pp. 52–59. Springer.
133. Microsoft, Cntk, 2016. [Online]. Available: https://github.com/Microsoft/CNTK.
134. Montavon, G., & Müller, K.-R. (2012) Deep Boltzmann machines and the centering trick. In
Neural Networks: Tricks of the Trade pp. 621–637: Springer.
135. Mukherjee, A., Pal, A., Misra, P. (2012) Data Analytics in Ubiquitous Sensor-Based Health
Information Systems. In: the 2012 6th International Conference on Next Generation Mobile
Applications, Services and Technologies, Paris, France, pp. 193–198.
136. Murnane, E.L., Cosley, D., Chang, P., Guha, S., Frank, E., Gay, G., Matthews, M. (2016)
Self-monitoring practices, attitudes, and needs of individuals with bipolar disorder: impli-
cations for the design of technologies to manage mental health. J. Am. Med. Inf. Assoc.
23(3):477–484.
137. Nair, V., & Hinton, G. E. (2010) Rectified linear units improve restricted boltzmann machines.
In: the 27th international conference on machine learning (ICML-10) pp. 807–814.
138. Nangalia, V., Prytherch, D., Smith, G. (2010) Health technology assessment review: Remote
monitoring of vital signs—current status and future challenges. Crit. Care 14:1–8.
139. Naraharisetti, K.V.P., Bawa, M, Tahernezhadi, M. (2011) Comparison of Different Signal
Processing Methods for Reducing Artifacts from Photoplethysmograph Signal. In: the IEEE
International Conference on Electro/Information Technology, Mankato, MN, USA, pp. 1–8.
140. Nervana Systems, Neon, 2016. [Online]. Available: https://github.com/NervanaSystems/neon.
141. Niemela, M., Fuentetaja, R.G., Kaasinen, E., Gallardo, J.L. (2007) Supporting independent
living of the elderly with mobile-centric ambient intelligence: user evaluation of three sce-
narios. Lect. Notes Comput. Sci. 4794:91–107.
142. NVIDIA Corp., Nvidia dgx-1, 2016. [Online]. Available: http://www.nvidia.com/object/
deep-learning-system.html.
143. Nweke, H.F., Teh, Y.W., Ai-garadi, M.A., & Aio, U.R. (2018) Deep learning algorithms for
human activity recognition using mobile and wearable sensor networks: State of the art and
research challenges. Expert Syst. with Appli. 105:233–261.
144. Olshausen, B. A., & Field, D. J. (1997) Sparse coding with an overcomplete basis set: A
strategy employed by V1? Vis. Res. 37:3311–3325.
145. OM Signal Inc. OM Smart Shirt. Available online: http://omsignal.com.
146. Ordóñez, F. J., & Roggen, D. (2016) Deep Convolutional and LSTM Recurrent Neural
Networks for Multimodal Wearable Activity Recognition. Sensors. 16:115.
147. Ordonez, P., Armstrong, T., Oates, T., Fackler, J. (2011) Classification of Patients Using Novel
Multivariate Time Series Representations of Physiological Data. In: the 10th International
Conference on Machine Learning and Applications, Honolulu, HI, USA, pp. 172–179.
148. Oyedotun, O. K., & Khashman, A. (2016) Deep learning in vision-based static hand gesture
recognition. Neural Comput. and Appli. 1–11.
149. Page A, Kulkarni, A, Mohsenin T. (2015) Utilizing deep neural nets for an embedded ECG-
based biometric authentication system. In: Biomedical Circuits and Systems Conference
(BioCAS), Atlanta, GA, USA, pp. 1–4.
150. Paliwal, M., Kumar, U.A. (2009) Neural networks and statistical techniques: A review of
applications. Expert. Syst. Appl. 36:2–17.
151. Pantelopoulos, A., Bourbakis, N.G. (2010) A Survey on Wearable Sensor-Based Systems for
Health Monitoring and Prognosis. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 40:1–12.
152. Podgorelec, V., Kokol, P., Stiglic, B., Rozman, I. (2002) Decision trees: An overview and
their use in medicine. J. Med. Syst. 26:445–463.
153. Postema, T., Peeters, J.M., Friele, R.D. (2012) Key factors influencing the implementation
success of a home telecare application. Int. J. Med. Inf. 8(5):415–423.
154. Poultney, C., et al., (2006) Efficient learning of sparse representations with an energy-based
model, in Proc. Adv. Neural Inf. Process. Syst., pp. 1137–1144.
155. R. Collobert, K. Kavukcuoglu, and C. Farabet, Torch, 2016. [Online]. Available: http://
torch.ch/.
156. Rabiner, L., Juang, B.H. (1986) An introduction to hidden Markov models. IEEE ASSP
Mag. 3:4–16.
157. Rahhal, M. M. A., Bazi, Y., AlHichri, H., Alajlan, N., Melgani, F., & Yager, R. R. (2016)
Deep learning approach for active classification of electrocardiogram signals. Inf. Sci.
345:340–354.
158. Rault, T., Bouabdallah, A., Challal, Y., Marin, F. (2017) A survey of energy-efficient con-
text recognition systems using wearable sensors for healthcare applications. Pervasive Mob.
Comput. 37:23–44.
159. Ravi D, Wong C, Lo B, Yang GZ. cs. In: 13th International Conference on Wearable and
Implantable Body Sensor Networks (BSN), San Francisco, CA, USA, pp. 71–76. (2016)
160. Ravì, D., Wong, C., Deligianni, F., Berthelot, M., Andreu-Perez, J., Lo, B., & Yang,
G. Z. (2017) Deep Learning for Health Informatics. IEEE J. of Biomed. and Health Inf.
21:4–21.
161. Rhea P. Liem, Charles A. Mader, Joaquim R.R.A. Martins (2015) Surrogate models and mix-
tures of experts in aerodynamic performance prediction for mission analysis, Aerosp. Sci.
Technol. 43:126–151.
162. Rifai, S., Vincent, P., Muller, X., Glorot, X., & Bengio, Y. (2011) Contractive auto-encoders:
Explicit invariance during feature extraction. In: the 28th international conference on machine
learning (ICML-11), pp. 833–840.
163. Ripoll VJR, Wojdel A, Romero E, Ramos P, Brugada J. (2016) ECG assessment based on
neural networks with pretraining. Appli. Soft Comput. 49: 399–406.
164. Rita Paradiso, G.L., Taccini, N. (2005) A Wearable Health Care System Based on Knitted
Integrated Sensors. IEEE Trans. Inf. Technol. Biomed. 337–344.
165. Rodriguez, M., Orrite, C., Medrano, C., & Makris, D. (2016) One-Shot Learning of Human
Activity With an MAP Adapted GMM and Simplex-HMM. IEEE Trans. Cybern. 1–12.
158 C. Fei et al.
166. Ronao, C. A., & Cho, S.-B. (2015) Evaluation of deep convolutional neural network archi-
tectures for human activity recognition with smartphone sensors. In Proc. of the KIISE Korea
Computer Congress 858–860.
167. Ronao, C. A., & Cho, S.-B. (2016) Human activity recognition with smartphone sensors
using deep learning neural networks. Expert Syst. with Appli. 59:235–244.
168. Rosenbloom, S.T. (2016) Person-generated health and wellness data for health care. J. Am.
Med. Inf. Assoc. 23(3):438–439.
169. Ruiz-Rodríguez JC, Ruiz-Sanmartín A, Ribas V, Caballero J, García-Roche A, Riera J et al.
(2013) Innovative continuous non-invasive cuffless blood pressure monitoring based on pho-
toplethysmography technology. Intensive Care Med. 39(9): 1618–1625.
170. S. Lu and X. Wang (2004) PCA-based feature selection scheme for machine defect classifica-
tion. IEEE Trans. Instrum. Mea., 53(6):1517–1525.
171. Saeedi R, Norgaard S, Gebremedhin AH. A closed-loop deep learning architecture for robust
activity recognition using wearable sensors. In: IEEE International Conference on Big Data.
Boston, MA, USA, pp. 473–479. (2017)
172. Safi, K., Mohammed, S., Attal, F., Khalil, M., & Amirat, Y. (2016) Recognition of different
daily living activities using hidden Markov model regression. In Biomedical Engineering
(MECBME) 16–19.
173. Salakhutdinov, R., & Larochelle, H. (2010) Efficient Learning of Deep Boltzmann Machines.
In AISTATs 693–700.
174. Salakhutdinov, R., & Hinton, G. (2012) An efficient learning procedure for deep Boltzmann
machines. Neural comput. 24:1967–2006.
175. Sarkar S, Reddy K, Dorgan A, Fidopiastis C, Giering M. (2016) Wearable EEG-based activ-
ity recognition in PHM-related service environment via deep learning. Int. J. Progn. Health
Manag. 7:1–10.
176. Sathyanarayana, A., Joty, S., Fernandez-Luque, L., Ofli, F., Srivastava, J., Elmagarmid, A.,
Taheri, S., & Arora, T. (2016) Impact of Physical Activity on Sleep: A Deep Learning Based
Exploration. arXiv preprint arXiv:1607.07034.
177. Schmidhuber, J. (2015) Deep learning in neural networks: An overview. Neural Netw.
61:85–117.
178. Scully, C., Lee, J., Meyer, J., Gorbach, A.M., Granquist-Fraser, D., Mendelson, Y., Chon,
K.H. (2012) Physiological parameter monitoring from optical recordings with a mobile
phone. IEEE Trans. Biomed. Eng. 59:303–306.
179. Seoane, F., Mohino-Herranz, I., Ferreira, J., Alvarez, L., Buendia, R., Ayllon, D., Llerena,
C., Gil-Pita, R. (2014) Wearable biomedical measurement systems for assessment of mental
stress of combatants in real time. Sensors. 14:7120–7141.
180. Shao, H., Jiang, H., Zhao, H. and Wang, F. (2017) A novel deep autoencoder feature learning
method for rotating machinery fault diagnosis, Mech. Syst. Signal Process. 95:187–204.
181. Shashikumar SP, Shah AJ, Li Q, Clifford GD, Nemati S, A deep learning approach to monitor-
ing and detecting atrial fibrillation using wearable technology. In: IEEE EMBS International
Conference of Biomedical & Health Informatics (BHI), 4–7 March, Las Vegas, Nevada,
USA, pp. 141–144.
182. Shoaib, M., Bosch, S., Incel, O. D., Scholten, H., & Havinga, P. J. (2016) Complex human
activity recognition using smartphone and wrist-worn motion sensors. Sensors 16:426.
183. Simpao AF, Ahumada LM, Gálvez JA, Rehman MA. (2014) A review of analytics and clini-
cal informatics in healthcare. J. Med. Syst. 38(4):1–7.
184. Skymind, Deeplearning4j, 2016. [Online]. Available: http://deeplearning4j.org/.
185. Solmitech. Pacth-type SHC-U7. Available online: http://www.solmitech.com/.
186. Song, Q., Zheng, Y. J., Xue, Y., Sheng, W. G., & Zhao, M. R. (2017) An evolutionary deep
neural network for predicting morbidity of gastrointestinal infections by food contamination.
Neurocomputing 226:16–22.
187. Sow, D., Turaga, D., Schmidt, M. (2013) Mining of Sensor Data in Healthcare: A Survey.
In Managing and Mining Sensor Data, Aggarwal, C.C., Ed., Springer: Berlin, Germany,
459–504.
188. Stacey, M., McGregor, C. (2007) Temporal abstraction in intelligent clinical data analysis: A
survey. Artif. Intell. Med. 39:1–24.
189. Stowe, S., Harding, S. (2010) Telecare, telehealth and telemedicine. Eur. Geriatr. Med.
1:193–197.
190. Sun, F.T., Kuo, C., Cheng, H.T., Buthpitiya, S., Collins, P., Griss, M. (2012) Activity-Aware
Mental Stress Detection Using Physiological Sensors. In Mobile Computing, Applications,
and Services, Gris, M., Yang, G., Eds., Springer: Berlin, Germany, Volume 76, pp. 211–230.
191. Sutskever, I., Vinyals, O., & Le, Q. V. (2014) Sequence to sequence learning with neural
networks. In Adv. Neural Inf. Process. Syst. pp. 3104–3112
192. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke,
V., & Rabinovich, A. (2015) Going deeper with convolutions. In: the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 1–9
193. T.W. Simpson, T.M. Mauery, J.J. Korte, F. Mistree (2001) Kriging metamodels for
global approximation in simulation-based multidisciplinary design optimization, AIAA
J. 39(12):2233–2241.
194. Tabar YR, Halici U. (2016) A novel deep learning approach for classification of EEG motor
imagery signals. J. Neural Eng. 14:016003.
195. Taylor, G. W., Hinton, G. E., & Roweis, S. T. (2007) Modeling human motion using binary
latent variables. Adv. Neural Inf. Process. Syst. 19:1345.
196. Tennina, S., Di Renzo, M., Kartsakli, E., Graziosi, F., Lalos, A.S., Antonopoulos, A., Mekikis,
P.V., Alonso, L. (2014) WSN4QoL: A WSN-Oriented Healthcare System Architecture. Int.
J. Distrib. Sens. Netw. 503417.
197. Thakker, B., Vyas, A.L. (2011) Support vector machine for abnormal pulse classification. Int.
J. Comput. Appl. 22:13–19.
198. Thomas, O., Sunehag, P., Dror, G., Yun, S., Kim, S., Robards, M., Smola, A., Green, D.,
Saunders, P. (2010) Wearable sensor activity analysis using semi-Markov models with a
grammar. Pervasive Mob. Comput. 6:342–350.
199. Universite de Montreal, Theano, 2016. [Online]. Available: http://deeplearning.net/software/
theano/.
200. Valipour, S., Siam, M., Jagersand, M., & Ray, N. (2016) Recurrent Fully Convolutional
Networks for Video Segmentation. arXiv preprint arXiv:1606.00487.
201. Vincent, P., Larochelle, H., Bengio, Y., & Manzagol, P.-A. (2008) Extracting and compos-
ing robust features with denoising autoencoders. In: the 25th international conference on
Machine learning, pp. 1096–1103.
202. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., & Manzagol, P.-A. (2010) Stacked denois-
ing autoencoders: Learning useful representations in a deep network with a local denoising
criterion. J. of Mach. Learn. Res. 11:3371–3408.
203. Vital Connect. HealthPatch® MD. Available online: http://www.vitalconnect.com/.
204. Vivonoetics. ActiWave Cardio. Available online: http://vivonoetics.com/.
205. Vivonoetics. Smartex WWS. Available online: http://vivonoetics.com/.
206. VPMS Asia Pacific. V-Patch. Available online: http://www.vpatchmedical.com/.
207. Vu, T.H.N., Park, N., Lee, Y.K., Lee, Y., Lee, J.Y., Ryu, K.H. (2010) Online discovery of
Heart Rate Variability patterns in mobile healthcare services. J. Syst. Softw. 83:1930–1940.
208. Wang, L. (2016) Recognition of human activities using continuous autoencoders with wear-
able sensors. Sensors. 16:189.
209. Wang, W., Wang, H., Hempel, M., Peng, D., Sharif, H., Chen, H.H. (2011) Secure stochas-
tic ECG signals based on gaussian mixture model for e-healthcare systems. IEEE Syst.
J. 5:564–573.
210. Widodo, A., Yang, B.S. (2007) Application of nonlinear feature extraction and support vector
machines for fault diagnosis of induction motors. Expert Syst. Appl. 33:241–250.
211. Withings—Inspire Health. Pulse Ox—Track. Improve. Available online: http://www.with-
ings.com/eu/withings-pulse.html.
160 C. Fei et al.
212. Wolfram Research, Wolfram math, 2016. [Online]. Available: https://www.wolfram.com/

mathematica/.
213. Wulsin D, Gupta J, Mani R, Blanco J, Litt B. (2011) Modeling electroencephalography wave-
forms with semi-supervised deep belief nets: fast classification and anomaly measurement.
J. Neural Eng. 8(3):1–28.
214. X. Xue and J. Zhou (2017) A hybrid fault diagnosis approach based on mixed-domain state
features for rotating machinery. ISA Trans. 66:284–295.
215. Xu, P.J., Zhang, H., Tao, X.M. (2008) Textile-structured electrodes for electrocardiogram.
Text. Prog. 40:183–213.
216. Yalçın, H. (2016) Human activity recognition using deep belief networks. In 2016 24th
Signal Processing and Communication Application Conference (SIU) pp. 1649–1652.
217. Yan Y, Qin X, Wu Y, Zhang N, Fan J, Wang L. (2015) A restricted Boltzmann machine based
two-lead electrocardiography classification. In: 12th International Conference on Wearable
and Implantable Body Sensor Networks (BSN), Cambridge, Massachusetts, pp. 1–9.
218. Yang, J. B., Nguyen, M. N., San, P. P., Li, X. L., & Krishnaswamy, S. (2015) Deep con-
volutional neural networks on multichannel time series for human activity recognition. In:
the 24th International Joint Conference on Artificial Intelligence (IJCAI), Buenos Aires,
Argentina, pp. 25–31.
219. Yeh, J.Y., Wu, T.H., Tsao, C.W. (2011) Using data mining techniques to predict hospitaliza-
tion of hemodialysis patients. Decis. Support Syst. 50:439–448.
220. Yilmaz, T., Foster, R., Hao, Y. (2010) Detecting vital signs with wearable wireless sensors.
Sensors 10837–10862.
221. Yin, W., Yang, X., Zhang, L., & Oki, E. (2016) ECG Monitoring System Integrated With
IR-UWB Radar Based on CNN. IEEE Access, 4:6344–6351.
222. Yoo, I., Alafaireet, P., Marinov, M., Pena-Hernandez, K., Gopidi, R., Chang, J.F., Hua,
L. (2012) Data mining in healthcare and biomedicine: A survey of the literature. J. Med.
Syst. 36:2431–2448.
223. Yoon, J. (2013) Three-Tiered Data Mining for Big Data Patterns of Wireless Sensor Networks
in Medical and Healthcare Domains. In: the 8th International Conference on Internet and
Web Applications and Services, Rome, Italy, pp. 18–24.
224. Younes, L. (1999) On the convergence of Markovian stochastic algorithms with rapidly
decreasing ergodicity rates. Stochastics: An Int. J. Probab. Stoch. Process. 65:177–228.
225. Zeiler, M. D., and Fergus, R. (2014) Visualizing and understanding convolutional networks,
in Proc. Eur. Conf. Comput. Vision, pp. 818–833.
226. Zephyr Performance Systems. BioHarness™ 3. Available online: http://www.zephyrany-
where.com/products/bioharness-3.
227. Zephyr Technology Corp. Available online: http://zephyranywhere.com/.
228. Zhang J, Wu Y, Bai J, Chen F. (2016) Automatic sleep stage classification based on sparse
deep belief net and combination of multiple classifiers. Trans. of the Institute of Meas. and
Control. 38: 435–451.
229. Zhang, M., & Sawchuk, A. A. (2013) Human daily activity recognition with sparse represen-
tation using wearable sensors. IEEE J. Biomed. Health Inf. 17: 553-560.
230. Zhang, S., Zhang, S., Wang, B., Habetler, T.C., Machine Learning and Deep Learning
Algorithms for Bearing Fault Diagnostics – A Comprehensive Review, https://arxiv.org/
pdf/1901.08247.pdf.
231. Zheng, Y.-J., Ling, H.-F., & Xue, J.-Y. (2014) Ecogeography-based optimization: enhancing
biogeography-based optimization with ecogeographic barriers and differentiations. Comput.
& Oper. Res. 50:115-127.
232. Zhou, X., Guo, J., & Wang, S. (2015) Motion recognition by using a stacked autoencoder-
based deep learning algorithm with smart phones. In Int. Conf. on Wirel. Algorithm., Syst.,
and Appli., pp. 778–787: Springer.
233. Zhu, Y. (2011) Automatic detection of anomalies in blood glucose using a machine learning
approach. J. Commun. Netw. 13:125–131.
View publication stats

2021 Chapter MachineAndDeepLearningAlgorith

Uploaded by

Copyright:

Available Formats

2021 Chapter MachineAndDeepLearningAlgorith

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2021 Chapter MachineAndDeepLearningAlgorith

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Machine and Deep Learning Algorithms for Wearable Health Monitoring

Article · May 2021

The user has requested enhancement of the downloaded file.

Chengwei Fei, Rong Liu, Zihao Li, Tianmin Wang, and Faisal N. Baig

Keywords Machine learning · Deep learning · Wearable health monitoring ·

© The Author(s), under exclusive license to Springer 105

2 Key Technologies of WHM

Table 1 Medical service evolution with different micromechatronics

2.1 Generic System Architecture of WHM

Table 2 Comparison between primary features of wireless protocols [46, 128]

2.2 Device or System of WHM

heart-activity-­monitoring devices: (1) HR devices that is to acquire the R-peaks to

Fig. 3 Heart trackers characterized by different types of WHM devices

2.3 Vital Signals for WHM

Table 3 WHM devices survey topics and restrictions

2.4 Data Analysis of Vital Signals Used for WHM

3 Traditional Machine Learning-Based Approaches

backpropagation (BP-ANN), SVM. The application of classification ML algorithms

Fig. 8 ANN structure Hidden

3.2 Kriging Model

The k-NN algorithm is a nonparametric approach for either regression or categori-

3.6 Other Traditional ML Algorithms

In addition to the generally applied ML methods mentioned above, a great deal of

4 Advanced Methods Based on Deep Learning

DL is a part of ML and Al techniques and thus provides considerable power and

Fig. 9 The framework of deep learning network

Fig. 10 Different framework of deep learning algorithms

pharmacy [127]. Recently, DL has found applications in human activity recognition

the data representation of variants. Moreover, information about the automatic

Fig. 12 The architecture

The use of autoencoders was proposed in the 1980s as an unsupervised pretraining

Fig. 13 Training process of a single hidden layer autoencoder [180]

hidden layer is described in Fig. 13. An autoencoder is trained through the use of an

Fig. 14 Deep autoencoder encoding and decoding process

A study proposed a contractive autoencoder [162], which can effectively repre-

4.3 Sparse Coding

Sparse coding is an ML technique first proposed by Olshausen and Field [144],

Fig. 15 CNN architecture

Fig. 16 Deep convolutional neural network for WHM [120]

4.6 Generative Adversary Network

Fig. 18 The architecture of GAN [89]

5.1 Comparison of Different DL Algorithms for WHM

Hidden Layer 1 Hidden Layer k Hidden Layer N

Architecture Description Characteristics

Real GAN Pros:

Raw data Feature extraction DL approach

Fig. 19 Integration of wearable sensor data in DL method

5.2 Different DL Applications in WHM Systems

6.1 Challenges of ML- or DL-Based WHM

• Large-sensor dataset collection for evaluation of DL methods: The training and

6.2 Open Research Directions of ML-/DL-Based WHM

To monitor human activities and health conveniently and smartly, high-performance

development of the fifth-generation wireless network technology (5G) makes

1. © Polar Electro 2016. H7 Heart Rate Sensor. Available online: www.polar.com.

78. Hexoskin. Available online: http://www.hexoskin.com/.

212. Wolfram Research, Wolfram math, 2016. [Online]. Available: https://www.wolfram.com/

View publication stats

You might also like

2 Key Technologies of WHM

2.1 Generic System Architecture of WHM

2.2 Device or System of WHM

heart-activity-monitoring devices: (1) HR devices that is to acquire the R-peaks to

2.3 Vital Signals for WHM

2.4 Data Analysis of Vital Signals Used for WHM

3 Traditional Machine Learning-Based Approaches

3.2 Kriging Model

3.6 Other Traditional ML Algorithms

4 Advanced Methods Based on Deep Learning

4.3 Sparse Coding

4.6 Generative Adversary Network

5.1 Comparison of Different DL Algorithms for WHM

5.2 Different DL Applications in WHM Systems

6.1 Challenges of ML- or DL-Based WHM

6.2 Open Research Directions of ML-/DL-Based WHM