Applsci 12 10156
Applsci 12 10156
Applsci 12 10156
sciences
Article
Human Posture Detection Using Image Augmentation and
Hyperparameter-Optimized Transfer Learning Algorithms
Roseline Oluwaseun Ogundokun , Rytis Maskeliūnas and Robertas Damaševičius *
Abstract: With the advancement in pose estimation techniques, human posture detection recently
received considerable attention in many applications, including ergonomics and healthcare. When
using neural network models, overfitting and poor performance are prevalent issues. Recently, convo-
lutional neural networks (CNNs) were successfully used for human posture recognition from human
images due to their superior multiscale high-level visual representations over hand-engineering
low-level characteristics. However, calculating millions of parameters in a deep CNN requires a
significant number of annotated examples, which prohibits many deep CNNs such as AlexNet
and VGG16 from being used on issues with minimal training data. We propose a new three-phase
model for decision support that integrates CNN transfer learning, image data augmentation, and
hyperparameter optimization (HPO) to address this problem. The model is used as part of a new
decision support framework for the optimization of hyperparameters for AlexNet, VGG16, CNN, and
multilayer perceptron (MLP) models for accomplishing optimal classification results. The AlexNet
and VGG16 transfer learning algorithms with HPO are used for human posture detection, while CNN
and Multilayer Perceptron (MLP) were used as standard classifiers for contrast. The HPO methods
are essential for machine learning and deep learning algorithms because they directly influence the
Citation: Ogundokun, R.O.; behaviors of training algorithms and have a major impact on the performance of machine learning
Maskeliūnas, R.; Damaševičius, R. and deep learning models. We used an image data augmentation technique to increase the number
Human Posture Detection Using Image of images to be used for model training to reduce model overfitting and improve classification
Augmentation and Hyperparameter- performance using the AlexNet, VGG16, CNN, and MLP models. The optimal combination of
Optimized Transfer Learning
hyperparameters was found for the four models using a random-based search strategy. The MPII
Algorithms. Appl. Sci. 2022, 12, 10156.
human posture datasets were used to test the proposed approach. The proposed models achieved an
https://doi.org/10.3390/
accuracy of 91.2% using AlexNet, 90.2% using VGG16, 87.5% using CNN, and 89.9% using MLP. The
app121910156
study is the first HPO study executed on the MPII human pose dataset.
Academic Editor: Maria Rizzi
Keywords: data augmentation; CNN; dropout; transfer learning; human pose detection; hyperpa-
Received: 1 September 2022
Accepted: 3 October 2022
rameter optimization; decision support
Published: 10 October 2022
Figure 1. 1.
Figure Deep CNN
Deep Architecture.
CNN Architecture [8].
Several
Several manuscripts
manuscripts have
have usedused machine
machine learning
learning and deepand learning
deep learning
methods methods
to solveto
solve the human posture classification problem [9,10], but none
the human posture classification problem [8,9], but none applied hyperparameter optimi- applied hyperparameter
optimization
zation (HPO) with (HPO) with algorithms
algorithms to find theto find
best the best hyperparameters
hyperparameters that willthat will the
induce inducemostthe
most excellent classification accuracy for the machine learning
excellent classification accuracy for the machine learning or deep learning algorithmsor deep learning algorithms
used
used (as(asfar
farasasweweknow).
know).This
Thisset
setofofhyperparameters
hyperparameters is is not
not the
the same
sameforforevery
everyclassification
classifica-
task
tion and
task changes
and changes according
according to to
thethenature
nature ofof
the
themedical
medicalproblem,
problem,which
whichisiscomplex
complexby
design. When a critical data analysis of medical data is required
by design. When a critical data analysis of medical data is required to identify hidden to identify hidden links
or abnormalities that are not evident to humans, machine learning
links or abnormalities that are not evident to humans, machine learning technologies are technologies are being
employed
being employed in healthcare for computational
in healthcare for computational decision-making
decision-making [11]. Decision support
[10]. Decision systems
support
combined with various artificial intelligence (AI) methods have
systems combined with various artificial intelligence (AI) methods have been used for been used for supporting
various medical
supporting variousdecisions
medical while
decisionsanalyzing
while complex
analyzingbiomedical signals andsignals
complex biomedical imagesand such
as echocardiograms
images [12], magnetic
such as echocardiograms resonance
[11], magneticimages [13] and
resonance chest
images x-rays,
[12] and and
chestcomputed
x-rays,
tomography images [14].
and computed tomography images [13].
Usually,
Usually, huge
huge amounts
amounts and
and complicated
complicated medical
medical data,
data, reports,
reports, and
and images
images must
must bebe
analyzed more quickly but with higher accuracy. It is challenging to implement algorithms
analyzed more quickly but with higher accuracy. It is challenging to implement algo-
to carry out such jobs in and of themselves, but it is even more difficult to improve algorithm
rithms to carry out such jobs in and of themselves, but it is even more difficult to improve
accuracy while reducing execution time. Especially, hyperparameter optimization may
algorithm accuracy while reducing execution time. Especially, hyperparameter optimiza-
require a much larger overhead, because multiple training rounds and evaluations of
tion may require a much larger overhead, because multiple training rounds and evalua-
machine/deep learning algorithms for hyperparameter-optimized decision support are
tions of machine/deep learning algorithms for hyperparameter-optimized decision sup-
needed. However, most of these methods solely concentrate on feature selection while
port are needed. However, most of these methods solely concentrate on feature selection
focusing specifically on the issues of underfitting and overfitting. The model can perform
while focusing specifically on the issues of underfitting and overfitting. The model can
well on both datasets, i.e., training data and testing data, if overfitting and underfitting
perform well on both datasets, i.e., training data and testing data, if overfitting and un-
are prevented. Overfitting of the training data is frequently caused by irrelevant features
derfitting are prevented. Overfitting of the training data is frequently caused by irrelevant
and improper (i.e., suboptimal) model design [15]. Most models proposed in the literature
features and improper (i.e., suboptimal) model design [14]. Most models proposed in the
are not generalizable, which raises the need for optimizing models that adapt well to
literature are not generalizable,
new, previously which raises
unavailable medical the As
data [16]. need for optimizing optimization
hyperparameter models that adaptis often
well to new,in
neglected previously unavailable
the validation medical
of decision data [15].
support As hyperparameter
systems, the significance optimization
of its effects
remains unreported [17]. Moreover, there is a lack of systematic studies on the effect
of hyperparameters on the accuracy and robustness of the AI models used for decision
support [18].
Appl. Sci. 2022, 12, 10156 3 of 23
This paper aims to propose a rigorous methodology for CNN transfer learning HPO
for human posture image classification. For this, the random search approach was adopted
to create recommended rankings for filter sizes for the convolutional layers and dense
layers of the four models and the learning rate values for the optimizer used in the models.
Additionally, the dataset used in the study includes the MPII human pose images obtained
from the Kaggle repository. Unlike current approaches, the models and techniques uti-
lized in this proposed system are more favorable since they enhance and optimize the
classification of human posture using deep learning algorithms.
The remaining part of the article is prearranged as follows: Section 2 presents the
related works. Section 3 describes the two HPO models used in the study. This section also
presented the materials and methods used in the study. Section 4 presents the experimental
results and discussions. The discussion is presented in Section 5, and the article is concluded
with future works suggested in Section 6.
2. Literature Review
In this article, we described the utilization of HPO of deep learning algorithms and
data augmentation with deep learning models to solve image classification problems. It
is essential to shedding light on earlier related works on image classification using deep
learning algorithms. Machine learning in limited datasets is aided by data augmentation
approaches [19–23]. This is because the creation of false pictures immediately helps to
increase the deep learning model’s capacity for generalization and therefore lowers the
risk of overfitting [19,20]. One of the difficulties with data augmentation in this regard
is determining which transformations (such as zoom, rotation, and flip) will be applied
to the images [24–28]. This topic may be categorized as a HPO problem in terms of
machine learning [29–35]. Some research in the literature has looked at the impact of
data augmentation and HPO combinations in a variety of applications, including plant
classification [36], transmission line inspection [37], and the COVID-19 diagnosis procedure
in chest radiographic imaging [38]. Hyperparameter optimization was also applied in
bioinformatics to optimize SVM classification [39], predict real estate prices [40], and
improve neural network training [41].
However, previous studies lack suggestions for optimizing the combinations of deep
learning approaches in human posture image classification. The CNNs are deep learning
methods that have been extensively studied in the field of computer vision [42–44]. One
of the most important aspects that contribute to CNN’s relevance in machine learning
approaches is the ability to extract features from processed images automatically [45,46].
The novelty of this paper is as follows:
• A new decision support framework for the optimization of hyperparameters for
AlexNet, VGG16, CNN, and multilayer perceptron (MLP) models for accomplishing
optimal classification results;
• An experimental comparison of AlexNet, VGG16, CNN, and MLP classifiers that were
trained and evaluated by applying the image data augmentation technique to enrich
the training datasets;
• The study is the first HPO study executed on the MPII human pose dataset.
validation, and 10% for testing. The dataset used for the study was 22,000 images, broken
down to 16,496 images for the training set (75%), 3305 images for validation (15%), and
2223 images for the testing (10%).
Theframework
Figure2.2.The
Figure frameworkofofthe
theproposed
proposed classification
classification transfer
transfer learning
learning models
models with
with thethe image
image datadata
augmentationmethod.
augmentation method.
3.4. AlexNet
The AlexNet model is characterized by containing input layers, convolutional layers,
pooling layers, and fully connected (FC) layers. In general, the AlexNet transfer learning
algorithm is eight layers deep that comprises 5 convolutional layers and 3 FC layers as
Block 6:
• Fully connected layer 1: Dense node 4096, input (227, 227,3), activation = relu.
Appl. Sci. 2022, 12, 10156 • Dropout layer was added to prevent the model from overfitting (0.4). 5 of 23
• BatchNormalization layer 5 was passed before moving to the next layer.
Block 7:
3.4. AlexNet
• Fully connected layer 1: Dense node 4096, activation = relu.
• The AlexNet
Dropout layermodel is characterized
was added to preventbythe
containing inputoverfitting
model from layers, convolutional
(0.4). layers,
• pooling layers, and fully connected (FC) layers. In general, the AlexNet
BatchNormalization layer 5 was passed before moving to the next layer. transfer learning
algorithm is eight layers deep that comprises 5 convolutional layers and 3 FC layers as seen
Block 3.
in Figure 8: The network was modified by adding the Batch Normalization and Dropout
• layers. The sequential
Output model
layer: Dense (7), was built in=blocks
activation as follows:
softmax.
ModifiedAlexNet
Figure3.3.Modified
Figure AlexNet Architecture
Architecture for
forHuman
HumanPosture
PostureImage Classification.
Image Classification.
Block 1
3.5. VGG16
• Convolutional layer 1: 96 filters, 11 kernels, 4 strides, padding is valid, and activation = relu.
The VGG16 model is characterized by containing input layers, convolutional layers,
• Pooling layer 1: MaxPooling2D of size 3, 2 strides, padding is valid.
pooling
• layers, and FC layers as well. In general, the VGG16 transfer learning algorithm
BatchNormalization layer 1 was passed before moving to the next layer.
is 16 layers deep that comprises 13 convolutional layers and 3 FC layers as seen in Figure
Block 2:
4. It was developed into six blocks which are five convolutional blocks and one classifica-
• block.
tion Convolutional layer 2:was
The network 256 modified
filters, 5 kernels, 1 stride,
by adding padding
Batch is valid, and activation
Normalization = relu.lay-
and Dropout
• Pooling layer 2: MaxPooling2D of size 3, 2 strides, padding is valid.
ers. The input size used was 300, and 300 and the image channel were 3.
• BatchNormalization layer 2 was passed before moving to the next layer.
Block 1.
Block 3:
• Convolutional layer 1: 64 filters, 3 kernel, activation = relu, padding = same, im-
• Convolutional layer 3: 384 filters, 3 kernels, 1 stride, padding is valid, and activation = relu.
age_input as defined earlier.
• BatchNormalization layer 3 was passed before moving to the next layer.
• MaxPooling2D layer 1: Pooling size = 3, strides = 2.
Block 4:
Block 2.
• Convolutional layer 4: 384 filters, 3 kernels, 1 stride, padding is valid, and activation = relu.
• • Convolutional layer layer
BatchNormalization 2: 1284 was
filters, 3 kernel,
passed before activation
moving to the= relu,
next padding
layer. = same, im-
age_input
Block 5: as defined earlier.
• Convolutional layer 2: 128 filters, 3 kernel, activation = relu, padding = same, im-
• Convolutional layer 5: 256 filters, 5 kernels, 1 stride, padding is valid, and activation = relu.
• age_input as defined
Pooling layer earlier. of size 3, 2 strides, padding is valid.
5: MaxPooling2D
• • MaxPooling layer2D 2:
BatchNormalization layer Pooling
5 was size = 3,before
passed strides = 2. to the next layer.
moving
• Block
The network
3. model was flattened before moving to the next block.
• Block 6:
Convolutional layer 3: 256 filters, 3 kernel, activation = relu, padding = same, im-
• Fully connected
age_input layerearlier.
as defined 1: Dense node 4096, input (227, 227,3), activation = relu.
•• Dropout layer was added
Convolutional layer 3: 256 to filters,
prevent3the modelactivation
kernel, from overfitting
= relu,(0.4).
padding = same, im-
• BatchNormalization layer 5
age_input as defined earlier. was passed before moving to the next layer.
• Block 7:
Convolutional layer 3: 256 filters, 3 kernel, activation = relu, padding = same, im-
• age_input as defined
Fully connected layerearlier.
1: Dense node 4096, activation = relu.
•• MaxPooling2D layer 3: Pooling
Dropout layer was added size the
to prevent 3, strides
model = 2. overfitting (0.4).
from
• Block
BatchNormalization
4. layer 5 was passed before moving to the next layer.
Block 8:
• Convolutional layer 4: 512 filters, 3 kernel, activation = relu, padding = same, im-
• age_input
Output layer: Dense (7),
as defined activation = softmax.
earlier.
• Convolutional layer 5: 512 filters, 3 kernel, activation = relu, padding = same, im-
age_input as defined earlier.
• Convolutional layer 5: 512 filters, 3 kernel, activation = relu, padding = same, im-
age_input as defined earlier.
Appl. Sci. 2022, 12, 10156 • Convolutional layer 5: 512 filters, 3 kernel, activation = relu, padding = same, im-
6 of 23
age_input as defined earlier.
• MaxPooling2D layer 5: Pooling size = 3, strides = 2.
Block 6 (Classification Block)
3.5. VGG16
• Flatten.
The VGG16 model is characterized by containing input layers, convolutional layers,
• FC layer 1: Dense = 4096 nodes, activation = relu.
pooling layers, and FC layers as well. In general, the VGG16 transfer learning algorithm is
• Dropout = 0.4.
16 layers deep that comprises 13 convolutional layers and 3 FC layers as seen in Figure 4. It
• FC layer 2: Dense = 4096 nodes, activation = relu.
was developed into six blocks which are five convolutional blocks and one classification
• Dropout = 0.4.
block. The network was modified by adding Batch Normalization and Dropout layers. The
• Output layer: classes, activation = softmax.
input size used was 300, and 300 and the image channel were 3.
Block 1.
3.6. Proposed System Architecture
• Convolutional layer 1: 64 filters, 3 kernel, activation = relu, padding = same, im-
In this study, we used the RandomSearch hyperparameter optimization approach to
age_input as defined earlier.
optimize two CNN transfer learning algorithms for decision support, namely AlexNet
• andMaxPooling2D layer 1: Pooling size = 3, strides = 2.
VGG16. These algorithms are employed in MPII human posture analysis to classify
Block
human 2. These datasets consist of around 25,000 poses of different activities. When
poses.
• the Convolutional
default hyperparameter settings
layer 2: 128 are3 used,
filters, kernel,the accuracy =ofrelu,
activation the algorithms
padding = issame,
deter-
im-
mined, and then, when each
age_input as defined earlier.of the HPO techniques is used, it is computed again. There is
• a comparison
Convolutionalbetween the2:before
layer and after
128 filters, photos.activation
3 kernel, For enriching thepadding
= relu, initially limited
= same,da-
im-
taset, we also design an image
age_input as defined earlier. data augmentation approach. Rotation, translation, zoom,
• flips, shear, mirror,
MaxPooling and color
layer2D perturbation
2: Pooling size =[47] are examples
3, strides = 2. of picture data augmentation
techniques that solve the problem of insufficient training data by including altered origi-
Block 3.
nal samples in the training set (Figure 2). The classification results with image data aug-
• mentation
Convolutional layerbased
were verified 3: 256onfilters,
AlexNet 3 kernel,
[48] andactivation
VGG16 [5], = relu, padding = same, im-
respectively.
age_input as defined earlier.
The proposed system architecture is presented in Figure 5. This represents the clas-
• sification
Convolutional
of humanlayer 3: 256
postures filters,
using data3 augmentation
kernel, activation
at the= preprocessing
relu, paddingphase
= same,
afterim-
age_input as defined earlier.
which the models were trained. The MPII dataset was supplied to the system, and the
• preprocessing
Convolutional layer
phase, 3: 256
which filters,normalization,
includes 3 kernel, activation = relu,
rescaling, and padding = same, im-
data augmentation,
age_input as defined earlier.
• MaxPooling2D layer 3: Pooling size 3, strides = 2.
Block 4.
• Convolutional layer 4: 512 filters, 3 kernel, activation = relu, padding = same, im-
age_input as defined earlier.
• Convolutional layer 4: 512 filters, 3 kernel, activation = relu, padding = same, im-
age_input as defined earlier.
• Convolutional layer 4: 512 filters, 3 kernel, activation = relu, padding = same, im-
age_input as defined earlier.
• MaxPooling2D layer 4: Pooling size = 3, strides = 2.
Block 5.
• Convolutional layer 5: 512 filters, 3 kernel, activation = relu, padding = same, im-
age_input as defined earlier.
• Convolutional layer 5: 512 filters, 3 kernel, activation = relu, padding = same, im-
age_input as defined earlier.
Appl. Sci. 2022, 12, 10156 7 of 23
• Convolutional layer 5: 512 filters, 3 kernel, activation = relu, padding = same, im-
age_input as defined earlier.
• MaxPooling2D layer 5: Pooling size = 3, strides = 2.
Block 6 (Classification Block)
• Flatten.
• FC layer 1: Dense = 4096 nodes, activation = relu.
• Dropout = 0.4.
• FC layer 2: Dense = 4096 nodes, activation = relu.
• Dropout = 0.4.
• Output layer: classes, activation = softmax.
Figure5.5.Proposed
Figure ProposedImage
ImageData
Dataaugmentation
augmentationwith
withthe
themodel’s
model’sarchitecture.
architecture.
Figure6.6.Proposed
Figure Proposeddecision
decisionsupport
supportmodels’
models’hyperparameter-optimizer
hyperparameter-optimizerarchitecture.
architecture.
The HPO settings for machine learning or deep learning algorithms are a collection
of resolutions that have a true impact on the training procedure and the classification
results, which reflect how a model performs appropriately. The procedure of training a
model to classify human pose images in the training dataset and apprehend the output
based on the image patterns is referred to as model training. Aside from hyperparameter
selection, model design, which describes a model, directly influences how long it takes to
train, validate, and test a model. The setting has arisen as an important and problematic
subject in the application of deep learning algorithms because of their impact on model
performance and the fact that the ideal collection of values is unknown. Hyperparameters
can be adjusted in a variety of methods in the literature. The procedures for optimizing
these hyperparameters are presented as follows:
• When the scholar has a firm understanding of neural network structure and learning
data, the manual search calculates the hyperparameter value based on the scholar’s
perception or knowledge. However, the criteria for setting hyperparameters are
ambiguous, demanding multiple experiments.
• Random search is a method to train a model that selects random combinations of
hyperparameters. We utilize the best combinations of random hyperparameters.
Random search resembles grid search in several ways.
The fact that we do not give a list of feasible values for each hyperparameter is a
critical distinction. Instead, for each hyperparameter, we sample values from a statistical
distribution. For each hyperparameter, a sample distribution is constructed to perform
a random search. This method allows us to limit the number of hyperparameter combi-
nations that are attempted. In contrast to grid search, which attempts every conceivable
combination, random search allows us to define the number of models to train. Our search
iterations might be based on our computational resources or the time spent per iteration.
An experiment is a set of tests designed to identify the factors that have the greatest
impact on a response variable [51]. The major goal of the Design-of-Experiments (DOE)
methodology is to maximize this response variable after these components are found. To
discover the link between factors and the response variable, these studies need a careful
selection of variables, their ranges, and the number of experiments run. The influence of
factors on the response variable has traditionally been examined by changing the amounts
of one component at a time while keeping the other factors constant. However, this method
is inefficient and leaves out information about prospective interactions.
Table 1 provides an overview of all hyperparameters that were fine-tuned for all
models. The most significant parameters of neural networks are the initial learning rate,
the learning rate decay factor, the number of hidden neurons, and regularization strength.
Hyperparameter Description AlexNet Range VGG16 Range CNN Range MLP Range
In the validation dataset, the best-performing model with the minimum loss value
is chosen. The ultimate performance of the model is estimated using a hold-out test set,
as the validation dataset’s performance is integrated into the model’s hyperparameter
optimization. This strategy provides an objective assessment of performance. Tables 2–5
show the best hyperparameter results for AlexNet, VGG16, CNN, and MLP, respectively.
Hyperparameter Values
Conv_1_filter 232
Conv_1_kernel 5
Conv_2_filter 352
Conv_2_kernel 7
Conv_3_filter 312
Conv_3_kernel 7
Dense_1_units 1540
Dense_2_units 2564
Learning_rate 0.0001
Score 0.8910
Hyperparameter Values
Conv_1_filter 232
Conv_1_kernel 3
Conv_2_filter 128
Conv_2_kernel 7
Conv_3_filter 424
Conv_3_kernel 11
Dense_1_units 2052
Dense_2_units 3076
Learning_rate 0.001
Score 0.9022
Hyperparameter Values
Conv_1_filter 32
Conv_1_kernel 5
Conv_2_filter 64
Conv_2_kernel 7
Dense_1_units 60
Learning_rate 0.01
Score 0. 8742
Hyperparameter Values
Dense_1_units 512
Dense_2_units 640
Learning_rate 0.0001
Score 0.8685
(a) (b)
(a) (b)
Figure 7. Training and testing of AlexNet model: (a) accuracy, (b) loss.
Figure 7. Training
Figureand
7. testing of AlexNet
Training model:
and testing (a) accuracy,
of AlexNet (b) loss.
model: (a) accuracy, (b) loss.
(a) (b)
(a) (b)
Figure 8. Training and testing of AlexNet + IDA model: (a) accuracy, (b) loss.
Appl. Sci. 2022, 12, x FOR PEER REVIEW 13 of 25
Figure 8. Training and testing of AlexNet + IDA model: (a) accuracy, (b) loss.
(a) (b)
Figure 9. Training and testing of AlexNet + HPO model: (a) training and validation accuracy, (b)
training and validation loss.
(a) (b)
4.2.2. VGG16 Transfer Learning Model
Figure Figure 9. Training
9. Training andand testingofofAlexNet
testing AlexNet + + HPO
HPOmodel:
model:(a)(a) training and and
training validation accuracy,
validation accuracy,
(b)The VGG16
training and transfer
validation learning
loss. algorithm was implemented alone (Figure 10a,b) with-
training and validation loss.
out augmenting the image datasets while conducting the preprocessing phase. The da-
taset was
Table only normalized
6. Summary and resized.
of AlexNet transfer learningThe parameters set for the implementation are
results.
4.2.2. VGG16
epochs Transfer Transfer
of 50, batch size Learning
of 32, Model
image input shape of (300, 300), channel = 3, optimizer =
Training Testing Validation
Training Loss Time (min)
The VGG16
Adam, Learning Modeltransfer learning algorithm
Accuracy
loss of categorical_crossentropy, was implemented
Accuracy
and learning Loss
rate of 0.0001. aloneThe(Figure
algorithm10a,b)
waswi
also AlexNet
implemented
out augmenting 0.9709
with
the0.9877
image = 97.1%
IDA, and the0.1050
results
datasets0.0293 are0.7885 =
shown
while conducting 78.9%
in Figure0.6788
11a,b.
the preprocessing The 61
result
phase. the
for The d
AlexNet + IDA = 98.8% 0.8165 = 81.7% 1.0808 50
implementation
taset was
AlexNetonly of 0.9990
the algorithm
normalized
+ HPO with
0.0043a hyperparameter
and resized.
= 99.9% The0.9147 = 91.2% optimizer
parameters set1.0356 is shown
for the 28 in Figure a
implementation
12a,b. of
epochs From50, the implementation,
batch size of 32, image it was input
discovered
shape that
of the VGG16
(300, 300),model
channelperformed best
= 3, optimize
in terms
Adam, 4.2.2.
loss of of
a training
VGG16 Transfer accuracy
LearningofModel
categorical_crossentropy, 100%, while andVGG16
learning + HPOrateperformed
of 0.0001.second best with w
The algorithm
a training accuracy
The VGG16 with of 99.8%.
transfer The
learning validation accuracy of 90.2% for VGG16 + HPO was the
also implemented IDA, andalgorithm
the resultswas are
implemented
shown in alone (Figure
Figure 10a,b)The
11a,b. without
result for t
bestaugmenting
with the lowest
the image execution time conducting
datasets while of 33 min. the Table 7 shows the
preprocessing summary
phase. of the
The dataset wasentire
implementation of the algorithm with a hyperparameter optimizer is shown in Figu
VGG16 result executions.
only normalized and resized. The parameters set for the implementation are epochs of 50,
12a,b.batch
From the
size of implementation,
32, image input shape it was discovered
of (300, that
300), channel the
= 3, VGG16=model
optimizer performed
Adam, loss of b
in Table
terms of a training accuracy
categorical_crossentropy,
7. Summary of VGG16and of 100%,
learning
transfer while
rate of
learning VGG16 + HPO performed second
0.0001. The algorithm was also implemented
results. best w
with IDA,
a training and theofresults
accuracy 99.8%. areThe
shown in Figure accuracy
validation 11a,b. The of
result for the
90.2% forimplementation
VGG16 + HPO was t
Transfer
of the Learning with a hyperparameter optimizer is shown in Figure
algorithm Validation
12a,b. From the
best with the lowestTraining
Model
executionAcc.time of 33 min.
Training Loss Table 7 shows
Testing Acc. the summary
Loss
Timeof(min)
the ent
implementation, it was discovered that the VGG16 model performed best in terms of a
VGG16 result
VGG16
training
executions.
accuracy of 0.1000 = 100%
100%, while VGG16 0.0011 0.7692 = 76.9%
+ HPO performed second best1.3228 83
with a training
VGG16
accuracy+ of
IDA99.8%.0.8765 = 87.7% accuracy
The validation 0.3005 0.7982
of 90.2% for = 79.8%
VGG16 0.5521
+ HPO was 78
the best with
TableVGG16
7. Summary
the lowest
+ HPO of VGG16
execution
0.9984 =transfer
time learning
of 33 min.
99.8% 0.0067results.
Table 7 shows0.9015
the summary
= 90.2% of the entire VGG16
0.8654 33
result executions.
Transfer Learning Validation
Training Acc. Training Loss Testing Acc. Time (mi
Model Loss
VGG16 0.1000 = 100% 0.0011 0.7692 = 76.9% 1.3228 83
VGG16 + IDA 0.8765 = 87.7% 0.3005 0.7982 = 79.8% 0.5521 78
VGG16 + HPO 0.9984 = 99.8% 0.0067 0.9015 = 90.2% 0.8654 33
(a) (b)
Figure
Figure 10.10. Trainingand
Training andtesting
testing of
of VGG16
VGG16 model:
model:(a)
(a)training and
training testing
and accuracy,
testing (b) training
accuracy, and and
(b) training
testing
testing loss.
loss.
Sci. 2022, 12, x FOR PEER REVIEW 14
l. Sci. 2022, 12,Sci.
Appl. x FOR PEER
2022, 12, REVIEW
10156 13 of 23 14 of
(a) (b)
(a) (b)
Figure 11. Training and testing of VGG16 + IDA model: (a) training and testing accuracy, (b) tra
FigureFigure
11. Training and
11. Training testing
and testingof
of VGG16
VGG16 ++IDA
IDA model:
model: (a) training
(a) training and testing
and testing accuracy,accuracy,
(b) training(b) traini
and testing
and
loss.loss.
testing
and testing loss.
(a)
(a) (b)
(b)
FigureFigure
Figure 12. 12. Training
12.Training
Training andand
and testingofofVGG16
testing
testing VGG16 ++ HPO
VGG16 HPO
HPOmodel:
model:
model:(a)(a)
training
(a) andand
training
training validation accuracy,
andvalidation
validation accuracy,
accurac
(b) training
training and and validation
validation loss. loss.
training and validation loss.
Table 7. Summary of VGG16 transfer learning results.
4.3.Standard
4.3. StandardClassifier
Classifier Results
Results Examination
Examination
Transfer Validation
Training Acc. Training Loss Testing Acc. Time (min)
The performance of each deep learning (CNN
The performance of each deep learning (CNN and MLP)Loss
Learning Model and MLP) classifier with
classifier with data
dataaugme
augm
tation, hyperparameter
VGG16
tation, hyperparameter optimization,
0.1000 = 100%
optimization, and
0.0011 alone was
0.7692
and alone measured
= 76.9% and
1.3228 is detailed
83 in Table
VGG16 + IDA 0.8765 = 87.7% 0.3005 0.7982was measured
= 79.8% 0.5521 and is detailed
78 in Tab
and 9 for CNN
VGG16 and
+ HPO MLP,
0.9984 = respectively.
and 9 for CNN and MLP, respectively.99.8% 0.0067 0.9015 = 90.2% 0.8654 33
4.3.1. 4.3.
CNN Model
Standard Classifier Results Examination
4.3.1. CNN Model
The CNN The performance
model was of implemented
each deep learning (CNN
alone and MLP)
(Figure classifier
13a,b) without with data aug- the i
augmenting
The CNN model
mentation, was implemented
hyperparameter optimization,alone (Figure
and alone 13a,b) without
was measured augmenting
and is detailed in th
age datasets
Tables 8 while
and 9 conducting
for CNN and therespectively.
MLP, preprocessing phase. The dataset was only normaliz
age datasets while conducting the preprocessing phase. The dataset was only norma
and resized. The parameters set for the implementation are epochs of 50, batch size of
and resized. The parameters
8. Summary of CNN set for the implementation are epochs of 50, batch size o
imageTable
input shape of (32, results.
32), channel = 3, optimizer = Adam, loss is categorical_crosse
image input
tropy, and
shape of (32,
learning rate
Transfer 32), channel
= 0.0001. Training
Training = 3, optimizer
The algorithm
= Adam,
was also implemented
Validation loss is categorical_cro
Validation with IDA, and t
Loss Time (min)
tropy, and learning
Learning
results areCNN
Model rate
shown in0.9996 = 0.0001.
Accuracy
Figure The
14. The0.0040algorithm was
Accuracy
result for0.6511 also implemented
the= implementation
Loss with IDA, an
of the39algorithm w
= 99.9% 65.1% 2.6715
results are
hyperparameter shown in0.6873 Figure
CNN + IDA optimization
14. The0.8018
= 68.7% is shown
result for
in Figure the
0.676715.
implementation
From the
= 67.7%
of the
implementation,
1.3748 31
algorithm
it was d
hyperparameter
coveredCNN theoptimization
that+ HPO CNN 0.9869 = 98.7% is shown
model performed 0.0530 in Figure
0.8750 =15.
best From
87.5%
in terms of athe implementation,
2.1469
training 3
accuracy ofit 99.9
was
covered
while CNN that +the HPO CNN model second
performed performed best best
with in terms of
a training a training
accuracy accuracy
of 98.7%. The of 99
valid
while CNN + HPO
tion accuracy performed
of 87.5% for CNN second
+ HPObest waswith a training
the best with theaccuracy
lowest of 98.7%. The
execution timeva o
tion accuracy of 87.5% for CNN +
min. Table 8 shows the summary of the CNN results. HPO was the best with the lowest execution time
min. Table 8 shows the summary of the CNN results.
Appl. Sci. 2022, 12, 10156 14 of 23
(a) (b)
(a) (b)
Figure 13. Training and testing of CNN model: (a) training and testing accuracy, (b) training and
Figure 13. 13.
Figure
testing loss.Training and
Training andtesting
testing of CNN
CNNmodel:
model:(a)(a) training
training andand testing
testing accuracy,
accuracy, (b) training
(b) training and and
testing
testing loss.loss.
(a) (b)
(a) (b)
Figure
Figure 14. Training
14. Training andtesting
and testing of
of CNN
CNN++DA
DAmodel:
model:(a)(a)
training and and
training testing accuracy,
testing (b) training
accuracy, (b) training
Figure
and and14. Training
testing
testing loss. and testing of CNN + DA model: (a) training and testing accuracy, (b) training
loss.
and testing loss.
(a) (b)
Appl. Sci. 2022, 12, 10156 15 of 23
Figure 14. Training and testing of CNN + DA model: (a) training and testing accuracy, (b) training
and testing loss.
Appl.Appl.
Sci. 2022, 12, x12,
Sci. 2022, FOR PEER
x FOR REVIEW
PEER REVIEW 16 of
16 25
of 25
(a) (b)
waswasimplemented
implemented alone (Figure
alone 16a,b)
(Figure without
16a,b) withoutaugmenting
augmenting the the
image
imagedatasets
datasetswhile con-con-
while
Training
Figure 15. ducting theand
Figure testing
15. Training
preprocessing of and
CNN +
testing
phase. TheHPO model:
ofdataset
CNN +was (a)
HPOonly training
model: andand
(a) training
normalized validation
and validation
resized. accuracy,
The accuracy,
pa- pa- (b)
ducting the and
training
preprocessing
validation
phase. The dataset was only normalized and resized. The
loss.
(b) training and validation
rameters
rameters for loss.
set set the the
for implementation
implementation are are
epochs of 50,
epochs batch
of 50, sizesize
batch of 32, image
of 32, imageinput shape
input shape
of (32, 32),32),
of (32, channel
channel = 3,=optimizer
3, optimizer = Adam,
= lossloss
Adam, of categorical_crossentropy,
of categorical_crossentropy, andandlearning
learning
4.3.2. MLP Model
4.3.2. MLP rate
Modelof 0.0001.
rate of 0.0001.TheThealgorithm
algorithm waswasalsoalso
implemented
implemented with IDA,
with andand
IDA, the the
results are are
results shown
shown
in Figure Figure
17a,b. The 16, Figure
result for 17,
the and Figure 18 give
implementation of the
the MLP
model classifier
with alone result, with
hyperparameters is image
in Figure
Figures 16–18 give 17a,b.
the The result
MLP for thealone
classifier implementation
result, with of image
the model with hyperparameters
augmentation, and withis
shown augmentation,
in Figure 18a,b. and
From with the
the the hyperparameter
implementation, optimizer,
it was respectively.
discovered thatthat The
the the
MLP MLP
model algorithm
shown in Figure 18a,b. From implementation, it was discovered MLP model
the hyperparameter
performed optimizer,
bestbest
in terms respectively.
of training The
accuracy MLP algorithm
at 99.3%, while MLP was implemented
+ HPO performed alone
sec-sec-
performed in terms of training accuracy at 99.3%, while MLP + HPO performed
(Figure 16a,b) without
ondondbestbest
with augmenting
a training the
accuracy image
of datasets
97.5%. The while
validation conducting
accuracy
with a training accuracy of 97.5%. The validation accuracy of 89.9% for CNN of the
89.9% preprocessing
for CNN + +
phase. TheHPO dataset
HPOwaswas was
the the only
best with
best normalized
the the
with lowest and
execution
lowest resized.
executiontime of 7The
time min.
of parameters
Table
7 min. 9 shows
Table set
thefor
9 shows the
summary
the imple-
summary of of
mentation the entire
aretheepochs MLP
entire ofresult
MLP 50, executions.
batch
result size of 32, image input shape of (32, 32), channel = 3,
executions.
optimizer = Adam, loss of categorical_crossentropy, and learning rate of 0.0001. The algo-
Table 9. Summary of MLP results.
rithm was alsoTable 9. Summary
implemented of MLP
with results.
IDA, and the results are shown in Figure 17a,b. The result
Transfer Learning
Transfer for
Learning Training
the implementation
Training of theTraining
model with Validation
hyperparameters
Validation is shown in Figure Time
18a,b. From
Loss
Training Loss<break/>Accuracy Validation Loss
Validation Loss Time
<break/>Model
<break/>Model <break/>Accuracy
the implementation, it was discovered that the
<break/>Accuracy MLP model performed best in terms of train-
<break/>Accuracy
MLP MLPing accuracy0.9928 = 99.3%
at0.9928
99.3%, while MLP
= 99.3% 0.0386
+ 0.7009
HPO performed
0.0386 = 70.1%
0.7009 =second
70.1% best2.0471
with 38 min
2.0471a training accuracy
38 min
MLPMLP+ IDA 0.6930 = 69.3% 0.8584 0.7205 = 72.1% 0.8379 33 min
of 97.5%. The validation accuracy of 89.9% for CNN + HPO was the best with themin
+ IDA 0.6930 = 69.3% 0.8584 0.7205 = 72.1% 0.8379 33 lowest
MLP + HPO
MLP +execution
HPO 0.9746 = 97.5% 0.0774 0.8985 = 89.9% 0.3604
0.3604result 7executions.
min
time 0.9746
of 7 min.= 97.5% 0.0774the summary
Table 9 shows 0.8985 =of
89.9%
the entire MLP 7 min
(a) (b)
(a) (b)
Figure 18.
18. Training
Trainingand
andtesting of MLP + HPO model: (a) training and validation accuracy,accuracy,
(b) train-
Figure Figure 18. testing
Trainingofand
MLP + HPO
testing model:
of MLP + HPO (a) training
model: and validation
(a) training and validation accuracy, (b) tra
ing and validation loss.
ing and validation loss.
(b) training and validation loss.
4.4. Performance
4.4. Performance4.4. Analysis ofofthe
Performance
Analysis theModels
Models of the Models
Analysis
The proposed
The proposed models
models
The proposedwereevaluated
were evaluated
models were using training
evaluated
using training andtraining
using
and validation
validation accuracies
andaccuracies
validationand and
accuracies a
losses, which are shown
losses, whichin Figure
are 19,
shown while
in Figure
Figure 19, 20 shows
while Figure the
losses, which are shown in Figure 19, while Figure 20 shows the training and validation 20training
shows and
the validation
training and validati
losses of the models. VGG16 has the best training accuracy with the lowest
losses of the models. VGG16 has the best training accuracy with the lowest training loss oftraining lo
losses of the models. VGG16 has the best training accuracy with training
the lowest loss
of 0.0011,
0.0011, whilewhile ofAlexNet
AlexNet0.0011,
+ HPO + HPO
while has
hasAlexNet
the the second-best
+ HPO
second-best has the training
training second-best
accuracy accuracy
99.9%ofwith
oftraining 99.9%
accuracy with a
of 99.9%
a training with
training
loss loss of
of 0.0043, training
and VGGloss
0.0043, 19 +of
and 0.0043,
VGG
HPO 19 +and
is the HPO VGG 19 +with
is the
third-best HPOa is
third-best the third-best
with
training a training
accuracy with a training
ofaccuracy
99.8% andofaccuracy
a99.8% andloss
training 99.8%
a training
of and aoftraining
loss AlexNet
0.0067. 0.0067.+loss of 0.0067.
AlexNet
HPO + HPO
performedAlexNet
best+with
HPObest
performed aperformed best with a ac-
with a validation
validation accuracy validation
of
curacyand
91.2%, of 91.2%,
VGG16curacy
and of 91.2%,
VGG16
+ HPO had+aandVAVGG16
HPO ofhad a +VA
90.2%, HPO of had
which a VA
90.2%,
made it of
whichthe90.2%,
madewhichit themade
second-best it the
Thesecond-b
second-best
model.
model. The model with the lowest validation loss (0.5521)
model with the lowest validation loss (0.5521) is VGG16 + IDA followed by AlexNet withby
model. The model with the lowest validation loss (0.5521) is VGG16 + is VGG16
IDA +
followed IDA afollowed
AlexNet with aAlexNet
validation
validation loss of 0.6788. with a
loss validation
of 0.6788. loss of 0.6788.
Figure20.
Figure 20.Transfer
Transferlearning
learningmodels’
models’training
trainingand
andvalidation
validationlosses.
losses.
The training and validation accuracies of the transfer learning models are shown in
Figure 21, while Figure 22 shows the training and validation losses of the models. CNN
has the best training accuracy of 99.9% with the lowest training loss of 0.0040, while MLP
Figure 20. Transfer learning models’ training and validation losses.
Appl. Sci. 2022, 12, 10156 17 of 23
The training and validation accuracies of the transfer learning models are s
Figure 21, while Figure 22 shows the training and validation losses of the mode
hasThe
thetraining and validation
best training accuracy accuracies
of 99.9%of the transfer
with learningtraining
the lowest models are shown
loss in
of 0.0040, wh
Figure 21, while Figure 22 shows the training and validation losses of the models. CNN
has the second-best training accuracy of 99.3% with a training loss of 0.0386, and
has the best training accuracy of 99.9% with the lowest training loss of 0.0040, while MLP
HPO
has theis the third-best
second-best with
training a training
accuracy accuracy
of 99.3% of 98.7%
with a training lossand a training
of 0.0386, loss of 0.05
and CNN
++HPO
HPOis performed
the third-bestbest
withwith a validation
a training accuracy accuracy
of 98.7% andof a89.9%,
trainingand
lossCNN + HPO ha
of 0.0530.
MLP + HPO performed best with a validation accuracy of 89.9%, and
dation accuracy of 87.5%, which made it the second-best model. The model CNN + HPO had a with
validation accuracy of 87.5%, which made it the second-best model. The model with the
est validation loss (0.3604) is MLP + HPO, followed by MLP + IDA with a valida
lowest validation loss (0.3604) is MLP + HPO, followed by MLP + IDA with a validation
of 0.8379.
loss of 0.8379.
Figure
Figure 21.21. Deep
Deep learning
learning models’
models’ trainingtraining and validation
and validation accuracies. accuracies.
Figure
Figure 22.22. Deep
Deep learning
learning models’
models’ trainingtraining and validation
and validation losses. losses.
The time taken for execution was also used for the evaluation of the models. It was
The that
discovered timethose
taken for execution
transfer was alsothat
learning algorithms used
usedforHPO
thewere
evaluation
executedofat the
the model
discovered
lowest that those
time. AlexNet + HPOtransfer
executed learning algorithms
at 28 min, while VGG16 +thatHPOused HPO
executed were
at 33 min.execute
Itlowest
was alsotime. AlexNet
discovered + HPO
that the executed
next models withat 28 min,
a lower timewhile VGG16
of execution are+the
HPO
onesexecuted
that a
used image data augmentation at the preprocessing stage of their datasets.
It was also discovered that the next models with a lower time of execution are AlexNet + IDA
executed at 50 min, while VGG16 + IDA executed at 78 min as seen in Table 10.
that used image data augmentation at the preprocessing stage of their datasets.
+ IDA executed at 50 min, while VGG16 + IDA executed at 78 min as seen in Tab
The same condition applied to deep learning models have the models impl
with HPO having the lowest time of execution. The deep learning models with a
lowest time of execution were also the ones that used IDA at the preprocessing
the datasets. CNN + HPO executed in 3 min, while MLP + HPO executed in 7 m
Appl. Sci. 2022, 12, 10156 18 of 23
The same condition applied to deep learning models have the models implemented
with HPO having the lowest time of execution. The deep learning models with a second-
lowest time of execution were also the ones that used IDA at the preprocessing stage of the
datasets. CNN + HPO executed in 3 min, while MLP + HPO executed in 7 min. Likewise,
CNN + IDA was executed in 31 min, and MLP + IDA was executed in 33 min (Table 11).
5. Discussion
The results of hyperparameter optimization reveal that some combinations of parame-
ters have a greater impact on the model’s performance, while others have a minor impact.
We noticed that the number of layers and the breadth of the filter had a significant impact
on the prediction performance. The results also showed that great performance may be
achieved for all filter widths. Furthermore, using several layers produced somewhat better
performance than using only one layer since it tolerates additional model complexity, but it
similarly required a longer training time. The training time was affected by the number of
layers and the breadth of the filter, but not by the classification performance. Accordingly,
provided that the number of layers is rigid, a high filter width takes fewer periods to train
than a lesser filter width, even though the two alternatives provide similar predictions. The
same can be said for the number of filters used. The higher the filters used, the longer the
training period becomes, with no discernible gain in prediction accuracy. The number of
filters used has a significant impact on the training time (bottom). The number of trainable
parameters is determined by the number of filters, their breadth, and the number of layers
and stacks. As a result, by utilizing fewer filters for the equivalent fuse of filter width and
layers, training time is lowered. Adding extra layers to the network enhances the depth
and, as a result, its complexity. Training time is influenced by the number of layers but not
by performance. Although both would deliver equal outcomes provided the number of
layers was fixed, an extensive filter width would require fewer layers and hereafter fewer
training times than a narrower filter width.
Appl. Sci. 2022, 12, 10156 20 of 23
An independent test set was used to choose the most efficient network model with
the optimal mix of hyperparameter values. These results show that the suggested VGG16
and AlexNet transfer learning models can reliably classify human pose images. As a result,
transfer learning models are an excellent choice for replacing time-consuming classical
machine learning models. Figures 9–19 depict the model training outcomes, including
training accuracies, validation accuracies, training loss, and validation loss numbers. The
fundamental downside of transfer learning models is that they demand a lot of processing
resources, such as GPUs and a lot of RAMs. Transfer learning classifiers create synthetic
data during the training phase, which requires a large amount of storage space.
Image augmentation was used in the data preparation portion of the investigation.
Data augmentation was used as a regularizer to help manage data overfitting. By producing
additional training data and exposing the model to diverse versions of data, image aug-
mentation helps to reduce the likelihood of overfitting. The image augmentation utilized in
this study helped the model to operate better and more precisely by improving the results.
It also reduces operating expenses by adding transformations in the datasets, as evidenced
by the fact that the model calculation execution time is shorter than when the model is
without image augmentation. It also helps with data cleansing, which is necessary for good
model accuracy. Image augmentation also improves the robustness of deep learning by
adding variations to the model.
Deep learning models are particularly subtle to the extent of the training set and, to be
adequately built, require significantly bigger training datasets. Our findings reveal that no
single hyperparameter combination significantly beats the others. Training a model with
the same hyperparameter settings again does not necessarily result in the same classification
accuracy due to variations in weight and bias initialization. It is critical to go through
the training process many times before deciding on the best-performing network. Deeper
models with more layers, on the other hand, take longer to train.
The proposed hyperparameter-optimization methodology for decision-making can
support doctors in making critical clinical decisions more effectively. The approach pre-
sented in this paper is also useful in situations where people lack access to integrated
primary medical care technology for early diagnosis and treatment.
6. Conclusions
This study used four models for decision support in posture recognition: two transfer
learning algorithms and two deep learning algorithms CNN and MLP. The models were
implemented on MPII Human-Posture dataset images. Three main stages were carried out,
which were implementing the algorithms alone, implementing using image augmentation,
and implementing using hyperparameter optimization (HPO). The HPO transfer learning
algorithms outperformed the ones implemented with image augmentation in terms of
training loss and validation accuracy. AlextNet + HPO outperformed the other four models
with a validation accuracy of 91.2% followed by VGG16 + HPO with a validation accuracy
of 90.2%. The algorithm with the lowest training loss was VGG16 (0.0011), while the model
with the lowest validation loss was 0.5521. In terms of execution time, deep learning
models with HPO had the lowest execution times of 3 min for CNN + HPO and 7 min for
MLP + HPO. This was a result of having fewer layers. Therefore, we recommend that re-
searchers implement their transfer learning algorithms using hyperparameter-optimization
techniques to obtain optimized training and validation losses and accuracies.
In image classification, particularly using transfer learning and deep learning models,
image augmentation can generate diverse outcomes based on a different dataset. In this
study, the performance of posture image classification was determined using transfer
learning models. In general, the proposed models are effective in their decision-making
application for classifying the MPII pose dataset, as evidenced by the comparison of the
four models.
The disadvantage is the increased complexity, as finding optimal hyperparameter
values requires additional computational resources. The scope of this study will be ex-
Appl. Sci. 2022, 12, 10156 21 of 23
panded in the future by performing experiments with larger image datasets. This may
be accomplished by integrating deep learning algorithms with optimization methods to
improve image data augmentation and accurately characterize postures.
Abbreviations
CNN Convolutional Neural Network
MLP Multilayer Perceptron
GPU Graphics Processing Unit
HPO Hyperparameter Optimization
DOE Design of Experiments
References
1. Deng, L. An overview of deep-structured learning for information processing. In Proceedings of the Asia-Pacific Signal and
Information Processing Annual Summit Conference (APSIPA-ASC), Xi’an, China, 19–21 October 2011; pp. 1–14.
2. Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks
from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958.
3. Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al.
Imagenet large-scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [CrossRef]
4. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf.
Process. Syst. 2012, 60, 1–25.
5. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556.
6. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper
with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA,
7–12 June 2015; pp. 1–9.
7. Zhang, M.; Jing, W.; Lin, J.; Fang, N.; Wei, W.; Woźniak, M.; Damaševičius, R. NAS-HRIS: Automatic design and architecture
search of neural network for semantic segmentation in remote sensing images. Sensors 2020, 20, 1–15. [CrossRef] [PubMed]
8. Ogundokun, R.O.; Maskeliunas, R.; Misra, S.; Damaševičius, R. Improved CNN Based on Batch Normal-ization and Adam
Optimizer. In Proceedings of the International Conference on Computational Science and Its Applications, Malaga, Spain,
4–7 July 2022; Springer: Cham, Switzerland, 2022; pp. 593–604.
9. Ben Gamra, M.; Akhloufi, M.A. A review of deep learning techniques for 2D and 3D human pose estimation. Image Vis. Comput.
2021, 114, 104282. [CrossRef]
10. Song, L.; Yu, G.; Yuan, J.; Liu, Z. Human pose estimation and its application to action recognition: A survey. J. Vis. Commun.
Image Represent. 2021, 76, 103055. [CrossRef]
11. Jayatilake, S.M.D.A.C.; Ganegoda, G.U. Involvement of machine learning tools in healthcare decision making. J. Healthc. Eng.
2021, 2021, 6679512. [CrossRef] [PubMed]
12. de Siqueira, V.S.; Borges, M.M.; Furtado, R.G.; Dourado, C.N.; da Costa, R.M. Artificial intelligence applied to support medical
decisions for the automatic analysis of echocardiogram images: A systematic review. Artif. Intell. Med. 2021, 120, 102165.
[CrossRef]
13. Tsougos, I.; Vamvakas, A.; Kappas, C.; Fezoulidis, I.; Vassiou, K. Application of radiomics and decision support systems for breast
MR differential diagnosis. Comput. Math. Methods Med. 2018, 2018, 7417126. [CrossRef] [PubMed]
14. Yang, Y.; Feng, X.; Chi, W.; Li, Z.; Duan, W.; Liu, H.; Liang, W.; Wang, W.; Chen, P.; He, J.; et al. Deep learning aided decision
support for pulmonary nodules diagnosing: A review. J. Thorac. Dis. 2018, 10, S867–S875. [CrossRef]
15. Ali, L.; Rahman, A.; Khan, A.; Zhou, M.; Javeed, A.; Khan, J.A. An automated diagnostic system for heart disease prediction
based on χ2 statistical model and optimally configured deep neural network. IEEE Access 2019, 7, 34938–34945. [CrossRef]
Appl. Sci. 2022, 12, 10156 22 of 23
16. Ansarullah, S.I.; Mohsin Saif, S.; Abdul Basit Andrabi, S.; Kumhar, S.H.; Kirmani, M.M.; Kumar, D.P. An intelligent and reliable
hyperparameter optimization machine learning model for early heart disease assessment using imperative risk attributes.
J. Healthc. Eng. 2022, 2022, 9882288. [CrossRef]
17. Cooney, C.; Korik, A.; Folli, R.; Coyle, D. Evaluation of hyperparameter optimization in machine and deep learning methods for
decoding imagined speech eeg. Sensors 2020, 20, 1–22. [CrossRef] [PubMed]
18. Du, X.; Xu, H.; Zhu, F. Understanding the effect of hyperparameter optimization on machine learning models for structure design
problems. CAD Comput. Aided Des. 2021, 135, 103013. [CrossRef]
19. Chollet, F.; Allaire, J.J. Deep Learning mit R und Keras: Das Praxis-Handbuch von den Entwicklern von Keras und Rstudio; MITP-Verlags
GmbH Co. KG: Frechen, Germany, 2018.
20. Elgendy, M. Deep Learning for Vision Systems; Simon and Schuster: New York, NY, USA, 2020.
21. Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 1–48. [CrossRef]
22. Younis, M.C.; Keedwell, E. Semantic segmentation on small datasets of satellite images using convolutional neural networks.
J. Appl. Remote Sens. 2019, 13, 046510. [CrossRef]
23. Zeng, S.; Zhang, B.; Zhang, Y.; Gou, J. Dual sparse learning via data augmentation for robust facial image classification. Int. J.
Mach. Learn. Cybern. 2020, 11, 1717–1734. [CrossRef]
24. Abayomi-Alli, O.O.; Damaševicius, R.; Maskeliunas, R.; Misra, S. Few-shot learning with a novel voronoi tessellation-based
image augmentation method for facial palsy detection. Electronics 2021, 10, 978. [CrossRef]
25. Abayomi-Alli, O.O.; Damaševičius, R.; Misra, S.; Maskeliūnas, R. Cassava disease recognition from low-quality images using
enhanced data augmentation model and deep learning. Expert Syst. 2021, 38, e12746. [CrossRef]
26. Abayomi-Alli, O.O.; Damaševičius, R.; Misra, S.; Maskeliūnas, R.; Abayomi-Alli, A. Malignant skin melanoma detection using
image augmentation by oversampling in nonlinear lower-dimensional embedding manifold. Turk. J. Electr. Eng. Comput. Sci.
2021, 29, 2600–2614. [CrossRef]
27. Oyewola, D.O.; Dada, E.G.; Misra, S.; Damaševičius, R. A novel data augmentation convolutional neural network for detecting
malaria parasite in blood smear images. Appl. Artif. Intell. 2022, 36, 1. [CrossRef]
28. Wang, Z.; Yang, J.; Jiang, H.; Fan, X. CNN training with twenty samples for crack detection via data augmentation. Sensors 2020,
20, 4849. [CrossRef] [PubMed]
29. Hutter, F.; Hoos, H.; Leyton-Brown, K. An efficient approach for assessing hyperparameter importance. In Proceedings of the
International Conference on Machine Learning, Beijing, China, 21–26 June 2014; pp. 754–762.
30. Hutter, F.; Kotthoff, L.; Vanschoren, J. Automated Machine Learning: Methods, Systems, Challenges; Springer Nature: Berlin, Germany,
2019; p. 219.
31. Mantovani, R.G.; Rossi, A.L.D.; Alcobaça, E.; Vanschoren, J.; de Carvalho, A.C.P.L.F. A meta-learning recommender system for
hyperparameter tuning. Inf. Sci. 2019, 501, 193–221. [CrossRef]
32. Neary, P. Automatic hyperparameter tuning in deep convolutional neural networks using asynchronous reinforcement learning.
In Proceedings of the 2018 IEEE International Conference on Cognitive Computing (ICCC), San Francisco, CA, USA, 2–7 July 2018;
pp. 73–77.
33. Ottoni, A.L.; Nepomuceno, E.G.; de Oliveira, M.S.; de Oliveira, D.C. Tuning of reinforcement learning parameters applied to sop
using the Scott–Knott method. Soft Comput. 2020, 24, 4441–4453. [CrossRef]
34. Schratz, P.; Muenchow, J.; Iturritxa, E.; Richter, J.; Brenning, A. Hyperparameter tuning and performance assessment of statistical
and machine-learning algorithms using spatial data. Ecol. Model. 2019, 406, 109–120. [CrossRef]
35. Shankar, K.; Zhang, Y.; Liu, Y.; Wu, L.; Chen, C.H. Hyperparameter tuning deep learning for diabetic retinopathy fundus image
classification. IEEE Access 2020, 8, 118164–118173. [CrossRef]
36. Pawara, P.; Okafor, E.; Schomaker, L.; Wiering, M. Data augmentation for plant classification. In Proceedings of the International
Conference on Advanced Concepts for Intelligent Vision Systems, Antwerp, Belgium, 18–21 September 2017; Springer: Cham,
Switzerland, 2017; pp. 615–626.
37. Song, C.; Xu, W.; Wang, Z.; Yu, S.; Zeng, P.; Ju, Z. Analysis of the impact of data augmentation on target recognition for UAV-based
transmission line inspection. Complexity 2020, 2020, 3107450. [CrossRef]
38. Monshi, M.M.A.; Poon, J.; Chung, V.; Monshi, F.M. CovidXrayNet: Optimizing data augmentation and CNN hyperparameters
for improved COVID-19 detection from CXR. Comput. Biol. Med. 2021, 133, 104375. [CrossRef]
39. Damaševičius, R. Optimization of SVM parameters for recognition of regulatory DNA sequences. TOP 2010, 18, 339–353.
[CrossRef]
40. Kalliola, J.; Kapočiūtė-Dzikienė, J.; Damaševičius, R. Neural network hyperparameter optimization for prediction of real estate
prices in helsinki. PeerJ Comput. Sci. 2021, 7, e444. [CrossRef] [PubMed]
41. Połap, D.; Woźniak, M.; Hołubowski, W.; Damaševičius, R. A heuristic approach to the hyperparameters in training spiking
neural networks using spike-timing-dependent plasticity. Neural Comput. Appl. 2022, 34, 13187–13200. [CrossRef]
42. Lawal, M.O. Tomato Detection Based on Modified YOLOv3 Framework; Springer Science and Business Media LLC: Berlin, Germany, 2021.
[CrossRef]
43. Zhang, K.; Robinson, N.; Lee, S.-W.; Guan, C. Adaptive Transfer Learning for EEG Motor Imagery Classification with Deep Convolutional
Neural Network; Elsevier BV: Amsterdam, The Netherlands, 2021. [CrossRef]
Appl. Sci. 2022, 12, 10156 23 of 23
44. Roy, A.M. Adaptive Transfer Learning-Based Multiscale Feature Fused Deep Convolutional Neural Network for EEG MI Multiclassification
in Brain–Computer Interface; Elsevier BV: Amsterdam, The Netherlands, 2022. [CrossRef]
45. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on
Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778.
46. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [CrossRef] [PubMed]
47. Andriluka, M.; Pishchulin, L.; Gehler, P.; Schiele, B. 2d human pose estimation: New benchmark and state-of-the-art analysis.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014;
pp. 3686–3693.
48. Flusser, J.; Suk, T. Pattern recognition by affine moment invariants. Pattern Recognit. 1993, 26, 167–174. [CrossRef]
49. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM
2017, 60, 84–90.
50. Ogundokun, R.O.; Misra, S.; Douglas, M.; Damaševičius, R.; Maskeliūnas, R. Medical Internet-of-Things Based Breast Cancer
Diagnosis Using Hyperparameter-Optimized Neural Networks. Future Internet 2022, 14, 153. [CrossRef]
51. Montgomery, D.C. Design and Analysis of Experiments; John Wiley Sons: Hoboken, NJ, USA, 2017.
52. Luvizon, D.; Picard, D.; Tabia, H. Multi-task deep learning for real-time 3D human pose estimation and action recognition. IEEE
Trans. Pattern Anal. Mach. Intell. 2021, 43, 2752–2764. [CrossRef]
53. Munea, T.L.; Yang, C.; Huang, C.; Elhassan, M.A.; Zhen, Q. SimpleCut: A simple and strong 2D model for multi-person pose
estimation. Comput. Vis. Image Underst. 2022, 222, 103509. [CrossRef]
54. Qin, X.; Guo, H.; He, C.; Zhang, X. Lightweight human pose estimation: CVC-net. Multimed. Tools Appl. 2022, 81, 17615–17637.
[CrossRef]
55. Wang, R.; Geng, F.; Wang, X. MTPose: Human pose estimation with high-resolution multi-scale transformers. Neural Process. Lett.
2022, 1–24. [CrossRef]
56. Wang, W.; Zhang, K.; Ren, H.; Wei, D.; Gao, Y.; Liu, J. UULPN: An ultra-lightweight network for human pose estimation based on
unbiased data processing. Neurocomputing 2022, 480, 220–233. [CrossRef]
57. Wu, Y.; Ma, S.; Zhang, D.; Huang, W.; Chen, Y. An improved mixture density network for 3D human pose estimation with ordinal
ranking. Sensors 2022, 22, 4987. [CrossRef]
58. Yang, L.; Qin, Y.; Zhang, X. Lightweight densely connected residual network for human pose estimation. J. Real-Time Image
Process. 2021, 18, 825–837. [CrossRef]
59. Zhang, W.; Fang, J.; Wang, X.; Liu, W. EfficientPose: Efficient human pose estimation with neural architecture search. Comput. Vis.
Media 2021, 7, 335–347. [CrossRef]