Mathematics: Solving Regression Problems With Intelligent Machine Learner For Engineering Informatics
Mathematics: Solving Regression Problems With Intelligent Machine Learner For Engineering Informatics
Mathematics: Solving Regression Problems With Intelligent Machine Learner For Engineering Informatics
Article
Solving Regression Problems with Intelligent Machine Learner
for Engineering Informatics
Jui-Sheng Chou 1, * , Dinh-Nhat Truong 1,2 and Chih-Fong Tsai 3
1 Department of Civil and Construction Engineering, National Taiwan University of Science and Technology,
Taipei City 106335, Taiwan; [email protected]
2 Department of Civil Engineering, University of Architecture Ho Chi Minh City (UAH),
Ho Chi Minh City 700000, Vietnam
3 Department of Information Management, National Central University, Taoyuan City 320317, Taiwan;
[email protected]
* Correspondence: [email protected]
Abstract: Machine learning techniques have been used to develop many regression models to make
predictions based on experience and historical data. They might be used singly or in ensembles.
Single models are either classification or regression models that use one technique, while ensemble
models combine various single models. To construct or find the best model is very complex and
time-consuming, so this study develops a new platform, called intelligent Machine Learner (iML), to
automatically build popular models and identify the best one. The iML platform is benchmarked
with WEKA by analyzing publicly available datasets. After that, four industrial experiments are
conducted to evaluate the performance of iML. In all cases, the best models determined by iML are
superior to prior studies in terms of accuracy and computation time. Thus, the iML is a powerful
and efficient tool for solving regression problems in engineering informatics.
Keywords: applied machine learning; classification and regression; data mining; ensemble model;
Citation: Chou, J.-S.; Truong, D.-N.;
Tsai, C.-F. Solving Regression
engineering informatics
Problems with Intelligent Machine
Learner for Engineering Informatics.
Mathematics 2021, 9, 686. https://
doi.org/10.3390/math9060686 1. Introduction
Machine Learning (ML)-based methods for building prediction models have attracted
Academic Editor: Florin Leon abundant scientific attention and are extensively used in industrial engineering [1–3], de-
sign optimization of electromagnetic devices, and other areas [4,5]. The ML-based methods
Received: 18 February 2021
have been confirmed to be effective for solving real-world engineering problems [6–8].
Accepted: 19 March 2021
Various supervised ML techniques (e.g., artificial neural network, support vector machine,
Published: 23 March 2021
classification and regression tree, linear (ridge) regression, and logistic regression) are typi-
cally used individually to construct single models and ensemble models [9,10]. To construct
Publisher’s Note: MDPI stays neutral
a series of models and identify the best one among these ML techniques, users need a
with regard to jurisdictional claims in
comprehensive knowledge of ML and spend a significant effort building advanced models.
published maps and institutional affil-
The primary objective of this research is to develop a user-friendly and powerful ML
iations.
platform, called intelligent Machine Learner (iML), to help its users to solve real-world
engineering problems with a shorter training time and greater accuracy than before. The
iML can automatically build and scan all regression models, and then identify the best one.
Novice users with no experience of ML can easily use this system. Briefly, the iML (1) helps
Copyright: © 2021 by the authors.
users to make prediction model easily; (2) provides an overview of the parameter settings
Licensee MDPI, Basel, Switzerland.
for the purpose of making objective choices; and (3) yields clear performance indicators,
This article is an open access article
facilitating reading and understanding of the results, on which decisions can be based.
distributed under the terms and
Four experiments were carried out to evaluate the performance of iML and were
conditions of the Creative Commons
Attribution (CC BY) license (https://
compared with previous studies. In the first experiment, empirical data concerning enter-
creativecommons.org/licenses/by/
prise resource planning (ERP) for software projects by a leading Taiwan software provider
4.0/).
over the last five years were collected and analyzed [1]. The datasets in the other three
experiments were published on the UCI website [11–13]. Specifically, the purpose of the
second experiment was to train a regression model of comparing the performance of CPU
processors by using some characteristics as input. The third experiment involved forecast-
ing the demand supporting structured productivity and high levels of customer service,
and the fourth experiment involved estimating the total bikes rented per day.
The rest of this paper is organized as follows. Section 2 reviews application of machine
learning techniques in various disciplines. Section 3 presents the proposed methodology
and iML framework. Section 4 introduces the evaluation metrics to measure accuracy of
the developed system. Section 5 demonstrates iML’s interface. Section 6 shows benchmarks
between iML and WEKA (a free, open source program). Section 7 exhibits the applicability
of iML in numerical experiments. Section 8 draws conclusions, and provides managerial
implications and suggestions for future research.
2. Literature Review
Numerous researchers in various fields, such as ecology [14,15], materials proper-
ties [16–18], water resource [19], energy management [20], and decision support [21,22],
use data-mining techniques to solve regression problems, and especially project-related
problems [23,24]. Artificial neural network (ANN), support vector machine/regression
(SVM/SVR) classification and regression tree (CART), linear ridge regression (LRR), and
logistic regression (LgR) are the most commonly used methods for this purpose and are
all considered to be among the best machine learning techniques [25–27]. Similarly, four
popular ensemble models, including voting, bagging, stacking and tiering [28–30], can be
built based on the meta-combination rules of aforementioned single models.
Chou (2009) [31] developed a generalized linear model-based expert system for es-
timating the cost of transportation projects. Dandikas et al. (2018) [32] assessed the
advantages and disadvantages of regression models for predicting potential of biomethane.
The results indicated that the regression method could predict variations in the methane
yield and could be used to rank substrates for production quality. However, least squares-
based regression usually leads to overfitting a model, failure to find unique solutions, and
issues dealing with multicollinearity among the predictors [33], so ridge regression, another
type of regularized regression, is favorably integrated in this study to avoid the above
problems. Additionally, Sentas and Angelis (2006) [34] investigated the possibility of using
some machine learning methods for estimating categorical missing values in software cost
databases. They concluded that multinomial logistic regression was the best for imputation
owing to its superior accuracy.
The general regression neural network was originally designed chiefly to solve re-
gression problems [24,35]. Caputo and Pelagagge (2008) [36] compared the ANN with the
parametric methods for estimating the cost of manufacturing large, complex-shaped pres-
sure vessels in engineer-to-order manufacturing systems. Their comparison demonstrated
that the ANN was more effective than the parametric models, presumably because of its
better mapping capabilities. Rocabruno-Valdés et al. (2015) [37] developed models based
on ANN for predicting the density, dynamic viscosity, and cetane number of methyl esters
and biodiesel. Similarly, Ganesan et al. (2015) [38] used ANN to predict the performance
and exhaust emissions of a diesel electricity generator.
SVM was originally developed by Vapnik (1999) for classification (SVM) and regres-
sion (SVR) [39,40]. Jing et al. (2018) [41] used SVM to classify air balancing, which is a key
element for heating, ventilating, air-conditioning (HAVC), and variable air volume (VAV)
system installation, and is useful for improving the energy efficiency by minimizing unnec-
essary fresh air to the air-conditioned zones. The results demonstrated that SVM achieved
4.6% of relative error value and is a promising approach for air balancing. García-Floriano
et al. (2018) [42] used SVR to model software maintenance (SM) effort prediction. The SVR
model was superior to regression, neural networks, association rules and decision trees,
with 95% confidence level.
Mathematics 2021, 9, 686 3 of 25
The classification and regression tree method (CART), introduced by Breiman et al.
(2017) [43], is an effective method to solve classification and regression problems [42]. Choi
and Seo (2018) [44] predicted the fecal coliform in the North Han River, South Korea by
CART models, the test results showed the total correct classification rates of the four models
ranged from 83.7% to 93.0%. Ru et al. (2016) [45] used the CART model to predict cadmium
enrichment levels in reclaimed coastal soils. The results showed that cadmium enrichment
levels had an accuracy of 78.0%. Similarly, Li (2006) [16] used CART to predict materials
properties and behavior. Chou et al. (2014, 2017) [26,46] utilized the CART method to
modeling steel pitting risk and corrosion rate and forecasting project dispute resolutions.
In addition to the aforementioned single models, Elish (2013) [47] used voting ensem-
ble for estimating software development effort. The ensemble model outperformed all
the single models in terms of Mean Magnitude of Relative Error (MMRE), and achieved
competitive percentage of observations whose Magnitude of Relative Error (MRE) is less
than 0.25 (PRED (25)) and recently proposed Evaluation Function (EF) results. Wang at
el. (2018) demonstrated that ensemble bagging tree (EBT) model could accurately predict
hourly building energy usage with MAPE ranging from 2.97% to 4.63% [48]. Comparing
to the conventional single prediction model, EBT is superior in prediction accuracy and
stability. However, it requires more computation time and is short of interpretability owing
to its sophisticated model structure.
Chen et al. (2019) [49] showed that the stacking model outperformed the individual
models, achieving the highest R2 of 0.85, followed by XGBoost (0.84), AdaBoost (0.84) and
random forest (0.82). For the estimation of hourly PM2.5 in China, the stacking model
exhibited relatively high stability, with R2 ranging from 0.79 to 0.92. Basant at el. (2016) [50]
proposed a three-tier quantitative structure-activity relationship (QSAR) model. This model
can be used for the screening of chemicals for future drug design and development process
and safety assessment of the chemicals. In comparison with previously studies, the QSAR
models on the same endpoint property showed the encouraging statistical quality of the
proposed models.
According to the reviewed literature, various machine learning platforms have been
developed for the past decades, such as the Scikit-Learn Python libraries, Google’s Ten-
sorFlow, WEKA and Microsoft Research’s CNTK. Users can find it easy to use a machine
learning tool and/or framework to solve numerous problems as per their needs [51]. ML-
based approaches have been confirmed to be effective in providing decisive information.
Since there is no best model suitable to predict all problems (the “No Free Lunch” the-
orem [52,53]), a comprehensive comparison of single and ensemble models embedded
within an efficient forecasting platform for solving real-world engineering problems is
imperatively needed. The iML platform proposed in this study can efficiently address
this issue.
Input variable 1 y1
1
Input variable 1 y1
1
y2
Input variable 2
2 y2
Input variable 2
2
... ...
... ...
yn
... ...
Input variable u yn
u
Input variable u
u
class and a negative class. In training, a set of l data points ( xi , yi )il=1 , where xi ∈ Rn the
input data, and yi ∈ (1, 2, . . . , m) is the class label of xi ; the ith SVM model is solved using
the following optimization problem equation [59].
l
1 i T i
min J wi , b, ξ = w w + C ∑ ξ ij (2)
wi ,b,ξ 2 i =1
T
wi ϕ x j + bi ≥ 1 − ξ ij , y j = i,
T
subject to : wi ϕ x j + bi ≤ −1 + ξ ij , y j 6= i, (3)
ξ ij ≥ 0, j = 1, . . . , l.
When the SVM models have been solved, the class label of example x is predicted
as follows:
T
y( x ) = arg max wi ϕ ( x ) + bi (4)
i =1...m
where i is the ith SVM model; wi is a vector normal to the hyper-plane; bi is a bias, ϕ( x)
is a nonlinear function that maps x to a high-dimension feature space, ξ i is the error
in misclassification, and C ≥ 0 is a constant that specifies the trade-off between the
classification margin and the cost of misclassification.
To train the SVM model, radial basic function (RBF) kernel maps samples non-linearly
into a feature space with more dimensions. In this study, the RBF kernel is used as SVM
kernel function. !
−k xi − x j k2
K xi , x j = exp (5)
2σ2
where σ is a positive parameter that controls the radius of RBF kernel function.
Support vector regression (SVR) [40] is one version of SVM. SVR computes a linear
regression function for the new higher-dimensional feature space using ε-insensitive loss
while simultaneously reducing model complexity of the model by minimizing kwk2 . This
process can be implemented by introducing (non-negative) slack variables ξ i , ξ i∗ to measure
the deviation in training samples outside the ε-insensitive zone. The SVR can be formulated
as the minimization of the following equation:
l
1
min J (w, b, ξ ) = (w)T w + C ∑ (ξ i + ξ i∗ ) (6)
w,b,ξ 2 i =1
yi − f ( xi , w) ≤ ε + ξ i∗
subject to : f ( x , w ) − yi ≤ ε + ξ i (7)
∗ i
ξ i , ξ i ≥ 0, i = 1, . . . , n
When SVR model has been solved, the value of example x is predicted as follows.
where K ( xi , x ) is the kernel function and αi∗ , αi are Lagrange multipliers in the dual function.
Figure3.3.The
Figure Theclassification
classificationand
andregression
regressiontree
tree(CART)
(CART)model.
model.
g(𝑔(𝑡)
t) = =
∑∑p( j𝑝(𝑗|𝑡)𝑝(𝑖|𝑡)
| t ) p (i | t ) (9)
(9)
j6=i𝑗≠𝑖
𝑝(𝑗, 𝑡)
𝑝(𝑗|𝑡) =
p( j,𝑝(𝑡)
t) (10)
p( j|t) = (10)
p(t) 𝑗 (𝑡)
𝑝(𝑗)𝑁
𝑝(𝑗, 𝑡) = (11)
p( j) Nj𝑁(𝑗t)
p( j, t) = (11)
𝑝(𝑡) = ∑N𝑝(𝑗, j 𝑡) (12)
𝑗
p(t) = ∑ p( j, t) (12)
𝐺𝑖𝑛𝑖 𝑖𝑛𝑑𝑒𝑥 =j 1 − ∑ 𝑝(𝑗, 𝑡)2 (13)
where 𝑖 and 𝑗 are the categorical variables in each item;2 𝑁𝑗 (𝑡) is the recorded number of
Gini index = 1 − ∑ p( j, t) (13)
nodes 𝑡 in category 𝑗; and 𝑁𝑗 is the recorded number of the root nodes in category 𝑗;
where
and 𝑝(𝑗)i and
is jthe
areprior
the categorical
probabilityvariables
value for in 𝑗. Nj (t) is the recorded number of
each item;
category
nodes t in category j; and Nj . is the recorded number of the root nodes in category j; and
p3.1.4.
( j) is Linear
the prior probability
Ridge value
Regression for category
(LRR) j. Regression (LgR)
and Logistic
Statistical models of the relationship between dependent variables (response
3.1.4. Linear Ridge Regression (LRR) and Logistic Regression (LgR)
variables) and independent variables (explanatory variables) are developed using linear
Statistical
regression models
(Figure 4). of
Thethegeneral
relationship between
formula dependent
for multiple variables
regression (response
models is as variables)
follows.
and independent variables (explanatory variables) are developed using linear regression
𝑛
(Figure 4). The general formula for multiple regression models is as follows.
𝑦 = 𝑓(𝒙) = 𝛽𝑜 + ∑ 𝛽𝑗 𝑥𝑗 + 𝜀 (14)
𝑗=1n
y = f (x) = β o + ∑ β j x j + ε (14)
where 𝑦 is a dependent variable; 𝛽𝑜 is a constant; j =1 𝛽𝑗 is a regression coefficient (𝑗 =
1,2, … , 𝑛), and 𝜀 is an error term.
whereLinear y is aridge
dependent variable;
regression (LRR) β iso aisregularization
a constant; βtechnique
j is a regression
that can coefficient (j =
be used together
1,with . , n), andregression
2, . . generic ε is an error term.
algorithms to model highly correlated data [61,62]. Least squares
methodLinear is aridge regression
powerful (LRR)
technique foristraining
a regularization technique
the LRR model, thatdenotes
which can be used together
𝛽 to minimize
with generic regression algorithms to model highly correlated data [61,62]. Least squares
Mathematics2021,
Mathematics 2021,9,9,x686
FOR PEER REVIEW 7 of2525
7 of
the Residual
method Sum Squares
is a powerful (RSS)-function.
technique for trainingTherefore, the cost
the LRR model, function
which is presented
denotes as
β to minimize
below.
the Residual Sum Squares (RSS)-function. Therefore, the cost function is presented as below.
𝑙
n𝑛
!
l
𝐶𝑜𝑠𝑡(𝛽) = 𝑅𝑆𝑆(𝛽) = ∑(𝑦 − 0 22 + 𝜆(∑
y′) 2𝛽𝑗2 )
∑ ∑𝑗=1 j (15)
Cost( β) = RSS( β) = y−y +λ β (15)
𝑖=1
i =1 j =1
y𝑦′0 =
=β𝛽0 +
0
+ ∑β𝛽𝑗x𝑥𝑗
∑ j j (16)
(16)
where𝜆λ is
where is aa pre-chosen
pre-chosen constant,
constant,which
whichisisthe
theproduct
productof ofaapenalty
penaltyterm
termand
andthe
thesquared
squared
norm the𝛽β vector of regression method, and y𝑦′
normininthe 0 isisthe
thepredicted
predictedvalues.
values.
Y Y
1
Linear Regression
Linear Ridge Regression
X 0 X
(a) LRR model. (b) LgR model.
LinearRidge
Figure4.4.Linear
Figure RidgeRegression
Regression(LRR)
(LRR)and
andLogistic
LogisticRegression
Regression(LgR)
(LgR)models.
models.
x) =
p(= 1 1 (17)
𝑝(𝑥) −(𝛽 −( β𝑛o +∑n𝑥 1)β j x j ) (17)
1 + e𝑜 +∑𝑗=1 𝛽𝑗j=𝑗
1+𝑒
where𝑝(𝑥)
where p( x ) isis the
theprobability
probabilitythat
thatthe thedependent
dependentvariable variableequals
equalsaa“success”
“success”oror“case”
“case”
rather
ratherthan
than a failure non-case.β o 𝛽and
failureorornon-case. 𝑜 and 𝛽𝑗 found
β j are are found by minimizing
by minimizing cost function
cost function defined
defined in Equation
in Equation (18). (18).
𝑙 ! 𝑛
l 𝜆 λ 2n
𝐶𝑜𝑠𝑡(𝛽)
Cost( β =) − − ∑𝑖 (ln(𝑝(𝒙
=(∑(𝑦 yi ln( p𝑖 ))
(xi+
))(1
+− (1𝑦−𝑖 )ln
yi )(1ln−
(1𝑝(𝒙
− p𝑖 ))) )))+ 2+∑ 𝛽∑
( xi ) 𝑗 βj
2 (18)
(18)
𝑖=1 i =1
2
𝑗=1 j=1
where𝑦y𝑖i is
where the observed
is the observed outcome
outcome of case x𝒙,𝑖 ,having
of case
i
having00or
or11as
aspossible
possiblevalues
values[64]
[64]
3.2.
3.2.Ensemble
Ensemble Regression Model
Regression Model
InInthis
thisstudy,
study,several
severalensemble
ensembleschemes,
schemes,including
includingvoting,
voting,bagging,
bagging,stacking,
stacking,andand
tiering
tieringwere
wereinvestigated
investigatedusing
usingthetheinput
inputdata
dataand
anddescribed
describedas asbelow.
below.
• • Voting:
Voting:The
Thevoting
votingensemble
ensemblemodel
modelcombines
combinesthe theoutputs
outputsofofthethesingle
singlemodels
modelsusing
using
aameta-rule.
meta-rule.The
Themean
meanofofthe
theoutput
outputvalues
valuesisisused
usedininthis
thisstudy.
study.According
Accordingtotothe the
adopted
adoptedML MLmodels,
models,1111voting
votingmodels
models are
are trained
trained in this study, including (1)
study, including (1) ANN
ANN+
+SVR,
SVR,(2)
(2)ANN
ANN++ CART,
CART,(3)(3)ANN
ANN++LRR,
LRR,(4)
(4)SVR
SVR++ CART,
CART,(5) (5)SVR
SVR++ LRR,
LRR,(6)
(6)CART
CART+
+LRR,
LRR,(7)
(7)ANN
ANN++SVRSVR++CART,
CART,(8)(8) ANN
ANN + CART
+ CART + LRR,
+ LRR, (9)(9)
ANN ANN + CART
+ CART + LRR,
+ LRR, (10)
(10)
SVRSVR + CART
+ CART + LRR,
+ LRR, (11) (11)
ANNANN + SVR+ SVR
+ CART + CART
+ LRR.+ LRR.
FigureFigure 5a presents
5a presents the
the voting
ensemble
voting model.model.
ensemble
• • Bagging:
Bagging: The
Thebagging
baggingensemble
ensemblemodel
modelduplicates
duplicatessamples
samplesatatrandom,
random,and andeach
each
regressionmodel
regression modelpredicts
predictsvalues
valuesfrom
fromthe
thesamples
samplesindependently.
independently.The Themeta-rule
meta-ruleisis
Mathematics 2021, 9, 686 8 of 25
Mathematics 2021, 9, x FOR PEER REVIEW 8 of 25
applied
applied toto all
all of
of the
the outputs
outputs in
in this
this study.
study. Bagging
Bagging ensemble
ensemble model
model is
is depicted
depicted at
at
Figure 5b.
Figure 5b.
•• Stacking:
Stacking:TheThestacking
stackingensemble
ensemblemodel
modelisisaa two-stage
two-stage model,
model, and
and Figure
Figure 5c
5c describes
describes
the principle of the model. In stage 1, each single model predicts one output value.
the principle of the model. In stage 1, each single model predicts one output value.
Then, these outputs are used as inputs to train a model by these machine learning
Then, these outputs are used as inputs to train a model by these machine learning
techniques again to make a meta-prediction in stage 2. There are four stacking models
techniques again to make a meta-prediction in stage 2. There are four stacking models
herein, including ANN (ANN, SVR, CART, LRR); SVR (ANN, SVR, CART, LRR);
herein, including ANN (ANN, SVR, CART, LRR); SVR (ANN, SVR, CART, LRR);
CART (ANN, SVR, CART, LRR); LRR (ANN, SVR, CART, LRR).
CART (ANN, SVR, CART, LRR); LRR (ANN, SVR, CART, LRR).
ANN
SVR
Predictive
Meta-rule
Data value
CART
LRR
ANN
SVR
CART
Duplicate
sampling LRR ANN
SVR Predictive
Data vaue
CART
ANN LRR
SVR
CART
Duplicate
sampling LRR
Stage 1
ANN
SVR
Output Y 1~n
Data CART
LRR
Stage 2
ANN
SVR
Output Y 1~n
Sub Data CART
LRR
Figure
Figure 5. Ensemble models.
5. Ensemble models.
Mathematics 2021, 9, x FOR PEER REVIEW 9 of 25
Mathematics 2021, 9, 686 9 of 25
• Tiering: Figure 5d illustrates the tiering ensemble model. There are two tiers inside a
• Tiering:
tiering Figure
ensemble5d illustrates the tiering
model in this ensemble
study. The model.
first tier is toThere aredata
classify two into
tiers kinside a
classes
tiering ensemble model in this study. The first tier is to classify
on the basis of T value [18]. Machine learning technique in the first tier fordata into k classes on
the basis of T
classifying value
data [18].toMachine
needs learningAfter
be identified. technique in thethe
classifying firstdata,
tier for
theclassifying
regression
data needs to be identified. After classifying the data, the regression
machine learning is used to train each data (Sub Data) of each class (second machine learning
tier) to
ispredict
used toresults. In the iML, we developed three types of models, including 2-class,In3-
train each data (Sub Data) of each class (second tier) to predict results.
the iML,
class, andwe developed
4-class. three types
The equation for of models, including
calculating T value is: 2-class, 3-class, and 4-class.
The equation for calculating T value is:
𝑦 +𝑦
T = 𝑚𝑎𝑥 𝑚𝑖𝑛 (19)
𝑘 ymax + ymin
T= (19)
where T is standard value, 𝑘 is the number kof classes, and 𝑦𝑚𝑎𝑥 and 𝑦𝑚𝑖𝑛 are the
maximum
where andT is minimum of actual
standard value, values,
k is respectively.
the number of classes, and y and y are the
max min
maximum and minimum of actual values, respectively.
3.3. K-Fold Cross Validation
3.3. K-Fold
K-foldCross Validation
cross validation is used to compare two or more prediction models. This
method randomly
K-fold divides a is
cross validation sample
used into a training
to compare twosample andprediction
or more a test sample by splitting
models. This
into K subsets. K-1 subsets are selected to train the model while the other
method randomly divides a sample into a training sample and a test sample by splitting is used to test,
and this training process is repeated K times (Figure 6). To compare models,
into K subsets. K-1 subsets are selected to train the model while the other is used to test, the average
of performance
and this training results
process(e.g., RMSE,Kand
is repeated MAPE)
times is computed.
(Figure Kohavi
6). To compare (1995)
models, the stated thatofK
average
= 10 provides
performance analytical
results validity,and
(e.g., RMSE, computational efficiency,
MAPE) is computed. and optimal
Kohavi deviation
(1995) stated that K[65].
=
Thus,
10 K = 10analytical
provides is used invalidity,
this study. Performance
computational metrics will
efficiency, andbe explained
optimal in details
deviation [65].Section
Thus,
K4.= 10 is used in this study. Performance metrics will be explained in details Section 4.
1 Test data
1
2 1 Training data
2 1
3 2 1
3 2 1
4 3 2 1
4 3 2 1
5 4 3 2 1
5 4 3 2 1
6 5 4 3 2
6 5 4 3 2
7 6 5 4 3
7 6 5 4 3
8 7 6 5 4
8 7 6 5 4
9 8 7 6 5
9 8 7 6 5
10 9 8 7 6
10 9 8 7 6
10 9 8 7
10 9 8 7
10 9 8
10 9 8
10 9
10 9
10
10
Figure6.6.K-fold
Figure K-foldcross-validation
cross-validationmethod.
method.
3.4.
3.4.Intelligent
IntelligentMachine
Machine Learner Framework
Learner Framework
Figure
Figure 7 presents the structureof
7 presents the structure ofiML.
iML.In
Instage
stage11(data
(datapreprocessing),
preprocessing),the
thedata
dataisis
classified
classified distinctly for particular use in the Tiering ensemble model. Meanwhile, alldata
distinctly for particular use in the Tiering ensemble model. Meanwhile, all data
is divided into two main data groups, namely, learning data and test data, and the learning
is divided into two main data groups, namely, learning data and test data, and the
data is duplicated for training ensemble models.
learning data is duplicated for training ensemble models.
At the next stage, all retrieved data is automatically used for training models, which
At the next stage, all retrieved data is automatically used for training models, which
include single models (ANN, SVR, LRR, and CART), and ensemble models (voting, bagging,
include single models (ANN, SVR, LRR, and CART), and ensemble models (voting,
stacking, and tiering). Notably, the tiering ensemble model needs to employ a classification
bagging, stacking, and tiering). Notably, the tiering ensemble model needs to employ a
technique to assign a class label to the original input at the first tier. A corresponding
classification technique to assign a class label to the original input at the first tier. A
regression model for the particular class is then adopted at the second tier to obtain the
corresponding regression model for the particular class is then adopted at the second tier
predictive value [17,26].
to obtain the predictive value [17,26].
Mathematics 2021, 9, 686 10 of 25
Mathematics 2021, 9, x FOR PEER REVIEW 10 of 25
Finally,ininstage
Finally, stage33(find
(findthe
thebest
bestmodel),
model),the
thepredictive
predictiveperformances
performancesofofall allthe
themodels
models
learned (trained)
learned (trained) inin stage
stage 22 using
using test
test dataset
dataset are
are compared
compared toto identify
identify the
the best
best models.
models.
Section44describes
Section describesthetheperformance
performanceevaluation
evaluationmetrics
metricsinindetail.
detail.
4.4.Mathematical
MathematicalFormulas
Formulasfor
forPerformance
PerformanceMeasures
Measures
To
Tomeasure
measurethetheperformance
performance of classification models,
of classification the accuracy,
models, precision,
the accuracy, sensi-
precision,
tivity, specificity and the area under the curve (AUC) are calculated. For
sensitivity, specificity and the area under the curve (AUC) are calculated. For the the regression
models, five-performance
regression measures, (i.e.,
models, five-performance correlation
measures, coefficient
(i.e., (R),coefficient
correlation mean absolute
(R), error
mean
(MAE),
absolutemean
errorabsolute percentage
(MAE), mean error
absolute (MAPE), error
percentage root mean squared
(MAPE), error (RMSE),
root mean and
squared error
total errorand
(RMSE), rate (TER))
total errorare
ratecalculated. Table 1 presents
(TER)) are calculated. Table 1a presents
confusion matrix and
a confusion Tableand
matrix 2
exhibits
Table 2 those performance
exhibits measuresmeasures
those performance [17,66]. [17,66].
In Table 2, MAE is the mean absolute difference between the prediction and the
actual value. MAPE represents the mean percentage error between prediction and actual
value, the smaller value of MAPE, the better prediction result achieved by the model. The
MAPE is the index typically used to evaluate the accuracy of prediction models. RMSE
represents the dispersion of errors by a prediction model. The statistical index that shows
the linear correlation between two variables is denoted as R. Lastly, TER is the total
difference of predicted and actual values [17].
Mathematics 2021, 9, 686 11 of 25
In Table 2, MAE is the mean absolute difference between the prediction and the actual
value. MAPE represents the mean percentage error between prediction and actual value,
the smaller value of MAPE, the better prediction result achieved by the model. The MAPE
is the index typically used to evaluate the accuracy of prediction models. RMSE represents
the dispersion of errors by a prediction model. The statistical index that shows the linear
correlation between two variables is denoted as R. Lastly, TER is the total difference of
predicted and actual values [17].
The goal is to identify the model that yields the lowest error of test data. To obtain a
comprehensive performance measure, the five statistical measures (RMSE, MAE, MAPE,
1-R, and TER) were combined into a synthesis index (SI) using Equation (20). Based on the
SI values, the best model is identified.
mp
Pi − Pmin,i
1
SI =
mp ∑ Pmax,i − Pmin,i
(20)
i =1
Table 4. Test results by WEKA and iML on concrete compressive strength dataset via hold-out validation.
WEKA iML
Model RMSE MAE MAPE SI (Ranking) RMSE MAE MAPE SI (Ranking)
R R
(MPa) (MPa) (%) (MPa) (MPa) (%)
I. Single CART ANN
0.927 6.546 5.170 18.770 0.142 (7) 0.946 5.302 3.728 12.673 0.023 (3)
II. Voting ANN + CART ANN + CART
0.936 6.202 4.930 19.090 0.124 (6) 0.956 4.771 3.550 12.723 0.000 (1)
III. Bagging CART ANN
0.960 5.044 3.983 15.130 0.032 (4) 0.951 5.056 3.647 12.249 0.010 (2)
IV. Stacking (*) CART (*) LRR
0.939 5.986 4.792 17.520 0.104 (5) 0.444 14.829 11.779 56.775 1.000 (8)
Note: (*) is (ANN + SVR + CART + LRR); bold value denotes the best overall performance.
Mathematics 2021, 9, 686 14 of 25
Table 5. Test results by WEKA and iML on real estate dataset via hold-out validation.
WEKA iML
Model RMSE MAE MAPE SI (Ranking) RMSE MAE MAPE SI (Ranking)
R R
(U) (U) (%) (U) (U) (%)
I. Single CART ANN
0.740 10.762 5.882 13.210 0.321 (6) 0.871 6.630 4.912 13.591 0.049 (3)
II. Voting ANN + CART + LRR ANN + CART
0.745 11.054 5.908 12.780 0.327 (7) 0.877 6.615 4.739 12.867 0.030 (2)
III. Bagging CART CART
0.770 10.321 5.281 11.760 0.246 (4) 0.884 6.485 4.381 12.305 0.000 (1)
IV. Stacking (*) CART (*) ANN
0.744 10.748 5.774 12.940 0.311 (5) 0.391 12.638 10.411 33.807 1.000 (8)
Note: (*) is (ANN + SVR + CART + LRR); bold value denotes the best overall performance; U: house price of unit area (10,000 New Taiwan
Dollar/Ping, where Ping is a local unit, 1 Ping = 3.3 m squared).
Table 6. Test results by WEKA and iML on energy efficiency data set (Heating load) via hold-out validation.
WEKA iML
Model RMSE MAE MAPE SI (Ranking) RMSE MAE MAPE SI (Ranking)
R R
(kW) (kW) (%) (kW) (kW) (%)
I. Single CART ANN
0.996 0.914 0.646 3.300 0.418 (6) 0.999 0.488 0.354 1.700 0.046 (3)
II. Voting ANN + CART ANN + CART
0.996 0.929 0.729 3.820 0.449 (7) 0.999 0.495 0.336 1.617 0.045 (2)
III. Bagging CART ANN
0.997 0.870 0.619 3.210 0.354 (5) 0.999 0.426 0.311 1.519 0.000 (1)
IV. Stacking (*) CART (*) LRR
0.998 0.754 0.524 2.480 0.231 (4) 0.998 3.454 3.226 17.658 1.000 (8)
Note: (*) is (ANN + SVR + CART + LRR); bold value denotes the best overall performance.
Table 7. Test results by WEKA and iML on energy efficiency dataset (Cooling load) via hold-out validation.
WEKA iML
Model RMSE MAE MAPE SI (Ranking) RMSE MAE MAPE SI (Ranking)
R R
(kW) (kW) (%) (kW) (kW) (%)
I. Single CART ANN
0.986 1.524 1.006 3.900 0.320 (6) 0.992 1.231 0.884 3.577 0.038 (2)
II. Voting ANN + CART ANN + CART
0.987 1.504 1.064 4.330 0.293 (4) 0.988 1.509 0.982 3.544 0.222 (3)
III. Bagging CART ANN
0.986 1.565 1.046 4.030 0.358 (7) 0.993 1.177 0.809 3.165 0.000 (1)
IV. Stacking (*) SVR (*) LRR
0.986 1.537 0.979 3.700 0.314 (5) 0.989 4.290 3.762 17.305 1.000 (8)
Note: (*) is (ANN + SVR + CART + LRR); bold value denotes the best overall performance.
Mathematics 2021, 9, 686 15 of 25
Table 8. Test results by WEKA and iML on airfoil self-noise dataset via hold-out validation.
WEKA iML
Model RMSE MAE MAPE SI (Ranking) RMSE MAE MAPE SI (Ranking)
R R
(dB) (dB) (%) (dB) (dB) (%)
I. Single CART ANN
0.898 3.185 2.339 1.880 0.502 (6) 0.953 2.149 1.577 1.259 0.044 (2)
II. Voting ANN + CART ANN + CART
0.893 3.471 2.649 2.100 0.591 (7) 0.952 2.163 1.633 1.301 0.058 (3)
III. Bagging CART ANN
0.922 2.902 2.135 1.710 0.332 (4) 0.958 2.031 1.494 1.194 0.000 (1)
IV. Stacking (*) CART (*) LRR
0.905 3.082 2.271 1.820 0.450 (5) 0.952 7.050 5.648 4.613 1.000 (8)
Note: (*) is (ANN + SVR + CART + LRR); bold value denotes the best overall performance.
Table 9. Performance of WEKA and iML on concrete compressive strength dataset via tenfold cross-validation.
WEKA iML
Model RMSE MAE MAPE SI (Ranking) RMSE MAE MAPE SI (Ranking)
R R
(MPa) (MPa) (%) (MPa) (MPa) (%)
I. Single CART ANN
0.923 6.434 4.810 15.510 0.228 (5) 0.946 5.411 4.003 13.866 0.154 (3)
II. Voting ANN + CART ANN + CART
0.917 6.823 5.213 17.230 0.265 (7) 0.955 4.903 3.506 12.397 0.111 (2)
III. Bagging CART CART
0.932 6.082 4.598 15.030 0.205 (4) 0.980 3.359 2.432 8.356 0.000 (1)
IV. Stacking (*) SVR (*) ANN
0.924 6.436 4.852 15.530 0.229 (6) 0.613 14.381 10.867 44.759 1.000 (8)
Note: (*) is (ANN + SVR + CART + LRR); bold value denotes the best overall performance.
Table 10. Performance of WEKA and iML on real estate valuation dataset via tenfold cross-validation.
WEKA iML
Model RMSE MAE MAPE SI (Ranking) RMSE MAE MAPE SI (Ranking)
R R
(U) (U) (%) (U) (U) (%)
I. Single CART ANN
0.807 8.021 5.197 15.270 0.314 (5) 0.813 8.011 5.388 14.991 0.315 (6)
II. Voting SVR + CART ANN + CART + LRR
0.805 8.091 5.198 15.090 0.315 (7) 0.821 7.878 5.376 15.116 0.308 (4)
III. Bagging CART CART
0.828 7.637 5.017 14.930 0.280 (2) 0.925 4.774 3.201 8.974 0.000 (1)
IV. Stacking (*) SVR (*) ANN
0.819 7.823 4.969 14.440 0.284 (3) 0.432 12.309 9.526 32.267 1.000 (8)
Note: (*) is (ANN + SVR + CART + LRR); bold value denotes the best overall performance; U: house price of unit area (10,000 New Taiwan
Dollar/Ping, where Ping is a local unit, 1 Ping = 3.3 m squared).
Mathematics 2021, 9, 686 16 of 25
Table 11. Performance of WEKA and iML on energy efficiency dataset (Heating load) via tenfold cross-validation.
WEKA iML
Model RMSE MAE MAPE SI (Ranking) RMSE MAE MAPE SI (Ranking)
R R
(kW) (kW) (%) (kW) (kW) (%)
I. Single CART ANN
0.995 1.046 0.712 3.200 0.459 (7) 0.999 0.484 0.360 1.722 0.049 (2)
II. Voting ANN + CART ANN + CART
0.997 0.853 0.641 3.190 0.309 (4) 0.999 0.497 0.352 1.602 0.053 (3)
III. Bagging CART ANN
0.997 0.915 0.633 2.890 0.324 (5) 0.999 0.384 0.291 1.409 0.000 (1)
IV. Stacking (*) SVR (*) LRR
0.996 0.872 0.639 2.990 0.337 (6) 0.998 3.522 3.226 18.181 1.000 (8)
Note: (*) is (ANN + SVR + CART + LRR); bold value denotes the best overall performance.
Table 12. Performance of WEKA and iML on energy efficiency dataset (Cooling load) via tenfold cross-validation.
WEKA iML
Model RMSE MAE MAPE SI (Ranking) RMSE MAE MAPE SI (Ranking)
R R
(kW) (kW) (%) (kW) (kW) (%)
I. Single CART ANN
0.982 1.812 1.183 4.160 0.460 (5) 0.993 1.140 0.799 3.161 0.150 (2)
II. Voting ANN + CART ANN + CART
0.982 1.831 1.276 4.770 0.491 (7) 0.989 1.415 0.900 3.206 0.250 (3)
III. Bagging CART ANN
0.983 1.785 1.160 4.070 0.444 (4) 0.997 0.808 0.556 2.129 0.000 (1)
IV. Stacking (*) SVR (*) LRR
0.982 1.827 1.195 4.210 0.465 (6) 0.989 4.108 3.619 17.253 1.000 (8)
Note: (*) is (ANN + SVR + CART + LRR); bold value denotes the best overall performance.
Table 13. Performance of WEKA and iML on airfoil self-noise dataset via tenfold cross-validation.
WEKA iML
Model RMSE MAE MAPE SI (Ranking) RMSE MAE MAPE SI (Ranking)
R R
(dB) (dB) (%) (dB) (dB) (%)
I. Single CART ANN
0.877 3.314 2.381 1.910 0.497 (5) 0.946 2.239 1.660 1.331 0.152 (2)
II. Voting ANN + CART ANN + CART
0.851 3.685 2.747 2.220 0.641 (7) 0.946 2.246 1.664 1.334 0.152 (3)
III. Bagging CART CART
0.911 2.906 2.160 1.730 0.352 (4) 0.971 1.727 1.271 1.023 0.000 (1)
IV. Stacking (*) LRR (*) LRR
0.874 3.374 2.494 1.990 0.525 (6) 0.946 6.894 5.587 4.562 1.000 (8)
Note: (*) is (ANN + SVR + CART + LRR); bold value denotes the best overall performance.
6.4. Discussion
Single, voting, bagging, and stacking models are compared using WEKA and iML,
except for the tiering method, which is not available in WEKA. Additionally, unlike manual
construction of individual models in WEKA interface, iML can automatically build and
identify the best model for the imported datasets. Hold-out validation and tenfold cross-
validation are used to evaluate the performance results (R, MAE, RMSE, and MAPE) in each
scheme (single, voting, bagging, and stacking). The analytical results of either validation
Mathematics 2021, 9, 686 17 of 25
show that most of the models trained by iML are superior to those trained by WEKA using
the same datasets. Hence, iML is an effective platform to solve regression problems.
7. Numerical Experiments
This section validates iML by using various industrial datasets, including (1) enterprise
resource planning data [1], (2) CPU computer performance data [12], (3) customer data for
a logistics company [13], and (4) daily data bike rentals [11]. Table 14 presents the initial
parameter settings for these problems.
Table 15. Variables and descriptive statistics for predicting enterprise resource planning (ERP) software development effort.
Standard Data
Variable Min. Max. Mean
Deviation Type
Y: Software development effort
4 2694 258.55 394.69 Numerical
(person-hour)
X1 : Program type entry 0 1 Dummy variable Boolean
X2 : Program type report 0 1 Dummy variable Boolean
X3 : Program type batch 0 1 Dummy variable Boolean
X4 : Program type query 0 1 Dummy variable Boolean
X5 : Program type transaction 0 0 Referential category Boolean
X6 : Number of programs 1 88 16.73 19.12 Numerical
X7 : Number of zooms 0 2028 100.22 255.40 Numerical
X8 : Number of columns in form 3 3216 397.75 548.06 Numerical
X9 : Number of actions 0 1645 288.44 339.61 Numerical
X10 : Number of signature tasks 0 15 0.39 1.77 Numerical
X11 : Number of batch serial numbers 0 11 0.31 1.50 Numerical
X12 : Number of multi-angle trade tasks 0 22 0.55 2.66 Numerical
X13 : Number of multi-unit tasks 0 21 1.10 3.41 Numerical
X14 : Number of reference calls 0 528 13.96 49.92 Numerical
X15 : Number of confirmed tasks 0 21 1.50 3.99 Numerical
X16 : Number of post tasks 0 12 0.23 1.33 Numerical
X17 : Number of industry type tasks 0 21 0.80 2.97 Numerical
Table 16. Performances of predictive models for ERP software development effort.
Note: (*) is (ANN + SVR + CART + LRR); (**) SVM-(ANN, SVR); (***) CART-(ANN, SVR, SVR); (****) CART-(CART, SVR, SVR, SVR);
(No.): Ranking.
Mathematics2021,
Mathematics 2021,9,9,x686
FOR PEER REVIEW 1919ofof25
25
400
Root mean square error 317.04
289.20
300 Learning Data Test data
200 176.77
115.24 123.43 132.73
100 68.81 59.92 70.28 65.58
Three models (single, voting, and bagging) provided better results in terms of R (0.94
Figure 10. Root mean square errors of best models.
to 0.99) than the tiering and stacking ensemble models, which had the R values of 0.58 to
0.95. Among these three best models, in terms of MAPE, the bagging model exhibited the
Three models (single, voting, and bagging) provided better results in terms of R (0.94
best balance of MAPE results from learning and test data (21.45% and 19.50%, respectively).
to 0.99) than the tiering and stacking ensemble models, which had the R values of 0.58 to
The single and voting models depicted un-balanced MAPEs for training and test data
0.95. Among these three best models, in terms of MAPE, the bagging model exhibited the
(19.91% and 30.65% for the single model; 16.83% and 33.90% for the voting model). Thus,
best balance of MAPE results from learning and test data (21.45% and 19.50%, respec-
the bagging model was the best model to predict ERP.
tively). The single and voting models depicted un-balanced MAPEs for training and test
The first experiment indicates that, the iML not only identifies the best model, but
data
also (19.91%
reports theandperformance
30.65% for the single
values model;
of all 16.83% models.
the training and 33.90%Chou foretthe
al. voting model).
(2012) obtained
Thus, the and
training bagging model
testing wasof
MAPEs the best model
26.8% to predict
and 27.3%, ERP. of 234.0157 h and 97.2667 h
and RMSEs
The first experiment indicates that, the iML
using Evolutionary Support Vector Machine Inference Model not only identifies
(ESIM)the [1].best
Themodel, but
iML yields
also reports the performance values of all the training models. Chou
the bagging ensemble model with MAPEs of 21.45% and 19.50%, and RMSEs of 70.28hr et al. (2012) obtained
training
and 65.58 and testing
h for MAPEs
the same of 26.8%
training anddata,
and test 27.3%, and RMSEs
respectively. Asofa 234.0157
result, thehiMLand is
97.2667 h
effective
using Evolutionary Support Vector Machine Inference
to find the best model among the popular regression models. Model (ESIM) [1]. The iML yields
the bagging ensemble model with MAPEs of 21.45% and 19.50%, and RMSEs of 70.28hr
and
7.2. 65.58 h for the
Experiments onsame training
Industrial and test data, respectively. As a result, the iML is effec-
Datasets
tive toThree
find the
additional experimentsthe
best model among popular
were regression
performed models.
to evaluate iML. To ensure a fair com-
parison, 70 % of the data was used for learning whereas the remaining 30% was utilized
7.2.
for Experiments
testing. on Industrial Datasets
Three additional experiments were performed to evaluate iML. To ensure a fair com-
7.2.1. Performance
parison, 70 % of theof CPU
data wasProcessors
used for learning whereas the remaining 30% was utilized
for testing.
This experiment is about the comparison of performance of CPU processors. The data
for this experiment was taken from Maurya and Gupta (2015) [12]. This dataset contained
7.2.1. Performance
209 samples with of CPUof
a total Processors
6 attributes (Table 17). The descriptions of the attributes are
as follows: X : Machine cycle
This experiment is about the
1 time in nanoseconds
comparison (integer,ofinput);
of performance X2 : Minimum
CPU processors. The main
data
memory in kilobytes (integer,
for this experiment was taken from Mauryainput); X 3 : Maximum main memory in kilobytes (integer,
and Gupta (2015) [12]. This dataset contained
input);
209 X4 : Cache
samples with a memory
total of 6 in kilobytes
attributes (integer,
(Table input);
17). The X5 : Minimum
descriptions of thechannels
attributesinare
units
as
(integer, input);
follows: X1: Machine X 6 : Maximum channels in units (integer, input); and Y: Estimated
cycle time in nanoseconds (integer, input); X2: Minimum main relative
performance
memory (integer,(integer,
in kilobytes output).input); X3: Maximum main memory in kilobytes (integer,
input); X4: Cache memory in kilobytes (integer, input); X5: Minimum channels in units
Table 17. Descriptive statistics for CPU processors.
(integer, input); X6: Maximum channels in units (integer, input); and Y: Estimated relative
performance (integer, output). Input Output
Statistic Value
X1 X2 X3 X4 X5 X6 Y
Min 17 64 64 0 0 0 15
Max 1500 32,000 64,000 256 52 176 1238
Mean 203.82 2867.98 11,796.2 25.21 4.7 18.27 99.33
Std. 260.26 3878.74 11,726.6 40.63 6.82 26 154.76
Mathematics 2021, 9, 686 20 of 25
Table 18. Variables and descriptive statistics for daily demand forecasting orders.
Input Output
Statistic Value
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 Y
Min 1 2 43.65 77.37 21.83 25.13 74.37 0 11,992 3452 16,411 7679 129.41
Max 5 6 435.30 223.27 118.18 267.34 302.45 865 71,772 210,508 188,411 73,839 616.45
Mean - - 172.55 118.92 52.11 109.23 139.53 77.4 44,504.4 46,640.8 79,401.5 23,114.6 300.87
Std. - - 69.51 27.17 18.83 50.74 41.44 186.5 12,197.9 45,220.7 40,504.4 13,148 89.6
Table 19. Variables and descriptive statistics for total hourly-shared bike rental per days.
Input Output
Statistic Value
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 Y
Min 1 0 1 0 0 0 1 0.059 0.079 0 0.022 22
Max 4 1 12 1 6 1 3 0.862 0.841 0.973 0.507 8714
Mean - - - 0.029 2.997 0.684 1.395 0.495 0.474 0.628 0.19 4504.35
Std. - - - 0.167 2.005 0.465 0.545 0.183 0.163 0.142 0.077 1937.21
In this study, to calculate MAPE, the output was normalized and 0.1 was added to
prevent a zero value.
yi − ymin
yi = + 0.1 (21)
ymax − ymin
where yi , ymin , and ymax are actual value, minimum and maximum of actual value, respec-
tively.
Mathematics 2021, 9, 686 21 of 25
Learn Test
No. Model
RMSE MAE MAPE R TER RMSE MAE MAPE R TER
2 Single 2.462 0.569 1.015% 1.000 0.236% 8.738 3.683 3.775% 0.996 2.730%
Voting 17.509 4.965 2.808% 0.995 0.118% 13.087 5.173 5.685% 0.989 0.921%
Bagging 40.127 8.893 2.360% 0.981 3.015% 13.484 3.930 2.489% 0.996 3.795%
Stacking 40.428 9.030 3.389% 0.973 0.000% 64.782 43.615 104.992% 0.842 9.852%
Tiering-2class 163.338 26.947 3.229% 0.383 24.818% 18.196 5.822 4.629% 0.986 3.220%
Tiering-3class 167.093 29.784 3.856% 0.340 27.446% 76.406 12.698 5.268% 0.639 13.094%
Tiering-4class 182.572 44.722 7.970% 0.112 41.332% 91.701 21.089 8.342% 0.422 24.368%
3 Single 0.349 0.080 0.023% 1.000 0.021% 0.317 0.231 0.093% 1.000 0.042%
Voting 17.417 10.754 3.089% 0.985 0.010% 12.162 10.157 3.993% 0.951 0.867%
Bagging 0.917 0.399 0.110% 1.000 0.020% 0.296 0.221 0.087% 1.000 0.074%
Stacking 0.338 0.090 0.026% 1.000 0.014% 0.335 0.251 0.101% 1.000 0.042%
Tiering-2class 169.296 63.580 14.294% −0.399 21.483% 214.674 86.711 16.747% −0.704 27.688%
Tiering-3class 273.303 212.047 62.384% −0.664 71.449% 295.065 223.209 62.186% −0.570 71.054%
Tiering-4class 329.001 312.397 97.619% −0.304 99.023% 51.164 45.122 18.684% 0.706 11.339%
4 Single 0.046 0.030 6.670% 0.979 10.450% 0.105 0.073 14.120% 0.883 0.550%
Voting 0.052 0.037 7.850% 0.974 8.000% 0.080 0.056 10.750% 0.929 0.260%
Bagging 0.049 0.034 7.150% 0.977 6.930% 0.069 0.046 8.870% 0.948 0.190%
Stacking 0.005 0.003 0.700% 1.000 21.430% 0.214 0.169 38.680% 0.000 0.086%
Tiering-2class 0.580 0.432 57.620% −0.582 58.900% 0.589 0.451 61.410% −0.570 68.290%
Tiering-3class 0.639 0.568 84.420% −0.680 64.600% 0.646 0.565 82.110% −0.717 90.260%
Tiering-4class 0.648 0.596 92.370% −0.513 65.200% 0.652 0.584 87.400% −0.630 94.300%
Note: No. 2: CPU experiment dataset; No. 3: Customer experiment dataset; No. 4: Rental bike experiment dataset; the bold denotes the
best model in each experiment.
In the experiment No. 3, Ferreira et al. (2016) had an analytical result of MAPE 3.45%
and iML confirms ANN single model as the best model, with MAPE values for learning and
test of 0.023% and 0.093%, respectively [13]. The stacking ANN ensemble also performs
well with the MAPEs for the learning and test data by 0.026% and 0.010%, respectively.
Finally, in the experiment No. 4, iML achieves R-learn and R-test values of 0.97660
and 0.94790, with bagging ANN as the best model. In contrast, Fanaee-T and Gama (2014)
obtained a maximum R value of 0.91990 [11].
As shown in the above numerical experiments, iML trains and identifies the best
models which are better than those in the previous studies.
Four industrial experiments were carried out to validate the performance of iML. The
first experiment involved training a model for prediction of ERP development effort, in
which iML yielded an RMSE for learning data with 70.28 h and for testing data with 65.58
h, by using the bagging ANN ensemble (best model). In contrast, Chou et al. (2012) [1]
obtained training and testing RMSE values of 234.0157 h and 97.2667 h, respectively.
In the second experiment on performance of CPU processors, iML yielded 0.99990
for R-learning and 0.99629 for R-testing, which are better than those reported in Maurya
and Gupta (2015) [12], and confirmed that single ANN was the best model. In the third
experiment of daily demand forecasting orders, iML achieved MAPE values of 0.026%
(learning) and 0.010% (testing). The results are as excellent as those obtained in Ferreira
et al. (2016) [13]. In the fourth experiment for total hourly-shared bike rental, R-learning
and R-testing values of 0.97660 and 0.94790 were reached using iML. The test performance
was 6% better than that obtained by Fanaee-T and Gama (2014) [11]. In addition to the
enhanced prediction performance, the iML possesses ability to determine the best models
on the basis of multiple evaluation metrics.
In conclusion, the iML is a powerful and promising prediction platform for solving
diverse engineering problems. Since the iML platform can only deal with regression
problems, future research should upgrade iML for solving complex classification and
time series problems by automatically presenting the alternative models for practical use
in engineering applications, as well as adding some other advanced ML methods (such
as deep learning models). Moreover, metaheuristic optimization algorithms could be
integrated with the iML to help the users finetune the hyperparameters of chosen machine
learning models.
Author Contributions: Conceptualization, J.-S.C.; data curation, D.-N.T.; formal analysis, J.-S.C. and
D.-N.T.; funding acquisition, J.-S.C.; investigation, J.-S.C., D.-N.T. and C.-F.T.; methodology, J.-S.C.
and C.-F.T.; project administration, J.-S.C.; resources, J.-S.C. and C.-F.T.; software, D.-N.T.; supervision,
J.-S.C.; validation, J.-S.C., D.-N.T. and C.-F.T.; visualization, J.-S.C. and D.-N.T.; writing—original
draft, J.-S.C., D.-N.T. and C.-F.T.; writing—review and editing, J.-S.C. and D.-N.T. All authors have
read and agreed to the published version of the manuscript.
Funding: This research was funded by the Ministry of Science and Technology, Taiwan, under grants
108-2221-E-011-003-MY3 and 107-2221-E-011-035-MY3.
Data Availability Statement: The data that support the findings of this study are available from the
UCI Machine Learning Repository or corresponding author upon reasonable request.
Acknowledgments: The authors would like to thank the Ministry of Science and Technology, Taiwan,
for financially supporting this research.
Conflicts of Interest: The authors declare that they have no conflict of interest.
References
1. Chou, J.-S.; Cheng, M.-Y.; Wu, Y.-W.; Wu, C.-C. Forecasting enterprise resource planning software effort using evolutionary
support vector machine inference model. Int. J. Proj. Manag. 2012, 30, 967–977. [CrossRef]
2. Pham, A.-D.; Ngo, N.-T.; Nguyen, Q.-T.; Truong, N.-S. Hybrid machine learning for predicting strength of sustainable concrete.
Soft Comput. 2020. [CrossRef]
3. Cheng, M.-Y.; Chou, J.-S.; Cao, M.-T. Nature-inspired metaheuristic multivariate adaptive regression splines for predicting
refrigeration system performance. Soft Comput. 2015, 21, 477–489. [CrossRef]
4. Li, Y.; Lei, G.; Bramerdorfer, G.; Peng, S.; Sun, X.; Zhu, J. Machine Learning for Design Optimization of Electromagnetic Devices:
Recent Developments and Future Directions. Appl. Sci. 2021, 11, 1627. [CrossRef]
5. Piersanti, S.; Orlandi, A.; Paulis, F.d. Electromagnetic Absorbing Materials Design by Optimization Using a Machine Learning
Approach. IEEE Trans. Electromagn. Compat. 2018, 1–8. [CrossRef]
6. Chou, J.S.; Pham, A.D. Smart artificial firefly colony algorithm-based support vector regression for enhanced forecasting in civil
engineering. Comput.-Aided Civ. Infrastruct. Eng. 2015, 30, 715–732. [CrossRef]
Mathematics 2021, 9, 686 23 of 25
7. Cheng, M.-Y.; Prayogo, D.; Wu, Y.-W. A self-tuning least squares support vector machine for estimating the pavement rutting
behavior of asphalt mixtures. Soft Comput. 2019, 23, 7755–7768. [CrossRef]
8. Al-Ali, H.; Cuzzocrea, A.; Damiani, E.; Mizouni, R.; Tello, G. A composite machine-learning-based framework for supporting
low-level event logs to high-level business process model activities mappings enhanced by flexible BPMN model translation. Soft
Comput. 2019. [CrossRef]
9. López, J.; Maldonado, S.; Carrasco, M. A novel multi-class SVM model using second-order cone constraints. Appl. Intell. 2016,
44, 457–469. [CrossRef]
10. Bogawar, P.S.; Bhoyar, K.K. An improved multiclass support vector machine classifier using reduced hyper-plane with skewed
binary tree. Appl. Intell. 2018, 48, 4382–4391. [CrossRef]
11. Fanaee-T, H.; Gama, J. Event labeling combining ensemble detectors and background knowledge. Prog. Artif. Intell. 2014,
2, 113–127. [CrossRef]
12. Maurya, V.; Gupta, S.C. Comparative Analysis of Processors Performance Using ANN. In Proceedings of the 2015 5th International
Conference on IT Convergence and Security (ICITCS), Kuala Lumpur, Malaysia, 24–27 August 2015; pp. 1–5.
13. Ferreira, R.P.; Martiniano, A.; Ferreira, A.; Ferreira, A.; Sassi, R.J. Study on Daily Demand Forecasting Orders using Artificial
Neural Network. IEEE Lat. Am. Trans. 2016, 14, 1519–1525. [CrossRef]
14. De’ath, G.; Fabricius, K.E. Classification and regression trees: A powerful yet simple technique for ecological data analysis.
Ecology 2000, 81, 3178–3192. [CrossRef]
15. Li, H.; Wen, G. Modeling reverse thinking for machine learning. Soft Comput. 2020, 24, 1483–1496. [CrossRef]
16. Li, Y. Predicting materials properties and behavior using classification and regression trees. Mater. Sci. Eng. A 2006, 433, 261–268.
[CrossRef]
17. Chou, J.-S.; Yang, K.-H.; Lin, J.-Y. Peak Shear Strength of Discrete Fiber-Reinforced Soils Computed by Machine Learning and
Metaensemble Methods. J. Comput. Civ. Eng. 2016, 30, 04016036. [CrossRef]
18. Qi, C.; Tang, X. Slope stability prediction using integrated metaheuristic and machine learning approaches: A comparative study.
Comput. Ind. Eng. 2018, 118, 112–122. [CrossRef]
19. Chou, J.-S.; Ho, C.-C.; Hoang, H.-S. Determining quality of water in reservoir using machine learning. Ecol. Inform. 2018, 44, 57–75.
[CrossRef]
20. Chou, J.-S.; Bui, D.-K. Modeling heating and cooling loads by artificial intelligence for energy-efficient building design. Energy
Build. 2014, 82, 437–446. [CrossRef]
21. Alkahtani, M.; Choudhary, A.; De, A.; Harding, J.A. A decision support system based on ontology and data mining to improve
design using warranty data. Comput. Ind. Eng. 2018. [CrossRef]
22. Daras, G.; Agard, B.; Penz, B. A spatial data pre-processing tool to improve the quality of the analysis and to reduce preparation
duration. Comput. Ind. Eng. 2018, 119, 219–232. [CrossRef]
23. Chou, J.-S.; Tsai, C.-F. Preliminary cost estimates for thin-film transistor liquid–crystal display inspection and repair equipment:
A hybrid hierarchical approach. Comput. Ind. Eng. 2012, 62, 661–669. [CrossRef]
24. Chen, T. An ANN approach for modeling the multisource yield learning process with semiconductor manufacturing as an
example. Comput. Ind. Eng. 2017, 103, 98–104. [CrossRef]
25. Wu, X.; Kumar, V.; Quinlan, J.R.; Ghosh, J.; Yang, Q.; Motoda, H.; McLachlan, G.J.; Ng, A.; Liu, B.; Philip, S.Y. Top 10 algorithms in
data mining. Knowl. Inf. Syst. 2008, 14, 1–37. [CrossRef]
26. Chou, J.-S.; Ngo, N.-T.; Chong, W.K. The use of artificial intelligence combiners for modeling steel pitting risk and corrosion rate.
Eng. Appl. Artif. Intell. 2017, 65, 471–483. [CrossRef]
27. Das, D.; Pratihar, D.K.; Roy, G.G.; Pal, A.R. Phenomenological model-based study on electron beam welding process, and
input-output modeling using neural networks trained by back-propagation algorithm, genetic algorithms, particle swarm
optimization algorithm and bat algorithm. Appl. Intell. 2018, 48, 2698–2718. [CrossRef]
28. Tewari, S.; Dwivedi, U.D. Ensemble-based big data analytics of lithofacies for automatic development of petroleum reservoirs.
Comput. Ind. Eng. 2018. [CrossRef]
29. Priore, P.; Ponte, B.; Puente, J.; Gómez, A. Learning-based scheduling of flexible manufacturing systems using ensemble methods.
Comput. Ind. Eng. 2018, 126, 282–291. [CrossRef]
30. Fang, K.; Jiang, Y.; Song, M. Customer profitability forecasting using Big Data analytics: A case study of the insurance industry.
Comput. Ind. Eng. 2016, 101, 554–564. [CrossRef]
31. Chou, J.-S. Generalized linear model-based expert system for estimating the cost of transportation projects. Expert Syst. Appl.
2009, 36, 4253–4267. [CrossRef]
32. Dandikas, V.; Heuwinkel, H.; Lichti, F.; Drewes, J.E.; Koch, K. Predicting methane yield by linear regression models: A validation
study for grassland biomass. Bioresour. Technol. 2018, 265, 372–379. [CrossRef] [PubMed]
33. Ngo, S.H.; Kemény, S.; Deák, A. Performance of the ridge regression method as applied to complex linear and nonlinear models.
Chemom. Intell. Lab. Syst. 2003, 67, 69–78. [CrossRef]
34. Sentas, P.; Angelis, L. Categorical missing data imputation for software cost estimation by multinomial logistic regression. J. Syst.
Softw. 2006, 79, 404–414. [CrossRef]
Mathematics 2021, 9, 686 24 of 25
35. Slowik, A. Application of an Adaptive Differential Evolution Algorithm With Multiple Trial Vectors to Artificial Neural Network
Training. IEEE Trans. Ind. Electron. 2011, 58, 3160–3167. [CrossRef]
36. Caputo, A.C.; Pelagagge, P.M. Parametric and neural methods for cost estimation of process vessels. Int. J. Prod. Econ. 2008,
112, 934–954. [CrossRef]
37. Rocabruno-Valdés, C.I.; Ramírez-Verduzco, L.F.; Hernández, J.A. Artificial neural network models to predict density, dynamic
viscosity, and cetane number of biodiesel. Fuel 2015, 147, 9–17. [CrossRef]
38. Ganesan, P.; Rajakarunakaran, S.; Thirugnanasambandam, M.; Devaraj, D. Artificial neural network model to predict the diesel
electric generator performance and exhaust emissions. Energy 2015, 83, 115–124. [CrossRef]
39. Vapnik, V.N. An overview of statistical learning theory. IEEE Trans. Neural Netw. 1999, 10, 988–999. [CrossRef]
40. Vapnik, V. The Nature of Statistical Learning Theory, 2nd ed.; Springer: New York, NY, USA, 2013.
41. Jing, G.; Cai, W.; Chen, H.; Zhai, D.; Cui, C.; Yin, X. An air balancing method using support vector machine for a ventilation
system. Build. Environ. 2018, 143, 487–495. [CrossRef]
42. García-Floriano, A.; López-Martín, C.; Yáñez-Márquez, C.; Abran, A. Support vector regression for predicting software enhance-
ment effort. Inf. Softw. Technol. 2018, 97, 99–109. [CrossRef]
43. Breiman, L.; Friedman, J.; Stone, C.J.; Olshen, R.A. Classification and Regression Trees; Routledge: New York, NY, USA, 2017; p. 368.
[CrossRef]
44. Choi, S.Y.; Seo, I.W. Prediction of fecal coliform using logistic regression and tree-based classification models in the North Han
River, South Korea. J. Hydro-Environ. Res. 2018, 21, 96–108. [CrossRef]
45. Ru, F.; Yin, A.; Jin, J.; Zhang, X.; Yang, X.; Zhang, M.; Gao, C. Prediction of cadmium enrichment in reclaimed coastal soils by
classification and regression tree. Estuar. Coast. Shelf Sci. 2016, 177, 1–7. [CrossRef]
46. Chou, J.-S.; Tsai, C.-F.; Pham, A.-D.; Lu, Y.-H. Machine learning in concrete strength simulations: Multi-nation data analytics.
Constr. Build. Mater. 2014, 73, 771–780. [CrossRef]
47. Elish, M.O. Assessment of voting ensemble for estimating software development effort. In Proceedings of the 2013 IEEE
Symposium on Computational Intelligence and Data Mining (CIDM), Singapore, 16–19 April 2013; pp. 316–321.
48. Wang, Z.; Wang, Y.; Srinivasan, R.S. A novel ensemble learning approach to support building energy use prediction. Energy Build.
2018, 159, 109–122. [CrossRef]
49. Chen, J.; Yin, J.; Zang, L.; Zhang, T.; Zhao, M. Stacking machine learning model for estimating hourly PM2.5 in China based on
Himawari 8 aerosol optical depth data. Sci. Total Environ. 2019, 697, 134021. [CrossRef] [PubMed]
50. Basant, N.; Gupta, S.; Singh, K.P. A three-tier QSAR modeling strategy for estimating eye irritation potential of diverse chemicals
in rabbit for regulatory purposes. Regul. Toxicol. Pharmacol. 2016, 77, 282–291. [CrossRef]
51. Lee, K.M.; Yoo, J.; Kim, S.-W.; Lee, J.-H.; Hong, J. Autonomic machine learning platform. Int. J. Inf. Manag. 2019, 49, 491–501.
[CrossRef]
52. Wolpert, D.H.; Macready, W.G. No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1997, 1, 67–82. [CrossRef]
53. Wolpert, D.H.; Macready, W.G. No Free Lunch Theorems for Search; Technical Report SFI-TR-95-02-010; Santa Fe Institute: Santa Fe,
NM, USA, 1995.
54. Cheng, D.; Shi, Y.; Gwee, B.; Toh, K.; Lin, T. A Hierarchical Multiclassifier System for Automated Analysis of Delayered IC Images.
IEEE Intell. Syst. 2019, 34, 36–43. [CrossRef]
55. Basheer, I.A.; Hajmeer, M. Artificial neural networks: Fundamentals, computing, design, and application. J. Microbiol. Methods
2000, 43, 3–31. [CrossRef]
56. Jain, A.K.; Jianchang, M.; Mohiuddin, K.M. Artificial neural networks: A tutorial. Computer 1996, 29, 31–44. [CrossRef]
57. Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [CrossRef]
58. Chamasemani, F.F.; Singh, Y.P. Multi-class Support Vector Machine (SVM) Classifiers—An Application in Hypothyroid Detection
and Classification. In Proceedings of the 2011 Sixth International Conference on Bio-Inspired Computing: Theories and
Applications, Penang, Malaysia, 27–29 September 2011; pp. 351–356.
59. Yang, X.; Yu, Q.; He, L.; Guo, T. The one-against-all partition based binary tree support vector machine algorithms for multi-class
classification. Neurocomputing 2013, 113, 1–7. [CrossRef]
60. Tuv, E.; Runger, G.C. Scoring levels of categorical variables with heterogeneous data. IEEE Intell. Syst. 2004, 19, 14–19. [CrossRef]
61. Chiang, W.; Liu, X.; Zhang, T.; Yang, B. A Study of Exact Ridge Regression for Big Data. In Proceedings of the 2018 IEEE
International Conference on Big Data (Big Data), Seattle, WA, USA, 10–13 December 2018; pp. 3821–3830.
62. Marquardt, D.W.; Snee, R.D. Ridge Regression in Practice. Am. Stat. 1975, 29, 3–20. [CrossRef]
63. Cox, D.R. The regression analysis of binary sequences. J. R. Stat. Society. Ser. B 1958, 20, 215–242. [CrossRef]
64. Jiang, F.; Guan, Z.; Li, Z.; Wang, X. A method of predicting visual detectability of low-velocity impact damage in composite
structures based on logistic regression model. Chin. J. Aeronaut. 2021, 34, 296–308. [CrossRef]
65. Kohavi, R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the
International Joint Conference on Artificial Intelligence 1995, Montreal, QC, Canada, 20–25 August 1995; pp. 1137–1143.
66. Chou, J.; Truong, D.; Le, T. Interval Forecasting of Financial Time Series by Accelerated Particle Swarm-Optimized Multi-Output
Machine Learning System. IEEE Access 2020, 8, 14798–14808. [CrossRef]
67. Yeh, I.-C. Analysis of Strength of Concrete Using Design of Experiments and Neural Networks. J. Mater. Civ. Eng. 2006,
18, 597–604. [CrossRef]
Mathematics 2021, 9, 686 25 of 25
68. Yeh, I.C.; Hsu, T.-K. Building real estate valuation models with comparative approach through case-based reasoning. Appl. Soft
Comput. 2018, 65, 260–271. [CrossRef]
69. Tsanas, A.; Xifara, A. Accurate quantitative estimation of energy performance of residential buildings using statistical machine
learning tools. Energy Build. 2012, 49, 560–567. [CrossRef]
70. Lau, K.; López, R. A Neural Networks Approach to Aerofoil Noise Prediction; International Center for Numerical Methods in
Engineering: Barcelona, Spain, 2009.