Applied Sciences

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

applied

sciences
Article
Predicting Commercial Building Energy Consumption Using a
Multivariate Multilayered Long-Short Term Memory
Time-Series Model
Tan Ngoc Dinh *, Gokul Sidarth Thirunavukkarasu , Mehdi Seyedmahmoudian *, Saad Mekhilef
and Alex Stojcevski

School of Science, Computing and Engineering Technologies, Swinburne University of Technology,


Hawthorn, VIC 3122, Australia; [email protected] (G.S.T.); [email protected] (S.M.);
[email protected] (A.S.)
* Correspondence: [email protected] (T.N.D.); [email protected] (M.S.)

Abstract: The global demand for energy has been steadily increasing due to population growth,
urbanization, and industrialization. Numerous researchers worldwide are striving to create precise
forecasting models for predicting energy consumption to manage supply and demand effectively. In
this research, a time-series forecasting model based on multivariate multilayered long short-term
memory (LSTM) is proposed for forecasting energy consumption and tested using data obtained
from commercial buildings in Melbourne, Australia: the Advanced Technologies Center, Advanced
Manufacturing and Design Center, and Knox Innovation, Opportunity, and Sustainability Center
buildings. This research specifically identifies the best forecasting method for subtropical condi-
tions and evaluates its performance by comparing it with the most commonly used methods at
present, including LSTM, bidirectional LSTM, and linear regression. The proposed multivariate,
multilayered LSTM model was assessed by comparing mean average error (MAE), root-mean-square
error (RMSE), and mean absolute percentage error (MAPE) values with and without labeled time.
Results indicate that the proposed model exhibits optimal performance with improved precision and
Citation: Dinh, T.N.; accuracy. Specifically, the proposed LSTM model achieved a decrease in MAE of 30%, RMSE of 25%,
Thirunavukkarasu, G.S.;
and MAPE of 20% compared with the LSTM method. Moreover, it outperformed the bidirectional
Seyedmahmoudian, M.; Mekhilef, S.;
LSTM method with a reduction in MAE of 10%, RMSE of 20%, and MAPE of 18%. Furthermore, the
Stojcevski, A. Predicting Commercial
proposed model surpassed linear regression with a decrease in MAE by 2%, RMSE by 7%, and MAPE
Building Energy Consumption Using
a Multivariate Multilayered
by 10%.These findings highlight the significant performance increase achieved by the proposed
Long-Short Term Memory multivariate multilayered LSTM model in energy consumption forecasting.
Time-Series Model. Appl. Sci. 2023,
13, 7775. https://doi.org/10.3390/ Keywords: energy consumption; time-series forecasting; long short-term memory; machine learning
app13137775

Academic Editor: Dae-Ki Kang

Received: 30 May 2023 1. Introduction


Revised: 18 June 2023 Energy consumption refers to the amount of energy used over a certain period of
Accepted: 21 June 2023 time, typically measured in kilowatt-hours (kWh) or British thermal units (BTUs). It is
Published: 30 June 2023
a crucial metric for evaluating energy usage, efficiency, and understanding the energy
requirements of a specific region or country [1]. Urban mining, which involves extracting
valuable minerals from waste or secondary resources, is emphasized as a sustainable
Copyright: © 2023 by the authors.
solution. Various secondary resources, such as biomass, desalination water, sewage sludge,
Licensee MDPI, Basel, Switzerland.
phosphogypsum, and e-waste, have been evaluated for their market potential and elemental
This article is an open access article composition, showcasing their potential to partially replace minerals in different sectors.
distributed under the terms and The review also discusses technological advancements in mineral extraction from waste,
conditions of the Creative Commons emphasizing the importance of improving processes for large-scale implementation [2].
Attribution (CC BY) license (https:// In the context of renewable energy, biofuels are identified as a potential alternative for
creativecommons.org/licenses/by/ the transportation industry, and research on utilizing surplus rice straw as a feedstock for
4.0/). biofuels is explored. While biofuels have the potential to reduce emissions, further research

Appl. Sci. 2023, 13, 7775. https://doi.org/10.3390/app13137775 https://www.mdpi.com/journal/applsci


Appl. Sci. 2023, 13, 7775 2 of 16

is needed to address concerns regarding food security, feedstock selection, and their impact
on climate and human health [3].
According to the International Energy Agency (IEA) [4], global energy consumption
is expected to continue to increase in the coming decades, driven by population growth,
urbanization, and industrialization in developing countries. However, most of the current
research has focused on forecasting for countries or regions [5,6]. There are only a limited
number of studies that specifically focus on forecasting for individual buildings. Energy
consumption in buildings encompasses the precise measurement of energy utilized for
specific purposes such as heating, cooling, lighting, and other essential functions within
residential, commercial, and institutional structures. In China and India, buildings account
for 37% [7] and 35% [8] of global energy consumption, making it an important area for
energy efficiency and sustainability efforts [9].
In this study, we propose a forecasting model for energy consumption for various
commercial buildings, such as Hawthorn Campus—ATC Building, Hawthorn Campus—
AMDC Building, Wantirna Campus—KIOSC Building. In the study, the proposed model,
along with its trained data preprocessing method, demonstrated superior performance
compared with other popular models, particularly when there is a sufficient amount of
training data available or when there is a lack of training data. The results indicate that
the proposed method and model can be used to accurately predict energy consumption
in commercial buildings, which is crucial for energy management and conservation. In
addition, the proposed method can be easily applied to other commercial buildings with
similar energy consumption patterns, providing a practical solution for energy management
in the commercial building sector.
The rest of this study is organized as follows. Section 2 presents background knowl-
edge by reviewing the existing research on forecasting models and their adoption for energy
consumption forecasting. Section 3 introduces the available data and the configuration of
the experiment. Section 2 provides the background of the research. Section 4 provides the
details of the proposed model. Section 5 describes some popular bench-marking models.
Section 6 presents the experiment to evaluate the efficiency of the models on different
datasets. Finally, the conclusion of this study is given in Section 7.

2. Background
2.1. Forecasting Models
A forecasting model is a mathematical algorithm or statistical tool used to predict
future trends, values, or events based on historical data and patterns. A forecasting model
M that analyzes the historical values from the current time t to return the predicted future
value at time t + 1 (denoted as ŷt+1 ) is built. The objective of the forecasting model is to
minimize the discrepancy between the estimated value ŷt+1 and the actual value yt+1 by
seeking the closest approximation. To achieve this in a temporal context where data points
are indexed in time order, a specific type of forecasting model is often used.
A time series forecasting model is a predictive algorithm that utilizes historical time-
series data to anticipate future trends or patterns in the data over time. Two techniques
are available for building the model M to obtain this objective, i.e., (i) univariate and
(ii) multivariate time-series (TS) [10–12]. In univariate TS, only a 1D sequence of energy
consumption value yt = {yt−(k−1) , yt−(k−2) , . . . , yt−1 , yt } is utilized to produce the esti-
mated value ŷt+1 , where k is a period of time from the current time t [13,14]. By contrast,
for multivariate TS, we could employ one or more other historical features in addition
to the energy consumption for training model M [15]. They can be time fields or other
specific sources. Therefore, the input for multivariate TS is a multi-dimensional sequence
Xt = {xt−(k−1) , xt−(k−2) , . . . , xt−1 , xt }, with xi ∈ Rn is a vector of dimension n. In forecast-
ing new values, the model could be enhanced if related available features are taken into
account. The additional information could help the model capture the dependencies or
correlations between features and the target variable. Therefore, the model could better
understand the context, mitigate the impact of missing values, and make more precise
Appl. Sci. 2023, 13, 7775 3 of 16

predictions. Therefore, multivariate TS has been frequently employed for building the
forecasting model recently [16].

2.2. Energy Consumption Forecasting


Energy consumption forecasting refers to predicting a particular region’s future energy
consumption based on historical consumption and other relevant factors [1]. Accurate
energy consumption forecasting is essential for proper energy planning, pricing, and
management. It plays a significant role in the transition toward a more sustainable energy
future. There have been numerous studies on energy consumption forecasting, and various
models have been proposed for this purpose [17]. As shown in Figure 1, in the literature,
forecasting models often fall into two categories: (i) Conventional Models and (ii) Artificial
Intelligence (AI) Models. Table 1 shows different forecasting models for commercial
buildings. It shows the names of forecasting techniques, locations, and performance
evaluation indexes with the best accuracy.

Table 1. Summary of different forecasting models for commercial buildings.

Forecast
No. Forecasting Model Year Country Ref. Accuracy
Horizon
Normalised-
MAPE MAE
RMSE
ANN model with external variables hour
1 2019 Korea [18] 1.69% 85.44
(NARX) ahead
Long Short-term Memory Networks hour
2 2020 USA [19] 5.96% 7.21
with attention (LSTM) ahead
hour
3 AdaBoost.R2 2021 Portugal [20] 5.34%
ahead
hour
4 Support Vector Machine (SVM) 2022 Ireland [21] 5.3% 3.82 11.94 kW
ahead
hour
5 Seq2seq RNN 2020 USA [22] 3.74 kW
ahead
hour
Bayesian regularized (BR) (12 inputs) 2019 Canada [23] 1.83% 105.03 kW
ahead
6
hour
Levenberg Macquardt (LM) (12 inputs) 2019 Canada 1.82% 104.21 kW
ahead
Hybrid convolutional neural network
hour
7 (CNN) with an LSTM autoencoder 2020 Korea [24] 0.76% 0.47 0.31
ahead
(LSTM-AE)
Hybrid method of Random Forest (RF)
and Long Short-Term Memory (LSTM)
hour
8 based on Complete Ensemble 2022 USA [25] 5.33% 0.57 0.43
ahead
Empirical Mode Decomposition with
Adaptive Noise (CEEMDA)
Seasonal autoregressive integrated day
9 2020 Korea [26] 27.15% 557.6 kW
moving average (SARIMAX) ahead
day
10 Gated Recurrent Unit (GRU) 2022 Spain [27] 7.86% 156.11
ahead
Appl. Sci. 2023, 13, 7775 4 of 16

Table 1. Cont.

Forecast
No. Forecasting Model Year Country Ref. Accuracy
Horizon
Normalised-
MAPE MAE
RMSE
Hybrid Neural Fuzzy Interface System day
2019 Portugal [28] 8.71%
(HyFIS) ahead
Wang and Mendel’s Fuzzy Rule day
2019 Portugal 8.58%
11 Learning Method (WM) ahead
A genetic fuzzy system for fuzzy rule
day
learning based on the MOGUL 2019 Portugal 9.87%
ahead
methodology (GFS.FR.MOGUL)
day
12 XGBoost 2022 Spain [29] 8.83
ahead

Energy Consumption
Forecasting

Artificial Intelligence
Conventional Models
(AI) Models

Stochastic Regression Based


based
Gray models Artificial
Artificial Neural
neural Network
network Support Vector Machine
Time
time series
Series Approach
approach

Figure 1. Common categories of Forecasting model in the literature.

Conventional models used in energy consumption forecasting commonly include


Stochastic time series (TS) models [30], regression models (RMs) [31], and gray models
(GMs) [32,33]. These models typically require historical energy consumption data as
input and use various statistical and mathematical techniques to make future energy
consumption predictions. However, these models may not be able to capture complex
nonlinear relationships and may require manual feature engineering, making them less
efficient and scalable compared with AI-based models. AI-based models have become
increasingly popular in the field of energy forecasting due to their ability to learn patterns
and relationships in complex data [34–36]. The use of AI in energy forecasting has the
potential to reduce energy costs, optimize energy production, and enhance energy security.
Furthermore, LSTM models are able to capture both short-term and long-term dependencies
in TS data. They are also capable of handling non-linear relationships between input and
output variables, which is important in energy forecasting, where the relationships may be
complex. Finally, LSTMs could process sequential data of varying lengths, which is useful
for handling variable-length TS data in energy forecasting.

3. Data Used in This Study


3.1. Data Collection
Data collected from three buildings in different regions is employed to evaluate
the models, i.e., Hawthorn Campus—ATC Building (denoted as DatasetS1), Hawthorn
Campus—AMDC Building (denoted as DatasetS2), and Wantirna Campus—KIOSC Build-
ing (denoted as DatasetS3) are incredibly valuable as it provides real-time insights into the
building’s performance, energy consumption, and operational efficiencies, allowing for a
swift response to potential issues. This real-time data not only enhance decision-making
capabilities for building management, maintenance, and optimization but also provides a
basis for developing more accurate forecasting models. Furthermore, it can guide strategic
energy management, potentially leading to significant cost savings, improved sustainabil-
Appl. Sci. 2023, 13, 7775 5 of 16

ity, and increased occupant comfort over time. The two datasets DatasetS1 and DatasetS2
contain the energy consumption from 2017 to 2019, and the dataset DatasetS3 contains the
energy consumption from 2018 to 2019. The prediction value is the difference between the
previous and the intermediate next time in using energy, or the cumulation of energy. The
historical value is 96 data points every 15 min to predict the next value. Figure 2 Indicates
the location of the buildings from the Hawthorn Campus and the Wantrina Campus in the
context of Melbourne and Figures 3 and 4 indicate the Electricity accumulation in every 15
min and one hour for the Hawthorn campus (ATC building).

Figure 2. Location of Hawthorn Campus and Wantirna Campus in Metropolitan Melbourne, Victo-
ria, Australia.

5,000 10,000 15,000 20,000 25,000 30,000 35,000

Figure 3. Electricity accumulation in every 15 min at Hawthorn Campus—ATC Building.

Figure 4. Average electricity accumulation in every 1 h at Hawthorn Campus—ATC Building in 2018.


As can be seen, electricity consumption peaks between 8 a.m. and 21 p.m.

3.2. Data Setup


Dataset. In this experiment, three datasets from three buildings in different regions
are employed to evaluate the models, i.e., Hawthorn Campus—ATC Building (denoted
as DatasetS1), Hawthorn Campus—AMDC Building (denoted as DatasetS2), and Wan-
tirna Campus—KIOSC Building (denoted as DatasetS3). The two datasets DatasetS1 and
DatasetS2 contain the energy consumption from 2017 to 2019, and the dataset DatasetS3
contains the energy consumption from 2018 to 2019. The prediction value is the difference
between the previous and the intermediate next time in using energy, or the cumulation of
energy. The historical value is 96 data points every 15 min to predict the next value.
Configuration of the proposed model. The proposed model, M-LSTM, consists of a
succession of one input layer, two LSTM layers, and one dense layer at the end. The input
Appl. Sci. 2023, 13, 7775 6 of 16

layer contains two input types, as described in Section 4.1. In the following, the first LSTM
layer wraps eight LSTM units, and the second wraps four units. The last dense layer has
one unit for predicting energy consumption.
Configuration of bench-marking models. As mentioned earlier, three competitive
models are used for comparison: LSTM, Bi-LSTM, LR, and SVM models. The LSTM model
consists of one single layer with one unit, followed by a Dense layer for prediction. The
Bi-LSTM consists of one single Bi-Directional LSTM layer of one unit, followed by a Dense
layer for prediction as the LSTM model. The LR model trains on one dense layer.
Training Configuration. Both M-LSTM, LSTM, Bi-LSTM, LR, and SVM models are
trained using the same training set and evaluated on the same test set. In DatasetS1 and
DatasetS2, the models are trained on the data from 2017 and 2018. In DatasetS3, the models
are trained on the data from 2018 to demonstrate the ability of models with a lack of
training data.

4. Methodology
In this section, we propose the forecasting model (Multivariate Multilayered LSTM),
which is referred to as M-LSTM. The overview of the proposed method is illustrated in
Figure 5. There are three phases, i.e., data preprocessing, model training, and evaluation.

Start

Data Differencing Time Labeling


Preprocessing
Data

Concatenation

Window Slicing
Input Layer

x1 x
x22 x
x33 x k-1 x1k
......

LSTM11
LSTM LSTM11
LSTM LSTM11
LSTM LSTM11
LSTM LSTM11
LSTM
Training ... ...
mechanism
Hidden Layer

Loss A Anumber
numberofoflayers
layers
optimization
with
Adam
LSTMmh
LSTM LSTMmh
LSTM LSTMmh
LSTM LSTM
LSTMhm LSTMmh
LSTM
Optimizer ... ...
Ouput Layer

Dense layer
Dense layer

Evaluation

Stop

Figure 5. Workflow of the proposed model.


Appl. Sci. 2023, 13, 7775 7 of 16

4.1. Data Preprocessing


Two major techniques are used in the data preprocessing phase, i.e., (i) data differenc-
ing and (ii) time labeling. Additionally, there are two other techniques, i.e., (iii) concatena-
tion and (iv) window slicing.
In regard to data differencing, due to a large amount of energy consumption and the
fact that the value is not commonly stationary, the difference between the data value and
its previous data value is taken into account. By removing these patterns through data
differencing, the resulting stationary TS can be more easily modeled and forecasted. This
technique is often required for many forecasting models [37].
In time labelling, the amount of energy consumed during peak time points is typically
greater than that during non-peak times. Consequently, the labeling convention assigns a
value of 1 to peak time and 0 to non-peak time [38]. These labeled time periods serve as
valuable features for training the forecasting model. In the concatenation phase, the data
from data differencing and labeled time are concatenated and then sliced into vectors with
a length of k during the window-slicing phase.

4.2. Forecasting Model—Multivariate Multilayered LSTM


M-LSTM is an extension of the LSTM model, which is a type of recurrent neural network
(RNN) architecture used for sequential data processing tasks such as TS forecasting. There
are h sub-layers In the hidden layer. The M-LSTM model consists of multiple LSTM layers,
with each layer having its own set of neurons that process the input variables independently.
The output of each layer is then fed into the next layer, allowing the model to capture more
complex and abstract relationships between the input variables. There are some additional
layers, such as the dropout layer, normalization layer, etc., in the hidden part for training
the model efficiently.
As shown in Figure 6, the memory cell is responsible for storing information about the
long-term dependencies and patterns in the input sequence, while the gates control the flow
of information into and out of the cell. The gates are composed of sigmoid neural network
layers and a point-wise multiplication operation, that allows the network to learn which
information to keep or discard. In particular, each ith cell at the layer m, denoted as Cim , has
three inputs, i.e., the hidden state him−1 , cell state cim−1 of the previous cell, and the hidden
state him−1 of the cell i at the previous layer m − 1. The cell Cim has two recurrent features,
i.e., hidden state him , and cell state cim . Cim is a mathematical function as Equation (1), that
takes three inputs and returns two outputs. Both outputs leave the cell at time i and are fed
into that same cell at time i + 1, and the input sequence xi+1 is also fed into the cell. In the
first layer (m = 1), the hidden state him−1 is the input xi .

(him , cim ) = C (him−1 , cim−1 , him−1 ) (1)


Inside the ith cell, the previous hidden state him−1 and input vector xi are fed into three
gates. i.e., input gate (igim ), forget gate ( f gim ), output gate (ogim ). They are sigmoid functions
(σ), each of which produces a scalar value as described in Equations (2)–(4) respectively.

igim (him−1 , him−1 ) = σ(wig,m−1 him−1 + wig,m him−1 + big ) (2)

f gim (him−1 , him−1 ) = σ (w f g,m−1 him−1 + w f g,m him−1 + b f g ) (3)

ogim (him−1 , him−1 ) = σ (wog,m−1 him−1 + wog,m him−1 + bog ) (4)


where wig,m−1 , whg,m−1 , wog,m−1 ∈ Rn and wig,m , whg,m , wog,m , big , b f g , bog ∈ R denote weight,
which is the parameter that should be updated during the training of the cell. Another
Appl. Sci. 2023, 13, 7775 8 of 16

scalar function, called the update function (denoted as ug) has a tanh activation function as
described in Equation (5).

ugi (him−1 , him−1 ) = tanh(wug,x him−1 + wug,h him−1 + bug ) (5)


where wig,x ∈ R and wug,h ∈ R are further weighted to be learned. The returned cell state
(cim ) and hidden state (hi ) are formulated in Equation (6), (7) respectively.

cim = f gim · cim−1 + igim · ugi (6)

him = ogim · tanh(cim ) (7)


In energy forecasting, loss optimization is one of the important steps to improve the
accuracy of the model [39]. One common technique for loss optimization is using the Adam
optimizer, which is a stochastic gradient descent optimizer that uses moving averages
of the parameters to adapt the learning rate. The Adam optimizer computes individual
adaptive learning rates for different parameters from estimates of the first and second
moments of the gradients. This makes it suitable for optimizing the loss function in models
where there are a large number of parameters. By using the Adam optimizer, the model can
efficiently learn and update the weights of the neurons in each time step, resulting in better
prediction accuracy. In addition, the loss function formulation of the Adam optimizer used
in the proposed model aims to strike a balance between accuracy and robustness, taking
into account the unique characteristics of the data and the specific requirements of the
forecasting task.

him
m
ci-1 ci m
x +

tanh

x
x

σ σ tanh σ
m
hi-1
him

him-1

Figure 6. Illustration of the ith LSTM cell at the layer m.

5. Bench-Marking Models
In this study, we compare the proposed model with three well-known models, i.e.,
linear regression (LR), long-short-term memory (LSTM), bidirectional long-short-term
memory (Bi-LSTM), and Support Vector Machine (SVM).

5.1. Linear Regression


LR allows knowing the relationship between the response variable (energy consump-
tion) and the return variables (the other variables). As a causative technique, regression
analysis predicts energy demand from one or more reasons (independent variables), which
might include things such as the day of the week, energy prices, the availability of housing,
or other variables. When there is a clear pattern in the previous forecast data, the LR method
is applied. Due to this, its simple application has been used in numerous works related
to the prediction of electricity consumption. Bianco V et al. (2009) used a LR model to
conduct a study on the projection of Italy’s electricity consumption [40], while Saab C et al.
Appl. Sci. 2023, 13, 7775 9 of 16

(2001) looked into various univariate modeling approaches to project Lebanon’s monthly
electric energy usage [41]. With the help of our statistical model, this research has produced
fantastic outcomes.
The LR model works by fitting a line to a set of data points with the goal of minimizing
the sum of the squared differences between the predicted and actual values of the depen-
dent variable. The slope of the line represents the relationship between the dependent and
independent variables, while the intercept represents the value of the dependent variable
when the independent variable is equal to zero. The LR model describes the linear rela-
tionship between the previous values yt and the estimated future value ŷt+1 , formulated
as follows:
t
ŷt+1 = ∑ wi · y i (8)
i =t−(k −1)

5.2. LSTM
The LSTM technique is a type of Recurrent Neural Network (RNN). The RNNs [42] are
capable of processing data sequences, or data that must be read together in a precise order
to have meaning, in contrast to standard neural networks. This ability is made possible by
the RNNs’ architectural design, which enables them to receive input specific to each instant
of time in addition to the value of the activation from the previous instant. Given their
ability to preserve data from earlier actions, these earlier temporal instants provide for a
certain amount of “memory”. Consequently, they possess a memory cell, which maintains
the state throughout time [43]. Figure 7 illustrates an overview of the simple LSTM Model.

x1 x2 x3 ... xk-1 xk

0 LSTM LSTM LSTM ... LSTM LSTM

Dense layer

Figure 7. Overview of the LSTM model.

As noted in Section 4, the LSTM [44] model has the ability to remove or add informa-
tion to decide what information needs to go through the network from the cell state [44].
Different from the M-LSTM model, the LSTM model has just one LSTM layer, with the input
sequence x. Therefore, the hidden state (h) and cell state (c) for the ith LSTM cell are
calculated as Equation (9).

( h i , c i ) = C ( h i − 1 , c i − 1 , xi ) (9)
In the experiment, we compare the performance of the proposed model with the
univariate and multivariate LSTM models. The univariate LSTM takes the first input (i)
described in Section 4.1, and the multivariate LSTM takes both those inputs.

5.3. Bidirectional LSTM


Bi-LSTM is also an RNN. It utilizes information in both the previous and following
directions in the training phase [45]. The fundamental principle of the Bi-LSTM model is
that it examines a specific sequence from both the front and back. Which uses one LSTM
layer for forward processing and the other for backward processing. The network would be
Appl. Sci. 2023, 13, 7775 10 of 16

able to record the evolution of energy that would power both its history and its future [46].
This bidirectional processing is achieved by duplicating the hidden layers of the LSTM,
where one set of layers processes the input sequence in the forward direction and another
set of layers processes the input sequence in the reverse direction. As illustrated in Figure 8,
f f
the hidden state (hi ) and the cell state (ci ) in the ith forward LSTM cell are calculated as
similar as the Equation (9). On the contrary, each LSTM cell in the backward LSTM takes
the following hidden state (hib+1 ), and following cell state (cib+1 ), and xi as input. Therefore,
the hidden state (hib ) and cell state (cib ) of the ith backward LSTM cell are calculated as
Equation (10).

(hib , cib ) = C (hib+1 , cib+1 , xi ) (10)

x1 x2 x3 ... xk-1 xk

LSTM LSTM LSTM ... LSTM LSTM 0


Backward

0 LSTM LSTM LSTM ... LSTM LSTM


Forward

σ σ σ σ σ

Dense layer

Figure 8. Overview of the Bi-LSTM model.

After the calculations of both forward and backward LSTM cells, the hidden states of
the two directions could be concatenated or combined in some way to obtain the output.
The common combination is the sigmoid function, as noted in Figure 8. The output is fed
into the Dense layer to obtain the final prediction. Similar to the LSTM model, we also
compare the proposed model with univariate and multivariate Bi-LSTM models.

5.4. Support Vector Machine


Support Vector Machines (SVM) are supervised machine learning models used for
both classification and regression tasks. In this paper, SVM with a Radial Basis Function
(RBF) kernel is used for the regression task. This kind of SVM model utilizes the RBF kernel
to transform the input space into a higher-dimensional feature space, enabling the SVM
to learn nonlinear decision boundaries. The formulation of the RBF kernel is defined as
Equation (11).

K (x, x0 ) = exp(−γ × ||x − x0 ||2 ) (11)


where x, x0 are input data points, ||.|| denotes the Euclidean distance between them, and γ
is a parameter that controls the width of the Gaussian curve. Higher values of gamma result
Appl. Sci. 2023, 13, 7775 11 of 16

in more localized and complex decision boundaries. Furthermore, the decision function in
SVM with RBF kernel can be represented as Equation (12).
D
f (x) = b + ∑ αi × yi × K (x, xi ) (12)
i =1

where, x is the input data point, b is the bias term, αi is the Lagrange multiplier associated
with the ith support vector, yi is the corresponding class label, K (x, xi ) is the RBF kernel
function, and the summation is performed over all support vectors.
The SVM with RBF kernel formulation aims to find the optimal hyperplane that
maximizes the margin between the classes while allowing some misclassifications. The RBF
kernel enables the SVM to capture complex, nonlinear patterns in the data by mapping the
data to a higher-dimensional feature space. The model is trained by solving the quadratic
programming problem to find the Lagrange multipliers ( αi ) and bias term b that define
the decision function.

6. Experiment
6.1. Metric
To better evaluate the performance, a model is tested by making a set of predic-
tions ŷ = {ŷ1 , ŷ2 , . . . , ŷ D } and then comparing it with a set of known actual values
Y = {y1 , y2 , . . . , y D }, where D is the size of the test set. Three common metrics are used
to compare the overall distance of these two sets, i.e., mean absolute percentage error
(MAPE), normalized root mean squared error (NRMSE), and R-squared score (R2 score).
MAPE As shown in Equation (13), MAPE is calculated by taking the absolute differ-
ence between the predicted and actual values, dividing it by the actual value, and then
taking the average of these values over the entire dataset. This calculation results in a single
number that represents the average percentage difference between the predicted and actual
values. The smaller the MAPE value, the better the model’s performance

100% D ŷi − yi
D i∑
MAPE = (13)
=1
yi
NRMSE Normalized Root Mean Squared Error (NRMSE) is a metric used to evaluate
the accuracy of a prediction model. It measures the normalized average magnitude of
the residuals or errors between the predicted values and the actual values, as shown in
Equation (14). q
( D1 ) ∑iD=1 (ŷi − yi )2
NRMSE = (14)
ymax − ymin
where, ŷi represents the predicted values, A represents the actual values, and sqrt() denotes
the square root function. The term (ŷi − yi )2 calculates the squared residuals or errors
between the predicted and actual values. The ymax and ymax represent the maximum and
minimum values in the actual values, respectively. The smaller the NRMSE value, the
better the model’s performance.
R2 score The R2 score, also known as the coefficient of determination, is a statistical
measure that indicates the proportion of the variance in the dependent variable that is
predictable from the independent variables in a regression model. The R2 score is typically
used to evaluate the fitness of a regression model, as formulated in Equation (15).

∑iD=1 (ŷi − yi )2
R2 = 1 − (15)
∑iD=1 (yi − y∗ )2
where y∗ the mean of the actual values. In essence, the R2 score is a measure of how well
the regression model fits the data and provides an assessment of its predictive performance.
A higher R2 score indicates a better fit and stronger explanatory power of the model.
Appl. Sci. 2023, 13, 7775 12 of 16

6.2. Result and Discussion


This study aims to experimentally address the effectiveness of the proposed model by
answering the following research inquiries: the general performance of training the pro-
posed model and the comparative performance analysis against other competitive models.

6.2.1. General Performance


In the first part of the experiments, M-LSTM is trained and evaluated with two types
of data preprocessing strategies, i.e., with a labeled time field and without a labelled time
field. Figure 9 shows the results of the test set in DatasetS1, DatasetS2, and DatasetS3. In this
question, there are two results, (i) a sufficient training set, and (ii) a lack of a training set.

With labeled time Without labeled time With labeled time Without labeled time
0.3 0.15

0.2 0.10

0.1 0.05

0.0 0.00
datasetS1 datasetS2 datasetS3 datasetS1 datasetS2 datasetS3

(a) (b)

With labeled time Without labeled time


1.00

0.75

0.50

0.25

0.00
datasetS1 datasetS2 datasetS3

(c)

Figure 9. Comparison of M-LSTM trained with labelled time and without labelled time. (a) MAPE
error in the scale of (0, 1), (b) NRMSE error, (c) R2 score. Note that, the higher R2 score indicates the
better model’s performance.

For the first result (i), the model is sufficiently trained with data from 2017 and 2018 and
evaluated in 2019 from DatasetS1 and DatasetS2. Figure 9 shows that the model achieves
better performance in all three metrics with the labeled time field. The results are similar
under the same settings for the other models. The details are provided in Table 2 and the
line plot in Figure 10. Therefore, the models can learn and extract more valuable features if
they are trained with the appropriate data preprocessing strategy.
For the second result (ii), the model is only trained with data from 2018 and evaluated
in 2019 from DatasetS3. Figure 10 shows that the model performed well in predicting and
matching the actual values, as evidenced by its superior fit line compared with the other
models in Figures 9 and 10. These findings suggest that the proposed preprocessing method
is effective, particularly in situations with a limited amount of training data available for
model training.
Appl. Sci. 2023, 13, 7775 13 of 16

Figure 10. Comparison of M-LSTM with labelled time (M-LSTMt ) with M-LSTM without labelled time,
and other models in case of lack of training data (DatasetS3 in 2019). The time step is 7 days.

Table 2. Comparison of M-LSTM with competitive models with and without labelled time. MAPEt ,
NRMSEt , R2t score. are denoted metrics for model trained with labelled time. MAPE, NRMSE, R2
score are denoted metrics for model trained without labelled time. MAPEt and MAPE are rescale to
the range from 0 to 1. Better values are marked in bold.

With Labelled Time Without Labelled Time


Dataset Model M APEt N RMSEt R2t Score M APE N RMSE R2 Score
M-LSTM 0.159 0.071 0.543 0.251 0.084 0.347
LSTM 0.260 0.090 0.252 0.270 0.097 0.135
Bi-LSTM 0.324 0.097 0.180 0.343 0.134 0.115
DatasetS1
Linear
0.285 0.094 0.191 0.311 0.093 0.216
Regression
SVM 0.239 0.074 0.490 0.258 0.081 0.310
M-LSTM 0.139 0.034 0.831 0.176 0.044 0.719
LSTM 0.385 0.075 0.156 0.495 0.091 −0.230
Bi-LSTM 0.352 0.087 0.143 0.476 0.093 −0.258
DatasetS2
Linear
0.167 0.034 0.827 0.345 0.069 0.291
Regression
SVM 0.208 0.058 0.248 0.388 0.078 0.122
M-LSTM 0.072 0.130 0.506 0.099 0.134 0.399
LSTM 0.184 0.146 0.378 0.431 0.144 0.395
Bi-LSTM 0.449 0.331 −2.196 0.814 0.295 −1.536
DatasetS3
Linear
0.312 0.182 0.035 0.390 0.138 0.141
Regression
SVM 0.192 0.136 0.404 0.397 0.147 0.341

6.2.2. Experience Different Models


In the second part of the experiments, we compare the performance evaluation results
of M-LSTM to those of other competitive models (LSTM, Bi-LSTM, LR, and SVM models)
on three datasets in terms of three metrics. Two sets of performance metrics are presented;
one set includes time label information (MAPEt , NRMSEt , R2t score), whereas the other set
does not include time label information (MAPE, NRMSE, R2 score). Table 2 presents the
performance evaluation results of M-LSTM and LSTM, Bi-LSTM, LR, and SVM models on
three datasets in terms of three metrics. Figure 11 shows the MAPE error of models with
and without labeled time.
In general, the results indicate that models using labelled time information tend to
perform better than those that do not use labelled time information in the same setting.
For example, on DatasetS1, the four models get lower values, while the same models
have higher error values. On the other hand, among the different models, the model
M-LSTM using the labeled time information tends to perform the best overall, with the
lowest MAPEt , NRMSEt , and the highest R2t score in most cases. To summarize, using the
proposed preprocessing method with time information tends to improve the performance
Appl. Sci. 2023, 13, 7775 14 of 16

of the models. The M-LSTM model using time label information performs the best in general.
The labeled time field provides useful information for predicting energy consumption in
peak and non-peak periods. These findings suggest that considering time information can
help in accurately predicting the target variable in the studied datasets.

1.00

0.75
MAPE

0.50

0.25

0.00
M-LSTM_t M-LSTM LSTM_t LSTM Bi-LSTM_t Bi-LSTM Linear Linear SVM_t SVM
Regression_t Regression

Model

Figure 11. Bart chart in the comparison of M-LSTM with labelled time (M-LSTMt ) with M-LSTM without
labelled time, and other models with and without labelled time in case of lack of training data
(DatasetS3 in 2019).

7. Conclusions
In conclusion, this work presents a method for pre-processing data and a model for
accurately predicting energy consumption in commercial buildings, specifically focusing
on buildings on the Hawthorn and Wantirna campuses. The proposed pre-processing
method effectively improves the accuracy of energy consumption prediction, even when
training data are limited. The results demonstrate the applicability of the proposed method
and model for accurately predicting energy consumption in various commercial buildings.
The proposed model, denoted as M-LSTM, achieved the lowest MPAE values of 0.159,
0.139, and 0.072 for DatasetS1, DatasetS2, and DatasetS3, respectively. This achievement is
crucial for effective energy management and conservation in commercial buildings. The
practicality of this approach extends to other commercial buildings with similar energy
consumption patterns, making it a viable solution for energy management in the commer-
cial building sector. Visualizations were also provided to aid in understanding the data
patterns and trends in the model predictions. Additionally, further research can explore
the effectiveness of the proposed pre-processing method and models in predicting energy
consumption for different types of buildings or larger datasets. Exploring alternative
techniques, such as seasonal decomposition or time series analysis, for incorporating time
information into the models could also yield valuable insights. These advancements in
energy consumption forecasting contribute to significant cost savings and environmental
benefits in commercial buildings.

Author Contributions: Individual Contribution: Conceptualization, T.N.D., G.S.T., M.S., S.M. and
A.S.; Methodology, T.N.D., G.S.T. and M.S.; Software, T.N.D. and G.S.T.; Validation, M.S., A.S. and
S.M.; Formal analysis, G.S.T., M.S., A.S. and S.M.; Investigation,G.S.T., M.S., S.M. and A.S.; Resources,
M.S., A.S. and S.M.; Data curation, G.S.T., T.N.D. and M.S.; Writing—original draft preparation, T.N.D.
and G.S.T.; Writing—review and Editing, M.S., S.M. and A.S.; Visualization, G.S.T., T.N.D. and M.S.
All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Appl. Sci. 2023, 13, 7775 15 of 16

Conflicts of Interest: The authors declare no conflict of interest.

References
1. Khalil, M.; McGough, A.S.; Pourmirza, Z.; Pazhoohesh, M.; Walker, S. Machine learning, deep learning and statistical analysis for
forecasting building energy consumption—A systematic review. Eng. Appl. Artif. Intell. 2022, 115, 105287. [CrossRef]
2. Agrawal, R.; Bhagia, S.; Satlewal, A.; Ragauskas, A.J. Urban mining from biomass, brine, sewage sludge, phosphogypsum and
e-waste for reducing the environmental pollution: Current status of availability, potential, and technologies with a focus on LCA
and TEA. Environ. Res. 2023, 224, 115523. [CrossRef] [PubMed]
3. Alok, S.; Ruchi, A.; Samarthya, B.; Art, R. Rice straw as a feedstock for biofuels: Availability, recalcitrance, and chemical properties:
Rice straw as a feedstock for biofuels. Biofuels Bioprod. Biorefining 2017, 12, 83–107.
4. IEA. Clean Energy Transitions in Emerging and Developing Economies; IEA: Paris, France, 2021.
5. Shin, S.-Y.; Woo, H.-G. Energy consumption forecasting in korea using machine learning algorithms. Energies 2022, 15, 4880.
[CrossRef]
6. Özbay, H.; Dalcalı, A. Effects of COVID-19 on electric energy consumption in turkey and ann-based short-term forecasting. Turk.
J. Electr. Eng. Comput. Sci. 2021, 29, 78–97. [CrossRef]
7. Ji, Y.; Lomas, K.J.; Cook, M.J. Hybrid ventilation for low energy building design in south China. Build. Environ. 2009, 44,
2245–2255. [CrossRef]
8. Manu, S.; Shukla, Y.; Rawal, R.; Thomas, L.E.; De Dear, R. Field studies of thermal comfort across multiple climate zones for the
subcontinent: India Model for Adaptive Comfort (IMAC). Build. Environ. 2016, 98, 55–70. [CrossRef]
9. Delzendeh, E.; Wu, S.; Lee, A.; Zhou, Y. The impact of occupants’ behaviours on building energy analysis: A research review.
Renew. Sustain. Energy Rev. 2017, 80, 1061–1071. [CrossRef]
10. Itzhak, N.; Tal, S.; Cohen, H.; Daniel, O.; Kopylov, R.; Moskovitch, R. Classification of univariate time series via temporal
abstraction and deep learning. In Proceedings of the 2022 IEEE International Conference on Big Data (Big Data) IEEE, Osaka,
Japan, 17–20 December 2022; pp. 1260–1265.
11. Ibrahim, M.; Badran, K.M.; Hussien, A.E. Artificial intelligence-based approach for univariate time-series anomaly detection
using hybrid cnn-bilstm model. In Proceedings of the 2022 13th International Conference on Electrical Engineering (ICEENG)
IEEE, Cairo, Egypt, 29–31 March 2022; pp. 129–133.
12. Hu, M.; Ji, Z.; Yan, K.; Guo, Y.; Feng, X.; Gong, J.; Zhao, X.; Dong, L. Detecting anomalies in time series data via a meta-feature
based approach. IEEE Access 2018, 6, 27760–27776. [CrossRef]
13. Niu, Z.; Yu, K.; Wu, X. Lstm-based vae-gan for time-series anomaly detection. Sensors 2020, 20, 3738. [CrossRef]
14. Warrick, P.; Homsi, M.N. Cardiac arrhythmia detection from ecg combining convolutional and long short-term memory networks.
In Proceedings of the 2017 Computing in Cardiology (CinC) IEEE, Rennes, France, 24–27 September 2017; pp. 1–4.
15. Karim, F.; Majumdar, S.; Darabi, H.; Harford, S. Multivariate lstm-fcns for time series classification. Neural Netw. 2019, 116,
237–245. [CrossRef]
16. Gasparin, A.; Lukovic, S.; Alippi, C. Deep learning for time series forecasting: The electric load case. CAAI Trans. Intell. Technol.
2021, 7, 1–25. [CrossRef]
17. Wei, N.; Li, C.; Peng, X.; Zeng, F.; Lu, X. Conventional models and artificial intelligence-based models for energy consumption
forecasting: A review. J. Pet. Sci. Eng. 2019, 181, 106187. [CrossRef]
18. Kim, Y.; Son, H.G.; Kim, S. Short term electricity load forecasting for institutional buildings. Energy Rep. 2019, 5, 1270–1280.
[CrossRef]
19. Chitalia, G.; Pipattanasomporn, M.; Garg, V.; Rahman, S. Robust short-term electrical load forecasting framework for commercial
buildings using deep recurrent neural networks. Appl. Energy 2020, 278, 115410. [CrossRef]
20. Pinto, T.; Praça, I.; Vale, Z.; Silva, J. Ensemble learning for electricity consumption forecasting in office buildings. Neurocomputing
2021, 423, 747–755. [CrossRef]
21. Pallonetto, F.; Jin, C.; Mangina, E. Forecast electricity demand in commercial building with machine learning models to enable
demand response programs. Energy AI 2022, 7, 100121. [CrossRef]
22. Skomski, E.; Lee, J.Y.; Kim, W.; Chandan, V.; Katipamula, S.; Hutchinson, B. Sequence-to-sequence neural networks for short-term
electrical load forecasting in commercial office buildings. Energy Build. 2020, 226, 110350. [CrossRef]
23. Dagdougui, H.; Bagheri, F.; Le, H.; Dessaint, L. Neural network model for short-term and very-short-term load forecasting in
district buildings. Energy Build. 2019, 203, 109408. [CrossRef]
24. Khan, Z.A.; Hussain, T.; Ullah, A.; Rho, S.; Lee, M.; Baik, S.W. Towards efficient electricity forecasting in residential and
commercial buildings: A novel hybrid CNN with a LSTM-AE based framework. Sensors 2020, 20, 1399. [CrossRef]
25. Karijadi, I.; Chou, S.Y. A hybrid RF-LSTM based on CEEMDAN for improving the accuracy of building energy consumption
prediction. Energy Build. 2022, 259, 111908. [CrossRef]
26. Hwang, J.; Suh, D.; Otto, M.O. Forecasting electricity consumption in commercial buildings using a machine learning approach.
Energies 2020, 13, 5885. [CrossRef]
27. Fernández-Martínez, D.; Jaramillo-Morán, M.A. Multi-Step Hourly Power Consumption Forecasting in a Healthcare Building
with Recurrent Neural Networks and Empirical Mode Decomposition. Sensors 2022, 22, 3664. [CrossRef] [PubMed]
Appl. Sci. 2023, 13, 7775 16 of 16

28. Jozi, A.; Pinto, T.; Marreiros, G.; Vale, Z. Electricity consumption forecasting in office buildings: An artificial intelligence approach.
In Proceedings of the 2019 IEEE Milan PowerTech, Milan, Italy, 23–27 June 2019; pp. 1–6.
29. Mariano-Hernández, D.; Hernández-Callejo, L.; Solís, M.; Zorita-Lamadrid, A.; Duque-Pérez, O.; Gonzalez-Morales, L.; Alonso-
Gómez, V.; Jaramillo-Duque, A.; Santos García, F. Comparative study of continuous hourly energy consumption forecasting
strategies with small data sets to support demand management decisions in buildings. Energy Sci. Eng. 2022, 10, 4694–4707.
[CrossRef]
30. Divina, F.; Torres, M.G.; Vela, F.A.G.; Noguera, J.L.V. A comparative study of time series forecasting methods for short term
electric energy consumption prediction in smart buildings. Energies 2019, 12, 1934. [CrossRef]
31. Johannesen, N.J.; Kolhe, M.; Goodwin, M. Relative evaluation of regression tools for urban area electrical energy demand
forecasting. J. Clean. Prod. 2019, 218, 555–564. [CrossRef]
32. Singhal, R.; Choudhary, N.; Singh, N. Short-Term Load Forecasting Using Hybrid ARIMA and Artificial Neural Network Model.
In Advances in VLSI, Communication, and Signal Processing: Select Proceedings of VCAS 2018; Springer: Singapore, 2020; pp. 935–947.
33. Li, K.; Zhang, T. Forecasting electricity consumption using an improved grey prediction model. Information 2018, 9, 204. [CrossRef]
34. del Real, A.J.; Dorado, F.; Duran, J. Energy demand forecasting using deep learning: Applications for the french grid. Energies
2020, 13, 2242. [CrossRef]
35. Fathi, S.; Srinivasan, R.S.; Kibert, C.J.; Steiner, R.L.; Demirezen, E. AI-based campus energy use prediction for assessing the effects
of climate change. Sustainability 2020, 12, 3223. [CrossRef]
36. Khan, S.U.; Khan, N.; Ullah, F.U.M.; Kim, M.J.; Lee, M.Y.; Baik, S.W. Towards intelligent building energy management: AI-based
framework for power consumption and generation forecasting. Energy Build. 2023, 279, 112705. [CrossRef]
37. Athiyarath, S.; Paul, M.; Krishnaswamy, S. A comparative study and analysis of time series forecasting techniques. SN Comput.
Sci. 2020, 1, 175. [CrossRef]
38. Noor, R.M.; Yik, N.S.; Kolandaisamy, R.; Ahmedy, I.; Hossain, M.A.; Yau, K.L.A.; Shah, W.M.; Nandy, T. Predict Arrival Time by
Using Machine Learning Algorithm to Promote Utilization of Urban Smart Bus. Preprints.org 2020, 2020020197. [CrossRef]
39. Ciampiconi, L.; Elwood, A.; Leonardi, M.; Mohamed, A.; Rozza, A. A Survey and Taxonomy of Loss Functions in Machine
Learning. arXiv 2023, arXiv:2301.05579.
40. Bianco, V.; Manca, O.; Nardini, S. Electricity consumption forecasting in italy using linear regression models. Energy 2009, 34,
1413–1421. [CrossRef]
41. Saab, S.; Badr, E.; Nasr, G. Univariate modeling and forecasting of energy consumption: The case of electricity in lebanon. Energy
2001, 26, 1–14. [CrossRef]
42. Yuan, X.; Li, L.; Wang, Y. Nonlinear dynamic soft sensor modeling with supervised long short-term memory network. IEEE Trans.
Ind. Inform. 2019, 16, 3168–3176. [CrossRef]
43. Durand, D.; Aguilar, J.; R-Moreno, M.D. An analysis of the energy consumption forecasting problem in smart buildings using
lstm. Sustainability 2022, 14, 13358. [CrossRef]
44. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [CrossRef]
45. Graves, A.; Schmidhuber, J. Framewise phoneme classification with bidirectional lstm and other neural network architectures.
Neural Netw. 2005, 18, 602–610. [CrossRef]
46. Le, T.; Vo, M.T.; Vo, B.; Hwang, E.; Rho, S.; Baik, S.W. Improving electric energy consumption prediction using cnn and bi-lstm.
Appl. Sci. 2019, 9, 4237. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like