Energies 17 05173

energies
Article
Fusion Forecasting Algorithm for Short-Term Load in
Power System
Tao Yu 1 , Ye Wang 1 , Yuchong Zhao 1 , Gang Luo 1 and Shihong Yue 2, *
1 CCCC Tianjin Waterway Bureau Co., Ltd., Tianjin 300202, China; [email protected] (T.Y.);
[email protected] (Y.W.); [email protected] (Y.Z.); [email protected] (G.L.)
2 School of Electrical Automation and Information Engineering, Tianjin University, Tianjin 300072, China
* Correspondence: [email protected]
Abstract: Short-term load forecasting plays an important role in power system scheduling, optimiza-
tion, and maintenance, but no existing typical method can consistently maintain high prediction
accuracy. Hence, fusing different complementary methods is increasingly focused on. To improve
forecasting accuracy and stability, these features that affect the short-term power system are firstly
extracted as prior knowledge, and the advantages and disadvantages of existing methods are an-
alyzed. Then, three typically methods are used for short-term power load forecasting, and their
interaction and complementarity are studied. Finally, the Choquet integral (CI) is used to fuse the
three existing complementarity methods. Different from other fusion methods, the CI can fully utilize
the interactions and complementarity among different methods to achieve consistent forecasting
results, and reduce the disadvantages of a single forecasting method. Essentially, a CI with n inputs is
equivalent to n! constrained feedforward neural networks, leading to a strong generalization ability
in the load prediction process. Consequently, the CI-based method provides an effective way for the
fusion forecasting of short-term load in power systems.
Keywords: short-term load forecasting; time series; information fusion; Choquet integral
1. Introduction
Citation: Yu, T.; Wang, Y.; Zhao, Y.; Short-term load forecasting is a crucial task in the power system and serves as a funda-
Luo, G.; Yue, S. Fusion Forecasting mental basis for power load scheduling, optimization, and maintenance [1]. However, the
Algorithm for Short-Term Load in
power loads have real-time, rapid-changeable, and non-storage characteristics. Ensuring
Power System. Energies 2024, 17, 5173.
the stability, safety, and economy of the power system through accurate forecasting has
https://doi.org/10.3390/en17205173
always been a key issue [2].
Academic Editors: Ramiro Barbosa The current effective forecasting methods include two categories [3], current deep
and Tek Tjing Lie learning and traditional machine learning. The deep learning methods that have been
applied to short-term load forecasting in power systems include convolutional neural
Received: 21 August 2024
networks, long short-term memory networks, and lightweight networks, etc. [4]. Deep
Revised: 12 October 2024
Accepted: 15 October 2024
learning methods can effectively solve load forecasting issues if there are sufficient represen-
Published: 17 October 2024
tative samples and if load changes exhibit a strong repeatability. However, for short-term
power system load forecasting, the above conditions often are not met, and thus their
predicted results may have a high uncertainty and produce significant errors [5]. More-
over, their interpretability and operability are often quite restricted. On the other hand,
Copyright: © 2024 by the authors. traditional machine learning methods are effective under certain conditions and usage
Licensee MDPI, Basel, Switzerland. environments. Notably, three commonly used algorithms are Kalman filtering (KF) [6],
This article is an open access article backpropagation (BP) neural networks [7], and the gray system (GS) model [8], each with
distributed under the terms and its own characteristics and applicable range. These methods have the advantages of clear
conditions of the Creative Commons
principles, easy realization, and good interpretability.
Attribution (CC BY) license (https://
KF is a recursive, optimally filtering and forecasting method. It is a highly efficient and
creativecommons.org/licenses/by/
useful method for solving a large amount of time series forecasting problems with prior and
4.0/).
Energies 2024, 17, 5173. https://doi.org/10.3390/en17205173 https://www.mdpi.com/journal/energies

Energies 2024, 17, 5173 2 of 12
subsequent correlations. When KF is used for power system load forecasting, it may have
an easy realization, clear principles, and stable forecasting results. However, KF is easily
affected by noise and nonlinear variables [9]. For irregular load changes, constructing the
recursive matrix and observation matrix in KF is key and challenging in practice. Moreover,
possible random interference factors and noisy data also limit its forecasting accuracy
and stability.
The BP neural network is widely used in traditional machine learning, with numerous
open-source systems and functional modules available in most software packages and
computing platforms. The use of BP for short-term load forecasting can establish causal
relationships between various related factors, offer explanations for load measurements,
and improve forecasting accuracy under certain conditions. However, BP networks have
evident limitations, such as the difficulty in avoiding the problem of local optima of the
objective function and the gradient disappearing in the learning process [10], which have
not been fundamentally solved so far.
The GS is a modeling and approximation method based on gray theory [11]. It
is a simple and practical method to solve modeling, forecasting, and optimization in
inconsistent, uncertain, and incomplete situations. Currently, many uncertain and random
factors affect daily power load changes, and both mechanistic and semi-mechanistic models
struggle to accurately describe and quantitatively calculate these factors. In contrast, the
GS method offers advantages such as simple calculations, strong practicality, and a high
accuracy for power forecasting, particularly when the load changes are not rapid. So, the
GS may be suitable for short-, mid-, and long- term power load forecasting. However,
GS-based forecasting is less effective when samples are complex. Especially for time series
with long-term and persistent load changes, the forecasting results may be inaccurate.
In summary, the above studies partially or entirely use meteorological information,
daily patterns, and historical loads as the key factors of feature selection in the forecasting
of power load [12]. Their input features are not processed based on the characteristics of
the short-term load forecasting task before inputting the features into the forecasting model,
optimization, and computation. Therefore, they are unable to effectively utilize existing
experience and prior knowledge in the learning and forecasting models. Furthermore,
most machine learning-based methods the predict power load independently and do not
consider integration and time series characteristics, leading to limitations such as a reduced
applicability and unstable results [13]. Nevertheless, integrating multiple load forecasting
methods can provide more accurate and stable short-term forecasting for the power system,
offering practical significance and urgency.
In this study, the three commonly used short-term load forecasting methods, KF, BP,
and GS, are used to model and predict short-term loads. After exploring their applicability
and complementarity, the Choquet integral (CI) model [14] is proposed as a fusion fore-
casting method to enhance accuracy and stability. The CI is a nonlinear fusion model that
considers the interactions among various methods or information sources. It can automati-
cally identify the advantages and disadvantages of different methods by solving model
parameters, and thereby endowing a higher importance to those accurate sources, reducing
the limitations of single methods in predicting unstable or inaccurate results under different
conditions. The proposed method based on the CI is validated by predicting the annual
load data of a certain district in Tianjin, China, and further practical results are obtained as
to the applicability of the new method.
2. Related Work
2.1. KF Technique
The KF provides the state estimation of any dynamic system through recursive re-
lationships, and is the optimal estimator for non-stationary stochastic processes under
Energies 2024, 17, 5173 3 of 12
the criterion of minimum error covariance [15]. The implementation of KF consists of the
following five fundamental equations [16]:
Yk = Hk Xk + Wk (Measurement equation) (1)
Xk+1 = Ak Xk + Vk (State equation) (2)

where Ak is the one-step state transition matrix; Vk is the dynamic noise of the system at
the kth moment; Hk is the observation matrix; Xk is an N-dimensional state vector; and Wk
is the random interference noise at the kth moment. The forecasting at time (k + 1) is
e k+1 = Ak X̂k
X (3)
where X̂k is the filtering estimation at time k. After obtaining the new measurement Yk+1 , the
forecasting error (Yk+1 − Y ek+1 ) is multiplied by an appropriate coefficient as the correction
term for state forecasting, and the estimated value at time k + 1 is obtained as
e k+1 = Ak X̂k + Kk+1 (Yk+1 − Hk+1 Ak X̂k )

X (4)
After selecting the gain matrix Kk+1 , the covariance matrix of the estimation error
( Xk+1 − X̂k+1 ) is obtained as
n o
T
Pk+1 = E ( Xk+1 − X̂k+1 )( Xk+1 − X̂k+1 ) (5)
The KF forecasting result is achieved by recursively using the above five key equations.
2.2. BP Neural Network

BP first selects the factors that primarily reflect load changes as the input X of the
network and uses the load forecasting value Y as the output. This forms a pair (X, Y) and
further constructs a time series. The BP modeling steps are as follows [17]:
Step 1. Initialize parameters. Namely, the number of input layer nodes n, the number
of hidden layer nodes l, the number of output layer nodes m, the connection weights Wij
and Wjk from the input to the hidden layer and output layer neurons, the hidden layer
threshold a, the output layer threshold b, and learning rate η.
Step 2. Compute hidden layer. The output in the hidden layer Hj is calculated as
n
Hj = f (∑i=1 Wij Xi − a), i = 1, 2, . . . , n; j = 1, 2, . . . , l (6)
where f is the hidden layer excitation function.

Step 3. Update output. It is calculated as
i
Ok = ∑ j=1 Hj Wjk − bk , k = 1, 2, . . . , m (7)
Step 4. Calculate error. According to Ok and the actual/real output Yk , the error ek is
calculated as
ek = Yk − Ok , k = 1, 2, . . . , m (8)
Step 5. Update weighting value. The connecting weighting values among the nodes
are calculated as
m
Wij = Wij + ηHj (1 − Hj ) Xi ∑k=1 Wjk ek s.t., Wjk = Wjk + ηHj ek (9)
where i = 1, . . ., n; j = 1, . . ., l; k = 1, . . ., m.
Step 6. Update threshold. The threshold is updated as
a j = a j + ηHj (1 − Hj )∑m

k =1 Wjk ek , j = 1, 2, . . . , l (10)
b j = bk + ek , k = 1, 2, . . . , m
Energies 2024, 17, 5173 4 of 12
2.3. GS Model
The GS model uses any historical data series to establish a differential equation as
dX/dt + aX = u (11)
where a and u are a pair of coefficients [18].

Let the variable X(0) be the original data sequence X(0) = (X(0) (1), X(0) (2), . . ., X(0) (n))
that generates the following first-order sequence as
k
X (1) = ( X (1) (1), X (1) (2), . . . , X (1) (n)), s.t., X (1) (k) = ∑ i =1 x (0) ( i ) (12)
where X(1) (k) satisfies the discrete form along the following differential equations,
X (1) (k + 1) = [ X (0) (1) − û/ â]e âk + û/ â (13)
The GS model for the original data column X(0) can be obtained by
X̂ (0) (k + 1) = X̂ (1) (k + 1) − X̂ (1) (k) = (e− a − 1)[ X (0) (1) − û/ â]e âk (14)
where k = 0, 1, 2, 3, . . .
In the GS model, the sequence X(1) (k) obeys an exponential growth law, and its solution
follows the discrete first-order differential equation as
1
X (1) (k + 1) + a[ X (1) (k + 1) + X (1) (k )] = u, k = 1, 2, . . . , n (15)
2
Based on the residual between the predicted and the actual value, the least squares
criteria method can estimate the values of a and u. Hence, the power load forecasting can
be realized by (15).
3. Proposed Fusion Method

3.1. Feature Extraction
The total load of the power system L can generally be decomposed into four compo-
nents according to the type of influencing factors [19], namely
L = Ln + Lw + Ls + Lr (16)
where Ln represents the normal trend of the load component, typically characterized by the
load of typical historical days; Lw is the power load component related to meteorological
factors, primarily influenced by changes in various meteorological factors; Ls is caused by
special external factors such as holidays; and Lr is a random component of the power load,
which, while having a small proportion, is usually difficult to predict. Therefore, we focus
on the first three key factors.
(1) Temperature and humidity: Based on previous research experiences in power load
forecasting [20], the two meteorological factors of temperature and relative humidity have
the strongest correlation with power system load. Thus, we construct their daily mean and
variance into a vector,
Hk = (tm (k), tv (k), hm (k), hv (k)), k = 1, 2, . . ., K, (17)
where K is the total number of observed days, and tm (k) and tv (k) refer to the daily mean
and variance of the temperature, and hm (k) and hv (k) those of the humidity. According to
the range of annual temperature and humidity in the detection area, the ranges of the four
variables can be determined. All the above vectors consist of a set SH = {Hk |k = 1, 2, . . ., K}.
(2) Special days. The power loads are very different between working and non-working
days, and particularly special days such as Spring Festival, Labor Day, and National Day.
Energies 2024, 17, x FOR PEER REVIEW 5 of 12
Energies 2024, 17, 5173 5 of 12
(2) Special days. The power loads are very different between working and non-work-
ing days, and particularly special days such as Spring Festival, Labor Day, and National
Each of them usually includes a group of continuous non-working days. Hence, a feature
Day. Each of them usually includes a group of continuous non-working days. Hence, a
vector to represent any special day is constructed as follows,
feature vector to represent any special day is constructed as follows,
Dq =D(dq q1 , dq1q2, ,d·q2· ,⋯,
= (d · , ddqm ), qq == 1,
qm), 1, 2,…,Q;
2,. . .,Q; (18)
(18)
where qq denotes
where denotes thethe qth
qth special
special day,
day, m m is
is the
the number
number of of non-working
non-working days,days, andand Q Q is is the
the
total number
total number of of special
special days.
days. All such vectors consist of a set, set, SD
SD == {Dqq|q
|q==1,1,2,2,…, . . .,Q}.
Q}.
(3) Normal
(3) Normalday.day.Owing
Owing to the
to the cyclical
cyclical and and continuous
continuous changes,changes,
normalnormal
historical historical
power
powerfrom
loads loadsa from
weekaexcept
week except
specialspecial
days are days are nearly
nearly repeated repeated [21]. Assume
[21]. Assume that the thattotal
the
total number
number of normal
of normal days days is N;
is N; all all normal
normal days days consist
consist of aSN
of a set, set,=SN
{Dp=|p
{D=p|p
1, =2,1,. .2,., …,
N}.N}.
Therefore, the
Therefore, the power
powerload loadmust
mustbebe forecasted
forecasted individually
individually over special
over daysdays
special and nor-and
normal
mal days days after
after considering
considering the the influences
influences of temperature
of temperature andand humidity.
humidity. And And any pre-any
predicted
dicted power power loadload firstlyis isdistinguished
firstly distinguishedby SD∪SHwith
bySD∪SH withrelative
relative power
power loads, and and
SN ∪ SH with relative
SN∪SH with relative power loads, respectively.
Let
Let the
the forecasting model be q == F(H, c), where q is the
forecasting model power load and c is
the power is the
the data
data
type.
type. We
We useuseKF,
KF,BP,BP,and
andthetheGS GStotoconstruct
construct forecasting
forecasting models
models to to
approximate
approximate F(·,F(·,·)
·) using
us-
historical data.data.
ing historical For For
eacheach
predicted day,day,
predicted the three models
the three provide
models power
provide loadload
power forecastings,
forecast-
which are then
ings, which areintegrated into theinto
then integrated CI model
the CItomodel
achieve to aachieve
more accurate
a more forecasting of power
accurate forecasting
loads. Figure 1 shows the flowchart of our proposed
of power loads. Figure 1 shows the flowchart of our proposed model. model.
Figure 1.
Figure 1. Illustration of
of the
the forecasting
forecasting process
process of
of our
our proposed
proposedmodel.
model.
3.2.
3.2. Forecasting
Forecasting of
of the
the Three
Three Existing
Existing Typical
TypicalMethods
Methods
As
As an example, the historical load data used
an example, the historical load data used in
in this
this study
study consists
consists of
of daily
daily 24
24 hh loads
loads
in
in aa district
district in
in Tianjin
Tianjin from
from 11 July
July 2019
2019 to
to 30
30 June
June 2021.
2021. The
The sampling
sampling unit
unit is
is hours,
hours, and
and in
in
total, 17,088 pieces of data are collected. Table 1 shows their main features,
total, 17,088 pieces of data are collected. Table 1 shows their main features, such as their such as their
maximum,
maximum, minimum,
minimum, average,
average, standard
standard deviation,
deviation, and
and variance.
variance. All
All data
data are
are partitioned
partitioned
into
into two groups, and the data in the first year are used to construct the forecasting model
two groups, and the data in the first year are used to construct the forecasting model
and
and the
the second
second year
year to
to evaluate
evaluate the
the predicting
predicting accuracy
accuracy of of the
the model.
model.
Table 1. Data
Table 1. Data features
features of
of historical
historical loads
loads in
in two
two years
years (unit:
(unit: MW).
MW).
Period\Feature
Period\Feature MaximumMinimum
Maximum Minimum Average
Average Variance
Variance Standard
StandardDeviation
Deviation
June 2019–June
June 2019–June 2020 4785.15 1834.64 3206.40 789.93 28.11
July 2020–July 20214785.15
2020 3464.79 1834.64
587.22 3206.40
2345.73 789.93
312.54 28.11
17.68
July 2020–July
3.2.1. 3464.79 587.22 2345.73 312.54 17.68
2021 KF Forecasting
The KF forecasting method is developed using relative power loads from SD∪SH
and KF∪SH
3.2.1.SN in the first year. Denote the load data at time k as qk , k = 1, 2, . . ., n. Each
Forecasting
Energies 2024, 17, x FOR PEER REVIEW 6 of 12
Energies 2024, 17, 5173 6 of 12

The KF forecasting method is developed using relative power loads from SD∪SH
and SN∪SH in the first year. Denote the load data at time k as qk, k = 1, 2, …, n. Each piece
piece
of data of data is defined is defined
as aasnode,
a node,andand kthnode
thethekth noderefers
refers to
to a three-dimensional
three-dimensional vector
vector
′′ ′′
XXk k :: (qkk,,qqk' ′k, ,qqk" )k ,),where
where qk'′k and
andqkqk"are defined
are defined asas
q′k =qqk' k=−qkq−k−qk1−1and qk" q=′kq− ′'

′′ '
and
qk = k −qq
kk−−11 (19)
(19)
TheyThey
are the
are first-order and and
the first-order the second-order differences
the second-order of qk,of
differences and
qk , are
andrecursively com-
are recursively
puted. TheThe
computed. calculation process
calculation is shown
process in Figure
is shown 2. 2.
in Figure
Figure2.2. Illustration
Figure Illustration of
of the
theprocess
processto
toobtain
obtainaathree-dimensional
three-dimensionalvector
vectorfrom
fromload
loadnodes.
nodes.
After
Afterrecording
recordingthe
theload loadevery
every day dayalong each
along regular
each regular interval
interval(1 h),
(124
h),daily sampling
24 daily sam-
values are obtained, and the three-dimensional representation
pling values are obtained, ′′and the three-dimensional representation of the measurementof the measurement nodes
can be obtained, Xt (qt , q′k ,Xqk()q. ,Hence,
' " the KF matrix coefficient A is determined as follows.
nodes can be obtained, t t qk , qk ) . Hence, the KF matrix coefficient A is determined as
Since
follows. Since
q′k = q′k−1 + qk−1 and qk = qk−1 + q′k−1 + qk−1
′′ ′′
(20)
' ' " ' "
the coefficient matrix A in the
q =
k KF q + q
k −equation
1 k −1 and q k =
is obtained
qk −as
1 + qk −1 + q k −1 (20)
the coefficient matrix 

A in the
 KF equation
 is obtained
 as 
qk q k −1 111 q k −1
 q′  = A
k
qk q′k−1qk −1  = 1 1
 0 111 q ′
k −1 q k −1
 (21)
′′   ′′       ′′
 qk  =qkA−1qk −1  = 0 a1 b1c  qk −1 qk−1
' ' '
qk (21)
 
 q"   q "   a b c   q" 
 k  prior    k −1 
 k −1  information,
When there is no other available the measurement matrix Hk is
takenWhen
as the there
identity matrix.
is no other available prior information, the measurement matrix Hk is
takenUsing the
as the following
identity seven special days, such as New Year’s Day, Spring Festival, Tomb-
matrix.
Sweeping
UsingDay,
theLabor Day, the
following Long
seven Boat Festival,
special days, suchMid-Autumn Festival,
as New Year’s Day,and National
Spring Day,
Festival,
as dividing points,
Tomb-Sweeping we Labor
Day, divideDay,
the entire yearBoat
the Long fromFestival,
June 2019 to June 2020Festival,
Mid-Autumn into 7 stages
and Na-to
determine the parameters in each forecasting model. In the same way, the
tional Day, as dividing points, we divide the entire year from June 2019 to June 2020 into year from July
2020 to July
7 stages 2021 is divided
to determine into 7 stages
the parameters in for
eachload forecasting.
forecasting In every
model. stage,
In the samethe forecasting
way, the year
isfrom
performed
July 2020 to July 2021 is divided into 7 stages for load forecasting. In everybystage,
in every hour of all days. The KF forecasting accuracy is evaluated a group
the
of statistical quantities in each stage, such as maximal absolute error, average
forecasting is performed in every hour of all days. The KF forecasting accuracy is evalu- absolute
relative
ated by error,
a groupaverage relativequantities
of statistical error, andin standard error,
each stage, as shown
such in Table
as maximal 2 as follows.
absolute error, av-
erage absolute relative error, average relative error, and standard error, as shown in Table
Table 2. KF forecasting error for 7 stages (unit: MW).
2 as follows.
Maximal Average Average
Stage2. KF forecasting error for 7 stages (unit: MW).
Table Standard Error
Absolute Error Absolute Error Relative Error
1 685.982
Maximal Absolute Average486.239
Absolute 29.7% Relative 166.267
Average Standard
Stage
2 64.010
Error Error 43.845 18.0%
Error 12.137
Error
3 550.280 467.083 25.3% 82.295
14 685.982 321.439 486.239224.451 29.7%
24.3% 166.267
49.528
25 64.010 436.222 43.845 253.580 18.0%
37.6% 12.137
69.458
36 550.280 441.775 467.083310.238 38.6%
25.3% 73.582
82.295
47 321.439 358.751 224.451198.239 15.6%
24.3% 70.223
49.528
5 436.222 253.580 37.6% 69.458
The following conclusions can be inferred from Table 2.
(1) The minimum forecasting error occurs in the second stage, but the forecasting
error tends to stabilize from the third stage onwards. This is because the KF method is
Energies 2024, 17, 5173 7 of 12
advantageous as it is an adaptive process with a dynamic correction function. It updates

the state variables using the estimated value from the previous moment and the observed
value at the current moment to obtain the estimated value at this stage.
(2) The maximum absolute error ranges from 64.010 to 685.982 MW, the average abso-
lute error ranges from 43.845 to 486.239 MW, the average relative error ranges from 15.6%
to 38.6%, and the standard deviation ranges from 12.137 to 166.267, respectively. All error
indicators are relatively high. This is because the parameters of the KF model significantly
impact the forecasting results, with the state transition matrix having a particularly notable
effect on the KF forecasting. Especially in the KF forecasting process, the state matrix is
inferred based on the load information of the first two months of every two quarters, and
its accuracy directly affects the forecasting results.
(3) Analysis of the measurement period reveals that the average relative error of
load forecasting change during the National Day Golden Week is the smallest. Load
value fluctuations are relatively small in summer and winter, while they are larger in
spring and autumn. This indicates that seasonal variations impact the accuracy of the
forecasting results.
3.2.2. BP Forecasting
The process of solving the BP network model is as follows:
(1) Input and output layer design: The inputs of the BP take the first-order and second-order
difference of hourly loads, and the output is the load. Therefore, the number of nodes
in the input layer is 2, and that in the output layer is 1.
(2) Hidden layer design: We use a multi-input–single-output BP network with three hidden
layers for load forecasting. The determination of the number of hidden layer neurons
is very important, and we thus refer to the empirical formula [22]:
√
l = n+m+a (22)
where n is the number of input layer neurons, m is the number of output layer neurons,
and a is a constant between the range [1, 10]. Therefore, the number of neurons in the
BP network in each quarter can be calculated.
(3) Selection of incentive function: There are various transfer functions for BP neural net-
works. But with periodic data, using the tansig function for the transfer function has
smaller errors and higher stability than the logsig function. Therefore, tansig is used,
and the Purelin function is used as the transfer function in the output layer.
(4) Network parameter determination: Let the network parameters have 5000 epochs of
network iterations and an expected error of 10−5 . The forecasting model established
in the first two months of each quarter forecasts the third month by the BP neural
network method; the experimental results are shown in Table 3.
Table 3. Analysis of forecasting results using BP neural network (unit: MW).

Stage Standard Error
1 33.545 26.783 3.8% 14.948
2 26.871 20.127 9.7% 4.092
3 466.754 354.853 13.9% 70.371
4 83.295 50.385 10.0% 31.331
5 134.968 100.358 13.1% 49.695
6 129.702 87.453 11.2% 54.244
7 136.513 89.729 23.4% 49.288
The results can be inferred from Table 3 as follows.

Energies 2024, 17, 5173 8 of 12
(1) In some stages, there are individual points in the predicted results with significant
forecasting errors, and some forecasting values deviate from the actual values. This may be
due to the inherent drawback of local extremum points in the BP algorithm.
(2) The maximum absolute error is between 26.871 and 466.754 MW, the average
absolute error ranges from 20.127 to 354.853 MW, the average relative error is between 3.8%
and 23.4%, and the standard deviation is between 4.092 and 70.371.
(3) In the forecasting results of each quarter, the absolute and relative errors of the first,
second, and fourth stages are relatively small, while the errors of other stages are relatively
large, indicating that different stages each day have a certain impact on the accuracy.
3.2.3. GS Forecasting
When using the GS, both parameters a and u must be determined. We determined
them through fitting the samples in the year from June 2019 to June 2020, as shown in
Table 4.
Table 4. GS parameters of four quarters.
Quarter 1 2 3 4
a 0.954 0.813 0.762 0.742
u 0.322 0.455 0.651 0.658
After taking these parameters into (13), Table 5 shows the errors in all seven stages.
Table 5. The GS forecasting results (unit: MW).

1 26.566 18.732 3.2% 9.478
2 6.932 5.054 1.6% 1.178
3 109.351 87.239 17.1% 6.631
4 68.195 50.780 16.0% 3.452
5 97.388 83.227 12.5% 43.597
6 89.640 50.718 11.8% 41.034
7 81.050 65.450 18.5% 8.787
The following results can be inferred from Table 5.

(1) The maximum absolute error predicted by the GS model is between 6.9329 and
109.3515 MW, the average absolute error ranges from 5.054 to 87.239 MW, the average
relative error is between 1.62% and 18.57%, and the standard deviation is between 1.1782
and 43.5972. After evaluating these data, it was found that the GS forecasting results did
not fluctuate up and down with the actual load value. The overall predicted trend manifests
as exponential characteristics, which should be consistent with the actual situation in the
short term. However, the general future load trend will not fully obey the exponential
growth characteristics, which is the reason that the GS generates the larger error.
(2) The maximum absolute error, average relative error, and standard deviation of
the predicted results outside the third and fifth stages are relatively small; especially, the
absolute error of the forecasting in the second stage reaches small values, basically reaching
the small error level that can be used for load forecasting. However, the GS has certain
limitations and is only suitable for modeling and forecasting problems with relatively
gentle data changes. It is not suitable for situations where the growth rate of data sequences
is too fast or exhibits significant fluctuations.
3.3. Fusion Based on CI Model

Let X = {x1 , x2 , . . ., xn } be a set of n inputs (components) and P(X) be the power set
of X. The set function g on X: P(X)→[0, 1] is called a non-additive measure [23], which
represents the interrelation among the n inputs, satisfying
is too fast or exhibits significant fluctuations.
3.3. Fusion Based on CI Model

Let X = {x1, x2, …, xn} be a set of n inputs (components) and P(X) be the power set of
Energies 2024, 17, 5173 X. The set function g on X: P(X)→[0, 1] is called a non-additive measure [23], which9 rep-
of 12
resents the interrelation among the n inputs, satisfying
(1) Boundary condition: g (φ ) = 0, g ( X ) = 1 ( φ is an empty set);
(1) Boundary condition:
(2) Monotonicity g(ϕ)For
condition: = 0,any
g( X )two
= 1 sets
(ϕ is A
an and
empty B set);
on X that satisfy A ⊆ B ,
(2) Monotonicity
g ( A) ⊆ g ( B ) . condition: For any two sets A and B on X that satisfy A ⊆ B , g( A) ⊆ g(B).
Let h:
Let X →R be
h: X→R be aa normalized
normalized map
map onon the
the measure
measure g,g, the
the discrete
discrete equation
equation of
of CI
CI is
is
=
Z
 hdg
n n
E g = Eg =
hdg =∑ [ g[ g(A
i =1i =1
(A∗ ∗ − g ( A∗ )]h( x∗ )∗
i i) )− g(Aii++1 1 )]h(i xi ) (23)
(23)
where AA(i ) =={xi* ,xx∗i*,+1x,..., * * * *

∗ xn } h∗( x1 ), h ( ∗x2 ),..., h∗( xn ) are ∗an ordering arrangement of h(x1),
where (i ) i i +1 , . . . , xn h ( x1 ), h ( x2 ), . . . , h ( xn ) are an ordering arrangement of
h(x21),…,h(x
), h(x2 ),. n),. .,h(x
satisfying h( x1* ) ≤ hh((x12∗* )) ≤≤... h≤(hx(2∗x)n* )≤, h. .( .xn*≤
n ), satisfying ) =h0,( xgn∗()A, n*h+1()x=n∗ )0 .=There is∗n+
0, g( A the
1 ) rela-
= 0.
There is the
tion with (23), relation with (23),
∑i=
n n
[ g (A∗ ) − ∗
g(A∗+i∗1+1)])] =
1 i =1[ gi( Ai ) − g (iA = 11 (24)
(24)
showsthat
(24) shows thatthe theCICI can can
be be formulated
formulated as aas a weighted
weighted averageaverage
of theof the n inputs.
n inputs. How-
However,
ever, eacheach weighting
weighting value value of the
of the typical
typical average
average formula
formula is isonly
onlyrelative
relativeto to each
each input
itself, while each in the CI is relative to all inputs. Therefore, the former is linear and the
latter is nonlinear.
nonlinear. Specially,
Specially,the theCI
CIhas
hasaavery
verystrong
stronggeneralization
generalizationability,
ability,and
anda CI with
a CI withn
inputs is equivalent to n! constrained feedforward neural networks.
n inputs is equivalent to n! constrained feedforward neural networks. For example, when For example, when
there are
there inputs, xx11,, xx22,, xx33,, the
are 33 inputs, the CI
CI is
is equivalent
equivalent toto 66 feedforward
feedforward neural
neural networks,
networks, asas
shown in Figure 3. The weights of each feedforward neural
shown in Figure 3. The weights of each feedforward neural network consist of network consist of 3
3 relative
relative
measures. Therefore,
measures. Therefore, the the CI CI cancan act
act as
as aa nonlinear
nonlinear classifier
classifier with
with aa strong
strong generalization
generalization
ability and
ability and interpretability.
interpretability.
Figure 3. Equivalence
Equivalence of of a CI and n! n! feedforward
feedforward neural networks. Note: Note: ⃝~ ○○
11 ~⃝ 1818areare −−gg23
g123g123 23 ,,
gg23 −−gg,3 , gg− − g , g

3 g , 4g −132 − g , g − g , g − g , g
g32 , g3232− g232, g2 −g24 , 2g213 − g413 , 213 − g
g13 − g3 13 , g − g ,
, g3 −13g4 , g321 g − g ,
3 − g3 , g4 − g g − g
321, g −21 , g − g
g312 − g121 ,
g4 , 21
23 3 3 4 132 21 21 1 1
g1 − g4 , g312 − g12 , g12 − g2 , g2 − g4 , g231 − g31 , g31 − g1 , g1 − g4 , respectively, where g(·) refers
,to gg(A).
12
− g2 , g2 −g4 , g231 −g31 , g31 − g1 , g1 − g4 , respectively, where g(·) refers to g(A).
The parameter determination

determination in in the CI is a key step. For 2nn parameters parameters from from nn inputs,
inputs,
since there are additional constraints on the monotonicity conditions among these inputs, inputs,
the solving method must be special. Currently, the typical and effective effective method to solve
the CI parameter is the heuristic least mean square n (HLMS) [24]. giving a setoof K
After giving
After
training data
data pairs
pairsfrom
fromthe inputhkhktotooutput
theinput outputEEk :k: ({(hhk k, ,hhkk ,,..., h k k k
), E | kk = 1,2,..., K } , the objec-
11 22 . . . ,n hn ), E k = 1, 2, . . . , K , the
tive function
objective of HLMS
function is formulated
of HLMS as as
is formulated
J = min(∑
( E − Eˆ k ) 2 )2
K
J = min( K (25)
k =1
( Ekk − Êggk ) )
k =1
(25)
where Êgk is the computed CI value of (h1k , h2k , . . . , hkn ).

To apply the CI to integrate the forecasting results from the KF, BP, and GS, we denote
the variables x1 , x2 , and x3 as their error from the real value. An HLMS algorithm can be
performed to solve the parameter of the CI in the open-resource software Kappalab [25].
For the set of training samples of three inputs relative to the three methods, x1 , x2 and x3 ,
Table 6 shows the solved parameter of the CI in Kappalab by HLMS after using the samples
of each quarter in the entire year from June 2019 to June 2020.
where Eˆ gk is the computed CI value of (h1k , h2k ,..., hnk ) .
To apply the CI to integrate the forecasting results from the KF, BP, and GS, we de-
note the variables x1, x2, and x3 as their error from the real value. An HLMS algorithm can
be performed to solve the parameter of the CI in the open-resource software Kappalab
Energies 2024, 17, 5173
[25]. For the set of training samples of three inputs relative to the three methods, x10 of 12
1, x2 and
x3, Table 6 shows the solved parameter of the CI in Kappalab by HLMS after using the
samples of each quarter in the entire year from June 2019 to June 2020.
Table 6. Solved CI model parameters.
Table 6. Solved CI model parameters.
Quarter g(x1 ) g(x2 ) g(x3 ) g(x1 , x2 ) g(x1 , x3 ) g(x2 , x3 ) g(x1 , x2 , x3 )
1 0.351 Quarter
0.452 g(x 1)
0.543 g(x2) g(x3)
0.422 g(x1,0.523
x2) g(x1, x3)0.732g(x2, x3) g(x1, x2, x3)
1.000
2 0.327 1
0.418 0.351
0.476 0.452 0.543
0.328 0.4220.489 0.523 0.5490.732 1.000
1.000
2 0.327 0.418 0.476 0.328 0.489 0.549 1.000
3 0.403 0.305 0.404 0.332 0.582 0.512 1.000
3 0.403 0.305 0.404 0.332 0.582 0.512 1.000
4 0.360 0.289 0.332 0.424 0.664 0.675 1.000
4 0.360 0.289 0.332 0.424 0.664 0.675 1.000
After
After substituting
substituting these
these parameters
parameters intointo
(23)(23)
andand
usingusing the MATLAB
the MATLAB tool, tool, the fore-
the forecast-
casting
ing resultsresults
by thebyCI the CI for
for the the year
entire entire year
from from
July July
2020 to 2020 to July
July 2021 are2021 are presented
presented by fusingby
the results
fusing thefrom the from
results three the
methods. Taking the
three methods. four stages
Taking of stages
the four the firstofquarter
the firstinquarter
the second
in the
year as an example, the forecasting results are shown in Figure 4, where the
second year as an example, the forecasting results are shown in Figure 4, where the fore- forecasting
results areresults
casting evaluated by the error
are evaluated by between
the erroreach real load
between each and
real the
loadcomputed load.
and the computed load.
(a) (b)
(c) (d)
Figure
Figure 4. 4. Comparison
Comparison ofof the
the forecasting
forecasting loads
loads byby the
the three
three methods
methods and
and their
their CICI fusion.
fusion. (a)(a) Stage
Stage 1;
1; (b) stage 2; (c) stage 3; (d) stage 4.
(b) stage 2; (c) stage 3; (d) stage 4.
The
The forecastingloads
forecasting loadsbyby the
the CICI fusion
fusion method
method have
have reducedthe
reduced the problems
problems ofof large
large
errors
errors inin the
the first
first three
three sampling
sampling points
points ofofthethe KF,
KF, large
large errors
errors inin individual
individual points
points ofofthethe
BP method, and exponential changes in the forecasting values of the
BP method, and exponential changes in the forecasting values of the GS method. TheGS method. The fore-
casting errors
forecasting errorsofof
the proposed
the proposedCICI
fusion
fusionforforthe four
the quarters
four quartersare
areshown
shownininTable
Table7.7.
Energies 2024, 17, 5173 11 of 12
Table 7. Forecasting errors by CI in seven stages (Unit: MW).

Absolute Error Maximal Error Relative Error
1 30.121 26.456 3.2% 7.263
2 12.654 10.337 2.8% 3.188
3 109.221 83.704 11.8% 30.126
4 46.139 40.659 10.4% 20.783
5 70.853 50.760 12.8% 28.491
6 43.573 37.327 9.1% 30.328
7 44.205 36.663 9.7% 20.393
Compared with the respective forecasting results from the KF, BP, and GS, the fusion
results by the CI from Table 7 can be summarized as follows.
(1) The maximum forecasting error among the three methods occurs in the KF method.
Although KF is a continuous iterative process, the errors at the initial points and the
increase in subsequent errors can cancel each other out, and the subsequent filtering tends
to stabilize. However, due to the significant errors at the first three points, the maximum
errors still have a significant impact on the overall accuracy of load forecasting, resulting in
various error indicators being less favorable compared to other methods.
(2) For stages 1 and 2, the GS forecasting results show the minimum average relative
error, because this stage corresponds to the minimum fluctuation of the load value at
the morning time of each day, and the GS model is well-suited for load forecasting with
relatively flat data changes. The forecasting results of the CI fusion method are affected by
the large forecasting error of the KF method, leading to inferior results compared to the GS
in some stages. However, it weakens the instability and limitations of single forecasting
results of both KF and the BP neural network.
(3) Comparing the errors of three single forecasting methods, the average relative error
from the CI fusion forecasting results is the smallest, which overall reduces the instability
and limitations of the single forecasting results of the other three methods in various stages.
The advantages of the CI fusion method are obvious.
4. Conclusions
This study aims to address the issues of imprecision, uncertainty, and instability in
short-term power load forecasting using a single method. A new fusion method using
the Choquet integral is proposed. This method can fully use the interactions and comple-
mentarity among various methods, leading to the more accurate and stable forecasting
results of short-term loads. Different from the other existing methods, the proposed fusion
method can effectively exact the key features and parameters in the fusion process by
using a typical optimization algorithm, which is very easily realized and understood in
practice. Therefore, the proposed method provides new ideas and a practical means for
fusing multimodal and multi-resource methods in short-term load forecasting, and it can
be extended to any power system.
However, there is still room for improvement in the method proposed in this study.
Firstly, further research and analysis are required on the typicality of the fused forecasting
methods. Secondly, in the case of big data, current deep learning-based methods can be
used to further evaluate and compare the advantages, disadvantages, and usability of
the proposed method. Finally, an applicable method must be chosen to conduct a more
in-depth analysis of the various factors that affect the operation of the power system and
load changes, in order to effectively improve the accuracy of short-term load forecasting.
Author Contributions: Conceptualization, T.Y.; Methodology, S.Y.; Validation, Y.Z.; Data curation,
G.L.; Writing—original draft, Y.W. All authors have read and agreed to the published version of
the manuscript.
Funding: This research received no external funding.
Energies 2024, 17, 5173 12 of 12
Data Availability Statement: The data are not publicly available since they are not shared resource.
Conflicts of Interest: Authors Tao Yu, Ye Wang, Yuchong Zhao, Gang Luo were employed by the
CCCC Tianjin Waterway Bureau Co., Ltd. The remaining authors declare that the research was
conducted in the absence of any commercial or financial relationships that could be construed as a
potential conflict of interest.
References
1. Dong, J.; Luo, L.; Zhang, Q. A parallel short-term power load forecasting method considering high-level elastic loads. IEEE Trans.
Instrum. Meas. 2023, 72, 2524210. [CrossRef]
2. Luo, L.; Dong, J.; Zhang, Q. A distributed short-term load forecasting method in consideration of holiday distinction. Sustain.
Energy Grid Netw. 2024, 38, 101296. [CrossRef]
3. Jahan, I.S.; Snasel, V.; Misak, S. Intelligent Systems for Power Load Forecasting: A Study Review. Energies 2020, 13, 6105.
[CrossRef]
4. Lai, C.S.; Yang, Y.; Pan, K.; Zhang, J.; Yuan, H.; Ng, W.W.Y. Multi-view neural network ensemble for short and midterm load
forecasting. IEEE Trans. Power Syst. 2021, 36, 2992–3003. [CrossRef]
5. He, Y.; Xu, Q.; Wan, J.; Yang, S. Short-term power load probability density forecasting based on quantile regression neural network
and triangle kernel function. Energy 2016, 114, 498–512. [CrossRef]
6. Sharma, S.; Majumdar, A.; Elvira, V.; Chouzenoux, E. Blind Kalman Filtering for Short-Term Load Forecasting. IEEE Trans. Power
Syst. 2021, 35, 4916–4919. [CrossRef]
7. Li, L.J.; Huang, W. A Short-Term Power Load Forecasting Method Based on BP Neural Network. Appl. Mech. Mater. 2014, 494–495,
1647–1650. [CrossRef]
8. Li, Q.; Liu, S. Grey input-output analysis for fixed assets. J. Grey Syst. 2022, 29, 14–28.
9. Khan, M.E.; Dutt, D.N. An expectation-maximization algorithm based Kalman smoother approach for event-related desynchro-
nization (ERD) estimation from EEG. IEEE Trans. Biomed. Eng. 2007, 54, 1191–1198. [CrossRef]
10. Jetcheva, J.G.; Majidpour, M.; Chen, W.P. Neural network model ensembles for building-level electricity load forecasts. Energy
Build. 2014, 84, 214–223. [CrossRef]
11. Li, H.Z.; Guo, S.; Li, C.J.; Sun, J.Q. A hybrid annual power load forecasting model based on generalized regression neural network
with fruit fly optimization algorithm. Knowl. Based Syst. 2013, 37, 378–387. [CrossRef]
12. Xie, K.; Yi, H.; Hu, G.; Li, L.; Fan, Z. Short-term power load forecasting based on Elman neural network with particle swarm
optimization. Neurocomputing 2020, 416, 136–142. [CrossRef]
13. Liu, Y.; Wang, W.; Ghadimi, N. Electricity load forecasting by an improved forecast engine for building level consumers. Energy
2017, 139, 18–30. [CrossRef]
14. Grabisch, M. A new algorithm for identifying fuzzy measures and its application to pattern recognition. In Proceedings of the
IEEE International Conference on Fuzzy Systems, Yokohama, Japan, 20–24 March 1995; pp. 145–150.
15. Kalman, R.E. A new approach to linear filtering and prediction problems. J. Basic Eng. 1960, 82, 35–45. [CrossRef]
16. Graves, A. Long Short-Term Memory. In Supervised Sequence Labelling with Recurrent Neural Networks; Springer: Berlin/Heidelberg,
Germany, 2012; pp. 1735–1780.
17. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [CrossRef]
18. Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks
from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958.
19. Tokgoz, A.; Unal, G. A RNN based time series approach for forecasting turkish electricity load. In Proceedings of the 26th Signal
Processing and Communications Applications Conference, Izmir Turkey, 2–5 May 2018.
20. Zheng, J.; Xu, C.; Zhang, Z.; Li, X. Electric load forecasting in smart grids using long-short-term-memory based recurrent neural
network. In Proceedings of the 2017 51st Annual Conference on Information Sciences and Systems (CISS), Baltimore, MD, USA,
22–24 March 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–6.
21. Chen, Z.; Sun, L. Short-Term Electrical Load Forecasting Based on Deep Learning LSTM Networks. Electron. Technol. 2018, 10,
11–18.
22. Shang, H.; Jiang, Z.; Xu, R.; Wang, D.; Wu, P.; Chen, Y. The dynamic mechanism of a novel stochastic neural firing pattern
observed in a real biological system. Cogn. Syst. Res. 2019, 53, 123–136. [CrossRef]
23. Angilella, S.; Grecoa, S.; Matarazzo, B. Non-additive robust ordinal regression: A multiple criteria decision model based on the
Choquet integral. Eur. J. Oper. Res. 2010, 201, 277–288. [CrossRef]
24. Grabisch, M. Alternative representations of discrete fuzzy measures for decision making. Int. J. Uncertain. Fuzziness Knowl. Based
Syst. 1997, 5, 587–607. [CrossRef]
25. Kappa Software. Available online: http://www.kappa.com.cn (accessed on 14 October 2024).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

Energies 17 05173

Uploaded by

Copyright:

Available Formats

Energies 17 05173

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Energies 17 05173

Uploaded by

Copyright:

Available Formats

energies

Energies 2024, 17, 5173. https://doi.org/10.3390/en17205173 https://www.mdpi.com/journal/energies

Yk = Hk Xk + Wk (Measurement equation) (1)

Xk+1 = Ak Xk + Vk (State equation) (2)

e k+1 = Ak X̂k + Kk+1 (Yk+1 − Hk+1 Ak X̂k )

2.2. BP Neural Network

where f is the hidden layer excitation function.

where a and u are a pair of coefficients [18].

X (1) (k + 1) = [ X (0) (1) − û/ â]e âk + û/ â (13)

3. Proposed Fusion Method

Hk = (tm (k), tv (k), hm (k), hv (k)), k = 1, 2, . . ., K, (17)

Energies 2024, 17, 5173 5 of 12

Energies 2024, 17, 5173 6 of 12

q′k =qqk' k=−qkq−k−qk1−1and qk" q=′kq− ′'

the coeﬃcient matrix 

advantageous as it is an adaptive process with a dynamic correction function. It updates

Table 3. Analysis of forecasting results using BP neural network (unit: MW).

Maximal Average Average

The results can be inferred from Table 3 as follows.

Table 4. GS parameters of four quarters.

Table 5. The GS forecasting results (unit: MW).

Maximal Average Average

The following results can be inferred from Table 5.

3.3. Fusion Based on CI Model

3.3. Fusion Based on CI Model

where AA(i ) =={xi* ,xx∗i*,+1x,..., * * * *

gg23 −−gg,3 , gg− − g , g

The parameter determination

where Êgk is the computed CI value of (h1k , h2k , . . . , hkn ).

Table 7. Forecasting errors by CI in seven stages (Unit: MW).

Maximal Average Average

You might also like

where AA(i ) =={xi* ,xx∗i,+1x,..., * * *