1 s2.0 S0306261924000515 Main
1 s2.0 S0306261924000515 Main
1 s2.0 S0306261924000515 Main
Applied Energy
journal homepage: www.elsevier.com/locate/apenergy
GRAPHICAL ABSTRACT
Dataset link: https://github.com/AlexanderNeu The use of machine learning in building technology has become increasingly important in recent years. One
bauer/heating_cooling_load_VDI_6007 of the applications is heating load prediction, which enables demand-side flexibility. Most studies consider the
heating load prediction without sufficient context with the existing building characteristics. For an accurate
Keywords:
Heating load prediction
load prediction, suitable features have to be selected according to their importance, the feature importance (FI).
Building characteristic The scope of this paper is to investigate whether there is a relationship between the building characteristics
Correlation and the FI and if so, how strong this relationship is. Additionally, an analysis has been conducted to determine
Feature importance which building characteristic have the most significant impact on FI. For this purpose, a full factorial design
SHAP values of a room with six different building characteristics is carried out. In total, the heating load is calculated
Model explainability for 15 552 room variants. The thermal balance, correlation, random forest FI, permutation FI and SHapley
Additive exPlanations (SHAP) values are calculated for these different rooms. The local SHAP values were
https://doi.org/10.1016/j.apenergy.2024.122668
Received 30 September 2023; Received in revised form 28 December 2023; Accepted 12 January 2024
Available online 22 January 2024
0306-2619/© 2024 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
A. Neubauer et al. Applied Energy 359 (2024) 122668
used to explain the model. These values also provide insight into the interaction of individual features with
the heating load. For most variants, the outdoor temperature had the highest FI. It is investigated which
building characteristics have the greatest influence on the thermal balance, correlation, FI and SHAP values.
A relationship was found between the proportion of thermal balance, the correlation between the features and
the label as well as the FI. The greatest association with the thermal balance characteristics was found for the
SHAP values. The study shows the systematic relationship between building characteristics and FI. Therefore,
FI should always be considered in the context of building characteristics.
installed capacity of heat pumps is more than 1000 GW and will increase
Nomenclature
to 2600 GW by 2030 according to the Announced Pledges Scenario
(APS) [1]. The electricity required for the heat pumps is provided by
Greek Symbols an increasing share of renewable energy in the electricity grid. The
𝜙 Shapley value global renewable electricity capacity will grow by more than 60%
between 2020 and 2026, reaching more than 4800 GW. The increase in
Latin Symbols renewable electricity capacity is largely due to photovoltaics and wind.
These are fluctuating electricity generation sources [2]. Heat pumps
𝑄̇ Heat flux (W)
offer a good opportunity for demand-side flexibility. The demand-side
𝑞̇ Heat flux density (W m−2 )
flexibility capacity of heat pumps in the EU increases by a factor of
𝐴 Area (m2 )
more than three between 2021 and 2030 [2]. Heat pumps will have
𝐷 data set (–) a share of about 12% of the total flexibility resources in the EU in
𝑒 Error (–) 2030 [1].
𝑓 Model (–) In order to use demand-side flexibility, heating load predictions
𝑔 𝑔-value (–) with a good prediction accuracy are necessary. Data-driven approaches,
𝐼 Irradiance (W m−2 ) which have recently gained a lot of attention, are a useful way to pre-
𝑖 Player (–) dict the heating load. There are two types of data-driven approaches:
𝐾 Number of repetitions (–) statistical methods and machine learning methods [3]. Data-driven
𝐿 Error metric (–) approaches are black-box models in which the relationship between
𝑁 Set of Players (–) the input variables, and the output variables respectively the predicted
variables, is learned using training data. In contrast to grey-box models
𝑛 Running number (–)
with a physical model, it is possible to dispense with the often complex
𝑄 Heat (W s)
and time-consuming modelling of the building. For the application of
𝑟 Pearson Correlation Coefficient (–) machine learning (ML) models for heating load prediction, it is impor-
𝑟s Spearman’s Rank Correlation Coefficient (–) tant to select suitable input variables, the so-called features in ML. The
𝑆 Coalation of Players (–) suitable features are determined by feature selection. By selecting suit-
𝑇 Temperature (K) able features, the training time and model complexity can be reduced
𝑡 Time (s) and in some cases even the prediction accuracy can be increased [4].
𝑣 Characteristic function (–) The selected features and the accuracy of the prediction should be
𝑥 Feature (–) considered in relation to the characteristics of the building. Since the
𝑦 Label (–) characteristics of the building are present implicitly in the training data.
Another property of black-box approaches is the low interpretability
Subscripts of the results. The interpretability decreases with increasing model
ext exterior accuracy [5]. Model explainability is used to explain the results of the
model. It describes how the model gets from the inputs to the results.
IG Internal gains
There are two different types of model explainability model-specific –
out Outside
depending on the used ML model – or model-agnostic, meaning without
perm Permutation
dependence on a model and only depending on the data set.
sol Solar Previous studies have almost always considered the feature impor-
trans Transmission tance (FI) for heating load predictions without taking the building
vent Ventilation characteristics into account. There is currently a research gap as no
wall Wall comprehensive and systematic investigation of the influence of the
win Window building characteristics on the FI for heating load predictions has been
carried out. In the scope of this work, a parameter study with a generic
room model is used to simulate 15 552 different variations with various
building characteristics and internal loads as well as air change rates
in order to determine the hourly heating load for one year. For these
1. Introduction 15 552 data sets, the thermal balance of the building over one year is
evaluated as well as the FI and model explainability are calculated.
Heating load predictions are a key technology for the energy transi- FI is determined using random forest and permutation. The model
tion. Demand-side flexibility can be achieved through the use of heating explanation is carried out using SHapley Additive exPlanations (SHAP)
load predictions. In the coming years, the sector coupling between values. The aim of the work is to establish a connection between the
the electricity sector and the heating sector will increase strongly. characteristics of the room, the FI and the model explainability. The
Worldwide, 42% of the space heating demand in 2021 was covered by question is whether the characteristics of the building are reflected in
natural gas and 15% by electricity. In the coming years, the share of the results and if this relationship allows a better interpretation of the
gas will drop significantly and the share of electricity will rise sharply. model. In addition, the importance of the individual features for the
This is due to the rising number of heat pumps. In 2021, the global prediction quality will be determined depending on the building type.
2
A. Neubauer et al. Applied Energy 359 (2024) 122668
Three types of sources for building energy prediction uncertainties the cases the accuracy had the same level and the execution time was
can be distinguished, the human factors, the building factors and the decreased. In the work by Chaganti et al. [14], the heating and cooling
weather factors. The weather factors have a large influence [6]. When load for different ML models was investigated for a data set with 768
considering the forecast deviation of the weather factors, it can be seen buildings with eight different features. In Chaganti et al. the heating
that the solar irradiance prediction has a significantly higher mean load is defined as the amount of energy for an unspecified longer
absolute percentage error than the ambient temperature prediction [7]. period of time, whereas in this study the heating load is the hourly
The uncertainty of the input variables leads to uncertainty in the ther- load. The features used were glazing area, orientation, height, relative
mal load prediction [8]. Based on the importance of the input variables compactness, roof area, surface area, and wall area. The relative com-
– the FI – conclusions can be drawn about the prediction accuracy. For pactness, surface area and wall area had the greatest influence on the
example, if the FI of solar irradiance and outdoor temperature is the determination of the heating and cooling load. In the work of Ibrahim
same, the solar irradiance is probably involved to a greater proportion et al. [15], the annual heating and cooling load was determined for
in the heating load prediction error, because its input data have a 3840 typical buildings in Saudi Arabia. The features with the highest
greater uncertainty. The main contributions of this work are as follows: importance for the multilayer perceptron were the 𝑈 -values of the
window and the outer wall, the window-to-wall ratio, whereas the floor
1. Describes the relationship between the building characteristics, height and the building size had only a minor influence.
the thermal balance, the correlation between the features and Chung and Liu [16] created a data set of over 2000 buildings in
the heating load, the FI determined by random forest and per- five different climate zones and the constraints of U.S. Department of
mutation, as well as the SHAP values. Energy reference buildings. For these buildings, the annual heating and
2. Define the FI depending on the building characteristics. cooling demand was calculated and the importance of the nine features
3. Determine the influence of the building characteristics on the was determined using the standardised regression coefficient (SRC), lo-
variation of the thermal balance and the FI. cal interpretable model-agnostic explanations (LIME) and SHAP values.
4. Show the local SHAP values over the time course for different The aim of the study was to reduce the number of features required
building types depending on the feature values. while maintaining or only slightly reducing the prediction accuracy. In
the investigation of the SRC, the global solar radiation and the dry-
The paper is divided into six sections. Section 1 contains the the- bulb temperature are the variables with the highest influence on the
matic introduction to the topic and explains why heating load predic- heating load. For the LIME values, the dry-bulb temperature also had
tions are necessary and why they should be considered in the context the greatest influence and the largest variation. For the SHAP values,
of the room characteristics. In Section 2 the state of the art and already the dry-bulb temperature had the greatest influence on the heating
conducted investigations of feature selection and model explainability load and time also had a high influence. The importance of the SRC,
in the context of energy and heating load predictions are presented. The LIME and SHAP values were ranked. The SRC has a similar tendency
description of the methodology and the used data is given in Section 3. of importance as the physical phenomena of the heat transfer. Whereas
The results of the parameter study, the thermal balance, the correlation, the LIME and SHAP values have a tendency to deviate from the physical
the FI and the model explanation are presented in Section 4. Section 5 understanding. The SHAP value was found to be the most credible
contains the conclusion with questions for future research. analysis method for determining the importance of features for heat
and cooling load prediction. In the work of Deb et al. [17] the best
2. Related work retrofit strategies for buildings were identified using the permutation
FI and SHAP values. It was found that it is promising to choose the
The field of heating load prediction using machine learning has been retrofit strategy based solely on the FI. Wu et al. [18] shows for the
gaining increasing importance in recent years. The number of papers heat and cool load prediction that the local SHAP values for summer
on Explainable Artificial Intelligence in energy and power system ap- and winter and thus the importance of the feature differ strongly. The
plications [9] has increased significantly in recent years. Sakkas and connection between the features and labels could be well mapped and
Abang [10] determined the Spearman’s and Kendall’s coefficients for visualised with the SHAP values. The local SHAP values are consistent
real measured data between the hourly heating load of a district heating with the objective understanding and increase the transparency of the
system and various features. The strongest correlation was between prediction model and the credibility of the results.
temperature and the heating load. Whereas the wind speed and the rain The impact of the same features on building energy differs per
index have only a low correlation. Chen et al. [11] evaluated 25 papers building. Deciding on the selection of features require a good under-
for building energy prediction with different prediction time horizons. standing of the building physics and the specific building itself. This
The outdoor dry bulb temperature was the most frequently used feature process may require the involvement of expert knowledge and bespoke
with 18 uses, followed by outdoor relative humidity (13) and solar modelling, which will drive up development costs [19]. For this reason,
radiation (8). In some studies such as Fan et al. [12], a large number of this paper examines the relationship between FI and building character-
features, among others meteorological features in this case 12, are used istics. It also examines which building characteristics have the strongest
for the calculation of energy consumption. It was found that a large influence on the FI.
number of meteorological features have only a minor importance. In
predicting the heating load for a real district heating network [13], the 3. Methods and materials
past load, the outdoor temperature and the time index were determined
as the most important features. 3.1. Evaluation metrics
Kapetanakis et al. [4] selected appropriate features for data-driven
prediction models for the heating and cooling load. Therefore they In the scope of this study, the Pearson correlation coefficient (PCC)
used the Pearson and Spearman correlation. Six different reference 𝑟 (Eq. (1)) and the Spearman’s rank correlation coefficient (SRCC) 𝑟s
buildings each of 16 different locations were considered. The ambient (Eq. (2)) are used to evaluate the correlation between two variables.
temperature had a strong correlation in almost all cases, while the other The PCC describes the measure of linear correlation between two vari-
input factors like zone air temperature, zone and air humidity and the ables (𝑥, 𝑦). The SRCC assesses the monotonic relationship between two
solar radiation correlates variously with the thermal load. Depending continuous or quantitatively discrete variables independent of linearity.
on the building type and the climate zone. The prediction accuracy The 𝑟s is calculated using the pairwise distances (between 𝑥 and 𝑦) of
of an artificial neural network was examined with a reduced feature the ranks 𝑑 and the number of samples 𝑛. Both evaluation variables can
selected on their correlation with the thermal load. In the majority of assume values between −1 and 1. A positive value means a rising trend
3
A. Neubauer et al. Applied Energy 359 (2024) 122668
and a negative value means a falling trend. The closer the correlation is selected model algorithm. The evaluation of the individual features
to the value ± 1, the stronger the correlation is. The classification of the is repeated for each subset, the generation of the subsets depends on
correlation strength into fixed ranges with cutoff points is problematic the selected search strategy. The computation time of the wrapper
because the interpretation of the correlation coefficients depends on the methods is much higher than for the filter methods. The permutation
underlying data and the field of application [20]. FI is a wrapper method. Embedded methods create an evaluation of the
∑𝑛 feature selection while the model algorithm is executed. The methods
(𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦)
𝑟 = √ 𝑖=1 (1) are thus embedded in the learning algorithm. The computation time
∑𝑛 2 2
𝑖=1 (𝑥𝑖 − 𝑥) (𝑦𝑖 − 𝑦) of embedded methods is lower than the computation time of wrapper
∑ methods [22]. An example of an embedded method is the random
6 𝑛𝑖=1 𝑑𝑖2
𝑟s = 1 − (2) forest.
𝑛(𝑛2 − 1)
The multivariate analysis of variance (MANOVA) is a statistical analy- The definition of interpretability in the context of ML systems accord-
sis method that analyses the influence of several independent variables ing to Doshi-Velez and Kim [25] is the ability to explain or to present in
on more than one dependent variable. The linear combinations of the understandable terms to a human. According to Brian and Cotton [26],
different dependent variables are examined for significant differences in interpretability is defined as follows, systems are interpretable if their
the mean value. The MANOVA is a further development of the Analysis operations can be understood by a human, either through introspection
of variance (ANOVA). In the ANOVA, there is only one independent or through a produced explanation. Two types of interpretability were
variable, whereas in MANOVA there are several independent variables. defined. The prediction interpretation, in which a justification for the
To evaluate the relationship between the independent and dependent prediction for a given prediction and usually non-interpretable models
variables, the Wilks Lambda (WL), among others, can be used. The (e.g. neural networks) is made. The other variant is the interpretable
WL indicates how strongly the independent variable divides the depen- models. In this variant, interpretable models are developed. Which are
dent variable into individual groups. A lower value means a greater interpretable by themselves (introspection) and can be explained by
discrimination ability of the independent variables to the dependent reasoning.
variables. In model interpretability, there is a trade-off between interpretabil-
ity and performance. As model accuracy increases, model interpretabil-
3.3. Feature selection ity decreases [5]. For example, rule-based learning models or linear
regressions have a high model interpretability but a low model ac-
The objective of feature selection is to determine suitable features curacy. Whereas deep learning models have a low interpretability
or subsets of features for the prediction model. A proper selection but a high model accuracy. For this reason, attempts to increase the
of features can, for example, reduce the complexity of the model, interpretability of ML models should be made. The advantage of in-
shorten the training time and increase the quality of the prediction. creased interpretability is, among other things, better implementability
Furthermore, the interpretability of the model can be enhanced by an of ML models. When developing ML models, the implementability can
appropriate choice of features. Depending on whether label information be improved by considering interpretability as an additional design
is available or not, feature selection techniques can be divided into criterion for three reasons [5].
three families: the supervised methods, the semi-supervised methods and
the unsupervised methods [21]. In the scope of this work, the labels are • Interpretability helps to provide impartial decision making, i.e., to
known, so they are supervised methods. Three different approaches to detect and consequently correct biases in the training data.
feature selection can be distinguished the filter methods, the wrapper • Interpretability makes it easier to provide robustness by high-
methods and the embedded methods [22]. Furthermore, feature selection lighting potential adverse perturbations that could change the
can be distinguished between model-specific and model-agnostic eval- prediction.
uation methods. With model-agnostic methods, the evaluation results • Interpretability can serve as an insurance that only meaningful
are independent of the selected learning algorithm, whereas the results variables infer the output, i.e., it guarantees that an underlying
of model-specific methods are dependent on the selected learning algo- truthful causality exists in the modelling.
rithm. Fig. 1 shows an overview of selected methods of model selection
and model explanation. The explainability, as opposed to interpretability, is linked to the
Filter methods determine the importance of features independently internal logic and mechanics within the machine learning model. The
of the used learning algorithm (model-agnostic). When using filter more explainable the model is, the better the human can understand
methods, single features (univariant feature filters) or feature subsets why the internal processes happens while the model is training or
(multivariate feature filters) can be used. An example of a univariate making decisions [27]. Explainable models are interpretable by de-
filter variant is the determination of the correlation between the fea- fault but not always vice versa [28]. Explainability refers to reasons
tures and the label [24]. For this purpose, the PCC, SRCC (Eq. (1), (2)) and details a model provides to make its operation clear or easy to
are used in the scope of this work. The correlation for each feature understand. One of the main barriers to the use of ML in practice is
is determined, sometimes ranked, and features with weak correlation the lack of explainability. For example, ML models often deliver good
can be removed. Filter methods have a low computing time. However, performance, but the path to the result is often not apparent. For this
the correlation methods only examine the relationships of individual reason, the explainability of ML models must be considered. The goals
features to the label. The influence of subsets of several variables on of Explainable Artificial Intelligence (XAI) are trustworthiness, causal-
the target variable is not taken into account. Thus, several variables ity, transferability, informativeness, confidence, fairness, accessibility,
which individually have a small influence may together have a large interactivity and privacy awareness [5].
influence on the target variable. To examine such subsets, multivariate
feature filters (e.g. MANOVA) or wrapper methods must be used. 3.5. Permutation feature importance (PFI)
These methods also take into account the interrelationship between the
features. The Permutation Feature Importance (PFI) [29] describes the in-
In contrast to filter methods, wrapper methods use a specific model crease of the model error when a single feature value is randomly
algorithm (model-specific). The results are therefore biased by the shuffled. Due the shuffling the correlation between the feature and the
4
A. Neubauer et al. Applied Energy 359 (2024) 122668
Fig. 1. Overview feature selection and model explanation (feature selection drawing inspired by: [23]).
label is removed. The larger the change in the label variable, the larger of players the importance of the players in each possible coalition. Some
is the influence of the feature variable. Algorithm 1 shows the process players have a higher importance on the total payout. Furthermore, the
of the PFI (based on [30]). importance of the players is not only considered individually, but also
To calculate the PFI, the default model (without permutation) 𝑓 , a in combination with other players. There is a set of players 𝑁 = 1, … , 𝑛,
data set 𝐷 with features 𝐱 and label 𝑦 is required. In the first step, where 𝑛 is the number of players. 𝑆 is a coalition of players. The
the error for the standard model 𝑒default is calculated. In this study, characteristic function 𝑣 assigns a real number to each coalition of
the negative mean square error (NMSE) is used as the error metric 𝐿. players. This number is the value of a coalition 𝑣(𝑆), which represents
For each feature 𝑗, the values are shuffled and the whole is repeated the expected sum of the payout of the coalition, which is the total
for 𝐾 repetitions. Shuffling the feature values resolves the relationship expected profit. The game is thus described by the tuple (𝑣, 𝑁). 𝑁 ⧵ {𝑖}
between the feature and the label. The model error 𝑒perm is calculated is all possible coalitions excluding the player 𝑖. The Shapley value 𝜙𝑖
for the permuted feature matrix 𝐗perm . The importance of a single for player 𝑖 ⊂ 𝑆 ⊆ 𝑁 is given by Eq. (3).
feature PFI𝑗 is calculated from the difference of the model error for
the default model and the mean of the model error of the permuted ∑
1
(𝑛 − 1 − |𝑆|)! ⋅ |𝑆|!
𝜙𝑖 (𝑁, 𝑣) = ⋅(𝑣(𝑆 ∪ {𝑖}) − 𝑣(𝑆)) (3)
data. 𝑆⊆𝑁⧵{𝑖}
𝑛! ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟
marginal contribution
Algorithm 1 Permutation feature importance The Shapley value of the player 𝑖 is the weighted marginal contri-
Require: default model 𝑓 , data set 𝐷 bution of all possible coalitions. The player’s marginal contribution is
Calculate the error of the original model the difference between the value a coalition achieves with and without
𝑒default = 𝐿(𝐲, 𝑓̂(𝐗)) (e.g. NMSE) the player. Since the marginal contribution also depends on the order
for each feature 𝑗 ∈ 1, ..., 𝑝: do in which the players join, the calculation of 𝜙 is done by considering
for each repetition 𝑘 ∈ 1, ..., 𝐾: do all possible orders of the players. The weighting is done as follows.
- Shuffle feature 𝑗 which generates a
The number of possible combinations for 𝑛 players is 𝑛!. For each
permutated feature matrix 𝐗perm .
coalition 𝑆 ⊆ 𝑁 ⧵ {𝑖} there are (𝑛 − 1 − |𝑆|)! ⋅ |𝑆|! orders of precedence.
- Calculate the error 𝑒perm = 𝐿(𝐲, 𝑓̂(𝐗perm ))
with the permutated feature matrix
There are |𝑆| possibilities to arrange the players from 𝑆 before the
end for player 𝑖 and (𝑛 − 1 − |𝑆|)! possibilities to arrange the players from
∑𝐾
Determine PFI𝑗 = 𝑒default − 𝐾1 𝑘=1 𝑒perm,j 𝑁 ⧵ {𝑆 ∪ {𝑖} after the player 𝑖. Using Shapley values, the influence of
end for different players on the payout and its fair distribution can be found.
A fair payout can be considered to be the fulfilment of the properties
3.6. Shapley additive explanations (SHAP) efficiency, symmetry, dummy and additivity. The Shapley value is the
only attribution method that fulfils all of these characteristics [30].
Another method to determine the influence of a feature on a target A variety of methods exist based on the Shapley value using differ-
value is to use SHAP values [31]. This game theoretic approach is used ent approaches [33]. The SHAP values [31] are based on the Shapley
to justify the output of machine learning models and thus increase the Values and have been extended thereby. Thus, SHAP combines the local
transparency and interpretability of machine learning models. SHAP is interpretability of other model-agnostic approaches such as Local Inter-
a model-agnostic approach, so that the explanation can be considered pretable Model-Agnostic Explanations (LIME) [34] and the game theory
separately from the model. Local and global explanations can be given of Shapley values. This results in several properties in the form of
for the model. The importance of the SHAP values is determined by local accuracy, missingness, consistency. Furthermore, SHAP included
the magnitude of the feature attributions, while the PFI is based on the efficient methods for computing the Shapley values. The KernelSHAP
reduction of the model accuracy. SHAP values are based on the Shapley is a kernel-based model, while the TreeSHAP is a tree-based model,
values [32], which are a concept from cooperative game theory. The both are used to estimate the Shapley values. In the scope of this work
Shapley value 𝜙 is a solution concept for the fair distribution of the the TreeSHAP [35] is used to calculate the influence of each feature
payout in a cooperative game. This approach determines for coalitions on the prediction. In the case of the SHAP values, the nomenclature is
5
A. Neubauer et al. Applied Energy 359 (2024) 122668
Table 1 Table 2
Investigated room surfaces in the parameter study. Investigated building characteristics in the parameter study.
Room type 𝐴win 𝐴wall 𝐴win /𝐴wall 𝐴wall /𝑉room Nr. 𝑈wall 𝑈win 𝑔-value AER 𝑞̇ IG
m2 m2 1 m−1 W/(m2 ⋅K) W/(m2 ⋅K) 1 h−1 W∕m2
1 7 3.5 0.67 0.2 1 0.15 0.8 0.15 0.25 5.1
2 3.5 7 0.33 0.2 2 0.25 1.1 0.3 0.5 10.3
3 7 18.5 0.27 0.49 3 0.5 1.4 0.45 0.75 15.4
4 0.75 1.7 0.6 1 20.6
5 1 2.1 0.75 2 25.7
6 1.5 2.7 0.9 4 30.9
considered in the context of machine learning rather than game theory.
So the players are single features or groups of features and the payout
is the prediction of the model.
gains were varied. This variation covers a large part of the building
age classes, insulation types and types of use of office buildings. The
3.7. Justification for the choice of methods
ventilation and internal loads are applied from Monday to Friday
between 7 am and 5 pm. The indoor room temperature in the model
This study analyses various methods for determining the FI and is ideally prescribed to be 20 °C in the heating case and 24 °C in the
compares their characteristics and results. Different types of feature cooling case throughout the year. Within the room temperature band
selection, the filter methods PCC, SRCC, MANOVA, the wrapper method of 20-24 °C, there is neither heating nor cooling. The consideration
PFI, the embedded method random forest feature importance (RFFI) period for the heating season is from 01. October to 31. April. If
and the model explainability method SHAP are investigated. These thermal losses are higher than gains, then heating to the temperature
methods are used to find suitable features for ML prediction models. of 20 °C will occur in this period. The data set for training and testing
The filter methods are considered because there are fast and easy the heating load prediction model uses this time period. As weather
to use. The wrapper methods consider individual subsets of features, data the International Weather for Energy Calculations (IWEC) for the
but are computationally intensive [3]. Through the elimination of German cities Berlin and Stuttgart are used, which allows to consider
iterative model training, XAI has computational advantages for feature different climatic zones. The weather data have a period of one year
selection. In addition, XAI methods provide insight into the model’s and a temporal resolution of one hour. To generate the data set, all
decision-making process [36]. room types, the thermal mass, building physical properties and usage
This model explainability offers great advantage over methods like properties are varied with each other. The 𝑈 -values of the wall and
filter, wrapper or embedded methods. For the explanation of black box the window are always considered in combination and are therefore
models, LIME and SHAP are by far the most comprehensive and dom- not varied among each other, because they only have an influence on
inant methods in the literature for visualising feature importance and the average 𝑈 -value of the exterior building components. Due to the
feature interactions [27]. It was shown that SHAP identify important variation of the properties, 15 552 variants exist.
features that have a major influence on the load forecast [37]. SHAP,
3.9. Definition heating load and demand
which is the most commonly used model explainability method for
power and energy applications [9], is considered in this work due to
The heating load 𝑄̇ heat of the building results from the difference of
its frequent use and diverse possibilities.
the gains and losses of the building (Eq. (4)). Gains are solar irradiance
𝑄̇ sol through the opaque components and internal gains 𝑄̇ IG which
3.8. Building description result from e.g. people, lighting and equipment. The losses are divided
into transmission heat losses 𝑄̇ trans and ventilation heat losses 𝑄̇ vent . In
The data set for the heating load prediction is generated by sim- addition, the building mass acts as a thermal storage and can store and
ulating a simplified room model. The room is represented by two discharge energy 𝑄̇ st .
RC elements. All exterior and interior components are combined into
one equivalent element each. The floor area is 17.5 m2 and the room 𝑄̇ heat = 𝑄̇ sol + 𝑄̇ IG + 𝑄̇ trans + 𝑄̇ vent +𝑄̇ st (4)
⏟⏞⏞⏞⏟⏞⏞⏞⏟ ⏟⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏟
has a south-facing exterior wall with a window. The modelling is gains losses
based on the boundary conditions of test case 5 of VDI 6007-1 [38].
The 𝑄̇ trans is directly proportional to the thermal transmittance
The consideration of the radiation through the transparent exterior
of the exterior components 𝑈ext and the temperature difference 𝛥𝑇
components is carried out according to VDI 6007-2 [39]. The exact
between the room temperature 𝑇room and the outside temperature 𝑇out
structure of the room model is described in Lauster et al. [40]. The
(Eq. (5)). Besides the 𝑈 -values of the exterior components, the A/V
modelling of the room is done with the modelling language Modelica
ratio and the share of window area also have a major influence on the
in the modelling and simulation environment Dymola. The used room
heating load.
model is the thermal zone with two elements for exterior and interior
walls model of the Buildings library version 8.0.0 [41]. 𝑄̇ trans = 𝑈ext ⋅ 𝐴 ⋅ 𝛥𝑇 (5)
For generating different room types, the exterior wall area 𝐴wall and
window area 𝐴win were varied. Table 1 shows the surface properties of 𝑄̇ sol = 𝑔 ⋅ 𝐴win ⋅ 𝐼sol,win (6)
the room configurations. Room type 1 corresponds to the area values
𝑄̇ vent has a directly proportional relationship with the temperature
from VDI 6007. The window to wall ratio 𝐴win ∕𝐴wall is 0.67 and the
difference between 𝑇room and 𝑇out and the air exchange rate. The
wall area to room volume ratio 𝐴wall ∕𝑉room is 0.2. For room type 2, internal gains 𝑄̇ gain directly affect the room as convective and radiation
the areas for the exterior wall and window area were swapped. For the heat, in the study the amount of gains is simplified based on the specific
third variation, an interior wall was changed to an exterior wall. internal gains 𝑞̇ IG . The solar gains 𝑄̇ sol depend on the window area
These adjustments change the 𝐴win ∕𝐴wall and the 𝐴wall ∕𝑉room which 𝐴win , the solar irradiance on the window 𝐼sol,win and the 𝑔-value 𝑔 of the
have a large impact on the heat transmission and solar gains (Eq. (5) window (Eq. (6)). The 𝑔-value indicates what proportion of radiation
and (6)). Additionally to the wall areas, the specific heat capacity of the hitting the window is emitted by transmission or by convection –
room was also varied. Light (50 Wh/m2 K) and medium (90 Wh/m2 K) between the window and the room – to the room. The AER and 𝑞̇ IG
thermal masses are studied. Besides the area properties of the room, are equal to the values in Table 2 during the hours of use Monday to
the building physical properties 𝑈wall , 𝑈win , 𝑔-value of the window, as Friday from 7 am - 5 pm. Outside the hours of use, the air change and
well as the usage properties like air exchange rate (AER) and internal internal gains are 0.
6
A. Neubauer et al. Applied Energy 359 (2024) 122668
Fig. 3. Distribution of selected features and the label for the training, validation and
test set.
7
A. Neubauer et al. Applied Energy 359 (2024) 122668
The relationship between the parts of the thermal balance and the
correlations of the features and the heating load are examined. For
better comparability with the FI and the model explainability, the
correlation is determined exclusively for the test data set (December).
According to Sun et al. [3], the features can be divided into mete-
orological information, indoor environmental information, occupancy
related data, time index, building characteristic data, socio-economic
information and historical data. The most frequently used features are
meteorological, historical data and time index. Since the data sets are
simulation data, only data that influence the results of the simulation
are considered. This means, for example, that wind speed is excluded.
Since only occupancies and internal loads of 0 or 100 % are present
in the simulation and these are known, time index is not used. With
the help of this time index, the occupancy could be learned from the
training data. The correlation between GHI and 𝐼sol,so is very strong
(𝑟 = 0.88, see Fig. 2). This strong correlation can cause unrealistic data
instances in the permutation feature importance, for example, if there is
a very high GHI but no 𝐼sol,so . Furthermore, adding a correlated feature
can reduce the importance of the correlated feature, as the FI is split
between both features [30,43]. For this reason, only 𝐼sol,so is considered
in the following considerations and not in combination with GHI. 𝐼sol,so
Fig. 4. Proportion of thermal balance depending on the building characteristics (olive
line = course of the median values). is chosen because it leads to a higher FI when considering GHI and
𝐼sol,so separately. To reduce the number of evaluation parameters, only
one correlation is evaluated in the following. Since the PCC is more for
interval scale and the SCC more for ordinal scales, the PCC is chosen
and internal gains 𝑄IG range from 1.0–86.4% (mean: 15.1%). The
because more values have interval scale (Fig. 2).
mean outside temperature 𝑇 out in Berlin with 2.42 °C is higher than
For the thermal balance, the amount of energy is determined for
in Stuttgart with 2.03 °C. The mean global radiation GHI, on the other
the entire period of December and for the correlation, the hourly
hand, is greater in Stuttgart with 29.1 W to 16.9 W than in Berlin. The features and the label are used to determine the correlation. Since
mean heat demand 𝑄heat for December in Stuttgart with 449.3 kWh is the energy shares cannot be used directly for the correlation, features
lower than that of Berlin with 492.9 kWh. The higher GHI thus have that significantly influence the parts of the thermal balance are used
a greater influence on the 𝑄heat than the lower 𝑇out . The share of for the correlation. The share of 𝑄sol is equated with 𝐼sol,so and the
𝑄IG is higher in Stuttgart and in Berlin a higher share of the thermal share of 𝑄IG with occupancy. For the share of 𝑄heat , 𝑇out is used as
balance has to be covered by 𝑄heat (Fig. 4). The thermal mass (TM) the feature. Since this is the potential for the heat losses (Eq. (4)) in
and the room type (RT) have a minor influence compared to the other the form of transmission (Eq. (5)) and ventilation heat losses. Fig. 5
building characteristics. A larger window area (RT 1 compared to RT shows the PCC between the features and the heating load for all 15 552
2) increases the mean proportion of solar gains. A larger proportion of variants depending on the building characteristics. Higher features –
external walls leads to a higher mean proportion of heating energy (RT i.e. outdoor temperature, solar irradiance and occupancy – lead to
1/2 to RT 3). A lower 𝑈 -value reduces the mean share of the heater a lower heating load. The individual correlations are thus negative.
The correlation between the features and the heating load is strongly
and increases the share of solar and internal gains. A higher 𝑔-value
dependent on the building characteristics. The strongest correlation is
or higher internal gains leads to a larger share of gains and thus to a
on average between 𝑇out and 𝑄̇ heat with values from −0.8990 to 0.1836
reduction in the share of heating. An increasing AER, on the other hand,
(mean: −0.5495). The sporadic positive correlation occurs with a low 𝑈 -
has a negative influence on the share of active heating, so the share value, high internal loads and a low air exchange. In these variants, the
𝑄heat increases with an increasing AER. The results from the parameter heating load of the building is very small and the correlation between
study are plausible and correspond to the expected tendencies from the 𝑇out and 𝑄̇ heat is not given due to the high internal loads and the low
thermal balance (Eq. (4)). heat losses through transmission and ventilation.
To quantify the results of the thermal balance (Fig. 4), a MANOVA With higher 𝑈 -values and AER, the corr(𝐓out , 𝐐̇ heat ) increases. The
is performed between the building characteristics (independent vari- proportion of internal gains in the thermal balance decreases and a
ables) and the thermal balance (dependent variables). The MANOVA larger proportion must be actively provided by a heater. The heating
determines which building characteristics have a great influence on load is thus dependent on the outdoor temperature to a greater ex-
the thermal balance. Table 3 shows the results of the MANOVA. The tent. The correlation trends of corr(𝐓out , 𝐐̇ heat ) for the 𝑈 -values and
python package statsmodels [42] was used to carry out the MANOVA. the AER are contrary to those for the share of 𝑄heat in the thermal
All building characteristics are statistically significant (𝑝 < 0.05). A balance. Since a higher 𝑇out leads to a lower share of 𝑄̇ heat . The
corr(𝐈sol,so , 𝐐̇ heat ) has values ranging from −0.6522 to −0.0134 (mean:
lower WL value means a greater discrimination ability of the building
−0.4531). The corr(𝐓out , 𝐐̇ heat ) and corr(𝐈sol,so , 𝐐̇ heat ) decreases with in-
characteristics (independent variables) to the thermal balance (depen-
creasing 𝑔-values, because a higher 𝑔-value means that there are fewer
dent variables). Thus, the thermal mass and the room type have a very
hours of heating load. Higher 𝑈 -values, on the other hand, lead to
low discriminant of WL = 0.9968 and 0.9313, respectively. The AER a stronger correlation, since heating load is also present at higher
has the lowest WL with 0.4263 and the internal gain the second lowest 𝐼sol,so values. With a correlation of −0.7245 to 0.0676 (mean: −0.4354),
value with 0.4697. The two variables thus have a greater influence on occupancy has the lowest mean correlation of the shown features.
the subdivision into individual groups of the thermal balance parts. The To quantify the influence of the building characteristics on the
𝑈 -value and the 𝑔-value have a WL of 0.5799 and 0.4913, respectively, correlation, a MANOVA is carried out. All building characteristics
indicating a medium discrimination of the dependent variable. show a statistical significance for the correlations. The thermal mass
8
A. Neubauer et al. Applied Energy 359 (2024) 122668
Table 3
Results of the MANOVA to evaluate the influence of the building characteristics on thermal balance, PCC, PFI, RFFI, and SHAP values.
Building Thermal balance (Fig. 4) PCC (Fig. 5) PFI (Fig. 6) RFFI relative SHAP value (Fig. 7)
characteristic WL Rank WL Rank WL Rank WL Rank WL Rank
location 0.6641 3 0.2764 6 0.8901 3 0.9365 3 0.9408 3
thermal mass 0.9968 1 0.9798 1 0.9755 1 0.9782 2 0.9789 1
room type 0.9313 2 0.8644 2 0.9575 2 0.9928 1 0.9762 2
𝑈 -value 0.5799 4 0.5432 4 0.5272 5 0.8957 4 0.6264 5
𝑔-value 0.4913 5 0.5954 3 0.3339 6 0.8957 4 0.5952 6
IG 0.4697 6 0.2952 5 0.5650 4 0.7076 7 0.7151 4
AER 0.4263 7 0.2673 7 0.3185 7 0.7536 6 0.5813 7
Fig. 5. Pearson correlation coefficients between features and heating load depending Fig. 6. Permutation feature importance depending on the building characteristics (olive
on the building characteristics (olive line = course of the median values). line = course of the median values).
(WL = 0.9798) is the parameter with the lowest discrimination ability each run. The average of these 50 runs was used for the evaluation.
of the function and the location and the AER the parameters with the For the RFFI, the random forest regressor from Scikit-learn with 100
highest discrimination ability (WL = 0.2764 and 0.2673), followed by the trees was used. For variants with a low number of heating hours, the
internal gains with a WL of 0.2952. The ranking of the WL values for model accuracy is in some cases poor, which means that the PFI and
the various building characteristics is identical for the thermal balance RFFI have little significance for these cases.
and the PCC, for the thermal mass, room type, resistance and AER. Fig. 6 shows the PFI for the features depending on the building
Building characteristics that have a large or small influence on the characteristics. The lower the value, the lower the significance of the
thermal balance therefore tend to also have a large or small influence feature on the prediction. A negative value of the PFI means that the
on the correlation between the features and the heating load. model with the shuffled feature has a better accuracy than without the
shuffled feature. The highest mean PFI has 𝐼sol,so with values between
4.3. Importance of the features to predict the heating load 0.002 – 0.149 (mean: 0.098). The 𝑇out has an almost identical mean PFI
with values of −0.015 – 0.196 (mean: 0.097). Occupancy has the lowest
To determine the importance of each feature on the heating load, mean PFI with values ranging from −0.017 – 0.082 (mean: 0.010). For
the two methods PFI (Section 3.5) and the RFFI are determined for the RFFI, the mean decrease in impurity (MDI) is calculated. The sum
each individual room variant. The models are trained with the training of all FI is thus always 1. The highest RFFI has 𝑇out with 0.44 – 0.93
set and the PFI and RFFI are determined for the test set. The data (mean: 0.62), followed by the 𝐼sol,so with 0.01 – 0.54 (mean: 0.35). The
sets were scaled to a value range of 0–1. Due to the dimensioned occupancy has very low values of 0.00 – 0.32 (mean: 0.04). 𝑇out has a
evaluation metric and the wide range of heating loads, all features and much higher mean value compared to the other features. The influence
the label (heating load) of the 15 552 variants were scaled for better of 𝑇out is thus higher for the RFFI than for the PCC and the PFI. For the
comparability. The software library XGBoost [44] (eXtreme Gradient RFFI and the PFI, occupancy is the feature with the lowest importance.
Boosting) in Python is used for the model fitting of the PFI. XGBoost The feature with the highest importance is 𝐼sol,so for the PFI and 𝑇out for
is a distributed gradient boosting library that implements machine the RFFI. In the work of Deng et al. [46] it was found that features with
learning algorithms under the Gradient Boosting framework. XGBoost a high cardinality are preferred in RFFI. By permuting the features, this
uses parallel tree boosting. The model has 100 gradient boosted trees. bias can be removed for features with high cardinality. This results in
In PFI, the individual features are shuffled to see how this affects the a low RFFI value for the occupancy, as the occupancy has only two
model score. The NRMSE was used as the model score for the PFI. The values (low cardinality). Using PFI is therefore more appropriate.
PFI is executed with the Python library Scikit-learn [45]. The features The importance of the features for the model needs to be considered
were shuffled 50 times and the PFI for the features was determined for in the context of the building characteristics. The MANOVA shows
9
A. Neubauer et al. Applied Energy 359 (2024) 122668
Figs. 7 and 6). The thermal mass and the room type lead to a low
discrimination ability of the SHAP values, whereas the AER and the
Fig. 7. Relative global SHAP value depending on the building characteristics (olive 𝑔-values lead to the highest discrimination (see Table 3).
line = course of the median values). Fig. 8 shows the relative SHAP values for five cases. This allows to
show the interaction effects of building characteristics. Case 1 is the
default room (room 1) with room type 2, thermal mass m, 𝑈 -values
the influence of the building characteristics for the PFI and the RFFI. (wall: 0.75 W∕m2 K, window: 1.7 W∕m2 K), 𝑔-value 0.45, AER 0.5 h−1
The thermal mass and the room type have very small influence on and 𝑞̇ IG 10.3 W∕m2 K at the location Berlin.
the discreteness of the PFI and the RFFI. The AER has the greatest For each case, a building characteristic with high discriminatory
discrimination ability for the PFI and the internal load for the RFFI. ability of the SHAP values is varied. Poor insulation (case 2) results
The WL has lower values for the PFI than for the RFFI. The PFI in a higher share of the SHAP value for 𝑇out , while the share of the
thus shows a greater discrimination of the values depending on the occupancy decreases. In case 3, the 𝑔-value of the windows is reduced,
building characteristics (see Table 3). The ranking of the WL values resulting in lower SHAP fraction of 𝐼sol,so . Case 4 is also referred to
in the thermal balance and the PFI are similar. Based on the results, a as room 2. Which has the same characteristics as room 1 but poorer
higher proportion of the thermal balance generally leads to a greater FI. insulation, a lower 𝑔-value and a higher 𝑞̇ IG . The relative SHAP value
Furthermore, no general statement can be made about the importance of occu increases significantly and the SHAP value of 𝑇out decreases due
of features in a heating load prediction, as the FI is highly dependent to the higher internal loads in case 4 compared to case 3. In case 5, the
on the building characteristics. AER is changed from 0.5 h−1 to 4 h−1 . The SHAP value share of 𝑇out is
almost identical in case 5 (higher AER, 𝑈 -value and IG) compared to
4.4. Determine model explainability with SHAP values case 1, despite the poorer insulation and the significantly higher AER.
These two characteristics are counteracted by the higher internal gains.
To determine the model explainability, the SHAP values of the Therefore, all building characteristics must be considered holistically
features are determined. The influence of the features is not only con- as they affect the SHAP values. For example, it can be seen that poorer
sidered separately, also the influence of the interrelationship between insulation results in an on average higher relative SHAP value of 𝑇out
the features is considered. Fig. 7 shows the relative global SHAP values (see Fig. 8). However, this can be compensated by very high internal
of the features depending on the building characteristics. Due to the gains (see Fig. 8 case 1 and 5).
different absolute SHAP values of the individual 15 552 room variants In addition to the global model explanation, SHAP values can also
and the resulting lack of comparability, the relative proportions of the be used to carry out a local model explanation. The global evaluation is
SHAP values were determined for each variant. The relative SHAP value the sum of all absolute SHAP values, while the local evaluation shows
is the feature-specific proportion of the sum of all three SHAP values the SHAP values for each time step. Fig. 9 shows the local SHAP values
of 𝑇out , 𝐼sol,so and occupancy. for room 1 and room 2. The upper subplot shows the scaled heating
The SHAP value is determined with the TreeSHAP method [35]. load.
For this purpose, an XGBoost model with 100 gradient boosted trees The sum of the base value and the three SHAP values of 𝑇out , 𝐼sol,so
is fitted and individual features are removed when determining the and occu at the respective time step is equal to the scaled heating
prediction value. The change in the predicted value is evaluated on the load. Where the mean heating load from the training set is the base
basis of the SHAP value. A negative SHAP value means that the value value. Subplot 2 and 3 show the local SHAP values for the default
leads to a reduction of the prediction value, whereas a positive SHAP room (room 1) and the comparison room (room 2). The last subplot
value leads to an increased prediction value. For the global SHAP value, shows the scaled features (𝑇out , 𝐼sol,so and occupancy). The period of
the sum of all local absolute SHAP values is calculated. The feature observation is from the 9th to the 11th of December and includes
𝑇out has the largest relative global SHAP values, which are between use and non-use days. The heating energy demand of room 1 over
32.3–73.9% (mean: 54.8%), followed by 𝐼sol,so with 23.1–49.1% (mean: the period is 13.8 kWh and that of room 2 is 23.1 kWh. The heating
36.8%). The occupancy 0.7–32.2% (mean: 8.5%) has lower proportions load and thus SHAP values are scaled to a range of values from 0–
(see Fig. 7). The relative SHAP values for 𝐼sol,so rise with increasing 1. This allows a better comparison of the variants with each other, as
𝑔-value. The same applies to the SHAP values of the occupancy with they have different maximum heating loads. The SHAP value indicates
increasing internal loads. In general, SHAP values and PFI show similar whether the feature leads to an increase or reduction compared to the
trends for 𝑇out , but diverge in certain cases for 𝐼sol,so and occu (see mean heating load. The global SHAP values for the period from the
10
A. Neubauer et al. Applied Energy 359 (2024) 122668
Fig. 10. Normalised values from thermal balance, PCC, PFI, RF, relative global SHAP
value.
also have a higher relative prediction error when using future features
than buildings with poor insulation and lower occupancy or smaller
windows. In future work, it can be investigated how high the influence
Fig. 9. Local SHAP values for two rooms depending on the feature values. of the prediction error of the features is on the prediction accuracy of
the heating load.
9th to the 11th of December are shown below the feature labels. Solar 4.5. Utilisation of the results for practical applications
radiation occurs during the day on 09.12. This results in a negative
SHAP value of 𝐼sol,so , as solar radiation leads to a reduction in the Based on the SHAP values, buildings can be classified according to
heating load compared to the average heating load. In room 2, the their building characteristics solely based on the input values of the
global SHAP value from 𝐼sol,so is lower because the 𝑔-value of the features and the heating load. Additionally, it is possible to analyse
window is smaller. On the day of occupancy 10.12. The SHAP value of for each time step which feature accounts for what proportion of the
occupancy is reduced due to the presence of people and the associated heating load. This provides precise information on the interaction of
internal load. In room 2, the internal load is three times higher than in individual factors. Only white or grey box models of the building can
room 1, which leads to a significantly lower SHAP value of occupancy. provide this insight. Which requires a lot of information about the
Lower 𝑇out , as in the period from 5 am to 9 am on 9 December, result building. Local SHAP values can therefore be used to provide purely
in high positive SHAP values of 𝑇out , as these lead to an increase in data-driven insights into the building, identifying ways to optimise it.
the heating load. Overall, the global SHAP value for 𝑇out is higher for By determining the influence of solar radiation and occupancy on the
room 1 than for room 2, despite the lower 𝑈 -value. This is due to the heating load, it is also possible to determine a normalised heating en-
interaction of all three SHAP values. Using the local SHAP values it ergy demand for different observation periods, independent of weather
is possible to observe the influence of the individual features on the and occupancy. Furthermore, a hypothetical heating energy demand
SHAP values. The temporal course of the SHAP values can be used to can be calculated by adjusting the feature values such as occupancy
identify different building characteristics such as insulation, occupancy time or solar radiation.
and window characteristics.
For room 1 the calculation of the TreeSHAP was the fastest, with a 5. Conclusion
computation time2 for the standard room variant of 0.13 s, compared to
the RFFI with 0.20 s and the PFI with 0.38 s. In contrast, the KernelSHAP
This work investigated whether there is a systematic relationship
computation time is 7.8 s.
between the importance of features in a heating load prediction and the
For the final comparison of the influencing variables and features
building characteristics, and if so, how strong this relationship is. A sim-
of the thermal balance, the correlation, the FI and the model explain-
ulation model was used to determine the hourly heating load of 15 552
ability, all variables are normalised to a range of 0–1 (see Fig. 10).
different room variants with, e.g. different 𝑈 -values and window areas.
The normalisation of the values is carried out for each category (Share
The thermal balance, PCC between features and heating load, RFFI, PFI
TB, Pearson, PFI, RFFI, relative global SHAP). The values 𝑄heat and the
and SHAP values were calculated and compared for these rooms. It was
PCC 𝑇out have similar normalised values. Whereas the correlations of
found that outside temperature is the most important feature for most
the PCC 𝐼sol,so have much higher values than 𝑄sol . The same counts for
variants. The FI of 𝑇out increases with poorer insulation of the room
the occupancy and 𝑄IG . For PFI, the 𝐼sol,so and the 𝑇out have the highest
mean FI. The RFFI and SHAP values for the individual features are in and decreases with higher internal gains. Based on the simulations
a similar range. Therefore, feature selection methods can sometimes conducted, it was shown that the building characteristics of the room
produce varying results. It is recommended to use SHAP values because are implicitly present in the FI and that there is a relationship between
they provide local and global FI and other advantages. the two values. These results can be used to estimate the FI values for
It can be concluded that the 𝑇out has the highest importance on the the given room characteristics and serve as reference values. The AER
heating load prediction in most variants. Compared to 𝐼sol,so and occu- and the 𝑔-value have the highest discrimination ability on the FI, while
pancy, the 𝑇out probably has the highest prediction accuracy. If future the thermal mass and the room type have a low value. The local SHAP
features with a prediction error are used to predict the heating load, values were used to determine the effect of the feature values on the
the heating load will also be incorrectly predicted. Buildings with good heating load at each time step. This level of detail would otherwise only
insulation and high occupancy or large windows therefore presumably be possible using a white or grey box model of the building.
The accuracy of the model was low for room types with only a
few hours of heating load due to the small amount of training data.
2
Performed with an AMD Ryzen 9 7950X 16-core processor. Therefore, the FI is not necessarily representative for these variants.
11
A. Neubauer et al. Applied Energy 359 (2024) 122668
The simulation models assumed some simplifications, in particular con- [13] Wang Chao, Li Xinyi, Li Hailong. Role of input features in developing data-driven
stant and regular occupancy. An important finding is that the FI must models for building thermal demand forecast. Energy Build 2022;277:112593.
[14] Chaganti Rajasekhar, Rustam Furqan, Daghriri Talal, Díez Isabel de la Torre,
always be considered in the context of the given feature values. For
Mazón Juan Luis Vidal, Rodríguez Carmen Lili, et al. Building heating and
example, the FI in April is different from the FI in December because cooling load prediction using ensemble machine learning model. Sensors
the feature values and their occurrence are different in these months. By 2022;22(19).
using the local SHAP values, these different feature values can be taken [15] Ibrahim Dina M, Almhafdy Abdulbasit, Al-Shargabi Amal A, Alghieth Manal,
into account. In further investigations, the FI can be determined for Elragi Ahmed, Chiclana Francisco. The use of statistical and machine learning
tools to accurately quantify the energy performance of residential buildings.
real buildings and compared with the results of the simulation study.
PeerJ Comput Sci 2022;8:e856.
Moreover, it can be investigated if there is a relationship between the [16] Chung Woong June, Liu Chunde. Analysis of input parameters for deep learning-
building characteristics and the FI of the cooling load. based load prediction for office buildings in different climate zones using
explainable Artificial Intelligence. Energy Build 2022;276:112521.
CRediT authorship contribution statement [17] Deb Chirag, Dai Zhonghao, Schlueter Arno. A machine learning-based framework
for cost-optimal building retrofit. Appl Energy 2021;294:116990.
[18] Wu Kailang, Gu Jie, Meng Lu, Wen Honglin, Ma Jinghuan. An explainable
Alexander Neubauer: Writing – review & editing, Writing – origi- framework for load forecasting of a regional integrated energy system based on
nal draft, Visualization, Validation, Software, Methodology, Investiga- coupled features and multi-task learning. Prot Control Mod Power Syst 2022;7(1).
tion, Conceptualization. Stefan Brandt: Writing – review & editing. [19] Zhang Liang, Wen Jin, Li Yanfei, Chen Jianli, Ye Yunyang, Fu Yangyang,
Livingood William. A review of machine learning in building load prediction.
Martin Kriegel: Supervision, Resources.
Appl Energy 2021;285:116452.
[20] Schober Patrick, Boer Christa, Schwarte Lothar. Correlation coefficients:
Declaration of competing interest Appropriate use and interpretation. Anesth Analg 2018;126:1.
[21] Miao Jianyu, Niu Lingfeng. A survey on feature selection. Procedia Comput Sci
The authors declare that they have no known competing finan- 2016;91:919–26.
[22] Kumar Vipin. Feature selection: A literature review. Smart Comput Rev
cial interests or personal relationships that could have appeared to
2014;4(3).
influence the work reported in this paper. [23] Tadist Khawla, Najah Said, Nikolov Nikola, Mrabti Fatiha, Zahi Azeddine. Feature
selection methods and genomic big data: a systematic review. J Big Data 2019;6.
Data availability [24] Jovic A, Brkić K, Bogunović N. A review of feature selection methods
with applications. In: 2015 38th international convention on information and
communication technology, electronics and microelectronics. 2015, p. 1200–5.
The hourly heating load data sets for the 15 552 room variants are [25] Doshi-Velez Finale, Kim Been. Towards a rigorous science of interpretable
available on https://github.com/AlexanderNeubauer/heating_cooling_ machine learning. 2017.
load_VDI_6007. [26] Biran Or, Cotton Courtenay. Explanation and justification in machine learning:
A survey. In: IJCAI-17 workshop on explainable AI, vol. XAI, no. 1. 2017, p.
8–13.
Acknowledgements
[27] Linardatos Pantelis, Papastefanopoulos Vasilis, Kotsiantis Sotiris. Explainable AI:
A review of machine learning interpretability methods. Entropy 2021;23(1).
This work is funded by the German Federal Ministry for Eco- [28] Gilpin Leilani H, Bau David, Yuan Ben Z, Bajwa Ayesha, Specter Michael, Ka-
nomic Affairs and Climate Action (BMWK) in the framework of the re- gal Lalana. Explaining explanations: An overview of interpretability of machine
search program EnOB:ML-EBESR 03EN1076B. The support is gratefully learning. 2018, http://dx.doi.org/10.48550/ARXIV.1806.00069.
[29] Altmann André, Toloşi Laura, Sander Oliver, Lengauer Thomas. Permuta-
acknowledged.
tion importance: a corrected feature importance measure. Bioinformatics
2010;26(10):1340–7.
References [30] Molnar Christoph. Interpretable machine learning. second ed.. 2022.
[31] Lundberg Scott M, Lee Su-In. A unified approach to interpreting model
[1] IEA- International Energy Agency. The future of heat pumps. 2022. predictions. In: Guyon I, Luxburg U Von, Bengio S, Wallach H, Fergus R,
[2] IEA, International Energy Agency. Renewables 2021 - Analysis and forecast to Vishwanathan S, Garnett R, editors. Advances in neural information processing
2026. 2021. systems, vol. 30. Curran Associates, Inc.; 2017.
[3] Sun Ying, Haghighat Fariborz, Fung Benjamin CM. A review of the-state-of- [32] Shapley LS. 17. A value for n-person games. In: Contributions to the theory
the-art in data-driven approaches for building energy prediction. Energy Build of games (AM-28), volume II. Princeton: Princeton University Press; 1953, p.
2020;221:110022. 307–18.
[4] Kapetanakis Dimitrios-Stavros, Mangina Eleni, Finn Donal P. Input variable [33] Sundararajan Mukund, Najmi Amir. The many Shapley values for model
selection for thermal load predictive models of commercial buildings. Energy explanation. 2019, CoRR abs/1908.08474.
Build 2017;137:13–26. [34] Ribeiro Marco Túlio, Singh Sameer, Guestrin Carlos. "Why should I trust you?":
[5] Arrieta Alejandro Barredo, Díaz-Rodríguez Natalia, Ser Javier Del, Ben- Explaining the predictions of any classifier. 2016, CoRR abs/1602.04938.
netot Adrien, Tabik Siham, Barbado Alberto, et al. Explainable artificial [35] Lundberg Scott M, Erion Gabriel, Chen Hugh, DeGrave Alex, Prutkin Jordan M,
intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward Nair Bala, et al. From local explanations to global understanding with explainable
responsible AI. 2019. AI for trees. Nat Mach Intell 2020;2(1):56–67.
[6] Yu Jiaqi, Chang Wen-Shao, Dong Yu. Building energy prediction models and [36] van Zyl Corne, Ye Xianming, Naidoo Raj. Harnessing explainable artificial
related uncertainties: A review. Buildings 2022;12(8). intelligence for feature selection in time series energy forecasting: A comparative
[7] Ramirez-Vergara Jose, Bosman Lisa B, Leon-Salas Walter D, Wollega Ebisa. Am- analysis of Grad-CAM and SHAP. Appl Energy 2024;353:122079.
bient temperature and solar irradiance forecasting prediction horizon sensitivity [37] Sim Taeyong, Choi Seonbin, Kim Yunjae, Youn Su Hyun, Jang Dong-Jin,
analysis. Mach Learn Appl 2021;6:100128. Lee Sujin, et al. Explainable AI (XAI)-Based input variable selection methodology
[8] Wang Zhe, Hong Tianzhen, Piette Mary Ann. Building thermal load pre- for forecasting energy consumption. Electronics 2022;11(18). URL https://www.
diction through shallow machine learning and deep learning. Appl Energy mdpi.com/2079-9292/11/18/2947.
2020;263:114683. [38] Verein Deutscher Ingenieure eV(VDI). 6007-1:2012-03: Calculation of transient
[9] Machlev R, Heistrene L, Perl M, Levy KY, Belikov J, Mannor S, et al. Explainable thermal response of rooms and buildings - modelling of rooms. March 2012.
artificial intelligence (XAI) techniques for energy and power systems: Review, [39] Verein Deutscher Ingenieure eV(VDI). 6007-2:2012-03: Calculation of transient
challenges and opportunities. Energy AI 2022;9:100169. thermal response of rooms and buildings - modelling of windows. March 2012.
[10] Sakkas Nikolaos P, Abang Roger. Thermal load prediction of communal district [40] Lauster M, Teichmann J, Fuchs M, Streblow R, Mueller D. Low order thermal
heating systems by applying data-driven machine learning methods. Energy Rep network models for dynamic simulations of buildings on city district scale. Build
2022;8:1883–95. Environ 2014;73:223–31.
[11] Chen Yongbao, Guo Mingyue, Chen Zhisen, Chen Zhe, Ji Ying. Physical energy [41] Wetter Michael, Zuo Wangda, Nouidui Thierry S, Pang Xiufeng. Modelica
and data-driven models in building energy prediction: A review. Energy Rep Buildings library. J Build Perform Simul 2014;7(4):253–70.
2022;8:2656–71. [42] Seabold Skipper, Perktold Josef. Statsmodels: Econometric and statistical
[12] Fan Cheng, Xiao Fu, Wang Shengwei. Development of prediction models for next- modeling with python. In: 9th python in science conference. 2010.
day building energy consumption and peak power demand using data mining [43] Kaneko Hiromasa. Cross-validated permutation feature importance considering
techniques. Appl Energy 2014;127:1–10. correlation between features. Anal Sci Adv 2022;3(9–10):278–87.
12
A. Neubauer et al. Applied Energy 359 (2024) 122668
[44] Chen Tianqi, Guestrin Carlos. XGBoost: A scalable tree boosting system. In: [45] Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al.
Proceedings of the 22nd ACM SIGKDD international conference on knowledge Scikit-learn: Machine learning in Python. J Mach Learn Res 2011;12:2825–30.
discovery and data mining. KDD ’16, New York, NY, USA: Association for [46] Deng Houtao, Runger George, Tuv Eugene. Bias of importance measures for
Computing Machinery; 2016, p. 785–94. multi-valued attributes and solutions. In: Proceedings of the 21st international
conference on artificial neural networks - volume part II. Berlin, Heidelberg:
Springer-Verlag; 2011, p. 293–300.
13