Processes 11 03299

processes
Article
Performance Evaluation of Chiller Fault Detection and
Diagnosis Using Only Field-Installed Sensors
Zhanwei Wang 1,2, *, Jingjing Guo 1,2 , Sai Zhou 1,2 and Penghua Xia 1,2
1 Institute of Building Energy and Thermal Science, Henan University of Science and Technology,
Luoyang 471023, China; [email protected] (J.G.); [email protected] (S.Z.);
[email protected] (P.X.)
2 Henan Provincial Engineering Research Center of Building Environmental Control and Safety,
Luoyang 471023, China
* Correspondence: [email protected]
Abstract: Owing to the rapid expansion of data science, data-driven methods have emerged as a
dominant trend in chiller fault detection and diagnosis (FDD). Most of these methods prioritize
feature selection to achieve optimal diagnostic performance. However, on-site research indicates a
common installation of a limited number of sensors, coupled with a necessity to minimize diagnostic
costs. This discrepancy between existing research’s feature selection principles and the current on-site
sensor installation status presents a significant challenge. To facilitate the practical implementation of
data-driven methods in real chiller units, this study addresses a critical question: under the constraint
of limited on-site sensor installations, what is the optimal performance achievable by data-driven
methods and their improved versions? To answer this, only features derived from commonly installed
sensors on field chillers are chosen as indicators for typical chiller faults. The FDD performance of
six frequently used data-driven methods, namely, back-propagation neural network, convolutional
neural network, support vector machine, support vector data description, Bayesian network, and
random forest, along with their improved versions, is comprehensively evaluated and validated
using experimental data, considering four evaluation metrics. The conclusions drawn in this paper
provide valuable insights for users/manufacturers with limited or no budget, detailing the best
achievable diagnostic performance for each typical fault and offering guidance for those aiming to
further enhance FDD performance.
Citation: Wang, Z.; Guo, J.; Zhou, S.;
Xia, P. Performance Evaluation of
Keywords: chiller; fault detection; fault diagnosis; data-driven; field application
Chiller Fault Detection and Diagnosis
Using Only Field-Installed Sensors.
Processes 2023, 11, 3299. https://
doi.org/10.3390/pr11123299
1. Introduction
Academic Editors: Iole Nardi
and Domenico Palladino
The world is currently grappling with two major challenges: energy and the environ-
ment. According to research conducted by the International Institute of Refrigeration, there
Received: 25 October 2023 are approximately 3 billion refrigeration, air conditioning, and heat pump units worldwide,
Revised: 18 November 2023 collectively consuming nearly 17% of electricity [1]. Recognizing the significance of this
Accepted: 20 November 2023 issue, China has placed great emphasis on the need to curb carbon emissions and aims to
Published: 26 November 2023
reach the peak of emissions by 2030, followed by achieving carbon neutrality by 2060.
Within China, the heating, ventilating, and air conditioning (HVAC) sector consti-
tutes a substantial portion of energy consumption, accounting for around 60% of total
Copyright: © 2023 by the authors.
building energy usage [2]. Chillers, specifically, contribute to 40% of the energy con-
Licensee MDPI, Basel, Switzerland. sumed by buildings [3]. However, chiller failures can result in additional energy wastage,
This article is an open access article reduced equipment lifespan, compromised indoor comfort, and evenenviron mental pol-
distributed under the terms and lution through refrigerant leakage [4]. These consequences underscore the importance of
conditions of the Creative Commons addressing fault detection and diagnosis (FDD) in chiller systems.
Attribution (CC BY) license (https:// Over the past few decades, extensive research has been carried out in the field of
creativecommons.org/licenses/by/ chiller FDD, as comprehensively summarized and reviewed by Zhao et al. [5] and Mirnaghi
4.0/). et al. [6]. Many scholars have introduced numerous methods and validated them using
Processes 2023, 11, 3299. https://doi.org/10.3390/pr11123299 https://www.mdpi.com/journal/processes

Processes 2023, 11, 3299 2 of 21
experimental data, demonstrating their excellent performance. These methods have been
categorized into different types from varying perspectives. For example, Wei et al. [7]
classified FDD methods as white-box-based, grey-box-based, and black-box-based, while
Katipamula and Brambley [4], and later in their subsequent work [8], categorized them
as quantitative model-based, qualitative model-based, or process history (data-driven).
Indeed, these classifications do not have significant differences in essence.
Currently, there is a prevailing trend in the development of FDD methods applied
to chillers: an increasing number of methods are leveraging data-driven algorithms. This
shift is attributed to the rapid expansion of data science in recent years. Notably, recent
review articles by Mirnaghi et al. [6], Wei et al. [7], Shi et al. [9], Chen et al. [10], and
Chen et al. [11] have concentrated on data-driven FDD. Summarizing insights from these
reviews reveals that almost 80% of the recently proposed methods fall under the category
of data-driven approaches. For FDD within the HVAC domain, the following six meth-
ods are some of the most frequently employed data-driven approaches. The first typical
approach is neural networks, with back-propagation neural network (BPNN) and convo-
lutional neural network (CNN) being widely used training algorithms. They have been
applied to anomaly detection for HVAC systems in the work of Borda et al. [12], and to
FDD of chillers in the works of Gao et al. [13] and Gao et al. [14]. The second typical
data-driven approach is support vector machine (SVM), which has been employed for
FDD of HVAC systems in works by Han et al. [15], Fan et al. [16], Amir et al. [17], and
Van et al. [18]. Support vector data description (SVDD) is the third typical data-driven
approach, and its application to FDD in the HVAC field can be found in works by
Zhao et al. [19], Chen et al. [20], and Zhang et al. [21]. Bayesian network (BN) is the fourth
typical data-driven approach, and its application to FDD of HVAC systems is demonstrated
in works by Wang et al. [22,23], Ng et al. [24], Li et al. [25], and Liu et al. [26]. Lastly, random
forest (RF) represents the fifth typical data-driven approach, and its application to FDD of
HVAC systems can be observed in works by Han et al. [15] and Li et al. [27]. When applying
these approaches to the FDD of chillers, their performance has been extensively validated
in the existing literature. Published studies report diagnostic accuracies exceeding 90%
for nearly all typical chiller faults, with some reaching as high as 99%. However, such
exceptional diagnostic performance is contingent upon having an ample set of features.
The question of whether such excellent diagnostic performance can still be achieved when
the available features are extremely limited warrants in-depth investigation.
In addition to the previously mentioned widely employed methods, there are several
other approaches in use, including autoencoder [28], semi-supervised learning [29], as
well as unsupervised learning methods such as principal component analysis [30], associ-
ation rules mining [31], and cluster analysis [32], and other novel methods, e.g., domain
adaptation networks with parameter-free adaptively rectified linear units [33], dual-path
mixed-domain residual threshold networks [34], and wavelet neural network [35].
Data-driven FDD methods, in general, rely on identifying patterns in the measures of
selected features to detect and diagnose faults [5]. Therefore, the specific quantity and types
of features used in the training process significantly affect the accuracy of these models. In
the context of chiller FDD, numerous efforts have been made to identify suitable features.
Comstock et al. [36], Zhou et al. [37], and Zhao et al. [19] selected 14, 8, and 16 features,
respectively, from the original features using experimental analysis, sensitivity analysis,
and expert knowledge. Bai et al. [38] employed a feature recognition model and kernel
discriminant analysis to select 12 features indicative of the chiller’s healthy conditions.
Zhang et al. [39] developed a hybrid algorithm that combines filter and wrapper methods
to generate an optimal feature set with the ideal number of features. Han et al. [40] and
Gao et al. [41], based on optimal algorithms and global sensitivity analysis, selected 8 and
14 of the most effective features from an initial set of 64 features, respectively. Yan et al. [42],
using cost-sensitive classification accuracy, identified an optimal subset of 16 features for
chiller FDD. Nevertheless, the majority of data-driven methods focus on selecting features
Processes 2023, 11, 3299 3 of 21
to achieve optimal diagnostic performance, often overlooking the practical constraints

imposed by on-site sensor installations.
Analysis based on on-site research reveals a dual scenario: not only are a limited
number of sensors typically installed on-site, but there is also a demand for minimizing
diagnostic costs. Two comprehensive studies by Wang et al. [43] and Zhao et al. [44] inves-
tigated field chillers, encompassing 22 chillers and 14 chillers located in China, America,
and Italy, respectively. The findings from these field investigations confirm that only a few
sensors are generally installed in chillers, primarily for operational regulation and control.
The reluctance of users and manufacturers to install an extensive array of sensors in chillers
stems from cost considerations. Furthermore, for FDD to achieve a cost-to-benefit ratio
comparable to critical applications such as nuclear power plants, aircraft, and chemical
process plants, it is imperative to minimize the cost of FDD. This necessitates reducing the
installation cost of sensors to maximize economic benefits.
The disparity between the existing research’s feature selection principles and the
current on-site sensor installation status has given rise to a significant paradox. The
practical reality of having a limited number of on-site sensors and the imperative for
cost-effective applications strongly constrains the types and quantity of features that FDD
methods can utilize. This disconnection results in the features chosen by the majority
of methods outlined in the literature not aligning with the actual sensor installations
on chillers in the field. Such incongruence stands as a fundamental impediment to the
widespread application of proposed FDD methods in practical scenarios. For instance, to
attain optimal diagnostic performance, a considerable number of data-driven methods
incorporate features measured by sensors that are not typically found in field chillers. Some
even resort to using virtual features solely obtained in controlled laboratory environments.
For instance, water flow is a key sensitive feature as it directly indicates abnormal water
flow and has been used in many works [40–42]. Still, it is not commonly installed due to
high installation and maintenance costs. Even relatively inexpensive sub-cooling sensors,
used in other works [19,36,43], are not typically found in the field, emphasizing the cost
considerations of users and manufacturers. In laboratory settings, auxiliary loops may
be designed to provide precise control for simulating faults, but the features from these
auxiliary loops are not present in an actual chiller. These features are referred to as virtual
features and are utilized in some works [40–42].
Through the aforementioned analysis, it is clear that data-driven methods are a prevail-
ing trend, and the restricted installation of on-site sensors is a practical reality. To facilitate
the practical implementation of data-driven methods in actual chiller units, several ques-
tions need to be addressed: under the constraint of limited on-site sensor installations,
what is the optimal performance achievable by data-driven methods and their improved
versions? This question takes into consideration both economic constraints and the cost-to-
benefit ratio of FDD. Addressing this question serves two valuable purposes: (1) informing
users/manufacturers with limited or no budget about the best achievable diagnostic per-
formance for each typical fault, and (2) offering guidance for users/manufacturers aiming
to further improve FDD performance. To tackle this question and attain the correspond-
ing beneficial outcomes, only features derived from commonly installed sensors on field
chillers are chosen as indicators for typical chiller faults. These selected features encom-
pass 8 directly measured features and an additional 10 calculated features. Seven typical
chiller faults are considered, and a comprehensive evaluation across six frequently used
data-driven methods (BPNN, CNN, SVM, SVDD, BN, and RF) is conducted using these
selected features, taking into account four evaluation metrics.
The major contributions of this work are as follows:
(1) Whether mainstream data-driven FDD methods can still achieve the expected perfor-
mance when relying solely on features obtained from sensors universally installed
on-site is investigated;
(2) For each fault, insights into the optimal FDD performance achievable by data-driven
models using solely field-installed sensors are provided;
Processes 2023, 11, 3299 4 of 21
Processes 2023, 9, x FOR PEER REVIEW 4 of 22
(3) The conclusions drawn provide guidance on whether additional sensor installa-
(3) The conclusions drawn provide guidance on whether additional sensor installations
tions are necessary, serving as a reference for optimizing the cost–benefit ratio in
are necessary, serving as a reference for optimizing the cost–benefit ratio in chiller
chiller FDD.
FDD.
2.2. Methods andMaterials

Methods and Materials
2.1. Frameworks of Data-Driven FDD Methods
2.1. Frameworks of Data-Driven FDD Methods
According
According totoclassification
classification mechanisms,
mechanisms, FDDFDD approaches
approaches can be can be classified
classified into one-into one-
class
class classification-based and multi-class classification-based ones. Figures 1 and and
classification-based and multi-class classification-based ones. Figures 1 2 illus trate
2 illus-
the FDD frameworks for one-class classification and multi-class classification,
trate the FDD frameworks for one-class classification and multi-class classification, res- respectively.
Among
pectively.the six data-driven
Among methods
the six data-driven considered
methods in this
considered study,
in this SVDD
study, SVDD and
andBN BNare used
as
areone-class classification
used as one-class methods,
classification while BPNN,
methods, CNN, CNN,
while BPNN, SVM, SVM,
and RF andareRFemployed
are as
employed as multi-class classification
multi-class classification methods. methods.
Offline training of models Online FDD
Historical data（normal and fault data） Online real-time measurements
Steady state detector Steady state detector
Feature selection and data normalization Feature selection and data normalization
Testing data Training data
Fault-free model
Fault n ··· Fault 1 Fault-free

No
Model Model Model Detected?
training training training Yes
Fault n Fault 1 Fault-free
··· Fault n Fault 1
model model model ···
model model
No No No No
Satisfied? Satisfied? Satisfied? Diagnosed?
Yes Yes Yes Yes

Trained models for online FDD Unknown fault Known fault Normal
Figure 1.
Figure 1. FDD
FDDframework
frameworkof of
thethe
one-class classification-based
one-class methods.
classification-based methods.
For the one-class classification-based methods, the FDD problem is transformed into
a single-class classification problem. Faults are detected and diagnosed step by step.
The process involves sequentially identifying the presence or absence of individual fault
conditions. Once a fault is detected, the subsequent step focuses on diagnosing the
specific fault.
On the contrary, the multi-class classification-based methods transform the FDD prob-
lem into a multi-class classification problem. They simultaneously detect and diagnose
faults using a single-step completion strategy. This involves directly classifying the system
into fault classes or determining the presence of specific fault conditions in a single classifi-
cation step. These approaches provide a more immediate detection and diagnosis of faults,
without the need for sequential steps.
Processes
Processes 2023, 11,
9, x3299
FOR PEER REVIEW 5 5of
of 22
21
Offline training of models Online FDD
Historical data（normal and fault data） Online real-time measurements
Steady state detector Steady state detector
Feature selection and data normalization Feature selection and data normalization
Testing data Training data
FDD model
FDD model training
No
Satisfied?
FDD model
Yes
Known fault Normal

No
Satisfied?
Yes
Trained models for online FDD
Figure 2. FDD framework of the multi-class classification-based methods.
2.1.1.For
One-Class Classification
the one-class classification-based methods, the FDD problem is transformed into
As depicted
a single-class in Figure 1,
classification the FDDFaults
problem. methodareemploying a one-class
detected and diagnosedclassification
step by step. mecha-
The
nism consists
process of two
involves main parts:identifying
sequentially offline model
the training
presenceandoronline FDD.
absence of individual fault
DuringOnce
conditions. the offline model
a fault is training
detected, process, historical
the subsequent data on
step focuses containing
diagnosingfault-free and
the specific
faulty
fault. instances are collected from the database. These data undergo pre-processing steps.
Firstly,
Ona the
steady-state
contrary,filter is applied toclassification-based
the multi-class remove obvious outliers
methodsandtransform
dynamic data. Sec-
the FDD
ondly, appropriate
problem features are
into a multi-class selected to problem.
classification effectivelyThey
represent the health states
simultaneously detectofand
the
system. Thirdly,
diagnose the steady-state
faults using a single-stepdata are normalized
completion strategy.to eliminate
This involvesany discrepancies
directly in
classifying
magnitude among them. Subsequently, the pre-processed data are divided
the system into fault classes or determining the presence of specific fault conditions in a into training
and testing
single sets. Thestep.
classification training
Theseset approaches
is used to train the models,
provide a morewhile the testing
immediate set evalu-
detection and
ates the performance of the trained models. Finally,
diagnosis of faults, without the need for sequential steps. the models are trained according to
predetermined principles and criteria.
2.1.1.In the onlineClassification
One-Class FDD process, the trained models are utilized for FDD in real time. Firstly,
the online real-time data undergo the same pre-processing steps as those during the offline
As depicted in Figure 1, the FDD method employing a one-class classification
training process. Then, the fault-free model is applied to detect faults. If a fault is detected,
mechanism consists of two main parts: offline model training and online FDD.
the real-time data are input into the trained models corresponding to each known fault,
During the offline model training process, historical data containing fault-free and
one by one. This allows for the specific fault to be determined based on the outputs of the
faulty instances are collected from the database. These data undergo pre-processing steps.
trained models.
Firstly, a steady-state filter is applied to remove obvious outliers and dynamic data.
Secondly, appropriate
2.1.2. Multi-Class features are selected to effectively represent the health states of the
Classification
system. Thirdly, the steady-state data are normalized to eliminate any discrepancies in
As illustrated in Figure 2, the FDD method utilizing a multi-class classification mecha-
magnitude among them. Subsequently, the pre-processed data are divided into training
nism also comprises two main components: offline model training and online FDD.
and testing sets. The training set is used to train the models, while the testing set evaluates
Processes 2023, 11, 3299 6 of 21
The data pre-processing, feature selection, and data normalization steps remain the
same for both the one-class classification and multi-class classification methods. The
differences between them can be summarized as follows:
Offline model training: for the multi-class classification methods, a single (or inte-
grated) FDD model is trained to handle both fault detection and diagnosis. The training
process focuses on optimizing this specific model to accurately classify the system’s states.
In contrast, the one-class classification methods involve training multiple models, each
dedicated to detecting and diagnosing a specific fault condition;
Online FDD process: during the online FDD phase, the trained models for the multi-
class classification methods are applied to real-time data for both fault detection and
diagnosis. The significant distinction is that the FDD results are generated simultaneously.
In other words, the models provide immediate assessment on the system’s status, indicating
whether it is operating normally or experiencing a specific fault.
2.2. Investigation of Field Chiller Onboard Sensors

The knowledge regarding the sensors commonly installed on field chillers has been
extensively investigated in previous studies. Wang et al. [43] conducted a survey in
which they randomly selected and investigated 22 machine rooms with chillers in Shaanxi
province, China. Similarly, Zhao et al. [44] conducted a survey in which they randomly
selected and investigated 14 field chillers from three different manufacturers. The 14 field
chillers were located in four cities of America and Italy. Interestingly, both studies reported
consistent results.
After conducting the investigations, it was revealed that field chillers commonly have
eight sensors installed. These sensors and their respective features are listed in Table 1.
Furthermore, additional features can be derived by calculating specific parameters based
on the measurements obtained from the sensors in Table 1. In theory, an extensive array of
features can be derived from the features presented in Table 1. However, from a statistical
standpoint, attempting to list all these features is both impractical and unnecessary. In
this paper, additional features are generated following a set of key principles: (1) These
features can be acquired through straightforward calculations. Given the demands of
real-world on-site applications, elaborate computations necessitate extra computing and
storage resources, which inevitably drive up the cost of FDD implementations; (2) These
features hold significant thermodynamic significance and can effectively characterize the
thermodynamic performance of chillers; 3) These features are frequently employed by
other researchers, and their efficacy has been validated in prior studies. Adhering to these
three guiding principles, the additional features are identified and outlined in Table 2.
Table 1. The respective features measured by the commonly installed sensors.
No. Designation Description Formulation

1 TEI Entering evaporator water temperature Direct measurement
2 TEO Leaving evaporator water temperature Direct measurement
3 TCI Entering condenser water temperature Direct measurement
4 TCO Leaving condenser water temperature Direct measurement
5 Pin Compressor input power Direct measurement
6 TRE/PRE Evaporating temperature/pressure Direct measurement
7 TRC/PRC Condensing temperature/pressure Direct measurement
8 TRdis Refrigerant discharge temperature Direct measurement
Based on the investigation results, the features shown in Tables 1 and 2, obtained
from the commonly installed sensors, are selected to indicate the typical faults of a chiller.
Considering these field constraints, the question whether the mainstream data-driven FDD
methods can still be effective to obtain an expected performance is addressed.
Processes 2023, 11, 3299 7 of 21
Table 2. The features derived from these parameters in Table 1.
No. Designation Description Formulation

1 ∆Te Evaporator water temperature difference ∆Te = TEI − TEO
2 ∆Tc Condenser water temperature difference ∆Tc = TCO − TCI
Logarithmic mean temperature difference TEI − TEO
3 LMTDe LMTDe = TEI − TRE
of evaporator ln( TEO − TRE )
Logarithmic mean temperature difference TCO− TCI

4 LMTDe LMTDc = TRC − TCI
of condenser ln( TRC − TCO )
5 TEA Evaporator approach temperature TEA = TEO − TRE

6 TCA Condenser approach temperature TCA = TRC − TCO
Refrigerant discharge superheat
7 Tshdis Tshdis = TRdis − TRC
temperature
Heat transfer efficiency in saturation TCO− TCI
8 ξ sat,c ξ sat,c = TRC − TCI
section of condenser
Heat transfer efficiency in superheat Tshdis
9 ξ sh,c ξ sh,c = TRdis − TCO
section of condenser
Heat transfer efficiency in saturation Tshdis
10 ξ sat,e ξ sat,e = TRdis − TCO
section of evaporator
2.3. Experimental Data and Model Evaluation

In this section, the performance of current mainstream data-driven methods is assessed
by applying them to an experimental chiller. The chiller is a centrifugal water-cooled chiller,
as reported in ASHRAE RP-1043 [45]. The FDD performance is evaluated comprehensively
by using four evaluation indexes.
2.3.1. Experimental Data

The experimental data from ASHRAE RP-1043 [45] are utilized to evaluate the FDD
performance of the data-driven methods. ASHRAE RP-1043 represents the first phase
research project initiated by ASHRAE in the late 1990s, with the aim of providing a stan-
dardized evaluation tool for chiller FDD methods. This experiment was conducted on a
centrifugal water-cooled chiller to collect a substantial amount of data encompassing both
normal operating conditions and various fault scenarios. These data have become a widely
adopted benchmark dataset and have been utilized by numerous researchers to assess and
validate their proposed FDD methods.
The chiller is a centrifugal water-cooled chiller with a cooling capacity of 316 kW, uti-
lizing R134a as the refrigerant. The condenser and evaporator are both shell and tube heat
exchangers. Seven typical chiller faults were deliberately induced during the experiment,
including reduced condenser water flow (RedCdW), reduced evaporator water flow (Re-
dEvW), refrigerant leakage (RefLeak), refrigerant overcharge (RefOver), condenser fouling
(CdFoul), non-condensable gas in refrigerant (NcG), and excess oil (ExOil). Each fault
condition was evaluated at four severity levels (SL-1, SL-2, SL-3, and SL-4), representing
increasing severity from 10% to 40%.
The experimental measurements were collected under 27 distinct operating conditions,
achieved by varying the evaporator water leaving temperature, condenser water entering
temperature, and the cooling load. A total of 64 measurements were recorded at 10-second
intervals during each operating condition. For more detailed information regarding the
experimental setup and data collection process, refer to Ref. [45].
2.3.2. Feature Selection and Data Pre-Processing

Based on the investigation results presented in Section 2.2, the features listed in
Tables 1 and 2 are derived from common field-installed sensors. To address the question of
whether mainstream data-driven FDD methods can still effectively achieve the expected
Processes 2023, 11, 3299 8 of 21
performance under the constraint of limited sensors, two comprehensive evaluation cases
are established: Case 1 and Case 2.
In Case 1, the evaluation is conducted using only the 8 features listed in Table 1. These
features represent the minimum set of features that can be obtained from the commonly
installed sensors. In Case 2, the evaluation incorporates both the 8 features from Table 1
and the 10 additional features in Table 2.
To filter the experimental dataset and remove obvious outlying and dynamic data, the
steady-state data filter proposed by Rossi et al. [46] was employed. This filter has demon-
strated its effectiveness in chiller FDD, as shown in the works of Zhao et al. [19]. In this
study, 3 features (TCI, TEI, and TEO) were chosen as steady-state indexes. By calculating
the window averages of these features and using ±3 times the standard deviation as upper
and lower thresholds, the experimental dataset was effectively filtered.
The steady-state data obtained after filtering were then normalized using the maxi-
mum and minimum method. Subsequently, the normalized steady-state data were ran-
domly divided into training and testing datasets. After the filtering process, approximately
30–40% of the original complete data were retained. For each normal operating condition
and each fault at each severity level, 1200 samples were randomly selected. The ratio of
samples in the training set to the testing set was set as 2:1. Consequently, for normal and
each fault at each severity level, there were 800 samples for training and 400 samples for
testing. In total, the training set consisted of 23,200 samples, and the testing set consisted of
11,600 samples. These training and testing datasets were utilized for training and evaluating
the FDD models, respectively.
2.3.3. Development of Foundational FDD Models

BPNN, CNN, SVM, SVDD, BN, and RF are chosen as representatives of data-driven
methods. According to literature reviews by Zhao et al. [5], Mirnaghi et al. [6], Chen
et al. [10], and Chen et al. [11], these methods are frequently employed and widely applied
in chiller FDD. Additionally, these methods serve as foundational models and are often
subject to improvement. For example, researchers frequently employ optimization algo-
rithms (such as genetic algorithm) to refine the parameters of these foundational models,
aiming to enhance their FDD performance. In this paper, BPNN, CNN, SVM, SVDD, BN,
and RF are initially used to evaluate the selected features as foundational models.
The determination of model parameters and the development of foundational FDD
models are as follows. To ensure fairness and minimize effort in model development,
typical setups found in the literature are adopted to develop these foundational FDD
models. The choice of these references adhered to specific principles: (1) the selected
literature utilized the same FDD algorithm.; (2) the literature concentrated on chillers,
encompassing essentially the same set of typical chiller faults, and all drew upon the
identical dataset from ASHRAE RP-1043. These two principles underscored the high
relevance of the model parameters in the referenced literature to the scenarios presented in
this paper.
For the BPNN model, a three-layer BPNN architecture was employed. Considering
the number of features, the input layer consisted of either 8 or 18 nodes, and the output
layer consisted of 8 nodes representing normal and the 7 typical faults. According to the
suggestion from Wang [47], the number of nodes in the hidden layer was determined as
2n + 1 (where n is the number of nodes in the input layer). The BPNN model was then
trained using the training dataset.
For the CNN model, the input layer comprised the either 8 or 18 features, and the
output layer included one node for normal and seven nodes representing the typical faults.
Referring to the experiences from the works of Gao et al. [14], the CNN model consisted of
three convolutional layers, three pooling layers, and one fully connected layer. Average
pooling with a single stride was employed in the pooling layers, and the fully connected
layer performed a fully connected process on the pooling results.
Processes 2023, 11, 3299 9 of 21
When developing the SVM and SVDD models, the “one against one” multi-class SVM
algorithm was utilized for SVM. Grid search and 5-fold cross-validation were applied to
optimize the two parameters, namely, penalty constant and width of Gaussian. Various
pairs of values for penalty constant and width of Gaussian were tested, and the pair yielding
the best cross-validation accuracies was selected. Following existing research results [19,40],
the
−4grid4 search was conducted on penalty constant and width of Gaussian within the range
2 ,2 .
For the BN model, it is structured with two layers: a parent node and a child node.
Following Wang et al. [43], the parent node represents a discrete variable with two states,
indicating the absence or presence of a certain state in the chiller. There is a corresponding
BN model for both normal operation and each fault. In each BN model, the child node
represents a continuous variable, consisting of either 8 or 18 features. Equal prior proba-
bilities (1/2) were assigned to each state of the parent node. The child node’s conditional
probability distribution was assumed to follow a high-dimensional Gaussian distribution,
and the Gaussian parameters, such as mean vector and covariance matrix, were estimated
through maximum likelihood on the training data. When the state of the parent node was
false, the Gaussian parameters were iteratively tuned to obtain optimal performance.
In the case of the RF model, the CART algorithm was employed, following the works
of Han et al. [15] and Gao et al. [41]. The Gini coefficient criterion was used for selecting
node splitting features. When training each tree, the number of selected features for each
tree was calculated as log2 n (where n is the total number of input features).
To provide a comprehensive comparison, four commonly used evaluation metrics
were employed to assess the FDD performance of the data-driven methods. These metrics
included accuracy, precision, recall, and F-measure [41,47], which could be calculated using
a confusion matrix. The confusion matrix is derived from the predictions made by a model
on a given dataset. The details about these evaluation metrics can be found in Refs. [41,47].
3. Results and Discussions

3.1. Fault Detection Results Using Foundational Models
The fault detection performance of the six foundational data-driven methods is eval-
uated using the normal sample data from the testing dataset. The evaluation results,
represented by accuracy, are presented in Figure 3. It is worth highlighting that the defini-
tions and computation methods for fault detection accuracy and fault diagnosis accuracy
differ. Fault detection accuracy assesses the ability to correctly identify normal samples
and is calculated as the ratio of correctly identified normal samples to the total number of
normal samples. Conversely, fault diagnosis accuracy evaluates the proportion of correctly
identified samples across all classes.
Considering both Case 1 (using fewer features) and Case 2 (using more features), under
the constraint of using only features commonly available in the field, the BN-, BPNN-, RF-,
and SVM-based methods achieve excellent fault detection performance, with fault detection
accuracy exceeding 94%. Although the fault detection accuracy of CNN- and SVDD-based
methods is lower, they still surpass 80%. These results indicate that the features commonly
available in the field are sufficient to achieve an acceptable fault detection performance.
However, there is still room for further improvement.
By comparing the fault detection performance under Case 1 and Case 2, it is evident
that using different numbers of features has varying effects on the fault detection perfor-
mance of each method. Indeed, Case 2 includes all the features from Case 1 as well as
additional computed features. Clearly, there is some degree of information redundancy
in the features of Case 2. Nevertheless, the results demonstrate that this redundancy
negatively impacts the fault detection performance of certain methods (e.g., BN, BPNN,
and RF), while positively affecting the fault detection performance of other methods (e.g.,
SVM, CNN, and SVDD). In detail, compared with Case 1, under Case 2, the fault detection
accuracies of the BN-, BPNN-, and RF-based methods decrease by 1.0%, 0.5%, and 1.3%,
respectively. However, the fault detection accuracies of the SVM-, CNN-, and SVDD-based
as additional computed features. Clearly, there is some degree of information redundancy
in the features of Case 2. Nevertheless, the results demonstrate that this redundancy
negatively impacts the fault detection performance of certain methods (e.g., BN, BPNN,
and RF), while positively affecting the fault detection performance of other methods (e.g.,
Processes 2023, 11, 3299 SVM, CNN, and SVDD). In detail, compared with Case 1, under Case 2, the fault detection 10 of 21
accuracies of the BN-, BPNN-, and RF-based methods decrease by 1.0%, 0.5%, and 1.3%,
respectively. However, the fault detection accuracies of the SVM-, CNN-, and SVDD-
based methods
methods increaseincrease
by 1.8%,by 1.8%,
5.3%, and5.3%,
3.0%,and 3.0%, respectively.
respectively. The observation
The observation that
that information
information redundancy within the feature set impacts the fault detection performance
redundancy within the feature set impacts the fault detection performance of various meth- of
various methods differently suggests that each method may necessitate
ods differently suggests that each method may necessitate a unique and optimal feature a unique and
optimal feature
combination combination
to achieve to achieve peak
peak performance. performance.
Hence, Hence, it to
it is recommended is select
recommended to
the optimal
select the
feature optimal feature
combination combination
individually individually
for each method. for each method.
Case 1 (using fewer features)

Case 2 (using more features)
99.8%
100 98.8%
97.5% 97.0% 97.8%
96.5% 96.3%
94.5%
90 89.0%
86.5% 86.0%
Accuracy /%
81.3%
80
70
60
50
Processes 2023, 9, x FOR PEER REVIEW BN-based BPNN-based RF-based SVM-based CNN-based SVDD-based 11 of 22
Figure 3. Fault detection accuracies of the six foundational data-driven methods under Case 1 and
Case 2.
3.2. Fault Diagnosis Results Using Foundational Models
3.2. Fault Diagnosis Results Using Foundational Models
The fault diagnosis performance of the six foundational data-driven methods is
The fault diagnosis performance of the six foundational data-driven methods is eval-
evaluated using the sample data of each fault type from the testing dataset. The evaluation
uated using the sample data of each fault type from the testing dataset. The evaluation
results are as follows.
results are as follows.
3.2.1.
3.2.1.Overall
OverallFault
FaultDiagnosis
DiagnosisPerformance
Performance
The
The diagnostic results for the
diagnostic results for the seven
seven typical
typical faults
faults using
using the
the six
six foundational
foundational FDD
FDD
methods
methods under Case 1 and Case 2 are represented by the confusion matrices shown
under Case 1 and Case 2 are represented by the confusion matrices shown in in
Figures
Figures 4–9.
4–9. Based
Basedononthese
theseconfusion
confusionmatrices,
matrices, thethe overall
overall diagnostic
diagnostic accuracies
accuracies are
are
calculated
calculatedand
anddisplayed
displayedininFigure
Figure10.
10.
Diagnosed Diagnosed
Item RedCdW RedEvW RefLeak RefOver CdFoul NcG ExOil Item RedCdW RedEvW RefLeak RefOver CdFoul NcG ExOil
RedCdW 1583 17 0 0 0 0 0 RedCdW 1539 4 1 0 56 0 0
RedEvW 61 1538 0 1 0 0 0 RedEvW 21 1522 23 17 13 0 4
RefLeak 203 23 1246 83 0 1 44 RefLeak 55 17 1248 29 128 0 123

Actual
Actual
RefOver 22 4 84 1489 0 0 1 RefOver 3 3 3 1296 293 2 0
CdFoul 85 30 8 88 1369 10 10 CdFoul 20 39 27 72 1430 0 12
NcG 0 0 0 15 0 1585 0 NcG 0 0 3 3 0 1594 0
ExOil 292 111 663 132 0 1 401 ExOil 28 47 64 67 527 0 867
Case-1 (using fewer features) Case-2 (using more features)
Figure
Figure4.4.Confusion
Confusionmatrix
matrixof
ofthe
theBN-based
BN-basedmethod
methodunder
underCase
Case11and
andCase
Case2.
2.
Diagnosed Diagnosed
RedCdW 1600 0 0 0 0 0 0 RedCdW 1598 0 0 1 1 0 0
RedEvW 0 1600 0 0 0 0 0 RedEvW 0 1593 0 4 0 1 2
RefLeak 0 0 1595 0 2 0 3 RefLeak 1 0 1531 1 7 1 59

tual
tual
RefOver
RefOver 22
22 44 84
84 1489
1489 00 00 11 RefOver
RefOver 33 33 33 1296
1296 293
293 22 00
CdFoul
CdFoul 85
85 30
30 88 88
88 1369
1369 10
10 10
10 CdFoul
CdFoul 20
20 39
39 27
27 72
72 1430
1430 00 12
12
NcG
NcG 00 00 00 15
15 00 1585
1585 00 NcG
NcG 00 00 33 33 00 1594
1594 00
ExOil
ExOil 292
292 111
111 663
663 132
132 00 11 401
401 ExOil
ExOil 28
28 47
47 64
64 67
67 527
527 00 867
867
Processes 2023, 11, 3299 Case-1 (using

Case-1 (using fewer
fewer features)
features) Case-2 (using
Case-2 (using more
more features)
features) 11 of 21
Figure 4.
Figure 4. Confusion
Confusion matrix
matrix of
of the
the BN-based
BN-based method
method under
under Case
Case 11 and
and Case
Case 2.
2.
Diagnosed
Diagnosed Diagnosed
Diagnosed
Item
Item RedCdW RedEvW
RedCdW RedEvW RefLeak
RefLeak RefOver
RefOver CdFoul
CdFoul NcG
NcG ExOil
ExOil Item
Item RedCdW RedEvW
RefLeak RefOver
RefOver CdFoul
CdFoul NcG
NcG ExOil
ExOil
RedCdW
RedCdW 1600
1600 00 00 00 00 00 00 RedCdW
RedCdW 1598
1598 00 00 11 11 00 00
RedEvW
RedEvW 00 1600
1600 00 00 00 00 00 RedEvW
RedEvW 00 1593
1593 00 44 00 11 22
RefLeak 00 00 1595 00 22 00 33 RefLeak 11 00 1531 11 77 11 59
Actual
Actual
RefLeak 1595 RefLeak 1531 59
Actual
Actual
RefOver
RefOver 00 00 00 1598
1598 22 00 00 RefOver
RefOver 00 11 00 1528
1528 47
47 10
10 14
14
CdFoul
CdFoul 00 00 00 00 1599
1599 00 11 CdFoul
CdFoul 00 00 33 52
52 1523
1523 00 22
22
NcG
NcG 00 00 00 22 00 1598
1598 00 NcG
NcG 00 00 00 12
12 00 1588
1588 00
ExOil
ExOil 00 00 00 00 00 33 1597
1597 ExOil
ExOil 00 00 46
46 44 14
14 00 1536
1536
Case-1 (using
Case-1 (using fewer
fewer features)
Case-2 (using more
more features)
features)
Figure 5.
Figure 5.
Figure Confusion matrix
Confusionmatrix
5. Confusion of
matrix of the
of the BPNN-based
the BPNN-based method
BPNN-based method under
method under Case
Case 111 and
under Case and Case
and Case 2.
Case 2.
2.
Diagnosed
Diagnosed Diagnosed
Diagnosed
Item
Item RedCdW RedEvW
RefLeak RefOver
RefOver CdFoul
CdFoul NcG
NcG ExOil
ExOil Item
Item RedCdW RedEvW
RefLeak RefOver
RefOver CdFoul
CdFoul NcG
NcG ExOil
ExOil
RedCdW
RedCdW 1571
1571 33 55 22 11
11 77 11 RedCdW
RedCdW 1576
1576 66 55 00 44 00 99
RedEvW
RedEvW 44 1549
1549 14
14 11
11 66 22 14
14 RedEvW
RedEvW 22 1560
1560 12
12 33 88 00 15
15
RefLeak 11 77 1571 44 55 11 11 RefLeak 66 66 1551 77 55 11 24
Actual
Actual
RefLeak 1571 11 RefLeak 1551 24

Actual
Actual
RefOver
RefOver 55 22 44 1547
1547 15
15 21
21 66 RefOver
RefOver 00 22 44 1522
1522 58
58 10
10 44
CdFoul
CdFoul 33 44 44 16
16 1562
1562 11 10
10 CdFoul
CdFoul 13
13 77 66 39
39 1519
1519 00 16
16
NcG
NcG 22 00 22 13
13 11 1581
1581 11 NcG
NcG 00 00 00 10
10 11 1589
1589 00
ExOil
ExOil 11 77 10
10 13
13 16
16 11 1552
1552 ExOil
ExOil 77 77 27
27 44 18
18 00 1537
1537
Processes2023,
Processes 2023,9,
9,xxFOR
FORPEERCase-1
PEER (using fewer
REVIEW
REVIEW
Case-1 (using fewer features)
Case-2 (using more
more features)
features) 12 of
12 of 22
22
Figure
Figure 6.
Figure 6. Confusion
6. Confusion matrix
Confusionmatrix of
matrixof the
ofthe RF-based
the RF-based method
RF-based method under
method under Case
Case 111 and
under Case and Case
and Case 2.
Case 2.
2.
Diagnosed
Diagnosed Diagnosed
Diagnosed
Item RedCdW
Item RedCdW RedEvW
RedEvW RefLeak
RefLeak RefOver
RefOver CdFoul
CdFoul NcG
NcG ExOil
ExOil Item RedCdW
Item RedCdW RedEvW
RedEvW RefLeak
RefLeak RefOver
RefOver CdFoul
CdFoul NcG
NcG ExOil
ExOil
RedCdW 1456
RedCdW 1456 00 18
18 11 82
82 00 43
43 1586
RedCdW 1586
RedCdW 00 44 00 88 00 22
RedEvW
RedEvW 00 1338
1338 37
37 33 188
188 00 34
34 RedEvW
RedEvW 00 1575
1575 14
14 11 55 00 55
RefLeak 44 00 1482
1482 00 69
69 00 45
45 RefLeak 00 00 1172 22 79 11 346
Actual
RefLeak
Actual
RefLeak 1172 79 346

Actual
Actual
RefOver
RefOver 44 11 15
15 1093
1093 317
317 124
124 46
46 RefOver
RefOver 00 00 22 1232
1232 348
348 55 13
13
CdFoul
CdFoul 77 00 14
14 20
20 1178
1178 33 378
378 CdFoul
CdFoul 88 00 17
17 77
77 1402
1402 00 96
96
NcG
NcG 00 00 00 121
121 18
18 1461
1461 00 NcG
NcG 00 00 00 12
12 11 1587
1587 00
ExOil
ExOil 00 00 112
112 22 106
106 11 1379
1379 ExOil
ExOil 11 11 93
93 11 60
60 00 1444
1444
Case-1(using
Case-1 (usingfewer
fewerfeatures)
features) Case-2(using
Case-2 (usingmore
more features)
features)
Figure 7.
Figure 7. Confusion matrix
matrix of the SVM-based
SVM-based method under
under Case 11 and
and Case 2.
2.
Figure 7.Confusion
Confusion matrixofofthe
the SVM-basedmethod
method underCase
Case 1 andCase
Case 2.
Diagnosed
Diagnosed Diagnosed
Diagnosed
Item RedCdW
Item RedCdW RedEvW
RedEvW RefLeak
RefLeak RefOver
RefOver CdFoul
CdFoul NcG
NcG ExOil
ExOil Item RedCdW
Item RedCdW RedEvW
RedEvW RefLeak
RefLeak RefOver
RefOver CdFoul
CdFoul NcG
NcG ExOil
ExOil
RedCdW 1589
RedCdW 1589 00 00 00 99 22 00 1599
RedCdW 1599
RedCdW 00 00 11 00 00 00
RedEvW
RedEvW 11 1497
1497 45
45 11 00 56
56 00 RedEvW
RedEvW 00 1505
1505 11 00 91
91 33 00
RefLeak
RefLeak 00 17
17 895
895 17
17 59
59 611
611 11 RefLeak 22 44 1291 14 282 77 00
Actual
Actual
RefLeak 1291 14 282

Actual
Actual
RefOver
RefOver 22 22 00 1214
1214 349
349 19
19 14
14 RefOver
RefOver 21
21 00 74
74 1319
1319 100
100 86
86 00
CdFoul
CdFoul 26
26 33 11
11 130
130 1147
1147 283
283 00 CdFoul
CdFoul 11 33 114
114 20
20 1449
1449 12
12 11
NcG
NcG 00 00 00 17
17 00 1583
1583 00 NcG
NcG 00 00 44 121
121 20
20 1433
1433 22
22
ExOil
ExOil 66 11
11 100
100 22 57
57 00 1424
1424 ExOil
ExOil 00 00 00 00 00 33 1597
1597
Case-1(using
Case-1 (usingfewer
fewerfeatures)
features) Case-2(using
Case-2 (usingmore
more features)
features)
Figure
Figure 8.
Figure8. Confusion
8.Confusion matrix
Confusionmatrix of
matrixof the
ofthe CNN-based
theCNN-based method
CNN-basedmethod under
methodunder Case
Case111and
underCase and Case
andCase 2.
Case2.
2.
Diagnosed
Diagnosed Diagnosed
Diagnosed
Item RedCdW
Item RedCdW RedEvW
RedEvW RefLeak
RefLeak RefOver
RefOver CdFoul
CdFoul NcG
NcG ExOil
ExOil Item RedCdW
Item RedCdW RedEvW
RedEvW RefLeak
RefLeak RefOver
RefOver CdFoul
CdFoul NcG
NcG ExOil
ExOil
RedCdW 1455
RedCdW 1455 26
26 29
29 18
18 25
25 20
20 27
27 1387
RedCdW 1387
RedCdW 133
133 52
52 20
20 77 11 00
RedEvW
RedEvW 32
32 1438
1438 39
39 23
23 25
25 20
20 23
23 RedEvW
RedEvW 00 1334
1334 151
151 93
93 15
15 66 11
RefLeak 21 00 1466 24 30 32 27 RefLeak 00 25 1363 106 105 11 00
al
al
RefLeak 21 1466 24 30 32 27 RefLeak 25 1363 106 105

l
l
RefOver 2 2 0 1214 349 19 14 RefOver 21 0 74 1319 100 86 0
CdFoul 26 3 11 130 1147 283 0 CdFoul 1 3 114 20 1449 12 1
NcG 0 0 0 17 0 1583 0 NcG 0 0 4 121 20 1433 22
ExOil 6 11 100 2 57 0 1424 ExOil 0 0 0 0 0 3 1597
Case-1 (using fewer features) Case-2 (using more features)

Processes 2023, 11, 3299 12 of 21
Figure 8. Confusion matrix of the CNN-based method under Case 1 and Case 2.
Diagnosed Diagnosed
RedCdW 1455 26 29 18 25 20 27 RedCdW 1387 133 52 20 7 1 0
RedEvW 32 1438 39 23 25 20 23 RedEvW 0 1334 151 93 15 6 1
RefLeak 21 0 1466 24 30 32 27 RefLeak 0 25 1363 106 105 1 0

Actual
Actual
RefOver 7 18 26 1459 34 24 32 RefOver 0 0 0 1592 6 2 0
CdFoul 26 16 25 23 1480 13 17 CdFoul 0 2 3 174 1368 51 2
NcG 25 24 12 22 19 1478 20 NcG 0 0 0 3 42 1555 0
ExOil 25 17 23 15 22 32 1466 ExOil 0 0 7 27 82 321 1163
Case 1 (using fewer features) Case 2 (using more features)

Figure
Figure 9.
9. Confusion
Confusion matrix
matrix of
of the
the SVDD-based
SVDD-based method
method under
under Case
Case 11 and
and Case
Case 2.
2.
Case 1 (using fewer features)

Case 2 (using more features)
99.9%
100 97.3% 97.6% 96.9%
91.0% 91.5%
89.3%
90 87.2%
84.8%
83.8% 83.5%
82.2%
Accuracy /%
80
70
60
50
BN-based BP-based RF-based SVM-based CNN-based SVDD-based
Figure10.
Figure 10.Fault
Faultdiagnosed
diagnosedaccuracies
accuraciesof
ofthe
thesix
sixfoundational
foundationaldata-driven
data-drivenmethods
methodsunder
underCase
Case11and
and
Case 2.
Case 2.
Takinginto
Taking intoaccount
accountbothboth Case
Case 1 and
1 and CaseCase2, and2, using
and using only features
only features commonly commonly
avail-
available
able in thein the field,
field, the BPNN-,
the BPNN-, RF-, CNN-
RF-, CNN- and SVDD-based
and SVDD-based methods methods
achieve achieve excellent
excellent fault
fault diagnosis performance, with fault diagnosis accuracy exceeding
diagnosis performance, with fault diagnosis accuracy exceeding 90%. The BN- and SVM- 90%. The BN- and
SVM-based
based methods methods show lower
show lower fault diagnostic
fault diagnostic accuracy accuracy but
but still still exceed
exceed 80%. Similar
80%. Similar to
to fault
detection, the features
fault detection, commonly
the features availableavailable
commonly in the fieldin are
thesufficient
field are to achieve an
sufficient to acceptable
achieve an
fault diagnosis
acceptable faultperformance. However, there
diagnosis performance. is still there
However, room isfor further
still roomimprovement.
for further improve-
ment.Indeed, by comparing the fault diagnosis performance under Case 1 and Case 2, it can
be observed
Indeed,that the numberthe
by comparing of features used hasperformance
fault diagnosis different impacts
underon Casethe 1fault
anddiagnosis
Case 2, it
performance
can be observed that the number of features used has different impacts on the fault2,diag-
for different methods. For instance, when moving from Case 1 to Case the
fault
nosisdiagnosis
performanceaccuracies of the BPNN-,
for different methods. RF-,
Forand SVDD-based
instance, methods
when moving decrease
from Case 1by to2.6%,
Case
0.7%,
2, theand 4.3%,
fault respectively.
diagnosis On the
accuracies other
of the hand, the
BPNN-, RF-,fault
and diagnosis
SVDD-based accuracies
methods of the BN-,
decrease
SVM-,
by 2.6%,and CNN-based
0.7%, and 4.3%, methods increase
respectively. Onby the2.5%,
other5.5%, and
hand, the7.5%,
fault respectively. Similar to
diagnosis accuracies of
fault detection, the redundant information in the features of Case
the BN-, SVM-, and CNN-based methods increase by 2.5%, 5.5%, and 7.5%, respectively.2 negatively impacts the
fault diagnosis
Similar to faultperformance
detection, the ofredundant
certain methods, such as
information inBPNN, RF, and
the features SVDD.
of Case However,
2 negatively
itimpacts
has a positive
the fault diagnosis performance of certain methods, such as BPNN, RF, BN,
impact on the fault diagnosis performance of other methods, including and
SVM, and CNN. This
SVDD. However, it finding suggestsimpact
has a positive that each
on method
the faultmay require performance
diagnosis a unique and optimal
of other
feature
methods, combination
including to BN,achieve
SVM, the andbest
CNN.faultThis
diagnosis
findingperformance.
suggests thatTherefore,
each method it is also
may
require a unique and optimal feature combination to achieve the best fault diagnosis
performance. Therefore, it is also advisable to select the optimal feature combination
individually for each method when it comes to diagnosing faults.
Processes 2023, 11, 3299 13 of 21
advisable to select the optimal feature combination individually for each method when it
comes to diagnosing faults.
3.2.2. Individual Fault Diagnosis Performance

Processes 2023, 9, x FOR PEER REVIEW 14
The overall diagnostic accuracy provides an overview of the diagnostic performance,
Processes 2023, 9, x FOR
but PEER REVIEW
it may not capture the individual diagnostic differences. To further analyze the diagnos- 14
tic results, precision, recall and F-measure are calculated based on the confusion matrices,
and the resultsCase
are1presented in Figures 11–17.
(using fewer features) Case 2 (using more features)
Precision Recall F-measure Precision Recall F-measure
Precision 70.5%
RecallF-measure Precision Recall92.4%
F-measure
BN-based 98.9% BN-based 96.2%
70.5%82.3% 92.4%
94.2%
82.3% 100% 99.9%
94.2%
BPNN-based 100% BPNN-based 99.9%
100%
100% 99.9%
99.9%
99.0%
100% 98.3%
99.9%
RF-based 98.2% RF-based 98.5%
99.0%
98.6% 98.3%
98.4%
99.0%
98.6% 99.4%
98.4%
SVM-based 91.0% SVM-based 99.1%
99.0%
94.8% 99.4%
99.3%
97.8%
94.8% 98.5%
99.3%
CNN-based 99.3% CNN-based 99.9%
97.8%
98.6% 98.5%
99.2%
91.5%98.6% 100%
99.2%
SVDD-based 90.9% SVDD-based 86.7%
91.5%
91.2% 92.9% 100%
0 10 20 30 40 50 60 70 80 91.2%90 100 0 10 20 30 40 50 60 70 8092.9% 90 100
Score Score
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Score Score
Figure 11. Precision, recall, and F-measure of RedCdW using the six foundational data-d
Figure 11. Precision, recall, and F-measure of RedCdW using the six foundational data-driven
methods11.under Case 1 recall,
and Caseand2.F-measure of RedCdW using the six foundational data-d
methods under Figure
Case 1 and Precision,
Case 2.
methods under Case 1 and Case 2.
Precision Recall 89.3%
F-measure Precision Recall 93.3%
F-measure
89.3% 93.3%
92.6% 94.2%
100%
92.6% 99.9%
94.2%
100% 99.9%
99.7%
100%
98.5%
100% 98.2%
99.7%
98.5% 98.2%
97.9%
97.7%
99.9%
97.7% 99.9%
97.9%
91.1% 99.9% 99.9%
99.2%
97.8%
91.1% 99.5%
99.2%
97.8%
95.7% 99.5%
96.7%
93.4%
95.7% 89.3%96.7%
93.4% 89.3%
91.6% 86.2%
0 10 20 30 40 50 60 70 80 91.6% 90 100 0 10 20 30 40 50 60 7086.2%
80 90 100
Score Score
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Score
Figure 12. Precision, recall, Score
and F-measure of RedEvW using the six foundational data-driven
Figure 12. Precision,
methods under Case 1 and Case 2. recall, and F-measure of RedEvW using the six foundational data-d
Figure 12. Precision, recall, and F-measure of RedEvW using the six foundational data-d
Processes 2023, 11, 3299 14 of 21

Precision
Case Recallfeatures)
1 (using fewer F-measure Precision
2 (using more F-measure
Precision Recall
62.3% F-measure Precision Recall F-measure
91.2%
BN-based 62.3% 77.9% BN-based 78.0%
91.2%
BN-based 69.2%
77.9% BN-based 84.1%
78.0%
69.2% 100% 84.1%96.9%
BPNN-based 99.7%
100% BPNN-based 95.7%
96.9%
BPNN-based 99.8% BPNN-based 96.3%
99.7% 95.7%
99.8%
97.6% 96.3%
96.6%
97.6% 96.6%
97.9% 96.8%
97.9%
88.3% 96.8%
90.0%
88.3% 90.0%
SVM-based 90.4%
92.6% SVM-based 80.8%
73.3%
90.4%
85.2% 80.8%
87.0%
CNN-based 55.9% 85.2% CNN-based 80.7%
87.0%
80.7%
55.9%
67.5% 90.5% 83.7%
86.5%
SVDD-based 91.6%
90.5%
SVDD-based 86.5% 85.2%
SVDD-based 91.1% SVDD-based 85.8%85.2%
91.6%
0 10 20 30 40 50 60 70 91.1%
80 90 100 0 10 20 30 40 50 60 70 85.8%
80 90 100
0 10 20 30 40Score
50 60 70 80 90 100 0 10 20 30 40 Score
50 60 70 80 90 100
Score Score
Figure 13. Precision, recall, and F-measure of RefLeak using the six foundational data-d
Figure 13. Precision, recall, and F-measure of RefLeak using the six foundational data-driven methods
methods
Figure 13.under Case 1 recall,
Precision, and Case
and2. F-measure of RefLeak using the six foundational data-d
under Case 1 and Case 2.
Precision
1 (using fewer F-measure Precision
2 (using more F-measure
Precision Recall F-measure
82.4% Precision Recall F-measure
83.7%
82.4% 83.7%
BN-based 87.4%
93.1% BN-based 84.0%
81.0%
87.4%99.9% 84.0%95.4%
95.4%
99.9%
95.5%
99.9%
99.9%
96.3% 95.4%
96.0%
96.3% 96.0%
RF-based 96.5%
96.7% RF-based 95.6%
95.1%
96.5%
88.1% 95.6%
93.0%
SVM-based 68.3% 88.1% SVM-based 77.0% 93.0%
77.0%
68.3%
77.0%87.9% 84.2%
89.4%
CNN-based 75.9%87.9% CNN-based 82.4%
89.4%
75.9% 82.4%
81.4%92.1% 85.8%
79.0%
SVDD-based 91.2% SVDD-based 79.0% 99.5%
92.1%
SVDD-based 91.6% SVDD-based 88.1%99.5%
91.2%
0 10 20 30 40 50 60 70 91.6%
80 90 100 0 10 20 30 40 50 60 70 88.1%
80 90 100
0 10 20 30 40Score
50 60 70 80 90 100 0 10 20 30 40 Score
50 60 70 80 90 100
Score Score
Figure 14. Precision, recall, and F-measure of RefOver using the six foundational data-driven methods
under Case 1 andFigure
Case 14.
2. Precision, recall, and F-measure of RefOver using the six foundational data-d
methods
Figure 14.under Case 1recall,
Precision, and Case
and2.F-measure of RefOver using the six foundational data-d
Processes 2023, 11, 3299 15 of 21

Case 1 (using fewer
Precision Recallfeatures)
F-measure Case 2 (using more
F-measure
100% 58.4%
BN-based 85.6% 100% BN-based 58.4% 89.4%
BN-based 92.2%
85.6% BN-based 70.7% 89.4%
92.2% 70.7%
99.8% 95.7%
BPNN-based 99.9%
99.8% BPNN-based 95.2%
95.7%
BPNN-based 99.8%
95.2%
99.8% 95.4%
96.7% 94.2%
RF-based 97.6%
96.7% RF-based 94.9%
94.2%
RF-based 97.1%
97.6% RF-based 94.6%
94.9%
97.1% 94.6%
60.2% 73.7%
SVM-based 60.2%73.6% SVM-based 73.7% 87.6%
SVM-based 66.2%
73.6% SVM-based 80.0%
87.6%
66.2% 80.0%
70.8% 74.6%
CNN-based 71.7%
70.8% CNN-based 74.6% 90.6%
CNN-based 71.2%
71.7% CNN-based 90.6%
81.8%
71.2% 81.8%
90.5% 84.2%
SVDD-based 92.5%
90.5% SVDD-based 85.5%
84.2%
SVDD-based 91.5%
85.5%
91.5% 84.8%
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
0 10 20 30 40Score
50 60 70 80 90 100 0 10 20 30 40Score
50 60 70 80 90 100
Score Score
Figurerecall,
Figure 15. Precision, 15. Precision, recall,
and F-measure and F-measure
of CdFoul using theof
sixCdFoul usingdata-driven
foundational the six foundational
methods data-d
Figure 15.
methods Precision, recall, and F-measure of CdFoul using the six foundational data-d
under Case 1 and Case 2.under Case 1 and Case 2.

Case 1 (using fewer
F-measure Case 2 (using more
F-measure
Precision Recall F-measure
99.2% Precision Recall F-measure
99.9%
BN-based 99.2%
99.1% BN-based 99.6%
99.9%
BN-based 99.2%
99.1% BN-based 99.7%
99.6%
99.2% 99.7%
99.8% 99.3%
BPNN-based 99.9%
BPNN-based 99.8%
99.8% 99.3%
98.0% 99.3%
RF-based 98.8%
98.0% RF-based 99.3%
99.3%
RF-based 98.4%
98.8% RF-based 99.3%
98.4% 99.3%
91.9% 99.6%
SVM-based 91.3%
91.9% SVM-based 99.2%
99.6%
SVM-based 91.6%
91.3% SVM-based 99.4%
99.2%
91.6% 99.4%
62.0% 92.8%
92.8%
89.6%
76.2%
91.3% 80.3%91.2%
SVDD-based 92.4%
91.3% SVDD-based 80.3% 97.2%
SVDD-based 91.8%
97.2%
91.8% 87.9%
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
0 10 20 30 40Score
50 60 70 80 90 100 0 10 20 30 40 Score
50 60 70 80 90 100
Score Score
Figure 16. Precision, recall, and F-measure of NcG using the six foundational data-driven methods
Figure 16. Precision, recall, and F-measure of NcG using the six foundational data-driven me
FigureCase
under 16. Precision, recall,
1 and Case 2. and F-measure of NcG using the six foundational data-driven me
Processes 2023, 11, 3299 16 of 21

87.9% 86.2%
39.0% 66.5%
99.8% 94.1%
99.8% 95.0%
97.3% 95.8%
97.2% 95.9%
71.6% 75.8%
78.2% 82.4%
99.0% 98.6%
93.7% 99.2%
90.9% 99.7%
91.3% 84.1%
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Score Score
Figure 17. Precision, recall, and F-measure of ExOil using the six foundational data-driven methods
Figure 17. Precision, recall, and F-measure of ExOil using the six foundational data-driven me
A summary of the individual diagnostic results is as follows:
A summary of the individual diagnostic results is as follows:
(1) When utilizing only the features obtainable from field sensors, different data-driven
methods(1) yieldWhen utilizing
diverse only the
diagnostic features
results obtainable
for different fault from field
types. sensors, different
Considering both data-d
methods yield diverse diagnostic results for
Case 1 and Case 2, almost all foundational FDD methods achieve F-measures over different fault types. Considering
Case 1 and Case 2, almost all foundational
80% for all seven faults. However, only a few foundational FDD methods achieve FDD methods achieve F-measures
80% for all seven faults. However, only a few foundational
F-measures over 90% for all seven faults. This suggests that the features available in FDD methods achie
measures
the field are feasible over 90%
to obtain for all seven
acceptable faults.performance
diagnostic This suggests forthat
thethe features
seven typicalavailable i
faults but mayfield
not are feasible to
be sufficient to obtain
achieveacceptable diagnosticperformance.
excellent diagnostic performance Furtherfor the seven ty
improvementfaultsmightbut mayadditional
require not be sufficient
features totoachieve
enhance excellent diagnostic
the diagnostic performance. Fu
capabilities;
(2) improvement
Based on the number of FDD might
methodsrequire additional
with F-measures overfeatures
90% for each to enhance
fault, when the diagn
capabilities;
using only features obtainable from field sensors, RedCdW, RedEvW, and NcG are
(2)faults
the easiest Based on the number
to diagnose. All sixoffoundational
FDD methodsFDD withmethods
F-measures overF-measures
achieve 90% for each fault,
over 90% for using only features
these three obtainable
faults. RefLeak, from field
CdFoul, sensors,
and ExOil areRedCdW,
relativelyRedEvW,
easier to and Nc
diagnose, with the easiest
five or fourfaults to diagnose.
foundational All sixachieving
methods foundational FDD methods
F-measures over 90% achieve
for F-mea
each of them.overOn the90% for these
other hand,three
RefOver faults. RefLeak,
is the CdFoul, and
most challenging ExOil
fault are relatively eas
to diagnose,
with only three foundational
diagnose, methods
with five or four achieving
foundational F-measures
methods over 90%; F-measures over 90
achieving
(3) Upon analyzing the confusion matrix, it is evident that there
each of them. On the other hand, RefOver is the most challenging is significant confusion
fault to diag
between RefOver with and
onlyCdFoul faults. For instance,
three foundational methodsfor BN-, SVM-,
achieving and CNN-based
F-measures over 90%;
methods,(3)293,Upon
348, analyzing
and 349 RefOver samples,
the confusion respectively,
matrix, it is evident were misdiagnosed
that as
there is significant conf
CdFoul. According
between toRefOver
the experimental
and CdFoul results [45],
faults. Forcertain
instance, features used,
for BN-, suchand
SVM-, as CNN-b
Pin , PRC, andmethods,
TCA, exhibit 293,similar
348, and changing
349 RefOvertrends whensamples, RefOver and CdFoul
respectively, werefaults
misdiagnos
occur, and this similarity has a notable negative impact on the BN-,
CdFoul. According to the experimental results [45], certain features used, such a SVM-, and
𝑃𝑅𝐶 , andTherefore,
CNN-based methods. 𝑇𝐶𝐴 , exhibit
it is necessary to supplement
similar changing trendsadditional
when RefOver features forCdFoul f
and
effective diagnosis of RefOver;
occur, and this similarity has a notable negative impact on the BN-, SVM-, and C
(4) Further analysis reveals
based that when
methods. using the
Therefore, it same feature combination,
is necessary to supplement suchadditional
as using feature
features fromeffective
either Case 1 or Case
diagnosis 2, the six foundational data-driven methods
of RefOver;
achieve different
(4) Furtherresults in fault
analysis detection
reveals that andwhen diagnosis.
using theSome same methods show better such as
feature combination,
FDD performance features from either Case 1 or Case 2, the six foundational accuracy
than the others. For example, in Case 1, the fault detection data-driven met
of the BN-based method
achieve is 18.5%
different higher
results in than
faultthat of the and
detection CNN-based
diagnosis. method,
Some and the show b
methods
fault diagnosis accuracy of the BPNN-based method is 17.6%
FDD performance than the others. For example, in Case 1, the fault dete higher than that of the
BN-based method.accuracyThisofindicates that different
the BN-based method methods
is 18.5% mayhigher
have different
than that optimal
of the CNN-b
feature combinations to achieve the best FDD performance. This also highlights the
method, and the fault diagnosis accuracy of the BPNN-based method is 17.6% higher
Processes 2023, 11, 3299 than that of the BN-based method. This indicates that different methods may17have of 21
different optimal feature combinations to achieve the best FDD performance. This
also highlights the necessity of feature selection for each method separately. There-
fore, it is advisable to individually choose the optimal feature combination for each
necessity of feature selection for each method separately. Therefore, it is advisable to
FDD method.
individually choose the optimal feature combination for each FDD method.
3.3.
3.3. Fault
FaultDiagnosis
Diagnosis Results
Results Using
Using Improved
Improved Models
Models
To
To assess the optimal performance achievable by
assess the optimal performance achievable by improved
improved data-driven
data-driven methods,
methods,
enhanced
enhanced versions
versionsof of BPNN,
BPNN, CNN,CNN, SVM, SVM, SVDD,SVDD, BN, BN, and
and RFRF models
models werewere employed
employed in in
evaluating the selected features. These foundational models
evaluating the selected features. These foundational models are improved through theare improved through the use
of
usea of
genetic algorithm
a genetic algorithm to optimize
to optimize their
theirparameters.
parameters. Specifically,
Specifically,for forSVM
SVMand and SVDD
SVDD
models,
models, the genetic algorithm was utilized to optimize the penalty constant and width
the genetic algorithm was utilized to optimize the penalty constant and width ofof
the Gaussian. In
the Gaussian. Inthe
thecasecase of BN,
of BN, the genetic
the genetic algorithmalgorithm was applied
was applied to optimize
to optimize the
the Gaussian
Gaussian
parameters parameters
when the when the
state of the state of the
parent nodeparent node
is false. is false. Additionally,
Additionally, for BPNN for andBPNN
CNN,
and CNN, the
the genetic geneticwas
algorithm algorithm
employed wasto employed
optimize to theoptimize
learningthe learning
rate, as well rate,
as theasnumbers
well as
the numbers of convolutional
of convolutional and pooling
and pooling layers, layers, respectively.
respectively.
The
The diagnostic accuracies of the improved models
diagnostic accuracies of the improved models and and their
their comparison
comparison with with the
the
foundational
foundational models are illustrated in Figure 18. The results demonstrate that after model
models are illustrated in Figure 18. The results demonstrate that after model
optimization,
optimization, all all six
sixmodels
modelsexhibit
exhibitimproved
improveddiagnostic
diagnosticperformance,
performance, with withthethemaximum
maximum
enhancement reaching 10.7%
enhancement reaching 10.7%(SVM(SVMininCase Case1).1). In situations
In situations wherewhereonlyonly features
features com-
commonly
monly
availableavailable in the
in the field arefield are considered,
considered, the improved the improved modelsdiagnostic
models achieve achieve diagnostic
accuracies
accuracies
exceedingexceeding
90%. However,90%. However, it is important
it is important to notetothat notemodel
that model optimization
optimization comescomes
at a
at a substantial
substantial timetime cost.
cost. TheThe training
training timefor
time forthe
theimproved
improvedmodels
models were were calculated usingusing
the
the training
training sets.
sets. For
Fortraining
trainingthe themodels,
models,all allsamples
samplesfrom fromthe thetraining
trainingset setwere
wereutilized.
utilized.
Results regarding
Results regarding the the time
time cost
cost of of model
model training
trainingare arepresented
presentedin inTable
Table3.3.TheThecompu-
compu
tational outcomes
tational outcomes indicate
indicate that,
that, compared
compared to to the
the training
training time
time ofofthe
thefoundational
foundational models
models
before optimization,
before optimization, the thetraining
trainingtime time forfor
thethe
improved
improved models
modelssignificantly increases,
significantly with
increases,
an increase
with ranging
an increase fromfrom
ranging 5 to 115 totimes. ThisThis
11 times. increase in training
increase in trainingtimetimemaymayrepresent the
represent
cost of improving diagnostic performance.
the cost of improving diagnostic performance.
Case 1 The foundational models The improved models Case 2 The foundational models The improved models
99.9% 99.9%
100 97.6% 97.9% 100 97.3% 98.2% 96.9% 96.9%
94.5% 95.0% 95.4% 94.7%
91.0% 91.9% 91.4% 91.0% 90.8%
98.4% 89.3%
87.2%
83.8% 83.5% 84.8%
82.2%
80 80
Accuracy /%
Accuracy /%
60 60
40 40
20 20
0 0
BN-based BPNN-based RF-based SVM-based CNN-based SVDD-based BN-based BPNN-based RF-based SVM-based CNN-based SVDD-based
Figure 18. The diagnostic accuracies of the improved models and their comparison with the
Figure 18. The diagnostic accuracies of the improved models and their comparison with the founda-
foundational models.
tional models.
Table 3. The time cost of the model training for the foundational and improved models under Case
3.4. Discussions
1 and Case 2.
Existing studies typically aim for optimal diagnostic performance during feature se-
Case Models lection,BN-Based
disregarding BPNN-Based
the practical RF-Based SVN-Based
status of on-site CNN-Based This
sensor installations. SVDD-Based
often leads
to the
The foundational models selection
344.2 s of features challenging
344.2 s to obtain
344.2 s from on-site
344.2 s sensors,
344.2 s escalating thes cost
344.2
Case 1 of implementing
The improved models 2019.4 s diagnostic
1749.3 models
s in real-world
1321.2 s applications.
1927.3 s Suchslimitations
1141.4 impede
1582.6 s
practical 382.4
Case 2 The foundational models use, particularly
s in son-site scenarios
452.8 139.1 s with restricted
316.5 s or315.4
no budget
s for272.4
additional
s
sensors. This paper addresses the practical constraint of limited on-site sensor installations
and specifically delves into the best achievable diagnostic performance using only features
obtained from on-site sensors. The study focuses on typical data-driven diagnostic methods
and their improved models. According to the evaluation results, features available in the
Processes 2023, 11, 3299 18 of 21
field are adequate for achieving acceptable diagnostic performance (e.g., accuracies or
F-measures exceeding 80%) for the seven typical faults but may fall short for achieving ex-
cellent diagnostic performance (e.g., accuracies or F-measures exceeding 90%). Additional
sensors (features) might be necessary to enhance diagnostic capabilities.
Table 3. The time cost of the model training for the foundational and improved models under Case 1
and Case 2.
Case Models BN-Based BPNN-Based RF-Based SVN-Based CNN-Based SVDD-Based

The foundational
344.2 s 344.2 s 344.2 s 344.2 s 344.2 s 344.2 s
Case 1 models
The improved
2019.4 s 1749.3 s 1321.2 s 1927.3 s 1141.4 s 1582.6 s
models
The foundational
382.4 s 452.8 s 139.1 s 316.5 s 315.4 s 272.4 s
Case 2 models
The improved
2346.9 s 2079.3 s 1658.6 s 2467.5 s 1352.1 s 2049.5 s
models
Note: Calculation time was evaluated in MATLAB 2018b environment installed on a computer with Intel Core
i9-10900K (3.70 GHz) CPU and 16 GB of memory.
Furthermore, under the constraint of using only on-site sensors, this paper identi-
fies which faults can achieve excellent diagnostic performance and which cannot. For
example, RedCdW, RedEvW, and NcG are the faults most easily diagnosed with excel-
lent performance. Conversely, RefOver is the most challenging fault to achieve excellent
diagnostic performance, with F-measures over 90%. While the improved models lead to
better diagnostic performance, they come with increased computational costs and longer
training times. This implies that on-site hardware needs upgrading to meet the additional
computational demands, incurring certain cost increments. Such changes are generally
met with reluctance from users/manufacturers. These results offer valuable guidance for
users/manufacturers. For instance, if users/manufacturers lack the budget to add sensors,
they can rely on the diagnostic results for RedCdW, RedEvW, and NcG with a high level
of confidence. If budget constraints allow for only a limited addition of sensors, priority
should be given to addressing the RefOver fault.
This paper exclusively presents the diagnostic performance attainable by typical
foundational and their enhanced data-driven methods under the constraint of using only on-
site sensors. Further research is warranted in two key aspects: (1) exploring the diagnostic
performance of more advanced diagnostic models under the constraint of using only on-site
sensors; (2) addressing the question of which sensors need to be supplemented and the
strategy for supplementation to further enhance diagnostic performance. These areas will
be the focal points of future research.
4. Conclusions
Under the constraints of only using the features obtained through field-installed
sensors, the FDD performance of current mainstream data-driven methods is evaluated to
shed light on whether it is still be effective to obtain an expected performance. The main
conclusions are as follows:
(1) For these foundational models, considering the overall FDD performance, using
features commonly available in the field is adequate to achieve acceptable FDD per-
formance, such as accuracies or F-measures exceeding 80%. However, it falls short of
obtaining excellent FDD performance, defined as accuracies or F-measures exceeding
90%. This conclusion provides valuable information for users/manufacturers with
limited or no budget regarding the best diagnostic performance achievable using the
current mainstream data-driven methods;
(2) While the improved models result in enhanced diagnostic performance, they are
accompanied by increased computational costs and longer training times. The findings
indicate that after model optimization, all six models exhibited improved diagnostic
performance, with the maximum improvement reaching 10.7% (SVM in Case 1).
Processes 2023, 11, 3299 19 of 21
However, it is important to note that the training time significantly increased, ranging
from 5 to 11 times;
(3) Not all typical chiller faults require additional features to achieve excellent FDD
performance. Based on the number of FDD methods with F-measures exceeding
90%, the faults of RedCdW, RedEvW, and NcG are the easiest to diagnose even
without supplementing additional features. However, the fault of RefOver is the most
challenging, with only three foundational methods achieving F-measures over 90%.
The conclusion emphasizes the necessity of supplementing additional features for
a more precise diagnosis of RefOver, providing valuable guidance for users/manu
facturers aiming to enhance FDD performance;
(4) Moreover, each method may require a distinct optimal feature combination to attain
the best FDD performance. The impact of information redundancy within the feature
set varies among different methods, with effects that can be either negative or positive.
Consequently, it is crucial to individually choose the optimal feature combination for
each method.
Author Contributions: Methodology, Z.W.; Software, Z.W. and J.G.; Validation, J.G.; Formal analysis,
Z.W.; Investigation, P.X.; Resources, S.Z.; Data curation, Z.W.; Writing—original draft, Z.W.; Writing—
review & editing, Z.W. and S.Z.; Visualization, J.G. and P.X. All authors have read and agreed to the
published version of the manuscript.
Funding: National Natural Science Foundation of China (No. 51806060), Zhongyuan Outstanding
Youth Talent Program (2022 Year), Youth Science Award Project in Henan Province (225200810087),
the Program for Science & Technology Innovation Talents in Universities of Henan Province (No.
22HASTIT025), and the Program for Innovative Research Team (in Science and Technology) in
University of Henan Province (No. 22IRTSTHN006).
Data Availability Statement: No new data were created or analyzed in this study. Data sharing is
not applicable to this article.
Conflicts of Interest: The authors declare no conflict of interest.
Abbreviations
FDD fault detection and diagnosis

HVAC heating, ventilating, and air conditioning
RedCdW reduced condenser water flow
RedEvW reduced evaporator water flow
RefLeak refrigerant leakage
RefOver refrigerant overcharge
CdFoul condenser fouling
NcG non-condensable gas in refrigerant
ExOil excess oil
BN Bayesian network
BPNN back-propagation neural network
RF random forest
SVM support vector machine
CNN convolutional neural network
SVDD support vector data description
References
1. Nalley, S. Annual Energy Outlook 2021; Energy Information Administration: Washington, DC, USA, 2021.
2. Building Energy Conservation Research Center of Tsinghua University. China Building Energy Conservation Annual Development
Report; China Building Industry Press: Beijing, China, 2021.
3. Wang, Z.; Wang, L.; Ma, A.; Liang, K.; Song, Z.; Feng, L. Performance evaluation of ground water-source heat pump system with
a fresh air pre-conditioner using ground water. Energy Convers. Manag. 2019, 188, 250–261. [CrossRef]
4. Katipamula, S.; Brambley, M.R. Review Article: Methods for Fault Detection, Diagnostics, and Prognostics for Building Systems—
A Review, Part I. HVACR Res. 2005, 11, 3–25. [CrossRef]
Processes 2023, 11, 3299 20 of 21
5. Zhao, Y.; Li, T.; Zhang, X.; Zhang, C. Artificial intelligence-based fault detection and diagnosis methods for building energy
systems: Advantages, challenges and the future. Renew. Sustain. Energy Rev. 2019, 109, 85–101. [CrossRef]
6. Mirnaghi, M.S.; Haghighat, F. Fault detection and diagnosis of large-scale HVAC systems in buildings using data-driven methods:
A comprehensive review. Energy Build. 2020, 229, 110492. [CrossRef]
7. Wei, Y.; Zhang, X.; Shi, Y.; Xia, L.; Pan, S.; Wu, J.; Han, M.; Zhao, X. A review of data-driven approaches for prediction and
classification of building energy consumption. Renew. Sustain. Energy Rev. 2018, 82, 1027–1047. [CrossRef]
8. Kim, W.; Katipamula, S. A review of fault detection and diagnostics methods for building systems. Sci. Technol. Built Environ.
2018, 24, 3–21. [CrossRef]
9. Shi, Z.; O’Brien, W. Development and implementation of automated fault detection and diagnostics for building systems:
A review. Autom. Constr. 2019, 104, 215–229. [CrossRef]
10. Chen, J.; Zhang, L.; Li, Y.; Shi, Y.; Gao, X.; Hu, Y. A review of computing-based automated fault detection and diagnosis of heating,
ventilation and air conditioning systems. Renew. Sustain. Energy Rev. 2022, 161, 112395. [CrossRef]
11. Chen, Z.; O’Neill, Z.; Wen, J.; Pradhan, O.; Yang, T.; Lu, X.; Lin, G.; Miyata, S.; Lee, S.; Shen, C.; et al. A review of data-driven fault
detection and diagnostics for building HVAC systems. Appl. Energy 2023, 339, 121030. [CrossRef]
12. Borda, D.; Bergagio, M.; Amerio, M.; Masoero, M.C.; Borchiellini, R.; Papurello, D. Development of Anomaly Detectors for HVAC
Systems Using Machine Learning. Processes 2023, 11, 535. [CrossRef]
13. Gao, Y.; Miyata, S.; Akashi, Y. Automated fault detection and diagnosis of chiller water plants based on convolutional neural
network and knowledge distillation. Build. Environ. 2023, 245, 110885. [CrossRef]
14. Gao, J.; Han, H.; Ren, Z.; Fan, Y. Fault diagnosis for building chillers based on data self-production and deep convolutional neural
network. J. Build. Eng. 2021, 34, 102043. [CrossRef]
15. Han, H.; Zhang, Z.; Cui, X.; Meng, Q. Ensemble learning with member optimization for fault diagnosis of a building energy
system. Energy Build. 2020, 226, 110351. [CrossRef]
16. Fan, Y.; Cui, X.; Han, H.; Lu, H. Feasibility and Improvement of Fault Detection and Diagnosis Based on Factory-Installed Sensors
for Chillers. Appl. Therm. Eng. 2020, 164, 114506. [CrossRef]
17. Ebrahimifakhar, A.; Kabirikopaei, A.; Yuill, D. Data-driven fault detection and diagnosis for packaged rooftop units using
statistical machine learning classification methods. Energy Build. 2020, 225, 110318. [CrossRef]
18. van de Sand, R.; Corasaniti, S.; Reiff-Stephan, J. Data-driven fault diagnosis for heterogeneous chillers using domain adaptation
techniques. Control. Eng. Pract. 2021, 112, 104815. [CrossRef]
19. Zhao, Y.; Xiao, F.; Wen, J.; Lu, Y.; Wang, S. A robust pattern recognition-based fault detection and diagnosis (FDD) method for
chillers. HvacR Res. 2014, 20, 798–809. [CrossRef]
20. Chen, K.; Wang, Z.; Gu, X.; Wang, Z. Multicondition operation fault detection for chillers based on global density-weighted
support vector data description. Appl. Soft Comput. 2021, 112, 107795. [CrossRef]
21. Zhang, C.; Peng, K.; Dong, J. An incipient fault detection and self-learning identification method based on robust SVDD and
RBM-PNN. J. Process Control. 2020, 85, 173–183. [CrossRef]
22. Wang, Z.; Liang, B.; Guo, J.J.; Wang, L.; Tan, Y.; Li, X. Fault diagnosis based on residual–knowledge–data jointly driven method
for chillers. Eng. Appl. Artif. Intell. 2023, 125, 106768. [CrossRef]
23. Wang, Z.; Liang, B.; Guo, J.J.; Wang, L.; Tan, Y.; Li, X.; Zhou, S. Fault Diagnosis Based on Fusion of Residuals and Data for Chillers.
Processes 2023, 11, 2323. [CrossRef]
24. Ng, K.H.; Yik, F.W.H.; Lee, P.; Lee, K.; Chan, D. Bayesian Method for HVAC Plant Sensor Fault Detection and Diagnosis. Energy
Build. 2020, 228, 110476. [CrossRef]
25. Li, T.; Zhao, Y.; Zhang, C.; Luo, J.; Zhang, X. A knowledge-guided and data-driven method for building HVAC systems fault
diagnosis. Build. Environ. 2021, 198, 107850. [CrossRef]
26. Liu, Z.; Huang, Z.; Wang, J.; Yue, C.; Yoon, S. A novel fault diagnosis and self-calibration method for air-handling units using
Bayesian Inference and virtual sensing. Energy Build. 2021, 250, 111293. [CrossRef]
27. Li, P.; Anduv, B.; Zhu, X.; Jin, X.; Du, Z. Across working conditions fault diagnosis for chillers based on IoT intelligent agent with
deep learning model. Energy Build. 2022, 268, 112188. [CrossRef]
28. Choi, Y.; Yong, S. Autoencoder-driven fault detection and diagnosis in building automation systems: Residual-based and latent
space-based approaches. Build. Environ. 2021, 203, 108066. [CrossRef]
29. Yoo, Y.-J. Fault Detection Method Using Multi-mode Principal Component Analysis Based on Gaussian Mixture Model for
Sewage Source Heat Pump System. Int. J. Control. Autom. Syst. 2019, 17, 2125–2134. [CrossRef]
30. Zhang, H.; Chen, H.; Guo, Y.; Wang, J.; Li, G.; Shen, L. Sensor fault detection and diagnosis for a water source heat pump
air-conditioning system based on PCA and preprocessed by combined clustering. Appl. Therm. Eng. 2019, 160, 114098. [CrossRef]
31. Zhang, C.; Xue, X.; Zhao, Y.; Zhang, X.; Li, T. An improved association rule mining-based method for revealing operational
problems of building heating, ventilation and air conditioning (HVAC) systems. Appl. Energy 2019, 253, 113492. [CrossRef]
32. Guo, Y.; Liu, J.; Liu, C.; Zhu, J.; Lu, J.; Li, Y. Operation Pattern Recognition of the Refrigeration, Heating and Hot Water Combined
Air-Conditioning System in Building Based on Clustering Method. Processes 2023, 11, 812. [CrossRef]
33. Chen, Y.; Zhang, D.; Yan, R. Domain adaptation networks with parameter-free adaptively rectified linear units for fault diagnosis
under variable operating conditions. IEEE Trans. Neural Netw. Learn. Syst. 2023. [CrossRef]
Processes 2023, 11, 3299 21 of 21
34. Chen, Y.; Zhang, D.; Zhang, H.; Wang, Q.-G. Dual-path mixed-domain residual threshold networks for bearing fault diagnosis.
IEEE Trans. Ind. Electron. 2022, 69, 13462–13472. [CrossRef]
35. Wang, D.; Yang, J.; Liu, X.; Yang, Q.; Wang, Q. Wavelet neural network approach for fault diagnosis to a chemical reactor.
In Proceedings of the 2006 6th World Congress on Intelligent Control and Automation, Dalian, China, 21–23 June 2006; Volume 2,
pp. 5764–5768.
36. Comstock, M.C.; Braun, J.E.; Groll, E.A. The Sensitivity of Chiller Performance to Common Faults. HVACR Res. 2001, 7, 263–279.
[CrossRef]
37. Zhou, Q.; Wang, S.; Xiao, F. A Novel Strategy for the Fault Detection and Diagnosis of Centrifugal Chiller Systems. HVACR Res.
2009, 15, 57–75. [CrossRef]
38. Bai, X.; Zhang, M.; Jin, Z.; You, Y.; Liang, C. Fault detection and diagnosis for Chiller based on Feature-recognition model and
Kernel Discriminant Analysis. Sustain. Cities Soc. 2022, 79, 103708. [CrossRef]
39. Zhang, L.; Frank, S.; Kim, J.; Jin, X.; Leach, M. A systematic feature extraction and selection framework for data-driven
whole-building automated fault detection and diagnostics in commercial buildings. Build. Environ. 2020, 186, 107338. [CrossRef]
40. Han, H.; Gu, B.; Wang, T.; Li, Z. Important sensors for chiller fault detection and diagnosis (FDD) from the perspective of feature
selection and machine learning. Int. J. Refrig. 2011, 34, 586–599. [CrossRef]
41. Gao, Y.; Han, H.; Ren, Z.X.; Gao, J.; Jiang, S.; Yang, Y. Comprehensive Study on Sensitive Parameters for Chiller Fault Diagnosis.
Energy Build. 2021, 251, 111318. [CrossRef]
42. Yan, K.; Ma, L.; Dai, Y.; Shen, W.; Ji, Z.; Xie, D. Cost-sensitive and Sequential Feature Selection for Chiller Fault Detection and
Diagnosis. Int. J. Refrig. 2017, 86, 401–409. [CrossRef]
43. Wang, Z.; Wang, Z.; Gu, X.; He, S.; Yan, Z. Feature selection based on Bayesian network for chiller fault diagnosis from the
perspective of field applications. Appl. Therm. Eng. 2018, 129, 674–683. [CrossRef]
44. Zhao, X.; Yang, M.; Li, H. Field implementation and evaluation of a decoupling-based fault detection and diagnostic method for
chillers. Energy Build. 2014, 72, 419–443. [CrossRef]
45. Comstock, M.C.; Braun, J.E. Development of Analysis Tools for the Evaluation of Fault Detection and Diagnostics for Chillers; ASHRAE
Research Project 1043-RP, HL 99-20, Report #4036-3; Purdue University: West Lafayette, IN, USA, 1999.
46. Rossi, T.M. Detection, Diagnosis, and Evaluation of Fault in Vapor Compressor Cycle Equipment. Ph.D. Thesis, Purdue University,
West Lafayette, IN, USA, 1995.
47. Wang, F. The use of artificial neural networks in a geographical information system for agricultural land-suitability assessment.
Environ. Plan. A 1994, 26, 265–284. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

Processes 11 03299

Uploaded by

Copyright:

Available Formats

Processes 11 03299

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Processes 11 03299

Uploaded by

Copyright:

Available Formats

processes

Processes 2023, 11, 3299. https://doi.org/10.3390/pr11123299 https://www.mdpi.com/journal/processes

to achieve optimal diagnostic performance, often overlooking the practical constraints

2.2. Methods andMaterials

Offline training of models Online FDD

Historical data（normal and fault data） Online real-time measurements

Steady state detector Steady state detector

Testing data Training data

Fault n ··· Fault 1 Fault-free

Yes Yes Yes Yes

Offline training of models Online FDD

Historical data（normal and fault data） Online real-time measurements

Steady state detector Steady state detector

Testing data Training data

Known fault Normal

Trained models for online FDD

Figure 2. FDD framework of the multi-class classification-based methods.

2.2. Investigation of Field Chiller Onboard Sensors

Table 1. The respective features measured by the commonly installed sensors.

No. Designation Description Formulation

Table 2. The features derived from these parameters in Table 1.

No. Designation Description Formulation

Logarithmic mean temperature difference TCO− TCI

5 TEA Evaporator approach temperature TEA = TEO − TRE

2.3. Experimental Data and Model Evaluation

2.3.1. Experimental Data

2.3.2. Feature Selection and Data Pre-Processing

2.3.3. Development of Foundational FDD Models

3. Results and Discussions

Case 1 (using fewer features)

RedCdW 1583 17 0 0 0 0 0 RedCdW 1539 4 1 0 56 0 0

RedEvW 61 1538 0 1 0 0 0 RedEvW 21 1522 23 17 13 0 4

RefLeak 203 23 1246 83 0 1 44 RefLeak 55 17 1248 29 128 0 123

RefOver 22 4 84 1489 0 0 1 RefOver 3 3 3 1296 293 2 0

CdFoul 85 30 8 88 1369 10 10 CdFoul 20 39 27 72 1430 0 12

NcG 0 0 0 15 0 1585 0 NcG 0 0 3 3 0 1594 0

ExOil 292 111 663 132 0 1 401 ExOil 28 47 64 67 527 0 867

Case-1 (using fewer features) Case-2 (using more features)

RedCdW 1600 0 0 0 0 0 0 RedCdW 1598 0 0 1 1 0 0

RedEvW 0 1600 0 0 0 0 0 RedEvW 0 1593 0 4 0 1 2

RefLeak 0 0 1595 0 2 0 3 RefLeak 1 0 1531 1 7 1 59

Processes 2023, 11, 3299 Case-1 (using

RefLeak 1571 11 RefLeak 1551 24

RefLeak 1172 79 346

RefLeak 1291 14 282

RefLeak 21 1466 24 30 32 27 RefLeak 25 1363 106 105

CdFoul 26 3 11 130 1147 283 0 CdFoul 1 3 114 20 1449 12 1

NcG 0 0 0 17 0 1583 0 NcG 0 0 4 121 20 1433 22

ExOil 6 11 100 2 57 0 1424 ExOil 0 0 0 0 0 3 1597

Case-1 (using fewer features) Case-2 (using more features)

RedCdW 1455 26 29 18 25 20 27 RedCdW 1387 133 52 20 7 1 0

RedEvW 32 1438 39 23 25 20 23 RedEvW 0 1334 151 93 15 6 1

RefLeak 21 0 1466 24 30 32 27 RefLeak 0 25 1363 106 105 1 0

CdFoul 26 16 25 23 1480 13 17 CdFoul 0 2 3 174 1368 51 2

NcG 25 24 12 22 19 1478 20 NcG 0 0 0 3 42 1555 0

ExOil 25 17 23 15 22 32 1466 ExOil 0 0 7 27 82 321 1163

Case 1 (using fewer features) Case 2 (using more features)

Case 1 (using fewer features)

3.2.2. Individual Fault Diagnosis Performance