Cired 2019 - 995

25th International Conference on Electricity Distribution Madrid, 3-6 June 2019
Paper n° 995
MACHINE LEARNING BASED HEALTH FRAMEWORK FOR POWER TRANSFORMERS
Sílvio RODRIGUES Maria Inês VERDELHO Ana Filipa RIBEIRO Luís CORDEIRO
jungle.ai – The Netherlands EDP Distribuição – Portugal EDP Distribuição – Portugal EDP Inovação – Portugal
[email protected] [email protected] [email protected] [email protected]
Some machine learning (ML) approaches to enhance

ABSTRACT transformer maintenance have been tested before [4-7].
Most are either focused on a specific aspect of PT
This paper presents a holistic machine learning solution maintenance like DGA data [8] or take a more holistic
to forecast current and future Power Transformers (PT) approach to quantify the health index for the transformer
health condition. Current PT monitoring and [9]. However, existing solutions are based on standard
maintenance standards recommendations followed by methods that do not take into consideration the
utility companies depend mostly on the type of the PT and specificities of a certain transformer fleet, such as their
past results. We propose a health framework that is power load, weather exposure and maintenance policies.
capable of accurately deriving key actionable insights. It Furthermore, maintenance policies may be improved by
leverages data from different sources and distinct leveraging more data sources. For example, the PTs
business units. Current and future oil health conditions power load should contain relevant information.
are forecasted and summarized in a Key Performance In this paper, we will share insights distilled from an
Indicator (KPI). The results show that key oil properties ongoing project whose main objective is to estimate the
are forecasted up to five years in advance with high current and future PT health. The project has the
accuracy and that the predictions are robust to data collaboration of several business units from a large utility
outliers. Future oil interventions are then recommended company and an artificial intelligence company. This
based on the KPI value of the forecasted measurements. allows the development of robust machine learning
The proposed framework allows utility companies to save models since they will be presented with data of PTs
considerable resources by reducing and improving the exposed to distinct application types.
scheduling of oil analyses, as well as increasing the The results help to improve the PT fleet in multiple ways:
quality of service (QoS) standards of energy grids. ● prioritization and allocation of resources to
timely intervene in the most critical fleet assets;
INTRODUCTION ● schedule, and potentially, reduce oil analyses;
HV/MV and MV/MV power transformers (PTs) failures ● identify rapidly degrading and risky PTs;
have a high impact on quality of service (QoS) of the ● understand the impact of use and load policies.
network and result in costly repairs. Currently, three main With this framework, one is able to have an overall view
approaches are used to aid PT maintenance [1-3]: of the current condition of the PTs, as well as future
● analysis of the physical properties of oil (e.g. predictions of its condition. Therefore, decisions can be
Interfacial Tension (IFT) and acidity) to made not only based in the current maintenance of the PT
understand the quality of the insulating oil; but also to plan the next years’ maintenance plans in a
● Dissolved Gas Analysis (DGA) identifies predictive way, therefore, anticipating failures in PT
different types of faults the transformer has which have a very large impact on the grid network.
suffered based on gas concentrations dissolved The remainder of this paper is organized as follows: our
in the transformer oil; health framework (HF) is presented in the next section.
● analysis of furan content that indirectly assesses Thereafter, each main section of HF is presented in more
the quality of paper insulation. detail. We start with the data landscape and its
Maintenance operations like oil substitutions or preprocessing. After, we introduce the pipeline used to
regenerations impact the functioning of the transformer train ML algorithms. We then assess the quality of our
by improving its insulating and cooling capabilities. predictions and distil key findings from them. The
Currently, the standard recommendation for maintenance actionable insights that are passed to the end users are
of insulating oils of PTs (IEC 60422 [1]) suggests an oil then introduced. Finally, conclusions, main findings and
examination frequency based on the type of transformer future work are provided in the last section.
and past examination results. Depending on these results
different corrective actions are recommended (e.g. oil HEALTH FRAMEWORK
drying, regeneration or substitution). As expected not We introduce a health framework that enables us to
every company will follow the standard in the same forecast and take action on the predicted future health
manner; as a result, business units have different policies state of a PT oil. The framework comprises the following
to guide their transformer maintenance operations.
CIRED 2019 1/5

Paper n° 995
key components: on the different oil properties.

● aggregation of data from multiple sources; The oil analyses data was enriched with data from asset
● data handling and preprocessing; record information with the characteristics of each PT,
● predictive modelling; e.g. age, nominal power and voltage. Online SCADA
● translation of predictions to actionable insights. data containing high frequency data of analog and digital
sensors, e.g. current, voltage and power, was also used.
Our framework learns from the examination data a
transformer fleet, i.e. it is able to adapt to different
maintenance policies. This results in a solution that may
be adopted by already-existing maintenance strategies
without disruption.
Using data originated from different sources enables us to
have a more complete understanding of a PT’s state over
time, and to capture trend changes of state change before
the next scheduled PT oil analysis. Given the different
nature of these data sources, as well as the size of the
aggregated data, modern data handling techniques are
required to clean and consolidate the data sources: parse
and transform data into usable formats, and to identify
outliers using statistical methods and domain-based rules. Fig. 2 Diagram with the available data sources.
We also further augmented the original dataset with
engineered features. We identified outliers of each property by taking into
Once we are able to reliably predict the future values of account what are the expected values for each property
the quantities of interest, we then translate those into based on current standards [1]. Given the incompleteness
actionable insights through the calculation of Key of maintenance data, most interventions haven not been
Performance Indicators (KPIs) and comparison of these annotated, we devised a method to detect unlabeled
with multi-dimensional action thresholds. interventions through an active learning approach. This
The following diagram summarizes the key sections of method not only helped us correctly identifying unlabeled
the HF, which range from raw PT measurements to interventions but also made possible the detection of
actionable insights. Each step of the framework will be inliers that were not detected since these are within the
explored and explained in detail in the following sections. expected ranges, as one may see in the below figures.
Fig. 1 Key components of the proposed framework.
DATA AGGREGATION & PREPROCESSING

We aggregated data from several EDP business units.
This ensures that our predictive models were exposed to
and learned from as many types of PTs, power load
scenarios and maintenance policies as possible.
The main data source contains historical oil analyses
records done to the PT fleet. These oil analyses measure
properties that help assessing the quality of the insulating
oil. The oil analysis data was collected by a specialized Fig. 3 Examples of detected interventions (black vertical lines)
laboratory, whose first samples date from 1989. Firstly, and outliers present in the data (isolated spikes).
we identified outliers that were originated from the
manual insertion of the measurements in the database. We augmentated the oil analyses dataset to allows us to
Another challenge were the unevenly sampled oil predict over multiple time windows. With this strategy
analyses. This was an important aspect that had to be the dataset increased seven times its original size.
considered while preparing the data to use in the ML
models in the next step of the HF. PREDICTIVE MODELLING
There were no interventions annotations for oil In this section we describe how the ML models were
replacements. These interventions have a high impact on trained to forecast the oil characteristics.
the transformer oil condition. This led to the development
of an active learning method to infer interventions based
CIRED 2019 2/5

Paper n° 995
ML models into a holistic solution, leading to a larger dataset of oil

Initially, several modelling approaches and ML models analyses samples.
were considered. Among others, we tested bayesian non-
parametric gaussian processes models, neural networks,
and several classical ML algorithms, e.g. k-nearest
neighbors and logistic regression. However, the best
results were obtained with tree-based models. Several
ML packages were used (scikit-learn [10], LightGBM
[11] and XGBoost [12]), with XGBoost obtaining the
best results.
Hyperparameter optimization
Fig. 4 IFT prediction absolute residuals.
The hyperparameters of the ML models were optimized
with a bayesian optimization algorithm. To this end, the The figures below shows real and predicted
library scikit-optimize [13] was employed. A total budget measurements for two different PTs and an automatically
of 200 iterations was used and, among others, the detected oil intervention. The first one shows that our
learning rate and maximum depth of each tree were predictions follow the real measurements reasonably well
optimized. even though the data contains a large outlier. The second
plot shows that our predictions did not predict the real
measurement after the oil intervention. The increased
Auto-scalable computation infrastructure residuals were due to an abnormally high oil ageing rate.
Given the highly computationally demanding nature of This specific PT's load conditions may have been the
the task, an appropriate computation infrastructure was cause for the increased ageing process.
required. A cloud-based distributed cluster computing
engine, internally developed at jungle.ai, was used to
scale and parallelize the training process. Several dozens
of computers were simultaneously used to optimize the
best performing models. This allowed us to iterate over
the results with ease and improve the problem
formulation and test different scenarios.
RESULTS
This section summarises the prediction results of IFT,
dielectric dissipation factor (DDF) or tangent delta (tg(δ))
and acidity. The following plot shows the empirical
cumulative density function (CDF) of the absolute
prediction residuals for the IFT. One can observe that the Fig. 5 Real measurements (in blue), detected intervention (black
ML model performs better for the PTs which nominal vertical line) and forecasted measurements (in red).
powers vary between the 10-20 MVA range than for 50-
100 MVA power range. This performance difference may The figure at the top of the next page shows the
be due to the total sample count for the 10-20 MVA histograms and respective empirical CDFs of the
range being approximately 10-fold higher than for the 50- prediction residuals for IFT, tg(δ) and acidity. The
100 MVA range. This result demonstrates the importance prediction time window ranges between 1-5 years.
of the proposed HF integrating data from different BUs
CIRED 2019 3/5

Paper n° 995
The table below it shows different quantiles of the before an intervention was performed is shown in the
prediction absolute residuals for different prediction time next plot; in red, the corresponding CDF. The plot shows
windows. The values are presented as a percentage of the that one can interpret the KPI as a metric of oil
typical range amplitude of each quantity. The typical degradation, since the samples right before an
ranges amplitude are: 34.32 for the IFT , 0.348 for tg(δ) intervention tend to correspond to high values of the KPI.
and 0.744 for the acidity. As expected the error increases With this plot, we can attribute an actionable meaning to
with longer prediction time windows. It is important to the KPI value, e.g.: “40% of the interventions were
Fig. 7 IFT, tg(δ) and Acidity prediction residuals counts and empirical CDF.
Fig. 6 Residuals for IFT, tg(δ) and Acidity predictions (% of typical range amplitude).
refer that the prediction quality does not heavily performed before the KPI reached a value of 20”.
deteriorate for a five year windows and that actions may
be still be planned based on these.
ACTIONABLE INSIGHTS
The main objective of the proposed HF is to derive
actionable insights from the high accuracy forecasts and
act in accordance to them. In this section, we demonstrate
how we leveraged the predictions of the oil properties,
presented in the previous section, and used them to create
an actionable metric. Fig. 8 CDF and KDE estimated distribution of the KPI values
immediately prior to an oil intervention.
Summarizing the oil condition

We mapped the oil condition to a single value. To do so, Forecasting the oil condition
we employed a dimensionality reduction algorithm, With this method we can map the three predicted oil
which allowed us to aggregate the three quantities of characteristics to the KPI space. Thus, we can forecast
interest into one. We treat this one-dimensional future KPI values. Consider the following example of a
representation of the oil condition as a KPI. transformer’s KPI value over time, with the true values
shown in blue. The oil interventions are marked by the
Oil condition before an intervention vertical black lines, and the forecasted KPI values are
shown in red. Important to note that these were all
Since we now have a one-dimensional representation of
predicted at least 2 years ahead.
the oil state, we may understand what the usual oil
condition is before an intervention is carried out.
The empirical distribution of the KPI values immediately
CIRED 2019 4/5

Paper n° 995
REFERENCES
[1] IEC, 2013, "Mineral insulating oils in electrical

equipment - Supervision and maintenance guidance"
[2] IEEE, 2008, “C57.104-2008 - IEEE Guide for
the Interpretation of Gases Generated in Oil-
Immersed Transformers”
Fig. 9 Example of KPI predictions over time for a given PT. [3] F. Ortiz, C. Fernández, A. Santisteban, F.
Delgado, A. Ortiz, 2016, "Estimating the age of
As expected, the KPI value increases over time (the oil is
power transformers using the concentration of
degrading), until an intervention is performed, and the
furans in dielectric oil" Proceedings ICREPQ
oil’s health is restored. From that point on, the oil
conference, RE&PQJ, vol.1, no.14
resumes its natural degrading process.
[4] P. M
Using the empirical CDF plot from the previous
[5] R. A. Prasojo, K. Diwyacitta, Suwarno, H.
subsection, we can convert the forecasted KPI values into
Gumilang, 2017, "Transformer Paper Expected Life
actionable insights. For instance, we obtained a
Estimation Using ANFIS Based on Oil
forecasted value of 35 for the first intervention date. This
Characteristics and Dissolved Gases (Case Study:
means that at least 2 years in advance, we could have
Indonesian Transformers)", MDPI Energies, vol.10,
said: “At this given date, the oil will reach a condition
iss.8
such that approximately 95% of all previous interventions
[6] W. Zuo, H. Yuan, Y. Shang, Y. Liu, T. Chen,
were performed before the oil reached such a state”.
2016, "Calculation of a Health Index of Oil-Paper
However, in some cases, we are not able to accurately
Transformers Insulation with Binary Logistic
predict the quantities of interest. A possible explanation
Regression", Mathematical Problems in
for the lack of predictive ability in those cases is the fact
Engineering, vol. 2016, Article ID 6069784
that external factors - such as electrical load and weather
[7] H. Song, J. Dai, L. Luo, G. Sheng, X. Jiang,
conditions – were not considered.
2018, "Power Transformer Operating State
Prediction Method Based on an LSTM Network",
CONCLUSIONS AND FUTURE WORK MDPI Energies, vol.11, iss.4
Forecasting the health state of PT oil has the potential of
[8] M. Duval, L. Lamarre, 2014, "The Duval
saving utility companies considerable amounts of
Pentagon—A New Complementary Tool for the
resources, as well as increasing the QoS standards of
Interpretation of Dissolved Gas Analysis in
energy grids. We have proposed an approach that
Transformers", IEEE EI Magazine, vol.30, no.6
leverages diverse data sources and comprises of data
[9] G. Brandtzæg, 2015, "Health Indexing of
aggregation and processing, modelling and interpretation.
Norwegian Power Transformers", NTNU
The predictive power of our approach enabled the
[10] Pedregosa et al., 2011, “Scikit-learn:
creation of a metric that can be acted upon, thus allowing
Machine Learning in Python”, JMLR, vol.12, 2825-
more informed maintenance policies. Hence, decisions
2830
can be made not only based in the current maintenance of
[11] T. Chen, C. Guestrin, 2016, “XGBoost:
the PT but also to plan the next years’ maintenance,
A Scalable Tree Boosting System”, Proceedings of
anticipating failures in PTs which have a large impact on
the 22nd SIGKDD conference, ACM, 785-794
the electrical network.
[12] G. Ke, Q. Meng, T. Finley, T. Wang,
Current efforts for improving the present results consist
W. Chen, W. Ma, Q. Ye, T. Liu, 2017, “LightGBM:
of integrating data relative to external factors that are
A Highly Efficient Gradient Boosting Decision
known to impact the PT, such as electrical load and
Tree”, NIPS, vol.30, 3149-3157
weather conditions. This will enable a better
[13] T. Head et al., 2018, “scikit-optimize
understanding of how these factors affect the health of
v0.5.2”, https://zenodo.org/record/1207017
the PT. Furthermore, it will allow the exploration of
forecast scenarios conditioned upon different behaviours
of said external factors.
Acknowledgments
The authors would like to thank the colleagues from the
LABELEC laboratory for the useful discussions and
exchange of ideas. We would also like to thank the
different EDP business units for their support, knowledge
sharing and openness. Finally, we would like to thank the
jungle.ai team for their contribution without which these
results would not haven been achieved.
CIRED 2019 5/5

Cired 2019 - 995

Uploaded by

Copyright:

Available Formats

Cired 2019 - 995

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cired 2019 - 995

Uploaded by

Copyright:

Available Formats

25th International Conference on Electricity Distribution Madrid, 3-6 June 2019

MACHINE LEARNING BASED HEALTH FRAMEWORK FOR POWER TRANSFORMERS

Some machine learning (ML) approaches to enhance

CIRED 2019 1/5

key components: on the different oil properties.

Fig. 1 Key components of the proposed framework.

DATA AGGREGATION & PREPROCESSING

CIRED 2019 2/5

ML models into a holistic solution, leading to a larger dataset of oil

CIRED 2019 3/5

Summarizing the oil condition

CIRED 2019 4/5

[1] IEC, 2013, "Mineral insulating oils in electrical

CIRED 2019 5/5

You might also like