Structural Health Monitoring as a Big-Data Problem

Article in Structural Engineering International · July 2018

DOI: 10.1080/10168664.2018.1461536

3 407

2 authors:

Christian Cremona João Pedro Santos

Bouygues Construction National Laboratory for Civil Engineering


Introduction: The Value of Health Monitoring in Structural
Performance Assessment
Structural health monitoring (SHM) which demonstrates how analyses of the assessment of concrete bridges exposed
comprises a wide range of activities, value of damage detection infor-mation to spatially distributed chlor-ide ingress,
which—through different technologies can be utilised for determining optimal and (f) the correlation assessment of
and algorithms—can provide exten-sive condition assessment and intervention masonry bridge building quality and its
information about the perform-ance of times throughout the service life of a ageing process.
existing and new structures over their life bridge. In the third con-tribution, a
The special issue concludes with novel
cycle. Despite the wide acceptance of decision process on how to implement
utilisations of digital image correlation
SHM as a useful aid to infrastructure risk and optimise SHM is pro-posed and its
systems for documenting shear failure
and integrity man-agement, its value is effectiveness is demon-strated through its
mechanisms in concrete beams and
seldom quantified and communicated to application to a stadium roof prone to
assessing the performance of the
infrastructure owners and operators. This heavy snowfall accumulations.
expansion joints on a major bridge. This
special issue aims to rectify this by
contribution encompasses novel digital
offering an overview of the latest
These contributions are followed by a image correlation analysis for the
scientific advancements and professional
series of six scientific papers that present purpose of long-term monitoring and
activi-ties in the field of SHM SHM technologies and algorithms related assessment in relation to bridge risk and
technologies and algorithms.
to a diverse range of challenges integrity management.
surrounding bridge per-formance,
comprising: (a) a framework for In addition, the role of SHM in facili-
The special issue starts out by analys-ing
evaluating the fatigue performance of tating suspension-bridge management is
big-data principles, demonstrating that
concrete bridges, (b) a temperature reported, and curvature measure-ment is
the challenges inherent to big data share
deformation model for ensuring pas- proposed as an indirect way of
much in common with those of SHM and
senger comfort on high-speed trains, monitoring prestressing forces.
concluding that there is great potential
for making sig-nificant advances in both (c) an investigation into the combined
fields through the interchange of effects of scour and seismic actions, Ana Mandic´ Ivankovic´, SEI Editorial
information and experience. The value of (d) an enhanced damping estimation Board; Sebastian Thöns, Technical
SHM is addressed in the second procedure for cable-supported bridges, University of Denmark; José Matos,
contribution, (e) a process for the life-cycle University of Minho, Portugal.

Structural Health Monitoring as a Big-Data Problem

Christian Cremona, Director, Technical Division, Bouygues TP, Guyancourt, France; Joao Santos, Researcher, Structures Deptartment, LNEC,
Lisbon, Portugal.
DOI: 10.1080/10168664.2018.1461536

Abstract Introduction
Structural health monitoring (SHM) has evolved over decades of continuous progress Civil-engineering structures are con-
in measuring, processing, collecting and storing massive amounts of data that can tinuously exposed to aggressive
provide valuable information for owners and managers in order to control and manage environmental and operational con-
the integrity of their structures. The data sets acquired from SHM systems are ditions, including extreme weather events
undoubtedly of the “big data” type due to their sheer volume, complexity and and damage due to accidents, all of
diversity, and conducting relevant analyses of their content can help to identify which results in wear to their
damage or failure during operation through the relationships between the components and constitutive materials, as
measurements taken by multiple sensors. A great deal can be learned from these large well as potential unexpected struc-tural
pools of data, resulting in significant advances in efficient integrity control. From changes. The thorough quantitat-ive
banking to retail, many sectors have already embraced big data, which is often assessment of the performance and
synonymous with “big expectations”; in the present case, it offers opportunities to integrity of any structure is thus of
apply data-processing research to the development of more efficient SHM systems paramount importance. This is the core
with real-time capabilities. By presenting various examples of bridge monitoring objective of structural health monitoring
systems, this paper contributes to the ongoing cross-disciplinary efforts in data (SHM): to determine and monitor the
science for the utilization and advancement of SHM. serviceability and safety of structures.
This can be expressed as activities
wherein conditions and par-ameters are
Keywords: big data; structural health monitoring; forward techniques; pattern observed, measured, recorded and
recognition; artificial intelligence; advanced statistics
processed.1 This disci-pline has been
addressed, for many

years, by designers, contractors and strategies for dealing with greater Big Data and Structural
owners with almost identical objec-tives: amounts of information in real time.1 Health Monitoring (SHM)
to check that the structures behave as
intended. This focus has gen-erated a In past decades, the stakeholders of
structures have been sceptical regard-ing Big Data: Overview and Challenges
fast-growing body of research on a wide
the advantages of SHM because it was The term “big data” is so popular today
variety of structures.1–6
seen as a generator of huge data sets of that it is overused without a clear
During the 1990s, SHM underwent major information without practical use.1 This understanding of its meaning.15 It not
changes due to advances in com-puting issue has hampered the development and only refers to a specific volume of data
and information technology. Highly application of this discipline in the past, sets that is considered vast,16 but also to
accurate sensors, sophisticated signal but the ever-decreasing cost of digital a wide range of data sets that cannot be
conditioning units, optical and wireless systems for processing, transporting and handled with traditional data-
networks, global positioning systems and storing data has contributed to the management tools and conven-tional
other technologies have all contributed to popularity of SHM to the extent that data-processing hardware and software
the development of more accurate, cost- some researchers are using the term “data frameworks. The main reason for this is
efficient measuring and monitoring deluge” and promoting the need to go not only the data’s size but also its
systems. As a conse-quence, the volume beyond the data in order to extract complexity, since these data sets are
of data sets has increased tremendously, usually unstruc-tured, distributed and
at rates of millions of measurements per meaningful information.8,9 This has led
to an important breakthrough in pattern
sensor, per hour.7 These new, richer data ceted.15,16 The term “big data” is usually
sets offer the ability to characterize struc- recognition10–14 and motivated the defined by the “3 Vs”16,17:
tural behaviour in unprecedented ways, expansion of computing platforms with
leading to international enthu-siasm for the aim of extracting value from large . Volume, i.e. the quantity of stored
implementing large monitor-ing systems amounts of complex data in real time. data.
on structures. Figure 1 shows the . Variety, i.e. the type and nature of the
exponential increase in the number of data.
Understanding the contribution of big
sensors installed on some major new . Velocity, i.e. the speed at which the
data to SHM requires the clarification of
bridges worldwide during the past 20 data is generated and processed.
both of these concepts. The present paper
years—and it actually under-estimates therefore addresses this objec-tive and
the quantity of measure-ments, since attempts to understand the connection Two other “Vs” are often added to the
sensors usually gather measurements between them. For this purpose, several description of big data18,19:
across several channels, and this does not bridge SHM systems are used as
take into account the measurement rate examples of proof-of-concept in order to . Variability, i.e. the inconsistency of the
itself. SHM has thus emerged as a high- provide clear infor-mation that can be data set.
tech discipline that is mostly driven by used for research and practical purposes . Veracity, i.e. the quality of the cap-
technological capa-bilities rather than in the realm of SHM. The next section tured data.
clear objectives and expectations. addresses and describes the major
However, an optimal cost–benefit ratio is challenges for the fields of big data and The value added by big data can lead to
still far from the reach of those who SHM, after which the several examples more confident decision-making,
operate the vast majority of SHM of moni-tored bridges are presented in greater operational efficiency, improved
systems worldwide, and numerous order to show the data-processing stages cost reduction and reduced risk regarding
research groups are focused on finding and challenges associated with SHM. any action, process or business. Despite
ways to better utilize the large amounts Finally, the most important ideas are these potential benefits however, many
of acquired data, either by optimizing the summed up and discussed, and several technical chal-lenges must be
number of deployed sensors or by conclusions are drawn. addressed.16 The most significant
developing challenges relating directly to the data
include analysis, capture, curation,
searching, sharing, storage, transfer,
visualization, querying and information
privacy,15–19 while the per-ipheral
challenges include the comput-ing
architecture, redundancy, failsafe
processes, collaborative and auto-mated
coding and meta-data model-ling, among
others.20 These challenges have been at
the heart of the trends in the big-data
field, not only for man-agement and
computing power but also—and most
importantly—for data analytics and
pattern recognition.10–14 The use of
sophisticated algorithms on big-data sets
has resulted in the development of
distributed computing
Fig. 1: Evolution of the number of sensors during the past 10 years in major monitored platforms, the mapping of data onto a
bridges large number of nodes (computing

clusters) and the use of the massive to as model updating or system identi- challenges in both fields.1,16–18 This
computing power of graphical proces- fication, these strategies consist in finding double matching suggests that forward
sing units (GPUs). the parametric physical models which approaches to big data can easily be
best fit the structural responses obtained transferred to SHM, offering greater
from the sensors to infer information that capabilities for integrity control and
SHM: Overview and Challenges
cannot be directly measured on-site. structural management compared to
SHM can be simplified into the process They are therefore generally enforced to inverse strategies for SHM.
of implementing a damage detection and use optimization or stochastic search
characterization strategy for civil- tech-niques for computational efficiency, The field of big data promotes sol-utions
for dealing with the kinds of large-scale
engineering structures.21–23 Damage is leading to true mathematical but non-
data sets that are generated by SHM. For
defined as the modification of material physical solutions.1,24,25 instance, the volume and dimensionality
and/or its geometric prop-erties, as well of the data requires storage across
as changes to the bound-ary conditions Forward (data-driven) approaches are the multiple machines and decentralized
and system connectivity. Said differently, opposite of inverse ones and their use has learning with parallel GPU computing;
damage is anything that affects a recently increased both in the SHM thus, SHM as a big-data problem needs
system’s per-formance. SHM involves literature and in practice. They do not to account for storage, access,
the scoring of performance indicators require the development of numerical or communication and privacy (as it relates
using visual or measured data and analytical models to be fitted with in situ to the details of structures and
analytical simu-lations combined with data. In relation to SHM, forward infrastructure that must be kept secret or
engineering experience, the final result of methodologies poten-tially present a secure). In addition, the challenges
which is the development of a health more flexible and inex-pensive solution, inherent to big data include navigating
profile consisting of performance profiles since they do not require the huge data sets, detecting new phenomena
that allow the “future health” of a development of specific models for each and “learning”, which is where the
structure to be accurately predicted. system.6 These tech-niques aim at current topics and trends of the field are
SHM is therefore not limited to a single most valuable in relation to SHM; ample
extracting sensitive information from a
type of instrumentation or process of data opportunities are available to SHM
time series on-site using statistical and
acquisition, but rather aims at extracting researchers and practitioners to find
analytical methods; the recent SHM
features (indicators) to determine the solutions to the technical and data-
literature reports their applications to
current (diagnosis) and future (prognosis) processing challenges which are not only
structures that are subjected to
operating con-ditions of a structure.3 operational and environmental the subject of study and prac-tice in the
realm of SHM but are also situated in the
effects,26,27 showing that data-based
frameworks of the numerous different
The SHM pipeline starts with on-site algorithms can more easily reproduce
activities and actors surrounding the use
data acquisition, which consists of these effects when compared to
of big data worldwide.
periodically sampling dynamic response numerical or analytical models.
measurements from an array of sensors. Moreover, their compu-tational
This is generally followed by the simplicity situates them as the most
extraction of damage-sensitive features suitable solution for carrying out real-
from these measurements and the time analyses of the data gen-erated by Common Big Data and SHM
classification of these features to the SHM of large-scale structures.28 Data-Processing Challenges
determine the current state of the system.
The most important aspect of and Solutions
any SHM framework is its capacity to Big Data vs. SHM As mentioned in the previous section,
provide data that can update the health The review of the latest literature related forward approaches are generally pre-
profile; in other words, the infor-mation to SHM and big data reveals that the ferred over inverse ones in modern SHM
acquired on-site—with its uncertainties forward strategies for SHM exhibit the systems and frameworks. In addition,
and multiple data formats—must be they exhibit the important advantage of
same pipelines and chal-lenges of big
incorporated into either the numerical sharing most of the data-processing
data. Table 1 compares the different
models (inverse strategies) or the
blocks or conceptual steps of big data challenges with the big-data-specific
statistical and pattern recognition
and SHM pipelines, while Table 2 domain. In this section, the various parts
methods (forward analysis).3 Since the of forward
synthesizes the different
data flow may orig-inate from an array of
different sensors which capture distinct
phenomena at
different sampling rates under varying Pipeline Big data Forward SHM
conditions, an additional challenge con-
sists of supressing the effects of environ- Recording Acquisition
mental and operational actions on the
structural responses—a process referred Cleansing Normalization
to in the SHM literature as Aggregation Data fusion
Modelling Prediction
In SHM today, health profiles for both
diagnosis and prognosis generally rely on Interpretation Classification
inverse strategies. Usually referred Table 1: Pipelines for big data and forward approaches to SHM

(future, past and present) can be ana- covariance between the two variables,
Big data SHM
lysed in the same way.1,23,27 (b) conducting statistical hypothesis
Volume Data flow tests on the location of the expected
Depending on the application, the data
value of the Mahalanobis distance (with
Variety Data types normalization strategies may have to deal
a 99% confidence level) using the chi-
with two types of influen-cing factor:
Velocity High-frequency sampling squared statistical distribution, and (c)
those which are known to occur and
applying a Kolmogorov– Smirnov
Variability Uncertainties influence the data in relation to the
hypothesis test to the data set before and
chosen application, and those which may
Veracity Corrupting noise after the outlier removal to check if it
or may not occur. The first type is
consists of the same data distribution.
Table 2: Main challenges for big data and generally included in the true distribution
The results of this process are shown in
forward approaches to SHM of the acquired data, while the second
Fig. 2c and 2d for the first and second
may or may not gen-erate different data
statistical tests, respectively, wherein it
distributions, so its objects are therefore
data-driven approaches to SHM (Table 1) can be observed that each test identified a
generally known as outliers.1,28 Outliers different unexpected source of variation.
are described and solved as parts of a
are records or samples belonging to
big-data processing pipeline with the
different true statistical distributions, and
objective of showing that numerous
their removal from data analysis is The normalization strategies for the
solutions of this type are already in use in
generally conducted before feature variability generated by “regular” and
SHM, while also reinforcing that the
extraction by applying statistical tests permanent factors are generally dealt
present and future solutions developed
through the use of appropriate robust with in SHM big-data applications using
for big data can also be simi-larly applied
measures of expected values and scatters either regression models or latent-
to SHM. The exception to this is
with high breakdown values (the number variable statistical algorithms. Regular
recording, which is highly specific for
of values belonging to different distri- variability in SHM appli-cations is
each application and there-fore it is not
butions which cause a certain statisti-cal usually caused by environ-mental and
possible to obtain generic conclusions for
measure to produce an incorrect operational factors such as temperature,
estimation regarding the distribution
wind, humidity and traffic,22,23,31 and
under analysis).1 Forward applications may actually be much greater than the
also require this type of data cleansing; variability gen-erated by novel behaviour
Cleansing the most well-known multivariate robust or even damage, especially at its
estimators—which consist of the
The process of data cleansing—using onset.26,31,32 An SHM strategy that does
feature selection to determine whether to minimum volume ellipsoid (MVE) and
not include a data normalization pro-
pass on or reject data— can be divided the minimum covariance determinant
cedure for regular variability in its
into two steps. The first step consists in (MCD)30—are used for the estimation of workflow is therefore unlikely to detect
detecting and removing corrupt or the covariance and median values.28 Due novel behaviour, and will cer-tainly be
inaccurate records from a record set.29 to the large size of SHM data sets, prone to generating false detections when
Used mostly in databases, the term efficient ver-sions of these algorithms are unexpected environ-mental and
“cleans-ing” refers to the process of generally used. operational conditions occur.
identifying irrelevant or missing parts of
the acquired data and then modifying or An example of the cleansing of data
deleting this dirty or coarse data. After The regression-based approach consists
generated by different unexpected
cleansing, the data set will be consistent in measuring the variations in oper-
sources is shown in Fig. 2 for the Inter-
with other similar data sets in an existing ational and environmental actions along
national Guadiana Bridge, which is with the structural responses and then
system, or has been modified to fulfil the located in the south-west of the Iberian
preset require-ments for the relevant establishing regression models between
Peninsula and connects the Portuguese the two data sets. Multivariate linear
application. The cleansing process is region of the Algarve to the Spanish
generally based on defining hard limits regression (MLR) models or
region of Andalusia. The bridge is being nonparametric nonlinear models such as
outside of which the data makes no sense monitored for multiple phenomena using
for the application at hand, or discarding multilayer perceptron (MLP) neural
approximately 50 sensors sampled at 100
fields with missing values or unex-pected networks23,31,33—which are capable of
Hz (Fig. 2a), whose data is compressed
types of data (for example where a text into one measurement per hour, per defining, by themselves, the relations
string has been entered into a numeric sensor. Figure 2b presents a time series which provide the function between
field). actions and responses—are the most
of the rotation at the top of pylon P3,
popular techniques. The defi-nition of
showing the influence of temperature as
such models for normalization in SHM is
well as multiple outliers related to
The second step, normalization, is the exemplified herein using the case study
incorrect heavy traffic filtering (high-
process of suppressing changes related to of the viaduct located at PK075 + 317 of
lighted in green) and sensor malfunc-tion
phenomena which are not relevant to the the south-east high-speed French railway
(highlighted in red). These values were
line (Fig. 3a), which was the subject of
application at hand or to the analyses to correctly identified as belonging to
retrofitting (Fig. 3b) with the aim of
be conducted. After performing the tasks different distributions by: (a) using the
increasing the value of the frequency of
associated with this step, the data is structural response along with the the first natural mode in order to avoid
expected to be “normalized” in the sense structural temperature to compute the reson-ance problems during the passage
that the changes observed in all samples Mahalanobis distance describing the of

Spain NL3 NL8 Tsc NL11 Portugal
NL2 NL12
NL1 NL13

CL3, CL4

P1 CL6, CL7
P2 P3 P4 CL8

(b) Outliers


0e+00 4e+14 8e+14


50 100 150 200 250

–300 –200 –100





50 100 150 200 250 4e+12


Time Index (days)

50 100 150 200 250
Time Index (days)

Fig. 2: International Guadiana Bridge: (a) overview of the monitoring system, (b) rotation on the top of P3 pylon, (c) first removal of outliers,
(d) second removal of outliers.
*CL = tilt meter; DH = horizontal relative displacement; NL = vertical displacement sensor; P = pylon; T = thermometer.

TGV trains. The bridge was monitored viaduct, the values of the frequency of the frequency is strongly non-linear, and
before, during and immediately after the the first natural mode were obtained that the MLP is more capable of
retrofitting, and then again approxi- along with the structural temperature and capturing it than linear regression. As a
mately two years later, at a rate of 500 then modelled using linear regression and consequence, the differences between the
samples per second across 10 acceler- MLP non-linear regression. Figure 3c data acquired and the MLP estimations
ometers during the passage of trains.34,35 shows that the relation between the can be considered free of the effect of the
As each train crossed the temperature and temperature,

(c) Experimental

Lin. Regression
mode (Hz)
26. 36.
Frequency– 1st ral

0 16.6. 95.

0 10 20 30
(b) Temperature (°C)

Fig. 3: Examples of MLR and MLP models: (a) monitoring of the dynamic response of a high speed track railway bridge; (b) retrofitting device;
(c) relation between 1st frequency and temperature

while the differences obtained from the by temperature, whereas the average Aggregation
linear models show evidence of the strain measurements are slightly influ-
The rising complexity and volume of
influence of the temperature. enced by the concrete rheology. Thus, data presents new opportunities and
The latent-variable approach to normal- the most effective normalization method challenges for both visualization and
izing regular variability in SHM data should be capable of produ-cing residual
analysis tools that are required to capture
relies on the well-known principal com- ACFs with the highest expected values
global patterns and structural
ponent analysis (PCA)—or non-linear and variances for strain and the smallest
information. This is a common problem
versions of it such as auto-associative ones for stay cable tension. The residuals
in big data and SHM and is divided into
neural networks (AANNs)—and can obtained from the MLP and PCA dimensionality reduction (the number of
characterize the influence, in a set of approaches (Figs. 4b,c, respectively)
variables considered for visualization and
measured structural responses, of the reveal that both methods are effective at
analysis) and length reduction (the
effects generated by operational and removing the temperature effects but are
number of samples per variable).
environmental actions without the need not capable, as expected, of removing the
to measure them.1,32,36 This offers an effects of creep and shrinkage: the stay
cable forces exhibit ACF distri-butions Vibrational SHM is a good example of
advantage when studying complex
close to zero while those obtained from big-data aggregation in terms of both
structural responses that can be gener-
the strain are located between 0.7 and dimensionality reduction and length
ated by actions whose appropriate
0.95. reduction. This type of monitoring aims
quantification and characterization may
at assessing the health of a
be both difficult and expensive.
Regardless of the method used, the
objective of normalizing the regular
(a) SC33
variability is to suppress any trends SC36
related to known actions along the time Santarém S10 Almeirim
series of the structural responses. Trends S2 S4
SC24 SC29
P1 P2 P3 P6 P7 P8
are completely removed when the P4 P5
corresponding differences between the
measured and estimated data exhibit 40.5 42.0 78.0 246.0 78.0 42.0 40.5
seemingly random distributions and thus
exhibit a null auto-correlation function Section S10 T7
(ACF). Hence, if fully effective T1 C1 T2 T4 T5 T8 T10 T12 T13 C5 T15
T3 C2 T6 T9 T11 C4 T14
estimation models are applied to a time T16 T17

series acquired from unchanged and Upstream C6 C7 C8 Downstream

undamaged structures, the ACF of the T18 T19 T20

resulting residuals must exhibit null Sections S2 and S4

expected values and variability. As an T1 T2 T3 Legend:
example, the monitoring system of the C1 C2 C3
Strain gauge
Santarém Almeirim
Salgueiro Maia Bridge is considered.1,36
T4 T5 T6 T7
C4 C5 C6 Stay-cable load cell
As depicted in Fig. 4a, the tension of the T8 T9 T10

four stay cables highlighted in red is con- (b)


tinuously monitored using single-strand

load cells. In addition, numerous sec-

tions of the deck and pylons are moni-

tored with thermal probes and embedded
strain gauges.

The data set used for the MLP approach


consists of the average (T2m and T10m)

and differential (T2d and T10d)

temperatures of sec-tions S2 and S10 as

the measured influ-encing factors, along C2m C4m C10m SC24 SC29 SC33 SC36
with the average strain obtained from (c)

sections S2, S4 and S10 (C2m, C4m and

Auto- Correlation

C10m) as well as the tension of the four

stay cables (SC24, SC29, SC33 and
SC36). For the PCA, only the strain and

tension data was used.


After 15 years in service, creep and


shrinkage are shown to have a near null

C2m C4m C10m SC24 SC29 SC33 SC36
influence on all structural responses with
the exception of strain, for which the 36
Fig. 4: The Salgueiro Maia Bridge : (a) monitoring system and box-and-whiskers plots of the
influence is mild to small. Hence, the auto-correlation function for the differences obtained from the (b) multilayer perceptron and the
variability of the stay cable forces is (c) principal component analysis.
mostly influenced *C = strain gauge; P = pylon; S = section; SC = stay cable; T = thermometer.

structure by measuring its vibrational travels over the River Oise (Fig. 5a),31 is of auto-regression or wavelet coeffi-
response, which is assumed to change in a good example of dynamic SHM big- cients. Rather different in terms of
relation to mass and/or stiffness vari- data aggregation, as it was continuously concept and objectives but also effec-tive
ations.31,34 Signals are usually acquired monitored at a rate of 250 samples per are strategies such as the definition of
at a rate of hundreds or even thousands second across 18 accel-eration sensors symbolic data objects, which can
of samples per channel per second, over a period of several months, thus generally be applied as a second layer of
resulting in data sets that can easily generating a large volume of data. This data aggregation to further com-press
become unmanageable if they are not data was, however, com-pressed into modal, auto-regressive or wavelet
processed in real time using oper-ational eight natural vibration modes, whose information. Symbolic data objects can
modal analysis. This is achieved through frequencies are shown in Fig. 5d using be defined as richer, less voluminous and
the use of a set of signal pro-cessing black dots over the fre-quency spectrum. less specific types of information when
techniques capable of translat-ing the The shapes of the first four natural compared to measurements acquired on-
millions of measurements comprising the vibration modes are shown in Fig. 5b, site.2,21,34 While classical data mining
acquired data into the structural dynamic where the vertices represent the 18 focuses on detecting groups or patterns in
features, which consist of a set of sensor locations. Thus, from the 18 sets of acquired data, symbolic data deals
frequencies, damping ratios and model acceleration sensor time series with concepts that must be properly
shape coor-dinates. As an example, a comprising 250 values per second, the defined for the application at hand to
monitoring system with 100 channels big-data aggregation pro-cedure results statistically describe the analysed data.
producing 1024 values per channel per in 34 time series (8 fre-quencies, 8 For instance, a set of hourly modal data
second generates a total of 102 400 damping ratios and 18 modal acquired from 15 sensors and 20 natural
values per second, which exceeds 368 displacements) with infor-mation modes over the period of one week can
000 000 values per hour. However, using updated every 3 h. be described by 57 120 values or can
20 natural vibration modes, the same consist of a single symbolic object
information is reduced to 2040 values In addition to operational modal analy- named “one week of modal information
(20 frequencies, 20 damping ratios and sis, other big-data aggregation methods acquired from a target structure”. This
2000 modal coordinates). The data are already being used for SHM appli- object must be described by statistical
aggregation conducted for Pi-57 of the cations. A good alternative, bearing in quantities such as histograms or
French A1 motorway, which mind the same principles and objec-tives interquartile intervals, providing data
of modal analysis, is the extraction compression without significant loss of
generality or infor-mation32,34,37,38 and
allowing for a
(a) (b) 1st mode 2nd mode simple quantification of the four stat-
istical moments (expected value, scatter,
symmetry and flatness). In the case of
interquartile intervals, the week of modal
data is reduced to only 680 values (20
intervals for each frequency, damping
3rd mode 4th mode
ratio and modal coordinates). The
effectiveness of sym-bolic data analysis
in SHM relies heavily on the definition of
statistical and pattern-recognition
algorithms that can be specifically
applied to this type of data in the same
(c) Time series (vertical accelerations) way that single values and measurements

are analysed by the methods applied to

big-data sets.37,39



0 50 100 150 200 250 300 In the realm of SHM, data modelling is
Time [s] strongly related to the construction of
Spectrum of the AV13 time series feature vectors and their extraction from
(d) 80 2.2 HZ data. Feature vectors usually consist of

60 multivariate outputs com-posed of values

40 12 HZ 14.2 HZ 15.9 HZ
that are directly corre-lated with the
4.88 HZ 11 HZ relevant phenomena being monitored or
6.84 HZ 19 HZ
controlled, which in the case of SHM
applications consist of damage and
0 2 4 6 8 10 12 14 16 18 20 structural integ-rity loss. The lengths and
Frequency [Hz] dimensional-ities of these vectors must
be lower than those of the original
Fig. 5: Modal identification of the Oise bridge : (a) overview of the structure, (b) first four
data.1,5,38 Consequently, feature
natural mode shapes, (c) signals acquired on site, (d) spectrum and natural mode shapes'
frequencies extraction must

include some form of data aggregation, strategies consist of modelling the data clustering symbolic data objects over
which is the reason that the aggrega-tion before any normalization takes place; time that was applied to the Samora
and modelling pipeline blocks are operational modal analysis, auto- Machel Bridge in Tete, Mozambique
frequently addressed simul-taneously regressive modelling and wavelet during the retrofitting works carried out
using identical methods. Hence, the analysis are all good examples of this. in 2009 (Fig. 6). The rotations of the
operational modal analysis presented in Pattern-level modelling usually takes bridge’s transverse beams were
the previous section (Fig. 5) is an place after normalization and immedi- monitored in three phases, during which
interesting example of data fusion—as is ately before the final step of interpret- two retrofitting actions took place: the
the use of symbolic data objects, which ation—cluster analysis is a good first changed the structural system and
has the advantage of allowing the example. Big-data applications in the was carried out between monitoring
combination of differ-ent types of data realm of SHM are generally composed of phases 1 and 2, and the second did not
(variety). a combination of both centralized and generate any changes to the structural
Modelling for feature extraction can be pattern-level modelling. system and was con-ducted between
conducted in two stages along the big- phases 2 and 3. The acquired rotation
1 Both types are used in the big-data
signals were first con-verted into

data processing pipeline.1,5 Centralized modelling pipeline consisting of symbolic objects, each







(a) (b) 150 Phase 1
Phase 2

Phase 3
Phase 3


Phase 2


Phase 1
0 100 200 300 400 500 600 Time
Phase 1 Phase 2 Phase 3
Index (min)

Standardized Ward Linkage


0.6 Dendrogram Cut: 2 Clusters












Symbolic Object Index

Phase 1 Phase 2 Phase 3
[Rot. TM2] (arc sec) / 100

Cluster 1
80 Cluster 2




1 11 21 31 41 51 61 71 81 91 101 111 121 131

Symbolic Object Index
Fig. 6: Detection of structural changes in the Samora Machel Bridge : (a) view of the bridge, (b) rotations acquired on site, (c) dissimilarity
matrix, (d) dendrogram and (e) clusters’ allocations

representing 5 min of data, and then time series (Fig. 6e), it can be readily The features extracted afterwards are
described by a set of each rotation’s concluded that the chosen data model has then compared with the boundaries and
interquartile intervals; Fig. 6e shows that identified, in a fully automatic manner classified either as inliers belong-ing to
the clear changes between phases 1 and 2 and without the need for user judgement, the same label (the structural condition)
remain, along with the clear resemblance two distinct structural behaviours or as outliers belonging to a different
between phases 2 and 3 (when compared monitored in the Samora Machel unknown structural con-dition (probably
to the signals shown in Fig. 6b). Bridge—the first observed, as expected, corresponding to damage).22,23 The
during phase 1, and the second during second approach consists of defining
phases 2 and 3. boundaries between features belonging to
The symbolic objects were analysed
using a clustering method, which is a the different labels, such as different
common technique for statistical data Data Interpretation known structural conditions corre-
analysis used in many different sponding to different damage scen-
Data interpretation involves a wide
fields40,41 to model data as subgroups by arios.2,44,45 Newly acquired features can
variety of methodologies, from the many
analysing its density. Put differently, types of quantitative and objec-tive then be placed in the feature space to
clustering can be described as the parti- philosophies and algorithms to simple assess which label should be used to
tioning of a data set into subsets (clus- human judgement and subjec-tive classify them. The third approach
ters) so that the data in each subset shares thinking. However, the most com-monly consists of finding clusters that
some common properties. To achieve used approach consists of classifying correspond to one or more unknown
appropriate clustering it is necessary to previously extracted fea-tures using structural behaviours that may or may
minimize the within-cluster variation to single or multiple variables or criteria. not be present in the feature set.2,46,47
obtain the most homogeneous clusters Feature classification con-sists of This unsupervised approach is generally
possible and, as a natural consequence, to labelling data features in terms of preferred when no data or feature
maximize the between-cluster variation anything that may be relevant to the task baselines are avail-able, such as in old
to obtain the most dissimilar clusters at hand, such as a certain group, pattern, structures or post-accident situations.1,38
among each other. To define these clus- time frame, structural condition, damage
ters and determine the proximity (or state, etc. Two main families of
similarity) among the tests it is necess- Supervised feature classification is gen-
classification tech-nique can be found in
ary to first define suitable dissimilarity erally conducted in two phases: train-ing,
the data ana-lytics literature,40–42 which consists of defining the boundaries
measures, which are shown in Fig. 6c for
the Samora Machel Bridge. The lower depending on the pre-existence of dividing the feature space, and
the values, the more similar the objects— labelled features. Supervised prediction, which consists of mod-elling
and thus these objects are gathered into classification is applied in cases in which newly acquired features and assessing the
the same cluster. Conver-sely, the objects there is a previously acquired baseline set labels that are used to classify them.
allocated into different clusters are the consisting of arrays of relevant features Statistical hypothesis tests, statistical
ones which have greater distances (normally, all those considered possible process control and one-class support
between them. Dis-similarity measures for the chosen analysis) along with their vector machines are good examples of
can take a variety of forms, and some corre-sponding classes or labels.40 In the supervised methods that use the first
applications might require specific specific case of SHM applications, this approach (boundaries around features),
methodology establishes the relations and neural networks, multi-class support
between the monitoring data and the vector machines and decision trees are
Two bright areas can be observed in the corresponding structural conditions or good examples of supervised
dissimilarity matrix shown in Fig. 6c— health (known or novel, damaged or classification algorithms that use the
one located in phase 1 and the other undamaged). When no labels are avail- second approach (bound-aries between
located in phases 2 and 3. These bright able, the big-data strategies resort instead features).2 Unsuper-vised cluster analysis
areas indicate small dissimilari-ties to unsupervised classifi-cation,41,43 (the third classification approach) can be
according to the chosen data model, which does not rely on a-priori con-ducted using partitioning algorithms
while the darker areas indicate larger knowledge and is capable of finding any such as the popular k-means, its robust
dissimilarities between phase 1 and number of labels between 1 (all features and fuzzy versions k-medians and c-
phases 2 and 3. This matrix reveals that assigned to the same class) and n, where means, hierarchical agglomera-tive or
the modelling suggests two distinct n is the number of features existent in the divisive methods, or even density-based
groups in the data. Apply-ing a data set (one class per feature). methods such as DBSCAN and
hierarchical agglomerative clus-tering OPTICS.1
method—which is the most commonly
used method for big-data analysis, along Feature classification in SHM is a direct
An example comprising both multi-label
application of the feature classification
with the popular k-means—produces the supervised and unsupervised approaches
dendrogram plot shown in Fig. 6d. This techniques found in big-data appli-
is found in the PK075 + 317 viaduct
plot makes it possible to obtain, in a fully cations and therefore has been reported
shown in Fig. 3. As detailed earlier, this
auto-matic manner, the structure of the following three major approaches, of
viaduct was monitored before, during
which the first two are supervised and the
multivariate data in a single tree-like and after a retrofitting programme. After
third is unsupervised.
shape, according to the chosen data the centralized-level aggregation and
model. In Fig. 6d, two clusters can be The first approach1 consists of defining modelling, the features were analysed
readily identified. When these clusters boundaries around features with known using classifi-cation algorithms to
are observed on a symbolic objects labels, such as the default unda-maged or assess—in a fully quantitative and
stable structural condition. objective manner—

neural-network technique that can only
Feature Label* Hierarchical clustering (%) Neural network (%) allocate new values to classes used in its
Raw data B 0 18 training. The clustering tech-nique is
more effective than the neural network
D 0 69 for the application in this case, since it is
A 0 13 capable of detecting new structural
Other 100 0
In addition to the unsupervised or
Frequency B 46 0 supervised nature of classification
D 17 100 approaches, feature labelling methods
can also be developed as fixed or adap-
A 0 0 tive over time, or with any other chan-
Other 37 0
ging parameter. Depending on the
application, the requirements for adap-
Mode shape B 0 0 tive classification can be low or high, due
to environmental and operational
D 0 100
variability, and due to the fact that mul-
A 0 0 tiple damage phenomena may occur in
Other 100 0 the same structural system.23,38 This is
easy to implement in classification strat-
*A = after; B = before; D = during.
egies based on defining boundaries
Table 3: Labelling of features by hierarchical clustering and neural network around one type of labelling and in clus-
tering methods because of the compu-
tational simplicity compared to neural
the effect of changes in the structural concluded that the neural network networks, support vector machines and
stiffness over time during the two years assigned most of the features to the decision trees. In general, these adaptive
following the strengthening process. structural behaviour of the bridge cor- classification algorithms are based on
Three training sets of dynamic tests were responding to the D-state (during retro- hypothesis testing defined under moving
considered as labels: before (B), during fitting) and not to the A-state (after time-window processes, in order to allow
(D) and after (A) strength-ening. These retrofitting), as would be expected. This for real-time oper-ation. The monitoring
were used in the training of neural result shows that the neural network system used during the retrofitting of the
networks, along with the corresponding identified a loss in the strength-ening Samora Machel Bridge (Fig. 6) is an
features. Cluster ana-lyses were also effect during the two years fol-lowing the example of this type of strategy; a
applied to the same data set, but since retrofitting. The results of the clustering moving time-window process was
this type of algorithm is unsupervised no method are even more interesting, since implemented with the objective of
training was con-ducted. The results it highlights the exist-ence of a new measuring both how long it takes to
comparing the effectiveness of the neural cluster (denoted as “Other” in Table 3), detect structural changes after they are
networks and clustering are shown in which seems to suggest the existence of a imposed on the structural system and
Table 3 for three features modelled for newly observed structural behaviour that how long it takes for the monitoring
this specific problem: the raw is different from the three labels already system to be ready to detect new
accelerations acquired on-site, the modal defined. The inconsistencies between the changes. This strat-egy relies on
frequencies and the mode shape results for the different features and statistically testing the location of the
coordinates. The assignment of features techniques clearly highlight the fact that expected value of the average distance
acquired during the two years following the current structural behav-iour cannot between clusters
the strengthening is shown in the two be assigned to a predefined state. This is — the Novelty Index38 (NI)—using
right-most columns, from which it can be the major drawback of a robust estimators of locations and scat-
ters; its usage resulted in the

5 CB 99.9%


0 Phase 1 Phase 2 Phase 3

100 200 300 400 500 600

Time Index (min)
Fig. 7: Adaptive real-time detection of structural changes in the Samora Machel Bridge using adaptive confidence boundaries
*Red circles indicate instants where the NI surpasses the CB: the robust confidence boundary.

Baseline-Based detection of structural changes:

