1 s2.0 S0045782523004814 Main

Comput. Methods Appl. Mech. Engrg.
417 (2023) 116357
Contents lists available at ScienceDirect
Comput. Methods Appl. Mech. Engrg.

journal homepage: www.elsevier.com/locate/cma
Bayesian structural identification using Gaussian Process

discrepancy models
Antonina M. Kosikova a , Omid Sedehi a , Costas Papadimitriou b ,
Lambros S. Katafygiotis a ,∗
a
Department of Civil and Environmental Engineering, The Hong Kong University of Science and Technology, Hong Kong
b Department of Mechanical Engineering, University of Thessaly, Volos, Greece
ARTICLE INFO ABSTRACT

Keywords: Bayesian model updating based on Gaussian Process (GP) models has received attention in
Model updating recent years, which incorporates kernel-based GPs to provide enhanced fidelity response pre-
Response predictions dictions. Although most kernel functions provide high fitting accuracy in the training data set,
Bayesian approach
their out-of-sample predictions can be highly inaccurate. This paper investigates this problem
Prediction error correlation
by reformulating the problem on a consistent probabilistic foundation, reviewing common
Gaussian Process models
Kernel covariance functions
choices of kernel covariance functions, and proposing a new Bayesian model selection for kernel
function selection, aiming to create a balance between fitting accuracy, generalizability, and
model parsimony. Computational aspects are addressed via Laplace approximation and sampling
techniques, providing detailed algorithms and strategies. Numerical and experimental examples
are included to demonstrate the accuracy and robustness of the proposed framework. As a
result, an exponential-trigonometric covariance function is characterized and justified based on
the Bayesian model selection approach and observations of the sample autocorrelation function
of the response discrepancies.
1. Introduction
Finite Element (FE) models of structures are modern tools for structural analysis, design, control, and monitoring. However, the
predicting properties of these numerical models depend squarely on the underlying set of assumptions and parameters considered
by simulation experts. This issue brings to light updating FE models based on vibration data, obtained from a limited number of
sensors optimally placed at specific positions of structures. In the literature, model updating methods are classified as deterministic,
probabilistic, and non-probabilistic techniques [1–5], but the focus of this study is on probabilistic techniques.
Bayesian model inference is a general probabilistic tool, which describes the discrepancy between the measured and model
responses through a class of probability distributions known as the likelihood function. Then, it updates the user-defined prior
distribution of the structural parameters and characterizes a posterior probability distribution [3,6–11]. This distribution represents
the uncertainty associated with the structural parameters, indicating the relative plausibility of structural parameters conditional
on the modeling assumptions and the data. When multiple candidates of structural models are available, the framework can be
extended to quantify the posterior probability of each model candidate and select the best model class [12–14]. In such cases, it
is also possible to average over the response of different models and reduce the sensitivity of response predictions to the choice of
structural model classes [15–17].
∗ Corresponding author.
E-mail addresses: [email protected] (A.M. Kosikova), [email protected] (O. Sedehi), [email protected] (C. Papadimitriou),
[email protected] (L.S. Katafygiotis).
https://doi.org/10.1016/j.cma.2023.116357
Received 8 January 2023; Received in revised form 5 August 2023; Accepted 6 August 2023
Available online 8 September 2023
0045-7825/© 2023 Elsevier B.V. All rights reserved.
A.M. Kosikova, O. Sedehi, C. Papadimitriou et al. Comput. Methods Appl. Mech. Engrg. 417 (2023) 116357
Bayesian structural model updating methods have significantly progressed over the last three decades. Probabilistic identification
of model parameters based on modal statistical information has been an active research direction over the last decade [18–22]. Iden-
tification of systematic nonlinearities has recently been addressed through Bayesian filtering techniques, e.g., Kalman and Particle
Filters [23–28]. Spatial sparsity of structural damage has been incorporated through parameterized prior distributions [29–31].
Test-to-test variability of structural parameters is accounted for through hierarchical Bayesian models, whereby the environmental
and operational effects are characterized [32,33]. In general, improving the robustness and accuracy of probabilistic models has
been an appealing and demanding research direction, which appears to remain at research spotlight for the years to come.
Treatment of data points as statistically independent and Gaussian distributed constructs the most basic mathematical form of
the likelihood function. Due to the maximum-entropy principle, the Gaussian distribution is an optimal choice when the probability
density function (PDF) is assumed to have fixed second-moment central statistical moments [34,35]. Although this probability model
is intended to comply with the principle of model parsimony [12], it turns out to be a too strong assumption in most practical cases as
it entirely ignores the correlation and dependency between the measured data points. This issue is particularly the case in handling
time–history data sets where the Gaussian White Noise (GWN) assumption appears to be largely violated. Thus, the essence of
incorporating the correlation of data points has been noticed even in the first generation of structural identification methods. The
exact covariance matrix was derived in [36,37], where the spatial–temporal correlation of data points is characterized by considering
the input excitations to be band-limited Gaussian noise. However, due to computational difficulties in performing inference based
on exact covariance matrices, it was proposed to substitute them with sample auto-covariance functions curtailed at a sufficiently
large length of the signal, essentially longer than a few fundamental period of the linear system of interest [38,39]. Nevertheless,
the auto-covariance function might cause over-fitting due to incorporating an excessively large number of correlation coefficients
driven by the data. Recently, Auto-Regressive (AR) models were employed for parameterizing the temporal correlation of prediction
errors, but these models are not adequately flexible for modeling different mathematical functions [40,41].
Gaussian Process (GP) models offer indispensable tools for considering spatial–temporal correlations through kernel covariance
matrices [42]. Kennedy and O’Hagan [43] developed a Bayesian framework for the calibration of computer models wherein the
inconsistency between the measured and model outputs are described by GP discrepancy models that help to reduce the gap between
the outputs by updating the model with appropriate set of structural parameters. In recent years, various investigators have applied
this framework to structural model updating problems, e.g., [44–46]. GP models have also been used extensively for describing the
correlation of structural parameters with environmental factors [47–49], estimating input forces [50], developing Kriging surrogate
models [51,52], and optimizing sensor configuration [53]. In these works, Squared-Exponential (SE) kernel function has been the
default choice to describe the error characteristics, but the SE kernel might not generalize well, especially when error processes
appear as periodic patterns. A recent study has employed the exponential-sinusoidal kernel function to account for the periodicity
of prediction errors [54]. However, a more natural approach is to choose the best kernel covariance function through Bayesian
model selection techniques, as demonstrated in [55]. In a recent study, the exact covariance function of linear systems is used,
exhibiting superior prediction performance compared to other kernels [56].
Despite these advances, a kernel covariance function customized for model updating problems is missing from the structural
dynamics’ literature. Therefore, in the present paper, structural model updating is reformulated through its augmentation with
GP models. Conventional choices of kernel functions are reviewed, and a new prediction-based Bayesian model class selection
is proposed to rank different kernel structures. Computational aspects are discussed, providing the detailed flow of steps when
Laplace approximation and sampling techniques are employed. Finally, numerical and experimental examples are provided for
demonstrating the proposed framework. It is shown that the prediction errors are characterized best via an exponential-trigonometric
kernel covariance function, which offers remarkable predictability and generalizability among other choices explored herein.
The paper continues with Section 2, describing the Bayesian formulation proposed for model inference, response predictions, and
kernel selection. Section 3 formalizes the computational aspects of this framework and discusses potential difficulties. Sections 4–6
demonstrate the GP framework via numerical and experimental examples. Section 7 summarizes conclusions of this work.
2. Proposed Bayesian methodology
2.1. Structural model updating
It is desired to update a structural model M(𝜽) ∈ 𝑀 and its unknown parameters 𝜽 ∈ R𝑁𝜃 based on a set of vibration data, where
𝑀 is a class of structural model. The data 𝐷 = {(𝐗, 𝐘)} consists of complete input knowledge subsumed into 𝐗 = [𝐱1𝑇 𝐱2𝑇 … 𝐱𝑛𝑇 ]𝑇 ,
as well as the incomplete output of the structure 𝐘 = [𝐲1𝑇 𝐲2𝑇 … 𝐲𝑛𝑇 ]𝑇 considered to be the measured response, where 𝐱𝑖 ∈ R𝑁𝑥 and
𝐲𝑖 ∈ R𝑁𝑜 respectively correspond to the input and output of the structure at discrete time instant 𝑡𝑖 = 𝑖𝛥𝑡, the 𝑁𝑜 stands for the
number of observed locations, 𝑁𝑥 is the number of the known input forces, and 𝑛 is the size of a dataset. The complete input
is in the form of a fully available time–history of the input forces and the incomplete output is referred to as the measurement
data available from the limited locations of the monitored mechanical system. When the input 𝐗 is given to the deterministic
model M(𝜽), specified by a set of structural parameters 𝜽, it generates a set of time–history response, defined as model response
𝐟(𝐗; 𝜽) = [(𝑓 (𝐱1 ; 𝜽))𝑇 … (𝑓 (𝐱𝑛 ; 𝜽))𝑇 ]𝑇 such that the functional form 𝑓 (𝐱𝑖 ; 𝜽) ∈ R𝑁𝑜 gives the model output at time 𝑡𝑖 corresponding
to those degrees-of-freedom from which the measurements 𝐘 are obtained. Depending on the choice of structural parameters (𝜽),
there exists some discrepancy between the measured and model responses such that the observed response can be written as
𝐲𝑖 = 𝑓 (𝐱𝑖 ; 𝜽) + 𝜺𝑖 (1)
2
where 𝜺𝑖 ∈ R𝑁𝑜 is a vector of prediction errors corresponding to the time instant 𝑡𝑖 , described herein through a GP model P(𝝓) ∈ 𝑃
chosen from a class of probability models (𝑃 ), parameterized by a vector of unknown hyper-parameters 𝝓 ∈ R𝑁𝜙 , where 𝑁𝜙 is the
dimension of the parameters 𝝓. The prediction error is defined herein as a mismatch between the measured data and the deterministic
model output. This probability model is characterized through a kernel covariance function, denoted by 𝑘(𝑡𝑖 , 𝑡𝑖 ′ ; 𝝓), where 𝑡𝑖 ∈ R𝑁𝑛
is a vector of time when the measurements were taken. The knowledge of time is necessary to define the likelihood of observing
the response 𝐲𝑖 conditional on 𝐱𝑖 as follows:
𝑝(𝐲𝑖 |𝐱𝑖 , 𝜽, 𝝓) ∼ 𝐺𝑃 (𝐲𝑖 |𝑓 (𝐱𝑖 ; 𝜽), 𝑘(𝑡𝑖 , 𝑡𝑖 ′ ; 𝝓)) (2)

where this GP has the mean vector 𝑓 (𝐱𝑖 ; 𝜽) and the kernel covariance function 𝑘(𝑡𝑖 , 𝑡𝑖 ′ ; 𝝓). This kernel function correlates the samples
𝑖th and 𝑗th of the prediction errors in the time space based on their projection into a multi-dimensional Euclidean manifold S,
whereon lies the above-mentioned vector 𝑡𝑖 ⊂ S. In this latent manifold, a dot product operation can be defined over 𝑡𝑖 ’s, governing
the correlation of data points, i.e., 𝑘(𝑡𝑖 , 𝑡𝑖 ′ ; 𝝓) = 𝜑(𝑡𝑖 ).𝜑(𝑡𝑖 ′ ), where 𝜑(.) is a function mapping 𝑡𝑖 to 𝐲𝑖 [42]. Given this probability model,
the model-updating problem transforms into the identification of the parameters 𝜣 = [𝜽𝑇 𝝓𝑇 ]𝑇 based on the data. By following a
Bayesian strategy, the prior probability model P(𝜣) ∈ 𝑈 is adopted to describe user’s initial knowledge (𝑈 ) about these parameters
before updating them based on the data.
The above series of probabilistic reasoning allows establishing a class of probabilistic models MP (𝜣) ∈ 𝑀𝑃 , which embodies a
composite of the structural model M(𝜽), GP model P(𝝓), and prior probability model P(𝜣). Subsequently, the Bayes’ rule can be
used for updating MP (𝜣) as follows:
𝑝(𝐘|𝜣, 𝐗, MP )𝑝(𝜣|MP )
𝑝(𝜣|𝐘, 𝐗, MP ) = (3)
𝑝(𝐘|𝐗, MP )
where 𝑝(𝜣|MP ) is the prior PDF; 𝑝(𝜣|𝐘, 𝐗, MP ) is the posterior PDF, which should be calculated; 𝑝(𝐘|𝐗, MP ) is the evidence of a
model class, that is, the integral of the numerator over the entire domain of parameters and given by
𝑝(𝐘|𝐗, MP ) = 𝑝(𝐘|𝜣, 𝐗, MP )𝑝(𝜣|MP )𝑑𝜣 (4)

∫𝜣
where, 𝑝(𝐘|𝜣, 𝐗, MP ) is the likelihood function of all instances of the responses (𝐘) conditional on the input time history (𝐗),
described based on Eq. (2) and given by
𝑝(𝐘|𝐗, 𝜣, MP ) = 𝑁(𝐘|𝐟(𝐗; 𝜽), 𝐊(𝝓)) (5)

where 𝐟 (𝐗; 𝜽) is the mean vector, considered equal to the structural model response, and 𝐊(𝝓) = [𝑘𝑖𝑗 ] is the covariance matrix,
described by the kernel covariance function. This Bayesian formulation provides a posterior distribution for the unknown parameters,
conditional on the data and the probabilistic model. Unlike the formulation which is mathematically elegant and easy-to-follow,
the computation is challenging and cumbersome even in low-dimensional settings as we have to perform an optimization to search
for the Most Probable Values (MPV) or run Markov Chain Monte Carlo (MCMC) sampling to draw samples from the posterior
distribution. Later, we will elaborate further on the computational aspects of this methodology.
Remark 1. The evidence term in Eq. (4) is often difficult-to-calculate, as it would entail an integration over the multi-dimensional
space of all parameters. Fortunately, when inferring the unknown parameters, the evidence turns out to be a constant value, which
does not govern the MPV or samples of the posterior distribution. Nonetheless, for the model class selection, as will be explained
later, its calculation is inevitable.
2.2. Response predictions
After performing Bayesian inference, the next step is to predict unobserved structural response quantities of interest while
considering the uncertainty in the structural model parameters, as well as the prediction error model. For this purpose, it is customary
𝑇 𝐱𝑇 … 𝐱𝑇
to assume that a new set of input data, 𝐗𝑝𝑟𝑒𝑑 = [𝐱𝑛+1 𝑛+2 𝑛+𝑛′
]𝑇 , is available, containing 𝑛′ data points. This input data set comes
from the same input space as the one used for the inference. Due to the likelihood function considered in Eq. (2), the joint distribution
of the observed and unobserved responses can be specified as
[ ] [ ]
( ) ⎛ 𝐘 || 𝐟(𝐗; 𝜽) ⎡ 𝐊(𝝓) 𝐤(𝝓)𝑝𝑟𝑒𝑑 ⎤⎞
𝑝 𝐘, 𝐘𝑝𝑟𝑒𝑑 |𝐗, 𝐗𝑝𝑟𝑒𝑑 , 𝜣, MP = 𝑁 ⎜ |
| ,⎢ ⎥⎟ (6)
⎜ 𝐘 | ⎢𝐤(𝝓)𝑇 𝐊(𝝓)𝑝𝑟𝑒𝑑 ⎥⎦⎟⎠
⎝ 𝑝𝑟𝑒𝑑 | 𝐟(𝐗𝑝𝑟𝑒𝑑 ; 𝜽) ⎣ 𝑝𝑟𝑒𝑑
𝑇 𝐲𝑇 … 𝐲𝑇 ′ ′
where 𝐘𝑝𝑟𝑒𝑑 = [𝐲𝑛+1 𝑛+2 𝑛+𝑛′
]𝑇 ∈ R𝑛 𝑁𝑜 is the unobserved response of the structure subjected to 𝐗𝑝𝑟𝑒𝑑 input; 𝐟(𝐗𝑝𝑟𝑒𝑑 ; 𝜽) ∈ R𝑛 𝑁𝑜
′ ′
is the structural model response in which the input 𝐗𝑝𝑟𝑒𝑑 is replaced; 𝐊𝑝𝑟𝑒𝑑 ∈ R𝑛 𝑁𝑜 ×𝑛 𝑁𝑜 is the covariance matrix of the unobserved
′ ′
output calculated based on the kernel function and the time of the predictions 𝑡𝑝𝑟𝑒𝑑 ∈ R𝑛 ; 𝐤(𝝓)𝑝𝑟𝑒𝑑 ∈ R𝑛𝑁𝑜 ×𝑛 𝑁𝑜 is the cross covariance
matrix of 𝐘 and 𝐘𝑝𝑟𝑒𝑑 . From this joint distribution, it is straightforward to derive the distribution of unobserved response conditional
on the observed response and the unknown parameters:
( ) ( )
| |
𝑝 𝐘𝑝𝑟𝑒𝑑 | 𝐘, 𝐗, 𝐗𝑝𝑟𝑒𝑑 , 𝜣, MP = 𝑁 𝐘𝑝𝑟𝑒𝑑 | 𝝁𝑝𝑟𝑒𝑑 , 𝜮 𝑝𝑟𝑒𝑑 (7)
| |
′ ′ ′
where 𝝁𝑝𝑟𝑒𝑑 ∈ R𝑛 𝑁𝑜 and 𝜮 𝑝𝑟𝑒𝑑 ∈ R𝑛 𝑁𝑜 ×𝑛 𝑁𝑜 are the conditional mean and covariance of the response given by
𝝁𝑝𝑟𝑒𝑑 = 𝐟(𝐗𝑝𝑟𝑒𝑑 ; 𝜽) + 𝐤(𝝓)𝑇𝑝𝑟𝑒𝑑 𝐊(𝝓)−1 (𝐘 − 𝐟(𝐗; 𝜽)) (8)
3
𝜮 𝑝𝑟𝑒𝑑 = 𝐊𝑝𝑟𝑒𝑑 (𝝓) − 𝐤(𝝓)𝑇𝑝𝑟𝑒𝑑 𝐊−1 (𝝓)𝐤(𝝓)𝑝𝑟𝑒𝑑 (9)
By virtue of the principle of total probability, the posterior predictive distribution of the unobserved responses can be calculated
from
𝑝(𝐘𝑝𝑟𝑒𝑑 |𝐘, 𝐗, 𝐗𝑝𝑟𝑒𝑑 , MP ) = 𝑝(𝐘𝑝𝑟𝑒𝑑 |𝐘, 𝐗, 𝐗𝑝𝑟𝑒𝑑 , 𝜣, MP )𝑝(𝜣|𝐘, 𝐗, MP )𝑑𝜣 (10)

∫𝜣
where the expressions inside the integral are obtained earlier in Eqs. (3) and (7), and 𝑝(𝐘𝑝𝑟𝑒𝑑 |𝐘, 𝐗, 𝐗𝑝𝑟𝑒𝑑 , MP ) is the posterior
predictive distribution of the unobserved response. As explained earlier, the integration over the multi-dimensional space of the
parameters is computationally challenging, which will be addressed later.
2.3. Prediction-based model class selection
The probabilistic framework formulated above can be continued with a Bayesian model class selection by writing another Bayes’
rule to evaluate the relative plausibility of the probabilistic model class. Based on [12], one may write:
𝑝(𝐘|𝐗, MP )𝑃 (MP )
𝑃 (MP |𝐗, 𝐘) = ∑ (11)
MP ∈𝑀𝑃 𝑝(𝐘|𝐗, MP )𝑃 (MP )
where 𝑃 (MP ) is the prior probability of the probabilistic model MP , 𝑝(𝐘|𝐗, MP ) is the evidence term calculated by Eq. (4), and
𝑃 (MP |𝐗, 𝐘) is the posterior probability of MP . When the class of probabilistic models (𝑀𝑃 ) embeds different structural models
varying in their underlying assumptions while using the same kernel function, this formulation establishes a trade-off between the
fitting accuracy and the complexity (in terms of the number of free parameters) of the structural model. However, this formulation
does not consider the accuracy of out-of-sample predictions when a kernel covariance function is included. Moreover, in the context
of GP models, most conventional kernels provide extremely high accuracy within the training data set, fitting almost exactly to the
data except for the measurement noise. Conversely, they do not offer generalizability in terms of predicting dynamical responses.
In this section, we propose a new formulation to consider generalizability of the kernel covariance functions. For this purpose, it
is assumed that the posterior distribution of the unknown parameters is obtained from the training data set 𝐷 = {(𝐗, 𝐘)} according
to Section 2.1. Then, the posterior distribution was used for predicting the held-out data set 𝐷𝑝𝑟𝑒𝑑 = {(𝐗𝑝𝑟𝑒𝑑 , 𝐘𝑝𝑟𝑒𝑑 )} according to
Section 2.2, where 𝐘𝑝𝑟𝑒𝑑 is also available but not used for inferring the unknown parameters. In this case, the Bayes’ rule provides:
𝑝(𝐘𝑝𝑟𝑒𝑑 |𝐘, 𝐗, 𝐗𝑝𝑟𝑒𝑑 , MP )𝑝(𝐘|𝐗, MP )𝑃 (MP )
𝑃 (MP |𝐗, 𝐗𝑝𝑟𝑒𝑑 , 𝐘, 𝐘𝑝𝑟𝑒𝑑 ) = ∑ (12)
MP ∈𝑀𝑃 𝑝(𝐘𝑝𝑟𝑒𝑑 |𝐘, 𝐗, 𝐗𝑝𝑟𝑒𝑑 , MP )𝑝(𝐘|𝐗, MP )𝑃 (MP )
where 𝑝(𝐘|𝐗, MP ) is the evidence term calculated by Eq. (4), 𝑝(𝐘𝑝𝑟𝑒𝑑 |𝐘, 𝐗, 𝐗𝑝𝑟𝑒𝑑 , MP ) is the posterior predictive distribution given
by Eq. (10), and 𝑃 (MP |𝐗, 𝐗𝑝𝑟𝑒𝑑 , 𝐘, 𝐘𝑝𝑟𝑒𝑑 ) is the posterior probability of MP , reflecting the relative plausible of MP , conditional on
the data and user’s judgment. This treatment can also be interpreted as a Bayesian implementation of cross-validation, wherein the
measured response history is broken into the training and prediction segments, aiming to consider prediction properties.
When the prior probability 𝑃 (MP ) is fixed, the posterior probability of MP will become proportional to the multiplication of
𝑝(𝐘𝑝𝑟𝑒𝑑 |𝐘, 𝐗, 𝐗𝑝𝑟𝑒𝑑 , MP ) and 𝑝(𝐘|𝐗, MP ). However, as it might be difficult to compare directly the posterior probability of different
model classes, their natural logarithms might be used instead, which leads to
ln 𝑃 (MP |𝐗, 𝐗𝑝𝑟𝑒𝑑 , 𝐘, 𝐘𝑝𝑟𝑒𝑑 ) = ln 𝑝(𝐘𝑝𝑟𝑒𝑑 |𝐘, 𝐗, 𝐗𝑝𝑟𝑒𝑑 , MP ) + ln 𝑝(𝐘|𝐗, MP ) (13)

⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ ⏟⏞⏞⏞⏞⏟⏞⏞⏞⏞⏟
Posterior Predictive Evidence
In this equation, if we ignore the posterior predictive expression, the original formulation presented in Beck and Yuen [12]
is recovered. Nevertheless, this new formulation is more general as it enables creating a balance between generalizability
(out-of-sample predictions), fitting accuracy in the training data, and model complexity.
2.4. Kernel covariance functions
The accuracy of GPs depends squarely upon choosing an appropriate kernel covariance function. There can be numerous kernel
structures to simulate the desired mathematical function. Thus, it is important to choose the simplest structure that fits into the
problem at hand. For selecting the best kernel function, we intend to employ the above model selection methodology. However,
in the remainder of this section, we review typical choices of kernel functions and explain under what conditions they might be a
legitimate choice.
2.4.1. Squared exponential (SE) function

The SE, also known as Gaussian kernel or Radial Basis Function (RBF), is perhaps the most popular kernel function [42]. Its
properties, such as easy integration with most functions and infinite differentiability, makes it suitable for a variety of problems.
This kernel has only two parameters, the variance (𝜎𝑓2 ) and the correlation length (𝓁) that governs the decay rate of correlation.
This kernel function can be written as
( 2
)
1 𝑇𝑖𝑗
𝑘𝑖𝑗 = 𝜎𝑓2 exp − (14)
2 𝓁2
4
Fig. 1. Schematic overview of kernel covariance functions (a) SE (b) PE (c) MMTE.
‖ ‖
where 𝑘𝑖𝑗 is the element (𝑖, 𝑗) of the covariance matrix 𝐊(𝝓) = [𝑘𝑖𝑗 ], 𝑇𝑖𝑗 = ‖𝑡𝑖 − 𝑡𝑗 ‖ is the so-called distance or similarity measure,
‖ ‖
2
𝜎𝑓 is the variance of individual samples, 𝓁 is correlation time describing how fast the correlation fades out as the distance measure
increases, and 𝝓 = [𝜎𝑓2 𝓁 −2 ]𝑇 is a vector comprising the unknown parameters of the SE kernel function. A schematic view of the
kernel function can be seen in Fig. 1(a). Based on this kernel function, training samples with a closer distance to the predicted ones
are presumably more informative. Moreover, the SE is also a stationary covariance matrix since it only relies on the distance 𝑇𝑖𝑗
and not the dynamic values of 𝑡𝑖 or 𝑡𝑗 .
2.4.2. Periodic exponential (PE) function

For periodic signals, it is reasonable to modify the SE by embedding a sinusoidal function inside the exponential function. By
doing so, the PE is expressed as
( 2
)
2 1 sin (𝜔𝑇𝑖𝑗 )
𝑘𝑖𝑗 = 𝜎𝑓 exp − (15)
2 𝓁2
In this equation, the coefficient 𝜔 specifies the frequency of the sine function, and 𝓁 is correlation time to be determined. Such a
kernel replicates and predicts responses that have the same period as the training data. However, in case of unstable periodicity
and non-stationarity of the signal, it fails to predict unobserved responses. This kernel function is displayed in Fig. 1(b), indicating
how the correlation decays as 𝑇𝑖𝑗 grows.
2.4.3. Multi-modal trigonometric exponential (MMTE) kernel

A more flexible kernel function can be acquired by multiplying the SE by a cosine function. Since it might be the case that the
correlation constitutes more than a single dominant frequency, the kernel function can further be generalized by considering the
contribution from 𝑚 exponential-cosine functions, given by
( )
∑𝑚 𝑇𝑖𝑗2 ( )
2
𝑘𝑖𝑗 = 𝜎𝑓 ,𝑘 exp − cos 𝜔𝑘 𝑇𝑖𝑗 (16)
𝓁 2
𝑘=1 𝑘
where 𝜎𝑓2 ,𝑘 is the amplitude of the correlation, 𝓁𝑘 is correlation time of the 𝑘th cosine function, 𝜔𝑘 is the frequency of 𝑘th function.
A unimodal example of this kernel is shown in Fig. 1(c). In this kernel function, by limiting the number of contributing/dominant
modes up to 𝑚 modes, one can well characterize an efficient model of mixing sinusoidal functions while retaining the simplicity
in terms of the number of parameters. Note that the MMTE function can be regarded as the decomposition of different dynamical
modes, resembling an impulse response of a dynamical system. This kernel function is shown to be promising in structural dynamics
applications [57], whose theoretical justification is provided in [56].
2.4.4. Combination with isotropic measurement noise

An advantageous feature of kernel functions is to create composite kernels that are more flexible. For instance, when the data
is polluted by measurement noise, a Gaussian White Noise (GWN) process can be imposed on top of the kernel function of interest
to attain additional robustness. In this case, the new kernel can be written as
𝑘′𝑖𝑗 = 𝑘𝑖𝑗 + 𝜎𝑛2 𝛿𝑖𝑗 (17)
where 𝛿𝑖𝑗 is the Kronecker delta function, and 𝜎𝑛2 is the measurement noise variance. Later, we will investigate how these kernel
functions perform when applied to structural dynamics problems.
3. Computational methods
3.1. Gaussian approximation of posterior distribution
In the presence of a large number of data points, the joint posterior distribution might concentrate around an isolated peak.
This condition, referred to as globally identifiable cases [58], allows applying Laplace asymptotic approximation, which substitute
5
the posterior distribution with an approximate Gaussian distribution, centered at the mode of the posterior distribution with a
covariance matrix equal to the Hessian matrix inverse of the objective function evaluated at the mode of the distribution [59]. By
applying this approach to Eq. (3), the following approximation can be obtained:
[ ] [ ] ̂
⎛ 𝜽 || 𝜽̂ ⎡𝜮 𝜽𝜽
̂ 𝜽𝝓 ⎤⎞
𝜮
𝑝(𝜣|𝐘, 𝐗, MP ) ≈ 𝑁 ⎜ | , ⎢ ⎥⎟ (18)
⎜ 𝝓 || 𝝓̂ ⎢𝜮 ̂𝑇 ̂ 𝝓𝝓 ⎥⎟
𝜮
⎝ | ⎣ 𝜽𝝓 ⎦⎠
̂ 𝝓)
where (𝜽, ̂ denotes the MPV of the parameters (mode of the distribution), calculated by minimizing the logarithm of the posterior
̂ 𝜽𝝓 ∈ R𝑁𝜃 ×𝑁𝜙 , and 𝜮
̂ 𝜽𝜽 ∈ R𝑁𝜃 ×𝑁𝜃 , 𝜮
distribution, i.e., 𝐿(𝜣) = − ln 𝑝(𝜣|𝐘, 𝐗, MP ); the matrix blocks 𝜮 ̂ 𝝓𝝓 ∈ R𝑁𝜙 ×𝑁𝜙 are calculated
by inverting the Hessian matrix of 𝐿(𝜣) at the MPV. The approximation in Eq. (19) requires simultaneously optimizing both the
structural parameters (𝜽) and the kernel function parameters (𝝓) over the augmented multi-dimensional space of these parameters.
However, the simultaneous minimization of 𝜽 and 𝝓 makes the optimization likely to suffer from premature termination due to
sticking into undesirable local minima.
As a remedy to this problem, we propose to break the minimization into a two-stage process, as provided in Algorithm 1, where
𝑝(𝜽|𝝓, 𝐘, 𝐗, MP ) and 𝑝(𝝓|𝜽, 𝐘, 𝐗, MP ) are maximized iteratively until convergence is attained. To find the mathematical expression
of these probability distributions, the Bayes’ rule can be used as follows:
𝑝(𝐘|𝜽, 𝝓, 𝐗, MP )𝑝(𝜽|𝝓, MP )
𝑝(𝜽|𝝓, 𝐘, 𝐗, MP ) = (19)
𝑝(𝐘|𝐗, 𝝓, MP )
Based on this equation, the logarithm of 𝑝(𝜽|𝝓, 𝐘, 𝐗, MP ) can be calculated as
ln 𝑝(𝜽|𝝓, 𝐘, 𝐗, MP ) = ln 𝑝(𝐘|𝜣, 𝐗, MP ) + ln 𝑝(𝜽|𝝓, MP ) − ln 𝑝(𝐘|𝐗, 𝝓, MP ) (20)
Likewise, the probability distribution 𝑝(𝝓|𝜽, 𝐘, 𝐗, MP ) can be specified by exchanging the position of 𝜽 and 𝝓. Then, maximizing
these conditional distributions turns out to be equivalent to minimizing the negative logarithm of these functions, as given by
1
𝐿(𝜽|𝝓) ≐ − ln 𝑝(𝜽|𝝓, 𝐘, 𝐗, MP ) = (𝐘 − 𝐟 (𝐗; 𝜽))𝑇 𝐊(𝝓)−1 (𝐘 − 𝐟(𝐗; 𝜽)) − ln 𝑝(𝜽|𝝓, MP ) + 𝑐𝑡𝑒. (21)
2
1 1
𝐿(𝝓|𝜽) ≐ − ln 𝑝(𝝓|𝜽, 𝐘, 𝐗, MP ) = ln |𝐊(𝝓)| + (𝐘 − 𝐟(𝐗; 𝜽))𝑇 𝐊(𝝓)−1 (𝐘 − 𝐟 (𝐗; 𝜽)) − ln 𝑝(𝝓|𝜽, MP ) + 𝑐𝑡𝑒.
2 2
(22)
where 𝑐𝑡𝑒. stands for constant terms; the determinant term in the Eq. (22) was dropped since it depends only on the parameters (𝝓)
and does not affect the optimization process when calculating the parameters (𝜽). The function in Eq. (22) is complex and nonlinear,
therefore a nonlinear constrained optimization was used to compute the model parameters using asymptotic approximation with
analytical gradients supplied.
Algorithm 1 formalizes the process of finding the MPV of the parameters, which reads as a main ‘‘while-loop’’ iteratively updating
(𝜽) and (𝝓). In this algorithm, the convergence is reached when the variation of the estimated parameters between the last two
iterations of the algorithm is no longer above a predefined threshold. This condition should be checked along with a stopper placed
on the total number of iterations, aiming to avoid too many trials. Once the MPV of the parameters are obtained, we can predict
structural responses by replacing the optimal values into Eqs. (7)–(10) while ignoring their uncertainties. Algorithm 2 provides
the detailed steps of predicting responses based on the inferred parameters. Although this procedure would be inaccurate due to
neglecting the uncertainty of the parameters, it offers significant computational savings. In case a more rigorous calculation is
required, a full sampling approach might be preferred, as explained in the next section.
6
3.2. Full sampling strategy
MCMC sampling methods can be regarded as a more general treatment of the inference problem formulated above. In this study,
we use the Transitional Markov Chain Monte Carlo (TMCMC) method [58,60,61] for drawing samples from the posterior distribution
described by Eqs. (3)–(5). By doing so, the mean and covariance of the unknown parameters can be estimated using the statistics
of the posterior samples:
([ ]) 𝑁𝑠
[ (𝑚) ]
𝜽 1 ∑ 𝜽
E ≈ (23)
𝝓 𝑁𝑠 𝑚=1 𝝓(𝑚)
([ ]) 𝑁𝑠
([ (𝑚) ] ([ ])) ([ (𝑚) ] ([ ]))𝑇
𝜽 1 ∑ 𝜽 𝜽 𝜽 𝜽
COV ≈ −E −E (24)
𝝓 𝑁𝑠 𝑚=1 𝝓(𝑚) 𝝓 𝝓(𝑚) 𝝓
where E[.] and COV[.] denote the expected value (mean) and the covariance matrix, respectively; 𝜽(𝑚) ∈ R𝑁𝑠 and 𝝓(𝑚) ∈ R𝑁𝑠 are
the samples obtained from the posterior distribution 𝑝(𝜣|𝐘, 𝐗, MP ) where 𝑁𝑠 is a number of samples. Additionally, the integrals
encountered for calculating the evidence term in Eq. (4) can be approximated as
𝑁𝑠
1 ∑
𝑝(𝐘|𝐗, MP ) ≈ 𝑝(𝐘|𝜽(𝑚) , 𝝓(𝑚) , 𝐗, MP ) (25)
𝑁𝑠 𝑚=1
Although this expression is generally true, calculation of the evidence term based on this approximation might not be sufficiently
accurate, especially when the likelihood function is highly peaked. Therefore, in this paper, we resort to the TMCMC sampling
algorithms to calculate it in a more reliable manner. In this approach, in lieu of attempting to converge from a prior distribution
directly into a target PDF, a series of intermediate proposal distributions are created to draw samples from the target distribution
iteratively. The intermediate distributions are determined through an additional resampling stage, where the samples are synthesized
based on plausibility weights. In most cases, the intermediate distributions are considered Gaussian centered at each sample of the
Markov chain, whose spread is characterized via a scaled covariance matrix. Further details about TMCMC sampler and its technical
details are available elsewhere, e.g., [58].
Once the evidence term is calculated, it is straightforward to use Eq. (11) for performing Bayesian model class selection.
Furthermore, the dynamical response of the structure subjected to new input scenarios can be approximated based on Eqs. (7)–(10)
and the sampling results, giving:
𝑁𝑠 ( )
1 ∑ |
𝑝(𝐘𝑝𝑟𝑒𝑑 |𝐘, 𝐗, 𝐗𝑝𝑟𝑒𝑑 , MP ) ≈ 𝑁 𝐘𝑝𝑟𝑒𝑑 | 𝝁(𝑚) , 𝜮 (𝑚) (26)
𝑁𝑠 𝑠=1 | 𝑝𝑟𝑒𝑑 𝑝𝑟𝑒𝑑
′ ′ ′
where 𝝁(𝑚)
𝑝𝑟𝑒𝑑
∈ R𝑛 𝑁𝑜 and 𝜮 (𝑚)
𝑝𝑟𝑒𝑑
∈ R𝑛 𝑁𝑜 ×𝑛 𝑁𝑜 are respectively the mean vector and covariance matrix of the response calculated based
on individual samples 𝜽 and 𝝓(𝑚) , given by
(𝑚)
𝝁(𝑚)
𝑝𝑟𝑒𝑑
= 𝐟(𝐗𝑝𝑟𝑒𝑑 ; 𝜽(𝑚)
𝑝𝑟𝑒𝑑
(𝑚)
) + (𝐤(𝝓(𝑚) )𝑝𝑟𝑒𝑑 )𝑇 (𝐊(𝝓(𝑚) ))−1 (𝐘 − 𝐟(𝐗; 𝜽𝑝𝑟𝑒𝑑 )) (27)
𝜮 (𝑚)
𝑝𝑟𝑒𝑑
= 𝐊(𝝓 (𝑚)
)𝑝𝑟𝑒𝑑 − (𝐤(𝝓 (𝑚) 𝑇
)𝑝𝑟𝑒𝑑 ) (𝐊(𝝓 (𝑚) −1
)) 𝐤(𝝓 (𝑚)
)𝑝𝑟𝑒𝑑 (28)
It simply follows from Eq. (28) that the mean and covariance of the Gaussian mixture distribution can be calculated explic-
itly [33]:
𝑁𝑠
([ ]) 1 ∑ (𝑚)
E 𝐘𝑝𝑟𝑒𝑑 = 𝝁 (29)
𝑁𝑠 𝑠=1 𝑝𝑟𝑒𝑑
𝑁𝑠 ( )
([ ]) 1 ∑ [ ] [ ]
COV 𝐘𝑝𝑟𝑒𝑑 = (𝝁(𝑚)
𝑝𝑟𝑒𝑑
)(𝝁(𝑚)
𝑝𝑟𝑒𝑑
)𝑇 + 𝜮 (𝑚)
𝑝𝑟𝑒𝑑
− E 𝐘𝑝𝑟𝑒𝑑 E 𝐘𝑇𝑝𝑟𝑒𝑑 (30)
𝑁𝑠 𝑠=1
Algorithm 3 puts together the flow steps for model inference and response predictions using MCMC sampling methods. In this
algorithm, as expected, the computational cost is mainly concerned with step ‘‘1’’ of the algorithm, where the MCMC samples
7
are obtained. The remaining steps are mainly post-processing, which end up with the estimated second-moment statistics of the
parameters and responses.
3.3. Discussion on computational costs
The above optimization and sampling algorithms involve a large number of evaluations of the log-likelihood function. This
process is computationally expensive since it needs running the structural model to produce dynamical response and calculating
the determinant and inverse of the covariance matrix. The former can be alleviated through model reduction techniques [62] and
surrogate modeling techniques (e.g., [63]). The latter can easily overwhelm computations even in low-dimensional problems as it
scales cubically with the number of data points, i.e., 𝑂(𝑛3 𝑁𝑜3 ), leading to excessively high dimensional matrices when dealing with
long time–history responses. In machine learning literature, it is customary to use Cholesky decomposition of the covariance matrices
and calculate the determinant and inverse matrix [42]. However, this decomposition can still be computationally prohibitive,
especially when dealing with time-domain vibration data. Moreover, in this approach, the optimization becomes very sensitive
to the initial values, and the estimated parameters might vary significantly. This issue might occur due to large noise contamination
of the signal, which usually prevails in the measurements. Therefore, in this work, the Singular Value Decomposition (SVD) of
the covariance matrix is employed for truncating redundant dimensions from the observed data covariance matrix and alleviating
scalability problems. In order to specify an appropriate level of truncation using the SVD, the slope of the singular values decay
(gradient) is checked against a tolerance of 0.1%–1% of the largest singular values. Later, we will illustrate how such a dimensionality
reduction approach can be used. It is worth highlighting that there are other techniques for reducing the dimension of the covariance
matrix as well, including Proper Orthogonal Decomposition (POD) [64] and Random Fourier features method [65]. However, these
methods are not studied in the present work.
4. Numerical example I
The acceleration response (𝐘) of a linear single-degree-of-freedom (SDOF) system is generated synthetically subjected to a known
input force and considered as the measurements. The input force is zero-mean GWN with 1 N standard deviation shown in Fig. 2(a).
The mass is 1 kg, the stiffness is 5 N/m, and the viscous damping ratio is 5%. The sampling interval is considered 0.01 s, and the
response length is 40 s shown in Fig. 2(b). The dataset 𝐷 then consists of the input force and the acceleration response.
The structural model considered is a linear SDOF structure, having 1 kg mass and 𝜃N/m stiffness, where 𝜃 should be identified.
The viscous damping ratio is set at 4%, considered unequal to the damping of the structure. This misspecification aims to introduce
mismatch between the model and measured responses for all choices of 𝜃. Thus, there is no correct value for the parameter (𝜃) to be
identified, but an estimate close to 5 N/m is desirable. For 𝜃 = 5N∕m, the model response is very close to the synthetic measurements
𝐘, as shown in Fig. 2(b). The discrepancy between the generated and model responses (prediction error process) is also shown in
Fig. 2(c), which seems to be a sinusoidal function with a dominant frequency.
Given the above assumptions, four types of kernel functions, GWN, SE, PE, and MMTE are considered for model inference and
response prediction in this example. Note that, when using SE, PE, and MMTE, they are composed with an additive GWN to capture
residual random effects. As the SDOF system has only one mode in its responses, in the MMTE kernel function, the model order is
set to one, i.e., 𝑚 = 1. The first 20 s of the data is used for model inference as training data, and the other 20 s segment of the data
is used for demonstrating prediction properties of the kernel functions. The results of applying Algorithms 1–3 are summarized in
Table 1. It is worth noting, that during the optimization process no multimodality has been encountered and all the parameters were
uniquely identified. The MPV (𝜃) ̂ calculated by optimization and the estimated mean E[𝜃] identified from iTMCMC sampling [61]
are very close to the actual value (5 N/m), regardless of the choice of the covariance function. However, the uncertainty of the
stiffness, represented by 𝜎̂ 𝜃 or SD[𝜃] (standard deviation of the samples), differs depending on the choice of the kernel function.
The logarithm of the evidence term ln 𝑝(𝐘|𝐗, MP ) and the posterior probability of the model MP , defined in Eq. (12), are calculated
and reported in Table 1 for each kernel function by using Laplace approximation and sampling approach. The MP is the model class
that comprises the structural model M(𝜽) and GP model P(𝝓), where the GP model is defined by the kernel functions, GWN, SE, PE,
and MMTE. Such a comparison is made to identify which kernel is more likely to describe the output data 𝐘. It should be noted
that kernels SE, PE, and MMTE are combined with a GWN kernel function according to Eq. (17).
According to this table, the MMTE is the preferred case. This result agrees with response prediction shown in Fig. 3(a-d), where
the MMTE outperforms other kernel functions. As shown, the GWN kernel function gives zero mean error and a fixed uncertainty
8
Fig. 2. (a) Input force (b) Measured and model responses (c) Discrepancy between measured and model responses.
Table 1
Identification results for different choices of kernel covariance function.
Method Calculated quantities Kernel covariance function
GWN SE PE MMTE
𝜃̂ (MPV) 4.993 5.006 5.009 5.001
𝜎̂ 𝜃 (SD) 0.0017 0.005 0.004 0.014
𝓁̂ – 0.391 5.842 0.189
𝜎̂ 𝑓 – 0.060 0.091 0.059
Laplace Approximation
𝜔̂ – – 8.2943 2.255
𝜎̂ 𝑛 0.0467 8.5×10−7 9.9×10−5 1.9×10−9
ln 𝑝(𝐘|𝐗, MP ) 5.1×103 8.4×103 −8.6×103 1.3×104
ln 𝑃 (MP |𝐗, 𝐗𝑝𝑟𝑒𝑑 , 𝐘, 𝐘𝑝𝑟𝑒𝑑 ) 9.3×103 1. 7×104 −1.2×1015 2.7×104
E[𝜃] 5.003 5.017 4.946 5.000
SD[𝜃] 0.0004 0.0194 0.0175 0.0220
TMCMC Sampling
ln 𝑝(𝐘|𝐗, MP ) −3.3×104 3.9×103 −4.8×103 1.3×104
ln 𝑃 (MP |𝐗, 𝐗𝑝𝑟𝑒𝑑 , 𝐘, 𝐘𝑝𝑟𝑒𝑑 ) −2.9×104 1.3×104 −1.8×1014 2.5×104
bound. The SE initially provides good accuracy, but due to characterizing a small correlation length, it loses accuracy and gives zero
mean predictions along with a large uncertainty bound. The PE repeats the same signal as the training data, falling off the actual
prediction error process; however, the MMTE well captures the periodic pattern in the prediction error process. The uncertainty
given by the MMTE is initially small but increases as time goes by and as the accuracy of the estimated mean error deteriorates.
The reason for seeing that the MMTE is superior to other kernels should be searched in the temporal characteristics of the
prediction errors. For this purpose, the prediction error is calculated by considering 𝜃 = 5 N∕m, and then, the sample autocorrelation
function and power spectral density (PSD) are calculated for the model discrepancy signal presented in Fig.
√ 3. As shown in Fig. 4(a-b),
the PSD of the prediction error has a sharp peak very close to the natural frequency of the structure 5∕(2𝜋)Hz, exhibiting that a
kernel function like MMTE suits the problem at hand.
As suggested in Section 3.3, the covariance matrix can be reduced by removing redundant dimensions. Fig. 5 shows the
eigenvalues of the covariance matrix (𝐊(𝝓)) on a logarithmic scale, displaying a fast decay in the magnitudes. The cut-off threshold
prescribed earlier implies keeping 30 singular values. This threshold can be a good choice since the truncated eigenvalues are smaller
than the largest one up to several orders of magnitude.
9
Fig. 3. Prediction errors created using different kernel functions (a) GWN (b) SE (c) PE (d) MMTE.
Fig. 4. (a) Sample autocorrelation function of the prediction error (b) Sample PSD.
Fig. 5. Eigenvalues of the covariance matrix and the truncation threshold.
4.1. Effects of larger prediction error
In the example presented earlier, the prediction error was relatively small, created by considering 4% damping ratio while
the system was 5% damped. In this section, we display the results for other choices of damping ratios while using the same set
of assumptions as above. Fig. 6(a-c) demonstrates model discrepancy considering 1%, 2%, and 3% damping ratios, respectively,
10
Fig. 6. Performance comparison of the proposed method when using 1%, 2%, and 3% damping ratios for the structural model class.
using the MMTE kernel function introduced in the previous section. It is evident that the predictions are properly accurate, and the
uncertainty bounds consistently account for the errors, regardless of the value of damping ratios assumed.
4.2. Missing data samples
An important aspect of the proposed framework is the reconstruction of missing data based on observed segments of data. This
issue might happen in practice due to malfunction of sensing or data acquisition devices. In this example, we have assumed that
two segments of data [0–20 s] and [30–40 s] are available while the segment [20–30 s] is missing. The MMTE-based GP model
is used for estimating the missing segment. Fig. 7(a) shows high accuracy of the reconstructed segment. The interval [20–22 s] is
magnified in Fig. 7(b), where the mean and ±SD uncertainty bounds are included to indicate the accuracy of the predictions. In
Fig. 7(c), the prediction error process is plotted along with the mean error and uncertainty bounds. As can be seen, the proposed
framework accurately estimates the missing data in the presence of prediction errors.
5. Numerical example II
5.1. Basic assumptions
In this section, a five-story structure shown in Fig. 8 is used for testing the proposed framework. Since horizontal beams act
rigidly, the only DOFs are the lateral displacements of the floors. The mass matrix is diagonal with equal elements of 1 kg. The
stiffness of all stories is considered 𝑘1 = ⋯ = 𝑘5 = 10 N∕m. Thus, the modal frequencies of the structure can be calculated as
𝛺 = {0.90, 2.63, 4.14, 5.32, 6.07} rad∕s. The damping matrix is proportional to the mass and stiffness matrices, i.e., 𝐂 = 𝛼𝐊 + 𝛽𝐌,
where 𝛼 = 0.02 and 𝛽 = 2 × 10−5 . The structure is subjected to a dynamic force 𝐗 applied to the top floor, and the dynamical
responses 𝐘 were simulated subjected to a GWN force with zero mean and 1 N standard deviation using 𝛥𝑡 = 0.01 s interval and
𝑇 = 60 s duration. Two sensor configurations are considered:
- Configuration (A): one accelerometer placed on the 2nd story

- Configuration (B): two accelerometers placed on the 2nd and 4th stories
A structural model with a similar configuration is considered for performing model updating and response predictions. The mass
matrix is known, and the structural model deviates from the actual structure in using a different damping coefficient of 𝛼 ′ = 0.03.
The stiffness matrix is partially known, and the following cases are considered for illustrating the proposed method:
11
Fig. 7. Reconstruction of missing acceleration response history.
Fig. 8. Shear frame used for illustrating the proposed methodology.
- Case (I): Only the stiffness of the 1st floor is unknown

- Case (II): the stiffness of the 1st and 4th floors are unknown
- Case (III): the stiffness of the 1st, 4th, and 5th floors are unknown
Thus, the objective is to update the stiffness parameters and predict the future response of the system in the presence of prediction
errors while using measured input–output data. For simplicity, the unknown stiffness is replaced by their nominal values multiplied
by the dimensionless parameters 𝜽, which should be identified. Thus, the unknown parameters (𝜽) are expected to be identified
very close to one.
12
Fig. 9. (a) Sample Auto-correlation function (ACF) of the prediction error (b) PSD of the prediction errors.
5.2. Results: Case (I) using sensor configuration (A)
We first apply the proposed framework to Case (I) while using sensor configuration (A). Having observed the achievement of
the MMTE kernel function in the preceding section, we only demonstrate the GP framework using this kernel function. This choice
can be justified by taking a careful look at the sample autocorrelation function of the prediction errors. As shown in Fig. 9(a), the
autocorrelation constitutes a few mixing cosine functions. This pattern can better be seen in Fig. 9(b), where the Fourier transform of
the autocorrelation function is indicated, showing the PSD of the prediction errors. It can be confirmed that there are four dominant
peaks, each potentially representing a separate cosine function. Although it might deem reasonable to consider the order of the
kernel function to be 𝑚 = 4, the complexity of the kernel covariance function requires additional investigation in order to avoid
over-fitting due to extra parameterization of the kernel function.
At this point, we clarify that the order of the MMTE kernel characterizes the number of dominant peaks on the PSD curves. Thus,
considering a large order for this covariance function can be interpreted as modeling more peaks than necessary. Additionally, since
the number of unknown parameters also grows with the order of the kernel function, a large order creates extra computational costs
that might be unnecessary.
To choose an optimal model order for the kernel function, a Bayesian model selection perspective is adopted herein. For this
purpose, the term 𝑃 (MP |𝐗, 𝐘), representing the likelihood of the model MP conditional on the data 𝐗 and 𝐘, is approximated through
the Bayesian Information Criteria (BIC) using the following formula:
𝑃 (MP |𝐗, 𝐘) ≈ 𝐵𝐼𝐶(MP ) = ln 𝑝(𝐘|𝜽, ̂ 𝐗, MP ) − 1 (𝑁𝜃 + 𝑁𝜙 + 1) ln(𝑛𝑁𝑜 )

̂ 𝝓, (31)
2
where ln 𝑝(𝐘|𝜽, ̂ 𝝓,̂ 𝐗, MP ) is the likelihood function to which the MPV 𝜽̂ and 𝝓̂ are replaced, representing a measure-of-fit; the
expression 21 (𝑁𝜃 + 𝑁𝜙 + 1) ln(𝑛𝑁𝑜 ) penalizes parameterization, growing with the number of data and model parameters. The MP
comprises a probabilistic model P(𝝓) that is specified by a kernel order 𝑚. Fig. 10 shows the BIC scores, as well as the log-evidence
obtained using TMCMC versus the number of modes contributing to the covariance function. As can be seen, both methods confirm
that 𝑚 = 3 is a reasonable choice.
Since a similar trend is observed for other case studies and sensor configurations, 𝑚 = 3 is considered for all cases. Thus,
𝝓=[𝜎𝑓2 ,1 , 𝜔1 , 𝓁1−2 , 𝜎𝑓2 ,2 , 𝜔2 , 𝓁2−2 , 𝜎𝑓2 ,3 , 𝜔3 , 𝓁3−2 , 𝜎𝑛2 ]𝑇 should be calibrated based on the data. For this purpose, the iTMCMC sampling
technique [61] was used to draw samples from the posterior distribution using Algorithm 3. The prior distribution of the parameters
is considered uniform, accepting equal plausibility of the parameters over a relatively large domain. In Fig. 11, the posterior samples
of 𝝓 are displayed for Case (I) of model inference using data from sensor configuration (A). The diagonal plots show the histograms
of the parameters, and the lower diagonal plots show joint posterior distributions in a pairwise manner. Based on this figure, the
posterior samples of the parameters {𝜔1 , 𝜔2 , 𝜔3 } concentrate close to those of the structure’s modal frequencies reported above. The
samples of {𝓁1−2 , 𝓁2−2 , 𝓁3−2 } populate around 0.02, and this result can be interpreted as having correlation lengths of around 7.1 s,
indicating how far we can expect to have reasonable predictions.
5.3. Results: Comparison between other cases
Similar pattern was observed for other cases and configurations, but they are not provided in the paper for brevity. Table 2
presents the identification of stiffness parameters for different case studies, obtained using Algorithm 1. The MPV match the actual
13
Fig. 10. Selection of the number of contributing modes into the kernel function.
Fig. 11. Posterior samples of the hyper-parameters of the kernel function using 𝑁𝑠 = 5000 number of samples.
values. The posterior uncertainties are also calculated using Laplace approximation. It can be shown that the uncertainty in the
parameters consistently increases with the number of unknown parameters while reduces when additional sensors are included.
Once the unknown parameters are inferred through Laplace approximation or sampling, it is possible to perform response
predictions. Figs. 12(a) and 13(a) show the predicted responses of the 2nd and 4th stories. Figs. 12(b) and 13(b) compare the
mean and uncertainty bounds along with the model discrepancy. The accuracy and robustness of predictions are fairly good, and
the periodic pattern in the discrepancy model is well captured via the proposed GP model. However, due to loss of correlation with
the data, the accuracy of the estimated mean deteriorates over time. It seems the trusted length of predictions can be up to the same
order as the correlation length identified by the GP model.
6. Experimental example
In this section, the proposed framework is illustrated using experimental data 𝐷 = {(𝐗, 𝐘)}, where 𝐗 is a time–history of the
base excitation and 𝐘 is the measured accelerations from the three-story structure shown in Fig. 14(a). A shaking table test is
14
Fig. 12. Prediction of the response time histories and the errors of the second floor (Training based on Case (III) and data from configuration (B)).
Table 2
Estimates of mean and standard deviation of the stiffness ratios using sampling for the signal trained at length of 𝑇 = 30 s
and 𝛥𝑡 = 0.01 s.
Case ID 𝜃̂1 𝜎̂ 𝜃2 𝜃̂2 𝜎̂ 𝜃2 𝜃̂3 𝜎̂ 𝜃3
Sensor Configuration (A)
(I) 1.0020 0.0033 – – – –
(II) 1.0056 0.0042 1.0034 0.0043 – –
(III) 1.0040 0.0195 0.9944 0.0113 1.0004 0.0250
Sensor Configuration (B)
(I) 0.9995 0.0018 – – – –
(II) 1.0005 0.0025 0.9988 0.0025 – –
(III) 1.0020 0.0039 0.9983 0.0034 0.9992 0.0048
performed to apply base excitation and record dynamical responses. The acceleration responses of the base and three stories were
measured under a band-limited GWN base excitation. The data is sampled at 0.005 s intervals and recorded for 120 s. The mass of
the structure is lumped at the floor levels, measured as 5.63 kg, 6.03 kg, and 4.66 kg corresponding to the first, second, and third
stories, respectively. The mass of other members is ignored as is considerably smaller than floor masses.
A shear frame model shown in Fig. 14(b) is used for describing the dynamics. The mass of the structural model is known
as 𝐌 = diag([5.63, 6.03, 4.66]) while the stiffness and damping matrices have to be identified considering the following parame-
terization:
⎡𝑘1 + 𝑘2 −𝑘2 0 ⎤
⎢ ⎥
𝐊 = ⎢ −𝑘2 𝑘2 + 𝑘3 −𝑘3 ⎥ (32)
⎢ ⎥
⎢ ⎥
⎣ 0 −𝑘3 𝑘3 ⎦
∑
3
𝐌𝝋 ̂ 𝑇𝑖 𝐌
̂ 𝑖𝝋
𝐂= 2𝜁𝑖 𝜛
̂𝑖 (33)
𝑖=1 ̂ 𝑇𝑖 𝐌𝝋
𝝋 ̂𝑖
15
Fig. 13. Prediction of the response time histories and the errors of the fourth floor (Training based on Case (III) and configuration (B)).
Fig. 14. (a) Three-story frame tested on a shaking-table (b) 3-DOF linear structural model.
where 𝜛 ̂ 𝑖 and 𝝋 ̂ 𝑖 are nominal values of the modal frequency and mode shape corresponding to the 𝑖th dynamical mode, obtained
from [66]; 𝜁𝑖 is the 𝑖th unknown modal damping ratio. Given this specification, the unknown parameters can be collected into
𝜽 = [𝑘1 , 𝑘2 , 𝑘3 , 𝜁1 , 𝜁2 , 𝜁3 ]𝑇 .
Time–history system responses of the 1st and 3rd stories were taken as the measured acceleration 𝐘 to measured input X, using
𝑛𝑁𝑜 = 12 000 number of data points for the inference. The MMTE kernel with additive GWN is used for describing the discrepancy
between the measured and model responses. Similar to the above example, the proper order of the kernel function is found by
comparing the BIC score for 𝑚 = {1, 2, 3}. The score is computed using Eq. (31), that expresses how likely the probabilistic model
with the kernel order 𝑚 can describe the experimental data 𝐘. This example is a 3-DOF structure, so only up to three dynamical
modes can be observed in the model response. Therefore, the order of the MMTE kernel cannot exceed three. As compared in
Fig. 15, the BIC scores suggest that 𝑚 = 3 is a good choice. Thus, the unknown parameters of the kernel function can be included
into 𝝓=[𝜎𝑓2 ,1 , 𝜔1 , 𝓁1−2 , 𝜎𝑓2 ,2 , 𝜔2 , 𝓁2−2 , 𝜎𝑓2 ,3 , 𝜔3 , 𝓁3−2 , 𝜎𝑛2 ]𝑇 .
16
Fig. 15. Selection of the order of the kernel covariance function.
Fig. 16. Prediction of the at the 1st floor acceleration response (the shaded area shows ±3SD uncertainty bound; the blue line shows the measured response).
The MPV of the parameters are calculated through Algorithm 1, and the results are reported in Tables 3–4. The MPV of the
structural parameters are compared with their reference values in Table 3, verifying the validity of the results. The posterior
uncertainty of the stiffness and damping parameters is approximated using Laplace approximation, and the standard deviation is
provided in Table 3. Additionally, in Table 4, the optimal values of the kernel parameters are provided. It can be observed that the
variances 𝜎𝑓2 ,𝑖 ’s are much larger than 𝜎𝑛2 , showing that a large portion of the uncertainty is captured by the kernel covariance function
and not the GWN additive component. The correlation lengths 𝓁𝑖 ’s appear to be between 1.0 s < 𝓁𝑖 < 1.25 s, and the frequencies of
the MMTE kernel function (𝜔𝑖 ’s) seem to be very close to those of the structure, reported in [64] as {27.57, 81.36, 118.20 rad∕s}.
Fig. 16 compares the measured acceleration response of the first floor with the uncertainty bounds obtained by the GP framework.
Fig. 17 shows the uncertainty bounds along with the structural model discrepancy. As can been observed, the uncertainty bounds
gradually expand as we deviate from the training segment ended at t = 10 s, but it reaches a stationary regime after t = 15 s.
17
Table 3
Posterior distribution of the structural parameters approximated using Algorithm 1.
Structural parameters Nominal values MPV SD
𝑘1 (kN/m) 17.49 17.86 0.16
𝑘2 (kN/m) 25.66 24.20 0.53
𝑘3 (kN/m) 24.79 26.00 0.38
𝜁1 (%) 2.39 1.89 0.73
𝜁2 (%) 0.87 0.34 0.15
𝜁3 (%) 0.65 1.30 0.44
Table 4
Optimal values of the parameters of the MMT kernel combined with GWN.
Parameters 𝜎𝑓2 ,1 𝜎𝑓2 ,2 𝜎𝑓2 ,3 1∕𝓁12 1∕𝓁22 1∕𝓁32 𝜔1 𝜔2 𝜔3 𝜎𝑛2
MPV 0.062 0.027 0.042 0.647 0.727 0.996 114.47 79.66 26.11 5×10−5
Fig. 17. Prediction of the at the 1st floor acceleration response.
However, care should be taken as the estimated mean of the model discrepancy is reliably estimated only for a short period of time,
[10–10.5 s], lasting only after half a length-scale (0.5 s < 𝓁𝑖 ∕2 < 0.75 s). Unlike the predicted mean, the indicated uncertainty bound
almost entirely covers the measured response and can reflect a reasonable degree-of-confidence.
7. Conclusions
A novel Bayesian framework is proposed for structural model updating, response predictions, and missing samples reconstruction
using GP models. Kernel functions are introduced for modeling the correlation of prediction errors. Potential merits of these kernels
are discussed in terms of how well the GP model compensates for model discrepancies. The choice of kernel functions is directed
toward Bayesian model class selection, and new formulations are offered to incorporate the accuracy of response predictions, the
fitting accuracy of training data, and model complexity. However, the computational cost of training the GP models turns out to be
challenging mainly due to the rise in the number of unknown parameters, as well as the increase of the size of covariance matrix
with the number of data points. Thus, a few computational strategies are developed for model inference and predictions using
Laplace approximation and MCMC sampling techniques. The computational cost of calculating the inverse and the determinant of
the covariance matrix is also addressed using a truncated spectral representation. Finally, the proposed framework is tested and
verified via numerical and experimental examples. From the results presented, the following conclusions can be drawn:
18
• The proposed framework provides accurate parameter and response estimates, compensating for the structural model
discrepancy. It also delivers reasonable response uncertainties, covering potential prediction errors.
• Based on the sample autocorrelation function and the spectral density of the model discrepancy (prediction errors), it is seen
that the errors can be approximated by a few dominant mixing cosine functions.
• The MMTE kernel functions offer better prediction performance compared to other choices. This finding is supported by
comparing the relative plausibility of different kernels calculated based on the posterior probability, as well as the quality
of response predictions in held-out data sets.
• The number of modes incorporated into the kernel function has a large effect on the parameter estimates and response
predictions. Therefore, it is advisable to include the most dominant modes in the signal of the interest and set the frequencies
of the MMTE kernel function initially at the modal frequencies of the structure.
• The BIC scores are observed to provide consistent results for selecting a proper model order for the MMTE covariance function.
In this paper, the proposed MMTE function is shown to be promising for dealing with linear structural models when stationary kernel
functions are justifiable. The input forces used for the synthetic simulations were considered to be GWN with varying magnitude and
in the experimental examples the force used was the base acceleration taken from the study [66]. Although, we tested only these
types of forces, it is believed that the methodology can be well applied for the other types of excitation, but further investigation
is needed to verify such a statement. Our future works will generalize this framework by considering nonlinearities in combination
with non-stationary types of excitations.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared
to influence the work reported in this paper.
Data availability
Data will be made available on request.
Acknowledgments
Financial support from the Hong Kong Research Grants Council under grants 16212918 and 16211019 is gratefully appreciated.
References
[1] J.L. Beck, Statistical system identification of structures, in: 5th Int. Conf. Struct. Saf. Reliab., ASCE, San Francisco, California, 1989,
cedb.asce.org/CEDBsearch/record.jsp?dockey=0064461.
[2] J.E. Mottershead, M.I. Friswell, Model updating in structural dynamics: A survey, J. Sound Vib. 167 (1993) 347–375.
[3] J.L. Beck, L.S. Katafygiotis, Updating models and their uncertainties. I: Bayesian statistical framework, J. Eng. Mech. 124 (1998) 455–461, http:
//dx.doi.org/10.1061/(ASCE)0733-9399(1998)124:4(455).
[4] E. Simoen, G. De Roeck, G. Lombaert, Dealing with uncertainty in model updating for damage assessment: A review, Mech. Syst. Signal Process. 56 (2015)
123–149, http://dx.doi.org/10.1016/j.ymssp.2014.11.001.
[5] D.J. Jerez, H.A. Jensen, M. Beer, An effective implementation of reliability methods for Bayesian model updating of structural dynamic models with
multiple uncertain parameters, Reliab. Eng. Syst. Saf. 225 (2022) 108634, http://dx.doi.org/10.1016/j.ress.2022.108634.
[6] J.L. Beck, Bayesian system identification based on probability logic, Struct. Control Heal. Monit. (2010) 825–847, http://dx.doi.org/10.1002/stc.424.
[7] J.L. Beck, S.K. Au, Bayesian updating of structural models and reliability using Markov chain Monte Carlo simulation, J. Eng. Mech. 128 (2002) 380–391,
http://dx.doi.org/10.1061/(ASCE)0733-9399(2002)128:4(380).
[8] S.-K. Au, Operational Modal Analysis-Modeling, Bayesian Inference, Uncertainty Laws, Springer, Singapore, 2017, http://dx.doi.org/10.1007/978-981-10-
4118-1.
[9] H.A. Jensen, E. Millas, D. Kusanovic, C. Papadimitriou, Model-reduction techniques for Bayesian finite element model updating using dynamic response
data, Comput. Methods Appl. Mech. Engrg. 279 (2014) 301–324, http://dx.doi.org/10.1016/j.cma.2014.06.032.
[10] M. Kitahara, C. Dang, M. Beer, Bayesian updating with two-step parallel Bayesian optimization and quadrature, Comput. Methods Appl. Mech. Engrg. 403
(2023) 115735, http://dx.doi.org/10.1016/j.cma.2022.115735.
[11] W. Betz, I. Papaioannou, J.L. Beck, D. Straub, Bayesian inference with subset simulation: Strategies and improvements, Comput. Methods Appl. Mech.
Engrg. 331 (2018) 72–93, http://dx.doi.org/10.1016/j.cma.2017.11.021.
[12] J.L. Beck, K.V. Yuen, Model selection using response measurements: Bayesian probabilistic approach, J. Eng. Mech. 130 (2004) 192–203, http://dx.doi.
org/10.1061/(ASCE)0733-9399(2004)130:2(192).
[13] K.V. Yuen, Bayesian methods for structural dynamics and civil engineering, 2010, http://dx.doi.org/10.1002/9780470824566.
[14] K.-V. Yuen, Recent developments of Bayesian model class selection and applications in civil engineering, Struct. Saf. 32 (2010) 338–346, http:
//dx.doi.org/10.1016/j.strusafe.2010.03.011.
[15] J.L. Beck, A.A. Taflanidis, Prior and posterior robust stochastic predictions for dynamical systems using probability logic, Int. J. Uncertain. Quantif. 3
(2013) 271–288, http://dx.doi.org/10.1615/Int.J.UncertaintyQuantification.2012003641.
[16] J. Zhang, A.A. Taflanidis, Bayesian model averaging for kriging regression structure selection, Probabilistic Eng. Mech. 56 (2019) 58–70, http://dx.doi.
org/10.1016/j.probengmech.2019.02.002.
[17] S.H. Cheung, J.L. Beck, Calculation of posterior probabilities for Bayesian model class assessment and averaging from posterior samples based on dynamic
system data, Comput. Civ. Infrastruct. Eng. 25 (2010) 304–321, http://dx.doi.org/10.1111/j.1467-8667.2009.00642.x.
[18] W.J. Yan, L.S. Katafygiotis, A novel Bayesian approach for structural model updating utilizing statistical modal information from multiple setups, Struct.
Saf. 52 (2015) 260–271, http://dx.doi.org/10.1016/j.strusafe.2014.06.004.
19
[19] H.A. Jensen, C. Esse, V. Araya, C. Papadimitriou, Implementation of an adaptive meta-model for Bayesian finite element model updating in time domain,
Reliab. Eng. Syst. Saf. 160 (2017) 174–190, http://dx.doi.org/10.1016/j.ress.2016.12.005.
[20] I. Behmanesh, B. Moaveni, Accounting for environmental variability, modeling errors, and parameter estimation uncertainties in structural identification,
J. Sound Vib. 374 (2016) 92–110, http://dx.doi.org/10.1016/j.jsv.2016.03.022.
[21] O. Sedehi, C. Papadimitriou, L.S. Katafygiotis, Hierarchical Bayesian uncertainty quantification of finite element models using modal statistical information,
Mech. Syst. Signal Process. 179 (2022) 109296, http://dx.doi.org/10.1016/j.ymssp.2022.109296.
[22] S.-K. Au, F.-L. Zhang, Fundamental two-stage formulation for Bayesian system identification, Part I: General theory, Mech. Syst. Signal Process. 66–67
(2016) 31–42, http://dx.doi.org/10.1016/j.ymssp.2015.04.025.
[23] M. Wu, A.W. Smyth, Application of the unscented Kalman filter for real-time nonlinear structural system identificatio, Struct. Control Heal. Monit. 14
(2007) 971–990, http://dx.doi.org/10.1002/stc.186.
[24] E.N. Chatzi, A.W. Smyth, The unscented Kalman filter and particle filter methods for nonlinear structural system identification with non-collocated
heterogeneous sensing, Struct. Control Heal. Monit. 16 (2009) 99–123.
[25] K.V. Yuen, S.C. Kuok, Online updating and uncertainty quantification using nonstationary output-only measurement, Mech. Syst. Signal Process. 66–67
(2016) 62–77, http://dx.doi.org/10.1016/j.ymssp.2015.05.019.
[26] J. Castiglione, R. Astroza, S. Eftekhar Azam, D. Linzell, Auto-regressive model based input and parameter estimation for nonlinear finite element models,
Mech. Syst. Signal Process. 143 (2020) 106779, http://dx.doi.org/10.1016/j.ymssp.2020.106779.
[27] M. Song, L. Renson, J. Noël, B. Moaveni, G. Kerschen, Bayesian model updating of nonlinear systems using nonlinear normal modes, Struct. Control Heal.
Monit. 25 (2018) http://dx.doi.org/10.1002/stc.2258.
[28] K. Erazo, S. Nagarajaiah, Bayesian structural identification of a hysteretic negative stiffness earthquake protection system using unscented Kalman filtering,
Struct. Control Heal. Monit. 25 (2018) e2203, http://dx.doi.org/10.1002/stc.2203.
[29] Y. Huang, J.L. Beck, H. Li, Y. Ren, Sequential sparse Bayesian learning with applications to system identification for damage assessment and recursive
reconstruction of image sequences, Comput. Methods Appl. Mech. Engrg. 373 (2021) 113545, http://dx.doi.org/10.1016/j.cma.2020.113545.
[30] Y. Huang, J. Yu, J.L. Beck, H. Zhu, H. Li, Novel sparseness-inducing dual Kalman filter and its application to tracking time-varying spatially-sparse
structural stiffness changes and inputs, Comput. Methods Appl. Mech. Engrg. 372 (2020) 113411, http://dx.doi.org/10.1016/j.cma.2020.113411.
[31] Y. Huang, J.L. Beck, H. Li, Bayesian system identification based on hierarchical sparse Bayesian learning and gibbs sampling with application to structural
damage assessment, Comput. Methods Appl. Mech. Engrg. 318 (2017) 382–411, http://dx.doi.org/10.1016/j.cma.2017.01.030.
[32] I. Behmanesh, B. Moaveni, G. Lombaert, C. Papadimitriou, Hierarchical Bayesian model updating for structural identification, Mech. Syst. Signal Process.
64–65 (2015) 360–376, http://dx.doi.org/10.1016/j.ymssp.2015.03.026.
[33] O. Sedehi, C. Papadimitriou, L.S. Katafygiotis, Probabilistic hierarchical Bayesian framework for time-domain model updating and robust predictions, Mech.
Syst. Signal Process. 123 (2019) 648–673, http://dx.doi.org/10.1016/j.ymssp.2018.09.041.
[34] E.T. Jaynes, Probability Theory: The Logic of Science, Cambridge University Press, 2003, http://dx.doi.org/10.1007/BF02985800.
[35] V.P. Singh, Entropy Theory and Its Application in Environmental and Water Engineering, John Wiley & Sons, 2013.
[36] K.-V. Yuen, L.S. Katafygiotis, Bayesian time–domain approach for modal updating using ambient data, Probabilistic Eng. Mech. 16 (2001) 219–231,
http://dx.doi.org/10.1016/S0266-8920(01)00004-2.
[37] K.-V. Yuen, L.S. Katafygiotis, Bayesian fast Fourier transform approach for modal updating using ambient data, Adv. Struct. Eng. 6 (2003) 81–95.
[38] K.-V. Yuen, L.S. Katafygiotis, Bayesian modal updating using complete input and incomplete response noisy measurements, J. Eng. Mech. 128 (2002)
340–350, http://dx.doi.org/10.1061/(ASCE)0733-9399(2002)128:3(340).
[39] K.-V. Yuen, S.-C. Kuok, Bayesian methods for updating dynamic models, Appl. Mech. Rev. 64 (2011) 010802, http://dx.doi.org/10.1115/1.4004479.
[40] K.F. Christodoulou, Methodology for Structural Identification and Damage Detection, University of Thessaly, Volos, Greece, 2006.
[41] L.S. Katafygiotis, O. Sedehi, F.R. Rofooei, Bayesian time-domain model updating considering correlation of prediction errors, in: 12th Int. Conf. Struct.
Saf. Reliab. Vienna, Austria, 2017, pp. 2500–2509.
[42] C.E. Rasmussen, C.K.I. Williams, Gaussian processes for machine learning, 2004, http://dx.doi.org/10.1142/S0129065704001899.
[43] M.C. Kennedy, A. O’Hagan, Bayesian calibration of computer models, J. R. Stat. Soc. Ser. B Stat. Methodol. 63 (2001) 425–464, http://dx.doi.org/10.
1111/1467-9868.00294.
[44] H.-P. Wan, W.-X. Ren, A residual-based Gaussian process model framework for finite element model updating, Comput. Struct. 156 (2015) 149–159,
http://dx.doi.org/10.1016/j.compstruc.2015.05.003.
[45] A. Jesus, P. Brommer, Y. Zhu, I. Laory, Comprehensive Bayesian structural identification using temperature variation, Eng. Struct. 141 (2017) 75–82,
http://dx.doi.org/10.1016/j.engstruct.2017.01.060.
[46] P. Ni, J. Li, H. Hao, Q. Han, X. Du, Probabilistic model updating via variational Bayesian inference and adaptive Gaussian process modeling, Comput.
Methods Appl. Mech. Engrg. 383 (2021) 113915, http://dx.doi.org/10.1016/j.cma.2021.113915.
[47] L.D. Avendaño-Valencia, E.N. Chatzi, D. Tcherniak, Gaussian process models for mitigation of operational variability in the structural health monitoring
of wind turbines, Mech. Syst. Signal Process. 142 (2020) 106686, http://dx.doi.org/10.1016/j.ymssp.2020.106686.
[48] Y.-C. Zhu, S.-K. Au, Bayesian data driven model for uncertain modal properties identified from operational modal analysis, Mech. Syst. Signal Process.
136 (2020) 106511, http://dx.doi.org/10.1016/j.ymssp.2019.106511.
[49] Q.-A. Wang, C. Zhang, Z.-G. Ma, Y.-Q. Ni, Modelling and forecasting of SHM strain measurement for a large-scale suspension bridge during typhoon events
using variational heteroscedastic Gaussian process, Eng. Struct. 251 (2022) 113554, http://dx.doi.org/10.1016/j.engstruct.2021.113554.
[50] R. Nayek, S. Chakraborty, S. Narasimhan, A Gaussian process latent force model for joint input-state estimation in linear structural systems, Mech. Syst.
Signal Process. 128 (2019) 497–530, http://dx.doi.org/10.1016/j.ymssp.2019.03.048.
[51] G. Jia, A.A. Taflanidis, Kriging metamodeling for approximation of high-dimensional wave and surge responses in real-time storm/hurricane risk assessment,
Comput. Methods Appl. Mech. Engrg. 261–262 (2013) 24–38, http://dx.doi.org/10.1016/j.cma.2013.03.012.
[52] J. Zhang, A.A. Taflanidis, Accelerating MCMC via kriging-based adaptive independent proposals and delayed rejection, Comput. Methods Appl. Mech.
Engrg. 355 (2019) 1124–1147, http://dx.doi.org/10.1016/j.cma.2019.07.016.
[53] C. Papadimitriou, G. Lombaert, The effect of prediction error correlation on optimal sensor placement in structural dynamics, Mech. Syst. Signal Process.
28 (2012) 105–127, http://dx.doi.org/10.1016/j.ymssp.2011.05.019.
[54] A. Ben Abdessalem, N. Dervilis, D.J. Wagg, K. Worden, Automatic kernel selection for Gaussian processes regression with approximate Bayesian computation
and sequential Monte Carlo, Front. Built Environ. 3 (2017) http://dx.doi.org/10.3389/fbuil.2017.00052.
[55] E. Simoen, C. Papadimitriou, G. Lombaert, On prediction error correlation in Bayesian model updating, J. Sound Vib. 332 (2013) 4136–4152, http:
//dx.doi.org/10.1016/j.jsv.2013.03.019.
[56] M.K. Ramancha, J.P. Conte, M.D. Parno, Accounting for model form uncertainty in Bayesian calibration of linear dynamic systems, Mech. Syst. Signal
Process. 171 (2022) 108871, http://dx.doi.org/10.1016/j.ymssp.2022.108871.
[57] A.M. Kosikova, O. Sedehi, L.S. Katafygiotis, Bayesian model updating using gaussian process regression, in: J. Li, P.D. Spanos, J.B. Chen, Y.B. Peng (Eds.),
13th Int. Conf. Struct. Saf. Reliab, (ICOSSAR 2021), Shanghai, China, 2021.
[58] J. Ching, Y.-C. Chen, Transitional Markov chain Monte Carlo method for Bayesian model updating, model class selection, and model averaging, J. Eng.
Mech. 133 (2007) 816–832, http://dx.doi.org/10.1061/(ASCE)0733-9399(2007)133:7(816).
20
[59] L.S. Katafygiotis, J.L. Beck, Updating models and their uncertainties. II: model identifiability, J. Eng. Mech. 124 (1998) 463–467, http://dx.doi.org/10.
1061/(ASCE)0733-9399(1998)124:4(463).
[60] W. Betz, I. Papaioannou, D. Straub, Transitional Markov chain Monte Carlo: Observations and improvements, J. Eng. Mech. 142 (2016) 04016016,
http://dx.doi.org/10.1061/(ASCE)EM.1943-7889.0001066.
[61] S. Wu, P. Angelikopoulos, C. Papadimitriou, P. Koumoutsakos, Bayesian annealed sequential importance sampling: An unbiased version of transitional
Markov chain Monte Carlo, ASCE-ASME J. Risk Uncert. Engrg. Sys. Part B Mech. Engrg. 4 (2017) 011008, http://dx.doi.org/10.1115/1.4037450.
[62] H. Jensen, C. Papadimitriou, Sub-structure Coupling for Dynamic Analysis, Springer International Publishing, Cham, 2019, http://dx.doi.org/10.1007/978-
3-030-12819-7.
[63] C. Lataniotis, S. Marelli, B. Sudret, Extending classical surrogate modeling to high dimensions through supervised dimensionality reduction: A data-driven
approach, Int. J. Uncertain. Quantif. 10 (2020) 55–82, http://dx.doi.org/10.1615/Int.J.UncertaintyQuantification.2020031935.
[64] A. Chatterjee, An Introduction to the Proper Orthogonal Decomposition, Current science, 2000, pp. 808–817.
[65] Ali Rahimi, Benjamin Recht, Random features for large-scale kernel machines, Adv. Neural Inf. Process. Syst. 20 (2007) 1177–1184.
[66] O. Sedehi, C. Papadimitriou, L.S. Katafygiotis, Data-driven uncertainty quantification and propagation in structural dynamics through a hierarchical Bayesian
framework, Probabilistic Eng. Mech. 60 (2020) 103047, http://dx.doi.org/10.1016/j.probengmech.2020.103047.
21

1 s2.0 S0045782523004814 Main

Uploaded by

Copyright:

Available Formats

1 s2.0 S0045782523004814 Main

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S0045782523004814 Main

Uploaded by

Copyright:

Available Formats

Comput. Methods Appl. Mech. Engrg.

417 (2023) 116357

Contents lists available at ScienceDirect

Comput. Methods Appl. Mech. Engrg.

Bayesian structural identification using Gaussian Process

ARTICLE INFO ABSTRACT

2. Proposed Bayesian methodology

2.1. Structural model updating

𝑝(𝐲𝑖 |𝐱𝑖 , 𝜽, 𝝓) ∼ 𝐺𝑃 (𝐲𝑖 |𝑓 (𝐱𝑖 ; 𝜽), 𝑘(𝑡𝑖 , 𝑡𝑖 ′ ; 𝝓)) (2)

𝑝(𝐘|𝐗, MP ) = 𝑝(𝐘|𝜣, 𝐗, MP )𝑝(𝜣|MP )𝑑𝜣 (4)

𝑝(𝐘|𝐗, 𝜣, MP ) = 𝑁(𝐘|𝐟(𝐗; 𝜽), 𝐊(𝝓)) (5)

2.2. Response predictions

𝝁𝑝𝑟𝑒𝑑 = 𝐟(𝐗𝑝𝑟𝑒𝑑 ; 𝜽) + 𝐤(𝝓)𝑇𝑝𝑟𝑒𝑑 𝐊(𝝓)−1 (𝐘 − 𝐟(𝐗; 𝜽)) (8)

𝜮 𝑝𝑟𝑒𝑑 = 𝐊𝑝𝑟𝑒𝑑 (𝝓) − 𝐤(𝝓)𝑇𝑝𝑟𝑒𝑑 𝐊−1 (𝝓)𝐤(𝝓)𝑝𝑟𝑒𝑑 (9)

𝑝(𝐘𝑝𝑟𝑒𝑑 |𝐘, 𝐗, 𝐗𝑝𝑟𝑒𝑑 , MP ) = 𝑝(𝐘𝑝𝑟𝑒𝑑 |𝐘, 𝐗, 𝐗𝑝𝑟𝑒𝑑 , 𝜣, MP )𝑝(𝜣|𝐘, 𝐗, MP )𝑑𝜣 (10)

2.3. Prediction-based model class selection

ln 𝑃 (MP |𝐗, 𝐗𝑝𝑟𝑒𝑑 , 𝐘, 𝐘𝑝𝑟𝑒𝑑 ) = ln 𝑝(𝐘𝑝𝑟𝑒𝑑 |𝐘, 𝐗, 𝐗𝑝𝑟𝑒𝑑 , MP ) + ln 𝑝(𝐘|𝐗, MP ) (13)

2.4. Kernel covariance functions

2.4.1. Squared exponential (SE) function

2.4.2. Periodic exponential (PE) function

2.4.3. Multi-modal trigonometric exponential (MMTE) kernel

2.4.4. Combination with isotropic measurement noise

𝑘′𝑖𝑗 = 𝑘𝑖𝑗 + 𝜎𝑛2 𝛿𝑖𝑗 (17)

3.1. Gaussian approximation of posterior distribution

ln 𝑝(𝜽|𝝓, 𝐘, 𝐗, MP ) = ln 𝑝(𝐘|𝜣, 𝐗, MP ) + ln 𝑝(𝜽|𝝓, MP ) − ln 𝑝(𝐘|𝐗, 𝝓, MP ) (20)

3.2. Full sampling strategy

3.3. Discussion on computational costs

Fig. 5. Eigenvalues of the covariance matrix and the truncation threshold.

4.1. Effects of larger prediction error

4.2. Missing data samples

5.1. Basic assumptions

- Configuration (A): one accelerometer placed on the 2nd story

Fig. 7. Reconstruction of missing acceleration response history.

Fig. 8. Shear frame used for illustrating the proposed methodology.

- Case (I): Only the stiffness of the 1st floor is unknown

5.2. Results: Case (I) using sensor configuration (A)

𝑃 (MP |𝐗, 𝐘) ≈ 𝐵𝐼𝐶(MP ) = ln 𝑝(𝐘|𝜽, ̂ 𝐗, MP ) − 1 (𝑁𝜃 + 𝑁𝜙 + 1) ln(𝑛𝑁𝑜 )

5.3. Results: Comparison between other cases

Fig. 15. Selection of the order of the kernel covariance function.

Fig. 17. Prediction of the at the 1st floor acceleration response.

Declaration of competing interest

Data will be made available on request.

You might also like