3483-Document Upload-13320-1-10-20231026

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

DAGGER: Data AuGmentation GEneRative Framework for

Time-Series Data in Data-Driven Smart Manufacturing Systems


David Blank1 , Daniel Ospina Acero2 , Nicholas Hemleben3 , Andrew VanFossen4 , Frank Zahiri5 , Mrinal Kumar6

1,2,4,6
Mechanical & Aerospace Engineering, The Ohio State University, 201 W. 19th Avenue, Columbus, OH 43210, USA
[email protected]
[email protected]
[email protected]
[email protected]

3
PointPro Inc., 7658 Johntimm Ct, Dublin, OH 43017, USA
[email protected]

5
402 Commodities Maintenance Group (CMXG), Warner Robins Air Logistics Complex, Robins AFB, GA 31098
[email protected]

A BSTRACT ries samples to perform system identification of a black box


dynamic system. In all the cases we compare the results from
Performance of digital twins (DTs) in smart manufacturing is
the DAGGER with those from a Riemannian Hamiltonian
heavily data dependent, especially when physics-based com-
Variational AE (RHVAE). We show that the DAGGER’s per-
putational models are not available or difficult to obtain in
formance is in general satisfactory, and comparable with that
practice, as it’s the case in most modern manufacturing sce-
of the RHVAE in all the considered evaluation scenarios.
narios. However, in manufacturing applications the availabil-
ity of data is often limited and involves high dimensional sig-
1. I NTRODUCTION
nals. In this work we present the Data AuGmentation GEneR-
ative (DAGGER) framework, which is a deep-learning-based As industries transition into the Industry 4.0 paradigm and in-
tool combining the strengths of autoencoders (AEs) and gen- creasingly incorporate smart manufacturing practices in their
erative adversarial networks (GANs) to robustly, efficiently operation, the relevance and interest in concepts like DT are
and reliably augment the available sensor data to train the at an all-time high. DTs offer direct avenues for industries
data-based computational models of DTs for smart manu- to make more accurate predictions, rational decisions, and
facturing. The DAGGER framework uses the learned latent informed plans, ultimately reducing costs, increasing perfor-
space from an AE into the training process of the generator in mance and productivity. The adequate operation of DTs in
a GAN. This provides increased stability in the convergence the context of smart manufacturing relies on an evolving data-
of the GAN’s discriminator/critic when working with very set relating to the real-life object or process, and a means of
small training data sets, and helps the GAN’s generator to dynamically updating the computational model to better con-
more accurately and robustly capture the structure in the sen- form to the data (Wright & Davidson, 2020). This reliance on
sor data. We corroborate the efficacy of the DAGGER frame- data is made more explicit when physics-based computational
work in two ways, one in which we directly contrast the syn- models are not available or difficult to obtain in practice, as it
thetically generated time series samples with real ones from is the case in most modern manufacturing scenarios. For data-
publicly available sensor data in a manufacturing application based model surrogates to “adequately” represent the under-
(using performance metrics based on the similarity of mean lying physics, the number of training data points must keep
signals, variance signals, KL divergence, signals in Fourier pace with the number of degrees of freedom in the model,
domain, auto-correlation signals, etc.), and one in which we which can be on the order of thousands. However, in niche
evaluate the adequacy of the synthetically generated time se- industrial scenarios like those in manufacturing applications,
the availability of data tend to be limited (on the order of a few
David Blank et al. This is an open-access article distributed under the terms hundred data points, at best) (Diez-Olivan, Del Ser, Galar, &
of the Creative Commons Attribution 3.0 United States License, which per- Sierra, 2019), mainly because a manual measuring process
mits unrestricted use, distribution, and reproduction in any medium, provided
the original author and source are credited. typically must take place for a few of the relevant quantities,

1
A NNUAL C ONFERENCE OF THE P ROGNOSTICS AND H EALTH M ANAGEMENT S OCIETY 2023

e.g., level of wear of a tool. In other words, notwithstanding ceived to generate synthetic high dimensional time series data
the popular notion of big-data, there is still a stark shortage of when the training data sets are very small. The DAGGER
ground-truth data when examining, for instance, a complex scheme uses the learned latent space from an AE in the train-
system’s path to failure. In this work, we present a frame- ing process of the generator of a GAN, increasing robustness
work to alleviate this problem via modern machine learning and stability in both training and synthetic sample generation.
tools, where we show a robust, efficient and reliable pathway
to augment the available data to train the data-based compu- 2. P ROBLEM S TATEMENT
tational models.
A data set S contains K time series signals that come from
Small sample size data is a key limitation in performance a sensor in a manufacturing device (e.g., a vibration sensor
in machine learning, in particular with very high dimen- placed at the spindle on a milling machine). The k-th el-
sional data (Goodfellow, 2016). Current efforts for synthetic ement in S is the time series signal xk containing N time
data generation typically involve either Generative Adversar- samples; i.e., xk = [xk (t1 ), xk (t2 ), . . . , xk (tN )]. Our objec-
ial Networks (GANs) or Variational Autoencoders (VAEs) tive is to obtain new time series signals that “resemble” those
(Figueira & Vaz, 2022; Demir, Mincev, Kok, & Paterakis, in S. More concretely, if we interpret every xk as a sample
2021), and both have been used in numerous applications, in- drawn from a probability distribution pdata (X), in principle
cluding those with very high dimensionality, e.g., 3D MRI we simply wish draw new samples from pdata (X). The prob-
images. However, there remain hurdles to overcome in the lem, however, is that pdata (X) is unknown, and we have to
context of smart manufacturing: first devise a model for it.
• Paradoxically, these methods fall under the umbrella of To that end, let us define a probability distribution
deep learning and therefore themselves require vast num- pmodel (X; θ) that approximates pdata (X), with θ representing
bers of hyperparameters to optimize (Hutter, Lücke, & the set of defining parameters of pmodel . With a slight abuse
Schmidt-Thieme, 2015). Traditional GAN and AE ar- of notation, our first step is thus defined by the following
chitectures therefore require large data sets for adequate optimization problem (Goodfellow, 2016):
training. Thus, new network architectures that can be
effectively trained with very small data sets (e.g., a few θ ∗ = arg min D pdata ({X = xk }K
k=1 ),
θ
hundreds of data points) need to be developed.
pmodel ({X = xk }K

k=1 ; θ) , (1)
• Most applications of GANs and AEs target image anal-
ysis and synthesis (Smith & Smith, 2020; Doersch, where D(·, ·) represents some measure of distance between
2021), which renders traditional network architectures two probability distributions (e.g., Kullback-Leibler diver-
poorly suited for time-series data generation. Although gence). Once θ ∗ is found, we obtain a new time series signal
there has been recent work on augmentation techniques x̂ by sampling from pmodel (X; θ ∗ ).
for time series data with GANS and/or VAEs (Iglesias,
Talavera, González-Prieto, Mozo, & Gómez-Canaval, Now, for manufacturing systems, the number of time series
2023; Yang, Li, & Zhou, 2023), the problem remains signals available for “training” is typically very small (in the
large open, especially regarding the efficiency of both order a few hundreds), and the number of time samples in a
the training and the generation operations. Consequently, time series is large (in the order of several thousands), i.e.,
robust and efficient architectures tailored to time-series N ≫ K. Consequently, the problem in (1) must be solved
sensor data generation must be defined. under those constraints: high-dimensional data with very few
samples for “training”. Here we present a robust framework
• Standalone GAN or VAE architectures inherently bring
to address these challenges.
along a few difficulties. For example, GAN models are
susceptible to mode collapse, training instability, and
3. AE+GAN A RCHITECTURE : DAGGER F RAME -
high computational costs when used for high dimen-
WORK FOR S YNTHETIC T IME S ERIES G ENERATION
sional data creation (Khanuja & Agarkar, 2023, Ch. 13).
On the other hand, the encoding of VAEs greatly reduces In this section, we describe the methodology developed to
dimensional complexity of data and can effectively regu- solve the problem described in the previous section. The
larize the latent space, but often produces poor represen- methodlogy is dubbed the DAGGER framework. The DAG-
tational synthetic samples (Shao et al., 2020). Thus, al- GER corresponds to the combination of two different deep
ternative neural network architectures that alleviate such neural network architectures working in tandem to produce
practical issues need to devised. synthetic samples of time series signals under the constraints
of high dimensionality and very few samples for training,
In light of these challenges, here we present the Data AuG- which are typical in manufacturing scenarios. The two net-
mentation GEneRative (DAGGER) framework, which corre- work architectures defining the DAGGER framework are an
sponds to a hybrid AE+GAN architecture specifically con-

2
A NNUAL C ONFERENCE OF THE P ROGNOSTICS AND H EALTH M ANAGEMENT S OCIETY 2023

AE network and a GAN, the former dealing with the transfor- mation, taking a signal in the latent space and transforming it
mation of the original data to and from a lower dimensional back out into its original space representation.
space, and the latter dealing with the generation of the syn-
For the DAGGER framework the AE network follows a stan-
thetic samples. A schematic view of the DAGGER frame-
dard architecture, with the encoder formed by an input layer
work is shown in Figure 1.
matching the length of the real time series signals (i.e., a few
Following the process illustrated in Figure 1, first the (trained) thousands), and a number of hidden linear layers, with rec-
AE network of the DAGGER takes the original time series tified linear units (ReLU) as activation functions, that pro-
data (i.e., set S) and transforms it into a considerably lower gressively reduce the size of the signal down to a couple of
dimensional space via its encoder network. The output of the hundreds. The decoder follows exactly the same architecture
encoder is then fed to the critic network of the GAN, along but in the reverse order.
with synthetic samples produced by the generator network
of the GAN, and the critic simply provides a rating on how 3.2. Wasserstein Generative Adversarial Network
real it considers them to be. The performance of the critic (WGAN)
when differentiating real samples from synthetically gener-
GANs are semi-supervised and unsupervised learning tech-
ated ones is measured, and this information is then used to
niques that attempt to capture the distribution of a set of
update the defining parameters of both the critic and the gen-
true examples and generate new (unseen) samples out of it
erator networks of the GAN. This process is repeated until
(Creswell et al., 2018; Gui, Sun, Wen, Tao, & Ye, 2021). The
an acceptable level of performance is reached (when the syn-
most common incarnation of a GAN in the present involves
thetic samples are indistinguishable from the real ones to the
two neural networks arranged adversarially: a generator net-
critic). Lastly, the synthetic samples produced by the genera-
work that transforms a noise signal into one resembling the
tor network are transformed back to the space of the original
true data, and a discriminator network that tries to discrim-
data by the decoder network of the AE.
inate between real and synthetic data samples as accurately
The reasons for using the hybrid AE+GAN architecture for as possible (Gui et al., 2021). The two networks are trained
the DAGGER framework are twofold. First, current efforts at the same time and in competition with each other, as the
for synthetic data generation typically involve either GANs generator tries its best to dupe the discriminator, and the dis-
or Variational AEs (VAEs). However, as mentioned, such criminator tries its best not to let that happen (Creswell et al.,
standalone efforts result in serious practical difficulties. Our 2018).
hybrid architecture can exploit the strengths of both AEs and
In the DAGGER framework, the GAN part doesn’t strictly
GANS, while avoiding their weaknesses: the AE part signif-
follow the standard architecture (as first conceived by Good-
icantly reduces the dimensionality of the real data, and the
fellow et al. in 2014 (Goodfellow et al., 2014)), and instead
GAN produces quality synthetic samples when operating in
takes the form of a Wasserstein GAN (WGAN) (Arjovsky,
the low dimensional data space, which helps with its stability
Chintala, & Bottou, 2017), which corresponds to a few minor
issues in the training process. Second, our hybrid architec-
practical modifications (although with deep theoretical con-
ture directly addresses the main constraints imposed by the
notations) to the traditional GAN operation to alleviate some
application scenario, as the AE part takes care of the high di-
of its weaknesses (Shmelkov, Schmid, & Alahari, 2018):
mensionality of the data, and the GAN part can then operate
with a number of defining parameters comparable to that of
• In the traditional GAN the generator tries to minimize
the number of signals for training (i.e., a few hundreds).
Ex [log(D(x))] + Ez [log(1 − D(G(z)))] (with D(·) rep-
We provide details about the AE and the GAN in the DAG- resenting the discriminator and G(·) the generator, and x
GER framework next. and z representing respectively a full data sample and a
noise signal), and the discriminator tries to minimize it;
3.1. Autoencoder (AE) in the WGAN, the generator tries to minimize D(G(z)),
and the discriminator, now called critic, tries to mini-
An AE is an unsupervised learning scheme, in the present typ-
mize D(x)−D(G(z)). The reason that the discriminator
ically implemented via neural networks, that are trained to re-
changes its name in the WGAN is the fact that its output
construct their inputs (Bank, Koenigstein, & Giryes, 2020). A
is not longer a number in [0, 1] (with 0.5 as the decision
standard AE architecture consists of two neural networks, an
boundary between real or synthetic samples), and instead
encoder network and a decoder network. The encoder trans-
is any real number, which can be interpreted as a mech-
forms the input into a lower dimensional signal, “compress-
anism that rates the “realness” of the samples, instead of
ing” its information content to a reduced but meaningful set of
simply classifying them between real and synthetic.
features, which define what is commonly known as the latent
space of the AE. The decoder performs the opposite transfor- • After every gradient update on the critic function (i.e.,
D(·)), the defining weights of the critic are kept bounded

3
A NNUAL C ONFERENCE OF THE P ROGNOSTICS AND H EALTH M ANAGEMENT S OCIETY 2023

Figure 1. Schematic of DAGGER framework for synthetic time series generation in manufacturing application scenarios.

inside a small range of values, as opposed to allowing The training of the WGAN, like with most traditional GANs,
them to take any value. occurs while synthetic data is being generated, adjusting its
hyperparameters at every “training” step. This takes place
• The optimizer of the critic switches from the traditional
until an acceptable level of performance is reached. Once
momentum based methods, like Adam, to RMSProp,
that point is reached, the DAGGER can be queried for new
which tries to resolve the problem that gradients may
synthetic time series samples.
vary widely in magnitudes for the different weights of
the critic. It does so by adapting the step size individu-
3.3.2. Querying
ally for each weight.
Querying the DAGGER for synthetic data samples corre-
• The generator is trained less frequently than the critic is sponds to simply having the generator of the WGAN produce
(e.g., 1 training cycle for the generator every 5 training synthetic latent space samples, and subsequently having the
cycles for the critic). decoder network of the AE transform them from the latent
space to the space of the time series data.
3.3. Training and Querying the DAGGER
The schematic shown in Figure 1 suggests that the two main 4. P ERFORMANCE E VALUATION
components of the DAGGER (the AE and the WGAN) op- In this section we present the performance evaluation results
erate simultaneously, without much distinction between the for the DAGGER framework. For this we consider two dif-
specifics of the training process and those of the generation ferent assessment mechanisms, one directly analyzing the
of synthetic samples. Strictly speaking, however, the differ- “closeness” of the synthetic data to the real data via numer-
ent parts of the two main networks are employed at different ous evaluation tools (like the means per time sample, the
times, depending on whether training or synthetic data gener- variances per time sample, DFT, KL distance, etc.), and the
ation is taking place, in an asynchronous fashion. other one indirectly assessing the quality of the synthetic data
through its ability to produce approximate system matrices
3.3.1. Training when performing system identification for input-output sig-
The training of the DAGGER, in the traditional sense of the nal pairs of a dynamical system.
term (i.e., a process taking place a priori, with the sole pur-
pose of finding the “optimal” values for the defining hyperpa- 4.1. Comparison with Riemannian Hamiltonian Varia-
rameters of the model), mainly involves the AE network. In tional AE (RHVAE)
other words, the real time series data is used to train the AE Variational autoencoders (VAEs), first introduced by
network, involving both the encoder and the decoder, without (Kingma & Welling, 2013) and (Rezende et al., 2014), are
involving the WGAN. Then, once convergence has occurred AE systems that can generate synthetic data by perturbing the
(the loss is below an acceptable threshold), the original data is original data in the latent space through, for example, ran-
passed through the encoder network of the AE, and its output dom noise addition, and then mapping the resulting data back
is then used to “train” the WGAN.

4
A NNUAL C ONFERENCE OF THE P ROGNOSTICS AND H EALTH M ANAGEMENT S OCIETY 2023

to the original space via the decoder network. In the VAE 4.2. Direct Performance Assessment
research world, a few works have emerged directly tackling
To validate the performance of the DAGGER with the
the data augmentation problem under the constraints of small
RHVAE, six time-series sensors from a milling environment
training data set and very high dimensional data: (Chadebec,
were compiled from (Teubert, 2022) (referenced as mill data
Mantoux, & Allassonnière, 2020; Chadebec & Allassonnière,
set). Each sensor had a training set composed of 146 runs of
2021; Chadebec, Thibeau-Sutre, Burgos, & Allassonnière,
9000 length time steps. The majority of the discussion fo-
2022). Their general approach is based on improvements to
cuses on three signals: AC motor current, spindle vibration,
two of the key procedures in the VAE operation, one being
and table vibration. The AC motor current was chosen due
the structure of the latent space, and the other one being the
to it being the only signal having a periodic nature while the
sampling procedure carried out in the latent space to generate
spindle vibration was chosen due to its more random transient
the synthetic data.
properties. The table vibration had similar properties as the
Regarding the geometry of the latent space, the latest of remaining three signals with well defined transients and mod-
works listed above ((Chadebec et al., 2022)) propose what erate noise being present. A toy data composed of a sinusoid
the authors refer to as the Riemannian Hamiltonian Varia- containing a random phase shift and amplitude enveloping
tional AE (RHVAE), and it configures the latent space as a per run (referenced as modulated sinusoid) was also used for
Riemannian manifold with a specific metric G. Noting that model training.
defining the metric using standard avenues (i.e., involving
The metrics for performance evaluation of the generative
the Jacobian of the generator function) results in consider-
models included: mean and variance comparisons, K-L di-
able computational difficulties in practice, Ref. (Chadebec et
vergence, auto-correlation of mean and variance signal, and
al., 2022) defines a scheme to learn the metric G as a func-
discrete Fourier transformations (DFT). These metrics were
tion of an artificial variable z, from the data and the struc-
chosen to better quantify the DAGGER’s ability to capture
ture of the neural network. To then navigate and sample the
important time series characteristics such as: multiple tran-
resulting Riemannian latent space, (Chadebec et al., 2022)
sient properties, probability density evolution over time, fre-
defines a multivariate zero-mean Gaussian random vector v,
quency information, and statistical properties. Error between
with covariance matrix given by G(z). Then, the “explo-
the synthetic and real data was calculated using mean abso-
ration” of the Riemannian latent space is based on Hamil-
lute percentage error (MAPE) and is provided in the figures
tonian dynamics, with z being the position and v the veloc-
for the auto-correlation section and expected value/variance
ity: the Hamiltonian is found by adding the resulting poten-
section of the results.
tial energy and kinetic energy, which are functions of z and
v. With this, a leapfrog integrator scheme is defined to it-
4.2.1. Statistical Moments
eratively compute the evolution in time of z and v from the
Hamiltonian expression, which ensures that the target distri- The first statistical analysis technique determines the mean
bution is preserved, and that the update procedure is volume and variance of each signal. This metric allows for quantifi-
preserving and time reversible. This iterative process then cation of the DAGGER’s ability to capture basic statistical
configures a Markov Chain (MC) on z, with transition prob- properties of the real data sets. Both the variance and ex-
abilities given by particular functions of the metric G (which pected value were calculated for each time step of the entire
is in turn a function of the current values of z and v). The data set. This methodology analyzes how well DAGGER cap-
defined MC is said to converge to a distribution ptarget , which tures both transient properties and the evolution of statistical
is used to efficiently estimate the relevant distributions from properties as the run progresses.
where the synthetic data samples are ultimately drawn. This
DAGGER was robust in accurately producing data that cap-
approach is then shown in (Chadebec et al., 2022) via exten-
tured the spread of possible values for the modulated sinu-
sive numerical analyses to be able to handle very high dimen-
soids, shown in Figure 2. This is evident from the variance
sional data (since the operation essentially occurs in the latent
signal generated by DAGGER having 19.8% less MAPE er-
space, which is considerably lower dimensional), and to ef-
ror than the RHVAE. Both models were comparable in cap-
fectively operate with very few training samples (as the sam-
turing the expected value with DAGGER having 3.4% error
pling scheme is defined over more meaningfully constructed
(0.2% less than RHVAE). Although having almost identical
distributions).
performances, DAGGER produced smoother signals and bet-
Considering how well the RHVAE framework fits the prob- ter captured significant trends present in the real data set’s
lem outlined in Section 2, we utilize it as the main contrast expected value signal.
medium for the DAGGER in the performance evaluation sub-
DAGGER had similar performance to the RHVAE for gener-
sections below.
ating synthetic data when training on the mill data set. Fig-
ure 3 displays this with only slight error differences being

5
A NNUAL C ONFERENCE OF THE P ROGNOSTICS AND H EALTH M ANAGEMENT S OCIETY 2023

Figure 2. The DAGGER (left column) and RHVAE (right column) generated/real expected value and variance signal for the
modulated sinusoid. Note: The graph for the real data is sometimes presented in gray due to the overlap from the graph of the
synthetic data.

present for most of the signals. The main exception is with for every time step within the training data. This metric al-
spindle vibration signal with the RHVAE poorly capturing the lows for the direct comparison of probability distributions
expected value signal having 27.1% higher error than DAG- over the time evolution of the signal. A KL divergence of
GER. Excluding that signal, both models performed equally zero indicates that the probabilities are identical where as a
well in capturing major trends present in remaining signals in higher values would indicate less similarity. The bin size for
the mill data set. the histograms remained constant for each case and all data
sets were normalized ([0, 1]) prior to histogram generation.
In Figure 4 the variance signal of the same three signal shown
in Figure 3 are given, DAGGER performed well with cap- DAGGER performance was comparable to that of the
turing trends and variance values for the majority of the sig- RHVAE for the modulated sinusoid case with similar trends
nals. DAGGER did have some issues when attempting to cap- and values for the KL divergence (Figure 5). Figure 5 de-
ture the variance of the AC motor current as compared to the picts similar trends with the main differences being noticed
RHVAE. However, the RHVAE saw issues when modeling at t0 = 0 ms and tf = 9000 ms. However, DAGGER’s
the properties of the table vibration and was outperformed by performance is slightly improved with the majority of KL
DAGGER. The difference in the nature of these two signals is divergence values at or below 0.2 while the RHVAE was
likely the cause for the noted discrepancies as the AC motor close to 0.25 for nearly 1000 ms.
current is a periodic signal while the spindle vibration signal
For four of the six signals, similar performance between the
is notably more random and noisy in nature.
RHVAE and DAGGER was observed. The largest variations
occurred with the AC motor current and table vibration as
4.2.2. KL Divergence
shown by Figure 6. The KL divergence for the AC small
The second metric employed was the Kullback-Leibler (KL) motor current was at or below 0.2 for the majority of the syn-
divergence. This metric measures the quality of the synthetic thetic DAGGER data set while the RHVAE kept KL diver-
data by comparing how similar the probability densities are gence under 0.05 a part from a few time steps. Referencing

6
A NNUAL C ONFERENCE OF THE P ROGNOSTICS AND H EALTH M ANAGEMENT S OCIETY 2023

Figure 3. The DAGGER (left column) and RHVAE (right column) generated/real expected value signal for the AC current
signal (top row), spindle vibration (middle row), and table vibration (bottom row), present in the mill data set. Note: The graph
for the real data is sometimes presented in gray due to the overlap from the graph of the synthetic data.

Figure 4, DAGGER saw issues with capturing the variance For time series signal cases, auto-correlation shows trends
of this signal thereby leading to this discrepancy. DAGGER present in the signal as the lag increases. Comparing the auto-
outperformed the RHVAE for the table vibration generations correlation of the generated data’s expected value and vari-
with the KL divergence of DAGGER hovering near 0.1 while ance signal with that of the real data reveals how well the gen-
the RHVAE was near 0.2. erative models will be able to capture the change of dynamics
over time. The mean absolute percentage error (MAPE) can
4.2.3. Auto-correlation then be calculated for these signals as a metric for perfor-
mance between DAGGER and RHVAE via:
The next metric used for performance measurement was the
auto-correlation of the variance and expected value signal.

7
A NNUAL C ONFERENCE OF THE P ROGNOSTICS AND H EALTH M ANAGEMENT S OCIETY 2023

Figure 4. DAGGER (left column) and RHVAE (right column) generated/real variance signal for: AC motor current, spindle
vibration, and table vibration, present in the mill data set. Note: The graph for the real data is sometimes presented in gray due
to the overlap from the graph of the synthetic data.

large MAPE error is caused by the real data having near 0 val-
n
1 X At − Ft ues for a considerable number of time samples, which trans-
M= × 100 (2)
n t=1 At lates into near 0 denominator values in the percentage calcu-
lations. Both DAGGER and RHVAE show improved perfor-
where n is the number of simulated time steps, At the real
mance when capturing trends in the variance, with DAGGER
signal, and Ft the synthetic data.
outperforming the RHVAE by having 9.1% less MAPE er-
Both DAGGER and RHVAE showed issues with capturing ror. This improvement can be explained through DAGGER
the time trend of the modulated sinusoid with MAPE errors having less variance present throughout the auto-correlation
above 1000% for the expected value signal (Figure 7). The signal.

8
A NNUAL C ONFERENCE OF THE P ROGNOSTICS AND H EALTH M ANAGEMENT S OCIETY 2023

Figure 5. DAGGER (left column) and RHVAE (right column) KL divergence for the modulated sinusoid

Figure 6. DAGGER (left column) and RHVAE (right column) KL divergence for the AC small motor current (top row) and
table vibration (bottom row)

9
A NNUAL C ONFERENCE OF THE P ROGNOSTICS AND H EALTH M ANAGEMENT S OCIETY 2023

Figure 7. DAGGER (left column) and RHVAE (right column) auto-correlation for the expected value (top row) and variance
(bottom row) signals. Note: The graph for the real data is sometimes presented in gray due to the overlap from the graph of the
synthetic data.

DAGGER and RHVAE show similar performance in captur- noisy spindle vibration signal with 68.7% less MAPE error
ing the trends of the expected value for the mill sensors (Fig- than the RHVAE.
ure 8). When capturing trends in the AC motor current, per-
formance degrades with MAPE errors in excess of 600%. 4.2.4. DFT
As mentioned in the variance signal analysis, this large er-
The final metric used was the discrete Fourier transform. This
ror is caused by near 0 values in the real data, which goes in
was chosen due to the DFT’s ability to transform a time series
the denominator of the error calculations. For the remain-
signal into a frequency domain representation that can capture
ing signals, both models demonstrate nearly identical per-
both frequency and amplitude information. Using this metric
formance capturing the lagged trends of the expected value
on the synthetic and real data allows for a visual compari-
signal. DAGGER’s and RHVAE’s auto-correlated expected
son of the performance of DAGGER in capturing frequency
value signals reflect noisy trends when compared with the
information.
training data set for the spindle vibration. This was most
likely caused from high noise content present throughout the DAGGER captured the main frequency and amplitude infor-
spindle vibration signal. The remaining signals had similar mation present throughout the training set, but incorrectly ex-
auto-correlated expected value signals as the table vibration. cluded the high frequency behavior (Figure 10). This per-
In general, DAGGER and RHVAE had similar performance formance was comparable to the RVHAE with both captur-
in both capturing signal trends and similar MAPE errors for ing the low frequency behavior. Both DAGGER and RHVAE
these signals. also contain a low amplitude noise floor that is not present
throughout the modulated sinusoid.
Shown in (Figure 9), DAGGER maintained similar perfor-
mance to the RHVAE in capturing the lagged trends in the One should note DAGGER’s ability in capturing the major
variance for both the AC motor current and table vibration. frequency and amplitude components present in the mill sig-
DAGGER captured the lagged trends of the variance for the nal as shown by Figure 11. DAGGER and RHVAE performed

10
A NNUAL C ONFERENCE OF THE P ROGNOSTICS AND H EALTH M ANAGEMENT S OCIETY 2023

Figure 8. DAGGER (left column) and RHVAE (right column) generated/real auto-correlation signal for the expected value
signal of the AC motor current (top row), spindle vibration (middle row), and table vibration (bottom row). Note: The graph
for the real data is sometimes presented in gray due to the overlap from the graph of the synthetic data.

similarly for the AC motor current with a DC component models out of it. For this we use the model of a simple dy-
(caused by data normalization) and the main AC current fre- namic system typically studied in the context of control the-
quency being depicted. Both models also have shown similar ory, and then use synthetic input-output signal pairs generated
performance in capturing the high and low frequency compo- by DAGGER and RHVAE to perform system identification.
nents present in the table vibration signal. The quality of the synthetic data is established by comparing
the resulting model matrices with those built with the original
4.3. Indirect Performance Assessment data.
Here we consider the quality of the synthetically generated The direction to take two pills every 8 hours can be con-
data by measuring its effectiveness when building black box sidered as a control problem and is modeled via a system

11
A NNUAL C ONFERENCE OF THE P ROGNOSTICS AND H EALTH M ANAGEMENT S OCIETY 2023

Figure 9. DAGGER (left column) and RHVAE (right column) generated/real auto-correlation signal for the variance of the AC
motor current (top row), spindle vibration (middle row), and table vibration. Note: The graph for the real data is sometimes
presented in gray due to the overlap from the graph of the synthetic data.

known as compartment models (Åström & Murray, 2010). A partments and that transport is driven by concentration differ-
schematic to illustrate the idea of the model is shown in Fig- ences yields (Åström & Murray, 2010):
ure 12 (Åström & Murray, 2010). Major components of the
human body such as the blood, tissues, and lungs are viewed
as compartments separated by membranes. The flow rates dc1
V1 = q(c2 − c1 ) − q0 c1 + c0 u, c1 ≥ 0, (3a)
between compartments are proportional to the concentration dt
differences in each compartment. dc2
V2 = q(c1 − c2 ), c2 ≥ 0, (3b)
dt
Simplifying the left half of Figure 12 into two compartments
y = c2 (3c)
as well as assuming there is perfect mixing between com-

12
A NNUAL C ONFERENCE OF THE P ROGNOSTICS AND H EALTH M ANAGEMENT S OCIETY 2023

Figure 10. DAGGER (left column) and RHVAE (right column) discrete Fourier transform for the synthetic data of the mod-
ulated sinusoid. Note: The graph for the real data is sometimes presented in gray due to the overlap from the graph of the
synthetic data.

Figure 11. DAGGER (left column) and RHVAE (right column) discrete Fourier transform for the synthetic data for the AC
motor current (top row) and table vibration (bottom row). Note: The graph for the real data is sometimes presented in gray due
to the overlap from the graph of the synthetic data.

where c1 and c2 are drug concentrations in each compartment Murray, 2010):


and V1 and V2 are the compartment volumes, which is shown    
dc −k0 − k1 k1 b
on the right half of Figure 12. Equation (3) can be written c + 0 u,
 
= y= 0 1 c (4)
in state-space form through the introduction of k0 = q0/V1 , dt k2 −k2 0
k1 = q/V1 , k2 = q/V2 , and b0 = c0/V1 giving (Åström &

13
A NNUAL C ONFERENCE OF THE P ROGNOSTICS AND H EALTH M ANAGEMENT S OCIETY 2023

Figure 12. Compartment Modeling Schematics

For the test case here, k0 = 0.1, k1 = 0.1, k2 = 0.5,


b0 = 1.5, c1 = 0, and c2 = 1. Four random instances (ex-
periments) of the real normalized [0, 1] input signal u are de-
picted in top left of Figure 13 whereby a sinusoid of random
amplitude between a : [1, 5], random phase ϕ : [0, 1/3π] and
increasing frequency ω : [1, 2π± 1/4π] for a simulated mixing
duration of 50 min is defined. The corresponding output sig-
nal of the data is shown on the upper right. Similarly, four
random instances of the synthetic normalized input-output
signals for the DAGGER and RHVAE platforms are shown
on in the middle and bottom row of Figure 13. As both the
real and synthetic input signals are random, an indirect quan-
titative comparison will be performed here. However, look-
ing closely at the DAGGER and RHVAE signals, note the
noisy behavior of both the inputs and outputs as compared to
the real data set. To perform a indirect comparison, one can
turn to system identification to generate corresponding matrix Figure 13. Compartment Model (Eq. (3)) Input-Output Visu-
alization
equations of similar form to Eq. (4).
For the system identification problem, 150 experimental in-
put signals were generated using the sinusoidal inputs as de-
fined above and passed through Eq. (4). It should be noted explains the mismatch between the matrices given by Eq.(4)
that in order to generate both the input signal and output data and those shown here in Table 1.
simultaneously through DAGGER and RHVAE for system
identification, the signals were concatenated. This approach By inspection, the A, and B matrices appear relatively close
was chosen to maintain some connectivity in the system de- to another. In fact, the distance between AReal and ADAG as
fined by Eq. (4). This data set was then treated as the seeds measured by the Frobenius norm is: ||AReal − ADAG ||F =
for training DAGGER and RHVAE. Subsequently, the plat- 0.0719 and ||BReal − BDAG ||F = 1.8311 × 10−4 . This cor-
forms generated 150 augmented experimental signals, a few responds to DAGGER adequately capturing the sinusoidal in-
of which are shown in Figure 13. System identification at- put signal. Similarly, for the RHVAE ||AReal − ARH ||F =
tempts to identify the A, B, C, and D matrices shown by the 0.0397 and ||BReal − BRH ||F = 8.3009 × 10−5 where one
generic state-space form given below: could also conclude the input signal has been adequately cap-
tured. In contrast, ||CReal − CDAG ||F = 420.0466 and
ẋ = Ax + Bu (5a) ||CReal − CRH ||F = 422.1737 meaning the output signal
y = Cx + Du (5b) is partially degraded when compared to the result from the
    input. This is likely due to the required concatenation of the
−k0 − k1 k1 b input-output signal prior to training DAGGER and RHVAE
, B = 0 , C = 0 1 , and
 
with A =
k2 −k2 0 causing an abrupt transition between signal behavior at the
D = 0 for the real data generated via Eq. (4). Leveraging the junction between the left and right hand sides of Figure 13.
DAGGER and RHVAE experimental data and a generic sub- While the networks can be trained independently for both the
space method within the time-domain for system identifica- input and output signals, one would lose the corresponding
tion, the platforms’ A, B, C, and D matrices can be compared connectivity required for system identification. That is, inde-
to the real matrices also passed through an identical system pendent networks would match random inputs with random
identification algorithm. It should be noted that both the real outputs thereby violating the dependence needed for system
and synthetic data sets were normalized between [0, 1], which identification.

14
A NNUAL C ONFERENCE OF THE P ROGNOSTICS AND H EALTH M ANAGEMENT S OCIETY 2023

Table 1. State-Space Comparison: Real vs. DAGGER vs. RHVAE


Real DAGGER RHVAE
     
0.9956 0.004 0.9996 0.0113 0.9996 −0.0044
A −0.011 1.0014 −0.0044 0.9303 0.0119 0.9703
     
−5.4097 × 10−5 −2.4676 × 10−6 −5.3345 × 10−7
B
−1.2996 × 10−4 4.5728 × 10−5 −6.6542 × 10−5

C [−112.1032 −45.5782] [305.0968 −3.2411] [307.7099 0.9949]

D [0] [0] [0]

5. C ONCLUSION input-output signals required for system identification will


also be addressed to improve synthetic data generation. The
In this paper, we presented the DAGGER framework, which
similarly between DAGGER and the RHVAE results is sup-
corresponds to a hybrid AE+GAN deep neural network archi-
ported throughout the direct performance assessment as well
tecture specifically conceived to produce synthetic high di-
whereby both platforms exhibit comparable characteristics
mensional time series data with very small training data sets,
during the examination of different synthetically generated
which are characteristic of traditional manufacturing applica-
signals. That is, the results showed that DAGGER’s perfor-
tions.
mance is in general satisfactory, and comparable with that of
We evaluated the effectiveness of the DAGGER framework the RHVAE in all the considered evaluation scenarios. From
by analyzing the quality of the synthetically generated data our tests, the area DAGGER performed better was in the du-
through both direct and indirect performance assessment ration of the training process, with it achieving convergence
methods. In the former, for toy artificial and real sensor data empirically in at least an order of magnitude less time then the
(a publicly available benchmarking data set containing sensor RHVAE. This requires a deep and careful exploration, how-
readings from a milling machine), we employed performance ever, and it is thus part of our future explorations.
metrics based on the similarity between original data and syn-
Looking ahead, optimization of the DAGGER architecture
thetic data in terms of mean, variance, KL divergence, behav-
is planned with implementing lower reconstruction error au-
ior in Fourier domain, and auto-correlation. Regarding the
toencoder networks and tuning parameters of the GAN. Fur-
indirect performance assessment, we evaluated the adequacy
thermore, software tools will be completed to advance im-
of the synthetically generated time series samples to perform
provements in analyzing techniques to increase understand-
system identification of a black box dynamic system. In all
ing of latent space modeling and exploration for the GAN.
the cases, we compared the results from DAGGER with those
Different generative models beside the RHVAE will be ex-
from a RHVAE, which is a recently published data augmen-
plored as a benchmark against improved DAGGER architec-
tation framework specifically designed to handle very small,
ture.
high dimensional data sets.
Direct analysis methods revealed similar performance be- ACKNOWLEDGMENT
tween the DAGGER and RHVAE regarding ability to cap-
This work is supported by the US Air Force AFWERX SBIR
ture frequency, statistical, and time trend information. Both
Program under Grant No. FA8649-22-P-0701.
models captured the transient properties present in the signal
with low MAPE error but had difficulty capturing trends in
the expected value and variance for noisy signals. Overall, R EFERENCES
the DAGGER had no significant performance differences in
comparison to the RHVAE. Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasser-
stein generative adversarial networks. In International
Regarding the indirect performance assessment procedures, conference on machine learning (pp. 214–223).
DAGGER and RHVAE demonstrated similar results when
identifying a compartment model system. Both platforms Åström, K., & Murray, R. (2010). Feedback systems: An in-
adequately captured input signal behavior at the detriment troduction for scientists and engineers. Princeton Uni-
to output signal performance. Aside from further hyperpa- versity Press.
rameter tuning within both DAGGER and RHVAE, an al- Bank, D., Koenigstein, N., & Giryes, R. (2020). Autoen-
ternative to the abrupt transition between the concatenated coders. arXiv preprint arXiv:2003.05991.

15
A NNUAL C ONFERENCE OF THE P ROGNOSTICS AND H EALTH M ANAGEMENT S OCIETY 2023

Chadebec, C., & Allassonnière, S. (2021). Data augmentation and data engineering.
with variational autoencoders and manifold sampling. Hutter, F., Lücke, J., & Schmidt-Thieme, L. (2015). Beyond
In Deep generative models, and data augmentation, la- manual tuning of hyperparameters. KI-Künstliche In-
belling, and imperfections (pp. 184–192). Springer. telligenz, 29(4), 329–337.
Chadebec, C., Mantoux, C., & Allassonnière, S. (2020). Iglesias, G., Talavera, E., González-Prieto, Á., Mozo, A., &
Geometry-aware hamiltonian variational auto-encoder. Gómez-Canaval, S. (2023). Data augmentation tech-
arXiv preprint arXiv:2010.11518, 0-44. niques in time series domain: a survey and taxonomy.
Chadebec, C., Thibeau-Sutre, E., Burgos, N., & Allas- Neural Computing and Applications, 35(14), 10123–
sonnière, S. (2022). Data augmentation in high dimen- 10145.
sional low sample size setting using a geometry-based Khanuja, H. K., & Agarkar, A. A. (2023). Towards gan chal-
variational autoencoder. IEEE Transactions on Pattern lenges and its optimal solutions. Generative Adversar-
Analysis and Machine Intelligence. ial Networks and Deep Learning: Theory and Applica-
Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., tions.
Sengupta, B., & Bharath, A. A. (2018). Generative Kingma, D. P., & Welling, M. (2013). Auto-encoding varia-
adversarial networks: An overview. IEEE signal pro- tional bayes. arXiv.
cessing magazine, 35(1), 53–65. Rezende, D.J., Mohamed, S., Wierstra, & D. (2014). Stochas-
Demir, S., Mincev, K., Kok, K., & Paterakis, N. G. (2021). tic backpropagation and approximate inference in deep
Data augmentation for time series regression: Apply- generative models. International conference on ma-
ing transformations, autoencoders and adversarial net- chine learning, 1278–1286.
works to electricity price forecasting. Applied Energy,
Shao, H., Yao, S., Sun, D., Zhang, A., Liu, S., Liu, D., . . .
304, 117695.
Abdelzaher, T. (2020). Controlvae: Controllable varia-
Diez-Olivan, A., Del Ser, J., Galar, D., & Sierra, B. (2019).
tional autoencoder. In International conference on ma-
Data fusion and machine learning for industrial prog-
chine learning (pp. 8655–8664).
nosis: Trends and perspectives towards industry 4.0.
Information Fusion, 50, 92–111. Shmelkov, K., Schmid, C., & Alahari, K. (2018). How good
Doersch, C. (2021). Tutorial on variational autoencoders. is my gan? In Proceedings of the european conference
Figueira, A., & Vaz, B. (2022). Survey on synthetic data on computer vision (eccv) (pp. 213–229).
generation, evaluation methods and gans. Mathemat- Smith, K. E., & Smith, A. O. (2020). Conditional gan for
ics, 10(15). doi: 10.3390/math10152733 timeseries generation.
Goodfellow, I. (2016). Nips 2016 tutorial: Generative adver- Teubert, C. (2022). Milling wear data set. Retrieved
sarial networks. arXiv preprint arXiv:1701.00160. from https://data.nasa.gov/Raw-Data/
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde- Milling-Wear/vjv9-9f3x (Dataset)
Farley, D., Ozair, S., . . . Bengio, Y. (2014). Genera- Wright, L., & Davidson, S. (2020). How to tell the difference
tive adversarial nets. In Z. Ghahramani, M. Welling, between a model and a digital twin. Advanced Model-
C. Cortes, N. Lawrence, & K. Weinberger (Eds.), ing and Simulation in Engineering Sciences, 7(1), 1–
Advances in neural information processing systems 13.
(Vol. 27). Curran Associates, Inc. Yang, Z., Li, Y., & Zhou, G. (2023). Ts-gan: Time-series
Gui, J., Sun, Z., Wen, Y., Tao, D., & Ye, J. (2021). A review gan for sensor-based health data augmentation. ACM
on generative adversarial networks: Algorithms, the- Transactions on Computing for Healthcare, 4(2), 1–
ory, and applications. IEEE transactions on knowledge 21.

16

You might also like