1812 11285v4
1812 11285v4
1812 11285v4
Abstract
1 Introduction
The widespread use of sensors, high throughput experiments and simulations, as well high per-
formance computing have made “big data” available for a range of engineering physics systems.
This has sparked an explosion of interest in data-driven modeling for these systems. An extreme
manifestation of this still-developing field is seen in “model-free” approaches that do not rely on
∗
Department of Mechanical Engineering, University of Michigan
†
Department of Mechanical Engineering, Michigan Institute for Computational Discovery & Engineering, Univer-
sity of Michigan, University of Michigan
‡
Departments of Mechanical Engineering, and Mathematics, Michigan Institute for Computational Discovery &
Engineering, University of Michigan, corresponding author, [email protected]
1
physics-based knowledge. Such a path to modelling holds the promise of very high computational
efficiency by circumventing solutions to potentially complex physics. However, it draws criticism for
its “black box” nature, and often for lack of interpretability. More seriously, these approaches offer
scant openings for analysis when the model fails. Conversely, the availability of abundant data also
presents opportunities to discover mathematical frameworks, with well-understood physical mean-
ing, which govern the underlying behavior. This has fueled the field of system identification, which
is particularly compelling for determining the governing partial differential equations (PDEs). This
is so, because knowledge of the PDE directly translates to deep insights to the physics, guided by
differential and integral calculus.
Approaches for data-assisted and data-driven discovery of PDEs have proceeded along several
fronts. (a) Early work in parameter identification can be traced to nonlinear regression approaches
[1, 2]. (b) If the governing equation is unknown, an approximate model—whether physical, empiri-
cal, or mixed—could be learned from data. For example, the approximate model could be a neural
network [3], a reduced-order model [4, 5], a phenomenological effective model formed by a linear
combinations of basis functions in finite dimensions [6, 7], or a linear mapping from a current to a
future state where the dynamic characteristics of the mapping could be learned by Dynamic Mode
Decomposition [8, 9]. (c) Lastly, the complete underlying governing equations could be extracted
from data by combining symbolic regression and genetic programming to infer algebraic expressions
along with their coefficients [10, 11]. In this context, while genetic programming balances model
accuracy and complexity, it proves very expensive when searching for a few relevant terms from
a large library of candidates. In a recent approach to solving inverse problems [12], the strong
form of a specified PDE was directly embedded in the loss function while training deep neural
network representations of the solution variable. This approach results in the minimum-residual
solution subject to a deep neural network global basis. The work contained examples where up to
two unknown PDE coefficients were estimated, and did not explore system identification from a
larger set of possible operators. A somewhat more ambitious and extensive approach is based on
the recognition that most physical systems have only a few relevant terms, thus creating an oppor-
tunity to develop sparse regression techniques for system identification. This class of approaches
has recently been applied with success [13, 14, 15] to determining the operators in PDEs from
a comprehensive library of candidates, thus efficiently discovering governing equations from data.
More recent extensions have been applied to model recovery using sparse data after abrupt changes
in the system [16], hybrid dynamical systems [17] and multiscale systems [18]. The work presented
here is based on this approach, inspired by its direct treatment of operator forms in PDEs, and
their connections with physical mechanisms, while presenting several novel advances. These include
the adoption of the variational setting, via the weak forms of PDEs; step-wise regression with the
statistical F-test; a treatment of noise and its amelioration by varying fidelity.
For context, we briefly discuss the role of pattern forming systems of equations in developmental
2
biology and materials physics. Following Alan Turing’s seminal work on reaction-diffusion systems
[19], a robust literature has developed on the application of nonlinear versions of this class of PDEs
to model pattern formation in developmental biology [20, 21, 22, 23, 24, 25, 26, 27, 28]. The Cahn-
Hilliard phase field equation [29] has been applied to model other biological processes with evolving
fronts, such as tumor growth and angiogenesis [30, 31, 32, 33, 34, 35, 36, 37]. Pattern formation
during phase transformations in materials physics can happen as the result of instability-induced
bifurcations from a uniform composition [38, 39, 40], which was the original setting of the Cahn-
Hilliard treatment. Another phase field treatment, the Allen-Cahn equation [41] models nucleation
and growth of precipitates and also has seen wide use [42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52]. All
these pattern forming systems fall into the class of nonlinear, parabolic PDEs, and have spawned
vast literature in mathematical physics. They can be written as systems of first-order dynamics
driven by a number of time-independent terms of algebraic and differential form. The spatio-
temporal, differentio-algebraic operators act on either a composition (normalized concentration) or
an order parameter. It also is common for the algebraic and differential terms to be coupled across
multiple species.
It is compelling to attempt to discover the physics governing these pattern forming systems by
identifying their PDE forms from data. However proper evaluation of the numerical derivatives is
very challenging. The commonly used finite-difference approximations require data to be available
in a very finely discretized space. Polynomial interpolation has proven to be reliable and robust
[53], and recently has found use in data-driven discovery of PDEs [15]. However, this approach
encounters substantial difficulties in estimating the coefficients of high-order derivatives, because
of the significant error associated with numerical differentiation. Even still, to the best of our
knowledge, the published literature in discovering governing equations has remained focused on
trying to identify the strong form of the PDEs. Every strong form, however, has equivalent weak
form(s), which seem to have not been exploited for system identification. As will be seen in this
communication, the advantages of using weak forms include: the natural occurrence of the loss
function for the regression problem, allowing identification of the boundary conditions, which has
proven to be a challenge, and successfully identifying higher-order derivatives with the aid of smooth
basis functions (such as non-uniform rational B-splines or NURBS in this work).
Here, after briefly reviewing how the Galerkin weak form may be employed for identification of a
general dynamical system (Section 1.1), we focus on introducing our methods for identifying PDEs
in this variational setting, using stepwise regression (Section 2). In this first communication, we
focus on examples to identify reaction-diffusion equations evolving under Schnakenberg kinetics [54],
and the Cahn-Hilliard and Allen-Cahn equations driven by non-convex energy density functions
(Section 3). This choice is driven by the fact that, as explained above, these are widely used models
in biological patterning and morphogenesis, and have well-established traditions of use in materials
physics, also. The datasets we use are all obtained by high fidelity direct numerical simulations. The
3
more challenging problem of datasets from physical experiments will be addressed in a subsequent
communication. We do call attention to our treatment of both high and low fidelity data, with and
without noise, as a precursor to using data from physical experiments. Concluding remarks appear
under Discussion and Conclusions (Section 4).
We note in passing that, very recently, there has been growing interest in combining data-driven
methods with PDEs in a more direct manner: to find solutions to the canonical initial and boundary
value problems (IBVPs) of mathematical physics. These approaches include Gaussian processes
with kernels defined by specific PDEs in linearized form for one- and two-dimensions [55], and deep
neural networks for solution of high-dimensional IBVPs on semi-linear parabolic PDEs [56] or with
strong forms of the target one- and two-dimensional PDEs embedded in the loss functions [12].
and ω is the vector of pre-factors for each term. Thus, the one-field diffusion reaction equation
∂C
− D∇2 C − f = 0 (3)
∂t
with constant diffusivity D and reaction rate f has
Note that the time derivative ∂C/∂t is treated separately from the other terms in order to highlight
the first-order dynamics of all problems we target in this work.
Next, we recall the weak form that is equivalent to a general strong form written as above with
vector of operators χ and vector of pre-factors ω. For infinite-dimensional problems with Dirichlet
boundary conditions on Γc the weak form can be stated as: ∀ w ∈ V where V = {w| w = 0 on Γc },
find c ∈ S where S = {C|C = C̄ on Γc } such that
Z
∂C
w − χ · ω dv = 0. (5)
Ω ∂t
4
H 2 (Ω) | wh = 0 on Γu }, the finite-dimensional (Galerkin) weak form of the problem is satisfied.
The choice of H 2 (Ω) as the Sobolev space is motivated by the differential operators, which reach
the highest order of two in the weak forms we consider (four in strong form). The variations wh
and trial solutions C h are defined component-wise using a finite number of basis functions,
nb
X
h
w = da N a (6)
a=1
nb
X
h
C = ca N a , (7)
a=1
where nb is the dimensionality of the function spaces S h and V h , and N a represents the basis
functions. Substituting wh and C h in Equation (5) by Equations (6) and (7) and decomposing the
integration over Ω by a sum over subdomains Ωe leads to:
XZ X nb h
a a ∂C h
d N − χ(C ) · ω dv = 0. (8)
e Ωe ∂t
a=1
Upon integration by parts, application of appropriate boundary conditions, and accounting for the
arbitrariness of wh in V h , the finite-dimensionality leads to a system of residual equations for each
degree of freedom (DOF):
h
∂C
Ri = F i , C h , ∇C h , ..., N, ∇N... , (9)
∂t
where Ri [C h ] is the ith component of the residual vector, a functional of C h satisfying Ri [C h ] = 0
for the Galerkin solution of the problem, C h .
5
Selecting all or a subset of these DOFs, a Galerkin representation of the field could be constructed
over the domain. The use of fewer DOFs represents lower fidelity data, which is discussed separately
in Section 2.4. We can then obtain the system of residual equations in (9), with the independent
candidate basis terms in weak form populating χ. Below we illustrate the procedure with four
example terms.
Constant term 1:
Z Z X nb nb
X
w1dv = di N i 1N a dv (10)
Ω Ω i=1 a=1
Since the di are arbitrary, we obtain a vector of size NDOF , where the ith component is
Z nb
X
Ξcons
i = Ni N a dv. (11)
Ω a=1
Time derivative ∂C/∂t at time tn : We write a backward difference approximation of the time
derivative at tn as
nb a
cn − can−1
Z X
ΞĊ
i |n = N i
N a dv (12)
Ω ∆t
a=1
where ∆t = tn − tn−1 is the time step, and cn and cn−1 are the control variable values at times tn
and tn−1 .
Laplace operator ∇2 C at time tn : Multiplying by the weighting function and integrating by
parts:
Z nb
Z X nb
X
2
w∇ Cnh dv =− di ∇N i · can ∇N a dv (13)
Ω Ω i=1 a=1
nb
Z X nb
X
+ di N i can ∇N a · nds. (14)
Γ i=1 a=1
Note that the second term on the right is the Neumann boundary term. For a flux boundary
condition on Γ, such that −∇Cnh · n = 1. We thus have two basis functions after accounting for the
arbitrariness of di :
Z nb
∇2 C
X
Ξi |n = − ∇N i · can ∇N a dv (15)
Ω a=1
is the basis for ∇2 Cnh (with zero Neumann boundary condition), and
Z nb
2C
Ξ∇
X
i
BC
|n = Ni 1ds (16)
Γ a=1
6
is the basis for the Neumann boundary term arising from ∇2 C at the surface Γ. As is well known,
in the variational setting the weak form of the divergence operator yields volume (15) and surface
(16) terms, with the latter corresponding to the Neumann boundary condition. In this work we
treat them as distinct bases, thus allowing us to separately identify the Neumann boundary term.
We also call attention to the fact that for the unsteady diffusion problem, equating (12) to
the sum of (15) and (16) amounts to the Backward Euler time integration algorithm, which is
first-order accurate. The mid-point rule, on the other hand is second-order in time.
Biharmonic operator ∇4 Cn at time tn : Multiplying by the weighting function and integrating
by parts:
Z Z Xnb nb
X
w∇4 Cnh dv = di ∇2 N i can ∇2 N a dv
Ω Ω i=1 a=1
nb
Z X nb
X
− di ∇N i · n can ∇2 N a ds
Γ i=1 a=1
nb
Z X nb
X
+ di N i can ∇(∇2 N a ) · nds. (17)
Γ i=1 a=1
Z nb
4C
Ξ∇
X
i
i
NBC
|n = N can ∇(∇2 N a ) · nds. (20)
Γ a=1
The second-order gradients in the Equation (18) require the solutions and basis functions to lie in
H 2 (Ω), while the Lagrange polynomial basis functions traditionally used in finite element analysis
only lie in H 1 (Ω). We therefore draw the basis functions, N , from the family of Non-Uniform
Rational B-Splines (NURBS), and adopt Isogeomeric Analysis (IGA) in our simulations to find
the solutions in H 2 (Ω). A discussion of the NURBS basis and IGA is beyond the scope of this
paper; interested readers are directed to the original works on this topic [57]. We briefly present
the construction of the NURBS basis functions in the following subsection.
7
2.2 NURBS basis functions
Similar to Lagrange polynomial basis functions traditionally used in the Finite Element Method
(FEM), NURBS basis functions are partitions of unity with compact support, satisfy affine co-
variance (i.e., an affine transformation of the basis is obtained by the affine transformation of its
nodes/control points), and support an isoparametric formulation, thereby making them suitable for
a Galerkin framework. They enjoy advantages over Lagrange polynomial basis functions in being
able to ensure C n -continuity, possessing the positive basis and convex hull properties, and being
variation diminishing.
The building blocks of the NURBS basis functions are univariate B-spline functions defined
as follows. Consider two positive integers p and n, and a non-decreasing sequence of values χ =
[ξ1 , ξ2 , ...., ξn+p+1 ], where p is the polynomial order, n is the number of basis functions, ξi are
coordinates in the parametric space referred to as knots (equivalent to nodes in FEM), and χ is the
knot vector. The B-spline basis functions Bi,p (ξ) are defined starting with the zeroth order basis
(
i 1 if ξi ≤ ξ < ξi+1 ,
B0 (ξ) = (21)
0 otherwise,
and higher orders using the Cox-de Boor recursive formula for p ≥ 1 [58]
ξ − ξi ξi+p+1 − ξ
Bpi (ξ) = i
Bp−1 (ξ) + B i+1 (ξ). (22)
ξi+p − ξi ξi+p+1 − ξi+1 p−1
The knot vector divides the parametric space into intervals referred to as knot spans (equivalent
to elements in FEM). A B-spline basis function is C ∞ -continuous inside knot spans and C p−1 -
continuous at the knots. If an interior knot value repeats, it is referred to as a multiple knot. At
a knot of multiplicity k, the continuity is C p−k . Using a quadratic B-spline basis (Figure (1)), a
C 1 -continuous one dimensional NURBS basis can now be constructed:
Bpi (ξ)wi
Npi (ξ) = Pnb i0
(23)
i0 =1 Bp (ξ)wi
0
where wi are the weights associated with each of the B-spline functions. In higher dimensions,
NURBS basis functions are constructed as a tensor product of their one-dimensional counterparts:
Bpi (ξ)Bpj (η)wij
Npij (ξ, η) = Pn Pn j0
(2D) (24)
b1 b2 i0
j 0 =1 Bp (ξ)Bp (η)wi j
0 0
i0 =1
Using NURBS basis functions, we are able to smoothly evaluate the higher order derivatives of
the data by interpolating them from the DOF control variable values. This is crucial in enabling
the identification of higher order derivatives (e.g., the second order derivatives that arise in the
weak form of the Cahn-Hilliard equations) using the stepwise regression methods described next.
8
1
0.9
0.5
0.4
0.3
0.2
0.1
0 i
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Figure 1: A one-dimensional quadratic B-spline basis constructed from the knot vector χ =
[0, 0, 0, 1/6, 1/3, 1/2, 2/3, 5/6, 1, 1, 1].
9
be our target vector. Note that each element ΞĊ |n ∈ RNDOF in (26) is a vector formed in Equation
(12):
..
.
ΞĊ |
i−1 n
Ċ |
Ċ
Ξ i n
Ξ |n =
ΞĊ |
(27)
i+1 n
..
.
ΞĊ
NDOF n |
The residual accounting for all DOFs and time steps can then be expressed as
R = y − Ξω. (29)
y = Ξω (30)
Note that the target vector in (27) and matrix Ξ in (28) are currently structured to contain all
DOFs. However we can always exclude some DOFs by deleting their corresponding rows. Additional
attention is needed if one wishes to eliminate an entire basis operator. For instance, doing so to the
Neumann boundary basis requires deletion of the corresponding column. The reduced Equation
(30) would then represent data with the homogeneous Neumann boundary condition.
If Equation (30) were directly solved for ω as an ordinary least squares regression problem, the
analytic solution is
where a nontrivial y leads to a non-trivial solution to ω. The least squares formulation can also be
interpreted as an optimization problem with the loss function
10
which is simply the squared Euclidean norm of the residual vector obtained from the weak form.
The system identification problem can then be viewed as finding ω that minimizes the loss:
If the linear system in Equation (30) is free of error, the loss function l and the prefactors for
irrelevant terms can all be driven down to machine zero. For a linear system containing error
terms (further discussed in Section 2.4), however, performing a standard regression will result in
a solution of ω with nonzero contributions in all component of this vector. Such a solution of the
system identification will not sharply delineate the relevant bases. Additionally, we could apply
a separate least squares regression for each possible combination of all bases and choose the best
one [60].a However the problem of selecting the best model from among the 2m possibilities grows
exponentially in m, the number of all candidate basis. A viable alternative is to select the relevant
bases by shrinking the pre-factors towards zero for the irrelevant bases. Ridge regression and
`1 regression (also known as least absolute shrinkage and selection operator (LASSO), a popular
problem in compressive sensing research) are two well-known techniques to induce regularization.
The optimization problem Equation (33) is updated with additional penalty terms:
X
ω =arg min l(ω) + λ2 ωj2 Ridge Regression (34)
ω
j
X
ω =arg min l(ω) + λ1 |ωj | LASSO, (35)
ω
j
where λ2 and λ1 are regularization parameters. For both methods, however, selecting a good
regularization parameter is critical, and may need to leverage prior knowledge about the magnitude
of pre-factors[60].
In this work, we use backward model selection by stepwise regression [60]. The algorithm is sum-
marized below.
a
A technique called best subset selection.
11
Algorithm for Model selection by Stepwise regression:
Step 1: Solve ω i in the linear regression problem, Equation (30) using ordinary
least squares regression. Calculate the loss function at this iteration, li .
Step 2: Eliminate basis terms in matrix Ξ by deleting their columns, using the
criterion to be introduced below. Set to zero the corrpesonding components of
ω i . GOTO Step 1. Note that in this case the value of loss function remains
small (li ∼ li−1 ), and the solution may be overfitted.
Step 3: The algorithm stops if the pre-specified criterion does not allow us
to eliminate any more basis terms. Beyond this, the loss function increases
dramatically for any further reduction.
There are several choices for the criterion for eliminating basis terms. Kutz and co-workers applied
a hard threshold on the pre-factors in each iteration [13, 14, 15]. Since the results of sparse re-
gression and sparsity profiles of ω vary with different values of the threshold, a different method is
needed for the final solution corresponding to the optimal tolerance. In the work mentioned above,
Kutz and co-workers used Pareto front analysis by cross-validation to find the optimal tolerance.
The drawback of this approach, in our experience, is that its performance degrades if the pre-factors
of the relevant basis operators differ significantly in magnitude. For the approach to perform well,
a carefully chosen rescaling is needed for the basis operators along with cross-validation.
Here, we explore, as an alternative, a widely used statistical criterion called the F -test [60],
which can be motivated as follows: Since the model at iteration i contains fewer bases than at
iteration i − 1, and is therefore more restrictive, the loss function, in general, is higher at iteration
i than at iteration i − 1. In this sense, model always becomes “worse” after eliminating any basis.
We want to determine whether the model at iteration i is significantly worse than at iteration i − 1.
If not, the basis could be eliminated. We can evaluate the significance of the change by the F -test:
i i−1
l −l
pi−1 −pi
F = li−1
(36)
n−pi−1
where pi is the number of bases at iteration i and n is the total number of bases. The F value
essentially evaluates the increasing loss function under the consideration of model complexity. If F
does not exceed a pre-defined tolerance, α, meaning that the model at iteration i is not significantly
worse than at iteration i − 1, we eliminate the considered basis. Using the F -test in Step 2 of the
above algorithm, this step can be re-stated as following:
12
Step 2 with F -test:
Step 2.1:
Tentatively eliminate the basis corresponding to pre-factors in ω i which are
smaller than the pre-defined threshold, evaluate the F value followed by ordinary
least squares regression on the reduced bases set.
Step 2.2:
IF F < α
THEN formally eliminate these bases in matrix Ξ, by deleting the corresponding
columns. GOTO Step 1.
ELSE GOTO Step 2.1, and choose another basis.
thresholdi = ωsmallest
i
+ (37)
i
where ωsmallest is the smallest value in ω i , and is a small tolerance, and chosen to be 10−5 . This
threshold will eliminate the basis with the smallest pre-factor, as well as all bases with pre-factors
within of the smallest one. Note that α in the F -test is the only hyperparameter in the algorithm.
System identification is significantly more challenging if greater numbers of candidate bases are
retained in the first few iterations. A small value of α may not eliminate all irrelevant bases at
once, but can serve as a good initial value as it will filter some irrelevant bases, thus mollifying
the problem to a degree. In this work, we choose α = 1 initially, and increase it to 10 when no
more basis can be eliminated for α = 1.b . We have summarized the step-wise algorithm in Figure
2. As mentioned previously, our approach builds on recent work in sparse regression [13, 14, 15].
However, it differs prominently from those papers in adopting a variational setting, using the F -
test for determining bases, and, as will be seen in Section 2.4, in considering the influence of data
fidelity and noise.
More broadly, the purpose of the F -test is to perform model selection: a criterion to judge
whether the new candidate model with fewer operators is more suitable than the previous one. At
the core, the comparison is made to balance the desires of (a) fitting the data and (b) reducing
the model complexity. Indeed, model selection is a rich and active area of research, encompassing
methods such as cross-validation [61, 62], Akaike information criterion (AIC) [63], Bayesian infor-
mation criterion (BIC) [64], and Bayes factor [65, 66]. We will explore additional model selection
techniques in the future, especially those that require a Bayesian version of our system identification
framework.
b
For a different problem, the value can be rigorously chosen by cross-validation.
13
Stop
data could be contaminated by background noise as well as data collection error, such that the
observed data C
b may be noisy. We model these imperfections using an additive form
b h = C h + .
C (38)
14
We further assume that , the noise on C h , follows an independent and identically distributed
(i.i.d.) Gaussian distributionc with zero mean and standard deviation σ :
∼ N (0, σ 2 ) (40)
Since linear combinations of mutually independent Gaussian random variables are also Gaussian,
we have:
nb a nb
cn − (cn−1 )a a Ċ a
Z Z
˙ X X
C i i
Ξi |n = N N dv + N N dv (41)
b
Ω ∆t Ω ∆t
a=1 a=1
√
where Ċ is a Gaussian distribution with zero mean and standard deviation 2σ. Since the integral
is computed using quadrature, we have:
Q nb a Q nb
˙ X X cn − (cn−1 )a a X X Ċ a
ΞC
i |n = N i
N |J|wq + N i
N |J|wq , (42)
b
∆t ∆t
q=1 a=1 q=1 a=1
where |J| is the determinant of the Jacobin matrix of the element mapping from an isoparametric
domain, and wq is the weight for the quadrature point q. The first term on the right hand side
˙
reflects the noiseless (true) component of ΞC
i |t , and the second term is the error. the true term
b
can − (can−1 ) ∂C h
lim = , (43)
∆t→0 ∆t ∂t
however the error term scaled by the time step diverges as ∆t −→ 0. It also indicates that the
error due to noise decreases if larger time steps are used:
error term 1
∼ , (44)
true term ∆t
For gradient operators, such as the Laplacian and the Biharmonic operator, we have
Q nb Q nb
2C
X X X X
Ξ∇ |t = ∇N i · can ∇N a |J|wq + ∇N i · ∇N a |J|wq (45)
b
i
q=1 a=1 q=1 a=1
Q nb Q nb
4C
X X X X
Ξ∇ |n = ∇2 N i can ∇2 N |J|wq + ∇2 N i ∇2 N |J|wq , (46)
b
i
q=1 a=1 q=1 a=1
c
While the observation noise may have systematic bias and correlation, we do not explore these possibilities in
this paper, and instead take an initial step to assume an i.i.d. setting.
15
from which, since the forward solution converges with decreasing mesh size, h, we have:
nb
X
lim ∇N i can ∇N a =∇2 C h (47)
h→0
a=1
nb
X
lim ∇2 N i can ∇2 N a =∇4 C h . (48)
h→0
a=1
However the error terms are scaled by h−2 in the noisy Laplacian operator and h−4 in the noisy
Biharmonic operator, where the mesh size h is introduced by the gradients of the basis functions.
The error on C b h is amplified by the mesh size acting through the spatial gradient operators and
diverges as h −→ 0. Consequently, we have
error term
∼ h−2 for the Laplacian operator (49)
true term
error term
∼ h−4 for the Biharmonic operator. (50)
true term
The above analysis illustrate that the final error induced by noise on C h will decrease with larger
time steps and element sizes. In Section 3 we show that for noisy C h , using low fidelity data
improves the results of system identification. For purely algebraic bases, i.e., without differential
operators from time derivatives or spatial gradients, time step and element size do not affect the
ratio of error to true value. Note also that the approximations in Equations (43), (47) and (48)
degrades with increasing time step and element size, bringing in discretization errors.
3 Examples
We now turn to using the framework detailed in the preceding section to identify the parabolic
PDEs that govern patterning. Instead of directly applying our algorithms to data from physical
experiments, in this first communication, we test our methods on identifying PDEs from data
obtained through high-fidelity, direct numerical simulations (DNS).
We consider test cases with the following four data-generating models, with their true coefficients
summarized in Table 1:
Model 1:
∂C1
= D1 ∇2 C1 + R10 + R11 C1 + R13 C12 C2 (51)
∂t
∂C2
= D2 ∇2 C2 + R20 + R21 C12 C2 (52)
∂t
16
where Γ is the domain boundary.
Model 2:
∂C1
= D1 ∇2 C1 + R10 + R11 C1 + R11 C12 C2 (54)
∂t
∂C2
= D2 ∇2 C2 + R20 + R21 C12 C2 (55)
∂t
∇C2 · n = 0 on Γ (56)
where Γ2 and Γ4 are shown in Figure 5. Models 1 and 2 represent coupled diffusion-reaction equa-
tions for two species following Schnakenberg kinetics [54], but with different boundary conditions.
For an activator-inhibitor species pair, these equations use auto-inhibition with cross-activation
of a short range species, and auto-activation with cross-inhibition of a long range species to form
so-called Turing patterns [19].
Model 3:
∂C1
= ∇ · (M1 ∇µ1 ) (57)
∂t
∂C2
= ∇ · (M2 ∇µ2 ) (58)
∂t
∂g
µ1 = − k1 ∇2 C1 (59)
∂C1
∂g
µ2 = − k2 ∇2 C2 (60)
∂C2
where g is a non-convex, “homogeneous” free energy density function, whose form has been chosen
from [28]:
3d 2 d
(2C1 − 1)2 + (2C2 − 1)2 + 3 (2C2 − 1) (2C2 − 1)2 − 3(2C1 − 1)2
g(C1 , C2 ) = 4
2s s
3d 2 2
− (2C 1 − 1) + (2C 2 − 1) . (63)
2s2
17
Model 4:
∂C1
= ∇ · (M1 ∇µ1 ) (64)
∂t
∂C2
= −M2 µ2 (65)
∂t
∂g
µ1 = − ∇ · k1 ∇C1 (66)
∂C1
∂g
µ2 = − ∇ · k2 ∇C2 (67)
∂C2
∇C2 · n = 0 on Γ (69)
Model 3 is a two field Cahn-Hilliard system with the well-known fourth-order term in the
concentration, C1 and C2 . The three-well non-convex free energy density function (See Figure 4),
g(C1 , C2 ), drives segregation of the system into two distinct types. We have previously used this
system to make connections with cell segregation in developmental biology [28]. The diffusion-
reaction and Cahn-Hilliard equations are widely used in biological pattern generation, as discussed
in the Introduction. The Cahn-Hilliad equation [29] also occupies a central role in the materials
physics literature for modelling phase transformations developing from a uniform concentration
field in the presence of an instability.
Model 4 is the coupled Cahn-Hilliard/Allen-Cahn equations system [41] where the field C2
describes growth of alloy precipitates from nuclei created by a process of spinodal decomposition
that controls the field C1 . The free energy density g(C1 , C2 ) couples these processes through µ1
18
and µ2 in both Models 3 and 4. The Introduction also cites some of the relevant literature on
the use of Cahn-Hilliard and Allen-Cahn equations in materials physics. An important difference
between Models 3 and 4 is that C2 in Model 4 is a non-conserved order parameter, which defines
the identities of the precipitate, described by composition field C1 , and matrix phases. Both
Cahn-Hilliard equations in Model 3 are conservative. Note that solving the Cahn-Hilliard and
Allen-Cahn equations in the isogeometric analytic framework requires imposition of the higher-
order Dirichlet boundary conditions (61) and (68). We use Nitsche’s method [67] to impose these
boundary conditions weakly.
Substituting the parameter values from Table 1, we present the weak form of each model:
Model 1:
Z Z Z
∂C1
w1 dv = −1∇w1 · ∇C1 dv + w1 (0.1 − C1 + 1C12 C2 )dv (70)
Ω ∂t Ω Ω
Z Z Z
∂C2
w2 dv = −40∇w2 · ∇C2 + w2 (0.9 − 1C12 C2 )dv (71)
Ω ∂t Ω Ω
Model 2:
Z Z Z Z
∂C1
w1 dv = −1∇w1 · ∇C1 dv + w1 (0.1 − C1 + 1C12 C2 )dv + w1 0.1ds (72)
Ω ∂t Ω Ω Γ2
Z Z Z
∂C2
w2 dv = −40∇w2 · ∇C2 + w2 (0.9 − 1C12 C2 )dv (73)
Ω ∂t Ω Ω
19
Model 3:
Z Z
∂C1
dv = ∇w1 · −17.8126 + 47.98C1 + 21.591C2 − 47.98C12 − 15.9933C22 ∇C1 dv
w1
Ω ∂t Ω
Z
+ ∇w1 ∇ · (−10.7955 + 21.591C1 + 15.9933C2 − 31.9867C1 C2 ) ∇C2 dv
Ω
Z
+ −1∇2 w1 ∇2 C1 (74)
Ω
Z Z
∂C2
w2 dv = ∇w2 · (−10.7955 + 21.591C1 + 15.9933C2 − 31.9867C1 C2 ) ∇C1 dv
Ω ∂t Ω
Z
∇w2 ∇ · −12.2149 + 15.9933C1 + 42.3823C2 − 15.9933C12 − 47.98C22 ∇2 C2 dv
+
Ω
Z
+ −1∇2 w2 ∇2 C2 (75)
Ω
Model 4:
Z Z
∂C1
dv = ∇w1 · −17.8126 + 47.98C1 + 21.591C2 − 47.98C12 − 15.9933C22 )∇C1 dv
w1
Ω ∂t Ω
Z
+ ∇w1 ∇ · (−10.7955 + 21.591C1 + 15.9933C2 − 31.9867C1 C2 ) ∇C2 dv
Ω
Z
+ −1∇2 w1 ∇2 C1 (76)
Ω
Z Z Z
∂C2
w2 dv = −1∇w2 · ∇C2 + +w2 (3.5085 − 10.7955C1 − 12.2149C2
Ω ∂t Ω Ω
The higher order boundary conditions and stabilization terms in Nitsche’s methods are not shown
here. We do not infer these terms since they are of a numerical rather than physical nature.
20
DNS was carried out on uniformly discretized 400 × 400 meshes. We start all Models from the same
initial conditions obtained by running Model 1 for a very short time from a randomized state with
small perturbations about C1 , C2 = 0.5 (see Figure 5). This state was relabelled as t = 0. With
Γ4
5.1e-01
0.508
0.506
0.504
Γ2 Γ3 0.502
0.5
0.498
0.496
0.49
Γ1
Figure 5: Concentration of C1 (left) and C2 (right) in all four models at t = 0.
this as initial conditions, the early stage dynamics of Models 1-4 cannot be distinguished by eye
(see Figure 6). However driven by different boundary condition and/or governing equations, the
dynamics of the four models soon evolve along distinct trajectories, and form completely different
patterns over a longer time as shown in Figure 7.
The 400 × 400 mesh yields the DNS data for each model, in terms of the fields (C1 , C2 ), at
the corresponding 401 × 401 DOFs for each time step. The synthetic, high fidelity DNS data are
regularized by having been obtained from PDEs with derivative constraints. Therefore, they are
noise-free, which contributes to finding solutions ω that drive the residual equations (9) down to
machine zero. However the data collected from physical experiments will not satisfy PDEs to the
same precision, in general: The data could be at lower fidelity, having been collected at fewer spatial
positions and time instants, and therefore may not approximate derivatives well. Background noise
and collection error will further contribute to a poorer approximation by PDEs, as discussed in
Section 2.4. To mimic the experimental data, we take two approaches: i) As illustrated in Figure
3, we use nested meshes for coarse-graining the data (C1 , C2 ) after generating them by DNS on the
400 × 400 mesh. Doing so introduces a loss of information on derivatives. ii) We superpose noise
with a Gaussian distribution having zero mean and standard deviation σ, on the collected (C1 , C2 ).
We first collect the DNS data, from all four models, on DOFs corresponding to the 400 × 400 mesh,
21
C1
C2
Figure 6: C1 and C2 for Model 1 to 4 ( left to right) at t = 0.01 are almost identical.
C1
C2
22
which was used for the forward computations. Data from the same forward computations were then
collected on the DOFs corresponding only to the nested 200 × 200, 100 × 100 and 50 × 50 meshes.
This yielded datasets with four different fidelities for each model, and 16 different datasets in total.
We then superimpose noise with σ = 10−4 for data generated from Models 1 and 2, and σ = 10−5
for data generated from Models 3 and 4, yielding another 16 data sets. Having the clean and noisy
data at different fidelity, we generate 38 candidate bases in addition to the time derivative terms,
summarized in Table 2. Note that to construct the time derivative terms, we need data at two time
steps, tn−1 and tn . Data at tn−1 is only used in the time derivative terms, while other bases are
constructed using data at tn (Backward Euler integration).
Table 2: Candidate basis for model selection. Asterisks, (∗) in the left column represent algebraic
operators on C1 and C2 .
type of basis basis in weak form
∂C1 ∂C2
R R
y Ω w ∂t dv Ω w ∂t dv
R R R
Ω ∇w∇C1 dv Ω ∇wC1 ∇C1 dv Ω ∇w∇C2 dv
2 2
R R R
∇(∗∇C1 ) Ω ∇wCR 1 ∇C1 dv Ω ∇wC1 CR 2 ∇C1 dv Ω ∇wC2 ∇C 1 dv
3 2 2 3
R R
Ω ∇wC1 ∇C1 dv Ω ∇wC1 C2 ∇C1 dv Ω ∇wC1 C2 ∇C1 dv Ω ∇wC2 ∇C1 dv
R R R
Ω ∇w∇C2 dv R Ω ∇wC1 ∇C2 dv Ω ∇w∇C2 dv
2 2
R R
∇(∗∇C2 ) Ω ∇wCR 1 ∇C2 dv Ω ∇wC1 CR 2 ∇C2 dv Ω ∇wC2 ∇C 2 dv
3 2 2 3
R R
Ω ∇wC1 ∇C2 dv Ω ∇wC1 C2 ∇C2 dv Ω ∇wC1 C2 ∇C2 dv Ω ∇wC2 ∇C2 dv
∇2 (∗∇2 C) 2 2 2 2 2 2 2 2
R R R R
Ω ∇ w∇ C1 dv Ω ∇ wC1 ∇ C1 dv Ω ∇ w∇ C2 dv Ω ∇ wC2 ∇ C2 dv
R R R
− w1dv − Ω wC1 dv − Ω wC2 dv
R Ω 2
− Ω wC1 dv − Ω wC1 C2 dv − Ω wC22 dv
R R
non-gradient
− Ω wC13 dv − Ω wC12 C2 dv − Ω wC1 C22 dv − Ω wC23 dv
R R R R
R R R R
boundary condition − Γ1 w1ds − Γ4 w1ds − Γ3 w1ds − Γ4 w1ds
23
the four datasets respectively. In particular, our system identification algorithms correctly identify
the zero flux boundary condition in Model 1, constant flux boundary condition on Γ4 in Model 2,
and higher order terms of the form ∇2 (∗∇2 C) (see Table 2), in Models 3 and 4. The algorithm
terminates after the second iteration since no more bases can be eliminated by the F test. As shown
in Figure 8, for the first two iterations, the loss function remains machine zero as the synthetic
data drive the residual equations (9) down to this limit. If any more bases are eliminated, the loss
function increases dramatically in Iteration 3 due to underfitting.
10-3
Model1-C 1
5
Model1-C 2
Model2-C 1
4 Model2-C 2
Model3-C 1
3 Model3-C 2
l
Model4-C 1
Model4-C 2
2
0
1 2 3
iteration
Figure 8: Value of the loss function at each iteration during stepwise regression using the high
fidelity data generated at t = 0 and t = 0.01. Note that the algorithm terminates at Iteration 2,
and the value of the loss function increases dramatically if more bases are eliminated, as seen in
Iteration 3.
Initially C1 and C2 are close to uniformly distributed with small perturbations. This uniformity
R R
yields bases, such as Ω ∇w∇C1 dv and Ω ∇wC1 ∇C1 dv that appear to be linearly dependent (left
plot in Figure in 9). The high fidelity data (400 × 400 mesh) drive the residual equations (9) down
to machine zero, allowing the linear regression model Equation (30) to converge. The algorithms
can therefore select the relevant bases, distinguishing from other irrelevant bases that are close to
being linearly dependent. However, low fidelidy data using NURBS basis functions yields poor
representation of derivatives such as those in Equations (43), (47) and (48). As a result, the
linear regression model contains greater error using low fidelity data, which are created by directly
sparsifying the high-fidelity data instead of being generated from a coarse grid. Therefore, an
unresolvable discrepancy remains in the low-fidelity representation, and two terms that are nearly
linearly dependent cannot be distinguished. Using low fidelity data at t = 0 and t = 0.01, the
algorithms are unable to identify any of the four governing PDEs (Models 1-4).
This spurious linearity however disappears as the system evolves (right plot in Figure 9). By
choosing data at 10 time steps: from t = 1 with time step ∆t = 1, the algorithms are able
to correctly identify all PDEs (Models 1-4) using low fidelity data. To quantify the algorithm’s
24
Figure 9: The nearly uniform distribution, C1 ≈ 0.5, yields a slope of ≈ 1/2 between
R R
Ω ∇wC1 ∇C1 dv and Ω ∇w∇C1 dv at t = 0.01, shown in the left plot. This spurious linearity
disappears as the system evolves over longer times as shown in the right plot.
elimination in the first few iterations. Although we are able to select all relevant terms with this
25
Figure 10: Inferred operators for C1 (left panel) and C2 (right panel) using data generated from
Model 1 using ∆t = 1, with 10 snapshots. The identified pre-factors of relevant terms are scaled by
their true values: Ω −1∇w1 · ∇C1 dv, 0.1 Ω w1 dv, −1 Ω w1 C1 dv, 1 Ω w1 C12 C2 dv in the left panel
R R R R
and −40 Ω ∇w2 · ∇C2 dv, Ω 0.9 Ω w2 dv, −1 Ω C12 C2 dv in right plot. They are also arranged
R R R R
from left to right in the same order as they appear in Equations (70) and (71). The original
results are summarized in Table 3. The transient results have been included as Movies 1 and 2 in
supplementary material.
prior, the error increases significantly as the mesh cannot properly resolve the small variation of
the basis as we see below in Figure 14.
Of the four datasets of different fidelity from each model, the high fidelity dataset (400 × 400
mesh) yields exact pre-factors for all bases. However for the lower fidelity datasets, variations in
the data field over a small neighborhood of the domain may diminish as the decreasing mesh size
resolves the variation poorly, i.e. the approximation in Equations (47) and (48) becomes poor. The
variations may even completely disappear, as seen on the 50 × 50 mesh (see one example shown in
Figure 14). Therefore, all the results deteriorate when using lower fidelity data. Specifically, for
data generated from Model 1, the error gradually increases for decreasing mesh size. Due to the
small pre-factor for the constant term, the error in the governing equation of C1 is bigger than that
in the governing equation of C2 .
the smaller time step, ∆t = 0.001, the noise, scaled by ∆t, washes out the true value. Decreasing
26
Figure 11: Inferred operators for C1 (left panel) and C2 (right panel) using data generated from
Model 2 using ∆t = 1, with 10 snapshots. The identified pre-factors of relevant terms are scaled
by their true values: Ω −1∇w1 · ∇C1 dv, 0.1 Ω w1 dv, −1 Ω w1 C1 dv, 1 Ω w1 C12 C2 dv, 0.1 Γ2 w1 ds
R R R R R
in the left plot and −40 Ω ∇w2 · ∇C2 dv, Ω 0.9 Ω w2 dv, −1 Ω C12 C2 dv in right plot. They are
R R R R
also arranged from left to right in the same order as they appear in Equations (72) and (73). The
original results are summarized in Table 4. As shown in the left panel, the Neumman boundary
condition term (last leaf) is more poorly estimated than the other terms, and fails to be identified
on the 50 × 50 mesh. The transient results have been included as Movies 3 and 4 in supplementary
material.
27
Figure 12: Inferred operators for C1 (left panel) and C2 (right panel) using data generated from
Model 3 using ∆t = 1, with 10 snapshots. The identified pre-factors of relevant terms are scaled
by their true values, and arranged from left to right in the same order as they appear in Equations
(74) and (75). The original results are summarized in Table 5. When using the 50 × 50 mesh,
a bias on Ω ∇wC22 ∇C1 dv is needed to protect it from elimination in the first few iterations for
R
inferring the governing PDE of C1 . The transient results have been included as Movies 5 and 6 in
supplementary material.
Figure 13: Inferred operators for C1 (left panel) and C2 (right panel) using data generated from
Model 4 using ∆t = 1, with 10 snapshots. The identified pre-factors of relevant terms are scaled
by their true values, and arranged from left to right in the same order as they appear in Equations
(76) and (77). The original results are summarized in Table 6. A bias on Ω ∇wC22 ∇C1 dv is needed
R
to protect it from elimination in the first few iterations for inferring the governing PDE of C1 . The
transient results have been included as Movies 7 and 8 in supplementary material.
28
Figure 14: The field of ∇2 C1 using different fidelity datasets generated by Model 3, on the corre-
sponding mesh. As the mesh size, h, decreases, the more diffuse features diminish in sharpness,
and even completely disappear in the 50 × 50 mesh.
the mesh size cannot alleviate the noise, as the ratio of error, denoted by σ, to its true value largely
remains fixed. At the larger time step, ∆t = 1, the noise is suppressed, as expressed in Equation
(44). For the following analysis, We choose data at the same 10 time steps starting from t = 10,
with time step ∆t = 1.
2c
Figure 16 shows the Laplacian basis operator on C2 , i.e., Ξ∇ C2 , with data generated by Model
1 using different meshes. For the fine mesh, the noise remains at a high level over an element,
and washes out the true value. By decreasing the mesh size, the ratio of final error to true value
decreases by a factor of h−2 as shown in Equation (49), and the true value begins to emerge in
the basis generated from noisy data. Consequently, as shown in Figure 17, using high fidelity data
(the 400 × 400 mesh) we are unable to correctly identify the governing equation for C2 using data
generated by Model 1. Using lower fidelity data, however, we are able to identify all the relevant
terms. The error also is smaller when using lower fidelity data. For the same reason, the governing
equation for C2 using data generated by Model 2 also cannot be identified using the 400 × 400
and 200 × 200 meshes, but becomes feasible for lower fidelity data (see Figure 18). On the other
hand the error is small for the basis operator C1 in both models. As a result, the estimation of the
governing PDE for C1 is comparable with that by using same data of the same fidelity, but without
noise. Again, as seen in the previous section, the Neumman boundary condition cannot able be
identified by using very low fidelity data on the 50 × 50 mesh.
4c
Figure 19 shows the biharmonic operator, Ξ∇ C1 using data generated by Model 3 with different
meshes.d Similar to the Laplace operator, the true value of the basis is washed out as the noise
overwhelms the high fidelity data on the fine meshes. By using low fidelity data, we are able
to correctly identify the governing equations for C1 and C2 using data generated from Model 3 as
shown in Figure 20. The results for data generated from Model 4 are mixed. The governing equation
for C1 is similar to that in Model 3 with the fourth-order biharmonic operator in the concentration.
d
This field was computed by projecting the data on to a fourth-order C 3 NURBS basis.
29
Figure 15: Each plot shows the true value of the basis components ΞĊ i , i = 1, . . . Number(DOF),
generated from noise-free data, in the top subplot, basis generated from noisy data in the middle
subplot, and the histogram of error between them in the bottom subplot. The error follows a
Gaussian distribution with standard deviation denoted by σ. The ratio of the error to the maximum
true value, σ
Ċ
∼ O(10−2 ) for the top two plots (∆t = 0.001), and decreases with increase in
maxi |Ξi |
∆t to ∼ O(10−4 ) for the bottom two plots (∆t = 1).
30
Figure 16: Each panel shows the field ∇2 C1 extracted from datasets with varying fidelity and
generated by Model 1 with and without noise. The histogram of error between the noise-free
and noisy data appears in the bottom plot. The error appears to follow a Gaussian distribution
with standard deviation denoted by σ. All the noise-free data produce similar results, while the
noise increasingly dominates as the mesh size decreases. (The standard deviations of the error
distributions are scaled by h−2 ). The true value begins to emerge in the basis generated from noisy
data as the mesh size decreases.
31
Figure 17: Inferred operators for C1 (left panel) and C2 (right panel) using data generated from
Model 1 using ∆t = 1, with 10 snapshots and σ = 10−4 . The identified pre-factors of relevant terms
are scaled by their true values: Ω −1∇w1 · ∇C1 dv, 0.1 Ω w1 dv, −1 Ω w1 C1 dv, 1 Ω w1 C12 C2 dv in
R R R R
the left plot and −40 Ω ∇w2 · ∇C2 dv, Ω 0.9 Ω w2 dv, −1 Ω C12 C2 dv in right plot. They also are
R R R R
arranged from left to right in the same order as they appear in Equations (70) and (71). The
original results are summarized in Table 7. Using data from the 400 × 400 mesh , the governing
equation for C2 is identified incorrectly, as seen in the right panel. The transient results have been
included as Movies 9 and 10 in supplementary material.
32
Figure 18: Inferred operators for C1 (left panel) and C2 (right panel) with data gen-
erated from Model 2 using ∆t = 1, with 10 snapshots and σ = 10−4 . The
R
identified pre-factors of relevant terms are scaled by their true values: Ω −1∇w 1 ·
2
R R R R R
∇C1 dv, 0.1 Ω w1 dv, −1 Ω w1 C1 dv, 1 Ω w1 C1 C2 dv, 0.1 Γ2 w1 ds in the left plot and −40 Ω ∇w2 ·
∇C2 dv, Ω 0.9 Ω w2 dv, −1 Ω C12 C2 dv in right plot. They are also arranged from left to right in
R R R
the same order as they appear in Equations (72) and (73). The original results are summarized in
Table 8. Using the data from the 400 × 400 or 200 × 200 mesh, the governing equation for C2 is
identified incorrectly, as seen in the right panel. The transient results have been included as Movies
11 and 12 in supplementary material.
33
Similar to that case, and for the same reason of noise overwhelming the true data, Model 4 cannot
be identified by using high fidelity data, but can be identified by using low fidelity data (see Figure
21). However, the governing equation for C2 does not have the fourth-order biharmonic operator.
Given the small error on (C1 , C2 ), the identification of the governing PDE for C2 is comparable
with that by using same fidelity data, but without noise. Again, as seen in the last section, for
the 50 × 50 mesh, we need to place a bias on Ω ∇wC22 ∇C1 dv to protect it from elimination in
R
the first few iterations for data generated from Models 3 and 4. Also note that the error increases
significantly to 20.7% with decreasing fidelity as shown in Table 10.
Finally for all results using noisy data, the errors are, in general, higher than those obtained by
using same fidelity data, but without noise.
Figure 19: Each plot shows the field ∇4 C1 using different fidelity datasets with and without noise,
generated by Model 3, and the histogram of error between them in the bottom plot. The error
appears to follow a Gaussian distribution with standard deviation denoted by σ. All the noise-free
data produce similar results, while the noise washes out the fields very quickly as mesh size increases
(The standard deviations of errors are scaled by h−4 ). The true value begins to emerge in the basis
generated from noisy data with decreasing mesh size.
34
Figure 20: Inferred governing equations for C1 ( left plot) and C2 ( right plot) using data generating
from Model 3 using dt=1, 10 snapshot. σ = 10−5 . The estimated pre-factors of relevant terms
are scaled by their true values, and arranged from left to right in the same order as they appear
in Equation (74) and (75). The original results are summarized in Table 9. Using data from the
400 × 400 or 200 × 200 meshes, neither governing equation can be identified correctly. The transient
results have been included as Movies 13 and 14 in supplementary material.
Figure 21: Inferred operators for C1 (left plot) and C2 (right plot) using data generating from
Model 4 using ∆t = 1, 10 snapshots and σ = 10−5 . The identified pre-factors of relevant terms
are scaled by their true values, and arranged from left to right in the same order as they appear
in Equation (76) and (77). The original results are summarized in Table 9. Using data from the
400 × 400 or 200 × 200 meshes, the governing equation for C1 is identified incorrectly seen in the left
panel. The transient results have been included as Movies 15 and 16 in supplementary material.
35
4 Discussion and conclusions
The development of patterns in biophysics and material physics are governed by a range of spatio-
temporal PDEs. It is compelling to attempt to discover the analytic forms of these PDEs from
data, because doing so immediately provides insight to the governing physics. System identification
has been explored using the strong form of the PDEs [10, 11, 13, 14, 15], as discussed in the
Introduction. However the equivalent weak forms, which can offer capabilities not achievable by
strong forms, have remained under-exploited for system identification. The weak form transfers
derivatives to the weighting function, and allows smooth evaluation of the remaining, possibly
higher-order, derivatives when NURBS basis functions are employed. In our numerical experiments,
NURBS basis functions allow the fourth-order terms in the Cahn-Hilliard and Allen-Cahn equations
to be correctly identified with higher accuracy estimation of the pre-factors using data at different
fidelity, with and without noise (results in Tables 5, 6, 9 and 10). Another advantage of writing the
PDEs in weak form is that Neumann boundary conditions are explicitly included as surface integrals.
During system identification, this class of boundary conditions can be, therefore, constructed along
with other operators in the PDEs. By incorporating these terms in the regression model, Equation
(30), identification of boundary conditions becomes feasible (results in Tables 4 and 8).
Intuitively high fidelity data is favored as it minimizes the residual equations, and can accurately
evaluate the time derivative and spatial gradient. For data without noise, the results enjoy these
advantages (results in Tables 3, 4, 5 and 6 and Figures 10 - 13). However, the error embedded
in noisy data is exaggerated relative to the true signal by small time steps and element lengths,
as discussed in the context of Equations (43), (45) and (46). Low fidelity data in time and space
may allow greater variations in the true signal relative to the noise, as shown in Figures 15, 16
and 19, and thus improve the results of system identification (results in Tables 7, 8, 9 and 10). On
the other hand, low fidelity data may yield inaccurate estimation of the pre-factors due to poor
evaluation of the operators (Figure 14). Data of very low fidelity may be incapable of capturing
the high-frequency content of solution fields, thus leading to entirely incorrect identification of the
operators. For example, without an injection of prior knowledge, we were unable to identify the
high frequency operators such as Ω ∇wC22 ∇C1 dv in Model 3 and 4 using low fidelity data on the
R
50× 50 mesh. The careful selection of data could improve these results, but that direction for
study lies beyond the scope of this first communication.
We have demonstrated our algorithm on data at different fidelity with and without noise,
generated from three different types of PDEs. In this first communication, we do not aim to
systematically study the uncertainty in the models induced by the lack of fidelity and noise, nor to
optimize the data extraction from simulations/experiments. However we point out several factors
that could influence the results:
1. Two analytically independent bases may appear to be linearly dependent over certain time
36
intervals (Figure 9). This spurious linearity could disappear as the system evolves.
2. Low fidelity data on coarse meshes resolves the variation of fields poorly (see Figure 14).
However it helps by allowing larger variation of the true signal relative to the noise. It thus
can effectively smooth out the error in bases constructed from noisy data (Figures 16 and
19).
3. Fields may evolve at different rates, and be distributed with different gradients (Figures
14 and 15 ). As a result a chosen basis may be more prominent at certain DOFs than at
others, and contribute more in the linear regression model. Using DOFs over subdomains
may improve the results of system identification. Using partial data in system identification
has been seen to reduce the computational expense [15], but atempting to systematically
improve the results by choosing partial data at certain times has not been widely explored.
4. We found that results improve when the Neumman boundary condition bases are not included.
We could correctly identify the governing equation generated from Model 1 with larger noise
by not including the boundary condition bases. We suspect that since the Neumman boundary
condition basis is zero at all inner DOFs it renders the matrix of bases, Ξ, close to singular.
5. Any prior knowledge of the system would help to select the correct basis from many can-
didates. For example we placed a bias on certain bases when identifying PDEs using data
generated from Model 3 with low fidelity data. Similarly, knowing that Model 3 is a conserved
system, all bases contributing to the reaction (source) terms can be pre-eliminated.
37
In this work, we currently use standard linear regression models for each iteration of the step-
wise regression algorithm. Bayesian frameworks present a natural path to systematically inject prior
knowledge and domain expertise, and to quantify and propagate the uncertainty induced by lack
of model fidelity and data noise. Furthermore, they enable techniques for comparing competing
models that are rooted in a rigorous probabilistic paradigm, such as with the Bayesian model
selection and model averaging methods. Bayesian analysis of our methods will appear in future
communications.
Acknowledgements
We acknowledge the support of Toyota Research Institute, Award #849910, “Computational frame-
work for data-driven, predictive, multi-scale and multi-physics modeling of battery materials” (ZW
and KG). Additional support: This material is based upon work supported by the Defense Advanced
Research Projects Agency (DARPA) under Agreement No. HR0011199002, “Artificial Intelligence
guided multi-scale multi-physics framework for discovering complex emergent materials phenom-
ena” (ZW, XH and KG). Simulations in this work were performed using the Extreme Science and
Engineering Discovery Environment (XSEDE) Stampede2 at the Texas Advance Computing Cen-
ter through allocations TG-MSS160003 and TG-DMR180072. XSEDE is supported by National
Science Foundation grant number ACI-1548562.
Appendix
Table 3: Results using data generated from Model 1 using ∆t = 1, with 10 snapshots.
38
mesh size results error
w ∂C1 dv = Ω −1∇w1 · ∇C1 dv + Ω w1 (0.1 − C1 + 1C12 C2 )dv + Γ2 w1 0.1ds
R R R R
Ω 1 ∂t
0%
400 × 400
w ∂C2 dv = Ω −40∇w2 · ∇C2 + Ω w2 (0.9 − 1C12 C2 )dv
R R R
Ω 2 ∂t
0%
w ∂C1 dv = Ω −0.982∇w1 · ∇C1 dv + Ω w1 (0.106 − 1.0027C1 + 0.997C12 C2 )dv + Γ2 w1 0.1305ds
R R R R
Ω 1 ∂t
30%
200 × 200
w2 ∂C dv = Ω −40.23∇w2 · ∇C2 + Ω w2 (0.90043 − 1.0006C12 C2 )dv
R 2
R R
∂t
0.57%
∂C1
R Ω
dv = Ω −0.9793∇w1 · ∇C1 dv + Ω w1 (0.1095 − 1.0036C1 + 0.9949C12 C2 )dv + Γ2 w1 0.1471ds
R R R
w
Ω 1 ∂t
47.1%
100 × 100
w ∂C2 dv = Ω −40.751∇w2 · ∇C2 + Ω w2 (0.9028 − 1.0031C12 C2 )dv
R R R
Ω 2 ∂t
1.87%
∂C1
dv = Ω −0.884∇w1 · ∇C1 dv + Ω w1 (0.1172 − 0.98576C1 + 0.9643C12 C2 )dv
R R R
w
Ω 1 ∂t
17%
50 × 50
w ∂C2 dv = Ω −41.985∇w2 · ∇C2 + Ω w2 (0.9076 − 1.0118C12 C2 )dv
R R R
Ω 2 ∂t
5%
Table 4: Results using data generating from Model 2 with ∆t = 1, with 10 snapshots. The high
error in the governing equation of C1 is induced by the Neumman boundary condition term, which
cannot be identified on the 50 × 50 mesh.
Table 5: Results using data generated from Model 3 using ∆t = 1, with 10 snapshots. A bias on
2
R
Ω ∇wC2 ∇C1 dv is needed to protect it from elimination in the first few iterations.
39
mesh size results error
∂C1
w1 ∂t dv = Ω ∇w1 · −17.8126 + 47.98C1 + 21.591C2 − 47.98C12 − 15.9933C22 ∇C1 dv
R R
RΩ 0%
+ Ω ∇w2 ∇ · (−10.7955 + 21.591C1 + 15.9933C2 − 31.9867C1 C2 ) ∇C2 dv + Ω −1∇2 w1 ∇2 C1
R
400 × 400
w ∂C2 dv = Ω −1∇w2 · ∇C2 + Ω +w2 (3.5085 − 10.7955C1 − 12.2149C2
R R R
Ω 2 ∂t
0%
+10.7955c21 + 15.9933C1 C2 + 21.1912C22 − 15.9933C12 C2 − 15.9933C23 )dv
w1 ∂C dv = Ω ∇w1 · −17.8589 + 48.06C1 + 21.7044C2 − 48.0635C12 − 16.09C22 ∇C1 dv
R 1
R
RΩ ∂t
1.2%
+ Ω ∇w2 ∇ · (−10.8207 + 21.64C1 + 16.0478C2 − 32.089C1 C2 ) ∇C2 dv + Ω −1.0123∇2 w1 ∇2 C1
R
200 × 200
w ∂C2 dv = Ω −1.0114∇w2 · ∇C2 + Ω +w2 (3.498 − 10.7636C1 − 12.191C2
R R R
Ω 2 ∂t
1.14%
+10.7663C12 + 15.9459C1 C2 + 21.1655C22 − 15.9499C12 C2 − 15.9752C23 )dv
∂C1
w1 ∂t dv = Ω ∇w1 · −17.3435 + 46.4025C1 + 21.4151C2 − 46.4119C12 − 15.9544C22 ∇C1 dv
R R
Ω
3.28%
+ Ω ∇w2 ∇ · (−10.4451 + 20.8839C1 + 15.5847C2 − 31.1354C1 C2 ) ∇C2 dv + Ω −1.0092∇2 w1 ∇2 C1
R R
100 × 100
w ∂C2 dv = Ω −1.0458∇w2 · ∇C2 + Ω +w2 (3.46 − 10.6416C1 − 12.096C2
R R R
Ω 2 ∂t
4.58%
+10.6545C12 + 15.7652C1 C2 + 21.056C22 − 15.7843C12 C2 − 15.8966C23 )dv
∂C1
w1 ∂t dv = Ω ∇w1 · −14.9026 + 38.6896C1 + 19.735C2 − 38.689C12 − 14.989C22 ∇C1 dv
R R
Ω
20.6%
+ Ω ∇w2 ∇ · (−8.5697 + 17.1244C1 + 13.1685C2 − 26.231C1 C2 ) ∇C2 dv + Ω −0.9216∇2 w1 ∇2 C1
R R
50 × 50
w ∂C2 dv = Ω −1.1476∇w2 · ∇C2 + Ω +w2 (3.3323 − 10.2516C1 − 11.7393C2
R R R
Ω 2 ∂t
14.7%
+10.3c21 + 15.191C1 C2 + 20.5589C22 − 15.2618C12 C2 − 15.5277C23 )dv
Table 6: Results using data generated from Model 4 using ∆t = 1, with 10 snapshots. A bias on
2
R
Ω ∇wC2 ∇C1 dv is needed to protect it from elimination in the first few iterations.
Table 7: Results using data generated from Model 1 using ∆t = 1, with 10 snapshots and σ = 10−4
Table 8: Results using data generated from Model 2 using ∆ = 1, with 10 snapshots and σ = 10−4 .
40
mesh size results error
∂C
w1 ∂t1 dv
R
identify wrongly N/A
400 × 400 RΩ ∂C
w2 ∂t2 dv identify wrongly N/A
RΩ ∂C
w1 ∂t1 dv identify wrongly N/A
200 × 200 RΩ ∂C2
Ω w2 ∂t
dv
identify wrongly N/A
∂C1
R R 2 2
Ω w1 ∂t dv = Ω ∇w1 · −13.7499 + 36.3723C1 + 17.9356C2 − 36.3702C1 − 13.9C2 ∇C1 dv 30.2%
+ Ω ∇w2 ∇ · (−7.6263 + 15.2494C1 + 11.158C2 − 22.32C1 C2 ) ∇C2 dv + Ω −0.7919∇ w1 ∇2 C1
2
R R
100 × 100 R ∂C2 R
Ω w2 ∂t dv = Ω ∇w2 · (−6.9381 + 13.8269C1 + 10.1406C2 −20.2006C1 C2R) ∇C1 dv
30.3%
+ Ω ∇w2 ∇ · −9.1114 + 13.4938C1 + 29.8767C2 − 13.5301C12 − 33.5644C22 ∇2 C2 dv + Ω −0.735∇2 w2 ∇2 C2
R
∂C
w1 ∂t1 dv = Ω ∇w1 · −15.1116 + 40.105C1 + 19.6542C2 − 40.132C12 − 15.2351C22 ∇C1 dv
R R
RΩ 24.1%
+ Ω ∇w2 ∇ · (−8.3106 + 16.6052C1 + 12.1407C2 − 24.2681C1 C2 ) ∇C2 dv + Ω −0.9238∇2 w1 ∇2 C1
R
50 × 50 R ∂C2 R
Ω w2 ∂t dv = Ω ∇w2 · (−7.6666 + 15.2396C1 + 11.2172C2 −22.272C1 C2R) ∇C1 dv 30.3%
+ Ω ∇w2 ∇ · −10.1748 + 15.3511C1 + 33.0199C2 − 15.4756C12 − 36.9985C22 ∇2 C2 dv + Ω −0.8631∇2 w2 ∇2 C2
R
Table 9: results using data generating from Model 3 using dt=1, 10 snapshot. σ = 10−5
Table 10: results using data generating from Model 4 using dt=1, 10 snapshot, σ = 10−5 .
41
References
1. H. Voss and J. Kurths. Reconstruction of nonlinear time delay models from data by the use of
optimal transformations. Phys. Lett. A., 234:336–344, 1997.
4. P. J. Attar and E. H. Dowell. A reduced order system id approach to the modelling ofnonlinear
structural behavior in aeroelasticity. J Fluids Struct., 21, 2005.
5. M. Khalil, S. Adhikari, and A. Sarkar. Linear system identification using proper orthogonal
decomposition. Mech Syst Signal Process., 21, 2007.
6. L. Z. Guo, S. A. Billings, and D. Coca. Identification of partial differential equation models for
a class of multiscale spatio-temporal dynamical systems. Int. J. Control., 83, 2010.
10. M. Schmidt and H. Lipson. Distilling free-form natural laws from experimental data. Science,
03, 2009.
12. M. Raissi, P. Perdikaris, and G.E. Karniadakis. Physics-informed neural networks: A deep
learning framework for solving forward and inverse problems involving nonlinear partial differ-
ential equations. J. Comput. Phys., 378, 2019.
13. S. L. Brunton, J. L. Proctor, and J. N. Kutz. Discovering governing equations from data by
sparse identification of nonlinear dynamical systems. Proc. Natl. Acad. Sci., 113, 2016.
42
14. N. M. Mangan, S. L. Brunton, J. L. Proctor, and J. N. Kutz. Inferring biological networks by
sparse identification of nonlinear dynamics. IEEE Trans. Mol. Biol. Multi-Scale Commun., 2,
2016.
16. M. Quade, M. Abel, J. N. Kutz, and S. L. Brunton. Sparse identification of nonlinear dynamics
for rapid model recovery. Chaos, 28, 2018.
17. N. M. Mangan, T. Askham, S. L. Brunton, J. N. Kutz, and J. L. Proctor. Model selection for hy-
brid dynamical systems via sparse regression. Proceedings of the Royal Society A: Mathematical,
Physical and Engineering Sciences, 475(2223):20180534, 2019. doi: 10.1098/rspa.2018.0534.
19. A. M. Turing. The chemical basis of morphogenesis. Phil. Trans. Roy. Soc. Lond. Ser. B., 237,
1952.
20. A. Gierer and H. Meinhardt. A theory of biological pattern formation. Kybernetik, 12, 1972.
21. J. D. Murray. On pattern formation mechanisms for lepidopteran wing patterns and mammalian
coat markings. Phil. Trans. Roy. Soc. Lond. Ser. B, 295, 1981.
22. R. Dillon, P. K. Maini, and H. G. Othmer. Pattern formation in generalized turing systems i:
Steady-state patterns in systems with mixed boundary conditions. J. Math. Biol., 32, 1994.
23. R. A. Barrio, C. Varea, and J. L. Aragon. A two-dimensional numerical study of spatial pattern
formation in interacting turing systems. Bull. Math. Biol., 61, 1999.
25. Philip K. Maini, Thomas E. Woolley, Ruth E. Baker, Eamonn A. Gaffney, and S. Seirin Lee.
Turing’s model for biological pattern formation and the robustness problem. Interface Focus,
2(4):487–496, 2012. doi: 10.1098/rsfs.2011.0113.
26. F. Spill, P. Guerrero, T. Alarcon, P. K. Maini, and H. Byrne. Hybrid approaches for multiple-
species stochastic reaction-diffusion models. J. Comput. Phys., 299, 2015.
27. K. Korvasová, E. A. Gaffney, P. K. Maini, M. A. Ferreira, and V. Klika. Investigating the turing
conditions for diffusion-driven instability in the presence of a binding immobile substrate. J.
Theor. Biol., 367, 2015.
43
28. K. Garikipati. Perspectives on the mathematics of biological patterning and morphogenesis. J.
Mech. Phys. Solids., 99, 2017.
29. J. W. Cahn and J. E. Hilliard. Free energy of a nonuniform system. i interfacial energy. J.
Chem. Phys., 28, 1958.
31. V. Cristini, X. Li, J. S. Lowengrub, and S. M. Wise. Nonlinear simulations of solid tumor
growth using a mixture model: invasion and branching. J. Math. Biol., 58, 2009.
32. J. S. Lowengrub, H. B. Frieboes, F Jin, Y-L. Chuang, X. Li, Macklin, S. M. Wise, and
V. Cristini. Nonlinear modelling of cancer: bridging the gap between cells and tumours. Non-
linearity, 23, 2010.
33. J. S. Lowengrub, A. Rätz, and A. Voigt. Phase-field modeling of the dynamics of multicom-
ponent vesicles: Spinodal decomposition, coarsening, budding, and fission. Phys. Rev. E, 79,
2009.
34. G. Vilanova, I. Colominas, and H. Gomez. Capillary networks in tumor angiogenesis: From
discrete endothelial cells to phase-field averaged descriptions via isogeometric analysis. Num.
Meth. Biomed. Eng., 29, 2013.
35. G. Vilanova, I. Colominas, and H. Gomez. Coupling of discrete random walks and continuous
modeling for three-dimensional tumor-induced angiogenesis. Comput. Mech., 53, 2014.
36. J. T. Oden, A. Hawkins, and S. Prudhomme. General diffuse-interface theories and an approach
to predictive tumor growth modeling. Math. Mod. Meth. App. Sci., 20, 2010.
37. J. Xu, G. Vilanova, and H. Gomez. A mathematical model coupling tumor growth and angio-
genesis. PLoS ONE, 11, 2016.
38. T. Jiang, S. Rudraraju, A. Roy, A. Van der Ven, K. Garikipati, and M. L. Falk. Multi-physics
simulations of lithiation-induced stress in litio electrode particles. J. Phys. Chem. C, 120, 2016.
39. S. Rudraraju, A. Van der Ven, and K. Garikipati. Mechano-chemical spinodal decomposition:
A phenomenological theory of phase transformations in multi-component crystalline solids.
Nature Computational Materials, 2, 2016.
40. G.H. Teichert, S. Rudraraju, and K. Garikipati. A variational treatment of material config-
urations with application to interface motion and microstructural evolution. Journal of the
Mechanics and Physics of Solids, 99, 2017.
44
41. Samuel M. Allen and John W. Cahn. A microscopic theory for antiphase boundary motion and
its application to antiphase domain coarsening. Acta Metall., 27, 1979.
42. V. Vaithyanathan, C. Wolverton, and L.Q. Chen. Multiscale modeling of precipitate microstruc-
ture evolution. Physical Phys. Rev. Lett., 88, 2002.
43. S.Y. Hu and L-Q. Chen. A phase field model for evolving microstructures with strong elastic
inhomogeneity. Acta Mater., 49, 2001.
44. J.Z. Zhu, Z.K. Liu, V. Vaithyanathan, and L-Q. Chen. Linking phase-field model to calphad:
application to precipitate shape evolution in ni-base alloys. Scripta Mater, 46, 2002.
45. C.H. Su and P.W. Voorhees. The dynamics of precipitate evolution in stressed solids-i. inverse
coarsening. Acta Mater., 44, 1996.
46. K. Seong Gyoon, K. Won Tae, and S. Toshio. Phase-field model for binary alloys. Phys. Rev.
E, 60, 1999.
47. Y. Gao, H. Liu, R. Shi, N. Zhou, Z. Xu, Y.M. Zhu, J.F. Nie, and Y. Wang. Simulation study
of precipitation in an mgynd alloy. Acta Mater., 60, 2012.
48. H. Liu, Y. Gao, J.Z. Liu, Y.M. Zhu, Y. Wang, and J.F. Nie. A simulation study of the shape
of β’ precipitates in mgy and mggd alloys. Acta Mater., 61, 2013.
49. Y.Z. Ji, A. Issa, T.W. Heo, J.E. Saal, C. Wolverton, and L.-Q. Chen. Predicting β’ precipitate
morphology and evolution in mgre alloys using a combination of first-principles calculations
and phase-field modeling. Acta Mater., 76, 2014.
50. H. Liu, W.F. Xu, L.M. Peng, W.J. Ding, and J.F. Nie. A simulation study of the distribution
of β’ precipitates in a crept mg-gd-zr alloy. Comput. Mater. Sci., 130, 2017.
51. H.-J. Jou, P.H. Leo, and J.S. Lowengrub. Microstructural evolution in inhomogeneous elastic
media. J. Comput. Phys., 131, 1997.
52. G. Teichert and K. Garikipati. Machine learning materials physics: Surrogate optimization
and multi-fidelity algorithms predict precipitate morphology in an alternative to phase field
dynamics. Computer Methods in Applied Mechanics and Engineering, 344, 2019.
53. I. Knowles and R. J. Renka. Methods for numerical differentiation of noisy data. Electron. J.
Differ. Eq., 21, 2014.
54. J. Schnakenberg. Network theory of microscopic and macroscopic behavior of master equation
systems. Rev. Mod. Phys., 48, 1976.
45
55. M. Raissi and G. E. Karniadakis. Hidden physics models: Machine learning of nonlinear partial
differential equations. J. Comput. Phys., 357, 2018.
56. J. Han, A. Jentzen, and W. E. Solving high-dimensional partial differential equations using
deep learning. Proc. Natl. Acad. Sci., 115, 2018.
57. J. Cottrell, T. Hughes, and Y. Bazilevs. Isogeometric analysis: Toward integration of cad and
fea. Wiley, Chichester, 2009.
58. L. Piegl and W. Tiller. The nurbs book, 2nd ed. Springer-Verlag New York, Inc., New York,
NY, USA, 1997.
59. Marc. C. Kennedy and Anthony O’Hagan. Bayesian calibration of computer models. Journal
of the Royal Statistical Society: Series B (Statistical Methodology), 63(3):425–464, 2001. doi:
10.1111/1467-9868.00294.
61. Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning.
Springer, New York, NY, 2nd edition, 2009.
62. Richard R. Picard and R. Dennis Cook. Cross-Validation of Regression Models. Journal of the
American Statistical Association, 79(387):575– 583, 2010.
63. Hirotugu Akaike. A New Look at the Statistical Model Identification. IEEE Transactions on
Automatic Control, 19(6):716–723, 1974. ISSN 15582523. doi: 10.1109/TAC.1974.1100705.
64. Gideon Schwarz. Estimating the Dimension of a Model. The Annals of Statistics, 6(2):461–464,
1978. ISSN 0090-5364. doi: 10.1214/aos/1176344136.
65. Robert E. Kass and Adrian E. Raftery. Bayes Factor. Journal of American Statistical Associ-
ation, 90(430):773–795, 1995. ISSN 01621459. doi: 10.2307/2291091.
66. Larry Wasserman. Bayesian Model Selection and Model Averaging. Journal of Mathematical
Psychology, 44:92–107, 2000. doi: 10.1006/jmps.1999.1278.
67. S. Rudraraju, A. Van der Ven, and Garikipati K. Mechanochemical spinodal decomposition: a
phenomenological theory of phase transformations in multi-component, crystalline solids. NPJ
Compt. Mater., 2, 2016.
46