Physics-Constrained Neural Network For Solving Discontinuous Interface K-Eigenvalue Problem With Application To Reactor Physics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Nuclear Science and Techniques (2023) 34:161

https://doi.org/10.1007/s41365-023-01313-0

Physics-constrained neural network for solving discontinuous


interface K-eigenvalue problem with application to reactor physics
Qi-Hong Yang1 · Yu Yang1 · Yang-Tao Deng1 · Qiao-Lin He1 · He-Lin Gong2 · Shi-Quan Zhang1

Received: 7 June 2023 / Revised: 28 July 2023 / Accepted: 25 August 2023 / Published online: 13 November 2023
© The Author(s) 2023

Abstract
Machine learning-based modeling of reactor physics problems has attracted increasing interest in recent years. Despite some
progress in one-dimensional problems, there is still a paucity of benchmark studies that are easy to solve using traditional
numerical methods albeit still challenging using neural networks for a wide range of practical problems. We present two
networks, namely the Generalized Inverse Power Method Neural Network (GIPMNN) and Physics-Constrained GIPMNN
(PC-GIPIMNN) to solve K-eigenvalue problems in neutron diffusion theory. GIPMNN follows the main idea of the inverse
power method and determines the lowest eigenvalue using an iterative method. The PC-GIPMNN additionally enforces
conservative interface conditions for the neutron flux. Meanwhile, Deep Ritz Method (DRM) directly solves the smallest
eigenvalue by minimizing the eigenvalue in Rayleigh quotient form. A comprehensive study was conducted using GIPMNN,
PC-GIPMNN, and DRM to solve problems of complex spatial geometry with variant material domains from the field of
nuclear reactor physics. The methods were compared with the standard finite element method. The applicability and accuracy
of the methods are reported and indicate that PC-GIPMNN outperforms GIPMNN and DRM.

Keywords Neural network · Reactor physics · Neutron diffusion equation · Eigenvalue problem · Inverse power method


1 Introduction −∇ · (Dg ∇φg ) + gR φg − g →g φg
g   =g
In the nuclear engineering domain, the fundamental mode 1  (1)
solution of the K-eigenvalue problem based on the steady- = χg νgf  φg g = 1, 2, ..., G,
K  g
state multigroup neutron diffusion theory is crucial for
simulation and analysis of nuclear reactors. The eigenvalue
equation can be expressed as follows: where φg ≡ φg (r) denotes the neutron flux at spatial point
r in the g-th energy group, Dg , gR , g →g , χg and νgf 
denote spatially dependent (possibly discontinuous) param-
eters that reflect the material properties in a reactor core
[1]. Nuclear engineers and analysts must numerically deter-
mine the fundamental mode eigenvalue (commonly called
This work was partially supported by the National Natural Science
Foundation of China (No. 11971020), Natural Science Foundation of Keff ) and the corresponding eigenvector for a given geom-
Shanghai (No.23ZR1429300), and Innovation Funds of CNNC etry/material configuration. For numerical solution, Eq. (1)
(Lingchuang Fund). must be discretized and reduced to a set of G-coupled alge-
braic equations, which can be expressed using matrices as
B Qiao-Lin He
follows:
[email protected]
B He-Lin Gong 1
[email protected] M = F. (2)
K
1 School of Mathdematics, Sichuan University, Chengdu
610065, China This is often termed as the generalized eigenvalue problem
2 Paris Elite Institute of Technology, Shanghai Jiao Tong because coefficient matrices occur on both sides of the equa-
University, Shanghai 200240, China tion. Many mature numerical methods, such as the finite

123
161 Page 2 of 23 Q.-H. Yang et al.

difference method [2], nodal collocation method [3], finite- In recent years, neural networks have been widely used to
element method [4–6], and nodal expansion method [7–9], solve Partial Differential Equations (PDEs) [20] and achieved
have been proposed to solve neutron diffusion equations. remarkable success.
Among the methods, the nodal expansion method is widely Based on neural networks, a large number of methods were
used because it is easier to implement and requires less com- proposed to solve PDE, such as the deep backward stochastic
putational effort than other methods. The power iteration differential equation (BSDE) method [21, 22], deep Galerkin
method is the most well-known numerical method for solving method (DGM) [23], deep Ritz method (DRM) [24], and
the principal K-eigenvalue [10]. A detailed review of conven- Physics-Informed Neural Network (PINN) [25]. The deep
tional numerical methods for solving K-eigenvalue problems BSDE method reformulates PDEs using backward stochas-
is available in recent nuclear reactor physics textbooks [1, 11, tic differential equations and approximates the gradient of
12]. Although traditional and mature numerical methods are an unknown solution using neural networks. Although DGM
currently widely used in nuclear reactor physics to solve K- and PINN appear independently under two names in the lit-
eigenvalue problems with acceptable engineering accuracy erature, they are similar. They both train neural networks
and computational cost, state-of-the-art neural networks to by minimizing the mean squared error loss of the residual
solve K-eigenvalue problems are still in the infancy stages. equation. However, DRM reformulates the original prob-
However, neural networks exhibit the potential to provide lem into a variational problem and trains neural networks by
another option to solve nuclear engineering problems with minimizing the energy function of the variational problem.
the progress of new algorithms and hardware. Specifically, DRM and PINN [25] attracted widespread atten-
As mentioned above, many traditional numerical meth- tion. Extensive studies focused on these methods to solve
ods were proposed to solve neutron diffusion equations. a variety of problems, including fiber optics [26], hypere-
However, a dense mesh is required to ensure highly accu- lasticity [27], solid mechanics [28], heat transfer problems
rate results. The model consumes more computational [29–31], inverse problems [32–35], and uncertainty quantifi-
resources if the mesh is extremely dense. Inaccurate solu- cation [36–39]. However, a few studies focused on solving
tions are obtained if the mesh is coarse. Moreover, for eigenvalue problems [24, 40–45].
high-dimensional problems, classical methods are either less The need to solve eigenvalue problems can be traced
efficient or not successful due to the problems involving back to 2018 [24]. A deep Ritz method to solve variational
dimensionality. A neural network can potentially enhance problems was proposed, and several examples elucidated
its performance by using a capable hypothesis space due to on how to use DRM to solve eigenvalue problems [24].
its relatively low statistical error [13]. First, the original eigenvalue problem was transformed into
Neural networks exhibit multiple potential benefits in a variational problem. Then, a specially defined loss func-
solving nuclear engineering problems when compared with tion was constructed, termed as the Rayleigh quotient, using
conventional numerical methods. the variational principle [46]. The Rayleigh quotient is a
well-known approximation of the eigenvalue of matrix A,
T
• They provide mesh-free solutions to approximate physics which is defined by R = x TAx . Finally, they minimized the
x x
fields in nuclear reactors, in which mesh generation is sig- loss function and obtained the smallest eigenvalue. Similarly,
nificantly complex due to high heterogeneity of geometry some studies [41, 42] directly used PINN to solve eigen-
and material. value problems. In contrast to DRM transferring the original
• They provide a general framework to solve high- problem to a variational problem, the PINN solves eigen-
dimensional problems governed by parameterized PDEs, value problems without variation. Neural networks are used
particularly for original neutron transport equation.This in PINN to represent the function, and automatic differentia-
equation comprises seven variables: three spatial vari- tion (AD) [47] is used to acquire the vector impacted by the
ables, two directional variables, one energy variable, and differential operators. The loss function is obtained using the
one temporal variable. Rayleigh quotient. The smallest eigenvalue and correspond-
• They are able to seamlessly incorporate prior data (poten- ing eigenvector were then solved using optimization tools.
tially with noise) into existing algorithms given that data Moreover, the deep forward-backward stochastic differential
assimilation is necessary as a post-process for conven- equation (FBSDE) method [48] was proposed to solve eigen-
tional numerical methods [14–19]. value problems, which is an expansion of the deep BSDE
• They provide a general framework to solve inverse prob- method. In the method, the eigenvalue problem is reformu-
lems with hidden physics. Conversely, they are typically lated as a fixed-point problem of semigroup flow induced by
prohibitively expensive and require different formula- the operator. Other studies [43, 44] proposed an alternative
tions and elaborate computer codes for conventional method to learn the eigenvalue problem by adding one or two
numerical methods. regularization terms to the loss function. More recently, Ref.

123
Physics-constrained neural network... Page 3 of 23 161

[49], a neural network framework based on the power method ent fissile nuclides enrichments [1]. Some studies focused
[50] was presented to solve eigenvalue problems and smallest on the use of neural networks to solve elliptic interface
eigenvalue problems, where the eigenfunction is expressed problems. Some researchers [57] use the idea of DRM and
by the neural network and iteratively solved following the formulated the PDEs into the variational problems, which
idea of the power method or inverse power method. How- can be solved using the deep learning approach. They pre-
ever, the scope was limited to linear operators and certain sented a novel mesh-free numerical method for solving
special eigenvalue problems. elliptical interface problems based on deep learning [58].
In a recent study, PINN was applied to solve neutron dif- They employed different neural networks in different sub-
fusion equations [45, 51], where the authors used a free domains and reformulated the problem as a least-squares
learnable parameter to approximate the eigenvalue and a problem. A similar case exists, in which the authors [53]
novel regularization technique to exclude null solutions from enforce the interface conditions using piecewise neural net-
the PINN framework. A conservative physics-informed neu- work. In contrast to the methods, a discontinuity-capturing
ral network (cPINN) was proposed in discrete domains for shallow neural network (DCSNN) [59] has been proposed for
nonlinear conservation laws [52]. Moreover, cPINN [53] was elliptic interface problems. The crucial concept of DCSNN
applied to solve heterogeneous neutron diffusion problems in is that a d-dimensional piecewise continuous function can
one-dimensional cases, which develops PINN for each sub- be extended to a continuous function defined in (d + 1)-
domain and considers additional conservation laws along the dimensional space, where the augmented coordinate variable
interfaces of subdomains (a general consideration in reactor labels the pieces of each subdomain. However, to the best of
physics [11]), which is involved in neural network training the authors’ knowledge, only a few studies focused on the
as the variable to be optimized. use of neural networks to solve eigenvalue problems that
More recently, a data-enabled physics-informed neural incorporate interface problems involving regions of different
network (DEPINN) [40] was proposed to solve neutron materials. However, challenges exist at least on three fronts.
diffusion eigenvalue problems. To achieve acceptable engi-
neering accuracy for complex engineering problems, it is • Designing a neural network that is more suitable for K-
suggested that a very small amount of prior data from phys- eigenvalue problems for more complicated/medium-size
ical experiments be used to improve the training accuracy test problems.
and efficiency. In contrast to PINN, which solves the neutron • Dealing with the interface problem in a more general
diffusion eigenvalue problem directly, an autoencoder-based and understandable manner when designing the neural
machine learning method in combination with the reduced- network.
order method [54, 55] was proposed [56] to solve the same • Proposing a framework effectively enhances the robust-
problem. However, it still relies on solving governing equa- ness of the neural network and improves the efficiency
tions with traditional numerical methods such as the finite of utilizing the noisy prior data.
difference method.
Although DRM provides a way to solve eigenvalue To address the aforementioned challenges and advance
problems with neural networks, as shown in Sect. 4.2.2, beyond the state-of-the-art research [45, 51, 53], we initially
results indicate that DRM is not stable when solving two- introduced the study [40]. This served as a preliminarily
dimensional cases. First, DRM learns the eigenvalue and demonstration of the applicability of the PINN approach to
eigenfunction at the early stage of the training process. Subse- reactor physics eigenvalue problems in complex engineering
quently, DRM attempts to learn a smaller eigenvalue after it is scenarios. The contributions of this study are as follows.
close to the true eigenvalue. Finally, DRM successfully learns
one smaller eigenvalue that may be close to the true eigen- • Firstly, we extend the Inverse Power Method Neural Net-
value and one incorrect eigenfunction that is meaningless and work (IPMNN)[49] to the so-called Generalized Inverse
far from the true eigenfunction. Additionally, the framework Power Method Neural Network (GIPMNN) to solve
of a neural network based on the power method [50] is unsuit- the smallest eigenvalue and the related eigenfunction
able for generalized eigenvalue problems. Therefore, it is of the generalized K-eigenvalue problems. Compared
necessary to propose a new algorithm to solve K-eigenvalue to DEPINN from our previous study [53], we omit the
problems. prior data in the training process and attempt to solve
The study focuses on eigenvalue problems, which are also the K-eigenvalue problems from a data-free mathemati-
interface problems in which the eigenfunctions are continu- cal/numerical perspective.
ous on the interface, and the derivatives of the eigenfunction • Then, we propose a Physics Constrained GIPMNN (PC-
are not continuous at the interface. Specifically, in the nuclear GIPMNN) to address the interface problem in a more
reactor physics domain, this is a general problem in which general and understandable manner with respect to pre-
the reactor core is composed of fuel assemblies of differ- vious studies [45, 51, 53].

123
161 Page 4 of 23 Q.-H. Yang et al.

• Finally, we conduct a thorough comparative study of Here,  s denotes the cross-section where a neutron scatters
GIPMNN, PC-GIPMNN, and DRM using a variety of in a different direction. The eigenvalue λ is obtained by mul-
numerical tests. We evaluate the applicability and accu- tiplying the neutron source in Eq. (4). The value balances the
racy of these three methods using typical 1D and 2D terms that produce neutrons with those that account for the
test cases in reactor physics, particularly accounting for losses. This is defined as the reciprocal of keff , i.e., λ = k1eff ,
material discontinuities in different geometries. In the where
1D example, we determine the optimal ratio of outer to
inner iterations, a finding that may be particularly rele- number of neutrons in one generation
keff = .
vant for GIPMNN. Additionally, we observe the failure number of neutrons in the preceding generation
of DRM in the 2D experiments, whereas PC-GIPMNN (6)
consistently outperforms GIPMNN and DRM over a set
number of epochs. Two main boundary conditions are imposed on the diffu-
sion equation. One condition represents a surface on which
The rest of this paper is organized as follows. The govern- neutrons are reflected back into the domain (reflective condi-
ing equations for the eigenvalue problems are presented in tion), and the other condition represents surfaces that allow
Sect. 2. In Sect. 3, we propose GIPMNN and PC-GIPMNN neutrons to escape from the system (vacuum or bare condi-
and introduce DRM in our cases. In Sect. 4, the results of tion). Both the conditions are satisfied by relating the flux
1D and 2D test cases are listed to verify the three meth- solution to its gradient on the boundary, i.e.,
ods. Finally, conclusions and future research are discussed

in Sect. 5.
1 ⎨ 1φ bare surface,
− D∇φ · n = 4 (7)
2 ⎩
0 reflective surface,
2 K-eigenvalue problems
where n denotes an outward point normal to the surface.
This section introduces the equations that govern the neutron
criticality over a spatial domain. We recall Eqs. (1) and (2). 2.1 PINN as a Eigenvalue solver
The generalized K-eigenvalue problem can be formulated as
follows: In this subsection, we discuss using PINN to solve the gen-
 eralized eigenvalue problem (Eq. (2)).
Lφ = λQφ, in , The eigenfunction of the operator L is approximated by
(3)
Bφ = g, on ∂, N θ , i.e., φ(x) = N θ (x). Then, Lφ and Bφ can be computed
using the AD. For the boundary conditions: A penalty term
where domains  ⊂ Rd , L, Q, and B denote the differential (Eq. (8)) is added to the loss function in PINN, which penal-
operators acting on the functions defined in the interior of izes the discrepancy between the approximated value on the
 and at the boundary of ( ∂). Furthermore, φ denotes boundary and exact boundary condition, where Nb denotes
the eigenfunction of the system and λ denotes the associated the number of sampling points on the boundary ∂ and x i
Nb
eigenvalue. In this preliminary study, inspired by the notable is one point in the sampling set {x i }i=1 .
work in [56], we utilize the one-group steady-state diffusion
equation for criticality, framing it as a generalized eigenvalue 
Nb

problem. It is expressed as: Loss b = |B(x i ) − g(x i )|2 . (8)


i=1

−∇ · (D∇φ) +  a φ = λv f φ, (4) Lφ, φ


λ= . (9)
φ, φ
where eigenfunction φ denotes the neutron flux, which is a
scalar quantity used in nuclear reactor physics. It corresponds As mentioned previously, we are concerned with the
to the total length travelled by all free neutrons per unit time smallest eigenvalue and associated eigenfunction. Hence, the
and volume.  a ,  f , and v denote the absorption cross sec- eigenvalue (Rayleigh quotient) is viewed as a loss term in
tion, fission cross section, and average number of neutrons PINN to attain the lowest eigenvalue.
produced per fission event, respectively. We follow [56] and Finally, the total loss function (Eq. (10)) in PINN corre-
use the diffusion coefficient approximation sponds to the weighted sum of the two objectives (Eq. (8))
and (Eq. (9)), where α and β corresponds to the weights.
1
D= . (5) Loss total = αλ2 + βLossb . (10)
3( a + s)

123
Physics-constrained neural network... Page 5 of 23 161

Unfortunately, PINN does not work for the cases in the


study and even performs worse than DRM. Therefore, we
only compared the results of our method with those of DRM.

Remark 1 It should be noted that the square of the eigenvalue


is used as the loss function in PINN. Given that the smallest
eigenvalue implies that the absolute value is the smallest,
PINN attempts to determine a negative infinity value without
using a square term.

3 Methodologies

In this section, we extend our previous study [49] and dis-


cuss the use of a neural network to numerically solve the
smallest eigenvalue problem. The main concept is to use a
neural network to approximate the eigenfunction and com-
pute the eigenvalue based on the Rayleigh quotient using
an eigenvector expressed by the points calculated using the
eigenfunction.

3.1 Neural network architecture

Next, we discuss the neural network structure used to approx-


imate eigenfunction φ(x). The neural network architecture
employed in the study is the same as ResNet [60], which is
built by stacking several residual modules. It is one of the
most popular models used in Deep Learning, and it is also
commonly used in the field for solving PDEs via neural net-
Fig. 1 Neural network architecture consists of two modules and one
works [24, 61, 62]. Each module has one skip connection, and linear output layer. Each module contains two fully-connected layers
each block consists of two fully connected layers as shown and a skip connection
in Fig. 1.
In the network, let x, x k ∈ Rd be the input and let W l
Remark 2 Given that φ(x) is the scalar of the neutron pop-
and bl , l = 1, 2, 3, 4 and 5 be the parameters in the fully
ulation and denotes the density of the free-moving neutron
connected layers. We use W rk and brk and k = 1 and 2 to
distribution over the spatial domain, we obtain φ(x) ≥ 0.
denote the parameters in the residual connections. The results
Therefore, the eigenfunction can be expressed as φ(x) =
s1 and s2 for the modules can be expressed as follows:
(N θ (x))2 .
s1 = σ (W 2 (σ (W 1 x + b1 )) + b2 ) + r 1 (x), (11)
s2 = σ (W 4 (σ (W 3 s1 + b3 )) + b4 ) + r 2 (s1 ), (12) 3.2 Recap of inverse power method neural network

where σ denotes the activation function and is chosen as tanh. We consider the following linear eigenvalue problem:
Furthermore, r k , k = 1, and 2 are functions in the residual
connections, which can represented as follows: Lφ = λφ, in , (15)

r k (x k ) = σ (W rk x k + brk ). (13) which differs from the generalized eigenvalue problem Lφ =


λQφ.
Subsequently, the neural network N θ (x) is expressed as Inspired by the idea of the inverse power method, IPMNN
[49] was proposed to solve for the smallest eigenvalue and
N θ (x) = W 5 s2 (s1 (x)) + b5 . (14) associated eigenfunction. Equation (16) depicts the key step
of the inverse power method, and equation (17) is analogous
Therefore, the eigenfunction φ(x) = N θ (x), where θ to equation (16), where A denotes a matrix, and λk−1 and
denotes neural network parameters.

123
161 Page 6 of 23 Q.-H. Yang et al.


φ k−1 denote the results of previous iteration. Therefore, λk ⎨ Aφ k = λk−1 Bφ k−1 ,
and φ k are obtained by using Eq. (16).  Aφ k , φ k (21)
⎩ λk = .
⎧ Bφ k , φ k

⎪ pk = A−1 φ k−1 ,



⎨ φ = pk .
k We use a neural network N θ to represent the approximated
pk (16)

⎪ eigenfunction .

⎪  Aφ k , φk

⎩ λk = .
φ k , φ k ⎧
Lk ⎨ Lk = λk−1 Qk−1 ,
= k−1 . (17) Lk , k (22)
Lk ⎩ λk = .
Qk , k
Here, the neural network N θ represents the approximated
eigenfunction k at the kth iteration of Eq. (17). Given that In GIPMNN, Eq. (22) is an analog of Eq. (21), where
k−1 , which is obtained from the last iterative step and fol- L and Q denote linear differential operators implemented
lows the main idea of inverse power method, we must solve by AD as opposed to specially discretized matrices. In a
k using Pk = L−1 k−1 and k = Pk / Pk . However, it manner similar to the generalized inverse power method,
is difficult to obtain the inverse operator L−1 for the differen- we record the results λk−1 of previous iteration. In contrast
tial operator L. Therefore, k is obtained without knowing to the generalized inverse power method, instead of record-
the inverse operator. The term Lk is computed using AD in ing φ k−1 , we record Qk−1 . It should be noted that k−1
Eq. (17). The main idea is that it is not necessary to calculate denotes the eigenfunction represented by the neural network
L−1 , and the eigenfunction can be approximated iteratively in (k − 1)th iteration, and Qk−1 is realized by AD. In k-th
by minimizing the defined loss (18) to approach Eq. (17), iteration, we directly compute k using the neural network,
where x i ∈ S is the dataset, and N denotes the number of i.e., k = N θ , and calculate Lk using AD. We obtain
points in S. The eigenvalue in the kth iteration is obtained by k directly through the neural network instead of solving
using (19). the equation Lk = λk−1 Qk−1 , We define the loss func-
tion Loss gipmnn in Eq. (23) to propel the neural network to
N 
 Lk (xi ) 2 learn k . When the neural network converges, we obtain the
Loss ipmnn (θ ) = − k−1 (xi ) . (18) smallest eigenvalue, and the associated eigenfunction can be
Lk
i=1
expressed using a neural network.
Lk , k
λk = . (19)
k , k

N
Loss gipmnn = |Lk (xi ) − λk−1 Qk−1 (xi )|2 . (23)
3.3 Generalized inverse power method neural
i=1
network

In standard nuclear engineering procedures, we normally


employ the inverse power method to solve for the small- Remark 3 In the algorithm of GIPMNN, given that the initial
est eigenvalue and associated eigenvector, which relies on function 0 is not represented by the neural network, it is not
the discretization of Eq. (3). IPMNN [49] was proposed to possible to obtain Q0 using the AD. Therefore, we chose
solve the smallest eigenvalue problem, which is a mesh-free an arbitrary function Q0 .
method realized by a neural network. However, the method is
restricted to solving the equation Lφ = λφ, which is a sim- The Neumann and Robin boundary conditions were used
ple form. Therefore, we propose GIPMNN to solve Eq. (3). for the eigenvalue problem. It is difficult to enforce them by
Details of algorithm of GIPMNN is presented in Algorithm encoding the boundary conditions into a neural network, as
1. in [49, 63, 64], where the Dirichlet and periodic boundary
First, the generalized inverse power method is used to conditions are used.
solve Eq. (20). The key step to focus on is given in Eq. (21), We form the loss function Loss b in Eq. (8), where Nb
where A and B are two matrices and λk−1 and φ k−1 are the denotes the number of sampling points on ∂, and x i is a
Nb
results of the previous iteration. Therefore, λk and φ k are point in the sampling set {x i }i=1 .
obtained by using Eq. (21). Therefore, the loss function is defined in Eqs. (24) and
(25) as follows: where the surface indicates that neutrons
are reflected back into the domain or that the surface allows
Aφ = λBφ. (20) neutrons to escape from the system.

123
Physics-constrained neural network... Page 7 of 23 161

Algorithm 1: GIPMNN for Finding the Smallest Eigen- 3.4 Physics constrained generalized inverse power
value method neural network
Given N denotes the number of points for training the neural
network, Nb denotes the number of points on the boundary ∂, Although GIPMNN can solve the eigenvalue problems of
Length denotes a measure of ∂, Nepoch denotes the maximum
Eq. (3), it is still difficult to solve eigenvalue problems with
number of epochs, and  denotes the stopping criterion
Step 1: Build data set S for the loss of the Rayleigh quotient and discontinuous coefficients in different regions. We discuss
data set Sb for the loss of boundary. interface problems, which implies that the eigenfunction may
Step 2: Initialize a neural network with random initialization of be continuous at the interface, and the derivatives of the
parameters.
eigenfunction may not be continuous at the interface. The
for k = 1, 2, · · · , Nepoch do
Input all points in S and Sb into neural network Nθ . enforcement of interface conditions is very important for
Let k (x i ) = N θ (x i ), where x i ∈ S Sb ; GIPMNN.
<Lk ,k > Nb
Lossdr m = α < Qk ,k >
+β Length
Nb i=1 |B k (xi )− g(xi )| .
2
In the study, inspired by the idea of a piecewise deep neural
Update parameters θ of the neural network via gradient network [58], we propose a PC-GIPMNN to solve eigen-
descent.
θ k+1 = θ k − η∇θ J (θ k ), where η is the learning rate and θ k is value problems with interface conditions. However, instead
the vector of parameters in k-th iteration. of employing different neural networks in different subdo-
if Lossdrm <  then mains, we use only one neural network and multiple neurons
Record the best eigenvalue and eigenfunction.
If the stopping criterion is met, then the iteration can be
in the output layer, as shown in Fig. 2. It should be noted that
stopped. each neuron in the output layer corresponds to a subdomain.
end Therefore, we can obtain outputs in different subdomains that
end can be used to enforce the conditions at the interface.
Suppose that there are two domains l and r , with
an interface , which is the cross region between the two
domains. Given the properties of the neutron population
⎧ φ(x), we can summarize that the eigenfunction will satisfy

⎪ Nb
1

⎪ Loss b = |− D(x i )∇(x i ) · n|2 bare surface, two interface conditions, i.e., (27) and Eq. (28), where φl



⎪ 2 and φr represent the eigenfunctions defined in l and r m,


i=1
⎨ (24) respectively, and n denotes the normal vector pointing from


⎪ Nb
1 1 r to l . Dl and Dr are the coefficients defined in l and r ,

⎪ |− D(x i )∇(x i ) · n − (x i )|2 reflective surface.

⎪ respectively, which are discontinuous at the interface. Equa-

⎪ 2 4


i=1
tion (27) indicates that the eigenfunction is continuous at the
(25) interface, i.e., the neutron flux is continuous at the interface.
Equation (28) indicates that neutron flow is continuous at the
The total loss function (Eq. (26)) denotes the weighted interface.
sum of the objectives (Eq. (23)) and (Eq. (8)). For the process
of GIPMNN, refer to Algorithm 1.
φl = φr , (27)
Loss total = αLoss gipmnn + β Loss b . (26) −Dl ∇φl · n = −Dr ∇φr · n. (28)

Remark 4 In Eq. (26), α and β denote the weights of the two


losses. It is noted that β ≥ α when training the neural net- Assume that S corresponds to the set of points at the
work, particularly in the method GIPMNN. Given this issue, interface  and |S | denotes the number of points in S . We
it is easy for a neural network to determine the eigenvalue then introduce two penalty terms to enforce the two interface
and eigenfunction. conditions: Equation (29) and Eq. (30), where xi ∈ S .

Fig. 2 Illustration of
PC-GIPMNN architecture
diagram. There are multiple
neurons in the output layer,
which denote the eigenfunctions
in different subdomains

123
161 Page 8 of 23 Q.-H. Yang et al.

|S |
 loss function based on variational formulations. The solu-
Loss i1 = |l (xi ) − r (xi )|2 , (29) tions of PDEs are represented by a neural network, and the
i=1 derivatives are calculated using AD. DRM [24] is also used
|S |
 to solve eigenvalue problems. Furthermore, we specify how
Loss i2 = |(−Dl ∇l (xi ) · n) − (−Dr ∇r (xi ) · n)|2 . to use DRM to solve Eq. (3).
i=1
We consider the variational principle of the smallest eigen-
(30) values.

⎪  Lφ · φdx
By combining (26), (29), and (30), Equation (31) is the
⎨ min ,
total loss defined in our method PC-GIPMNN, where α, β, γ ,
 Qφ · φdx (33)
and δ are the weights of the different losses. In subsequent ⎪

s.t. Bφ|∂ = g,
experiments, we chose 1. In the study, we focused on the
proposed algorithms and neglected the influence of weights.
where Rayleigh quotient was used. The boundary conditions
We are expecting that our proposed algorithms are universal
were enforced by adding a penalty term.
and achieve better results, even without adjusting the weights.
Therefore, we select all weights as 1. 
min |Bφ − g|2 d s, (34)
∂
Loss total = αLoss gipmnn + β Loss b + γ Loss i1 + δLoss i2 .
and the total loss function Loss drm is defined as:
(31)

 Lφ · φdx
Moreover, the eigenfunction can be represented as: Loss drm = α +β |Bφ − g|2 ds, (35)
 Qφ · φdx ∂
φ = ll φl + lr φr , (32)
where α and β denote the weights of different losses. We
chose α = 1 and β = 1 for our experiments.
where, ll and lr denote the indicator functions, ll = 1 in l ,
After the optimal approximation is obtained by solving the
ll = 0 in r , lr = 1 in r and lr = 0 in l .
optimization problem (35), we obtain the smallest eigenvalue
Remark 5 Conservative PINN (cPINN) [53] developed PINN Lφ·φdx
λ =  Qφ·φdx and eigenfunction represented by the trained

for each subdomain and considered additional conservation neural network. It should be noted that Lφ, Qφ, and Bφ are
law along the subdomains’ interfaces (a general consider- computed using the AD. For the process of DRM, refer to
ation in reactor physics [11]). However, in neural network Algorithm 2.
training, eigenvalue is involved as a variable to be optimized,
and the numerical examples that are presented correspond to Remark 7 We enforced the boundary condition by adding
only one-dimensional cases. Furthermore, the relative errors a penalty term, (34). However, if the boundary condition
of keff in these cases are 4.4800 × 10−04 and 3.3500 × 10−04 . is the Neumann or Robin boundary condition, we do not
Similarly, we use Eqs. (29) and (30) to enforce the interface use the penalty term (34) because the boundary condition
conditions. As shown below, our methods are more generic is incorporated into the Rayleigh quotient based on Green’s
and yield better results. first identity [65].

Remark 6 In [45], the impact of the interface conditions was


ignored, and the relative errors of keff in their cases corre-
sponded to 1.3 × 10−03 and 4.4 × 10−03 , respectively, and 4 Experiments
the study did not involve the smallest eigenvalue problem. In
this study, we obtain the lowest eigenvalue using the inverse In this section, we present numerical experiments to compare
power method as opposed to using a free learnable param- the applicability and accuracy of GIPMNN, PC-GIPMNN,
eter to approximate the eigenvalue. Our numerical results and DRM for solving the smallest eigenvalue problems in
demonstrate that accurate results can be obtained in more reactor physics. In all the experiments below, we chose
complicated cases. the Adam optimizer with an initial learning rate 10−3 to
minimize the loss function. Furthermore, we trained the neu-
3.5 Deep Ritz method ral network with the architecture of ResNet on a server
equipped with CentOS 7 system, one Intel Xeon Platinum
DRM is a deep-learning-based method for numerically solv- 8358 2.60-GHz CPU, and one NVIDIA A100 80GB GPU.
ing variational problems [24]. It reformulates the original Moreover, unless otherwise specified, the activation function
PDEs into equivalent variational equations and defines the was selected as the tanh function.

123
Physics-constrained neural network... Page 9 of 23 161

Algorithm 2: DRM for Finding the Smallest Eigenvalue Table 1 Six sets of material cross-sections in the work [66] are used to
test GIPMNN and DRM
Given N denotes the number of points for training the neural
network, Nb denotes the number of points on the boundary ∂, Problem Region  a (cm−1 )  s (cm−1 ) v f (cm−1 )
Length denotes a measure of ∂, Nepoch denotes the maximum
number of epochs, and  denotes the stopping criterion F1 Region1 0.4 2.0 0.5
Step 1: Build data set S for the loss of the Rayleigh quotient and Region2 0.4 2.0 0.5
data set Sb for the loss of boundary.
Region3 0.4 2.0 0.5
Step 2: Initialize a neural network with random initialization of
parameters. F2 Region1 0.4 2.0 0.5
for k = 1, 2, · · · , Nepoch do Region2 0.6 2.0 0.3
Input all points in S and Sb into neural network Nθ .
Region3 0.4 2.0 0.5
Let k (x i ) = N θ (x i ), where x i ∈ S Sb ;
<Lk ,k > Nb F3 Region1 0.4 2.0 0.5
Lossdr m = α < Qk ,k >
+β Length
Nb i=1 |B k (xi )− g(xi )| .
2

Update parameters θ of the neural network via gradient Region2 0.4 2.0 0.5
descent. Region3 0.6 2.0 0.3
θ k+1 = θ k − η∇θ J (θ k ), where η is the learning rate and θ k is
R1 Region1 0.4 2.0 0.5
the vector of parameters in k-th iteration.
if Lossdrm <  then Region2 0.4 2.0 0.5
Record the best eigenvalue and eigenfunction. Region3 0.4 2.0 0.5
If the stopping criterion is met, then the iteration can be
R2 Region1 0.4 2.0 0.5
stopped.
end Region2 0.6 2.0 0.3
end Region3 0.6 2.0 0.3
R3 Region1 0.4 2.0 0.7
Region2 0.5 2.0 0.4
Region3 0.5 2.0 0.4

To generate the necessary data to validate the accuracy


of GIPMNN, PC-GIPMNN, and DRM, FreeFEM [67–73] is
Fig. 3 Macroscopic cross-sections for the 1D slab reactor, which is utilized to solve these problems in Table 1. FreeFEM is a par-
modified based on the figure reported in [66]. There are three regions
tial differential equation solver for non-linear multi-physics
with different materials and functions
systems using the finite element method. We choose num-
ber of cells in the x-direction N = 1001 and mesh size
x = 10−3 . The solution of the baseline is obtained by FEM,
4.1 One-dimensional slab reactor and uniform cells are used to train GIPMNN, PC-GIPMNN,
and DRM.
We consider one-dimensional case with a simple slab reactor
consisting of a domain bounded by vacuum or bare surfaces, Remark 8 When solving all the parameter-dependent prob-
as shown in Fig. 3. The length of each slab reactor was 10 cm. lems below, parameters  a ,  s , and v f are selected based
It consists of fuel and control rod regions labeled 1, 2, and 3. on whether the data point x belongs to a related region. For
The control rods were located between 2.2 and 2.5 cm and PC-GIPMNN, the data points on the interface are considered
between 7.5 and 7.8 cm on the x-axis. to belong to multiple regions simultaneously. Therefore, we
As shown in Fig. 3, there were two control rods in the one- can enforce the interface conditions using data points on the
dimensional slab reactor. Either device can be withdrawn or interface.
inserted. Three scenarios were designed to model the reac-
tor and completely simulate the actions of the control rods. Remark 9 As unsupervised algorithms, our methods use only
The three situations are labeled F1, F2, and F3 in Table 1. points to train a neural network without any prior knowledge.
They indicate that both rods are withdrawn, only the left rod Therefore, there was no test set for the proposed algorithm.
is inserted, and only the right rod is inserted. We also con-
sidered three other problems, R1, R2, and R3 in Table 1. 4.1.1 Using GIPMNN to solve for Eigenvalue
Problem R1 is the same as F1, which is designed to simu-
late the withdrawal of both control rods. Here, we use R1 to In this one-dimensional example, we select 20 neurons for
facilitate comparison with R2 and R3. Although problems each hidden layer in ResNet and Nepoch = 50000. Without a
R2 and R3 denote that all the control rods are inserted, prob- loss of generality, the weights α and β of the different losses
lem R2 resembles heavily inserted control rods, and problem are not adjusted to achieve better results. Therefore, α and β
R3 resembles slightly inserted control rods. were set to one.

123
161 Page 10 of 23 Q.-H. Yang et al.

In Eq. (21), we must solve for φ k using the eigenval- Algorithm 3: Iterations in Algorithm 1 are split into
ues λk−1 and φ k−1 . To accelerate the training process and inner iterations and outer iterations.
obtain more accurate results using Algorithm 1, we split the for k = 1, 2, · · · , Nouter do
iterations in the original Algorithm 1 into inner and outer iter- for j = 1, 2, · · · , Ninner do
ations, which can be observed in Algorithm 3, where Ninner Input all points in S and Sb into neural network N θ .
Let k (x i ) = N θ (x i ), where x i ∈ S Sb ;
and Nouter denote the number of inner and outer iterations, N
Lossgi pmnn = i=1 |Lk (x i ) − λk−1 Qk−1 (x i )|2 .
respectively. Nb
Lossb = i=1 |Bk (x i ) − g(x i )|2 ,
We chose the ratios of the outer and inner iterations as Losstotal = αLossgi pmnn + β Lossb ,
1 : 1, 1 : 10, 1 : 100, 1 : 1000, and 1 : 10000 to investigate if j = Ninner then
the effect of the ratio on the results. Ratio 1 : n implies that k−1 = k / k .
<(L)k ,k >
λk−1 = <( Q)k ,k >
,
the outer code is executed once, whereas the inner code is exe-
end
cuted n times. For comparison, the total number of iterations
Update parameters θ of the neural network via gradient
was fixed at Ntotal = 100000 and Ntotal = Ninner ×Nouter . The descent.
relative errors of keff and eigenfunction for different ratios θ k, j+1 = θ k, j − η∇θ Losstotal (θ k, j ), where η denotes the
of outer and inner iterations during the training process are learning rate and θ k, j denotes the vector of parameters in
rel are shown in the first row (k j)-th iteration.
shown in Fig. 4. The results of keff end
and those of φ are shown in the second row, where the
rel
if Losstotal <  then
relative errors of keff and eigenfunction are calculated by Record the best eigenvalue and eigenfunction.
If the stopping criterion is met, then the iteration can be
|keff (F E M) − keff (N N )| stopped.
rel
keff = , (36) end
keff (F E M) end
max(|φ(F E M) − φ(N N )|)
x
φ rel = , (37)
max(|φ(F E M)|)
x
(29) and (30) are implemented as loss functions defined in
respectively. As shown in the figures, the best ratio is 1 : 1 (38) and (39) in the 1D example.
because the relative errors of keff and eigenfunction of the
ratio 1 : 1 is relatively smaller than the others when training Loss i1 = |u(xi1 ) − v(xi1 )|2 + |v(xi2 ) − w(xi2 )|2
the neural network. The convergence worsens when the ratio +|w(xi3 ) − p(xi3 )|2 + | p(xi4 ) − q(xi4 )|2 , (38)
of the outer and inner iterations changes from 1 : 1 to 1 : Loss i2 = |(−D1 u x (xi1 )) − (−D2 vx (xi1 ))|2
10000. Therefore, we trained GIPMNN using a ratio 1 : 1 to
+|(−D2 vx (xi2 )) − (−D1 wx (xi2 ))|2
solve 1D and 2D problems.
Moreover, we fixed the number of outer iterations +|(−D1 wx (xi3 )) − (−D3 px (xi3 ))|2
Nouter = 1000 and retrained the neural network using the +|(−D3 px (xi4 )) − (−D1 qx (xi4 ))|2 . (39)
ratios. The results are shown in Fig. 5. The opposite results
are observed when compared with the results in Fig. 4. The The eigenfunction can be expressed as follows:
ratio 1 : 1 is the worst ratio because increases in inner iter-
ations lead to a better approximation of the eigenfunction in φ = ul1 + vl2 + wl3 + pl4 + ql5 , (40)
the next outer iteration. This is consistent with the results of
the inverse power method. where l1 , l2 , l3 , l4 , and l5 are indicator functions that are 1 or
0 based on the input x inside or outside the subdomain.
4.1.2 Using PC-GIPMNN to solve for Eigenvalue
4.1.3 Using DRM to solve for Eigenvalue
We divided the slab reactor into five parts from left to right.
The output layer included five neurons, and five functions The configuration of DRM is identical to that of GIPMNN.
were defined in different subdomains. Here, u, w, and q Given that it is a variational formulation, and the boundary
denote the functions defined in Region 1, v and p are those condition is incorporated into the Rayleigh quotient, we do
defined in Regions 2 and 3, respectively. not use the penalty term to enforce the boundary condition.
As shown in Fig. 3, the four points are denoted as xi1 =2.2, The loss function Loss drm is defined as:
xi2 =2.5, xi3 = 7.5, and xi4 = 7.8. The interface conditions

123
Physics-constrained neural network... Page 11 of 23 161

Fig. 4 (Color online) Relative error of keff and φ for problems F1, F2, and F3 (from (a) to (c)) with respect to different ratios of outer and inner
rel and the second row shows the results of φ rel . The neural network is trained
iterations during training process. The first row shows the results of keff
with Ntotal = 100000

Fig. 5 (Color online) Relative error of keff and φ for problems F1, F2, and F3 (from (a) to (c)) with respect to different ratios of outer and inner
rel and second row shows the results of φ rel . The neural network is trained
iterations during training process. The first row shows the results of keff
with Nouter = 1000

123
161 Page 12 of 23 Q.-H. Yang et al.

Length slab N Length slab N


N i=1 D|∇φ(xi )|2 + 21 (φ(xl )2 + φ(xr )2 ) + N i=1  φ(x i )
a 2
Loss drm = Length slab N
, (41)
i=1 v φ(x i )
f 2
N

where Length slab denotes the length of the slab reactor, xl


reactor physics [1], where the normalization constant is gen-
and xr denote points that show the positions of the endpoints
erally computed to make the average reactor power equal to
of the slab reactor, and N denotes the number of points used
unity; thus, the eigenfunctions are normalized by Eq. (42),
to approximate the integral. Therefore, the smallest eigen-
where N denotes the number of training points in the entire
value λ can be obtained after the neural network converges.
domain.
4.1.4 Results N
φnorm = N
φ. (42)
The results for the one-dimensional example are listed in i=1 φ(x i )
Tables 2, 3, and 6. The values of keff and their relative errors
are listed in Table 2. For problems F1 and R1, which have The figures in the first row show the results for problems F1,
continuous coefficients, the results obtained by GIPMNN are F2, and F3 and those in the second row show the results for
better than those obtained by PC-GIPMNN and DRM. For problems R1, R2, and R3. In each figure, we plot the eigen-
problem F2, the relative error of keff obtained by DRM is bet- function obtained by FEM and compare the relative errors of
ter than those obtained by other methods. For other problems the eigenfunctions computed by GIPMNN, PC-GIPMNN,
with discontinuous coefficients, the results obtained using the and DRM. Evidently, the results obtained by PC-GIMPNN
PC-GIPMNN are better than those obtained by GIPMNN and are better than those obtained by the other methods, which is
DRM. The relative error of keff computed by PC-GIPMNN is consistent with the results in Table 3.
approximately 10−5 , which is lower than 10−4 as computed As reported in a previous study, [49], IPMNN can attain
by DRM. The relative errors in the eigenfunctions are listed more accurate eigenvalues when compared to those obtained
in Table 3. For all problems, PC-GIPMNN can attain better with DRM when linear eigenvalue problems without inter-
results than GIPMNN and DRM. face are considered. Therefore, IPMNN and GIPMNN are
In Fig. 6, the eigenfunctions of GIPMNN, PC-GIPMNN, suitable to determine the eigenfunction of a problem with a
and DRM are compared with those of FEM. Specifically, strong form, and DRM is similar to FEM in that DRM is
we follow the conventional normalization process in nuclear applicable for finding the eigenfunction of a problem with
a weak form. Consequently, DRM is better than GIPMNN
rel denotes the
Table 2 Results obtained by GIPMNN, PC-GIPMNN, and DRM compared with those obtained by FEM for problems in Table 1. keff
relative error of keff
Problem keff (FEM) keff (GIPMNN) keff (PC-GIPMNN) keff (DRM) rel (GIPMNN)
keff rel (PC-GIPMNN)
keff rel (DRM)
keff

F1 1.2127 1.2127 1.2127 1.2128 2.5142 × 10−06 3.7743 × 10−06 5.8568 × 10−05
F2 1.1973 1.1970 1.1973 1.1972 2.3429 × 10−04 2.6246 × 10−04 1.1679 × 10−04
F3 1.1973 1.1972 1.1973 1.1971 5.5503 × 10−05 1.6898 × 10−05 1.6809 × 10−04
R1 1.2127 1.2127 1.2127 1.2128 2.5142 × 10−06 3.7743 × 10−06 5.8568 × 10−05
R2 1.1715 1.1697 1.1715 1.1707 1.5629 × 10−03 3.2020 × 10−05 7.2244 × 10−04
R3 1.6498 1.6484 1.6498 1.6487 8.5897 × 10−04 2.9838 × 10−05 6.7968 × 10−04
The bold numbers are the best results

Table 3 Results obtained by


Problem φ rel (GIPMNN) φ rel (PC-GIPMNN) φ rel (DRM)
GIPMNN, PC-GIPMNN, and
DRM compared with those
F1 2.7301 × 10−03 6.3620 × 10−04 2.6313 × 10−03
obtained by FEM for problems
in Table 1. φ rel denotes the F2 2.6279 × 10−02 1.6140 × 10−03 8.2217 × 10−03
relative error of eigenfunction F3 2.7403 × 10−02 1.0120 × 10−03 6.2428 × 10−03
R1 2.7301 × 10−03 6.3620 × 10−04 2.6313 × 10−03
R2 2.5768 × 10−02 7.5946 × 10−04 6.8807 × 10−03
R3 1.4559 × 10−02 7.5257 × 10−04 8.1038 × 10−03

123
Physics-constrained neural network... Page 13 of 23 161

Fig. 6 (Color online) Eigenfunctions obtained by GIPMNN, PC- row shows the results of problems F1, F2, and F3, and the second row
GIPMNN, and DRM are compared with those obtained by FEM for shows the results of problems R1, R2, and R3
one-dimensional example. The eigenfunctions are normalized. The first

in this one-dimensional example with discontinuous coef-


ficients. However, given that the interface conditions are
well implemented in PC-GIPMNN, it successfully learns the
eigenvalue and eigenfunction and achieves better results than
GIPMNN and DRM, as shown in Tables 2 and 3.
Remark 10 Specifically, DRM is applicable to determine the
eigenfunction of a problem with a weak form, which implies
that the eigenfunction exhibits low regularity. Subsequently,
as shown in the implementation of GIPMNN, it is necessary
for the eigenfunction obtained from GIPMNN to exhibit high
regularity. Therefore, the learned eigenfunction is in a strong
form. Hence, GIPMNN is unable to obtain accurate values
at the interface. However, PC-GIPMNN does not require the
eigenfunction to exhibit a higher regularity at the interface
but instead guarantees continuity and physical constraints by
realizing the interface conditions. Therefore, PC-GIPMNN
successfully learns the eigenvalue and eigenfunction and
obtains better results when compared to GIPMNN and DRM.

Fig. 7 Geometry of a 2D reactor core with graphite, fuel, and four


4.2 Two-dimensional reactor control-rod regions labeled as 1, 2, 3, and 4. The figure is similar to the
figure in a previous study [66]
As shown in Fig. 7, a two-dimensional reactor is modeled
in a square-shaped domain with 90-cm sides. The reactor
was surrounded by a neutron reflector with graphite material, withdrawn. When the control rods are withdrawn, materials
which implies that Robin boundary condition was selected. in the regions are replaced with water, which corresponds
The main bulk of the reactor corresponds to the fuel region. to common practice in many reactor designs. Table 4 lists
Within its central region, four control rods can be inserted or five different materials used in the mock reactor. However,

123
161 Page 14 of 23 Q.-H. Yang et al.

Table 4 Cross section data for various materials of the 2D reactor. This In the two-dimensional case, we use FreeFEM to solve
data are similar to those reported in a previous study [66] the problems listed in Table 5. We chose uniform grids
Material  a (cm−1 )  s (cm−1 ) v f (cm−1 ) with x = 90 1
and y = 90 1
. We trained GIPMNN, PC-
GIPMNN, and DRM with N x = 91 and N y = 91.
Fuel 1 0.075 0.53 0.079
Fuel 2 0.072 0.53 0.085
4.2.1 Using GIPMNN and PC-GIMPNN to solve for
Water 0.01 0.89 0.0
Eigenvalue
Control rod 0.38 0.2 0.0
Graphite 0.15 0.5 0.0
The number of points N = N x N y is used to calculate
Loss gipmnn and number of points Nb = 2(N x − 2) + 2(N y −
2) + 4 is used to calculate Loss b . The number of neurons
is 20 for each hidden layer in ResNet and Nepoch = 500000
the two types of fuels in Table 4 are designed to simulate for both GIPMNN and PC-GIPMNN. Without a loss of gen-
different fuel materials. Fuel type 1 defines the standard fuel erality, α and β are set to one. As mentioned previously, we
used in most problems, and fuel type 2 defines an adjusted use the optimal ratio of the outer and inner iterations, which
fuel composition that is different from fuel type 1. Fuel type is 1 : 1.
2 in problem R7 was used to test whether our methods can The 2D reactor is divided into six parts, as shown in Fig. 7.
be affected by different types of fuels. The output layer includes six neurons, and there are six func-
As shown in Table 5, there are 12 problems in validat- tions that are defined in different subdomains and are labeled
ing the accuracy of the proposed method. Five full models as u, v, w, r , p, and q, where u, v, w, and r denote functions
which are labeled as F1-F5, were proposed to simulate the defined in CR1, CR2, CR3, and CR4, and p and q denote
reactor with all control rods removed and then with only one functions defined for Fuel and Graphite, respectively. SCR1 ,
control rod inserted in the regions of the control rods. As SCR2 , SCR3 , SCR4 , and SGF denote different datasets at dif-
previously discussed, when the control rod was removed, the ferent interfaces.
material in the region was replaced with water. Therefore, W For PC-GIPMNN, interface conditions (29) and (30) are
is used to denote water in Table 5. Other seven reactor config- implemented as loss functions (43) and (44) in the 2D exam-
urations, denoted by R1-R7, were proposed to simulate the ple.
cases where more control rods were inserted. Problems R1
and R2 are equivalent to the full model problems F1 and F2: |S
CR1 |

Problems R3-R6 utilized different combinations of inserted Loss i1 = |u(x i ) − p(x i )|2
i=1
and rejected control rods. It should be noted that in prob-
|S
CR2 | |S
CR3 |
lem R7, the material configuration differs from the material
+ |v(x i ) − p(x i )|2 + |w(x i ) − p(x i )|2
configuration of other problems. The fuel type was replaced
i=1 i=1
with fuel type 2, and the control rods were assumed to be par- |S
CR4 |
tially inserted, which implies that the materials in the regions + |r (x i ) − p(x i )|2
corresponded to a mix of control rods and water materials. i=1

Table 5 Configurations for


Problem Fuel type C-R1 C-R2 C-R3 C-R4
reactor problems with different
materials. The configurations F1 Fuel 1 W W W W
are similar to those in a previous
study [66]. C-R denotes the F2 Fuel 1 CR W W W
control rod region, CR denotes a F3 Fuel 1 W CR W W
control rod material, and W F4 Fuel 1 W W CR W
denotes water
F5 Fuel 1 W W W CR
R1 Fuel 1 W W W W
R2 Fuel 1 CR W W W
R3 Fuel 1 W CR CR W
R4 Fuel 1 W W CR CR
R5 Fuel 1 CR CR W CR
R6 Fuel 1 CR CR CR CR
1 1 1 9 3 7 1 4
R7 Fuel 2 4 CR+ 2 W 10 CR+ 10 W 10 CR+ 10 W 5 CR+ 5 W

123
Physics-constrained neural network... Page 15 of 23 161

|S
GF |
4.2.2 Using DRM to solve for Eigenvalue and the failure of
+ | p(x i ) − q(x i )|2 , (43) DRM during the 2D experiments
i=1
Loss i2
Given that the homogeneous Neumann boundary condition
|S
CR1 |
is used, the loss function in DRM can be defined as:
= |(−DCR1 ∇u(x i ) · n) − (−DFuel ∇ p(x i ) · n)|2
i=1
|S
CR2 | Loss drm
+ |(−DCR2 ∇v(x i ) · n) − (−DFuel ∇ p(x i ) · n)|2 Ar easquare N Ar easquare N
2 a 2
i=1 N i=1 D|∇φ(xi )| + N i=1  φ(xi )
= Ar easquare
,
|S
CR3 | N
i=1 v φ(xi )
f 2
N
+ |(−DCR3 ∇w(x i ) · n) − (−DFuel ∇ p(x i ) · n)| 2
(46)
i=1
|S
CR4 |
+ |(−DCR4 ∇r (x i ) · n) − (−DFuel ∇ p(x i ) · n)|2 where Ar easquare denotes the area of the square domain as
i=1 shown in Fig. 7.
|S
GF | We use the number of epochs Nepoch = 500000 in DRM.
+ |(−DFuel ∇ p(x i ) · n) − (−DGraphite ∇q(x i ) · n)|2 . First, we found that the DRM can learn the eigenvalues and
i=1 eigenfunctions at an early stage of the training process. Sub-
(44) sequently, the DRM attempts to learn a smaller eigenvalue
after it is close to the true eigenvalue. Finally, the DRM suc-
cessfully learns one smaller eigenvalue that may be close
The eigenfunction is expressed as follows: to the true eigenvalue and one terrible eigenfunction that is
meaningless and far from the true eigenfunction. This phe-
nomenon is illustrated in Fig. 8.
φ = ul1 + vl2 + wl3 + rl4 + pl5 + ql6 , (45)
As listed in Table 6, although the neural network in DRM
fails to learn the eigenfunction, the eigenvalue is close to
where l1 , l2 , l3 , l4 , l5 and l6 denote indicator functions. the true value. This phenomenon may be caused by min-
imization of Rayleigh quotient. As mentioned in [49], the
loss approaches zero, whereas the eigenvalue may not reach
Remark 11 In the experiment, it was important to use Eq. zero.
(42) instead of the original L2 norm, which is too small to In Fig. 8, we observe that the failure of DRM during the
optimize the neural network. Specifically, the L2 norm of 2D experiments occurs after Nepoch ≥ 60000. Therefore, we
the eigenfunction corresponds to one if we attempt to find select Nepoch = 50000 to train DRM again and record the
a normalized eigenfunction. If the total number of points N results. In a previous study [40], the stopping criteria of train-
is excessively high, then the value of each component of the ing process for PINN was investigated. In future studies, we
eigenvector is excessively low to such an extent that it is will follow the technique discussed in [40] to determine the
difficult for the neural network to learn the eigenfunction. stopping criteria for DRM. In the next section, we compare

Fig. 8 (Color online) Failure of DRM during the 2D experiments. DRM fails to learn the eigenfunctions of F1-F5 and R3-R7

123
161 Page 16 of 23 Q.-H. Yang et al.

Table 6 Failure of DRM during the 2D experiments. The eigenfunction 4.2.3 Results
of F1-F5 and R1-R7 is not correct, but the eigenvalue is to the true value
Similar to the results of the one-dimensional case, the relative
Problem keff (FEM) keff (DRM) rel (DRM)
keff φ rel (DRM) errors of keff and eigenfunction φ are shown in Tables 7 and
F1 1.0118 1.0423 3.0156 × 10−02 16.6580 8. It can be observed that both the relative errors for keff
F2 1.0052 1.0399 3.4538 × 10−02 21.8882
obtained via all the methods are small, and the results for the
DRM are trained with Nepoch = 50000.
F3 1.0000 1.0428 4.2782 × 10−02 29.5368
For problems F1, F2, F4, and R7, the relative error of
F4 1.0079 1.0533 4.5361 × 10−02 26.4549
keff obtained by DRM was smaller than that obtained by
F5 1.0052 1.0450 3.9598 × 10−02 32.0516
GIPMNN, except for problems F3, F5, R3, R4, R5, and
R1 1.0118 1.0423 3.0156 × 10−02 16.6580 R6. However, although the relative error of keff obtained
R2 1.0052 1.0399 3.4538 × 10−02 21.8882 by GIPMNN is small, the relative error of φ simulated by
R3 0.9921 1.0399 4.8250 × 10−02 166.2996 GIPMNN is larger than that obtained by DRM. It is observed
R4 1.0017 1.0397 3.7884 × 10−02 163.6593 rel obtained by the PC-GIPMNN is smaller than that
that keff
R5 0.9780 1.0498 7.3430 × 10−02 182.7224 obtained by GIPMNN and DRM for all problems. Further-
R6 0.9668 1.0492 8.5210 × 10−02 87.2200 more, the relative errors of φ computed by the PC-GIMPNN
R7 1.1018 1.1517 4.5337 × 10−02 36.6777 are smaller than those obtained by the GIPMNN and DRM
for all problems. Therefore, the PC-GIPMNN can success-
fully learn eigenvalues and eigenfunctions.
In Fig. 9 and 10, the eigenfunctions computed by the FEM
the results of the DRM trained with Nepoch = 50000 with are shown in the first column, and the relative errors of the
the results of GIPMNN and PC-GIPMNN. eigenfunctions obtained by the GIPMNN, PC-GIPMNN, and
DRM are shown in the other columns for different problems.
It is observed that the relative errors of the eigenfunction
computed by the PC-GIPMNN and DRM are smaller than
Remark 12 The numerical results in Fig. 8 are not due to those obtained by the GIPMNN, which failed to learn some
hardware problems or code errors but can be attributed to the details. Compared to the eigenfunctions computed by the
nature of the neural network. The eigenvalue was approxi- FEM, the results obtained by the PC-GIPMNN are the best
mated by constructing a Rayleigh quotient in the DRM. Then, among the three methods.
the eigenvalue is treated as a loss function to optimize the
neural network. However, this mechanism of minimizing the
eigenvalue leads to overfitting of the neural network, as the
neural network always attempts to find a point where the loss
function tends toward zero.

rel denotes the relative error of


Table 7 Results obtained via GIPMNN, PC-GIPMNN, and DRM compared with FEM for problems in Table 5, keff
keff . Especially, DRM is trained with Nepoch = 50000

Problem keff (FEM) keff (GIPMNN) keff (PC-GIPMNN) keff (DRM) rel (GIPMNN)
keff rel (PC-GIPMNN)
keff rel (DRM)
keff

F1 1.0118 1.0094 1.0121 1.0096 2.3574 × 10−03 2.8227 × 10−04 2.1558 × 10−03
F2 1.0052 1.0023 1.0055 1.0024 2.8782 × 10−03 2.9864 × 10−04 2.7674 × 10−03
F3 1.0000 0.9968 0.9999 0.9965 3.2380 × 10−03 1.2475 × 10−04 3.4253 × 10−03
F4 1.0079 1.0053 1.0081 1.0054 2.5913 × 10−03 2.4834 × 10−04 2.4439 × 10−03
F5 1.0052 1.0025 1.0054 1.0024 2.7120 × 10−03 3.1435 × 10−04 2.7394 × 10−03
R1 1.0118 1.0094 1.0121 1.0096 2.3574 × 10−03 2.8227 × 10−04 2.1558 × 10−03
R2 1.0052 1.0023 1.0055 1.0024 2.8782 × 10−03 2.9864 × 10−04 2.7674 × 10−03
R3 0.9921 0.9937 0.9927 0.9878 1.5981 × 10−03 6.0649 × 10−04 4.3297 × 10−03
R4 1.0017 1.0014 1.0015 0.9987 3.5502 × 10−04 2.4851 × 10−04 3.0409 × 10−03
R5 0.9780 0.9759 0.9781 0.9721 2.1223 × 10−03 1.6160 × 10−04 6.0148 × 10−03
R6 0.9668 0.9639 0.9680 0.9584 3.0188 × 10−03 1.2722 × 10−03 8.6348 × 10−03
R7 1.1018 1.0929 1.1020 1.0953 8.0309 × 10−03 2.1827 × 10−04 5.9042 × 10−03
The bold numbers are the best results

123
Physics-constrained neural network... Page 17 of 23 161

Table 8 Results obtained via


Problem φ rel (GIPMNN) φ rel (PC-GIPMNN) φ rel (DRM)
GIPMNN, PC-GIPMNN, and
DRM when compared with
F1 4.6050 × 10−02 3.0937 × 10−02 4.4186 × 10−02
FEM for problems in Table 5,
φ rel denotes the relative error of F2 8.9731 × 10−02 3.6713 × 10−02 7.0763 × 10−02
the eigenfunction. Especially, F3 8.9385 × 10−02 4.1200 × 10−02 6.7164 × 10−02
DRM is trained with
F4 8.5110 × 10−02 4.3430 × 10−02 5.9907 × 10−02
Nepoch = 50000
F5 7.9413 × 10−02 4.6256 × 10−02 6.6049 × 10−02
R1 4.6050 × 10−02 3.0937 × 10−02 4.4186 × 10−02
R2 8.9731 × 10−02 3.6713 × 10−02 7.0763 × 10−02
R3 1.1047 × 10−01 6.2990 × 10−02 8.4720 × 10−02
R4 1.3013 × 10−01 2.9511 × 10−02 6.8647 × 10−02
R5 2.5852 × 10−01 2.9718 × 10−02 7.7923 × 10−02
R6 4.6253 × 10−01 5.1511 × 10−02 9.1424 × 10−02
R7 1.1612 × 10−01 1.8635 × 10−02 7.2148 × 10−02

4.3 2D IAEA benchmark problem mentioned previously, we used the optimal ratio of the outer
and inner iterations, which was 1 : 1.
We also considered the classical 2D IAEA benchmark The 2D reactor is divided into seven parts, as shown in
problem reported in the study by Yang et al. [40], which Fig. 11. The output layer has seven neurons, and seven func-
was modeled using two-dimensional and two-group diffu- tions are defined in different subdomains and labeled as u, v,
sion equations. Here, one-group neutron diffusion equation, w, r , p, q, and h, where u, v, w, and r denote the functions
defined in Eq. (4) is considered, and multigroup problems defined in the control rods and p, q, and h are the functions
are considered in our future study. Its geometry is shown in defined in the fuel and water regions, respectively. Sup , Svp ,
Fig. 11. The main bulk of the reactor consists of two fuel Swp , Sr p , Srq , S pq , and Sqh denote different datasets at dif-
regions, labeled 1 and 2, representing the two types of fuel ferent interfaces.
materials. Within its central region, there are four control For the PC-GIPMNN, the interface conditions (29) and
rods, which are all labeled as 3. The last region, labeled 4, was (30) are implemented as the loss functions (47) and (48),
composed of water. Cross-sectional data for the 2D IAEA respectively, in the 2D example.
benchmark problem are presented in Table 9. It is worth not- |Sup | |Svp |
 
ing that only one quarter of the reactor is shown in this figure Loss i1 = |u(x i ) − p(x i )| + 2
|v(x i ) − p(x i )|2
because the rest can be inferred by symmetry along the x- i=1 i=1
and y-axes. Therefore, this 2D IAEA benchmark problem is |Swp |

confined to the two types of boundary conditions defined in + |w(x i ) − p(x i )|2
Eq. (7). The problem is confined to the Neumann boundary i=1
condition on the x- and y-axes and to the Robin boundary |Sr p | |Srq |
 
condition on the other boundaries. + |r (x i ) − p(x i )| + 2
|r (x i ) − q(x i )|2
In this two-dimensional case, we used FreeFEM to solve i=1 i=1
the 2D IAEA benchmark problem with the parameters listed |S pq |

in Table 9. We selected uniform grids with x = 170 1
and + | p(x i ) − q(x i )|2
y = 170 . We trained GIPMNN, PC-GIPMNN and DRM
1
i=1
with N x = 171 and N y = 171. |Sqh |

+ |q(x i ) − h(x i )|2 , (47)
4.3.1 Using GIPMNN and PC-GIMPNN to solve for i=1
Eigenvalue |Sup |

Loss i2 = |(−D3 ∇u(x i ) · n) − (−D2 ∇ p(x i ) · n)|2
The number of points N = N x N y is used to calculate i=1
Loss gipmnn . The number of points NNb on the x- and y- |Svp |

axes and number of points NRb on the other boundaries + |(−D3 ∇v(x i ) · n) − (−D2 ∇ p(x i ) · n)|2
were used to calculate Loss Nb and Loss Rb , which enforced i=1
the Neumann and Robin boundary conditions. The num- |Swp |

ber of neurons was 20 for each hidden layer in ResNet + |(−D3 ∇w(x i ) · n) − (−D2 ∇ p(x i ) · n)|2
and Nepoch = 500000 for GIPMNN and PC-GIPMNN. As i=1

123
161 Page 18 of 23 Q.-H. Yang et al.

Fig. 9 (Color online) First column shows the heatmap of the eigenfunc- F3, F4, and F5. Evidently, GIPMNN less favorable results than DRM.
tion of FEM (the first column) and the other columns show the heatmaps However, by enforcing the interface conditions, PC-GIPMNN outper-
of the relative error of GIPMNN (the second column), PC-GIPMNN forms GIPMNN and DRM, as shown in the third column
(the third column), and DRM (the fourth column) for problems F1, F2,
|Sr p |
 The eigenfunction can be expressed as follows:
+ |(−D3 ∇r (x i ) · n) − (−D2 ∇ p(x i ) · n)|2
i=1
|Srq | φ = ul1 + vl2 + wl3 + rl4 + pl5 + ql6 + hl7 ,
 (49)
+ |(−D3 ∇r (x i ) · n) − (−D1 ∇q(x i ) · n)|2
i=1
|S pq | where l1 , l2 , l3 , l4 , l5 , l6 , and l7 are the indicator functions.

+ |(−D2 ∇ p(x i ) · n) − (−D1 ∇q(x i ) · n)|2
i=1
|Sqh | 4.3.2 Using DRM to solve for Eigenvalue

+ |(−D1 ∇q(x i ) · n) − (−D4 ∇h(x i ) · n)| .
2

i=1 Given that the homogeneous Neumann boundary condition is


(48) used, the loss function in DRM omits the impact of the Neu-
mann boundary condition and focuses on the Robin boundary
condition. The loss function is defined as follows:

123
Physics-constrained neural network... Page 19 of 23 161

Fig. 10 (Color online) First column shows the heatmap of the eigen- problems R3, R4, R5, R6, and R7. Obviously, GIPMNN yields less
function of FEM (the first column) and the other columns show the favorable results than DRM. However, by enforcing the interface con-
heatmaps of the relative error of GIPMNN (the second column), ditions, PC-GIPMNN outperforms GIPMNN and DRM, as shown in
PC-GIPMNN (the third column), and DRM (the fourth column) for the third column

N Length NRb 1 N
Ar ea
N i=1 D|∇φ(xi )|2 + NRb i=1 2 φ(x i )
2 + Ar ea
N i=1  φ(x i )
a 2
J= N
, (50)
i=1 v φ(x i )
Ar ea f 2
N

where Ar ea denotes the area of all regions, Length indicates 4.3.3 Results
the length of the boundaries other than the x- and y-axes in
Fig. 11, and NRb denotes the number of points on the Robin As discussed above, we also trained the DRM with the num-
boundary. ber of epochs Nepoch = 500000 and found that DRM failed

123
161 Page 20 of 23 Q.-H. Yang et al.

Table 11 Results obtained via GIPMNN, PC-GIPMNN, and DRM


when compared with FEM for the 2D IAEA benchmark problem. φ rel
denotes the relative error of the eigenfunction. Especially, DRM is
trained with Nepoch = 50000

Problem φ rel (GIPMNN) φ rel (PC-GIPMNN) φ rel (DRM)

IAEA 8.5688 × 10−02 4.7381 × 10−02 8.9602 × 10−02

For the 1D slab reactor, the computation time of FEM


is 1.25 s, and the training times of DRM, GIPMNN, and
PC-GIPMNN are 4788.37 s, 10614.57 s, and 16052.16 s,
respectively. For the 2D reactor, the computation time of
FEM is 3.64 s, and the training times of DRM, GIPMNN,
and PC-GIPMNN are 5827.91 s, 18444.39 s, and 108072.41
s, respectively. For the 2D IAEA benchmark, the computa-
tion time of FEM is 5.22 s, and the training times of DRM,
GIPMNN, and PC-GIPMNN are 29352.83 s, 64546.74 s, and
137812.92 s, respectively.
Although PC-GIPMNN was better than DRM and
GIPMNN, it required significantly more training time to
Fig. 11 (Color online) Geometry of the 2D IAEA benchmark prob-
lem with two fuel regions, four control-rod regions, and a water region
obtain accurate results. Compared with FEM, neural network
labeled as 1, 2, 3 m and 4. This figure is similar to the figure reported in methods require excessive time. However, neural networks
a previous study [40] are an emerging method and we believe that they will achieve
better results in the near future.
Table 9 Cross section data for the 2D IAEA benchmark problem
Region  a (cm−1 )  s (cm−1 ) v f (cm−1 ) Material
5 Conclusion
1 2.0 0.53 0.079 Fuel 1
2 0.087 0.55 0.085 Fuel 2 In this study, we proposed two methods, GIPMNN and PC-
3 0.38 0.20 0.0 Control rod GIPMNN, to solve generalized K-eigenvalue problems in
4 0.01 0.89 0.0 Water nuclear reactor physics. We also conducted a comprehen-
sive study of GIPMNN, PC-GIPMNN, and DRM. GIPMNN
follows the main idea of the inverse power method to find
to learn the eigenfunction again. In this case, DRM attains the smallest eigenvalue. The PC-GIPMNN enforce interface
a good keff = 0.9750 and the relative errors of keff and φ conditions through multiple neurons in the output layer. The
are 6.4023 × 10−03 and 0.9557, respectively. Therefore, we concept of DRM is to define the function of the Rayleigh
retrained the DRM with Nepoch = 50000 and stored the best quotient and form an optimization problem. Unlike DRM
results. solving for the smallest eigenvalue by directly minimizing
The relative errors of keff and eigenfunction φ are shown in the eigenvalue (Rayleigh quotient), the GIPMNN and PC-
Table 10 and Table 11. It was observed that all three methods GIPMNN attain the smallest eigenvalue using the iterative
obtained good results, and the relative errors of keff of DRM method. All the methods used neural networks to represent
were small. However, the ability of DRM to learn the eigen- functions and the differential was implemented using AD.
function was worse than that of GIPMNN and PC-GIPMNN Finally, we applied these three methods to problems in reac-
and the relative errors of φ were the largest. Thus, it can tor physics.
be concluded that the PC-GIPMNN successfully learned the Three numerical experiments were conducted to verify the
eigenvalues and eigenfunctions. The same conclusion can be applicability and accuracy of the GIPMNN, PC-GIPMNN,
drawn from the graphs in Fig. 12. and DRM. In the first 1D example, we used inner and outer
Table 10 Results obtained via GIPMNN, PC-GIPMNN, and DRM when compared with FEM for the 2D IAEA benchmark problem. Specifically,
epoch = 50000
rel denotes the relative error of k . Especially, DRM is trained with N
keff eff

Problem keff (FEM) keff (GIPMNN) keff (PC-GIPMNN) keff (DRM) rel (GIPMNN)
keff rel (PC-GIPMNN)
keff rel (DRM)
keff

IAEA 0.9688 0.9692 0.9691 0.9685 4.0370 × 10−04 3.0812 × 10−04 2.8218 × 10−04
The bold numbers are the best results

123
Physics-constrained neural network... Page 21 of 23 161

Fig. 12 (Color online) First column shows the heatmap of the eigen- IAEA benchmark problem. However, by enforcing the interface condi-
function of FEM (the first column) and the other columns show the tions, PC-GIPMNN outperforms GIPMNN and DRM, as shown in the
heatmaps of the relative error of GIPMNN (the second column), PC- third column
GIPMNN (the third column), and DRM (the fourth column) for the 2D

iterations for the simulation. According to our test, the best eigenvalue problem should be studied and clarified. Finally,
ratio of outer and inner iterations was 1 : 1. Furthermore, for the interface problem, a suitable sampling algorithm can
we compared the results of the GIPMNN, PC-GIPMNN, facilitate the training process and provide a better approxima-
and DRM with those of the FEM. For the continuous prob- tion. The next study could also involve the implementation
lem, the solution learned by the GIPMNN was more accurate of the proposed networks in emerging reactor digital twins
than those learned by the DRM and PC-GIPMNN. For inter- [75–77] as core solvers.
face problem, the eigenvalue and eigenfunction learned by
Author Contributions All authors contributed to the study conception
PC-GIPMNN were better than that learned by DRM and and design. Material preparation, data collection and analysis were per-
GIPMNN. This is due to the interface conditions that are formed by Qi-Hong Yang, Yu Yang, Yang-Tao Deng, Qiao-Lin He,
implemented in the loss function of PC-GIPMNN. He-Lin Gong, and Shi-Quan Zhang. The first draft of the manuscript
In the 2D examples, we observed the failure of DRM on was written by Qi-Hong Yang and all authors commented on previ-
ous versions of the manuscript. All authors read and approved the final
the 2D experiments. The DRM can learn the eigenvalue and manuscript.
eigenfunction at the early stage of the training process, and
the results decrease when Nepoch increases. Therefore, we Declarations
selected Nepoch = 50000 to train the DRM and compare
the results obtained by the GIPMNN, PC-GIPMNN, and Conflict of interest The authors declare no conflict of interest.
DRM with those obtained by the FEM. The results show
that the PC-GIPMNN outperforms the GIPMNN and DRM Open Access This article is licensed under a Creative Commons
for the results of eigenvalue and eigenfunction. Moreover, the Attribution 4.0 International License, which permits use, sharing, adap-
tation, distribution and reproduction in any medium or format, as
GIPMNN and PC-GIPMNN are more stable than the DRM. long as you give appropriate credit to the original author(s) and the
Although good results were obtained, there are still some source, provide a link to the Creative Commons licence, and indi-
aspects that need to be examined in the future. First, given cate if changes were made. The images or other third party material
that the architecture of a neural network significantly influ- in this article are included in the article’s Creative Commons licence,
unless indicated otherwise in a credit line to the material. If material
ences the accuracy of our methods, it is important to design a is not included in the article’s Creative Commons licence and your
universal network architecture to achieve high accuracy. For intended use is not permitted by statutory regulation or exceeds the
example, MLP is widely used in the current field of solving permitted use, you will need to obtain permission directly from the copy-
PDEs using neural networks, and ResNet [60] is effective right holder. To view a copy of this licence, visit http://creativecomm
ons.org/licenses/by/4.0/.
in improving the convergence rate and may even obtain bet-
ter results than MLP. Recently, a transformer [74] was used
for operator learning and better results were obtained. Sifan References
Wang et al. [62] investigated the effect of MLP on opera-
1. A. Hébert, Applied Reactor Physics, 3rd edn. (Presses interna-
tor learning and proposed a modified MLP [61], which is a tionales Polytechnique, 2020)
new architecture of MLP that improves accuracy. Although 2. T.B. Fowler, D.R. Vondy, Nuclear reactor core analysis code: Cita-
GIPMNN and PC-GIPMNN are more stable than DRM and tion. (Jan 1969) https://www.osti.gov/biblio/4772428
PC-GIPMNN is more accurate than DRM, they require a 3. A. Hébert, Development of the nodal collocation method for
solving the neutron diffusion equation. Ann. Nuclear Energy 14,
large number of iterations and a long training time to achieve 527–541 (1987). https://doi.org/10.1016/0306-4549(87)90074-0
good results. Therefore, improving convergence and reduc- 4. L.A. Semenza, E.E. Lewis, E.C. Rossow, The application of the
ing training time will be investigated in our future study. finite element method to the multigroup neutron diffusion equation.
Furthermore, the failure of DRM on 2D experiments on the Nuclear Sci. Eng. 47, 302–310 (1972). https://doi.org/10.13182/
NSE72-A22416

123
161 Page 22 of 23 Q.-H. Yang et al.

5. A. Hébert, Application of a dual variational formulation to finite inverse problems involving nonlinear partial differential equations.
element reactor calculations. Ann. Nuclear Energy 20, 823–845 J. Comput. Phys. 378, 686–707 (2019). https://doi.org/10.1016/j.
(1993). https://doi.org/10.1016/0306-4549(93)90076-2 jcp.2018.10.045
6. L. Wanai, G. Helin, Z. Chunyu, Solution of neutron diffusion 26. X. Jiang, D. Wang, Q. Fan et al., Physics-informed neural network
problems by discontinuous Galerkin finite element method with for nonlinear dynamics in fiber optics. Laser & Photon. Rev. 16,
consideration of discontinuity factors. J. Nuclear Eng. Radiat. Sci. 2100483 (2022). https://doi.org/10.1002/lpor.202100483
9, 031503 (2023). https://doi.org/10.1115/1.4055379 27. D.W. Abueidda, S. Koric, E. Guleryuz et al., Enhanced physics-
7. R. Lawrence, Progress in nodal methods for the solution of the informed neural networks for hyperelasticity. Int. J. Numer. Meth-
neutron diffusion and transport equations. Prog. Nuclear Energy 17, ods Eng. 124, 1585–1601 (2023). https://doi.org/10.1002/nme.
271–301 (1986). https://doi.org/10.1016/0149-1970(86)90034-X 7176
8. K.S. Smith, An analytic nodal method for solving the two-group, 28. D.W. Abueidda, Q. Lu, S. Koric, Meshless physics-informed deep
multidimensional, static and transient neutron diusion equation. learning method for three-dimensional solid mechanics. Int. J.
(Mar 1979) http://hdl.handle.net/1721.1/15979 Numer. Methods Eng. 122, 7182–7201 (2021). https://doi.org/10.
9. P. An, Y. Ma, P. Xiao et al., Development and validation of reactor 1002/nme.6828
nuclear design code corca-3d. Nuclear Eng. Technol. 51, 1721– 29. R. Laubscher, Simulation of multi-species flow and heat transfer
1728 (2019). https://doi.org/10.1016/j.net.2019.05.015 using physics-informed neural networks. Phys. Fluids 33, 087101
10. A. Kuz’min, Iterative methods for solving nonlinear problems of (2021). https://doi.org/10.1063/5.0058529
nuclear reactor criticality. Phys. Atomic Nuclei 75, 1551–1556 30. S. Cai, Z. Wang, S. Wang et al., Physics-informed neural networks
(2012). https://doi.org/10.1134/S1063778812130042 for heat transfer problems. J. Heat Trans. 143, 060801 (2021).
11. W.M. Stacey, Nuclear reactor physics (John Wiley & Sons, 2007). https://doi.org/10.1115/1.4050542
https://doi.org/10.1002/9783527611041 31. Y. Liu, R. Hu, A. Kraus et al., Data-driven modeling of coarse
12. S. Marguet, The physics of nuclear reactors (Springer, 2018). mesh turbulence for reactor transient analysis using convolutional
https://doi.org/10.1007/978-3-319-59560-3 recurrent neural networks. Nuclear Eng. Des. 390, 111716 (2022).
13. K. Tang, X. Wan, C. Yang, Das-pinns: a deep adaptive sampling https://doi.org/10.1016/j.nucengdes.2022.111716
method for solving high-dimensional partial differential equations. 32. Y. Chen, L. Lu, G.E. Karniadakis et al., Physics-informed neural
J. Comput. Phys. 476, 111868 (2023). https://doi.org/10.1016/j. networks for inverse problems in nano-optics and metamaterials.
jcp.2022.111868 Opt. Express 28, 11618–11633 (2020). https://doi.org/10.1364/
14. J.P. Argaud, B. Bouriquet, P. Erhard et al., Data assimilation OE.384875
in nuclear power plant core (Springer, 2010). https://doi.org/10. 33. T. Kadeethum, T.M. Jørgensen, H.M. Nick, Physics-informed
1007/978-3-642-12110-4_61 neural networks for solving inverse problems of nonlinear
15. H. Gong, Y. Yu, Q. Li et al., An inverse-distance-based fitting biot’s equations: Batch training. (Jun 2020) https://onepetro.org/
term for 3d-var data assimilation in nuclear core simulation. Ann. ARMAUSRMS/proceedings-abstract/ARMA20/All-ARMA20/
Nuclear Energy 141, 107346 (2020). https://doi.org/10.1016/j. 447584
anucene.2020.107346 34. S. Cheng, I. Colin Prentice, Y. Huang et al., Data-driven surrogate
16. S. Cheng, J. Chen, C. Anastasiou et al., Generalised latent assim- model with latent data assimilation: application to wildfire fore-
ilation in heterogeneous reduced spaces with machine learning casting. J. Comput. Phys. 464, 111302 (2022). https://doi.org/10.
surrogate models, vol. 94 (Springer, 2023). https://doi.org/10. 1016/j.jcp.2022.111302
1007/s10915-022-02059-4 35. S. Cheng, J. Chen, C. Anastasiou et al., Generalised latent assim-
17. S. Cheng, D. Lucor, J.P. Argaud, Observation data compression for ilation in heterogeneous reduced spaces with machine learning
variational assimilation of dynamical systems. J. Comput. Sci. 53, surrogate models. J. Sci. Comput. 94, 11 (2023). https://doi.org/
101405 (2021). https://doi.org/10.1016/j.jocs.2021.101405 10.1007/s10915-022-02059-4
18. S. Riva, C. Introini, S. Lorenzi et al., Hybrid data assimilation 36. Y. Gao, M.K. Ng, Wasserstein generative adversarial uncertainty
methods, part i: Numerical comparison between GEIM and PBDW. quantification in physics-informed neural networks. J. Com-
Ann. Nuclear Energy 190, 109864 (2023). https://doi.org/10.1016/ put. Phys. 463, 111270 (2022). https://doi.org/10.1016/j.jcp.2022.
j.anucene.2023.109864 111270
19. S. Riva, C. Introini, S. Lorenzi et al., Hybrid data assimilation 37. C. Oszkinat, S.E. Luczak, I. Rosen, Uncertainty quantification in
methods, part II: application to the dynasty experimental facility. estimating blood alcohol concentration from transdermal alcohol
Ann. Nuclear Energy 190, 109863 (2023). https://doi.org/10.1016/ level with physics-informed neural networks. IEEE Trans. Neural
j.anucene.2023.109863 Netw. Learn. Syst. (2022). https://doi.org/10.1109/TNNLS.2022.
20. L.C. Evans, Partial differential equations. (2010) https://bookstore. 3140726
ams.org/gsm-19-r/ 38. Y. Yang, P. Perdikaris, Adversarial uncertainty quantification in
21. J. Han, A. Jentzen et al., Deep learning-based numerical methods physics-informed neural networks. J. Comput. Phys. 394, 136–152
for high-dimensional parabolic partial differential equations and (2019). https://doi.org/10.1016/j.jcp.2019.05.027
backward stochastic differential equations. Commun. Math. Stat. 39. Y. Liu, D. Wang, X. Sun et al., Uncertainty quantification for
5, 349–380 (2017). https://doi.org/10.1007/s40304-017-0117-6 multiphase-cfd simulations of bubbly flows: a machine learning-
22. J. Han, A. Jentzen, W. E, Solving high-dimensional partial differ- based Bayesian approach supported by high-resolution experi-
ential equations using deep learning. Proc. Nat. Acad. Sci. 115, ments. Reliab. Eng. Syst. Saf. 212, 107636 (2021). https://doi.org/
8505–8510 (2018). https://doi.org/10.1073/pnas.1718942115 10.1016/j.ress.2021.107636
23. J. Sirignano, K. Spiliopoulos, Dgm: a deep learning algorithm for 40. Y. Yang, H. Gong, S. Zhang et al., A data-enabled physics-informed
solving partial differential equations. J. Comput. Phys. 375, 1339– neural network with comprehensive numerical study on solving
1364 (2018). https://doi.org/10.1016/j.jcp.2018.08.029 neutron diffusion eigenvalue problems. Ann. Nuclear Energy 183,
24. W. E, B. Yu, The deep ritz method: a deep learning-based numerical 109656 (2023). https://doi.org/10.1016/j.anucene.2022.109656
algorithm for solving variational problems. Commun. Math. Stat. 41. I. Ben-Shaul, L. Bar, N. Sochen, Solving the functional eigen-
6, 1–12 (2018). https://doi.org/10.1007/s40304-018-0127-z problem using neural networks. (Jul 2020). https://doi.org/10.
25. M. Raissi, P. Perdikaris, G.E. Karniadakis, Physics-informed neu- 48550/arXiv.2007.10205
ral networks: a deep learning framework for solving forward and

123
Physics-constrained neural network... Page 23 of 23 161

42. I. Ben-Shaul, L. Bar, N. Sochen, Deep learning solution of the 61. S. Wang, Y. Teng, P. Perdikaris, Understanding and mitigating gra-
eigenvalue problem for differential operators. Neural Comput. 35, dient flow pathologies in physics-informed neural networks. SIAM
1100–1134 (2023). https://doi.org/10.1162/neco_a_01583 J. Sci. Comput. 43, A3055–A3081 (2021). https://doi.org/10.1137/
43. H. Jin, M. Mattheakis, P. Protopapas, Unsupervised neural net- 20M1318043
works for quantum eigenvalue problems. (Oct 2020). https://doi. 62. S. Wang, H. Wang, P. Perdikaris, Improved architectures and train-
org/10.48550/arXiv.2010.05075 ing algorithms for deep operator networks. J. Sci. Comput. 92, 35
44. H. Jin, M. Mattheakis, P. Protopapas, Physics-informed neural net- (2022). https://doi.org/10.1007/s10915-022-01881-0
works for quantum eigenvalue problems. 2022 International Joint 63. L. Lyu, K. Wu, R. Du, et al., Enforcing exact boundary and initial
Conference on Neural Networks (IJCNN) 1–8 (2022). https://doi. conditions in the deep mixed residual method. (Aug 2020). https://
org/10.1109/IJCNN55064.2022.9891944 doi.org/10.48550/arXiv.2008.01491
45. M.H. Elhareef, Z. Wu, Physics-informed neural network method 64. S. Dong, N. Ni, A method for representing periodic functions and
and application to nuclear reactor calculations: a pilot study. enforcing exactly periodic boundary conditions with deep neural
Nuclear Sci. Eng. 197, 1–22 (2022). https://doi.org/10.1080/ networks. J. Comput. Phys. 435, 110242 (2021). https://doi.org/
00295639.2022.2123211 10.1016/j.jcp.2021.110242
46. V.L. Berdichevsky, Variational principles. Var. Princ. Contin. 65. G.F. Carrier, C.E. Pearson, Partial differential equations: theory
Mech. (2009). https://doi.org/10.1007/978-3-540-88467-5_1 and technique (Academic Press, 2014)
47. A.G. Baydin, B.A. Pearlmutter, A.A. Radul et al., Automatic dif- 66. A. Buchan, C. Pain, F. Fang et al., A pod reduced-order model
ferentiation in machine learning: a survey. J. Mach. Learn. Res. 18, for eigenvalue problems with application to reactor physics. Int. J.
1–43 (2018) Numer. Methods Eng. 95, 1011–1032 (2013). https://doi.org/10.
48. J. Han, J. Lu, M. Zhou, Solving high-dimensional eigenvalue prob- 1002/nme.4533
lems using deep neural networks: A diffusion monte carlo like 67. F. Hecht, New development in freefem++. J. Numer. Math. 20,
approach. J. Comput. Phys. 423, 109792 (2020). https://doi.org/ 251–266 (2012). https://doi.org/10.1515/jnum-2012-0013
10.1016/j.jcp.2020.109792 68. J.P. Argaud, B. Bouriquet, F. De Caso et al., Sensor placement in
49. Q. Yang, Y. Deng, Y. Yang et al., Neural networks base on power nuclear reactors based on the generalized empirical interpolation
method and inverse power method for solving linear eigenvalue method. J. Comput. Phys. 363, 354–370 (2018). https://doi.org/10.
problems. Comput. Math. Appl. 147, 14–24 (2023). https://doi. 1016/j.jcp.2018.02.050
org/10.1016/j.camwa.2023.07.013 69. H. Gong, Y. Yu, Q. Li, Reactor power distribution detection and
50. W.H. Greub, Linear algebra, vol. 23 (Springer Science & Business estimation via a stabilized Gappy proper orthogonal decomposition
Media, 2013). https://doi.org/10.1007/978-3-662-01545-2 method. Nuclear Eng. Des. 370, 110833 (2020). https://doi.org/10.
51. M.H. ELHAREEF, Z. WU, Extension of the pinn diffusion model to 1016/j.nucengdes.2020.110833
k-eigenvalue problems. (2022) https://inis.iaea.org/search/search. 70. H. Gong, Z. Chen, W. Wu et al., Neutron noise calculation: a
aspx?orig_q=RN:54002255 comparative study between sp3 theory and diffusion theory. Ann.
52. A.D. Jagtap, E. Kharazmi, G.E. Karniadakis, Conservative physics- Nuclear Energy 156, 108184 (2021). https://doi.org/10.1016/j.
informed neural networks on discrete domains for conservation anucene.2021.108184
laws: applications to forward and inverse problems. Comput. Meth- 71. H. Gong, Z. Chen, Y. Maday et al., Optimal and fast field recon-
ods Appl. Mech. Eng. 365, 113028 (2020). https://doi.org/10.1016/ struction with reduced basis and limited observations: Application
j.cma.2020.113028 to reactor core online monitoring. Nuclear Eng. Des. 377, 111113
53. J. Wang, X. Peng, Z. Chen et al., Surrogate modeling for neutron (2021). https://doi.org/10.1016/j.nucengdes.2021.111113
diffusion problems based on conservative physics-informed neu- 72. J. Argaud, B. Bouriquet, H. Gong et al., Stabilization of (G) EIM
ral networks with boundary conditions enforcement. Ann. Nuclear in presence of measurement noise: application to nuclear reac-
Energy 176, 109234 (2022). https://doi.org/10.1016/j.anucene. tor physics (Springer, 2017). https://doi.org/10.1007/978-3-319-
2022.109234 65870-4_8
54. A. Quarteroni, G. Rozza, Reduced order methods for modeling and 73. H. Gong, S. Cheng, Z. Chen et al., Data-enabled physics-informed
computational reduction, vol. 9 (Springer, 2014). https://doi.org/ machine learning for reduced-order modeling digital twin: appli-
10.1007/978-3-319-02090-7 cation to nuclear reactor physics. Nuclear Sci. Eng. 196, 668–693
55. J.S. Hesthaven, G. Rozza, B. Stamm, Certified reduced basis (2022). https://doi.org/10.1080/00295639.2021.2014752
methods for parametrized partial differential equations (Springer, 74. S. Cao, Choose a transformer: Fourier or Galerkin. Adv. Neural
2016). https://doi.org/10.1007/978-3-319-22470-1 Inf. Process. Syst. 34, 24924–24940 (2021)
56. T. Phillips, C.E. Heaney, P.N. Smith et al., An autoencoder-based 75. H. Gong, T. Zhu, Z. Chen et al., Parameter identification and
reduced-order model for eigenvalue problems with application to state estimation for nuclear reactor operation digital twin. Ann.
neutron diffusion. Int. J. Numer. Methods Eng. 122, 3780–3811 Nuclear Energy 180, 109497 (2023). https://doi.org/10.1016/j.
(2021). https://doi.org/10.1002/nme.6681 anucene.2022.109497
57. Z. Wang, Z. Zhang, A mesh-free method for interface problems 76. H. Gong, S. Cheng, Z. Chen et al., An efficient digital twin based on
using the deep learning approach. J. Comput. Phys. 400, 108963 machine learning svd autoencoder and generalised latent assimila-
(2019). https://doi.org/10.1016/j.jcp.2019.108963 tion for nuclear reactor physics. Ann. Nuclear Energy 179, 109431
58. C. He, X. Hu, L. Mu, A mesh-free method using piecewise deep (2022). https://doi.org/10.1016/j.anucene.2022.109431
neural network for elliptic interface problems. J. Comput. Appl. 77. H. Gong, S. Cheng, Z. Chen et al., Data-enabled physics-informed
Math. 412, 114358 (2022). https://doi.org/10.1016/j.cam.2022. machine learning for reduced-order modeling digital twin: Appli-
114358 cation to nuclear reactor physics. Nuclear Sci. Eng. 196, 668–693
59. W.F. Hu, T.S. Lin, M.C. Lai, A discontinuity capturing shallow (2022). https://doi.org/10.1080/00295639.2021.2014752
neural network for elliptic interface problems. J. Comput. Phys.
469, 111576 (2022). https://doi.org/10.1016/j.jcp.2022.111576
60. K. He, X. Zhang, S. Ren, et al., Deep residual learning for image
recognition. (June 2016) https://openaccess.thecvf.com/content_
cvpr_2016/html/He_Deep_Residual_Learning_CVPR_2016_
paper.html

123

You might also like