Openpipeflow 1.02b

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

A Pipe-Flow Code

Primitive variable version


Ashley P. Willis
May 29, 2014
CONTENTS
1 Getting started 3
1.1 Overview of les . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Input les . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4.1 Snapshot data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4.2 Time-series data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Compiling libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6 Compiling and running . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.7 Monitoring a run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.8 Making utils . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Spatial representation 8
2.1 Finite dierences r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.1 Dierentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.2 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.3 Matrix representation of linear operators . . . . . . . . . . . . . . . 9
2.2 Fourier representaion , z . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.1 Expansion of variables . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.2 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.3 Energy integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.4 Volume integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Enforced conditions at the axis . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 Model 11
3.1 Governing equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.1 Non-dimensionalisation / scales . . . . . . . . . . . . . . . . . . . . . 11
3.1.2 Dimensionless parameters . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.3 Evolution equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Decoupling the equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Simple projection method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4 PPE-formulation with correct boundary conditions . . . . . . . . . . . . . . 12
3.5 Predictor-corrector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.6 Timestep control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4 The code 15
4.1 Naming conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
CONTENTS 2
4.1.1 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.1.2 Public variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Ordering the Fourier modes . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.3 Data types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3.1 coll, spec storage of coecients . . . . . . . . . . . . . . . . . . . 16
4.3.2 phys real space data . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.3.3 mesh, lumesh storage of banded matrices . . . . . . . . . . . . . . 17
4.3.4 rdom the radial domain . . . . . . . . . . . . . . . . . . . . . . . 17
4.4 Modications for the parallel implementation . . . . . . . . . . . . . . . . . 17
A Dierential operators in cylindricalpolar coordinates 20
1
GETTING STARTED
1.1 Overview of les
Makefile will require modication for your compiler and libraries (see 1.5). Sample
commands for other compilers can be found near the top of the le.
parallel.h Ensure _Np is set to 1 if you do not have MPI. This le contains macros
for parallelisation that are only invoked if the number of processes Np is greater than
1.
program/parameters.f90, where youll nd parameters. See 1.2
utils/ contains a number of utilities. Almost anything can be done in a util, both
post-processing and analysing data at runtime. There should be no need to alter the
core code. See 1.8
Matlab/, a few scripts. See Matlab/Readme.txt .
doc/, this document!
Chapter 1. Getting started 4
1.2 Parameters
./parallel.h:
_Np Number of processors (set 1 for serial)
./program/parameters.f90:
i_N Number of radial points n [1, N]
i_K Maximum k (axial), k (K, K)
i_M Maximum m (azimuthal), m [0, M)
i_Mp Azimuthal periodicity, i.e. m = 0, M
p
, 2M
p
, . . . , (M 1)M
p
(set =1 for no symmetry assumption)
d_Re Reynolds number Re or Rm
m
d_alpha Axial wavenumber = 2/L
z
b_const_flux Enforce constant ux U
b
=
1
2
.
i_save_rate1 Save frequency for snapshot data les
i_save_rate2 Save frequency for time-series data
i_maxtstep Maximum number of timesteps (no limit if set < 0)
d_cpuhours Maximum number of cpu hours
d_time Start time (taken from state.cdf.in if set < 0)
d_timestep Fixed timestep (typically 0.01 or dynamically controlled if set < 0)
d_dterr Maximum corrector norm, |f
corr
|
d_courant Courant number C
d_implicit Implicitness c
1.3 Input les
State les are stored in the NetCDF data format and can be transferred across dier-
ent architectures safely. The program main.out runs with the compiled parameters (see
main.info) but will load states of other truncations. Copy any state le e.g. state0018.cdf.dat
to state.cdf.in It will interpolate if necessary.
state.cdf.in:
t Start time. Overridden by d_time if d_time0
t Timestep. Ignored, see parameter d_timestep.
N, M
p
, r
n
Number of radial points, azimuthal periodicity,
radial values of the input state.
If the radial points dier the elds are
interpolated onto the new points.
u
r
, u

, u
z
Field vectors.
If K ,=K or M ,=M, harmonics are truncated or
zeros appended.
1.4 Output
1.4.1 Snapshot data
Data saved every i_save_rate1 timesteps:
state????.cdf.dat
vel_spectrum.dat
All output is sent to the current directory, and state les are numbered 0000, 0001, 0002,. . . .
Each le can be renamed state.cdf.in should a restart be necessary. To list times t for
Chapter 1. Getting started 5
each saved state le,
grep state OUT
The spectrum les are overwritten each save as they are retrievable from the state data.
To verify sucient truncation, a quick prole of the energy spectrum can be plotted with
gnuplot> set log
gnuplot> plot vel_spectrum.dat w lp
1.4.2 Time-series data
Data saved every i_save_rate2 timesteps:
tim_step.dat t, t, t
f
, t
CFL
current and limiting step sizes
vel_energy.dat t, E energies and viscous dissipation
vel_friction.dat t, U
b
or , u
z
(r = 0)
z
, u

bulk velocites, pressure, friction vel.


1.5 Compiling libraries
Take a look at Makele. The compiler and ags will probably need editing. There are
suggested ags for many compilers at the top of this le.
Building the main code is successful if running make produces no errors, but this requires
the necessary libraries to be present - LAPACK, netCDF and FFTW. Often it is necessary
to build LAPACK and NetCDF with the compiler and compile ags that will be used for
the main simulation code.
The default procedure for building a package (FFTW3, netCDF) is
tar -xvvzf package.tar.gz
cd package/
[set environment variables if necessary]
./configure --prefix=<path>
make
make install
FFTW3. This usually requires no special treatment. Install with your package man-
ager or build with the default settings.
LAPACK. If using gfortran you might be able to use the binary version supplied for
your linux distribution. Otherwise, edit the le make.inc that comes with LAPACK, setting
the fortran compiler and ags to those you plan to use. Type make. Once nished, copy
the following binaries into your library path (see Makele LIBS -L<path>/lib/)
cp lapack.a <path>/lib/liblapack.a
cp blas.a <path>/lib/libblas.a
netCDF. If using gfortran you might be able to use the supplied binary version for
your linux distribution. Otherwise, typical environment variables required to build netCDF
are
CXX=""
FC=/opt/intel/fc/10.1.018/bin/ifort
FFLAGS="-O3 -mcmodel=medium"
export CXX FC FFLAGS
After the make install, ensure that les netcdf.mod and typesizes.mod appear in your
Chapter 1. Getting started 6
include path (see Makele COMPFLAGS -I<path>/include/). can be found at the top of
the page
http://www.unidata.ucar.edu/downloads/netcdf/current/index.jsp Version 4.1.3 is rela-
tively straight forward to install, the above ags should be sucient. It is trickier for more
recent versions [2013-12-11 currently netcdf-4.3.0.tar.gz]. First
./configure --disable-netcdf-4 --prefix-<path>
which disables HDF5 support (not currently required in code). Fortran is no longer bun-
dled, so get netcdf-fortran-4.2.tar.gz or a more recent version from here
http://www.unidata.ucar.edu/downloads/netcdf/index.jsp
Build with the above ags, and in addition
CPPFLAGS=-I<path>/include
export CPPFLAGS
Add the link ag -lnetcdff before -lnetcdf in the Makefile.
1.6 Compiling and running
For serial use set _Np to 1 in parallel.h. Otherwise MPI is required and the compile must
be updated in the Makefile. First set parameters in program/parameters.f90. Next type
> make
> make install
The second command creates the directory install/ and a text le main.info, which is a
record of settings at compile time. Next an initial condition state.cdf.in is needed,
> mv install ~/runs/job0001
> cd ~/runs/job0001/
> cp .../state0100.cdf.dat state.cdf.in
> nohup ./main.out > OUT 2> OUT.err &
Press enter again to see if main.out stopped. If it has stopped, check OUT.err or OUT for
clues why. If it appears to be running, wait a minute or two then
> rm RUNNING
This will terminate the job cleanly.
NOTE: Any output state, e.g. state0012.cdf.dat can be copied to state.cdf.in to be
used as an initial condition. If resolutions do not match, they are automatically interpolated
or truncated. This is how I generate almost all initial conditions, by taking a state from a
run with similar parameters. If there is a mismatch in Mp, use utils/changeMp.f90.
1.7 Monitoring a run
Immediately after starting a job, its a good idea to check for any warnings
> less OUT
To nd out number of timesteps completed, or for possible diagnosis of an early exit,
> tail OUT
Chapter 1. Getting started 7
The code outputs timeseries data and snapshot data, the latter has a 4-digit number e.g.
state0012.cdf.dat.
To see when the in the run each state was saved,
> head -n 1 vel_spec* | less
I often monitor progress with tail vel_energy.dat or
> gnuplot
> plot vel_energy.dat w l
Use rm RUNNING to end the job.
1.8 Making utils
The core of the code in program/ rarely needs to be changed. Almost anything can be
done by creating a utility; there are many examples in utils/
In Makefile, set UTIL = utilname (omitting the .f90 extension), then type make util
which builds utilname.out.
2
SPATIAL REPRESENTATION
2.1 Finite dierences r
A function p(r) is represented by the values p
n
= p(r
n
), where p(r) is evaluated on the
N radial points r
n
.
2.1.1 Dierentiation
Taylor expansions about central point x
0
, k neighbouring points each side
f(x
k
) = f(x
0
) + (x
k
x
0
) f

(x
0
) +
(x
k
x
0
)
2
2!
f

(x
0
) +. . .
.
.
.
f(x
k
) = f(x
0
) + (x
k
x
0
) f

(x
0
) +
(x
k
x
0
)
2
2!
f

(x
0
) +. . .
(2.1)
The expansions (2.1) can be written
f = Adf, df = A
1
f, f =
_

_
f(x
k
)
.
.
.
f(x
k
)
_

_, df =
_

_
f(x
0
)
f

(x
0
)
f

(x
0
)
.
.
.
_

_
(2.2)
Derivatives are calculated using weights from the appropriate row of A
1
.
2.1.2 Integration
Integrating (2.1) the indenite integral may be approximated about x
0
_
f(x) dx =
(x x
0
) f(x
0
) +
(xx
0
)
2
2!
f

(x
0
) +
(xx
0
)
3
3!
f

(x
0
) +. . .
(2.3)
= [
(x x
0
)
(xx
0
)
2
2!
. . .
] A
1
f (2.4)
The total integral is approximated by
1
2

n
_
x
n+1
x
n1
f(x) dx (2.5)
Chapter 2. Spatial representation 9
2.1.3 Matrix representation of linear operators
Linear equations can be written in the matrix form
L p = Mq +s. (2.6)
L and M are N N matrices representing linear operators. If q and s are known then the
right-hand side is simple to evaluate. Consider the equation
L p = q, (2.7)
where p is unknown. If the matrix operators are formed by linear combinations of the
weights in 2.1.1, using only k neighbouring points to each side, they are banded. Boundary
conditions are formed in the same way and are placed in the end rows. The equation is
solved by forward-backward substitution, using the banded LU-factorisation of L.
BC
p q L
=
LU
2.2 Fourier representaion , z
2.2.1 Expansion of variables
A(, z) =

km
A
km
e
i(kz+m
0
m)
(2.8)
A real implies A
km
= A

k,m
(coecients are conjugate-symmetric). It is sucient to keep
only m 0, and for m = 0 keep k 0.
2.2.2 Orthogonality
Exponentials
_
2
0
e
i(m+n)
d = 2
m,n
,
_ 2

0
e
i(k+j)
dz =
2


k,j
(2.9)
2.2.3 Energy integral
E =
1
2
_
A AdV =
2
2

km
_
[A
km
[
2
r dr (2.10)
E =

m=0
E
m
, E
m
=
_

_
2
2

_
A
2
00
r dr +
4
2

k>0
_
[A
2
k0
[ r dr, m = 0,
4
2

k
_
[A
km
[
2
r dr, m > 0.
(2.11)
Chapter 2. Spatial representation 10
2.2.4 Volume integral
E =
1
2
_
AdV =
4
2

_
A
00
r dr (2.12)
2.3 Enforced conditions at the axis
The geometry enforces conditions on each variable at the axis when expanded over Fourier
modes in . Given a vector A, each mode for A
z
is even in r if m is even, and is odd if m
is odd. Each mode for A
r
and A

is even in r if m is odd, and is odd if m is even.


3
MODEL
3.1 Governing equations
3.1.1 Non-dimensionalisation / scales
length R,
velocity, xed ux 2 U
b
,
velocity, xed pressure U
cl
of HPF.
For HPF U
cl
= 2 U
b
.
3.1.2 Dimensionless parameters
Reynolds number, xed ux, Re
m
= 2U
b
R/
Reynolds number, xed pressure, Re = U
cl
R/
1 + = Re/Re
m
Re

= u

R/ = (2 Re
m
(1 +))
1
2
= (2 Re)
1
2
3.1.3 Evolution equations
Fixed ux,
(
t
+u )u = p +
4
Re
m
(1 +) z +
1
Re
m

2
u (3.1)
Fixed pressure
(
t
+u )u = p +
4
Re
z +
1
Re

2
u (3.2)
Let u = W(r) z +u

. Using the scaling above, W(r) = 1 r


2
. Then
(
t

1
Re
m

2
) u

= u

( u

)
dW
dr
u

z
z W
z
u

+
4
Re
m
z p

. (3.3)
3.2 Decoupling the equations
The equations for u
r
and u

are coupled in the Laplacian. They can be separated in a


Fourier decompositon by considering
u

= u
r
i u

, (3.4)
Chapter 3. Model 12
for which the are considered respectively. Original variables are easily recovered
u
r
=
1
2
(u
+
+u

), u

=
i
2
(u
+
u

). (3.5)
Governing equations are then
(
t

2

) u

= N

(p)

, (3.6)
(
t

2
) u
z
= N
z
(p)
z
, (3.7)
where

=
2

1
r
2

i
r
2

(3.8)
3.3 Simple projection method
The basic projection method oers simplicity and exibility. The error measured by the
predictor-corrector is second order (on the nonlinear terms), but the method is rst order
overall. See following section for a better method.
The following steps are performed, where q refers to time t
q
:
u

u
q
t
= N
q+
1
2
, (3.9)
u
q+1
= u

, (3.10)
where

2
= u

. (3.11)
3.4 PPE-formulation with correct boundary conditions
Write the time-discretised NavierStokes equations in the form
_
Xu
q+1
= Yu
q
+N
q+
1
2
p ,

2
p = (Yu
q
+N
q+
1
2
),
(3.12)
where q denotes time t
q
, which is sixth order in r for u
q+1
and second order for p, where the
solenoidal condition is not explicitly imposed. Symmetry conditions provide the conditions
at the axis. The diculty is in imposing the remaining four this system should be
inverted, in principle, simultaneously for p and u
q+1
with boundary conditions u
q+1
= 0
and u
q+1
= 0 on r = R (Rempfer 2006). In practice it would be preferable to invert for
p rst then for u
q+1
, but the boundary conditions to not involve p directly.
Note that the Yu
q
term has been included in the right-hand side of the pressure-Poisson
equation, the divergence of which should be small. Assume that pressure boundary condi-
tion is known: the right-hand side of the NavierStokes equation is then projected onto the
space of solenoidal functions though p and hence after inversion, u
q+1
will be solenoidal.
Consider the bulk solution, u, p, obtained from solution of the following:
_
X u = Yu
q
+N
q+
1
2
p ,

2
p = (Yu
q
+N
q+
1
2
),
(3.13)
Chapter 3. Model 13
with boundary conditions u = 0 and
r
p = 0. Introduce the following systems:
_
Xu

= p

2
p

= 0,
(3.14)
with boundary conditions u = 0 and
r
p = 1 on r = R, and
_
X u = 0, (3.15)
with boundary conditions u
+
= 1, u

= 1, u
z
= i on r = R. The system (3.14) provides
a linearly independent function u

j
that may be added to u without aecting the right-
hand side in (3.13), but altering (to correct) the boundary condition applied. Similarly the
system (3.15) provides a further three functions. The superposition
u
q+1
= u +
4

j=1
a
j
u

j
(3.16)
may be formed in order to satisfy the four boundary conditions, u
q+1
= 0 and u
q+1
= 0
on r = R. Substituting (3.16) into the boundary conditions, they may be written
Aa = g( u), (3.17)
where A = A(g(u

)) is a 44 matrix. The appropriate coecients required to satisfy the


boundary conditions are thereby recovered from solution of this small system for a. The
error in the boundary conditions g
j
(u
q+1
) using the inuence-matrix technique is at the
level of the machine epsilon, typically order 1e-18.
The functions u

j
(r), the matrix A and its inverse may all be precomputed. The boundary
conditions for u

have been chosen so that that u

are pure real, u

z
is pure imaginary, and
A is real. For each timestep, this application of the inuence matrix technique requires
only evaluation of the deviation from the boundary condition, multiplication by a 44 real
matrix, and the addition of only two functions to each component of u, each either pure
real or pure imaginary. Compared to the evaluation of nonlinear terms, the computational
overhead is negligible.
3.5 Predictor-corrector
The model equation for each Fourier mode is
(
t

2
)f = N, (3.18)
where nonlinear terms have been evaluated on each radial point by the transform method,
and is solved as in 2.1.3. The predictor at time t
q
, with Euler nonlinear terms and implic-
itness c
f
q+1
1
f
q
t

_
c
2
f
q+1
1
+ (1 c)
2
f
q
_
= N
q
, (3.19)
_
1
t
c
2
_
f
q+1
1
=
_
1
t
+ (1 c)
2
_
f
q
+N
q
. (3.20)
Corrector iterations are
_
1
t
c
2
_
f
q+1
j+1
=
_
1
t
+ (1 c)
2
_
f
q
+c N
q+1
j
+ (1 c) N
q
, (3.21)
Chapter 3. Model 14
or equivalently, for the correction f
corr
= f
q+1
j+1
f
q+1
j
_
1
t
c
2
_
f
corr
= c N
q+1
j
c N
q+1
j1
, (3.22)
where j = 1, 2, . . . and N
q+1
0
= N
q
. The size of the correction |f
corr
| must reduce at each
iteration. For c =
1
2
the scheme is second order such that |f
corr
| t
2
.
3.6 Timestep control
t = C min(/ [v[), 0 < C < 1, (3.23)
where C is the Courant number.
The timestep should also be small enough such that the corrector norm |f
corr
| is satis-
factorily small. This may be important for integrating initial transients, but ought not to
be the limiting factor in general.
4
THE CODE
4.1 Naming conventions
4.1.1 Parameters
Parameters dened within parameters.f90 are given a prex indicating the data type.
For example:
N i_N integer
t d_timestep double
xed ux? b_fixed_flux boolean
Note that the naming convention frees the variable name for dummy variables, e.g.:
integer :: n
do n = 1, i_N
...
4.1.2 Public variables
By default variables declared within a module are accessible to any other module or
subroutine that uses the module. To ensure that it is clear where a public variable is
declared, variables are given a prex. Examples:
t tim_dt timestep.f90
u
r
vel_r velocity.f90
4.2 Ordering the Fourier modes
A =

km
A
km
e
i(kz+m
0
m)
, (4.1)
Coecients are stored as in the following example for m
0
=Mp= 2, M= 4 and K= 4:
Chapter 4. The code 16
m m_ nh
3 6 18 19 20 21 22 23 24
2 4 11 12 13 14 15 16 17
1 2 4 5 6 7 8 9 10
0 0 0 1 2 3
-3 -2 -1 0 1 2 3 k
Rather than two indices k and m, the single index nh labels the modes. In the code this
ordering corresponds to the typical loop
nh = -1
do m = 0, M-1
m_ = m*Mp
do k = -(K-1), (K-1)
if(m==0 .and. k<0) cycle
nh = nh+1
...
NOTE: this loop, and its parallel equivalent, are predended macros in parallel.h.
4.3 Data types
The data types below consist of logical data groups to facilitate the passing of data
between functions. For clarity, in the following text they are dened as though we were
working on a single processor. See 4.4 regarding subtle dierences in the denitions due
to parallelisation.
4.3.1 coll, spec storage of coecients
The collocated data type (coll) is the principle type used for mode-independent oper-
ations.
type coll
double precision :: Re(i_N, 0:i_H1)
double precision :: Im(i_N, 0:i_H1)
end type coll
The value Re(n,nh) is the real part of the coecient for the nh
th
harmonic at r
n
. H1= 24
for the example in 4.2 and N is the number of radial points.
The spectral type (spec) is the data type used for operations independent at each radial
point. It is principally a transitory data format between the coll and phys types.
type spec
double precision :: Re(0:i_H1, i_N)
double precision :: Im(0:i_H1, i_N)
end type spec
Conversions between the coll and spec types involve only a transpose, var_coll2spec()
and var_spec2coll().
Chapter 4. The code 17
4.3.2 phys real space data
The type (phys) is used for data in real space.
type phys
double precision :: Re(0:i_Z-1, 0:i_Th-1, i_N)
end type phys
The element Re(k,m,n) refers to the value at z
k
,
m
and r
n
.
4.3.3 mesh, lumesh storage of banded matrices
The nite-dierence stencil has nite width (2.1) and so matrix operations involve ma-
trices that are banded. The type (mesh), dened in meshs.f90, is used for matrices on
the radial mesh involving kl points to either side of each node.
type mesh
double precision :: M(2*i_KL+1, i_N)
end type mesh
For a matrix A, the element M(KL+1+n-j, j) = A(n,j). See the man page for the lapack
routine dgbtrf. The LU-factorisation of a banded matrix is also banded:
type lumesh
integer :: ipiv(i_N)
double precision :: M(3*i_KL+1, i_N)
end type lumesh
The element M(2*KL+1+n-j, j) = A(n,j). Pivoting information is stored along with the
matrix.
4.3.4 rdom the radial domain
The type (rdom) contains information the radial domain. The public variable mes_D
is declared in meshs.f90. Data includes the number of radial points N; powers of r,
r
p
n
=r(n,p); weights for integration rdr; matrices for taking the p
th
derivative dr(p);. . .
type rdom
integer :: N
double precision :: r(i_N,i_rpowmin:i_rpowmax)
double precision :: intrdr(i_N)
type (mesh) :: dr(i_KL)
end type rdom
4.4 Modications for the parallel implementation
The code has been parallelised using the Message Passing Interface (MPI). Sections of the
code with special MPI function calls are separated from the serial sections by preprocessor
directives. The number of processors is given by the parameter _Np set in parallel.h. The
rank of the local processor is given by mpi_rnk declared in mpi.f90, and mpi_sze=_Np.
Data type changes for the model equation (see equations 2.7, 3.18) are as follows:
Chapter 4. The code 18
q
*
s = p L
mesh
= g
lumesh
on radial subdomain
all real space values
for radial subdomain
all spectral coefficients
selection of modes
all radial points for coll
spec
transpose
transform
g
coll
spec
phys
multiply
invert
p
coll
coll
spec
phys
factorise
transform
for radial subdomain
all spectral coefficients
selection of modes
all radial points for
transpose
phys
Data is split over the spherical harmonics for the linear parts of the code evaluating
curls, gradients and matrix inversions for the timestepping. These linear operations do
not couple modes. Here all radial points for a particular mode are located on the same
processor, separate modes may be located on separate processors. In the denition of
type (coll) the parameter i_H1 is replaced by i_pH1 = (_Np+i_H1)/_Np-1. The variable
var_H has elements to determine which harmonics are on which processor. The elements
pH0 and pH1 are the zeroth index and one-minus the number of harmonics on the local
processor. The corresponding values for all processors are stored in the arrays pH0_(rank)
and pH1_(rank). For a variable of type (coll), valid values for the harmonic index in
Re(n,nh) are nh=0..pH1, which refer to the indices (4.2) nh=pH0..pH0+pH1.
Data is split radially when calculating Fourier transforms and when evaluating products
in real space. In the denitions of type (spec) and type (phys) the parameter i_N is
replaced by i_pN = (_Np+i_N-1)/_Np. The location of the radial subdomain on the local
processor is given by additional elements of the variable mes_D. The elements pNi and pN
are the location of the rst (inner) radial point and the number of points. The values
for all processors are stored in the arrays pNi_(rank) and pN_(rank). For a variable of
type (spec), valid values for the radial index in Re(nh,n) are n=1..pN which refer to the
indices of r
n
, where n=pNi..pNi+pN-1.
The bulk of communication between processors occurs during the data transposes in
the functions var_coll2spec() and var_spec2coll(). For _Np processors the data is
transfered in _Np1 steps. At each step each processor sends and receives a block of data,
the source and destination are selected systematically as set in the following loop:
do step = 1, mpi_sze-1
dest = modulo(mpi_rnk+step, mpi_sze)
src = modulo(mpi_rnk-step+mpi_sze, mpi_sze)
...
Chapter 4. The code 19
0 1 2 3
2
3
step=1
src / dest
data
sections
nr / nh
The sketch is for the case _Np= 4. The size of each block is approximately i_pN(i_pH1+1),
and the data within each block must also be transposed.
A
DIFFERENTIAL OPERATORS IN CYLINDRICALPOLAR
COORDINATES
Gradient
(f)
r
=
r
f, (f)

=
1
r

f, (f)
z
=
z
f. (A.1)
Laplacian

2
f = (
1
r

r
+
rr
)f +
1
r
2

f +
zz
f. (A.2)
Laplacian of vector
(
2
A)
r
=
2
A
r

2
r
2


A
r
r
2
,
(
2
A)

=
2
A

+
2
r
2

A
r

A

r
2
, (A.3)
(
2
A)
z
=
2
A
z
.
Divergence
A = (
1
r
+
r
)A
r
+
1
r

+
z
A
z
. (A.4)
Curl
( A)
r
=
1
r

A
z

z
A

,
( A)

=
z
A
r

r
A
z
, (A.5)
( A)
z
= (
1
r
+
r
)A


1
r

A
r
.
Directional derivative
(A B)
r
= A
r

r
B
r
+
A

B
r
+A
z

z
B
r

A

r
,
(A B)

= A
r

r
B

+
A

+A
z

z
B

+
A

B
r
r
, (A.6)
(A B)
z
= A
r

r
B
z
+
A

B
z
+A
z

z
B
z
.
Appendix A. Dierential operators in cylindricalpolar coordinates 21
Toroidalpoloidal decompositions
Axial
A = ( z) + ( z), (A.7)
= (r, , z), = (r, , z).

2
h
f = (
1
r

r
+
rr
)f +
1
r
2

f. (A.8)
A
r
=
1
r

+
rz
,
A

=
r
+
1
r

z
, (A.9)
A
z
=
2
h
.
Radial
A =
0

+
0
z + (r) + (r), (A.10)

0
=
0
(r),
0
=
0
(r), = (r, , z), = (r, , z).

2
c
f =
1
r
2

f +
zz
f. (A.11)
A
r
= r
2
c
,
A

=
0
+r
z
+
r
, (A.12)
A
z
=
0

+ (2 +r
r
)
z
.

You might also like