Skip to main content

Ricard Borrell

Follower

1

Following

1

Public Views

Interests

Uploads

Papers by Ricard Borrell

Extending the Scalability and Parallelization of SimuCoast Code to hybrid CPU+GPU Supercomputers

The aim of the project was to extend the scalability and parallelization strategy of SimuCoast co... more The aim of the project was to extend the scalability and parallelization strategy of SimuCoast code to enable the use of hybrid CPU+GPU supercomputers. The code is focused on increasing the understanding of coastal processes utilizing high performance computing (HPC) for the numerical simulation of the three-dimensional turbulent flow, which is induced in the coastal zone, and mainly in the surf zone, by wave propagation (oblique to the shore), refraction, breaking and dissipation. A model based on MPI+OpenACC has been implemented in order to increase the computing capabilities of the code. The adapted code was validated using data from the Vittori-Blondeaux simulation and it was tested using up to 512 computing nodes of the Piz Daint supercomputer.

Memory Aware Poisson Solver for Peta-Scale Simulations with one FFT Diagonalizable Direction

2017 International Conference on High Performance Computing & Simulation (HPCS)

Portal del coneixement obert de la UPC http://upcommons.upc.edu/e-prints Oyarzun, G. [et al.] (20... more

Performance Assessment of Pipelined Conjugate Gradient method in Alya

Currently, one of the trending topics in High Performance Computing is related to exascale comput... more Currently, one of the trending topics in High Performance Computing is related to exascale computing. Although the hardware is not yet available, the software community is working on developing and updating codes, which can efficiently use exascale architectures when they become available. Alya is one of the codes that are being developed towards exascale computing. It is part of the simulation packages of the Unified European Applications Benchmark Suite (UEABS) and Accelerators Benchmark Suite of PRACE and thus complies with the highest standards in HPC. Even though Alya has proven its scalability for up to hundreds of thousands of CPU-cores, there are some expensive routines that could affect its performance on exascale architectures. One of these routines is the conjugate gradient (CG) algorithm. CG is relevant because it is called at each time step in order to solve a linear system of equations. The bottleneck in CG is the large number of collective communications calls. In par...

High-Performance Computing: Dos and Don’ts

Part of the research developments and results presented in this chapter were funded by: the Europ... more Part of the research developments and results presented in this chapter were funded by: the European Union’s Horizon 2020 Programme (2014–2020) and from Brazilian Ministry of Science, Technology and Innovation through Rede Nacional de Pesquisa (RNP) under the HPC4E Project, grant agreement 689772; EoCoE, a project funded by the European Union Contract H2020-EINFRA-2015-1-676629; PRACE Type C and Type A projects.

Parallel SFC-based mesh partitioning and load balancing

Modern supercomputers allow the simulation of complex phenomena with increased accuracy. Eventual... more Modern supercomputers allow the simulation of complex phenomena with increased accuracy. Eventually, this requires finer geometric discretizations with larger numbers of mesh elements. In this context, and extrapolating to the Exascale paradigm, meshing operations such as generation, adaptation or partition, become a critical issue within the simulation workflow. In this paper, we focus on mesh partitioning. In particular, we present some improvements carried out on an in-house parallel mesh partitioner based on the Hilbert Space-Filling Curve. Additionally, taking advantage of its performance, we present the application of the SFC-based partitioning for dynamic load balancing. This method is based on the direct monitoring of the imbalance at runtime and the subsequent re-partitioning of the mesh. The target weights for the optimized partitions are evaluated using a least-squares approximation considering all measurements from previous iterations. In this way, the final partition co...

Cache-aware Sparse Patterns for the Factorized Sparse Approximate Inverse Preconditioner

Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing

Conjugate Gradient is a widely used iterative method to solve linear systems Ax=b with matrix A b... more Conjugate Gradient is a widely used iterative method to solve linear systems Ax=b with matrix A being symmetric and positive definite. Part of its effectiveness relies on finding a suitable preconditioner that accelerates its convergence. Factorized Sparse Approximate Inverse (FSAI) preconditioners are a prominent and easily parallelizable option. An essential element of a FSAI preconditioner is the definition of its sparse pattern, which constraints the approximation of the inverse A-1. This definition is generally based on numerical criteria. In this paper we introduce complementary architecture-aware criteria to increase the numerical effectiveness of the preconditioner without incurring in significant performance costs. In particular, we define cache-aware pattern extensions that do not trigger additional cache misses when accessing vector x in the y=Ax Sparse Matrix-Vector (SpMV) kernel. As a result, we obtain very significant reductions in terms of average solution time ranging between 12.94% and 22.85% on three different architectures - Intel Skylake, POWER9 and A64FX - over a set of 72 test matrices.

HPC compact quasi-Newton algorithm for interface problems

Journal of Fluids and Structures

Memory Aware Poisson Solver for Peta-Scale Simulations with one FFT Diagonalizable Direction

2017 International Conference on High Performance Computing & Simulation (HPCS), Jul 1, 2017

Portal del coneixement obert de la UPC http://upcommons.upc.edu/e-prints Oyarzun, G. [et al.] (20... more

Effects of the Actuation on the Boundary Layer of an Airfoil at Reynolds Number Re = 60000

Flow, Turbulence and Combustion

Ivette Rodriguez et al. in a significant part of the airfoil chord. As a consequence, airfoil aer... more Ivette Rodriguez et al. in a significant part of the airfoil chord. As a consequence, airfoil aerodynamic efficiency increases by a 124% with a reduction of the drag coefficient about 46%. This kind of technique seems to be promising at delaying flow separation and its associated losses when the angle of attack increases beyond the maximum lift for the baseline case.

Portable implementation model for CFD simulations. Application to hybrid CPU/GPU supercomputers

International Journal of Computational Fluid Dynamics

Nowadays HPC systems experience a disruptive moment with a variety of novel architectures and fra... more Nowadays HPC systems experience a disruptive moment with a variety of novel architectures and frameworks, without any clarity of which one is going to prevail. In this context, the portability of codes across different architectures is of major importance. This paper presents a portable implementation model based on an algebraic operational approach for DNS and LES of incompressible turbulent flows using unstructured hybrid meshes. The strategy proposed consists in representing the whole time-integration algorithm using only three basic algebraic operations: sparse matrix-vector product, a linear combination of vectors and dot product. The main idea is based on decomposing the non-linear operators into a concatenation of two SpMV operations. This provides high modularity and portability. An exhaustive analysis of the proposed implementation for hybrid CPU/GPU supercomputers has been conducted with tests using up to 128 GPUs. The main objective consists in understanding the challenges of implementing CFD codes on new architectures.

Extension of the parallel Sparse Matrix Vector Product (SpMV) for the implicit coupling of PDEs on non-matching meshes

Computers & Fluids

Abstract The Sparse Matrix Vector Product (SpMV) is one of the main operations of iterative solve... more Abstract The Sparse Matrix Vector Product (SpMV) is one of the main operations of iterative solvers, and, in a parallel context, it is also the siege of point-to-point communications between the neighboring MPI processes. The parallel SpMV is built in such a way that it gives, up to round off errors, the same result as its sequential counterpart. In this regards, nodes on the interfaces (or halo nodes if halo are considered) are duplicated nodes of the same original mesh. It is therefore limited to matching meshes. In this work, we generalize the parallel SpMV to glue the solution of non-matching (non-conforming) meshes through the introduction of transmission matrices. This extension of the SpMV thus enables the implicit and parallel solution of partial differential equations on non-matching meshes, as well as the implicit coupling of multiphysics problems, such as fluid-structure interactions. The proposed method is developed similarly to classical parallelization techniques and can therefore be implemented by modifying few subroutines of an already MPI-based code. According to the proposed framework, the classical parallelization technique appears as a particular case of this general setup.

Wall‐modeled large‐eddy simulation in a finite element framework

International Journal for Numerical Methods in Fluids

Parallel mesh partitioning based on space filling curves

Computers & Fluids

Larger supercomputers allow the simulation of more complex phenomena with increased accuracy. Eve... more Larger supercomputers allow the simulation of more complex phenomena with increased accuracy. Eventually this requires finer and thus also larger geometric discretizations. In this context, and extrapolating to the Exascale paradigm, meshing operations such as generation, deformation, adaptation/regeneration or partition/load balance, become a critical issue within the simulation workflow. In this paper we focus on mesh partitioning. In particular, we present a fast and scalable geometric partitioner based on Space Filling Curves (SFC), as an alternative to the standard graph partitioning approach. We have avoided any computing or memory bottleneck in the algorithm, while we have imposed that the solution achieved is independent (discounting rounding off errors) of the number of parallel processes used to compute it. The performance of the SFC-based partitioner presented has been demonstrated using up to 4096 CPU-cores in the Blue Waters supercomputer.

On the flow past a circular cylinder from critical to super-critical Reynolds numbers: Wake topology and vortex shedding

International Journal of Heat and Fluid Flow

Large-eddy simulations (LES) of the flow past a circular cylinder are used to investigate the flo... more Large-eddy simulations (LES) of the flow past a circular cylinder are used to investigate the flow topology and the vortex shedding process at Reynolds numbers Re = 2.5×10 5 −8.5×10 5. This range encompasses both the critical and super-critical regimes. As the flow enters the critical regime, major changes occur which affect the flow configuration. Asymmetries in the flow are found in the critical regime, whereas the wake recovers its symmetry and stabilises in the super-critical regime. Wake characteristic lengths are measured and compared between the different Reynolds numbers. It is shown that the super-critical regime is characterised by a plateau in the drag coefficient at about C D ≈ 0.22, and a quasi-stable wake which has a non-dimensional width of d w /D ≈ 0.4. The periodic nature of the flow is analysed by means of measurements of the unsteady drag and lift coefficients. Power spectra of the lift fluctuations are computed. Wake vortex shedding is found to occur for both regimes investigated, although a jump in frequencies is observed when the flow enters the super-critical regime. In this regime, non-dimensional vortex-shedding frequency is almost constant and equal to St = f vs D/U ref ≈ 0.44. The analysis also shows a steep decrease in the fluctuating lift when entering the super-critical regime. The combined analysis of both wake topology and vortex shedding complements the physical picture of a stable and highly coherent flow in the super-critical regime.

Parallel load balancing strategy for Volume-of-Fluid methods on 3-D unstructured meshes

Journal of Computational Physics, 2015

Volume-of-Fluid (VOF) is one of the methods of choice to reproduce the interface motion in the si... more Volume-of-Fluid (VOF) is one of the methods of choice to reproduce the interface motion in the simulation of multi-fluid flows. One of its main strengths is its accuracy in capturing sharp interface geometries, although requiring for it a number of geometric calculations. Under these circumstances, achieving parallel performance on current supercomputers is a must. The main obstacle for the parallelization is that the computing costs are concentrated only in the discrete elements that lie on the interface between fluids. Consequently, if the interface is not homogeneously distributed throughout the domain, standard domain decomposition (DD) strategies lead to imbalanced workload distributions. In this paper, we present a new parallelization strategy for general unstructured VOF solvers, based on a dynamic load balancing process complementary to the underlying DD. Its parallel efficiency has been analyzed and compared to the DD one using up to 1024 CPU-cores on an Intel SandyBridge based supercomputer. The results obtained on the solution of several artificially generated test cases show a speedup of up to 12x with respect to the standard DD, depending on the interface size, the initial distribution and the number of parallel processes engaged. Moreover, the new parallelization strategy presented is of general purpose, therefore, it could be used to parallelize any VOF solver without requiring changes on the coupled flow solver. Finally, note that although designed for the VOF method, our approach could be easily adapted to other interface-capturing methods, such as the Level-Set, which may present similar workload imbalances.

Parallel load balancing strategy for Volume-of-Fluid methods on 3-D unstructured meshes

Volume-of-Fluid (VOF) is one of the methods of choice to reproduce the interface motion in the si... more Volume-of-Fluid (VOF) is one of the methods of choice to reproduce the interface motion in the simulation of multi-fluid flows. One of its main strengths is its accuracy in capturing sharp interface geometries, although requiring for it a number of geometric calculations. Under these circumstances, achieving parallel performance on current supercomputers is a must. The main obstacle for the parallelization is that the computing costs are concentrated only in the discrete elements that lie on the interface between fluids. Consequently, if the interface is not homogeneously distributed throughout the domain, standard domain decomposition (DD) strategies lead to imbalanced workload distributions. In this paper, we present a new parallelization strategy for general unstructured VOF solvers, based on a dynamic load balancing process complementary to the underlying DD. Its parallel efficiency has been analyzed and compared to the DD one using up to 1024 CPU-cores on an Intel SandyBridge based supercomputer. The results obtained on the solution of several artificially generated test cases show a speedup of up to 12x with respect to the standard DD, depending on the interface size, the initial distribution and the number of parallel processes engaged. Moreover, the new parallelization strategy presented is of general purpose, therefore, it could be used to parallelize any VOF solver without requiring changes on the coupled flow solver. Finally, note that although designed for the VOF method, our approach could be easily adapted to other interface-capturing methods, such as the Level-Set, which may present similar workload imbalances.

Parallel direct Poisson solver for DNS of complex turbulent flows using unstructured meshes

Lecture Notes in Computational Science and Engineering, 2010

In this paper a parallel direct Poisson solver for DNS simulation of turbulent flows statisticall... more In this paper a parallel direct Poisson solver for DNS simulation of turbulent flows statistically homogeneous in one spatial direction is presented. It is based on a Fourier diagonalization and a Schur decomposition on the spanwise and streamwise directions respectively. Numerical experiments carried out in order to test the robustness and efficiency of the algorithm are presented. This solver is being used for a DNS of a turbulent flow around a circular cylinder at Re = 1 ×104, the size of the required mesh is about 104 M elements and the discrete Poisson equation derived is solved in less than one second of CPU time using 720 CPUs of Marenostrum supercomputer.

Direct Numerical Simulation of Turbulent Natural Convection Flows Using PC Clusters

Parallel Computational Fluid Dynamics 2003, 2004

ABSTRACT

Parallel direct Poisson solver for discretisations with one Fourier diagonalisable direction

Journal of Computational Physics, 2011

In the context of time-accurate numerical simulation of incompressible flows, a Poisson equation ... more In the context of time-accurate numerical simulation of incompressible flows, a Poisson equation needs to be solved at least once per time-step to project the velocity field onto a divergence-free space. Due to the non-local nature of its solution, this elliptic system is one of the most time consuming and difficult to parallelise parts of the code.In this paper, a

Parallel adaptive mesh refinement for large eddy simulation using the finite element method

Applied Parallel Computing Large Scale Scientific and Industrial Problems, 1998

This paper describes work in progress at Hitachi Dublin Laboratory to develop a parallel adaptive... more This paper describes work in progress at Hitachi Dublin Laboratory to develop a parallel adaptive mesh refinement library. The library has been designed to be linked with a finite element simulation engine for solving three-dimensional unstructured turbulent fluid dynamics problems, using large eddy simulation. The library takes as input a distributed mesh and a list of mesh elements to be refined, carries out the refinement in parallel on the distributed data structure, redistributes the computational load and passes the updated ...

Extending the Scalability and Parallelization of SimuCoast Code to hybrid CPU+GPU Supercomputers

The aim of the project was to extend the scalability and parallelization strategy of SimuCoast co... more The aim of the project was to extend the scalability and parallelization strategy of SimuCoast code to enable the use of hybrid CPU+GPU supercomputers. The code is focused on increasing the understanding of coastal processes utilizing high performance computing (HPC) for the numerical simulation of the three-dimensional turbulent flow, which is induced in the coastal zone, and mainly in the surf zone, by wave propagation (oblique to the shore), refraction, breaking and dissipation. A model based on MPI+OpenACC has been implemented in order to increase the computing capabilities of the code. The adapted code was validated using data from the Vittori-Blondeaux simulation and it was tested using up to 512 computing nodes of the Piz Daint supercomputer.

Memory Aware Poisson Solver for Peta-Scale Simulations with one FFT Diagonalizable Direction

2017 International Conference on High Performance Computing & Simulation (HPCS)

Portal del coneixement obert de la UPC http://upcommons.upc.edu/e-prints Oyarzun, G. [et al.] (20... more

Performance Assessment of Pipelined Conjugate Gradient method in Alya

Currently, one of the trending topics in High Performance Computing is related to exascale comput... more Currently, one of the trending topics in High Performance Computing is related to exascale computing. Although the hardware is not yet available, the software community is working on developing and updating codes, which can efficiently use exascale architectures when they become available. Alya is one of the codes that are being developed towards exascale computing. It is part of the simulation packages of the Unified European Applications Benchmark Suite (UEABS) and Accelerators Benchmark Suite of PRACE and thus complies with the highest standards in HPC. Even though Alya has proven its scalability for up to hundreds of thousands of CPU-cores, there are some expensive routines that could affect its performance on exascale architectures. One of these routines is the conjugate gradient (CG) algorithm. CG is relevant because it is called at each time step in order to solve a linear system of equations. The bottleneck in CG is the large number of collective communications calls. In par...

High-Performance Computing: Dos and Don’ts

Part of the research developments and results presented in this chapter were funded by: the Europ... more Part of the research developments and results presented in this chapter were funded by: the European Union’s Horizon 2020 Programme (2014–2020) and from Brazilian Ministry of Science, Technology and Innovation through Rede Nacional de Pesquisa (RNP) under the HPC4E Project, grant agreement 689772; EoCoE, a project funded by the European Union Contract H2020-EINFRA-2015-1-676629; PRACE Type C and Type A projects.

Parallel SFC-based mesh partitioning and load balancing

Modern supercomputers allow the simulation of complex phenomena with increased accuracy. Eventual... more Modern supercomputers allow the simulation of complex phenomena with increased accuracy. Eventually, this requires finer geometric discretizations with larger numbers of mesh elements. In this context, and extrapolating to the Exascale paradigm, meshing operations such as generation, adaptation or partition, become a critical issue within the simulation workflow. In this paper, we focus on mesh partitioning. In particular, we present some improvements carried out on an in-house parallel mesh partitioner based on the Hilbert Space-Filling Curve. Additionally, taking advantage of its performance, we present the application of the SFC-based partitioning for dynamic load balancing. This method is based on the direct monitoring of the imbalance at runtime and the subsequent re-partitioning of the mesh. The target weights for the optimized partitions are evaluated using a least-squares approximation considering all measurements from previous iterations. In this way, the final partition co...

Cache-aware Sparse Patterns for the Factorized Sparse Approximate Inverse Preconditioner

Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing

Conjugate Gradient is a widely used iterative method to solve linear systems Ax=b with matrix A b... more Conjugate Gradient is a widely used iterative method to solve linear systems Ax=b with matrix A being symmetric and positive definite. Part of its effectiveness relies on finding a suitable preconditioner that accelerates its convergence. Factorized Sparse Approximate Inverse (FSAI) preconditioners are a prominent and easily parallelizable option. An essential element of a FSAI preconditioner is the definition of its sparse pattern, which constraints the approximation of the inverse A-1. This definition is generally based on numerical criteria. In this paper we introduce complementary architecture-aware criteria to increase the numerical effectiveness of the preconditioner without incurring in significant performance costs. In particular, we define cache-aware pattern extensions that do not trigger additional cache misses when accessing vector x in the y=Ax Sparse Matrix-Vector (SpMV) kernel. As a result, we obtain very significant reductions in terms of average solution time ranging between 12.94% and 22.85% on three different architectures - Intel Skylake, POWER9 and A64FX - over a set of 72 test matrices.

HPC compact quasi-Newton algorithm for interface problems

Journal of Fluids and Structures

Memory Aware Poisson Solver for Peta-Scale Simulations with one FFT Diagonalizable Direction

2017 International Conference on High Performance Computing & Simulation (HPCS), Jul 1, 2017

Portal del coneixement obert de la UPC http://upcommons.upc.edu/e-prints Oyarzun, G. [et al.] (20... more

Effects of the Actuation on the Boundary Layer of an Airfoil at Reynolds Number Re = 60000

Flow, Turbulence and Combustion

Ivette Rodriguez et al. in a significant part of the airfoil chord. As a consequence, airfoil aer... more Ivette Rodriguez et al. in a significant part of the airfoil chord. As a consequence, airfoil aerodynamic efficiency increases by a 124% with a reduction of the drag coefficient about 46%. This kind of technique seems to be promising at delaying flow separation and its associated losses when the angle of attack increases beyond the maximum lift for the baseline case.

Portable implementation model for CFD simulations. Application to hybrid CPU/GPU supercomputers

International Journal of Computational Fluid Dynamics

Nowadays HPC systems experience a disruptive moment with a variety of novel architectures and fra... more Nowadays HPC systems experience a disruptive moment with a variety of novel architectures and frameworks, without any clarity of which one is going to prevail. In this context, the portability of codes across different architectures is of major importance. This paper presents a portable implementation model based on an algebraic operational approach for DNS and LES of incompressible turbulent flows using unstructured hybrid meshes. The strategy proposed consists in representing the whole time-integration algorithm using only three basic algebraic operations: sparse matrix-vector product, a linear combination of vectors and dot product. The main idea is based on decomposing the non-linear operators into a concatenation of two SpMV operations. This provides high modularity and portability. An exhaustive analysis of the proposed implementation for hybrid CPU/GPU supercomputers has been conducted with tests using up to 128 GPUs. The main objective consists in understanding the challenges of implementing CFD codes on new architectures.

Extension of the parallel Sparse Matrix Vector Product (SpMV) for the implicit coupling of PDEs on non-matching meshes

Computers & Fluids

Abstract The Sparse Matrix Vector Product (SpMV) is one of the main operations of iterative solve... more Abstract The Sparse Matrix Vector Product (SpMV) is one of the main operations of iterative solvers, and, in a parallel context, it is also the siege of point-to-point communications between the neighboring MPI processes. The parallel SpMV is built in such a way that it gives, up to round off errors, the same result as its sequential counterpart. In this regards, nodes on the interfaces (or halo nodes if halo are considered) are duplicated nodes of the same original mesh. It is therefore limited to matching meshes. In this work, we generalize the parallel SpMV to glue the solution of non-matching (non-conforming) meshes through the introduction of transmission matrices. This extension of the SpMV thus enables the implicit and parallel solution of partial differential equations on non-matching meshes, as well as the implicit coupling of multiphysics problems, such as fluid-structure interactions. The proposed method is developed similarly to classical parallelization techniques and can therefore be implemented by modifying few subroutines of an already MPI-based code. According to the proposed framework, the classical parallelization technique appears as a particular case of this general setup.

Wall‐modeled large‐eddy simulation in a finite element framework

International Journal for Numerical Methods in Fluids

Parallel mesh partitioning based on space filling curves

Computers & Fluids

Larger supercomputers allow the simulation of more complex phenomena with increased accuracy. Eve... more Larger supercomputers allow the simulation of more complex phenomena with increased accuracy. Eventually this requires finer and thus also larger geometric discretizations. In this context, and extrapolating to the Exascale paradigm, meshing operations such as generation, deformation, adaptation/regeneration or partition/load balance, become a critical issue within the simulation workflow. In this paper we focus on mesh partitioning. In particular, we present a fast and scalable geometric partitioner based on Space Filling Curves (SFC), as an alternative to the standard graph partitioning approach. We have avoided any computing or memory bottleneck in the algorithm, while we have imposed that the solution achieved is independent (discounting rounding off errors) of the number of parallel processes used to compute it. The performance of the SFC-based partitioner presented has been demonstrated using up to 4096 CPU-cores in the Blue Waters supercomputer.

On the flow past a circular cylinder from critical to super-critical Reynolds numbers: Wake topology and vortex shedding

International Journal of Heat and Fluid Flow

Large-eddy simulations (LES) of the flow past a circular cylinder are used to investigate the flo... more Large-eddy simulations (LES) of the flow past a circular cylinder are used to investigate the flow topology and the vortex shedding process at Reynolds numbers Re = 2.5×10 5 −8.5×10 5. This range encompasses both the critical and super-critical regimes. As the flow enters the critical regime, major changes occur which affect the flow configuration. Asymmetries in the flow are found in the critical regime, whereas the wake recovers its symmetry and stabilises in the super-critical regime. Wake characteristic lengths are measured and compared between the different Reynolds numbers. It is shown that the super-critical regime is characterised by a plateau in the drag coefficient at about C D ≈ 0.22, and a quasi-stable wake which has a non-dimensional width of d w /D ≈ 0.4. The periodic nature of the flow is analysed by means of measurements of the unsteady drag and lift coefficients. Power spectra of the lift fluctuations are computed. Wake vortex shedding is found to occur for both regimes investigated, although a jump in frequencies is observed when the flow enters the super-critical regime. In this regime, non-dimensional vortex-shedding frequency is almost constant and equal to St = f vs D/U ref ≈ 0.44. The analysis also shows a steep decrease in the fluctuating lift when entering the super-critical regime. The combined analysis of both wake topology and vortex shedding complements the physical picture of a stable and highly coherent flow in the super-critical regime.

Parallel load balancing strategy for Volume-of-Fluid methods on 3-D unstructured meshes

Journal of Computational Physics, 2015

Volume-of-Fluid (VOF) is one of the methods of choice to reproduce the interface motion in the si... more Volume-of-Fluid (VOF) is one of the methods of choice to reproduce the interface motion in the simulation of multi-fluid flows. One of its main strengths is its accuracy in capturing sharp interface geometries, although requiring for it a number of geometric calculations. Under these circumstances, achieving parallel performance on current supercomputers is a must. The main obstacle for the parallelization is that the computing costs are concentrated only in the discrete elements that lie on the interface between fluids. Consequently, if the interface is not homogeneously distributed throughout the domain, standard domain decomposition (DD) strategies lead to imbalanced workload distributions. In this paper, we present a new parallelization strategy for general unstructured VOF solvers, based on a dynamic load balancing process complementary to the underlying DD. Its parallel efficiency has been analyzed and compared to the DD one using up to 1024 CPU-cores on an Intel SandyBridge based supercomputer. The results obtained on the solution of several artificially generated test cases show a speedup of up to 12x with respect to the standard DD, depending on the interface size, the initial distribution and the number of parallel processes engaged. Moreover, the new parallelization strategy presented is of general purpose, therefore, it could be used to parallelize any VOF solver without requiring changes on the coupled flow solver. Finally, note that although designed for the VOF method, our approach could be easily adapted to other interface-capturing methods, such as the Level-Set, which may present similar workload imbalances.

Parallel load balancing strategy for Volume-of-Fluid methods on 3-D unstructured meshes

Volume-of-Fluid (VOF) is one of the methods of choice to reproduce the interface motion in the si... more Volume-of-Fluid (VOF) is one of the methods of choice to reproduce the interface motion in the simulation of multi-fluid flows. One of its main strengths is its accuracy in capturing sharp interface geometries, although requiring for it a number of geometric calculations. Under these circumstances, achieving parallel performance on current supercomputers is a must. The main obstacle for the parallelization is that the computing costs are concentrated only in the discrete elements that lie on the interface between fluids. Consequently, if the interface is not homogeneously distributed throughout the domain, standard domain decomposition (DD) strategies lead to imbalanced workload distributions. In this paper, we present a new parallelization strategy for general unstructured VOF solvers, based on a dynamic load balancing process complementary to the underlying DD. Its parallel efficiency has been analyzed and compared to the DD one using up to 1024 CPU-cores on an Intel SandyBridge based supercomputer. The results obtained on the solution of several artificially generated test cases show a speedup of up to 12x with respect to the standard DD, depending on the interface size, the initial distribution and the number of parallel processes engaged. Moreover, the new parallelization strategy presented is of general purpose, therefore, it could be used to parallelize any VOF solver without requiring changes on the coupled flow solver. Finally, note that although designed for the VOF method, our approach could be easily adapted to other interface-capturing methods, such as the Level-Set, which may present similar workload imbalances.

Parallel direct Poisson solver for DNS of complex turbulent flows using unstructured meshes

Lecture Notes in Computational Science and Engineering, 2010

In this paper a parallel direct Poisson solver for DNS simulation of turbulent flows statisticall... more In this paper a parallel direct Poisson solver for DNS simulation of turbulent flows statistically homogeneous in one spatial direction is presented. It is based on a Fourier diagonalization and a Schur decomposition on the spanwise and streamwise directions respectively. Numerical experiments carried out in order to test the robustness and efficiency of the algorithm are presented. This solver is being used for a DNS of a turbulent flow around a circular cylinder at Re = 1 ×104, the size of the required mesh is about 104 M elements and the discrete Poisson equation derived is solved in less than one second of CPU time using 720 CPUs of Marenostrum supercomputer.

Direct Numerical Simulation of Turbulent Natural Convection Flows Using PC Clusters

Parallel Computational Fluid Dynamics 2003, 2004

ABSTRACT

Parallel direct Poisson solver for discretisations with one Fourier diagonalisable direction

Journal of Computational Physics, 2011

In the context of time-accurate numerical simulation of incompressible flows, a Poisson equation ... more In the context of time-accurate numerical simulation of incompressible flows, a Poisson equation needs to be solved at least once per time-step to project the velocity field onto a divergence-free space. Due to the non-local nature of its solution, this elliptic system is one of the most time consuming and difficult to parallelise parts of the code.In this paper, a

Parallel adaptive mesh refinement for large eddy simulation using the finite element method

Applied Parallel Computing Large Scale Scientific and Industrial Problems, 1998

This paper describes work in progress at Hitachi Dublin Laboratory to develop a parallel adaptive... more This paper describes work in progress at Hitachi Dublin Laboratory to develop a parallel adaptive mesh refinement library. The library has been designed to be linked with a finite element simulation engine for solving three-dimensional unstructured turbulent fluid dynamics problems, using large eddy simulation. The library takes as input a distributed mesh and a list of mesh elements to be refined, carries out the refinement in parallel on the distributed data structure, redistributes the computational load and passes the updated ...