The aim of the project was to extend the scalability and parallelization strategy of SimuCoast co... more The aim of the project was to extend the scalability and parallelization strategy of SimuCoast code to enable the use of hybrid CPU+GPU supercomputers. The code is focused on increasing the understanding of coastal processes utilizing high performance computing (HPC) for the numerical simulation of the three-dimensional turbulent flow, which is induced in the coastal zone, and mainly in the surf zone, by wave propagation (oblique to the shore), refraction, breaking and dissipation. A model based on MPI+OpenACC has been implemented in order to increase the computing capabilities of the code. The adapted code was validated using data from the Vittori-Blondeaux simulation and it was tested using up to 512 computing nodes of the Piz Daint supercomputer.
2017 International Conference on High Performance Computing & Simulation (HPCS)
Portal del coneixement obert de la UPC http://upcommons.upc.edu/e-prints Oyarzun, G. [et al.] (20... more Portal del coneixement obert de la UPC http://upcommons.upc.edu/e-prints Oyarzun, G. [et al.] (2017) Memory aware poisson solver for peta-scale simulations with one FFT diagonalizable direction.
Currently, one of the trending topics in High Performance Computing is related to exascale comput... more Currently, one of the trending topics in High Performance Computing is related to exascale computing. Although the hardware is not yet available, the software community is working on developing and updating codes, which can efficiently use exascale architectures when they become available. Alya is one of the codes that are being developed towards exascale computing. It is part of the simulation packages of the Unified European Applications Benchmark Suite (UEABS) and Accelerators Benchmark Suite of PRACE and thus complies with the highest standards in HPC. Even though Alya has proven its scalability for up to hundreds of thousands of CPU-cores, there are some expensive routines that could affect its performance on exascale architectures. One of these routines is the conjugate gradient (CG) algorithm. CG is relevant because it is called at each time step in order to solve a linear system of equations. The bottleneck in CG is the large number of collective communications calls. In par...
Part of the research developments and results presented in this chapter were funded by: the Europ... more Part of the research developments and results presented in this chapter were funded by: the European Union’s Horizon 2020 Programme (2014–2020) and from Brazilian Ministry of Science, Technology and Innovation through Rede Nacional de Pesquisa (RNP) under the HPC4E Project, grant agreement 689772; EoCoE, a project funded by the European Union Contract H2020-EINFRA-2015-1-676629; PRACE Type C and Type A projects.
Modern supercomputers allow the simulation of complex phenomena with increased accuracy. Eventual... more Modern supercomputers allow the simulation of complex phenomena with increased accuracy. Eventually, this requires finer geometric discretizations with larger numbers of mesh elements. In this context, and extrapolating to the Exascale paradigm, meshing operations such as generation, adaptation or partition, become a critical issue within the simulation workflow. In this paper, we focus on mesh partitioning. In particular, we present some improvements carried out on an in-house parallel mesh partitioner based on the Hilbert Space-Filling Curve. Additionally, taking advantage of its performance, we present the application of the SFC-based partitioning for dynamic load balancing. This method is based on the direct monitoring of the imbalance at runtime and the subsequent re-partitioning of the mesh. The target weights for the optimized partitions are evaluated using a least-squares approximation considering all measurements from previous iterations. In this way, the final partition co...
Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing
Conjugate Gradient is a widely used iterative method to solve linear systems Ax=b with matrix A b... more Conjugate Gradient is a widely used iterative method to solve linear systems Ax=b with matrix A being symmetric and positive definite. Part of its effectiveness relies on finding a suitable preconditioner that accelerates its convergence. Factorized Sparse Approximate Inverse (FSAI) preconditioners are a prominent and easily parallelizable option. An essential element of a FSAI preconditioner is the definition of its sparse pattern, which constraints the approximation of the inverse A-1. This definition is generally based on numerical criteria. In this paper we introduce complementary architecture-aware criteria to increase the numerical effectiveness of the preconditioner without incurring in significant performance costs. In particular, we define cache-aware pattern extensions that do not trigger additional cache misses when accessing vector x in the y=Ax Sparse Matrix-Vector (SpMV) kernel. As a result, we obtain very significant reductions in terms of average solution time ranging between 12.94% and 22.85% on three different architectures - Intel Skylake, POWER9 and A64FX - over a set of 72 test matrices.
2017 International Conference on High Performance Computing & Simulation (HPCS), Jul 1, 2017
Portal del coneixement obert de la UPC http://upcommons.upc.edu/e-prints Oyarzun, G. [et al.] (20... more Portal del coneixement obert de la UPC http://upcommons.upc.edu/e-prints Oyarzun, G. [et al.] (2017) Memory aware poisson solver for peta-scale simulations with one FFT diagonalizable direction.
Ivette Rodriguez et al. in a significant part of the airfoil chord. As a consequence, airfoil aer... more Ivette Rodriguez et al. in a significant part of the airfoil chord. As a consequence, airfoil aerodynamic efficiency increases by a 124% with a reduction of the drag coefficient about 46%. This kind of technique seems to be promising at delaying flow separation and its associated losses when the angle of attack increases beyond the maximum lift for the baseline case.
International Journal of Computational Fluid Dynamics
Nowadays HPC systems experience a disruptive moment with a variety of novel architectures and fra... more Nowadays HPC systems experience a disruptive moment with a variety of novel architectures and frameworks, without any clarity of which one is going to prevail. In this context, the portability of codes across different architectures is of major importance. This paper presents a portable implementation model based on an algebraic operational approach for DNS and LES of incompressible turbulent flows using unstructured hybrid meshes. The strategy proposed consists in representing the whole time-integration algorithm using only three basic algebraic operations: sparse matrix-vector product, a linear combination of vectors and dot product. The main idea is based on decomposing the non-linear operators into a concatenation of two SpMV operations. This provides high modularity and portability. An exhaustive analysis of the proposed implementation for hybrid CPU/GPU supercomputers has been conducted with tests using up to 128 GPUs. The main objective consists in understanding the challenges of implementing CFD codes on new architectures.
Abstract The Sparse Matrix Vector Product (SpMV) is one of the main operations of iterative solve... more Abstract The Sparse Matrix Vector Product (SpMV) is one of the main operations of iterative solvers, and, in a parallel context, it is also the siege of point-to-point communications between the neighboring MPI processes. The parallel SpMV is built in such a way that it gives, up to round off errors, the same result as its sequential counterpart. In this regards, nodes on the interfaces (or halo nodes if halo are considered) are duplicated nodes of the same original mesh. It is therefore limited to matching meshes. In this work, we generalize the parallel SpMV to glue the solution of non-matching (non-conforming) meshes through the introduction of transmission matrices. This extension of the SpMV thus enables the implicit and parallel solution of partial differential equations on non-matching meshes, as well as the implicit coupling of multiphysics problems, such as fluid-structure interactions. The proposed method is developed similarly to classical parallelization techniques and can therefore be implemented by modifying few subroutines of an already MPI-based code. According to the proposed framework, the classical parallelization technique appears as a particular case of this general setup.
Larger supercomputers allow the simulation of more complex phenomena with increased accuracy. Eve... more Larger supercomputers allow the simulation of more complex phenomena with increased accuracy. Eventually this requires finer and thus also larger geometric discretizations. In this context, and extrapolating to the Exascale paradigm, meshing operations such as generation, deformation, adaptation/regeneration or partition/load balance, become a critical issue within the simulation workflow. In this paper we focus on mesh partitioning. In particular, we present a fast and scalable geometric partitioner based on Space Filling Curves (SFC), as an alternative to the standard graph partitioning approach. We have avoided any computing or memory bottleneck in the algorithm, while we have imposed that the solution achieved is independent (discounting rounding off errors) of the number of parallel processes used to compute it. The performance of the SFC-based partitioner presented has been demonstrated using up to 4096 CPU-cores in the Blue Waters supercomputer.
Large-eddy simulations (LES) of the flow past a circular cylinder are used to investigate the flo... more Large-eddy simulations (LES) of the flow past a circular cylinder are used to investigate the flow topology and the vortex shedding process at Reynolds numbers Re = 2.5×10 5 −8.5×10 5. This range encompasses both the critical and super-critical regimes. As the flow enters the critical regime, major changes occur which affect the flow configuration. Asymmetries in the flow are found in the critical regime, whereas the wake recovers its symmetry and stabilises in the super-critical regime. Wake characteristic lengths are measured and compared between the different Reynolds numbers. It is shown that the super-critical regime is characterised by a plateau in the drag coefficient at about C D ≈ 0.22, and a quasi-stable wake which has a non-dimensional width of d w /D ≈ 0.4. The periodic nature of the flow is analysed by means of measurements of the unsteady drag and lift coefficients. Power spectra of the lift fluctuations are computed. Wake vortex shedding is found to occur for both regimes investigated, although a jump in frequencies is observed when the flow enters the super-critical regime. In this regime, non-dimensional vortex-shedding frequency is almost constant and equal to St = f vs D/U ref ≈ 0.44. The analysis also shows a steep decrease in the fluctuating lift when entering the super-critical regime. The combined analysis of both wake topology and vortex shedding complements the physical picture of a stable and highly coherent flow in the super-critical regime.
Volume-of-Fluid (VOF) is one of the methods of choice to reproduce the interface motion in the si... more Volume-of-Fluid (VOF) is one of the methods of choice to reproduce the interface motion in the simulation of multi-fluid flows. One of its main strengths is its accuracy in capturing sharp interface geometries, although requiring for it a number of geometric calculations. Under these circumstances, achieving parallel performance on current supercomputers is a must. The main obstacle for the parallelization is that the computing costs are concentrated only in the discrete elements that lie on the interface between fluids. Consequently, if the interface is not homogeneously distributed throughout the domain, standard domain decomposition (DD) strategies lead to imbalanced workload distributions. In this paper, we present a new parallelization strategy for general unstructured VOF solvers, based on a dynamic load balancing process complementary to the underlying DD. Its parallel efficiency has been analyzed and compared to the DD one using up to 1024 CPU-cores on an Intel SandyBridge based supercomputer. The results obtained on the solution of several artificially generated test cases show a speedup of up to 12x with respect to the standard DD, depending on the interface size, the initial distribution and the number of parallel processes engaged. Moreover, the new parallelization strategy presented is of general purpose, therefore, it could be used to parallelize any VOF solver without requiring changes on the coupled flow solver. Finally, note that although designed for the VOF method, our approach could be easily adapted to other interface-capturing methods, such as the Level-Set, which may present similar workload imbalances.
Volume-of-Fluid (VOF) is one of the methods of choice to reproduce the interface motion in the si... more Volume-of-Fluid (VOF) is one of the methods of choice to reproduce the interface motion in the simulation of multi-fluid flows. One of its main strengths is its accuracy in capturing sharp interface geometries, although requiring for it a number of geometric calculations. Under these circumstances, achieving parallel performance on current supercomputers is a must. The main obstacle for the parallelization is that the computing costs are concentrated only in the discrete elements that lie on the interface between fluids. Consequently, if the interface is not homogeneously distributed throughout the domain, standard domain decomposition (DD) strategies lead to imbalanced workload distributions. In this paper, we present a new parallelization strategy for general unstructured VOF solvers, based on a dynamic load balancing process complementary to the underlying DD. Its parallel efficiency has been analyzed and compared to the DD one using up to 1024 CPU-cores on an Intel SandyBridge based supercomputer. The results obtained on the solution of several artificially generated test cases show a speedup of up to 12x with respect to the standard DD, depending on the interface size, the initial distribution and the number of parallel processes engaged. Moreover, the new parallelization strategy presented is of general purpose, therefore, it could be used to parallelize any VOF solver without requiring changes on the coupled flow solver. Finally, note that although designed for the VOF method, our approach could be easily adapted to other interface-capturing methods, such as the Level-Set, which may present similar workload imbalances.
Lecture Notes in Computational Science and Engineering, 2010
In this paper a parallel direct Poisson solver for DNS simulation of turbulent flows statisticall... more In this paper a parallel direct Poisson solver for DNS simulation of turbulent flows statistically homogeneous in one spatial direction is presented. It is based on a Fourier diagonalization and a Schur decomposition on the spanwise and streamwise directions respectively. Numerical experiments carried out in order to test the robustness and efficiency of the algorithm are presented. This solver is being used for a DNS of a turbulent flow around a circular cylinder at Re = 1 ×104, the size of the required mesh is about 104 M elements and the discrete Poisson equation derived is solved in less than one second of CPU time using 720 CPUs of Marenostrum supercomputer.
In the context of time-accurate numerical simulation of incompressible flows, a Poisson equation ... more In the context of time-accurate numerical simulation of incompressible flows, a Poisson equation needs to be solved at least once per time-step to project the velocity field onto a divergence-free space. Due to the non-local nature of its solution, this elliptic system is one of the most time consuming and difficult to parallelise parts of the code.In this paper, a
Applied Parallel Computing Large Scale Scientific and Industrial Problems, 1998
This paper describes work in progress at Hitachi Dublin Laboratory to develop a parallel adaptive... more This paper describes work in progress at Hitachi Dublin Laboratory to develop a parallel adaptive mesh refinement library. The library has been designed to be linked with a finite element simulation engine for solving three-dimensional unstructured turbulent fluid dynamics problems, using large eddy simulation. The library takes as input a distributed mesh and a list of mesh elements to be refined, carries out the refinement in parallel on the distributed data structure, redistributes the computational load and passes the updated ...
The aim of the project was to extend the scalability and parallelization strategy of SimuCoast co... more The aim of the project was to extend the scalability and parallelization strategy of SimuCoast code to enable the use of hybrid CPU+GPU supercomputers. The code is focused on increasing the understanding of coastal processes utilizing high performance computing (HPC) for the numerical simulation of the three-dimensional turbulent flow, which is induced in the coastal zone, and mainly in the surf zone, by wave propagation (oblique to the shore), refraction, breaking and dissipation. A model based on MPI+OpenACC has been implemented in order to increase the computing capabilities of the code. The adapted code was validated using data from the Vittori-Blondeaux simulation and it was tested using up to 512 computing nodes of the Piz Daint supercomputer.
2017 International Conference on High Performance Computing & Simulation (HPCS)
Portal del coneixement obert de la UPC http://upcommons.upc.edu/e-prints Oyarzun, G. [et al.] (20... more Portal del coneixement obert de la UPC http://upcommons.upc.edu/e-prints Oyarzun, G. [et al.] (2017) Memory aware poisson solver for peta-scale simulations with one FFT diagonalizable direction.
Currently, one of the trending topics in High Performance Computing is related to exascale comput... more Currently, one of the trending topics in High Performance Computing is related to exascale computing. Although the hardware is not yet available, the software community is working on developing and updating codes, which can efficiently use exascale architectures when they become available. Alya is one of the codes that are being developed towards exascale computing. It is part of the simulation packages of the Unified European Applications Benchmark Suite (UEABS) and Accelerators Benchmark Suite of PRACE and thus complies with the highest standards in HPC. Even though Alya has proven its scalability for up to hundreds of thousands of CPU-cores, there are some expensive routines that could affect its performance on exascale architectures. One of these routines is the conjugate gradient (CG) algorithm. CG is relevant because it is called at each time step in order to solve a linear system of equations. The bottleneck in CG is the large number of collective communications calls. In par...
Part of the research developments and results presented in this chapter were funded by: the Europ... more Part of the research developments and results presented in this chapter were funded by: the European Union’s Horizon 2020 Programme (2014–2020) and from Brazilian Ministry of Science, Technology and Innovation through Rede Nacional de Pesquisa (RNP) under the HPC4E Project, grant agreement 689772; EoCoE, a project funded by the European Union Contract H2020-EINFRA-2015-1-676629; PRACE Type C and Type A projects.
Modern supercomputers allow the simulation of complex phenomena with increased accuracy. Eventual... more Modern supercomputers allow the simulation of complex phenomena with increased accuracy. Eventually, this requires finer geometric discretizations with larger numbers of mesh elements. In this context, and extrapolating to the Exascale paradigm, meshing operations such as generation, adaptation or partition, become a critical issue within the simulation workflow. In this paper, we focus on mesh partitioning. In particular, we present some improvements carried out on an in-house parallel mesh partitioner based on the Hilbert Space-Filling Curve. Additionally, taking advantage of its performance, we present the application of the SFC-based partitioning for dynamic load balancing. This method is based on the direct monitoring of the imbalance at runtime and the subsequent re-partitioning of the mesh. The target weights for the optimized partitions are evaluated using a least-squares approximation considering all measurements from previous iterations. In this way, the final partition co...
Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing
Conjugate Gradient is a widely used iterative method to solve linear systems Ax=b with matrix A b... more Conjugate Gradient is a widely used iterative method to solve linear systems Ax=b with matrix A being symmetric and positive definite. Part of its effectiveness relies on finding a suitable preconditioner that accelerates its convergence. Factorized Sparse Approximate Inverse (FSAI) preconditioners are a prominent and easily parallelizable option. An essential element of a FSAI preconditioner is the definition of its sparse pattern, which constraints the approximation of the inverse A-1. This definition is generally based on numerical criteria. In this paper we introduce complementary architecture-aware criteria to increase the numerical effectiveness of the preconditioner without incurring in significant performance costs. In particular, we define cache-aware pattern extensions that do not trigger additional cache misses when accessing vector x in the y=Ax Sparse Matrix-Vector (SpMV) kernel. As a result, we obtain very significant reductions in terms of average solution time ranging between 12.94% and 22.85% on three different architectures - Intel Skylake, POWER9 and A64FX - over a set of 72 test matrices.
2017 International Conference on High Performance Computing & Simulation (HPCS), Jul 1, 2017
Portal del coneixement obert de la UPC http://upcommons.upc.edu/e-prints Oyarzun, G. [et al.] (20... more Portal del coneixement obert de la UPC http://upcommons.upc.edu/e-prints Oyarzun, G. [et al.] (2017) Memory aware poisson solver for peta-scale simulations with one FFT diagonalizable direction.
Ivette Rodriguez et al. in a significant part of the airfoil chord. As a consequence, airfoil aer... more Ivette Rodriguez et al. in a significant part of the airfoil chord. As a consequence, airfoil aerodynamic efficiency increases by a 124% with a reduction of the drag coefficient about 46%. This kind of technique seems to be promising at delaying flow separation and its associated losses when the angle of attack increases beyond the maximum lift for the baseline case.
International Journal of Computational Fluid Dynamics
Nowadays HPC systems experience a disruptive moment with a variety of novel architectures and fra... more Nowadays HPC systems experience a disruptive moment with a variety of novel architectures and frameworks, without any clarity of which one is going to prevail. In this context, the portability of codes across different architectures is of major importance. This paper presents a portable implementation model based on an algebraic operational approach for DNS and LES of incompressible turbulent flows using unstructured hybrid meshes. The strategy proposed consists in representing the whole time-integration algorithm using only three basic algebraic operations: sparse matrix-vector product, a linear combination of vectors and dot product. The main idea is based on decomposing the non-linear operators into a concatenation of two SpMV operations. This provides high modularity and portability. An exhaustive analysis of the proposed implementation for hybrid CPU/GPU supercomputers has been conducted with tests using up to 128 GPUs. The main objective consists in understanding the challenges of implementing CFD codes on new architectures.
Abstract The Sparse Matrix Vector Product (SpMV) is one of the main operations of iterative solve... more Abstract The Sparse Matrix Vector Product (SpMV) is one of the main operations of iterative solvers, and, in a parallel context, it is also the siege of point-to-point communications between the neighboring MPI processes. The parallel SpMV is built in such a way that it gives, up to round off errors, the same result as its sequential counterpart. In this regards, nodes on the interfaces (or halo nodes if halo are considered) are duplicated nodes of the same original mesh. It is therefore limited to matching meshes. In this work, we generalize the parallel SpMV to glue the solution of non-matching (non-conforming) meshes through the introduction of transmission matrices. This extension of the SpMV thus enables the implicit and parallel solution of partial differential equations on non-matching meshes, as well as the implicit coupling of multiphysics problems, such as fluid-structure interactions. The proposed method is developed similarly to classical parallelization techniques and can therefore be implemented by modifying few subroutines of an already MPI-based code. According to the proposed framework, the classical parallelization technique appears as a particular case of this general setup.
Larger supercomputers allow the simulation of more complex phenomena with increased accuracy. Eve... more Larger supercomputers allow the simulation of more complex phenomena with increased accuracy. Eventually this requires finer and thus also larger geometric discretizations. In this context, and extrapolating to the Exascale paradigm, meshing operations such as generation, deformation, adaptation/regeneration or partition/load balance, become a critical issue within the simulation workflow. In this paper we focus on mesh partitioning. In particular, we present a fast and scalable geometric partitioner based on Space Filling Curves (SFC), as an alternative to the standard graph partitioning approach. We have avoided any computing or memory bottleneck in the algorithm, while we have imposed that the solution achieved is independent (discounting rounding off errors) of the number of parallel processes used to compute it. The performance of the SFC-based partitioner presented has been demonstrated using up to 4096 CPU-cores in the Blue Waters supercomputer.
Large-eddy simulations (LES) of the flow past a circular cylinder are used to investigate the flo... more Large-eddy simulations (LES) of the flow past a circular cylinder are used to investigate the flow topology and the vortex shedding process at Reynolds numbers Re = 2.5×10 5 −8.5×10 5. This range encompasses both the critical and super-critical regimes. As the flow enters the critical regime, major changes occur which affect the flow configuration. Asymmetries in the flow are found in the critical regime, whereas the wake recovers its symmetry and stabilises in the super-critical regime. Wake characteristic lengths are measured and compared between the different Reynolds numbers. It is shown that the super-critical regime is characterised by a plateau in the drag coefficient at about C D ≈ 0.22, and a quasi-stable wake which has a non-dimensional width of d w /D ≈ 0.4. The periodic nature of the flow is analysed by means of measurements of the unsteady drag and lift coefficients. Power spectra of the lift fluctuations are computed. Wake vortex shedding is found to occur for both regimes investigated, although a jump in frequencies is observed when the flow enters the super-critical regime. In this regime, non-dimensional vortex-shedding frequency is almost constant and equal to St = f vs D/U ref ≈ 0.44. The analysis also shows a steep decrease in the fluctuating lift when entering the super-critical regime. The combined analysis of both wake topology and vortex shedding complements the physical picture of a stable and highly coherent flow in the super-critical regime.
Volume-of-Fluid (VOF) is one of the methods of choice to reproduce the interface motion in the si... more Volume-of-Fluid (VOF) is one of the methods of choice to reproduce the interface motion in the simulation of multi-fluid flows. One of its main strengths is its accuracy in capturing sharp interface geometries, although requiring for it a number of geometric calculations. Under these circumstances, achieving parallel performance on current supercomputers is a must. The main obstacle for the parallelization is that the computing costs are concentrated only in the discrete elements that lie on the interface between fluids. Consequently, if the interface is not homogeneously distributed throughout the domain, standard domain decomposition (DD) strategies lead to imbalanced workload distributions. In this paper, we present a new parallelization strategy for general unstructured VOF solvers, based on a dynamic load balancing process complementary to the underlying DD. Its parallel efficiency has been analyzed and compared to the DD one using up to 1024 CPU-cores on an Intel SandyBridge based supercomputer. The results obtained on the solution of several artificially generated test cases show a speedup of up to 12x with respect to the standard DD, depending on the interface size, the initial distribution and the number of parallel processes engaged. Moreover, the new parallelization strategy presented is of general purpose, therefore, it could be used to parallelize any VOF solver without requiring changes on the coupled flow solver. Finally, note that although designed for the VOF method, our approach could be easily adapted to other interface-capturing methods, such as the Level-Set, which may present similar workload imbalances.
Volume-of-Fluid (VOF) is one of the methods of choice to reproduce the interface motion in the si... more Volume-of-Fluid (VOF) is one of the methods of choice to reproduce the interface motion in the simulation of multi-fluid flows. One of its main strengths is its accuracy in capturing sharp interface geometries, although requiring for it a number of geometric calculations. Under these circumstances, achieving parallel performance on current supercomputers is a must. The main obstacle for the parallelization is that the computing costs are concentrated only in the discrete elements that lie on the interface between fluids. Consequently, if the interface is not homogeneously distributed throughout the domain, standard domain decomposition (DD) strategies lead to imbalanced workload distributions. In this paper, we present a new parallelization strategy for general unstructured VOF solvers, based on a dynamic load balancing process complementary to the underlying DD. Its parallel efficiency has been analyzed and compared to the DD one using up to 1024 CPU-cores on an Intel SandyBridge based supercomputer. The results obtained on the solution of several artificially generated test cases show a speedup of up to 12x with respect to the standard DD, depending on the interface size, the initial distribution and the number of parallel processes engaged. Moreover, the new parallelization strategy presented is of general purpose, therefore, it could be used to parallelize any VOF solver without requiring changes on the coupled flow solver. Finally, note that although designed for the VOF method, our approach could be easily adapted to other interface-capturing methods, such as the Level-Set, which may present similar workload imbalances.
Lecture Notes in Computational Science and Engineering, 2010
In this paper a parallel direct Poisson solver for DNS simulation of turbulent flows statisticall... more In this paper a parallel direct Poisson solver for DNS simulation of turbulent flows statistically homogeneous in one spatial direction is presented. It is based on a Fourier diagonalization and a Schur decomposition on the spanwise and streamwise directions respectively. Numerical experiments carried out in order to test the robustness and efficiency of the algorithm are presented. This solver is being used for a DNS of a turbulent flow around a circular cylinder at Re = 1 ×104, the size of the required mesh is about 104 M elements and the discrete Poisson equation derived is solved in less than one second of CPU time using 720 CPUs of Marenostrum supercomputer.
In the context of time-accurate numerical simulation of incompressible flows, a Poisson equation ... more In the context of time-accurate numerical simulation of incompressible flows, a Poisson equation needs to be solved at least once per time-step to project the velocity field onto a divergence-free space. Due to the non-local nature of its solution, this elliptic system is one of the most time consuming and difficult to parallelise parts of the code.In this paper, a
Applied Parallel Computing Large Scale Scientific and Industrial Problems, 1998
This paper describes work in progress at Hitachi Dublin Laboratory to develop a parallel adaptive... more This paper describes work in progress at Hitachi Dublin Laboratory to develop a parallel adaptive mesh refinement library. The library has been designed to be linked with a finite element simulation engine for solving three-dimensional unstructured turbulent fluid dynamics problems, using large eddy simulation. The library takes as input a distributed mesh and a list of mesh elements to be refined, carries out the refinement in parallel on the distributed data structure, redistributes the computational load and passes the updated ...
Uploads
Papers by Ricard Borrell