Modern multicore architectures require adapted, parallel algorithms and implementation strategies... more Modern multicore architectures require adapted, parallel algorithms and implementation strategies for many applications. As a non-trivial example we chose in this paper a patch-based sparse coding algorithm called Orthogonal Matching Pursuit (OMP) and discuss parallelization and implementation strategies on current hardware. The OMP algorithm is used in imaging and involves heavy computations on many small blocks of pixels called patches. From a global view the patches within the image can be processed completely in parallel but within one patch the algorithm is hard to parallelize. We compare the performance on the Cell Broadband Engine Architecture (CBEA), different GPUs, and current multicore CPUs.
Modern multicore architectures require adapted, parallel algorithms and implementation strategies... more Modern multicore architectures require adapted, parallel algorithms and implementation strategies for many applications. As a non-trivial example we chose in this paper a patch-based sparse coding algorithm called Orthogonal Matching Pursuit (OMP) and discuss parallelization and implementation strategies on current hardware. The OMP algorithm is used in imaging and involves heavy computations on many small blocks of pixels called patches. From a global view the patches within the image can be processed completely in parallel but within one patch the algorithm is hard to parallelize. We compare the performance on the Cell Broadband Engine Architecture (CBEA), different GPUs, and current multicore CPUs.
This paper deals with parallel 2D and 3D V-cycle multigrid implementation for computing the optic... more This paper deals with parallel 2D and 3D V-cycle multigrid implementation for computing the optical flow between two images. We compare memory costs and convergence rates of four schemes: the Horn-Schunck algorithm (Gauss-Seidel) and multigrid with three different strategies of coarse grid operators discretization: direct coarsening, lumping and Galerkin approaches. Experiments on synthetic images and CT images of the brain are conducted.
We present a fast, cell-centered multigrid solver and apply it to image denoising and nonrigid di... more We present a fast, cell-centered multigrid solver and apply it to image denoising and nonrigid diffusion based image registration. In both applications real time performance is required in 3D and the multigrid method has to be compared to solvers based on Fast Fourier Transform. The optimization of the underlying variational approach results for image denoising directly in one time step of a parabolic linear heat equation, for image registration a non-linear 2nd order system of partial differential equations is obtained. This system is solved by a fixpoint iteration using a semi-implicit time discretization, where each time step again results in an elliptic linear heat equation. The multigrid implementation comes close to real time performance for medium size medical images in 3D for both applications and is compared to a solver based on Fast Fourier Transform using available libraries.
WaLBerla (Widely applicable Lattice-Boltzmann from Erlangen) is a massively parallel software fra... more WaLBerla (Widely applicable Lattice-Boltzmann from Erlangen) is a massively parallel software framework supporting a wide range of physical phenomena. This article describes the software designs realizing the major goal of the framework, a good balance between expandability and scalable, highly optimized, hardware-dependent, special purpose kernels. To demonstrate our designs, we discuss the coupling of our Lattice-Boltzmann fluid flow solver and a method for fluid structure interaction. Additionally, we show a software design for heterogeneous computations on GPU and CPU utilizing optimized kernels. Finally, we estimate the software quality of the framework on the basis of software quality factors.
Optical flow and the related non-rigid image registration both lead to a variational minimization... more Optical flow and the related non-rigid image registration both lead to a variational minimization problem that requires robust and efficient numerical solvers due to the often non-smooth input data and the large number of unknowns in real applications.
High Performance Computing in Science and Engineering, Garching/Munich 2007, 2009
This contribution presents three parallel multigrid solvers, two for finite element and one for f... more This contribution presents three parallel multigrid solvers, two for finite element and one for finite difference simulations. They are focused on the different aspects of software design: efficiency, usability, and generality, but all have in common that they are highly scalable to large numbers of processors.
The Direct Projection-based solvers are direct methods for solving linear systems of equations in... more The Direct Projection-based solvers are direct methods for solving linear systems of equations in which an initial set of vectors are projected onto the hyperplanes of the system by using projections parallel with some specific directions also constructed during the development of the algorithm. They have been initially designed for square nonsingular systems, but further developments were produced also for more general ones (see an overview of these algorithms in [1], [6] and references therein). The main scope of our paper is to introduce and theoretically analyse a new class of such kind of direct solvers. Starting from some preliminary methods and results proposed by one of the authors in we extend the theoretical analysis for one of them and design new block row and column projection versions. In the last section of the paper we use these algorithms and compare them with some classical direct projection-based ones for some numerical experiments on linear systems arising from rigid body dynamics problems.
Modern multicore architectures require adapted, parallel algorithms and implementation strategies... more Modern multicore architectures require adapted, parallel algorithms and implementation strategies for many applications. As a non-trivial example we chose in this paper a patch-based sparse coding algorithm called Orthogonal Matching Pursuit (OMP) and discuss parallelization and implementation strategies on current hardware. The OMP algorithm is used in imaging and involves heavy computations on many small blocks of pixels called patches. From a global view the patches within the image can be processed completely in parallel but within one patch the algorithm is hard to parallelize. We compare the performance on the Cell Broadband Engine Architecture (CBEA), different GPUs, and current multicore CPUs.
Modern multicore architectures require adapted, parallel algorithms and implementation strategies... more Modern multicore architectures require adapted, parallel algorithms and implementation strategies for many applications. As a non-trivial example we chose in this paper a patch-based sparse coding algorithm called Orthogonal Matching Pursuit (OMP) and discuss parallelization and implementation strategies on current hardware. The OMP algorithm is used in imaging and involves heavy computations on many small blocks of pixels called patches. From a global view the patches within the image can be processed completely in parallel but within one patch the algorithm is hard to parallelize. We compare the performance on the Cell Broadband Engine Architecture (CBEA), different GPUs, and current multicore CPUs.
This paper deals with parallel 2D and 3D V-cycle multigrid implementation for computing the optic... more This paper deals with parallel 2D and 3D V-cycle multigrid implementation for computing the optical flow between two images. We compare memory costs and convergence rates of four schemes: the Horn-Schunck algorithm (Gauss-Seidel) and multigrid with three different strategies of coarse grid operators discretization: direct coarsening, lumping and Galerkin approaches. Experiments on synthetic images and CT images of the brain are conducted.
We present a fast, cell-centered multigrid solver and apply it to image denoising and nonrigid di... more We present a fast, cell-centered multigrid solver and apply it to image denoising and nonrigid diffusion based image registration. In both applications real time performance is required in 3D and the multigrid method has to be compared to solvers based on Fast Fourier Transform. The optimization of the underlying variational approach results for image denoising directly in one time step of a parabolic linear heat equation, for image registration a non-linear 2nd order system of partial differential equations is obtained. This system is solved by a fixpoint iteration using a semi-implicit time discretization, where each time step again results in an elliptic linear heat equation. The multigrid implementation comes close to real time performance for medium size medical images in 3D for both applications and is compared to a solver based on Fast Fourier Transform using available libraries.
WaLBerla (Widely applicable Lattice-Boltzmann from Erlangen) is a massively parallel software fra... more WaLBerla (Widely applicable Lattice-Boltzmann from Erlangen) is a massively parallel software framework supporting a wide range of physical phenomena. This article describes the software designs realizing the major goal of the framework, a good balance between expandability and scalable, highly optimized, hardware-dependent, special purpose kernels. To demonstrate our designs, we discuss the coupling of our Lattice-Boltzmann fluid flow solver and a method for fluid structure interaction. Additionally, we show a software design for heterogeneous computations on GPU and CPU utilizing optimized kernels. Finally, we estimate the software quality of the framework on the basis of software quality factors.
Optical flow and the related non-rigid image registration both lead to a variational minimization... more Optical flow and the related non-rigid image registration both lead to a variational minimization problem that requires robust and efficient numerical solvers due to the often non-smooth input data and the large number of unknowns in real applications.
High Performance Computing in Science and Engineering, Garching/Munich 2007, 2009
This contribution presents three parallel multigrid solvers, two for finite element and one for f... more This contribution presents three parallel multigrid solvers, two for finite element and one for finite difference simulations. They are focused on the different aspects of software design: efficiency, usability, and generality, but all have in common that they are highly scalable to large numbers of processors.
The Direct Projection-based solvers are direct methods for solving linear systems of equations in... more The Direct Projection-based solvers are direct methods for solving linear systems of equations in which an initial set of vectors are projected onto the hyperplanes of the system by using projections parallel with some specific directions also constructed during the development of the algorithm. They have been initially designed for square nonsingular systems, but further developments were produced also for more general ones (see an overview of these algorithms in [1], [6] and references therein). The main scope of our paper is to introduce and theoretically analyse a new class of such kind of direct solvers. Starting from some preliminary methods and results proposed by one of the authors in we extend the theoretical analysis for one of them and design new block row and column projection versions. In the last section of the paper we use these algorithms and compare them with some classical direct projection-based ones for some numerical experiments on linear systems arising from rigid body dynamics problems.
Uploads
Papers by H. Köstler