Papers by Stanislav Sedukhin
IEICE Technical Report; IEICE Tech. Rep., Jun 12, 2015
Proceedings of the International Conference on Parallel Processing Workshops, 2006
Prediction of the arrival time of tsunami is critical for evacuating people from coastal area. So... more Prediction of the arrival time of tsunami is critical for evacuating people from coastal area. Solving many related to tsunami problems is important in order to decrease negative effects of this serious disaster. Numerical modeling of tsunami wave propagation is a computational intensive problem that requiresacceleration of calculations by means of parallel processing. The Method of Splitting Tsunami (MOST) is one of the well-known numerical solvers for modeling tsunami waves in the ocean. This paper focuses on design and evaluation of tsunami simulation code using OpenCL. We have developed a tsunami propagation code based on MOST, and implemented its different parallel optimizations for GPU and FPGA.
Technical Report 93-1-010, University of Aizu, Japan, 1993
This technical report presents a systematic approach for formal synthesis of a set of systolic ar... more This technical report presents a systematic approach for formal synthesis of a set of systolic array processors for the matrix algorithms of linear algebra, digital signal processing, graph theory, etc. This systematic approach is the theoretical background for visual software tool s4cad, which is either considered. Starting with the source algorithm specification in the form of the linear recurrent equations formal approach and the s4cad tool allow us to obtain and examine the set of admissible array processors. The tool uses advanced media technologies of the MS Windows 3 graphical environment that gets more comfort for the user to evaluate and chose an optimal solution observing the requirements on the computing time, number of processing elements, connections topology, data flow formats, etc. Number of basic algorithms of linear algebra and graph theory was considered with the formal approach and added to the s4cad library. In this report the design of systolic array processors for the transitive closure algorithm is shown as an example.
The two-dimensional (2D) forward/inverse discrete Fourier transform (DFT), discrete cosine transf... more The two-dimensional (2D) forward/inverse discrete Fourier transform (DFT), discrete cosine transform (DCT), discrete sine transform (DST), discrete Hartley transform (DHT), discrete Walsh-Hadamard transform (DWHT), play a fundamental role in many practical applications. Due to the separability property, all these transforms can be uniquely defined as a triple matrix product with one matrix transposition. Based on a systematic approach to represent and schedule different forms of the n × n matrix-matrix multiply-add (MMA) operation in 3D index space, we design new orbital highly-parallel/scalable algorithms and present an efficient n × n unified array processor for computing any n × n forward/inverse discrete separable transform in the minimal 2n time-steps. Unlike traditional 2D systolic array processing, all n 2 register-stored elements of initial/intermediate matrices are processed simultaneously by all n 2 processing elements of the unified array processr at each time-step. Hence the proposed array processor is appropriate for applications with naturally arranged multidimensional data such as still images, video frames, 2D data from a matrix sensor, etc. Ultimately, we introduce a novel formulation and a highlyparallel implementation of the frequently required matrix data alignment and manipulation by using MMA operations on the same array processor so that no additional circuitry is needed.
情報科学技術フォーラム講演論文集, Aug 20, 2009
in Torus Networks 3-dimensional discrete transforms, general matrix-matrix multiplication, 3-dime... more in Torus Networks 3-dimensional discrete transforms, general matrix-matrix multiplication, 3-dimensional data decomposition, orbital matrix-matrix multiplication, torus networks, extremely scalable algorithms, algorithm/architecture co-design The 3-dimensional (3D) forward/inverse separable discrete transforms, such as Fourier transform, cosine/sine transform, Hartley transform, and many others, are frequently the principal limiters that prevent many practical applications from scaling to the large number of processors. Existing approaches, which are based on a 1D or 2D data decomposition, do not allow the 3D transforms to be scaled to the maximum possible number of processors. Based on the newly proposed 3D decomposition of an N ×N ×N initial data into P ×P ×P blocks, where P = N/b and b ∈ [1, 2, ..., N ] is a blocking factor, we design unified, highly scalable, GEMM-based algorithms for parallel implementation of any forward/inverse 3D transform on a P ×P ×P network of toroidally interconnected nodes in 3P "compute-and-roll" time-steps, where each step is equal to the time of execution of b 4 fused multiply-add (fma) operations and movement of O(b 3) scalar data between nearest-neighbor nodes. The proposed 3D orbital algorithms can be extremely scaled to the maximum number of N 3 simple nodes (fma-units) which is equal to the size of data.
Programming and Computer Software, 1992
This article surveys the author`s work on design and analysis of parallel algorithms and processo... more This article surveys the author`s work on design and analysis of parallel algorithms and processor arrays with systolic architecture. 52 refs., 5 figs.
Computer Applications in Industry and Engineering, 2008
Use of vinyl acetate-ethylene copolymer emulsions as laminating adhesives for rug backing is disc... more Use of vinyl acetate-ethylene copolymer emulsions as laminating adhesives for rug backing is disclosed. The copolymer emulsions are prepared by admixing vinyl acetate-ethylene copolymer, dispersant and thickening agent together, with or without the addition of filler. The vinyl acetate-ethylene copolymer can contain between about 20 and about 70 parts by weight of vinyl acetate and between about 30 and about 80 parts by weight of ethylene. The resulting vinyl acetate-ethylene copolymer emulsions have a glass transition temperature (Tg) of between about -35 DEG C. and about -10 DEG C.
Computers and Their Applications, 2007
The Algebraic Path Problem (APP) unifies well-known matrix, graph, and language problems, such as... more The Algebraic Path Problem (APP) unifies well-known matrix, graph, and language problems, such as matrix inversion, all-pairs shortest paths (APSP), maximum capacity paths (MCP), minimum spanning tree, generation of regular languages, etc., into a single algorithmic scheme. The difference between APP instances is in the underlying algebraic structure. This paper explores the APP and presents an implementation of a block algorithm for solving the APP on the Cell Broadband Engine (Cell/B.E.) heterogeneous multicore processor. The block APP algorithm spends the most computing time in a block matrix-matrix multiply-add (MMA) operation in different algebras. In our APP algorithm, a fast dense MMA operation in linear (+, ×)-algebra is utilized. The MMA implementation on the Cell/B.E. needs only a single fused multiply-add (FMA) instruction to obtain a single short-vector (+, ×)-result in one cycle. APP instances such as APSP and MCP problems are based on (min, +)-and (max, min)-algebras, respectively, which are different from the linear (+, ×)-algebra, and require three and four instructions to obtain a single short-vector result in three and four cycles. Because of that, the maximum sustained performance for MMA operation on Cell/B.E.
Federated Conference on Computer Science and Information Systems, Nov 15, 2011
In the present study, a home monitoring healthcare system for elderly and chronic patients has be... more In the present study, a home monitoring healthcare system for elderly and chronic patients has been proposed. The system was developed for three types of users: assisted person, doctor and guardian. It analyzes the collected information (e.g. biomedical signals) and in case of detection of dangerous events informs physician and guardian. A mobile device has a key role in the system. It allows exchange and visualization of data to the users. This paper describes the design and implementation of a tablet in the home monitoring healthcare system, with specially developed data exchange protocol Additionally special security features to protect data exchange were introduced. Software part of the system was made using modern technologies such as JavaFX for central unit and Android for mobile devices.
The basic linear algebra subroutines (BLAS) are standard operations to efficiently solve the line... more The basic linear algebra subroutines (BLAS) are standard operations to efficiently solve the linear algebra problems on high performance and parallel systems. In this paper, we study the implementation of some important BLAS operations on a N ×N torus array processor. We show that the performance of the Level-3 BLAS represented by the n×n matrix multiply-add operation, n>N , approaches the theoretical peak as n increases since the degree of data reusing is high. While the performance of Level-1 and Level-2 BLAS operations is low as a result of low data reusing. Fortunately, many applications are based on intensive use of Level-3 BLAS with small percentage of Level-1 and Level-2 BLAS.
Scalable Computing: Practice and Experience, Jul 4, 2014
Scalable Computing: Practice and Experience, Apr 30, 2014
Recently, a novel method for image scrambling (and unscrambling) has been proposed. This method i... more Recently, a novel method for image scrambling (and unscrambling) has been proposed. This method is based on a linear transformation involving the Kroneker-delta function. However, while quite interesting, the way it was introduced, leaves some open issues concerning its actual usability for information hiding. Therefore, in this paper, we extend the original proposal and show how it can be used to securely pass image-like information between the users. 87 * Work of Marcin Paprzycki was completed while visiting the University of Aizu.
Springer eBooks, May 28, 2008
Uploads
Papers by Stanislav Sedukhin