Automatic Parallelization
23 Followers
Recent papers in Automatic Parallelization
Estimating the execution time of nested loops or the volume of data transferred between processors is necessary to make appropriate processor or data allocation. To achieve this goal one need to estimate the execution time of the body and... more
Adaptive Mesh Re®nement (AMR) calculations carried out on structured meshes play an exceedingly important role in several areas of science and engineering. This is so not just because AMR techniques allow us to carry out calculations very... more
Distributed-memory multicomputers such as the the Intel Paragon, the IBM SP-2, and the Thinking Machines CM-5 o er signi cant advantages over shared-memory multiprocessors in terms of cost and scalability. Unfortunately, extracting all... more
This paper extends the algorithms which were developed in Part I to cases in which there is no a ne schedule, i.e. to problems whose parallel complexity is polynomial but not linear. The natural generalization is to multidimensional... more
This paper extends the algorithms which were developed in Part I to cases in which there is no a ne schedule, i.e. to problems whose parallel complexity is polynomial but not linear. The natural generalization is to multidimensional... more
Static scheduling of a program represented by a directed task graph on a multiprocessor system to minimize the program completion time is a well-known problem in parallel processing. Since finding an optimal schedule is an NPcomplete... more
an NSF Graduate Research Fellowship and NSF and Darpa grants to the Fugu and Raw projects. While provided a vital support network. Most of all, I have relied on my wife, Kathleen Shannon, and my children, Karissa and Anya. Their love has... more
Existing theories of multiple object tracking (MOT) offer different predictions concerning the role of higher level cognitive processes, individual differences, effortful attention and parallel processing in MOT. Pylyshyn's model (1989)... more
GPUs are a class of specialized parallel architectures with tremendous computational power. The new Compute Unified Device Architecture (CUDA) programming model from NVIDIA facilitates programming of general purpose applications on their... more
per. Software package build upon proposed algorithm is described. Several practical examples of mesh generation on multiprocessor computational systems are given. It is shown that developed parallel algorithm enables us to reduce mesh... more
We introduce StarFlow, a script-centric environment for data analysis. StarFlow has four main features: (1) extraction of control and data-flow dependencies through a novel combination of static analysis, dynamic runtime analysis, and... more
A robot-controlled wafer bonding machine was developed for the bonding of different sizes of wafers ranging up to 8 inches diameter. The features of this equipment are such that: (1) After the automatic parallel adjustment for 8-inch... more
The Support Vector Machine (SVM) is a supervised learning algorithm used for recognizing patterns in data. It is a very popular technique in Machine Learning and has been successfully used in applications such as image classification,... more
The Set-Sharing domain has been widely used to infer at compiletime interesting properties of logic programs such as occurs-check reduction, automatic parallelization, and finite-tree analysis. However, performing abstract unification in... more
Parallelizing compilers have traditionally focussed mainly on parallelizing loops. This paper presents a new framework for automatically parallelizing recursive procedures that typically appear in divide-and-conquer algorithms. We present... more
With the advent of digitization and growing abundance of graphic and image processing tools, use cases for clipping using circular windows have grown considerably. This paper presents an efficient clipping algorithm for line segments... more
The widespread use of multicore processors is not a consequence of significant advances in parallel programming.
Data-oriented workflows are often used in scientific applications for executing a set of dependent tasks across multiple computers. We discuss how these can be modeled using lambda calculus, and how ideas from functional programming are... more
MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in terms of a map and a reduce function,... more
Recent advances in polyhedral compilation technology have made it feasible to automatically transform affine sequential loop nests for tiled parallel execution on multi-core processors. However, for multi-statement input programs with... more
Previous literature in alphabetic languages suggests that the occipital-temporal region (the ventral pathway) is specialized for automatic parallel word recognition, whereas the parietal region (the dorsal pathway) is specialized for... more
We describe pHPF, an research prototype HPF compiler for the IBM SP series parallel machines. The compiler accepts as input Fortran 90 and Fortran 77 programs, augmented with HPF directives; sequential loops are automatically... more
This paper develops and experimentally demonstrates a robust automatic parallel parking algorithm for parking in tight spaces. Novel fuzzy logic controllers are designed for each step of the maneuvering process. The controllers are first... more
The problem of writing software for multicore processors is greatly simplified if we could automatically parallelize sequential programs. Although auto-parallelization has been studied for many decades, it has succeeded only in a few... more
Recent advances in polyhedral compilation technology have made it feasible to automatically transform affine sequential loop nests for tiled parallel execution on multi-core processors. However, for multi-statement input programs with... more
We report on a detailed study of the application and e ectiveness of program analysis based on abstract interpretation to automatic program parallelization. We study the case of parallelizing logic programs using the notion of strict... more
Current approaches to parallelizing compilation perform a purely structural analysis of the sequential code. Conversely, a semantic analysis performing concept assignment for code sections, can support the recognition of the algorithms... more
Automatic parallelization is a promising strategy to improve application performance in the multicore era. However, common programming practices such as the reuse of data structures introduce artificial constraints that obstruct automatic... more
HEXAR, a new software product developed at Cray Research, Inc., automatically generates good quality meshes directly from surface data produced by computeraided design (CAD) packages. The HEXAR automatic mesh generator is based on a... more
We discuss the parallelization and object-oriented implementation of Monte Carlo simulations for physical problems. We present a C++ Monte Carlo class library for the automatic parallelization of Monte Carlo simulations. Besides... more
We describe and evaluate a novel approach for the automatic parallelization of programs that use pointer-based dynamic data structures, written in Java. The approach exploits parallelism among methods by creating an asynchronous thread of... more
The evolution of high performance computers is progressing toward increasingly heterogeneous systems. These new architectures pose new challenges, particularly in the field of programming languages. New tools and languages are needed if... more
automatically generates good quality meshes directly from surface data produced by computeraided design (CAD) packages. The HEXAR automatic mesh generator is based on a proprietary and parallel algorithm that relies on pattern... more
Two key steps in the compilation of strict functional languages are the conversion of higher-order functions to data structures (closures) and the transformation to tail-recursive style. We show how to perform both steps at once by... more
This paper introduces an analysis technique, commutativity analysis, for automatically parallelizing computations that manipulate dynamic, pointer-based data structures. Commutativity analysis views computations as composed of operations... more
Atomic operations are a key primitive in parallel computing systems. The standard implementation mechanism for atomic operations uses mutual exclusion locks. In an object-based programming system, the natural granularity is to give each... more
A flexible compiler framework for distributed-memory multicomputers automatically parallelizes sequential programs. A unified approach efficiently supports regular and irregular computations using data and functional parallelism.
Tree contraction algorithms, whose idea was first proposed by Miller and Reif, are important parallel algorithms to implement efficient parallel programs manipulating trees. Despite their efficiency, the tree contraction algorithms have... more
Divide-and-conquer algorithms are suitable for modern parallel machines, tending to have large amounts of inherent parallelism and working well with caches and deep memory hierarchies. Among others, list homomorphisms are a class of... more