Automatic Parallelization
23 Followers
Most downloaded papers in Automatic Parallelization
We present the design and implementation of an automatic polyhedral source-to-source transformation framework that can optimize regular programs (sequences of possibly imperfectly nested loops) for parallelism and locality simultaneously.... more
With the advent of digitization and growing abundance of graphic and image processing tools, use cases for clipping using circular windows have grown considerably. This paper presents an efficient clipping algorithm for line segments... more
Existing theories of multiple object tracking (MOT) offer different predictions concerning the role of higher level cognitive processes, individual differences, effortful attention and parallel processing in MOT. Pylyshyn's model (1989)... more
This paper presents an overview of the SUIF compiler, which automatically parallelizes and optimizes sequential programs for shared-memory multiprocessors. We describe new technology in this system for locating coarse-grain parallelism... more
In the area of automatic parallelization of programs, analyzing and transforming loop nests with parametric a ne loop bounds requires fundamental mathematical results. The most common geometrical model of iteration spaces, called the... more
Data-oriented workflows are often used in scientific applications for executing a set of dependent tasks across multiple computers. We discuss how these can be modeled using lambda calculus, and how ideas from functional programming are... more
Previous literature in alphabetic languages suggests that the occipital-temporal region (the ventral pathway) is specialized for automatic parallel word recognition, whereas the parietal region (the dorsal pathway) is specialized for... more
Performance optimization of stencil computations has been widely studied in the literature, since they occur in many computationally intensive scientific and engineering applications. Compiler frameworks have also been developed that can... more
GPUs are a class of specialized parallel architectures with tremendous computational power. The new Compute Unified Device Architecture (CUDA) programming model from NVIDIA facilitates programming of general purpose applications on their... more
A robot-controlled wafer bonding machine was developed for the bonding of different sizes of wafers ranging up to 8 inches diameter. The features of this equipment are such that: (1) After the automatic parallel adjustment for 8-inch... more
This paper develops and experimentally demonstrates a robust automatic parallel parking algorithm for parking in tight spaces. Novel fuzzy logic controllers are designed for each step of the maneuvering process. The controllers are first... more
The ubiquity of multicore processors in commodity computing systems has raised a significant programming challenge for their effective use. An attractive but challenging approach is automatic parallelization of sequential codes. Although... more
The widespread use of multicore processors is not a consequence of significant advances in parallel programming.
The Support Vector Machine (SVM) is a supervised learning algorithm used for recognizing patterns in data. It is a very popular technique in Machine Learning and has been successfully used in applications such as image classification,... more
Two key steps in the compilation of strict functional languages are the conversion of higher-order functions to data structures (closures) and the transformation to tail-recursive style. We show how to perform both steps at once by... more
Current Fortran optimizing compilers often include source to source transforma-tions for automatic parallelization or vectorization of loops. Lower level optimiza-tions, such as those that aim to exploit ILP, are performed at later stages... more
Abstract Program parallelization becomes increasingly important when new multi-core architectures provide ways to improve performance. One of the greatest challenges of this development lies in programming parallel applications. Us-ing... more
This paper depicts the development of backward automatic parallel parking system for nonholonomic mobile robot. The configuration of the system consists of ultrasonic sensor, rotary encoder, controller, and actuators. The path planning... more
Static scheduling of a program represented by a directed task graph on a multiprocessor system to minimize the program completion time is a well-known problem in parallel processing. Since finding an optimal schedule is an NPcomplete... more
As multicore systems become the dominant mainstream computing technology, one of the most difficult challenges the industry faces is the software. Applications with large amounts of explicit thread-level parallelism naturally scale... more
Parallelization of image analysis tasks forms a basic key for processing huge image data in realtime. At this, suitable subtasks for parallel processing have to be extracted and mapped to components of a distributed system. Basically,... more
The evolution of high performance computers is progressing toward increasingly heterogeneous systems. These new architectures pose new challenges, particularly in the field of programming languages. New tools and languages are needed if... more
Porting applications to new high performance parallel and distributed computing platforms is a challenging task. Since writing parallel code by hand is time consuming and costly, porting codes would ideally be automated by using some... more
The ubiquity of multicore processors in commodity computing systems has raised a significant programming challenge for their effective use. An attractive but challenging approach is automatic parallelization of sequential codes. Although... more
Recent advances in polyhedral compilation technology have made it feasible to automatically transform affine sequential loop nests for tiled parallel execution on multi-core processors. However, for multi-statement input programs with... more
an NSF Graduate Research Fellowship and NSF and Darpa grants to the Fugu and Raw projects. While provided a vital support network. Most of all, I have relied on my wife, Kathleen Shannon, and my children, Karissa and Anya. Their love has... more
We introduce StarFlow, a script-centric environment for data analysis. StarFlow has four main features: (1) extraction of control and data-flow dependencies through a novel combination of static analysis, dynamic runtime analysis, and... more
This article deals with automatic parallelization of static control programs.
Most parallel databases exploit two types of parallelism: intra-query parallelism and inter-transaction concurrency. Between these two cases lies another type of parallelism: inter-query parallelism within a transaction or application.... more
per. Software package build upon proposed algorithm is described. Several practical examples of mesh generation on multiprocessor computational systems are given. It is shown that developed parallel algorithm enables us to reduce mesh... more
per. Software package build upon proposed algorithm is described. Several practical examples of mesh generation on multiprocessor computational systems are given. It is shown that developed parallel algorithm enables us to reduce mesh... more
We describe pHPF, an research prototype HPF compiler for the IBM SP series parallel machines. The compiler accepts as input Fortran 90 and Fortran 77 programs, augmented with HPF directives; sequential loops are automatically... more
The aim of this paper is to explain the importance of polytope and polyhedra in automatic parallelization. We show that the semantics of parallel programs is best described geometrically, as properties of sets of integral points in... more
Increased complexity of memory systems to ameliorate the gap between the speed of processors and memory has made it increasingly harder for compilers to optimize an arbitrary code within a palatable amount of time. With the emergence of... more
Static scheduling of a program represented by a directed task graph on a multiprocessor system to minimize the program completion time is a well-known problem in parallel processing. Since finding an optimal schedule is an NPcomplete... more
The growing number of processing cores in a single CPU is demanding more parallelism from sequential programs. But in the past decades few work has succeeded in automatically exploiting enough parallelism, which casts a shadow over the... more
Distributed-memory multicomputers such as the the Intel Paragon, the IBM SP-2, and the Thinking Machines CM-5 o er signi cant advantages over shared-memory multiprocessors in terms of cost and scalability. Unfortunately, extracting all... more