Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing ICPPW-96, 1996
HPC++ is a C++ library and language extension framework that is being developed by the HPC++ cons... more HPC++ is a C++ library and language extension framework that is being developed by the HPC++ consortium as a standard model for portable parallel C++ programming. This paper provides a brief introduction to HPC++ style programming and outlines some of the unresolved issues
Proceedings of the 1989 ACM/IEEE conference on Supercomputing - Supercomputing '89, 1989
In this paper we describe an interactive tool designed for performance prediction of parallel pro... more In this paper we describe an interactive tool designed for performance prediction of parallel programs. Static performance prediction, in general, is a very difficult task. In order to avoid some inherent problems, we concentrate on reasonably structured scientific programs. Our prediction system, which is built as a sub-system of a larger interactive environment, uses a parser, dependence analyzer, database and an X-window based front end in analyzing programs. The system provides the user with execution times of different sections of programs. When there are unknowns involved, such as number of processors or unknown loop bounds, the output is an algebraic expression in terms of these variables. We propose a simple analytical model as an attempt to predict performance degradation due to data references in hierarchical memory systems. The predicted execution times of some Lawrence Livermore loop kernels are given together with the experimental values obtained by executing the loops on Alliant FX/8.
This report discusses a VLSI implementation of a record-sorting stack. Records are represented on... more This report discusses a VLSI implementation of a record-sorting stack. Records are represented on the stack as (key, record-pointer) pairs, and the operations supported ere PUSH, POP, and CLEAR When records are POPped, they ere returned in smallest-first order. The implementation allows the sorting of n records in O(n) time, and the design is cascadable so that the capacity of a single VLSI chip does not limit the amount of data which may be sorted. This report describes a paper design and evaluation, and thus serves two purposes: lt describes one particular VLSI sorting circuit, and it also serves as a case study in VLSI design methodology. The algorithm is described, the overall chip organization and data ti.ow are presented, and detailed circuits, layouts, and timing analyses are given. niD work W'2IS supported by the Natianel Science FOUlldation Grant ECS-Sll0684, a CileT?'Qil U.S.A. Career Development Gnmt. a California llICRO FelloWllbip, the Mr Farce 01fice of Sc:iemijic Research Gnmt. AFOSR-7~3596, and the NaTlll Electroilic Systema Commend Contract mstrNOCXl39-Bl-C-o569.
Abstract: The divide-and-conquer paradigm yields algorithms that parallelize easily, a veryimport... more Abstract: The divide-and-conquer paradigm yields algorithms that parallelize easily, a veryimportant consideration in high-performance computing. However, high-performancecomputing also relies on local reuse of data in a memory hierarchy of registers, caches,main memory, and swapping disks.
Opportunities and Constraints of Parallel Computing, 1989
The subject of parallel computation has been around for almost 20 years and, until very recently,... more The subject of parallel computation has been around for almost 20 years and, until very recently, it has been an exotic subdiscipline of computer architecture and algorithm analysis. However, recently we have seen a fundamental shift in the role of parallelism studies in both academic computer science and in the computer industry. In particular, parallel computation issues are now a significant part of research in most branches of computer science and the industry sees parallelism as the only answer to the next generation of high performance machines. Because we can now build parallel systems with relative ease and because application programmers are now intensively involved in the use of these machines, the nature of basic research has shifted focus. In the past, when there were no real machines available, all we could do was make theoretical studies of parallel algorithm complexity and debate the merits of switching network design. Now the discipline more closely resembles other sciences in that there is a very active experimental branch of computer science that is busy testing new design theories and algorithms. The experimental work is driving new investigations of theoretical questions that have arisen from our need to fully understand this first generation of parallel hardware.
ABSTRACT The idea of building computer applications by composing them out of reusable software co... more ABSTRACT The idea of building computer applications by composing them out of reusable software components is a concept that emerged in the 1970s and 1980s as developers began to realize that the complexity of software was evolving so rapidly that a different approach was needed if actual software development was going to keep pace with the demands placed upon it.1 This fact had already been realized by hardware designers. By the mid 1970s, it was standard practice to build digital systems by composing them from standard, well-tested integrated circuits that encapsulated sophisticated, powerful subsystems that we easily reused in thousands of applications. By the 1990s, even the designers of integrated circuits such as microprocessors were building them by composing them from standard cell libraries that provided components such as registers and floating-point units that could be arranged on the chip and easily integrated to form a full processor. Now, multiple processor cores can be assembled on a single chip as components of larger systems.
In many applications, the solutions to important partial differential equations are characterized... more In many applications, the solutions to important partial differential equations are characterized by a sharp active region of transition (such as wave fronts or areas of rapid diffusion) surrounded by relatively calm stable regions. The numerical approximation to such a solution is based on a mesh or grid structure that is best when very fine in the active region and coarse in the calm region. This work considers algorithms for the proper construction of locally refined grids for finite element based methods for the solution to such problems. By extending the work of Babuska and Rheinboldt to the case of parabolic problems, refinement criteria are developed and tested for this class of problems. The computational complexity of such a strategy is studied, and algorithms based on nested dissection are presented to solve the associated linear algebra problems.
A software component framework is one where an application designer programs by composing well un... more A software component framework is one where an application designer programs by composing well understood and tested "components" rather than writing large volumes of not-very-reusable code. The software industry has been using component technology to build desktop applications for about ten years now. More recently this idea has been extended to application in distributed systems with frameworks like the Corba Component Model and Enterprise Java Beans. With the advent of Grid computing, high performance applications may be distributed over a wide area network of compute and data servers. Also "peerto-peer" applications exploit vast amounts of parallelism exploiting the resources of thousands of servers. In this talk we look at the problem of building a component technology for scientific applications. The common component architecture project seeks to build a framework that allows software components runing on a massively parallel computers to be linked together to form wide-area, high performance application services that may be accessed from desktop applications. This problem is far from being solved and the talk will describe progress to date and outline some of the difficult problems that remain to be solved.
ABSTRACT Grid systems are now a standard approach to solving problems in large scale, multidiscip... more ABSTRACT Grid systems are now a standard approach to solving problems in large scale, multidisciplinary scientific endeavors. These research groups are often geographically distributed, and to conduct their research, they need to share access to physical resources such as supercomputers, large databases, on-line instruments and distributed applications. Grid infrastructure helps solve their problems because it can provide a layer of middleware that virtualizes the access to these resources. The users see a single coherent computer system instead of a complex network of distributed resources. In most cases, users enter the grid though a ”gateway” portal, which may be a web portal or a ”thick” desktop client. The gateway gives the user a way to browse metadata about computational experiments to access data products, to monitor active workflows and to run applications and share results. The user can focus on the problems of science and not computer systems. All of this is made possible because of the Service Oriented Architecture (SOA) that underlies the core Grid middleware. In this presentation, we will look at several examples of successful Scientific Grids and Gateways. We will also describe the fundamentals of the web service SOAs that works best in Grid systems. We will illustrate these ideas with an example called LEAD which is a Grid designed to improve our ability to predict meso-scale weather events such as hurricanes, typhoons and tornadoes. We will also describe how this entire approach to service virtualization is now being used in industry to better use the resources of a single, but distributed business enterprise. While a great deal of progress has been made there are many exciting and unsolved problems. As we go through the talk, we will highlight these challenges and research opportunities.
Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing ICPPW-96, 1996
HPC++ is a C++ library and language extension framework that is being developed by the HPC++ cons... more HPC++ is a C++ library and language extension framework that is being developed by the HPC++ consortium as a standard model for portable parallel C++ programming. This paper provides a brief introduction to HPC++ style programming and outlines some of the unresolved issues
Proceedings of the 1989 ACM/IEEE conference on Supercomputing - Supercomputing '89, 1989
In this paper we describe an interactive tool designed for performance prediction of parallel pro... more In this paper we describe an interactive tool designed for performance prediction of parallel programs. Static performance prediction, in general, is a very difficult task. In order to avoid some inherent problems, we concentrate on reasonably structured scientific programs. Our prediction system, which is built as a sub-system of a larger interactive environment, uses a parser, dependence analyzer, database and an X-window based front end in analyzing programs. The system provides the user with execution times of different sections of programs. When there are unknowns involved, such as number of processors or unknown loop bounds, the output is an algebraic expression in terms of these variables. We propose a simple analytical model as an attempt to predict performance degradation due to data references in hierarchical memory systems. The predicted execution times of some Lawrence Livermore loop kernels are given together with the experimental values obtained by executing the loops on Alliant FX/8.
This report discusses a VLSI implementation of a record-sorting stack. Records are represented on... more This report discusses a VLSI implementation of a record-sorting stack. Records are represented on the stack as (key, record-pointer) pairs, and the operations supported ere PUSH, POP, and CLEAR When records are POPped, they ere returned in smallest-first order. The implementation allows the sorting of n records in O(n) time, and the design is cascadable so that the capacity of a single VLSI chip does not limit the amount of data which may be sorted. This report describes a paper design and evaluation, and thus serves two purposes: lt describes one particular VLSI sorting circuit, and it also serves as a case study in VLSI design methodology. The algorithm is described, the overall chip organization and data ti.ow are presented, and detailed circuits, layouts, and timing analyses are given. niD work W'2IS supported by the Natianel Science FOUlldation Grant ECS-Sll0684, a CileT?'Qil U.S.A. Career Development Gnmt. a California llICRO FelloWllbip, the Mr Farce 01fice of Sc:iemijic Research Gnmt. AFOSR-7~3596, and the NaTlll Electroilic Systema Commend Contract mstrNOCXl39-Bl-C-o569.
Abstract: The divide-and-conquer paradigm yields algorithms that parallelize easily, a veryimport... more Abstract: The divide-and-conquer paradigm yields algorithms that parallelize easily, a veryimportant consideration in high-performance computing. However, high-performancecomputing also relies on local reuse of data in a memory hierarchy of registers, caches,main memory, and swapping disks.
Opportunities and Constraints of Parallel Computing, 1989
The subject of parallel computation has been around for almost 20 years and, until very recently,... more The subject of parallel computation has been around for almost 20 years and, until very recently, it has been an exotic subdiscipline of computer architecture and algorithm analysis. However, recently we have seen a fundamental shift in the role of parallelism studies in both academic computer science and in the computer industry. In particular, parallel computation issues are now a significant part of research in most branches of computer science and the industry sees parallelism as the only answer to the next generation of high performance machines. Because we can now build parallel systems with relative ease and because application programmers are now intensively involved in the use of these machines, the nature of basic research has shifted focus. In the past, when there were no real machines available, all we could do was make theoretical studies of parallel algorithm complexity and debate the merits of switching network design. Now the discipline more closely resembles other sciences in that there is a very active experimental branch of computer science that is busy testing new design theories and algorithms. The experimental work is driving new investigations of theoretical questions that have arisen from our need to fully understand this first generation of parallel hardware.
ABSTRACT The idea of building computer applications by composing them out of reusable software co... more ABSTRACT The idea of building computer applications by composing them out of reusable software components is a concept that emerged in the 1970s and 1980s as developers began to realize that the complexity of software was evolving so rapidly that a different approach was needed if actual software development was going to keep pace with the demands placed upon it.1 This fact had already been realized by hardware designers. By the mid 1970s, it was standard practice to build digital systems by composing them from standard, well-tested integrated circuits that encapsulated sophisticated, powerful subsystems that we easily reused in thousands of applications. By the 1990s, even the designers of integrated circuits such as microprocessors were building them by composing them from standard cell libraries that provided components such as registers and floating-point units that could be arranged on the chip and easily integrated to form a full processor. Now, multiple processor cores can be assembled on a single chip as components of larger systems.
In many applications, the solutions to important partial differential equations are characterized... more In many applications, the solutions to important partial differential equations are characterized by a sharp active region of transition (such as wave fronts or areas of rapid diffusion) surrounded by relatively calm stable regions. The numerical approximation to such a solution is based on a mesh or grid structure that is best when very fine in the active region and coarse in the calm region. This work considers algorithms for the proper construction of locally refined grids for finite element based methods for the solution to such problems. By extending the work of Babuska and Rheinboldt to the case of parabolic problems, refinement criteria are developed and tested for this class of problems. The computational complexity of such a strategy is studied, and algorithms based on nested dissection are presented to solve the associated linear algebra problems.
A software component framework is one where an application designer programs by composing well un... more A software component framework is one where an application designer programs by composing well understood and tested "components" rather than writing large volumes of not-very-reusable code. The software industry has been using component technology to build desktop applications for about ten years now. More recently this idea has been extended to application in distributed systems with frameworks like the Corba Component Model and Enterprise Java Beans. With the advent of Grid computing, high performance applications may be distributed over a wide area network of compute and data servers. Also "peerto-peer" applications exploit vast amounts of parallelism exploiting the resources of thousands of servers. In this talk we look at the problem of building a component technology for scientific applications. The common component architecture project seeks to build a framework that allows software components runing on a massively parallel computers to be linked together to form wide-area, high performance application services that may be accessed from desktop applications. This problem is far from being solved and the talk will describe progress to date and outline some of the difficult problems that remain to be solved.
ABSTRACT Grid systems are now a standard approach to solving problems in large scale, multidiscip... more ABSTRACT Grid systems are now a standard approach to solving problems in large scale, multidisciplinary scientific endeavors. These research groups are often geographically distributed, and to conduct their research, they need to share access to physical resources such as supercomputers, large databases, on-line instruments and distributed applications. Grid infrastructure helps solve their problems because it can provide a layer of middleware that virtualizes the access to these resources. The users see a single coherent computer system instead of a complex network of distributed resources. In most cases, users enter the grid though a ”gateway” portal, which may be a web portal or a ”thick” desktop client. The gateway gives the user a way to browse metadata about computational experiments to access data products, to monitor active workflows and to run applications and share results. The user can focus on the problems of science and not computer systems. All of this is made possible because of the Service Oriented Architecture (SOA) that underlies the core Grid middleware. In this presentation, we will look at several examples of successful Scientific Grids and Gateways. We will also describe the fundamentals of the web service SOAs that works best in Grid systems. We will illustrate these ideas with an example called LEAD which is a Grid designed to improve our ability to predict meso-scale weather events such as hurricanes, typhoons and tornadoes. We will also describe how this entire approach to service virtualization is now being used in industry to better use the resources of a single, but distributed business enterprise. While a great deal of progress has been made there are many exciting and unsolved problems. As we go through the talk, we will highlight these challenges and research opportunities.
Uploads
Papers by Dennis Gannon