Shared memory multiprocessor system
0 Followers
Recent papers in Shared memory multiprocessor system
Recent advances in the development of reconfigurable optical interconnect technologies allow for the fabrication of low cost and run-time adaptable interconnects in large distributed shared-memory (DSM) multiprocessor machines. This can... more
Toward the direction of supporting an end-to-end video-on-demand (VOD) system with guaranteed quality over ATM networks, this paper introduces our e orts to solve this challenging problem from the source: building VOD servers. One of the... more
We present Lazy Binary Splitting (LBS), a user-level scheduler of nested parallelism for shared-memory multiprocessors that builds on existing Eager Binary Splitting work-stealing (EBS) implemented in Intel's Threading Building... more
Multiprocessor systems are increasingly becoming the systems of choice for low and high-end servers, running such diverse tasks as number crunching, large-scale simulations, data base engines and world wide web server applications. With... more
Distributed Shared Memory is a good solution to the scalability, complexity and high cost problems of large scale Shared Memory Multiprocessors, as well as to the difficulty of the programming model problem of the message passing... more
The specification and verification of shared-memory multiprocessor cache coherence protocols is a paradigmatic example of parallel technologies where formal methods can be applied. In this paper we present the specification and... more
We study a high order fiuite difference scheme to solve the time accurate flow field of a jet using the compressible Navier-Stokes equations. As part of our ongoing efforts, we have implemented our numerical model on three parallel... more
We describe a parallel resolution theorem prover, called Parthenon* that handles full first order logic. Although there has been much work on parallel implementations of logic programming languages, our system is apparently the first... more
This paper presents Jade, a high-level parallel programming language for managing coarse-grain concurrency. Jade simplifies programming by providing the programmer with the abstractions of sequential execution and a shared address space.... more
We propose a new parallel Branch and Bound algorithm for the Quadratic Assignment Problem, which is a Combinatorial Optimization problem known to be very hard to solve exactly. An original method to distribute work to processors using the... more
Shared memory multiprocessors come back to popularity thanks to rapid spreading of commodity multi-core architectures. As ever, shared memory programs are fairly easy to write and quite hard to optimise; providing multi-core programmers... more
This paper presents an extension of the Latency Time (LT) scheduling algorithm for assigning tasks with arbitrary execution times on a multiprocessor with shared memory. The Extended Latency Time (ELT) algorithm adds to the priority... more
Modern shared-memory multiprocessors require complex interconnection networks to provide sufficient communication bandwidth between processors. They also rely on advanced memory systems that allow multiple memory operations to be made in... more
Recent advances in the development of reconfigurable optical interconnect technologies allow for the fabrication of low cost and run-time adaptable interconnects in large distributed shared-memory (DSM) multiprocessor machines. This can... more
In this work we put into evidence how the memory performance of a Web-Server machine may depend on the sharing induced by process migration. We considered a shared-bus shared-memory multiprocessor as the simplest multiprocessor... more
Communication has always been a limiting factor in making efficient computing architectures with large processor counts. Reconfigurable interconnects can help in this respect, since they can adapt the interprocessor network to the... more
In this paper, we propose efficient parallel implementations of the auction/sequential shortest path and the e-relaxation algorithms for solving the linear minimum cost flow problem. In the parallel auction algorithm, several augmenting... more
This paper deals with a new class of parallel asynchronous iterative algorithms for the solution of nonlinear systems of equations. The main feature of the new class of methods presented here is the possibility of flexible communication... more
We propose a new parallel Branch and Bound algorithm for the Quadratic Assignment Problem, which is a Combinatorial Optimization problem known to be very hard to solve exactly. An original method to distribute work to processors using the... more
In this work, we characterized the impact of operating system activities like process migration on a shared-bus shared-memory multiprocessor running typical DBMS workload. Our workload has been setup utilizing the TPC-D benchmark on the... more
Allowing multiple threads to execute within the same address space makes it easier to write programs that deal with related asynchronous activities and that execute faster on shared-memory multiprocessors. Supporting multiple threads... more
12 3.3 Characteristics of most time-consuming subroutines of COMMIX-1AR/P for data set Plr0 running in baseline SV mode on Cray X..MP (from PERFTRAC_) ...... 13 3.4 Characteristics of most, time-consuming subroutines of COMMIX-1AR/P for... more
We present a framework for parallel programming, based on three conceptual classes for understanding parallelism and three programming paradigms for implementing parallel programs. The conceptual classes are result parallelism, which... more
Many aftempts have been made to add con-currency to C+ +, often by extensive compiler extensions, but much of the work has not exploited the power of C+ +. This pa er shows how the object-oriented facilities of [+ + are powerful enough to... more
We study the computational, communication, and scalability characteristics of a computational fluid dynamics application, which solves the time-accurate flow field of a jet using the compressible Navier-Stokes equations, on a variety of... more
In this work, we characterized the impact of operating system activities like process migration on a shared-bus shared-memory multiprocessor running typical DBMS workload. Our workload has been setup utilizing the TPC-D benchmark on the... more
In this work, we characterized the impact of operating system activities like process migration on a shared-bus shared-memory multiprocessor running typical DBMS workload. Our workload has been setup utilizing the TPC-D benchmark on the... more
12 3.3 Characteristics of most time-consuming subroutines of COMMIX-1AR/P for data set Plr0 running in baseline SV mode on Cray X..MP (from PERFTRAC_) ...... 13 3.4 Characteristics of most, time-consuming subroutines of COMMIX-1AR/P for... more
We describe the design and implementation of Tornado, a new operating system designed from the ground up specifically for today's shared memory multiprocessors. The need for improved locality in the operating system is growing as... more
In a modern shared memory multiprocessor, it is possible to support more than one protocol for maintaining cache coherence. Possible candidates might be based on the Write-Back/Invalidate, Write-Through/Invalidate, and Write-Update... more
In the standard kernel organization on a bus-based multiprocessor, all processors share the code and data of the operating system; explicit synchronization is used to control access to kernel data structures. Distributed-memory... more
Load balancing is a crucial factor in achieving good performance for parallel discrete event simulations. We present a load balancing scheme that combines both static partitioning and dynamic load balancing. The static partitioning scheme... more
Protection and security are becoming essential requirements in commercial servers. In this paper, we present a fast and efficient method for providing secure memory and cache-to-cache communications in shared memory multiprocessor systems... more
We constructed a simulation model, using the PAWS simulation modeling tools, to predict the peflormance of a shared memory multiprocessor operating system. The operating system is divided into 5 distinct sections, called empires. We used... more
Speculative parallelization aggressively executes in parallel codes that cannot be fully parallelized by the compiler. Past proposals of hardware schemes have mostly focused on single-chip multiprocessors (CMPs), whose effectiveness is... more
12 3.3 Characteristics of most time-consuming subroutines of COMMIX-1AR/P for data set Plr0 running in baseline SV mode on Cray X..MP (from PERFTRAC_) ...... 13 3.4 Characteristics of most, time-consuming subroutines of COMMIX-1AR/P for... more
A common operation in multiprocessor programs is acquiring a lock to protect access to shared data. Typically, the requesting thread is blocked if the lock it needs is held by another thread. The cost of blocking one thread and activating... more
We propose a new parallel Branch and Bound algorithm for the Quadratic Assignment Problem, which is a Combinatorial Optimization problem known to be very hard to solve exactly. An original method to distribute work to processors using the... more
Parallel processing has been proposed as a means of improving network protocol throughput. Several different strategies have been taken towards parallelizing protocols. A relatively popular approach is packet-level parallelism, where... more
Parallelizing compilers for multiprocessors face many hurdles. However, SUIF’s robust analysis and memory optimization techniques enabled speedups on three fourths of the NAS and SPECfp95 benchmark programs. © 1996 IEEE. Reprinted, with... more
Abstract Filtered back projection is a popular algorithm for the reconstruction of n -dimensional signals from their ( n -1)-dimensional projections (in the sense of line integrals). Here we specifically treat the problem of the... more