Speculative Execution
19 Followers
Recent papers in Speculative Execution
Hardware multithreading is becoming a generally applied technique in the next generation of microprocessors. Several multithreaded processors are announced by industry or already into production in the areas of high-performance... more
The Cydra 5 is a VLIW minisupercomputer with hardware designed to accelerate a broad class of inner loops, presenting unique challenges to its compilers. We discuss the organization of its Fortran/77 compiler and several of the key... more
Improving architectural energy efficiency is important to address diminishing energy efficiency gains from technology scaling. At the same time, limiting hardware complexity is also important. This paper presents a new processor... more
Over the last 20 years, the open-source community has provided more and more software on which the world's highperformance computing systems depend for performance and productivity. The community has invested millions of dollars and years... more
an NSF Graduate Research Fellowship and NSF and Darpa grants to the Fugu and Raw projects. While provided a vital support network. Most of all, I have relied on my wife, Kathleen Shannon, and my children, Karissa and Anya. Their love has... more
The emergence and wide adoption of web applications have moved the client-side component, often written in JavaScript, to the forefront of computing on the web. Web application developers try to move more computation to the client side to... more
This paper proposes a new hardware technique for us-ing one core of a CMP to prefetch data for a thread run-ning on another core. Our approach simply executes a copy of all non-control instructions in the prefetching core af-ter they have... more
The AMD-K6 MMX-enabled processor is plugcompatible with the industry-standard Socket 7 and is binary compatible with the existing base of legacy X86 software. The microarchitecture is based on an out-of-order, superscalar execution engine... more
Current microprocessors utilise the instruction-level parallelism by a deep processor pipeline and the superscalar instruction issue technique. VLSI technology offers several solutions for aggressive exploitation of the instruction-level... more
The paper presents an approach helping developers to maintain source code identifiers and comments consistent with high-level artifacts. Specifically the approach computes and shows the textual similarity between source code and related... more
Branch predictor (BP) is an essential component in modern processors since high BP accuracy can improve performance and reduce energy by decreasing the number of instructions executed on wrong-path. However, reducing latency and storage... more
Early web content was expressed statically, making it amenable to straightforward prefetching to reduce user- perceived network delay. In contrast, today's rich web applications often hide content behind JavaScript event handlers,... more
To narrow the widening gap between processor and memory performance, the authors propose improving the cache locality of pointer-manipulating programs and bolstering performance by careful placement of structure elements.
Designers face many choices when planning a new high-performance, general purpose microprocessor. Options include superscalar organization (the ability to dispatch and execute more than one instruction at a time), out-of-order issue of... more
The Multiflow compiler uses the trace scheduling algorithm to find and exploit instruction-level parallelism beyond basic blocks. The compiler generates code for VLIW computers that issue up to 28 operations each cycle and maintain more... more
Improving MapReduce Performance in Heterogeneous Environments. ... If a node crashes, MapReduce re-runs its tasks on a different machine. ...
Performance of multithreaded programs is heavily influenced by the latencies of the thread management and synchronization operations. Improving these latencies becomes especially important when the parallelization is performed at fine... more
In modern superscalar microarchitectures that speculatively execute a great quantity of code, without performing branch prediction, it won't be possible to aggressively exploit instruction level parallelism from programs. Both the... more
As the di erence in speed between processor and memory system continues to increase, it is becoming crucial to develop and re ne techniques that enhance the e ectiveness of cache hierarchies. Two such techniques are data prefetching and... more
Performance of multithreaded programs is heavily influenced by the latencies of the thread management and synchronization operations. Improving these latencies becomes especially important when the parallelization is performed at fine... more
ABSTRACT There have been a number of successes in the past few years in use of formal methods for verification of real-time systems, and also in source-to-source transformation of these systems for improved analysis, performance, and... more
In modern superscalar microarchitectures that speculatively execute a great quantity of code, without performing branch prediction, it won't be possible to aggressively exploit program's instruction level parallelism. Both the... more
Over the last 20 years, the open-source community has provided more and more software on which the world’s high-performance computing systems depend for performance and productivity. The community has invested millions of dollars and... more
Two-level predictors deliver highly accurate conditional branch prediction, indirect branch target prediction and value prediction. Accurate prediction enables speculative execution of instructions, a technique that increases instruction... more
Speculative Multithreading (SpMT) increases the performance by means of executing multiple threads speculatively to exploit thread-level parallelism. By combining software and hardware approaches, we have improved the capabilities of... more
This paper presents Threaded Multi-Path Execution (TME), which exploits existing hardware on a Simultaneous Multithreading (SMT) processor to speculatively execute multiple paths of execution. When there are fewer threads in an SMT... more
Compiler optimizations are often driven by specific assumptions about the underlying architecture and implementation of the target machine. For example, when targeting shared-memory multiprocessors, parallel programs are compiled to... more
Speculative locking (SL) protocols have been proposed in the literature for improving the performance of read-only transactions (ROTs) without correctness and data currency issues. In these protocols, ROTs carry out speculative executions... more
Irregular algorithms are organized around pointer-based data structures such as graphs and trees, and they are ubiquitous in applications. Recent work by the Galois project has provided a systematic approach for parallelizing irregular... more
To improve the utilization of machine resources in superscalar processors, the instructions have to be carefully scheduled by the compiler. As internal parallelism and pipelining increases, it becomes evident that scheduling should be... more
... Among oth-ers, this support was provided by Rachel Allen, Scott Blomquist, Michael Chan, Cornelia Colyer, Mary Ann Ladd, Anne McCarthy, Marilyn Pierce, Lila Rhoades, Ty Sealy ... Most of all, I have relied on my wife, Kathleen... more
Recent research in thread-level speculation (TLS) has proposed several mechanisms for optimistic execution of difficultto-analyze serial codes in parallel. Though it has been shown that TLS helps to achieve higher levels of parallelism,... more
Replicated state machines are an important and widely-studied methodology for tolerating a wide range of faults. Unfortunately, while replicas should be distributed geographically for maximum fault tolerance, current replicated state... more
Irregular algorithms are organized around pointer-based data structures such as graphs and trees, and they are ubiquitous in applications. Recent work by the Galois project has provided a systematic approach for parallelizing irregular... more
The WaveScalar is the first DataFlow Architecture that can efficiently provide the sequential memory semantics required by imperative languages. This work presents an alternative memory ordering mechanism for this architecture, the... more
This paper presents new achievements on the automatic mapping of abstract algorithms, written in imperative software programming languages, to custom computing machines. The reconfigurable hardware element of the target architecture... more
Instruction-level parallelism in a single stream of code for non-numerical applications has been the subject of many recent researches. This work extends the analysis to symbolic applications described with logic programming. In... more
PEN-CHUNG YEW and ROY DZ-CHING JU, TIN-FOOK NGAI, SUN CHAN ________________________________________________________________________ Speculative execution, such as control speculation or data speculation, is an effective way to improve... more
Speculative execution, such as control speculation and data speculation, is an effective way to improve program performance. Using edge/path profile information or simple heuristic rules, existing compiler frameworks can adequately... more
The contribution of memory latency to execution time continues to increase, and latency hiding mechanisms become ever more important for efficient processor design. While high-end processors can use elaborate techniques like multiple... more
Predicated execution is an effective technique for dealing with conditional branches in application programs. However, there are several problems associated with conventional compiler support for predicated execution. First, all paths of... more
Cloud computing systems use distributed file systems (DFSs) to store and process large data generated in the organizations. The users of the web-based information systems very frequently perform read operations and infrequently carry out... more
The AMD-K6 MMX-enabled processor is plugcompatible with the industry-standard Socket 7 and is binary compatible with the existing base of legacy X86 software. The microarchitecture is based on an out-of-order, superscalar execution engine... more
A long-running transaction is an interactive component of a distributed system which must be executed as if it were a single atomic action. In principle, it should not be interrupted or fail in the middle, and it must not be interleaved... more