Thread Level Speculation Research Papers

Heterogeneous systems that integrate a multicore CPU and a GPU on the same die are ubiquitous. On these systems, both the CPU and GPU share the same physical memory as opposed to using separate memory dies. Although integration eliminates... more

Bookmark
Download
- by Mohammad Dashti
- •
- 7
  Computer Science, Parallel Computing, Memory Management, Shared memory

Bookmark
Download
- by Diego R. Llanos
- •
- 6
  Computer Science, Parallel Computing, Compiler, Software

This research investigates the feasibility of developing low-cost, personal computer-based parallel processing procedures that can be applicable to real time simulation of freeway flows. Specific objectives include, * Development of a... more

Bookmark
Download
- by Eil Kwon
- •
- 2
  Computer Science, Traffic Simulation

Speculative parallelization is a technique that tries to extract parallelism of loops that can not be parallelized at compile time. The underlying idea is to optimistically execute the code in parallel, while a subsystem checks that... more

Bookmark
Download
- by Diego R. Llanos
- •
- 6
  Computer Science, Parallel Computing, Compiler, Software

With speculative parallelization, code sections that cannot be fully analyzed by the compiler are aggressively executed in parallel. Hardware schemes are fast but expensive and require modifications to the processors and memory system.... more

Bookmark
Download
- by Diego R. Llanos
- •
- 12
  Computer Science, Parallel Computing, Compiler, Software

Speculative parallelization techniques allow to extract parallelism of fragments of code that can not be analyzed at compile time. However, research on software-based, thread-level speculation will greatly benefit from an appropriate... more

Bookmark
Download
- by Diego R. Llanos
- •
- 12
  Computer Science, Parallel Computing, XML, Parallel Processing

Software-based, thread-level speculation (TLS) systems allow the parallel execution of loops that can not be analyzed at compile time. TLS systems optimistically assume that the loop is parallelizable, and augment the original code with... more

Bookmark
Download
- by Diego R. Llanos
- •
- 4
  Computer Science, Parallel Computing, Speculation, Springer Ebooks

Robustness is a key issue on any runtime system that aims to speed up the execution of a program. However, robustness considerations are commonly overlooked when new software-based, thread-level speculation (STLS) systems are proposed.... more

With speculative parallelization, code sections that cannot be fully analyzed by the compiler are optimistically executed in parallel. Hardware schemes are fast but expensive and require modifications to the processors and/or memory... more

Some emerging technologies try to exploit the parallel capabilities of modern processors.

With the rapid expansion of process mining implementation in global enterprises distributed across numerous branches, there is a critical requirement to develop an application qualified for real-time operation with fast and precise data... more

With the advent of parallel architectures, distributed programs are used intensively and the question of how to formally specify the behaviors expected from such programs becomes crucial. A very general way to specify concurrent objects... more

This paper proposes a new processor architecture for handling hard-to-predict branches, the diverge-merge processor (DMP). The goal of this paradigm is to eliminate branch mispredictions due to hard-to-predict dynamic branches by... more

Bookmark
Download
- by Wen-mei Hwu
- •
- 3
  Computer Science, Parallel Computing, Springer Ebooks

Lazy hardware transactional memory has been shown to be more efficient at extracting available concurrency than its eager counterpart. However, it poses scalability challenges at commit time as existence of conflicts among concurrent... more

Thread-Level Speculation (TLS) facilitates the extraction of parallel threads from sequential applications. Most prior work has focused on developing the compiler and architecture for this execution paradigm. Such studies often narrowly... more

Bookmark
Download
- by Dr. Salman Khan
- •
- 5
  Computer Science, Parallel Computing, Architecture, Compiler

Work by Hill and Wood was performed while consulting for AMD Research. Work by Hechtman was performed while on internship at AMD Research.

Effective execution of atomic blocks of instructions (also called transactions) can enhance the performance and programmability of multiprocessors. Atomic blocks can be demarcated in software as in Transactional Memory (TM) or dynamically... more

In shared-memory multicore architectures, handling a write cache operation is more complicated than in singleprocessor systems. A cache line may be present in more than one private L1 cache. Any cache willing to write this line must... more

Bookmark
Download
- by Lucas Garcia
- •
- 6
  Computer Science, IEEE, Cache, Cache Coherence

The VLIW model describes a philosophy whereby the compiler organizes several nondependent machine operations into the same instructio:n word. Some features of this form of architecture are illustrated and certain strategies on presenting... more

Bookmark
Download
- by John Impagliazzo
- •
- 3
  Computer Science, Architecture, Compiler

Hardware transactional memory (HTM) systems have been studied extensively along the dimensions of speculative versioning and contention management policies. The relative performance of several designs policies has been discussed at length... more

Scheduling for speculative parallelization is a problem that remained unsolved despite its importance. Simple methods such as Fixed-Size Chunking (FSC) need several 'dry-runs' before an acceptable chunk size is found. Other traditional... more

Bookmark
Download
- by Belén Palop
- •
- 8
  Computer Science, Parallel Computing, Just in Time, Speculative Execution

This research investigates the feasibility of developing low-cost, personal computer-based parallel processing procedures that can be applicable to real time simulation of freeway flows. Specific objectives include, * Development of a... more

Bookmark
Download
- by Eil Kwon
- •
- 2
  Computer Science, Traffic Simulation

CEFOS is an operating system based on a continuationbased zero-wait thread model derived from a data-flow computing model. A program consists of zero-wait threads, each of which runs to completion without suspension once started.... more

In parallel processing, fine-grain parallel processing is quite effective solution for latency problem caused by remote memory accesses and remote procedure calls. We have proposed a processor architecture, called Datarol-II, that... more

Bookmark
Download
- by Makoto Amamiya
- •
- 11
  Computer Science, Architecture, Parallel Processing, System Design

Software developers often face challenges in terms of quality and productivity to match competitive costs. The software industry seeks options to minimize this cost during different phases of software development and maintenance with... more

Software developers often face challenges in terms of quality and productivity to match competitive costs. The software industry seeks options to minimize this cost during different phases of software development and maintenance with improved productivity. Software developers adopt different tools for different purposes, such as understanding program behavior, debugging memory issues, debugging concurrency issues, and testing. In this article we study different debugging tools mostly used for program design analysis, thread debugging, and resource management. Stand-alone tools do track static or dynamic control flow, thread activities, etc. But these do not specifically identify the thread work-breakdownstructure, global memory location management, thread-data interaction, etc. to allow good comprehension of the concurrency model of the program. Similarly for resource management, we observe that the Valgrind addresses a few required features but does not offer automatic garbage collection. Moreover, to address the outcomes of different tools, developers must compile and configure the application in different environments. This is very time-consuming, requires skills in different software paradigms, and is sometimes not supported by the tool itself. As a result, they cannot be used in an inter-operable manner to analyze by relating the different tool's outcomes. In this study, we conduct a detailed survey of the available tools and techniques and their limitations in identifying gaps. We address these gaps by implementing the tools for different phases of software development and maintenance. For example, a concurrency model detector based on thread behavior, resource debugger with features of automatic garbage collection, etc. can collectively inter-operate within our designed open-source tool framework PARALLELC-ASSIST to address the common requests of the developers in one toolset. The tool is built upon open-source dynamic instrumentation tool PIN and supports a wide variety of IDEs and OS to detect various multi-threaded memory issues and provide additional features to inject concerns dynamically at run-time to extend it further according to the user's needs. We verify our tool with a wide variety of industry-standard benchmarks and compare its features with other similar tools. INDEX TERMS Multi-threaded issues, memory issues, dynamic instrumentation.

Transactions are a simple and powerful mechanism for establishing fault-tolerance. To allow multiple processes to cooperate in a transaction we relax the isolation property and use message passing for communication. We call the new... more

Bookmark
Download
- by Jason Hickey
- •
- 3
  Computer Science, Message Passing, Fault Tolerant

Bookmark
Download
- by Joel Emer
- •
- 9
  Computer Science, Computer Architecture, Parallel Computing, Architecture

Bookmark
Download
- by Jose Renau
- •
- 10
  Computer Science, Parallel Computing, Computer Hardware, Speculation

建築の「ものづくり分析」を行う意味第１部ものづくり経営学から見た建築（建築物と「広義のものづくり」分析；日本型建築生産システムの成立とその強み・弱み—ゼネコンを中心とした擦り合わせ型アーキテクチャの形成と課題；建築における価値創造—建築設計、建築施工において求められる「機能」の実現；プロダクトからサービスへ）第２部... more

Bookmark
Download
- by 隆宏藤本
- •
- 3
  Computer Science, Computer Architecture, Architecture

To obtain the benefits of aggressive, wide-issue, architectures, a large window of valid instructions must be available. While researchers have been successful in obtaining high accuracies with a range of dynamic branch predictors, there... more

Bookmark
Download
- by David Kaeli
- •
- 2
  Computer Science, Predictability

The notion of permissiveness in Transactional Memory (TM) translates to only aborting a transaction when it cannot be accepted in any history that guarantees correctness criterion. This property is neglected by most TMs, which, in order... more

Bookmark
Download
- by Paolo Romano
- •
- 5
  Computer Science, Transaction Processing, Correctness, Permissiveness

Software developers often face challenges in terms of quality and productivity to match competitive costs. The software industry seeks options to minimize this cost during different phases of software development and maintenance with... more

Software developers often face challenges in terms of quality and productivity to match competitive costs. The software industry seeks options to minimize this cost during different phases of software development and maintenance with improved productivity. Software developers adopt different tools for different purposes, such as understanding program behavior, debugging memory issues, debugging concurrency issues, and testing. In this article we study different debugging tools mostly used for program design analysis, thread debugging, and resource management. Stand-alone tools do track static or dynamic control flow, thread activities, etc. But these do not specifically identify the thread work-breakdownstructure, global memory location management, thread-data interaction, etc. to allow good comprehension of the concurrency model of the program. Similarly for resource management, we observe that the Valgrind addresses a few required features but does not offer automatic garbage collection. Moreover, to address the outcomes of different tools, developers must compile and configure the application in different environments. This is very time-consuming, requires skills in different software paradigms, and is sometimes not supported by the tool itself. As a result, they cannot be used in an inter-operable manner to analyze by relating the different tool's outcomes. In this study, we conduct a detailed survey of the available tools and techniques and their limitations in identifying gaps. We address these gaps by implementing the tools for different phases of software development and maintenance. For example, a concurrency model detector based on thread behavior, resource debugger with features of automatic garbage collection, etc. can collectively inter-operate within our designed open-source tool framework PARALLELC-ASSIST to address the common requests of the developers in one toolset. The tool is built upon open-source dynamic instrumentation tool PIN and supports a wide variety of IDEs and OS to detect various multi-threaded memory issues and provide additional features to inject concerns dynamically at run-time to extend it further according to the user's needs. We verify our tool with a wide variety of industry-standard benchmarks and compare its features with other similar tools. INDEX TERMS Multi-threaded issues, memory issues, dynamic instrumentation.

The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and... more

In this paper, we present parallel algorithms for lossless data compression based on the Burrows-Wheeler Transform (BWT) block-sorting technique. We investigate the performance of using data parallelism and task parallelism for both... more

This paper proposes a new runtime parallelization technique, based on a dynamic optimization framework, to automatically parallelize single-threaded legacy programs. It heavily leverages the optimistic concurrency of transactional memory.... more

Bookmark
Download
- by Dean Tullsen
- •
- 7
  Computer Science, Parallel Computing, Compilers, Dynamic Optimization

This research demonstrates that coming support for hardware transactional memory can be leveraged to significantly reduce the cost of implementing true speculative multithreading. In particular, it explores the path from eager conflict... more

Dependence-aware transactional memory (DATM) is a recently proposed model for increasing concurrency of memory transactions without complicating their interface. DATM manages dependences between conflicting, uncommitted transactions so... more

This paper describes a generalisation of modulo scheduling to parallelise loops for SpMT processors that exploits simultaneously both instruction-level parallelism and thread-level parallelism while preserving the simplicity and... more

Performance Improvements and decreasing execution time had been started half a century ago, along with the development of new chipsets and microprocessors with increased clock speeds, the software engineers have also developed ways to... more

Implicit Parallelism with Ordered Transactions (IPOT) is an extension of sequential or explicitly parallel programming models to support speculative parallelization. The key idea is to specify opportunities for parallelization in a... more

Chip Multiprocessors (CMP) with Thread-Level Speculation (TLS) have become the subject of intense research. However, TLS is suspected of being too energy inefficient to compete against conventional processors. In this paper, we refute... more

Bookmark
Download
- by Luis Ceze
- •
- 11
  Computer Science, Performance, Energy Consumption, Speculation

Chip Multiprocessors (CMPs) are flexible, high-frequency platforms on which to support Thread-Level Speculation (TLS). However, for TLS to deliver on its promise, CMPs must exploit multiple sources of speculative task-level parallelism,... more

Bookmark
Download
- by Luis Ceze
- •
- 10
  Computer Science, Parallel Computing, Resource Allocation, Compiler

As multi-core architectures with Thread-Level Speculation (TLS) are becoming better understood, it is important to focus on TLS compilation. TLS compilers are interesting in that, while they do not need to fully prove the independence of... more

Bookmark
Download
- by Luis Ceze
- •
- 7
  Computer Science, Compiler, Processor Architecture, Thread Level Speculation

As Thread-Level Speculation (TLS) architectures are becoming better understood, it is important to focus on the role of TLS compilers. In systems where tasks are generated in software, the compiler often has a major performance impact:... more

Bookmark
Download
- by Luis Ceze
- •
- 8
  Computer Science, Parallel Computing, Compiler, Thread Level Speculation

Transactional Memory (TM), Thread-Level Speculation (TLS), and Checkpointed multiprocessors are three popular architectural techniques based on the execution of multiple, cooperating speculative threads. In these environments, correctly... more

When supported in silicon, transactional memory (TM) promises to become a fast, simple and scalable parallel programming paradigm for future shared memory multiprocessor systems. Among the multitude of hardware TM design points and... more

Bookmark
Download
- by Anurag Negi
- •
- 8
  Computer Science, Conflict Resolution, Transactional Memory, Concurrency

Lazy hardware transactional memory has been shown to be more efficient at extracting available concurrency than its eager counterpart. However, it poses scalability challenges at commit time as existence of conflicts among concurrent... more

Thread Level Speculation

Log In