Thread Level Speculation
4 Followers
Recent papers in Thread Level Speculation
Heterogeneous systems that integrate a multicore CPU and a GPU on the same die are ubiquitous. On these systems, both the CPU and GPU share the same physical memory as opposed to using separate memory dies. Although integration eliminates... more
This research investigates the feasibility of developing low-cost, personal computer-based parallel processing procedures that can be applicable to real time simulation of freeway flows. Specific objectives include, * Development of a... more
Speculative parallelization is a technique that tries to extract parallelism of loops that can not be parallelized at compile time. The underlying idea is to optimistically execute the code in parallel, while a subsystem checks that... more
With speculative parallelization, code sections that cannot be fully analyzed by the compiler are aggressively executed in parallel. Hardware schemes are fast but expensive and require modifications to the processors and memory system.... more
Speculative parallelization techniques allow to extract parallelism of fragments of code that can not be analyzed at compile time. However, research on software-based, thread-level speculation will greatly benefit from an appropriate... more
Software-based, thread-level speculation (TLS) systems allow the parallel execution of loops that can not be analyzed at compile time. TLS systems optimistically assume that the loop is parallelizable, and augment the original code with... more
Robustness is a key issue on any runtime system that aims to speed up the execution of a program. However, robustness considerations are commonly overlooked when new software-based, thread-level speculation (STLS) systems are proposed.... more
With speculative parallelization, code sections that cannot be fully analyzed by the compiler are optimistically executed in parallel. Hardware schemes are fast but expensive and require modifications to the processors and/or memory... more
Some emerging technologies try to exploit the parallel capabilities of modern processors.
With the rapid expansion of process mining implementation in global enterprises distributed across numerous branches, there is a critical requirement to develop an application qualified for real-time operation with fast and precise data... more
With the advent of parallel architectures, distributed programs are used intensively and the question of how to formally specify the behaviors expected from such programs becomes crucial. A very general way to specify concurrent objects... more
This paper proposes a new processor architecture for handling hard-to-predict branches, the diverge-merge processor (DMP). The goal of this paradigm is to eliminate branch mispredictions due to hard-to-predict dynamic branches by... more
Lazy hardware transactional memory has been shown to be more efficient at extracting available concurrency than its eager counterpart. However, it poses scalability challenges at commit time as existence of conflicts among concurrent... more
Thread-Level Speculation (TLS) facilitates the extraction of parallel threads from sequential applications. Most prior work has focused on developing the compiler and architecture for this execution paradigm. Such studies often narrowly... more
Work by Hill and Wood was performed while consulting for AMD Research. Work by Hechtman was performed while on internship at AMD Research.
Effective execution of atomic blocks of instructions (also called transactions) can enhance the performance and programmability of multiprocessors. Atomic blocks can be demarcated in software as in Transactional Memory (TM) or dynamically... more
In shared-memory multicore architectures, handling a write cache operation is more complicated than in singleprocessor systems. A cache line may be present in more than one private L1 cache. Any cache willing to write this line must... more
The VLIW model describes a philosophy whereby the compiler organizes several nondependent machine operations into the same instructio:n word. Some features of this form of architecture are illustrated and certain strategies on presenting... more
Hardware transactional memory (HTM) systems have been studied extensively along the dimensions of speculative versioning and contention management policies. The relative performance of several designs policies has been discussed at length... more
Scheduling for speculative parallelization is a problem that remained unsolved despite its importance. Simple methods such as Fixed-Size Chunking (FSC) need several 'dry-runs' before an acceptable chunk size is found. Other traditional... more
This research investigates the feasibility of developing low-cost, personal computer-based parallel processing procedures that can be applicable to real time simulation of freeway flows. Specific objectives include, * Development of a... more
CEFOS is an operating system based on a continuationbased zero-wait thread model derived from a data-flow computing model. A program consists of zero-wait threads, each of which runs to completion without suspension once started.... more
In parallel processing, fine-grain parallel processing is quite effective solution for latency problem caused by remote memory accesses and remote procedure calls. We have proposed a processor architecture, called Datarol-II, that... more
Software developers often face challenges in terms of quality and productivity to match competitive costs. The software industry seeks options to minimize this cost during different phases of software development and maintenance with... more
Transactions are a simple and powerful mechanism for establishing fault-tolerance. To allow multiple processes to cooperate in a transaction we relax the isolation property and use message passing for communication. We call the new... more
建築の「ものづくり分析」を行う意味 第1部 ものづくり経営学から見た建築(建築物と「広義のものづくり」分析;日本型建築生産システムの成立とその強み・弱み—ゼネコンを中心とした擦り合わせ型アーキテクチャの形成と課題;建築における価値創造—建築設計、建築施工において求められる「機能」の実現;プロダクトからサービスへ) 第2部... more
To obtain the benefits of aggressive, wide-issue, architectures, a large window of valid instructions must be available. While researchers have been successful in obtaining high accuracies with a range of dynamic branch predictors, there... more
The notion of permissiveness in Transactional Memory (TM) translates to only aborting a transaction when it cannot be accepted in any history that guarantees correctness criterion. This property is neglected by most TMs, which, in order... more
Software developers often face challenges in terms of quality and productivity to match competitive costs. The software industry seeks options to minimize this cost during different phases of software development and maintenance with... more
The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and... more
In this paper, we present parallel algorithms for lossless data compression based on the Burrows-Wheeler Transform (BWT) block-sorting technique. We investigate the performance of using data parallelism and task parallelism for both... more
This paper proposes a new runtime parallelization technique, based on a dynamic optimization framework, to automatically parallelize single-threaded legacy programs. It heavily leverages the optimistic concurrency of transactional memory.... more
This research demonstrates that coming support for hardware transactional memory can be leveraged to significantly reduce the cost of implementing true speculative multithreading. In particular, it explores the path from eager conflict... more
Dependence-aware transactional memory (DATM) is a recently proposed model for increasing concurrency of memory transactions without complicating their interface. DATM manages dependences between conflicting, uncommitted transactions so... more
This paper describes a generalisation of modulo scheduling to parallelise loops for SpMT processors that exploits simultaneously both instruction-level parallelism and thread-level parallelism while preserving the simplicity and... more
Implicit Parallelism with Ordered Transactions (IPOT) is an extension of sequential or explicitly parallel programming models to support speculative parallelization. The key idea is to specify opportunities for parallelization in a... more
Chip Multiprocessors (CMP) with Thread-Level Speculation (TLS) have become the subject of intense research. However, TLS is suspected of being too energy inefficient to compete against conventional processors. In this paper, we refute... more
Chip Multiprocessors (CMPs) are flexible, high-frequency platforms on which to support Thread-Level Speculation (TLS). However, for TLS to deliver on its promise, CMPs must exploit multiple sources of speculative task-level parallelism,... more
As multi-core architectures with Thread-Level Speculation (TLS) are becoming better understood, it is important to focus on TLS compilation. TLS compilers are interesting in that, while they do not need to fully prove the independence of... more
As Thread-Level Speculation (TLS) architectures are becoming better understood, it is important to focus on the role of TLS compilers. In systems where tasks are generated in software, the compiler often has a major performance impact:... more
Transactional Memory (TM), Thread-Level Speculation (TLS), and Checkpointed multiprocessors are three popular architectural techniques based on the execution of multiple, cooperating speculative threads. In these environments, correctly... more
When supported in silicon, transactional memory (TM) promises to become a fast, simple and scalable parallel programming paradigm for future shared memory multiprocessor systems. Among the multitude of hardware TM design points and... more
Lazy hardware transactional memory has been shown to be more efficient at extracting available concurrency than its eager counterpart. However, it poses scalability challenges at commit time as existence of conflicts among concurrent... more