Papers by Georgios Keramidas
Alexandre Amory Andre Reis Ankur Gupta Antonis Papanikolaou Bruno Allard Christian Weis Christina... more Alexandre Amory Andre Reis Ankur Gupta Antonis Papanikolaou Bruno Allard Christian Weis Christina Gimmler-Dumont Christoforos Kachris Colin Stevens Da-Chen Tzeng Dipankar Saha Egas Neto Elias Kougianos Elias Kougianos Faizal Arya Samman Fazal Hameed François Philipp Geng Zheng Georgios Keramidas Guihai Yan Harry Sidiropoulos Hua-Hsin Yeh Hussam Amrouch Jan Heisswolf Jose Costa Josef Schneider Jude Ambrose Julian José Hilgemberg Pontes Kaushal Gandhi Kele Shen Krishnan Sundaresan Mahtab Niknahad
Lecture Notes in Computer Science, 2007
In this paper, we propose a novel approach to reduce dynamic power in set-associative caches that... more In this paper, we propose a novel approach to reduce dynamic power in set-associative caches that leverages on a leakage-saving proposal, namely Cache Decay. We thus open the possibility to unify dynamic and leakage management in the same framework. The main intuition is that in a decaying cache, dead lines in a set need not be searched. Thus, rather than trying to predict which cache way holds a specific line, we predict, for each way, whether the line could be live in it. We access all the ways that possibly contain the live line and we call this way-selection. In contrast to way-prediction, way-selection cannot be wrong: the line is either in the selected ways or not in the cache. The important implication is that we have a fixed hit timeindispensable for both performance and ease-of-implementation reasons. In order to achieve high accuracy, in terms of total ways accessed, we use Decaying Bloom filters to track only the live lines in waysdead lines are automatically purged. We offer efficient implementations of such autonomously Decaying Bloom filters, using novel quasi-static cells. Our prediction approach grants us high-accuracy in narrowing the choice of ways for hits as well as the ability to predict missesa known weakness of way-prediction. ⋆ This work has been conducted while Polychronis Xekalakis was studying at the University of Patras, Greece.
System-Level Design Methodologies for Telecommunication, 2013
Proceedings of the 2005 international symposium on Low power electronics and design - ISLPED '05, 2005
Leakage power reduction in cache memories continues to be a critical area of research because of ... more Leakage power reduction in cache memories continues to be a critical area of research because of the promise of a significant pay-off. Various techniques have been developed so far that can be broadly categorized into state-preserving (e.g., Drowsy Caches) and non-state preserving (e.g., Cache Decay). Decay saves more leakage but also incurs dynamic power overhead in the form of induced misses. Previous work has shown that depending on the leakage vs. dynamic power trade-off, one or the other technique can be better. Several factors such as cache architecture, technology parameters and temperature, affect this trade-off. Our work proposes the first mechanism -to the best of our knowledge-that takes into account temperature in adjusting the leakage control policy at run time. At very low temperatures, leakage is relatively weak so the need to tightly control it is not as important as the need to minimize extra dynamic power (e.g., decay-induced misses) or performance loss. We use a hybrid decay+drowsy policy where the main benefit comes from decaying cache lines while the drowsy mode is used to save leakage in long decay intervals. To adapt the decay mode to temperature, we propose a simple triggering mechanism that is based on the principles of decaying 4T thermal sensors and, as such, tied to temperature. The hotter the cache is, the faster cache lines are decayed since it is beneficial to do so with very high leakage currents.Conversely, when the cache temperature is low, our mechanism defers putting cache lines in decay mode to avoid dynamic power overhead but still saves a significant amount of leakage using the drowsy mode. Our study shows that across a wide range of temperatures, the simple adaptability of our proposal yields consistently better results than either the decay mode, or drowsy mode alone, improving over the best by as much as 33%.
This paper presents a new cache replacement policy based on Instruction-based Reuse Distance Pred... more This paper presents a new cache replacement policy based on Instruction-based Reuse Distance Prediction (IbRDP) Replacement Policy originally proposed by Keramidas, Petoumenos, and Kaxiras [5] and further optimized by Petoumenos et al. [6]. In these works we have proven that there is a strong correlation between the temporal characteristics of the cache blocks and the access patterns of instructions (PCs) that touch these cache blocks. Based on this observation we introduced a new class of instruction-based predictors which are able to directly predict with high accuracy at run-time when a cache block is going to be accessed in the future, a.k.a. the reuse distance of a cache block. Being able to predict the reuse distances of the cache blocks permits us to make near-optimal replacement decisions by "looking into the future." In this work, we employ an extension of the IbRDP Replacement policy [6]. We carefully re-design the organization as well as the functionality of the predictor and the corresponding replacement algorithm in order to fit into the tight area budget provided by the CRC committee [3]. Since our proposal naturally supports the ability to victimize the currently fetched blocks by not caching them at all in the cache (Selective Caching), we submit for evaluation two versions: the base-IbRDP and the IbRDP enhanced with Selective Caching (IbRDP+SC).
Lecture Notes in Computer Science, 2006
Data cache compression is actively studied as a venue to make better use of onchip transistors, i... more Data cache compression is actively studied as a venue to make better use of onchip transistors, increase apparent capacity of caches, and hide the long memory latencies. While several techniques have been proposed for L2 compression, L1 compression is an elusive goal. This is due to L1's sensitivity to latency and the inability to create compression schemes that are both fast and adaptable to program behavior, i.e. dynamic. In this paper, we propose the first dynamic dictionary-based compression mechanism for L1 data caches. Our design solves the problem of keeping the compressed contents of the cache and the dictionary entries consistent, using a timekeeping decay technique. A dynamic compression dictionary adapts to program behavior without the need of profiling techniques and/or training phases. We compare our approach to previously proposed static dictionary techniques and we show that we surpass them in terms of power, hit ratio and energy delay product.
Lecture Notes in Computer Science, 2009
In this paper, we propose a novel approach to reduce dynamic power in set-associative caches that... more In this paper, we propose a novel approach to reduce dynamic power in set-associative caches that leverages on a leakage-saving proposal, namely Cache Decay. We thus open the possibility to unify dynamic and leakage management in the same framework. The main intuition is that in a decaying cache, dead lines in a set need not be searched. Thus, rather than trying to predict which cache way holds a specific line, we predict, for each way, whether the line could be live in it. We access all the ways that possibly contain the live line and we call this way selection. In contrast to way-prediction, way-selection cannot be wrong: the line is either in the selected ways or not in the cache. The important implication is that we have a fixed hit timeindispensable for both performance and ease-of-implementation reasons. One would expect way-selection to be inferior to sophisticated way-prediction in terms of the total ways accessed, but in fact it can even do better. To achieve this level of accuracy we use Decaying Bloom filters to track only the live lines in waysdead lines are automatically purged. We offer efficient implementations of such autonomously Decaying Bloom filters, using novel quasi-static cells. Our prediction approach affords us high-accuracy in narrowing the choice of ways for hits as well as the ability to predict missesa known weakness of way-predictionthus outperforming sophisticated way-prediction. Furthermore, our approach scales significantly better than way-prediction to higher associativity. We show that decay is a necessary component in this approachway-selection and Bloom filters alone cannot compete with sophisticated way-prediction. We compare our approach to Multi-MRU and we show that without even considering leakage savingswe surpass it terms of relative power savings and in relative energy-delay in 4-way (9%) and more so in 8-way (20%) and 16-way caches (31%).
Proceedings of the 7th ACM international conference on Computing frontiers - CF '10, 2010
and AMD Phenom II processors. By exploiting slack, inherent in memory-bound programs, our approac... more and AMD Phenom II processors. By exploiting slack, inherent in memory-bound programs, our approach aims to improve power efficiency even when the processor does not sit idle. Our underlying methodology is based on a simple first-order processor performance model in which frequency scaling is expressed as a change (in cycles) of the main memory latency. Utilizing available performance monitoring hardware, we show that our model is powerful enough to i) predict with reasonable accuracy the effect of frequency scaling (in terms of performance loss), and ii) predict the energy consumed by the core under different V/f combinations. To validate our approach we perform high-accuracy, fine-grain, power measurements directly on the off-chip voltage regulators. We use our model to implement various DVFS policies as Linux “green ” governors to continuously optimize for various powerefficiency metrics such as EDP (Energy-Delay Product) or ED 2 P (Energy-Delay-Square Product), or achieve energy ...
Lecture Notes in Computer Science, 2006
Abstract. Denial-of-Service (DoS) attacks try to exhaust some shared resources (eg process tables... more Abstract. Denial-of-Service (DoS) attacks try to exhaust some shared resources (eg process tables, functional units) of a service-centric provider. As Chip Multi-Processors (CMPs) are becoming mainstream architecture for server class processors, the need to manage on-chip ...
2009 International Symposium on Systems, Architectures, Modeling, and Simulation, 2009
The effect of caching is fully determined by the program locality or the data reuse and several c... more The effect of caching is fully determined by the program locality or the data reuse and several cache management techniques try to base their decisions on the prediction of temporal locality in programs. However, prior work reports only rough techniques which either try to predict when a cache block loses its temporal locality or try to categorize cache items as highly or poorly temporal. In this work, we quantify the temporal characteristics of the cache block at run time by predicting the cache block reuse distances (measured in intervening cache accesses), based on the access patterns of the instructions (PCs) that touch the cache blocks. We show that an instruction-based reused distance predictor is very accurate and allows approximation of optimal replacement decisions, since we can "see" the future. We experimentally evaluate our prediction scheme in various sizes L2 caches using a subset of the most memory intensive SPEC2000 benchmarks. Our proposal obtains a significant improvement in terms of IPC over traditional LRU up to 130.6% (17.2% on average) and it also outperforms the previous state of the art proposal (namely Dynamic Insertion Policy or DIP) by up to 80.7% (15.8% on average).
2007 25th International Conference on Computer Design, 2007
Several cache management techniques have been proposed that indirectly try to base their decision... more Several cache management techniques have been proposed that indirectly try to base their decisions on cacheline reuse-distance, like Cache Decay which is a postdiction of reuse-distances: if a cacheline has not been accessed for some "decay interval" we know that its reuse-distance is at least as large as this decay interval. In this work, we propose to directly predict reuse-distances via instruction-based (PC) prediction and use this information for cache level optimizations. In this paper, we choose as our target for optimization the replacement policy of the L2 cache, because the gap between the LRU and the theoretical optimal replacement algorithm is comparatively large for L2 caches. This indicates that, in many situations, there is ample room for improvement. We evaluate our reusedistance based replacement policy using a subset of the most memory intensive SPEC2000 and our results show significant benefits across the board.
Uploads
Papers by Georgios Keramidas