IRAM
IRAM
IRAM
Justin K George
Introduction
y One of the biggest performance challenges in computer systems today is the speed mismatch between microprocessors and memory. y In current practice, microprocessors and DRAMs being fabricated as different chips on different fab lines. So we merge the processor and the DRAM onto a single chip called IRAM. y IRAM is attractive today for several reasons. 1) The gap between processor and DRAM speed is growing at 50% per year;
the upcoming DRAM has enough capacity that whole programs and data sets can fit on a single chip. 3)DRAM dies have grown about 50% each generation; -DRAMs are being made with more metal layers to accelerate the longer lines of these larger chips. -Also, the high speed interface of synchronous DRAM will require fast transistors on the DRAM chip. These two DRAM trends should make logic on DRAM closer to the speed of logic on logic fabs than in the past. System architects have attempted to bridge the processor-memory performance gap by introducing deeper and deeper cache memory hierarchies; unfortunately, this makes the memory latency even longer in the worst case.
SRAM also has the following disadvantages It is more expensive Takes upto a lot more space Requires more power Dessipates more heat
To address this challenge,processor designers now fabricate the DRAM into the processor, since most of transistors on this merged chip will be devoted to memory.
IRAM
y Given the growing processor-memory performance
gap and the awkwardness of high capacity DRAM chips, we believe that it is time to consider unifying logic and DRAM such a chip is called "IRAM", standing for Intelligent RAM. y One more reason to put the processor in DRAM rather than increasing the on-processor SRAM is that DRAM is in practice approximately 25 times denser than SRAM. Thus, IRAM enables a much larger amount of on-chip memory than in a conventional architecutre. y Intelligent RAM or IRAM merges processor and memory into a single chip.
y The goal of Intelligent RAM (IRAM) is to design a costeffective computer by designing a processor in a memory fabrication process, instead of in a conventional logic fabrication process, and include memory on chip. y System-on-a-chip integration means chip can decide how to: -Notify processor of I/O events -Keep caches coherent -Update memory y On-chip DRAM array has much higher capacity than SRAM array of same area y It is a research model for the next generation of DRAM.
2) Lower Latency.
To reduce latency, y The wire length should be kept as short as possible. y In addition, the DRAM cells furthest away from the processor will be slower than the closest ones. Also, being on the same chip with the DRAM, the processor avoids y driving the offchip wires, y potentially turning around the data bus, and y accessing an external memory controller. In summary, the access latency of an IRAM processor is lower than that of standard DRAM part. Much lower latency may be obtained by intelligent floor planning, utilizing faster circuit topologies, and redesigning the address/data bussing schemes.
These first two points suggest IRAM offers performance opportunities for two types of applications: 1. Applications with predictable memory accesses, such as matrix manipulations, may take advantage of the potential 50X to 100X increase in IRAM bandwidth; and 2. Applications with unpredictable memory accesses and very large memory "footprints", such as data bases, may take advantage of the potential 5X to 10X decrease in IRAM latency.
3) Energy Efficiency
y IRAM offers the potential for improving energy consumption of the memory system. DRAM is much denser than SRAM, which is traditionally used for on-chip memory. Therefore, an IRAM will have many fewer external memory accesses, which consume a great deal of energy to drive high-capacitance off-chip buses. Even on-chip accesses will be more energy efficient, since DRAM consumes less energy than SRAM. Finally, an IRAM has the potential for higher performance than a conventional approach. Since higher performance for some fixed energy consumption can be translated into equal performance at a lower amount of energy, the performance advantages of IRAM can be translated into lower energy consumption
Cntn........
y IRAM reduces the frequency of accesses to lower levels
of the memory hierarchy, which require more energy y IRAM reduces energy to access various levels of the memory hierarchy y Consequently, IRAM reduces the average energy per instruction: y Energy per memory access = AEL1 + (MRL1 * AEL2 + (MRL2 * AEoff-chip)) where AE = access energy and MR = miss rate
is the ability to adjust both the size and width of the on-chip DRAM. Rather than being limited by powers of 2 in length or width, as is conventional DRAM, IRAM designers can specify exactly the number of words and their width. This flexibility can improve the cost of IRAM solutions versus memories made from conventional DRAMs.
5) Board Space.
y No need of parallel DRAMs, memory controller, bus to
turn around, SIMM module, pins y Finally, IRAM may be attractive in applications where board area is precious --such as cellular phones or portable computers--since it integrates several chips into one.
Discussions with experts in circuit design and process have suggested that the area cost might be 30% to 70%, and the speed cost today might be 30% to 100%. y 2) Area and power impact of increasing bandwidth to DRAM core: Standard DRAMcores are designed with few, highly multiplexed I/O lines to reduce area and power. To make effective use of a DRAM core s internal bandwidth, we will need to add more I/O lines. The area increase will affect the cost per bit of IRAM. y 3) Retention time of DRAM core when operating at high temperatures: Halving the retention rate for every increase of 10 degrees centigrade; thus, refresh rates could rise dramatically if the IRAM is run at the temperature of some microprocessors.
y 4) Scaling a system beyond a single IRAM: Even though a gigabit DRAM contains 128Mbytes, there will certainly be systems needing more memory. Thus a major architecture challenge is quantifying the pros and cons over several potential solutions. y 5) Matching IRAM to the commodity focus of the DRAM industry: Today s DRAMs are second sourced commodities that are interchangeable, which allows them to be manufactured in high volumes. Unless a single processor architecture was adopted, adding a processor would stratify IRAMs and effectively reduce interchangeability. y 6) Testing IRAM: The cost of testing during manufacturing is significant for DRAMs.Adding a processor would significantly increase the test time on conventional DRAM testers.
Applications
y 1) Accelerators. This category includes some logic on
chip to make a DRAM run well for a restricted application. Most of the efforts have been targeted at graphics, where logic is included with memory to be used as the frame buffer. The best known example is Video DRAM. Other examples are Mitsubishi s 3DDRAM, which includes a portion of the Zbuffer logic with 10 Mbits of DRAM to speedup 3D graphics, and Neomagic s graphics accelerator for portable PCs. A non-graphics examples include a L2 cache thatuses DRAM to increase size .
2) Uniprocessors. y This category combines a processor with on-chip DRAM. This part might be attractive because of high performance, good power-performance of the system, good cost-performance of the system, or combinations of all three.
intended exclusively to be used as a building block in a multiprocessor, IRAMs that include a MIMD (Multiple Instruction streams, Multiple Data streams) multiprocessor within a single chip, and IRAMs that include a SIMD (Single Instruction stream, Multiple Data streams) multiprocessor, or array processor, within a single chip. This category is the most popular research area for IRAMs.
Future predictions
y Multiprocessors on a chip y More compilor information migrating to processor y More integration of system components onto the
processor y Limited fine grained explicit parellilism y Dealing with wire delay domination
Conclusion
Merging a microprocessor and DRAM on the same chip presents opportunities in performance a factor of 5 to 10 reduction in latency a factor of 50 to 100 increase in bandwidth a factor of 2 to 4 advantage in energy efficiency and anunquantified cost savings by removing superfluous memory and by reducing board area.
Query time
Thank you
Frequency of Accesses
y On-chip DRAM array has much higher capacity than
SRAM array of same area y IRAM reduces the frequency of accesses to lower levels of memory hierarchy, which require more energy y On-chip DRAM organized as L2 cache has lower offchip miss rates than L2 SRAM, reducing the off-chip energy penalty y When entire main memory array is on-chip, high offchip energy cost is avoided entirely
Energy of Accesses
y IRAM reduces energy to access various levels of the
memory hierarchy y On-chip memory accesses use less energy than offchip accesses by avoiding high-capacitance offchip bus y Multiplexed address scheme of conventional DRAMs selects larger number of DRAM arrays than necessary y Narrow pin interface of external DRAM wastes energy in multiple column cycles needed to fill entire cache block