IRAM

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 32

Prepared by..............................

Justin K George

Introduction
y One of the biggest performance challenges in computer systems today is the speed mismatch between microprocessors and memory. y In current practice, microprocessors and DRAMs being fabricated as different chips on different fab lines. So we merge the processor and the DRAM onto a single chip called IRAM. y IRAM is attractive today for several reasons.  1) The gap between processor and DRAM speed is growing at 50% per year;

Processor Memory Performance Gap

 2) The actual processor occupies only about onethird of the die,

the upcoming DRAM has enough capacity that whole programs and data sets can fit on a single chip.  3)DRAM dies have grown about 50% each generation; -DRAMs are being made with more metal layers to accelerate the longer lines of these larger chips. -Also, the high speed interface of synchronous DRAM will require fast transistors on the DRAM chip. These two DRAM trends should make logic on DRAM closer to the speed of logic on logic fabs than in the past. System architects have attempted to bridge the processor-memory performance gap by introducing deeper and deeper cache memory hierarchies; unfortunately, this makes the memory latency even longer in the worst case.

SRAM also has the following disadvantages It is more expensive Takes upto a lot more space Requires more power Dessipates more heat
To address this challenge,processor designers now fabricate the DRAM into the processor, since most of transistors on this merged chip will be devoted to memory.

IRAM
y Given the growing processor-memory performance

gap and the awkwardness of high capacity DRAM chips, we believe that it is time to consider unifying logic and DRAM such a chip is called "IRAM", standing for Intelligent RAM. y One more reason to put the processor in DRAM rather than increasing the on-processor SRAM is that DRAM is in practice approximately 25 times denser than SRAM. Thus, IRAM enables a much larger amount of on-chip memory than in a conventional architecutre. y Intelligent RAM or IRAM merges processor and memory into a single chip.

Serial I/O lines a natural match for IRAM


Benefits y Avoids large number of pins for parallel I/O buses. y IRAM can sink high I/O rate without interfering with computation. How well will serial I/O work for IRAM? y Serial lines provide high I/O bandwidth for I/Ointensive applications. y I/O bandwidth incrementally scalable by adding more lines. y Number of pins required still lower than parallel bus.

Serial I/O and IRAM


How to overcome limited memory capacity of single IRAM? y SmartSIMM: collection of IRAMs (and optionally external DRAMs) y Can leverage high-bandwidth I/O to compensate for limited memory y In addition to other strengths, IRAM with serial lines provides high I/O bandwidth

y The goal of Intelligent RAM (IRAM) is to design a costeffective computer by designing a processor in a memory fabrication process, instead of in a conventional logic fabrication process, and include memory on chip. y System-on-a-chip integration means chip can decide how to: -Notify processor of I/O events -Keep caches coherent -Update memory y On-chip DRAM array has much higher capacity than SRAM array of same area y It is a research model for the next generation of DRAM.

POTENTIAL ADVANTAGES OF IRAM


y 1) Higher Bandwidth. A DRAM naturally has extraordinary internal bandwidth, an on-chip processor can tap that bandwidth.The potential bandwidth of the gigabit DRAM is even greater than indicated by its logical organization. Since it is important to keep the storage cell small, the normal solution is to limit the length of the bit lines, typically with 256 to 512 bits per sense amp. This quadruples the number of sense amplifiers. To save die area, each block has a small number of I/O lines, which reduces the internal bandwidth by a factor of about 5 to 10 but still meets the external demand. One IRAM goal is to capture a larger fraction of the potential on-chip bandwidth.

2) Lower Latency.
To reduce latency, y The wire length should be kept as short as possible. y In addition, the DRAM cells furthest away from the processor will be slower than the closest ones. Also, being on the same chip with the DRAM, the processor avoids y driving the offchip wires, y potentially turning around the data bus, and y accessing an external memory controller. In summary, the access latency of an IRAM processor is lower than that of standard DRAM part. Much lower latency may be obtained by intelligent floor planning, utilizing faster circuit topologies, and redesigning the address/data bussing schemes.

These first two points suggest IRAM offers performance opportunities for two types of applications: 1. Applications with predictable memory accesses, such as matrix manipulations, may take advantage of the potential 50X to 100X increase in IRAM bandwidth; and 2. Applications with unpredictable memory accesses and very large memory "footprints", such as data bases, may take advantage of the potential 5X to 10X decrease in IRAM latency.

3) Energy Efficiency
y IRAM offers the potential for improving energy consumption of the memory system. DRAM is much denser than SRAM, which is traditionally used for on-chip memory. Therefore, an IRAM will have many fewer external memory accesses, which consume a great deal of energy to drive high-capacitance off-chip buses. Even on-chip accesses will be more energy efficient, since DRAM consumes less energy than SRAM. Finally, an IRAM has the potential for higher performance than a conventional approach. Since higher performance for some fixed energy consumption can be translated into equal performance at a lower amount of energy, the performance advantages of IRAM can be translated into lower energy consumption

Cntn........
y IRAM reduces the frequency of accesses to lower levels

of the memory hierarchy, which require more energy y IRAM reduces energy to access various levels of the memory hierarchy y Consequently, IRAM reduces the average energy per instruction: y Energy per memory access = AEL1 + (MRL1 * AEL2 + (MRL2 * AEoff-chip)) where AE = access energy and MR = miss rate

4) Memory Size and Width.


y Another advantage of IRAM over conventional designs

is the ability to adjust both the size and width of the on-chip DRAM. Rather than being limited by powers of 2 in length or width, as is conventional DRAM, IRAM designers can specify exactly the number of words and their width. This flexibility can improve the cost of IRAM solutions versus memories made from conventional DRAMs.

5) Board Space.
y No need of parallel DRAMs, memory controller, bus to

turn around, SIMM module, pins y Finally, IRAM may be attractive in applications where board area is precious --such as cellular phones or portable computers--since it integrates several chips into one.

Potential Disadvantages of IRAM


y 1) Area and speed of logic in a DRAM process:

Discussions with experts in circuit design and process have suggested that the area cost might be 30% to 70%, and the speed cost today might be 30% to 100%. y 2) Area and power impact of increasing bandwidth to DRAM core: Standard DRAMcores are designed with few, highly multiplexed I/O lines to reduce area and power. To make effective use of a DRAM core s internal bandwidth, we will need to add more I/O lines. The area increase will affect the cost per bit of IRAM. y 3) Retention time of DRAM core when operating at high temperatures: Halving the retention rate for every increase of 10 degrees centigrade; thus, refresh rates could rise dramatically if the IRAM is run at the temperature of some microprocessors.

y 4) Scaling a system beyond a single IRAM: Even though a gigabit DRAM contains 128Mbytes, there will certainly be systems needing more memory. Thus a major architecture challenge is quantifying the pros and cons over several potential solutions. y 5) Matching IRAM to the commodity focus of the DRAM industry: Today s DRAMs are second sourced commodities that are interchangeable, which allows them to be manufactured in high volumes. Unless a single processor architecture was adopted, adding a processor would stratify IRAMs and effectively reduce interchangeability. y 6) Testing IRAM: The cost of testing during manufacturing is significant for DRAMs.Adding a processor would significantly increase the test time on conventional DRAM testers.

Applications
y 1) Accelerators. This category includes some logic on

chip to make a DRAM run well for a restricted application. Most of the efforts have been targeted at graphics, where logic is included with memory to be used as the frame buffer. The best known example is Video DRAM. Other examples are Mitsubishi s 3DDRAM, which includes a portion of the Zbuffer logic with 10 Mbits of DRAM to speedup 3D graphics, and Neomagic s graphics accelerator for portable PCs. A non-graphics examples include a L2 cache thatuses DRAM to increase size .

2) Uniprocessors. y This category combines a processor with on-chip DRAM. This part might be attractive because of high performance, good power-performance of the system, good cost-performance of the system, or combinations of all three.

y 3) Multiprocessors. This category includes chips

intended exclusively to be used as a building block in a multiprocessor, IRAMs that include a MIMD (Multiple Instruction streams, Multiple Data streams) multiprocessor within a single chip, and IRAMs that include a SIMD (Single Instruction stream, Multiple Data streams) multiprocessor, or array processor, within a single chip. This category is the most popular research area for IRAMs.

Future predictions
y Multiprocessors on a chip y More compilor information migrating to processor y More integration of system components onto the

processor y Limited fine grained explicit parellilism y Dealing with wire delay domination

Conclusion
Merging a microprocessor and DRAM on the same chip presents opportunities in  performance  a factor of 5 to 10 reduction in latency  a factor of 50 to 100 increase in bandwidth  a factor of 2 to 4 advantage in energy efficiency  and anunquantified cost savings by removing superfluous memory and by reducing board area.

Query time

Thank you

Frequency of Accesses
y On-chip DRAM array has much higher capacity than

SRAM array of same area y IRAM reduces the frequency of accesses to lower levels of memory hierarchy, which require more energy y On-chip DRAM organized as L2 cache has lower offchip miss rates than L2 SRAM, reducing the off-chip energy penalty y When entire main memory array is on-chip, high offchip energy cost is avoided entirely

Energy of Accesses
y IRAM reduces energy to access various levels of the

memory hierarchy y On-chip memory accesses use less energy than offchip accesses by avoiding high-capacitance offchip bus y Multiplexed address scheme of conventional DRAMs selects larger number of DRAM arrays than necessary y Narrow pin interface of external DRAM wastes energy in multiple column cycles needed to fill entire cache block

For 1 access, measured in nJoules:


Conventional on-chip L1$(SRAM) on-chip L2$(SRAM vs. DRAM) L1 to Memory (off- vs. on-chip) L2 to Memory (off-chip) 0.5 2.4 98.5 316.0 IRAM o.5 1.6 4.6 (n.a.)

You might also like