Intelligent RAM (IRAM) : The Industrial Setting, Applications, and Architectures

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Intelligent RAM (IRAM): the Industrial Setting, Applications, and Architectures

David Patterson, Krste Asanovic, Aaron Brown, Richard Fromm,


Jason Golbus, Benjamin Gribstad, Kimberly Keeton, Christoforos Kozyrakis,
David Martin, Stylianos Perissakis, Randi Thomas, Noah Treuhaft, and Katherine Yelick
Computer Science Division, University of California, Berkeley CA 94720-1776

Abstract • This same DRAM fabrication process offers fewer


metal layers than a logic process to lower costs since
The goal of Intelligent RAM (IRAM) is to design a routing speed is less of an issue in a memory;
cost-effective computer by designing a processor in a • DRAMs are designed to work in plastic packages and
memory fabrication process, instead of in a conventional dissipate less than 2 watts, while desktop microproces-
logic fabrication process, and include memory on-chip. sors dissipate 20 to 50 watts using ceramic packages;
To design a processor in a DRAM process one must
• DRAM refresh rates go up with operating temperature,
learn about the business and culture of the DRAMs, which
approximately doubling for every 10 degrees C raise;
is quite different from microprocessors. We describe some
of those differences, and then our current vision of IRAM • Some applications may not fit within the on-chip mem-
applications, architectures, and implementations. ory of an IRAM, and hence IRAMs must access either
conventional DRAMs or other IRAMs over a much
1. Potential and Challenges of IRAM slower path than on-chip accesses.
Another major IRAM challenge is matching the cost
Intelligent RAM (IRAM) may lead to a different style
of DRAM memory. Cost obstacles include:
of computer than those based on conventional micropro-
cessors. IRAM technology offers the following potential: • DRAMs include redundant memory so that fabrication
flaws can be circumvented to improve yield and there-
• Improve memory latency by factors of 5 to 10 and fore lower cost. Microprocessors traditionally have no
memory bandwidth by factors of 50 to 100, by rede-
redundant logic to improve yield. Hence the on-chip
signing the memory interface and exploiting the prox-
logic may effectively determine the yield of the IRAM.
imity of on-chip memory [1][2];
• Testing time affects chip costs. Given both logic and
• Improve energy efficiency of memory by factors of 2 to DRAM on a the same die, an IRAM die may need to be
4, primarily by going off-chip less frequently [3][4];
tested on both logic and memory testers.
• Reduce design effort tenfold by filling the die with rep- • To help close the performance gap for logic in a DRAM
licated memory rather than with custom logic [5];
process, merged-logic DRAM processes are being cre-
• Make the memory size and organization fit the intended ated with faster transistors and more metal layers which
workload; increasing the cost per wafer by 10% to 30%.
• Reduce board area by factors of 4 or much greater by The business model for IRAM also has challenges.
integrating many components on a single chip; and Although an IRAM may be classified as a single chip
• Improve I/O bandwidth by factors of 4 to 8 by replacing computer and sold like desktop or embedded microproces-
the conventional I/O bus with multiple high-speed, sors, the initial companies most interested in pursuing
point-to-point, serial lines. [6-8] IRAMs are DRAM companies, and they generally have
little experience in the microprocessor market. Some chal-
This list makes IRAM an exciting opportunity.
lenges are:
One IRAM challenge is matching the performance of
microprocessors. Performance obstacles include: • DRAMs are “generic” parts, used in many places with-
out impacting the software. Putting a processor in the
• IRAM is fabricated in a process that has been oriented DRAM limits the software that can run on the IRAM.
towards small memory size and low charge leakage
rather than fast transistor speed; • The DRAM economic model depends on producing a
very high volume of parts––billions of DRAMs are
made each year––while some microprocessors sell less the smallest memory cell so as to have the lowest cost per
than a million per year. bit. The capacity increases are generally achieved by
• DRAMs companies do not need to worry about a sup- reducing cell size by about a factor of 2.5 and increasing
ply of support and application software for their chips. the die size by a factor of 1.5. The increasing die size is a
IRAM would change that requirement. major reason that the cost per bit changes more slowly
than the capacity per chip. A secondary design target is
This paper first goes into more depth on the DRAM bandwidth in the fast access mode, and the trailing con-
industry to motivate initial solutions to the IRAM chal- cern is latency to access a random bit in memory.
lenges, looks at potential IRAM applications and architec- A DRAM company’s business goal is typically to
tures, and then concludes with our target implementation supply 10% of a single DRAM generation. As there were
alternatives that could be taped out in 1999. 6.25 billion DRAMs shipped in 1996, such an apparently
modest target can lead to hundreds of millions of chips.
2. DRAM and Microprocessor Industries Desktop microprocessor designers tend to have a
Figure 1 highlights some of the differences between design cost target, expressed as die size, and then build the
the DRAM industry and the desktop microprocessor fastest chip they can for that size. Microprocessor volumes
industry. DRAM companies agree on new standard inter- have more to do with an instruction set target than with the
faces for new generations and configurations of DRAMs. actual final performance, but given that microprocessor
These standards include almost everything: pinout, pack- designers generally do not get to pick the instruction set,
age, addressing, refresh rates, and so on. they aim for the highest performance. Of secondary
Each microprocessor manufacturer generally sets importance is the cost. Recently the power dissipation has
their own instruction set standards to ensure software become so high that it is now a concern.
compatibility with prior generations, but is free to invent Embedded microprocessor designers have much
new interfaces with different packages and pins, different lower cost targets and power budgets, and more likely to
memory interfaces, and so on. Whereas microprocessors sacrifice performance to ensure meeting the cost/power
follow their own architecture standards with varying budgets than designers of desktop microprocessors.
implementations over time, DRAM manufacturers stan- The different figures of merit for memory designers
dardize at the package level and innovate in the size of the and microprocessor designers have resulted in a perfor-
mance gap between processor and memory in computer
memory cell and in efficiency of manufacturing process.
systems. The primary approach to bridging this gap has
been increasing the amount of SRAM on a microprocessor
DRAM Microprocessor to act as a cache. Today, many microprocessors dedicate
Standards pinout, package, refresh binary compati- between one-third and two-thirds of the area on chip to
rate, addressing, capac- bility, IEEE 754 these caches.[2] Moreover, today there are often external
ity, width, fast transfer Floating Point, SRAM chips to build secondary caches. Such chips add to
mode, failure rate I/O bus cost and increase board area.
Sources multiple single
2.2. Differing Generation Strategies
Key figures 1) capacity, cost/bit 1) performance
of merit 2) bandwidth on standard Traditionally, DRAM manufacturers would design a
3) latency benchmarks new memory cell and a new fabrication process simulta-
2) cost neously. The company then produces tens of thousands of
Rate of 1) 60%/year, 25%/year 1) 60%/year “engineering samples” until both the fabrication process
improve- 2) 20%/year 2) little change and memory cell design are fully “characterized.” Charac-
ment 3) 7%/year terization means that the resulting dies will operate at min-
Figure 1. Business models of DRAM and desktop
imum refresh rates over the full temperature range
microprocessor industries. supplying data with acceptable bit error rates.
Once characterized, the subsequent chips are at the
2.1. Differing Design Targets “first customer ship” milestone. There may also be a sepa-
rate milestone of “mass production” when the part
Not only do the multiple source versus single source achieves the high volume that DRAM manufacturers
business model affect the design of the chips, the figures strive for. Given that all DRAM manufacturers use the
of merit vary between the two cultures. DRAM designers same semiconductor fabrication equipment and same
pride themselves on improving storage capacity per chip wafers, the time to these milestones can determine what
by fourfold every three years (60%/year) and by having share of the market a company will achieve.
The size of the die, testing time, and yield determine The economic law of supply and demand was invoked
profit of a company that has a sizeable market share. As a in 1996, as DRAM companies increased production and
result, DRAM manufacturers are much more secretive new companies entered the market. Between January 1996
about Spice parameters and design rules than micropro- and December 1996 the price of a 16 Mbit DRAM fell
cessor companies. To lower costs they shrink the die to from about $40 per chip to $6 per chip, below the histori-
increase the number of chips per wafer and improve the cal 25% per year price decline. Stated alternatively, over-
fabrication process to improve yield. As they better under- all DRAM sales fell from $16.5B in 3Q95 to $7B in
stand a process, they will reduce the testing time and may 1Q97. And although prices rose to $8 per 16 Mbit DRAM
even reduce the number of spare rows and columns to get in March 1997, they returned to $6 in August 1997.[9]
slightly smaller dies. DRAMs typically go through 3 to 4 At the same time Intel was posting record profits. In
generations of die sizes over a 4 to 6 year lifetime. 1996 Intel’s net revenue was $20 billion, with a ten year
Recently, DRAM manufactures have separated the growth rate of 30% per year. In recent quarters about a
process and memory cell size from the capacity of the die. third of Intel’s income was profit.
Hence the same line might make third generation 64 Mbit In addition to the interesting potential of the IRAM
and first generation 256 Mbit parts depending on the technology, DRAM companies are hoping that IRAM
demands of the market. Today, it makes more sense today would enable profits per wafer to be more like recent
to talk about the generations of memory cell size and pro- microprocessors wafers than like recent DRAM wafers.
cess rather than just the generation of, say, a 64 Mbit part.
Once in mass production, DRAM die yields below 3. Potential IRAM applications
60% are considered disastrous. Such high yields comes
from small die, low defect density, and using redundant For DRAM manufacturers to enjoy the profits of an
rows and columns to repair some flaws. Although real Intel, they need to find potential IRAM applications that
yields are closely guarded secrets, yields of 80% or 90% sell in the millions. The first three applications could meet
are apparently achieved by some efficient manufacturers. that goal. The last two applications are predicated on the
Microprocessor manufacturers generally are not as success of one or more of these first three, as they are
tightly tied to the fabrication process as are DRAM unlikely to achieve such high volumes.
designers. In fact, there are several “fabless” microproces-
3.1. “Intelligent” Video Game
sor manufacturers, but no major “fabless” DRAM manu-
facturers. Microprocessor designers tend to not worry as Nintendo sold 2.6 million of its latest video player
much about fully characterizing a design. The key mile- for $150 in its first year. Each is based on a four-chip set:
stones tend to be tape out, booting the operating system on one 64-bit MIPS processor chip, one graphic accelerator
an early chip, and then mass production occurs when the chip, and two RAMBUS memory chips. Graphics and
system using the chip is also shipped. Intel, which ships sound have always needed as much performance as possi-
10 to 100 times the volume of other microprocessor manu- ble, with 3D graphics being especially needy in memory
facturers, spends much more time on design verification bandwidth and floating point performance.
and process tweaking to improve yield. An IRAM combining the processor, graphics acceler-
While every chip designer desires high yield, micro- ator, and 4 to 16 megabytes of memory could exploit the
processor designers typically design chips that almost fill orders of magnitude in memory bandwidth and small
the full reticle and hence may be very happy with initial board area advantages of IRAM to offer an attractive chip
yields of 20%. for the next generation of video games.
The die is shrunk once as the technology scales,
thereby improving yield and increasing clock rate. Com- 3.2. “Intelligent” PDA
panies with high volumes like Intel have a shrink team at
Palm-top PDAs are becoming increasingly popular.
work before the die is originally taped out, and will go
For example, 1 million Palm Pilots were sold in its first
through more generations of the die than lower volume
year, each for about $300. The Palm Pilot requires the user
manufacturers.
to learn a new alphabet and then enter the characters with
2.3. Differing Profits a stylus on a touch sensitive screen. Other PDAs offer
miniature keyboards.
Between 1994 and early 1996, DRAM price per If an IRAM could include sufficient computing power
megabyte did not decline by its historical 25% per year. to enable speaker trained, isolated-word speech input to a
Since technology continued to improve and thus costs PDA, the device would be much more useful. In such a
continued to decline, the DRAM industry became increas- machine the stylus would be used to correct the errors,
ingly profitable. usually selected from a pop-up list of potential words. At
90% to 95% word accuracy, achieved by systems like a single IRAM in two to three years to sort more than the
Dragon Dictate, and if 80% to 90% of the time the correct current record. Using a few serial lines to connect a cluster
word is found in the popup error menu, then speaking into of 16 to 32 IRAMs via a switch for network communica-
a PDA could be as fast as typing on a full-sized keyboard. tion and other serial lines connect them to disks could
An IRAM with sufficient performance and 4 to 16 allow this cluster to sort more than 100 GB in a minute.
MB of memory to hold the dictionary, when combined Given that the high volume applications above need inex-
with the advantages of energy efficiency and small board pensive IRAMs, the cost of 16-32 IRAMs would likely be
area, could be an attractive building block for the next much less than 10% of the disk infrastructure cost.[7]
generation of PDAs. Greg Papadopolous, Chief Technical Officer of Sun
Microsystems Computing Corporation, observed a trend
3.3. “Intelligent” Disk in data mining. [12] While processors are doubling perfor-
mance every 18 months, customers are doubling data stor-
Tens of millions of magnetics disks are made each age every 5 months. Customers would like to “mine” this
years, and they include integrated circuits with memory data overnight to shape their business practices, but data is
for a track cache and logic to calculate the error correction being accumulated faster than affordable computers can
codes for each block. The track cache grows with the process the information. Combining Intelligent Disks with
increasing linear density of a track, or about 1.3X per year. an IRAM cluster might lead to scalable processing for
For example, the 9-GB Seagate Cheetah drive comes with data mining that can keep up with “Greg’s Law” at a frac-
a 0.5 Mbyte track cache and offers a 2.0 Mbyte cache as tion of the costs of the disks.
an option. The new Fibre Channel serial interfaces for
disks increase bandwidth demands, requiring transfer 3.5. Low-Cost TeraFLOPS Cluster
rates to the cache be 100 Mbytes/second over two ports.
An IRAM with high-speed serial interfaces could eas- A traditional but even lower volume market is super-
ily supply the required memory capacity and network computing. Using the same serial networks to connect
bandwidth. With sufficient computing power, in addition IRAMs via cross bar switches, hundreds of small, low
to calculating error correction codes, it could handle the power IRAMs could be placed on a few small boards. If
network and security protocols. Such a disk could attach IRAMs for video games could compute at 1 GFLOPS,
directly to a local area network, thereby avoiding a server. then in 2 to 3 years 1000 IRAMs and the disk system
Such a network-attached secure disk may improve scal- needed for the sorting above could offer TeraFLOPS com-
ability and bandwidth over conventional systems.[10] puting for less than $500,000. Figure 2 compares key
As disks will dissipate between 5 and 20 watts, an parameters to the $55,000,000 ASCI Red machine.
IRAM for an Intelligent disk must be power efficient. Note the smaller memory and higher I/O bandwidth
Disks also value small board area very highly, as the chips of the IRAM cluster. The sort benchmark was able to trade
must fit on the back of 2.5 inch or 3.5 diameter disks. off higher I/O bandwidth for smaller memory. Whether
An attractive chip for disk manufacturers might be a this would be true for supercomputing remains to be seen.
low-power IRAM with 4 to 16 MB of memory for disk Even adjusting cost/performance of ASCI Red by a
caches and networking code plus serial I/O for the inter- factor of 4 to 6 improvement for technological advances
face to disk and local area networks. between 1996 and 2000, an IRAM cluster might be attrac-
tive for supercomputing.
3.4. Scalable, Low-Cost, Data-Server Cluster
ASCI Red [13] IRAM cluster
If IRAM proves successful in such high volume mar-
kets as those above, such chips may be available to con- Processors 9000 Pentium Pros 1000 IRAMs
struct much more cost-effective cluster-based servers than Memory 600 GB 16-24 GB
those based on conventional desktop microprocessors.
Disk 2000 GB 2100 GB
One example comes from the commercial world. One
I/O benchmark is Minute Sort, which copies data from Peak Perf. 1.8 TeraFLOPS 1.0 TeraFLOPS
disk, sorts it, and then stores it back to disk. This applica- I/O speed 450 GB/s 2000 GB/s
tion places the same demands on servers as decision sup-
Floor space 1600 sq. ft. <10 sq. ft.
port systems. The current world record is 8.6 GBytes
using a cluster of 95 Sun Ultra 1 workstations connected Cost $55,000,000 <$500,000
via 160 Mbyte per sec links through switched-based local Year 1996 2000
area network.[11]
Figure 2. Supercomputing clusters.
Using the serial lines to connect to disks should allow
4. IRAM Architectures and Implementations We selected a vector architecture for four reasons.
The first is the compiler technology is the most mature of
Putting a conventional cache-based, superscalar the options, increasing the chances that programs would
microprocessor in an IRAM does not lead to exciting per- run on an IRAM with little or no change.
formance.[7][14] Hence IRAM needs a new architecture. The second reason is that the specification of many
If an architecture requires programmers to rewrite parallel operations in a single instruction helps in the
their programs, then it needs advantages of factors of at power-performance trade-off. Since the power is reduced
least 10 and as much as 50.[15] The reason for this high by the square of a voltage reduction, two techniques allow
threshold is that software development is slow, and with us to lower power while maintaining performance: deeper
conventional microprocessor performance doubling every pipelines and multiple pipes or lanes. Deeper pipelines
18 months, there must still be a large advantage after the make more sense in a vector architecture because the vec-
programming is completed. Otherwise programmers will tor operation specifies 64 or 128 operations without a
just wait, as in the long run novel machines are often branch. Multiple pipes or lanes means that by including,
unsuccessful commercially. say, 2 ALUs and cutting clock rate in half we can maintain
Given the silicon budgets of the next five or so years, performance while reducing voltage to lower power.
its unlikely that any alternative will have that large an The third reason is that the multimedia support sug-
advantage over conventional microprocessors for a large gested by video games, PDAs, or data mining is an ideal
set of programs. Keep in mind the DRAM vendors want application for vector architectures. Compared to multi-
designs that can be fabricated in the millions, so it is likely media extensions such as MMX, vectors are a more ele-
that IRAMs will be targeted at many applications. gant way of specifying multiple subword operations. We
Hence, in selecting a new architecture, the key is find-
can simply divide vector registers into smaller elements.
ing a design that exploits the memory bandwidth potential
The fourth reason is that the use of multiple pipes or
of IRAM while leveraging software developed for tradi-
lanes gives the IRAM the ability to have redundant logic
tional computing. Thus an architecture that has offers
that can be discarded to improve yield. With four ALUs,
mature compiler technology is at an advantage. A second-
for example, it may cost little in overall area but signifi-
ary consideration is energy efficiency. Given the applica-
tions in section 3, architectures that reduce power while cantly reduce costs to include a fifth ALU as a spare
preserving performance are very attractive for IRAM.
Another consideration is small code size to reduce the 5. Conclusion
amount of memory occupied by programs in IRAM.
Figure 3 shows the 1999 merged logic-DRAM tech-
We see four architectural alternatives: SIMD, VLIW,
nology, available from several companies, and parameter
MIMD on a chip, or vector. While SIMD is a good match
estimates of two potential vector IRAMs: low power and
to the IRAM technology when the logic is distributed with
high performance. We believe the low power option. is a
memory modules, it has never been a general purpose
solution. It also has received little compiler development
for traditional programming languages. So we rejected it. Target Low Power High Performance
VLIW is very popular today in the architecture Technology 0.18-0.20 micron, 5-6 metal layers, fast xtor
research community, but it has three negatives. The first is
that the compiler technology has not been successful com-
Die size ≈200 mm2
mercially, although it is an area of active compiler Memory 16-24 MB
research. The second is that VLIW architectures tradition- Vector lanes 4 64-bit (or 8 32-bit or 16 16-bit or 32 8-bit)
ally have the largest code size of the alternatives. The third
is object-code compatibility across multiple generations. Serial I/O 4 lines @ 1 Gbit/s 8 lines @ 2 Gbit/s
MIMD on a chip is a plausible direction for IRAM, Power ≈2 w @ 1-1.5 v logic ≈10 w @ 1.5-2 v
and many have taken or are taking this track.[16-18] The Clockunivers. 200scalar/100vector MHz 250s/250v MHz
MIMD commercial successes have been servers, where
the performance is number of tasks per hour rather than Clockindustry 400scalar/200vector MHz 500s/500v MHz
time for a single task. While servers are found in section 3, Perfuniversity 0.8 GFLOPS64-6 G8 2 GFLOPS64-16 G8
they probably will not have the volumes to justify IRAM.
Hence one question is whether a specific MIMD organiza- Perfindustry 1.6 GFLOPS64-12 G8 4 GFLOPS64-32 G8
tion lends itself to compiler technology to automatically Figure 3. Low power and high performance Vector IRAM
parallelize an application to run well on all processors goals to be taped out in 1999. The two clock rates are for the
with a single chip. A second question is the energy effi- scalar unit and the vector unit, and the range of the perfor-
ciency of fetching four independent instructions streams. mance is between 64-bit floating point and 8-bit integer.
better match to high volume applications such as video Remember, Denver, CO, USA, 1 June 1997.
games, PDAs, or disks. (http://iram.cs.berkeley.edu/isca97-workshop/w2-120-draft.ps)
We believe our small, academic design team can build
[8] Saulsbury, A.; Nowatzyk, A. “Missing the memory wall: the
an IRAM with half the performance of a larger and more case for processor/memory integration.” ISCA'96: The 23rd
experienced industrial team. Yet even this design would Annual International Conference on Computer Architecture,
demonstrate the potential of IRAM to offer an interesting Philadelphia, PA, USA, 22-24 May 1996. p.90-101.
combination of performance, power, memory capacity,
[9] Achilles Corporation; “DRAM Market Price Information in
board space, and cost.
Japan,” 1 August 1997.
Several characteristics make IRAM an exciting (http://pweb.aix.or.jp/~maski-na/index1-1EG.html)
research topic: large advantages on many dimensions, the
design challenges that make success not obvious, the need [10] Gibson, G.A.; Nagle, D.F.; Amiri, K.; Chang, F.W.; Fein-
to rethink the computer design for IRAM, its availability berg, E.M.; Gobioff, H.; Lee, C.; Ozceri, B.; Riedel, E.; Roch-
berg, D.; Zelenka, J. “File server scaling with network-attached
in a fairly standard manufacturing process, and its poten-
secure disks.” 1997 ACM International Conference on Measure-
tial impact on two large industries. Only time can tell us ment and Modeling of Computer Systems (SIGMETRICS 97),
the impact of this intriguing opportunity. Seattle, WA, USA, 15-18 June 1997. p.272-84.

6. Acknowledgments [11] Arpaci-Dusseau, A.C.; Arpaci-Dusseau, R.H.; Culler, D.E.;


Hellerstein, J.M.; Patterson, D.A. “High-performance sorting on
This research was supported by DARPA (DABT63- networks of workstations.” SIGMOD 1997: ACM SIGMOD
International Conference on Management of Data, Tucson, AZ,
C-0056), the California State MICRO program, and by
USA, 13-15 May 1997. p.243-54.
research grants from Intel, Samsung, Silicon Graph-
ics/Cray Research, and Sun Microsystems. [12] Papadopolous, G. “The Future of Computing.” Unpublished
talk at NOW Workshop, Lake Tahoe, CA USA, 27 July 1997.
7. References
[13] Rowell, J. “Intel Ships 20 Gflops Teraflops Installment to
[1] Patterson, D.; Anderson, T.; Cardwell, N.; Fromm, R.; Kee- Sandia,” May 9, 1996,(http://www.ssd.intel.com/tflop1.html)
ton, K.; Kozyrakis, C.; Thomas, R.; Yelick, K. “Intelligent RAM
(IRAM): chips that remember and compute,” 1997 IEEE Interna- [14] Bowman, N.; Cardwell, N.; Kozyrakis, C.; Romer, C.; and
tional Solids-State Circuits Conference. Digest of Technical Wang, H. “Evaluation of Existing Architectures in IRAM Sys-
Papers, San Francisco, CA, USA, 6-8 Feb. 1997. p.224-5. tems,” Workshop on Mixing Logic and DRAM: Chips that Com-
pute and Remember, Denver, CO, USA, 1 June 1997.
[2] Patterson, D.; Anderson, T.; Cardwell, N.; Fromm, R.; Kee- (http://iram.cs.berkeley.edu/isca97-workshop/w2-114.ps)
ton, K.; Kozyrakis, C.; Thomas, R.; and Yelick, K. “A case for
intelligent RAM”, IEEE Micro, vol.17, (no.2), March-April [15] Weems, C. “Considerations Leading to an Asynchronous
1997. p.34-44. SIMD Architectural Approach for Exploiting Mixed Logic and
Memory,” Workshop on Mixing Logic and DRAM: Chips that
[3] Fromm, R.; Perissakis, S.; Cardwell, N.; Kozyrakis, C.; Compute and Remember, Denver, CO, USA, 1 June 1997.
McGaughy, B.; Patterson, D.; Anderson, T.; Yelick, K. “The (http://iram.cs.berkeley.edu/isca97-workshop/w2-108.ps)
energy efficiency of IRAM architectures,” 24th Annual Interna-
tional Symposium on Computer Architecture. (ISCA '97.), Den- [16] Kogge, P.M.; Sunaga, T.; Miyataka, H.; Kitamura, K.; and
ver, CO, USA, 2-4 June 1997. p.327-37. others. “Combined DRAM and logic chip for massively parallel
systems.” Proceedings. 16th Conference on Advanced Research
[4] Shimizu, T.; et al. “A multimedia 32 b RISC microprocessor in VLSI, Chapel Hill, NC, USA, 27-29 March 1995, p. 4-16.
with 16 Mb DRAM.” ISSCC Digest of Technical Papers, San
Francisco, CA, USA, 8-10 Feb. 1996 p. 216-17, 448. [17] Murakami, K.; Shirakawa, S.; Miyajima, H. “Parallel pro-
cessing RAM chip with 256 Mb DRAM and quad processors.”
[5] Perissakis, S.; Kozyrakis, C.; Anderson, T.; Asanovic, K.; 1997 IEEE International Solids-State Circuits Conference.
Cardwell, N.; Fromm, R.; Golbus, J.; Gribstad, B.; Keeton, K.; Digest of Technical Papers, San Francisco, CA, USA, 6-8 Feb.
Patterson, D.; Thomas, R.; Treuhaft, N.; and Yelick, K. “Scaling 1997, p.228-9, 528.
Processors to 1 Billion Transistors and Beyond: IRAM,” To
appear in IEEE Computer, September 1997. [18] Yamauchi, T., Hammond, L. and Olukotun, K. “Evaluation
of Existing Architectures in IRAM Systems,” Workshop on Mix-
[6] Yang, C.K.K.; Horowitz, M.A. “A 0.8- mu m CMOS 2.5 Gb/s ing Logic and DRAM: Chips that Compute and Remember, Den-
oversampling receiver and transmitter for serial links.” IEEE ver, CO, USA, 1 June 1997.
Journal of Solid-State Circuits, 31:12, Dec. 1996. p.2015-23. (http://iram.cs.berkeley.edu/isca97-workshop/w2-106.ps)

[7] Keeton, K.; Arpaci-Dusseau, R; and Patterson, D; “IRAM


and SmartSIMM: Overcoming the I/O Bus Bottleneck,” Work-
shop on Mixing Logic and DRAM: Chips that Compute and

You might also like