ch06 PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 64

Chapter 6

The Memory System

Jin-Fu Li
Department of Electrical Engineering
National Central University
Jungli, Taiwan
Outline
¾ Basic Concepts
¾ Semiconductor Random Access Memories
¾ Read Only Memories
¾ Speed, Size, and Cost
¾ Cache Memories
¾ Performance Considerations
¾ Virtual Memories

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 2


Content Coverage

Main Memory System

Address Data/Instruction

Central Processing Unit (CPU)


Operational
Registers Arithmetic
Instruction
and
Cache Sets
Program Logic Unit
memory
Counter

Control Unit

Input/Output System

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 3


Basic Concepts
¾ The maximum size of the memory that can be used
in any computer is determined by the addressing
scheme
‹ For example, a 16-bit computer that generates 16-bit addresses is
capable of addressing up to 216=64K memory locations.
‹ Similarly, machines whose instructions generate 32-bit addresses
can utilize a memory that contains up to 232=4G memory
locations
¾ Most modern computers are byte addressable
¾ Form the system standpoint, we can view the
memory unit as a block box
‹ Data transfer between the memory and processor takes place
through the use of two processor registers, MAR and MDR

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 4


Connection of the Memory to the Processor

Processor Memory
k-bit address bus
MAR

n-bit data bus Up to 2k addressable


MDR locations

Control lines Word length=n bits


(R/W, MFC, etc.)

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 5


Basic Concepts
¾ A useful measure of the speed of memory units is the
time that elapses between the initiation of an
operation and the completion of that operation. This
is referred to as the memory access time.
¾ Another important measure is the memory cycle time,
which is the minimum time delay required between
the initiation of two successive memory operations
¾ A memory unit is called random-access memory
(RAM) if any location can be accessed for a Read or
Write operation in some fixed amount of time that is
independent of the location’s address
¾ The memory cycle time is the bottleneck in the
system
Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 6
Basic Concepts
¾ One way to reduce the memory access time is to use
a cache memory
¾ Cache memory is a small, fast memory that is
inserted between the larger, smaller main memory
and the processor.
¾ Virtual memory is used to increase the apparent size
of the physical memory. Data are addressed in a
virtual address space that can be as large as the
addressing capability of the processor. But at any
given time, only the active portion of this space is
mapped onto locations in the physical memory. The
remaining virtual addresses are mapped onto the
bulk storage devices used, such as magnetic disks
Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 7
Semiconductor Random Access memories
S0
Word0 Wordi-1
S1

A0
A1 Row Decoder

Ak-1

CS Sn-1
Wordni-1

A0
Aj-1 Column Decoder

Sense Amplifier
R/W Read/Write Circuit

m-bit Input/Output
Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 8
Organization of Bit Cells in a Memory

Address Row Decoder


Memory Cell

Column decoder and Read/Write Circuitry


Data I/Os

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 9


Generic RAM Circuit
Bit line conditioning Clocks

RAM Cell

n-1:k

Sense Amp, Column Write


k-1:0 Mux, Write Buffers Clocks

Address write data read data

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 10


Static RAM
¾ A 6T SRAM cell

word line

bit - bit

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 11


Read Operation

precharge

bit, -bit

word

data

- bit bit

data

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 12


Write Operation

N5 N6
write data
word

write
N3 N4

word

bit, -bit
- bit bit
N1 N2
write
cell, -cell
write data

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 13


Dynamic RAM
¾ A typical DRAM cell

word line

bit

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 14


Data Retention Time of a DRAM Cell
¾ Write and hold operation in a DRAM cell

WL=1 WL=0

on + off +
+ Cs Vs Cs Vs
- -
-

Write Operation Hold

V s = V max = V DD − V tn
Q max = C s (V DD − V tn )

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 15


Data Retention Time of a DRAM Cell
¾ Charge leakage in a DRAM cell
Vs(t)
WL=0
Vmax
IL Minimum logic 1
V1 voltage
off +
Cs Vs(t)
-
t
th
dQ s
IL = −( )
dt
dV s
IL = −C s ( )
dt
∆Vs
IL ≈ −C s ( )
∆t
Cs
t h = | ∆ t |≈ ( )∆ Vs
IL
Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 16
DRAM Refresh Operation
As an example, if IL=1nA, Cs=50fF, and the difference of Vs is 1V, the
hold time is
50 × 10 −15
t h= −9
× 1 = 0 .5 µ s
1 × 10
Memory units must be able to hold data so long as the power is
applied. To overcome the charge leakage problem, DRAM arrays
employ a refresh operation where the data is periodically read from
every cell, amplified, and rewritten.

The refresh cycle must be performed on every cell in the array with a
minimum refresh frequency of about
1
f refresh ≈
2th

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 17


Asynchronous DRAMs
¾ A 16M-bit DRAM chip, configured as 2Mx8, is
shown below
RAS
Row 4096x(512x8)
Row ..
address cell array
decoder .
latch

A20-9 /A8-0

..
.
Sense/Write CS
circuits R/W

..
.
Column
address Column decoder
latch

..
.
CAS
D7 D6 D0

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 18


Fast Page Mode
¾ When the DRAM shown above is accessed, the
contents of all 4096 cells in the selected row are
sensed, but only 8 bits are placed on the data lines
D7-0. This byte is selected by the column address bits
A8-0.
¾ A simple modification can make it possible to access
the other bytes in the same row without having to
reselect the row
‹ A latch can be added at the output of the sense amplifier in each
column
‹ Transfer of successive bytes is achieved by applying a
consecutive sequence of column address under the control of
successive CAS signals
¾ This block transfer capability is referred to as the fast
page mode feature
Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 19
Synchronous DRAMs
¾ The structure of an synchronous DRAM (SDRAM)
Refresh
counter

Row
Row .. Cell array
address
Row/Column decoder .
latch
address

Column Read/Write
Column ..
address circuits & latches
decoder .
counter

Clock
RAS Mode register Data input Data output
and timing register register
CAS
R/W control
CS Data
Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 20
Burst Read Operation in an SDRAM
¾ SDRAMs have several different modes of
operation, which can be selected by writing
control information into a mode register
¾ The burst operations use the block transfer
capability described above as the fast page mode
feature
¾ In SDRAMs, it is not necessary to provide
externally generated pulses on the CAS line to
select successive columns. The necessary control
signals are provided internally using a column
counter and the clock signal

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 21


Latency
¾ Transfers between the memory and the processor
involve single words of data or small blocks of words
¾ The speed and efficiency of these transfers have a
large impact on the performance of a computer
system
¾ A good indication of the performance is given by two
parameters: latency and bandwidth
¾ The term memory latency is used to refer to the
amount of time it takes to transfer a word of data to
or from the memory
¾ In block transfers, the term latency is used to denote
the time it takes to transfer first word of data

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 22


Bandwidth
¾ When transferring blocks of data, it is of interest
to know how much time is needed to transfer an
entire block
¾ Since blocks can be variable in size, it is useful to
define a performance measure in terms of the
number of bits or bytes that can be transferred in
one second. This measure is often referred to as
the memory bandwidth
¾ The bandwidth of a memory unit depends on the
speed of access to the stored data and on the
number of bits that can be accessed in parallel

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 23


Structure of Larger Memories
Address

2-bit
decoder

D31-24 D23-16 D15-8 D7-0


Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 24
Memory System Considerations
¾ Memory controller
‹ To reduce the number of pins, the DRAM chips use multiplexed
address inputs
‹ A typical processor issues all bits of an address at the same time
‹ The required multiplexing of address bits is usually performed
by a memory controller circuit, which is interposed between the
processor and the DRAM as shown below
Row/Colum
Address n Address
RAS
R/W
Memory CAS
Request controller
Processor R/W Memory
Clock
CS
Clock

Data
Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 25
Refresh Overhead
¾ All DRAMs have to be refreshed
¾ Consider an SDRAM whose cells are arranged in
8K (8192) rows. Suppose that it takes four clock
cycles to access (read) each row. Then it takes
8192x4=32768 cycles to refresh all rows
¾ At a clock rate of 133MHz, the time needed to
refresh all rows is 32768/(133x106)=246x10-6
seconds
¾ Thus, the refreshing process occupies 0.246ms in
each 64-ms time interval. The refresh overhead is
0.246/64=0.0038.

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 26


Read Only Memory (ROM)
¾ A 4x 4-bit NOR-based ROM array

R1 R2 R3 R4 C1 C2 C3 C4
R1 0 1 0 1
1 0 0 0
0 1 0 0 0 0 1 1
R2
0 0 1 0 1 0 0 1
R3 0 0 0 1 0 1 1 0

R4

C1 C2 C3 C4
Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 27
Read Only Memory (ROM)
¾ A 4x 4-bit NAND-based ROM array

C1 C2 C3 C4

R1 R1 R2 R3 R4 C1 C2 C3 C4

0 1 1 1 0 1 0 1
R2
1 0 1 1 0 0 1 1
R3
1 1 0 1 1 0 0 1
1 1 1 0 0 1 1 0
R4

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 28


Memory Hierarchy
Increasing size Increasing Increasing
speed cost per bit
Processor
Registers

Primary cache L1

Secondary cache L1

Main memory

Magnetic disk
Secondary memory

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 29


Cache Memories
¾ The speed of the main memory is very low in
comparison with the speed of modern processors
¾ Hence, it is important to devise a scheme that
reduces the time needed to access the necessary
information
¾ Since the speed of main memory unit is limited by
electronic and packaging constraints, the solution
must be sought in a different architectural
arrangement
¾ An efficient solution is to use a fast cache memory
which essentially makes the main memory appear to
the processor to be faster than it really is

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 30


Effectiveness of the Cache Memory
¾ The effectiveness of the cache mechanism is
based on a property of computer programs called
locality of reference
¾ Analysis of programs shows that most of their
execution time is spent on routines in which
many instructions are executed repeatedly. These
instructions may constitute a simple loop, nested
loops, or a few procedures that repeatedly call
each other
¾ Memory instructions in localized areas of the
program are executed repeatedly during some
time period, and the remainder of the program is
accessed relatively infrequently. This is referred
to as locality of reference
Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 31
Temporal and Spatial Localities
¾ The temporal aspect of the locality of reference
suggests that whenever an information item
(instruction or data) is first needed, this item should
be brought into the cache where it will hopefully
remain until it is needed again
¾ The spatial aspect of the locality of reference suggests
that instead of fetching just one item from the main
memory to the cache, it is useful to fetch several
items that reside at adjacent addresses as well. We
will use the term block to refer to a set of continuous
address locations of some size. Another item that is
often used to refer to a cache block is cache line

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 32


Use of a Cache Memory
¾ Consider the simple arrangement shown below

Main
Processor Cache memory

¾ Usually, the cache memory can store a reasonable


number of blocks at any given time, but this number
is small compared to the total number of blocks in
the main memory
¾ The correspondence between the main memory
blocks and those in the cache is specified by a
mapping function
Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 33
Cache Replacement Algorithm
¾ When the cache is full and a memory word that is
not in the cache is referenced, the cache control
hardware must decide which block should be
removed to create space for the new block that
contains the referenced word
¾ The collection of rules for making this decision
constitutes the replacement algorithm

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 34


Hits & Misses
¾ Read hits
‹ this is what we want
¾ Read misses
‹ stall the CPU, fetch block from memory, deliver to
cache, restart
¾ Write hits:
‹ can replace data in cache and memory (write-through)
‹ write the data only into the cache (write-back the
cache later)
¾ Write misses:
‹ read the entire block into the cache, then write the
word
Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 35
Mapping Functions
¾ To discuss possible methods for specifying where
memory blocks are placed in the cache, we use a
specific small example
¾ Consider a cache consisting of 128 blocks of 16 words
each, for a total of 2048 (2K) words, and assume that
the main memory is addressable by a 16-bit address.
The main memory has 64K words, which we will
view as 4K blocks of 16 words each
¾ Cache mapping functions
‹ Direct mapping
‹ Associative mapping
‹ Set-associative mapping

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 36


Direct-Mapped Cache
Main memory
Block 0
Block 2
Cache .
.
tag Block 0 .
tag Block 2 Block j modulo 128
Block 127
. .
. .
. .

tag Block 127 Block 255


Block 256

.
.
tag block word .

5 7 Main memory
4
address Block 4095

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 37


Associative-Mapped Cache
Main memory
Block 0
Block 2
Cache .
.
tag Block 0 .
tag Block 2
Block 127
. .
. .
. .

tag Block 127 Block 255


Block 256

.
.
tag word .

12 4 Main memory
address Block 4095

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 38


Set-Associative-Mapped Cache
Main memory
Block 0
Block 2
Cache .
.
tag Block 0 .
Set 0
tag Block 2
. Block 127
. .
. .
.
tag Block 126
Set 63 Block 255
tag Block 127
Block 256

.
.
tag set word .
Main memory
6 6 4
address Block 4095

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 39


K-Way Set-Associative Cache
¾ The number of blocks per set is a parameter that
can be selected to suit the requirements of a
particular computer
¾ A cache that has k blocks per set is referred to as
a k-way set-associative cache

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 40


Tag Memory Structure
¾ For performance reasons, the tags must be searched
in parallel
¾ An example of the structure of a tag word
bit[0] bit[0] bit[w-1] bit[w-1]
Wi

1 0
Vdd

P
Mi

10 0
1
comparison logic
Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 41
Comparison of Different Mapping Scheme
¾ Direct-mapped cache
‹ Cost: low
‹ Flexibility: low
¾ Associative-mapped cache
‹ Cost: high
‹ Flexibility: high
¾ Set-associative-mapped cache
‹ Cost: medium
‹ Flexibility: medium

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 42


Replacement Algorithms
¾ In a direct-mapped cache, the position of each block
is predetermined; hence, no replacement strategy
exists
¾ In associative and set-associative caches there exists
some flexibility. When a new block is to be brought
into the cache and all the positions that it may
occupy are full, the cache controller must decide
which of the old blocks to overwrite. This is an
important issue because the decision can be a strong
determining factor in system performance
¾ Thus efficient replacement algorithms are needed. In
general, the objective is to keep blocks in the cache
that are likely to be referenced in the near future
Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 43
LRU Replacement Algorithm
¾ Because programs usually stay in localized areas
for reasonable periods of time, there is a high
probability that the blocks that have been
referenced recently will be referenced again soon
¾ Therefore, when a block is to be overwritten, it is
sensible to overwrite the one that has gone the
longest time without being referenced. This block
is called the least recently used (LRU) block, and
the technique is called the LRU replacement
algorithm

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 44


Example of Mapping Techniques
¾ Assume the data cache has space for only eight
blocks of data. Also assume that each block consists
of only one 16-bit word of data and the memory is
word addressable with 16-bit addresses
¾ Assume the LRU replacement algorithm is used for
block replacement in the cache
¾ Let the following program be run and a 4x10 array of
number is stored in main memory locations 7A00
through 7A27 SUM:=0
for j:=0 to 9 do
SUM:=SUM+A(0,j)
end
AVE:=SUM/10
for i:=9 downto 0 do
A(0,i):=A(0,i)/AVE
end
Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 45
Data Stored in the Main Memory
Memory address Contents
(7A00) 0111101000000000 A(0,0)
(7A01) 0111101000000001 A(1,0)
(7A02) 0111101000000010 A(2,0)
(7A03) 0111101000000011 A(3,0)
(7A04) 0111101000000100 A(0,1)

(7A24) 0111101000000100 … A(0,9)


(7A25) 0111101000000101 A(1,9)
(7A26) 0111101000000110 A(2,9)
(7A27) 0111101000000111 A(3,9)
Tag for direct mapped

Tag for set-associative

Tag for associative

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 46


Contents of a Direct-Mapped Data Cache

Contents of data cache after pass:


Block position j=1 j=3 j=5 j=7 j=9 j=6 j=4 j=2 j=0
0 A(0,0) A(0,2) A(0,4) A(0,6) A(0,8) A(0,6) A(0,4) A(0,2) A(0,0)
1
2
3
4 A(0,1) A(0,3) A(0,5) A(0,7) A(0,9) A(0,7) A(0,5) A(0,3) A(0,1)
5
6
7

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 47


Contents of an Associative Data Cache

Contents of data cache after pass:


Block position j=7 j=8 j=9 j=1 j=0

0 A(0,0) A(0,8) A(0,8) A(0,8) A(0,0)

1 A(0,1) A(0,1) A(0,9) A(0,1) A(0,1)


2 A(0,2) A(0,2) A(0,2) A(0,2) A(0,2)
3 A(0,3) A(0,3) A(0,3) A(0,3) A(0,3)
4 A(0,4) A(0,4) A(0,4) A(0,4) A(0,4)
5 A(0,5) A(0,5) A(0,5) A(0,5) A(0,5)
6 A(0,6) A(0,6) A(0,6) A(0,6) A(0,6)

7 A(0,7) A(0,7) A(0,7) A(0,7) A(0,7)

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 48


Contents of a Set-Associative Data Cache

Contents of data cache after pass:


Block position j=3 j=7 j=9 j=4 j=2 j=0

0 A(0,0) A(0,4) A(0,8) A(0,4) A(0,4) A(0,0)

1 A(0,1) A(0,5) A(0,9) A(0,5) A(0,5) A(0,1)


Set 0
2 A(0,2) A(0,6) A(0,6) A(0,6) A(0,2) A(0,2)
3 A(0,3) A(0,7) A(0,7) A(0,7) A(0,3) A(0,3)
0
1
Set 1
2
3

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 49


Performance Considerations
¾ Two key factors in the commercial success of a
computer are performance and cost; the best
possible performance at the lowest cost is the
objective
¾ The challenge in considering design alternatives
is to improve the performance without increasing
the cost
¾ A common measure of success is the
price/performance ratio

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 50


Addressing Multiple-Module Memories
¾ Consecutive words in a module
k bits m bits

Module Address in module MM address

ABR DBR ABR DBR ABR DBR

Module … Module … Module


0 i n-1

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 51


Addressing Multiple-Module Memories
¾ Consecutive words in consecutive modules
m bits k bits

Address in module Module MM address

ABR DBR ABR DBR ABR DBR

Module … Module … Module


0 i 2k-1

¾ This memory structure is called memory


interleaving. The low-order k bits of the memory
address select a module, and the high-order m
bits name a location within that module.
Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 52
Access Time Reduction
¾ Consider the time needed to transfer a block of data
from the main memory to the cache when a read
miss occurs.
‹ Suppose that a cache with 8-word blocks is used. On a read miss,
the block that contains the desired word must be copies from the
memory into the cache
‹ Assume that the hardware has the following properties. It takes
one clock cycle to send an address to the main memory. The main
memory allows the first word to be accessed in 8 cycles, but
subsequent words of the block area accessed in 4 cycles per word
‹ Also, one clock cycle is needed to send one word to the cache

¾ If a single memory module is used, then the time


needed to load the desired block into the cache is
‹ 1+8+(7x4)+1=38 cycles

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 53


Access Time Reduction
¾ Suppose now that the memory is constructed as four
interleaved modules, using the scheme shown above.
¾ When the starting address of the block arrives at the
memory, all four modules begin accessing the required
data, using the high-order bits of the address. After 8
cycles, each module has one word of data in its data
buffer register (DBR). These words are transferred to the
cache, one word at a time, during the next 4 cycles.
During this time, the next word in each module is
accessed. Then it takes another 4 cycles to transfer these
words to the cache
¾ Therefore, the total time needed to load the block from
the interleaved memory is
‹ 1+8+4+4=17 cycles

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 54


Hit Rate and Miss Penalty
¾ A successful access to data in a cache is called a
hit. The number of hits stated as a fraction of all
attempted accesses is called the hit rate, and the
miss rate is the number of misses stated as a
fraction of attempted accesses
¾ High hit rates, well over 0.9, are essential for
high-performance computers
¾ Performance is adversely affected by the actions
that must be taken after a miss. The extra time
needed to bring the desired information into the
cache is called the miss penalty

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 55


Average Access Time
¾ Consider the impact of the cache on the overall
performance of the computer. Let h be the hit rate, M the
miss penalty, that is, the time to access information in the
main memory, and C the time to access information in the
cache. The average access time experienced by the
processor is
‹ tave=hC+(1-h)M
¾ Consider a high-performance processor has two levels of
caches, L1 and L2. The average access time experienced
by the processor is
‹ tave=h1C1+(1-h1)h2C2+(1-h1)(1-h2)M
‹ Where h1 is the hit rate in the L1 cache; h2 is the hit rate in the L2
cache; C1 is the time to access information in the L1 cache; C2 is
the time to access information in the L2 cache

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 56


Example
¾ If the computer has no cache, then, using a fast processor and a typical
DRAM main memory, it takes 10 clock cycles for each memory read
access. Suppose the computer has a cache that holds 8-word blocks and
an interleaved main memory. Then as we shown above, 17 cycles are
needed to load a block into the cache. Assume that 30% of the
instructions in a typical program perform a read or a write operation,
which means that there are 130 memory accesses for every 100
instructions executed. Assume that the hit rates in the cache are 0.95 for
instructions and 0.9 for data. Let us further assume that the miss penalty
is the same for both read and write accesses. Then a rough estimate of the
improvement in performance that results from using the cache can be
obtained as follows:
Time without cache 130x10
5.04
Time with cache 100(0.95x1+0.05x17)+30(0.9x1+0.1x17)
¾ It is also interesting to consider how effective this cache is compared to
an ideal cache that has a hit rate of 100%
100(0.95x1+0.05x17)+30(0.9x1+0.1x17)
1.98
130x1
Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 57
Virtual Memories
¾ Techniques that automatically move program
and data blocks into the physical main memory
when they are required for execution are called
virtual-memory techniques
¾ Programs, and hence the processor, reference an
instruction and data space that is independent of
the available physical main memory space. The
binary addresses that the processor issues for
either instructions or data are called virtual or
logical addresses. These addresses are translated
into physical addresses by a combination of
hardware and software components
Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 58
Virtual Memory Organization

Processor A special hardware unit, called


Memory Management Unit (MMU),
Virtual address translates virtual addresses into
physical addresses. When the
Data MMU
desired data are in the main
Physical address memory, these data are fetched as
described in our presentation of
Cache the cache mechanism. If the data
are not in the main memory, the
Data Physical address MMU causes the operating system
to bring the data into the memory
Main memory from the disk. Transfer of data
between the disk and the main
DMA transfer memory is performed using the
DMA scheme.
Disk storage

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 59


Address Translation
¾ A simple method for translating virtual addresses into
physical addresses is to assume that all programs and
data are composed of fixed-length units called pages,
each of which consists of a block of words that occupy
contiguous locations in the main memory.
¾ Conceptually, cache techniques and virtual-memory
techniques are very similar. They differ mainly in the
details of their implementation
‹ The cache bridges the speed gap between the processor and the
processor and the main memory and is implemented in hardware
‹ The virtual-memory mechanism bridges the size and speed gaps
between the main memory and secondary storage and is usually
implemented in part by software techniques

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 60


Virtual-Memory Address Translation
Virtual address from processor
Page table base register
Page table address Virtual page number Offset

PAGE TABLE


Page frame Offset


Control bits Page frame
in memory
Physical address in main memory

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 61


Use of an Associative-Mapped TLB
Virtual address from processor

Virtual page number Offset

TLB (Translation Lookaside Buffer)

Virtual page Control Page frame


number bits In memory

No …


=?

Yes

Miss Hit Page frame Offset

Physical address in main memory

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 62


Use of an Associative-Mapped TLB
¾ An essential requirement is that the contents of
the TLB be coherent with the contents of page
tables in the memory. When the operating system
changes the contents of page tables, it must
simultaneously invalidate the corresponding
entries in the TLB.
¾ Address translation proceeds as follows
‹ Given a virtual address, the MMU looks in the TLB for
the referenced page. If the page table entry for this
page is found in the TLB, the physical address is
obtained immediately. If there is a miss in the TLB,
then the required entry is obtained from the page table
in the main memory and the TLB is updated
Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 63
Page Fault
¾ When a program generates an access request to a
page that is not in the main memory, a page fault is
said to have occurred. The whole page must be
brought from the disk into the memory before access
can proceed
¾ When it detects a page fault, the MMU asks the
operating system to intervene by raising an exception
(interrupt). Processing of the active task is
interrupted, and control is transferred to the
operating system. The operating system then copies
the requested page from the disk into the main
memory and returns control to the interrupt task.

Advanced Reliable Systems (ARES) Lab. Jin-Fu Li, EE, NCU 64

You might also like