Test 6 PracticeQuestion Cachememory 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Practice problems: Cache memory

1. What is Hierarchy of Memories?

Programmers want memory to be fast, large, and cheap. Question may be asked whether we need a very
fast and very large memory, since programs access a small proportion of their address space at any time!
Architects have found that they can address these conflicting demands with a hierarchy of memories, with
the fastest, smallest, and most expensive memory per bit at the top of the hierarchy and the slowest, largest,
and cheapest per bit at the bottom. To optimize cost and speed, combination of memory devices are
used. Smaller amount of expensive but fast memory is used close to the processor whereas large
amounts of cheaper but slower memory is used farther from the processor. Hierarchy of memories
give the programmer the illusion that main memory is nearly as fast as the top of the hierarchy and nearly
as big and cheap as the bottom of the hierarchy. Due to hierarchy, to CPU, it would appear as fast as
most expensive memory and as big as the cheapest.
The fastest, smallest, and most expensive memory per bit at the top of the hierarchy and the slowest,
largest, and cheapest per bit at the bottom.

2. What is principle of locality?


 Programs access a small proportion of their address space at any time
 Temporal locality
 Items accessed recently are likely to be accessed again soon
 instructions in a loop
 Spatial locality
 Items near those accessed recently are likely to be accessed soon
 sequential instruction access, array data
3. Cache Memory reduces frequency of access to RAM while CPU is running a program
Reduces average memory access time of a program
Reduces program execution time
Caches give the programmer the illusion that main memory is nearly as fast as the top of the hierarchy
and nearly as big and cheap as the bottom of the hierarchy.

4.
5. Define Hit rate, Hit time, Miss rate, miss penalty, average memory access time, and memory stall cycles.
• Hit Time (HT): The hit time is how long it takes data to be sent from the cache to the processor.
This is usually fast, on the order of 1-3 clock cycles.
• Miss Penalty (MP): The miss penalty is the time to copy data from main memory to the cache.
This often requires dozens of clock cycles (at least). The miss rate is the percentage of misses.
• Miss Rate (MR): 1 – Hit Ratio
• The average memory access time, or AMAT, can then be computed.
AMAT = Hit time + (Miss rate × Miss penalty)
This is just averaging the amount of time for cache hits and the amount of time for cache misses
6. What is locality of reference? Define temporal locality and spatial locality.
7. How does CPU use cache memory? Explain briefly
8. List the steps in sequential order, how CPU access/uses cache memory while it runs a program.
9. What is cache miss? What is the consequence of cache miss? Explain.
10. What are the performance measures of cache memory?
11. What is the justification of block transfer from RAM to cache in the event of cache miss? explain
12. Parameters that matter: RAM

Hit Time (HT) Cache


C
P
Miss Rate (MR) U
Miss Penalty (MP) Access time 5ns
Access time 30ns
AMAT = Hit Time + Miss Rate x Miss Penalty
Suppose a program (all instructions are register based), initially loaded into RAM. The Hit ratio of
Cache is 80%. Calculate average access time.
Solution: AMAT = Hit Time + Miss Rate x Miss Penalty
Average Memory Access Time: 5ns +0.2×30ns = 11ns

13. If a memory system consists of a single external cache with an access time of 20 ns and a hit rate of
0.92, and a main memory with an access time of 60 ns, what is the average memory access time
(AMAT) of this system?
AMAT = Hit time + Miss rate × Miss penalty

AMAT = 20 + (0.08x60) = 24.8 ns

14. If Hit ratio of Cache is 80%, Hit Time (HT) = 5ns and Miss Penalty (MP) = 30ns, calculate AMAT.
Miss Rate = 1- 0.8 = 0.2
AMAT = Hit Time + Miss Rate x Miss Penalty
= 5ns +0.2×30ns = 11ns
15. Hit time is also important for performance Average memory access time (AMAT)
AMAT = Hit time + Miss rate × Miss penalty

Example
CPU with 1ns clock, hit time = 1 cycle, miss penalty = 20 cycles, cache miss rate = 5%

AMAT = 1 + 0.05 × 20 = 2ns


2 cycles per instruction

16. Processor clock cycle: 200 ns, Miss Penalty of 50 clock cycles, Miss Ratio of 0.02 misses/instruction,
and Hit time of 1 clock cycle

AMAT = Hit time + Miss rate × Miss penalty


= 1+ 0.02 × 50 = 2 clock cycles = 400 ns

Which improvement would be best?


A) 190 ns clock:

AMAT = 1+ 0.02 × 50 = 2 clock cycles = 380 ns

B) MP (Miss Penalty) of 40 clock cycles:

AMAT = 1+ 0.02 × 40 = 1.8 clock cycles = 3600 ns

C) MR(Miss Ratio) of 0.015 misses/instruction:

AMAT = 1+ 0.015 × 50 = 1.75 clock cycles = 350 ns

17. A machine has a base CPI of 2 clock cycles. Measurements obtained show that the instruction miss rate
is 12% and the data miss rate is 6%, and that on average, 30% of all instructions contain one data
reference. The miss penalty for the cache is 10 cycles. What is the total CPI?
Solution:
Please note that Base CPI means the clock cycles required to read from cache memory. You can also say
that this is hit time in CPU clock cycles, i.e., access time of Cache memory in CPU clock cycles.
Average CPI = 2.0 + instruction miss cycles + data miss cycles
= 2.0 + 0.12 x 10 + 0.30 x 0.06 x 10 = 2.0 + 1.2 + 0.18
= 3.38

18. Memory Stall Cycles = Number of Memory Accesses × Miss rate × Miss penalty

19. Machine has a base CPI of 1 clock cycles. Measurements obtained show that the instruction miss rate is
15% and the data miss rate is 6%, and that on average, 40% of all instructions contain one data
reference. The miss penalty for the cache is 20 cycles. What is the total CPI?
Solution:

Average CPI = Base CPI + instruction miss cycles + data miss cycles
I-cache miss rate = 15%
D-cache miss rate = 6%
Miss penalty = 20 cycles
Base CPI (ideal cache) = 1
Load & stores are 40% of instructions
Miss cycles per instruction
I-cache: 1 x 0.15 × 20 = 3
D-cache: 1x 0.40 × 0.06 × 20 = 0.48
Actual CPI = 1 + 3 + 0.48 = 4.48

20. Suppose our processor has separate L1 instruction cache and data cache. Our CPI-base is 2 clock cycles,
whereas memory accesses take 100 cycles. Our Instruction cache miss rate is 3% while our Data cache
miss rate is 10%. 40% of our instructions are loads or stores.
a. What is our processor's CPI stall?

CPI stall = CPI base + L1 inst miss cycles + L1 data miss cycles
= 2 + 1 ×0.03 ×100 + 0.4×0.1×100
= 2 +0.07 × 100 = 9 cycles

To improve the performance our processor, we add a unified L2 cache between the L1 caches and
memory. Our L2 cache has a hit time of 10 cycles and a global miss rate of 2%.
b. What is our new CPI stall?

CPI stall = CPI base + L1 inst miss cycles + L1 data miss cycles + L2 inst miss cycles + L2 data miss
cycles
= 2 + (1 ×0.03 × 10) + (0.4 ×0.1 × 10) + (1×0.02 × 100) + (0.4 ×0.02 × 100)
= 2 + 0.3 + 0.4 + 2 + 0.8
= 2 + 3.5 = 5.5 cycles
21.  Assume
1. I-cache miss rate 3%.
2. D-cache miss rate 5%.
3. 40% of instructions reference data.
4. Miss penalty of 50 cycles.
5. Base CPI is 2.
 What is the CPI including the misses? Average CPI
 How much slower is the machine when misses are taken into account? Average CPI/base CPI
 Redo the above if the I-miss penalty is reduced to 10 (D-miss still 50)
 With I-miss penalty back to 50, what is performance if CPU (and the caches) are 100 times faster
22.  Assume
o 5% I-cache misses.
o 10% D-cache misses.
o 1/3 of the instructions access data.
 What is the CPI if the miss penalty is 12?
 What is the CPI if miss penalty is 24 clock)?

23. Suppose our processor has separate L1 instruction cache and data cache. Our CPIbase is 2 clock cycles,
whereas memory accesses take 100 cycles. Our I$ miss rate is 3% while our D$ miss rate is 10%. 40%
of our instructions are loads or stores.

• a. What is our processor's CPIstall?


CPIstall = CPIbase + L1 inst miss cycles + L1 data miss cycles
= 2 + 1 x 0.03 x 100 + 0.4 x 0.1 x 100
= 2 + 0.07 x 100 = 9 cycles
To improve the performance our processor, we add a unified L2 cache between the L1 caches
and memory. Our L2 cache has a hit time of 10 cycles and a global miss rate of 2%.
• b. What is our new CPIstall?
CPIstall=CPIbase + L1 inst miss cycles + L1 data miss cycles + L2 inst miss cycles + L2 data
miss cycles
= 2 + (1 x 0.03 x 10) + (0.4 x 0.1 x 10) + (1 x 0.02 x 100 ) + (0.4 x 0.02 x 100)
= 2 + 0.3 + 0.4 + 2 + 0.8
= 2 + 3.5 = 5.5 cycles

24. Assume a memory access to main memory on a cache "miss" takes 30 ns and a cache "hit" takes 3 ns. If
80% of the processor's memory requests result in a cache "hit", what is the average memory access time?
Solution:
(0.8 × 3 ns) + (0.2 × 33 ns) = 2.4 ns + 6.6 ns
= 9 ns

AMAT = 3 + 0.2 x 30 = 9ns

25. I-cache miss rate = 2%


D-cache miss rate = 4%
Miss penalty = 100 cycles
Base CPI (ideal cache) = 2
Load & stores are 36% of instructions
Solution:

Miss cycles per instruction


I-cache: 0.02 × 100 = 2
D-cache: 0.36 × 0.04 × 100 = 1.44
Actual CPI = 2 + 2 + 1.44 = 5.44

Ideal CPI = Base CPI = 2


Speed up = 5.44/2 =2.72

Ideal CPU is 2.72 times faster


26. CPU with 1ns clock, hit time = 1 cycle, miss penalty = 20 cycles, I-cache miss rate = 5%
Calculate: Average memory access time (AMAT)
Solution:

AMAT = Hit time + Miss rate × Miss penalty


AMAT = 1 + 0.05 × 20 = 2ns
2 cycles per instruction
27.

28. Suppose our processor has separate L1 instruction cache and data cache. Our CPI base is 2 clock cycles,
whereas memory accesses take 100 cycles. Our instruction miss rate is 3% while our data miss rate is
10%. 40% of our instructions are loads or stores.

a. What is our processor's CPI stall?


CPI stall = CPI base + L1 inst miss cycles + L1 data miss cycles
= 2 + 1 x 0.03 x 100 + 0.4 x 0.1 x 100
= 2 + 0.07 x 100 = 9 cycles
To improve the performance our processor, we add a unified L2 cache between the L1 caches and
memory. Our L2 cache has a hit time of 10 cycles and a global miss rate of 2%.
b. What is our new CPI stall?
CPI stall = CPI base + L1 inst miss cycles + L1 data miss cycles + L2 inst miss cycles + L2 data miss
cycles
= 2 + (1 x 0.03 x 10) + (0.4 x 0.1 x 10) + (1 x 0.02 x 100 ) + (0.4 x 0.02 x 100)
= 2 + 0.3 + 0.4 + 2 + 0.8
= 2 + 3.5 = 5.5 cycles

Question 2:
You want your AMAT to be <=2 cycles. You have two levels of cache.
L1 Hit Time is 1 cycle; L1 miss rate is 10%;
L2 Hit Time is 3 cycles; L2 Miss Penalty is 100 cycles
What must you optimize your L2 miss rate to be?
2 = 1 + 0.1x ( 3 +100x)
x = 0.07 = 7% miss rate
29. Assume that 33% of the instructions in a program are data accesses. The cache hit ratio is 97% and the
hit time is one cycle, but the miss penalty is 20 cycles.
AMAT = Hit time + (Miss rate x Miss penalty)

30. • CPI = 1.0 when cache has 100% hit rate

data accesses are only performed in loads and stores

load/stores make up 50% of all instructions

miss penalty = 50 clock cycles

miss rate = 1%

• How much faster is an ideal machine?

CPU time ideal = IC x 1.0 xclock cycle time

CPU time this machine = IC x (1.0 + 1.50 x 0.01 x 50) x clock cycle time

= IC x 1.75 x clock cycle time

ideal machine is 1.75 times faster (75%)

31. Assume that 33% of the instructions in a program are data accesses. The cache hit ratio is 97% and the
hit time is one cycle, but the miss penalty is 20 cycles.

Solution:
Average Memory Access Time (AMAT) = Hit time + (Miss rate x Miss penalty)

Miss rate = 1 – 0.97 = 0.03

AMAT = 1 + (0.03 × 0.33 × 20) = 1.198

32. Suppose our processor has separate L1 instruction cache and data cache. Our CPI-base is 2 clock cycles,
whereas memory accesses take 100 cycles. Our Instruction cache hit rate is 97% while our Data cache
hit rate is 90%. 40% of our instructions are loads or stores.

[Stalled means the processor was not making forward progress with instructions, and usually happens
because it is waiting on memory I/O.]
Alternate statement:

Suppose our processor has separate L1 instruction cache and data cache. Our CPI-base is 2 clock cycles,
whereas memory accesses take 100 cycles. Our Instruction cache miss rate is 3% while our Data cache
miss rate is 10%. 40% of our instructions are loads or stores.

a. What is our processor's CPI stall?

CPI stall = CPI base + L1 inst miss cycles + L1 data miss cycles
= 2 + 1 ×0.03 ×100 + 0.4×0.1×100
= 2 +0.07 × 100 = 9 cycles

33.
You want your Average Memory Access Time (AMAT) to be <=2 cycles. You have two levels of cache.
L1 Hit Time is 1 cycle; L1 miss rate is 10%;
L2 Hit Time is 3 cycles; L2 Miss Penalty is 100 cycles
What must you optimize your L2 miss rate to be?
2 = 1 +0.1× (3 + x ×100)
x = 0.07 = 7% miss rate

34. Assume
CPI = 1.0 when cache has 100% hit rate
Miss penalty = 50 clock cycles
Miss rate = 1%
How much faster is an ideal machine?

CPU time ideal = IC × 1.0 × clock cycle time


CPU time this machine = IC × (1.0 + 0.01 × 50) × clock cycle time = IC × 1.5 × clock cycle time
ideal machine is 1.5 times faster (50%)

35. A machine has a base CPI of 2 clock cycles. Measurements obtained show that the instruction miss
rate is 12% and the data miss rate is 6%, and that on average, 30% of all instructions contain one data
reference. The miss penalty for the cache is 10 cycles. What is the total CPI?

Effective CPI = 2.0 + instruction miss cycles + data miss cycles


= 2.0 + 0.12×10 + 0.30×0.06×10 = 2.0 + 1.2 + 0.18 = 3.38

36. I-cache miss rate = 2%


D-cache miss rate = 4%
Miss penalty = 100 cycles
Base CPI (ideal cache) = 2
Load & stores are 36% of instructions
Miss cycles per instruction
I-cache: 0.02 × 100 = 2
D-cache: 0.36 × 0.04 × 100 = 1.44
Actual CPI = 2 + 2 + 1.44 = 5.44
Ideal CPU is 5.44/2 =2.72 times faster

37. Assume

I-cache miss rate 3%.


D-cache miss rate 5%.
40% of instructions reference data.
miss penalty of 50 cycles.
Base CPI is 2.

What is the CPI including the misses?


How much slower is the machine when misses are taken into account?
Redo the above if the I-miss penalty is reduced to 10 (D-miss still 50)
With I-miss penalty back to 50, what is performance if CPU (and the caches) are 100 times faster

38. Assume
5% I-cache misses.
10% D-cache misses.
1/3 of the instructions access data.
The CPI = 4 if the miss penalty is 0. A 0miss penalty is not realistic of course.
What is the CPI if the miss penalty is 12?
What is the CPI if we upgrade to a double speed cpu+cache, but keep a single speed memory (i.e., a 24
clock miss penalty)?
How much faster is the double speed machine? It would be double speed if the miss penalty were 0 or if
there was a 0% miss rate.

----

Problem:
HT(L1) = 2ns ;
HT(L2) = 10ns ;
Miss Ratio(L1) = 6%,
Miss Ratio (L2) = 2%
RAM Access time = 100ns.
Calculate average memory access time.

If Average Memory Access Time = 3.358 ns, calculate what percent of instruction CPU reads from L1
cache memory?

Miss rate of L-1 = x%

Hit ratio of L-1 = (100 – x)

For the following

Problem:
HT(L1) = 2ns ; Hit Ratio(L1) = 70%
HT(L2) = 10ns ; Hit Ratio(L1) = 80%
HT(L3) = 20ns ;
Global Miss Rate = 4% and RAM Access time = 100ns
Calculate: Average Memory Access Time

Assume:
L1 I-cache miss rate 3% L1 D-cache miss rate 8%
30% of instructions reference data L2 miss rate 4%
L2 time of 10 clock cycles Memory access time 90 clock cycles
Base CPI of 2.5 CPU Clock rate 3GHz
Which of the following implementation will be better?
a. A faster L2 of 6 clock cycles
b. A larger L2 of miss rate 3%
Average CPI: base CPI + L-1 Instruction miss cycles + L-1 Data Miss cycles + L-2 instruction miss cycles + L-2
Data Miss Cycles

base CPI = 2.5

L-1 Instruction miss cycles = Instruction count x Instruction MR x L-1 Miss Penalty = 1 x 0.03 x 10

L-1 data miss cycles = Data access count x data MR x L-1 Miss Penalty = (1 x0.3)x 0.08 x 10

L-2 Instruction miss cycles = Instruction count x Instruction MR x L-2 Miss Penalty = 1 x 0.04 x 90

L-2 data miss cycles = data access count x data MR x L-2 Miss Penalty = (1x0.3) x 0.04 x 90

Average CPI =

Case-a
L1 I-cache miss rate 3% L1 D-cache miss rate 8%
30% of instructions reference data L2 miss rate 4%
L2 time of 6 clock cycles Memory access time 90 clock cycles
Base CPI of 2.5 CPU Clock rate 3GHz

Average CPI=

Case-b
L1 I-cache miss rate 3% L1 D-cache miss rate 8%
30% of instructions reference data L2 miss rate 3%
L2 time of 10 clock cycles Memory access time 90 clock cycles
Base CPI of 2.5 CPU Clock rate 3GHz

Average CPI =

Which of the following implementation will be better?


a. A faster L2 of 6 clock cycles
b. A larger L2 of miss rate 3%
Which of the following implementation will be better?
a. A faster L2 of 6 clock cycles
b. A larger L2 of miss rate 3%

Assume:
L1 I-cache miss rate 3% L1 D-cache miss rate 8%
40% of instructions reference data L2 miss rate 4%
L2 time of 10 clock cycles Memory access time 80 clock cycles
Base CPI of 2 CPU Clock rate 4GHz
Which of the following implementation will be better?
a. A faster L2 of 5 clock cycles
b. A larger L2 of miss rate 2%

Assume
• L1 I-cache miss rate 8%
• L1 D-cache miss rate 10%
• Load Instructions: 15% of total instruction count
• Store Instructions: 25% of total instruction count
• Data ref = 40%
• L2 miss rate 6%
• L2 time of 15ns = 15ns / 0.25ns = 60 cycles
• Memory access time 100ns = 100ns/0.25ns = 400 cycles
• Base CPI of 2 = 0.25 ns x 2 = 0.5 ns
• Clock rate 4GHz
CPU clock period = 1/4x109 = 0.25 x 10-9 sec = 0.25 ns
a) How many instructions per second does this machine execute?

Average Memory Access time (ns)


No of instructions per second does this machine execute = 1 /average memory access time (ns)

1/ 20 x 10-9 =

Average CPI (no of clock cycles)


No of instructions per second does this machine execute = 4GHz /average CPI = 4x109/average CPI

b) How many instructions per second would this machine execute if the L2 cache were eliminated?
Average CPI
No of instructions per second does this machine execute = 4GHz /average CPI

c) How many instructions per second would this machine execute if both caches were eliminated?
Average CPI
No of instructions per second does this machine execute = 4GHz /average CPI

d) How many instructions per second would this machine execute if the L2 cache had a 0% miss
rate (L1 as originally specified)?
e) How many instructions per second would this machine execute if both L1 caches had a 0% miss
rate?

Problem: For Intel Core i7-965 processor, assume that


HT(L1) = 4ns ; Miss Ratio(L1) = 10%
HT(L2) = 10ns ; Miss Ratio(L2) = 6%
HT(L3) = 20ns ; Miss Ratio(L3) = 2%
RAM Access time = 100ns
Calculate: Average Memory Access Time

Assume
• L1 I-cache miss rate 5%
• L1 D-cache miss rate 8%
• Load Instructions: 25% of total instruction count
• Store Instructions: 10% of total instruction count
• L2 miss rate 4%
• L2 time of 15ns
• Memory access time 90ns
• Base CPI of 2
• Clock rate 3GHz
f) How many instructions per second does this machine execute?
g) How many instructions per second would this machine execute if the L2 cache were eliminated?
h) How many instructions per second would this machine execute if both caches were eliminated?
i) How many instructions per second would this machine execute if the L2 cache had a 0% miss
rate (L1 as originally specified)?
j) How many instructions per second would this machine execute if both L1 caches had a 0% miss
rate?

Problem: For Intel Core i7-965 processor, assume that


HT(L1) = 5ns ; Miss Ratio(L1) = 10%
HT(L2) = 10ns ; Miss Ratio(L2) = 8%
HT(L3) = 15ns ; Miss Ratio(L3) = 2%
RAM Access time = 100ns
Calculate: Average Memory Access Time
AMAT = HT(L1) + MR(L1)xMP(L1)
= HT(L1) + MR(L1)x[HT(L2) + MR(L2)xMP(L-2)]
= HT(L1) + MR(L1)x[HT(L2) + MR(L2)x{HT(L-3) + MR(L-3) x MP(L-3)}]
= 5ns + 0.1 x [10ns + 0.08x{15ns + 0.02 x 100ns}] =

You might also like