Test 6 PracticeQuestion Cachememory 1
Test 6 PracticeQuestion Cachememory 1
Test 6 PracticeQuestion Cachememory 1
Programmers want memory to be fast, large, and cheap. Question may be asked whether we need a very
fast and very large memory, since programs access a small proportion of their address space at any time!
Architects have found that they can address these conflicting demands with a hierarchy of memories, with
the fastest, smallest, and most expensive memory per bit at the top of the hierarchy and the slowest, largest,
and cheapest per bit at the bottom. To optimize cost and speed, combination of memory devices are
used. Smaller amount of expensive but fast memory is used close to the processor whereas large
amounts of cheaper but slower memory is used farther from the processor. Hierarchy of memories
give the programmer the illusion that main memory is nearly as fast as the top of the hierarchy and nearly
as big and cheap as the bottom of the hierarchy. Due to hierarchy, to CPU, it would appear as fast as
most expensive memory and as big as the cheapest.
The fastest, smallest, and most expensive memory per bit at the top of the hierarchy and the slowest,
largest, and cheapest per bit at the bottom.
4.
5. Define Hit rate, Hit time, Miss rate, miss penalty, average memory access time, and memory stall cycles.
• Hit Time (HT): The hit time is how long it takes data to be sent from the cache to the processor.
This is usually fast, on the order of 1-3 clock cycles.
• Miss Penalty (MP): The miss penalty is the time to copy data from main memory to the cache.
This often requires dozens of clock cycles (at least). The miss rate is the percentage of misses.
• Miss Rate (MR): 1 – Hit Ratio
• The average memory access time, or AMAT, can then be computed.
AMAT = Hit time + (Miss rate × Miss penalty)
This is just averaging the amount of time for cache hits and the amount of time for cache misses
6. What is locality of reference? Define temporal locality and spatial locality.
7. How does CPU use cache memory? Explain briefly
8. List the steps in sequential order, how CPU access/uses cache memory while it runs a program.
9. What is cache miss? What is the consequence of cache miss? Explain.
10. What are the performance measures of cache memory?
11. What is the justification of block transfer from RAM to cache in the event of cache miss? explain
12. Parameters that matter: RAM
13. If a memory system consists of a single external cache with an access time of 20 ns and a hit rate of
0.92, and a main memory with an access time of 60 ns, what is the average memory access time
(AMAT) of this system?
AMAT = Hit time + Miss rate × Miss penalty
14. If Hit ratio of Cache is 80%, Hit Time (HT) = 5ns and Miss Penalty (MP) = 30ns, calculate AMAT.
Miss Rate = 1- 0.8 = 0.2
AMAT = Hit Time + Miss Rate x Miss Penalty
= 5ns +0.2×30ns = 11ns
15. Hit time is also important for performance Average memory access time (AMAT)
AMAT = Hit time + Miss rate × Miss penalty
Example
CPU with 1ns clock, hit time = 1 cycle, miss penalty = 20 cycles, cache miss rate = 5%
16. Processor clock cycle: 200 ns, Miss Penalty of 50 clock cycles, Miss Ratio of 0.02 misses/instruction,
and Hit time of 1 clock cycle
17. A machine has a base CPI of 2 clock cycles. Measurements obtained show that the instruction miss rate
is 12% and the data miss rate is 6%, and that on average, 30% of all instructions contain one data
reference. The miss penalty for the cache is 10 cycles. What is the total CPI?
Solution:
Please note that Base CPI means the clock cycles required to read from cache memory. You can also say
that this is hit time in CPU clock cycles, i.e., access time of Cache memory in CPU clock cycles.
Average CPI = 2.0 + instruction miss cycles + data miss cycles
= 2.0 + 0.12 x 10 + 0.30 x 0.06 x 10 = 2.0 + 1.2 + 0.18
= 3.38
18. Memory Stall Cycles = Number of Memory Accesses × Miss rate × Miss penalty
19. Machine has a base CPI of 1 clock cycles. Measurements obtained show that the instruction miss rate is
15% and the data miss rate is 6%, and that on average, 40% of all instructions contain one data
reference. The miss penalty for the cache is 20 cycles. What is the total CPI?
Solution:
Average CPI = Base CPI + instruction miss cycles + data miss cycles
I-cache miss rate = 15%
D-cache miss rate = 6%
Miss penalty = 20 cycles
Base CPI (ideal cache) = 1
Load & stores are 40% of instructions
Miss cycles per instruction
I-cache: 1 x 0.15 × 20 = 3
D-cache: 1x 0.40 × 0.06 × 20 = 0.48
Actual CPI = 1 + 3 + 0.48 = 4.48
20. Suppose our processor has separate L1 instruction cache and data cache. Our CPI-base is 2 clock cycles,
whereas memory accesses take 100 cycles. Our Instruction cache miss rate is 3% while our Data cache
miss rate is 10%. 40% of our instructions are loads or stores.
a. What is our processor's CPI stall?
CPI stall = CPI base + L1 inst miss cycles + L1 data miss cycles
= 2 + 1 ×0.03 ×100 + 0.4×0.1×100
= 2 +0.07 × 100 = 9 cycles
To improve the performance our processor, we add a unified L2 cache between the L1 caches and
memory. Our L2 cache has a hit time of 10 cycles and a global miss rate of 2%.
b. What is our new CPI stall?
CPI stall = CPI base + L1 inst miss cycles + L1 data miss cycles + L2 inst miss cycles + L2 data miss
cycles
= 2 + (1 ×0.03 × 10) + (0.4 ×0.1 × 10) + (1×0.02 × 100) + (0.4 ×0.02 × 100)
= 2 + 0.3 + 0.4 + 2 + 0.8
= 2 + 3.5 = 5.5 cycles
21. Assume
1. I-cache miss rate 3%.
2. D-cache miss rate 5%.
3. 40% of instructions reference data.
4. Miss penalty of 50 cycles.
5. Base CPI is 2.
What is the CPI including the misses? Average CPI
How much slower is the machine when misses are taken into account? Average CPI/base CPI
Redo the above if the I-miss penalty is reduced to 10 (D-miss still 50)
With I-miss penalty back to 50, what is performance if CPU (and the caches) are 100 times faster
22. Assume
o 5% I-cache misses.
o 10% D-cache misses.
o 1/3 of the instructions access data.
What is the CPI if the miss penalty is 12?
What is the CPI if miss penalty is 24 clock)?
23. Suppose our processor has separate L1 instruction cache and data cache. Our CPIbase is 2 clock cycles,
whereas memory accesses take 100 cycles. Our I$ miss rate is 3% while our D$ miss rate is 10%. 40%
of our instructions are loads or stores.
24. Assume a memory access to main memory on a cache "miss" takes 30 ns and a cache "hit" takes 3 ns. If
80% of the processor's memory requests result in a cache "hit", what is the average memory access time?
Solution:
(0.8 × 3 ns) + (0.2 × 33 ns) = 2.4 ns + 6.6 ns
= 9 ns
28. Suppose our processor has separate L1 instruction cache and data cache. Our CPI base is 2 clock cycles,
whereas memory accesses take 100 cycles. Our instruction miss rate is 3% while our data miss rate is
10%. 40% of our instructions are loads or stores.
Question 2:
You want your AMAT to be <=2 cycles. You have two levels of cache.
L1 Hit Time is 1 cycle; L1 miss rate is 10%;
L2 Hit Time is 3 cycles; L2 Miss Penalty is 100 cycles
What must you optimize your L2 miss rate to be?
2 = 1 + 0.1x ( 3 +100x)
x = 0.07 = 7% miss rate
29. Assume that 33% of the instructions in a program are data accesses. The cache hit ratio is 97% and the
hit time is one cycle, but the miss penalty is 20 cycles.
AMAT = Hit time + (Miss rate x Miss penalty)
miss rate = 1%
CPU time this machine = IC x (1.0 + 1.50 x 0.01 x 50) x clock cycle time
31. Assume that 33% of the instructions in a program are data accesses. The cache hit ratio is 97% and the
hit time is one cycle, but the miss penalty is 20 cycles.
Solution:
Average Memory Access Time (AMAT) = Hit time + (Miss rate x Miss penalty)
32. Suppose our processor has separate L1 instruction cache and data cache. Our CPI-base is 2 clock cycles,
whereas memory accesses take 100 cycles. Our Instruction cache hit rate is 97% while our Data cache
hit rate is 90%. 40% of our instructions are loads or stores.
[Stalled means the processor was not making forward progress with instructions, and usually happens
because it is waiting on memory I/O.]
Alternate statement:
Suppose our processor has separate L1 instruction cache and data cache. Our CPI-base is 2 clock cycles,
whereas memory accesses take 100 cycles. Our Instruction cache miss rate is 3% while our Data cache
miss rate is 10%. 40% of our instructions are loads or stores.
CPI stall = CPI base + L1 inst miss cycles + L1 data miss cycles
= 2 + 1 ×0.03 ×100 + 0.4×0.1×100
= 2 +0.07 × 100 = 9 cycles
33.
You want your Average Memory Access Time (AMAT) to be <=2 cycles. You have two levels of cache.
L1 Hit Time is 1 cycle; L1 miss rate is 10%;
L2 Hit Time is 3 cycles; L2 Miss Penalty is 100 cycles
What must you optimize your L2 miss rate to be?
2 = 1 +0.1× (3 + x ×100)
x = 0.07 = 7% miss rate
34. Assume
CPI = 1.0 when cache has 100% hit rate
Miss penalty = 50 clock cycles
Miss rate = 1%
How much faster is an ideal machine?
35. A machine has a base CPI of 2 clock cycles. Measurements obtained show that the instruction miss
rate is 12% and the data miss rate is 6%, and that on average, 30% of all instructions contain one data
reference. The miss penalty for the cache is 10 cycles. What is the total CPI?
37. Assume
38. Assume
5% I-cache misses.
10% D-cache misses.
1/3 of the instructions access data.
The CPI = 4 if the miss penalty is 0. A 0miss penalty is not realistic of course.
What is the CPI if the miss penalty is 12?
What is the CPI if we upgrade to a double speed cpu+cache, but keep a single speed memory (i.e., a 24
clock miss penalty)?
How much faster is the double speed machine? It would be double speed if the miss penalty were 0 or if
there was a 0% miss rate.
----
Problem:
HT(L1) = 2ns ;
HT(L2) = 10ns ;
Miss Ratio(L1) = 6%,
Miss Ratio (L2) = 2%
RAM Access time = 100ns.
Calculate average memory access time.
If Average Memory Access Time = 3.358 ns, calculate what percent of instruction CPU reads from L1
cache memory?
Problem:
HT(L1) = 2ns ; Hit Ratio(L1) = 70%
HT(L2) = 10ns ; Hit Ratio(L1) = 80%
HT(L3) = 20ns ;
Global Miss Rate = 4% and RAM Access time = 100ns
Calculate: Average Memory Access Time
Assume:
L1 I-cache miss rate 3% L1 D-cache miss rate 8%
30% of instructions reference data L2 miss rate 4%
L2 time of 10 clock cycles Memory access time 90 clock cycles
Base CPI of 2.5 CPU Clock rate 3GHz
Which of the following implementation will be better?
a. A faster L2 of 6 clock cycles
b. A larger L2 of miss rate 3%
Average CPI: base CPI + L-1 Instruction miss cycles + L-1 Data Miss cycles + L-2 instruction miss cycles + L-2
Data Miss Cycles
L-1 Instruction miss cycles = Instruction count x Instruction MR x L-1 Miss Penalty = 1 x 0.03 x 10
L-1 data miss cycles = Data access count x data MR x L-1 Miss Penalty = (1 x0.3)x 0.08 x 10
L-2 Instruction miss cycles = Instruction count x Instruction MR x L-2 Miss Penalty = 1 x 0.04 x 90
L-2 data miss cycles = data access count x data MR x L-2 Miss Penalty = (1x0.3) x 0.04 x 90
Average CPI =
Case-a
L1 I-cache miss rate 3% L1 D-cache miss rate 8%
30% of instructions reference data L2 miss rate 4%
L2 time of 6 clock cycles Memory access time 90 clock cycles
Base CPI of 2.5 CPU Clock rate 3GHz
Average CPI=
Case-b
L1 I-cache miss rate 3% L1 D-cache miss rate 8%
30% of instructions reference data L2 miss rate 3%
L2 time of 10 clock cycles Memory access time 90 clock cycles
Base CPI of 2.5 CPU Clock rate 3GHz
Average CPI =
Assume:
L1 I-cache miss rate 3% L1 D-cache miss rate 8%
40% of instructions reference data L2 miss rate 4%
L2 time of 10 clock cycles Memory access time 80 clock cycles
Base CPI of 2 CPU Clock rate 4GHz
Which of the following implementation will be better?
a. A faster L2 of 5 clock cycles
b. A larger L2 of miss rate 2%
Assume
• L1 I-cache miss rate 8%
• L1 D-cache miss rate 10%
• Load Instructions: 15% of total instruction count
• Store Instructions: 25% of total instruction count
• Data ref = 40%
• L2 miss rate 6%
• L2 time of 15ns = 15ns / 0.25ns = 60 cycles
• Memory access time 100ns = 100ns/0.25ns = 400 cycles
• Base CPI of 2 = 0.25 ns x 2 = 0.5 ns
• Clock rate 4GHz
CPU clock period = 1/4x109 = 0.25 x 10-9 sec = 0.25 ns
a) How many instructions per second does this machine execute?
1/ 20 x 10-9 =
b) How many instructions per second would this machine execute if the L2 cache were eliminated?
Average CPI
No of instructions per second does this machine execute = 4GHz /average CPI
c) How many instructions per second would this machine execute if both caches were eliminated?
Average CPI
No of instructions per second does this machine execute = 4GHz /average CPI
d) How many instructions per second would this machine execute if the L2 cache had a 0% miss
rate (L1 as originally specified)?
e) How many instructions per second would this machine execute if both L1 caches had a 0% miss
rate?
Assume
• L1 I-cache miss rate 5%
• L1 D-cache miss rate 8%
• Load Instructions: 25% of total instruction count
• Store Instructions: 10% of total instruction count
• L2 miss rate 4%
• L2 time of 15ns
• Memory access time 90ns
• Base CPI of 2
• Clock rate 3GHz
f) How many instructions per second does this machine execute?
g) How many instructions per second would this machine execute if the L2 cache were eliminated?
h) How many instructions per second would this machine execute if both caches were eliminated?
i) How many instructions per second would this machine execute if the L2 cache had a 0% miss
rate (L1 as originally specified)?
j) How many instructions per second would this machine execute if both L1 caches had a 0% miss
rate?