Chapter 5
Chapter 5
Chapter 5
University
CpE 440
Computer Architecture
Dr. Haithem Al-Mefleh
Computer Engineering Department
Yarmouk University, Second 2020-2021
1
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
• Principle of Locality
• Programs access a small portion of their address space at any given
time
2 types of locality
• Temporal (locality in time)
an item is referenced tend to be referenced soon
• Loops, arrays, ..
• Sequential access to code, data
2
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
Memory Hierarchy
• Take advantage of Principle of Locality by implementing M hierarchy
• Main M – DRAM
• Caches – SRAM
• Magnetic Disks, or Flashes in Embedded Systems
• Access
time of
Level 1
• Size of
Level n
3
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
• Hit Time
• Time to access upper level; includes determining a miss or a hit
• Miss Penalty
• Time to replace a block in the upper level with corresponding block from lower
level + time to deliver to processor
4
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
10
10
5
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
11
11
12
12
6
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
Cache structure
13
13
Direct Mapped
• Simplest
• Each M location…. directly mapped to exactly 1 location in the cache
• Almost all,
14
14
7
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
An 8-block cache:
3 lowest bits of block
address
15
15
16
16
8
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
17
18
9
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
19
20
10
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
21
22
11
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
23
24
12
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
25
Address Subdivision
• 32-bits address
• One word
Naming Convention
4 KB Cache size of data only
26
26
13
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
2n-blocks Cache
• Each block – 2m words (2m+2 bytes)
• m bits – a word within block
• 2 bits – a byte within word
27
27
16KB / 4B = 4K Words
4K / 4 = K Blocks in cache
tag index word byte
18 10 2 2
28
28
14
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
Address 1200 in M is in
block 1200/16 = 75 in M
29
29
Block Size?
Larger blocks exploit spatial locality to reduce miss rates
• eventually, miss rate may go up if block size is a significant fraction of cache size
• number of blocks that can be in cache becomes small
• a block is replaced before many words are accessed – spatial locality among words of a block
decreases
• Less blocks blocks are replaced many times
30
30
15
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
31
31
Handling Writes
• Inconsistency
• Writing to a cache word that word in M becomes different
• Write policy
• Write Through – write to both Simplest
• Write Back (Copy Back) – write to lower level only when replaced
• More complex to implement
• A dirty bit is needed
32
32
16
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
• On a write miss – fetch from M, place into cache, overwrite, then write to M
using the full address (if write through)
• Example
• 10% of instructions – Stores
• CPI without cache misses – 1
• 100 extra cycles – on every write
• CPI = 1 + 0.1*100 = 11 reduce performance by more than a factor of 10
• Write Buffer – write to cache and a buffer, processor continues while writing to M
• When write to M is complete; free the entry in buffer
• If buffer is full, stall
• rate at which M can finish writes < rate at which processor generates writes
• buffering cannot help
• rate at which M can finish writes > rate at which processor generates writes
• Stalls can happen – when write in bursts
• to reduce this, depth of write buffer > one entry
33
33
34
34
17
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
One-word-wide
35
35
• a M of 2 words
• Miss penalty = 1 +2x15 + 2x1 = 33 memory bus clock cycles
• Bandwidth of a single miss = 0.48 byes per bus clock cycle
• Cost
• Wider bus Wider memory
• Possible increase in cache access time
• Multiplexor & control logic between processor and cache
36
36
18
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
37
37
38
38
19
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
39
39
40
40
20
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
• 5.44/2 = 2.72
41
42
42
21
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
43
43
44
44
22
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
Set-Associative Cache
• A block can be placed in one of a fixed number of locations in cache
• n locations an n-way set-associative cache
45
45
• 8 blocks total
• Block 12 – 1100
• Direct Mapped 1100 block
• 2-ways set-associative 4 sets 1100 set – any block inside this set
• Fully-associative – any block
46
46
23
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
Cache of
8 blocks
47
47
48
48
24
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
Direct,….
49
49
50
50
25
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
Fully associative,…
51
51
52
52
26
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
53
53
54
54
27
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
55
55
56
56
28
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
57
57
Virtual Memory
58
58
29
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
• Main Memory (MM)… a cache for the secondary storage usually magnetic disks
• Share memory among programs – safely/efficiently
• A program can read/write only its part of MM
• Remove burden of limited memory
59
59
• VM automates management
• Page = a VM block
• Virtual Address (VA)
• corresponds to a location in virtual space
• produced by processor
• mapped to a Physical Address (using a combination of hardware and software)
• Page Fault = a VM miss; an accessed page is not in MM
• Address Translation = address mapping; VA an address to access M
60
60
30
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
61
61
62
62
31
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
• A table in memory –
indexes memory to find pages
• A table for each program
• Each entry can have extra information
• protection, …
Example:
32-bit Virtual Address, 4 KB pages
Assuming: 4 Bytes per page table entry;
- shown only 19 bit
63
63
Page Fault
• Operating System
• Find page in next level (usually magnetic disk)
• Decide where to place it in MM
64
64
32
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
65
65
Writes
• Write-back – write back a page to disk when replaced in memory
• Dirty pages
66
66
33
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
67
67
• Best case
• A virtual address translated by TLB, sent to cache, data found in cache
• Worst case
• A reference would miss in all components (TLB, page table, and cache)
68
68
34
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
69
69
Pitfalls
70
70
35
CpE 440, Second 2020-2021, Yarmouk 2/26/2021
University
71
71
Any questions/comments?
Thank you
72
72
36