Lecture-04 & 05, Adv. Computer Architecture, CS-522
Lecture-04 & 05, Adv. Computer Architecture, CS-522
Lecture-04 & 05, Adv. Computer Architecture, CS-522
CS-522
MS – Computer Science
Lecture: 04 and 05
• External (secondary)
– Optical disks, magnetic disks and tapes
Capacity of computer memories
• Capacity of Internal Memory
– Typically expressed in bytes (8-bits) or words
– Word size: the natural unit of organisation
– i.e. 8, 16 or 32 bits
• Capacity of External Memory
– Typically expressed in bytes (8-bits)
– KB, MB, GB or even TB etc.
Unit of Transfer
• Internal memory
– Usually governed by data bus width (no: of data lines into and out of
memory module)
– May be equal to the word length, but is often larger, such as 64, 128, or
256 bits
– Word = Addressable unit (= word or byte)
– Smallest location which can be uniquely addressed
• External memory
– Usually a block, usually much larger than a word
Access Methods
• Sequential
– Access must be in a specific linear sequence
– Start at the beginning and read through in order
– Access time depends on location of data and previous location
• Thus highly variable
– e.g. Tape
• Direct
– Individual blocks have unique address based on physical locations
– Access is by jumping to vicinity (surrounding area) plus sequential
searching/counting/waiting
– Access time depends on location and previous location
• Thus variable
– e.g. Disk
Access Methods
• Random
– Individual addresses identify memory locations exactly
– Access time is independent of location or previous access
– e.g. Main memory (RAM) and some Cache
• Associative
– Data is located by a comparison with contents of a portion of the store
– i.e. word is retrieved based on a portion of its contents rather than its address
– Access time is independent of location or previous access
– e.g. Cache (Random Access type)
Performance
• Access time (latency)
– Time between presenting the address to memory and getting the valid
data
– Time it takes to perform a read or write operation
• Memory Cycle time
– Time may be required for the memory to “recover” before next access
– Memory Cycle time = access time + additional time (recovery)
• Transfer Rate
– Rate at which data can be moved (into or out of memory)
• Power consumption
Organisation
• Physical arrangement of bits to form words
• Not always obvious
– e.g. Interleaved
• (to insert something alternately and regularly between the parts of…)
Memory Hierarchy
• Registers
– In CPU
• Internal or Main memory
– May include one or more levels of cache
– “RAM”
• External memory
– Backing store
– HD, FD, CD, DVD, USB, Tapes etc.
Memory design constraints
• How much?
– Capacity
• How fast?
– Access time
• How expensive?
– Cost per bit
18
Cache
• Small amount of fast memory
• Sits between normal main memory (RAM) and CPU
• May be located on CPU chip or module
28
Cache Addresses
• A logical cache (virtual cache)
– stores data using virtual addresses
– processor accesses the cache directly,
• without going through the MMU (Memory Management Unit)
– Advantage: cache access speed is faster than for a physical cache
• as cache can respond before the MMU performs an address translation
– Disadvantage: virtual memory supply each application with the same
virtual memory address space
• each application sees a virtual memory that starts at address 0
• each line of cache need to identify which virtual address space this address
refers to
• A physical cache
– stores data using main memory physical addresses
– processor accesses the cache going through the MMU 29
30
Cache Size
• Cache (size) would be liked small enough so that,
– Cost
• Cost of cache = cost of main memory (approximately)
• More cache is expensive
– Speed
• More cache is faster (up to a point)
• Large cache means large number of gates (for addressing the cache)
– Tend to be slightly slower than smaller ones
32
Mapping Function
• As fewer cache lines than main memory blocks
– An algorithm is needed for mapping main memory blocks into cache
lines
• determining which main memory block currently occupies a cache line
– Three techniques are used
• Direct
• Associative
• Set-associative
35
Direct Mapping Cache Line Table
8 14 2
• 24 bit address
• 2 bit word identifier (4 byte block)
• 22 bit block identifier
– 8 bit tag (=22-14)
– 14 bit slot or line
• No two blocks in the same line have the same Tag field
• Check contents of cache by finding line and checking Tag
Direct Mapping Cache Organization
38
Direct Mapping
Example
39
Direct Mapping pros & cons
• Simple
• Inexpensive to implement
• Disadvantage: Fixed location for given block
– If a program accesses 2 blocks that map to the same line repeatedly,
cache misses are very high
40
Exercise
• Consider a computer system with a direct-mapped cache
having the following characteristics:
– Cache size: 8 KB
– Block size: 8 B
– Main memory size: 256 KB (byte-addressable)
41
Exercise - Solution
• Determine the address of a main memory location.
– 17 bits
42
Associative Mapping
• A main memory block can load into any line of cache
• Memory address is interpreted as tag and word
• Tag uniquely identifies block of memory
• Every line’s tag is examined for a match
• Cache searching gets expensive
Word
Tag 22 bit 2 bit
43
Associative Mapping Summary
• Address length = (s + w) bits
• Number of addressable units = 2s+w words or bytes
• Block size = line size = 2w words or bytes
• Number of blocks in main memory=2s+w/2w = 2s
• Number of lines in cache = undetermined by address
• Size of tag = s bits
44
Associative Mapping Address Structure
Word
Tag 22 bit
2 bit
45
Associative Cache Organization
46
Associative
Mapping Example
47
Associative Mapping pros & cons
• Flexibility: which block to replace when a new block is read
into the cache
– Replacement algorithms are designed to maximize the hit ratio
48
Set Associative Mapping
• Comprise advantages of both direct and associative
• Cache is divided into a number of sets
• Each set contains a number of lines
• A given block maps to any line in a given set
– e.g. Block B can be in any line of set i
• e.g. k lines per set = k-way associative mapping, similarly
• 2 lines per set =2-way associative mapping
– A given block can be in one of 2 lines in only one set
49
Set Associative Mapping Address Structure
Word
Tag 9 bit Set 13 bit 2 bit
50
Set Associative Mapping Summary
• Address length = (s + w) bits
• Number of addressable units = 2s+w words or bytes
• Block size = line size = 2w words or bytes
• Number of blocks in main memory = 2s
• Number of lines in set = k
• Number of sets = v = 2d
• Number of lines in cache = kv = k * 2d
• Size of tag = (s – d) bits
51
Two Way Set Associative Cache Organization
52
Replacement Algorithms
Direct mapping
• No choice
• Each block only maps to one line
• Replace that line
53
Replacement Algorithms
Associative & Set Associative
• Possible Problems
– Multiple CPUs may have individual caches
– I/O may address main memory directly
55
Write through
• All write operations go to main memory as well as cache, (so
that main memory is always valid)
56
Write back
• Minimize memory write operations
• Updates initially made in cache only
• Update bit for cache line (slot) is set when update occurs
• If block is to be replaced, write to main memory only if update
bit is set
• Disadvantages
– Other caches get out of sync
– I/O must access main memory through cache
– 15% of memory references are writes
57
Line Size
• No definitive optimum value
• However,
– A size of 8 to 64 bytes seems reasonably close to optimum
– For HPC systems, 64- and 128-byte cache line sizes are most
frequently used
58
Number of Caches
• On-chip Caches (Level-1, L1)
• Off-chip or external caches (Level-2, L2)
– known as a two-level cache
59
Pentium 4 Cache
• 80386 – no on chip cache
• 80486 – single on chip cache
– 8kB, using 16 byte lines, 4-way set associative organization
• Pentium (all versions) –
– Two on chip caches (both L1)
– Data & instructions caches
• Pentium III – L3 cache added off chip
• Pentium 4
– L1 cache, 16kB, 64 byte lines, four way set associative
– L2 cache, 256kB, 128 byte lines, 8-way set associative
– L3 cache on chip. 1MB, 128 byte lines, 8-way set associative
60
Pentium 4 Block Diagram
61
Quiz-01
A short quiz is scheduled for the class of the upcoming week
63