Lecture-04 & 05, Adv. Computer Architecture, CS-522

Advanced Computer Architecture
CS-522
MS – Computer Science
Credit Hours : 3-0
Dr. Shahid Latif (Associate Professor)
Department of Computer Science & IT

1
Sarhad University of Science and Information Technology, Peshawar
Course Details
Course title/code: Adv. Computer Architecture/CS-522
Lecture: 04 and 05
Topic: Cache Memory
Program: MS – Computer Science (Sem: 1st, 2nd, 3rd & 4th)

Sarhad University of Science and Information Technology, Peshawar 2
Lecture Outlines
• Computer Memory System • Cache Memory
Overview • Cache and Main Memory
• Characteristic of Memory Structure
System
• Elements of Cache Design
• Location, Capacity, Unit of
transfer, Access method, • Cache Addresses: Logical
Performance, Physical type, and Physical addresses
Physical characteristics, • Cache Size
Organization etc. • Mapping Function
• The Memory Hierarchy • Direct Mapping
• Cache Memory Principles
• What is Cache? • Associative Mapping
• Cache/Main memory • Set-Associative
structure Mapping
• Cache Operation 3
Characteristics of Computer Mem.
• Location
• Capacity
• Unit of transfer
• Access method
• Performance
• Physical type
• Physical characteristics
• Organisation
Location of computer memories
• Internal (main memory)
– Processor (local memory)
• Registers PC, ACC, MQ, MAR, MBR, IR IBR etc.
– Main memory (RAM), Cache
• External (secondary)
– Optical disks, magnetic disks and tapes
Capacity of computer memories
• Capacity of Internal Memory
– Typically expressed in bytes (8-bits) or words
– Word size: the natural unit of organisation
– i.e. 8, 16 or 32 bits
• Capacity of External Memory
– Typically expressed in bytes (8-bits)
– KB, MB, GB or even TB etc.
Unit of Transfer
• Internal memory
– Usually governed by data bus width (no: of data lines into and out of
memory module)
– May be equal to the word length, but is often larger, such as 64, 128, or
256 bits
– Word = Addressable unit (= word or byte)
– Smallest location which can be uniquely addressed
• External memory
– Usually a block, usually much larger than a word
Access Methods
• Sequential
– Access must be in a specific linear sequence
– Start at the beginning and read through in order
– Access time depends on location of data and previous location
• Thus highly variable
– e.g. Tape
• Direct
– Individual blocks have unique address based on physical locations
– Access is by jumping to vicinity (surrounding area) plus sequential
searching/counting/waiting
– Access time depends on location and previous location
• Thus variable
– e.g. Disk
Access Methods
• Random
– Individual addresses identify memory locations exactly
– Access time is independent of location or previous access
– e.g. Main memory (RAM) and some Cache
• Associative
– Data is located by a comparison with contents of a portion of the store
– i.e. word is retrieved based on a portion of its contents rather than its address
– Access time is independent of location or previous access
– e.g. Cache (Random Access type)
Performance
• Access time (latency)
– Time between presenting the address to memory and getting the valid
data
– Time it takes to perform a read or write operation
• Memory Cycle time
– Time may be required for the memory to “recover” before next access
– Memory Cycle time = access time + additional time (recovery)
• Transfer Rate
– Rate at which data can be moved (into or out of memory)
Recovery: regeneration of data after read-destructive

Physical Types
• Semiconductor
– RAM
• Magnetic
– Disk & Tape
• Optical
– CD & DVD
• Others (such as magneto-optical)
– Bubble
• A thin film of a magnetic material to hold small magnetized areas, known
as bubbles or domains, each storing one bit of data
Physical Characteristics
• Volatile/non-volatile
– Volatile: information is lost when electrical power is switched off
– Non-Volatile: no electrical power is needed to retain information
• Semiconductor memories may/may not be volatile
• Magnetic surface memories are non-volatile
• Erasable/non-erasable
– Non-erasable: cannot be altered, except by destroying the storage
unit
• such as read-only memory (ROM)
– A practical non-erasable memory must also be nonvolatile
• Power consumption
Organisation
• Physical arrangement of bits to form words
• Not always obvious
– e.g. Interleaved
• (to insert something alternately and regularly between the parts of…)
Memory Hierarchy
• Registers
– In CPU
• Internal or Main memory
– May include one or more levels of cache
– “RAM”
• External memory
– Backing store
– HD, FD, CD, DVD, USB, Tapes etc.
Memory design constraints
• How much?
– Capacity
• How fast?
– Access time
• How expensive?
– Cost per bit
• Relationships among the three (capacity, access time, cost)

holds:
– Faster access time, greater cost per bit

– Greater capacity, smaller cost per bit
– Greater capacity, slower access time
Memory Hierarchy - Diagram
As one goes down the hierarchy
a. Decreasing cost per bit

b. Increasing capacity
c. Increasing access time
d. Decreasing frequency of access
of memory by processor
So you want fast?
• Smaller, more expensive but faster memory (i.e. 1000 words, access
time=0.01µs) is supplemented (complemented) by larger, cheaper but
slower memories (i.e. 100,000 words, access time= 0.1µs)
• It is possible to build a computer which uses only static RAM

(see later)
– This would be very fast
– This would need no cache
– This would cost a very large amount
Cache Memory Principles
18
Cache
• Small amount of fast memory
• Sits between normal main memory (RAM) and CPU
• May be located on CPU chip or module
• Contains a copy of portions of main memory

– When the processor attempts to read a word of memory,
– a check is made to determine if the word is in the cache
– If so, the word is delivered to the processor
– If not, a block of main memory (some fixed number of words) is read
into the cache and then the word is delivered to the processor
20
Cache/Main Memory Structure
Tag = portion of main memory address
C (no: of lines) << M (no: of blocks)

Exercise
• A computer system has a main memory with a capacity of 2GB.
• The system uses a memory architecture with a block size of 512 B.
• Calculate the following:
• The number of main memory blocks.

• The size of the memory address (in bits) required to address each block
uniquely.
• If the system uses a byte-addressable memory, calculate the total number of
unique addresses in the main memory.
Exercise - Solution
• Number of Main Memory Blocks:
Number of Blocks = Memory Size/Block Size
= 2GB/512B
= 1GB
• Size of Memory Address:

Memory Address bits = n = log 2 (Number of Blocks) = 2n
• Total Number of Unique Addresses:

Total number of unique addresses = log 2
(Number of memory location) or use 2n
Exercise
• A computer system has a cache memory with a total size of 64
KB
— Cache line is organized with a size of 128B (block size)
• Calculate the following:
• The number of cache lines in the cache.

• The size of the memory address (in bits) required to uniquely address each
cache line.
• The number of cache lines = cache size/block size

• The size of the memory address = log2 (number of
cache lines) or use 2n
• Where n is number of address bits
Cache operation – overview
• CPU requests contents of memory location
• Check cache for this data
• If present, get from cache (fast)
• If not present, read required block from main memory to cache
• Then deliver from cache to CPU
• Cache includes tags to identify which block of main memory is
in each cache slot
• C (no: of lines)<<M (no: of blocks),
– a single line of cache can not be uniquely and permanently dedicated to the one
block
– So tags are used to identify which block is stored
Cache Read Operation
Flowchart
Typical Cache Organization
Elements of Cache Design
28
Cache Addresses
• A logical cache (virtual cache)
– stores data using virtual addresses
– processor accesses the cache directly,
• without going through the MMU (Memory Management Unit)
– Advantage: cache access speed is faster than for a physical cache
• as cache can respond before the MMU performs an address translation
– Disadvantage: virtual memory supply each application with the same
virtual memory address space
• each application sees a virtual memory that starts at address 0
• each line of cache need to identify which virtual address space this address
refers to
• A physical cache
– stores data using main memory physical addresses
– processor accesses the cache going through the MMU 29
30
Cache Size
• Cache (size) would be liked small enough so that,
– Cost
• Cost of cache = cost of main memory (approximately)
• More cache is expensive
– Speed
• More cache is faster (up to a point)
• Large cache means large number of gates (for addressing the cache)
– Tend to be slightly slower than smaller ones
– Chip and board area

• Available chip area also limits cache size
31
Comparison of Cache Sizes
32
Mapping Function
• As fewer cache lines than main memory blocks
– An algorithm is needed for mapping main memory blocks into cache
lines
• determining which main memory block currently occupies a cache line
– Three techniques are used
• Direct
• Associative
• Set-associative
• For all three cases: example include

– Cache of 64kB
– Cache line/block of 4 bytes (each line=32-bits)
• i.e. cache is 16k (214) lines of 4 bytes each
– 16MB main memory
• 24 bit address i.e. (224=16M)
• 4M blocks of 4 Bytes each
33
Direct Mapping
• Each block of main memory maps to only one cache line
– i.e. if a block is in cache, it must be in one specific place
• Main memory address is in two parts (3-fields)

– Least Significant w bits (2 bits) identify unique word within a block
– Most Significant s (22 bits) bits specify one memory block
– The MSBs are split into
• a cache line field r and
• a tag of s-r (most significant)
Tag s-r Line or Slot r Word w

8 14 2
34
Direct Mapping Summary
• Address length = (s + w) bits
• Number of addressable units = 2s + w words or bytes
• Block size = line size = 2w words or bytes
• Number of blocks in main memory = 2s+w/2w = 2s
• Number of lines in cache = m = 2r
• Size of cache = 2r + w
• Size of tag = (s – r) bits
35
Direct Mapping Cache Line Table
Cache line Main Memory blocks assigned
0 0, 100, 200, 300,…,

1 1,101, 201, 301,…,
--- -------------------
36
Direct Mapping Address Structure
Tag s-r Line or Slot r Word w
8 14 2
• 24 bit address
• 2 bit word identifier (4 byte block)
• 22 bit block identifier
– 8 bit tag (=22-14)
– 14 bit slot or line
• No two blocks in the same line have the same Tag field
• Check contents of cache by finding line and checking Tag
Direct Mapping Cache Organization
38
Direct Mapping
Example
39
Direct Mapping pros & cons
• Simple
• Inexpensive to implement
• Disadvantage: Fixed location for given block
– If a program accesses 2 blocks that map to the same line repeatedly,
cache misses are very high
40
Exercise
• Consider a computer system with a direct-mapped cache
having the following characteristics:
– Cache size: 8 KB
– Block size: 8 B
– Main memory size: 256 KB (byte-addressable)
• Determine the address of a main memory location.

• Calculate the corresponding cache line, tag and word/block
offset bits for the under given addresses.
– 1 0110 0010 1011 0010
– 1 0001 1101 0111 1011
41
Exercise - Solution
• Determine the address of a main memory location.
– 17 bits
• Calculate the corresponding cache line, tag and word/block

offset bits for the under given addresses.
– 1 0110 0010 1011 0010
Tag Cache Line word

1 011 0 0010 1011 0 010
42
Associative Mapping
• A main memory block can load into any line of cache
• Memory address is interpreted as tag and word
• Tag uniquely identifies block of memory
• Every line’s tag is examined for a match
• Cache searching gets expensive
Word
Tag 22 bit 2 bit
43
Associative Mapping Summary
• Number of addressable units = 2s+w words or bytes
• Number of blocks in main memory=2s+w/2w = 2s
• Number of lines in cache = undetermined by address
• Size of tag = s bits
44
Associative Mapping Address Structure
Word
Tag 22 bit
2 bit
• 22 bit tag stored with each 32 bit block of data

• Compare tag field with tag entry in cache to check for hit
• Least significant 2 bits of address identify which word is
required from 32 bit data block
45
Associative Cache Organization
46
Associative
Mapping Example
47
Associative Mapping pros & cons
• Flexibility: which block to replace when a new block is read
into the cache
– Replacement algorithms are designed to maximize the hit ratio
• Disadvantage: complex circuitry required to examine the tags

of all cache lines in parallel
48
Set Associative Mapping
• Comprise advantages of both direct and associative
• Cache is divided into a number of sets
• Each set contains a number of lines
• A given block maps to any line in a given set
– e.g. Block B can be in any line of set i
• e.g. k lines per set = k-way associative mapping, similarly
• 2 lines per set =2-way associative mapping
– A given block can be in one of 2 lines in only one set
49
Set Associative Mapping Address Structure
Word
Tag 9 bit Set 13 bit 2 bit
• Use set field to determine cache set to look in

– 13 bit set number
– Block number in main memory is modulo 213
– 000000, 00A000, 00B000, 00C000 … map to same set
• Compare tag field to see if we have a hit
50
Set Associative Mapping Summary
• Number of addressable units = 2s+w words or bytes
• Number of blocks in main memory = 2s
• Number of lines in set = k
• Number of sets = v = 2d
• Number of lines in cache = kv = k * 2d
• Size of tag = (s – d) bits
51
Two Way Set Associative Cache Organization
52
Replacement Algorithms
Direct mapping
• No choice
• Each block only maps to one line
• Replace that line
• Nothing to do with any algorithm
53
Replacement Algorithms
Associative & Set Associative
• Hardware implemented algorithm (to achieve high speed)
• Least Recently used (LRU)

– e.g. in 2 way set associative
– Which of the 2 block is LRU?
• First in first out (FIFO)
– replace block that has been in cache longest
• Least frequently used (LFU)
– replace block which has had fewest hits
• Random
– not based on usage
54
Write Policy
• Must not overwrite a cache block unless main memory is up to
date
– E.g. 2+3=5
• A variety of write policies are possible
• Possible Problems
– Multiple CPUs may have individual caches
– I/O may address main memory directly
55
Write through
• All write operations go to main memory as well as cache, (so
that main memory is always valid)
• Multiple CPUs can monitor main memory traffic to keep local

cache (its own) up to date
– Disadvantage = Lots of traffic = bottleneck
– Slows down writes
56
Write back
• Minimize memory write operations
• Updates initially made in cache only
• Update bit for cache line (slot) is set when update occurs
• If block is to be replaced, write to main memory only if update
bit is set
• Disadvantages
– Other caches get out of sync
– I/O must access main memory through cache
– 15% of memory references are writes
57
Line Size
• No definitive optimum value
• However,
– A size of 8 to 64 bytes seems reasonably close to optimum
– For HPC systems, 64- and 128-byte cache line sizes are most
frequently used
58
Number of Caches
• On-chip Caches (Level-1, L1)
• Off-chip or external caches (Level-2, L2)
– known as a two-level cache
• Due to shrinkage of processor components

– Most microprocessors have moved the L2 cache onto the processor
chip and added an L3 cache
• More recently, many microprocessors have incorporated an

on-chip L3 cache
59
Pentium 4 Cache
• 80386 – no on chip cache
• 80486 – single on chip cache
– 8kB, using 16 byte lines, 4-way set associative organization
• Pentium (all versions) –
– Two on chip caches (both L1)
– Data & instructions caches
• Pentium III – L3 cache added off chip
• Pentium 4
– L1 cache, 16kB, 64 byte lines, four way set associative
– L2 cache, 256kB, 128 byte lines, 8-way set associative
– L3 cache on chip. 1MB, 128 byte lines, 8-way set associative
60
Pentium 4 Block Diagram
61
Quiz-01
A short quiz is scheduled for the class of the upcoming week
The quiz comprises:

2-3 Questions
Duration: 20-30 mints
Worth: 10 Marks
Wrapping Lectures- 1-5
Thank you
Dr. Shahid Latif (Associate Professor)

Sarhad University of Science and Information Technology, Peshawar
63

Lecture-04 & 05, Adv. Computer Architecture, CS-522

Uploaded by

Copyright:

Available Formats

Lecture-04 & 05, Adv. Computer Architecture, CS-522

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture-04 & 05, Adv. Computer Architecture, CS-522

Uploaded by

Copyright:

Available Formats

Advanced Computer Architecture

Credit Hours : 3-0

Dr. Shahid Latif (Associate Professor)

Department of Computer Science & IT

Topic: Cache Memory

Program: MS – Computer Science (Sem: 1st, 2nd, 3rd & 4th)

Department of Computer Science & IT

Recovery: regeneration of data after read-destructive

• Relationships among the three (capacity, access time, cost)

– Faster access time, greater cost per bit

As one goes down the hierarchy

a. Decreasing cost per bit

• It is possible to build a computer which uses only static RAM

• Contains a copy of portions of main memory

Tag = portion of main memory address

C (no: of lines) << M (no: of blocks)

• Calculate the following:

• The number of main memory blocks.

• Size of Memory Address:

• Total Number of Unique Addresses:

• Calculate the following:

• The number of cache lines in the cache.

• The number of cache lines = cache size/block size

– Chip and board area

• For all three cases: example include

• Main memory address is in two parts (3-fields)

Tag s-r Line or Slot r Word w

Cache line Main Memory blocks assigned

0 0, 100, 200, 300,…,

Tag s-r Line or Slot r Word w

• Determine the address of a main memory location.

• Calculate the corresponding cache line, tag and word/block

Tag Cache Line word

• 22 bit tag stored with each 32 bit block of data

• Disadvantage: complex circuitry required to examine the tags

• Use set field to determine cache set to look in

• Compare tag field to see if we have a hit

• Nothing to do with any algorithm

• Hardware implemented algorithm (to achieve high speed)

• Least Recently used (LRU)

• A variety of write policies are possible

• Multiple CPUs can monitor main memory traffic to keep local

• Due to shrinkage of processor components

• More recently, many microprocessors have incorporated an

The quiz comprises:

Dr. Shahid Latif (Associate Professor)

Department of Computer Science & IT

You might also like