Lecture-04 & 05, Adv. Computer Architecture, CS-522

Download as pdf or txt
Download as pdf or txt
You are on page 1of 63

Advanced Computer Architecture

CS-522

MS – Computer Science

Credit Hours : 3-0

Dr. Shahid Latif (Associate Professor)

Department of Computer Science & IT


1
Sarhad University of Science and Information Technology, Peshawar
Course Details
Course title/code: Adv. Computer Architecture/CS-522

Lecture: 04 and 05

Topic: Cache Memory

Program: MS – Computer Science (Sem: 1st, 2nd, 3rd & 4th)

Department of Computer Science & IT


Sarhad University of Science and Information Technology, Peshawar 2
Lecture Outlines
• Computer Memory System • Cache Memory
Overview • Cache and Main Memory
• Characteristic of Memory Structure
System
• Elements of Cache Design
• Location, Capacity, Unit of
transfer, Access method, • Cache Addresses: Logical
Performance, Physical type, and Physical addresses
Physical characteristics, • Cache Size
Organization etc. • Mapping Function
• The Memory Hierarchy • Direct Mapping
• Cache Memory Principles
• What is Cache? • Associative Mapping
• Cache/Main memory • Set-Associative
structure Mapping
• Cache Operation 3
Characteristics of Computer Mem.
• Location
• Capacity
• Unit of transfer
• Access method
• Performance
• Physical type
• Physical characteristics
• Organisation
Location of computer memories
• Internal (main memory)
– Processor (local memory)
• Registers PC, ACC, MQ, MAR, MBR, IR IBR etc.
– Main memory (RAM), Cache

• External (secondary)
– Optical disks, magnetic disks and tapes
Capacity of computer memories
• Capacity of Internal Memory
– Typically expressed in bytes (8-bits) or words
– Word size: the natural unit of organisation
– i.e. 8, 16 or 32 bits
• Capacity of External Memory
– Typically expressed in bytes (8-bits)
– KB, MB, GB or even TB etc.
Unit of Transfer
• Internal memory
– Usually governed by data bus width (no: of data lines into and out of
memory module)
– May be equal to the word length, but is often larger, such as 64, 128, or
256 bits
– Word = Addressable unit (= word or byte)
– Smallest location which can be uniquely addressed

• External memory
– Usually a block, usually much larger than a word
Access Methods
• Sequential
– Access must be in a specific linear sequence
– Start at the beginning and read through in order
– Access time depends on location of data and previous location
• Thus highly variable
– e.g. Tape
• Direct
– Individual blocks have unique address based on physical locations
– Access is by jumping to vicinity (surrounding area) plus sequential
searching/counting/waiting
– Access time depends on location and previous location
• Thus variable
– e.g. Disk
Access Methods
• Random
– Individual addresses identify memory locations exactly
– Access time is independent of location or previous access
– e.g. Main memory (RAM) and some Cache

• Associative
– Data is located by a comparison with contents of a portion of the store
– i.e. word is retrieved based on a portion of its contents rather than its address
– Access time is independent of location or previous access
– e.g. Cache (Random Access type)
Performance
• Access time (latency)
– Time between presenting the address to memory and getting the valid
data
– Time it takes to perform a read or write operation
• Memory Cycle time
– Time may be required for the memory to “recover” before next access
– Memory Cycle time = access time + additional time (recovery)
• Transfer Rate
– Rate at which data can be moved (into or out of memory)

Recovery: regeneration of data after read-destructive


Physical Types
• Semiconductor
– RAM
• Magnetic
– Disk & Tape
• Optical
– CD & DVD
• Others (such as magneto-optical)
– Bubble
• A thin film of a magnetic material to hold small magnetized areas, known
as bubbles or domains, each storing one bit of data
Physical Characteristics
• Volatile/non-volatile
– Volatile: information is lost when electrical power is switched off
– Non-Volatile: no electrical power is needed to retain information
• Semiconductor memories may/may not be volatile
• Magnetic surface memories are non-volatile
• Erasable/non-erasable
– Non-erasable: cannot be altered, except by destroying the storage
unit
• such as read-only memory (ROM)
– A practical non-erasable memory must also be nonvolatile

• Power consumption
Organisation
• Physical arrangement of bits to form words
• Not always obvious
– e.g. Interleaved
• (to insert something alternately and regularly between the parts of…)
Memory Hierarchy
• Registers
– In CPU
• Internal or Main memory
– May include one or more levels of cache
– “RAM”
• External memory
– Backing store
– HD, FD, CD, DVD, USB, Tapes etc.
Memory design constraints
• How much?
– Capacity
• How fast?
– Access time
• How expensive?
– Cost per bit

• Relationships among the three (capacity, access time, cost)


holds:

– Faster access time, greater cost per bit


– Greater capacity, smaller cost per bit
– Greater capacity, slower access time
Memory Hierarchy - Diagram

As one goes down the hierarchy

a. Decreasing cost per bit


b. Increasing capacity
c. Increasing access time
d. Decreasing frequency of access
of memory by processor
So you want fast?
• Smaller, more expensive but faster memory (i.e. 1000 words, access
time=0.01µs) is supplemented (complemented) by larger, cheaper but
slower memories (i.e. 100,000 words, access time= 0.1µs)

• It is possible to build a computer which uses only static RAM


(see later)
– This would be very fast
– This would need no cache
– This would cost a very large amount
Cache Memory Principles

18
Cache
• Small amount of fast memory
• Sits between normal main memory (RAM) and CPU
• May be located on CPU chip or module

• Contains a copy of portions of main memory


– When the processor attempts to read a word of memory,
– a check is made to determine if the word is in the cache
– If so, the word is delivered to the processor
– If not, a block of main memory (some fixed number of words) is read
into the cache and then the word is delivered to the processor
20
Cache/Main Memory Structure

Tag = portion of main memory address

C (no: of lines) << M (no: of blocks)


Exercise
• A computer system has a main memory with a capacity of 2GB.
• The system uses a memory architecture with a block size of 512 B.

• Calculate the following:

• The number of main memory blocks.


• The size of the memory address (in bits) required to address each block
uniquely.
• If the system uses a byte-addressable memory, calculate the total number of
unique addresses in the main memory.
Exercise - Solution
• Number of Main Memory Blocks:
Number of Blocks = Memory Size/Block Size
= 2GB/512B
= 1GB

• Size of Memory Address:


Memory Address bits = n = log 2 (Number of Blocks) = 2n

• Total Number of Unique Addresses:


Total number of unique addresses = log 2
(Number of memory location) or use 2n
Exercise
• A computer system has a cache memory with a total size of 64
KB
— Cache line is organized with a size of 128B (block size)

• Calculate the following:

• The number of cache lines in the cache.


• The size of the memory address (in bits) required to uniquely address each
cache line.

• The number of cache lines = cache size/block size


• The size of the memory address = log2 (number of
cache lines) or use 2n
• Where n is number of address bits
Cache operation – overview
• CPU requests contents of memory location
• Check cache for this data
• If present, get from cache (fast)
• If not present, read required block from main memory to cache
• Then deliver from cache to CPU
• Cache includes tags to identify which block of main memory is
in each cache slot
• C (no: of lines)<<M (no: of blocks),
– a single line of cache can not be uniquely and permanently dedicated to the one
block
– So tags are used to identify which block is stored
Cache Read Operation
Flowchart
Typical Cache Organization
Elements of Cache Design

28
Cache Addresses
• A logical cache (virtual cache)
– stores data using virtual addresses
– processor accesses the cache directly,
• without going through the MMU (Memory Management Unit)
– Advantage: cache access speed is faster than for a physical cache
• as cache can respond before the MMU performs an address translation
– Disadvantage: virtual memory supply each application with the same
virtual memory address space
• each application sees a virtual memory that starts at address 0
• each line of cache need to identify which virtual address space this address
refers to
• A physical cache
– stores data using main memory physical addresses
– processor accesses the cache going through the MMU 29
30
Cache Size
• Cache (size) would be liked small enough so that,

– Cost
• Cost of cache = cost of main memory (approximately)
• More cache is expensive

– Speed
• More cache is faster (up to a point)
• Large cache means large number of gates (for addressing the cache)
– Tend to be slightly slower than smaller ones

– Chip and board area


• Available chip area also limits cache size
31
Comparison of Cache Sizes

32
Mapping Function
• As fewer cache lines than main memory blocks
– An algorithm is needed for mapping main memory blocks into cache
lines
• determining which main memory block currently occupies a cache line
– Three techniques are used
• Direct
• Associative
• Set-associative

• For all three cases: example include


– Cache of 64kB
– Cache line/block of 4 bytes (each line=32-bits)
• i.e. cache is 16k (214) lines of 4 bytes each
– 16MB main memory
• 24 bit address i.e. (224=16M)
• 4M blocks of 4 Bytes each
33
Direct Mapping
• Each block of main memory maps to only one cache line
– i.e. if a block is in cache, it must be in one specific place

• Main memory address is in two parts (3-fields)


– Least Significant w bits (2 bits) identify unique word within a block
– Most Significant s (22 bits) bits specify one memory block
– The MSBs are split into
• a cache line field r and
• a tag of s-r (most significant)

Tag s-r Line or Slot r Word w


8 14 2
34
Direct Mapping Summary
• Address length = (s + w) bits
• Number of addressable units = 2s + w words or bytes
• Block size = line size = 2w words or bytes
• Number of blocks in main memory = 2s+w/2w = 2s
• Number of lines in cache = m = 2r
• Size of cache = 2r + w
• Size of tag = (s – r) bits

35
Direct Mapping Cache Line Table

Cache line Main Memory blocks assigned

0 0, 100, 200, 300,…,


1 1,101, 201, 301,…,
--- -------------------
36
Direct Mapping Address Structure

Tag s-r Line or Slot r Word w

8 14 2

• 24 bit address
• 2 bit word identifier (4 byte block)
• 22 bit block identifier
– 8 bit tag (=22-14)
– 14 bit slot or line
• No two blocks in the same line have the same Tag field
• Check contents of cache by finding line and checking Tag
Direct Mapping Cache Organization

38
Direct Mapping
Example

39
Direct Mapping pros & cons
• Simple
• Inexpensive to implement
• Disadvantage: Fixed location for given block
– If a program accesses 2 blocks that map to the same line repeatedly,
cache misses are very high

40
Exercise
• Consider a computer system with a direct-mapped cache
having the following characteristics:
– Cache size: 8 KB
– Block size: 8 B
– Main memory size: 256 KB (byte-addressable)

• Determine the address of a main memory location.


• Calculate the corresponding cache line, tag and word/block
offset bits for the under given addresses.
– 1 0110 0010 1011 0010
– 1 0001 1101 0111 1011

41
Exercise - Solution
• Determine the address of a main memory location.
– 17 bits

• Calculate the corresponding cache line, tag and word/block


offset bits for the under given addresses.
– 1 0110 0010 1011 0010

Tag Cache Line word


1 011 0 0010 1011 0 010

42
Associative Mapping
• A main memory block can load into any line of cache
• Memory address is interpreted as tag and word
• Tag uniquely identifies block of memory
• Every line’s tag is examined for a match
• Cache searching gets expensive

Word
Tag 22 bit 2 bit

43
Associative Mapping Summary
• Address length = (s + w) bits
• Number of addressable units = 2s+w words or bytes
• Block size = line size = 2w words or bytes
• Number of blocks in main memory=2s+w/2w = 2s
• Number of lines in cache = undetermined by address
• Size of tag = s bits

44
Associative Mapping Address Structure

Word
Tag 22 bit
2 bit

• 22 bit tag stored with each 32 bit block of data


• Compare tag field with tag entry in cache to check for hit
• Least significant 2 bits of address identify which word is
required from 32 bit data block

45
Associative Cache Organization

46
Associative
Mapping Example

47
Associative Mapping pros & cons
• Flexibility: which block to replace when a new block is read
into the cache
– Replacement algorithms are designed to maximize the hit ratio

• Disadvantage: complex circuitry required to examine the tags


of all cache lines in parallel

48
Set Associative Mapping
• Comprise advantages of both direct and associative
• Cache is divided into a number of sets
• Each set contains a number of lines
• A given block maps to any line in a given set
– e.g. Block B can be in any line of set i
• e.g. k lines per set = k-way associative mapping, similarly
• 2 lines per set =2-way associative mapping
– A given block can be in one of 2 lines in only one set

49
Set Associative Mapping Address Structure

Word
Tag 9 bit Set 13 bit 2 bit

• Use set field to determine cache set to look in


– 13 bit set number
– Block number in main memory is modulo 213
– 000000, 00A000, 00B000, 00C000 … map to same set

• Compare tag field to see if we have a hit

50
Set Associative Mapping Summary
• Address length = (s + w) bits
• Number of addressable units = 2s+w words or bytes
• Block size = line size = 2w words or bytes
• Number of blocks in main memory = 2s
• Number of lines in set = k
• Number of sets = v = 2d
• Number of lines in cache = kv = k * 2d
• Size of tag = (s – d) bits

51
Two Way Set Associative Cache Organization

52
Replacement Algorithms
Direct mapping

• No choice
• Each block only maps to one line
• Replace that line

• Nothing to do with any algorithm

53
Replacement Algorithms
Associative & Set Associative

• Hardware implemented algorithm (to achieve high speed)

• Least Recently used (LRU)


– e.g. in 2 way set associative
– Which of the 2 block is LRU?
• First in first out (FIFO)
– replace block that has been in cache longest
• Least frequently used (LFU)
– replace block which has had fewest hits
• Random
– not based on usage
54
Write Policy
• Must not overwrite a cache block unless main memory is up to
date
– E.g. 2+3=5

• A variety of write policies are possible

• Possible Problems
– Multiple CPUs may have individual caches
– I/O may address main memory directly

55
Write through
• All write operations go to main memory as well as cache, (so
that main memory is always valid)

• Multiple CPUs can monitor main memory traffic to keep local


cache (its own) up to date
– Disadvantage = Lots of traffic = bottleneck
– Slows down writes

56
Write back
• Minimize memory write operations
• Updates initially made in cache only
• Update bit for cache line (slot) is set when update occurs
• If block is to be replaced, write to main memory only if update
bit is set

• Disadvantages
– Other caches get out of sync
– I/O must access main memory through cache
– 15% of memory references are writes

57
Line Size
• No definitive optimum value
• However,
– A size of 8 to 64 bytes seems reasonably close to optimum
– For HPC systems, 64- and 128-byte cache line sizes are most
frequently used

58
Number of Caches
• On-chip Caches (Level-1, L1)
• Off-chip or external caches (Level-2, L2)
– known as a two-level cache

• Due to shrinkage of processor components


– Most microprocessors have moved the L2 cache onto the processor
chip and added an L3 cache

• More recently, many microprocessors have incorporated an


on-chip L3 cache

59
Pentium 4 Cache
• 80386 – no on chip cache
• 80486 – single on chip cache
– 8kB, using 16 byte lines, 4-way set associative organization
• Pentium (all versions) –
– Two on chip caches (both L1)
– Data & instructions caches
• Pentium III – L3 cache added off chip
• Pentium 4
– L1 cache, 16kB, 64 byte lines, four way set associative
– L2 cache, 256kB, 128 byte lines, 8-way set associative
– L3 cache on chip. 1MB, 128 byte lines, 8-way set associative

60
Pentium 4 Block Diagram

61
Quiz-01
A short quiz is scheduled for the class of the upcoming week

The quiz comprises:


2-3 Questions
Duration: 20-30 mints
Worth: 10 Marks
Wrapping Lectures- 1-5
Thank you

Dr. Shahid Latif (Associate Professor)

Department of Computer Science & IT


Sarhad University of Science and Information Technology, Peshawar

63

You might also like