MC Module-5 Notes
MC Module-5 Notes
MC Module-5 Notes
A cache is a small, fast array of memory placed between the processor core and main memory that
stores portions of recently referenced main memory.
The goal of a cache is to reduce the memory access bottleneck imposed on the processor core by slow
memory.
Often used with a cache is a write buffer—a very small first-in-first-out (FIFO) memory placed
between the processor core and main memory. The purpose of a write buffer is to free the processor
core and cache memory from the slow write time associated with writing to main memory.
Since cache memory only represents a very small portion of main memory, the cache fills quickly
during program execution.
Once full, the cache controller frequently evicts existing code or data from cache memory to make
more room for the new code or data.
This eviction process tends to occur randomly, leaving some data in cache and removing others.
Memory Hierarchy
Figure reviews some of this information to show where a cache and write buffer fit in the hierarchy.
The innermost level of the hierarchy is at the processor core.
This memory is so tightly coupled to the processor that in many ways it is difficult to think of it as
separate from the processor. This memory is known as a register file.
Also at the primary level is main memory. It includes volatile components like SRAM and DRAM, and
non-volatile components like flash memory
The next level is secondary storage—large, slow, relatively inexpensive mass storage devices such as
disk drives or removable memory.
Also included in this level is data derived from peripheral devices, which are characterized by their
extremely long access times.
A cache may be incorporated between any level in the hierarchy where there is a significant access
time difference between memory components.
The L1 cache is an array of high-speed, on-chip memory that temporarily holds code and data from a
slower level.
The write buffer is a very small FIFO buffer that supports writes to main memory from the cache.
1
ECE, DSCE
ARM processors Module - 5
An L2 cache is located between the L1 cache and slower memory. The L1 and L2 caches are also
known as the primary and secondary caches.
Figure shows the relationship that a cache has with main memory system and the processor core.
Upper part is without Cache and lower one is With cache.
If a cached core supports virtual memory, it can be located between the core and the memory
management unit (MMU), or between the MMU and physical memory.
Placement of the cache before or after the MMU determines the addressing realm the cache
operates in and how a programmer views the cache memory system.
2
ECE, DSCE
ARM processors Module - 5
Cache Architecture
ARM uses two bus architectures in its cached cores, the Von Neumann and the Harvard.
A different cache design is used to support the two architectures.
In processor cores using the Von Neumann architecture, there is a single cache used for instruction
and data. This type of cache is known as a unified cache.
The Harvard architecture has separate instruction and data buses to improve overall system
performance, but supporting the two buses requires two caches.
In processor cores using the Harvard architecture, there are two caches: an instruction cache (I-cache)
and a data cache (D-cache). This type of cache is known as a split cache.
3
ECE, DSCE
ARM processors Module - 5
4
ECE, DSCE
ARM processors Module - 5
Set Associativity
This structural design feature is a change that divides the cache memory into smaller equal units,
called ways.
5
ECE, DSCE
ARM processors Module - 5
A 4 KB, four-way set associative cache. The cache has 256 total cache lines, which are
separated into four ways, each containing 64 cache lines. The cache line contains four
words
the set index now addresses more than one cache line—it points to one cache line in each way.
Instead of one way of 256 lines, the cache has four ways of 64 lines.
The four cache lines with the same set index are said to be in the same set, which is the origin of the
name “set index.”
A data or code block from main memory can be allocated to any of the four ways in a set without
affecting program behaviour; in other words the storing of data in cache lines within a set does not
affect program execution.
The important thing to note is that the data or code blocks from a specific location in main memory
can be stored in any cache line that is a member of a set.
The bit field for the tag is now two bits larger, and the set index bit field is two bits smaller. This
means four million main memory addresses now map to one set of four cache lines, instead of one
million addresses mapping to one location.
6
ECE, DSCE
ARM processors Module - 5
Write Buffers
A write buffer is a very small, fast FIFO memory buffer that temporarily holds data that the processor
would normally write to main memory.
In a system with a write buffer, data is written at high speed to the FIFO and then emptied to slower
main memory.
The write buffer reduces the processor time taken to write small blocks of sequential data to main
memory. The FIFO memory of the write buffer is at the same level in the memory hierarchy as the L1
cache and is shown in Figure
The efficiency of the write buffer depends on the ratio of main memory writes to the number of
instructions executed.
A write buffer also improves cache performance; the improvement occurs during cache line evictions.
If the cache controller evicts a dirty cache line, it writes the cache line to the write buffer instead of
main memory.
Data written to the write buffer is not available for reading until it has exited the write buffer to main
memory.
Measuring Cache Efficiency
The hit rate is the number of cache hits divided by the total number of memory requests over a given
time interval. The value is expressed as a percentage:
The miss rate is similar in form: the total cache misses divided by the total number of memory
requests expressed as a percentage over a time interval. Note that the miss rate also equals 100
minus the hit rate.
hit time—the time it takes to access a memory location in the cache
7
ECE, DSCE
ARM processors Module - 5
miss penalty—the time it takes to load a cache line from main memory into cache.
8
ECE, DSCE