MC Module-5 Notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

ARM processors Module - 5

 A cache is a small, fast array of memory placed between the processor core and main memory that
stores portions of recently referenced main memory.
 The goal of a cache is to reduce the memory access bottleneck imposed on the processor core by slow
memory.
 Often used with a cache is a write buffer—a very small first-in-first-out (FIFO) memory placed
between the processor core and main memory. The purpose of a write buffer is to free the processor
core and cache memory from the slow write time associated with writing to main memory.
 Since cache memory only represents a very small portion of main memory, the cache fills quickly
during program execution.
 Once full, the cache controller frequently evicts existing code or data from cache memory to make
more room for the new code or data.
 This eviction process tends to occur randomly, leaving some data in cache and removing others.

The Memory Hierarchy and Cache Memory

Memory Hierarchy
 Figure reviews some of this information to show where a cache and write buffer fit in the hierarchy.
 The innermost level of the hierarchy is at the processor core.
 This memory is so tightly coupled to the processor that in many ways it is difficult to think of it as
separate from the processor. This memory is known as a register file.
 Also at the primary level is main memory. It includes volatile components like SRAM and DRAM, and
non-volatile components like flash memory
 The next level is secondary storage—large, slow, relatively inexpensive mass storage devices such as
disk drives or removable memory.
 Also included in this level is data derived from peripheral devices, which are characterized by their
extremely long access times.
 A cache may be incorporated between any level in the hierarchy where there is a significant access
time difference between memory components.
 The L1 cache is an array of high-speed, on-chip memory that temporarily holds code and data from a
slower level.
 The write buffer is a very small FIFO buffer that supports writes to main memory from the cache.

1
ECE, DSCE
ARM processors Module - 5

 An L2 cache is located between the L1 cache and slower memory. The L1 and L2 caches are also
known as the primary and secondary caches.

 Figure shows the relationship that a cache has with main memory system and the processor core.
 Upper part is without Cache and lower one is With cache.

Caches and Memory Management Units

 If a cached core supports virtual memory, it can be located between the core and the memory
management unit (MMU), or between the MMU and physical memory.
 Placement of the cache before or after the MMU determines the addressing realm the cache
operates in and how a programmer views the cache memory system.

2
ECE, DSCE
ARM processors Module - 5

 A logical cache is located between the processor and the MMU.


 The processor can access data from a logical cache directly without going through the MMU.
 A logical cache is also known as a virtual cache.
 A logical cache stores data in a virtual address space.
 physical cache is located between the MMU and main memory. For the processor to access memory,
the MMU must first translate the virtual address to a physical address before the cache memory can
provide data to the core.
 A physical cache stores memory using physical addresses.

Cache Architecture
 ARM uses two bus architectures in its cached cores, the Von Neumann and the Harvard.
 A different cache design is used to support the two architectures.
 In processor cores using the Von Neumann architecture, there is a single cache used for instruction
and data. This type of cache is known as a unified cache.
 The Harvard architecture has separate instruction and data buses to improve overall system
performance, but supporting the two buses requires two caches.
 In processor cores using the Harvard architecture, there are two caches: an instruction cache (I-cache)
and a data cache (D-cache). This type of cache is known as a split cache.

Basic Architecture of a Cache Memory

3
ECE, DSCE
ARM processors Module - 5

A 4 KB cache consisting of 256 cache lines of four 32-bit words.

 A simple cache memory is shown on the right side of Figure


 It has three main parts: a directory store, a data section, and status information. All three parts of the
cache memory are present for each cache line.
 The cache must know where the information stored in a cache line originates from in main memory. It
uses a directory store to hold the address identifying where the cache line was copied from main
memory. The directory entry is known as a cache-tag.
 A cache memory must also store the data read from main memory. This information is held in the
data section
 The size of a cache is defined as the actual code or data the cache can store from main memory.
 There are also status bits in cache memory to maintain state information. Two common status bits
are the valid bit and dirty bit.
 A valid bit marks a cache line as active, meaning it contains live data originally taken from main
memory and is currently available to the processor core on demand.
 A dirty bit defines whether or not a cache line contains data that is different from the value it
represents in main memory.

Basic Operation of a Cache Controller


 The cache controller is hardware that copies code or data from main memory to cache memory
automatically.
 First, the controller uses the set index portion of the address to locate the cache line within the cache
memory that might hold the requested code or data. This cache line contains the cache-tag and status
bits, which the controller uses to determine the actual data stored there.
 The controller then checks the valid bit to determine if the cache line is active, and compares the
cache-tag to the tag field of the requested address. If both the status check and comparison succeed,
it is a cache hit. If either the status check or comparison fails, it is a cache miss.

The Relationship between Cache and Main Memory

4
ECE, DSCE
ARM processors Module - 5

How main memory maps to a direct-mapped cache.


 The figure represents the simplest form of cache, known as a direct-mapped cache.
 In a direct-mapped cache each addressed location in main memory maps to a single location in cache
memory.
 Since main memory is much larger than cache memory, there are many addresses in main memory
that map to the same single location in cache memory.
 The figure shows this relationship for the class of addresses ending in 0x824.
 The set index selects the one location in cache where all values in memory with an ending address of
0x824 are stored.
 The tag field is the portion of the address that is compared to the cache-tag value found in the
directory store.
 The comparison of the tag with the cache-tag determines whether the requested data is in cache or
represents another of the million locations in main memory with an ending address of 0x824.
 During a cache line fill the cache controller may forward the loading data to the core at the same time
it is copying it to cache; this is known as data streaming.
 If valid data exists in this cache line but represents another address block in main memory, the entire
cache line is evicted and replaced by the cache line containing the requested address. This process of
removing an existing cache line as part of servicing a cache miss is known as eviction
 A direct-mapped cache is a simple solution, but there is a design cost inherent in having a single
location available to store a value from main memory.
 Direct-mapped caches are subject to high levels of thrashing—a software battle for the same location
in cache memory.

Set Associativity
 This structural design feature is a change that divides the cache memory into smaller equal units,
called ways.

5
ECE, DSCE
ARM processors Module - 5

A 4 KB, four-way set associative cache. The cache has 256 total cache lines, which are
separated into four ways, each containing 64 cache lines. The cache line contains four
words

 the set index now addresses more than one cache line—it points to one cache line in each way.
Instead of one way of 256 lines, the cache has four ways of 64 lines.
 The four cache lines with the same set index are said to be in the same set, which is the origin of the
name “set index.”
 A data or code block from main memory can be allocated to any of the four ways in a set without
affecting program behaviour; in other words the storing of data in cache lines within a set does not
affect program execution.
 The important thing to note is that the data or code blocks from a specific location in main memory
can be stored in any cache line that is a member of a set.
 The bit field for the tag is now two bits larger, and the set index bit field is two bits smaller. This
means four million main memory addresses now map to one set of four cache lines, instead of one
million addresses mapping to one location.

Increasing Set Associativity


 The ideal goal would be to maximize the set associativity of a cache by designing it so any main
memory location maps to any cache line.
 However, as the associativity increases, so does the complexity of the hardware that supports it. One
method used by hardware designers to increase the set associativity of a cache includes a content
addressable memory (CAM).
 CAM uses a set of comparators to compare the input tag address with a cache-tag stored in each valid
cache line.
 Using a CAM allows many more cache-tags to be compared simultaneously, thereby increasing the
number of cache lines that can be included in a set.

6
ECE, DSCE
ARM processors Module - 5

4 KB 64-way set associative D-cache using a CAM.


 The tag portion of the requested address is used as an input to the four CAMs that simultaneously
compare the input tag with all cache-tags stored in the 64 ways.
 If there is a match, cache data is provided by the cache memory. If no match occurs, a miss signal is
generated by the memory controller.
 The controller enables one of four CAMs using the set index bits.

Write Buffers

 A write buffer is a very small, fast FIFO memory buffer that temporarily holds data that the processor
would normally write to main memory.
 In a system with a write buffer, data is written at high speed to the FIFO and then emptied to slower
main memory.
 The write buffer reduces the processor time taken to write small blocks of sequential data to main
memory. The FIFO memory of the write buffer is at the same level in the memory hierarchy as the L1
cache and is shown in Figure
 The efficiency of the write buffer depends on the ratio of main memory writes to the number of
instructions executed.
 A write buffer also improves cache performance; the improvement occurs during cache line evictions.
If the cache controller evicts a dirty cache line, it writes the cache line to the write buffer instead of
main memory.
 Data written to the write buffer is not available for reading until it has exited the write buffer to main
memory.
Measuring Cache Efficiency
 The hit rate is the number of cache hits divided by the total number of memory requests over a given
time interval. The value is expressed as a percentage:

 The miss rate is similar in form: the total cache misses divided by the total number of memory
requests expressed as a percentage over a time interval. Note that the miss rate also equals 100
minus the hit rate.
 hit time—the time it takes to access a memory location in the cache

7
ECE, DSCE
ARM processors Module - 5

 miss penalty—the time it takes to load a cache line from main memory into cache.

Memory Protection Units


 There are two methods to control access to system resources, unprotected and protected.
 An unprotected system relies solely on software to protect the system resources.
 A protected system relies on both hardware and software to protect the system resources.
 An unprotected embedded system has no hardware dedicated to enforcing the use of memory and
peripheral devices during operation.
 A protected system has dedicated hardware to check and restrict access to system resources. It can
enforce resource ownership.
 A protected system is proactive in preventing one task from using the resources of another.

8
ECE, DSCE

You might also like