CHAPTER FIVcoaE
CHAPTER FIVcoaE
CHAPTER FIVcoaE
MEMORY ORGANIZATION
Introduction
“Ideally one would desire an indefinitely large memory capacity such that any particular word
would be immediately available.
We are forced to recognize the possibility of constructing a hierarchy of memories, each of which
has greater capacity than the preceding but which is less quickly accessible.”
Computer architects rightly predicted that programmers would want unlimited amounts of
memory. A cost-effective way to provide large amounts of memory is by using a memory
hierarchy, which takes advantage of locality and performance of memory technologies (such as
SRAM, DRAM, and so on). The principle of locality states that most programs do not refer all
code or data in a uniform manner. Locality occurs in time and in space. The belief that smaller
hardware can be made faster, contributed to the development of memory hierarchies based on
different speeds and sizes.
The significance of memory hierarchy has improved with advances in performance of processors.
As the performance of the processor increases, the memory should also support this performance.
Therefore, computer architects are working towards minimizing the processor-memory gap
You may be aware that the data required to operate a computer is stored in the hard drive. However,
other storage methods are also required. This requirement is for a number of reasons, but mostly
it is because the hard drive is slow and executing programs using it could be impractical. When
the processor requires data or functions, it first fetches the required data from the hard drive and
then loads them into the main memory (RAM). This increases the operation speed and programs
are executed faster.
Main memory and the hard drive form two levels of the computer’s memory hierarchy. In memory
hierarchy, the storage devices are arranged in such a way that they take advantage of the
characteristics of different storage technologies in order to improve the overall performance of a
computer system.
The memory unit is an essential component of any digital computer, because all the programs and
data are stored here. A very small computer with a limited purpose may be able to accomplish its
intended task without the need for additional storage capacity. Nearly all conventional computers
would run more efficiently if they were provided with additional storage space, excluding the
capacity of the main memory. It is not possible for one memory unit to accommodate all the
programs used in a conventional computer due to lack of storage space. Additionally, almost all
computer users collect and continue to amass large amount of data processing software. Not all
information that is gathered is needed by the processor at the same time. Thus, it is more
appropriate to use low-cost storage devices to serve as a backup for storing the information that is
not currently used by the CPU.
The memory unit that exchanges information directly with the CPU is called the main memory.
Devices that support backup storage are referred to as auxiliary memory. The most common
devices that support storage in typical computer systems are magnetic disks and magnetic tapes.
They are mainly used to store system programs, large data files, and other backup information.
Only programs and data that are currently required by the processor reside in the main memory.
All other information is stored in auxiliary/secondary memory and moved to the main memory
when needed.
The overall memory capacity of a computer can be pictured as being a hierarchy of components.
The memory hierarchy system includes all storage devices used in a computer system. They range
from the slow but large capacity auxiliary memory to a comparatively faster main memory, to an
even smaller but faster cache memory accessible to the high-speed processing logic.
Placed at the bottom of the hierarchy is the comparatively slow magnetic tape used to store
removable files. Next is the magnetic disk that is used as backup storage.
The main memory takes up a central position because of its ability to communicate directly with
the CPU and with auxiliary memory devices, through an Input/output (I/O) processor. When the
CPU needs programs that are not present in the main memory, they are brought in from the
auxiliary memory. Programs that are not currently required in the main memory are moved into
the auxiliary memory to provide space for currently used programs and data.
To increase the speed of processing, a very-high-speed memory, known as cache, is used. It helps
in making the current data and programs available to the CPU at high speed. The cache memory
is used in computer systems to compensate for the difference between the main memory access
time and processor logic speed. CPU logic is quicker than main memory access time. Yet, the
processing speed of the CPU is limited by the main memory’s speed. The difference in these
operating speeds can be balanced by using an extremely fast, small cache between the CPU and
main memory. The cache access time is nearly equal to the processor logic clock cycle time. The
cache is used to store the following:
When programs and data are available at a high rate, it is possible to increase the performance rate
of the computer.
Although the I/O processor manages data transfers between main memory and auxiliary memory,
the cache organization deals with the transfer of information between the main memory and the
CPU.
Thus, the cache and the CPU interact at different levels in the memory hierarchy system. The
main reason for having two or three levels of memory hierarchy is to achieve cost-effectiveness
As the storage capacity of the memory increases, the cost-per-bit of storing binary information
decreases and the memory access time increases.
When compared to main memory, the auxiliary memory has a large storage capacity and is
relatively inexpensive. However, the auxiliary memory has low access speed.
The cache memory is very small, quite expensive, and has very high access speed. Consequently,
as the memory access speed increases, so does its relative cost. The main purpose of employing a
memory hierarchy is to achieve the optimum average access speed while minimizing the total cost
of the entire memory system.
Did u know? Many operating systems are designed in such a way that they allow the CPU to
process a number of independent programs concurrently.
This concept, called multiprogramming, refers to the presence of two or more programs in different
parts of the memory hierarchy at the same time.
In this way, it is possible to make use of the computer by processing several programs in sequence.
Example: Assume that a program is being executed in the CPU and an I/O
operation is required. The CPU instructs the I/O processor to start
processing the transfer. At
this point, the CPU is free to execute another program. In a multiprogramming system, when one
program is waiting for an input or output process to take place, there is another program ready to
use the CPU for execution.
The main memory is the fundamental storage unit in a computer system. It is a relatively large and
fast memory, and stores programs and data during computer operations. The technology that makes
the main memory work is based on semiconductor integrated circuits.
As mentioned earlier, RAM is the main memory. Integrated circuit Random Access Memory
(RAM) chips are available in two possible operating modes. They are:
1. Static: It basically consists of internal flip-flops, which store the binary information. The
stored information remains valid as long as power is supplied to the unit. The static RAM is
2. Dynamic: It stores the binary information in the form of electric charges that are applied to
capacitors. The capacitors are made available inside the chip by Metal Oxide Semiconductor
(MOS) transistors.
The stored charge on the capacitors tends to discharge with time and thus, the capacitors must be
regularly recharged by refreshing the dynamic memory.
Refreshing is done by progressively supplying the capacitor with words every few milliseconds to
restore the decaying charge.
The dynamic RAM uses minimum power and provides ample storage capacity in a single memory
chip.
Most of the main memory in a computer is typically made up of RAM integrated circuit chips, but
a part of the memory may be built with Read Only Memory (ROM) chips.
Formerly, RAM was used to refer to a random-access memory. But now it is used to address a
read/write memory to distinguish it from a read-only memory, although ROM is also random
access.
RAM is used for storing the volumes of programs and data that are subject to change. ROM is
used for storing programs that permanently reside in the computer.
It is also used for storing tables of constants whose value does not change once the computer is
constructed.
RAM and ROM chips are available in a variety of sizes. If the memory requirement for the
computer
Notes is larger than the capacity of a single chip, a number of chips can be combined to form the
required memory size.
The term Random Access Memory or RAM is typically used to refer to memory that is easily read
from and written to by the microprocessor.
In reality, it is not right to use this term. For a memory to be called random access, it should be
possible to access any address at any time.
This differentiates RAM from storage devices such as tapes or hard drives where the data is
accessed sequentially.
Practically, RAM is the main memory of a computer. Its objective is to store data and applications
that are currently in use.
The operating system controls the usage of this memory. It gives instructions like when the items
are to be loaded into RAM, where they are to be located in RAM, and when they need to be
removed from RAM.
RAM is intended to be very fast both for reading and writing data. RAM also tends to be volatile,
that is, all the data is lost as soon as power is cut off.
5.2.2 Read Only Memory
In every computer system, there must be a segment of memory that is stable and unaffected by
power loss.
This kind of memory is called Read Only Memory or ROM. Once again, this is not the right term.
If it was not possible to write to this type of memory, it would not have been possible to store the
code or data that is to be contained in it.
It simply indicates that without special mechanisms in place, a processor may not be able to write
to this type of memory.
Did u know? ROM also stores the computer’s BIOS (Basic Input/output System). BIOS is the code
that guides the processor to access its resources when power is turned on, it must be present even
when the computer is powered down.
The other function of BIOS is storing the code for embedded systems.
Example: It is important for the code in your car’s computer to persist even if the battery is
disconnected.
There are some categories of ROM that the microprocessor can write to. However, the time taken
to write to them, or the programming requirements needed to do so, makes it difficult to write to
them regularly. This is why these memories are still considered read only.
There are few situations where the processor cannot write to a ROM under any conditions.
Example: There is no need to modify the code in your car’s computer. This ROM is programmed
before installing. In order to install a new program in the car’s computer, the old ROM is removed
and a new ROM is installed in its place.
SRAM
RAMs that are made up of circuits and can preserve the information as long as power is supplied
are referred to as Static Random Access Memories (SRAM).
Flip-flops form the basic memory elements in a SRAM device. A SRAM consists of an array of
flip-flops, one for each bit.
Since SRAM consists of an array of flip-flops, a large number of flip-flops are needed to provide
higher capacity memory.
Because of this, simpler flip-flop circuits, BJT and MOS transistors are used for SRAM. This helps
to save chip area and provides memory integrated circuits at relatively reduced cost, increased
speed and reduces the power dissipation as well.
SRAMs have very short access times typically less than 10 ns. SRAMs with battery backup are
commonly used to provide stability to data during power loss.
DRAM
SRAMs are faster but their cost is high, because their cells require many transistors. RAMs can be
obtained at a lower cost if simpler cells are used.
A MOS storage cell based on capacitors can be used to replace the SRAM cells. Such a storage
cell cannot preserve the charge (that is, data) indefinitely and must be recharged periodically.
Therefore, these cells are called as dynamic storage cells. RAMs using these cells are referred to
as Dynamic RAMs or simply DRAMs.
5.3 Cache
Even with improvements in hard drive performance, it is still not practical to execute programs or
access data directly from the mechanical devices like hard disk and magnetic tapes, because they
are very slow.
Therefore, when the processor needs to access data, it is first loaded from the hard drive into the
main memory where the higher performance RAM allows fast access to the data.
When the processor does not require the data anymore, it can either be discarded or used to update
the hard drive.
The cost makes the capacity of a computer’s main memory inadequate when compared to its hard
drive. However, this would not have a major impact as all of the data on a hard drive need not be
accessed all the time by the processor.
Only the currently active data or programs need to be in RAM. Additional performance
improvements can be achieved by taking this concept to another level.
As discussed earlier, there are two main classifications of RAM: Static RAM (SRAM) and
Dynamic RAM (DRAM). SRAM is faster, but that speed comes at a high cost - it has a lower
density and it is more expensive.
Since main memory needs to be relatively large and inexpensive, it is implemented with DRAM.
Main memory improves the performance of the system by loading only the data that is currently
required or in use from the hard drive.
However, the system performance can be improved considerably by using a small, fast SRAM to
store data and code that is in immediate use.
The code and data that is not currently needed can be stored in the main memory.
You must be aware that in a programming language, instructions that are executed often tend to
be clustered together.
This is mainly due to the basic constructs of programming such as loops and subroutines.
Therefore, when one instruction is executed, the chances of it or its surrounding instructions being
executed again in the near future are high.
Over a short interval, a cluster of instructions may execute over and over again. This is referred to
as the principle of locality.
Data also behaves as per this principle as related data is often stored in consecutive locations.
To benefit from this principle, a small, fast SRAM is placed between the processor/CPU and main
memory to hold the most recently used instructions/programs and data under the belief that they
will most likely be used again soon.
This small, fast SRAM is called a RAM cache or just a cache. The location of a cache in a memory
hierarchy is shown in Figure 5.3.
The SRAM of the cache needs to be small, the reason being that the larger address decoder circuits
are slower than small address decoder circuits.
As the memory increases, the complexity of the address decoder circuit also increases. As the
complexity of the address decoder circuit increases, the time taken to select a memory location
based on the address it received also increases. For this reason, making a memory smaller makes
it faster.
The concept of reducing the size of memory can be optimized by placing an even smaller SRAM
Notes
between the cache and the processor, thereby creating two levels of cache. This new cache is
usually
contained inside of the processor. As the new cache is put inside the processor, the wires
connecting
the two become very short, and the interface circuitry becomes more closely integrated with that
of the processor. These two conditions together with the smaller decoder circuit facilitate faster
data access. When two caches are present, the cache within the processor is referred to as a level
1
or L1 cache. The cache between the L1 cache and memory is referred to as a level 2 or L2 cache.
Figure 12.4 shows the placement of L1 and L2 cache in memory.
The split cache, another cache organization, is shown in figure 5.4 Split cache requires two caches.
In this case, a processor uses one cache to store code/instructions and a second cache to store data.
This cache organization is typically used to support an advanced type of processor architecture
such as pipelining.
Here, the mechanisms used by the processor to handle the code are so distinct from those used for
data that it does not make sense to put both types of information into the same cache.
The success of caches depends upon the principle of locality. The principle proposes that when
one data item is loaded into a cache, the items close to it in memory should be loaded too.
If a program enters a loop, most of the instructions that are part of that loop are executed multiple
times.
Therefore, when the first instruction of a loop is being loaded into the cache, it loads its bordering
instructions simultaneously to save time.
In this way, the processor does not have to wait for the main memory to provide subsequent
instructions. As a result of this, caches are organized in such a way that when one piece of data or
code is loaded, the block of neighboring items are loaded too. Each block loaded into the cache is
identified with a number known as a tag. This tag can be used to find the original addresses of the
data in the main memory.
Therefore, when the processor is in search of a piece of data or code (hereafter referred to as a
word), it only needs to check the tags to see if the word is contained in the cache.
Each block of words and its corresponding tag is combined in the cache to form a line. The lines
are structured into a table.
When the main memory needs a word from within a block, the whole block is moved into one of
the lines of the cache along with its tag, which is used to identify the address of the block.
Cache
The process of transferring data from main memory to cache memory is called as mapping. There
are three methods used to map a line in the cache to an address in memory so that the processor
can quickly find a word. They are:
1. Direct Mapping
2. Associative Mapping
Direct mapping is a procedure used to assign each memory block in main memory to a specific
line in the cache. If a line is already filled with a memory block and a new block needs to be loaded,
then the old block is discarded from the cache.
Figure 5.5shows how multiple blocks from the above example are mapped to each line in the Notes
cache.
Just like locating a word within a block, bits are taken from the main memory address to uniquely
describe the line in the cache where a block can be stored.
Example: Consider a cache with 29 = 512 lines, then a line would need 9 bits to be uniquely
identified.
Therefore, the 9 bits of the address immediately to the left of the word identification bits would
recognize the line in the cache where the block is to be stored. The bits of the address that were
not used for the word offset or the cache line would be used for the tag. Figure 5.6represents this
partitioning of the bits.
5.3.2 Associative Mapping
Associative mapping or fully associative mapping does not make use of line numbers. It breaks
the main memory address into two parts - the word ID and a tag as shown in Figure 5.7.
In order to check for a block stored in the memory, the tag is pulled from the memory address and
a search is performed through all of the lines of the cache to see if the block is present.
This method of searching for a block within a cache appears like it might be a slow process, but it
is not.
Each line of the cache has its own compare circuitry, which can quickly analyze whether or not
the block is contained at that line.
With all of the lines performing this comparison process in parallel, the correct line is identified
quickly.
This mapping technique is designed to solve a problem that exists with direct mapping where two
active blocks of memory could map to the same line of the cache.
When this happens, neither block of memory is allowed to stay in the cache as it is replaced quickly
by the competing block.
This leads to a condition that is referred to as thrashing. In thrashing, a line in the cache goes back
and forth between two or more blocks, usually replacing a block even before the processor goes
through it.
Thrashing can be avoided by allowing a block of memory to map to any line of the cache.
However, this advantage comes with a price. When an associative cache is full and the processor
needs to load a new block from memory, a decision has to be made regarding which of the existing
blocks should be discarded.
The selection method, known as a replacement algorithm, should aim to replace the block least
likely to be needed by the processor in the near future.
There are many replacement algorithms, none of which has any precedence over the others. In an
attempt to realize the fastest operation, each of these algorithms is implemented in hardware.
1. Least Recently Used (LRU): This method replaces the block that has not been read by the
2. First in First Out (FIFO): This method replaces the block that has been in cache the longest.
3. Least Frequently Used (LFU): This method replaces the block which has had fewest hits since
4. Random: This method randomly selects a block to be replaced. Its performance is slightly
The objective of a replacement algorithm is to try to remove the page least likely to be referenced
in the immediate future.
Set associative mapping merges direct mapping with fully associative mapping by grouping
together lines of a cache into sets. The sets are determined using a direct mapping scheme.
However, the lines within each set are considered as tiny fully associative cache where any block
that is to be stored in the set can be stored to any line within the set.
Figure 12.9 represents this arrangement using a sample cache that uses four lines to a set
A set associative cache that contains k lines per set is called as a k way set associative cache. Since
the mapping technique uses the memory address just like direct mapping does, the number of lines
contained in a set must be equal to an integer power of two, for example, two, four, eight, sixteen,
and so on.
In typical computer systems, data and programs are initially stored in auxiliary memory. Fragments
of data or programs are brought into main memory as and when the CPU requires them.
Virtual memory is a method or approach used in some large computer systems. It allows the user
to construct programs as though a large memory space was available, which is equal to the whole
of auxiliary memory.
Every address that is referenced by the CPU goes through an address mapping from the so-called
virtual address to a physical address in the main memory.
Virtual memory is made use of to give programmers the impression that they have a very large
memory at their disposition, even though the computer actually has a relatively small main
memory.
This translation happens dynamically even as programs are being executed in the CPU. The
translation or mapping is automatically handled by the hardware by means of a mapping table.
Addresses that are used by programmers are called virtual addresses, and the set of such addresses
is called the address space.
The space or spot where the address is stored in the main memory is called a location or physical
address and the set of such locations is called the memory space.
Therefore, the address space is the set of addresses generated by programs as they reference
instructions and data.
The memory space holds the actual main memory locations that are directly addressable for
processing. In most computers, the address and memory spaces are the same.
Did u know? The address space is permitted to be bigger than the memory space in computers with
virtual memory.
Example: Consider, main-memory having capacity of 32K words (K = 1024). 15 bits are
required to specify a physical address in memory since 32K = 215. Assuming that the computer
has available auxiliary memory for storing 220 = 1024K words.
Thus auxiliary memory has a storage capacity equivalent to the capacity of 32 main memories. If
the address space is denoted by N and the memory space by M, we then have for this example N
= 1024K and M = 32K.
In a multiprogramming computer system, programs and data are transferred to and from auxiliary
memory and main memory when required by the CPU.
Suppose Program1 is currently being executed in the CPU, Program1 and a section of its
associated data are transferred from auxiliary memory into main memory as shown in figure 5.10
The associated programs and data need not be in adjacent locations in the memory, since
information is being moved in and out, and empty spaces may be scattered in the memory.