Os nOTES

CS6303 – COMPUTER ARCHITECTURE UNIT – 3 Q & A
UNIT III MEMORY & I/O SYSTEMS
Memory Hierarchy - Memory technologies – cache memory – measuring and

improving cache performance – virtual memory –-Accessing I/O Devices –
Interrupts – Direct Memory Access – Bus structure –Interface circuits – USB.
MEMORY HIERARCHY
The Computer memory hierarchy looks like a pyramid structure which is used to describe the
differences among memory types. It separates the computer storage based on hierarchy.
Level 0: CPU registers
Level 1: Cache memory
Level 2: Main memory or primary memory
Level 3: Magnetic disks or secondary memory
Level 4: Optical disks or magnetic types or tertiary Memory
In Memory Hierarchy the cost of memory, capacity is inversely proportional to speed. Here the
devices are arranged in a manner Fast to slow, that is form register to Tertiary memory.
Let us discuss each level in detail:
Level-0 − Registers
The registers are present inside the CPU. As they are present inside the CPU, they have least
access time. Registers are most expensive and smallest in size generally in kilobytes. They are
implemented by using Flip-Flops.
Level-1 − Cache
Cache memory is used to store the segments of a program that are frequently accessed by the
processor. It is expensive and smaller in size generally in Megabytes and is implemented by using
static RAM.
Level-2 − Primary or Main Memory

It directly communicates with the CPU and with auxiliary memory devices through an I/O
processor. Main memory is less expensive than cache memory and larger in size generally in
Gigabytes. This memory is implemented by using dynamic RAM.
Level-3 − Secondary storage
Secondary storage devices like Magnetic Disk are present at level 3. They are used as backup
storage. They are cheaper than main memory and larger in size generally in a few TB.
Level-4 − Tertiary storage
Tertiary storage devices like magnetic tape are present at level 4. They are used to store
removable files and are the cheapest and largest in size (1-20 TB).
Let us see the memory levels in terms of size, access time, bandwidth.
Level Register Cache Primary Secondary

memory memory
Bandwidth 4k to 32k 800 to 5k 400 to 2k 4 to 32 MB/sec
MB/sec MB/sec MB/sec
Size Less than 1KB Less than 4MB Less than 2 GB Greater than 2
GB
Access time 2 to 5nsec 3 to 10 nsec 80 to 400 nsec 5ms
Managed by Compiler Hardware Operating OS or user
system
Why memory Hierarchy is used in systems?

Memory hierarchy is arranging different kinds of storage present on a computing device based
on speed of access. At the very top, the highest performing storage is CPU registers which are
the fastest to read and write to. Next is cache memory followed by conventional DRAM memory,
followed by disk storage with different levels of performance including SSD, optical and magnetic
disk drives.
To bridge the processor memory performance gap, hardware designers are increasingly relying
on memory at the top of the memory hierarchy to close / reduce the performance gap. This is
done through increasingly larger cache hierarchies (which can be accessed by processors much
faster), reducing the dependency on main memory which is slower.
Memory technologies
Memory latency is traditionally quoted using two measures—access time and cycle time. Access time
is the time between when a read is requested and when the desired word arrives, cycle time is the
minimum time between requests to memory. One reason that cycle time is greater than access time is
that the memory needs the address lines to be stable between accesses.
DRAM technology
The main memory of virtually every desktop or server computer sold since 1975 is composed of
semiconductor DRAMs,. As early DRAMs grew in capacity, the cost of a package with all the
necessary address lines was an issue. The solution was to multiplex the
address lines, thereby cutting the number of address pins in half. One half of the address is sent first,
called the row access strobe or(RAS). It is followed by the other half of the address, sent during the
column access strobe(CAS). These names come from the internal chip organization, since the memory
is organized as a rectangular matrix addressed by rows and columns.
DRAMs are commonly sold on small boards called DIMMs for Dual Inline Memory Modules.
DIMMs typically contain 4 to 16 DRAMs. They are normally organized to be eight bytes wide for
desktop systems.
SRAM Technology
In contrast to DRAMs are SRAMs—the first letter standing for static. The dynamic nature of the
circuits in DRAM require data to be written back after being read, hence the difference between the
access time and the cycle time as well as the need to refresh. SRAMs typically use six transistors per
bit to prevent the information from being disturbed when read.
In DRAM designs the emphasis is on cost per bit and capacity, while SRAM designs are concerned
with speed and capacity. (Because of this concern, SRAM address lines are not multiplexed.). Thus,
unlike DRAMs, there is no difference between access time and cycle time. For memories designed in
comparable technologies, the capacity of DRAMs is roughly 4 to 8 times that of SRAMs. The cycle
time of SRAMs is 8 to 16 times faster than DRAMs, but they are also 8 to 16 times as expensive.
Embedded Processor Memory Technology: ROM and Flash
Embedded computers usually have small memories, and most do not have a disk to act as non-volatile
storage. Two memory technologies are found in embedded computers to address this problem.
The first is Read-Only Memory (ROM). ROM is programmed at time of manufacture, needing only a
single transistor per bit to represent 1 or 0. ROM is used for the embedded program and for constants,
often included as part of a larger chip.In addition to being non-volatile, ROM is also non-destructible;
nothing the computer can do can modify the contents of this memory. Hence, ROM also provides a
level of protection to the code of embedded computers. Since address based protection is often not
enabled in embedded processors, ROM can fulfill an important role.
The second memory technology offers non-volatility but allows the memory to be modified. Flash
memory allows the embedded device to alter nonvolatile memory after the system is manufactured,
which can shorten product development.
Improving Memory Performance in a standard DRAM Chip
To improve bandwidth, there have been a variety of evolutionary innovations over time.
1. The first was timing signals that allow repeated accesses to the row buffer without another
row access time, typically called fast page mode..
2. The second major change is that conventional DRAMs have an asynchronousinterface to the
memory controller, and hence every transfer involves overhead to synchronize with the controller.This
optimization is called Synchronous DRAM( SDRAM).
3. The third major DRAM innovation to increase bandwidth is to transfer data onboth the
rising edge and falling edge of the DRAM clock signal, thereby doubling the peak data rate. This
optimization is calledDouble Data Rate(DDR).
CACHE BASICS
Basic Ideas
The cache is a small mirror-image of a portion (several "lines") of main memory. cache is faster
than main memory ==> so maximize its utilization
 cache is more expensive than main memory ==> so it is much smaller
Locality of reference
The principle that the instruction currently being fetched/executed is very close in memory to
the instruction to be fetched/executed next. The same idea applies to the data value currently
being accessed (read/written) in memory.
If we keep the most active segments of program and data in the cache, overall execution speed
for the program will be optimized. Our strategy for cache utilization should maximize the number
of cache read/write operations, in comparison with the number of main memory read/write
operations.
Example
A line is an adjacent series of bytes in main memory (that is, their addresses are contiguous).
Suppose a line is 16 bytes in size. For example, suppose we have a 212= 4K-byte cache with 28 =
256 16-byte lines; a 224 = 16M-byte main memory, which is 212 = 4K times the size of the cache;
and a 400-line program, which will not all fit into the cache at once.
Each active cache line is established as a copy of a corresponding memory line during execution.
Whenever a memory write takes place in the cache, the "Valid" bit is reset (marking that line
"Invalid"), which means that it is no longer an exact image of its corresponding line in memory.
Cache Dynamics
When a memory read (or fetch) is issued by the CPU:
1. If the line with that memory address is in the cache (this is called a cache hit), the data is
read from the cache to the MDR.
2. If the line with that memory address is not in the cache (this is called a miss), the cache is
updated by replacing one of its active lines by the line with that memory address, and then the
data is read from the cache to the MDR.
When a memory write is issued by the CPU:
1. If the line with that memory address is in the cache, the data is written from the MDR to the
cache, and the line is marked "invalid" (since it no longer is an image of the corresponding
memory line
2. If the line with that memory address is not in the cache, the cache is updated by replacing
one of its active lines by the line with that memory address. The data is then written from the
MDR to the cache and the line is marked "invalid."
Cache updating is done in the following way.
1. A candidate line is chosen for replacement using an algorithm that tries to minimize the
number of cache updates throughout the life of the program run. Two algorithms have been
popular in recent architectures:
- Choose the line that has been least recently used - "LRU" for short (e.g., the PowerPC)
- Choose the line randomly (e.g., the 68040)
2. If the candidate line is "invalid," write out a copy of that line to main memory (thus bringing
the memory up to date with all recent writes to that line in the cache).
3. Replace the candidate line by the new line in the cache.
MEASURING AND IMPROVING CACHE PERFORMANCE
As a working example, suppose the cache has 27 = 128 lines, each with 24 = 16 words. Suppose
the memory has a 16-bit address, so that 216 = 64K words are in the memory's address space.
Direct Mapping
Under this mapping scheme, each memory line j maps to cache line j mod 128 so the memory
address looks like this:
Here, the "Word" field selects one from among the 16 addressable words in a line. The "Line"
field defines the cache line where this memory line should reside. The "Tag" field of the address
is is then compared with that cache line's 5-bit tag to determine whether there is a hit or a miss.
If there's a miss, we need to swap out the memory line that occupies that position in the cache
and replace it with the desired memory line.
E.g., Suppose we want to read or write a word at the address 357A, whose 16 bits are
0011010101111010. This translates to Tag = 6, line = 87, and Word = 10 (all in decimal). If line 87
in the cache has the same tag (6), then memory address 357A is in the cache. Otherwise, a miss
has occurred and the contents of cache line 87 must be replaced by the memory line
001101010111 = 855 before the read or write is executed.
Direct mapping is the most efficient cache mapping scheme, but it is also the least effective in its
utilization of the cache - that is, it may leave some cache lines unused.
Associative Mapping
This mapping scheme attempts to improve cache utilization, but at the expense of speed. Here,
the cache line tags are 12 bits, rather than 5, and any memory line can be stored in any cache
line. The memory address looks like this:
Here, the "Tag" field identifies one of the 2 12 = 4096 memory lines; all the cache tags are
searched to find out whether or not the Tag field matches one of the cache tags. If so, we have a
hit, and if not there's a miss and we need to replace one of the cache lines by this line before
reading or writing into the cache. (The "Word" field again selects one from among 16
addressable words (bytes) within the line.)
For example, suppose again that we want to read or write a word at the address 357A, whose 16
bits are 0011010101111010. Under associative mapping, this translates to Tag = 855 and Word =
10 (in decimal). So we search all of the 128 cache tags to see if any one of them will match with
855. If not, there's a miss and we need to replace one of the cache lines with line 855 from
memory before completing the read or write.
The search of all 128 tags in the cache is time-consuming. However, the cache is fully utilized
since none of its lines will be unused prior to a miss (recall that direct mapping may detect a miss
even though the cache is not completely full of active lines).
Set-associative Mapping
This scheme is a compromise between the direct and associative schemes described above. Here,
the cache is divided into sets of tags, and the set number is directly mapped from the memory
address (e.g., memory line j is mapped to cache set j mod 64), as suggested by the diagram
below:
The memory address is now partitioned to like this:
Here, the "Tag" field identifies one of the 26 = 64 different memory lines in each of the 26 = 64
different "Set" values. Since each cache set has room for only two lines at a time, the search for a
match is limited to those two lines (rather than the entire cache). If there's a match, we have a
hit and the read or write can proceed immediately.
Otherwise, there's a miss and we need to replace one of the two cache lines by this line before
reading or writing into the cache. (The "Word" field again select one from among 16 addressable
words inside the line.) In set-associative mapping, when the number of lines per set is n, the
mapping is called n-way associative. For instance, the above example is 2-way associative.
E.g., Again suppose we want to read or write a word at the memory address 357A, whose 16 bits
are 0011010101111010. Under set-associative mapping, this translates to Tag = 13, Set = 23, and
Word = 10 (all in decimal). So we search only the two tags in cache set 23 to see if either one
matches tag 13. If so, we have a hit. Otherwise, one of these two must be replaced by the
memory line being addressed (good old line 855) before the read or write can be executed.
A Detailed Example
Suppose we have an 8-word cache and a 16-bit memory address space, where each memory
"line" is a single word (so the memory address need not have a "Word" field to distinguish
individual words within a line). Suppose we also have a 4x10 array a of numbers (one number per
addressible memory word) allocated in memory column-by-column, beginning at address 7A00.
That is, we have the following declaration and memory allocation picture for
The array a:
float [a = new float [4][10];

Direct Mapping
Direct mapping of the cache for this model can be accomplished by using the rightmost 3 bits of
the memory address. For instance, the memory address 7A00 = 0111101000000 000, which
maps to cache address 000. Thus, the cache address of any value in the array a is just its memory
address modulo 8.
Using this scheme, we see that the above calculation uses only cache words 000 and 100, since
each entry in the first row of a has a memory address with either 000 or 100 as its rightmost 3
bits. The hit rate of a program is the number of cache hits among its reads and writes divided by
the total number of memory reads and writes. There are 30 memory reads and writes for this
program, and the following diagram illustrates cache utilization for direct mapping throughout
the life of these two loops:
Reading the sequence of events from left to right over the ranges of the indexes i and j, it is easy
to pick out the hits and misses. In fact, the first loop has a series of 10 misses (no hits). The
second loop contains a read and a write of the same memory location on each repetition (i.e.,
a[0][i] = a[0][i]/Ave; ), so that the 10 writes are guaranteed to be hits. Moreover, the first two
repetitions of the second loop have hits in their read operations, since a09 and a08 are still in the
cache at the end of the first loop. Thus, the hit rate for direct mapping in this algorithm is 12/30 =
40%
Associative Mapping
Associative mapping for this problem simply uses the entire address as the cache tag. If we use
the least recently used cache replacement strategy, the sequence of events in the cache after the
first loop completes is shown in the left-half of the following diagram. The second loop happily
finds all of a 09 - a02 already in the cache, so it will experience a series of 16 hits (2 for each
repetition) before missing on a 01 when i=1. The last two steps of the second loop therefore have
2 hits and 2 misses.
Set-Associative Mapping
Set associative mapping tries to compromise these two. Suppose we divide the cache into two
sets, distinguished from each other by the rightmost bit of the memory address, and assume the
least recently used strategy for cache line replacement. Cache utilization for our program can
now be pictured as follows:
Here all the entries in a that are referenced in this algorithm have even-numbered addresses
(their rightmost bit = 0), so only the top half of the cache is utilized. The hit rate is therefore
slightly worse than associative mapping and slightly better than direct. That is, set-associative
cache mapping for this program yields 14 hits out of 30 read/writes for a hit rate of 46%.
VIRTUAL MEMORY
The physical main memory is not as large as the address space spanned by an address issued by
the processor. When a program does not completely fit into the main memory, the parts of it not
currently being executed are stored on secondary storage devices, such as magnetic disks. Of
course, all parts of a program that are eventually executed are first brought into the main
memory.
When a new segment of a program is to be moved into a full memory, it must replace another
segment already in the memory. The operating system moves programs and data automatically
between the main memory and secondary storage. This process is known as swapping. Thus, the
application programmer does not need to be aware of limitations imposed by the available main
memory.
Techniques that automatically move program and data blocks into the physical main memory
when they are required for execution are called virtual-memory techniques. Programs, and
hence the processor, reference an instruction and data space that is independent of the available
physical main memory space. The binary addresses that the processor issues for either
instructions or data are called virtual or logical addresses. These addresses are translated into
physical addresses by a combination of hardware and software components. If a virtual address
refers to a part of the program or data space that is currently in the physical memory, then the
contents of the appropriate location in the main memory are accessed immediately. On the
other hand, if the referenced address is not in the main memory, its contents must be brought
into a suitable location in the memory before they can be used.
Figure shows a typical organization that implements virtual memory. A special hardware unit,
called the Memory Management Unit (MMU), translates virtual addresses into physical
addresses. When the desired data (or instructions) are in the main memory, these data are
fetched as described in our presentation of the ache mechanism. If the data are not in the main
memory, the MMU causes the operating system to bring the data into the memory from the disk.
The DMA scheme is used to perform the data Transfer between the disk and the main memory.
ADDRESS TRANSLATION
The process of translating a virtual address into physical address is known as address translation.
It can be done with the help of MMU. A simple method for translating virtual addresses into
physical addresses is to assume that all programs and data are composed of fixed-length units
called pages, each of which consists of a block of words that occupy contiguous locations in the
main memory. Pages commonly range from 2K to 16K bytes in length. They constitute the basic
unit of information that is moved between the main memory and the disk whenever the
translation mechanism determines that a move is required.
Pages should not be too small, because the access time of a magnetic disk is much longer
(several milliseconds) than the access time of the main memory. The reason for this is that it
takes a considerable amount of time to locate the data on the disk, but once located, the data
can be transferred at a rate of several megabytes per second. On the other hand, if pages are too
large it is possible that a substantial portion of a page may not be used, yet this unnecessary data
will occupy valuable space in the main memory.
The cache bridges the speed gap between the processor and the main memory and is
implemented in hardware. The virtual-memory mechanism bridges the size and speed gaps
between the main memory and secondary storage and is usually implemented in part by
software techniques. Conceptually, cache techniques and virtual- memory techniques are very
similar. They differ mainly in the details of their implementation.
A virtual-memory address translation method based on the concept of fixed-length pages. Each
virtual address generated by the processor, whether it is for an instruction fetch or an operand
fetch/store operation, is interpreted as a virtual page number (high-order bits) followed by an
offset (low-order bits) that specifies the location of a particular byte (or word) within a page.
Information about the main memory location of each page is kept in a page table. This
information includes the main memory address where the page is stored and the current status
of the page.
An area in the main memory that can hold one page is called a page frame. The starting address
of the page table is kept in a page table base register. By adding the virtual page number to the
contents of this register, the address of the corresponding entry in the page table is obtained.
The contents of this location give the starting address of the page if that page currently resides in
the main memory. Each entry in the page table also includes some control bits that describe the
status of the page while it is in the main memory. One bit indicates the validity of the page, that
is, whether the page is actually loaded in the main memory.
This bit allows the operating system to invalidate the page without actually removing it. Another
bit indicates whether the page has been modified during its residency in the memory. As in cache
memories, this information is needed to determine whether the page should be written back to
the disk before it is removed from the main memory to make room for another page. Other
control bits indicate various restrictions that may be imposed on accessing the page. For
example, a program may be given full read and write permission, or it may be restricted to read
accesses only.
TLBS- INPUT/OUTPUT SYSTEM
The MMU must use the page table information for every read and write access; so ideally, the
page table should be situated within the MMU. Unfortunately, the page table may be rather
large, and since the MMU is normally implemented as part of the processor chip (along with the
primary cache), it is impossible to include a complete page table on this chip. Therefore, the page
table is kept in the main memory. However, a copy of a small portion of the page table can be
accommodated within the MMU.
This portion consists of the page table entries that correspond to the most recently accessed
pages. A small cache, usually called the Translation Lookaside Buffer (TLB) is incorporated into
the MMU for this purpose. The operation of the TLB with respect to the page table in the main
memory is essentially the same as the operation of cache memory; the TLB must also include the
virtual address of the entry. Figure shows a possible organization of a TLB where the associative-
mapping technique is used. Setassociative mapped TLBs are also found in commercial products.
An essential requirement is that the contents of the TLB be coherent with the contents of page
tables in the memory. When the operating system changes the contents of page tables, it must
simultaneously invalidate the corresponding entries in the TLB. One of the control bits in the TLB
is provided for this purpose. When an entry is invalidated, the TLB will acquire the new
information as part of the MMU's normal response to access misses. Address translation
proceeds as follows.
Given a virtual address, the MMU looks in the TLB for the referenced page. IT the page table
entry for this page is found in the TLB, the physical address is obtained immediately. If there is a
miss in the TLB, then the required entry is obtained from the page table in the main memory and
the TLB is updated. When a program generates an access request to a page that is not in the
main memory, a page fault is said to have occurred. The whole page must be brought from the
disk into the memory before access can proceed. When it detects a page fault, the MMU
The operating system then copies the requested page from the disk into the main memory and
returns control to the interrupted task. Because a long delay occurs while the page transfer takes
place, the operating system may suspend execution of the task that caused the page fault and
begin execution of another task whose pages are in the main memory.
It is essential to ensure that the interrupted task can continue correctly when it resumes
execution. A page fault occurs when some instruction accesses a memory operand that is not in
the main memory, resulting in an interruption before the execution of this instruction is
completed. Hence, when the task resumes, either the execution of the interrupted instruction
must continue from the point of interruption, or the instruction must be restarted. The design of
a particular processor dictates which of these options should be used.
If a new page is brought from the disk when the main memory is full, it must replace one of the
resident pages. The problem of choosing which page to remove is just as critical here as it is in a
cache, and the idea that programs spend most of their time in a few localized areas also applies.
Because main memories are considerably larger than cache memories, it should be possible to
keep relatively larger portions of a program in the main memory. This will reduce the frequency
of transfers to and from the disk.
Concepts similar to the FIFO, Optimal and LRU replacement algorithms can be applied to page
replacement and the control bits in the page table entries can indicate usage. One simple scheme
is based on a control bit that is set to 1 whenever the corresponding page is referenced
(accessed). The operating system occasionally clears this bit in all page table entries, thus
providing a simple way of determining which pages have not been used recently.
A modified page has to be written back to the disk before it is removed from the main memory. It
isimportant to note that the write-through protocol, which is useful in the framework of cache
memories, is not suitable for virtual memory. The access time of the disk is so long that it does
not make sense to access it frequently to write small amounts of data. The address translation
process in the MMU requires some time to perform, mostly dependent on the time needed to
look up entries in the TLB. Because of locality of reference, it is likely that many successive
translations involve addresses on the same page. This is particularly evident in fetching
instructions. Thus, we can reduce the average translation time by including one or more special
registers that retain the virtual page number and the physical page frame of the most recently
performed translations. The information in these registers can be accessed more quickly than the
TLB.
DMA AND INTERRUPTS
A special control unit is provided to allow transfer of a block of data directly between an external
device and the main memory, without continuous intervention by the processor. This approach is
called direct memory access, or DMA.
DMA transfers are performed by a control circuit that is part of the I/O device interface. We refer
to this circuit as a DMA controller. The DMA controller performs the functions that would
normally be carried out by the processor when accessing the main memory. For each word
transferred, it provides the memory address and all the bus signals that control data transfer.
Since it has to transfer blocks of data, the DMA controller must increment the memory address
for successive words and keep track of the number of transfers.
Although a DMA controller can transfer data without intervention by the processor, its operation
must be under the control of a program executed by the processor. To initiate the transfer of a
block of words, the processor sends the starting address, the number of words in the block, and
the direction of the transfer. On receiving this information, the DMA controller proceeds to
perform the requested operation. When the entire block has been transferred, the controller
informs the processor by raising an interrupt signal.
While a DMA transfer is taking place, the program that requested the transfer cannot continue,
and the processor can be used to execute another program. After the DMA transfer is
completed, the processor can return to the program that requested the transfer. I/O operations
are always performed by the operating system of the computer in response to a request from an
application program.
The OS is also responsible for suspending the execution of one program and starting another.
Thus, for an I/O operation involving DMA, the OS puts the program that requested the transfer in
the Blocked state, initiates the DMA operation, and starts the execution of another program.
When the transfer is completed, the DMA controller informs the processor by sending an
interrupt request. In response, the OS puts the suspended program in the Runnable state so that
it can be selected by the scheduler to continue execution.
The above figure shows an example of the DMA controller registers that are accessed by the
processor to initiate transfer operations. Two registers are used for storing the starting address
and the word count. The third register contains status and control flags. The R/W bit determines
the direction of the transfer. When this bit is set to 1 by a program instruction, the controller
performs a read operation, that is, it transfers data from the memory to the I/O device.
Otherwise, it performs a write operation.
When the controller has completed transferring a block of data and is ready to receive another
command, it sets the Done flag to 1. Bit 30 is the Interrupt-enable flag, IE. When this flag is set to
1, it causes the controller to raise an interrupt after it has completed transferring a block of data.
Finally, the controller sets the IRQ bit to 1 when it has requested an interrupt.
A DMA controller connects a high-speed network to the computer bus. The disk controller, which
controls two disks, also has DMA capability and provides two DMA channels. It can perform two
independent DMA operations, as if each disk had its own DMA controller. The registers needed
to store the memory address, the word count, and so on are duplicated, so that one set can be
used with each device.
To start a DMA transfer of a block of data from the main memory to one of the disks, a program
writes the address and word count information into the registers of the corresponding channel of
the disk controller. It also provides the disk controller with information to identify the data for
future retrieval. The DMA controller proceeds independently to implement the specified
operation.
When the DMA transfer is completed, this fact is recorded in the status and control register of
the DMA channel by setting the Done bit. At the same time, if the IE bit is set, the controller
sends an interrupt request to the processor and sets the IRQ bit. The status register can also be
used to record other information, such as whether the transfer took place correctly or errors
occurred.
Memory accesses by the processor and the DMA controllers are interwoven. Requests by DMA
devices for using the bus are always given higher priority than processor requests. Among
different DMA devices, top priority is given to high-speed peripherals such as a disk, a high-speed
network interface, or a graphics display device. Since the processor originates most memory
access cycles, the DMA controller can be said to "steal" memory cycles from the processor.
Hence, this interweaving technique is usually called cycle stealing.
Alternatively, the DMA controller may be given exclusive access to the main memory to transfer
a block of data without interruption. This is known as block or burst mode. Most DMA controllers
incorporate a data storage buffer. In the case of the network interface for example, the DMA
controller reads a block of data from the main memory and stores it into its input buffer. This
transfer takes place using burst mode at a speed appropriate to the memory and the computer
bus. Then, the data in the buffer are transmitted over the network at the speed of the network.
I/O PROCESSORS.
IO processor is
• A specialized processor
• Not only loads and stores into memory but also can execute instructions, which are among a
set of I/O instructions
• The IOP interfaces to the system and devices
• The sequence of events involved in I/O transfers to move or operate the results of an I/O
operation into the main memory (using a program for IOP, which is also in main memory)
• Used to address the problem of direct transfer after executing the necessary format
conversion or other instructions
• In an IOP-based system, I/O devices can directly access the memory without intervention by
the processor
IOP instructions
• Instructions help in format conversions─ byte from memory as packed decimals to the output
device for line-printer
• The I/O device data in different format can be transferred to main memory using an IOP
Sequence of events when using an I/O Processor
Sequence 1:
A DRQ (for IOP request) signal from an IOP device starts the IOP sequence, the IOP signals an
interrupt on INTR line this requests attention from the processor
Sequence 2:
The processor responds by checking the device’s status via the memory-mapped control registers
and issues a command telling the IOP to execute IOP instructions for the transfer to move the
formatted data into the memory.
Sequence 3:
During each successive formatted byte(s) transfer, the device DMAC (DMA controller) logic inside
the IOP uses a processor bushold request line, HOLD, distinct from INTR device interrupt request
line
• The main processor sends to the device a signal from the processor called DACK (distinct from
INTA device-interrupt request-acknowledge line)
• The I/O device bus has access to the address and data buses of the memory bus when DACK is
activated
• It has no access when DACK is not activated when a HOLD request is not accepted by the
processor when the processor is using the memory bus
• Once the DMA logic start command has been issued to IOP, the main processor begins working
on something else while the I/O device transfers the data into the memory
Sequence 4:
When the IOP’s DMA transfer as per instructions is complete, the I/O device signals another
interrupt (using DRQ) . Lets the main processor know that the DMA is done and it may access the
data

Os nOTES

Uploaded by

Copyright:

Available Formats

Os nOTES

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Os nOTES

Uploaded by

Copyright:

Available Formats

CS6303 – COMPUTER ARCHITECTURE UNIT – 3 Q & A

UNIT III MEMORY & I/O SYSTEMS

Memory Hierarchy - Memory technologies – cache memory – measuring and

Level-2 − Primary or Main Memory

Level Register Cache Primary Secondary

Why memory Hierarchy is used in systems?

Embedded Processor Memory Technology: ROM and Flash

Improving Memory Performance in a standard DRAM Chip

 cache is more expensive than main memory ==> so it is much smaller

When a memory read (or fetch) is issued by the CPU:

When a memory write is issued by the CPU:

Cache updating is done in the following way.

- Choose the line randomly (e.g., the 68040)

3. Replace the candidate line by the new line in the cache.

MEASURING AND IMPROVING CACHE PERFORMANCE

The memory address is now partitioned to like this:

float [a = new float [4][10];

TLBS- INPUT/OUTPUT SYSTEM

DMA AND INTERRUPTS

• The IOP interfaces to the system and devices

device for line-printer

Sequence of events when using an I/O Processor

INTA device-interrupt request-acknowledge line)

You might also like