Chap 04
Chap 04
Chap 04
• In this chapter, we first look at a very simple computer called MARIE: A Machine
Architecture that is Really Intuitive and Easy.
• We then provide brief overviews of Intel and MIPS machines, two popular
architectures reflecting the CISC (Complex Instruction Set Computer) and RISC
(Reduced Instruction Set Computer) design philosophies.
• The objective of this chapter is to give you an understanding of how a computer
functions.
• The Central processing unit (CPU) is responsible for fetching program instructions,
decoding each instruction that is fetched, and executing the indicated sequence of
operations on the correct data.
• The two principal parts of the CPU are the datapath and the control unit.
• The datapath consists of an arithmetic-logic unit (ALU) and storage units (registers)
that are interconnected by a data bus that is also connected to main memory. Check
page 29 Figure 1.4.
• Various CPU components perform sequenced operations according to signals
provided by its control unit.
• The arithmetic-logic unit (ALU) carries out logical and arithmetic operations as
directed by the control unit.
• The control unit determines which actions to carry out according to the values in a
program counter register and a status register.
• The CPU shares data with other system components by way of a data bus.
• A bus is a set of wires that simultaneously convey a single bit along each line.
• Two types of buses are commonly found in computer systems: point-to-point, and
multipoint buses.
• At any one time, only one device (be it a register, the ALU, memory, or some other
component) may use the bus.
• However, the sharing often results in a communications bottleneck.
• In a master-slave configuration, where more than one device can be the bus master,
concurrent bus master requests must be arbitrated.
• Four categories of bus arbitration are:
o Daisy chain: Permissions are passed from the highest priority device to the
lowest.
o Centralized parallel: Each device is directly connected to an arbitration circuit,
and a centralized arbiter selects who gets the bus.
o Distributed using self-detection: Devices decide which gets the bus among
themselves.
o Distributed using collision-detection: Any device can try to use the bus. If its
data collides with the data of another device, the device tries again (Ethernet
uses this type arbitration.
• Every computer contains at least one clock that synchronizes the activities of its
components.
• A fixed number of clock cycles are required to carry out each data movement or
computational operation.
• The clock frequency, measured in megahertz or gigahertz, determines the speed with
which all operations are carried out.
• Clock cycle time is the reciprocal of clock frequency.
o An 800 MHz clock has a cycle time of 1.25 ns.
• The minimum clock cycle time must be at least as great as the maximum
propagation delay of the circuit.
• The CPU time required to run a program is given by the general performance
equation:
• We see that we can improve CPU throughput when we reduce the number of
instructions in a program, reduce the number of cycles per instruction, or reduce the
number of nanoseconds per clock cycle.
• In general, multiplication requires more time than addition, floating point operations
require more cycles than integer ones, and accessing memory takes longer than
accessing registers.
• Bus clocks are usually slower than CPU clocks, causing bottleneck problems.
FIGURE 4.4 (a) N 8-Bit Memory Locations; (b) M 16-Bit Memory Locations
• Normally, memory is byte-addressable, which means that each individual byte has a
unique address.
• For example, a computer might handle 32-bit word, but still employ a byte-
addressable architecture. In this situation, when a word uses multiple bytes, the byte
with the lowest address determines the address of the entire word.
• It is also possible that a computer might be word-addressable, which means each
word has its own address, but most current machines are byte-addressable.
• If architecture is byte-addressable, and the instruction set architecture word is larger
than 1 byte, the issue of alignment must be addressed.
• Memory is built from random access memory (RAM) chips. Memory is often referred
to using the notation L X W (length X Length). For example,
o 4M X 16 means the memory is 4M long (4M = 22 X 220 = 222 words) and it is
16 bits wide (each word is 16 bits).
o To address this memory (assuming word addressing), we need to be able to
uniquely identify 222 different items.
o The memory locations for this memory are numbered 0 through 222 -1.
o The memory bus of this system requires at least 22 address lines.
• In general, if a computer 2n addressable units of memory, it will require N bits to
uniquely address each byte.
• Access is more efficient when memory is organized into banks of chips with the
addresses interleaved across the chips:
o Accordingly, in high-order interleaving, the high order address bits specify
the memory bank.
o With low-order interleaving, the low order bits of the address specify which
memory bank contains the address of interest.
• Interrupts are events that alter (or interrupt) the normal flow of execution in the
system. An interrupt can be triggered for a variety of reasons, including:
o I/O requests
o Arithmetic errors (e.g., division by zero)
o Arithmetic underflow or overflow
o Hardware malfunction (e.g., memory parity error)
o User-defined break points (such as when debugging a program)
o Page faults (this is covered in detail in Chapter 6)
o Invalid instructions (usually resulting from pointer issues)
o Miscellaneous
• Each interrupt is associated with a procedure that directs the actions of the CPU
when an interrupt occurs.
• A computer’s instruction set architecture (ISA) specifies the format of its instructions
and the primitive operations that the machine can perform.
• The ISA is an interface between a computer’s hardware and its software.
• Some ISAs include hundreds of different instructions for processing data and
controlling program execution.
• The MARIE ISA consists of only thirteen instructions.
MAR ← X
MBR ← M[MAR], AC ← MBR
MAR ← X
MBR ← M[MAR]
AC ← AC + MBR
o SKIPCOND skips the next instruction according to the value of the AC.
• All computers follow a basic machine cycle: the fetch, decode, and execute cycle.
• The fetch-decode-execute cycle represents the steps that a computer follows to run a
program.
• When the CPU executes an input or output instruction, the appropriate I/O device is
notified.
• The CPU then continues with other useful work until the device is ready.
• At that time, the device sends an interrupt signal to the CPU.
• The CPU then processes the interrupt, after which it continues with normal fetch-
decode-execute cycle.
• Consider the simple MARIE program given in TABLE 4.3. We show a set of
mnemonic instructions stored at addresses 100 - 106 (hex):
• Mnemonic instructions, such as LOAD 104, are easy for humans to write and
understand.
During the first pass, we have a symbol table and the partial instructions
• Most programmers agree that 10% of the code in a program uses approximately 90%
of the CPU time.
• In time-critical applications, we often need to optimize this 10% of code.
Programmers can make the program more efficient in terms of time (and space).
• If the overall size of the program or response time is critical, assembly language often
becomes the language of choice.
• Embedded Systems must be reactive and often are found in time-constrained
environments. These systems are designed to perform either a single instruction or a
very specific set of instructions.
• EXAMPE 4.1 (Page 176) Here is an example using a loop to add five numbers
• EXAMPE 4.2 (Page 178) This example illustrates the use of an if/else construct to
allow for selection. In particular, it implements the following:
if X = Y then
X := X * 2
else
Y := Y – X;
• EXAMPE 4.3 (Page 179) This example illustrates the use of a simple subroutine to
double any number and can be coded. (Note: the line numbers are given for
information only.)
• A computer’s control unit keeps things synchronized, making sure that bits flow to
the correct components as the components are needed.
• There are two general ways in which a control unit can be implemented: hardwired
control and microprogrammed control.
o With microprogrammed control, a small program is placed into read-only
memory in the microcontroller.
o Hardwired controllers implement this program using digital logic components.
• For example, a 4-to-16 decoder could be used to decode the opcode. By using the
contents of the IR register and the status of the ALU, this controls the registers, the
ALU operations, all shifters, and bus access.
• Each member of the x86 family of Intel architectures is known as a CISC (Complex
Instruction Set Computer) machine, whereas the Pentium family and the MIPS
architectures are examples of RISC (Reduced Instruction Set Computer) machines.
• The main objective of RISC machines is to simplify instructions so they can execute
more quickly. Each instruction performs only one operation; they are all the same
size.
• The classic Intel architecture, the 8086, was born in 1979. It is a CISC architecture.
• It was adopted by IBM for its famed PC, which was released in 1981.
• The 8086 operated on 16-bit data words and supported 20-bit memory addresses.
• Later, to lower costs, the 8-bit 8088 was introduced. Like the 8086, it used 20-bit
memory addresses.
• In 1985, Intel introduced the 32-bit 80386.
• It also had no built-in floating-point unit.
• The 80486, introduced in 1989, was an 80386 that had built-in floating-point
processing and cache memory.
• The 80386 and 80486 offered downward compatibility with the 8086 and 8088.
• Software written for the smaller word systems was directed to use the lower 16 bits of
the 32-bit registers.
• Intel’s most advanced 32-bit microprocessor is the Pentium 4.
• It can run as fast as 3.06 GHz. This clock rate is over 350 times faster than that of the
8086.
• Speed enhancing features include multilevel cache and instruction pipelining.
• Intel, along with many others, is marrying many of the ideas of RISC architectures
with microprocessors that are largely CISC.
• The MIPS family of CPUs has been one of the most successful in its class.
• In 1986 the first MIPS CPU was announced.
• It had a 32-bit word size and could address 4GB of memory.
• Over the years, MIPS processors have been used in general purpose computers as
well as in games.
• The MIPS architecture now offers 32- and 64-bit versions.
• MIPS was one of the first RISC microprocessors.
• The original MIPS architecture had only 55 different instructions, as compared with
the 8086 which had over 100.
• MIPS was designed with performance in mind: It is a load/store architecture, meaning
that only the load and store instructions can access memory.
• The major components of a computer system are its control unit, registers, memory,
ALU, and data path.
• MARIE has 4K 16-bit words of main memory, uses 16-bit instructions, and has seven
registers.
• There is only one general purpose register, the AC.
• Instructions for MARIE use 4 bits for the opcode and 12 bits for an address.
• A built-in clock keeps everything synchronized.
• Computers run programs through iterative fetch-decode-execute cycles.
• Computers can run programs that are in machine language.
• An assembler converts mnemonic code to machine language.
• Control units can be microprogrammed or hardwired.
• Hardwired control units give better performance, while microprogrammed units are
more adaptable to changes.
• The Intel architecture is an example of a CISC architecture; MIPS is an example of a
RISC architecture.