CH6 Iii
CH6 Iii
CH6 Iii
+ Chapter 6_P_III
Processor Structure and Function
+ Processor Organization 2
Processor Requirements:
Fetch instruction
The processor reads an instruction from memory (register, cache, main memory)
Interpret instruction
The instruction is decoded to determine what action is required
Fetch data
The execution of an instruction may require reading data from memory or an I/O
module
Process data
The execution of an instruction may require performing some arithmetic or logical
operation on data
Write data
The results of an execution may require writing data to memory or an I/O module
Register Organization
Within the processor there is a set of registers that
function as a level of memory above main memory and
cache in the hierarchy
Categories:
Referenced by means • General purpose
of the machine • Can be assigned to a variety of
functions by the programmer
language that the • Data
processor executes • May be used only to hold data and
cannot be employed in the calculation
of an operand address
• Address
• May be somewhat general purpose or
may be devoted to a particular
addressing mode
• Examples: segment pointers, index
registers, stack pointer
• Condition codes
• Also referred to as flags
• Bits set by the processor hardware as
the result of operations
7
Table 14.1
Condition Codes
+ 8
If interrupts are
enabled and an
Read the next
Interpret the opcode interrupt has
instruction from
and perform the occurred, save the
memory into the
indicated operation current process state
processor
and service the
interrupt
12
• The main line of activity consists of alternating instruction fetch and instruction
execution activities.
• After an instruction is fetched, it is examined to determine if any indirect
addressing is involved. If so, the required operands are fetched using indirect
addressing.
• Following execution, an interrupt may be processed before the next instruction
fetch.
13
14
During the fetch cycle, an instruction is read from memory. Figure 14.6 shows the flow of data
during this cycle. The PC contains the address of the next instruction to be fetched. This address is
moved to the MAR and placed on the address bus. The control unit requests a memory read, and
the result is placed on the data bus and copied into the MBR and then moved to the IR.
Meanwhile, the PC is incremented by 1, preparatory for the next fetch.
15
Once the fetch cycle is over, the control unit examines the contents of the IR to determine if it
contains an operand specifier using indirect addressing. If so, an indirect cycle is performed. As
shown in Figure 14.7, this is a simple cycle. The right- most N bits of the MBR, which contain the
address reference, are transferred to the MAR. Then the control unit requests a memory read, to get
the desired address of the operand into the MBR.
16
Like the fetch and indirect cycles, the interrupt cycle is simple and predictable (Figure 14.8). The
current contents of the PC must be saved so that the processor can resume normal activity after the
interrupt. Thus, the contents of the PC are transferred to the MBR to be written into memory. The
special memory location reserved for this purpose is loaded into the MAR from the control unit. It
might, for example, be a stack pointer. The PC is loaded with the address of the interrupt routine. As a
result, the next instruction cycle will begin by fetching the appropriate instruction.
17
Pipelining Strategy
To apply this
concept to
instruction
Similar to the use execution we must
of an assembly line recognize that an
in a manufacturing instruction has a
plant number of stages
While the second stage is executing the instruction, the first stage takes advantage of any unused memory cycles to fetch and buffer the next
instruction. This is called instruction prefetch or fetch overlap. Note that this approach, which involves instruction buffering, requires more
registers. In general, pipelining requires registers to store data between stages.
It should be clear that this process will speed up instruction execution. If the fetch and execute stages were of equal duration, the instruction
cycle time would be halved. However, if we look more closely at this pipeline (Figure 14.9b), we will see that this doubling of execution rate
is unlikely for two reasons:
-The execution time will generally be longer than the fetch time. Execution will involve reading and storing operands and the performance
of some operation. Thus, the fetch stage may have to wait for some time before it can empty its buffer.
-A conditional branch instruction makes the address of the next instruction to be fetched unknown. Thus, the fetch stage must wait until it
receives the next instruction address from the execute stage. The execute stage may then have to wait while the next instruction is fetched.
Guessing can reduce the time loss from the second reason. A simple rule is the following: When a conditional branch instruction is passed on
from the fetch to the execute stage, the fetch stage fetches the next instruction in memory after the branch instruction. Then, if the branch is
not taken, no time is lost. If the branch is taken, the fetched instruction must be discarded and a new instruction fetched.
+ Additional Stages 19
For the sake of illustration, let us assume equal duration. Using this assumption, Figure 14.10 shows that a six-
stage pipeline can reduce the execution time for 9 instructions from 54 time units to 14 time units.
Several comments are in order: The diagram assumes that each instruction goes through all six stages of the
pipeline. This will not always be the case. For example, a load instruction does not need the WO stage. However,
to simplify the pipeline hardware, the timing is set up assuming that each instruction requires all six stages. Also,
the diagram assumes that all of the stages can be performed in parallel. In particular, it is assumed that there are no
memory conflicts. For example, the FI, FO, and WO stages involve a memory access. The diagram implies that all
these accesses can occur simultaneously. Most memory systems will not permit that. However, the desired value
may be in cache, or the FO or WO stage may be null. Thus, much of the time, memory conflicts will not slow
down the pipeline.
21
Another difficulty is the conditional branch instruction, which can invalidate several instruction fetches.
A similar unpredictable event is an interrupt.
Figure 14.11 illustrates the effects of the conditional branch, using the same program as Figure 14.10.
Assume that instruction 3 is a conditional branch to instruction 15. Until the instruction is executed,
there is no way of knowing which instruction will come next. The pipeline, in this example, simply loads
the next instruction in sequence (instruction 4) and proceeds. The branch is not determined until the end
of time unit 7. At this point, the pipeline must be cleared of instructions that are not useful. During time
unit 8, instruction 15 enters the pipeline. No instructions complete during time units 9 through 12; this is
the performance penalty incurred because we could not anticipate the branch.
22
Figure 14.12 indicates the logic needed for pipelining to account for branches and
interrupts.
23
In Figure 14.13a (which corresponds to Figure 14.10), the pipeline is full at time 6, with 6 different instructions in
various stages of execution, and remains full through time 9; we assume that instruction I9 is the last instruction to be
executed. In Figure 14.13b, (which corresponds to Figure 14.11), the pipeline is full at times 6 and 7. At time 7,
instruction 3 is in the execute stage and executes a branch to instruction 15. At this point, instructions I4 through I7
are flushed from the pipeline, so that at time 8, only two instructions are in the pipeline, I3 and I15.
24
Figure 14.14a plots the speedup factor as a function of the number of instructions that are executed
without a branch.
Figure 14.14b shows the speedup factor as a function of the number of stages in the instruction pipeline
Thus, the larger the number of pipeline stages, the greater the potential for speedup. However, as a
practical matter, the potential gains of additional pipe- line stages are countered by increases in cost,
delays between stages, and the fact that branches will be encountered requiring the flushing of the
pipeline.
Pipeline Hazards
25
A pipeline hazard
occurs when the
pipeline, or some
portion of the There are three
pipeline, must stall types of hazards:
because conditions • Resource
do not permit • Data
continued execution • Control
A resource hazard occurs when two (or more) instructions that are already in the pipeline need the
same resource. The result is that the instructions must be executed in serial rather than parallel for a
portion of the pipeline. A resource hazard is sometime referred to as a structural hazard.
Now assume that main memory has a single port and that all instruction fetches and data reads and writes
must be performed one at a time. Further, ignore the cache. In this case, an operand read to or write from
memory cannot be performed in parallel with an instruction fetch. This is illustrated in Figure 14.15b,
which assumes that the source operand for instruction I1 is in memory, rather than a register. Therefore,
the fetch instruction stage of the pipeline must idle for one cycle before beginning the instruction fetch
for instruction I3. The figure assumes that all other operands are in registers.
27
Clock cycle
1 2 3 4 5 6 7 8 9 10
ADD EAX, EBX FI DI FO EI WO
I3 FI DI FO EI WO
I4 FI DI FO EI WO
A data hazard occurs when there is a conflict in the access of an operand location. In general terms, we
can state the hazard in this form: Two instructions in a program are to be executed in sequence and both
access a particular memory or register operand. If the two instructions are executed in strict sequence, no
problem occurs. However, if the instructions are executed in a pipeline, then it is possible for the operand
value to be updated in such a way as to produce a different result than would occur with strict sequential
execution. In other words, the program produces an incorrect result because of the use of pipelining.
+ Types of Data Hazard 28
Control Hazard
Also known as a branch hazard
Occurs when the pipeline makes the wrong decision on
a branch prediction
Brings instructions into the pipeline that must
subsequently be discarded
Dealing with Branches:
Multiple streams
Prefetch branch target
Loop buffer
Branch prediction
Delayed branch
30
Multiple Streams
A simple pipeline suffers a penalty for a
branch instruction because it must
choose one of two instructions to fetch
next and may make the wrong choice
Drawbacks:
• With multiple pipelines there are contention
delays for access to the registers and to memory
• Additional branch instructions may enter the
pipeline before the original branch decision is
resolved
31
Loop Buffer
Small, very-high speed memory maintained by the
instruction fetch stage of the pipeline and containing the
n most recently fetched instructions, in sequence
Benefits:
Instructions fetched in sequence will be available without the
usual memory access time
If a branch occurs to a target just a few locations ahead of the
address of the branch instruction, the target will already be in
the buffer
This strategy is particularly well suited to dealing with loops
Instruction to be
8 decoded in case of hi
Loop Buffer
(256 bytes)
Figure 14.17 gives an example of a loop buffer. If the buffer contains 256 bytes, and byte
addressing is used, then the least significant 8 bits are used to index the buffer. The remaining
most significant bits are checked to determine if the branch target lies within the environment
captured by the buffer.
Among the machines using a loop buffer are some of the CDC machines (Star- 100, 6600, 7600) and the CRAY-1. A
specialized form of loop buffer is available on the Motorola 68010, for executing a three-instruction loop involving the DBcc
(decrement and branch on condition) instruction (see Problem 14.14). A three-word buffer is maintained, and the processor
executes these instructions repeatedly until the loop condition is satisfied.
+ Branch Prediction 34
The first two approaches are the simplest. These either always assume that the branch will not be taken and continue to
fetch instructions in sequence, or they always assume that the branch will be taken and always fetch from the branch tar-
get. The predict-never-taken approach is the most popular of all the branch prediction methods. Studies analyzing program
behavior have shown that conditional branches are taken more than 50% of the time [LILJ88], and so if the cost of
prefetching from either path is the same, then always prefetching from the branch target address should give better
performance than always prefetching from the sequential path.
The final static approach makes the decision based on the opcode of the branch instruction. The processor assumes that the
branch will be taken for certain branch opcodes and not for others. [LILJ88] reports success rates of greater than 75% with
this strategy.
35
With a single bit, all that can be recorded is whether the last execution of this instruction resulted in a branch or not. A
shortcoming of using a single bit appears in the case of a conditional branch instruction that is almost always taken, such
as a loop instruction. With only one bit of history, an error in prediction will occur twice for each use of the loop: once on
entering the loop, and once on exiting.
If two bits are used, they can be used to record the result of the last two instances of the execution of the associated
instruction, or to record a state in some other fashion. Figure 14.18 shows a typical approach (see Problem 14.13 for other
possibilities). Assume that the algorithm starts at the upper-left-hand corner of the flowchart. As long as each succeeding
conditional branch instruction that is encountered is taken, the decision process predicts that the next branch will be
taken. If a single prediction is wrong, the algorithm continues to predict that the next branch is taken. Only if two
successive branches are not taken does the algorithm shift to the right-hand side of the flowchart. Subsequently, the
algorithm will predict that branches are not taken until two branches in a row are taken. Thus, the algorithm requires two
consecutive wrong predictions to change the prediction decision.
36
-The decision process can be represented more compactly by a finite-state machine, shown in Figure 14.19.
-The use of history bits, as just described, has one drawback: If the decision is made to take the branch, the
target instruction cannot be fetched until the tar- get address, which is an operand in the conditional branch
instruction, is decoded. Greater efficiency could be achieved if the instruction fetch could be initiated as soon
as the branch decision is made. For this purpose, more information must be saved, in what is known as a
branch target buffer, or a branch history table.
Next sequential
address
37
-Figure 14.20 contrasts this scheme with a predict-
Select
never-taken strategy. With the former strategy, the
Memory
instruction fetch stage always fetches the next
Branch Miss
sequential address. If a branch is taken, some logic in
E
Handling the processor detects this and instructs that the next
instruction be fetched from the target address (in
(a) Predict nevertaken strategy
addition to flushing the pipeline).
-The branch history table is treated as a cache. Each
Next sequential
prefetch triggers a lookup in the branch history table. If
address no match is found, the next sequential address is used
Branch
IPFAR
instruction Target for the fetch. If a match is found, a prediction is made
Select
address State
Lookup
address
based on the state of the instruction: Either the next
Memory
sequential address or the branch target address is fed to
the select logic.
Update
stage signals the branch history table logic with the
state result. The state of the instruction is updated to reflect
a correct or incorrect prediction. If the prediction is
incorrect, the select logic is redirected to the correct
Branch Miss
Handling
Redirect
address for the next fetch. When a conditional branch
E instruction is encountered that is not in the table, it is
added to the table and one of the existing entries is
(b) Branch history table strategy
discarded, using one of the cache replacement
Figur
e 14.20 Dealing with Branches algorithms discussed in Chapter 4.
38
Intel 80486 Pipelining
Objective is to fill the prefetch buffers with new data asFetch
Operates independently of the other stages to keep the
soon as the old data have been consumed by the
prefetch buffers full
instruction decoder
Decode stage 2
Also controls the computation of the more complex
Expands each opcode into control signals for the ALU
addressing modes
Execute
Stage includes ALU operations, cache access, and register update
Write back
Updates registers and status flags modified during the preceding execute stage
39
-Figure 14.21 shows examples of the operation of
the pipeline. Figure 14.21a shows that there is no
delay introduced into the pipeline when a memory
access is required.
-However, as Figure 14.21b shows, there can be a
delay for values used to compute memory
addresses. That is, if a value is loaded from
memory into a register and that register is then
used as a base register in the next instruction, the
processor will stall for one cycle.
Table 14.2
x86
Processor
Registers
44
45
46
+ 47
Interrupt Processing
Interrupts and Exceptions
Interrupts
Generated by a signal from hardware and it may occur at random
times during the execution of a program
Maskable (. The processor does not recognize a maskable interrupt unless the
interrupt enable flag (IF) is set. )
Nonmaskable (Recognition of such interrupts cannot be prevented. )
Exceptions
Generated from software and is provoked by the execution of an
instruction
Processor detected
Programmed
Table 14.3
x86
Exception
and
Interrupt
Vector
Table
+ The ARM Processor 49
Most application
ARM programs execute
architecture in user mode
supports • While the processor is
in user mode the
seven program being
executed is unable to
execution access protected
modes system resources or to
change mode, other
than by causing an
exception to occur
Table 14.4
ARM
Interrupt
Vector
+ Summary Processor
56
Structure and
Function
Chapter 6
Processor organization Instruction pipelining
Pipelining strategy
Register organization
Pipeline performance
User-visible registers
Pipeline hazards
Control and status
registers
Dealing with branches
Intel 80486 pipelining
Instruction cycle
The indirect cycle
The Arm processor
Data flow
Processor organization
Processor modes
The x86 processor family Register organization
Register organization Interrupt processing
Interrupt processing