Unit 3 Students

Download as pdf or txt
Download as pdf or txt
You are on page 1of 37

22-Sep-2024

UNIT 3
• Basic Processing Unit: Concepts - Instruction Execution - Hardware Components - Instruction Fetch and
Execution Steps -Control Signals - Hardwired Control.

• Pipelining: Basic Concept - Pipeline Organization- Pipelining Issues - Data Dependencies - Memory Delays -
Branch Delays - Resource Limitations - Performance Evaluation -Superscalar Operation.

Unit – 3: Basic Processing Unit-Fundamental Concepts

To execute a program (Basic Steps): Straight Line Sequencing


▪ A typical computing task consists of a series of steps specified by a
sequence of machine instructions that constitute a program.
▪ The processor fetches one instruction at a time and performs the
operations specified.
▪ Instructions are fetched from successive memory locations until a
branch or a jump instruction is encountered.
▪ The processor keeps tracks of the address of the memory location
containing the next instruction to be fetched using the program counter,
PC.
▪ After fetching an instruction, the contents of the PC are updated to
point to the next instruction in the sequence.

1
22-Sep-2024

To execute a program (Basic Steps): Branch Instructions

▪ A branch instruction may load a different value into the PC.

▪ The Program Counter (PC) value changes according to the branch


instruction (i.e., condition) and jumps to the particular instruction
based on the condition.

▪ Hence in this case, the sequence of execution is broken.

▪ Branching instructions include: JMP, LOOP instructions.

Steps to execute an instruction

2
22-Sep-2024

Processor Organization
Internal processor
bus
Control signals

PC

Instruction
Address
decoder and
lines
MAR control logic

Memory
bus

MDR
Data
lines IR

Y
Constant 4 R0

Select MUX

Add
A B
ALU Sub R( n - 1)
control ALU
lines
Carry -in
XOR TEMP

Figure 7.1. Single-bus organization of the datapath inside a processor.

Datapath

• The primary function of a computer system is to execute a program, sequence of instructions. These
instructions are stored in computer memory.
• These instructions are executed to process data which are already loaded in the computer memory through
some input devices.
• After processing the data, the result is either stored in the memory for further reference, or it is sent to the
outside world through some output port.
• To perform the execution of an instruction, in addition to the arithmetic logic unit, and control unit, the
processor contains a number of registers used for temporary storage of data and some special function
registers.
• The special function registers include program counters (PC), instruction registers (IR), memory address
registers (MAR) and memory and memory data registers (MDR).
• The Program counter is one of the most critical registers in CPU. It monitors the execution of instructions.
It keeps track on which instruction is being executed and what the next instruction will be.
• The instruction register IR is used to hold the instruction that is currently being executed.
• The contents of IR are available to the control unit, which generate the timing signals that control, the
various processing elements involved in executing the instruction.
• The two registers MAR and MDR are used to handle the data transfer between the main memory and the
processor
• The MAR holds the address of the main memory to or from which data is to be transferred.
• The MDR contains the data to be written into or read from the addressed word of the main memory. 6

3
22-Sep-2024

void main() Programming Example


{
int var1, var2=20, var3: 4001: 21
R1=var1, R2=var2, R3=var3: var1
scanf(“%d”,&var1): 4002: 01
1001:00
var3=var1+var2: 4003: 10
Equivalent Machinecode 1002:0A
printf(“%d”,var3): 4004: 3E
var2
} 4005: 14
Load R1, Loc(var1=10) 2001:00
PC MAR MDR IR Mvi R2, 14 2002:14 4006: 85
4001 Add R3, R1, R2 Var3 4007: 32
4004 4001 21 21 -> R3=R1+R2 3001:00 4008: 01
4006 4004 3E 3E Store R3, Loc(var3) 3002:1E 4009: 30
4007 4006 85 85 400A: 76
400A 4007 32 32
400A 76 76 7

Instruction Execution
• It refers to the process by which a computer's processor (CPU) carries out the instructions specified in a program.
This process involves several key steps and components.
1. Instruction Fetch
• Program Counter (PC): The CPU uses the Program Counter to keep track of the address of the next
instruction to execute.
• Memory Access: The instruction is fetched from memory (RAM) based on the address in the Program
Counter.
2. Instruction Decode
• Instruction Register (IR): The fetched instruction is loaded into the Instruction Register.
• Decoding: The CPU decodes the instruction to determine what operation is to be performed. This involves
interpreting the opcode (operation code) and identifying the operands (data or addresses involved).
3. Operand Fetch
• Registers or Memory Access: Depending on the instruction, operands might need to be fetched from registers
or memory locations. For instructions that involve data in memory, the CPU may need to access specific
memory locations.
4. Instruction Execution
• Execution Unit: The actual operation specified by the instruction is performed by the CPU's execution units.
This could involve arithmetic operations (e.g., addition, subtraction), logical operations (e.g., AND, OR), or
data movement operations (e.g., loading data into a register).

4
22-Sep-2024

5. Write Back
• Update Registers or Memory: The result of the execution is written back to a register or memory location,
depending on the instruction’s requirements.
6. Update Program Counter
• Next Instruction Address: The Program Counter is updated to point to the address of the next instruction,
preparing the CPU to fetch the next instruction in the sequence.
Key Concepts
• Pipelining: Modern CPUs use pipelining to overlap these stages for improved performance. Different
instructions can be at different stages of execution simultaneously.
• Instruction Set Architecture (ISA): The ISA defines the set of instructions a CPU can execute and the format of
these instructions.
• Control Unit: This part of the CPU coordinates the fetching, decoding, execution, and write-back processes.
• Data Path: The internal pathways through which data and instructions travel within the CPU.
• Clock Cycles: Each step of the process typically takes one or more clock cycles, which are the basic unit of
time in the CPU’s operation.
• This sequence of steps is often referred to as the Instruction Cycle or Fetch-Decode-Execute Cycle. The
specific details of instruction execution can vary depending on the architecture of the CPU, such as whether it's
a RISC (Reduced Instruction Set Computer) or CISC (Complex Instruction Set Computer) architecture.

Steps to execute an instruction


To execute an instruction, the processor has to perform the following three steps:
1. Fetch Phase:
▪ Fetch the contents of memory location pointed to by the PC.
▪ The contents of this location are interpreted as an instruction to be executed.
▪ Fetching an instruction and loading it into the IR is usually referred to as the instruction-fetch phase.
▪ They are loaded into the Instruction Register (IR)
IR  [ [PC] ]
▪ Assume, word size = 4 bytes.
▪ Each instruction size = 4 bytes
▪ Memory is byte addressable.
▪ Then, increment the contents of PC by 4.
▪ (i.e.) PC  [PC] + 4

10

5
22-Sep-2024

2. Decode Phase:
▪ Decode the instruction from the IR.
3. Execute Phase
▪ Carry out the actions specified by the instruction in the IR by executing it.

▪ Performing the operation specified in the instruction constitutes the instruction execution phase. With
few exceptions, the operation specified by an instruction can be carried out by performing one or
more of the following actions:

➢ Read the contents of a given memory location and load them into a processor register.

➢ Read data from one or more processor registers.

➢ Perform an arithmetic or logic operation and place the result into a processor register.

➢ Store data from a processor register into a given memory location.

11

▪ The processor communicates with the memory through the processor-memory interface, which
transfers data from and to the memory during Read and Write operations.

▪ The instruction address generator updates the contents of the PC after every instruction is fetched.

▪ The register file is a memory unit whose storage locations are organized to form the processor’s
general-purpose registers.

▪ During execution, the contents of the registers named in an instruction that performs an arithmetic or
logic operation are sent to the arithmetic and logic unit (ALU), which performs the required
computation.

▪ The results of the computation are stored in a register in the register file.

12

6
22-Sep-2024

Hardware Components
• The processor communicates with the memory through the
processor-memory interface, which transfers data from and
to the memory during Read and Write operations.
• The instruction address generator updates the contents of
the PC after every instruction is fetched.
• The register file is a memory unit whose storage locations
are organized to form the processor’s general-purpose
registers.
• During execution, the contents of the registers named in
an instruction that performs an arithmetic or logic
operation are sent to the arithmetic and logic unit (ALU),
which performs the required computation. The results of
the computation are stored in a register in the register file.

13

Register File
• General-purpose registers are usually implemented in the form of a register file, which is a small and fast
memory block. It consists of an array of storage elements, with access circuitry that enables data to be read
from or written into any register.
• The access circuitry is designed to enable two registers to be read at the same time, making their contents
available at two separate outputs, A and B.
• The register file has two address inputs that select the two registers to be read. These inputs are connected to
the fields in the IR that specify the source registers, so that the required registers can be read. The register file
also has a data input, C, and a corresponding address input to select the register into which data are to be
written. This address input is connected to the IR field that specifies the destination register of the instruction.
• The inputs and outputs of any memory unit are often called input and output ports stage performs the actions
needed in one of the steps.
• Two alternatives for implementing a dual-ported register file. A memory unit that has two output ports is said to
be dual-ported.
14

7
22-Sep-2024

• One possibility is to use a single set of registers with duplicate data paths and access circuitry that enable two
registers to be read at the same time.
• An alternative is to use two memory blocks, each containing one copy of the register file. Whenever data are
written into a register, they are written into both copies of that register. Thus, the two files have identical
contents. When an instruction requires data from two registers, one register is accessed in each file. In effect, the
two register files together function as a single dual-ported register file.

ALU
• The arithmetic and logic unit is used to manipulate data. It performs arithmetic operations such as addition and
subtraction, and logic operations such as AND, OR, and XOR.
• When an instruction that performs an arithmetic or logic operation is
being executed, the contents of the two registers specified in the
instruction are read from the register file and become available at
outputs A and B.
• Output A is connected directly to the first input of the ALU, InA, and
output B is connected to a multiplexer, MuxB.
• The multiplexer selects either output B of the register file or the
immediate value in the IR to be connected to the second ALU input,
InB.
• The output of the ALU is connected to the data input, C, of the register
file so that the results of a computation can be loaded into the
destination register. 16

8
22-Sep-2024

Datapath
• Instruction processing consists of two phases: the fetch phase and the execution phase. It is convenient to divide
the processor hardware into two corresponding sections. One section fetches instructions and the other executes
them.
• The section that fetches instructions is also responsible for decoding them and for generating the control signals
that cause appropriate actions to take place in the execution section.
• The execution section reads the data operands specified in an instruction, performs the required computations,
and stores the results
• An instruction is fetched in step 1 by hardware stage 1 and placed into the IR. It is decoded, and its source
registers are read in step 2. The information in the IR is used to generate the control signals for all subsequent
steps. Therefore, the IR must continue to hold the instruction until its execution is completed.
• Data read from the register file are placed in registers RA and RB. Register RA provides the data to input InA of
the ALU. Multiplexer MuxB forwards either the contents of RB or the immediate value in the IR to the ALU’s
second input, InB. The ALU constitutes stage 3, and the result of the computation it performs is placed in
register RZ
17

Datapath
• Recall that for computational instructions, such as an Add
instruction, no processing actions take place in step 4.
• During that step, multiplexer MuxYin The contents of RY are
transferred to the register file in step 5 and loaded into the
destination register.
• For this reason, the register file is in both stages 2 and 5. It
is a part of stage 2 because it contains the source registers
and a part of stage 5 because it contains the destination
register.
• For Load and Store instructions, the effective address of the
memory operand is computed by the ALU in step 3 and
loaded into register RZ. From there, it is sent to the
memory, which is stage 4.

18

9
22-Sep-2024

Instruction Fetch Section

• Figure shows the instruction fetch section of the processor.


• The addresses used to access the memory come from the PC when fetching
instructions and from register RZ in the datapath when accessing instruction
operands.
• Multiplexer MuxMA selects one of these two sources to be sent to the
processor-memory interface. The PC is included in a larger block, the
instruction address generator, which updates the contents of the PC after
each instruction is fetched.
• The instruction read from the memory is loaded into the IR, where it stays
until its execution is completed and the next instruction is fetched. The
contents of the IR are examined by the control circuitry to generate the
signals needed to control all the processor’s hardware. They are also used
by the block labeled Immediate.
19

Instruction Fetch Section

• An adder is used to increment the PC by 4 during straight-line execution. It


is also used to compute a new value to be loaded into the PC when
executing branch and subroutine call instructions.
• One adder input is connected to the PC. The second input is connected to a
multiplexer, MuxINC, which selects either the constant 4 or the branch
offset to be added to the PC. The branch offset is given in the immediate
field of the IR and is sign-extended to 32 bits by the Immediate block.
• The output of the adder is routed to the PC via a second multiplexer,
MuxPC, which selects between the adder and the output of register RA. The
latter connection is needed when executing subroutine linkage instructions.
• Register PC-Temp is needed to hold the contents of the PC temporarily
during the process of saving the subroutine or interrupt return address.

20

10
22-Sep-2024

Instruction Fetch and Execution Steps

Actions involved in fetching and executing instructions.

We illustrate these actions using a few representative RISC-style instructions.

1. Load Instructions

2. Arithmetic and Logic Instructions

3. Store Instructions

21

Actions involved in fetching and executing instructions.

We illustrate these actions using a few representative RISC-style instructions.

1. Load Instructions

▪ Consider the instruction: Load R5, X(R7)

▪ The above example uses the Index addressing mode to load a word of data
from memory location X + [R7] into register R5.

22

11
22-Sep-2024

Actions involved in fetching and executing instructions: Load Instructions


Execution of this instruction involves the following actions:

1. Fetch the instruction from the memory.

2. Increment the program counter.

3. Decode the instruction to determine the operation to be performed.

4. Read register R7.

5. Add the immediate value X to the contents of R7.

6. Use the sum X + [R7] as the effective address of the source operand, and read the contents of that
location in the memory.

7. Load the data received from the memory into the destination register, R5.

23

Actions involved in fetching and executing instructions: Load Instructions


▪ Depending on hardware organization, some of these actions can be performed at the same time.
▪ Let us assume that the processor has five hardware stages, which is a commonly used
arrangement in RISC-style processors.
▪ Execution of each instruction is divided into five steps, such that each step is carried out by one
hardware stage.
▪ In this case, fetching and executing the Load instruction above can be completed as follows:

1. Fetch the instruction and increment the program counter.


2. Decode the instruction and read the contents of register R7 in the register file.
3. Compute the effective address.
4. Read the memory source operand.
5. Load the operand into the destination register, R5.
Dr. N. P. PONNUVIJI--Unit 3: COA-Basic Processing Unit and Pipelining 24

12
22-Sep-2024

Actions involved in fetching and executing instructions:

2. Arithmetic and Logic Instructions

Instructions that involve an arithmetic or logic operation can be executed using similar steps.

▪ They differ from the Load instruction in two ways:

▪ There are either two source registers, or a source register and an immediate source operand.

▪ No access to memory operands is required.

▪ A typical instruction of this type is:

Add R3, R4, R5

25

Actions involved in fetching and executing instructions: Arithmetic and Logic


Instructions:
Add R3, R4, R5
It requires the following steps:
1. Fetch the instruction and increment the program counter.
2. Decode the instruction and read the contents of source registers R4 and R5.
3. Compute the sum [R4] + [R5].
4. Load the result into the destination register, R3.

26

13
22-Sep-2024

Actions involved in fetching and executing instructions: Arithmetic and Logic


Instructions:
▪ If the instruction uses an immediate operand, as in Add R3, R4, #1000 the immediate value is
given in the instruction word.

▪ Once the instruction is loaded into the IR, the immediate value is available for use in the
addition operation.

▪ The step sequence can be used, with steps 2 and 3 modified as:

2. Decode the instruction and read register R4.

3. Compute the sum [R4] + 1000.

27

Actions involved in fetching and executing instructions:

▪ 3. Store Instructions

▪ The five-step sequence is suitable for all Load and Store instructions, because the addressing modes that can be
used in these instructions are special cases of the Index mode.

▪ Most RISC-style processors provide one general-purpose register, usually register R0, that always contains the
value zero.

Store R1, X(R0) ; Store the value in register R1 to memory address (X + R0)

▪ When R0 is used as the index register, the effective address of the operand is the immediate value X. This is
the Absolute addressing mode.

▪ Alternatively, if the offset X is set to zero, the effective address is the contents of the index register, Ri. This is
the Indirect addressing mode.

28

14
22-Sep-2024

Instruction Fetch and Execution Steps –


Branching Instructions

• The instruction is fetched and the PC is


incremented as usual in step 1. After the instruction
has been decoded in step 2, multiplexer MuxINC
selects the branch offset in the IR to be added to
the PC in step 3. This is the address that will be
used to fetch the next instruction.
• Execution of a Branch instruction is completed in
step 3.
• No action is taken in steps 4 and 5.

29

• In processors that do not use condition-code flags, the branch instruction


specifies a compare-and-test operation that determines the branch condition.
For example, the instruction
Branch_if_[R5]==[R6] LOOP
• results in a branch if the contents of registers R5 and R6 are identical. When this
instruction is executed, the register contents are compared, and if they are
equal, a branch is made to location LOOP.

30

15
22-Sep-2024

Subroutine Call Instructions


• Subroutine calls and returns are implemented in a similar manner to branch instructions. The address of the
subroutine may either be computed using an immediate value given in the instruction or it may be given in full in
one of the general-purpose registers.
Call_Register R9
• which calls a subroutine whose address is in register R9. The contents of that register are read and placed in RA
in step 2. During step 3, multiplexer MuxPC selects its 0 input, thus transferring the data in register RA to be
loaded into the PC.

31

Control Signals or Control Unit


• Control signals guide the flow of data and instructions between the CPU, memory, and peripheral devices.
Here's an overview of key control signals and their functions:
1. Clock Signal
• Synchronizes the timing of all operations within the computer.
• The clock signal determines the timing for data transfers, instruction execution, and other operations,
ensuring that everything happens in a coordinated manner.
2. Reset Signal
• Initializes or resets the computer system to a known state.
• When activated, it clears the CPU registers, resets memory addresses, and ensures that the system
starts from a known baseline, usually upon power-up.
3. Read/Write Signals
• Indicate whether data is being read from or written to a memory location or I/O device.
• Read Signal: Activates when the CPU wants to read data from memory or an I/O device.
• Write Signal: Activates when the CPU wants to write data to memory or an I/O device.
4. Memory Address Signals
• Specify the address of the memory location or I/O device involved in the operation.
• Address lines carry the memory or I/O address to identify where data should be read from or written to.
5. Data Signals
• Transfer actual data between the CPU, memory, and I/O devices.
• Data lines carry the binary information being read from or written to memory or I/O devices.

16
22-Sep-2024

6. Interrupt Signals
• Notify the CPU that an event requiring immediate attention has occurred.
• When an interrupt signal is received, the CPU temporarily halts its current operations to address the interrupting event.
7. Control Bus Signals
• Manage various control operations across the system.
• Include signals such as Memory Read (MEMR), Memory Write (MEMW), and Input/Output Read (IOR), among others.
8. Status Signals
• Provide information about the status of different components.
• Signals such as Interrupt Request (IRQ) or Flag Status might indicate the status of a device or the result of an operation.
9. Bus Control Signals
• Manage access to the system bus.
• Signals like Bus Request (BRQ) and Bus Grant (BG) control access to the shared system bus, coordinating which
component can use the bus at a given time.
10. DMA (Direct Memory Access) Signals
• Manage data transfers directly between memory and I/O devices without CPU intervention.
• Signals such as DMA Request (DMARQ) and DMA Acknowledge (DMAACK) coordinate the data transfer process in DMA
operations.

• Control signals are essential for the proper operation of a computer system. They ensure that different parts of the system
communicate and coordinate effectively, allowing for efficient execution of instructions and management of data.

Control Signals or Control Unit

• A Central Processing Unit is the most important component of a computer system. A control unit is a part of
the CPU.
• A control unit controls the operations of all parts of the computer but it does not carry out any data processing
operations.
Functions of the Control Unit
• It coordinates the sequence of data movements into, out of, and between a processor’s many sub-units.
• It interprets instructions.
• It controls data flow inside the processor.
• It receives external instructions or commands to which it converts to sequence of control signals.
• It controls many execution units(i.e. ALU, data buffers and registers) contained within a CPU.
• It also handles multiple tasks, such as fetching, decoding, execution handling and storing results.
Types of Control Unit
There are two types of control units:
• Hardwired
• Micro programmable control unit.

17
22-Sep-2024

Hardwired Control Unit


• With the help of generating control signals, the hardwired control unit is able to execute the instructions at a
correct time and proper sequence.
• As compared to the micro-programmed, the hardwired CU is generally faster.
• The control signals are generated with the help of PLA circuit and step counter. Here the Central processing
unit requires all these control signals.
• With the help of hardware, the hardwired control signals are generated, and it basically uses the circuitry
approach.

• The image of a hardwired control unit is shown in


figure.
• The instruction register is a type of processor register
used to contain an instruction that is currently in
execution.
• The instruction register is used to generate the OP-
code bits respective of the operation as well as the
addressing mode of operands.

• The above generated Op-code bits are received in the field of an instruction decoder. The instruction decoder
interprets the operation and instruction's addressing mode. Now on the basis of the addressing mode of
instruction and operation which exists in the instruction register, the instruction decoder sets the corresponding
Instruction signal INSi to 1.
• Five steps are used to execute each instruction, i.e., instruction fetch, decode, operand fetch, ALU and
memory store.
• The information about the current step of instruction must be known by the control unit. Now the Step Counter is
implemented, which is used to contain the signals from T1,…., T5. Now on the basis of the step which contains
the instruction, one of the signals of a step counter will be set from T1 to T5 to 1.
• The one-clock cycle of the clock will be completed for each step. For example, suppose that if the stop counter
sets T3 to 1, then after completing one clock cycle, the step counter will set T4 to 1.
• What will happen if the execution of an instruction is interrupted for some reason? Will the step counter still be
triggered by the clock? The answer to this question is No. As long as the execution is current step is completed,
the Counter Enable will "disable" the Step Counter so that it will stop then increment to the next step signal.
• What if the execution of instruction depends on some conditions? In this case, the Condition Signals will be
used. There are various conditions in which the signals are generated with the help of control signals that can be
less than, greater than, less than equal, greater than equal, and many more.
• The external input is the last one. It is used to tell the Control Signal Generator about the interrupts, which will
affect the execution of an instruction.
• So, on the basis of the input obtained by the conditional signals, step counter, external inputs, and instruction
register, the control signals will be generated with the help of Control signal Generator.

18
22-Sep-2024

Hardwired Control Unit Micro-programmed Control Unit


With the help of a hardware circuit, we can implement the While with the help of programming, we can implement the
hardwired control unit. Also known as circuitry approach. micro-programmed control unit.
It uses the logic circuit so that it can generate the control It uses microinstruction which can generate control signals.
signals, which are required for the processor. Control memory is used to store these microinstructions.
Control signals are going to be generated in the form of hard It is very easy to modify because the modifications are going to
wired. So it is very difficult to modify. be performed only at the instruction level.
Its more costly as compared to the micro-programmed control Its less costly as compared to the hardwired CU because this
unit. unit requires microinstruction to generate the control signals.
The complex instructions cannot be handled because when we The micro-programmed control unit is able to handle the
design a circuit for this instruction, it will become complex. complex instructions.
Because of the hardware implementation, the hardwired The micro-programmed control unit is able to generate control
control unit is able to use a limited number of instructions. signals for many instructions.

The hardwired control unit is used in those types of computers The micro-programmed control unit is used in those types of
that also use the RISC (Reduced instruction Set Computers). computers that also use the CISC (Complex instruction Set
Computers).
In the hardwired control unit, the hardware is used to generate In this CU, the microinstructions are used to generate the
only the required control signals. That's why this control unit control signals. That's why this CU is slower than the
is faster as compared to the micro-programmed control unit. hardwired control unit.

Pipelining
• To improve the performance of a CPU we have two options:
• Improve the hardware by introducing faster circuits.
• Arrange the hardware such that more than one operation can be performed at the same time. Since there
is a limit on the speed of hardware and the cost of faster circuits is quite high, we have to adopt the
2nd option.
• Pipelining is a process of arrangement of hardware elements of the CPU such that its overall performance is
increased. Simultaneous execution of more than one instruction takes place in a pipelined processor.
• Let us see a real-life example that works on the concept of pipelined operation. Consider a water bottle
packaging plant. Let there be 3 stages that a bottle should pass through, Inserting the bottle(I), Filling water in
the bottle(F), and Sealing the bottle(S).
• Let us consider these stages as stage 1, stage 2, and stage 3 respectively. Let each stage take 1 minute to
complete its operation. Now, in a non-pipelined operation, a bottle is first inserted in the plant, after 1 minute
it is moved to stage 2 where water is filled. Now, in stage 1 nothing is happening. Similarly, when the bottle
moves to stage 3, both stage 1 and stage 2 are idle. But in pipelined operation, when the bottle is in stage 2,
another bottle can be loaded at stage 1. Similarly, when the bottle is in stage 3, there can be one bottle each in
stage 1 and stage 2. So, after each minute, we get a new bottle at the end of stage 3.
• Hence, the average time taken to manufacture 1 bottle is:

19
22-Sep-2024

Pipelining
• Hence, the average time taken to manufacture 1 bottle is:
Without pipelining = 9/3 minutes = 3m
I F S | | | | | |
| | | I F S | | |
| | | | | | I F S (9 minutes)

With pipelining = 5/3 minutes = 1.67m


I F S | |
| I F S |
| | I F S (5 minutes)

Thus, pipelined operation increases the efficiency of a system.

Design of a basic pipeline


• In a pipelined processor, a pipeline has two ends, the input end and the output end. Between these ends, there are multiple
stages/segments such that the output of one stage is connected to the input of the next stage and each stage performs a
specific operation.
• Interface registers are used to hold the intermediate output between two stages. These interface registers are also called
latch or buffer.
• All the stages in the pipeline along with the interface registers are controlled by a common clock.

Pipelining

Execution in a pipelined processor Execution sequence of instructions in a pipelined processor can be


visualized using a space-time diagram.
For example, consider a processor having 4 stages and let there be 2 instructions to be executed. We can
visualize the execution sequence through the following space-time diagrams:

Total time = 5 cycles

20
22-Sep-2024

Pipeline Organization
Pipeline Stages RISC processor has 5 stage instruction pipeline to execute all the instructions in the RISC
instruction set.
Following are the 5 stages of the RISC pipeline with their respective operations:
• Stage 1 (Instruction Fetch) In this stage the CPU reads instructions from the address in the memory whose
value is present in the program counter.
• Stage 2 (Instruction Decode) In this stage, instruction is decoded and the register file is accessed to get the
values from the registers used in the instruction.
• Stage 3 (Instruction Compute) In this stage, ALU operations are performed.
• Stage 4 (Memory Access) In this stage, memory operands are read and written from/to the memory that is
present in the instruction.
• Stage 5 (Write Back) In this stage, computed/fetched value is written back to the register present in the
instructions.

• In the first stage of the pipeline, the program counter (PC) is used to fetch a new instruction. As other instructions are
fetched, execution proceeds through successive stages.
• At any given time, each stage of the pipeline is processing a different instruction. Information such as register addresses,
immediate data, and the operations to be performed must be carried through the pipeline as each instruction proceeds
from one stage to the next. This information is held in interstage buffers.
The interstage buffers are used as follows:
• Interstage buffer B1 feeds the Decode stage with a newly-fetched instruction.
• Interstage buffer B2 feeds the Compute stage with the two operands read from the register file, the source/destination
register identifiers, the immediate value derived from the instruction, the incremented PC value used as the return address
for a subroutine call, and the settings of control signals determined by the instruction decoder. The settings for control
signals move through the pipeline to determine the ALU operation, the memory operation, and a possible write into the
register file.
• Interstage buffer B3 holds the result of the ALU operation, which may be data to be written into the register file or an
address that feeds the Memory stage. In the case of a write access to memory, buffer B3 holds the data to be written.
These data were read from the register file in the Decode stage. The buffer also holds the incremented PC value passed
from the previous stage, in case it is needed as the return address for a subroutine-call instruction.
• Interstage buffer B4 feeds the Write stage with a value to be written into the register file. This value may be the ALU
result from the Compute stage, the result of the Memory access stage, or the incremented PC value that is used as the
return address for a subroutine-call instruction.

21
22-Sep-2024

• But, there are times when it is not possible to have a new


instruction enter the pipeline in every cycle. Consider the case of
two instructions, Ij and Ij+1, where the destination register for
instruction Ij is a source register for instruction Ij+1. The result
of instruction Ij is not written into the register file until cycle 5,
but it is needed earlier in cycle 3 when the source operand is
read for instruction Ij+1
• Any condition that causes the pipeline to stall is called a hazard.
We have just described an example of a data hazard, where the
value of a source operand of an instruction is not available when
needed. Other hazards arise from memory delays, branch
instructions, and resource limitations. The next several sections
describe these hazards in more detail, along with techniques to
mitigate their impact on performance.

Pipelining

22
22-Sep-2024

Example

Example of pipelining processing

R1 to R5 are
registers

MUL and ADD are


combinational
circuits registers

23
22-Sep-2024

Content of registers in
First clock Pulse:
Pipeline example • Transfers the A1 and B1 to R1 and R2
Second Clock Pulse:
• Transfers the product of R1 and R2 into R3 and C into R4
• Same clock pulse transfers A2 and B2 into R1 and R2
Third Clock Pulse:
• Operates on all segments simultaneously
• Places A3 and B3 into R1 and R2
• Transfers the product of R1 and R2 into R3 and C into R4
• Places the sum of R3 and R4 into R5
• From there on each clock produces a new output and
moves the data one step down the pipeline.
• This happens as long as new input data flow into the
system
• When input data not available, the clock must
continue until the last output emerges out of the
pipeline

Problem 1 Solution

24
22-Sep-2024

Problem 2 Solution

Problem 3 Solution

25
22-Sep-2024

Advantages/Disadvantages
Advantages:
■ More efficient use of processor
■ Quicker time of execution of large number of instructions

Disadvantages:
▪ Pipelining involves adding hardware to the chip
▪ Inability to continuously run the pipeline at full speed because of pipeline
hazards which disrupt the smooth execution of the pipeline.

Speed Up
For a pipeline processor:
• k-stage pipeline processes with a clock cycle time tp is used to execute n tasks.
• The time required for the first task T1 to complete the operation = k*tp (if k segments in
the pipe)
• The time required to complete (n-1) tasks = (n-1) *tp
• To complete n tasks using a k-segment pipeline requires =(k + (n-1)) *tp
For the non-pipelined processor :
• Time to complete each task = tn
• Total time required to complete n tasks=n*tn

• Speed up = non pipelining processing/pipelining processing

26
22-Sep-2024

Mov R1, A
Mov R2, B
Add R3, R1, R2
Inc R3
Store C, R3

Mov R1, A F D E M W

Mov R2, B F D E M W

Add R3, R1, R2 F D E M W

Inc R3 F D E M W

Store C, R3
F D E M W

27
22-Sep-2024

Pipelining Issues / Hazards


• Hazard refers to situations that can prevent the next instruction in the pipeline from
executing during its designated clock cycle.
• There are three primary types of hazards:

• Structural Hazards: Occur when hardware resources are insufficient to support all
concurrent operations in a pipeline. For example, if a single memory unit is used for both
instruction fetch and data access, a conflict can arise.
• Data Hazards: Arise when an instruction depends on the result of a previous instruction
that has not yet completed. This can be classified into:
– Read After Read (RAR)
– Read After Write (RAW)
– Write After Read (WAR)
– Write After Write (WAW)
• Control Hazards: Occur due to branch instructions that change the flow of execution,
making it uncertain which instruction should be fetched next. This can cause delays while
the pipeline waits to resolve the branch.

Structural Hazards
• Structural Hazard
– Hardware cannot support this combination of instructions - two instructions need the
same resource
• It may be too expensive too eliminate a structural hazard, in which case the pipeline should
stall.
• Stalling: The pipeline might need to stall the execution of one instruction until the other
completes, reducing overall performance.
• When the pipeline stalls, no instructions are issued until the hazard has been resolved
• Methods to prevent/overcome Structural Hazards
– Duplicate resources
– Pipeline the resource
– Reorder the instructions

28
22-Sep-2024

Data Hazards or Data Dependencies


• Instruction depends on result of prior instruction still in the pipeline and not completed
• These data hazards are categorized based on the order of read and write access in the
instruction.
– Read After Read (RAR) : when two or more instructions attempt to read the same data
from a register or memory location. In this situation, the second instruction reads the
data after the first instruction has also read it, but before any write operation occurs.
– Read After Write (RAW): when an instruction needs to read a value that is produced by a
previous instruction that has not yet completed its write operation. This situation can
lead to incorrect results if not properly managed.
– Write After Read (WAR): when an instruction writes to a location before a previous
instruction reads from it. Although it’s less common than other hazards, it can still lead
to incorrect program behavior if not properly managed.
– Write After Write (WAW): when two instructions attempt to write to the same register or
memory location in such a way that the order of writes can affect the final value. This can
lead to incorrect results if the writes are not properly ordered.

RAR (Read After Read)


LOAD R1, 0(R2) ; Load the value from memory address in R2 into R1
ADD R3, R1, R4 ; Add the value in R1 to R4 and store in R3
Hazard Explanation
• Dependency: The second instruction (ADD) reads the value of R1, which was loaded by the first instruction
(LOAD). Both instructions read from the same location (R1).
• Potential Conflict: If the second instruction attempts to read R1 before the first instruction has completed its
execution, it might not have the updated value, leading to inefficient use of the pipeline.
Consequences
While RAR does not lead to incorrect values, it can cause:
• Stalls: The pipeline may need to stall until the first instruction completes, affecting overall performance.
• Resource Contention: If the architecture doesn't handle concurrent reads effectively, it can slow down
execution.
Mitigation Strategies
1.Out-of-Order Execution: Allow other independent instructions to execute while waiting for the read to
complete, thus reducing stalls.
2.Instruction Scheduling: Reorder instructions during compilation to minimize RAR hazards and improve
throughput.
3.Using Multiple Read Ports: Some architectures support multiple read ports for registers, allowing
simultaneous reads without conflicts.

29
22-Sep-2024

RAW (Read After Write)


Instruction 1 (Write): ADD R1, R2, R3 ; R1 = R2 + R3
Instruction 2 (Read): SUB R4, R1, R5 ; R4 = R1 - R5
Hazard Explanation
• Dependency: The second instruction (SUB) depends on the result of the first instruction (ADD). If the second instruction
executes before the first instruction completes its write to R1, it will read an incorrect or old value of R1.
• Timing Issue: In a pipelined architecture, the first instruction might be in the process of executing while the second
instruction is attempting to read R1, leading to a RAW hazard.
Consequences
If not addressed, a RAW hazard can lead to incorrect calculations and program behavior.

Mitigation Strategies
Data Forwarding:
The processor can forward the result of the ADD operation directly to the SUB instruction without writing it back to the
register file first, allowing the second instruction to use the correct value.
Stalling:
The pipeline can introduce a stall (delay) for the second instruction until the first instruction has completed its write
operation.
Out-of-Order Execution:
More advanced architectures may allow instructions to be executed out of order, enabling the CPU to continue executing
other instructions while waiting for the result of the dependent instruction.

WAR (Write After Read )


Consider the following sequence of instructions:
Instruction 1 (Read): ADD R1, R2, R3 ; R1 = R2 + R3
Instruction 2 (Write): SUB R2, R4, R5 ; R2 = R4 - R5
Hazard Explanation
• Dependency: The first instruction reads the value of R2, and the second instruction writes to R2. If the second instruction
executes before the first instruction completes its read operation, the first instruction may read an outdated or incorrect
value of R2.
• Execution Order: In a pipelined architecture, if the write operation of Instruction 2 occurs before Instruction 1 has a
chance to read R2, it can lead to incorrect calculations.
Consequences
If a WAR hazard occurs, the first instruction may not retrieve the expected value, potentially leading to erroneous results.

Mitigation Strategies
In-Order Execution:
Ensure that instructions are executed in their original order to prevent conflicts between reads and writes.
Pipeline Interlocks:
Implement hardware mechanisms that detect potential WAR hazards and stall the pipeline until it is safe to proceed.
Register Renaming:
Use register renaming to allocate different physical registers for different uses, ensuring that the write does not interfere
with the read operation.

30
22-Sep-2024

WAW (Write After Write )


Consider the following sequence of instructions:
Instruction 1 (Write): ADD R1, R2, R3 ; R1 = R2 + R3
Instruction 2 (Write): SUB R1, R4, R5 ; R1 = R4 - R5
Hazard Explanation
• Dependency: Both instructions are writing to the same register, R1. If Instruction 2 executes before Instruction 1 has
completed its write operation, the value written by Instruction 1 could be overwritten by the value from Instruction 2.
• Execution Order: In a pipelined architecture, these instructions might be processed in such a way that the write from
Instruction 2 occurs before the write from Instruction 1 completes, leading to an incorrect final value in R1.
Consequences
If not properly managed, a WAW hazard can lead to incorrect values being stored in registers, causing unexpected behavior
in the program.

Mitigation Strategies
In-Order Execution:
Ensure that instructions are executed in the order they appear in the program, preventing the possibility of a WAW hazard.
Register Renaming:
Use register renaming to allocate different physical registers for the results of different instructions. This eliminates the
conflict by ensuring that each write goes to a unique register.
Pipeline Interlocks:
Implement hardware interlocks that detect potential WAW hazards and stall the pipeline until it is safe to proceed.

Minimizing Data hazards


• Minimizing data hazards in pipelined architectures is crucial for maintaining high performance and ensuring
correct program execution. Here are several strategies to address different types of data hazards:
1. Data Forwarding (Bypassing)
• Description: Allows the result of an instruction to be used by subsequent instructions without waiting for it to
be written back to the register file.
• Example: If an instruction produces a value, it can be forwarded directly to a dependent instruction instead of
going through the register.
2. Stalling (Pipeline Interlocks)
• Description: Introduce delays (or stalls) in the pipeline until the data dependencies are resolved.
• Use: Useful for preventing data hazards when forwarding isn’t feasible or when waiting for data is unavoidable.
3. Out-of-Order Execution
• Description: Allows instructions to be executed as resources become available rather than strictly in the order
they appear.
• Benefit: Helps avoid stalls by keeping the pipeline busy with independent instructions while waiting for
dependent instructions to complete.
4. Instruction Scheduling
• Description: Reorder instructions at compile time to minimize hazards.
• Example: Placing independent instructions between dependent ones can help keep the pipeline full.

31
22-Sep-2024

Minimizing Data hazards


5. Register Renaming
• Description: Allocate different physical registers for instructions that write to the same logical
register, preventing WAW and WAR hazards.
• Benefit: Ensures that writes do not interfere with reads.
6. Branch Prediction
• Description: Predict the outcome of branch instructions to keep the pipeline filled with the correct
instructions.
• Use: Reduces control hazards that can stall the pipeline, especially in the presence of data
dependencies.
7. Use of Delayed Loads and Stores
• Description: Introduce NOPs (no-operation instructions) or other independent instructions between
loads/stores and their dependent operations.
• Benefit: Provides time for memory operations to complete before dependent instructions execute.
8. Hardware Interlocks
• Description: Implement hardware mechanisms to detect hazards and automatically introduce stalls
when necessary.
• Use: Helps prevent the execution of instructions that would result in data hazards.

Control Hazards
• Control hazards occur in pipelined processors when the flow of instruction execution is
altered, typically due to branch or jump instructions. These hazards can lead to incorrect
instruction fetching and execution, resulting in performance issues. Here’s a detailed
overview:
Causes of Control Hazards
Branch Instructions:
• Conditional branches (e.g., if statements) may change the next instruction to be executed
based on the outcome of a condition.
Jump Instructions:
• Unconditional jumps redirect the flow of execution to a specified address, which may not
be the next sequential instruction.
Consequences of Control Hazards
• Stalling: The pipeline may have to wait until the branch condition is evaluated before it can
fetch the next instruction.
• Incorrect Fetching: If the wrong instructions are fetched (before the branch decision is
resolved), it can lead to executing incorrect code.

32
22-Sep-2024

Control Hazards

Example
Consider the following sequence of instructions:
• 1. BEQ R1, R2, Label ; Branch to Label if R1 equals R2
• 2. ADD R3, R4, R5 ; This instruction might be executed if the branch is not taken
• 3. Label: SUB R6, R7, R8

Control Hazard Management


• If the pipeline executes the ADD instruction while waiting for the branch condition from
BEQ, it may end up executing the ADD instruction when it should have jumped to Label.
• Using branch prediction, the processor could predict the outcome of BEQ, allowing it to
fetch either the ADD instruction or jump directly to Label without stalling.

Techniques to minimize the Control Hazards


• Minimizing control hazards is crucial for maintaining high performance in pipelined processors. Here are
several strategies to address and mitigate control hazards:
1. Branch Prediction
• Description: Use algorithms to predict whether a branch will be taken or not taken.
Types:
• Static Prediction: Always predict taken or not taken based on fixed rules.
• Dynamic Prediction: Use hardware structures like branch history tables to make predictions based on
previous behavior.
• Benefit: Reduces stalls by allowing the pipeline to continue fetching instructions based on the prediction.
2. Delayed Branch
• Description: Rearrange the instruction sequence so that instructions that are not dependent on the branch
follow the branch instruction.
• Implementation: The compiler schedules independent instructions in the delay slots of branch instructions.
• Benefit: Keeps the pipeline busy while waiting for the branch resolution.
3. Branch Target Buffer (BTB)
• Description: A cache that stores the target addresses of previously executed branch instructions.
• Function: When a branch instruction is encountered, the BTB allows the CPU to quickly access the target
address if the branch is taken.
• Benefit: Reduces the time to fetch the next instruction after a branch.

33
22-Sep-2024

Techniques to minimize the Control Hazards


4. Stalling the Pipeline
• Description: Introduce bubbles (NOPs) in the pipeline until the branch instruction's outcome is known.
• Implementation: Simple but can lead to performance degradation if used excessively.
• Benefit: Ensures correct instruction execution at the cost of pipeline efficiency.
5. Speculative Execution
• Description: Execute both paths of a branch instruction, guessing the likely outcome, and discard the results
of the incorrect path once the actual outcome is known.
• Implementation: Requires additional hardware to manage the execution paths and rollback if necessary.
• Benefit: Can significantly improve performance by reducing the impact of control hazards.
6. Out-of-Order Execution
• Description: Allows instructions to be executed as resources are available, rather than strictly in program
order.
• Benefit: Keeps the pipeline filled with independent instructions while waiting for branch outcomes, reducing
stalls.
7. Hardware Interlocks
• Description: Implement mechanisms that automatically stall the pipeline when a control hazard is detected.
• Function: Ensures that instructions dependent on the result of a branch are not executed until the branch is
resolved.
• Benefit: Simplifies the handling of control hazards but can introduce delays.

Memory Delays
• Memory delays in pipelining refer to the latency introduced when accessing memory during instruction execution.
These delays can significantly impact the performance of pipelined processors, especially since memory operations
often take longer than other operations. Here’s an overview of memory delays and their effects in a pipelined
architecture:
Sources of Memory Delays
Access Time:
• The time it takes to read from or write to memory can vary based on memory hierarchy (e.g., cache, main memory).
Accessing lower-level memory (like RAM) generally takes longer than accessing cache.
Cache Misses:
• If the required data is not found in the cache, the processor must retrieve it from slower main memory, introducing
significant delays.
Instruction Fetching:
• Each instruction must be fetched from memory, which can contribute to pipeline stalls if it takes longer than expected.
Load/Store Instructions:
• Memory access for load and store operations can create delays, particularly when the data is not in the cache.
Impact of Memory Delays on Pipelining
• Stalls: When an instruction that depends on a memory operation is in the pipeline, subsequent instructions may need
to wait, causing pipeline stalls.
• Reduced Throughput: Frequent stalls can lead to lower instruction throughput, negating the performance benefits of
pipelining.
• Complexity: Managing memory delays adds complexity to pipeline design and instruction scheduling.

34
22-Sep-2024

Memory Delays

Strategies to Mitigate Memory Delays


Caching:
• Implement multi-level caches (L1, L2, L3) to reduce the average access time for frequently used data.
Prefetching:
• Anticipate future memory accesses and load data into the cache before it is requested, reducing wait times.
Memory Interleaving:
• Distribute memory accesses across multiple banks to allow simultaneous access, increasing throughput.
Out-of-Order Execution:
• Allow independent instructions to execute while waiting for memory operations to complete, keeping the pipeline
busy.
Load/Store Buffers:
• Use buffers to hold the results of load/store operations until the memory access completes, allowing the processor to
continue executing other instructions.
Reducing Memory Access Frequency:
• Optimize code to minimize the number of memory accesses, for instance, by using registers more effectively or
reducing the number of load/store operations.

Branch Delays
• Branch delays in pipelining refer to the performance penalties that occur when the flow of instruction execution is
altered by branch instructions (like jumps or conditional branches). These delays can disrupt the smooth flow of
instructions through the pipeline, leading to inefficiencies.
Causes of Branch Delays
Uncertainty in Control Flow:
• When a branch instruction is encountered, the processor may not immediately know which instruction to fetch next,
leading to potential stalls while the outcome of the branch is determined.
Pipeline Flushing:
• If the prediction of a branch outcome is incorrect, the instructions that were fetched speculatively need to be
discarded (flushed), wasting cycles and resources.
Instruction Fetching:
• The time taken to fetch the next instruction after a branch can lead to stalls, especially if the branch is taken.
Impact of Branch Delays on Performance
• Stalls: The pipeline may need to wait until the branch condition is resolved, introducing bubbles (NOPs) that can slow
down overall execution.
• Reduced Throughput: Frequent branches can lower instruction throughput, as the pipeline can spend significant time
resolving branches.
• Increased Complexity: Managing branch delays requires additional logic and mechanisms in the processor, increasing
design complexity.

35
22-Sep-2024

Resource Limitations
• Resource limitations in pipelining refer to constraints on hardware resources that can affect the efficiency and performance
of a pipelined processor. These limitations can lead to various hazards and inefficiencies during instruction execution.
Types of Resource Limitations
Functional Units:
• Description: The number of arithmetic logic units (ALUs), floating-point units (FPUs), or other functional units can limit the
number of instructions that can be executed simultaneously.
• Impact: If multiple instructions require the same functional unit, it can lead to pipeline stalls while waiting for resources to
become available.
Registers:
• Description: Limited physical registers can constrain the ability to hold intermediate values.
• Impact: Register file conflicts can lead to data hazards, particularly in Write After Write (WAW) and Read After Write (RAW)
scenarios.
Memory Bandwidth:
• Description: The speed and capacity of memory accesses can limit how quickly data can be fetched or stored.
• Impact: Memory bottlenecks can cause delays in instruction execution, especially with frequent load/store operations.
Cache Size and Hierarchy:
• Description: The size and levels of caches (L1, L2, L3) can limit the amount of frequently accessed data that can be stored
close to the CPU.
• Impact: Cache misses can introduce significant latency, affecting overall performance and leading to pipeline stalls.
Pipeline Stages:
• Description: The number of stages in a pipeline can affect how instructions are processed.
• Impact: A longer pipeline may lead to more complex hazard management, while a shorter pipeline may not fully exploit
parallelism.

Resource Limitations
Consequences of Resource Limitations
• Pipeline Stalls: Conflicts for functional units or registers can lead to delays, reducing overall throughput.
• Underutilization: If certain resources are idle while others are overused, it can lead to inefficient execution and wasted
potential.
• Increased Complexity: Managing limited resources requires additional control logic and mechanisms, complicating the
design and operation of the processor.
Strategies to Mitigate Resource Limitations
Resource Duplication:
• Increase the number of functional units, registers, or cache levels to reduce contention and increase parallelism.
Instruction Scheduling:
• Reorder instructions during compilation to minimize conflicts and optimize the use of available resources.
Out-of-Order Execution:
• Allow instructions to be executed based on resource availability rather than strict program order, helping to keep the
pipeline busy.
Register Renaming:
• Use additional physical registers to eliminate WAW and WAR hazards, allowing multiple instructions to execute without
conflicts.
Efficient Cache Design:
• Optimize cache sizes and hierarchies to balance speed and capacity, reducing the likelihood of cache misses.
Memory Management Techniques:
• Implement techniques such as prefetching and memory interleaving to improve memory bandwidth and reduce access
times.

36
22-Sep-2024

Performance Evaluation
Throughput:
• The number of instructions completed per unit of time (e.g., instructions per second).
• Importance: Higher throughput indicates better utilization of the pipeline.
Latency:
• The time taken to complete a single instruction from start to finish.
• Importance: Lower latency is desirable for quick instruction execution, but pipelining may increase the
latency of individual instructions due to the presence of pipeline stages.
Speedup:
• The ratio of the time taken to execute a program on a non-pipelined processor to the time taken on a
pipelined processor.
• Importance: Indicates how much faster a program runs on a pipelined processor compared to a non-
pipelined one.
Utilization:
• The fraction of time the pipeline is busy executing instructions.
• Importance: High utilization suggests that the pipeline is being effectively used, while low utilization
indicates idle time and inefficiencies.
Cycle Time:
• The time taken to complete one clock cycle in the pipeline.
• Importance: Cycle time can affect overall performance; shorter cycle times generally lead to better
performance.

Superscalar Operations
• Superscalar operations in pipelining refer to the ability of a processor to execute multiple instructions simultaneously in a
single clock cycle. This is achieved by having multiple execution units and pipelines within a single processor core, allowing
for greater throughput and improved performance.
Key Concepts of Superscalar Architecture
Multiple Functional Units:
• Superscalar processors have multiple ALUs, FPUs, and load/store units, allowing them to execute more than one instruction
at a time.
Instruction-Level Parallelism (ILP):
• Superscalar architectures exploit ILP by issuing multiple instructions per clock cycle. The degree of parallelism depends on
the dependencies between instructions.
Dynamic Scheduling:
• Instructions are dynamically scheduled at runtime, which means the processor can decide the order of instruction
execution based on resource availability and data dependencies, rather than following the original program order.
Instruction Dispatching:
• Instructions are fetched, decoded, and dispatched to various execution units in parallel. The dispatch unit determines which
instructions can be executed simultaneously based on their dependencies.
Benefits of Superscalar Operations
• Increased Throughput: By executing multiple instructions in parallel, superscalar architectures can significantly increase the
number of instructions processed per cycle.
• Better Resource Utilization: Multiple functional units can be utilized effectively, reducing idle times and enhancing overall
performance.
• Higher Performance for Diverse Workloads: Superscalar designs are particularly effective for applications with high
instruction-level parallelism, such as scientific computing and graphics processing.

37

You might also like