V-Unit Co

1
UNIT-V
 Pipelining: Basic Concepts
Pipelining is a technique for breaking down a sequential process into various sub-
operations and executing each sub-operation in its own dedicated segment that runs in
parallel with all other segments.
The most signi icant feature of a pipeline technique is that it allows several
computations to run in parallel in different parts at the same time.
By associating a register with every segment in the pipeline, the process of
computation can be made overlapping. The registers provide separation and isolation
among every segment, allowing each to work on different data at the same time.
An input register for each segment, followed by a combinational circuit, can be
used to illustrate the structure of a pipeline organisation. To better understand the
pipeline organisation, consider an example of a combined multiplication and addition
operation.
A stream of numbers is used to perform the combined multiplication and addition
operation, such as:
for i = 1, 2, 3, ……., 7
Ai* Bi + Ci
The operation to be done on the numbers is broken down into sub-operations, each of
which is implemented in a segment of a pipeline. We can de ine the sub-operations
performed in every segment of the pipeline as:
Input Ai, and Bi
R1 ← Ai, R2 ← Bi
Multiply, and input Ci
R3 ← R1 * R2, R4 ← Ci
Add Ci to the product
R5 ← R3 + R4
The combined and sub-operations conducted in each leg of the pipeline are depicted in
the block diagram below:
VSREDDY
2
R1, R2, R3, and R4 Registers hold the data. The combinational circuits then operate in a
certain segment.
The output of a given segment’s combinational circuit is used as an input register for the
next segment. The register R3 is used here as one of the input registers for a
combinational adder circuit, as shown in the block diagram.
the pipeline organization is applicable for two areas of computer design which includes:
1. Arithmetic Pipeline
2. Instruction Pipeline
 Arithmetic Pipeline
Arithmetic Pipelines are mostly used in high-speed computers. They are used to
implement loating-point operations, multiplication of ixed-point numbers, and similar
computations encountered in scienti ic problems.
To understand the concepts of arithmetic pipeline in a more convenient way, let us
consider an example of a pipeline unit for loating-point addition and subtraction.
The inputs to the loating-point adder pipeline are two normalized loating-point binary
numbers de ined as:
X = A * 2a = 0.9504 * 103
Y = B * 2b = 0.8200 * 102
Where A and B are two fractions that represent the mantissa and a and b are the
exponents.
The combined operation of loating-point addition and subtraction is divided into four
segments. Each segment contains the corresponding suboperation to be performed in the
given pipeline. The suboperations that are shown in the four segments are:
1. Compare the exponents by subtraction.

2. Align the mantissas.
3. Add or subtract the mantissas.
4. Normalize the result.
5. We will discuss each suboperation in a more detailed manner later in this section.
The following block diagram represents the suboperations performed in each segment of
the pipeline.
VSREDDY
3
1. Compare exponents by subtraction:

The exponents are compared by subtracting them to determine their difference. The
larger exponent is chosen as the exponent of the result.
The difference of the exponents, i.e., 3 - 2 = 1 determines how many times the mantissa
associated with the smaller exponent must be shifted to the right.
2. Align the mantissas:
The mantissa associated with the smaller exponent is shifted according to the difference
of exponents determined in segment one.
X = 0.9504 * 103
Y = 0.08200 * 103
3. Add mantissas:
The two mantissas are added in segment three.
Z = X + Y = 1.0324 * 103
4. Normalize the result:
After normalization, the result is written as:
Z = 0.1324 * 104
 Instruction Pipeline
Pipeline processing can occur not only in the data stream but in the instruction stream
as well. Most of the digital computers with complex instructions require instruction
pipeline to carry out operations like fetch, decode and execute instructions.
In general, the computer needs to process each instruction with the following sequence
of steps.
1. Fetch instruction from memory.
VSREDDY
4
2. Decode the instruction.

3. Calculate the effective address.
4. Fetch the operands from memory.
5. Execute the instruction.
6. Store the result in the proper place.
Each step is executed in a particular segment, and there are times when different
segments may take different times to operate on the incoming information. Moreover,
there are times when two or more segments may require memory access at the same time,
causing one segment to wait until another is inished with the memory.
The organization of an instruction pipeline will be more ef icient if the instruction cycle
is divided into segments of equal duration. One of the most common examples of this type
of organization is a Four-segment instruction pipeline.
A four-segment instruction pipeline combines two or more different segments and makes
it as a single one. For instance, the decoding of the instruction can be combined with the
calculation of the effective address into one segment.
The following block diagram shows a typical example of a four-segment instruction

pipeline. The instruction cycle is completed in four segments.
Segment 1:
The instruction fetch segment can be implemented using irst in, irst out (FIFO) buffer.
Segment 2:
The instruction fetched from memory is decoded in the second segment, and eventually,
the effective address is calculated in a separate arithmetic circuit.
Segment 3:
An operand from memory is fetched in the third segment.
Segment 4:
VSREDDY
5
The instructions are inally executed in the last segment of the pipeline organization.
 Instruction hazards
Instruction hazards are a type of hazard that can occur in pipelined computer
architectures. These hazards arise when instructions are not executed in the correct order
or when the pipeline stages are not utilized ef iciently. In this essay, we will explore the
basics of instructional hazards, how they arise, and techniques used to mitigate them.
What are Instructional Hazards?
Instructional hazards are a type of hazard that occurs when instructions in a
pipelined architecture are not executed in the correct order. In a pipelined architecture,
instructions are broken down into several stages, and each stage of the pipeline is
responsible for a particular operation. These stages operate on different instructions in a
pipelined manner, such that one instruction is being executed while another instruction
is being fetched and decoded.
However, when instructions depend on the results of previous instructions,
instructional hazards can occur. For example, if instruction A and instruction B both
require the same pipeline stage to execute, then these instructions may need to be
executed in the correct order. If instruction B is executed before instruction A completes,
then the results of instruction A may not be available to instruction B.
Types of Instructional Hazards.
There are three types of instructional hazards that can occur in pipelined computer
architectures. These are:
1. Structural Hazards: Structural hazards occur when multiple instructions
require the same hardware resource. For example, if two instructions require the same
arithmetic logic unit (ALU), then only one instruction can be executed at a time. This can
lead to delays and decreased ef iciency in the pipeline.
2. Control Hazards: Control hazards occur when the pipeline is unable to predict
the outcome of a branch instruction. In this case, the pipeline may continue to execute
instructions that depend on the outcome of the branch instruction, even though the
branch instruction has not yet been resolved. This can lead to incorrect results and wasted
processing time.
3. Data Hazards: Data hazards, as discussed in the previous essay, occur when
instructions depend on the results of previous instructions. Data hazards can lead to
incorrect results and must be mitigated.
VSREDDY
6
 Data hazards
Data hazards refer to a type of hazard that occurs in pipelined computer architectures.
These hazards arise when instructions that depend on the data produced by previous
instructions are not executed in the correct order. In this essay, we will explore the
basics of data hazards, how they arise, and techniques used to mitigate them.
What are Data Hazards?
Data hazards are a type of hazard that occurs when instructions in a pipelined
architecture require data produced by previous instructions. In a pipelined architecture,
instructions are broken down into several stages, and each stage of the pipeline is
responsible for a particular operation. These stages operate on different instructions in
a pipelined manner, such that one instruction is being executed while another
instruction is being fetched and decoded.
However, when instructions depend on the results of previous instructions, data hazards
can occur. For example, if instruction A is responsible for writing data to a register, and
instruction B is responsible for reading data from the same register, then instruction B
must wait for instruction A to complete before it can execute. If instruction B is executed
before instruction A completes, then the data read by instruction B may be incorrect.
Types of Data Hazards There are three types of data hazards that can occur in pipelined
computer architectures.
These are:
1. Read-After-Write (RAW) Hazards: RAW hazards occur when an instruction reads
data from a register that has not yet been written by a previous instruction. In this case,
the previous instruction must be completed before the instruction that reads the data
can execute.
2. Write-After-Read (WAR) Hazards: WAR hazards occur when an instruction writes
data to a register that has been read by a previous instruction. In this case, the previous
instruction must be completed before the instruction that writes the data can execute.
3. Write-After-Write (WAW) Hazards: WAW hazards occur when two instructions
attempt to write data to the same register. In this case, the second instruction must wait
for the irst instruction to complete before it can execute
VSREDDY
7
Influence on Instruction Sets
Influence on Instruction Sets: Some instructions are much better suited to pipelined
execution than other instructions. For example, instruction side effects can lead to
undesirable data dependencies. The machine instructions are influenced by addressing
modes and condition code flags.
• Addressing modes: Addressing modes should provide the means for accessing a
variety of data structures simply and efficiently. Useful addressing modes include
index, indirect, autoincrement, and autodecrement. Many processors provide
various combinations of these modes to increase the flexibility of their instruction
sets. Complex addressing modes, such as those involving double indexing, are
often encountered.
Two important considerations in this regard are the side effects of addressing
modes such as autoincrement and autodecrement and the extent to which
complex addressing modes cause the pipeline to stall. Another important factor is
whether a given mode is likely to be used by compilers.
• Condition modes
In many processors, the condition code flags are stored in the processor status
register. They are either set or cleared by many instructions, so that they can be
tested by subsequent conditional branch instructions to change the flow of
program execution. An optimizing compiler for a pipelined processor attempts to
reorder instructions to avoid stalling the pipeline when branches or data
dependencies between successive instructions occur. In doing so, the compiler
must ensure that reordering does not cause a change in the outcome of a
computation. The dependency introduced by the condition-code flags reduces the
flexibility available for the compiler to reorder instructions.
 Large Computer Systems

The computer has been viewed as a sequential machine. Most computer
programming languages require the programmer to specify algorithms as sequences of
instructions. CPUs execute programs by executing machine instructions in a sequence and
one at a time. Each instruction is executed in a sequence of operation (fetch instruction,
fetch operands, perform operation, store results).
This view of the computer has never been entirely true. At the microoperation
level, multiple control signals are generated at the same time. Instruction pipelining, at
least to the extent of overlapping fetch and execute operations, has been around for a long
time. Both of these are examples of performing functions in parallel.
When a computer application requires a very large amount of computation that
must be completed in a reasonable amount of time, it becomes necessary to use machines
with correspondingly large computing capacity. Such machines are called
supercomputers. Typical applications that require supercomputers include weather
VSREDDY
8
forecasting; simulation of large, complex, physical systems; and computer-aided design

(CAD) using high-resolution graphics.
As a general quantitative measure, a supercomputer should have the capability to
execute at least 100 million instructions per second.
In the development of powerful computers, two basic approaches can be followed.

1. The irst possibility is to build the system around a high-performance single
processor, and to use the fastest circuit technology available. Architectural features
such as multiple functional units for arithmetic operations, pipelining, large
caches, and separate buses for instructions and data can be used to achieve very
high throughput. As the design is based on a single processor, the system is
relatively easy to use because conventional software techniques can be applied.
2. The second approach is to con igure a system that contains a large number of
conventional processors. The individual processors do not have to be complex,
high-performance units. They can be standard microprocessors. The system
derives its high performance form the fact that many computations can proceed in
parallel.
 What is parallel processing?

Parallel processing or multiprocessing refers to a computing method that helps
process large tasks by separating them into multiple parts and completing them
simultaneously with two or more central processing units (CPU). This type of processing
helps improve performance and reduces the time for completing a task. You can use any
operating system with multiple CPUs, such as multi-core processors, to perform
multiprocessing methods.
Types of parallel processing
You may divide parallel processors into four groups based on data streams and
instructions. These groups include:
1. Single Instruction Single Data (SISD): In a SISD architecture, there is a single

processor that executes a single instruction stream and operates on a single data
VSREDDY
9
stream. This is the simplest type of computer architecture and is used in most
traditional computers.
2. Single Instruction Multiple Data (SIMD): In a SIMD architecture, there is a single

processor that executes the same instruction on multiple data streams in parallel.
This type of architecture is used in applications such as image and signal
processing.
3. Multiple Instruction Single Data (MISD): In a MISD architecture, multiple

processors execute different instructions on the same data stream. This type of
architecture is not commonly used in practice, as it is dif icult to ind applications
that can be decomposed into independent instruction streams.
4. Multiple Instruction Multiple Data (MIMD): In a MIMD architecture, multiple

processors execute different instructions on different data streams. This type of
architecture is used in distributed computing, parallel processing, and other high-
performance computing applications.
VSREDDY
10
 Array Processors
A processor that performs computations on a vast array of data is known as an array
processor. Multiprocessors and vector processors are other terms for array processors. It
only executes one instruction at a time on an array of data. They work with massive data
sets to perform computations. Hence, they are used to enhance the computer's
performance.
Classi ication of Array Processors
Attached Array Processor

The auxiliary processor like the attached array processor is shown below. This
processor is simply connected to a computer for enhancing the performance of a machine
within numerical computational tasks. This processor is connected to the General
Purpose Computer through an I/O interface and a local memory interface where both the
memories like the main & the local are connected. This processor achieves high
performance through parallel processing by multiple functional units.
VSREDDY
11
SIMD Array Processor

SIMD (‘Single Instruction and Multiple Data Stream’) processors is a computers
with several processing units which operate in parallel. These processing units perform
the same operation in synchronizing under the supervision of the common control unit
(CCU). The SIMD processor includes a set of identical PEs (processing elements) where
each PES has a local memory.
This processor includes a master control unit and main memory. The master
control unit in the processor controls the operation of the processing elements. And also,
decodes the instruction & determines how the instruction is executed. So, if the
instruction is program control or scalar then it is executed directly in the master control
unit. Main memory is mainly used to store the program while every processing unit uses
operands that are stored in its local memory.
Advantages
The advantages of an array processor include the following.
• Array processors improve the whole instruction processing speed.

• These processors run asynchronously from the host CPU the overall capacity of
the system is improved.
• These processors include their own local memory that provides extra memory to
systems. So this is an important consideration for the systems through a limited
address space or physical memory.
VSREDDY
12
•These processors simply perform computations on a huge array of data.

•These are extremely powerful tools that help in handling troubles with a high
amount of parallelism.
• This processor includes a number of ALUs that permits all the array elements to
be processed simultaneously.
• Generally, the I/O devices of this processor-array system are very ef icient in
supplying the required data to the memory directly.
• The main advantage of using this processor with a range of sensors is a slighter
footprint.
Applications
The applications of array processors include the following.
• This processor is used in medical and astronomy applications.
• These are very helpful in speech improvement.
• These are used in sonar and radar systems.
• These are applicable in anti-jamming, seismic exploration & wireless
communication.
• This processor is connected to a general-purpose computer to improve the
computer’s performance within arithmetic computational tasks. So it attains high
performance through parallel processing by several functional units.
 The Structure of General- Purpose multiprocessors

A multiprocessor is a computer that contains two or more central processing units
(CPUs) that share full access to a standard RAM(Random-access memory). The primary
goal of using a multiprocessor is to increase the system's execution speed, with fault
tolerance.
We can divide multiprocessors into
o Shared memory multiprocessors: -all CPUs share the common memory
in a shared-memory multiprocessor.
o Distributed memory multiprocessors: - Each CPU has its private
memory in a distributed memory multiprocessor
Multiprocessor
• A multiprocessor system is an interconnection of two or more CPU’s with
memory and input-output equipment.
VSREDDY
13
• Multiprocessors system are classi ied as multiple instruction stream, multiple

data stream systems(MIMD).
• There exists a distinction between multiprocessor and multicomputers that
though both support concurrent operations.
• In multicomputers several autonomous computers are connected through a
network and they may or may not communicate but in a multiprocessor
system there is a single OS Control that provides interaction between
processors and all the components of the system to cooperate in the solution
of the problem.
• VLSI circuit technology has reduced the cost of the computers to such a low
Level that the concept of applying multiple processors to meet system
performance requirements has become an attractive design possibility.
Processor Organization
Characteristics of Multiprocessors:
o Symmetric shared memory: It introduces multiprocessor architecture
and shared memory machine design options. It explains the cache
coherence problem and the solutions that are accessible. It also contains
information on snooping protocols.
o Memory consistency models: It presents the memory consistency idea
and the sequential consistency paradigm. More lenient consistency models,
VSREDDY
14
such as the release/acquire model, are also presented. It also provides

information on the memory model used by Intel processors.
o Synchronization: It discusses the synchronization problem in shared
memory computers. It also discusses the several hardware primitives that
we can utilize to accomplish synchronization. It explains what a lock and a
barrier are and how to create them.
o Distributed shared memory: It goes over the concept of distributed
shared memory in greater depth and introduces directory-based protocols.
 Interconnection Structures
o The components that form a multiprocessor system are CPUs, IOPs connected to
input-output devices, and a memory unit.
o The interconnection between the components can have different physical
con igurations, depending on the number of transfer paths that are available
 Between the processors and memory in a shared memory system
 Among the processing elements in a loosely coupled system
There are several physical forms available for establishing an interconnection network.
• Time-shared common bus
• Multiport memory
• Crossbar switch
Time Shared Common Bus
o A common-bus multiprocessor system consists of a number of processors
connected through a common path to a memory unit.
Disadvantage.:
• Only one processor can communicate with the memory or another
processor at any given time.
• As a consequence, the total overall transfer rate within the system
is limited by the speed of the single path
o A more economical implementation of a dual bus structure is depicted in Fig.
below.
o Part of the local memory may be designed as a cache memory attached to the
CPU.
VSREDDY
15
System bus structure for multiprocessors

Multiport Memory
o A multiport memory system employs separate buses between each memory
module and each CPU.
o The module must have internal control logic to determine which port will have
access to memory at any given time.
o Memory access con licts are resolved by assigning ixed priorities to each
memory port.
o Advantages.: -
The high transfer rate can be achieved because of the multiple paths.
o Disadvantages.:-
It requires expensive memory control logic and a large number of cables
and connections
VSREDDY
16
Crossbar Switch
o Consists of a number of crosspoints that are placed at intersections between
processor buses and memory module paths.
o The small square in each crosspoint is a switch that determines the path from
a processor to a memory module.
o Adv.:
 Supports simultaneous transfers from all memory modules
o Disadv.:
 The hardware required to implement the switch can become quite
large and complex.
 Below ig. shows the functional design of a crossbar switch
connected to one memory module.
VSREDDY
17
Crossbar switch
Block diagram of crossbar switch
Important Questions
Small questions
1. Explain Arithmetic Pipeline.
2. Explain stages for Instruction Pipeline.
3. Explain Data hard.
4. Explain Instruction Hazard.
5. Explain Large Computer Systems.
6. Explain Attached Array Processor
7. Explain SIMD Array Processor
Large Questions
1. Explain Basic Pipeline Concept in Computer Organization.
2. Explain Hazards in pipeline.
3. What is Parallel processing? Explain forms of parallel processing.
4. Explain Array Processors in computer organization.
5. De ine the structure of general-purpose multiprocessors.
6. Explain interconnection structures.
Assignment questions
Small questions
1. Explain Arithmetic Pipeline.
2. Explain stages for Instruction Pipeline.
3. Explain Data hard.
VSREDDY
18
4. Explain Instruction Hazard.

5. Explain Large Computer Systems.
Large Questions
1. Explain Basic Pipeline Concept in Computer Organization.
2. Explain Hazards in pipeline.
3. What is Parallel processing? Explain forms of parallel processing.
4. Explain Array Processors in computer organization.
5. De ine the structure of general-purpose multiprocessors.
VSREDDY

V-Unit Co

Uploaded by

Copyright:

Available Formats

V-Unit Co

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

V-Unit Co

Uploaded by

Copyright:

Available Formats

1

1. Compare the exponents by subtraction.

1. Compare exponents by subtraction:

2. Decode the instruction.

The following block diagram shows a typical example of a four-segment instruction

Influence on Instruction Sets

 Large Computer Systems

forecasting; simulation of large, complex, physical systems; and computer-aided design

In the development of powerful computers, two basic approaches can be followed.

 What is parallel processing?

1. Single Instruction Single Data (SISD): In a SISD architecture, there is a single

2. Single Instruction Multiple Data (SIMD): In a SIMD architecture, there is a single

3. Multiple Instruction Single Data (MISD): In a MISD architecture, multiple

4. Multiple Instruction Multiple Data (MIMD): In a MIMD architecture, multiple

Attached Array Processor

SIMD Array Processor

• Array processors improve the whole instruction processing speed.

•These processors simply perform computations on a huge array of data.

 The Structure of General- Purpose multiprocessors

• Multiprocessors system are classi ied as multiple instruction stream, multiple

such as the release/acquire model, are also presented. It also provides

System bus structure for multiprocessors

Block diagram of crossbar switch

4. Explain Instruction Hazard.

You might also like