Parallel Algorithm Merged

Download as pdf or txt
Download as pdf or txt
You are on page 1of 76

Your Roll No....

CEN-704
VII Semester Examination 2018
B.Tech.(Computer Engineering),
Parallel and Distributed Computing

Paper No: CEN-704

Max. Marks:-60
Time:-03 Hours

top immediately on receipt question paper.


of
Write your Roll No. on

All questions carry equal


Attempt all questions by attempting any
two parts from each question.
marks.

CO1
Q14) a new architecture:
following speed ups are proposed for
Three enhancements with the
one enhancement is usable at a time.
Speedupl-30, speedup2-20, and peedup3 =15. Only 2 1
Assume the enhancements can be used 25%,35% and 10% of the time for enhancements
no enhancement in use?
and 3 respectively. For what fraction of the reduced execution time is
enhancements should be used? 06
If only two enhancements are to be used then which two

Q1(6) benchmark takes


i) An application program is executed on a nine computer cluster. A program
time T on this cluster. Further 25% of T is time in which the application is run simultaneously on
all nine computers. The remaining time, the application has to run on a single computer.
Calculate the effective speedup under the aforementioned condition as compared to executing the
in
program on a single computer. Also calculate the percentage of code that has been parallelized
the preceding program. 03

Gi) Let a be the percentagee of program code that can be executed simultaneously by n
computers in a cluster, each computer using a difierent set of parameters or initial conditions.
Assume that the remaining code must be executed simultaneously by a single processor. Each
processor has an execution rate of x MIPS. Determine an expression for the effective MIPS rate
when using the system for exclusive execution of this program in terms of a, n and x. 03

Q1
().Consider a computer which can execute a program in two operational modes; regular mode
versus enhanced mode with a probability distribution of {A, 1-A} respectively. If A varies
between a and b and 0O<=a<=1, derive an expression for the average speedup factor using
harmonic mean concept. 06
CO2
06
of suitable examples and diagrams.
Q2a) Describe Branch prediction techniques with help

Q2(6)) Consider the following


N= no. of instructions to be executed
M= No. of segments in pipeline
branch instruction
P=probability that a given instruction is a unconditional
branch instruction
Q probability that a given instruction is a conditional
R= probability that a given conditional branch instruction will cause branching

Calculate the followings


Speedup
i. Throughput
ii. efficiency
iv. Average no. of Instructions executed per Instruction cycle. 04

Q2(b)
(i)Derive an expression for optimal number of stages in a pipeline. 02
Q2(c) Three functional pipelines fl, f2 and f3 are characterized by the following reservation
tables. 06

fl: f2: f3:


123 4 2 13 |4 123 4
S1X Ti X| U1 X | X |
$2 X T2 X U2
| S3 XXX] T3 X| U3 X

Using these three pipelines a composite pipeline is formed as below.

Mux F1 F2
:
A F3
BMux
output

Each task going through this composite pipeline in the following order: f1 first, 2 and f3 next, f1
again and then the output is obtained. The dual multiplexer selects a pair of inputs (A,B) or
(X,Y) and feeds them into the input of fl.
(i) Draw the state transition diagram for collision free scheduling
ii)List all simple cycle and greedy cycle
(ii)Find MAL
reservation table.
CO3
Q3 (a) Design and verify SIMD
complexity as O(N ). algorithm for NXN Matrix multiplication with computational
Q3 6) What is PRAM model? Design and verify PRAM 06
Algorithm for EREW NXN matrix
multiplication with computational complexity of O(N). 06
Q3(¢) Parallelize Quick Sort Algorithm. Compute its speced up with respect to its
algorithm. sequential
06

CO4
Q4(a ). What is MPI program? Write an MPI program for computing the values of N! 06

Q46) What do you mean by Open Mp program ? Write an open MP program to compute the
value of PI 06

4(0) Explain the following terms () GPU (Gi) cUDA. Explain how a CUDA program is written
with help of suitable example. Revise 06

COS

Q5. Describe in details any two of the following with the help of suitable examples and
diagrams.
(a)Mobile Agent
(6) Object request Broker

(c) Matrix Logical clock 2x6 12


Parallel Algorithm
RAM (Random Access Machine)
 Unbounded number of local memory cells
 Each memory cell can hold an integer of unbounded size
 Instruction set included –simple operations, data operations,
comparator, branches
 All operations take unit time
 Time complexity = number of instructions executed
 Space complexity = number of memory cells used
PRAM (Parallel Random Access Machine)
 Definition:
◦ Is an abstract machine for designing the algorithms applicable to parallel
computers
◦ M’ is a system <M, X, Y, A> of infinitely many
 RAM’s M1, M2, …, each Mi is called a processor of M’. All the processors are assumed to be
identical. Each has ability to recognize its own index i
 Input cells X(1), X(2),…,
 Output cells Y(1), Y(2),…,
 Shared memory cells A(1), A(2),…,
PRAM (Parallel RAM)
 Unbounded collection of RAM processors P0, P1, …,
 Each processor has unbounded registers
 Unbounded collection of share memory cells
 All processors can access all memory cells in unit time
 All communication via shared memory
PRAM (Parallel RAM)
 Some subset of the processors can remain idle

P0 P1 P2 PN

Shared Memory Cells

 Two or more processors may read


simultaneously from the same cell
 A write conflict occurs when two or
more processors try to write
simultaneously into the same cell
Share Memory Access Conflicts
 PRAM are classified based on their Read/Write
abilities (realistic and useful)
◦ Exclusive Read(ER) : all processors can simultaneously
read from distinct memory locations
◦ Exclusive Write(EW) : all processors can
simultaneously write to distinct memory locations
◦ Concurrent Read(CR) : all processors can
simultaneously read from any memory location
◦ Concurrent Write(CW) : all processors can write to
any memory location
◦ EREW, CREW, CRCW
Concurrent Write (CW)
• What value gets written finally?
• Priority CW: processors have priority based on which value is decided, the
highest priority is allowed to complete WRITE
• Common CW: all processors are allowed to complete WRITE iff all the values
to be written are equal.
• Arbitrary/Random CW: one randomly chosen processor is allowed to
complete WRITE
• Combimimg CW: a functiom may map multiple values into a single value.
Function may be max, min, sum, multiply etc
Strengths of PRAM
 PRAM is attractive and important model for designers of parallel
algorithms Why?
◦ It is natural: the number of operations executed per one cycle on p processors
is at most p
◦ It is strong: any processor can read/write any shared memory cell in unit time
◦ It is simple: it abstracts from any communication or synchronization overhead,
which makes the complexity and correctness of PRAM algorithm easier
◦ It can be used as a benchmark: If a problem has no feasible/efficient solution
on PRAM, it has no feasible/efficient solution for any parallel machine
An initial example
• How do you add N numbers residing in memory location A[0, 1, …, N]

• Serial Algorithm = O(N)

• PRAM Algorithm using N processors P0, P1, P2, …, PN ?


PRAM Algorithm (Parallel Addition)

P0 + P1 + P2 + P3 + Step 1

P0 + P2 + Step 2

P0 + Step 3
• Program in P(i)
• L=n
• Repeat
• L=L/2
• If(i<L) then begin
• read A[2i] from SM
• Read A[2i+1] from SM
• Compute sum=A[2i]+A[2i+1]
• store in A[i]
• Until(L=1)
PRAM Algorithm (Parallel Addition)
• Log (n) steps = time needed
• n / 2 processors needed
• Speed-up = n / log(n)
• Efficiency = 1 / log(n)
• Applicable for other operations
• +, *, <, >, etc.
Another algorithm to find sum
• Program in P(i)
• L=n
• Repeat
• L=L/2
• If(i<L) then begin
• read A[i] from SM
• Read A[i+L] from SM
• Compute sum of (A[i]and A[i+L])
• store in A[i]
• Until(L=1)
Program to find maximum of N nos
• Program in P(i)
• L=n
• Repeat
• L=L/2
• If(i<L) then begin
• read A[i] from SM
• Read A[i+L] from SM
• Compute Max(A[i],A[i+L])
• store in A[i]
• Until(L=1)
Program to find minimum of N nos
• Program in P(i)
• L=n
• Repeat
• L=L/2
• If(i<L) then begin
• read A[i] from SM
• Read A[i+L] from SM
• Compute Min(A[i],A[i+L])
• store in A[i]
• Until(L=1)
Program to find Product of N nos
• Program in P(i)
• L=n
• Repeat
• L=L/2
• If(i<L) then begin
• read A[i] from SM
• Read A[i+L] from SM
• Compute Product(A[i],A[i+L])
• store in A[i]
• Until(L=1)
Program to find sum of N nos using M
processors
• Program in P(I)
• sum=0

• For(J=I; J<N; J=J+M)


• Sum=sum+A[J]
• A[I]= sum
• L=M
• Repeat
• L=L/2
• If(i<L) then begin
• read A[i] from SM
• Read A[i+L] from SM
• Compute sum(A[i],A[i+L])
• store in A[i]
• Until(L=1)
Program to find sum of N nos using M
processors-another way of writing
• Program in P(I)
• PAR for (I= 0;I< M,I++ )
• {
• sum=0
• For(J=I; J<N; J=J+M)
• Sum=sum+A[J]
• A[I]= sum
• }
• L=M
• Repeat
• L=L/2
• If(i<L) then begin
• read A[i] from SM
• Read A[i+L] from SM
• Compute sum(A[i],A[i+L])
• store in A[i]
• Until(L=1)
Program to find sum of N nos using M
processors-another way of writing
• Program in P(I)
• PAR for (I= 0;I< M,I++ )
• {
• sum=0
• K= N/M
• For(J=K*I; J<(I+1)*K; J=J+1)
• Sum=sum+A[J]
• A[I]= sum
• }
• L=M
• Repeat
• L=L/2
• If(i<L) then begin
• read A[i] from SM
• Read A[i+L] from SM
• Compute sum(A[i],A[i+L])
• store in A[i]
• Until(L=1)
Matrix multiplication using n Processors on
CRCW PRAM
• For ( i=0;i<n; i++)
• {
• for (j=0;j<n; j++)
• { C[i][j]=0
• PAR for(k=0; k<n;k++)
• {
• Read A[i][k]
• Read B[k][j]
• Compute C[i][j]=A[i][k]*B[k][j]
• store in C[i][j]
• }
• }
• }
Matrix multiplication using n Processors on
CREW PRAM
• For ( i=0;i<n; i++)
• {
• for (j=0;j<n; j++)
• { PAR for(k=0; k<n;k++)
• {C[i][j]=0
• Read A[i][k]
• Read B[k][j]
• Compute C[i][j]=C[i][j]+A[i][k]*B[k][j]
• store in C[i][j]
• }
• }
• }
Matrix multiplication using nxn Processors on
CRCW PRAM
• PAR For ( i=0;i<n; i++)
• {
• PAR for (j=0;j<n; j++)
• { C[i][j]=0
• for(k=0; k<n;k++)
• {
• Read A[i][k]
• Read B[k][j]
• Compute C[i][j]=A[i][k]*B[k][j]
• store in C[i][j]
• }
• }
• }
Matrix multiplication using nxn Processors on
CRCW PRAM
• For ( i=0;i<n; i++)
• {
• PAR for (j=0;j<n; j++)
• { PAR for(k=0; k<n;k++)
• {
• Read A[i][k]
• Read B[k][j]
• Compute C[i][j]=A[i][k]*B[k][j]
• store in C[i][j]
• }
• }
• }
Matrix multiplication using nxn Processors on
CREW PRAM
• PAR For ( i=0;i<n; i++)
• {
• PAR for (j=0;j<n; j++)
• { for(k=0; k<n;k++)
• {
• Read A[i][k]
• Read B[k][j]
• Compute C[i][j]= C[i][j]+A[i][k]*B[k][j]
• store in C[i][j]
• }
• }
• }
Matrix multiplication using nxnxn Processors
on CRCW PRAM
• PAR For ( i=0;i<n; i++)
• {
• PAR for (j=0;j<n; j++)
• { PAR for(k=0; k<n;k++)
• {
• Read A[i][k]
• Read B[k][j]
• Compute C[i][j]=A[i][k]*B[k][j]
• store in C[i][j]
• }
• }
• }
Matrix multiplication using nxnxn Processors
on CREW PRAM
• PAR For ( i=0;i<n; i++)
• {
• PAR for (j=0;j<n; j++)
• { PAR for(k=0; k<n;k++)
• {C[i][j]=0
• Read A[i][k]
• Read B[k][j]
• Compute C[i][j]=C[i][j]+A[i][k]*B[k][j]
• store in C[i][j]
• }
• }
• }
Matrix multiplication using nxn Processors on
EREW PRAM
• PAR For ( i=0;i<n; i++)
• {
• PAR for (j=0;j<n; j++)
• { C[i][j]=0
• for(k=0; k<n;k++)
• {
• Ik=(i+j+k)mod n + 1
• Read A[i][Ik]
• Read B[Ik][j]
• Compute C[i][j]=C[i][j]+A[IK][k]*B[Ik][j]
• store in C[i][j]
• }
• }
• }
Matrix Multiplication on Mesh with NxN processors
Parallel Algorithm
Complexity
• Processor P(I,j) receives its input after i-1+j-1 steps from the
beginning of computation
• After getting the input P(I,j) takes n steps to Compute C(I,j)
• So C(I,j) is computed in i-1+j-1 +n steps
• So complexity of algorithm is O(i-1+j-1 +n )=O(n-1+n-1+n)=O(n)
Matrix Multiplication on SIMD machines
Memory allocation for matrix multiplication
Successive content of C array in the memory
Initial distribution of rows of A
4-way broadcaste of rows of A
Initial Distribution of rows of B transpose
4-way broadcast of rows of B transpose
CREW Matrix multiplication
EREW Matrix multiplication
CRCW Matrix multiplication

You might also like