Assignment 1: Name Class Date Period Sbuid Netid Email
Assignment 1: Name Class Date Period Sbuid Netid Email
Assignment 1: Name Class Date Period Sbuid Netid Email
2. Time complexity = O(logn) + O(n/logn) + time complexity of read and write operation between the
iteration in step2.
Time taken by write after first iteration = number of elements x time taken by single write operation,
Which is n/logn elements x O(1) = O(n/logn) for write after first iteration. Time complexity of read
after first iteration is also O(n/logn). Similarly, time complexity of read and write after second
iteration is o(n/logn*2) and so on and so forth. Total read/write time complexity = O(n/logn) +
O(n/logn*2) + O(n/logn*4) O(1) = O(n/logn).Use this in equation 2.
Algorithm:
Divide n elements into n/logn groups of logn elements:
For each Pi in 1<= i <= n/logn: Do in parallel
find max in group Gi in logn element in time O(logn)
write to max of each group in location M[Gi].
while i not equal to 1 for Gi://only one element left(max)
For each Pi in 1<= i <= Gi/2: Do in parallel.
Read from M[i*2-1] and M[i*2]
Compare and find max between i*2-1 and i*2 element.
Write to max of two to memory M[i].
Gi= Gi/2
2.Given n processors and assume that initially each shared memory location M[i](1<=i<=n)
holds an input value ai. Design an O(logn) algorithm on a CREW-PRAM model such that at
the end of the algorithm M[i] = ik=1 ak. Give necessary explanation and analysis.
Step 4. To find remaining elements by adding the previous sum and original element at that place
because previous elements in final array is already sum of all previous element.
Example f1 = a1, f3 = f2 + a3, f5 = f4 +a5, so on and so forth.
*Step 2 can be solved recursively, by again adding adjacent number in pairs and jump to step 1 until
only one element remaining in array.
Algorithm:
1.For each Pi 1<=i<=n/2: do in parallel:
add two adjacent numbers in pairs. : T(1)
2.find sum M[i] 1<=i<=n/2 : T(n/2)
3.for each Pi in 1<=i<=n/2: do in parallel: : T(1)
add f(n-1) + a(n) and place in M[n] : T(1)
Time complexity T(n) = T(1) + T(n/2) + T(1) + T(1). Step 2 can be recursively solved by passing n/2 array
again to step1 until only 1 element left in array. So, T(n) = T(n/2K) + C recursively, which is O(logn).
Page 2 of 4
3.Design an algorithm on a CRCW-PRAM model for fast multiplication of two n x n matrix
for the following case:
(a). The number of processors P(n) = n and the time complexity of the algorithm T(n) =
O(n2).
Traverse n x n location and find each element by multiplying A(0,k) and B(k,o) in parallel by Pk
processor and, then add all to get M(i,j). Repeat this for all n x n element. Only advantage this
algorithm has over the 1 processor sequential algorithm is that it can multiply all n x n number to
find i,j element in final matrix in O(1) time because of parallelism.
(b). The number of processors P(n) = n2 and the time complexity of the algorithm T(n) =
O(n2).
Algorithm for matrix A(n, n) x B(n, n) multiplication using n2 processors:
For each Pi,j in 1<=i,j<=n : do in parallel.
M(i,j) = nk=0 A(i,k)*B(k,j) -- T(k)
Since we have n2 processors, we can use each processer to find M(i, j) number in final matrix. Each
processor can calculate sigma of A(i,k) * B(k,j) for k 0 to n to find number at position i,j.
4.Prove that the best parallel algorithm written for an n processors EREW-PRAM model can
be no more than O(logn) times slower than any algorithm for a CRCW model of PRAM
having same number of processors.
Difference between the EREW and CRCW model is that, in CRCW all processors can read and
write to memory location mi concurrently. But, in EREW processors cannot read memory location
in same cycle. So, we need to broadcast the data for other processors. Every processor can
broadcast data using following algorithm:
Page 3 of 4
Start broadcast of mi:
Foreach Pi 1<=i<=n:
For j in 1<=j<=log2 n:
M[i+ 2j] = M[i]
j = j+1
then Pi reads from M[i] location
So the idea is before Pi reads from memory, it will be broadcasted to other locations exponentially.
Time complexity is O(logn) because of For j in 1<=j<=log2 n.
Case: In matrix multiplication in Answer 3 (a). Time complexity is O(n2) CRCW model. If we use
EREW model for same problem then only extra step is read and write, which can be done with
above broadcasting algorithm. So time complexity for EREW model will be O(logn) more than
CRCW model for matrix multiplication.
Page 4 of 4