National University of Computer and Emerging Sciences, Lahore Campus
National University of Computer and Emerging Sciences, Lahore Campus
National University of Computer and Emerging Sciences, Lahore Campus
3. There are N processors. Computations of a matrix are divided among N processors. Once they are
done, each one of them sends their computed result to a single processor (part of N processors)
and it is summed there. How many All-to-One reductions are required?
a. 1
b. N - 1
c. N/2
d. N
4. For task dependency graphs that are trees, the maximum degree of concurrency is:
a. The ratio of the total amount of work to the critical path length
b. Equal to the sum of the weights of nodes/tasks on the critical path
c. Equal to the sum of the weights of leaves/number of leaves in a tree
d. Equal to the weight of the leaf with the largest weight (among leafs) in a tree
6. Speedups can become _________ when we use exploratory decomposition, because of the
_______ position of the goal state within the search space
a. anomalous; deterministic
b. anomalous; uncertain
c. deterministic; uncertain
d. deterministic; random
7. With the naïve solution for one-to-all broadcast on a ring, we would expect to send ____ messages
to the other _____ processes, and this may lead to an __________ of the communication network
a. p; p-1; overutilization
b. p-1; p-1; underutilization
c. p-1; p-1; overutilization
d. (p-1)2; p-1; underutilization
8. On a ring, if we move from the naïve solution to recursive doubling for one-to-all broadcast, we
would decrease the number of cycles consumed by approximately (assuming p processors):
a. (p-1) – log(p)
b. (p-1)2 – log(p)
c. (p-1) – (2*log(p))
d. ((p-1)/2) – (2*log(p))
9. Using OpenMP, if we don’t use the OpenMP for construct, loop work-sharing can be done by:
a. Modifying the start and end values of the for loop, using a combination of
omp_get_thread_num() and omp_get_num_threads()
b. Modifying the start and end values of the for loop, using a combination of
omp_set_thread_num() and omp_set_num_threads()
c. Modifying the start and end values of the for loop, using a combination of
omp_get_thread_ID() and omp_get_num_threads()
d. Modifying the start and end values of the for loop, with the random() function and
omp_get_num_threads()
10. In OpenMP, a private variable has _________ address in the _______ context of every thread:
a. the same; execution
b. the same; memory
c. a different; execution
d. a different; variable
11. Assume a sequential program S has an execution time of 650 seconds. Now assume a parallel
variant of S takes 85.55 seconds to complete when we have 8 processors available. The Karp-Flatt
metric is approximately equal to:
a. 1.15
b. 0.129
c. 0.99
d. 0.007
12. Considering a 2-D mesh (without wraparound) with M rows and N columns, we would expect arc
connectivity of ______, bisection width of ________, and link cost _____________.
a. 3; minimum(M, N); 2MN – (M+N)
b. 4; minimum(M, N); 2MN + (MN)
c. 2; maximum(M, N); 2MN – (M+N)
d. 2; minimum(M, N); 2MN – (M+N)
13. If we use the schedule(static, 8) clause within the #pragma omp parallel for, we are enabling:
a. Each thread is assigned 1/8th of the total iterations of the for loop in round-robin manner
b. Each thread is assigned 8 contiguous iterations of the for loop in round-robin manner
c. Each idle thread is dynamically assigned 1/8th of the remaining iterations of the for loop
d. Each idle thread is dynamically assigned the 8 leftmost contiguous remaining iterations of
the for loop
14. We would use the lastprivate() clause in OpenMP when we want the master thread:
a. To copy the private copy of the variable from the thread that executed the last iteration
b. To copy the private copy of the variable from the thread that executed the first iteration
c. To copy the private copy of the variable from the thread that was created last
d. To copy the private copy of the variable from the thread that has the largest thread ID
15. Assume a sequential program S has an execution time of 400 seconds. Now assume a parallel
variant of S takes 55.55 seconds to complete when we have 8 processors available. The speedup
approximately equal to:
a. 50
b. 7.2
c. 0.9
d. 57.6
int k;
omp_set_num_threads(8);
int c = 10;
printf(“total threads are:%d \n”, omp_get_num_threads());
#pragmaomp parallel
{
#pragmaomp master
printf(“total threads are:%d \n”, omp_get_num_threads());
Output:
int brr[1000];
int i;
brr[0]=5;
brr[i]=brr[i-1]+1;
Parallel Code:
MPI_Init(&argc,&argv);
MPI_Statusstatus;
int p;
int i;
MPI_Comm_size(MPI_COMM_WORLD, &p);
int my_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
int a = my_rank;
int b;
int sendTag=1;
int recvTag=1;
MPI_Sendrecv(&a,1,MPI_INT,next,sendTag,
MPI_Finalize();
Output:
Bisection Width
Arc connectivity
b) Draw a 2-D mesh without wraparound having 16 nodes and calculate the values of above
parameters by first mentioning or deriving the formulas for these parameters for a 2-D mesh
without wraparound.