Unit 4 Shared-Memory Parallel Programming With Openmp
Unit 4 Shared-Memory Parallel Programming With Openmp
Unit 4 Shared-Memory Parallel Programming With Openmp
omp parallel
{
printf("Hello world!\n");
}
return 0;
}
#include <stdio.h>
#include <omp.h>
printf("I’m
#pragma omp the parallel
master
{ thread, I’m
alone.\n");
int num_threads, thread_id; num_threads =
omp_get_num_threads(); thread_id =
omp_get_thread_num();
printf("Hello world! I’m thread No.% d out of %d
threads.\n",thread_id,num_threads);
}
return 0;
}
“Manual” loop parallelization
Now, let’s try to “manually” parallelize a for-loop, that is, divide the
iterations evenly among the threads.
Example:
N/num_threads;
num_threads = omp_get_num_threads();
thread_id = omp_get_thread_num();
blen = N/num_threads;
if (thread_id < (N% num_threads)) { blen =
blen + 1;
bstart = blen * thread_id;
}
else {
bstart = blen * thread_id + (N%
num_threads);
}
bend = bstart + blen;
Each thread can either declare new local variables inside the
parallel region, these variables are private “by birth”;
Or, each thread can “privatize” some of the shared variables
that already existed before a parallel region (using the
p r i v a t e clause):
int blen, bstart, bend;
#pragma omp parallel private(blen, bstart, bend)
{
// ...
}
or simply
Serial implementation:
int N, i;
double w = 1.0/N, x, approx_pi; double
sum = 0.;
approx_pi = w*sim;
A naive OpenMP implementation
int N, i;
double w = 1.0/N, x, approx_pi = 0.; double
sum = 0.;
int N, i;
double sum = 0.;
double w = 1.0/N, x, approx_pi;
approx_pi = w*sim;
Another example of using the reduction
clause
#pragma single
{
for (i=0; i<N; i++) {
r = rand(); // a randomly generated number if (p[i] > r) {
#pragma task
{
do_some_work (p[i]);
}
} // enf of if-test
} // end of for-loop
} // end of the single
directive
/*
pointer swapping */
temp_ptr = phi_new;
phi_new = phi;
phi =
temp_ptr;
}
OpenMP-parallel Jacobi algorithm
/*
pointer swapping */
temp_ptr = phi_new;
phi_new = phi;
phi =
temp_ptr;
OpenMP-parallel Jacobi algorithm (version 2)
double maxdelta = 1.0, eps = 1.0e-14;
} // end of
Challenge: Parallelizing 3D Gauss-Seidel algorithm
Note: The upper limits of k, j and i are different from those given
in Chapter 6 of the textbook.
Wavefront parallelization
numthreads = omp_get_num_threads();
threadID = omp_get_thread_num();
jstart = ((jmax-2)*threadID)/numthreads + 1;
jend = ((jmax-2)*(threadID+1))/numthreads;