I am trying out offloading an array calculation with GPU (GTX 1080Ti) using OpenMP and C++ on this dummy code that I have written:
#include <omp.h>
#include <iostream>
using namespace std;
int main(){
//int totalSum, ompSum;
int totalSum=0, ompSum=0;
const int N = 1000;
int array[N];
for (int i=0; i<N; i++){
array[i]=i;
}
#pragma omp target
{
#pragma omp parallel private(ompSum) shared(totalSum)
{
ompSum=0;
omp_set_num_threads(100);
printf ( "Total number of threads are %d!\n", omp_get_num_threads() );
#pragma omp for
for (int i=0; i<N; i++){
ompSum += array[i];
}
#pragma omp critical
totalSum += ompSum;
}
printf ( "Caculated sum should be %d but is %d\n", N*(N-1)/2, totalSum );
}
return 0;
}
Upon running the code, this is the output I get:
Total number of threads are 8!
Total number of threads are 8!
Total number of threads are 8!
Total number of threads are 8!
Total number of threads are 8!
Total number of threads are 8!
Total number of threads are 8!
Total number of threads are 8!
Caculated sum should be 499500 but is 499500
The calculated sum is correct but I am curious why it shows only 8 threads compared to the 100 threads which I have set in the code.
When setting the omp_set_num_threads
right below the #pragma omp target
, the runtime will report
libgomp: cuCtxSynchronize error: an illegal memory access was encountered
I am new with OpenMP, I would greatly appreciate if someone could help explain this issue.
int omp_is_initial_device(void);
function insidetarget
region. If the return value is 0, then you run this code on GPU, otherwise on CPU.#pragma omp target teams distribute parallel for simd reduction(+:totalSum) map(from:totalSum) map(to:array[:N])