1

I am trying out offloading an array calculation with GPU (GTX 1080Ti) using OpenMP and C++ on this dummy code that I have written:

#include <omp.h>
#include <iostream>

using namespace std;

int main(){

        //int totalSum, ompSum;
        int totalSum=0, ompSum=0;
        const int N = 1000;
        int array[N];
        for (int i=0; i<N; i++){
                array[i]=i;
        }
        #pragma omp target
        {
                #pragma omp parallel private(ompSum) shared(totalSum)
                {
                        ompSum=0;
                        omp_set_num_threads(100);
                        printf ( "Total number of threads are %d!\n", omp_get_num_threads() );
                        #pragma omp for
                        for (int i=0; i<N; i++){
                                ompSum += array[i];
                        }

                        #pragma omp critical
                        totalSum += ompSum;

                }

                printf ( "Caculated sum should be %d but is %d\n", N*(N-1)/2, totalSum );
        }
        return 0;


}

Upon running the code, this is the output I get:

Total number of threads are 8!
Total number of threads are 8!
Total number of threads are 8!
Total number of threads are 8!
Total number of threads are 8!
Total number of threads are 8!
Total number of threads are 8!
Total number of threads are 8!
Caculated sum should be 499500 but is 499500

The calculated sum is correct but I am curious why it shows only 8 threads compared to the 100 threads which I have set in the code.

When setting the omp_set_num_threads right below the #pragma omp target, the runtime will report

libgomp: cuCtxSynchronize error: an illegal memory access was encountered

I am new with OpenMP, I would greatly appreciate if someone could help explain this issue.

5
  • 1
    It's invalid to set the thread count inside a parallel region.
    – paddy
    Commented Aug 6, 2021 at 5:31
  • @paddy I tried setting outside, no change as well
    – OMEGOSH01
    Commented Aug 6, 2021 at 5:32
  • Are you sure that you have correctly setup GPU (and is OpenMP actually using it)? Please check the return value of int omp_is_initial_device(void); function inside target region. If the return value is 0, then you run this code on GPU, otherwise on CPU.
    – Laci
    Commented Aug 6, 2021 at 6:05
  • It is returning a value of 0, which I presume is the GPU @Laci
    – OMEGOSH01
    Commented Aug 6, 2021 at 6:17
  • 3
    Why is there a need to set the number of GPU threads? AFAIK, GCC will use one thread, per warp, so the correct writing of the offload directive would be #pragma omp target teams distribute parallel for simd reduction(+:totalSum) map(from:totalSum) map(to:array[:N]) Commented Aug 6, 2021 at 6:37

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.