Pointer to Array of Pointers to Objects in CUDA

Question

I have followed the guidance offered by this question and this link which deal with the concepts of passing arrays of pointers to a device and back but I seem to be struggling with my specific case when the pointer point to an object. See example code below where I have removed error checking for brevity.

// Kernel
__global__ void myKernel(Obj** d_array_of_objs)
{
    // Change the scalar of each object to 5
    // by dereferencing device array to get 
    // appropriate object pointer.
    *d_array_of_objs->changeToFive();    <--------- SEE QUESTION 4
}

// Entry point
int main()
{

    /********************************/
    /* INITIALISE OBJ ARRAY ON HOST */
    /********************************/

    // Array of 3 pointers to Objs
    Obj* h_obj[3];
    for (int i = 0; i < 3; i++) {
        h_obj[i] = new Obj();       // Create
        h_obj[i]->scalar = i * 10;  // Initialise
    }

    // Write out
    for (int i = 0; i < 3; i++) {
        std::cout << h_obj[i]->scalar << std::endl;
    }


    /**************************************************/
    /* CREATE DEVICE VERSIONS AND STORE IN HOST ARRAY */
    /**************************************************/

    // Create host pointer to array-like storage of device pointers
    Obj** h_d_obj = (Obj**)malloc(sizeof(Obj*) * 3);    <--------- SEE QUESTION 1
    for (int i = 0; i < 3; i++) {
        // Allocate space for an Obj and assign
        cudaMalloc((void**)&h_d_obj[i], sizeof(Obj));
        // Copy the object to the device (only has single scalar field to keep it simple)
        cudaMemcpy(h_d_obj[i], &(h_obj[i]), sizeof(Obj), cudaMemcpyHostToDevice);
    }

    /**************************************************/
    /* CREATE DEVICE ARRAY TO PASS POINTERS TO KERNEL */
    /**************************************************/

    // Create a pointer which will point to device memory
    Obj** d_d_obj = nullptr;
    // Allocate space for 3 pointers on device at above location
    cudaMalloc((void**)&d_d_obj, sizeof(Obj*) * 3);
    // Copy the pointers from the host memory to the device array
    cudaMemcpy(d_d_obj, h_d_obj, sizeof(Obj*) * 3, cudaMemcpyHostToDevice);


    /**********
     * After the above, VS2013 shows the memory pointed to by d_d_obj 
     * to be NULL <------- SEE QUESTION 2.
     **********/


    // Launch Kernel
    myKernel <<<1, 3>>>(d_d_obj);

    // Synchronise and pass back to host
    cudaDeviceSynchronize();
    for (int i = 0; i < 3; i++) {
        cudaMemcpy(&(h_obj[i]), h_d_obj[i], sizeof(Obj), cudaMemcpyDeviceToHost);     <--------- SEE QUESTION 3
    }

    // Write out
    for (int i = 0; i < 3; i++) {
        std::cout << h_obj[i]->scalar << std::endl;
    }

    return 0;
}

So the questions are:

If the line indicated by SEE QUESTION 1 above allocates host memory for the pointers, and once I have used cudaMalloc in the subsequent loop to allocate device memory, the pointer pointed to by h_d_obj get overwritten with device addresses, does that mean I have allocated host memory for 3 Obj* that now has no pointer pointing to it?
Why is the cudaMemcpy succeeding when I test the status returned but clearly does not copy the addresses correctly? I was expecting the "arrays" of memory address of both h_d_obj and d_d_obj to be the same since they should point to the same Obj in the device address space.
At the line SEE QUESTION 3, assuming I'm correct in question 2. I also expect to be able to use either h_d_obj or d_d_obj to retrieve the Obj objects from the device since the difference would be only whether I dereference a host pointer to access a device pointer to Obj or a device pointer both of which I can do in a cudaMemcpy method right? If I use what is written, the copy succeeds but the pointer at h_obj[0] is corrupted and I cannot write out the data.
At the line SEE QUESTION 4, why can I not dereference an Obj** to get a Obj* then use the -> operator to call a device method? The compiler moans that it is not a pointer to class type which the fact that it is a Obj* tells me it is.

Robert Crovella · Accepted Answer · 2016-08-16 18:48:16Z

First of all, it's convenient if you provide a complete code, including a definition for the Obj class. I've provided one based on inspection of your code and some guesswork.

Second, much of your confusion here seems to be a less-than-crisp facility with pointers in C (or C++). Using the CUDA API with double-pointer constructions (**) between host and device requires a crisp understanding and ability to visualize what is happening.

If the line indicated by SEE QUESTION 1 above allocates host memory for the pointers, and once I have used cudaMalloc in the subsequent loop to allocate device memory, the pointer pointed to by h_d_obj get overwritten with device addresses, does that mean I have allocated host memory for 3 Obj* that now has no pointer pointing to it?

No. h_d_obj is established (i.e. given a meaningful value) by the malloc operation. Nothing you have done subsequent to that modifies the value of h_d_obj.

Why is the cudaMemcpy succeeding when I test the status returned but clearly does not copy the addresses correctly? I was expecting the "arrays" of memory address of both h_d_obj and d_d_obj to be the same since they should point to the same Obj in the device address space.

I don't see anything wrong with your code up to this point. The value of h_d_obj was established (previously) by malloc, and the numerical value of it is an address in host memory. The value of d_d_obj was established by cudaMalloc, and the numerical value of it is an address in device memory. Numerically, I would expect them to be different.

At the line SEE QUESTION 3, assuming I'm correct in question 2. I also expect to be able to use either h_d_obj or d_d_obj to retrieve the Obj objects from the device since the difference would be only whether I dereference a host pointer to access a device pointer to Obj or a device pointer both of which I can do in a cudaMemcpy method right? If I use what is written, the copy succeeds but the pointer at h_obj[0] is corrupted and I cannot write out the data.

NO. You cannot dereference a device pointer in host code, even if it is a parameter in cudaMemcpy. This is legal as a source or destination in a cudaMemcpy operation:

h_d_obj[i]

This is not legal:

d_d_obj[i]

The reason is that in order to get the actual target address, I must dereference a host pointer (i.e. access a memory location on the host) in the first case, but a device pointer in the second case. From host code, I can retrieve the contents of h_d_obj[i]. I am not allowed to try to retrieve the contents of d_d_obj[i] in host code (and the parameter manipulation for cudaMemcpy is host code). The value of d_d_obj can be used as a destination from host code. d_d_obj[i] cannot.

At the line SEE QUESTION 4, why can I not dereference an Obj** to get a Obj* then use the -> operator to call a device method? The compiler moans that it is not a pointer to class type which the fact that it is a Obj* tells me it is.

The compiler is barking at you because you don't understand the order of operations between the various operators (*, ->) that you are using. If you add parenthesis to identify the correct order:

(*d_array_of_objs)->changeToFive();

Then the compiler won't object to that (although I would do it slightly differently as below).

Here's a modified version of your code with the Obj definition addition, a slight change to the kernel so that independent threads work on independent objects, and a few other fixes. Your code was mostly correct:

$ cat t1231.cu
#include <iostream>

class Obj{

  public:
  int scalar;
  __host__ __device__
  void changeToFive() {scalar = 5;}
};

// Kernel
__global__ void myKernel(Obj** d_array_of_objs)
{
    // Change the scalar of each object to 5
    // by dereferencing device array to get
    // appropriate object pointer.
    int idx = threadIdx.x+blockDim.x*blockIdx.x;
    // (*d_array_of_objs)->changeToFive();  //  <--------- SEE QUESTION 4 (add parenthesis)
    d_array_of_objs[idx]->changeToFive();
}

// Entry point
int main()
{

    /********************************/
    /* INITIALISE OBJ ARRAY ON HOST */
    /********************************/

    // Array of 3 pointers to Objs
    Obj* h_obj[3];
    for (int i = 0; i < 3; i++) {
        h_obj[i] = new Obj();       // Create
        h_obj[i]->scalar = i * 10;  // Initialise
    }

    // Write out
    for (int i = 0; i < 3; i++) {
        std::cout << h_obj[i]->scalar << std::endl;
    }


    /**************************************************/
    /* CREATE DEVICE VERSIONS AND STORE IN HOST ARRAY */
    /**************************************************/

    // Create host pointer to array-like storage of device pointers
    Obj** h_d_obj = (Obj**)malloc(sizeof(Obj*) * 3); //    <--------- SEE QUESTION 1
    for (int i = 0; i < 3; i++) {
        // Allocate space for an Obj and assign
        cudaMalloc((void**)&h_d_obj[i], sizeof(Obj));
        // Copy the object to the device (only has single scalar field to keep it simple)
        cudaMemcpy(h_d_obj[i], &(h_obj[i]), sizeof(Obj), cudaMemcpyHostToDevice);
    }

    /**************************************************/
    /* CREATE DEVICE ARRAY TO PASS POINTERS TO KERNEL */
    /**************************************************/

    // Create a pointer which will point to device memory
    Obj** d_d_obj = NULL;
    // Allocate space for 3 pointers on device at above location
    cudaMalloc((void**)&d_d_obj, sizeof(Obj*) * 3);
    // Copy the pointers from the host memory to the device array
    cudaMemcpy(d_d_obj, h_d_obj, sizeof(Obj*) * 3, cudaMemcpyHostToDevice);


    /**********
     * After the above, VS2013 shows the memory pointed to by d_d_obj
     * to be NULL <------- SEE QUESTION 2.
     **********/


    // Launch Kernel
    myKernel <<<1, 3>>>(d_d_obj);

    // Synchronise and pass back to host
    cudaDeviceSynchronize();
    for (int i = 0; i < 3; i++) {
        cudaMemcpy(h_obj[i], h_d_obj[i], sizeof(Obj), cudaMemcpyDeviceToHost);  //   <--------- SEE QUESTION 3  remove parenthesis
    }

    // Write out
    for (int i = 0; i < 3; i++) {
        std::cout << h_obj[i]->scalar << std::endl;
    }

    return 0;
}
$ nvcc -o t1231 t1231.cu
$ cuda-memcheck ./t1231
========= CUDA-MEMCHECK
0
10
20
5
5
5
========= ERROR SUMMARY: 0 errors
$

A diagram of h_d_obj and d_d_obj might help:

HOST                               |    DEVICE
h_d_obj-->(Obj *)-------------------------->Obj0<---(Obj *)<----|
          (Obj *)-------------------------->Obj1<---(Obj *)     |
          (Obj *)-------------------------->Obj2<---(Obj *)     |
                                   |                            |
d_d_obj---------------------------------------------------------|
HOST                               |    DEVICE

You're allowed to access any quantity (location) on the left hand side (HOST) of the above diagram, in host code, or in a cudaMemcpy operation. You're not allowed to access any quantity (location) on the right hand side, in host code.

Thank you @Robert Crovella for your answer. The diagram is very helpful. Also thank you for being so polite in by describing my confusion as "less-than-crisp", very kind of you :) — CodingLumis, Commented Aug 17, 2016 at 7:27
I think I understand my mis-understanding in question 1. Obj** h_d_obj = (Obj**)malloc(sizeof(Obj*) * 3); allocates host memory enough to hold 3 pointers and returns a pointer to the memory to h_d_obj then cudaMalloc((void**)&h_d_obj[i], sizeof(Obj)); dereferences h_d_obj[i] to get the i-th pointer in host memory (although this pointer has never been set), then gets the address of this pointer with & to which cudaMalloc returns the address of an Obj in device memory. So h_d_obj points to host memory with device pointers and although h_d_obj has changed h_d_obj hasn't. — CodingLumis, Commented Aug 17, 2016 at 7:32

Collectives™ on Stack Overflow

Pointer to Array of Pointers to Objects in CUDA

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged
c++
c
arrays
pointers
cuda
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged c++carrayspointerscuda or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
c++
c
arrays
pointers
cuda
or ask your own question.