I have followed the guidance offered by this question and this link which deal with the concepts of passing arrays of pointers to a device and back but I seem to be struggling with my specific case when the pointer point to an object. See example code below where I have removed error checking for brevity.
// Kernel
__global__ void myKernel(Obj** d_array_of_objs)
{
// Change the scalar of each object to 5
// by dereferencing device array to get
// appropriate object pointer.
*d_array_of_objs->changeToFive(); <--------- SEE QUESTION 4
}
// Entry point
int main()
{
/********************************/
/* INITIALISE OBJ ARRAY ON HOST */
/********************************/
// Array of 3 pointers to Objs
Obj* h_obj[3];
for (int i = 0; i < 3; i++) {
h_obj[i] = new Obj(); // Create
h_obj[i]->scalar = i * 10; // Initialise
}
// Write out
for (int i = 0; i < 3; i++) {
std::cout << h_obj[i]->scalar << std::endl;
}
/**************************************************/
/* CREATE DEVICE VERSIONS AND STORE IN HOST ARRAY */
/**************************************************/
// Create host pointer to array-like storage of device pointers
Obj** h_d_obj = (Obj**)malloc(sizeof(Obj*) * 3); <--------- SEE QUESTION 1
for (int i = 0; i < 3; i++) {
// Allocate space for an Obj and assign
cudaMalloc((void**)&h_d_obj[i], sizeof(Obj));
// Copy the object to the device (only has single scalar field to keep it simple)
cudaMemcpy(h_d_obj[i], &(h_obj[i]), sizeof(Obj), cudaMemcpyHostToDevice);
}
/**************************************************/
/* CREATE DEVICE ARRAY TO PASS POINTERS TO KERNEL */
/**************************************************/
// Create a pointer which will point to device memory
Obj** d_d_obj = nullptr;
// Allocate space for 3 pointers on device at above location
cudaMalloc((void**)&d_d_obj, sizeof(Obj*) * 3);
// Copy the pointers from the host memory to the device array
cudaMemcpy(d_d_obj, h_d_obj, sizeof(Obj*) * 3, cudaMemcpyHostToDevice);
/**********
* After the above, VS2013 shows the memory pointed to by d_d_obj
* to be NULL <------- SEE QUESTION 2.
**********/
// Launch Kernel
myKernel <<<1, 3>>>(d_d_obj);
// Synchronise and pass back to host
cudaDeviceSynchronize();
for (int i = 0; i < 3; i++) {
cudaMemcpy(&(h_obj[i]), h_d_obj[i], sizeof(Obj), cudaMemcpyDeviceToHost); <--------- SEE QUESTION 3
}
// Write out
for (int i = 0; i < 3; i++) {
std::cout << h_obj[i]->scalar << std::endl;
}
return 0;
}
So the questions are:
If the line indicated by
SEE QUESTION 1
above allocates host memory for the pointers, and once I have usedcudaMalloc
in the subsequent loop to allocate device memory, the pointer pointed to by h_d_obj get overwritten with device addresses, does that mean I have allocated host memory for 3Obj*
that now has no pointer pointing to it?Why is the
cudaMemcpy
succeeding when I test the status returned but clearly does not copy the addresses correctly? I was expecting the "arrays" of memory address of bothh_d_obj
andd_d_obj
to be the same since they should point to the sameObj
in the device address space.At the line
SEE QUESTION 3
, assuming I'm correct in question 2. I also expect to be able to use eitherh_d_obj
ord_d_obj
to retrieve theObj
objects from the device since the difference would be only whether I dereference a host pointer to access a device pointer toObj
or a device pointer both of which I can do in acudaMemcpy
method right? If I use what is written, the copy succeeds but the pointer ath_obj[0]
is corrupted and I cannot write out the data.At the line
SEE QUESTION 4
, why can I not dereference anObj**
to get aObj*
then use the->
operator to call a device method? The compiler moans that it is not a pointer to class type which the fact that it is aObj*
tells me it is.