OpenCL spec says that “All variables inside a function (including __kernel functions), or passed into the function as arguments are in the __private or private address space. Variables declared as pointers are considered to point to the __private address space if an address space qualifier is not specified”.
I have few questions on this:
-
I want to know how these variables mapped to GPU’ resources (i.e. to registers, or local memory or global memory)? Actually I know that these variables are stored in GPU’s registers which is on-chip. Is that correct or is it stored in off-chip memory?
-
As in the given example below:
void func1(__global uint *input, __global uint *output, __private uint *arr)
{
//some stuff
}
__kernel void demoKernel(__global uint *d_input,
__global uint *d_output,
int d_maxSize)
{
long index = get_global_id(0);
__private uint arr[12];
func1(d_input[index], d_output[index], arr);
}
I am using Nvidia’ Kepler K20 GPU card in Ubuntu with OpenCL platform version 1.1 .
To find the number of resource usage info in the above example, I used “-cl-nv-verbose -cl-nv-maxrregcount=122” flag while building OpenCL kernel.
The resource usage info is given below:
a. when arr was declared as __private address space
ptxas info : Compiling entry function ‘demoKernel’ for ‘sm_35’
ptxas info : Function properties for demoKernel
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 38 registers, 332 bytes cmem[0] : 249
b. when arr was declared as __local address space
ptxas info : Compiling entry function ‘demoKernel’ for ‘sm_35’
ptxas info : Function properties for demoKernel
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 44 registers, 48+0 bytes smem, 332 bytes cmem[0] : 266
how number of registers usage got decreased when i use __private address space in place of __local? I think that when we use __private address space, number of register usage should be increased but that didn’t happen above.
Thanks !!