How OpenCL __private address space is mapped on GPU?

OpenCL spec says that “All variables inside a function (including __kernel functions), or passed into the function as arguments are in the __private or private address space. Variables declared as pointers are considered to point to the __private address space if an address space qualifier is not specified”.

I have few questions on this:

  1. I want to know how these variables mapped to GPU’ resources (i.e. to registers, or local memory or global memory)? Actually I know that these variables are stored in GPU’s registers which is on-chip. Is that correct or is it stored in off-chip memory?

  2. As in the given example below:

void func1(__global uint *input, __global uint *output, __private uint *arr)
{
//some stuff
}
__kernel void demoKernel(__global uint *d_input,
__global uint *d_output,
int d_maxSize)
{
long index = get_global_id(0);

  __private uint arr[12];
  func1(d_input[index], d_output[index], arr);

}

I am using Nvidia’ Kepler K20 GPU card in Ubuntu with OpenCL platform version 1.1 .
To find the number of resource usage info in the above example, I used “-cl-nv-verbose -cl-nv-maxrregcount=122” flag while building OpenCL kernel.
The resource usage info is given below:

a. when arr was declared as __private address space
ptxas info : Compiling entry function ‘demoKernel’ for ‘sm_35’
ptxas info : Function properties for demoKernel
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 38 registers, 332 bytes cmem[0] : 249

b. when arr was declared as __local address space
ptxas info : Compiling entry function ‘demoKernel’ for ‘sm_35’
ptxas info : Function properties for demoKernel
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 44 registers, 48+0 bytes smem, 332 bytes cmem[0] : 266

how number of registers usage got decreased when i use __private address space in place of __local? I think that when we use __private address space, number of register usage should be increased but that didn’t happen above.

Thanks !!

  1. Private variables are stored in registers on GPU. If the kernel uses more registers than available, some variables are stored instead in global memory (register spilling)

  2. Hard to say since this is the decision of the compiler. You can look at the PTX compiled code with clGetProgramInfo(CL_PROGRAM_BINARIES) to understand what happens under the hood.