__global vs __constant qualifier in OpenCL

I want an array variable to have a program scope.

One way I can do this by passing it as a function pointer throughout the program, which might be complex when we have multiple functions reading/writing this array variable.
Second way to do this, is to have a global variable having program scope. As per the OpenCL specification, Global variables are declared in the program source with the __constant qualifier and are accessed as read-only variables.

//I am writing one sample program to demonstrate my problem:
__constant uint arr[2] = {0, 0}; // an array of unsigned integer
void func1 (uint tmp)
{
for(int i = 0; i < 2; i ++)
arr[i] = tmp+i;
}
void func2(uint tmp)
{
for(int i = 0; i < 2; i++)
tmp = arr[i];
}
__kernel void demoKernel(__global uint *input,
__global uint *output)
{
uint index = get_global_id(0);
func1(input[index]);
func2(output[index]);
}

when i compiled this i got the following error:

tmp/OCLrnlEIO.cl", line 5: error: expression must be a modifiable lvalue
arr[i] = tmp+i;
I searched in google for this error, i found that it is because of type of “arr” is array of 2 length (it is not a pointer).

So my questions are:

  1. What is reason for this error and How I can fix this ?
  2. secondly, my requirement is to not only to read the array, but also write on it, so how should I use __constant qualifier for that which is read-only variable?

Thanks in Advance !!

  1. As its name suggests, a __constant variable cannot be modified. So a line such as ‘arr[i] = tmp+i’ is incorrect.

  2. A variable defined at program scope can be either __constant or __global. So you could declare arr as __global.
    However, your program is written so that arr should store a different value for each work-item, so its correct address space is __private, and arr cannot be declared __private at program scope.

You must declare arr as a __private array inside demoKernel and pass it as an argument to func1 and func2.

[QUOTE=utnapishtim;29586]1. As its name suggests, a __constant variable cannot be modified. So a line such as ‘arr[i] = tmp+i’ is incorrect.

  1. A variable defined at program scope can be either __constant or __global. So you could declare arr as __global.
    However, your program is written so that arr should store a different value for each work-item, so its correct address space is __private, and arr cannot be declared __private at program scope.

You must declare arr as a __private array inside demoKernel and pass it as an argument to func1 and func2.[/QUOTE]

Thanks for helping me out !!!
One more thing i want to know, lets assume number of elements in arr is 44 then would it be efficient to use address space __local in place of __private?
I thought of using __local address space for this but I am not sure whether shared memory is going to help me in this case, since i know that shared memory is only useful if we need to access data more than once, either within the same thread or from different threads within a block.

Please provide an insight here to get better performance in this scenario.

__local memory is used to share data among work-items of the same work-group, which implies that your algorithm can be split into blocks.
In your simple case, arr contains temporary data relative to a work-item, not to a work-group, so its natural memory space is __private.
Private memory is the fastest and GPUs generally have more private memory than local memory.

[QUOTE=utnapishtim;29600]__local memory is used to share data among work-items of the same work-group, which implies that your algorithm can be split into blocks.
In your simple case, arr contains temporary data relative to a work-item, not to a work-group, so its natural memory space is __private.
Private memory is the fastest and GPUs generally have more private memory than local memory.[/QUOTE]

Thank you,
i used __private address space and got result little faster compare to __local