const variable / memory latency

Hi,

I have this kernel:

__kernel void some_kernel(__global int* in_array, __global int* out_array)
{
uint tid = get_global_id(0);
out_array[tid] = in_array[tid] *3;
}

in_array is initialized in the beginning of my program - and never changes.

  1. Would it be beneficial if I change the kernel to “__global const int* in_array” ? And if so, why?

  2. Would it be OK to process one element per kernel - or would it be better to do a loop?

Or in a few words: What’s the best way to minimize the effects of memory latency here?

  1. Would it be beneficial if I change the kernel to “__global const int* in_array” ? And if so, why?

I don’t think it will make any difference since ‘const’ can be casted away. A compiler smart enough to notice that you didn’t remove the ‘const’ with a cast will also be smart enough to notice that you never write into that array.

Something that may help the compiler more is declaring the variables as ‘__global restrict int*’. Do this if you know that when you enqueue this kernel you are going to use a different buffer object for each of the arguments. ‘restrict’ in OpenCL works the same as in C99 if you are familiar with it.

  1. Would it be OK to process one element per kernel - or would it be better to do a loop?

It depends on your definition of “better”. If you mean “it executes faster” then the answer depends on how large is your NDRange. If you are going to execute a sufficiently large NDRange then doing a loop will improve performance because it will save the time it takes to start up and tear down a work-group after another. I don’t think there will be any savings related to memory latency.

Personally I would code for readability and let the driver/compiler decide about these kind of optimizations.

On some devices, for some sizes of data using __constant int* in_array might be an improvement.