Hi everyone,
I’ve read somewhere (some forum I cannot recall right now) that allocating local (“shared” in nvidia cuda nomenclature) memory statically like below should be avoided since it’s implementation dependend:
__local float s_elData[32];
The dynamic allocation using kernel args and clSetKernelArg should be used instead:
Unfortunately when I’m using the latter method my register usage increases from 14 to 19 - no other change in code, just the way of allocation. So I rather stick to the former - static - method of allocation - is it safe or really should be avoided?
Thanks!
You mean when allocating statically? In kernel. Like this:
__kernel void K(
//.. kernel args
){
//definition of s_el
__local float s_el[32];
//.. download data from global to s_el, make computations in parallel, store results from s_el back to global
}
So kernel scope. I need it only to download some data from global memory to it and then perform a lot of computations in the kernel and store the results back to global memory.
It’s working (on nvidia opencl implementation) and the reg consumption is lower then if I allocated dynamically with kernel arguments and clSetKernelArg (The s_el array is always constant size so I don’t need dynamic allocation). Is this way of defining variables in local mem all right?
You are doing things fine. Declaring a local variable at kernel scope is perfectly legal. See section 6.5.2 of the CL 1.1 spec; there’s even an example. Don’t worry about that.