Why can kernels take __local pointer arguments?

Why can kernel take __local pointer arguments?

As far as I know, the only meaningful value that the host could set for this argument is NULL. Sure, other OpenCL functions could call the kernel and pass a __local pointer that means something, but I can’t imagine a scenario where that makes sense. The kernel would likely be functioning in two different ways – thus make it two functions.

I just want to make sure I’m not missing something big regarding options to allocate __local memory. (i.e. not possible to do dynamically based on workgroup size right?)

I’m speculating this was just an easier standard to specify since all other functions can take __local pointers.

The only valid pointer for __local agruments passed from the host is NULL. However, the size can be specified to change the amount of local memory on the device the kernel will use. This is as close as it gets to dynamically allocating local memory.

Ok, wow. I did’n realize that could be done, but now I see in the clSetKernelArg() documentation. Thanks.

I guess one might do this if not enforcing a workgroup size at kernel compile-time but however did want to specify workgroup size at the time of clEnqueueNDRangeKernel()? Otherwise you might now know how big to make the _local array.

It seems strange that the workgroup size constraints could change every time that the _local* kernel argument is changed. Right? If you call for a large _local array with the API, that has to drive the workgroup size down…

If you statically allocated local memory, even with a macro definition, then you can’t change it for the kernel unless you recompile. This method seems to be for compiling to binary and distributing that binary that can still run on a wide range of hardware.

Constraints on the workgroup size and the number of workgroups able to swap in and out of a compute unit may change depending on the amount of local memory used. I recommend reading the AMD Accelerated Parallel Processing OpenCL Programming Guide for some more incite.

So, just to confirm. With local memory, you allocate with NULL on the host side and then set the kernel arg.
This just allocates local memory right? You can’t fill it or edit it from the host-side? It just makes local memory for use internally in the kernel?

Cheers.

That’s correct.

Cheers!

Ok thanks. Another further question. Is local memory intended therefore for memory used locally within the workitem or can it be accessed across the workgroup? ie. You pass in the pointer as an arg and this can then be used by other items in the same work group? (Since Private memory is lower down the hierarchy) Or is it simply for workitems to have a bit of local memory?

Thanks.

I’m pretty sure this is covered in the doco.

But local memory is shared across all items in the workgroup yes, there wouldn’t be much point to it otherwise (registers are already local to the work-item). It is the only way (short of the atomics) to communicate within the work-group which is often desirable.

As to your original query, passing null + length for local arguments to kernels is the only way to dynamically allocate local memory for a given kernel: which you need if you want the same code to work on different sized problems without wasting memory.

Great thanks. I was clarifying since the docs I have read are very clear on what it can do but not on what it can’t do, so just wanted to clarify.

Thanks.