Associate memory chunks with compute units

Cbswe · April 3, 2011, 10:18am

Hi!

As I understand it. You cannot create arrays of a runtime-detirmined-sized arrays in OpenCL C. However you can allocate runtime-detirmined-sized buffers and pass them as pointers to the kernels.

Now. I want to associate one runtime-detirmined-sized memory chunk with each compute unit. And “recycle” this memory for each new kernel run on a core in the GPU.

I tried to allocate a memory chunk that can hold the memory for all cores and then have an offset calculated in each kernel. But I didn’t find a way to calculate this offset because you cannot determine witch core you are running on in a kernel and you cannot distribute the memory with some modulating int since there is no way to make sure nothing happens between the reading and the increment of the int.

Any tip on this is very much appreciated. This might be an easy problem. I’m fairly new to the framework so I might have missed some fundamental concept that prevents this issue from the beginning.

david.garcia · April 3, 2011, 4:39pm

You cannot create arrays of a runtime-detirmined-sized arrays in OpenCL C. However you can allocate runtime-detirmined-sized buffers and pass them as pointers to the kernels.

Correct.

I want to associate one runtime-detirmined-sized memory chunk with each compute unit. And “recycle” this memory for each new kernel run on a core in the GPU.

You already lost me here. This is my guess of what you are looking for: you want to allocate a buffer object and pass different non-overlapping pieces of it to different kernel invocations? Is that it? You could use sub-buffer objects.

Rather than talking about “cores” and “compute units” it’s less ambiguous to talk about OpenCL devices, NDRanges/kernels, work-groups and work-items.

Cbswe · April 4, 2011, 10:45am

You already lost me here. This is my guess of what you are looking for: you want to allocate a buffer object and pass different non-overlapping pieces of it to different kernel invocations? Is that it? You could use sub-buffer objects.

Rather than talking about “cores” and “compute units” it’s less ambiguous to talk about OpenCL devices, NDRanges/kernels, work-groups and work-items.[/quote:1vzitva9]

Thanks a lot for the reply!

The thing is. Wouldn’t that allocate one memory chunk per kernel? It would be preffered if it didnt allocate more then there will be parallel kernels running. So that would be the number of compute units. Or does the subbuffer automatically “recycle” the memory?

For example: I got 30 cores in my graphic card. That is 30 compute units. I then run 100 kernels. Will it allocate 100 memory chunks or will it allocate 30 and reuse them with the subbuffer strategy?

david.garcia · April 4, 2011, 3:21pm

The thing is. Wouldn’t that allocate one memory chunk per kernel? It would be preffered if it didnt allocate more then there will be parallel kernels running. So that would be the number of compute units. Or does the subbuffer automatically “recycle” the memory?

Sorry, I’m having a lot of trouble following you. What do you mean by “kernels” or “parallel kernels”. Do you mean (a) NDRange commands or (b) work-items?

A subbuffer object is nothing more than a way to represent a slice or a region of a parent buffer object. It’s not a new memory allocation.