Pass scalars to kernel and avoid creating CLBuffer each time

I have to execute a kernel many times where the only change is that two buffer are exchanged and one is updated before the call: For example the first call would be


myContext.program.myKernel(
    myContext.queue,
    (NThreads,),
    None,
    bufferA,
    bufferB,
    bufferState)

and the next


myContext.program.myKernel(
    myContext.queue,
    (NThreads,),
    None,
    bufferB,
    bufferA,
    bufferState)

but bufferState contains actually only one interger which is increased by 1 each call. Can I somehow avoid creating a new clBuffer each time to pass this integer? All threads have to be sychron. So increasing this integer inside the kernel each call is not possible due to asynchron access.

I’m also wondering if I can pass only scalar values instead of arrays. Up to now I create a numpy array, then a clBuffer and after passing it to the kernel I extract the scalar out of the array inside the kernel. This seems a bit to much effort. Maybe this is specific to pyopencl, I don’t know.

Ok, I found a solution and it didn’t speed up anything. :slight_smile:

Anyway: To pass only a variable instead of a whole array when using pyopencl one has to use the numpy datatypes. Example:


__kernel void foo(int x, int y)
{
    y = x + 1;
}

To call this kernel in python one has to write:


x = 17
y = 14
program.foo(queue, (1,), None, numpy.int32(x), numpy.int32(y))

The ushort corresponds to numpy.int16 and so on …