Hi,
My OpenCL kernel seems to have issues when I execute it with a large global_work_size.
On a Macbook pro with an ‘Intel Iris 1536 MB’ it returns no results (all ints in the out buffer are 0).
200_000_000 works fine but 300_000_000 does not.
I searched for limits on the global_work_size but if I understand correctly there are no limits.
Does anyone have a clue about why this is happening?
I stripped down my kernel to an example that is as simple as possible:
__kernel void process_moves_with_local(__global int* out)
{
int global_id = get_global_id(0);
int test[128];
for (int i=0; i < 128; i++) {
test[i] = i;
}
if (global_id < 5) {
out[1] = test[1];
}
out[0] = 2;
}
This is off course a silly example but it’s enough to demonstrate the issue.
I write the test[1] to the output buffer as otherwise the issue does not occur.
I guess the compiler optimises the code and removes the array initialisation when I do that.
I run the kernel with this host code:
int nrOfMoves = 10;
final int dstArray[] = new int[nrOfMoves];
final Pointer dst = Pointer.to(dstArray);
final cl_mem memObjects[] = new cl_mem[1];
memObjects[0] = clCreateBuffer(context.context, CL_MEM_READ_WRITE, Sizeof.cl_int * nrOfMoves, null, null);
clSetKernelArg(kernel, 0, Sizeof.cl_mem, Pointer.to(memObjects[0]));
final long global_work_size[] = new long[] { 200000000 };
final long local_work_size[] = new long[] { 64 };
clEnqueueNDRangeKernel(commandQueue, kernel, 1, null, global_work_size, local_work_size, 0, null, null);
clEnqueueReadBuffer(commandQueue, memObjects[0], CL_TRUE, 0, nrOfMoves * Sizeof.cl_int, dst, 0, null, null);
Thanks in advance,
Joep