global_work_size seems to have a limit

joepadmiraal · November 16, 2016, 11:38am

Hi,

My OpenCL kernel seems to have issues when I execute it with a large global_work_size.
On a Macbook pro with an ‘Intel Iris 1536 MB’ it returns no results (all ints in the out buffer are 0).
200_000_000 works fine but 300_000_000 does not.
I searched for limits on the global_work_size but if I understand correctly there are no limits.
Does anyone have a clue about why this is happening?

I stripped down my kernel to an example that is as simple as possible:


__kernel void process_moves_with_local(__global int* out)
{
	int global_id = get_global_id(0);

	int test[128];
	for (int i=0; i < 128; i++) {
		test[i] = i;
	}

	if (global_id < 5) {
		out[1] = test[1];
	}

	out[0] = 2;
}

This is off course a silly example but it’s enough to demonstrate the issue.
I write the test[1] to the output buffer as otherwise the issue does not occur.
I guess the compiler optimises the code and removes the array initialisation when I do that.

I run the kernel with this host code:


	int nrOfMoves = 10;

	final int dstArray[] = new int[nrOfMoves];
	final Pointer dst = Pointer.to(dstArray);
	final cl_mem memObjects[] = new cl_mem[1];
	memObjects[0] = clCreateBuffer(context.context, CL_MEM_READ_WRITE, Sizeof.cl_int * nrOfMoves, null, null);
	clSetKernelArg(kernel, 0, Sizeof.cl_mem, Pointer.to(memObjects[0]));

	final long global_work_size[] = new long[] { 200000000 };
	final long local_work_size[] = new long[] { 64 };

	clEnqueueNDRangeKernel(commandQueue, kernel, 1, null, global_work_size, local_work_size, 0, null, null);

	clEnqueueReadBuffer(commandQueue, memObjects[0], CL_TRUE, 0, nrOfMoves * Sizeof.cl_int, dst, 0, null, null);

Thanks in advance,
Joep