Hi All,
I am using JOCL as the binding for OpenCL.
Exact same code running OK (and fast) on Snow Leopard (OpenCL 1.0) but is throwing the following error on Lion (OpenCL 1.1):
com.jogamp.opencl.CLException$CLInvalidWorkGroupSizeException: can not enqueue 1DRange CLKernel [id: 140699706121896 name: IntegrateHHStep]
with gwo: null gws: {256} lws: {256}
cond.: null events: null [error: CL_INVALID_WORK_GROUP_SIZE]
I am using the following code to define the local workgroup size and global worksize for the I/O buffers:
// Length of array to process
int elementCount = models.size();
// Local work size dimensions for the selected device
int localWorkSize = min(device.getMaxWorkGroupSize(), 256);
// rounded up to the nearest multiple of the localWorkSize
int globalWorkSize = roundUp(localWorkSize, elementCount);
// results buffers are bigger as we are capturing every value for every item for every time-step
int globalWorkSize_Results = roundUp(localWorkSize, elementCount*timeConfigSteps);
If I set the localWorkSize to 0, so that it will pick-up automatically a work-size this eventually works on Lion but performance (when compared to previous performance on Snow Leopard) goes down a lot:
int elementCount = models.size();
int localWorkSize = 0;
int globalWorkSize = elementCount;
int globalWorkSize_Results = elementCount*timeConfigSteps;
Can someone explain what could be going on here? Is the error about the local or the global workgroup size and has anyone a clue how to troubleshoot/fix this?
Any help appreciated!
Thanks!