Second chance exception after kernel execution

Hello everybody!

I have a problem with my OpenCL code. The program reads some data from a file, then sends everything to the kernel for computation. In particular, data is stored in a pointer named “n”, then a buffer for n is created and sent to the kernel.

Here is the code.


devHostMemN = clCreateBuffer(cxGPUContext, CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR, sizeof(cl_uchar)*(*sizeIn), n, &ciErr1);

ciErr1 = clSetKernelArg(ckKernel, 0, sizeof(cl_mem), (void*)&devHostMemN);

ciErr1 = clEnqueueNDRangeKernel(cqCommandQueue, ckKernel, 1, NULL, &szGlobalWorkSize, &szLocalWorkSize, 0, NULL, NULL);

The problem is that after this last instruction (which completes with no error whatsoever), I do a clFinish which returns “Invalid Command Queue”. Using gDEBugger, it returns a second chance exception, saying that “The thread tried to read from or write to a virtual address to which it does not have access”.

Can you help me? I’m a newbie here, so please be kind :smiley:
Thanks a lot!!

Routine

The kernel you are executing is attempting to read beyond the end of buffer devHostMemN. Check the value of szGlobalWorkSize and the contents of the kernel you are running.

Thanks for your reply.

Buffer devHostMemN contains a 400x20 dataset, so I set the global worksize to 400 and the local worksize to 20.
Is that right?

Routine

Buffer devHostMemN contains a 400x20 dataset, so I set the global worksize to 400 and the local worksize to 20.

400x20 what? cl_chars?

The global work-size represents the number of parallel work-items that you want to execute. The local work-size indicates how the global work-size is divided into smaller units called work-groups.

If you enqueue an NDRange with 400 work-items that’s the number of work-items that will be spawned. The local work-size doesn’t change the number of work-items, only how they are grouped together.

In your case you may be better off passing NULL as the local work-size. That will indicate that you let the driver choose a suitable value.

400x20 what? cl_chars?

No, they are unsigned chars.
I’m not sure I understand what size to choose for the global work size… If I’m not mistaken it should allow the computation of the input data and so its size should be at least as that of the input.
I also have other arguments sent to the kernel, should I include them as well to calculate the size?

Sorry to bother you, I new to OpenCL and I could use all of the information you can give me.

Thanks,
Routine

The globalSize is not directly related to the size of the data you have but is a combination of the
size of the problem and the amount of work each instance of the kernel performs. It is not uncommon
for a kernel to operate over a small subset of the data so you would in tnat case not need as many work items as data items.

In your case, assuming your kernel only operates on a single data item and you have 400x20 data items
you will need a global worksize of 8000. Setting a local worksize of NULL as suggested by david.garcia
will allow the runtime choose an appropriate value.


jason

Thanks for the help guys.

routine