My global work size is {128; 128; 4096}. The calculations are pretty intense, so every time I start my kernel, windows watchdog fires in and makes the GPU driver crash (the screen is going black).

Right now I'm enqueueing the kernel in the following way (using OpenCL.Net wrapper for C#):
Code :
IntPtr[] workGroupSizePtr = new IntPtr[] { (IntPtr)128, (IntPtr)128, (IntPtr)4096};
error = Cl.EnqueueNDRangeKernel(cmdQueue, kernel, 3, null, workGroupSizePtr, null, 0, null, out clevent);

As you can see, I'm letting OpenCL to decide the size of my local work size. Is it optimal in my case or should I do it on my own? What would be the optimal local work size? I don't really understand the concept of workgroups, why would I want to have a few of them?

How can I divide my large task into a few smaller subtask and prevent the GPU driver from crashing? As far as I understand Windows's watchdog kicks out the kernel, because it's taking too long to execute. What steps can be taken to prevent this behavior?

Thank you!