while-loop breaks kernel

Hello,
I am using openCL to render an image of the Mandelbrot set.The program stops functioning when the while-loop in the kernel below finishes at different times in different threads. I know this because when I remove the condition: ‘zimag*zimag + zreal * zreal <= 4&&’ from the while-loop the kernel runs fine without errors.

EnqueueNDRangeKernel() returns Out_of_range and a Windows-notification that reads"Display driver has stopped responding and has recovered". I think it is caused by the fact that the kernel takes too long time and Windows stops my kernel. Is there a way to come around this?

https://github.com/benhiller/opencl-mandelbrot
This project on github seems to be working and is using the same escape time algorithm as I do. What am I doing wrong?

I am using openCL c++ bindings on an Intel(R) HD Graphics 4400(I can’t get it too work on my nvidia card).

Host(I have left out the part where the kernel is built)

//uses the gpu to generate the  Mandelbrot set.
//stores the colours in the variables r,g,b of the class
//Mandelbrot.
void Mandelbrot::run_kernel(){
    std::cout<<"running kernel!
";
    int tempI[w * h] = {};
    
    cl::Buffer buffer_Maxit(context,CL_MEM_READ_WRITE, sizeof(max_iterations));
    cl::Buffer buffer_unit(context, CL_MEM_READ_WRITE, sizeof(float));
    cl::Buffer buffer_i(context, CL_MEM_READ_WRITE, sizeof(tempI));
    cl::CommandQueue clqueue(context, devices[device_id]);

    clqueue.enqueueWriteBuffer(buffer_Maxit, CL_TRUE,0, sizeof(int),&max_iterations);
    clqueue.enqueueWriteBuffer(buffer_unit, CL_TRUE,0, sizeof(unit),&unit);

    cl::Kernel kernel_colour = cl::Kernel(program,"calculateColour");
    kernel_colour.setArg(0, buffer_Maxit);
    kernel_colour.setArg(1, buffer_unit);
    kernel_colour.setArg(2, buffer_i);
    clqueue.enqueueNDRangeKernel(kernel_colour, cl::NullRange, cl::NDRange(w,h), cl::NullRange);
    clqueue.finish();
    clqueue.enqueueReadBuffer(buffer_i, CL_TRUE, 0, sizeof(tempI), &tempI);    

    //colouring
    for(int y = 0; y < h; y++){
        for(int x = 0; x < w; x++){
            if(tempI[x + y * h] < max_iterations)
            {
                r[x][y] = 255;
                g[x][y] = 255;
                b[x][y] = 255;
            }
            else
            {
                r[x][y] = 0;
                g[x][y] = 0;
                b[x][y] = 0;
            }
        }
    }
}

Kernel:

kernel void calculateColour( const int maxIt,   const float unit, global int* tempI){

            float zreal = 0;
            float zimag = 0;
            float creal = unit * get_global_id(0);
            float cimag = unit * get_global_id(1);
            int i = 0;
            while(zimag*zimag + zreal * zreal <= 4 && i < maxIt){// <- This is the loop that is mentioned in the text above
               float zrealtemp = zreal * zreal - zimag * zimag + creal;
               zimag = zreal * zimag * 2 + cimag;
               zreal = zrealtemp;
               i++;
            }
            tempI[get_global_id(0) + get_global_id(1) * get_global_size(0)] =  i;
}

General advice: Since the display driver needs to use the GPU it kills your long-running compute process. When running OpenCL on the display GPU you should try to keep your kernels in the single-digit milliseconds, or double-digit if necessary (and rarely). If you run multiple kernels over 100 ms your GUI becomes sluggish and unusable.

Specific advice: Instead of trying to compute the entire Mandelbrot in a single NDRange enqueue, break it up into many smaller enqueues. Each will interleave with display GPU processing as needed. Let’s say your image needs 10 seconds to compute (which Windows will kill after something like 8 seconds), divide it up into 50 by 50 tiles, each of which should take about 4 ms. If these smaller NDRange kernels are still timing out you may have a bug.

Thank you for your reply.
I tried with smaller kernels(even down to 1*1) but none of them worked(my computer crashed once with video_tdr_failure, but it did not crash the second time I tried). Do know where in the program the bug might be(kernel, host etc.)?

Your kernel takes three arguments, int, float, & buffer
You are calling setArg with buffer, buffer, & buffer
The first two are wrong, you should be passing int & float (and also don’t need to create three buffers, just one).

So I surmise that inside your kernel the value of maxIt is very large and the kernel runs for a long time and get terminated.

If you don’t have a way to single-step the kernel (probably not unless you have a fancy debugger), then check assumptions starting with the values of input parameters (write them to the buffer and exit, then read the buffer on the host and see if the values are correct). You can work through the kernel piece by piece to debug it.

Thank you once again for your reply. I am a little busy with other things but I will try what you have suggested when I have time.

I have tried setting arguments correctly, now it works like a charm! Thank you!