out of resources when clEnqueueReadBuffer

hello guys,

i’m writing a program with two opencl kernels. both of them run fine. but eventually, i want to loop the kernels for many many times. the output of the kernel 1 is the input of the kernel 2. so my program looks like this:



main()
{
     initialize kernel 1;
     initialize kernel 2;

     for(int i=0;i<n;++i)
     {
            initialize the input for kernel 1;    
            execute kernel 1;
            read the result of kernel 1;

            initialize the input for kernel 2;
            execute kernel 2;
            read the result of kernel 2;
     }
}


however, my code can only loop only for two times, and then i got a CL_OUT_OF_RESOURCES error, when i read the result of the kernel 2:


     ciErr1 = clEnqueueReadBuffer(cqCommandQueue, cmDevNeighbors, CL_TRUE, 0, sizeof(cl_float) * iNumElements*60, neighbors, 0, NULL, NULL);

here are the things i don’t quite understand:

first of all, according to this online specification of opencl 1.0 http://www.khronos.org/opencl/sdk/1.0/d … rrors.html, the CL_OUT_OF_RESOURCES should not be returned by the clEnqueueReadBuffer function.

second, my understanding of the CL_OUT_OF_RESOURCES error is that my kernel program uses up all the registers. but why would a reading back function need registers.

third, the kernels can run twice perfectly but have this problem for the third time. however, both the kernel program and the size of the input arrays are fixed. if the kernels can run for once, that means the resources should be enough for all the following executions. why did it stop at the third time?

one thing i’m not sure though is that i didn’t release my cl_mem pointers after each execution, instead i reuse these cl_mem pointers by writing the new data to them. the size of the data is fixed. so the program looks like this:



cmDevBuffer=clCreateBuffer(cxGPUContext, CL_MEM_READ_ONLY, sizeof(cl_int) *hostBuffer.size(), NULL, &ciErr1);

ciErr1 = clSetKernelArg(Kernel1, 2, sizeof(cl_mem), (void*)&cmDevBuffer);

ciErr1 |= clEnqueueWriteBuffer(cqCommandQueue, cmDevBuffer, CL_TRUE, 0, sizeof(cl_int) * hostBuffer.size(), &(hostBuffer[0]), 0, NULL, NULL);
	
for(int i=0;i<n;++i)
{

    run kernel1;

    ciErr1 = clEnqueueWriteBuffer(cqCommandQueue, cmDevCellPointsBeginAndEnd, CL_TRUE, 0, sizeof(cl_int) * cellPointsBeginAndEnd.size(), &(cellPointsBeginAndEnd[0]), 0, NULL, NULL);
	
}


should i release the cmDevBuffer at the end of the loop chunk and recreate a new one at the beginning of the loop chunk? like this:



for(int i=0;i<n;++i){
     cmDevBuffer=clCreateBuffer(...);
     clSetKernelArg();
     clEnqueueWriteBuffer();

     call Kernel1;

     call Kernel2;

     clEnqueueReadBuffer();
     clReleaseMemObject(cmDevBuffer);
}

i didn’t do this, because i thought it is unnecessary as the size of the buffer doesn’t change. but now i do have the out of resources problem.
i will try this option, but if it doesn’t work, i will have no idea how to fix this.

is there anybody having the same problem before or having an idea of what the problem might be?

Thank you very much.

:x

interesting thing happened.

the code was run on my mac book with a nv m9400 card.

now i’ve switched to my desktop with 8800 gtx, the code runs perfectly. i looped the code for 50 times, and there is no problem.

may this a driver bug? cause i know notebook driver is in demo.

you see, coding on a graphics card is always a frustrating experience, because even though everything is correct, things don’t work.

I take it you’re not using Mac OS X here based on your comment about driver demo code. I would suggest trying your program under OS X and either providing a context error callback function or setting the environment variable CL_LOG_ERRORS=stdout. This will give you far more detailed error reporting.

However, I don’t see any reason why you should get this error. You should not need to free/recreate cl_mem objects as that will be a big performance hit. My guess is that this is a driver bug, probably related to the different amounts of VRAM available on the desktop (512MB) and laptop (256MB) devices.

Hi,

I found this page while trying to solve the same issue myself. I now think I understand what my problem was - I had a write-only buffer that exceeded the amount of memory that can be allocated in one block (the same would happen, I assume, even if the array were smaller that the upper limit, if it still exceeded the remaining resources).

It does not seem that odd to me that this error is raised by this method, because it is the first call I make after queuing the kernel. What I guess happens is that the kernel is queued, starts, attempts to allocate the buffer, fails, and is left in a broken state. Attempting to read then retrieves the error code.

What’s more, this would explain why your problem goes away when you change to a card with more memory.

Andrew

Just in case anyone else reads this, some more help:

  • calling clFinish() gets you the error status for the calculation (rather than getting it when you try to read data).

  • the “out of resources” error can also be caused by a 5s timeout if the (NVidia) card is also being used as a display

  • it can also appear when you have pointer errors in your kernel.

Andrew

Andrew may well be right here. Reporting errors from the GPU is difficult because it operates asynchronously from the CPU and errors are caught at a much coarser granularity. That said, the runtime should check to make sure your data fits on the card before letting you enqueue the kernel. However, that assumption relies on the card’s memory paging system to be robust, which is not always the case with GPU drivers. If you are writing off the end of an array on the GPU you can get all sorts of bad errors as their memory protection is not as robust as the CPUs. Running on a card with more memory may make this disappear as you won’t overwrite anything important. I find it is very helpful to run my kernel first on the CPU to make sure I’m not accessing memory out-of-bounds before moving to the GPU.

Going on what Andrew said, I was just having this error on a convolution algorithm.

it can also appear when you have pointer errors in your kernel.
makes sense if you were to have an index_out_of_bounds exception that is caught by the GPU. You would be accessing a memory address that wasnt there.