Command Queue going invalid

PaulS · September 18, 2009, 8:23am

I’m having a problem where I have a clCommandQueue object going invalid. Is there any reason this should happen when none of my code is calling clReleaseCommandQueue?

Some context:
This is an OpenGL based program on SnowLeopard (10.6.1), but I’m not using shared GL objects.
The program renders the data, and then uses OpenCL to calculate the next frame.
If I target the GPU the command queue becomes invalid after 1-2 frames.
If I target the CPU everything works fine.
Adding extra retains on the command queue and context (just for good measure) have no effect.

dbs2 · September 22, 2009, 12:03am

Paul, try setting CL_LOG_ERRORS=stdout (I think that’s the right spelling on SnowLeopard) when you run your application or providing a context logging function when you create your contexts. (Apple provides a few sample ones in the header files you can use.)

In my experience, contexts can be invalidated if you read/write out-of-bounds of your memory objects in a kernel (the CPU is much more forgiving) or if the GPU can’t get all the memory on the card at the same time. This may be because the system needs some memory on the card for overhead. Try reducing the total amount you are trying to use at once, and verify that you are not accessing memory you haven’t allocated.

PaulS · September 22, 2009, 5:06am

Thanks for that. It looks like apple ship a few standard error handlers that you can plug into the clCreateContext call. They’re pretty self explanatory:

extern void clLogMessagesToSystemLogAPPLE(...)
extern void clLogMessagesToStdoutAPPLE(...)
extern void clLogMessagesToStderrAPPLE(...)

You’re meant to check for the relevant extension string before using them, but for debug it’s reasonable to just plug them in.

Using these I’ve managed to track it down to one particular clEnqueueReadBuffer giving this error:

[CL_INVALID_COMMAND_QUEUE] : OpenCL Error : Read caused an error that invalidated the queue (0x119090). This may be due to a resource allocation failure at execution time.

… and then next time round everything fails in a big explosion. I just need to work out why the read fails.

PaulS · September 22, 2009, 6:41am

Ok, I found out what was going on, and I’ll write up a little description to help anybody else.

My kernel is structured so that a number of float arrays are read, a calculation is performed and then the results are written out to another buffer. It was this output buffer that I was reading back, and that read was causing the failure.

Previously my code looked something like this:

kernel void foobar(float * in0, float * in1, float * out);

Notice the lack of global, local, constant, etc. I had assumed that that the system would make an informed choice as to which memory space these arrays needed to be in, but that didn’t seem to be the case when targeting the GPU (the CPU worked fine with this). I would have at least expected a default of global.

Defining the parameters like this (because the inputs are read-only):

kernel void foobar(constant float * in0, constant float * in1, global float * out);

brought this error

[CL_UNKNOWN_ERROR] : OpenCL Error : clBuildProgram failed: could not build program for device 0 (0x7365522f) (-44)
[CL_BUILD_ERROR] : OpenCL Build Error : Compiler build log:
Error while compiling the ptx module: CLH_ERROR_NO_BINARY_FOR_GPU
PTX Info log: 
PTX Error log:

… which I still don’t know what it means, but it does seem to be related. When I only had a couple of things tagged constant, but it was still too big I got a sensible error during clEnqueueNDRangeKernel (i.e. after I had set the kernel parameters and therefore set the size of the constant data).

Tagging everything as global solves the issue, and get’s everything running. That probably means that the output buffer wasn’t in global space when I’d left it undefined, and so when I read it things went bang.

I’m now off to try some optimisations, as things are running slower than the CPU at the moment, but at least it’s running.

dbs2 · September 23, 2009, 9:32am

Paul, have you checked that you’re not doing anything non-constant-like with your constants? My guess is that you are using them in a way that is causing the compiler to have problems. It should generate an error in this case, though. (Are you checking the compiler log?) If the compiler is failing like this you should try to reduce this to the simplest kernel that reproduces it and file a bug with the vendor.

PaulS · September 24, 2009, 4:55am

Unfortunately that is the compiler log. Not very helpful.

I’ll try to find the smallest kernel that exhibits the problem.

PaulS · September 24, 2009, 5:18am

This seems to be it:

kernel void foobar(constant float * in,
                     global float * out) {
                     
    int gid = get_global_id(0);
        
    for (int i = 0; i < 1; i++) {
       out[gid] = in[i];
    }
}

What seems to matter is the constant array being accessed with the loop counter as the index.
I’ll report a bug.

dbs2 · September 25, 2009, 1:18am

Ah. Yes. I suspect that the compiler is trying to access the constant memory as a pointer, but that memory is handled very differently from other memories in hardware so that might cause problems. My guess is that accessing it with constant offsets will work. I would suggest filing a bug against Apple on this.

mpw · September 30, 2009, 7:33pm

This is a known problem that will be fixed in a future SnowLeopard Software Update.