CLH_ERROR_NO_BINARY_FOR_GPU when not using sampler!?

memo · November 20, 2009, 4:20pm

hi all, my very simple kernel code is below.

__kernel void msacv_multiplyf(write_only image2d_t dstImage, read_only image2d_t srcImage, const float f, sampler_t smp) {
	int2 coords = (int2)(get_global_id(0), get_global_id(1));
	float4 color	= read_imagef(srcImage, CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST, coords);
	write_imagef(dstImage, coords, color * f);   
}

You will see that I have a parameter not being used, sampler_t smp. If I don’t have that parameter, I get the compile error below! I’ve googled this message and it seems a few people are getting it on weird situations, but I have not seen anyone get it because of this.

***** Error building program. *****

OF_ERROR: Error while compiling the ptx module: CLH_ERROR_NO_BINARY_FOR_GPU
PTX Info log:
PTX Error log:

The odd thing is, in the same program, I have the kernels below, and the program builds fine with these in!

//--------------------------------------------------------------
__kernel void msacv_invert(write_only image2d_t dstImage, read_only image2d_t srcImage) {                                                                                            
	int2 coords = (int2)(get_global_id(0), get_global_id(1));
	float4 color = read_imagef(srcImage, CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST, coords);
	color = (float4)(1.0f, 1.0f, 1.0f, 1.0f) - color;
	write_imagef(dstImage, coords, color);
}  


//--------------------------------------------------------------
__kernel void msacv_threshold(write_only image2d_t dstImage, read_only image2d_t srcImage, const float thresholdLevel) {
	int2 coords = (int2)(get_global_id(0), get_global_id(1));
	float4 color	= read_imagef(srcImage, CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST, coords);
	write_imagei(dstImage, coords, isgreaterequal(color, thresholdLevel));   
}

what am I missing?

cheers,

memo.

dbs2 · November 21, 2009, 11:29pm

It looks like you’ve found a bug in Nvidia’s PTX compiler. You should file a bug report with the vendor (in this case I’m guessing you’re using Apple’s OpenCL). That kind of error with PTX information indicates that the OpenCL compilation went fine but there was a problem when the GPU’s compiler tried to produce the final binary.

As an aside, you will not get very good performance out of these kernels because they are so small. The overhead of the kernel setup and the get_global_id() calls will be quite large compared to the operations you are doing. (This is one area where GLSL would be far faster as it gets the global_id information for free in hardware and has optimized write patterns.) If you apply multiple kernels like this to an image you should seriously consider combining them into one kernel so there is more work done per kernel invocation.

PaulS · November 22, 2009, 1:18am

You’re right, that error does seem to be a general “something went wrong!” error, and the reasons for it are numerous. Sometimes it’s worth compiling for the CPU as that can pick up errors in your code which are missed compiling for the GPU… and sometimes it just works and doesn’t tell you anything.

I think in this case it’s because a sampler_t might not be able to be passed as a parameter as it’s something special. This is a bug, as the spec says “The sampler can be passed as an argument to the kernel using clSetKernelArg, or it can be a constant variable of type sampler_t declared in the program source.”, but it wouldn’t surprise me if the implementation had a problem with it.

Looks like another one to have a radar filed on it. At least you’ve already got a simple kernel that shows the problem.

memo · November 22, 2009, 2:52am

Thanks for the replies, yea it looks very suspicious, have filed radar 7414906 (This was on 10.6.2 with latest Macbook Pro, 9600GT). I can’t understand why msacv_threshold would work and msacv_multiplyf won’t, they are almost identical!

Regarding

If you apply multiple kernels like this to an image you should seriously consider combining them into one kernel so there is more work done per kernel invocation.

That’s actually another question I had, I will definitely do that.