LLVM compilation failure -- why?

So I have this kernel which compiles fine when the indicated line is commented out, but causes this error when the line is in:

cvmsErrorCompilerFailure: LLVM compiler has failed to compile a function.

Any ideas why?

__kernel void Convolve(
        global const uint *img,
	write_only image2d_t newImg,
	const uint *filter, const int filterWidth, const int filterHeight,
	const int w, const int h, const int wpl)
{
        private const int gid = get_global_id(0);
	private const int x = gid%w;
	private const int y = gid/w;
	private const int halfWidth = filterWidth/2;
	private const int halfHeight = filterHeight/2;

	private uint sum = 0;
	private uint filterPointer = 0;
        private uint xfrom = (x-halfWidth)/32;
        private uint xto = (x+halfWidth)/32;
        private uint xwidth = xto - xfrom + 1;
        private uint imgCache[xwidth*(2*halfHeight+1)];
        private uint imgCachePtr = 0;


        for (int v=y-halfHeight; v<=y+halfHeight; v++)
            for (int u=xfrom; u<=xto; u++, imgCachePtr++)
            {
                bool outOfBounds = (v<0 || v>=h || u<0 || u>=wpl);
                int uu = outOfBounds ? 0 : u;
                int vv = outOfBounds ? 0 : v;
                uint val = img[wpl*vv + uu];
                val = outOfBounds ? 0 : val;
                imgCache[imgCachePtr] = val;
            }


	for (int v=0; v<filterHeight; v++)
		for (int u=0; u<filterWidth; u++, filterPointer++)
		{
			int yy = y + v - halfHeight;
			int xx = x + u - halfWidth;
                        bool outOfBounds = (yy < 0 || yy >= h || xx < 0 || xx >= w);
                        yy = outOfBounds ? 0 : yy;
                        //xx = outOfBounds ? 0 : xx;
                        imgCachePtr = v*xwidth + (xx/32) - xfrom;

                        uint buggy = (imgCache[imgCachePtr] >> (31 - (xx & 31))) & 0x01;
                        buggy = outOfBounds ? 0 : buggy;
                        buggy = 255 * (1 - buggy);
                        sum += buggy * filter[filterPointer]; // Why does this line cause a compile failure?????

                        uint val = (img[wpl*yy + xx/32] >> (31 - (xx & 31))) & 0x01;
                        val = outOfBounds ? 0 : val;
                        val = 255 * (1 - val);
                        sum += val * filter[filterPointer]; // .... and this line does not????

		}


	uint lowerBound = max( (uint)0, sum );
        uint upperBound = min( (uint)255, upperBound );
	write_imageui(newImg, (int2)(x, y),
		(uint4)( upperBound, 0, 0, 0) );

}

The error I can see here is the declaration one of your parameters: const uint *filter.
Pointer argument always must point to address space global, local, or constant. So try with: constant uint *filter.

This looks like a known bug in 10.6.0/10.6.1 with indexing into constant arrays with a variable index. I’m afraid there’s no work-around except to move your data out of the constant space until Apple fixes it.

Replaced const uint *filter with global uint *filter, same error. I think there’s something more fundamental and buggy going on here. Otherwise, why would the code compile the second section – uint val accessing an element in global space (img) and multiplying by an element in global space (filter), when the code in the first section fails-- uint buggy accessing an element in private space (imgCache) and multiplying by an element in global space (filter).

I did want to rely on the global memory in img being cached by the card, but the GeForce 8600M GT has no global cache. So I figured that I would be clever and copy the part of img that I needed into private space, but noooooooo…

Given that this is my goal, can anyone think of a workaround?

Or, and I’m just guessing here, maybe there’s a problem with variable arrays on the stack. When the line is in, the array is accessed, which causes the compiler to blow up. When the line is out, the compiler can optimize the array access out, which causes the compiler to not allocate the array on the stack, so it compiles OK…?

Indeed, that was the case. I replaced the variable array with a fixed-size array, and that worked. And, I didn’t know that until I did a search for “arrays” in the standard and discovered in section 6.8 (Restrictions):

d. Variable length arrays and structures with flexible (or unsized) arrays are not supported.

Looks like I’ll be reading this section a bit more carefully!

Sounds like the compiler should be generating an error there instead of failing like that.

BTW – everything is private by default so you don’t have to put in most (if not all) of those “private” declarations.

Well, yes, the compiler should generate an error. But this is Apple’s OpenCL compiler, no? The one which is kinda fragile… I hope they pay attention to the bug reports and fix them for 10.6.2.

Thanks – I was pretty sure local stack was private, but I guess inertia got the better of me.

Apple bug report 7375696 filed.

That may or may not be a failure in Apple’s compiler, actually. LLVM is usually really good about telling you what failed and where. If you get less descriptive errors it may be because LLVM generated code and passed it on to the GPU vendor’s compiler which choked on it. That tends to get reported back in a less-than-helpful manner as some generic failure.