how to declare a constant array of float2 vectors?

In my kernel code I am trying to declare a constant array of float2 vectors like this:


__constant float2 grads[2] = {(float2)(1.0f,0.0f), (float2)(0.0f,1.0f)};

But this gives a rather obscure error when attempting to build the kernel:

ptxas application ptx input, line 11; fatal : Parsing error near ‘,’: syntax error
ptxas fatal : Ptx assembly aborted due to errors
error : Ptx compilation failed: gpu=‘sm_20’, device code=‘cuModuleLoadDataEx_4’
: Considering profile ‘compute_20’ for gpu=‘sm_20’ in ‘cuModuleLoadDataEx_4’
: Retrieving binary for ‘cuModuleLoadDataEx_4’, for gpu=‘sm_20’, usage mode=’ ’
: Considering profile ‘compute_20’ for gpu=‘sm_20’ in ‘cuModuleLoadDataEx_4’
: Control flags for ‘cuModuleLoadDataEx_4’ disable search path
: Ptx binary found for ‘cuModuleLoadDataEx_4’, architecture=‘compute_20’
: Ptx compilation for ‘cuModuleLoadDataEx_4’, for gpu=‘sm_20’, ocg options=’ ’

Any suggestions for the correct way to do this?

I’m using the Linux Nvidia OpenCL implementation.
This code did previously work ok on Mac, so I was thinking it was correct syntax…

Thanks!

Did you declare grads at program scope or at function scope? That is, does it look like “foo” or like “bar” in the code below:


// Program scope. This is valid and should compile.
__constant float2 foo[2] = {(float2)(1.0f,0.0f), (float2)(0.0f,1.0f)};

__kernel void fubar()
{
    // Function scope. This is not valid and should not compile.
    __constant float2 foo[2] = {(float2)(1.0f,0.0f), (float2)(0.0f,1.0f)};
}

Section 6.5.3. of the spec says:

Variables allocated in the __constant address space can only be defined as program scope variables and are required to be initialized.

Yes it is at program scope. (not function scope)

…so it should be correct syntax and should work, but doesn’t. Any ideas? Bug?

Yeah, as I mentioned earlier, if it is in program scope it should work. There’s a compiler bug.

Thanks - had the same problem now it’s resolved!

Care to share with us how it got resolved?

just an update on this, in case anybody else is running into the same issue:
Nvidia says it should be fixed in their driver r280 release.

I did run into the same issue (with driver 306.94) and did some further investigations, see OpenCL bug: __constant program scope variable and __constant kernel argument? - CUDA Programming and Performance - NVIDIA Developer Forums.

Kind regards,
Markus