problem passing struct kernel arguments (Apple)

I’ve run into a problem where I would like to pass one or more simple structures to my OpenCL kernel. I am aware of potential issues of packing and endian-ness. I’m prepared to work with those carefully, but I don’ think that’s what’s going on now.

Bottom line: I can’t pass a simple structure as a kernel argument into my kernel complied on Mac OS X 10.7.2, for NVIDIA GeForce 8600M GT. (OpenCL 1.0 supported)

Struct passing works as expected when on the same machine, compiled for an Intel CPU. Also works fine on a linux machine compiled for a Fermi GPU (CUDA 4.1).

I’m thinking this is a bug with Apple’s implementation, but want to see what the community thinks before reporting it. Also, can anyone confirm this bug on a different Mac GPU device?

Links are for two simple demo programs. The first demonstrates the struct problem (at least run on my setup. Desired output is:

Input to kernel was: 2002, output back was 2002.
The second demonstrates successful results when the native type is passed directly (not contained in the struct).

The key point is that passing a kernel argument

struct { int intMember }

does not work while the native-type argument

int nativeArg

does.

Struct Argument http://dl.dropbox.com/u/54478577/supporting%20forum%20posts/structProblemDemo.cpp

Native Argument: http://dl.dropbox.com/u/54478577/supporting%20forum%20posts/nativeScalarOKDemo.cpp

Build commands are in the leading source file comments.

Thanks.

It looks like passing structures by value to an OpenCL kernel doesn’t work on Apple indeed.

I reported this to Apple as a Bug Report back on April 3rd. Not that that has gone anywhere since, but maybe someday.

Although your own structs cannot be passed as arguments to OpenCL kernels,
you can pass in built-in types such as int4, uint4 by value as a workaround. You can typedef the x,y,z,w fields to meaningful names:

#define m_numElements x

__kernel name(int4 args)
{

if (args.m_numElements)

}

I can confirm that I came across this problem previously on a Mac (OS 10.6.8). I never found a solution, so thanks for filing the bug report. I presume that you’re getting the same error number as me (-51:CL_INVALID_ARG_SIZE). Actually I’m a little disappointed when I look at Apple’s more recent efforts regarding OpenCL, I get the feeling from their website that they might not be so committed to it as they once were.

Erwin: regarding your workaround, do you think that there would be any performance benefit to using an int4 buffer over 4 buffers of ints? I am currently using a random number generator with 4 state variable so this would be quite interesting to me.

Thanks,
Dave.

Dave, I do not receive any error when setting the argument, just incorrect struct values within the kernel.

Here’s an update having tested a few different platforms and GPUs. The bug seems specific to Apple Mac OS X with NVIDIA GPU.

FAILS:
Mac OS X 10.7.4 - NVIDIA GeForce 8600M GT
Mac OS X 10.8.2 - NVIDIA GeForce GT 650M
Mac OS X 10.8.2 - NVIDIA Quadro 4000

WORKS:
Mac OS X 10.8.2 - ATI Radeon HD 5770
Linux NVIDIA CUDA SDK - Tesla M2050
Linux AMD APP SDK - Xenon CPU
Linux AMD APP SDK - ATI Radeon HD 5770

amending my previous update. In slightly more complicated struct cases, the Apple/AMD GPU case fails as well.

I’m on OSX and, all day long, I pass in a structure containing dozens of variables with no problems. This struct contains bytes, ints, floats, and arrays of all three…

no problems under 10.7.3, or 10.7.4. (I use …4 on the laptop for coding, and …3 on the production Mac Pro, since …4 breaks OpenCL for the AMD GPU on that machine…)

Photovore, does my “Struct Argument” program linked in my original post work for you when run for your GPU device? I’m saying that program works for me on my AMD GPU. It fails with NVIDIA GPU. (Both on 10.8.2)

Noah,

  1. I didn’t read carefully earlier; I pass my structure by address.

  2. My only nVidia device is in my laptop, where I run 10.7.4; indeed structProblemDemo fails:

Available devices:

device: 0

CL_DEVICE_NAME: GeForce GT 330M
CL_DRIVER_VERSION: CLH 1.0

SELECTED DEVICE: 0.

Build Log:

End of Build Log.

Setting kernel arguments…
Enqueuing kernel…
Kernel executed. retreiving results.
Input to kernel was: 2002, output back was 1335342685.

Struct size in host: 4, size in kernel: 0.

The End

I met the same problem on OS X 10.8.4 and Geforce 320M or Radeon HD 6970M:(

As a workaround, I replaced an argument which receives a value by one which receives reference(that is a buffer object).
I could get member values correctly. But if it contains an array as its member, I couldn’t get each component of the array.

I know this thread is no longer new, I post this report:)
There is still the problem.

I’m having the same problem on a retina macbook pro with Geforce GT650M running 10.8.5. Passing in a struct with 4 float4s freezes up my machine. On an older macbook with an ATI 6750M it doesn’t crash, but the values passed in aren’t correct.

You cannot pass structs as kernel arguments. There is nothing in the spec that says you can. You can only pass basic types, vector types, and mem_objects. To pass structs you need to upload them as buffers, or as another posted suggested, use a vector type (I’ve used float16 to pass 16 float parameters in).

There’s nothing in the spec that says you can’t. Actually major implementations of OpenCL on Windows (Intel, NVIDIA, AMD) can pass structure arguments to a kernel, so this looks like a bug in Apple implementation.

Passing structure arguments is covered by §5.7.2 in OpenCL 1.2 spec “Setting Kernel Arguments” as “other kernel arguments” type. arg_value should be a pointer to the structure data and arg_size the size of the structure.

Technically, passing a structure as argument is nothing more than copying its content into a __global or __constant buffer and transparently passing this buffer to the kernel. So if an implementation is able to pass a mem buffer containing an array of structures to a kernel, it can also pass a structure as value to a kernel.

I can confirm that structs still do not work with Apple OpenCL. If I pass a struct with 4 ints, the kernel receives the first 2 ints correctly but the other 2 members are zero. This is with the Iris Pro GPU on OSX 10.9.

Typically my kernels have between 10 and 20 int size parameters and it’s very convent to be able to share the same structure type between CPU and GPU. So this bug really sucks. As a workaround I’m now looking into casting my struct into a cl_ulong16. This has enough space and seems to work even with Apple but it’s terribly ugly.

I’m wondering though what is considered the best way to pass kernel parameters of this size? From what I understand kernel parameters map into __private space which is scarce. And __private maps into registers which are per-thread while the kernel parameters are identical and const for all threads in a work group. Or does the compiler recognise this and use shared memory?

Or is uploading my struct to __constant memory a better solution? But this requires an enqueueWriteBuffer for each kernel execution plus waiting for upload completion event, which seems inefficient if it’s just a 100 bytes.

Thanks
Joost