but so far nobody has replied me. I did quite a bit search in the meantime, including reading the specification, but unable to find an example for how to use a struct in a kernel.
Can someone take a look at my example and tell me what I should do in this case? I tried replacing all member types with cf_float4/cf_float etc in my host code, but it does not work either on an nvidia card
I don’t see anything wrong with your code and you already said that the number of constant kernel arguments is 4 so that’s not an issue either.
Between that and the fact that it works on ATI, it looks quite clearly like a bug in NVidia’s OpenCL drivers.
I’m sorry I don’t have any advice on how to work around the issue. You could try commenting out some of the struct fields and see if at some point the problem goes away.
I printed sizeof(KParam) inside the host and device and found the two sizes are different for the code I posted at nvidia’s forum: for the host code, it is 180, for cl kernel, it is 192. I prepended all type names by cl_ for the host definition, and now their sizes are the same.
In your opinion, if I don’t prepend cl_ in the types, will there be misalignment when passing the 180-byte host struct to the 192-byte device struct? where the paddings happen? are they at the very end of the struct or can be in between two elements?
I also found out that the segfault error may not solely be caused by the constant parameter, but by some bugs in the nvidia’s compiler in handling nested if-statements. I am still investigating on this.
In your opinion, if I don’t prepend cl_ in the types, will there be misalignment when passing the 180-byte host struct to the 192-byte device struct? where the paddings happen? are they at the very end of the struct or can be in between two elements?
Ah, I missed that. Yes, you must always use cl_xxx types on the API side as they are guaranteed to match the size of their cousins in OpenCL C (except for size_t and bool). For example, cl_long in the API side is equivalent to ulong on OpenCL C (a 64-bit signed integer).
Padding can happen either between struct members or at the end of the struct. This comes from C99 actually.