Endianness in kernel arguments

Hi,

I’m new to OpenCL, and I’m reading topics about endianness but I’m getting more and more confused as I’m reading more.

Let’s suppose a kernel is going to receive arrays of floats and unsigned ints from the host. So the host initializes such arrays, creates memory objects from them, and sets them as kernel arguments.

But… must the host shuffle the endianness of each float and each unsigned int in the array if the device and the host have different endianness?

I hope not, because otherwise sending floats and reading floats to/from devices can be a complex task. Doesn’t OpenCL have some convenience API functions for this?

Thanks!

For kernel arguments the implementation takes care of endianness. For data buffers (clBuffer) it is your responsibility. However, I haven’t heard yet of an implementation where the device doesn’t match the host endianness, so I wouldn’t worry about it until you do (since you have no way to test it).

Do you mean that NVIDIA and AMD OpenCL implementations are little endian, just like Intel CPUs are little endian in all OSs?

I’m arriving to the conclusion that all OpenCL examples from books and from the web are wrong: they don’t check whether endianness on the host is the same as on the devices. So, all OpenCL demos work by accident, just because endianness is the same. But they would fail if some system doesn’t behave that way.

Likely true, but again, until such a mis-matched implementation exists how would you test that you handled it correctly? Seems like a lot of extra work for an unlikely scenario.

I suppose that all software would fail in implementations where the host and OpenCL endianness differ, so the OpenCL vendor will match the host endianness via transparent conversion even if the spec advises the contrary, just to make sure that apps work.

Honestly, IMHO, the spec is broken here: the API functions for managing clBuffers should match the host endianness, and if the device endianness differs, it should be the OpenCL implementation responsibility to perform the conversion in a transparent way. I don’t know what the spec designers were thinking when they left it broken.

That’s simply not possible since the runtime doesn’t know what you’re storing in the buffers. It could be any size data types in any combination. There is no way for it to know which bytes to swap around.

That’s what the attribute “endian” is made for: use attribute ((endian(host))) or attribute ((endian(device))) to tell OpenCL which kind of endianness a buffer uses. Default is device endianness (cf. section 6.11.3 “Specifying Attributes of Variables”).

There is however a real problem with kernel arguments passed as values. The specs explicitely say:

“When the host and device have different endianness, the developer must ensure that kernel argument values are processed correctly. The implementation may or may not automatically convert endianness of kernel arguments. Developers should consult vendor documentation for guidance on how to handle kernel arguments in these situations.”

It’s a pity there is not even a device query to tell how kernel arguments are handled in this case. However I can’t imagine an implementation that would not automatically convert arguments: a simple int passed as argument would have to be reversed by hand in the kernel.

When you use pointers to void in C/C++ (i.e.: void *ptr) it’s a way of convenient data abstraction (which must be dealt with care, of course). But at the end, you can do nothing with a pointer to void. You must cast it to a pointer to a certain data type, or an aggregate data type or whatever if you have to use such pointer.

The same goes for OpenCL. The fact that OpenCL doesn’t care about how you cast a clBuffer is no excuse. It should know, and there should be ways of informing OpenCL about the content of a clBuffer (yes, even if your buffer is an array of a non-trivial aggregate type).

Otherwise, IMHO, I see that the only way of programming OpenCL in a reasonable way is to assume that the host and the device endianness match, and accept your program will malfunction if they don’t match (well, you could notify the user: “Unable to run program because of endianness missmatch between host and devices”).

What the OpenCL spec says (pass this responsibility to app developers) is just plain absurd and non-practical. There’s no practical way of dealing with it unless you complicate your code to the absurd.