Differences in the implementation of certified OpenCL vendors makes life hard

Hey there,
I am working on relative big OpenCL (1.0) project (total sum of our OpenCL code is tens of thousands of lines).

Our code has to run on the OpenCL implementations from AMD, Intel & nVidia and this nowadays is nearly impossible, since all of those implementation (which I think are certified, but I haven’t check that to be honest) are doing different things in some cases …

We are finding some differences in their implementations, that make the code crash or not compile and I wanted to ask somewhere more familiar with the standard if those behaviors can be expected. So here is a list to begin with …

  1. Passing un-itialized variable to OpenCL kernel causes a crash (in spite of the fact that this variable is not being used at all (no read, no writes) in the entire kernel). This situation can happen relatively easily (for example, if you have a lot of #ifdef’s because you want to support all the other GPGPU platforms too). This happens on Intel OpenCL for CPU only (latest version, all the oldest version I have tried - too).

  2. Converting __global const T * restrict to a bool does not compile. Example :

__global const int * restrict ptr = NULL;
if (!ptr) {} //this does not compile
if (ptr != NULL) {} //this is okay

This happens on nVidia OpenCL (latest version).

  1. Macro redefinition causes compilation error - “Macro redefinition is not allowed in OpenCL”. This happens only with Intel OpenCL.

  2. Can’t use -> operator for float4. Example :

float4* foo = &bar;
foo->x = 0.f; //does not compile
(*foo).w = 0.f; //works

I am not sure which implementation did that, but I can check (most likely nVidia). It works on all other implementations.

There are more, but right now these are the first that come to my mind …

Please note, that the problem is that all examples I made above are problematic in one of the three OpenCL implementations we use, so we can never be sure if we have OpenCL valid source code after just testing on one of those.
Clearly, all of these are either undefined behaviours, or these OpenCL implementations does not stick very closely to the standard (and if it is the second and they are OpenCL certified, how did they become such ?) ?
The problem is that it works in one place (we haven’t enable any extensions), because somebody has decided to “extend” the language, or because he has crappy OpenCL implementation, it does not work as “a standard” - we can never be sure if everything is okay at our side, and this has started to cause a lot headache for quite some time now … It has either to work on all or none. We don’t care if we can use float4->w, as far as it is consistent along the implementations, for example.

Thanks,
Blagovest Taskov.

It should be noted that pretty much every widely implemented standard has these kinds of issues. OpenGL is probably the best known for these issues, but C++ compilers encounter them as well. How many “XML parsers” are there that don’t parse XML correctly (fully)? Even those standards with conformance tests and certification programs like OpenGL ES have huge variances among implementations.

Annoying though it may be, the only effective means of shipping code that runs on multiple implementations of a standard is to test your software with all of the implementations that are relevant for that product. That’s something the HTML folk have had to live with for decades, and W3C’s conformance tests and other testing rigs haven’t helped one bit.

“Write once, run everyone” never worked for anything. You have to test everywhere.

[QUOTE=Alfonse Reinheart;31377]It should be noted that pretty much every widely implemented standard has these kinds of issues. OpenGL is probably the best known for these issues, but C++ compilers encounter them as well. How many “XML parsers” are there that don’t parse XML correctly (fully)? Even those standards with conformance tests and certification programs like OpenGL ES have huge variances among implementations.

Annoying though it may be, the only effective means of shipping code that runs on multiple implementations of a standard is to test your software with all of the implementations that are relevant for that product. That’s something the HTML folk have had to live with for decades, and W3C’s conformance tests and other testing rigs haven’t helped one bit.

“Write once, run everyone” never worked for anything. You have to test everywhere.[/QUOTE]

Thank you for that note, but as this seems rather to be correct, it seems to me a bit off-topic also.
The purpose of my post is to find some actions that can be done, in spite of that situation we have.

What I am asking is - is this undefined behavior or is it implementation issue ? If it is the second, could you please add those as tests to the standard compliance tests, so we can make sure that the future is a bit brighter for us all ?
We understand that when you have different implementations for a standard, you will have all kind of issues, but what I am trying to do here is to take some actions and start solving them (one mice step at a time). After all, my notes are not quite specific corner cases, but rather a pretty straight-forward ones.

Hey guys, I want to bump this a little bit.
Perhaps this is not the place to put these complaints ? Is there any place for bug reports or something like that ?
Thanks,
Blagovest.

A little bump, may be.

I think that I should note guys, that if we ignore those issues they are not going to disappear by themselves.

I have to confess, that I am a bit disappointed about the lack of feedback on that.
If it will help, I can gather as much OpenCL users and kindly ask them to post here as well (or sign a petition or something, I don’t know).

I am okay to do whatever necessary to improve that, since clearly, we all can do better.

Very interesting. I have an objection regarding use of swizzle on pointers however (4).

The “message between the lines” in all GPU-oriented languages regarding swizzle/mask is that those are very similar to the proper operator. but they are not. They are different operators. Therefore, operator-> doesn’t swizzle as it’s basically operator. on a dereferenced pointer.

I would suggest to always dereference pointer to value before swizzling it. Having operator-> do swizzles/masks would indeed be very convenient.

With the exception of the first issue that you described, the problems you are encountering are all issues with the compiler front-ends behaving differently. As has been pointed out, ensuring that all compiler front-ends behave in the same way for any possible valid input program is an intractable problem. You can add bugs to a regression test suite after-the-fact, but there will always be these kinds of differences when you are trying to compile a single program with several different compilers. The same is true of vanilla C programs being compiled with GCC, Clang and MSVC - you always have to test everywhere to make sure your code will actually compile everywhere, regardless of whether or not you are rigidly sticking to a standard.

However, this situation should improve with OpenCL as of version 2.1. In OpenCL 2.1, there is a new OpenCL C++ kernel language, but it is no longer required that OpenCL drivers can consume the kernel language at runtime. Instead, OpenCL drivers are required to support an intermediate representation (which will be the new SPIR-V standard on most platforms). Compilation of your OpenCL C++ kernels will be performed offline (i.e. at application build time), generating the SPIR-V program which you then ship with your program.

This means that you will only have to deal with a single compiler front-end, and therefore you shouldn’t have so many of the kinds of issues that you have been seeing. If you need online kernel compilation, there will likely also be solutions for this that will provide a single OpenCL C++ front-end, rather than one for each vendor.

As James pointed out, we are not ignoring these issues. We have decided to step back from runtime compilation and move to a mandated intermediate language with offline compilation and this is one of the big reasons for doing that.