Multiple cl:::Program within one context with identical kernel names

I have the following situation:

Two threads handling two OpenCL devices which share the same context. Each thread loads a different version of the OpenCL device code, creates a cl::Programm instance and compiles the code for his specific cl::Device. However, the createKernels function after successfully building the program fails with error code -47 =

CL_INVALID_KERNEL_DEFINITION if the function definition for __kernel function given by kernel_name such as the number of arguments, the argument types are not the same for all devices for which the program executable has been built.

With multiple cl::Context instances (one for each device) this worked well. If I look at the OpenCL class diagram I don’t see why is should not be able to use multiple programs with multiple kernels within one context as they are clearly distinguishable via the associated programs.

I’m using the OpenCL implementaton of Nvidia within CUDA SDK 5.5 (on multiple Tesla M2090). The questions that arises for me are:

Is this a general misunderstanding of the OpenCL structure and there is a rule that says that every kernel within a context must have a unique name, or is this one of Nvidia’s non OpenCL standard confirm ways of handling this particular use case?

I really want multiple devices within one context to be able to use copy from one cl::Buffer to another even if their memory resides on different devices.

According to the specification, the requirement is that the kernel signature (number and type of arguments) should be the same for all devices for which the program was built. If you build different programs for different devices, the error should not be raised, so this looks like it could be a spec violation in the NVIDIA OpenCL implementation . Can you prepare a minimal application showing the problem? It could be useful to test the problem across different platforms.

Here is the problem condensed from my project: h__p://ideone.com/vFK9bi << the link protection prevents me from posting this link, so please replace every _ by t
The application reproduces the error. The second call to createKernels returns -47 aka CL_INVALID_KERNEL_DEFINITION.

Output from my development environment:

Platform: NVIDIA CUDA Version: OpenCL 1.1 CUDA 4.2.1
Device 0: Device Name: Tesla M2090
Device 1: Device Name: Tesla M2090
Device 2: Device Name: Tesla M2090
Device 3: Device Name: Tesla M2090
Device 4: Device Name: Tesla M2090
OPENCL: program2.createKernels(&kernels2) failed! Error: -47 main.cpp:103

The CUDA Toolkit Version is 5.5. However, the OpenCL library says 4.2.1…