Building opencl kernels clCreateProgramWithSource/clBuildProgram

I’m trying to build the following kernel:

pastebin<dot>com<slash>8m1Gvjr9

(I cannot post links nor is the code accepted as is in a post) on an Nvidia device running CUDA 7.5, on debian:

$ lsb_release  -da
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 8.4 (jessie)
Release:	8.4
Codename:	jessie

opencl was installed together with CUDA.

The code that calls the read and build function is (pastebin URL):

YsJ3b3tZ


clCreateProgramWithSource()
completes successfully but clBuildProgram() fails with the following logs (pastebin URL):

UfdvG9c0

Is it possible to define an emulated context for AMD GPUs on Nvidia systems so that the code can be run?

Short answer: No

Long answer: You are trying to run code that uses AMD-specific extensions on an NVIDIA GPU, where they are not supported. You should either switch to an AMD GPU or re-write the kernel to use generic OpenCL C without the AMD extensions (replace them with C code that does the same thing). Check with the author of the code to see if they had a reference C version before writing the device-specific version.

[QUOTE=Dithermaster;41037]Short answer: No

Long answer: You are trying to run code that uses AMD-specific extensions on an NVIDIA GPU, where they are not supported. You should either switch to an AMD GPU or re-write the kernel to use generic OpenCL C without the AMD extensions (replace them with C code that does the same thing). Check with the author of the code to see if they had a reference C version before writing the device-specific version.[/QUOTE]

Thanks. So, all AMD-specific extensions explicitly require an AMD GPU and cannot be emulated on an nvidia device (I cannot really rewrite the kernels as the target platform is a multi GPU AMD device)?

No, they are not emulated.

Your target platform is AMD GPU and your kernel uses AMD-specific extension but your development system has an NVIDIA GPU that does not support them.

Every GPU AMD sells supports OpenCL so go buy or borrow the cheapest one you can find and use it for development. It will cost you less than the time you’ve spent so far on this issue.

https://www.khronos.org/registry/cl/extensions/amd/cl_amd_media_ops2.txt

media_op2 in particular does not do anything extraordinary. But you will be in trouble i.e. if your project uses some major OpenCL 2.0 features.

Shared virtual memory and device side kernel enqueue. Neither is completely impossible to emulate in some perverted fashion, but it is way more trouble than it’s worth and, obviously, makes a huge performance hit. There are also quality of life improvements like pipes, built-in reduce\scan functions and generic pointers, stuff for razor-thin optimizations like C11 atomics or *_broadcast function, but none of them change API’s computational model too drastically.