Is it a programmer's nightmare?
I'll start by saying that I'm not a programmer, I'm just an end-user who wants to get the best performance out of his hardware that he can. To that end, I occasionally download source-code and compile software, using compiler options to tweak settings for my hardware. Recently, I decided to compile SciLab, and in their instructions, they suggested that I use Intel's Math Kernel Library, so I downloaded it, and discovered that it produces sub-optimal code for non-Intel CPU's, so that put the brakes on that idea. Whilst I was on Intel's site, though, I saw their Parallelization library, and thought that it would be the perfect solution to speed up machine learning for trading on the foreign exchange market.
To cut a long story short, after looking at dozens of libraries, and glancing through the OpenCL 2.1 spec, I've come up with an end-user's dream, which goes like this...
I install a program (e.g. SciLab, or R), and, during the installation process it...
1) Queries the host, and installs the appropriate main executable, and associated specialized libraries (MKL for Intel, BLIS, LibM & libflame for AMD).
2) It then queries GPU's, DSP's, etc, and installs the appropriate specialized libraries for that hardware (if they're not already installed as part of the driver package for that hardware).
During operation, it uses the host's parallelization library to schedule tasks between the CPU and other hardware, but, to prevent vendor games, it should not know the vendorID, just that hardware's capabilities. If the other hardware has its own parellelization library, it uses that as needed. Each hardware driver would have optimization codes for the JIT compiler to use when compiling kernels for its discrete compute units.
Am I asking too much? Am I being greedy? From what I've read, (and from what little I really understand), vendors would provide low-level libraries that get the best performance out of their hardware, OpenCL aims to use the appropriate libraries, and the application programmer doesn't have to worry about each piece of hardware's idiosyncrasies.
That's all I can come up with at the moment.