An End-User's Dream

Is it a programmer’s nightmare?

I’ll start by saying that I’m not a programmer, I’m just an end-user who wants to get the best performance out of his hardware that he can. To that end, I occasionally download source-code and compile software, using compiler options to tweak settings for my hardware. Recently, I decided to compile SciLab, and in their instructions, they suggested that I use Intel’s Math Kernel Library, so I downloaded it, and discovered that it produces sub-optimal code for non-Intel CPU’s, so that put the brakes on that idea. Whilst I was on Intel’s site, though, I saw their Parallelization library, and thought that it would be the perfect solution to speed up machine learning for trading on the foreign exchange market.

To cut a long story short, after looking at dozens of libraries, and glancing through the OpenCL 2.1 spec, I’ve come up with an end-user’s dream, which goes like this…

I install a program (e.g. SciLab, or R), and, during the installation process it…

  1. Queries the host, and installs the appropriate main executable, and associated specialized libraries (MKL for Intel, BLIS, LibM & libflame for AMD).

  2. It then queries GPU’s, DSP’s, etc, and installs the appropriate specialized libraries for that hardware (if they’re not already installed as part of the driver package for that hardware).

During operation, it uses the host’s parallelization library to schedule tasks between the CPU and other hardware, but, to prevent vendor games, it should not know the vendorID, just that hardware’s capabilities. If the other hardware has its own parellelization library, it uses that as needed. Each hardware driver would have optimization codes for the JIT compiler to use when compiling kernels for its discrete compute units.

Am I asking too much? Am I being greedy? From what I’ve read, (and from what little I really understand), vendors would provide low-level libraries that get the best performance out of their hardware, OpenCL aims to use the appropriate libraries, and the application programmer doesn’t have to worry about each piece of hardware’s idiosyncrasies.

That’s all I can come up with at the moment.

Take care,

&

Have fun!

Radar =8^)

  1. It then queries GPU’s, DSP’s, etc, and installs the appropriate specialized libraries for that hardware (if they’re not already installed as part of the driver package for that hardware).

Basically, HSA :stuck_out_tongue:

For better or for worse, a programmer’s goal is never to reach 100% perfomance. Basically, it has to be “okay” and better than of competition. Of course, there are rare cases when 5% speed up means you can roll out your product a week faster, but this usually means a dev has some fixed set of HW to optimize for. Otherwise, it is wiser to invest more time into testing and new features. Who cares if your software is ludicrously fast when it is buggy and lacks basic features?

[QUOTE=Salabar;39564]Basically, HSA :stuck_out_tongue:

For better or for worse, a programmer’s goal is never to reach 100% perfomance. Basically, it has to be “okay” and better than of competition. Of course, there are rare cases when 5% speed up means you can roll out your product a week faster, but this usually means a dev has some fixed set of HW to optimize for. Otherwise, it is wiser to invest more time into testing and new features. Who cares if your software is ludicrously fast when it is buggy and lacks basic features?[/QUOTE]

I guess I could have been a bit clearer in the line you quoted. I meant specialized libraries that are provided by the hardware vendor. Looking around the ‘net, I’ve seen performance libraries from the major hardware vendors, and I think it’s safe to assume that they would be sub-optimal if used with competitors’ hardware.

Maybe I have an overly optimistic view of the purpose of OpenCL… I thought it was to save dev’s from having to optimize to any piece of hardware, so that they can concentrate on implementing features, and squashing bugs.

Man can but only dream :wink: